上QQ阅读APP看书,第一时间看更新
Searching indexed data with Apache Lucene
Now that you have indexed your data, you will be searching the data using Apache Lucene in this recipe. The code for searching in this recipe depends on the index that you created in the previous recipe, and therefore, it will only successfully execute if you followed the instructions in the previous recipe.
Getting ready
- Complete the previous recipe. After completing the previous recipe, go to the index directory in your project that you created in step 11 of that recipe. Make sure that you see some indexing files there:
- Create a Java file named
SearchFiles
in theorg.apache.lucene.demo
package you created in the previous recipe: - Now you are ready to type in some code in the
SearchFiles.java
file.
How to do it...
- Open
SearchFiles.java
in the editor of Eclipse and create the following class:public class SearchFiles {
- You need to create two constant String variables. The first variable will contain the path of your
index
that you created in the previous recipe. The second variable will contain the field contents where you will be searching. In our case, we will be searching in thecontents
field of theindex
:public static final String INDEX_DIRECTORY = "index"; public static final String FIELD_CONTENTS = "contents";
- Start creating your main method:
public static void main(String[] args) throws Exception {
- Create an
indexreader
by opening the indexes in yourindex
directory:IndexReader reader = DirectoryReader.open(FSDirectory.open (Paths.get(INDEX_DIRECTORY)));
- The next step will be to create a searcher that will search the index:
IndexSearcher indexSearcher = new IndexSearcher(reader);
- As your analyzer, create a standard analyzer:
Analyzer analyzer = new StandardAnalyzer();
- Create a query parser by providing two arguments to the
QueryParser
constructor, the field where you will be searching and the analyzer you have created:QueryParser queryParser = new QueryParser(FIELD_CONTENTS, analyzer);
- In this recipe, you will be using a predefined search term. In this search, you are trying to find the documents that contain both
"over-full"
and"persuasion"
:String searchString = "over-full AND persuasion";
- Using the search string, create a query:
Query query = queryParser.parse(searchString);
- The searcher will be looking into the index to see whether it can find out the search term. You are also mentioning how many search results will be coming as a result, which in our case is
5
:TopDocs results = indexSearcher.search(query, 5);
- Create an array to hold the
hits
:ScoreDoc[] hits = results.scoreDocs;
- Note that during indexing, we have used only one document,
shakespeare.txt
. So the length of this array, in our case, can be a maximum of 1. - You will also be interested in knowing the number of documents where the search was found as a hit:
int numTotalHits = results.totalHits; System.out.println(numTotalHits + " total matching documents");
- Finally, iterate through the hits. You get the document ID for which a hit was found. With the document ID, you will then create the document and print the path of the document and the score calculated by Lucene for a document for the search term you have used:
for(int i=0;i<hits.length;++i) { int docId = hits[i].doc; Document d = indexSearcher.doc(docId); System.out.println((i + 1) + ". " + d.get("path") + " score=" + hits[i].score); }
- Close the method and the class:
} }
- If you run the code, you will see the following output:
- Open the
shakespeare.txt
file in the input folder of your project folder. Search manually, and you will find that both"over-full"
and"persuasion"
are present in the document. - Change the
searchString
in step 8, as follows:String searchString = "shakespeare";
- By keeping the rest of the codes as they are, whether you run the code, you will see the following output:
- Open the
Shakespeare.txt
file again and double-check if the term Shakespeare appears in it. You will find none.
The complete code for this recipe is as follows:
package org.apache.lucene.demo; import java.nio.file.Paths; import org.apache.lucene.analysis.Analyzer; import org.apache.lucene.analysis.standard.StandardAnalyzer; import org.apache.lucene.document.Document; import org.apache.lucene.index.DirectoryReader; import org.apache.lucene.index.IndexReader; import org.apache.lucene.queryparser.classic.QueryParser; import org.apache.lucene.search.IndexSearcher; import org.apache.lucene.search.Query; import org.apache.lucene.search.ScoreDoc; import org.apache.lucene.search.TopDocs; import org.apache.lucene.store.FSDirectory; public class SearchFiles { public static final String INDEX_DIRECTORY = "index"; public static final String FIELD_CONTENTS = "contents"; public static void main(String[] args) throws Exception { IndexReader reader = DirectoryReader.open(FSDirectory.open (Paths.get(INDEX_DIRECTORY))); IndexSearcher indexSearcher = new IndexSearcher(reader); Analyzer analyzer = new StandardAnalyzer(); QueryParser queryParser = new QueryParser(FIELD_CONTENTS, analyzer); String searchString = "shakespeare"; Query query = queryParser.parse(searchString); TopDocs results = indexSearcher.search(query, 5); ScoreDoc[] hits = results.scoreDocs; int numTotalHits = results.totalHits; System.out.println(numTotalHits + " total matching documents"); for(int i=0;i<hits.length;++i) { int docId = hits[i].doc; Document d = indexSearcher.doc(docId); System.out.println((i + 1) + ". " + d.get("path") + " score=" + hits[i].score); } } }
Note
You can visit https://lucene.apache.org/core/2_9_4/queryparsersyntax.html for the query syntaxes supported by Apache Lucene.