上QQ阅读APP看书，第一时间看更新

Text processing

It is possible to do simple text processing using only the standard Java library with classes such as StringTokenizer, the java.text package, or the regular expressions.

In addition to that, there is a big variety of text processing frameworks available for Java as follows:

Apache Lucene (https://lucene.apache.org/) is a library that is used for information retrieval
Stanford CoreNLP (http://stanfordnlp.github.io/CoreNLP/)
Apache OpenNLP (https://opennlp.apache.org/)
LingPipe (http://alias-i.com/lingpipe/)
GATE (https://gate.ac.uk/)
MALLET (http://mallet.cs.umass.edu/)
Smile (http://haifengl.github.io/smile/) also has some algorithms for NLP

Most NLP libraries have very similar functionality and coverage of algorithms, which is why selecting which one to use is usually a matter of habit or taste. They all typically have tokenization, parsing, part-of-speech tagging, named entity recognition, and other algorithms for text processing. Some of them (such as StanfordNLP) support multiple languages, and some support only English.

We will cover some of these libraries in Chapter 6, Working with Text - Natural Language Processing and Information Retrival.