Getting Started with Python for the Internet of Things
上QQ阅读APP看书,第一时间看更新

How to do it...

  1. Introduce sentence tokenization:
from nltk.tokenize import sent_tokenize
  1. Form a new text tokenizer:
tokenize_list_sent = sent_tokenize(text)
print "nSentence tokenizer:" print tokenize_list_sent
  1. Form a new word tokenizer:
from nltk.tokenize import word_tokenize 
print "nWord tokenizer:" 
print word_tokenize(text) 
  1. Introduce a new WordPunct tokenizer:
from nltk.tokenize import WordPunctTokenizer 
word_punct_tokenizer = WordPunctTokenizer() 
print "nWord punct tokenizer:" 
print word_punct_tokenizer.tokenize(text) 

The result obtained by the tokenizer is shown here. It divides a sentence into word groups: