Python Natural Language Processing
上QQ阅读APP看书,第一时间看更新

Practical Understanding of a Corpus and Dataset

In this chapter, we'll explore the first building block of natural language processing. We are going to cover the following topics to get a practical understanding of a corpus or dataset:

  • What is corpus?
  • Why do we need corpus?
  • Understanding corpus analysis
  • Understanding types of data attributes
  • Exploring different file formats of datasets
  • Resources for access free corpus
  • Preparing datasets for NLP applications
  • Developing the web scrapping application