上QQ阅读APP看书,第一时间看更新
Decide by yourself
This section is a discussion of how to approach preprocessing when you don't know what kind of preprocessing is required for developing an NLP application. In this kind of situation, what you can do is simply ask the following questions to yourself and make a decision.
What is your NLP application and what kind of data do you need to build the NLP application?
- Once you have understood the problem statement, as well as having clarity on what your output should be, then you are in a good situation.
- Once you know about the problem statement and the expected output, now think what all the data points are that you need from your raw data set.
- To understand the previous two points, let's take an example. If you want to make a text summarization application, suppose you are using a news articles that are on the web, which you want to use for building news text summarization application. Now, you have built a scraper that scrapes news articles from the web. This raw news article dataset may contain HTML tags, long texts, and so on.
For news text summarization, how will we do preprocessing? In order to answer that, we need to ask ourselves a few questions. So, let's jump to a few questions about preprocessing.