Sampling
In this stage, you can actually try to figure out which of the available data attributes our present dataset has and which of the data attributes can be derived by us. We are also trying to figure out what the most important data attributes are as per our application. Suppose we are building a chatbot. We will then try to break down sentences into words so as to identify the keywords of the sentence. So, the word-level information can be derived from the sentence, and both word-level and sentence level information are important for the chatbot application. As such, we do not remove sentences, apart from junk sentences. Using sampling, we try to extract the best data attributes that represent the overall dataset very well.
Now we can look at the last stage, which is the transformation stage.