Mastering Machine Learning on AWS
上QQ阅读APP看书,第一时间看更新

Classification algorithms

One of the popular subsets of ML algorithms is classification algorithms. They are also referred to as supervised learning algorithms. For this approach, we assume that we have a rich dataset of features and events associated with those features. The task of the algorithm is to predict an event given a set of features. The event is referred to as a class variable. For example, consider the following dataset of features related to weather and whether it snowed on a particular day:

Table 1: Sample dataset

 

In the dataset, a weather station has information about the temperature, the sky condition, and the wind speed for the day. They also have records of when they received snowfall. The classification problem they are working on is to predict snowfall based on features such as temperature, sky condition, and wind speed.

Let's discuss some terminology that is used in ML datasets. In the example, if the classification problem is to predict snowfall, then the snowfall feature is referred to as a class or target variable. Non-class values are referred to as attribute or feature variables. Each row in this dataset is referred to as an observation.