Hands-On Machine Learning with Microsoft Excel 2019
上QQ阅读APP看书,第一时间看更新

Frequency table

Let's build a frequency table, which is the usual way of counting the total number of combinations between variables. In our case, we use it to decide which variable choice leads to a larger reduction of the entropy:

  1. Count the different combinations of feature values, taking each feature compared to the Train outside target variable. You can count them manually in this particular example, but it is useful to have a general method to do this in case we are working with a larger dataset.
  2. To count the number of feature combinations, we start by concatenating the values in the data table in pairs. For example, CONCATENATE(B2;"_";F2) gives us Hot_No.
  3. If we copy the formula down to complete the total number of rows, we get all possible combinations of the Temperature and Train outside variables.
  4. If we repeat the same calculation with the rest of the features, the results will be as follows:
  1. Create pivot tables to count the number of unique values in each column, that is, the number of unique combinations. This can be done by selecting the full range in the column, right-clicking anywhere in the selection, and left-clicking on Quick Analysis. The following dialogue will pop up:
  1. Select Tables | PivotTable to create a table like the following:
  1. Repeat the same procedure with all columns and build all frequency tables and the two-variable entropy. The resulting tables and the entropy calculations are shown in the following subsection.