Deciding whether to train outdoors depending on the weather
Let's suppose we have historical data on the decisions made by an experienced football trainer about training outdoors (outside the gym) or not with her team, including the weather conditions on the days when the decisions were made.
A typical dataset could look as follows:
The dataset was specifically created for this example and, of course, might not represent any real decisions.
In this example, the target variable is Train outside and the rest of the variables are the model features.
According to the data table, a possible decision tree would be as follows:
We choose to start splitting the data by the value of the Outlook feature. We can see that if the value is Overcast, then the decision to train outside is always Yes and does not depend on the values of the other features. Sunny and Rainy can be further split to get an answer.
How can we decide which feature to use first and how to continue? We will use the value of the entropy, measuring how much its value changes when considering different input features.