Tree visualization
Let us take a look at the code to visualize a tree as follows:
In []: labels = df.label.astype('category').cat.categories labels = list(labels) labels Out[]: [u'platyhog', u'rabbosaurus']
Define a variable to store all the names for the features:
In []: feature_names = map(lambda x: x.encode('utf-8'), features.columns.get_values()) feature_names Out[]: ['length', 'fluffy', 'color_light black', 'color_pink gold', 'color_purple polka-dot', 'color_space gray']
Then, create the graph object using the export_graphviz function:
In []: import pydotplus dot_data = tree.export_graphviz(tree_model, out_file=None, feature_names=feature_names, class_names=labels, filled=True, rounded=True, special_characters=True) dot_data Out[]: u'digraph Tree {nnode [shape=box, style="filled, rounded", color="black", fontname=helvetica] ;nedge [fontname=helvetica] ;n0 [label=<length ≤ 26.6917<br/>entropy = 0.9971<br/>samples = 700<br/>value = [372, ... In []: graph = pydotplus.graph_from_dot_data(dot_data.encode('utf-8')) graph.write_png('tree1.png') Out[]: True
Put a markdown to the next cell to see the newly-created file as follows:
![](tree1.png)
The preceding diagram shows what our decision tree looks like. During the training, it grows upside-down. Data (features) travels through it from its root (top) to the leaves (bottom). To predict the label for a sample from our dataset using this classifier, we should start from the root, and move until we reach the leaf. In each node, one feature is compared to some value; for example, in the root node, the tree checks if the length is < 26.0261. If the condition is met, we move along the left branch; if not, along the right.
Let's look closer at a part of the tree. In addition to the condition in each node, we have some useful information:
- Entropy value
- Number of samples in the training set which supports this node
- How many samples support each outcome
- The most likely outcome at this stage