How it works...
Let's consider a test dataset containing 100 items, out of which 82 are of interest to us. Now, we want our classifier to identify these 82 items for us. Our classifier picks out 73 items as the items of interest. Out of these 73 items, only 65 are actually items of interest, and the remaining 8 are misclassified. We can compute precision in the following way:
- The number of correct identifications = 65
- The total number of identifications = 73
- Precision = 65 / 73 = 89.04%
To compute recall, we use the following:
- The total number of items of interest in the dataset = 82
- The number of items retrieved correctly = 65
- Recall = 65 / 82 = 79.26%
A good machine learning model needs to have good precision and good recall simultaneously. It's easy to get one of them to 100%, but the other metric suffers! We need to keep both metrics high at the same time. To quantify this, we use an F1 score, which is a combination of precision and recall. This is actually the harmonic mean of precision and recall:
In the preceding case, the F1 score will be as follows: