Hands-On Machine Learning with Microsoft Excel 2019
上QQ阅读APP看书,第一时间看更新

Calculating the Mean Squared Error (MSE)

MSE takes the average of the square of the difference between the actual values and predicted values:

No matter what evaluation method we choose, it is extremely important to take into account the business part of the problem. The optimal solution is not always to have the most accurate model, but the one that better satisfies your business needs. It may be the case that a not-so-accurate model that can be built quickly is better than a perfect one that takes a year to produce. Taking into account the dataset imbalance and business needs is important for fine-tuning the model in order to improve the confusion matrix values:

Another important factor to consider is whether we have, in the case of a classification problem, a balanced dataset. A dominant class will lead to a model that mostly predicts the same result every time. For example, a dataset with 99% YES labels will produce a machine learning model after training that predicts YES for 99% of the input (and it will be right!). There are many known techniques used to balance a dataset and find the problems in our data.