advertisement

Suppose a binary classifier that is to classify 1000 items. Of those items, 700 belong to A, and 300 to B. The results are as follows: Predicted |#A|#B -------+-----+----True A | 550 | 150 True B | 50 | 250 We'll call class B a positive result (1) and class A a negative one (0). So there were 550 true negatives, 150 false positives, 50 false negatives and 250 true positives. There are some metrics defined for this classification: Recall=TP/TP+FN=0.833 Precision=TP/TP+FP=0.625 TruePositive+TrueNegative accuracy= TruePositive+TrueNegative+FalsePositive+FlaseNegative When a data scientist has chosen a target variable - the “column” in a spreadsheet they wish to predict - and have done the prerequisites of transforming data and building a model, one of the most important steps in the process is evaluating the model’s performance. Confusion Matrix Choosing a performance metric often depends on the business problem being solved. Let’s say you have 100 examples in your data and you’ve fed each one to your model and received a classification. The predicted vs. actual classification can be charted in a table called a confusion matrix. Negative (predicted) Positive (predicted) Negative (actual) 98 0 Positive (actual) 1 1 The table above describes an output of negative vs. positive. These two outcomes are the “classes” of each examples. Because there are only two classes, the model used to generate the confusion matrix can be described as a binary classifier. To better interpret the table, you can also see it in terms of true positives, false negatives, etc. Negative (predicted) Positive (predicted) Negative (actual) true negative false positive Positive (actual) false negative true positive Accuracy Overall, how often is our model correct? As a heuristic, accuracy can immediately tell us whether a model is being trained correctly and how it may perform generally. However, it does not give detailed information regarding its application to the problem. Precision When the model predicts positive, how often is it correct? Precision helps determine when the costs of false positives are high. So let’s assume the business problem involves the detection of skin cancer. If we have a model that has very low precision, the result is that many patients will receive results that they have melanoma. Lots of extra tests and stress are at stake. Recall Recall helps determine when the costs of false negatives are high. What if our problem requires that we check for a fatal virus such as Ebola? If many patients are told they don’t have Ebola (when they actually do), the result is likely a large infection of the population and an epidemiological crisis. F1 Score F1 is a helpful measure of a test’s accuracy. It is a consideration of both precision and recall, and an F1 score is considered perfect when at 1 and is a total failure when at 0.