To test the results of my multi-label classfication model, I measured the Precision, Recall and F1 scores. I wanted to compare two different results, Micro and Macro. I have a dataset with few rows, but my label count is around 1700. Why is the macro so low even though I get a high result in micro, which one would be more useful to look at when it is a multi class?
Accuracy: 0.743999 Micro Precision: 0.743999 Macro Precision: 0.256570 Micro Recall: 0.743999 Macro Recall: 0.264402 Micro F1 score: 0.743999 Macro F1 score: 0.250033 Cohens kappa: 0.739876
Advertisement
Answer
Micro-Average
The micro-average precision and recall score is calculated from the individual classes’ true positives (TPs), true negatives (TNs), false positives (FPs), and false negatives (FNs) of the model.
Macro-Average
The macro-average precision and recall score is calculated as the arithmetic mean of individual classes’ precision and recall scores. The macro-average F1-score is calculated as the arithmetic mean of individual classes’ F1-score.
When to use micro-averaging and macro-averaging scores?
Use micro-averaging score when there is a need to weigh each instance or prediction equally.
Use macro-averaging score when all classes need to be treated equally to evaluate the classifier’s overall performance concerning the most frequent class labels.
Use a weighted macro-averaging score in case of class imbalances (different instances related to different class labels). The weighted macro-average is calculated by weighting the score of each class label by the number of true instances when calculating the average.
The macro-average method can be used when you want to know how the system performs overall across the sets of data. You should not come up with any specific decision with this average. On the other hand, micro-average can be a useful measure when your dataset varies in size.
Micro-Average & Macro-Average Precision Scores for Multi-class Classification
For multi-class classification problems, micro-average precision scores can be defined as the sum of true positives for all the classes divided by all positive predictions. The positive prediction is the sum of all true positives and false positives.
Micro-Average & Macro-Average Recall Scores for Multi-class Classification
For multi-class classification problems, micro-average recall scores can be defined as the sum of true positives for all the classes divided by the actual positives (and not the predicted positives).
References: