Accuracy, Precision, Recall, F1-Score, or MCC? Empirical Evidence from Advanced Statistics, ML, and XAI

Metrics guide our choice—
F1, MCC, or accuracy?
Evidence speaks truth.
Machine learning
Performance metrics
XAI
Classification

K. M. Sujon, R. Hassan, K. Choi, M. A. Samad, et al., “Accuracy, precision, recall, f1-score, or MCC? empirical evidence from advanced statistics, ML, and XAI,” Journal of Big Data 12:268 (2025), doi: 10.1186/s40537-025-01313-4

Authors

K. M. Sujon

R. Hassan

K. Choi

Md Abdus Samad

Published

January 2025

Doi

Abstract

Selecting the right performance metric is crucial for evaluating machine learning models. This study provides empirical evidence comparing accuracy, precision, recall, F1-score, and Matthews Correlation Coefficient (MCC) across various classification scenarios. Using advanced statistical methods, machine learning techniques, and explainable AI (XAI), we demonstrate when each metric is most appropriate and how metric selection impacts model evaluation and decision-making.

Citation

 Add to Zotero

@article{SujonEtAl:2025,
  Author  = {Sujon, K. M. and Hassan, R. and Choi, K. and Samad, M. A. and others},
  Title   = {Accuracy, precision, recall, f1-score, or MCC? empirical evidence from advanced statistics, ML, and XAI},
  Journal = {Journal of Big Data},
  Volume  = {12},
  Pages   = {268},
  Year    = {2025},
  Doi     = {10.1186/s40537-025-01313-4}
}