Measures, Metrics and Indicators Derived from the Ubiquitous Two-by-two Contingency Table, Part I: Background
Published: 2021-06-04
Page: 133- 147
Issue: 2021 - Volume 4 [Issue 2]
Muzainah Ali Rushdi
Kasr Al-Ainy Faculty of Medicine, Cairo University, Cairo, 11562, Arab Republic of Egypt.
Ali Muhammad Rushdi
*
Department of Electrical and Computer Engineering, King Abdulaziz University, P.O.Box 80200, Jeddah 21589, Saudi Arabia.
*Author to whom correspondence should be addressed.
Abstract
This paper (the first part of two sibling parts) provides a tutorial exposition of indicators derived of the ubiquitous two-by-two contingency table (confusion matrix) that has widespread applications in many fields, including, in particular, the fields of binary classification and clinical or epidemiological testing. These indicators include the eight most prominent indicators used in diagnostic testing, namely the Sensitivity or True Positive Rate (TPR), the Specificity or True Negative Rate (TNR), the Positive and Negative Predictive Values (PPV and NPV), together with their respective complements, namely the False Negative Rate (FNR), False Positive Rate (FPR), False Discovery rate (FDR) and False Omission Rate (FOR). We consider also some other indicators, such as the total error and accuracy, pre-test prevalence, the diagnostic odds ratio (DOR), the inverse DOR, the F-scores, Youden’s Index (Informedness), Markedness and the Index of Association (Matthews Correlation Coefficient (MCC)). We review recent studies asserting that the MCC is the most reliable single metric derivable from the contingency matrix. We suggest that any mean (signed geometric mean, arithmetic mean, or harmonic mean) of Informedness and Markedness might be as effective as the MCC in summarizing the contingency matrix into a single value. We set criteria in terms of basic and composite indicators for identifying the quality of binary classification, going down from the perfect type to the completely-contradictory type, where random-guessing-like classification marks the middle point of transition between good and bad classification. In a sequel paper, we present a potpourri of example or test cases to reveal and unravel many of the properties and inter-relationships among binary and composite indicators.
Keywords: Diagnostic testing, binary classification, sensitivity, specificity, predictive values, F scores, Matthews correlation coefficient, means of Informedness and Markedness, Muzainah Ali Rushdi