Binary-Classification Performance Metric-Spaces Data

Metric-Space is a proposed concept by Gürol Canbek et al (2019). A metric-space indicates all possible permutations of contingency table (or confusion matrix) elements yielding the same sample size (Sn). It holds all possible results of a hypothetical classification conducted in a dataset with a giv...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
1. Verfasser: Canbek, Gürol
Format: Dataset
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Metric-Space is a proposed concept by Gürol Canbek et al (2019). A metric-space indicates all possible permutations of contingency table (or confusion matrix) elements yielding the same sample size (Sn). It holds all possible results of a hypothetical classification conducted in a dataset with a given sample size in terms of one or more performance metrics (e.g. Accuracy, F1, or TPR). Metric-space provides a pseudo-universal space to analyze and compare metrics in complete coverage. The formal definition and the details are given in the article. Each data file has the following performance 13 measures and 13 metrics: * True Positive (TP), False Positive (FP), False Negative (FN), True Negative (TN), Positive (P), Negative (N), Outcome Positive (OP), Outcome Negative (ON), True Classification (TC), False Classification (FC), Sample Size (Sn), Prevalence (PREV), Bias (BIAS) * True Positive Rate (TPR), True Negative Rate (TNR), Positive Predictive Value (PPV), Negative Predictive Value (NPV), Accuracy (ACC), Informedness (INFORM), Markedness (MARK), Balanced Accuracy (BACC), G, Normalized Mutual Information (nMI), F1, Cohen’s Kappa (CK), and Mathews Correlation Coefficient (MCC) Each data file belongs to metric-spaces for different Sn values (10, 25, 50, 75, 100, 125, 150, 175, 200, 225). The file format is RData (compatible with The R Project for Statistical Computing) instead of CSV (comma separated values) because of large CSV file sizes. Therefore, MATLAB users should convert the files into CSV and save them in R: > load('MetricSpaces_Sn_10.RData') > metric_spaces_Sn_10 write.csv(metric_spaces_Sn_10, file='MetricSpaces_Sn_10.csv') Note that metric-space sizes (permutations) increase exponentially: Sn=25 (3,276); Sn=50 (23,426); Sn=75 (76,076); Sn=100 (176,851); Sn=125 (341,376); Sn=150 (585,276); Sn=175 (924,176); Sn=200 (1,373,701); Sn=250 (2,667,126).
DOI:10.17632/64r4jr8c88.2