Confirming the statistically significant superiority of tree-based machine learning algorithms over their counterparts for tabular data

Many individual studies in the literature observed the superiority of tree-based machine learning (ML) algorithms. However, the current body of literature lacks statistical validation of this superiority. This study addresses this gap by employing five ML algorithms on 200 open-access datasets from...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	PloS one 2024-04, Vol.19 (4), p.e0301541-e0301541
Hauptverfasser:	Uddin, Shahadat, Lu, Haohui
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Analysis Biology and Life Sciences Computer and Information Sciences Data mining Engineering and Technology Evaluation Forecasts and trends Machine learning Medical research Medicine, Experimental Physical Sciences Research and Analysis Methods
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Many individual studies in the literature observed the superiority of tree-based machine learning (ML) algorithms. However, the current body of literature lacks statistical validation of this superiority. This study addresses this gap by employing five ML algorithms on 200 open-access datasets from a wide range of research contexts to statistically confirm the superiority of tree-based ML algorithms over their counterparts. Specifically, it examines two tree-based ML (Decision tree and Random forest) and three non-tree-based ML (Support vector machine, Logistic regression and k-nearest neighbour) algorithms. Results from paired-sample t-tests show that both tree-based ML algorithms reveal better performance than each non-tree-based ML algorithm for the four ML performance measures (accuracy, precision, recall and F1 score) considered in this study, each at p
ISSN:	1932-6203 1932-6203
DOI:	10.1371/journal.pone.0301541