A Practical Tutorial for Decision Tree Induction: Evaluation Measures for Candidate Splits and Opportunities

Experts from different domains have resorted to machine learning techniques to produce explainable models that support decision-making. Among existing techniques, decision trees have been useful in many application domains for classification. Decision trees can make decisions in a language that is c...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	ACM computing surveys 2022-01, Vol.54 (1), p.1-38, Article 18
Hauptverfasser:	Sosa Hernandez, Victor Adrian, Monroy, Raul, Angel Medina-Perez, Miguel, Loyola-Gonzalez, Octavio, Herrera, Francisco
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Computer Science Computer Science, Theory & Methods Decision making Decision trees Domains Machine learning Performance evaluation Science & Technology Statistical analysis Technology
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Experts from different domains have resorted to machine learning techniques to produce explainable models that support decision-making. Among existing techniques, decision trees have been useful in many application domains for classification. Decision trees can make decisions in a language that is closer to that of the experts. Many researchers have attempted to create better decision tree models by improving the components of the induction algorithm. One of the main components that have been studied and improved is the evaluation measure for candidate splits. In this article, we introduce a tutorial that explains decision tree induction. Then, we present an experimental framework to assess the performance of 21 evaluation measures that produce different C4.5 variants considering 110 databases, two performance measures, and 10x 10-fold cross-validation. Furthermore, we compare and rank the evaluation measures by using a Bayesian statistical analysis. From our experimental results, we present the first two performance rankings in the literature of C4.5 variants. Moreover, we organize the evaluation measures into two groups according to their performance. Finally, we introduce meta-models that automatically determine the group of evaluation measures to produce a C4.5 variant for a new database and some further opportunities for decision tree models.
ISSN:	0360-0300 1557-7341
DOI:	10.1145/3429739