On the Capability of Classification Trees and Random Forests to Estimate Probabilities

With the rising popularity of artificial intelligence, machine learning algorithms are being considered for an increasing number of problems. For binary classification, most algorithms can provide an estimate of the probability that an event will occur, but the statistical properties thereof are oft...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Journal of statistical theory and practice 2024-06, Vol.18 (2), Article 25
Hauptverfasser: Plante, Jean-François, Radatz, Marisa
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:With the rising popularity of artificial intelligence, machine learning algorithms are being considered for an increasing number of problems. For binary classification, most algorithms can provide an estimate of the probability that an event will occur, but the statistical properties thereof are often unknown. After reviewing convergence results for classification trees and random forests in the literature, we discuss how some methods could be negatively impacted by poor probability estimates. We design an extensive Monte Carlo simulation inspired by nine datasets to evaluate the ability of different algorithms to estimate probabilities. We find that while trees and forests may perform better at ranking, their ability to estimate probabilities rarely exceeds that of logistic regression, even when the logistic regression is misspecified.
ISSN:1559-8608
1559-8616
DOI:10.1007/s42519-024-00376-5