Approximating XGBoost with an interpretable decision tree

•A GBDT model can be converted into a single decision tree.•The generated tree approximates the accuracy of its source forest.•The developed tree provides interpretable classifications as opposed to GBDT.•The generated tree outperforms CARET induced trees in terms of predictive performance.•The comp...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Information sciences 2021-09, Vol.572, p.522-542
Hauptverfasser: Sagi, Omer, Rokach, Lior
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:•A GBDT model can be converted into a single decision tree.•The generated tree approximates the accuracy of its source forest.•The developed tree provides interpretable classifications as opposed to GBDT.•The generated tree outperforms CARET induced trees in terms of predictive performance.•The complexity of the tree can be configured by the method user. The increasing usage of machine-learning models in critical domains has recently stressed the necessity of interpretable machine-learning models. In areas like healthcare, finary – the model consumer must understand the rationale behind the model output in order to use it when making a decision. For this reason, it is impossible to use black-box models in these scenarios, regardless of their high predictive performance. Decision forests, and in particular Gradient Boosting Decision Trees (GBDT), are examples of this kind of model. GBDT models are considered the state-of-the-art in many classification challenges, reflected by the fact that the majority of Kaggle’s recent winners used GBDT methods as a part of their solution (such as XGBoost). But despite their superior predictive performance, they cannot be used in tasks that require transparency. This paper presents a novel method for transforming a decision forest of any kind into an interpretable decision tree. The method extends the tool-set available for machine learning practitioners, who want to exploit the interpretability of decision trees without significantly impairing the predictive performance gained by GBDT models like XGBoost. We show in an empirical evaluation that in some cases the generated tree is able to approximate the predictive performance of a XGBoost model while enabling better transparency of the outputs.
ISSN:0020-0255
1872-6291
DOI:10.1016/j.ins.2021.05.055