Optimization of Tree Ensembles

From Tree Ensemble Models to Decisions Predictive models based on ensembles of trees, such as random forests and gradient boosted trees, are widely used in machine learning and data science. In many applications, the features that these models use are controllable and can be regarded as decision var...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Operations research 2020-09, Vol.68 (5), p.1605-1624
1. Verfasser:	Mišić, Velibor V.
Format:	Artikel
Sprache:	eng
Schlagworte:	applications: integer: programming Benders decomposition Business machines customized pricing Datasets Decision trees Dependent variables drug design Heuristic Independent variables Machine learning Mixed integer mixed-integer optimization Operations research Optimization Prediction models random forests statistics Stochastic models tree ensembles Variables
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	From Tree Ensemble Models to Decisions Predictive models based on ensembles of trees, such as random forests and gradient boosted trees, are widely used in machine learning and data science. In many applications, the features that these models use are controllable and can be regarded as decision variables. This leads to a natural prescriptive analytics problem: how should these features be set, so as to maximize the value predicted by the tree ensemble model? In “Optimization of Tree Ensembles” Velibor V. Mišić proposes a MIO model of this problem, proposes a hierarchy of approximations to this formulation based on truncating the trees at a particular depth, and develops two specialized constraint generation methods for solving the problem at scale. Using real data sets, including two detailed case studies in drug design and customized pricing, the author shows how this approach can efficiently solve large-scale problem instances to full or near optimality and outperforms solutions obtained by heuristic approaches. Tree ensemble models such as random forests and boosted trees are among the most widely used and practically successful predictive models in applied machine learning and business analytics. Although such models have been used to make predictions based on exogenous, uncontrollable independent variables, they are increasingly being used to make predictions where the independent variables are controllable and are also decision variables. In this paper, we study the problem of tree ensemble optimization: given a tree ensemble that predicts some dependent variable using controllable independent variables, how should we set these variables so as to maximize the predicted value? We formulate the problem as a mixed-integer optimization problem. We theoretically examine the strength of our formulation, provide a hierarchy of approximate formulations with bounds on approximation quality and exploit the structure of the problem to develop two large-scale solution methods, one based on Benders decomposition and one based on iteratively generating tree split constraints. We test our methodology on real data sets, including two case studies in drug design and customized pricing, and show that our methodology can efficiently solve large-scale instances to near or full optimality, and outperforms solutions obtained by heuristic approaches.
ISSN:	0030-364X 1526-5463
DOI:	10.1287/opre.2019.1928