Efficient Realization of Decision Trees for Real-Time Inference

For timing-sensitive edge applications, the demand for efficient lightweight machine learning solutions has increased recently. Tree ensembles are among the state-of-the-art in many machine learning applications. While single decision trees are comparably small, an ensemble of trees can have a signi...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	ACM transactions on embedded computing systems 2022-10, Vol.21 (6), p.1-26, Article 68
Hauptverfasser:	Chen, Kuan-Hsun, Su, Chiahui, Hakert, Christian, Buschjäger, Sebastian, Lee, Chao-Lin, Lee, Jenq-Kuen, Morik, Katharina, Chen, Jian-Jia
Format:	Artikel
Sprache:	eng
Schlagworte:	Classification and regression trees Computer systems organization Computing methodologies Embedded systems Software and its engineering Software organization and properties
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	For timing-sensitive edge applications, the demand for efficient lightweight machine learning solutions has increased recently. Tree ensembles are among the state-of-the-art in many machine learning applications. While single decision trees are comparably small, an ensemble of trees can have a significant memory footprint leading to cache locality issues, which are crucial to performance in terms of execution time. In this work, we analyze memory-locality issues of the two most common realizations of decision trees, i.e., native and if-else trees. We highlight that both realizations demand a more careful memory layout to improve caching behavior and maximize performance. We adopt a probabilistic model of decision tree inference to find the best memory layout for each tree at the application layer. Further, we present an efficient heuristic to take architecture-dependent information into account thereby optimizing the given ensemble for a target computer architecture. Our code-generation framework, which is freely available on an open-source repository, produces optimized code sessions while preserving the structure and accuracy of the trees. With several real-world data sets, we evaluate the elapsed time of various tree realizations on server hardware as well as embedded systems for Intel and ARM processors. Our optimized memory layout achieves a reduction in execution time up to 75 % execution for server-class systems, and up to 70 % for embedded systems, respectively.
ISSN:	1539-9087 1558-3465
DOI:	10.1145/3508019