An Empirical Study on Distributed Bayesian Approximation Inference of Piecewise Sparse Linear Models

The importance of interpretability of machine learning models has been increasing due to emerging enterprise predictive analytics. Piecewise linear models have been actively studied to achieve both accuracy and interpretability. They often produce competitive accuracy against state-of-the-art non-li...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on parallel and distributed systems 2019-07, Vol.30 (7), p.1481-1493
Hauptverfasser: Asahara, Masato, Fujimaki, Ryohei
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:The importance of interpretability of machine learning models has been increasing due to emerging enterprise predictive analytics. Piecewise linear models have been actively studied to achieve both accuracy and interpretability. They often produce competitive accuracy against state-of-the-art non-linear methods. In addition, their representations (i.e., rule-based segmentation plus sparse linear formula) are often preferred by domain experts. A disadvantage of such models, however, is high computational cost for simultaneous determinations of the number of “pieces” and cardinality of each linear predictor, which has restricted their applicability to middle-scale data sets. This paper discusses an empirical study on the derivation of a distributed factorized asymptotic Bayesian (FAB) inference of learning piece-wise sparse linear models on distributed memory architectures from the original FAB inference algorithm. The distributed FAB inference solves the simultaneous model selection issue without communicating O(N)O(N) data where NN is the number of training samples and achieves linear scale-out against the number of CPU cores. Experimental results demonstrate that the distributed FAB inference achieves high prediction accuracy and performance scalability with both synthetic and public benchmark data.
ISSN:1045-9219
1558-2183
DOI:10.1109/TPDS.2019.2892972