Distributed Numerical and Machine Learning Computations via Two-Phase Execution of Aggregated Join Trees

When numerical and machine learning (ML) computations are expressed relationally, classical query execution strategies (hash-based joins and aggregations) can do a poor job distributing the computation. In this paper, we propose a two-phase execution strategy for numerical computations that are expr...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Proceedings of the VLDB Endowment 2021-03, Vol.14 (7), p.1228-1240
Hauptverfasser:	Jankov, Dimitrije, Yuan, Binhang, Luo, Shangyu, Jermaine, Chris
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer Science Computer Science, Information Systems Computer Science, Theory & Methods Science & Technology Technology
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	When numerical and machine learning (ML) computations are expressed relationally, classical query execution strategies (hash-based joins and aggregations) can do a poor job distributing the computation. In this paper, we propose a two-phase execution strategy for numerical computations that are expressed relationally, as aggregated join trees (that is, expressed as a series of relational joins followed by an aggregation). In a pilot run, lineage information is collected; this lineage is used to optimally plan the computation at the level of individual records. Then, the computation is actually executed. We show experimentally that a relational system making use of this two-phase strategy can be an excellent platform for distributed ML computations.
ISSN:	2150-8097 2150-8097
DOI:	10.14778/3450980.3450991