Disaggregated query processing on data lakes based on pipelined, massively parallel, distributed native query execution on compute clusters utilizing precise, parallel, asynchronous shared storage repository access

Executing a query in a disaggregated cluster. A query is received at the disaggregated cluster. A query graph is created based on the query that identifies a hierarchy of vertices, where each vertex is associated with a set of data responsive to at least a portion of the query. The compute nodes pro...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Krishnan, Manavalan, O'Krafka, Brian Walter, Rothauge, Kai, Busch, John Richard
Format:	Patent
Sprache:	eng
Schlagworte:	CALCULATING COMPUTING COUNTING ELECTRIC DIGITAL DATA PROCESSING PHYSICS
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Executing a query in a disaggregated cluster. A query is received at the disaggregated cluster. A query graph is created based on the query that identifies a hierarchy of vertices, where each vertex is associated with a set of data responsive to at least a portion of the query. The compute nodes process the query graph by first identifying all tables, files, and objects stored on the storage nodes whose access is required to retrieve data that satisfy the query. Next, the compute nodes selectively assign the identified tables, files, and objects to a leaf vertex of the query graph to optimize retrieving data from the storage nodes. Thereafter, the compute nodes process the retrieved data sets associated with each vertex using separate threads of execution for each vertex of the query graph such that leaf vertices are performed in parallel. The compute nodes then provide a result set.