POLARIS: the distributed SQL engine in azure synapse

In this paper, we describe the Polaris distributed SQL query engine in Azure Synapse. It is the result of a multi-year project to re-architect the query processing framework in the SQL DW parallel data warehouse service, and addresses two main goals: (i) converge data warehousing and big data worklo...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Proceedings of the VLDB Endowment 2020-08, Vol.13 (12), p.3204-3216
Hauptverfasser:	Aguilar-Saborit, Josep, Ramakrishnan, Raghu, Srinivasan, Krish, Bocksrocker, Kevin, Alagiannis, Ioannis, Sankara, Mahadevan, Shafiei, Moe, Blakeley, Jose, Dasarathy, Girish, Dash, Sumeet, Davidovic, Lazar, Damjanic, Maja, Djunic, Slobodan, Djurkic, Nemanja, Feddersen, Charles, Galindo-Legaria, Cesar, Halverson, Alan, Kovacevic, Milana, Kicovic, Nikola, Lukic, Goran, Maksimovic, Djordje, Manic, Ana, Markovic, Nikola, Mihic, Bosko, Milic, Ugljesa, Milojevic, Marko, Nayak, Tapas, Potocnik, Milan, Radic, Milos, Radivojevic, Bozidar, Rangarajan, Srikumar, Ruzic, Milan, Simic, Milan, Sosic, Marko, Stanko, Igor, Stikic, Maja, Stanojkov, Sasa, Stefanovic, Vukasin, Sukovic, Milos, Tomic, Aleksandar, Tomic, Dragan, Toscano, Steve, Trifunovic, Djordje, Vasic, Veljko, Verona, Tomer, Vujic, Aleksandar, Vujic, Nikola, Vukovic, Marko, Zivanovic, Marko
Format:	Artikel
Sprache:	eng
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	In this paper, we describe the Polaris distributed SQL query engine in Azure Synapse. It is the result of a multi-year project to re-architect the query processing framework in the SQL DW parallel data warehouse service, and addresses two main goals: (i) converge data warehousing and big data workloads, and (ii) separate compute and state for cloud-native execution. From a customer perspective, these goals translate into many useful features, including the ability to resize live workloads, deliver predictable performance at scale, and to efficiently handle both relational and unstructured data. Achieving these goals required many innovations, including a novel "cell" data abstraction, and flexible, fine-grained, task monitoring and scheduling capable of handling partial query restarts and PB-scale execution. Most importantly, while we develop a completely new scale-out framework, it is fully compatible with T-SQL and leverages decades of investment in the SQL Server single-node runtime and query optimizer. The scalability of the system is highlighted by a 1PB scale run of all 22 TPC-H queries; to our knowledge, this is the first reported run with scale larger than 100TB.
ISSN:	2150-8097 2150-8097
DOI:	10.14778/3415478.3415545