Query grouping–based multi‐query optimization framework for interactive SQL query engines on Hadoop
Summary In the past few years, executing high‐concurrency queries with interactive SQL query engines on Hadoop has become an important activity for many organizations. However, these systems do not adopt Multi‐Query Optimization (MQO) to accelerate the process. There are two major concerns. Firstly,...
Gespeichert in:
Veröffentlicht in: | Concurrency and computation 2018-10, Vol.30 (19), p.n/a |
---|---|
Hauptverfasser: | , , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Summary
In the past few years, executing high‐concurrency queries with interactive SQL query engines on Hadoop has become an important activity for many organizations. However, these systems do not adopt Multi‐Query Optimization (MQO) to accelerate the process. There are two major concerns. Firstly, traditional MQO researches assume that multiple queries have high similarity. However, these systems usually serve a variety of applications. Although queries from the same application have high similarity, queries from different applications may have low similarity, so using traditional MQO will be inefficient and time consuming. Secondly, integrating MQO may lead to lots of system modifications. To integrate MQO into interactive SQL query engines on Hadoop efficiently, a query grouping–based MQO framework is proposed. A lightweight mechanism is used to represent SQL queries, on which a grouping method is exploited to speed up the optimization process. A cost model is integrated to estimate the execution cost of interactive SQL query engines on Hadoop. By using the proposed framework, we modify Impala system to support MQO, and the experimental results on TPC‐DS show significant performance improvements. |
---|---|
ISSN: | 1532-0626 1532-0634 |
DOI: | 10.1002/cpe.4676 |