Efficiently approximating selectivity functions using low overhead regression models

Today's query optimizers use fast selectivity estimation techniques but are known to be susceptible to large estimation errors. Recent work on supervised learned models for selectivity estimation significantly improves accuracy while ensuring relatively low estimation overhead. However, these m...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Proceedings of the VLDB Endowment 2020-07, Vol.13 (12), p.2215-2228
Hauptverfasser: Dutt, Anshuman, Wang, Chi, Narasayya, Vivek, Chaudhuri, Surajit
Format: Artikel
Sprache:eng
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Today's query optimizers use fast selectivity estimation techniques but are known to be susceptible to large estimation errors. Recent work on supervised learned models for selectivity estimation significantly improves accuracy while ensuring relatively low estimation overhead. However, these models impose significant model construction cost as they need large numbers of training examples and computing selectivity labels is costly for large datasets. We propose a novel model construction method that incrementally generates training data and uses approximate selectivity labels, that reduces total construction cost by an order of magnitude while preserving most of the accuracy gains. The proposed method is particularly attractive for model designs that are faster-to-train for a given number of training examples, but such models are known to support a limited class of query expressions. We broaden the applicability of such supervised models to the class of select-project-join query expressions with range predicates and IN clauses. Our extensive evaluation on synthetic benchmark and real-world queries shows that the 95th-percentile error of our proposed models is 10-100X better than traditional selectivity estimators. We also demonstrate significant gains in plan quality as a result of improved selectivity estimates.
ISSN:2150-8097
2150-8097
DOI:10.14778/3407790.3407820