FLAT: fast, lightweight and accurate method for cardinality estimation
Query optimizers rely on accurate cardinality estimation (CardEst) to produce good execution plans. The core problem of CardEst is how to model the rich joint distribution of attributes in an accurate and compact manner. Despite decades of research, existing methods either over-simplify the models o...
Gespeichert in:
Veröffentlicht in: | Proceedings of the VLDB Endowment 2021-05, Vol.14 (9), p.1489-1502 |
---|---|
Hauptverfasser: | , , , , , , , |
Format: | Artikel |
Sprache: | eng |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Query optimizers rely on accurate cardinality estimation (CardEst) to produce good execution plans. The core problem of CardEst is how to model the rich joint distribution of attributes in an accurate and compact manner. Despite decades of research, existing methods either over-simplify the models only using independent factorization which leads to inaccurate estimates, or over-complicate them by lossless conditional factorization without any independent assumption which results in slow probability computation. In this paper, we propose FLAT, a CardEst method that is simultaneously
fast
in probability computation,
lightweight
in model size and
accurate
in estimation quality. The key idea of FLAT is a novel unsupervised graphical model, called FSPN. It utilizes both independent and conditional factorization to adaptively model different levels of attributes correlations, and thus combines their advantages. FLAT supports efficient online probability computation in near linear time on the underlying FSPN model, provides effective offline model construction and enables incremental model updates. It can estimate cardinality for both single table queries and multi-table join queries. Extensive experimental study demonstrates the superiority of FLAT over existing CardEst methods: FLAT achieves 1--5 orders of magnitude better accuracy, 1--3 orders of magnitude faster probability computation speed and 1--2 orders of magnitude lower storage cost. We also integrate FLAT into Postgres to perform an end-to-end test. It improves the query execution time by 12.9% on the well-known IMDB benchmark workload, which is very close to the optimal result 14.2% using the true cardinality. |
---|---|
ISSN: | 2150-8097 2150-8097 |
DOI: | 10.14778/3461535.3461539 |