ScienceBenchmark: A Complex Real-World Benchmark for Evaluating Natural Language to SQL Systems
PVLDB Volume 17, 2023-2024 Natural Language to SQL systems (NL-to-SQL) have recently shown a significant increase in accuracy for natural language to SQL query translation. This improvement is due to the emergence of transformer-based language models, and the popularity of the Spider benchmark - the...
Gespeichert in:
Hauptverfasser: | , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | PVLDB Volume 17, 2023-2024 Natural Language to SQL systems (NL-to-SQL) have recently shown a significant
increase in accuracy for natural language to SQL query translation. This
improvement is due to the emergence of transformer-based language models, and
the popularity of the Spider benchmark - the de-facto standard for evaluating
NL-to-SQL systems. The top NL-to-SQL systems reach accuracies of up to 85\%.
However, Spider mainly contains simple databases with few tables, columns, and
entries, which does not reflect a realistic setting. Moreover, complex
real-world databases with domain-specific content have little to no training
data available in the form of NL/SQL-pairs leading to poor performance of
existing NL-to-SQL systems.
In this paper, we introduce ScienceBenchmark, a new complex NL-to-SQL
benchmark for three real-world, highly domain-specific databases. For this new
benchmark, SQL experts and domain experts created high-quality NL/SQL-pairs for
each domain. To garner more data, we extended the small amount of
human-generated data with synthetic data generated using GPT-3. We show that
our benchmark is highly challenging, as the top performing systems on Spider
achieve a very low performance on our benchmark. Thus, the challenge is
many-fold: creating NL-to-SQL systems for highly complex domains with a small
amount of hand-made training data augmented with synthetic data. To our
knowledge, ScienceBenchmark is the first NL-to-SQL benchmark designed with
complex real-world scientific databases, containing challenging training and
test data carefully validated by domain experts. |
---|---|
DOI: | 10.48550/arxiv.2306.04743 |