Wav2SQL: Direct Generalizable Speech-To-SQL Parsing
Speech-to-SQL (S2SQL) aims to convert spoken questions into SQL queries given relational databases, which has been traditionally implemented in a cascaded manner while facing the following challenges: 1) model training is faced with the major issue of data scarcity, where limited parallel data is av...
Gespeichert in:
Hauptverfasser: | , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Speech-to-SQL (S2SQL) aims to convert spoken questions into SQL queries given
relational databases, which has been traditionally implemented in a cascaded
manner while facing the following challenges: 1) model training is faced with
the major issue of data scarcity, where limited parallel data is available; and
2) the systems should be robust enough to handle diverse out-of-domain speech
samples that differ from the source data. In this work, we propose the first
direct speech-to-SQL parsing model Wav2SQL which avoids error compounding
across cascaded systems. Specifically, 1) to accelerate speech-driven SQL
parsing research in the community, we release a large-scale and multi-speaker
dataset MASpider; 2) leveraging the recent progress in the large-scale
pre-training, we show that it alleviates the data scarcity issue and allow for
direct speech-to-SQL parsing; and 3) we include the speech re-programming and
gradient reversal classifier techniques to reduce acoustic variance and learned
style-agnostic representation, improving generalization to unseen out-of-domain
custom data. Experimental results demonstrate that Wav2SQL avoids error
compounding and achieves state-of-the-art results by up to 2.5\% accuracy
improvement over the baseline. |
---|---|
DOI: | 10.48550/arxiv.2305.12552 |