Joint intent detection and slot filling using weighted finite state transducer and BERT

Intent detection and slot filling are the two most essential tasks of natural language understanding (NLU). Deep neural models have produced impressive results on these tasks. However, the predictive accuracy of these models heavily depends upon a massive amount of supervised data. In many applicati...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Applied intelligence (Dordrecht, Netherlands) Netherlands), 2022-12, Vol.52 (15), p.17356-17370
Hauptverfasser: Abro, Waheed Ahmed, Qi, Guilin, Aamir, Muhammad, Ali, Zafar
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Intent detection and slot filling are the two most essential tasks of natural language understanding (NLU). Deep neural models have produced impressive results on these tasks. However, the predictive accuracy of these models heavily depends upon a massive amount of supervised data. In many applications collecting high-quality labeled data is a very expensive and time taking process. This paper proposes WFST-BERT model which augments the fine-tuning of BERT-like architecture with weighted finite-state transducer (WFST) to reduce the need for massive supervised data. The WFST-BERT employs regular expressions (REs) rules to encode domain knowledge and pre-trained BERT model to generate contextual representations of user sentences. In particular, the model converts REs into the trainable weighted finite-state transducer, which can generate decent predictions when limited or no training examples are available. Moreover, BERT contextual representation is combined with WFST and trained simultaneously on supervised data using a gradient descent algorithm. The experimental results on the ATIS dataset show that the F1-Score of the WFST-BERT improved by around 1.8% and 1.3% for intent detection and 0.9%, 0.7% for slot filling tasks as compared to its counterparts RE-NN and JointBERT models in limited data settings. Further, in full data settings, the proposed model generates better recall and F1-score than state-of-the-art models.
ISSN:0924-669X
1573-7497
DOI:10.1007/s10489-022-03295-9