Improved features using convolution-augmented transformers for keyword spotting

Transformer can effectively model long rang dependency, but suffer from uncapable to extract local feature patterns. While CNNs exploit local features effectively. In this paper, we seek to combine convolution and Transformers improves over using them individually, and propose improved features usin...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:ITM web of conferences 2022, Vol.47, p.2039
Hauptverfasser: Wang, Yi, Yang, Junan, Liu, Jingtao, Chen, Qiang, Li, Song
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Transformer can effectively model long rang dependency, but suffer from uncapable to extract local feature patterns. While CNNs exploit local features effectively. In this paper, we seek to combine convolution and Transformers improves over using them individually, and propose improved features using convolution-augmented transformers for keyword spotting. The convolution-augmented transformers are constructed with a ResNet front-end and a convolution-augmented transformers back-end in series. Using this improved feature for keyword spotting task. The results show that the improved features using convolution- augmented transformers can yield at least 3% improvement compared with other features.
ISSN:2271-2097
2431-7578
2271-2097
DOI:10.1051/itmconf/20224702039