Improved features using convolution-augmented transformers for keyword spotting

Transformer can effectively model long rang dependency, but suffer from uncapable to extract local feature patterns. While CNNs exploit local features effectively. In this paper, we seek to combine convolution and Transformers improves over using them individually, and propose improved features usin...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	ITM web of conferences 2022, Vol.47, p.2039
Hauptverfasser:	Wang, Yi, Yang, Junan, Liu, Jingtao, Chen, Qiang, Li, Song
Format:	Artikel
Sprache:	eng
Schlagworte:	attention Convolution convolutional neural networks Feature extraction keyword spotting Transformers
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Transformer can effectively model long rang dependency, but suffer from uncapable to extract local feature patterns. While CNNs exploit local features effectively. In this paper, we seek to combine convolution and Transformers improves over using them individually, and propose improved features using convolution-augmented transformers for keyword spotting. The convolution-augmented transformers are constructed with a ResNet front-end and a convolution-augmented transformers back-end in series. Using this improved feature for keyword spotting task. The results show that the improved features using convolution- augmented transformers can yield at least 3% improvement compared with other features.
ISSN:	2271-2097 2431-7578 2271-2097
DOI:	10.1051/itmconf/20224702039