EdgeRNN: A Compact Speech Recognition Network With Spatio-Temporal Features for Edge Computing
Driven by the vision of Internet of Things, some research efforts have already focused on designing a network of efficient speech recognition for the development of edge computing. Other researches (such as tpool2) do not make full use of spatial and temporal information in the acoustic features of...
Gespeichert in:
Veröffentlicht in: | IEEE access 2020, Vol.8, p.81468-81478 |
---|---|
Hauptverfasser: | , , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Driven by the vision of Internet of Things, some research efforts have already focused on designing a network of efficient speech recognition for the development of edge computing. Other researches (such as tpool2) do not make full use of spatial and temporal information in the acoustic features of speech. In this paper, we propose a compact speech recognition network with spatio-temporal features for edge computing, named EdgeRNN. Alternatively, EdgeRNN uses 1-Dimensional Convolutional Neural Network (1-D CNN) to process the overall spatial information of each frequency domain of the acoustic features. A Recurrent Neural Network (RNN) is used to process the temporal information of each frequency domain of the acoustic features. In addition, we propose a simplified attention mechanism to enhance the portion of the network that contributes to the final identification. The overall performance of EdgeRNN has been verified on speech emotion and keywords recognition. The IEMOCAP dataset is used in speech emotion recognition, and the unweighted average recall (UAR) reaches 63.98%. Speech keywords recognition uses Google's Speech Commands Datasets V1 with a weighted average recall (WAR) of 96.82%. Compared with the experimental results of the related efficient networks on Raspberry Pi 3B+, the accuracies of EdgeRNN have been improved on both of speech emotion and keywords recognition. |
---|---|
ISSN: | 2169-3536 2169-3536 |
DOI: | 10.1109/ACCESS.2020.2990974 |