NS-FDN: Near-Sensor Processing Architecture of Feature-Configurable Distributed Network for Beyond-Real-Time Always-on Keyword Spotting

Always-on keyword spotting (KWS) that detects wake-up words has been the indispensable module in the voice interaction system. However, the ultra-low-power embedded devices put forward strict requirements on energy consumption, latency, and recognition accuracy of KWS. In this work, we propose a nea...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on circuits and systems. I, Regular papers Regular papers, 2021-05, Vol.68 (5), p.1892-1905
Hauptverfasser:	Li, Qin, Liu, Changlu, Dong, Peiyan, Zhang, Yanming, Li, Tong, Lin, Sheng, Yang, Minda, Qiao, Fei, Wang, Yanzhi, Luo, Li, Yang, Huazhong
Format:	Artikel
Sprache:	eng
Schlagworte:	Accuracy Algorithms Analog to digital conversion Analog to digital converters Artificial neural networks Computer architecture Computer networks configurable feature distributed network Electronic devices Energy consumption Feature extraction Hardware Keyword spotting lightweight GRU Microphones near-sensor processing architecture Network latency Optimization Power consumption Power demand Redundancy Sensors Signal processing Speech recognition
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Always-on keyword spotting (KWS) that detects wake-up words has been the indispensable module in the voice interaction system. However, the ultra-low-power embedded devices put forward strict requirements on energy consumption, latency, and recognition accuracy of KWS. In this work, we propose a near-sensor processing architecture of feature-configurable distributed network (NS-FDN) for always-on KWS applications. The proposed distributed network adapts to the flexible keywords demands in the actual scene by splitting the conventional single network into distributed sub-networks. We design a channel-independent training framework to improve the recognition accuracy of distributed networks. The speech features are evaluated and the redundancy is reduced in NS-FDN, which can also configure the speech features to further reduce the computing complexity and improve processing speed. For deeper optimization, we implement a 65nm-process prototype chip with near-sensor mixed-signal processing architecture avoiding energy-consuming analog-to-digital converter. By improving the system, algorithm, and hardware designs of the KWS, our co-optimized architecture eliminates the energy consumption bottleneck long-standing in conventional KWS systems and achieves state-of-the-art system performance. The experiment results show that NS-FDN achieves 31.6% energy consumption savings, 1.6 times memory savings, 57 times speedup, and 3.4% higher recognition accuracy compared with the state of the art.
ISSN:	1549-8328 1558-0806
DOI:	10.1109/TCSI.2021.3059649