Human Activity Recognition on Microcontrollers with Quantized and Adaptive Deep Neural Networks

Human Activity Recognition (HAR) based on inertial data is an increasingly diffused task on embedded devices, from smartphones to ultra low-power sensors. Due to the high computational complexity of deep learning models, most embedded HAR systems are based on simple and not-so-accurate classic machi...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:ACM transactions on embedded computing systems 2022-08, Vol.21 (4), p.1-28, Article 46
Hauptverfasser: Daghero, Francesco, Burrello, Alessio, Xie, Chen, Castellano, Marco, Gandolfi, Luca, Calimera, Andrea, Macii, Enrico, Poncino, Massimo, Pagliari, Daniele Jahier
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Human Activity Recognition (HAR) based on inertial data is an increasingly diffused task on embedded devices, from smartphones to ultra low-power sensors. Due to the high computational complexity of deep learning models, most embedded HAR systems are based on simple and not-so-accurate classic machine learning algorithms. This work bridges the gap between on-device HAR and deep learning, proposing a set of efficient one-dimensional Convolutional Neural Networks (CNNs) that can be deployed on general purpose microcontrollers (MCUs). Our CNNs are obtained combining hyper-parameters optimization with sub-byte and mixed-precision quantization, to find good trade-offs between classification results and memory occupation. Moreover, we also leverage adaptive inference as an orthogonal optimization to tune the inference complexity at runtime based on the processed input, hence producing a more flexible HAR system. With experiments on four datasets, and targeting an ultra-low-power RISC-V MCU, we show that (i) we are able to obtain a rich set of Pareto-optimal CNNs for HAR, spanning more than 1 order of magnitude in terms of memory, latency, and energy consumption; (ii) thanks to adaptive inference, we can derive >20 runtime operating modes starting from a single CNN, differing by up to 10% in classification scores and by more than 3× in inference complexity, with a limited memory overhead; (iii) on three of the four benchmarks, we outperform all previous deep learning methods, while reducing the memory occupation by more than 100×. The few methods that obtain better performance (both shallow and deep) are not compatible with MCU deployment; (iv) all our CNNs are compatible with real-time on-device HAR, achieving an inference latency that ranges between 9 μs and 16 ms. Their memory occupation varies in 0.05–23.17 kB, and their energy consumption in 0.05 and 61.59 μJ, allowing years of continuous operation on a small battery supply.
ISSN:1539-9087
1558-3465
DOI:10.1145/3542819