A Runtime Switchable Multi-Phase Convolutional Neural Network for Resource-Constrained Systems
Convolutional Neural Networks (CNNs) are widely used in various systems, including resource-constrained embedded systems or IoT devices. In such systems, it is typical to deploy compressed or pruned CNNs, instead of original ones, at the cost of reduced accuracy. Existing CNN pruning techniques have...
Gespeichert in:
Veröffentlicht in: | IEEE access 2023-01, Vol.11, p.1-1 |
---|---|
Hauptverfasser: | , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Convolutional Neural Networks (CNNs) are widely used in various systems, including resource-constrained embedded systems or IoT devices. In such systems, it is typical to deploy compressed or pruned CNNs, instead of original ones, at the cost of reduced accuracy. Existing CNN pruning techniques have primarily focused on minimizing resource requirements. However, today's embedded systems are increasingly dynamic in both resource demands and availability. Thus, the previous techniques that only consider given static cases are no longer efficient. In this paper, we propose a novel multi-phase CNN that enables a multi-objective exploration of a number of pruning candidates out of a single CNN. In the proposed technique, a CNN can operate in various versions depending on which subsets of weights are used and can be transformed to the one best matches to the given constraint adaptively and efficiently. For that, a CNN is first pruned to the sparsest form; then a set of parameters (sub-network) is additionally supplemented as the phase goes by. As a result, a number of network versions for all different phases can be represented by a single network and they form a pareto solution over the accuracy and resource usage trade-off. In this work, we target CPU-based CNN inference engines as most embedded systems do not have the luxury of specialized co-processor support such as GPUs or HW accelerators. The proposed technique has been implemented in a publicly available CPU inference engine, Darknet, and its effectiveness has been validated with a popular CNN in terms of design space exploration capability and runtime switchability. |
---|---|
ISSN: | 2169-3536 2169-3536 |
DOI: | 10.1109/ACCESS.2023.3287998 |