ALF: Autoencoder-based Low-rank Filter-sharing for Efficient Convolutional Neural Networks
Closing the gap between the hardware requirements of state-of-the-art convolutional neural networks and the limited resources constraining embedded applications is the next big challenge in deep learning research. The computational complexity and memory footprint of such neural networks are typicall...
Gespeichert in:
Hauptverfasser: | , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Closing the gap between the hardware requirements of state-of-the-art
convolutional neural networks and the limited resources constraining embedded
applications is the next big challenge in deep learning research. The
computational complexity and memory footprint of such neural networks are
typically daunting for deployment in resource constrained environments. Model
compression techniques, such as pruning, are emphasized among other
optimization methods for solving this problem. Most existing techniques require
domain expertise or result in irregular sparse representations, which increase
the burden of deploying deep learning applications on embedded hardware
accelerators. In this paper, we propose the autoencoder-based low-rank
filter-sharing technique technique (ALF). When applied to various networks, ALF
is compared to state-of-the-art pruning methods, demonstrating its efficient
compression capabilities on theoretical metrics as well as on an accurate,
deterministic hardware-model. In our experiments, ALF showed a reduction of
70\% in network parameters, 61\% in operations and 41\% in execution time, with
minimal loss in accuracy. |
---|---|
DOI: | 10.48550/arxiv.2007.13384 |