Gated Compression Layers for Efficient Always-On Models
Mobile and embedded machine learning developers frequently have to compromise between two inferior on-device deployment strategies: sacrifice accuracy and aggressively shrink their models to run on dedicated low-power cores; or sacrifice battery by running larger models on more powerful compute core...
Gespeichert in:
Hauptverfasser: | , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Mobile and embedded machine learning developers frequently have to compromise
between two inferior on-device deployment strategies: sacrifice accuracy and
aggressively shrink their models to run on dedicated low-power cores; or
sacrifice battery by running larger models on more powerful compute cores such
as neural processing units or the main application processor. In this paper, we
propose a novel Gated Compression layer that can be applied to transform
existing neural network architectures into Gated Neural Networks. Gated Neural
Networks have multiple properties that excel for on-device use cases that help
significantly reduce power, boost accuracy, and take advantage of heterogeneous
compute cores. We provide results across five public image and audio datasets
that demonstrate the proposed Gated Compression layer effectively stops up to
96% of negative samples, compresses 97% of positive samples, while maintaining
or improving model accuracy. |
---|---|
DOI: | 10.48550/arxiv.2303.08970 |