NPAS: A Compiler-aware Framework of Unified Network Pruning and Architecture Search for Beyond Real-Time Mobile Acceleration
With the increasing demand to efficiently deploy DNNs on mobile edge devices, it becomes much more important to reduce unnecessary computation and increase the execution speed. Prior methods towards this goal, including model compression and network architecture search (NAS), are largely performed i...
Gespeichert in:
Hauptverfasser: | , , , , , , , , , , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | With the increasing demand to efficiently deploy DNNs on mobile edge devices,
it becomes much more important to reduce unnecessary computation and increase
the execution speed. Prior methods towards this goal, including model
compression and network architecture search (NAS), are largely performed
independently and do not fully consider compiler-level optimizations which is a
must-do for mobile acceleration. In this work, we first propose (i) a general
category of fine-grained structured pruning applicable to various DNN layers,
and (ii) a comprehensive, compiler automatic code generation framework
supporting different DNNs and different pruning schemes, which bridge the gap
of model compression and NAS. We further propose NPAS, a compiler-aware unified
network pruning, and architecture search. To deal with large search space, we
propose a meta-modeling procedure based on reinforcement learning with fast
evaluation and Bayesian optimization, ensuring the total number of training
epochs comparable with representative NAS frameworks. Our framework achieves
6.7ms, 5.9ms, 3.9ms ImageNet inference times with 78.2%, 75% (MobileNet-V3
level), and 71% (MobileNet-V2 level) Top-1 accuracy respectively on an
off-the-shelf mobile phone, consistently outperforming prior work. |
---|---|
DOI: | 10.48550/arxiv.2012.00596 |