FP-NAS: Fast Probabilistic Neural Architecture Search
Differential Neural Architecture Search (NAS) requires all layer choices to be held in memory simultaneously; this limits the size of both search space and final architecture. In contrast, Probabilistic NAS, such as PARSEC, learns a distribution over high-performing architectures, and uses only as m...
Gespeichert in:
Hauptverfasser: | , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Differential Neural Architecture Search (NAS) requires all layer choices to
be held in memory simultaneously; this limits the size of both search space and
final architecture. In contrast, Probabilistic NAS, such as PARSEC, learns a
distribution over high-performing architectures, and uses only as much memory
as needed to train a single model. Nevertheless, it needs to sample many
architectures, making it computationally expensive for searching in an
extensive space. To solve these problems, we propose a sampling method adaptive
to the distribution entropy, drawing more samples to encourage explorations at
the beginning, and reducing samples as learning proceeds. Furthermore, to
search fast in the multi-variate space, we propose a coarse-to-fine strategy by
using a factorized distribution at the beginning which can reduce the number of
architecture parameters by over an order of magnitude. We call this method Fast
Probabilistic NAS (FP-NAS). Compared with PARSEC, it can sample 64% fewer
architectures and search 2.1x faster. Compared with FBNetV2, FP-NAS is 1.9x -
3.5x faster, and the searched models outperform FBNetV2 models on ImageNet.
FP-NAS allows us to expand the giant FBNetV2 space to be wider (i.e. larger
channel choices) and deeper (i.e. more blocks), while adding Split-Attention
block and enabling the search over the number of splits. When searching a model
of size 0.4G FLOPS, FP-NAS is 132x faster than EfficientNet, and the searched
FP-NAS-L0 model outperforms EfficientNet-B0 by 0.7% accuracy. Without using any
architecture surrogate or scaling tricks, we directly search large models up to
1.0G FLOPS. Our FP-NAS-L2 model with simple distillation outperforms BigNAS-XL
with advanced in-place distillation by 0.7% accuracy using similar FLOPS. |
---|---|
DOI: | 10.48550/arxiv.2011.10949 |