FP-DARTS: Fast parallel differentiable neural architecture search for image classification
•A two-parallel-path super-network is carefully designed from three levels: operation-level, channel-level and training-level.•Instead of using two paths to train the super-network, we adopt the binary gate to randomly eliminate at most one path to reduce the memory cost.•We adopt the partially-conn...
Gespeichert in:
Veröffentlicht in: | Pattern recognition 2023-04, Vol.136, p.109193, Article 109193 |
---|---|
Hauptverfasser: | , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | •A two-parallel-path super-network is carefully designed from three levels: operation-level, channel-level and training-level.•Instead of using two paths to train the super-network, we adopt the binary gate to randomly eliminate at most one path to reduce the memory cost.•We adopt the partially-connected strategy to further reduce the memory cost, where each sub-network only adopts partial channels.•Sigmoid function is introduced to select the best input for each node across two operator sub-sets.•The proposed method can obtain better results in speed and effect.
Neural Architecture Search (NAS) has made remarkable progress in automatic machine learning. However, it still suffers massive computing overheads limiting its wide applications. In this paper, we present an efficient search method, Fast Parallel Differential Neural Architecture Search (FP-DARTS). The proposed method is carefully designed from three levels to construct and train the super-network. Firstly, at the operation-level, to reduce the computational burden, different from the standard DARTS search space (8 operations), we decompose the operation set into two non-overlapping operator sub-sets (4 operations for each). Adopting these two reduced search spaces, two over-parameterized sub-networks are constructed. Secondly, at the channel-level, the partially-connected strategy is adopted, where each sub-network only adopts partial channels. Then these two sub-networks construct a two-parallel-path super-network by addition. Thirdly, at the training-level, the binary gate is introduced to control whether a path participates in the super-network training. It may suffer an unfair issue when using softmax to select the best input for intermediate nodes across two operator sub-sets. To tackle this problem, the sigmoid function is introduced, which measures the performance of operations without compression. Extensive experiments demonstrate the effectiveness of the proposed algorithm. Specifically, FP-DARTS achieves 2.50% test error with only 0.08 GPU-days on CIFAR10, and a state-of-the-art top-1 error rate of 23.7% on ImageNet using only 2.44 GPU-days for search. |
---|---|
ISSN: | 0031-3203 1873-5142 |
DOI: | 10.1016/j.patcog.2022.109193 |