Anytime Recognition with Routing Convolutional Networks
Achieving an automatic trade-off between accuracy and efficiency for a single deep neural network is highly desired in time-sensitive computer vision applications. To achieve anytime prediction, existing methods only embed fixed exits to neural networks and make the predictions with the fixed exits...
Gespeichert in:
Veröffentlicht in: | IEEE transactions on pattern analysis and machine intelligence 2021-06, Vol.43 (6), p.1875-1886 |
---|---|
Hauptverfasser: | , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Achieving an automatic trade-off between accuracy and efficiency for a single deep neural network is highly desired in time-sensitive computer vision applications. To achieve anytime prediction, existing methods only embed fixed exits to neural networks and make the predictions with the fixed exits for all the samples (refer to the "latest-all" strategy). However, it is observed that the latest exit within a time budget does not always provide a more accurate prediction than the earlier exits for testing samples of various difficulties, making the "latest-all" strategy a sub-optimal solution. Motivated by this, we propose to improve the anytime prediction accuracy by allowing each sample to adaptively select its own optimal exit within a specific time budget. Specifically, we propose a new Routing Convolutional Network (RCN). For any given time budget, it adaptively selects the optimal layer as exit for a specific testing sample. To learn an optimal policy for sample routing, a Q-network is embedded into the RCN at each exit, considering both potential information gain and time-cost. To further boost the anytime prediction accuracy, the exits and the Q-networks are optimized alternately to mutually boost each other under the cost-sensitive environment. Apart from applying to whole image classification, RCN can also be adapted to dense prediction tasks, e.g., scene parsing, to achieve the pixel-level anytime prediction. Extensive experimental results on CIFAR-10, CIFAR-100, and ImageNet classification benchmarks, and Cityscapes scene parsing benchmark demonstrate the efficacy of the proposed RCN for anytime recognition. |
---|---|
ISSN: | 0162-8828 1939-3539 2160-9292 |
DOI: | 10.1109/TPAMI.2019.2959322 |