DR-Block: Convolutional Dense Reparameterization for CNN Generalization Free Improvement

As an emerging and popular technique for boosting CNNs, structural reparameterization (SR) decouples the training and inference structures to alter the training dynamics and achieve cost-free improvement of a given network. Existing SR methods often prioritize network expressiveness enhancement but...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on circuits and systems for video technology 2024-11, Vol.34 (11), p.10618-10631
Hauptverfasser: Yan, Qingqing, Li, Shu, He, Zongtao, Hu, Mengxian, Liu, Chengju, Chen, Qijun
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:As an emerging and popular technique for boosting CNNs, structural reparameterization (SR) decouples the training and inference structures to alter the training dynamics and achieve cost-free improvement of a given network. Existing SR methods often prioritize network expressiveness enhancement but have yet to investigate approaches to mitigate significant bias and non-robustness of model prediction due to over-reliance on training data distribution and image noise. To this end, inspired by the effective strength of implicit regularization on the problem, this paper introduces an extra balanced implicit regularization mechanism into SR techniques to enhance the generalization of a given network for the first time. Specifically, we propose a novel SR module named DR-Block, which is used to complicate each convolutional layer of a given CNN during training. It draws on the advantages of deep matrix factorization with the regularization effect and further improves singular value dynamics by introducing batch normalization and dense connections to alleviate network degradation. At inference time, DR-Block can be equivalently reparameterized back into a single convolution for deployment. Furthermore, we empirically demonstrate the role of each design in DR-Block and explicitly reveal its inherent mechanism, which lies in enhancing the movement of large singular values while countering the attenuation of small ones. This helps enhance the interpretability of SR techniques. Experiments illustrate that DR-Block is an impressive alternative for a regular convolution layer of any structure and outperforms the existing SR methods in improving mainstream network architectures on various visual tasks. The code is available at https://github.com/qyan0131/DRBlock .
ISSN:1051-8215
1558-2205
DOI:10.1109/TCSVT.2024.3411804