Demystifying Batch Normalization in ReLU Networks: Equivalent Convex Optimization Models and Implicit Regularization
Batch Normalization (BN) is a commonly used technique to accelerate and stabilize training of deep neural networks. Despite its empirical success, a full theoretical understanding of BN is yet to be developed. In this work, we analyze BN through the lens of convex optimization. We introduce an analy...
Gespeichert in:
Hauptverfasser: | , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Batch Normalization (BN) is a commonly used technique to accelerate and
stabilize training of deep neural networks. Despite its empirical success, a
full theoretical understanding of BN is yet to be developed. In this work, we
analyze BN through the lens of convex optimization. We introduce an analytic
framework based on convex duality to obtain exact convex representations of
weight-decay regularized ReLU networks with BN, which can be trained in
polynomial-time. Our analyses also show that optimal layer weights can be
obtained as simple closed-form formulas in the high-dimensional and/or
overparameterized regimes. Furthermore, we find that Gradient Descent provides
an algorithmic bias effect on the standard non-convex BN network, and we design
an approach to explicitly encode this implicit regularization into the convex
objective. Experiments with CIFAR image classification highlight the
effectiveness of this explicit regularization for mimicking and substantially
improving the performance of standard BN networks. |
---|---|
DOI: | 10.48550/arxiv.2103.01499 |