Norm matters: efficient and accurate normalization schemes in deep networks
NeurIPS2018 Over the past few years, Batch-Normalization has been commonly used in deep networks, allowing faster training and high performance for a wide variety of applications. However, the reasons behind its merits remained unanswered, with several shortcomings that hindered its use for certain...
Gespeichert in:
Hauptverfasser: | , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | NeurIPS2018 Over the past few years, Batch-Normalization has been commonly used in deep
networks, allowing faster training and high performance for a wide variety of
applications. However, the reasons behind its merits remained unanswered, with
several shortcomings that hindered its use for certain tasks. In this work, we
present a novel view on the purpose and function of normalization methods and
weight-decay, as tools to decouple weights' norm from the underlying optimized
objective. This property highlights the connection between practices such as
normalization, weight decay and learning-rate adjustments. We suggest several
alternatives to the widely used $L^2$ batch-norm, using normalization in $L^1$
and $L^\infty$ spaces that can substantially improve numerical stability in
low-precision implementations as well as provide computational and memory
benefits. We demonstrate that such methods enable the first batch-norm
alternative to work for half-precision implementations. Finally, we suggest a
modification to weight-normalization, which improves its performance on
large-scale tasks. |
---|---|
DOI: | 10.48550/arxiv.1803.01814 |