The effect of Target Normalization and Momentum on Dying ReLU
Optimizing parameters with momentum, normalizing data values, and using rectified linear units (ReLUs) are popular choices in neural network (NN) regression. Although ReLUs are popular, they can collapse to a constant function and "die", effectively removing their contribution from the mod...
Gespeichert in:
Hauptverfasser: | , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Optimizing parameters with momentum, normalizing data values, and using
rectified linear units (ReLUs) are popular choices in neural network (NN)
regression. Although ReLUs are popular, they can collapse to a constant
function and "die", effectively removing their contribution from the model.
While some mitigations are known, the underlying reasons of ReLUs dying during
optimization are currently poorly understood. In this paper, we consider the
effects of target normalization and momentum on dying ReLUs. We find
empirically that unit variance targets are well motivated and that ReLUs die
more easily, when target variance approaches zero. To further investigate this
matter, we analyze a discrete-time linear autonomous system, and show
theoretically how this relates to a model with a single ReLU and how common
properties can result in dying ReLU. We also analyze the gradients of a
single-ReLU model to identify saddle points and regions corresponding to dying
ReLU and how parameters evolve into these regions when momentum is used.
Finally, we show empirically that this problem persist, and is aggravated, for
deeper models including residual networks. |
---|---|
DOI: | 10.48550/arxiv.2005.06195 |