Dropout as a Regularizer of Interaction Effects
We examine Dropout through the perspective of interactions. This view provides a symmetry to explain Dropout: given $N$ variables, there are ${N \choose k}$ possible sets of $k$ variables to form an interaction (i.e. $\mathcal{O}(N^k)$); conversely, the probability an interaction of $k$ variables su...
Gespeichert in:
Hauptverfasser: | , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | We examine Dropout through the perspective of interactions. This view
provides a symmetry to explain Dropout: given $N$ variables, there are ${N
\choose k}$ possible sets of $k$ variables to form an interaction (i.e.
$\mathcal{O}(N^k)$); conversely, the probability an interaction of $k$
variables survives Dropout at rate $p$ is $(1-p)^k$ (decaying with $k$). These
rates effectively cancel, and so Dropout regularizes against higher-order
interactions. We prove this perspective analytically and empirically. This
perspective of Dropout as a regularizer against interaction effects has several
practical implications: (1) higher Dropout rates should be used when we need
stronger regularization against spurious high-order interactions, (2) caution
should be exercised when interpreting Dropout-based explanations and
uncertainty measures, and (3) networks trained with Input Dropout are biased
estimators. We also compare Dropout to other regularizers and find that it is
difficult to obtain the same selective pressure against high-order
interactions. |
---|---|
DOI: | 10.48550/arxiv.2007.00823 |