Jitter: Random Jittering Loss Function
Regularization plays a vital role in machine learning optimization. One novel regularization method called flooding makes the training loss fluctuate around the flooding level. It intends to make the model continue to random walk until it comes to a flat loss landscape to enhance generalization. How...
Gespeichert in:
Hauptverfasser: | , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Regularization plays a vital role in machine learning optimization. One novel
regularization method called flooding makes the training loss fluctuate around
the flooding level. It intends to make the model continue to random walk until
it comes to a flat loss landscape to enhance generalization. However, the
hyper-parameter flooding level of the flooding method fails to be selected
properly and uniformly. We propose a novel method called Jitter to improve it.
Jitter is essentially a kind of random loss function. Before training, we
randomly sample the Jitter Point from a specific probability distribution. The
flooding level should be replaced by Jitter point to obtain a new target
function and train the model accordingly. As Jitter point acting as a random
factor, we actually add some randomness to the loss function, which is
consistent with the fact that there exists innumerable random behaviors in the
learning process of the machine learning model and is supposed to make the
model more robust. In addition, Jitter performs random walk randomly which
divides the loss curve into small intervals and then flipping them over,
ideally making the loss curve much flatter and enhancing generalization
ability. Moreover, Jitter can be a domain-, task-, and model-independent
regularization method and train the model effectively after the training error
reduces to zero. Our experimental results show that Jitter method can improve
model performance more significantly than the previous flooding method and make
the test loss curve descend twice. |
---|---|
DOI: | 10.48550/arxiv.2106.13749 |