Improving Regression Performance with Distributional Losses
There is growing evidence that converting targets to soft targets in supervised learning can provide considerable gains in performance. Much of this work has considered classification, converting hard zero-one values to soft labels---such as by adding label noise, incorporating label ambiguity or us...
Gespeichert in:
Hauptverfasser: | , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | There is growing evidence that converting targets to soft targets in
supervised learning can provide considerable gains in performance. Much of this
work has considered classification, converting hard zero-one values to soft
labels---such as by adding label noise, incorporating label ambiguity or using
distillation. In parallel, there is some evidence from a regression setting in
reinforcement learning that learning distributions can improve performance. In
this work, we investigate the reasons for this improvement, in a regression
setting. We introduce a novel distributional regression loss, and similarly
find it significantly improves prediction accuracy. We investigate several
common hypotheses, around reducing overfitting and improved representations. We
instead find evidence for an alternative hypothesis: this loss is easier to
optimize, with better behaved gradients, resulting in improved generalization.
We provide theoretical support for this alternative hypothesis, by
characterizing the norm of the gradients of this loss. |
---|---|
DOI: | 10.48550/arxiv.1806.04613 |