Your Autoregressive Generative Model Can be Better If You Treat It as an Energy-Based One
Autoregressive generative models are commonly used, especially for those tasks involving sequential data. They have, however, been plagued by a slew of inherent flaws due to the intrinsic characteristics of chain-style conditional modeling (e.g., exposure bias or lack of long-range coherence), sever...
Gespeichert in:
Hauptverfasser: | , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Autoregressive generative models are commonly used, especially for those
tasks involving sequential data. They have, however, been plagued by a slew of
inherent flaws due to the intrinsic characteristics of chain-style conditional
modeling (e.g., exposure bias or lack of long-range coherence), severely
limiting their ability to model distributions properly. In this paper, we
propose a unique method termed E-ARM for training autoregressive generative
models that takes advantage of a well-designed energy-based learning objective.
By leveraging the extra degree of freedom of the softmax operation, we are
allowed to make the autoregressive model itself be an energy-based model for
measuring the likelihood of input without introducing any extra parameters.
Furthermore, we show that E-ARM can be trained efficiently and is capable of
alleviating the exposure bias problem and increase temporal coherence for
autoregressive generative models. Extensive empirical results, covering
benchmarks like language modeling, neural machine translation, and image
generation, demonstrate the effectiveness of the proposed approach. |
---|---|
DOI: | 10.48550/arxiv.2206.12840 |