Approximate Distribution Matching for Sequence-to-Sequence Learning
Sequence-to-Sequence models were introduced to tackle many real-life problems like machine translation, summarization, image captioning, etc. The standard optimization algorithms are mainly based on example-to-example matching like maximum likelihood estimation, which is known to suffer from data sp...
Gespeichert in:
Hauptverfasser: | , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Sequence-to-Sequence models were introduced to tackle many real-life problems
like machine translation, summarization, image captioning, etc. The standard
optimization algorithms are mainly based on example-to-example matching like
maximum likelihood estimation, which is known to suffer from data sparsity
problem. Here we present an alternate view to explain sequence-to-sequence
learning as a distribution matching problem, where each source or target
example is viewed to represent a local latent distribution in the source or
target domain. Then, we interpret sequence-to-sequence learning as learning a
transductive model to transform the source local latent distributions to match
their corresponding target distributions. In our framework, we approximate both
the source and target latent distributions with recurrent neural networks
(augmenter). During training, the parallel augmenters learn to better
approximate the local latent distributions, while the sequence prediction model
learns to minimize the KL-divergence of the transformed source distributions
and the approximated target distributions. This algorithm can alleviate the
data sparsity issues in sequence learning by locally augmenting more unseen
data pairs and increasing the model's robustness. Experiments conducted on
machine translation and image captioning consistently demonstrate the
superiority of our proposed algorithm over the other competing algorithms. |
---|---|
DOI: | 10.48550/arxiv.1808.08003 |