Non-Autoregressive Text Generation with Pre-trained Language Models
Non-autoregressive generation (NAG) has recently attracted great attention due to its fast inference speed. However, the generation quality of existing NAG models still lags behind their autoregressive counterparts. In this work, we show that BERT can be employed as the backbone of a NAG model to gr...
Gespeichert in:
Hauptverfasser: | , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Non-autoregressive generation (NAG) has recently attracted great attention
due to its fast inference speed. However, the generation quality of existing
NAG models still lags behind their autoregressive counterparts. In this work,
we show that BERT can be employed as the backbone of a NAG model to greatly
improve performance. Additionally, we devise mechanisms to alleviate the two
common problems of vanilla NAG models: the inflexibility of prefixed output
length and the conditional independence of individual token predictions.
Lastly, to further increase the speed advantage of the proposed model, we
propose a new decoding strategy, ratio-first, for applications where the output
lengths can be approximately estimated beforehand. For a comprehensive
evaluation, we test the proposed model on three text generation tasks,
including text summarization, sentence compression and machine translation.
Experimental results show that our model significantly outperforms existing
non-autoregressive baselines and achieves competitive performance with many
strong autoregressive models. In addition, we also conduct extensive analysis
experiments to reveal the effect of each proposed component. |
---|---|
DOI: | 10.48550/arxiv.2102.08220 |