Retrieval is Accurate Generation
Standard language models generate text by selecting tokens from a fixed, finite, and standalone vocabulary. We introduce a novel method that selects context-aware phrases from a collection of supporting documents. One of the most significant challenges for this paradigm shift is determining the trai...
Gespeichert in:
Hauptverfasser: | , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Standard language models generate text by selecting tokens from a fixed,
finite, and standalone vocabulary. We introduce a novel method that selects
context-aware phrases from a collection of supporting documents. One of the
most significant challenges for this paradigm shift is determining the training
oracles, because a string of text can be segmented in various ways and each
segment can be retrieved from numerous possible documents. To address this, we
propose to initialize the training oracles using linguistic heuristics and,
more importantly, bootstrap the oracles through iterative self-reinforcement.
Extensive experiments show that our model not only outperforms standard
language models on a variety of knowledge-intensive tasks but also demonstrates
improved generation quality in open-ended text generation. For instance,
compared to the standard language model counterpart, our model raises the
accuracy from 23.47% to 36.27% on OpenbookQA, and improves the MAUVE score from
42.61% to 81.58% in open-ended text generation. Remarkably, our model also
achieves the best performance and the lowest latency among several
retrieval-augmented baselines. In conclusion, we assert that retrieval is more
accurate generation and hope that our work will encourage further research on
this new paradigm shift. |
---|---|
DOI: | 10.48550/arxiv.2402.17532 |