Arithmetic Sampling: Parallel Diverse Decoding for Large Language Models
Decoding methods for large language models often trade-off between diversity of outputs and parallelism of computation. Methods such as beam search and Gumbel top-k sampling can guarantee a different output for each element of the beam, but are not easy to parallelize. Alternatively, methods such as...
Gespeichert in:
Hauptverfasser: | , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Decoding methods for large language models often trade-off between diversity
of outputs and parallelism of computation. Methods such as beam search and
Gumbel top-k sampling can guarantee a different output for each element of the
beam, but are not easy to parallelize. Alternatively, methods such as
temperature sampling and its modifications (top-k sampling, nucleus sampling,
typical decoding, and others), are embarrassingly parallel, but have no
guarantees about duplicate samples. We present a framework for sampling
according to an arithmetic code book implicitly defined by a large language
model, compatible with common sampling variations, with provable beam diversity
under certain conditions, as well as being embarrassingly parallel and
providing unbiased and consistent expectations from the original model. We
demonstrate the effectiveness of our approach on WMT machine translation, more
than halving the standard deviation when estimating expected BLEU score reward,
and closing the BLEU score gap between independent sampling and beam search by
up to 63%. |
---|---|
DOI: | 10.48550/arxiv.2210.15458 |