Adaptive speculative decoding

Examples herein relate to decoding tokens using speculative decoding operations to decode tokens at an offset from a token decoded by a sequential decoding operation. At a checkpoint, a determination is made as to whether tokens to be decoded by the sequential and speculative decoding operations ali...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Satpathy, Sudhir K, Suresh, Vikram B, Mathew, Sanu K
Format: Patent
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Examples herein relate to decoding tokens using speculative decoding operations to decode tokens at an offset from a token decoded by a sequential decoding operation. At a checkpoint, a determination is made as to whether tokens to be decoded by the sequential and speculative decoding operations align. If there is alignment, the speculatively decoded tokens after a discard window are committed and made available for access. If there is not alignment, the speculatively decoded tokens are discarded. A miss in alignment and a fullness level of a buffer that stores speculatively decoded tokens are assessed to determine a next offset level for a start of speculative decoding. A size of a discard window can be set using a relationship based on the offset level to improve buffer utilization and to attempt to improve changes of alignments.