Adaptive speculative decoding
Examples herein relate to decoding tokens using speculative decoding operations to decode tokens at an offset from a token decoded by a sequential decoding operation. At a checkpoint, a determination is made as to whether tokens to be decoded by the sequential and speculative decoding operations ali...
Gespeichert in:
Hauptverfasser: | , , |
---|---|
Format: | Patent |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Examples herein relate to decoding tokens using speculative decoding operations to decode tokens at an offset from a token decoded by a sequential decoding operation. At a checkpoint, a determination is made as to whether tokens to be decoded by the sequential and speculative decoding operations align. If there is alignment, the speculatively decoded tokens after a discard window are committed and made available for access. If there is not alignment, the speculatively decoded tokens are discarded. A miss in alignment and a fullness level of a buffer that stores speculatively decoded tokens are assessed to determine a next offset level for a start of speculative decoding. A size of a discard window can be set using a relationship based on the offset level to improve buffer utilization and to attempt to improve changes of alignments. |
---|