State-Free Inference of State-Space Models: The Transfer Function Approach
We approach designing a state-space model for deep learning applications through its dual representation, the transfer function, and uncover a highly efficient sequence parallel inference algorithm that is state-free: unlike other proposed algorithms, state-free inference does not incur any signific...
Gespeichert in:
Hauptverfasser: | , , , , , , , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | We approach designing a state-space model for deep learning applications
through its dual representation, the transfer function, and uncover a highly
efficient sequence parallel inference algorithm that is state-free: unlike
other proposed algorithms, state-free inference does not incur any significant
memory or computational cost with an increase in state size. We achieve this
using properties of the proposed frequency domain transfer function
parametrization, which enables direct computation of its corresponding
convolutional kernel's spectrum via a single Fast Fourier Transform. Our
experimental results across multiple sequence lengths and state sizes
illustrates, on average, a 35% training speed improvement over S4 layers --
parametrized in time-domain -- on the Long Range Arena benchmark, while
delivering state-of-the-art downstream performances over other attention-free
approaches. Moreover, we report improved perplexity in language modeling over a
long convolutional Hyena baseline, by simply introducing our transfer function
parametrization. Our code is available at https://github.com/ruke1ire/RTF. |
---|---|
DOI: | 10.48550/arxiv.2405.06147 |