Explaining Modern Gated-Linear RNNs via a Unified Implicit Attention Formulation
Recent advances in efficient sequence modeling have led to attention-free layers, such as Mamba, RWKV, and various gated RNNs, all featuring sub-quadratic complexity in sequence length and excellent scaling properties, enabling the construction of a new type of foundation models. In this paper, we p...
Gespeichert in:
Hauptverfasser: | , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Recent advances in efficient sequence modeling have led to attention-free
layers, such as Mamba, RWKV, and various gated RNNs, all featuring
sub-quadratic complexity in sequence length and excellent scaling properties,
enabling the construction of a new type of foundation models. In this paper, we
present a unified view of these models, formulating such layers as implicit
causal self-attention layers. The formulation includes most of their
sub-components and is not limited to a specific part of the architecture. The
framework compares the underlying mechanisms on similar grounds for different
layers and provides a direct means for applying explainability methods. Our
experiments show that our attention matrices and attribution method outperform
an alternative and a more limited formulation that was recently proposed for
Mamba. For the other architectures for which our method is the first to provide
such a view, our method is effective and competitive in the relevant metrics
compared to the results obtained by state-of-the-art Transformer explainability
methods. Our code is publicly available. |
---|---|
DOI: | 10.48550/arxiv.2405.16504 |