RecurrentGemma: Moving Past Transformers for Efficient Open Language Models
We introduce RecurrentGemma, a family of open language models which uses Google's novel Griffin architecture. Griffin combines linear recurrences with local attention to achieve excellent performance on language. It has a fixed-sized state, which reduces memory use and enables efficient inferen...
Gespeichert in:
Hauptverfasser: | , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | We introduce RecurrentGemma, a family of open language models which uses
Google's novel Griffin architecture. Griffin combines linear recurrences with
local attention to achieve excellent performance on language. It has a
fixed-sized state, which reduces memory use and enables efficient inference on
long sequences. We provide two sizes of models, containing 2B and 9B
parameters, and provide pre-trained and instruction tuned variants for both.
Our models achieve comparable performance to similarly-sized Gemma baselines
despite being trained on fewer tokens. |
---|---|
DOI: | 10.48550/arxiv.2404.07839 |