Expansion Span: Combining Fading Memory and Retrieval in Hybrid State Space Models

The "state" of State Space Models (SSMs) represents their memory, which fades exponentially over an unbounded span. By contrast, Attention-based models have "eidetic" (i.e., verbatim, or photographic) memory over a finite span (context size). Hybrid architectures combine State Sp...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	arXiv.org 2024-12
Hauptverfasser:	Nunez, Elvis, Zancato, Luca, Bowman, Benjamin, Golatkar, Aditya, Xia, Wei, Soatto, Stefano
Format:	Artikel
Sprache:	eng
Schlagworte:	Attention Context Natural language State space models
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title	arXiv.org
container_volume
creator	Nunez, Elvis Zancato, Luca Bowman, Benjamin Golatkar, Aditya Xia, Wei Soatto, Stefano
description	The "state" of State Space Models (SSMs) represents their memory, which fades exponentially over an unbounded span. By contrast, Attention-based models have "eidetic" (i.e., verbatim, or photographic) memory over a finite span (context size). Hybrid architectures combine State Space layers with Attention, but still cannot recall the distant past and can access only the most recent tokens eidetically. Unlike current methods of combining SSM and Attention layers, we allow the state to be allocated based on relevancy rather than recency. In this way, for every new set of query tokens, our models can "eidetically" access tokens from beyond the Attention span of current Hybrid SSMs without requiring extra hardware resources. We describe a method to expand the memory span of the hybrid state by "reserving" a fraction of the Attention context for tokens retrieved from arbitrarily distant in the past, thus expanding the eidetic memory span of the overall state. We call this reserved fraction of tokens the "expansion span," and the mechanism to retrieve and aggregate it "Span-Expanded Attention" (SE-Attn). To adapt Hybrid models to using SE-Attn, we propose a novel fine-tuning method that extends LoRA to Hybrid models (HyLoRA) and allows efficient adaptation on long spans of tokens. We show that SE-Attn enables us to efficiently adapt pre-trained Hybrid models on sequences of tokens up to 8 times longer than the ones used for pre-training. We show that HyLoRA with SE-Attn is cheaper and more performant than alternatives like LongLoRA when applied to Hybrid models on natural language benchmarks with long-range dependencies, such as PG-19, RULER, and other common natural language downstream tasks.
format	Article
fullrecord	<record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_3147267435</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>3147267435</sourcerecordid><originalsourceid>FETCH-proquest_journals_31472674353</originalsourceid><addsrcrecordid>eNqNjtEKgjAYhUcQJOU7_NC1oJtmdCuKN95o9zHbX0x0s21Gvn0KPUBX34HzHTgb4lHGouAcU7ojvrVdGIb0lNIkYR6p88_IlZVaQbOEC2R6aKWS6gkFFysqHLSZgSsBNToj8c17kArKuTVSQOO4w3V7R6i0wN4eyPbBe4v-j3tyLPJrVgaj0a8Jrbt1ejJqqW4sitPlSswS9p_1BX1LPs8</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3147267435</pqid></control><display><type>article</type><title>Expansion Span: Combining Fading Memory and Retrieval in Hybrid State Space Models</title><source>Free E- Journals</source><creator>Nunez, Elvis ; Zancato, Luca ; Bowman, Benjamin ; Golatkar, Aditya ; Xia, Wei ; Soatto, Stefano</creator><creatorcontrib>Nunez, Elvis ; Zancato, Luca ; Bowman, Benjamin ; Golatkar, Aditya ; Xia, Wei ; Soatto, Stefano</creatorcontrib><description>The "state" of State Space Models (SSMs) represents their memory, which fades exponentially over an unbounded span. By contrast, Attention-based models have "eidetic" (i.e., verbatim, or photographic) memory over a finite span (context size). Hybrid architectures combine State Space layers with Attention, but still cannot recall the distant past and can access only the most recent tokens eidetically. Unlike current methods of combining SSM and Attention layers, we allow the state to be allocated based on relevancy rather than recency. In this way, for every new set of query tokens, our models can "eidetically" access tokens from beyond the Attention span of current Hybrid SSMs without requiring extra hardware resources. We describe a method to expand the memory span of the hybrid state by "reserving" a fraction of the Attention context for tokens retrieved from arbitrarily distant in the past, thus expanding the eidetic memory span of the overall state. We call this reserved fraction of tokens the "expansion span," and the mechanism to retrieve and aggregate it "Span-Expanded Attention" (SE-Attn). To adapt Hybrid models to using SE-Attn, we propose a novel fine-tuning method that extends LoRA to Hybrid models (HyLoRA) and allows efficient adaptation on long spans of tokens. We show that SE-Attn enables us to efficiently adapt pre-trained Hybrid models on sequences of tokens up to 8 times longer than the ones used for pre-training. We show that HyLoRA with SE-Attn is cheaper and more performant than alternatives like LongLoRA when applied to Hybrid models on natural language benchmarks with long-range dependencies, such as PG-19, RULER, and other common natural language downstream tasks.</description><identifier>EISSN: 2331-8422</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Attention ; Context ; Natural language ; State space models</subject><ispartof>arXiv.org, 2024-12</ispartof><rights>2024. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>776,780</link.rule.ids></links><search><creatorcontrib>Nunez, Elvis</creatorcontrib><creatorcontrib>Zancato, Luca</creatorcontrib><creatorcontrib>Bowman, Benjamin</creatorcontrib><creatorcontrib>Golatkar, Aditya</creatorcontrib><creatorcontrib>Xia, Wei</creatorcontrib><creatorcontrib>Soatto, Stefano</creatorcontrib><title>Expansion Span: Combining Fading Memory and Retrieval in Hybrid State Space Models</title><title>arXiv.org</title><description>The "state" of State Space Models (SSMs) represents their memory, which fades exponentially over an unbounded span. By contrast, Attention-based models have "eidetic" (i.e., verbatim, or photographic) memory over a finite span (context size). Hybrid architectures combine State Space layers with Attention, but still cannot recall the distant past and can access only the most recent tokens eidetically. Unlike current methods of combining SSM and Attention layers, we allow the state to be allocated based on relevancy rather than recency. In this way, for every new set of query tokens, our models can "eidetically" access tokens from beyond the Attention span of current Hybrid SSMs without requiring extra hardware resources. We describe a method to expand the memory span of the hybrid state by "reserving" a fraction of the Attention context for tokens retrieved from arbitrarily distant in the past, thus expanding the eidetic memory span of the overall state. We call this reserved fraction of tokens the "expansion span," and the mechanism to retrieve and aggregate it "Span-Expanded Attention" (SE-Attn). To adapt Hybrid models to using SE-Attn, we propose a novel fine-tuning method that extends LoRA to Hybrid models (HyLoRA) and allows efficient adaptation on long spans of tokens. We show that SE-Attn enables us to efficiently adapt pre-trained Hybrid models on sequences of tokens up to 8 times longer than the ones used for pre-training. We show that HyLoRA with SE-Attn is cheaper and more performant than alternatives like LongLoRA when applied to Hybrid models on natural language benchmarks with long-range dependencies, such as PG-19, RULER, and other common natural language downstream tasks.</description><subject>Attention</subject><subject>Context</subject><subject>Natural language</subject><subject>State space models</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><recordid>eNqNjtEKgjAYhUcQJOU7_NC1oJtmdCuKN95o9zHbX0x0s21Gvn0KPUBX34HzHTgb4lHGouAcU7ojvrVdGIb0lNIkYR6p88_IlZVaQbOEC2R6aKWS6gkFFysqHLSZgSsBNToj8c17kArKuTVSQOO4w3V7R6i0wN4eyPbBe4v-j3tyLPJrVgaj0a8Jrbt1ejJqqW4sitPlSswS9p_1BX1LPs8</recordid><startdate>20241217</startdate><enddate>20241217</enddate><creator>Nunez, Elvis</creator><creator>Zancato, Luca</creator><creator>Bowman, Benjamin</creator><creator>Golatkar, Aditya</creator><creator>Xia, Wei</creator><creator>Soatto, Stefano</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>20241217</creationdate><title>Expansion Span: Combining Fading Memory and Retrieval in Hybrid State Space Models</title><author>Nunez, Elvis ; Zancato, Luca ; Bowman, Benjamin ; Golatkar, Aditya ; Xia, Wei ; Soatto, Stefano</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_31472674353</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Attention</topic><topic>Context</topic><topic>Natural language</topic><topic>State space models</topic><toplevel>online_resources</toplevel><creatorcontrib>Nunez, Elvis</creatorcontrib><creatorcontrib>Zancato, Luca</creatorcontrib><creatorcontrib>Bowman, Benjamin</creatorcontrib><creatorcontrib>Golatkar, Aditya</creatorcontrib><creatorcontrib>Xia, Wei</creatorcontrib><creatorcontrib>Soatto, Stefano</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Nunez, Elvis</au><au>Zancato, Luca</au><au>Bowman, Benjamin</au><au>Golatkar, Aditya</au><au>Xia, Wei</au><au>Soatto, Stefano</au><format>book</format><genre>document</genre><ristype>GEN</ristype><atitle>Expansion Span: Combining Fading Memory and Retrieval in Hybrid State Space Models</atitle><jtitle>arXiv.org</jtitle><date>2024-12-17</date><risdate>2024</risdate><eissn>2331-8422</eissn><abstract>The "state" of State Space Models (SSMs) represents their memory, which fades exponentially over an unbounded span. By contrast, Attention-based models have "eidetic" (i.e., verbatim, or photographic) memory over a finite span (context size). Hybrid architectures combine State Space layers with Attention, but still cannot recall the distant past and can access only the most recent tokens eidetically. Unlike current methods of combining SSM and Attention layers, we allow the state to be allocated based on relevancy rather than recency. In this way, for every new set of query tokens, our models can "eidetically" access tokens from beyond the Attention span of current Hybrid SSMs without requiring extra hardware resources. We describe a method to expand the memory span of the hybrid state by "reserving" a fraction of the Attention context for tokens retrieved from arbitrarily distant in the past, thus expanding the eidetic memory span of the overall state. We call this reserved fraction of tokens the "expansion span," and the mechanism to retrieve and aggregate it "Span-Expanded Attention" (SE-Attn). To adapt Hybrid models to using SE-Attn, we propose a novel fine-tuning method that extends LoRA to Hybrid models (HyLoRA) and allows efficient adaptation on long spans of tokens. We show that SE-Attn enables us to efficiently adapt pre-trained Hybrid models on sequences of tokens up to 8 times longer than the ones used for pre-training. We show that HyLoRA with SE-Attn is cheaper and more performant than alternatives like LongLoRA when applied to Hybrid models on natural language benchmarks with long-range dependencies, such as PG-19, RULER, and other common natural language downstream tasks.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	EISSN: 2331-8422
ispartof	arXiv.org, 2024-12
issn	2331-8422
language	eng
recordid	cdi_proquest_journals_3147267435
source	Free E- Journals
subjects	Attention Context Natural language State space models
title	Expansion Span: Combining Fading Memory and Retrieval in Hybrid State Space Models
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-23T08%3A02%3A30IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=document&rft.atitle=Expansion%20Span:%20Combining%20Fading%20Memory%20and%20Retrieval%20in%20Hybrid%20State%20Space%20Models&rft.jtitle=arXiv.org&rft.au=Nunez,%20Elvis&rft.date=2024-12-17&rft.eissn=2331-8422&rft_id=info:doi/&rft_dat=%3Cproquest%3E3147267435%3C/proquest%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=3147267435&rft_id=info:pmid/&rfr_iscdi=true