Recurrent Memory Transformer

Transformer-based models show their effectiveness across multiple domains and tasks. The self-attention allows to combine information from all sequence elements into context-aware representations. However, global and local information has to be stored mostly in the same element-wise representations....

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Bulatov, Aydar, Kuratov, Yuri, Burtsev, Mikhail S
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer Science - Computation and Language Computer Science - Learning
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title
container_volume
creator	Bulatov, Aydar Kuratov, Yuri Burtsev, Mikhail S
description	Transformer-based models show their effectiveness across multiple domains and tasks. The self-attention allows to combine information from all sequence elements into context-aware representations. However, global and local information has to be stored mostly in the same element-wise representations. Moreover, the length of an input sequence is limited by quadratic computational complexity of self-attention. In this work, we propose and study a memory-augmented segment-level recurrent Transformer (RMT). Memory allows to store and process local and global information as well as to pass information between segments of the long sequence with the help of recurrence. We implement a memory mechanism with no changes to Transformer model by adding special memory tokens to the input or output sequence. Then the model is trained to control both memory operations and sequence representations processing. Results of experiments show that RMT performs on par with the Transformer-XL on language modeling for smaller memory sizes and outperforms it for tasks that require longer sequence processing. We show that adding memory tokens to Tr-XL is able to improve its performance. This makes Recurrent Memory Transformer a promising architecture for applications that require learning of long-term dependencies and general purpose in memory processing, such as algorithmic tasks and reasoning.
doi_str_mv	10.48550/arxiv.2207.06881
format	Article
fullrecord	<record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2207_06881</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2207_06881</sourcerecordid><originalsourceid>FETCH-LOGICAL-a671-6910be4e3c45707d2f82760c4be7ac58233144126056d47ef909c4ea9bba3e93</originalsourceid><addsrcrecordid>eNotzkkKwkAQQNHeuBD1AIKgF0isnruXIk4QEdR96LQVCJgo5YDeXoyu_u7zGBtySJXTGqaBXtUzFQJsCsY53mWjPcYHETb3yRbrC70nRwrNrbxQjdRnnTKcbzj4t8cOy8Vxvk6y3Wozn2VJMJYnxnMoUKGMSluwJ1E6YQ1EVaANUTshJVeKCwPanJTF0oOPCoMviiDRyx4b_66tLr9SVQd6519l3irlB1YcNeo</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Recurrent Memory Transformer</title><source>arXiv.org</source><creator>Bulatov, Aydar ; Kuratov, Yuri ; Burtsev, Mikhail S</creator><creatorcontrib>Bulatov, Aydar ; Kuratov, Yuri ; Burtsev, Mikhail S</creatorcontrib><description>Transformer-based models show their effectiveness across multiple domains and tasks. The self-attention allows to combine information from all sequence elements into context-aware representations. However, global and local information has to be stored mostly in the same element-wise representations. Moreover, the length of an input sequence is limited by quadratic computational complexity of self-attention. In this work, we propose and study a memory-augmented segment-level recurrent Transformer (RMT). Memory allows to store and process local and global information as well as to pass information between segments of the long sequence with the help of recurrence. We implement a memory mechanism with no changes to Transformer model by adding special memory tokens to the input or output sequence. Then the model is trained to control both memory operations and sequence representations processing. Results of experiments show that RMT performs on par with the Transformer-XL on language modeling for smaller memory sizes and outperforms it for tasks that require longer sequence processing. We show that adding memory tokens to Tr-XL is able to improve its performance. This makes Recurrent Memory Transformer a promising architecture for applications that require learning of long-term dependencies and general purpose in memory processing, such as algorithmic tasks and reasoning.</description><identifier>DOI: 10.48550/arxiv.2207.06881</identifier><language>eng</language><subject>Computer Science - Computation and Language ; Computer Science - Learning</subject><creationdate>2022-07</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,780,885</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2207.06881$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2207.06881$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Bulatov, Aydar</creatorcontrib><creatorcontrib>Kuratov, Yuri</creatorcontrib><creatorcontrib>Burtsev, Mikhail S</creatorcontrib><title>Recurrent Memory Transformer</title><description>Transformer-based models show their effectiveness across multiple domains and tasks. The self-attention allows to combine information from all sequence elements into context-aware representations. However, global and local information has to be stored mostly in the same element-wise representations. Moreover, the length of an input sequence is limited by quadratic computational complexity of self-attention. In this work, we propose and study a memory-augmented segment-level recurrent Transformer (RMT). Memory allows to store and process local and global information as well as to pass information between segments of the long sequence with the help of recurrence. We implement a memory mechanism with no changes to Transformer model by adding special memory tokens to the input or output sequence. Then the model is trained to control both memory operations and sequence representations processing. Results of experiments show that RMT performs on par with the Transformer-XL on language modeling for smaller memory sizes and outperforms it for tasks that require longer sequence processing. We show that adding memory tokens to Tr-XL is able to improve its performance. This makes Recurrent Memory Transformer a promising architecture for applications that require learning of long-term dependencies and general purpose in memory processing, such as algorithmic tasks and reasoning.</description><subject>Computer Science - Computation and Language</subject><subject>Computer Science - Learning</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotzkkKwkAQQNHeuBD1AIKgF0isnruXIk4QEdR96LQVCJgo5YDeXoyu_u7zGBtySJXTGqaBXtUzFQJsCsY53mWjPcYHETb3yRbrC70nRwrNrbxQjdRnnTKcbzj4t8cOy8Vxvk6y3Wozn2VJMJYnxnMoUKGMSluwJ1E6YQ1EVaANUTshJVeKCwPanJTF0oOPCoMviiDRyx4b_66tLr9SVQd6519l3irlB1YcNeo</recordid><startdate>20220714</startdate><enddate>20220714</enddate><creator>Bulatov, Aydar</creator><creator>Kuratov, Yuri</creator><creator>Burtsev, Mikhail S</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20220714</creationdate><title>Recurrent Memory Transformer</title><author>Bulatov, Aydar ; Kuratov, Yuri ; Burtsev, Mikhail S</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a671-6910be4e3c45707d2f82760c4be7ac58233144126056d47ef909c4ea9bba3e93</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Computer Science - Computation and Language</topic><topic>Computer Science - Learning</topic><toplevel>online_resources</toplevel><creatorcontrib>Bulatov, Aydar</creatorcontrib><creatorcontrib>Kuratov, Yuri</creatorcontrib><creatorcontrib>Burtsev, Mikhail S</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Bulatov, Aydar</au><au>Kuratov, Yuri</au><au>Burtsev, Mikhail S</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Recurrent Memory Transformer</atitle><date>2022-07-14</date><risdate>2022</risdate><abstract>Transformer-based models show their effectiveness across multiple domains and tasks. The self-attention allows to combine information from all sequence elements into context-aware representations. However, global and local information has to be stored mostly in the same element-wise representations. Moreover, the length of an input sequence is limited by quadratic computational complexity of self-attention. In this work, we propose and study a memory-augmented segment-level recurrent Transformer (RMT). Memory allows to store and process local and global information as well as to pass information between segments of the long sequence with the help of recurrence. We implement a memory mechanism with no changes to Transformer model by adding special memory tokens to the input or output sequence. Then the model is trained to control both memory operations and sequence representations processing. Results of experiments show that RMT performs on par with the Transformer-XL on language modeling for smaller memory sizes and outperforms it for tasks that require longer sequence processing. We show that adding memory tokens to Tr-XL is able to improve its performance. This makes Recurrent Memory Transformer a promising architecture for applications that require learning of long-term dependencies and general purpose in memory processing, such as algorithmic tasks and reasoning.</abstract><doi>10.48550/arxiv.2207.06881</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	DOI: 10.48550/arxiv.2207.06881
ispartof
issn
language	eng
recordid	cdi_arxiv_primary_2207_06881
source	arXiv.org
subjects	Computer Science - Computation and Language Computer Science - Learning
title	Recurrent Memory Transformer
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-06T13%3A46%3A11IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Recurrent%20Memory%20Transformer&rft.au=Bulatov,%20Aydar&rft.date=2022-07-14&rft_id=info:doi/10.48550/arxiv.2207.06881&rft_dat=%3Carxiv_GOX%3E2207_06881%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true