Recurrent Memory Transformer

Transformer-based models show their effectiveness across multiple domains and tasks. The self-attention allows to combine information from all sequence elements into context-aware representations. However, global and local information has to be stored mostly in the same element-wise representations....

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Bulatov, Aydar, Kuratov, Yuri, Burtsev, Mikhail S
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title
container_volume
creator Bulatov, Aydar
Kuratov, Yuri
Burtsev, Mikhail S
description Transformer-based models show their effectiveness across multiple domains and tasks. The self-attention allows to combine information from all sequence elements into context-aware representations. However, global and local information has to be stored mostly in the same element-wise representations. Moreover, the length of an input sequence is limited by quadratic computational complexity of self-attention. In this work, we propose and study a memory-augmented segment-level recurrent Transformer (RMT). Memory allows to store and process local and global information as well as to pass information between segments of the long sequence with the help of recurrence. We implement a memory mechanism with no changes to Transformer model by adding special memory tokens to the input or output sequence. Then the model is trained to control both memory operations and sequence representations processing. Results of experiments show that RMT performs on par with the Transformer-XL on language modeling for smaller memory sizes and outperforms it for tasks that require longer sequence processing. We show that adding memory tokens to Tr-XL is able to improve its performance. This makes Recurrent Memory Transformer a promising architecture for applications that require learning of long-term dependencies and general purpose in memory processing, such as algorithmic tasks and reasoning.
doi_str_mv 10.48550/arxiv.2207.06881
format Article
fullrecord <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2207_06881</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2207_06881</sourcerecordid><originalsourceid>FETCH-LOGICAL-a671-6910be4e3c45707d2f82760c4be7ac58233144126056d47ef909c4ea9bba3e93</originalsourceid><addsrcrecordid>eNotzkkKwkAQQNHeuBD1AIKgF0isnruXIk4QEdR96LQVCJgo5YDeXoyu_u7zGBtySJXTGqaBXtUzFQJsCsY53mWjPcYHETb3yRbrC70nRwrNrbxQjdRnnTKcbzj4t8cOy8Vxvk6y3Wozn2VJMJYnxnMoUKGMSluwJ1E6YQ1EVaANUTshJVeKCwPanJTF0oOPCoMviiDRyx4b_66tLr9SVQd6519l3irlB1YcNeo</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Recurrent Memory Transformer</title><source>arXiv.org</source><creator>Bulatov, Aydar ; Kuratov, Yuri ; Burtsev, Mikhail S</creator><creatorcontrib>Bulatov, Aydar ; Kuratov, Yuri ; Burtsev, Mikhail S</creatorcontrib><description>Transformer-based models show their effectiveness across multiple domains and tasks. The self-attention allows to combine information from all sequence elements into context-aware representations. However, global and local information has to be stored mostly in the same element-wise representations. Moreover, the length of an input sequence is limited by quadratic computational complexity of self-attention. In this work, we propose and study a memory-augmented segment-level recurrent Transformer (RMT). Memory allows to store and process local and global information as well as to pass information between segments of the long sequence with the help of recurrence. We implement a memory mechanism with no changes to Transformer model by adding special memory tokens to the input or output sequence. Then the model is trained to control both memory operations and sequence representations processing. Results of experiments show that RMT performs on par with the Transformer-XL on language modeling for smaller memory sizes and outperforms it for tasks that require longer sequence processing. We show that adding memory tokens to Tr-XL is able to improve its performance. This makes Recurrent Memory Transformer a promising architecture for applications that require learning of long-term dependencies and general purpose in memory processing, such as algorithmic tasks and reasoning.</description><identifier>DOI: 10.48550/arxiv.2207.06881</identifier><language>eng</language><subject>Computer Science - Computation and Language ; Computer Science - Learning</subject><creationdate>2022-07</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,780,885</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2207.06881$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2207.06881$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Bulatov, Aydar</creatorcontrib><creatorcontrib>Kuratov, Yuri</creatorcontrib><creatorcontrib>Burtsev, Mikhail S</creatorcontrib><title>Recurrent Memory Transformer</title><description>Transformer-based models show their effectiveness across multiple domains and tasks. The self-attention allows to combine information from all sequence elements into context-aware representations. However, global and local information has to be stored mostly in the same element-wise representations. Moreover, the length of an input sequence is limited by quadratic computational complexity of self-attention. In this work, we propose and study a memory-augmented segment-level recurrent Transformer (RMT). Memory allows to store and process local and global information as well as to pass information between segments of the long sequence with the help of recurrence. We implement a memory mechanism with no changes to Transformer model by adding special memory tokens to the input or output sequence. Then the model is trained to control both memory operations and sequence representations processing. Results of experiments show that RMT performs on par with the Transformer-XL on language modeling for smaller memory sizes and outperforms it for tasks that require longer sequence processing. We show that adding memory tokens to Tr-XL is able to improve its performance. This makes Recurrent Memory Transformer a promising architecture for applications that require learning of long-term dependencies and general purpose in memory processing, such as algorithmic tasks and reasoning.</description><subject>Computer Science - Computation and Language</subject><subject>Computer Science - Learning</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotzkkKwkAQQNHeuBD1AIKgF0isnruXIk4QEdR96LQVCJgo5YDeXoyu_u7zGBtySJXTGqaBXtUzFQJsCsY53mWjPcYHETb3yRbrC70nRwrNrbxQjdRnnTKcbzj4t8cOy8Vxvk6y3Wozn2VJMJYnxnMoUKGMSluwJ1E6YQ1EVaANUTshJVeKCwPanJTF0oOPCoMviiDRyx4b_66tLr9SVQd6519l3irlB1YcNeo</recordid><startdate>20220714</startdate><enddate>20220714</enddate><creator>Bulatov, Aydar</creator><creator>Kuratov, Yuri</creator><creator>Burtsev, Mikhail S</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20220714</creationdate><title>Recurrent Memory Transformer</title><author>Bulatov, Aydar ; Kuratov, Yuri ; Burtsev, Mikhail S</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a671-6910be4e3c45707d2f82760c4be7ac58233144126056d47ef909c4ea9bba3e93</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Computer Science - Computation and Language</topic><topic>Computer Science - Learning</topic><toplevel>online_resources</toplevel><creatorcontrib>Bulatov, Aydar</creatorcontrib><creatorcontrib>Kuratov, Yuri</creatorcontrib><creatorcontrib>Burtsev, Mikhail S</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Bulatov, Aydar</au><au>Kuratov, Yuri</au><au>Burtsev, Mikhail S</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Recurrent Memory Transformer</atitle><date>2022-07-14</date><risdate>2022</risdate><abstract>Transformer-based models show their effectiveness across multiple domains and tasks. The self-attention allows to combine information from all sequence elements into context-aware representations. However, global and local information has to be stored mostly in the same element-wise representations. Moreover, the length of an input sequence is limited by quadratic computational complexity of self-attention. In this work, we propose and study a memory-augmented segment-level recurrent Transformer (RMT). Memory allows to store and process local and global information as well as to pass information between segments of the long sequence with the help of recurrence. We implement a memory mechanism with no changes to Transformer model by adding special memory tokens to the input or output sequence. Then the model is trained to control both memory operations and sequence representations processing. Results of experiments show that RMT performs on par with the Transformer-XL on language modeling for smaller memory sizes and outperforms it for tasks that require longer sequence processing. We show that adding memory tokens to Tr-XL is able to improve its performance. This makes Recurrent Memory Transformer a promising architecture for applications that require learning of long-term dependencies and general purpose in memory processing, such as algorithmic tasks and reasoning.</abstract><doi>10.48550/arxiv.2207.06881</doi><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier DOI: 10.48550/arxiv.2207.06881
ispartof
issn
language eng
recordid cdi_arxiv_primary_2207_06881
source arXiv.org
subjects Computer Science - Computation and Language
Computer Science - Learning
title Recurrent Memory Transformer
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-06T13%3A46%3A11IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Recurrent%20Memory%20Transformer&rft.au=Bulatov,%20Aydar&rft.date=2022-07-14&rft_id=info:doi/10.48550/arxiv.2207.06881&rft_dat=%3Carxiv_GOX%3E2207_06881%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true