AGaLiTe: Approximate Gated Linear Transformers for Online Reinforcement Learning

In this paper we investigate transformer architectures designed for partially observable online reinforcement learning. The self-attention mechanism in the transformer architecture is capable of capturing long-range dependencies and it is the main reason behind its effectiveness in processing sequen...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Pramanik, Subhojeet, Elelimy, Esraa, Machado, Marlos C, White, Adam
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer Science - Artificial Intelligence Computer Science - Learning
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title
container_volume
creator	Pramanik, Subhojeet Elelimy, Esraa Machado, Marlos C White, Adam
description	In this paper we investigate transformer architectures designed for partially observable online reinforcement learning. The self-attention mechanism in the transformer architecture is capable of capturing long-range dependencies and it is the main reason behind its effectiveness in processing sequential data. Nevertheless, despite their success, transformers have two significant drawbacks that still limit their applicability in online reinforcement learning: (1) in order to remember all past information, the self-attention mechanism requires access to the whole history to be provided as context. (2) The inference cost in transformers is expensive. In this paper, we introduce recurrent alternatives to the transformer self-attention mechanism that offer context-independent inference cost, leverage long-range dependencies effectively, and performs well in online reinforcement learning task. We quantify the impact of the different components of our architecture in a diagnostic environment and assess performance gains in 2D and 3D pixel-based partially-observable environments (e.g. T-Maze, Mystery Path, Craftax, and Memory Maze). Compared with a state-of-the-art architecture, GTrXL, inference in our approach is at least 40% cheaper while reducing memory use more than 50%. Our approach either performs similarly or better than GTrXL, improving more than 37% upon GTrXL performance in harder tasks.
doi_str_mv	10.48550/arxiv.2310.15719
format	Article
fullrecord	<record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2310_15719</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2310_15719</sourcerecordid><originalsourceid>FETCH-arxiv_primary_2310_157193</originalsourceid><addsrcrecordid>eNpjYJA0NNAzsTA1NdBPLKrILNMzMgYKGJqaG1pyMgQ4uif6ZIakWik4FhQU5Vdk5iaWpCq4A4kUBZ_MvNTEIoWQosS84rT8otzUomIFIK3gn5cDlFEISs3MA3KTU3NT80oUfIBK8zLz0nkYWNMSc4pTeaE0N4O8m2uIs4cu2O74giKgDUWV8SA3xIPdYExYBQDJ0jxp</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>AGaLiTe: Approximate Gated Linear Transformers for Online Reinforcement Learning</title><source>arXiv.org</source><creator>Pramanik, Subhojeet ; Elelimy, Esraa ; Machado, Marlos C ; White, Adam</creator><creatorcontrib>Pramanik, Subhojeet ; Elelimy, Esraa ; Machado, Marlos C ; White, Adam</creatorcontrib><description>In this paper we investigate transformer architectures designed for partially observable online reinforcement learning. The self-attention mechanism in the transformer architecture is capable of capturing long-range dependencies and it is the main reason behind its effectiveness in processing sequential data. Nevertheless, despite their success, transformers have two significant drawbacks that still limit their applicability in online reinforcement learning: (1) in order to remember all past information, the self-attention mechanism requires access to the whole history to be provided as context. (2) The inference cost in transformers is expensive. In this paper, we introduce recurrent alternatives to the transformer self-attention mechanism that offer context-independent inference cost, leverage long-range dependencies effectively, and performs well in online reinforcement learning task. We quantify the impact of the different components of our architecture in a diagnostic environment and assess performance gains in 2D and 3D pixel-based partially-observable environments (e.g. T-Maze, Mystery Path, Craftax, and Memory Maze). Compared with a state-of-the-art architecture, GTrXL, inference in our approach is at least 40% cheaper while reducing memory use more than 50%. Our approach either performs similarly or better than GTrXL, improving more than 37% upon GTrXL performance in harder tasks.</description><identifier>DOI: 10.48550/arxiv.2310.15719</identifier><language>eng</language><subject>Computer Science - Artificial Intelligence ; Computer Science - Learning</subject><creationdate>2023-10</creationdate><rights>http://creativecommons.org/licenses/by/4.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,780,885</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2310.15719$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2310.15719$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Pramanik, Subhojeet</creatorcontrib><creatorcontrib>Elelimy, Esraa</creatorcontrib><creatorcontrib>Machado, Marlos C</creatorcontrib><creatorcontrib>White, Adam</creatorcontrib><title>AGaLiTe: Approximate Gated Linear Transformers for Online Reinforcement Learning</title><description>In this paper we investigate transformer architectures designed for partially observable online reinforcement learning. The self-attention mechanism in the transformer architecture is capable of capturing long-range dependencies and it is the main reason behind its effectiveness in processing sequential data. Nevertheless, despite their success, transformers have two significant drawbacks that still limit their applicability in online reinforcement learning: (1) in order to remember all past information, the self-attention mechanism requires access to the whole history to be provided as context. (2) The inference cost in transformers is expensive. In this paper, we introduce recurrent alternatives to the transformer self-attention mechanism that offer context-independent inference cost, leverage long-range dependencies effectively, and performs well in online reinforcement learning task. We quantify the impact of the different components of our architecture in a diagnostic environment and assess performance gains in 2D and 3D pixel-based partially-observable environments (e.g. T-Maze, Mystery Path, Craftax, and Memory Maze). Compared with a state-of-the-art architecture, GTrXL, inference in our approach is at least 40% cheaper while reducing memory use more than 50%. Our approach either performs similarly or better than GTrXL, improving more than 37% upon GTrXL performance in harder tasks.</description><subject>Computer Science - Artificial Intelligence</subject><subject>Computer Science - Learning</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNpjYJA0NNAzsTA1NdBPLKrILNMzMgYKGJqaG1pyMgQ4uif6ZIakWik4FhQU5Vdk5iaWpCq4A4kUBZ_MvNTEIoWQosS84rT8otzUomIFIK3gn5cDlFEISs3MA3KTU3NT80oUfIBK8zLz0nkYWNMSc4pTeaE0N4O8m2uIs4cu2O74giKgDUWV8SA3xIPdYExYBQDJ0jxp</recordid><startdate>20231024</startdate><enddate>20231024</enddate><creator>Pramanik, Subhojeet</creator><creator>Elelimy, Esraa</creator><creator>Machado, Marlos C</creator><creator>White, Adam</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20231024</creationdate><title>AGaLiTe: Approximate Gated Linear Transformers for Online Reinforcement Learning</title><author>Pramanik, Subhojeet ; Elelimy, Esraa ; Machado, Marlos C ; White, Adam</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-arxiv_primary_2310_157193</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Computer Science - Artificial Intelligence</topic><topic>Computer Science - Learning</topic><toplevel>online_resources</toplevel><creatorcontrib>Pramanik, Subhojeet</creatorcontrib><creatorcontrib>Elelimy, Esraa</creatorcontrib><creatorcontrib>Machado, Marlos C</creatorcontrib><creatorcontrib>White, Adam</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Pramanik, Subhojeet</au><au>Elelimy, Esraa</au><au>Machado, Marlos C</au><au>White, Adam</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>AGaLiTe: Approximate Gated Linear Transformers for Online Reinforcement Learning</atitle><date>2023-10-24</date><risdate>2023</risdate><abstract>In this paper we investigate transformer architectures designed for partially observable online reinforcement learning. The self-attention mechanism in the transformer architecture is capable of capturing long-range dependencies and it is the main reason behind its effectiveness in processing sequential data. Nevertheless, despite their success, transformers have two significant drawbacks that still limit their applicability in online reinforcement learning: (1) in order to remember all past information, the self-attention mechanism requires access to the whole history to be provided as context. (2) The inference cost in transformers is expensive. In this paper, we introduce recurrent alternatives to the transformer self-attention mechanism that offer context-independent inference cost, leverage long-range dependencies effectively, and performs well in online reinforcement learning task. We quantify the impact of the different components of our architecture in a diagnostic environment and assess performance gains in 2D and 3D pixel-based partially-observable environments (e.g. T-Maze, Mystery Path, Craftax, and Memory Maze). Compared with a state-of-the-art architecture, GTrXL, inference in our approach is at least 40% cheaper while reducing memory use more than 50%. Our approach either performs similarly or better than GTrXL, improving more than 37% upon GTrXL performance in harder tasks.</abstract><doi>10.48550/arxiv.2310.15719</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	DOI: 10.48550/arxiv.2310.15719
ispartof
issn
language	eng
recordid	cdi_arxiv_primary_2310_15719
source	arXiv.org
subjects	Computer Science - Artificial Intelligence Computer Science - Learning
title	AGaLiTe: Approximate Gated Linear Transformers for Online Reinforcement Learning
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-23T19%3A26%3A45IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=AGaLiTe:%20Approximate%20Gated%20Linear%20Transformers%20for%20Online%20Reinforcement%20Learning&rft.au=Pramanik,%20Subhojeet&rft.date=2023-10-24&rft_id=info:doi/10.48550/arxiv.2310.15719&rft_dat=%3Carxiv_GOX%3E2310_15719%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true