AGaLiTe: Approximate Gated Linear Transformers for Online Reinforcement Learning
In this paper we investigate transformer architectures designed for partially observable online reinforcement learning. The self-attention mechanism in the transformer architecture is capable of capturing long-range dependencies and it is the main reason behind its effectiveness in processing sequen...
Gespeichert in:
Hauptverfasser: | , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | |
---|---|
container_issue | |
container_start_page | |
container_title | |
container_volume | |
creator | Pramanik, Subhojeet Elelimy, Esraa Machado, Marlos C White, Adam |
description | In this paper we investigate transformer architectures designed for partially
observable online reinforcement learning. The self-attention mechanism in the
transformer architecture is capable of capturing long-range dependencies and it
is the main reason behind its effectiveness in processing sequential data.
Nevertheless, despite their success, transformers have two significant
drawbacks that still limit their applicability in online reinforcement
learning: (1) in order to remember all past information, the self-attention
mechanism requires access to the whole history to be provided as context. (2)
The inference cost in transformers is expensive. In this paper, we introduce
recurrent alternatives to the transformer self-attention mechanism that offer
context-independent inference cost, leverage long-range dependencies
effectively, and performs well in online reinforcement learning task. We
quantify the impact of the different components of our architecture in a
diagnostic environment and assess performance gains in 2D and 3D pixel-based
partially-observable environments (e.g. T-Maze, Mystery Path, Craftax, and
Memory Maze). Compared with a state-of-the-art architecture, GTrXL, inference
in our approach is at least 40% cheaper while reducing memory use more than
50%. Our approach either performs similarly or better than GTrXL, improving
more than 37% upon GTrXL performance in harder tasks. |
doi_str_mv | 10.48550/arxiv.2310.15719 |
format | Article |
fullrecord | <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2310_15719</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2310_15719</sourcerecordid><originalsourceid>FETCH-arxiv_primary_2310_157193</originalsourceid><addsrcrecordid>eNpjYJA0NNAzsTA1NdBPLKrILNMzMgYKGJqaG1pyMgQ4uif6ZIakWik4FhQU5Vdk5iaWpCq4A4kUBZ_MvNTEIoWQosS84rT8otzUomIFIK3gn5cDlFEISs3MA3KTU3NT80oUfIBK8zLz0nkYWNMSc4pTeaE0N4O8m2uIs4cu2O74giKgDUWV8SA3xIPdYExYBQDJ0jxp</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>AGaLiTe: Approximate Gated Linear Transformers for Online Reinforcement Learning</title><source>arXiv.org</source><creator>Pramanik, Subhojeet ; Elelimy, Esraa ; Machado, Marlos C ; White, Adam</creator><creatorcontrib>Pramanik, Subhojeet ; Elelimy, Esraa ; Machado, Marlos C ; White, Adam</creatorcontrib><description>In this paper we investigate transformer architectures designed for partially
observable online reinforcement learning. The self-attention mechanism in the
transformer architecture is capable of capturing long-range dependencies and it
is the main reason behind its effectiveness in processing sequential data.
Nevertheless, despite their success, transformers have two significant
drawbacks that still limit their applicability in online reinforcement
learning: (1) in order to remember all past information, the self-attention
mechanism requires access to the whole history to be provided as context. (2)
The inference cost in transformers is expensive. In this paper, we introduce
recurrent alternatives to the transformer self-attention mechanism that offer
context-independent inference cost, leverage long-range dependencies
effectively, and performs well in online reinforcement learning task. We
quantify the impact of the different components of our architecture in a
diagnostic environment and assess performance gains in 2D and 3D pixel-based
partially-observable environments (e.g. T-Maze, Mystery Path, Craftax, and
Memory Maze). Compared with a state-of-the-art architecture, GTrXL, inference
in our approach is at least 40% cheaper while reducing memory use more than
50%. Our approach either performs similarly or better than GTrXL, improving
more than 37% upon GTrXL performance in harder tasks.</description><identifier>DOI: 10.48550/arxiv.2310.15719</identifier><language>eng</language><subject>Computer Science - Artificial Intelligence ; Computer Science - Learning</subject><creationdate>2023-10</creationdate><rights>http://creativecommons.org/licenses/by/4.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,780,885</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2310.15719$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2310.15719$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Pramanik, Subhojeet</creatorcontrib><creatorcontrib>Elelimy, Esraa</creatorcontrib><creatorcontrib>Machado, Marlos C</creatorcontrib><creatorcontrib>White, Adam</creatorcontrib><title>AGaLiTe: Approximate Gated Linear Transformers for Online Reinforcement Learning</title><description>In this paper we investigate transformer architectures designed for partially
observable online reinforcement learning. The self-attention mechanism in the
transformer architecture is capable of capturing long-range dependencies and it
is the main reason behind its effectiveness in processing sequential data.
Nevertheless, despite their success, transformers have two significant
drawbacks that still limit their applicability in online reinforcement
learning: (1) in order to remember all past information, the self-attention
mechanism requires access to the whole history to be provided as context. (2)
The inference cost in transformers is expensive. In this paper, we introduce
recurrent alternatives to the transformer self-attention mechanism that offer
context-independent inference cost, leverage long-range dependencies
effectively, and performs well in online reinforcement learning task. We
quantify the impact of the different components of our architecture in a
diagnostic environment and assess performance gains in 2D and 3D pixel-based
partially-observable environments (e.g. T-Maze, Mystery Path, Craftax, and
Memory Maze). Compared with a state-of-the-art architecture, GTrXL, inference
in our approach is at least 40% cheaper while reducing memory use more than
50%. Our approach either performs similarly or better than GTrXL, improving
more than 37% upon GTrXL performance in harder tasks.</description><subject>Computer Science - Artificial Intelligence</subject><subject>Computer Science - Learning</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNpjYJA0NNAzsTA1NdBPLKrILNMzMgYKGJqaG1pyMgQ4uif6ZIakWik4FhQU5Vdk5iaWpCq4A4kUBZ_MvNTEIoWQosS84rT8otzUomIFIK3gn5cDlFEISs3MA3KTU3NT80oUfIBK8zLz0nkYWNMSc4pTeaE0N4O8m2uIs4cu2O74giKgDUWV8SA3xIPdYExYBQDJ0jxp</recordid><startdate>20231024</startdate><enddate>20231024</enddate><creator>Pramanik, Subhojeet</creator><creator>Elelimy, Esraa</creator><creator>Machado, Marlos C</creator><creator>White, Adam</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20231024</creationdate><title>AGaLiTe: Approximate Gated Linear Transformers for Online Reinforcement Learning</title><author>Pramanik, Subhojeet ; Elelimy, Esraa ; Machado, Marlos C ; White, Adam</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-arxiv_primary_2310_157193</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Computer Science - Artificial Intelligence</topic><topic>Computer Science - Learning</topic><toplevel>online_resources</toplevel><creatorcontrib>Pramanik, Subhojeet</creatorcontrib><creatorcontrib>Elelimy, Esraa</creatorcontrib><creatorcontrib>Machado, Marlos C</creatorcontrib><creatorcontrib>White, Adam</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Pramanik, Subhojeet</au><au>Elelimy, Esraa</au><au>Machado, Marlos C</au><au>White, Adam</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>AGaLiTe: Approximate Gated Linear Transformers for Online Reinforcement Learning</atitle><date>2023-10-24</date><risdate>2023</risdate><abstract>In this paper we investigate transformer architectures designed for partially
observable online reinforcement learning. The self-attention mechanism in the
transformer architecture is capable of capturing long-range dependencies and it
is the main reason behind its effectiveness in processing sequential data.
Nevertheless, despite their success, transformers have two significant
drawbacks that still limit their applicability in online reinforcement
learning: (1) in order to remember all past information, the self-attention
mechanism requires access to the whole history to be provided as context. (2)
The inference cost in transformers is expensive. In this paper, we introduce
recurrent alternatives to the transformer self-attention mechanism that offer
context-independent inference cost, leverage long-range dependencies
effectively, and performs well in online reinforcement learning task. We
quantify the impact of the different components of our architecture in a
diagnostic environment and assess performance gains in 2D and 3D pixel-based
partially-observable environments (e.g. T-Maze, Mystery Path, Craftax, and
Memory Maze). Compared with a state-of-the-art architecture, GTrXL, inference
in our approach is at least 40% cheaper while reducing memory use more than
50%. Our approach either performs similarly or better than GTrXL, improving
more than 37% upon GTrXL performance in harder tasks.</abstract><doi>10.48550/arxiv.2310.15719</doi><oa>free_for_read</oa></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | DOI: 10.48550/arxiv.2310.15719 |
ispartof | |
issn | |
language | eng |
recordid | cdi_arxiv_primary_2310_15719 |
source | arXiv.org |
subjects | Computer Science - Artificial Intelligence Computer Science - Learning |
title | AGaLiTe: Approximate Gated Linear Transformers for Online Reinforcement Learning |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-23T19%3A26%3A45IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=AGaLiTe:%20Approximate%20Gated%20Linear%20Transformers%20for%20Online%20Reinforcement%20Learning&rft.au=Pramanik,%20Subhojeet&rft.date=2023-10-24&rft_id=info:doi/10.48550/arxiv.2310.15719&rft_dat=%3Carxiv_GOX%3E2310_15719%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true |