PID Accelerated Temporal Difference Algorithms

Long-horizon tasks, which have a large discount factor, pose a challenge for most conventional reinforcement learning (RL) algorithms. Algorithms such as Value Iteration and Temporal Difference (TD) learning have a slow convergence rate and become inefficient in these tasks. When the transition dist...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Bedaywi, Mark, Rakhsha, Amin, Farahmand, Amir-massoud
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer Science - Artificial Intelligence Computer Science - Learning Computer Science - Systems and Control Mathematics - Optimization and Control Statistics - Machine Learning
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title
container_volume
creator	Bedaywi, Mark Rakhsha, Amin Farahmand, Amir-massoud
description	Long-horizon tasks, which have a large discount factor, pose a challenge for most conventional reinforcement learning (RL) algorithms. Algorithms such as Value Iteration and Temporal Difference (TD) learning have a slow convergence rate and become inefficient in these tasks. When the transition distributions are given, PID VI was recently introduced to accelerate the convergence of Value Iteration using ideas from control theory. Inspired by this, we introduce PID TD Learning and PID Q-Learning algorithms for the RL setting, in which only samples from the environment are available. We give a theoretical analysis of the convergence of PID TD Learning and its acceleration compared to the conventional TD Learning. We also introduce a method for adapting PID gains in the presence of noise and empirically verify its effectiveness.
doi_str_mv	10.48550/arxiv.2407.08803
format	Article
fullrecord	<record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2407_08803</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2407_08803</sourcerecordid><originalsourceid>FETCH-arxiv_primary_2407_088033</originalsourceid><addsrcrecordid>eNpjYJA0NNAzsTA1NdBPLKrILNMzMjEw1zOwsDAw5mTQC_B0UXBMTk7NSS1KLElNUQhJzS3IL0rMUXDJTEtLLUrNS05VcMxJzy_KLMnILeZhYE1LzClO5YXS3Azybq4hzh66YJPjC4oycxOLKuNBNsSDbTAmrAIAqC4vtQ</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>PID Accelerated Temporal Difference Algorithms</title><source>arXiv.org</source><creator>Bedaywi, Mark ; Rakhsha, Amin ; Farahmand, Amir-massoud</creator><creatorcontrib>Bedaywi, Mark ; Rakhsha, Amin ; Farahmand, Amir-massoud</creatorcontrib><description>Long-horizon tasks, which have a large discount factor, pose a challenge for most conventional reinforcement learning (RL) algorithms. Algorithms such as Value Iteration and Temporal Difference (TD) learning have a slow convergence rate and become inefficient in these tasks. When the transition distributions are given, PID VI was recently introduced to accelerate the convergence of Value Iteration using ideas from control theory. Inspired by this, we introduce PID TD Learning and PID Q-Learning algorithms for the RL setting, in which only samples from the environment are available. We give a theoretical analysis of the convergence of PID TD Learning and its acceleration compared to the conventional TD Learning. We also introduce a method for adapting PID gains in the presence of noise and empirically verify its effectiveness.</description><identifier>DOI: 10.48550/arxiv.2407.08803</identifier><language>eng</language><subject>Computer Science - Artificial Intelligence ; Computer Science - Learning ; Computer Science - Systems and Control ; Mathematics - Optimization and Control ; Statistics - Machine Learning</subject><creationdate>2024-07</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,781,886</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2407.08803$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2407.08803$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Bedaywi, Mark</creatorcontrib><creatorcontrib>Rakhsha, Amin</creatorcontrib><creatorcontrib>Farahmand, Amir-massoud</creatorcontrib><title>PID Accelerated Temporal Difference Algorithms</title><description>Long-horizon tasks, which have a large discount factor, pose a challenge for most conventional reinforcement learning (RL) algorithms. Algorithms such as Value Iteration and Temporal Difference (TD) learning have a slow convergence rate and become inefficient in these tasks. When the transition distributions are given, PID VI was recently introduced to accelerate the convergence of Value Iteration using ideas from control theory. Inspired by this, we introduce PID TD Learning and PID Q-Learning algorithms for the RL setting, in which only samples from the environment are available. We give a theoretical analysis of the convergence of PID TD Learning and its acceleration compared to the conventional TD Learning. We also introduce a method for adapting PID gains in the presence of noise and empirically verify its effectiveness.</description><subject>Computer Science - Artificial Intelligence</subject><subject>Computer Science - Learning</subject><subject>Computer Science - Systems and Control</subject><subject>Mathematics - Optimization and Control</subject><subject>Statistics - Machine Learning</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNpjYJA0NNAzsTA1NdBPLKrILNMzMjEw1zOwsDAw5mTQC_B0UXBMTk7NSS1KLElNUQhJzS3IL0rMUXDJTEtLLUrNS05VcMxJzy_KLMnILeZhYE1LzClO5YXS3Azybq4hzh66YJPjC4oycxOLKuNBNsSDbTAmrAIAqC4vtQ</recordid><startdate>20240711</startdate><enddate>20240711</enddate><creator>Bedaywi, Mark</creator><creator>Rakhsha, Amin</creator><creator>Farahmand, Amir-massoud</creator><scope>AKY</scope><scope>AKZ</scope><scope>EPD</scope><scope>GOX</scope></search><sort><creationdate>20240711</creationdate><title>PID Accelerated Temporal Difference Algorithms</title><author>Bedaywi, Mark ; Rakhsha, Amin ; Farahmand, Amir-massoud</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-arxiv_primary_2407_088033</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Computer Science - Artificial Intelligence</topic><topic>Computer Science - Learning</topic><topic>Computer Science - Systems and Control</topic><topic>Mathematics - Optimization and Control</topic><topic>Statistics - Machine Learning</topic><toplevel>online_resources</toplevel><creatorcontrib>Bedaywi, Mark</creatorcontrib><creatorcontrib>Rakhsha, Amin</creatorcontrib><creatorcontrib>Farahmand, Amir-massoud</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv Mathematics</collection><collection>arXiv Statistics</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Bedaywi, Mark</au><au>Rakhsha, Amin</au><au>Farahmand, Amir-massoud</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>PID Accelerated Temporal Difference Algorithms</atitle><date>2024-07-11</date><risdate>2024</risdate><abstract>Long-horizon tasks, which have a large discount factor, pose a challenge for most conventional reinforcement learning (RL) algorithms. Algorithms such as Value Iteration and Temporal Difference (TD) learning have a slow convergence rate and become inefficient in these tasks. When the transition distributions are given, PID VI was recently introduced to accelerate the convergence of Value Iteration using ideas from control theory. Inspired by this, we introduce PID TD Learning and PID Q-Learning algorithms for the RL setting, in which only samples from the environment are available. We give a theoretical analysis of the convergence of PID TD Learning and its acceleration compared to the conventional TD Learning. We also introduce a method for adapting PID gains in the presence of noise and empirically verify its effectiveness.</abstract><doi>10.48550/arxiv.2407.08803</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	DOI: 10.48550/arxiv.2407.08803
ispartof
issn
language	eng
recordid	cdi_arxiv_primary_2407_08803
source	arXiv.org
subjects	Computer Science - Artificial Intelligence Computer Science - Learning Computer Science - Systems and Control Mathematics - Optimization and Control Statistics - Machine Learning
title	PID Accelerated Temporal Difference Algorithms
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-12T23%3A40%3A30IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=PID%20Accelerated%20Temporal%20Difference%20Algorithms&rft.au=Bedaywi,%20Mark&rft.date=2024-07-11&rft_id=info:doi/10.48550/arxiv.2407.08803&rft_dat=%3Carxiv_GOX%3E2407_08803%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true