Asynchronous Distributed Reinforcement Learning for LQR Control via Zeroth-Order Block Coordinate Descent

Recently introduced distributed zeroth-order optimization (ZOO) algorithms have shown their utility in distributed reinforcement learning (RL). Unfortunately, in the gradient estimation process, almost all of them require random samples with the same dimension as the global variable and/or require e...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on automatic control 2024-11, Vol.69 (11), p.7524-7539
Hauptverfasser:	Jing, Gangshan, Bai, He, George, Jemin, Chakrabortty, Aranya, Sharma, Piyush K.
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Computer aided instruction Convergence Cost function Costs Design optimization Distance learning Distributed learning Estimation Hybrid learning Linear programming Linear quadratic regulator Machine learning multiagent systems (MASs) Optimization Reinforcement learning reinforcement learning (RL) Variance zeroth-order optimization (ZO)
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	7539
container_issue	11
container_start_page	7524
container_title	IEEE transactions on automatic control
container_volume	69
creator	Jing, Gangshan Bai, He George, Jemin Chakrabortty, Aranya Sharma, Piyush K.
description	Recently introduced distributed zeroth-order optimization (ZOO) algorithms have shown their utility in distributed reinforcement learning (RL). Unfortunately, in the gradient estimation process, almost all of them require random samples with the same dimension as the global variable and/or require evaluation of the global cost function, which may induce high estimation variance for large-scale networks. In this paper, we propose a novel distributed zeroth-order algorithm by leveraging the network structure inherent in the optimization objective, which allows each agent to estimate its local gradient by local cost evaluation independently, without use of any consensus protocol. The proposed algorithm exhibits an asynchronous update scheme, and works for stochastic nonconvex optimization with a possibly nonconvex feasible domain based on the block coordinate descent method. The algorithm is later employed as a distributed model-free RL algorithm for distributed linear quadratic regulator design. We provide an empirical validation of the proposed algorithm to benchmark its performance on convergence rate and variance against a centralized ZOO algorithm.
doi_str_mv	10.1109/TAC.2024.3386061
format	Article
fullrecord	<record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_ieee_primary_10494371</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10494371</ieee_id><sourcerecordid>3120655653</sourcerecordid><originalsourceid>FETCH-LOGICAL-c245t-84a624188adb55f188ca058e27ac0b443943ab8042ea63d7004818e956ac607c3</originalsourceid><addsrcrecordid>eNpNkMtPAjEQxhujiYjePXho4nmxb8oRF18JCZHgxcum252VIrTYLib895bAwdO8vvlm8kPolpIBpWT0sBiXA0aYGHCuFVH0DPWolLpgkvFz1COE6mLEtLpEVymtcqmEoD3kxmnv7TIGH3YJT1zqoqt3HTR4Ds63IVrYgO_wFEz0zn_h3MLT9zkug-9iWONfZ_AnxNAti1lsIOLHdbDfeRxi47zpAE8g2WxxjS5as05wc4p99PH8tChfi-ns5a0cTwvLhOwKLYxigmptmlrKNifWEKmBDY0ltRB8JLipNREMjOLNkBChqYaRVMYqMrS8j-6PvtsYfnaQumoVdtHnkxWnjCgpleRZRY4qG0NKEdpqG93GxH1FSXUAWmWg1QFodQKaV-6OKw4A_slF_mhI-R8JE3Fy</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3120655653</pqid></control><display><type>article</type><title>Asynchronous Distributed Reinforcement Learning for LQR Control via Zeroth-Order Block Coordinate Descent</title><source>IEEE Electronic Library (IEL)</source><creator>Jing, Gangshan ; Bai, He ; George, Jemin ; Chakrabortty, Aranya ; Sharma, Piyush K.</creator><creatorcontrib>Jing, Gangshan ; Bai, He ; George, Jemin ; Chakrabortty, Aranya ; Sharma, Piyush K.</creatorcontrib><description>Recently introduced distributed zeroth-order optimization (ZOO) algorithms have shown their utility in distributed reinforcement learning (RL). Unfortunately, in the gradient estimation process, almost all of them require random samples with the same dimension as the global variable and/or require evaluation of the global cost function, which may induce high estimation variance for large-scale networks. In this paper, we propose a novel distributed zeroth-order algorithm by leveraging the network structure inherent in the optimization objective, which allows each agent to estimate its local gradient by local cost evaluation independently, without use of any consensus protocol. The proposed algorithm exhibits an asynchronous update scheme, and works for stochastic nonconvex optimization with a possibly nonconvex feasible domain based on the block coordinate descent method. The algorithm is later employed as a distributed model-free RL algorithm for distributed linear quadratic regulator design. We provide an empirical validation of the proposed algorithm to benchmark its performance on convergence rate and variance against a centralized ZOO algorithm.</description><identifier>ISSN: 0018-9286</identifier><identifier>EISSN: 1558-2523</identifier><identifier>DOI: 10.1109/TAC.2024.3386061</identifier><identifier>CODEN: IETAA9</identifier><language>eng</language><publisher>New York: IEEE</publisher><subject>Algorithms ; Computer aided instruction ; Convergence ; Cost function ; Costs ; Design optimization ; Distance learning ; Distributed learning ; Estimation ; Hybrid learning ; Linear programming ; Linear quadratic regulator ; Machine learning ; multiagent systems (MASs) ; Optimization ; Reinforcement learning ; reinforcement learning (RL) ; Variance ; zeroth-order optimization (ZO)</subject><ispartof>IEEE transactions on automatic control, 2024-11, Vol.69 (11), p.7524-7539</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2024</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c245t-84a624188adb55f188ca058e27ac0b443943ab8042ea63d7004818e956ac607c3</cites><orcidid>0000-0003-0066-204X ; 0000-0002-4247-0698 ; 0000-0002-3474-8215 ; 0000-0001-8417-5411 ; 0000-0001-8829-9390</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10494371$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,776,780,792,27901,27902,54733</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/10494371$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Jing, Gangshan</creatorcontrib><creatorcontrib>Bai, He</creatorcontrib><creatorcontrib>George, Jemin</creatorcontrib><creatorcontrib>Chakrabortty, Aranya</creatorcontrib><creatorcontrib>Sharma, Piyush K.</creatorcontrib><title>Asynchronous Distributed Reinforcement Learning for LQR Control via Zeroth-Order Block Coordinate Descent</title><title>IEEE transactions on automatic control</title><addtitle>TAC</addtitle><description>Recently introduced distributed zeroth-order optimization (ZOO) algorithms have shown their utility in distributed reinforcement learning (RL). Unfortunately, in the gradient estimation process, almost all of them require random samples with the same dimension as the global variable and/or require evaluation of the global cost function, which may induce high estimation variance for large-scale networks. In this paper, we propose a novel distributed zeroth-order algorithm by leveraging the network structure inherent in the optimization objective, which allows each agent to estimate its local gradient by local cost evaluation independently, without use of any consensus protocol. The proposed algorithm exhibits an asynchronous update scheme, and works for stochastic nonconvex optimization with a possibly nonconvex feasible domain based on the block coordinate descent method. The algorithm is later employed as a distributed model-free RL algorithm for distributed linear quadratic regulator design. We provide an empirical validation of the proposed algorithm to benchmark its performance on convergence rate and variance against a centralized ZOO algorithm.</description><subject>Algorithms</subject><subject>Computer aided instruction</subject><subject>Convergence</subject><subject>Cost function</subject><subject>Costs</subject><subject>Design optimization</subject><subject>Distance learning</subject><subject>Distributed learning</subject><subject>Estimation</subject><subject>Hybrid learning</subject><subject>Linear programming</subject><subject>Linear quadratic regulator</subject><subject>Machine learning</subject><subject>multiagent systems (MASs)</subject><subject>Optimization</subject><subject>Reinforcement learning</subject><subject>reinforcement learning (RL)</subject><subject>Variance</subject><subject>zeroth-order optimization (ZO)</subject><issn>0018-9286</issn><issn>1558-2523</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNpNkMtPAjEQxhujiYjePXho4nmxb8oRF18JCZHgxcum252VIrTYLib895bAwdO8vvlm8kPolpIBpWT0sBiXA0aYGHCuFVH0DPWolLpgkvFz1COE6mLEtLpEVymtcqmEoD3kxmnv7TIGH3YJT1zqoqt3HTR4Ds63IVrYgO_wFEz0zn_h3MLT9zkug-9iWONfZ_AnxNAti1lsIOLHdbDfeRxi47zpAE8g2WxxjS5as05wc4p99PH8tChfi-ns5a0cTwvLhOwKLYxigmptmlrKNifWEKmBDY0ltRB8JLipNREMjOLNkBChqYaRVMYqMrS8j-6PvtsYfnaQumoVdtHnkxWnjCgpleRZRY4qG0NKEdpqG93GxH1FSXUAWmWg1QFodQKaV-6OKw4A_slF_mhI-R8JE3Fy</recordid><startdate>20241101</startdate><enddate>20241101</enddate><creator>Jing, Gangshan</creator><creator>Bai, He</creator><creator>George, Jemin</creator><creator>Chakrabortty, Aranya</creator><creator>Sharma, Piyush K.</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>7TB</scope><scope>8FD</scope><scope>FR3</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0003-0066-204X</orcidid><orcidid>https://orcid.org/0000-0002-4247-0698</orcidid><orcidid>https://orcid.org/0000-0002-3474-8215</orcidid><orcidid>https://orcid.org/0000-0001-8417-5411</orcidid><orcidid>https://orcid.org/0000-0001-8829-9390</orcidid></search><sort><creationdate>20241101</creationdate><title>Asynchronous Distributed Reinforcement Learning for LQR Control via Zeroth-Order Block Coordinate Descent</title><author>Jing, Gangshan ; Bai, He ; George, Jemin ; Chakrabortty, Aranya ; Sharma, Piyush K.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c245t-84a624188adb55f188ca058e27ac0b443943ab8042ea63d7004818e956ac607c3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Algorithms</topic><topic>Computer aided instruction</topic><topic>Convergence</topic><topic>Cost function</topic><topic>Costs</topic><topic>Design optimization</topic><topic>Distance learning</topic><topic>Distributed learning</topic><topic>Estimation</topic><topic>Hybrid learning</topic><topic>Linear programming</topic><topic>Linear quadratic regulator</topic><topic>Machine learning</topic><topic>multiagent systems (MASs)</topic><topic>Optimization</topic><topic>Reinforcement learning</topic><topic>reinforcement learning (RL)</topic><topic>Variance</topic><topic>zeroth-order optimization (ZO)</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Jing, Gangshan</creatorcontrib><creatorcontrib>Bai, He</creatorcontrib><creatorcontrib>George, Jemin</creatorcontrib><creatorcontrib>Chakrabortty, Aranya</creatorcontrib><creatorcontrib>Sharma, Piyush K.</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Mechanical & Transportation Engineering Abstracts</collection><collection>Technology Research Database</collection><collection>Engineering Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>IEEE transactions on automatic control</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Jing, Gangshan</au><au>Bai, He</au><au>George, Jemin</au><au>Chakrabortty, Aranya</au><au>Sharma, Piyush K.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Asynchronous Distributed Reinforcement Learning for LQR Control via Zeroth-Order Block Coordinate Descent</atitle><jtitle>IEEE transactions on automatic control</jtitle><stitle>TAC</stitle><date>2024-11-01</date><risdate>2024</risdate><volume>69</volume><issue>11</issue><spage>7524</spage><epage>7539</epage><pages>7524-7539</pages><issn>0018-9286</issn><eissn>1558-2523</eissn><coden>IETAA9</coden><abstract>Recently introduced distributed zeroth-order optimization (ZOO) algorithms have shown their utility in distributed reinforcement learning (RL). Unfortunately, in the gradient estimation process, almost all of them require random samples with the same dimension as the global variable and/or require evaluation of the global cost function, which may induce high estimation variance for large-scale networks. In this paper, we propose a novel distributed zeroth-order algorithm by leveraging the network structure inherent in the optimization objective, which allows each agent to estimate its local gradient by local cost evaluation independently, without use of any consensus protocol. The proposed algorithm exhibits an asynchronous update scheme, and works for stochastic nonconvex optimization with a possibly nonconvex feasible domain based on the block coordinate descent method. The algorithm is later employed as a distributed model-free RL algorithm for distributed linear quadratic regulator design. We provide an empirical validation of the proposed algorithm to benchmark its performance on convergence rate and variance against a centralized ZOO algorithm.</abstract><cop>New York</cop><pub>IEEE</pub><doi>10.1109/TAC.2024.3386061</doi><tpages>16</tpages><orcidid>https://orcid.org/0000-0003-0066-204X</orcidid><orcidid>https://orcid.org/0000-0002-4247-0698</orcidid><orcidid>https://orcid.org/0000-0002-3474-8215</orcidid><orcidid>https://orcid.org/0000-0001-8417-5411</orcidid><orcidid>https://orcid.org/0000-0001-8829-9390</orcidid></addata></record>
fulltext	fulltext_linktorsrc
identifier	ISSN: 0018-9286
ispartof	IEEE transactions on automatic control, 2024-11, Vol.69 (11), p.7524-7539
issn	0018-9286 1558-2523
language	eng
recordid	cdi_ieee_primary_10494371
source	IEEE Electronic Library (IEL)
subjects	Algorithms Computer aided instruction Convergence Cost function Costs Design optimization Distance learning Distributed learning Estimation Hybrid learning Linear programming Linear quadratic regulator Machine learning multiagent systems (MASs) Optimization Reinforcement learning reinforcement learning (RL) Variance zeroth-order optimization (ZO)
title	Asynchronous Distributed Reinforcement Learning for LQR Control via Zeroth-Order Block Coordinate Descent
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-31T20%3A25%3A00IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Asynchronous%20Distributed%20Reinforcement%20Learning%20for%20LQR%20Control%20via%20Zeroth-Order%20Block%20Coordinate%20Descent&rft.jtitle=IEEE%20transactions%20on%20automatic%20control&rft.au=Jing,%20Gangshan&rft.date=2024-11-01&rft.volume=69&rft.issue=11&rft.spage=7524&rft.epage=7539&rft.pages=7524-7539&rft.issn=0018-9286&rft.eissn=1558-2523&rft.coden=IETAA9&rft_id=info:doi/10.1109/TAC.2024.3386061&rft_dat=%3Cproquest_RIE%3E3120655653%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=3120655653&rft_id=info:pmid/&rft_ieee_id=10494371&rfr_iscdi=true