Online inverse reinforcement learning for nonlinear systems with adversarial attacks

In the inverse reinforcement learning (RL) problem, there are two agents. A learner agent seeks to mimic another expert agent's state and control input behavior trajectories by observing the expert's behavior trajectories. These observations are used to reconstruct the unknown expert'...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:International journal of robust and nonlinear control 2021-09, Vol.31 (14), p.6646-6667
Hauptverfasser: Lian, Bosen, Xue, Wenqian, Lewis, Frank L., Chai, Tianyou
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 6667
container_issue 14
container_start_page 6646
container_title International journal of robust and nonlinear control
container_volume 31
creator Lian, Bosen
Xue, Wenqian
Lewis, Frank L.
Chai, Tianyou
description In the inverse reinforcement learning (RL) problem, there are two agents. A learner agent seeks to mimic another expert agent's state and control input behavior trajectories by observing the expert's behavior trajectories. These observations are used to reconstruct the unknown expert's performance objective. This article develops novel inverse RL algorithms to solve the inverse RL problem in which both agents suffer from adversarial attacks and have continuous‐time nonlinear dynamics. We first propose an offline inverse RL algorithm for the learner to reconstruct unknown expert's performance objective. This offline inverse RL algorithm is based on the technique of integral RL (IRL) and only needs partial knowledge of the system dynamics. The algorithm has two learning stages: an optimal control learning stage first and a second learning stage based on inverse optimal control. Then, based on the offline algorithm, an online inverse RL algorithm is further developed to solve the inverse RL problem in real time without knowing the system drift dynamics. This online adaptive learning method consists of simultaneous adaptation of four neural networks (NNs): a critic NN, an actor NN, an adversary NN, and a state penalty NN. Convergence of the algorithms as well as the stability of the learner system and the synchronous tuning NNs are guaranteed. Simulation examples verify the effectiveness of the online method.
doi_str_mv 10.1002/rnc.5626
format Article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2559629301</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2559629301</sourcerecordid><originalsourceid>FETCH-LOGICAL-c3276-e1513a511446d97fb21861bad96dad0dcffbffb0e59c5b65e5db70aa42fd4fce3</originalsourceid><addsrcrecordid>eNp1kE1LAzEQQIMoWKvgTwh48bI1yW7S5ijFLygWpJ5DNplo6jZbk61l_73Z1qswMMPMmxl4CF1TMqGEsLsYzIQLJk7QiBIpC8pKeTrUlSxmkpXn6CKlNSF5xqoRWi1D4wNgH34gJsARfHBtNLCB0OEGdAw-fODcwqE9oDri1KcONgnvffeJtR02dfS6wbrrtPlKl-jM6SbB1V8eo_fHh9X8uVgsn17m94vClGwqCqCclppTWlXCyqmrGZ0JWmsrhdWWWONcnYMAl4bXggO39ZRoXTFnK2egHKOb491tbL93kDq1bncx5JeKcS4FkyWhmbo9Uia2KUVwahv9RsdeUaIGZyo7U4OzjBZHdO8b6P_l1Nvr_MD_Att5b74</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2559629301</pqid></control><display><type>article</type><title>Online inverse reinforcement learning for nonlinear systems with adversarial attacks</title><source>Wiley Online Library All Journals</source><creator>Lian, Bosen ; Xue, Wenqian ; Lewis, Frank L. ; Chai, Tianyou</creator><creatorcontrib>Lian, Bosen ; Xue, Wenqian ; Lewis, Frank L. ; Chai, Tianyou</creatorcontrib><description>In the inverse reinforcement learning (RL) problem, there are two agents. A learner agent seeks to mimic another expert agent's state and control input behavior trajectories by observing the expert's behavior trajectories. These observations are used to reconstruct the unknown expert's performance objective. This article develops novel inverse RL algorithms to solve the inverse RL problem in which both agents suffer from adversarial attacks and have continuous‐time nonlinear dynamics. We first propose an offline inverse RL algorithm for the learner to reconstruct unknown expert's performance objective. This offline inverse RL algorithm is based on the technique of integral RL (IRL) and only needs partial knowledge of the system dynamics. The algorithm has two learning stages: an optimal control learning stage first and a second learning stage based on inverse optimal control. Then, based on the offline algorithm, an online inverse RL algorithm is further developed to solve the inverse RL problem in real time without knowing the system drift dynamics. This online adaptive learning method consists of simultaneous adaptation of four neural networks (NNs): a critic NN, an actor NN, an adversary NN, and a state penalty NN. Convergence of the algorithms as well as the stability of the learner system and the synchronous tuning NNs are guaranteed. Simulation examples verify the effectiveness of the online method.</description><identifier>ISSN: 1049-8923</identifier><identifier>EISSN: 1099-1239</identifier><identifier>DOI: 10.1002/rnc.5626</identifier><language>eng</language><publisher>Bognor Regis: Wiley Subscription Services, Inc</publisher><subject>adaptive control ; Algorithms ; Dynamical systems ; integral reinforcement learning ; inverse optimal control ; inverse reinforcement learning ; Machine learning ; Neural networks ; Nonlinear dynamics ; Nonlinear systems ; Optimal control ; System dynamics ; Trajectory control</subject><ispartof>International journal of robust and nonlinear control, 2021-09, Vol.31 (14), p.6646-6667</ispartof><rights>2021 John Wiley &amp; Sons Ltd.</rights><rights>2021 John Wiley &amp; Sons, Ltd.</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c3276-e1513a511446d97fb21861bad96dad0dcffbffb0e59c5b65e5db70aa42fd4fce3</citedby><cites>FETCH-LOGICAL-c3276-e1513a511446d97fb21861bad96dad0dcffbffb0e59c5b65e5db70aa42fd4fce3</cites><orcidid>0000-0002-3275-9551</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://onlinelibrary.wiley.com/doi/pdf/10.1002%2Frnc.5626$$EPDF$$P50$$Gwiley$$H</linktopdf><linktohtml>$$Uhttps://onlinelibrary.wiley.com/doi/full/10.1002%2Frnc.5626$$EHTML$$P50$$Gwiley$$H</linktohtml><link.rule.ids>314,780,784,1416,27922,27923,45572,45573</link.rule.ids></links><search><creatorcontrib>Lian, Bosen</creatorcontrib><creatorcontrib>Xue, Wenqian</creatorcontrib><creatorcontrib>Lewis, Frank L.</creatorcontrib><creatorcontrib>Chai, Tianyou</creatorcontrib><title>Online inverse reinforcement learning for nonlinear systems with adversarial attacks</title><title>International journal of robust and nonlinear control</title><description>In the inverse reinforcement learning (RL) problem, there are two agents. A learner agent seeks to mimic another expert agent's state and control input behavior trajectories by observing the expert's behavior trajectories. These observations are used to reconstruct the unknown expert's performance objective. This article develops novel inverse RL algorithms to solve the inverse RL problem in which both agents suffer from adversarial attacks and have continuous‐time nonlinear dynamics. We first propose an offline inverse RL algorithm for the learner to reconstruct unknown expert's performance objective. This offline inverse RL algorithm is based on the technique of integral RL (IRL) and only needs partial knowledge of the system dynamics. The algorithm has two learning stages: an optimal control learning stage first and a second learning stage based on inverse optimal control. Then, based on the offline algorithm, an online inverse RL algorithm is further developed to solve the inverse RL problem in real time without knowing the system drift dynamics. This online adaptive learning method consists of simultaneous adaptation of four neural networks (NNs): a critic NN, an actor NN, an adversary NN, and a state penalty NN. Convergence of the algorithms as well as the stability of the learner system and the synchronous tuning NNs are guaranteed. Simulation examples verify the effectiveness of the online method.</description><subject>adaptive control</subject><subject>Algorithms</subject><subject>Dynamical systems</subject><subject>integral reinforcement learning</subject><subject>inverse optimal control</subject><subject>inverse reinforcement learning</subject><subject>Machine learning</subject><subject>Neural networks</subject><subject>Nonlinear dynamics</subject><subject>Nonlinear systems</subject><subject>Optimal control</subject><subject>System dynamics</subject><subject>Trajectory control</subject><issn>1049-8923</issn><issn>1099-1239</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><recordid>eNp1kE1LAzEQQIMoWKvgTwh48bI1yW7S5ijFLygWpJ5DNplo6jZbk61l_73Z1qswMMPMmxl4CF1TMqGEsLsYzIQLJk7QiBIpC8pKeTrUlSxmkpXn6CKlNSF5xqoRWi1D4wNgH34gJsARfHBtNLCB0OEGdAw-fODcwqE9oDri1KcONgnvffeJtR02dfS6wbrrtPlKl-jM6SbB1V8eo_fHh9X8uVgsn17m94vClGwqCqCclppTWlXCyqmrGZ0JWmsrhdWWWONcnYMAl4bXggO39ZRoXTFnK2egHKOb491tbL93kDq1bncx5JeKcS4FkyWhmbo9Uia2KUVwahv9RsdeUaIGZyo7U4OzjBZHdO8b6P_l1Nvr_MD_Att5b74</recordid><startdate>20210925</startdate><enddate>20210925</enddate><creator>Lian, Bosen</creator><creator>Xue, Wenqian</creator><creator>Lewis, Frank L.</creator><creator>Chai, Tianyou</creator><general>Wiley Subscription Services, Inc</general><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>7TB</scope><scope>8FD</scope><scope>FR3</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0002-3275-9551</orcidid></search><sort><creationdate>20210925</creationdate><title>Online inverse reinforcement learning for nonlinear systems with adversarial attacks</title><author>Lian, Bosen ; Xue, Wenqian ; Lewis, Frank L. ; Chai, Tianyou</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c3276-e1513a511446d97fb21861bad96dad0dcffbffb0e59c5b65e5db70aa42fd4fce3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>adaptive control</topic><topic>Algorithms</topic><topic>Dynamical systems</topic><topic>integral reinforcement learning</topic><topic>inverse optimal control</topic><topic>inverse reinforcement learning</topic><topic>Machine learning</topic><topic>Neural networks</topic><topic>Nonlinear dynamics</topic><topic>Nonlinear systems</topic><topic>Optimal control</topic><topic>System dynamics</topic><topic>Trajectory control</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Lian, Bosen</creatorcontrib><creatorcontrib>Xue, Wenqian</creatorcontrib><creatorcontrib>Lewis, Frank L.</creatorcontrib><creatorcontrib>Chai, Tianyou</creatorcontrib><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics &amp; Communications Abstracts</collection><collection>Mechanical &amp; Transportation Engineering Abstracts</collection><collection>Technology Research Database</collection><collection>Engineering Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>International journal of robust and nonlinear control</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Lian, Bosen</au><au>Xue, Wenqian</au><au>Lewis, Frank L.</au><au>Chai, Tianyou</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Online inverse reinforcement learning for nonlinear systems with adversarial attacks</atitle><jtitle>International journal of robust and nonlinear control</jtitle><date>2021-09-25</date><risdate>2021</risdate><volume>31</volume><issue>14</issue><spage>6646</spage><epage>6667</epage><pages>6646-6667</pages><issn>1049-8923</issn><eissn>1099-1239</eissn><abstract>In the inverse reinforcement learning (RL) problem, there are two agents. A learner agent seeks to mimic another expert agent's state and control input behavior trajectories by observing the expert's behavior trajectories. These observations are used to reconstruct the unknown expert's performance objective. This article develops novel inverse RL algorithms to solve the inverse RL problem in which both agents suffer from adversarial attacks and have continuous‐time nonlinear dynamics. We first propose an offline inverse RL algorithm for the learner to reconstruct unknown expert's performance objective. This offline inverse RL algorithm is based on the technique of integral RL (IRL) and only needs partial knowledge of the system dynamics. The algorithm has two learning stages: an optimal control learning stage first and a second learning stage based on inverse optimal control. Then, based on the offline algorithm, an online inverse RL algorithm is further developed to solve the inverse RL problem in real time without knowing the system drift dynamics. This online adaptive learning method consists of simultaneous adaptation of four neural networks (NNs): a critic NN, an actor NN, an adversary NN, and a state penalty NN. Convergence of the algorithms as well as the stability of the learner system and the synchronous tuning NNs are guaranteed. Simulation examples verify the effectiveness of the online method.</abstract><cop>Bognor Regis</cop><pub>Wiley Subscription Services, Inc</pub><doi>10.1002/rnc.5626</doi><tpages>22</tpages><orcidid>https://orcid.org/0000-0002-3275-9551</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 1049-8923
ispartof International journal of robust and nonlinear control, 2021-09, Vol.31 (14), p.6646-6667
issn 1049-8923
1099-1239
language eng
recordid cdi_proquest_journals_2559629301
source Wiley Online Library All Journals
subjects adaptive control
Algorithms
Dynamical systems
integral reinforcement learning
inverse optimal control
inverse reinforcement learning
Machine learning
Neural networks
Nonlinear dynamics
Nonlinear systems
Optimal control
System dynamics
Trajectory control
title Online inverse reinforcement learning for nonlinear systems with adversarial attacks
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-09T18%3A57%3A15IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Online%20inverse%20reinforcement%20learning%20for%20nonlinear%20systems%20with%20adversarial%20attacks&rft.jtitle=International%20journal%20of%20robust%20and%20nonlinear%20control&rft.au=Lian,%20Bosen&rft.date=2021-09-25&rft.volume=31&rft.issue=14&rft.spage=6646&rft.epage=6667&rft.pages=6646-6667&rft.issn=1049-8923&rft.eissn=1099-1239&rft_id=info:doi/10.1002/rnc.5626&rft_dat=%3Cproquest_cross%3E2559629301%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2559629301&rft_id=info:pmid/&rfr_iscdi=true