Online inverse reinforcement learning for nonlinear systems with adversarial attacks
In the inverse reinforcement learning (RL) problem, there are two agents. A learner agent seeks to mimic another expert agent's state and control input behavior trajectories by observing the expert's behavior trajectories. These observations are used to reconstruct the unknown expert'...
Gespeichert in:
Veröffentlicht in: | International journal of robust and nonlinear control 2021-09, Vol.31 (14), p.6646-6667 |
---|---|
Hauptverfasser: | , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 6667 |
---|---|
container_issue | 14 |
container_start_page | 6646 |
container_title | International journal of robust and nonlinear control |
container_volume | 31 |
creator | Lian, Bosen Xue, Wenqian Lewis, Frank L. Chai, Tianyou |
description | In the inverse reinforcement learning (RL) problem, there are two agents. A learner agent seeks to mimic another expert agent's state and control input behavior trajectories by observing the expert's behavior trajectories. These observations are used to reconstruct the unknown expert's performance objective. This article develops novel inverse RL algorithms to solve the inverse RL problem in which both agents suffer from adversarial attacks and have continuous‐time nonlinear dynamics. We first propose an offline inverse RL algorithm for the learner to reconstruct unknown expert's performance objective. This offline inverse RL algorithm is based on the technique of integral RL (IRL) and only needs partial knowledge of the system dynamics. The algorithm has two learning stages: an optimal control learning stage first and a second learning stage based on inverse optimal control. Then, based on the offline algorithm, an online inverse RL algorithm is further developed to solve the inverse RL problem in real time without knowing the system drift dynamics. This online adaptive learning method consists of simultaneous adaptation of four neural networks (NNs): a critic NN, an actor NN, an adversary NN, and a state penalty NN. Convergence of the algorithms as well as the stability of the learner system and the synchronous tuning NNs are guaranteed. Simulation examples verify the effectiveness of the online method. |
doi_str_mv | 10.1002/rnc.5626 |
format | Article |
fullrecord | <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2559629301</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2559629301</sourcerecordid><originalsourceid>FETCH-LOGICAL-c3276-e1513a511446d97fb21861bad96dad0dcffbffb0e59c5b65e5db70aa42fd4fce3</originalsourceid><addsrcrecordid>eNp1kE1LAzEQQIMoWKvgTwh48bI1yW7S5ijFLygWpJ5DNplo6jZbk61l_73Z1qswMMPMmxl4CF1TMqGEsLsYzIQLJk7QiBIpC8pKeTrUlSxmkpXn6CKlNSF5xqoRWi1D4wNgH34gJsARfHBtNLCB0OEGdAw-fODcwqE9oDri1KcONgnvffeJtR02dfS6wbrrtPlKl-jM6SbB1V8eo_fHh9X8uVgsn17m94vClGwqCqCclppTWlXCyqmrGZ0JWmsrhdWWWONcnYMAl4bXggO39ZRoXTFnK2egHKOb491tbL93kDq1bncx5JeKcS4FkyWhmbo9Uia2KUVwahv9RsdeUaIGZyo7U4OzjBZHdO8b6P_l1Nvr_MD_Att5b74</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2559629301</pqid></control><display><type>article</type><title>Online inverse reinforcement learning for nonlinear systems with adversarial attacks</title><source>Wiley Online Library All Journals</source><creator>Lian, Bosen ; Xue, Wenqian ; Lewis, Frank L. ; Chai, Tianyou</creator><creatorcontrib>Lian, Bosen ; Xue, Wenqian ; Lewis, Frank L. ; Chai, Tianyou</creatorcontrib><description>In the inverse reinforcement learning (RL) problem, there are two agents. A learner agent seeks to mimic another expert agent's state and control input behavior trajectories by observing the expert's behavior trajectories. These observations are used to reconstruct the unknown expert's performance objective. This article develops novel inverse RL algorithms to solve the inverse RL problem in which both agents suffer from adversarial attacks and have continuous‐time nonlinear dynamics. We first propose an offline inverse RL algorithm for the learner to reconstruct unknown expert's performance objective. This offline inverse RL algorithm is based on the technique of integral RL (IRL) and only needs partial knowledge of the system dynamics. The algorithm has two learning stages: an optimal control learning stage first and a second learning stage based on inverse optimal control. Then, based on the offline algorithm, an online inverse RL algorithm is further developed to solve the inverse RL problem in real time without knowing the system drift dynamics. This online adaptive learning method consists of simultaneous adaptation of four neural networks (NNs): a critic NN, an actor NN, an adversary NN, and a state penalty NN. Convergence of the algorithms as well as the stability of the learner system and the synchronous tuning NNs are guaranteed. Simulation examples verify the effectiveness of the online method.</description><identifier>ISSN: 1049-8923</identifier><identifier>EISSN: 1099-1239</identifier><identifier>DOI: 10.1002/rnc.5626</identifier><language>eng</language><publisher>Bognor Regis: Wiley Subscription Services, Inc</publisher><subject>adaptive control ; Algorithms ; Dynamical systems ; integral reinforcement learning ; inverse optimal control ; inverse reinforcement learning ; Machine learning ; Neural networks ; Nonlinear dynamics ; Nonlinear systems ; Optimal control ; System dynamics ; Trajectory control</subject><ispartof>International journal of robust and nonlinear control, 2021-09, Vol.31 (14), p.6646-6667</ispartof><rights>2021 John Wiley & Sons Ltd.</rights><rights>2021 John Wiley & Sons, Ltd.</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c3276-e1513a511446d97fb21861bad96dad0dcffbffb0e59c5b65e5db70aa42fd4fce3</citedby><cites>FETCH-LOGICAL-c3276-e1513a511446d97fb21861bad96dad0dcffbffb0e59c5b65e5db70aa42fd4fce3</cites><orcidid>0000-0002-3275-9551</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://onlinelibrary.wiley.com/doi/pdf/10.1002%2Frnc.5626$$EPDF$$P50$$Gwiley$$H</linktopdf><linktohtml>$$Uhttps://onlinelibrary.wiley.com/doi/full/10.1002%2Frnc.5626$$EHTML$$P50$$Gwiley$$H</linktohtml><link.rule.ids>314,780,784,1416,27922,27923,45572,45573</link.rule.ids></links><search><creatorcontrib>Lian, Bosen</creatorcontrib><creatorcontrib>Xue, Wenqian</creatorcontrib><creatorcontrib>Lewis, Frank L.</creatorcontrib><creatorcontrib>Chai, Tianyou</creatorcontrib><title>Online inverse reinforcement learning for nonlinear systems with adversarial attacks</title><title>International journal of robust and nonlinear control</title><description>In the inverse reinforcement learning (RL) problem, there are two agents. A learner agent seeks to mimic another expert agent's state and control input behavior trajectories by observing the expert's behavior trajectories. These observations are used to reconstruct the unknown expert's performance objective. This article develops novel inverse RL algorithms to solve the inverse RL problem in which both agents suffer from adversarial attacks and have continuous‐time nonlinear dynamics. We first propose an offline inverse RL algorithm for the learner to reconstruct unknown expert's performance objective. This offline inverse RL algorithm is based on the technique of integral RL (IRL) and only needs partial knowledge of the system dynamics. The algorithm has two learning stages: an optimal control learning stage first and a second learning stage based on inverse optimal control. Then, based on the offline algorithm, an online inverse RL algorithm is further developed to solve the inverse RL problem in real time without knowing the system drift dynamics. This online adaptive learning method consists of simultaneous adaptation of four neural networks (NNs): a critic NN, an actor NN, an adversary NN, and a state penalty NN. Convergence of the algorithms as well as the stability of the learner system and the synchronous tuning NNs are guaranteed. Simulation examples verify the effectiveness of the online method.</description><subject>adaptive control</subject><subject>Algorithms</subject><subject>Dynamical systems</subject><subject>integral reinforcement learning</subject><subject>inverse optimal control</subject><subject>inverse reinforcement learning</subject><subject>Machine learning</subject><subject>Neural networks</subject><subject>Nonlinear dynamics</subject><subject>Nonlinear systems</subject><subject>Optimal control</subject><subject>System dynamics</subject><subject>Trajectory control</subject><issn>1049-8923</issn><issn>1099-1239</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><recordid>eNp1kE1LAzEQQIMoWKvgTwh48bI1yW7S5ijFLygWpJ5DNplo6jZbk61l_73Z1qswMMPMmxl4CF1TMqGEsLsYzIQLJk7QiBIpC8pKeTrUlSxmkpXn6CKlNSF5xqoRWi1D4wNgH34gJsARfHBtNLCB0OEGdAw-fODcwqE9oDri1KcONgnvffeJtR02dfS6wbrrtPlKl-jM6SbB1V8eo_fHh9X8uVgsn17m94vClGwqCqCclppTWlXCyqmrGZ0JWmsrhdWWWONcnYMAl4bXggO39ZRoXTFnK2egHKOb491tbL93kDq1bncx5JeKcS4FkyWhmbo9Uia2KUVwahv9RsdeUaIGZyo7U4OzjBZHdO8b6P_l1Nvr_MD_Att5b74</recordid><startdate>20210925</startdate><enddate>20210925</enddate><creator>Lian, Bosen</creator><creator>Xue, Wenqian</creator><creator>Lewis, Frank L.</creator><creator>Chai, Tianyou</creator><general>Wiley Subscription Services, Inc</general><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>7TB</scope><scope>8FD</scope><scope>FR3</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0002-3275-9551</orcidid></search><sort><creationdate>20210925</creationdate><title>Online inverse reinforcement learning for nonlinear systems with adversarial attacks</title><author>Lian, Bosen ; Xue, Wenqian ; Lewis, Frank L. ; Chai, Tianyou</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c3276-e1513a511446d97fb21861bad96dad0dcffbffb0e59c5b65e5db70aa42fd4fce3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>adaptive control</topic><topic>Algorithms</topic><topic>Dynamical systems</topic><topic>integral reinforcement learning</topic><topic>inverse optimal control</topic><topic>inverse reinforcement learning</topic><topic>Machine learning</topic><topic>Neural networks</topic><topic>Nonlinear dynamics</topic><topic>Nonlinear systems</topic><topic>Optimal control</topic><topic>System dynamics</topic><topic>Trajectory control</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Lian, Bosen</creatorcontrib><creatorcontrib>Xue, Wenqian</creatorcontrib><creatorcontrib>Lewis, Frank L.</creatorcontrib><creatorcontrib>Chai, Tianyou</creatorcontrib><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Mechanical & Transportation Engineering Abstracts</collection><collection>Technology Research Database</collection><collection>Engineering Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>International journal of robust and nonlinear control</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Lian, Bosen</au><au>Xue, Wenqian</au><au>Lewis, Frank L.</au><au>Chai, Tianyou</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Online inverse reinforcement learning for nonlinear systems with adversarial attacks</atitle><jtitle>International journal of robust and nonlinear control</jtitle><date>2021-09-25</date><risdate>2021</risdate><volume>31</volume><issue>14</issue><spage>6646</spage><epage>6667</epage><pages>6646-6667</pages><issn>1049-8923</issn><eissn>1099-1239</eissn><abstract>In the inverse reinforcement learning (RL) problem, there are two agents. A learner agent seeks to mimic another expert agent's state and control input behavior trajectories by observing the expert's behavior trajectories. These observations are used to reconstruct the unknown expert's performance objective. This article develops novel inverse RL algorithms to solve the inverse RL problem in which both agents suffer from adversarial attacks and have continuous‐time nonlinear dynamics. We first propose an offline inverse RL algorithm for the learner to reconstruct unknown expert's performance objective. This offline inverse RL algorithm is based on the technique of integral RL (IRL) and only needs partial knowledge of the system dynamics. The algorithm has two learning stages: an optimal control learning stage first and a second learning stage based on inverse optimal control. Then, based on the offline algorithm, an online inverse RL algorithm is further developed to solve the inverse RL problem in real time without knowing the system drift dynamics. This online adaptive learning method consists of simultaneous adaptation of four neural networks (NNs): a critic NN, an actor NN, an adversary NN, and a state penalty NN. Convergence of the algorithms as well as the stability of the learner system and the synchronous tuning NNs are guaranteed. Simulation examples verify the effectiveness of the online method.</abstract><cop>Bognor Regis</cop><pub>Wiley Subscription Services, Inc</pub><doi>10.1002/rnc.5626</doi><tpages>22</tpages><orcidid>https://orcid.org/0000-0002-3275-9551</orcidid><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | ISSN: 1049-8923 |
ispartof | International journal of robust and nonlinear control, 2021-09, Vol.31 (14), p.6646-6667 |
issn | 1049-8923 1099-1239 |
language | eng |
recordid | cdi_proquest_journals_2559629301 |
source | Wiley Online Library All Journals |
subjects | adaptive control Algorithms Dynamical systems integral reinforcement learning inverse optimal control inverse reinforcement learning Machine learning Neural networks Nonlinear dynamics Nonlinear systems Optimal control System dynamics Trajectory control |
title | Online inverse reinforcement learning for nonlinear systems with adversarial attacks |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-09T18%3A57%3A15IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Online%20inverse%20reinforcement%20learning%20for%20nonlinear%20systems%20with%20adversarial%20attacks&rft.jtitle=International%20journal%20of%20robust%20and%20nonlinear%20control&rft.au=Lian,%20Bosen&rft.date=2021-09-25&rft.volume=31&rft.issue=14&rft.spage=6646&rft.epage=6667&rft.pages=6646-6667&rft.issn=1049-8923&rft.eissn=1099-1239&rft_id=info:doi/10.1002/rnc.5626&rft_dat=%3Cproquest_cross%3E2559629301%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2559629301&rft_id=info:pmid/&rfr_iscdi=true |