UAV Autonomous Aerial Combat Maneuver Strategy Generation with Observation Error Based on State-Adversarial Deep Deterministic Policy Gradient and Inverse Reinforcement Learning
With the development of unmanned aerial vehicle (UAV) and artificial intelligence (AI) technology, Intelligent UAV will be widely used in future autonomous aerial combat. Previous researches on autonomous aerial combat within visual range (WVR) have limitations due to simplifying assumptions, limite...
Gespeichert in:
Veröffentlicht in: | Electronics (Basel) 2020, Vol.9 (7), p.1121 |
---|---|
Hauptverfasser: | , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | |
---|---|
container_issue | 7 |
container_start_page | 1121 |
container_title | Electronics (Basel) |
container_volume | 9 |
creator | Kong, Weiren Zhou, Deyun Yang, Zhen Zhao, Yiyang Zhang, Kai |
description | With the development of unmanned aerial vehicle (UAV) and artificial intelligence (AI) technology, Intelligent UAV will be widely used in future autonomous aerial combat. Previous researches on autonomous aerial combat within visual range (WVR) have limitations due to simplifying assumptions, limited robustness, and ignoring sensor errors. In this paper, in order to consider the error of the aircraft sensors, we model the aerial combat WVR as a state-adversarial Markov decision process (SA-MDP), which introduce the small adversarial perturbations on state observations and these perturbations do not alter the environment directly, but can mislead the agent into making suboptimal decisions. Meanwhile, we propose a novel autonomous aerial combat maneuver strategy generation algorithm with high-performance and high-robustness based on state-adversarial deep deterministic policy gradient algorithm (SA-DDPG), which add a robustness regularizers related to an upper bound on performance loss at the actor-network. At the same time, a reward shaping method based on maximum entropy (MaxEnt) inverse reinforcement learning algorithm (IRL) is proposed to improve the aerial combat strategy generation algorithm’s efficiency. Finally, the efficiency of the aerial combat strategy generation algorithm and the performance and robustness of the resulting aerial combat strategy is verified by simulation experiments. Our main contributions are three-fold. First, to introduce the observation errors of UAV, we are modeling air combat as SA-MDP. Second, to make the strategy network of air combat maneuver more robust in the presence of observation errors, we introduce regularizers into the policy gradient. Third, to solve the problem that air combat’s reward function is too sparse, we use MaxEnt IRL to design a shaping reward to accelerate the convergence of SA-DDPG. |
doi_str_mv | 10.3390/electronics9071121 |
format | Article |
fullrecord | <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2423956923</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2423956923</sourcerecordid><originalsourceid>FETCH-LOGICAL-c319t-5b60a6751032abaf9af63269bf0711fa784a42c8ecea06b20b3ecaee0724ec273</originalsourceid><addsrcrecordid>eNplUctOwzAQjBBIIOAHOFniHHDsNqmPobwqFYEo5RptnA0YJXZZO6B-Fn-ISzkgsYd9zsxKu0lykvEzKRU_xw51IGeN9ooXWSayneRA8EKlSiix-yffT469f-PRVCYnkh8kX8vymZVDcNb1bvCsRDLQsanrawjsDiwOH0hsEQgCvqzZDVqMqXGWfZrwyu5rj_SxbVwROWIX4LFhsVyESEnLJvI9_KheIq6iC0i9scYHo9mD64yOsgSNQRsY2IbN7IaC7BGNbR1p7DeTOQJZY1-Okr0WOo_Hv_EwWV5fPU1v0_n9zWxazlMtMxXScZ1zyItxxqWAGloFbS5Frup2c6IWiskIRkJPUCPwvBa8lqgBkRdihFoU8jA53equyL0P6EP15gaycWUlRkKqca6EjCixRWly3hO21YpMD7SuMl5tvlP9_478BiRuigo</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2423956923</pqid></control><display><type>article</type><title>UAV Autonomous Aerial Combat Maneuver Strategy Generation with Observation Error Based on State-Adversarial Deep Deterministic Policy Gradient and Inverse Reinforcement Learning</title><source>MDPI - Multidisciplinary Digital Publishing Institute</source><source>EZB-FREE-00999 freely available EZB journals</source><creator>Kong, Weiren ; Zhou, Deyun ; Yang, Zhen ; Zhao, Yiyang ; Zhang, Kai</creator><creatorcontrib>Kong, Weiren ; Zhou, Deyun ; Yang, Zhen ; Zhao, Yiyang ; Zhang, Kai</creatorcontrib><description>With the development of unmanned aerial vehicle (UAV) and artificial intelligence (AI) technology, Intelligent UAV will be widely used in future autonomous aerial combat. Previous researches on autonomous aerial combat within visual range (WVR) have limitations due to simplifying assumptions, limited robustness, and ignoring sensor errors. In this paper, in order to consider the error of the aircraft sensors, we model the aerial combat WVR as a state-adversarial Markov decision process (SA-MDP), which introduce the small adversarial perturbations on state observations and these perturbations do not alter the environment directly, but can mislead the agent into making suboptimal decisions. Meanwhile, we propose a novel autonomous aerial combat maneuver strategy generation algorithm with high-performance and high-robustness based on state-adversarial deep deterministic policy gradient algorithm (SA-DDPG), which add a robustness regularizers related to an upper bound on performance loss at the actor-network. At the same time, a reward shaping method based on maximum entropy (MaxEnt) inverse reinforcement learning algorithm (IRL) is proposed to improve the aerial combat strategy generation algorithm’s efficiency. Finally, the efficiency of the aerial combat strategy generation algorithm and the performance and robustness of the resulting aerial combat strategy is verified by simulation experiments. Our main contributions are three-fold. First, to introduce the observation errors of UAV, we are modeling air combat as SA-MDP. Second, to make the strategy network of air combat maneuver more robust in the presence of observation errors, we introduce regularizers into the policy gradient. Third, to solve the problem that air combat’s reward function is too sparse, we use MaxEnt IRL to design a shaping reward to accelerate the convergence of SA-DDPG.</description><identifier>ISSN: 2079-9292</identifier><identifier>EISSN: 2079-9292</identifier><identifier>DOI: 10.3390/electronics9071121</identifier><language>eng</language><publisher>Basel: MDPI AG</publisher><subject>Air combat ; Airborne observation ; Aircraft ; Algorithms ; Artificial intelligence ; Combat aircraft ; Computer simulation ; Decision making ; Game theory ; Ground stations ; Machine learning ; Markov analysis ; Markov processes ; Maximum entropy ; Neural networks ; Robustness ; Sensors ; Simulation ; Strategy ; Teaching methods ; Unmanned aerial vehicles ; Upper bounds</subject><ispartof>Electronics (Basel), 2020, Vol.9 (7), p.1121</ispartof><rights>2020. This work is licensed under http://creativecommons.org/licenses/by/3.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c319t-5b60a6751032abaf9af63269bf0711fa784a42c8ecea06b20b3ecaee0724ec273</citedby><cites>FETCH-LOGICAL-c319t-5b60a6751032abaf9af63269bf0711fa784a42c8ecea06b20b3ecaee0724ec273</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,776,780,4010,27900,27901,27902</link.rule.ids></links><search><creatorcontrib>Kong, Weiren</creatorcontrib><creatorcontrib>Zhou, Deyun</creatorcontrib><creatorcontrib>Yang, Zhen</creatorcontrib><creatorcontrib>Zhao, Yiyang</creatorcontrib><creatorcontrib>Zhang, Kai</creatorcontrib><title>UAV Autonomous Aerial Combat Maneuver Strategy Generation with Observation Error Based on State-Adversarial Deep Deterministic Policy Gradient and Inverse Reinforcement Learning</title><title>Electronics (Basel)</title><description>With the development of unmanned aerial vehicle (UAV) and artificial intelligence (AI) technology, Intelligent UAV will be widely used in future autonomous aerial combat. Previous researches on autonomous aerial combat within visual range (WVR) have limitations due to simplifying assumptions, limited robustness, and ignoring sensor errors. In this paper, in order to consider the error of the aircraft sensors, we model the aerial combat WVR as a state-adversarial Markov decision process (SA-MDP), which introduce the small adversarial perturbations on state observations and these perturbations do not alter the environment directly, but can mislead the agent into making suboptimal decisions. Meanwhile, we propose a novel autonomous aerial combat maneuver strategy generation algorithm with high-performance and high-robustness based on state-adversarial deep deterministic policy gradient algorithm (SA-DDPG), which add a robustness regularizers related to an upper bound on performance loss at the actor-network. At the same time, a reward shaping method based on maximum entropy (MaxEnt) inverse reinforcement learning algorithm (IRL) is proposed to improve the aerial combat strategy generation algorithm’s efficiency. Finally, the efficiency of the aerial combat strategy generation algorithm and the performance and robustness of the resulting aerial combat strategy is verified by simulation experiments. Our main contributions are three-fold. First, to introduce the observation errors of UAV, we are modeling air combat as SA-MDP. Second, to make the strategy network of air combat maneuver more robust in the presence of observation errors, we introduce regularizers into the policy gradient. Third, to solve the problem that air combat’s reward function is too sparse, we use MaxEnt IRL to design a shaping reward to accelerate the convergence of SA-DDPG.</description><subject>Air combat</subject><subject>Airborne observation</subject><subject>Aircraft</subject><subject>Algorithms</subject><subject>Artificial intelligence</subject><subject>Combat aircraft</subject><subject>Computer simulation</subject><subject>Decision making</subject><subject>Game theory</subject><subject>Ground stations</subject><subject>Machine learning</subject><subject>Markov analysis</subject><subject>Markov processes</subject><subject>Maximum entropy</subject><subject>Neural networks</subject><subject>Robustness</subject><subject>Sensors</subject><subject>Simulation</subject><subject>Strategy</subject><subject>Teaching methods</subject><subject>Unmanned aerial vehicles</subject><subject>Upper bounds</subject><issn>2079-9292</issn><issn>2079-9292</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2020</creationdate><recordtype>article</recordtype><sourceid>BENPR</sourceid><recordid>eNplUctOwzAQjBBIIOAHOFniHHDsNqmPobwqFYEo5RptnA0YJXZZO6B-Fn-ISzkgsYd9zsxKu0lykvEzKRU_xw51IGeN9ooXWSayneRA8EKlSiix-yffT469f-PRVCYnkh8kX8vymZVDcNb1bvCsRDLQsanrawjsDiwOH0hsEQgCvqzZDVqMqXGWfZrwyu5rj_SxbVwROWIX4LFhsVyESEnLJvI9_KheIq6iC0i9scYHo9mD64yOsgSNQRsY2IbN7IaC7BGNbR1p7DeTOQJZY1-Okr0WOo_Hv_EwWV5fPU1v0_n9zWxazlMtMxXScZ1zyItxxqWAGloFbS5Frup2c6IWiskIRkJPUCPwvBa8lqgBkRdihFoU8jA53equyL0P6EP15gaycWUlRkKqca6EjCixRWly3hO21YpMD7SuMl5tvlP9_478BiRuigo</recordid><startdate>2020</startdate><enddate>2020</enddate><creator>Kong, Weiren</creator><creator>Zhou, Deyun</creator><creator>Yang, Zhen</creator><creator>Zhao, Yiyang</creator><creator>Zhang, Kai</creator><general>MDPI AG</general><scope>AAYXX</scope><scope>CITATION</scope><scope>7SP</scope><scope>8FD</scope><scope>8FE</scope><scope>8FG</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L7M</scope><scope>P5Z</scope><scope>P62</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope></search><sort><creationdate>2020</creationdate><title>UAV Autonomous Aerial Combat Maneuver Strategy Generation with Observation Error Based on State-Adversarial Deep Deterministic Policy Gradient and Inverse Reinforcement Learning</title><author>Kong, Weiren ; Zhou, Deyun ; Yang, Zhen ; Zhao, Yiyang ; Zhang, Kai</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c319t-5b60a6751032abaf9af63269bf0711fa784a42c8ecea06b20b3ecaee0724ec273</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2020</creationdate><topic>Air combat</topic><topic>Airborne observation</topic><topic>Aircraft</topic><topic>Algorithms</topic><topic>Artificial intelligence</topic><topic>Combat aircraft</topic><topic>Computer simulation</topic><topic>Decision making</topic><topic>Game theory</topic><topic>Ground stations</topic><topic>Machine learning</topic><topic>Markov analysis</topic><topic>Markov processes</topic><topic>Maximum entropy</topic><topic>Neural networks</topic><topic>Robustness</topic><topic>Sensors</topic><topic>Simulation</topic><topic>Strategy</topic><topic>Teaching methods</topic><topic>Unmanned aerial vehicles</topic><topic>Upper bounds</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Kong, Weiren</creatorcontrib><creatorcontrib>Zhou, Deyun</creatorcontrib><creatorcontrib>Yang, Zhen</creatorcontrib><creatorcontrib>Zhao, Yiyang</creatorcontrib><creatorcontrib>Zhang, Kai</creatorcontrib><collection>CrossRef</collection><collection>Electronics & Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>Advanced Technologies & Aerospace Collection</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Advanced Technologies & Aerospace Database</collection><collection>ProQuest Advanced Technologies & Aerospace Collection</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><jtitle>Electronics (Basel)</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Kong, Weiren</au><au>Zhou, Deyun</au><au>Yang, Zhen</au><au>Zhao, Yiyang</au><au>Zhang, Kai</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>UAV Autonomous Aerial Combat Maneuver Strategy Generation with Observation Error Based on State-Adversarial Deep Deterministic Policy Gradient and Inverse Reinforcement Learning</atitle><jtitle>Electronics (Basel)</jtitle><date>2020</date><risdate>2020</risdate><volume>9</volume><issue>7</issue><spage>1121</spage><pages>1121-</pages><issn>2079-9292</issn><eissn>2079-9292</eissn><abstract>With the development of unmanned aerial vehicle (UAV) and artificial intelligence (AI) technology, Intelligent UAV will be widely used in future autonomous aerial combat. Previous researches on autonomous aerial combat within visual range (WVR) have limitations due to simplifying assumptions, limited robustness, and ignoring sensor errors. In this paper, in order to consider the error of the aircraft sensors, we model the aerial combat WVR as a state-adversarial Markov decision process (SA-MDP), which introduce the small adversarial perturbations on state observations and these perturbations do not alter the environment directly, but can mislead the agent into making suboptimal decisions. Meanwhile, we propose a novel autonomous aerial combat maneuver strategy generation algorithm with high-performance and high-robustness based on state-adversarial deep deterministic policy gradient algorithm (SA-DDPG), which add a robustness regularizers related to an upper bound on performance loss at the actor-network. At the same time, a reward shaping method based on maximum entropy (MaxEnt) inverse reinforcement learning algorithm (IRL) is proposed to improve the aerial combat strategy generation algorithm’s efficiency. Finally, the efficiency of the aerial combat strategy generation algorithm and the performance and robustness of the resulting aerial combat strategy is verified by simulation experiments. Our main contributions are three-fold. First, to introduce the observation errors of UAV, we are modeling air combat as SA-MDP. Second, to make the strategy network of air combat maneuver more robust in the presence of observation errors, we introduce regularizers into the policy gradient. Third, to solve the problem that air combat’s reward function is too sparse, we use MaxEnt IRL to design a shaping reward to accelerate the convergence of SA-DDPG.</abstract><cop>Basel</cop><pub>MDPI AG</pub><doi>10.3390/electronics9071121</doi><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | ISSN: 2079-9292 |
ispartof | Electronics (Basel), 2020, Vol.9 (7), p.1121 |
issn | 2079-9292 2079-9292 |
language | eng |
recordid | cdi_proquest_journals_2423956923 |
source | MDPI - Multidisciplinary Digital Publishing Institute; EZB-FREE-00999 freely available EZB journals |
subjects | Air combat Airborne observation Aircraft Algorithms Artificial intelligence Combat aircraft Computer simulation Decision making Game theory Ground stations Machine learning Markov analysis Markov processes Maximum entropy Neural networks Robustness Sensors Simulation Strategy Teaching methods Unmanned aerial vehicles Upper bounds |
title | UAV Autonomous Aerial Combat Maneuver Strategy Generation with Observation Error Based on State-Adversarial Deep Deterministic Policy Gradient and Inverse Reinforcement Learning |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-10T11%3A10%3A27IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=UAV%20Autonomous%20Aerial%20Combat%20Maneuver%20Strategy%20Generation%20with%20Observation%20Error%20Based%20on%20State-Adversarial%20Deep%20Deterministic%20Policy%20Gradient%20and%20Inverse%20Reinforcement%20Learning&rft.jtitle=Electronics%20(Basel)&rft.au=Kong,%20Weiren&rft.date=2020&rft.volume=9&rft.issue=7&rft.spage=1121&rft.pages=1121-&rft.issn=2079-9292&rft.eissn=2079-9292&rft_id=info:doi/10.3390/electronics9071121&rft_dat=%3Cproquest_cross%3E2423956923%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2423956923&rft_id=info:pmid/&rfr_iscdi=true |