UAV Autonomous Aerial Combat Maneuver Strategy Generation with Observation Error Based on State-Adversarial Deep Deterministic Policy Gradient and Inverse Reinforcement Learning

With the development of unmanned aerial vehicle (UAV) and artificial intelligence (AI) technology, Intelligent UAV will be widely used in future autonomous aerial combat. Previous researches on autonomous aerial combat within visual range (WVR) have limitations due to simplifying assumptions, limite...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Electronics (Basel) 2020, Vol.9 (7), p.1121
Hauptverfasser:	Kong, Weiren, Zhou, Deyun, Yang, Zhen, Zhao, Yiyang, Zhang, Kai
Format:	Artikel
Sprache:	eng
Schlagworte:	Air combat Airborne observation Aircraft Algorithms Artificial intelligence Combat aircraft Computer simulation Decision making Game theory Ground stations Machine learning Markov analysis Markov processes Maximum entropy Neural networks Robustness Sensors Simulation Strategy Teaching methods Unmanned aerial vehicles Upper bounds
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue	7
container_start_page	1121
container_title	Electronics (Basel)
container_volume	9
creator	Kong, Weiren Zhou, Deyun Yang, Zhen Zhao, Yiyang Zhang, Kai
description	With the development of unmanned aerial vehicle (UAV) and artificial intelligence (AI) technology, Intelligent UAV will be widely used in future autonomous aerial combat. Previous researches on autonomous aerial combat within visual range (WVR) have limitations due to simplifying assumptions, limited robustness, and ignoring sensor errors. In this paper, in order to consider the error of the aircraft sensors, we model the aerial combat WVR as a state-adversarial Markov decision process (SA-MDP), which introduce the small adversarial perturbations on state observations and these perturbations do not alter the environment directly, but can mislead the agent into making suboptimal decisions. Meanwhile, we propose a novel autonomous aerial combat maneuver strategy generation algorithm with high-performance and high-robustness based on state-adversarial deep deterministic policy gradient algorithm (SA-DDPG), which add a robustness regularizers related to an upper bound on performance loss at the actor-network. At the same time, a reward shaping method based on maximum entropy (MaxEnt) inverse reinforcement learning algorithm (IRL) is proposed to improve the aerial combat strategy generation algorithm’s efficiency. Finally, the efficiency of the aerial combat strategy generation algorithm and the performance and robustness of the resulting aerial combat strategy is verified by simulation experiments. Our main contributions are three-fold. First, to introduce the observation errors of UAV, we are modeling air combat as SA-MDP. Second, to make the strategy network of air combat maneuver more robust in the presence of observation errors, we introduce regularizers into the policy gradient. Third, to solve the problem that air combat’s reward function is too sparse, we use MaxEnt IRL to design a shaping reward to accelerate the convergence of SA-DDPG.
doi_str_mv	10.3390/electronics9071121
format	Article
fullrecord	<record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2423956923</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2423956923</sourcerecordid><originalsourceid>FETCH-LOGICAL-c319t-5b60a6751032abaf9af63269bf0711fa784a42c8ecea06b20b3ecaee0724ec273</originalsourceid><addsrcrecordid>eNplUctOwzAQjBBIIOAHOFniHHDsNqmPobwqFYEo5RptnA0YJXZZO6B-Fn-ISzkgsYd9zsxKu0lykvEzKRU_xw51IGeN9ooXWSayneRA8EKlSiix-yffT469f-PRVCYnkh8kX8vymZVDcNb1bvCsRDLQsanrawjsDiwOH0hsEQgCvqzZDVqMqXGWfZrwyu5rj_SxbVwROWIX4LFhsVyESEnLJvI9_KheIq6iC0i9scYHo9mD64yOsgSNQRsY2IbN7IaC7BGNbR1p7DeTOQJZY1-Okr0WOo_Hv_EwWV5fPU1v0_n9zWxazlMtMxXScZ1zyItxxqWAGloFbS5Frup2c6IWiskIRkJPUCPwvBa8lqgBkRdihFoU8jA53equyL0P6EP15gaycWUlRkKqca6EjCixRWly3hO21YpMD7SuMl5tvlP9_478BiRuigo</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2423956923</pqid></control><display><type>article</type><title>UAV Autonomous Aerial Combat Maneuver Strategy Generation with Observation Error Based on State-Adversarial Deep Deterministic Policy Gradient and Inverse Reinforcement Learning</title><source>MDPI - Multidisciplinary Digital Publishing Institute</source><source>EZB-FREE-00999 freely available EZB journals</source><creator>Kong, Weiren ; Zhou, Deyun ; Yang, Zhen ; Zhao, Yiyang ; Zhang, Kai</creator><creatorcontrib>Kong, Weiren ; Zhou, Deyun ; Yang, Zhen ; Zhao, Yiyang ; Zhang, Kai</creatorcontrib><description>With the development of unmanned aerial vehicle (UAV) and artificial intelligence (AI) technology, Intelligent UAV will be widely used in future autonomous aerial combat. Previous researches on autonomous aerial combat within visual range (WVR) have limitations due to simplifying assumptions, limited robustness, and ignoring sensor errors. In this paper, in order to consider the error of the aircraft sensors, we model the aerial combat WVR as a state-adversarial Markov decision process (SA-MDP), which introduce the small adversarial perturbations on state observations and these perturbations do not alter the environment directly, but can mislead the agent into making suboptimal decisions. Meanwhile, we propose a novel autonomous aerial combat maneuver strategy generation algorithm with high-performance and high-robustness based on state-adversarial deep deterministic policy gradient algorithm (SA-DDPG), which add a robustness regularizers related to an upper bound on performance loss at the actor-network. At the same time, a reward shaping method based on maximum entropy (MaxEnt) inverse reinforcement learning algorithm (IRL) is proposed to improve the aerial combat strategy generation algorithm’s efficiency. Finally, the efficiency of the aerial combat strategy generation algorithm and the performance and robustness of the resulting aerial combat strategy is verified by simulation experiments. Our main contributions are three-fold. First, to introduce the observation errors of UAV, we are modeling air combat as SA-MDP. Second, to make the strategy network of air combat maneuver more robust in the presence of observation errors, we introduce regularizers into the policy gradient. Third, to solve the problem that air combat’s reward function is too sparse, we use MaxEnt IRL to design a shaping reward to accelerate the convergence of SA-DDPG.</description><identifier>ISSN: 2079-9292</identifier><identifier>EISSN: 2079-9292</identifier><identifier>DOI: 10.3390/electronics9071121</identifier><language>eng</language><publisher>Basel: MDPI AG</publisher><subject>Air combat ; Airborne observation ; Aircraft ; Algorithms ; Artificial intelligence ; Combat aircraft ; Computer simulation ; Decision making ; Game theory ; Ground stations ; Machine learning ; Markov analysis ; Markov processes ; Maximum entropy ; Neural networks ; Robustness ; Sensors ; Simulation ; Strategy ; Teaching methods ; Unmanned aerial vehicles ; Upper bounds</subject><ispartof>Electronics (Basel), 2020, Vol.9 (7), p.1121</ispartof><rights>2020. This work is licensed under http://creativecommons.org/licenses/by/3.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c319t-5b60a6751032abaf9af63269bf0711fa784a42c8ecea06b20b3ecaee0724ec273</citedby><cites>FETCH-LOGICAL-c319t-5b60a6751032abaf9af63269bf0711fa784a42c8ecea06b20b3ecaee0724ec273</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,776,780,4010,27900,27901,27902</link.rule.ids></links><search><creatorcontrib>Kong, Weiren</creatorcontrib><creatorcontrib>Zhou, Deyun</creatorcontrib><creatorcontrib>Yang, Zhen</creatorcontrib><creatorcontrib>Zhao, Yiyang</creatorcontrib><creatorcontrib>Zhang, Kai</creatorcontrib><title>UAV Autonomous Aerial Combat Maneuver Strategy Generation with Observation Error Based on State-Adversarial Deep Deterministic Policy Gradient and Inverse Reinforcement Learning</title><title>Electronics (Basel)</title><description>With the development of unmanned aerial vehicle (UAV) and artificial intelligence (AI) technology, Intelligent UAV will be widely used in future autonomous aerial combat. Previous researches on autonomous aerial combat within visual range (WVR) have limitations due to simplifying assumptions, limited robustness, and ignoring sensor errors. In this paper, in order to consider the error of the aircraft sensors, we model the aerial combat WVR as a state-adversarial Markov decision process (SA-MDP), which introduce the small adversarial perturbations on state observations and these perturbations do not alter the environment directly, but can mislead the agent into making suboptimal decisions. Meanwhile, we propose a novel autonomous aerial combat maneuver strategy generation algorithm with high-performance and high-robustness based on state-adversarial deep deterministic policy gradient algorithm (SA-DDPG), which add a robustness regularizers related to an upper bound on performance loss at the actor-network. At the same time, a reward shaping method based on maximum entropy (MaxEnt) inverse reinforcement learning algorithm (IRL) is proposed to improve the aerial combat strategy generation algorithm’s efficiency. Finally, the efficiency of the aerial combat strategy generation algorithm and the performance and robustness of the resulting aerial combat strategy is verified by simulation experiments. Our main contributions are three-fold. First, to introduce the observation errors of UAV, we are modeling air combat as SA-MDP. Second, to make the strategy network of air combat maneuver more robust in the presence of observation errors, we introduce regularizers into the policy gradient. Third, to solve the problem that air combat’s reward function is too sparse, we use MaxEnt IRL to design a shaping reward to accelerate the convergence of SA-DDPG.</description><subject>Air combat</subject><subject>Airborne observation</subject><subject>Aircraft</subject><subject>Algorithms</subject><subject>Artificial intelligence</subject><subject>Combat aircraft</subject><subject>Computer simulation</subject><subject>Decision making</subject><subject>Game theory</subject><subject>Ground stations</subject><subject>Machine learning</subject><subject>Markov analysis</subject><subject>Markov processes</subject><subject>Maximum entropy</subject><subject>Neural networks</subject><subject>Robustness</subject><subject>Sensors</subject><subject>Simulation</subject><subject>Strategy</subject><subject>Teaching methods</subject><subject>Unmanned aerial vehicles</subject><subject>Upper bounds</subject><issn>2079-9292</issn><issn>2079-9292</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2020</creationdate><recordtype>article</recordtype><sourceid>BENPR</sourceid><recordid>eNplUctOwzAQjBBIIOAHOFniHHDsNqmPobwqFYEo5RptnA0YJXZZO6B-Fn-ISzkgsYd9zsxKu0lykvEzKRU_xw51IGeN9ooXWSayneRA8EKlSiix-yffT469f-PRVCYnkh8kX8vymZVDcNb1bvCsRDLQsanrawjsDiwOH0hsEQgCvqzZDVqMqXGWfZrwyu5rj_SxbVwROWIX4LFhsVyESEnLJvI9_KheIq6iC0i9scYHo9mD64yOsgSNQRsY2IbN7IaC7BGNbR1p7DeTOQJZY1-Okr0WOo_Hv_EwWV5fPU1v0_n9zWxazlMtMxXScZ1zyItxxqWAGloFbS5Frup2c6IWiskIRkJPUCPwvBa8lqgBkRdihFoU8jA53equyL0P6EP15gaycWUlRkKqca6EjCixRWly3hO21YpMD7SuMl5tvlP9_478BiRuigo</recordid><startdate>2020</startdate><enddate>2020</enddate><creator>Kong, Weiren</creator><creator>Zhou, Deyun</creator><creator>Yang, Zhen</creator><creator>Zhao, Yiyang</creator><creator>Zhang, Kai</creator><general>MDPI AG</general><scope>AAYXX</scope><scope>CITATION</scope><scope>7SP</scope><scope>8FD</scope><scope>8FE</scope><scope>8FG</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L7M</scope><scope>P5Z</scope><scope>P62</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope></search><sort><creationdate>2020</creationdate><title>UAV Autonomous Aerial Combat Maneuver Strategy Generation with Observation Error Based on State-Adversarial Deep Deterministic Policy Gradient and Inverse Reinforcement Learning</title><author>Kong, Weiren ; Zhou, Deyun ; Yang, Zhen ; Zhao, Yiyang ; Zhang, Kai</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c319t-5b60a6751032abaf9af63269bf0711fa784a42c8ecea06b20b3ecaee0724ec273</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2020</creationdate><topic>Air combat</topic><topic>Airborne observation</topic><topic>Aircraft</topic><topic>Algorithms</topic><topic>Artificial intelligence</topic><topic>Combat aircraft</topic><topic>Computer simulation</topic><topic>Decision making</topic><topic>Game theory</topic><topic>Ground stations</topic><topic>Machine learning</topic><topic>Markov analysis</topic><topic>Markov processes</topic><topic>Maximum entropy</topic><topic>Neural networks</topic><topic>Robustness</topic><topic>Sensors</topic><topic>Simulation</topic><topic>Strategy</topic><topic>Teaching methods</topic><topic>Unmanned aerial vehicles</topic><topic>Upper bounds</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Kong, Weiren</creatorcontrib><creatorcontrib>Zhou, Deyun</creatorcontrib><creatorcontrib>Yang, Zhen</creatorcontrib><creatorcontrib>Zhao, Yiyang</creatorcontrib><creatorcontrib>Zhang, Kai</creatorcontrib><collection>CrossRef</collection><collection>Electronics & Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>Advanced Technologies & Aerospace Collection</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Advanced Technologies & Aerospace Database</collection><collection>ProQuest Advanced Technologies & Aerospace Collection</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><jtitle>Electronics (Basel)</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Kong, Weiren</au><au>Zhou, Deyun</au><au>Yang, Zhen</au><au>Zhao, Yiyang</au><au>Zhang, Kai</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>UAV Autonomous Aerial Combat Maneuver Strategy Generation with Observation Error Based on State-Adversarial Deep Deterministic Policy Gradient and Inverse Reinforcement Learning</atitle><jtitle>Electronics (Basel)</jtitle><date>2020</date><risdate>2020</risdate><volume>9</volume><issue>7</issue><spage>1121</spage><pages>1121-</pages><issn>2079-9292</issn><eissn>2079-9292</eissn><abstract>With the development of unmanned aerial vehicle (UAV) and artificial intelligence (AI) technology, Intelligent UAV will be widely used in future autonomous aerial combat. Previous researches on autonomous aerial combat within visual range (WVR) have limitations due to simplifying assumptions, limited robustness, and ignoring sensor errors. In this paper, in order to consider the error of the aircraft sensors, we model the aerial combat WVR as a state-adversarial Markov decision process (SA-MDP), which introduce the small adversarial perturbations on state observations and these perturbations do not alter the environment directly, but can mislead the agent into making suboptimal decisions. Meanwhile, we propose a novel autonomous aerial combat maneuver strategy generation algorithm with high-performance and high-robustness based on state-adversarial deep deterministic policy gradient algorithm (SA-DDPG), which add a robustness regularizers related to an upper bound on performance loss at the actor-network. At the same time, a reward shaping method based on maximum entropy (MaxEnt) inverse reinforcement learning algorithm (IRL) is proposed to improve the aerial combat strategy generation algorithm’s efficiency. Finally, the efficiency of the aerial combat strategy generation algorithm and the performance and robustness of the resulting aerial combat strategy is verified by simulation experiments. Our main contributions are three-fold. First, to introduce the observation errors of UAV, we are modeling air combat as SA-MDP. Second, to make the strategy network of air combat maneuver more robust in the presence of observation errors, we introduce regularizers into the policy gradient. Third, to solve the problem that air combat’s reward function is too sparse, we use MaxEnt IRL to design a shaping reward to accelerate the convergence of SA-DDPG.</abstract><cop>Basel</cop><pub>MDPI AG</pub><doi>10.3390/electronics9071121</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	ISSN: 2079-9292
ispartof	Electronics (Basel), 2020, Vol.9 (7), p.1121
issn	2079-9292 2079-9292
language	eng
recordid	cdi_proquest_journals_2423956923
source	MDPI - Multidisciplinary Digital Publishing Institute; EZB-FREE-00999 freely available EZB journals
subjects	Air combat Airborne observation Aircraft Algorithms Artificial intelligence Combat aircraft Computer simulation Decision making Game theory Ground stations Machine learning Markov analysis Markov processes Maximum entropy Neural networks Robustness Sensors Simulation Strategy Teaching methods Unmanned aerial vehicles Upper bounds
title	UAV Autonomous Aerial Combat Maneuver Strategy Generation with Observation Error Based on State-Adversarial Deep Deterministic Policy Gradient and Inverse Reinforcement Learning
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-10T11%3A10%3A27IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=UAV%20Autonomous%20Aerial%20Combat%20Maneuver%20Strategy%20Generation%20with%20Observation%20Error%20Based%20on%20State-Adversarial%20Deep%20Deterministic%20Policy%20Gradient%20and%20Inverse%20Reinforcement%20Learning&rft.jtitle=Electronics%20(Basel)&rft.au=Kong,%20Weiren&rft.date=2020&rft.volume=9&rft.issue=7&rft.spage=1121&rft.pages=1121-&rft.issn=2079-9292&rft.eissn=2079-9292&rft_id=info:doi/10.3390/electronics9071121&rft_dat=%3Cproquest_cross%3E2423956923%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2423956923&rft_id=info:pmid/&rfr_iscdi=true