UAV Autonomous Aerial Combat Maneuver Strategy Generation with Observation Error Based on State-Adversarial Deep Deterministic Policy Gradient and Inverse Reinforcement Learning

With the development of unmanned aerial vehicle (UAV) and artificial intelligence (AI) technology, Intelligent UAV will be widely used in future autonomous aerial combat. Previous researches on autonomous aerial combat within visual range (WVR) have limitations due to simplifying assumptions, limite...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Electronics (Basel) 2020, Vol.9 (7), p.1121
Hauptverfasser: Kong, Weiren, Zhou, Deyun, Yang, Zhen, Zhao, Yiyang, Zhang, Kai
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue 7
container_start_page 1121
container_title Electronics (Basel)
container_volume 9
creator Kong, Weiren
Zhou, Deyun
Yang, Zhen
Zhao, Yiyang
Zhang, Kai
description With the development of unmanned aerial vehicle (UAV) and artificial intelligence (AI) technology, Intelligent UAV will be widely used in future autonomous aerial combat. Previous researches on autonomous aerial combat within visual range (WVR) have limitations due to simplifying assumptions, limited robustness, and ignoring sensor errors. In this paper, in order to consider the error of the aircraft sensors, we model the aerial combat WVR as a state-adversarial Markov decision process (SA-MDP), which introduce the small adversarial perturbations on state observations and these perturbations do not alter the environment directly, but can mislead the agent into making suboptimal decisions. Meanwhile, we propose a novel autonomous aerial combat maneuver strategy generation algorithm with high-performance and high-robustness based on state-adversarial deep deterministic policy gradient algorithm (SA-DDPG), which add a robustness regularizers related to an upper bound on performance loss at the actor-network. At the same time, a reward shaping method based on maximum entropy (MaxEnt) inverse reinforcement learning algorithm (IRL) is proposed to improve the aerial combat strategy generation algorithm’s efficiency. Finally, the efficiency of the aerial combat strategy generation algorithm and the performance and robustness of the resulting aerial combat strategy is verified by simulation experiments. Our main contributions are three-fold. First, to introduce the observation errors of UAV, we are modeling air combat as SA-MDP. Second, to make the strategy network of air combat maneuver more robust in the presence of observation errors, we introduce regularizers into the policy gradient. Third, to solve the problem that air combat’s reward function is too sparse, we use MaxEnt IRL to design a shaping reward to accelerate the convergence of SA-DDPG.
doi_str_mv 10.3390/electronics9071121
format Article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2423956923</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2423956923</sourcerecordid><originalsourceid>FETCH-LOGICAL-c319t-5b60a6751032abaf9af63269bf0711fa784a42c8ecea06b20b3ecaee0724ec273</originalsourceid><addsrcrecordid>eNplUctOwzAQjBBIIOAHOFniHHDsNqmPobwqFYEo5RptnA0YJXZZO6B-Fn-ISzkgsYd9zsxKu0lykvEzKRU_xw51IGeN9ooXWSayneRA8EKlSiix-yffT469f-PRVCYnkh8kX8vymZVDcNb1bvCsRDLQsanrawjsDiwOH0hsEQgCvqzZDVqMqXGWfZrwyu5rj_SxbVwROWIX4LFhsVyESEnLJvI9_KheIq6iC0i9scYHo9mD64yOsgSNQRsY2IbN7IaC7BGNbR1p7DeTOQJZY1-Okr0WOo_Hv_EwWV5fPU1v0_n9zWxazlMtMxXScZ1zyItxxqWAGloFbS5Frup2c6IWiskIRkJPUCPwvBa8lqgBkRdihFoU8jA53equyL0P6EP15gaycWUlRkKqca6EjCixRWly3hO21YpMD7SuMl5tvlP9_478BiRuigo</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2423956923</pqid></control><display><type>article</type><title>UAV Autonomous Aerial Combat Maneuver Strategy Generation with Observation Error Based on State-Adversarial Deep Deterministic Policy Gradient and Inverse Reinforcement Learning</title><source>MDPI - Multidisciplinary Digital Publishing Institute</source><source>EZB-FREE-00999 freely available EZB journals</source><creator>Kong, Weiren ; Zhou, Deyun ; Yang, Zhen ; Zhao, Yiyang ; Zhang, Kai</creator><creatorcontrib>Kong, Weiren ; Zhou, Deyun ; Yang, Zhen ; Zhao, Yiyang ; Zhang, Kai</creatorcontrib><description>With the development of unmanned aerial vehicle (UAV) and artificial intelligence (AI) technology, Intelligent UAV will be widely used in future autonomous aerial combat. Previous researches on autonomous aerial combat within visual range (WVR) have limitations due to simplifying assumptions, limited robustness, and ignoring sensor errors. In this paper, in order to consider the error of the aircraft sensors, we model the aerial combat WVR as a state-adversarial Markov decision process (SA-MDP), which introduce the small adversarial perturbations on state observations and these perturbations do not alter the environment directly, but can mislead the agent into making suboptimal decisions. Meanwhile, we propose a novel autonomous aerial combat maneuver strategy generation algorithm with high-performance and high-robustness based on state-adversarial deep deterministic policy gradient algorithm (SA-DDPG), which add a robustness regularizers related to an upper bound on performance loss at the actor-network. At the same time, a reward shaping method based on maximum entropy (MaxEnt) inverse reinforcement learning algorithm (IRL) is proposed to improve the aerial combat strategy generation algorithm’s efficiency. Finally, the efficiency of the aerial combat strategy generation algorithm and the performance and robustness of the resulting aerial combat strategy is verified by simulation experiments. Our main contributions are three-fold. First, to introduce the observation errors of UAV, we are modeling air combat as SA-MDP. Second, to make the strategy network of air combat maneuver more robust in the presence of observation errors, we introduce regularizers into the policy gradient. Third, to solve the problem that air combat’s reward function is too sparse, we use MaxEnt IRL to design a shaping reward to accelerate the convergence of SA-DDPG.</description><identifier>ISSN: 2079-9292</identifier><identifier>EISSN: 2079-9292</identifier><identifier>DOI: 10.3390/electronics9071121</identifier><language>eng</language><publisher>Basel: MDPI AG</publisher><subject>Air combat ; Airborne observation ; Aircraft ; Algorithms ; Artificial intelligence ; Combat aircraft ; Computer simulation ; Decision making ; Game theory ; Ground stations ; Machine learning ; Markov analysis ; Markov processes ; Maximum entropy ; Neural networks ; Robustness ; Sensors ; Simulation ; Strategy ; Teaching methods ; Unmanned aerial vehicles ; Upper bounds</subject><ispartof>Electronics (Basel), 2020, Vol.9 (7), p.1121</ispartof><rights>2020. This work is licensed under http://creativecommons.org/licenses/by/3.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c319t-5b60a6751032abaf9af63269bf0711fa784a42c8ecea06b20b3ecaee0724ec273</citedby><cites>FETCH-LOGICAL-c319t-5b60a6751032abaf9af63269bf0711fa784a42c8ecea06b20b3ecaee0724ec273</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,776,780,4010,27900,27901,27902</link.rule.ids></links><search><creatorcontrib>Kong, Weiren</creatorcontrib><creatorcontrib>Zhou, Deyun</creatorcontrib><creatorcontrib>Yang, Zhen</creatorcontrib><creatorcontrib>Zhao, Yiyang</creatorcontrib><creatorcontrib>Zhang, Kai</creatorcontrib><title>UAV Autonomous Aerial Combat Maneuver Strategy Generation with Observation Error Based on State-Adversarial Deep Deterministic Policy Gradient and Inverse Reinforcement Learning</title><title>Electronics (Basel)</title><description>With the development of unmanned aerial vehicle (UAV) and artificial intelligence (AI) technology, Intelligent UAV will be widely used in future autonomous aerial combat. Previous researches on autonomous aerial combat within visual range (WVR) have limitations due to simplifying assumptions, limited robustness, and ignoring sensor errors. In this paper, in order to consider the error of the aircraft sensors, we model the aerial combat WVR as a state-adversarial Markov decision process (SA-MDP), which introduce the small adversarial perturbations on state observations and these perturbations do not alter the environment directly, but can mislead the agent into making suboptimal decisions. Meanwhile, we propose a novel autonomous aerial combat maneuver strategy generation algorithm with high-performance and high-robustness based on state-adversarial deep deterministic policy gradient algorithm (SA-DDPG), which add a robustness regularizers related to an upper bound on performance loss at the actor-network. At the same time, a reward shaping method based on maximum entropy (MaxEnt) inverse reinforcement learning algorithm (IRL) is proposed to improve the aerial combat strategy generation algorithm’s efficiency. Finally, the efficiency of the aerial combat strategy generation algorithm and the performance and robustness of the resulting aerial combat strategy is verified by simulation experiments. Our main contributions are three-fold. First, to introduce the observation errors of UAV, we are modeling air combat as SA-MDP. Second, to make the strategy network of air combat maneuver more robust in the presence of observation errors, we introduce regularizers into the policy gradient. Third, to solve the problem that air combat’s reward function is too sparse, we use MaxEnt IRL to design a shaping reward to accelerate the convergence of SA-DDPG.</description><subject>Air combat</subject><subject>Airborne observation</subject><subject>Aircraft</subject><subject>Algorithms</subject><subject>Artificial intelligence</subject><subject>Combat aircraft</subject><subject>Computer simulation</subject><subject>Decision making</subject><subject>Game theory</subject><subject>Ground stations</subject><subject>Machine learning</subject><subject>Markov analysis</subject><subject>Markov processes</subject><subject>Maximum entropy</subject><subject>Neural networks</subject><subject>Robustness</subject><subject>Sensors</subject><subject>Simulation</subject><subject>Strategy</subject><subject>Teaching methods</subject><subject>Unmanned aerial vehicles</subject><subject>Upper bounds</subject><issn>2079-9292</issn><issn>2079-9292</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2020</creationdate><recordtype>article</recordtype><sourceid>BENPR</sourceid><recordid>eNplUctOwzAQjBBIIOAHOFniHHDsNqmPobwqFYEo5RptnA0YJXZZO6B-Fn-ISzkgsYd9zsxKu0lykvEzKRU_xw51IGeN9ooXWSayneRA8EKlSiix-yffT469f-PRVCYnkh8kX8vymZVDcNb1bvCsRDLQsanrawjsDiwOH0hsEQgCvqzZDVqMqXGWfZrwyu5rj_SxbVwROWIX4LFhsVyESEnLJvI9_KheIq6iC0i9scYHo9mD64yOsgSNQRsY2IbN7IaC7BGNbR1p7DeTOQJZY1-Okr0WOo_Hv_EwWV5fPU1v0_n9zWxazlMtMxXScZ1zyItxxqWAGloFbS5Frup2c6IWiskIRkJPUCPwvBa8lqgBkRdihFoU8jA53equyL0P6EP15gaycWUlRkKqca6EjCixRWly3hO21YpMD7SuMl5tvlP9_478BiRuigo</recordid><startdate>2020</startdate><enddate>2020</enddate><creator>Kong, Weiren</creator><creator>Zhou, Deyun</creator><creator>Yang, Zhen</creator><creator>Zhao, Yiyang</creator><creator>Zhang, Kai</creator><general>MDPI AG</general><scope>AAYXX</scope><scope>CITATION</scope><scope>7SP</scope><scope>8FD</scope><scope>8FE</scope><scope>8FG</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L7M</scope><scope>P5Z</scope><scope>P62</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope></search><sort><creationdate>2020</creationdate><title>UAV Autonomous Aerial Combat Maneuver Strategy Generation with Observation Error Based on State-Adversarial Deep Deterministic Policy Gradient and Inverse Reinforcement Learning</title><author>Kong, Weiren ; Zhou, Deyun ; Yang, Zhen ; Zhao, Yiyang ; Zhang, Kai</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c319t-5b60a6751032abaf9af63269bf0711fa784a42c8ecea06b20b3ecaee0724ec273</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2020</creationdate><topic>Air combat</topic><topic>Airborne observation</topic><topic>Aircraft</topic><topic>Algorithms</topic><topic>Artificial intelligence</topic><topic>Combat aircraft</topic><topic>Computer simulation</topic><topic>Decision making</topic><topic>Game theory</topic><topic>Ground stations</topic><topic>Machine learning</topic><topic>Markov analysis</topic><topic>Markov processes</topic><topic>Maximum entropy</topic><topic>Neural networks</topic><topic>Robustness</topic><topic>Sensors</topic><topic>Simulation</topic><topic>Strategy</topic><topic>Teaching methods</topic><topic>Unmanned aerial vehicles</topic><topic>Upper bounds</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Kong, Weiren</creatorcontrib><creatorcontrib>Zhou, Deyun</creatorcontrib><creatorcontrib>Yang, Zhen</creatorcontrib><creatorcontrib>Zhao, Yiyang</creatorcontrib><creatorcontrib>Zhang, Kai</creatorcontrib><collection>CrossRef</collection><collection>Electronics &amp; Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>Advanced Technologies &amp; Aerospace Collection</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Advanced Technologies &amp; Aerospace Database</collection><collection>ProQuest Advanced Technologies &amp; Aerospace Collection</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><jtitle>Electronics (Basel)</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Kong, Weiren</au><au>Zhou, Deyun</au><au>Yang, Zhen</au><au>Zhao, Yiyang</au><au>Zhang, Kai</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>UAV Autonomous Aerial Combat Maneuver Strategy Generation with Observation Error Based on State-Adversarial Deep Deterministic Policy Gradient and Inverse Reinforcement Learning</atitle><jtitle>Electronics (Basel)</jtitle><date>2020</date><risdate>2020</risdate><volume>9</volume><issue>7</issue><spage>1121</spage><pages>1121-</pages><issn>2079-9292</issn><eissn>2079-9292</eissn><abstract>With the development of unmanned aerial vehicle (UAV) and artificial intelligence (AI) technology, Intelligent UAV will be widely used in future autonomous aerial combat. Previous researches on autonomous aerial combat within visual range (WVR) have limitations due to simplifying assumptions, limited robustness, and ignoring sensor errors. In this paper, in order to consider the error of the aircraft sensors, we model the aerial combat WVR as a state-adversarial Markov decision process (SA-MDP), which introduce the small adversarial perturbations on state observations and these perturbations do not alter the environment directly, but can mislead the agent into making suboptimal decisions. Meanwhile, we propose a novel autonomous aerial combat maneuver strategy generation algorithm with high-performance and high-robustness based on state-adversarial deep deterministic policy gradient algorithm (SA-DDPG), which add a robustness regularizers related to an upper bound on performance loss at the actor-network. At the same time, a reward shaping method based on maximum entropy (MaxEnt) inverse reinforcement learning algorithm (IRL) is proposed to improve the aerial combat strategy generation algorithm’s efficiency. Finally, the efficiency of the aerial combat strategy generation algorithm and the performance and robustness of the resulting aerial combat strategy is verified by simulation experiments. Our main contributions are three-fold. First, to introduce the observation errors of UAV, we are modeling air combat as SA-MDP. Second, to make the strategy network of air combat maneuver more robust in the presence of observation errors, we introduce regularizers into the policy gradient. Third, to solve the problem that air combat’s reward function is too sparse, we use MaxEnt IRL to design a shaping reward to accelerate the convergence of SA-DDPG.</abstract><cop>Basel</cop><pub>MDPI AG</pub><doi>10.3390/electronics9071121</doi><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 2079-9292
ispartof Electronics (Basel), 2020, Vol.9 (7), p.1121
issn 2079-9292
2079-9292
language eng
recordid cdi_proquest_journals_2423956923
source MDPI - Multidisciplinary Digital Publishing Institute; EZB-FREE-00999 freely available EZB journals
subjects Air combat
Airborne observation
Aircraft
Algorithms
Artificial intelligence
Combat aircraft
Computer simulation
Decision making
Game theory
Ground stations
Machine learning
Markov analysis
Markov processes
Maximum entropy
Neural networks
Robustness
Sensors
Simulation
Strategy
Teaching methods
Unmanned aerial vehicles
Upper bounds
title UAV Autonomous Aerial Combat Maneuver Strategy Generation with Observation Error Based on State-Adversarial Deep Deterministic Policy Gradient and Inverse Reinforcement Learning
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-10T11%3A10%3A27IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=UAV%20Autonomous%20Aerial%20Combat%20Maneuver%20Strategy%20Generation%20with%20Observation%20Error%20Based%20on%20State-Adversarial%20Deep%20Deterministic%20Policy%20Gradient%20and%20Inverse%20Reinforcement%20Learning&rft.jtitle=Electronics%20(Basel)&rft.au=Kong,%20Weiren&rft.date=2020&rft.volume=9&rft.issue=7&rft.spage=1121&rft.pages=1121-&rft.issn=2079-9292&rft.eissn=2079-9292&rft_id=info:doi/10.3390/electronics9071121&rft_dat=%3Cproquest_cross%3E2423956923%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2423956923&rft_id=info:pmid/&rfr_iscdi=true