Guidance Design for Escape Flight Vehicle Using Evolution Strategy Enhanced Deep Reinforcement Learning

Guidance commands of flight vehicles can be regarded as a series of data sets having fixed time intervals, thus guidance design constitutes a typical sequential decision problem and satisfies the basic conditions for using the deep reinforcement learning (DRL) technique. In this paper, we consider t...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE access 2024-01, Vol.12, p.1-1
Hauptverfasser:	Hu, Xiao, Wang, Tianshu, Gong, Min, Yang, Shaoshi
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Commands Deep learning Deep reinforcement learning Differential games Earth Evolution evolution strategy (ES) Evolutionary computation Flight Flight vehicles guidance design Machine learning max-min problem Minimax techniques Navigation Neural networks Optimization Proportional navigation proximal policy optimization (PPO) Real-time systems Vectors Velocity
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	1
container_issue
container_start_page	1
container_title	IEEE access
container_volume	12
creator	Hu, Xiao Wang, Tianshu Gong, Min Yang, Shaoshi
description	Guidance commands of flight vehicles can be regarded as a series of data sets having fixed time intervals, thus guidance design constitutes a typical sequential decision problem and satisfies the basic conditions for using the deep reinforcement learning (DRL) technique. In this paper, we consider the scenario where the escape flight vehicle (EFV) generates guidance commands based on the DRL technique and the pursuit flight vehicle (PFV) generates guidance commands based on the proportional navigation method. Evasion distance is described as the minimum distance between the EFV and the PFV during the escape-and-pursuit process. For the EFV, the objective of the guidance design entails progressively maximizing the residual velocity, which is described as the EFV's velocity when the evasion distance occurs, subject to the constraint imposed by the given evasion distance. Thus an irregular dynamic max-min problem of extremely large-scale is formulated. In this problem, the time instant when the optimal solution (i.e., the maximum residual velocity satisfying the evasion distance constraint) can be attained is uncertain and the optimum solution is dependent on all the intermediate guidance commands generated before. For solving this challenging problem, a two-step strategy is conceived. In the first step, we use the proximal policy optimization (PPO) algorithm to generate the guidance commands of the EFV. The results obtained by PPO in the global search space are coarse, despite the fact that the reward function, the neural network parameters and the learning rate are designed elaborately. Therefore, in the second step, we propose to invoke the evolution strategy (ES) based algorithm, which uses the result of PPO as the initial value, to further improve the quality of the solution by searching in the local space. Extensive simulation results demonstrate that the guidance design method relying on the proposed ES-enhanced PPO algorithm is highly effective.
doi_str_mv	10.1109/ACCESS.2024.3383322
format	Article
fullrecord	<record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_crossref_primary_10_1109_ACCESS_2024_3383322</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10485410</ieee_id><doaj_id>oai_doaj_org_article_747423c0bf2a427f95cef6a10c059f71</doaj_id><sourcerecordid>3033619251</sourcerecordid><originalsourceid>FETCH-LOGICAL-c359t-a16c36222b5b0a3954412578cb91231f311c1644025f5ae8ff94df6d89e5a2b73</originalsourceid><addsrcrecordid>eNpNkU1r5DAMhkPZQku3v2B7MPQ8U9uyk_hYZtMPGCh02r0ax5EzHqb2rJ1Z6L9v0pSlukgIvY8k3qL4xeiSMapublerZrNZcsrFEqAG4PykOOesVAuQUP74Vp8Vlznv6Bj12JLVedHfH31ngkXyG7PvA3ExkSZbc0Byt_f9diB_cOvtHslr9qEnzb-4Pw4-BrIZkhmwfydN2E6EbkTggTyjDyPE4huGgazRpDDqfhanzuwzXn7li-L1rnlZPSzWT_ePq9v1woJUw8Kw0kLJOW9lSw0oKQTjsqptqxgH5oAxy0ohKJdOGqydU6JzZVcrlIa3FVwUjzO3i2anD8m_mfSuo_H6sxFTr00apn90JSrBwdLWcSN45ZS06ErDqKVSuYqNrOuZdUjx7xHzoHfxmMJ4vgYKUDLF5TQF85RNMeeE7v9WRvVkkJ4N0pNB-sugUXU1qzwiflOIWgpG4QOsXosS</addsrcrecordid><sourcetype>Open Website</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3033619251</pqid></control><display><type>article</type><title>Guidance Design for Escape Flight Vehicle Using Evolution Strategy Enhanced Deep Reinforcement Learning</title><source>DOAJ Directory of Open Access Journals</source><source>IEEE Xplore Open Access Journals</source><source>EZB Electronic Journals Library</source><creator>Hu, Xiao ; Wang, Tianshu ; Gong, Min ; Yang, Shaoshi</creator><creatorcontrib>Hu, Xiao ; Wang, Tianshu ; Gong, Min ; Yang, Shaoshi</creatorcontrib><description>Guidance commands of flight vehicles can be regarded as a series of data sets having fixed time intervals, thus guidance design constitutes a typical sequential decision problem and satisfies the basic conditions for using the deep reinforcement learning (DRL) technique. In this paper, we consider the scenario where the escape flight vehicle (EFV) generates guidance commands based on the DRL technique and the pursuit flight vehicle (PFV) generates guidance commands based on the proportional navigation method. Evasion distance is described as the minimum distance between the EFV and the PFV during the escape-and-pursuit process. For the EFV, the objective of the guidance design entails progressively maximizing the residual velocity, which is described as the EFV's velocity when the evasion distance occurs, subject to the constraint imposed by the given evasion distance. Thus an irregular dynamic max-min problem of extremely large-scale is formulated. In this problem, the time instant when the optimal solution (i.e., the maximum residual velocity satisfying the evasion distance constraint) can be attained is uncertain and the optimum solution is dependent on all the intermediate guidance commands generated before. For solving this challenging problem, a two-step strategy is conceived. In the first step, we use the proximal policy optimization (PPO) algorithm to generate the guidance commands of the EFV. The results obtained by PPO in the global search space are coarse, despite the fact that the reward function, the neural network parameters and the learning rate are designed elaborately. Therefore, in the second step, we propose to invoke the evolution strategy (ES) based algorithm, which uses the result of PPO as the initial value, to further improve the quality of the solution by searching in the local space. Extensive simulation results demonstrate that the guidance design method relying on the proposed ES-enhanced PPO algorithm is highly effective.</description><identifier>ISSN: 2169-3536</identifier><identifier>EISSN: 2169-3536</identifier><identifier>DOI: 10.1109/ACCESS.2024.3383322</identifier><identifier>CODEN: IAECCG</identifier><language>eng</language><publisher>Piscataway: IEEE</publisher><subject>Algorithms ; Commands ; Deep learning ; Deep reinforcement learning ; Differential games ; Earth ; Evolution ; evolution strategy (ES) ; Evolutionary computation ; Flight ; Flight vehicles ; guidance design ; Machine learning ; max-min problem ; Minimax techniques ; Navigation ; Neural networks ; Optimization ; Proportional navigation ; proximal policy optimization (PPO) ; Real-time systems ; Vectors ; Velocity</subject><ispartof>IEEE access, 2024-01, Vol.12, p.1-1</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2024</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c359t-a16c36222b5b0a3954412578cb91231f311c1644025f5ae8ff94df6d89e5a2b73</cites><orcidid>0000-0003-2395-1637 ; 0009-0007-3011-7858</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10485410$$EHTML$$P50$$Gieee$$Hfree_for_read</linktohtml><link.rule.ids>314,780,784,864,2102,27633,27924,27925,54933</link.rule.ids></links><search><creatorcontrib>Hu, Xiao</creatorcontrib><creatorcontrib>Wang, Tianshu</creatorcontrib><creatorcontrib>Gong, Min</creatorcontrib><creatorcontrib>Yang, Shaoshi</creatorcontrib><title>Guidance Design for Escape Flight Vehicle Using Evolution Strategy Enhanced Deep Reinforcement Learning</title><title>IEEE access</title><addtitle>Access</addtitle><description>Guidance commands of flight vehicles can be regarded as a series of data sets having fixed time intervals, thus guidance design constitutes a typical sequential decision problem and satisfies the basic conditions for using the deep reinforcement learning (DRL) technique. In this paper, we consider the scenario where the escape flight vehicle (EFV) generates guidance commands based on the DRL technique and the pursuit flight vehicle (PFV) generates guidance commands based on the proportional navigation method. Evasion distance is described as the minimum distance between the EFV and the PFV during the escape-and-pursuit process. For the EFV, the objective of the guidance design entails progressively maximizing the residual velocity, which is described as the EFV's velocity when the evasion distance occurs, subject to the constraint imposed by the given evasion distance. Thus an irregular dynamic max-min problem of extremely large-scale is formulated. In this problem, the time instant when the optimal solution (i.e., the maximum residual velocity satisfying the evasion distance constraint) can be attained is uncertain and the optimum solution is dependent on all the intermediate guidance commands generated before. For solving this challenging problem, a two-step strategy is conceived. In the first step, we use the proximal policy optimization (PPO) algorithm to generate the guidance commands of the EFV. The results obtained by PPO in the global search space are coarse, despite the fact that the reward function, the neural network parameters and the learning rate are designed elaborately. Therefore, in the second step, we propose to invoke the evolution strategy (ES) based algorithm, which uses the result of PPO as the initial value, to further improve the quality of the solution by searching in the local space. Extensive simulation results demonstrate that the guidance design method relying on the proposed ES-enhanced PPO algorithm is highly effective.</description><subject>Algorithms</subject><subject>Commands</subject><subject>Deep learning</subject><subject>Deep reinforcement learning</subject><subject>Differential games</subject><subject>Earth</subject><subject>Evolution</subject><subject>evolution strategy (ES)</subject><subject>Evolutionary computation</subject><subject>Flight</subject><subject>Flight vehicles</subject><subject>guidance design</subject><subject>Machine learning</subject><subject>max-min problem</subject><subject>Minimax techniques</subject><subject>Navigation</subject><subject>Neural networks</subject><subject>Optimization</subject><subject>Proportional navigation</subject><subject>proximal policy optimization (PPO)</subject><subject>Real-time systems</subject><subject>Vectors</subject><subject>Velocity</subject><issn>2169-3536</issn><issn>2169-3536</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>ESBDL</sourceid><sourceid>RIE</sourceid><sourceid>DOA</sourceid><recordid>eNpNkU1r5DAMhkPZQku3v2B7MPQ8U9uyk_hYZtMPGCh02r0ax5EzHqb2rJ1Z6L9v0pSlukgIvY8k3qL4xeiSMapublerZrNZcsrFEqAG4PykOOesVAuQUP74Vp8Vlznv6Bj12JLVedHfH31ngkXyG7PvA3ExkSZbc0Byt_f9diB_cOvtHslr9qEnzb-4Pw4-BrIZkhmwfydN2E6EbkTggTyjDyPE4huGgazRpDDqfhanzuwzXn7li-L1rnlZPSzWT_ePq9v1woJUw8Kw0kLJOW9lSw0oKQTjsqptqxgH5oAxy0ohKJdOGqydU6JzZVcrlIa3FVwUjzO3i2anD8m_mfSuo_H6sxFTr00apn90JSrBwdLWcSN45ZS06ErDqKVSuYqNrOuZdUjx7xHzoHfxmMJ4vgYKUDLF5TQF85RNMeeE7v9WRvVkkJ4N0pNB-sugUXU1qzwiflOIWgpG4QOsXosS</recordid><startdate>20240101</startdate><enddate>20240101</enddate><creator>Hu, Xiao</creator><creator>Wang, Tianshu</creator><creator>Gong, Min</creator><creator>Yang, Shaoshi</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>ESBDL</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>7SR</scope><scope>8BQ</scope><scope>8FD</scope><scope>JG9</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>DOA</scope><orcidid>https://orcid.org/0000-0003-2395-1637</orcidid><orcidid>https://orcid.org/0009-0007-3011-7858</orcidid></search><sort><creationdate>20240101</creationdate><title>Guidance Design for Escape Flight Vehicle Using Evolution Strategy Enhanced Deep Reinforcement Learning</title><author>Hu, Xiao ; Wang, Tianshu ; Gong, Min ; Yang, Shaoshi</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c359t-a16c36222b5b0a3954412578cb91231f311c1644025f5ae8ff94df6d89e5a2b73</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Algorithms</topic><topic>Commands</topic><topic>Deep learning</topic><topic>Deep reinforcement learning</topic><topic>Differential games</topic><topic>Earth</topic><topic>Evolution</topic><topic>evolution strategy (ES)</topic><topic>Evolutionary computation</topic><topic>Flight</topic><topic>Flight vehicles</topic><topic>guidance design</topic><topic>Machine learning</topic><topic>max-min problem</topic><topic>Minimax techniques</topic><topic>Navigation</topic><topic>Neural networks</topic><topic>Optimization</topic><topic>Proportional navigation</topic><topic>proximal policy optimization (PPO)</topic><topic>Real-time systems</topic><topic>Vectors</topic><topic>Velocity</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Hu, Xiao</creatorcontrib><creatorcontrib>Wang, Tianshu</creatorcontrib><creatorcontrib>Gong, Min</creatorcontrib><creatorcontrib>Yang, Shaoshi</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005–Present</collection><collection>IEEE Xplore Open Access Journals</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998–Present</collection><collection>IEEE Electronic Library Online</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Engineered Materials Abstracts</collection><collection>METADEX</collection><collection>Technology Research Database</collection><collection>Materials Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>DOAJ Directory of Open Access Journals</collection><jtitle>IEEE access</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Hu, Xiao</au><au>Wang, Tianshu</au><au>Gong, Min</au><au>Yang, Shaoshi</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Guidance Design for Escape Flight Vehicle Using Evolution Strategy Enhanced Deep Reinforcement Learning</atitle><jtitle>IEEE access</jtitle><stitle>Access</stitle><date>2024-01-01</date><risdate>2024</risdate><volume>12</volume><spage>1</spage><epage>1</epage><pages>1-1</pages><issn>2169-3536</issn><eissn>2169-3536</eissn><coden>IAECCG</coden><abstract>Guidance commands of flight vehicles can be regarded as a series of data sets having fixed time intervals, thus guidance design constitutes a typical sequential decision problem and satisfies the basic conditions for using the deep reinforcement learning (DRL) technique. In this paper, we consider the scenario where the escape flight vehicle (EFV) generates guidance commands based on the DRL technique and the pursuit flight vehicle (PFV) generates guidance commands based on the proportional navigation method. Evasion distance is described as the minimum distance between the EFV and the PFV during the escape-and-pursuit process. For the EFV, the objective of the guidance design entails progressively maximizing the residual velocity, which is described as the EFV's velocity when the evasion distance occurs, subject to the constraint imposed by the given evasion distance. Thus an irregular dynamic max-min problem of extremely large-scale is formulated. In this problem, the time instant when the optimal solution (i.e., the maximum residual velocity satisfying the evasion distance constraint) can be attained is uncertain and the optimum solution is dependent on all the intermediate guidance commands generated before. For solving this challenging problem, a two-step strategy is conceived. In the first step, we use the proximal policy optimization (PPO) algorithm to generate the guidance commands of the EFV. The results obtained by PPO in the global search space are coarse, despite the fact that the reward function, the neural network parameters and the learning rate are designed elaborately. Therefore, in the second step, we propose to invoke the evolution strategy (ES) based algorithm, which uses the result of PPO as the initial value, to further improve the quality of the solution by searching in the local space. Extensive simulation results demonstrate that the guidance design method relying on the proposed ES-enhanced PPO algorithm is highly effective.</abstract><cop>Piscataway</cop><pub>IEEE</pub><doi>10.1109/ACCESS.2024.3383322</doi><tpages>1</tpages><orcidid>https://orcid.org/0000-0003-2395-1637</orcidid><orcidid>https://orcid.org/0009-0007-3011-7858</orcidid><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	ISSN: 2169-3536
ispartof	IEEE access, 2024-01, Vol.12, p.1-1
issn	2169-3536 2169-3536
language	eng
recordid	cdi_crossref_primary_10_1109_ACCESS_2024_3383322
source	DOAJ Directory of Open Access Journals; IEEE Xplore Open Access Journals; EZB Electronic Journals Library
subjects	Algorithms Commands Deep learning Deep reinforcement learning Differential games Earth Evolution evolution strategy (ES) Evolutionary computation Flight Flight vehicles guidance design Machine learning max-min problem Minimax techniques Navigation Neural networks Optimization Proportional navigation proximal policy optimization (PPO) Real-time systems Vectors Velocity
title	Guidance Design for Escape Flight Vehicle Using Evolution Strategy Enhanced Deep Reinforcement Learning
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-25T09%3A03%3A58IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Guidance%20Design%20for%20Escape%20Flight%20Vehicle%20Using%20Evolution%20Strategy%20Enhanced%20Deep%20Reinforcement%20Learning&rft.jtitle=IEEE%20access&rft.au=Hu,%20Xiao&rft.date=2024-01-01&rft.volume=12&rft.spage=1&rft.epage=1&rft.pages=1-1&rft.issn=2169-3536&rft.eissn=2169-3536&rft.coden=IAECCG&rft_id=info:doi/10.1109/ACCESS.2024.3383322&rft_dat=%3Cproquest_cross%3E3033619251%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=3033619251&rft_id=info:pmid/&rft_ieee_id=10485410&rft_doaj_id=oai_doaj_org_article_747423c0bf2a427f95cef6a10c059f71&rfr_iscdi=true