Guidance Design for Escape Flight Vehicle Using Evolution Strategy Enhanced Deep Reinforcement Learning

Guidance commands of flight vehicles can be regarded as a series of data sets having fixed time intervals, thus guidance design constitutes a typical sequential decision problem and satisfies the basic conditions for using the deep reinforcement learning (DRL) technique. In this paper, we consider t...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE access 2024-01, Vol.12, p.1-1
Hauptverfasser: Hu, Xiao, Wang, Tianshu, Gong, Min, Yang, Shaoshi
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 1
container_issue
container_start_page 1
container_title IEEE access
container_volume 12
creator Hu, Xiao
Wang, Tianshu
Gong, Min
Yang, Shaoshi
description Guidance commands of flight vehicles can be regarded as a series of data sets having fixed time intervals, thus guidance design constitutes a typical sequential decision problem and satisfies the basic conditions for using the deep reinforcement learning (DRL) technique. In this paper, we consider the scenario where the escape flight vehicle (EFV) generates guidance commands based on the DRL technique and the pursuit flight vehicle (PFV) generates guidance commands based on the proportional navigation method. Evasion distance is described as the minimum distance between the EFV and the PFV during the escape-and-pursuit process. For the EFV, the objective of the guidance design entails progressively maximizing the residual velocity, which is described as the EFV's velocity when the evasion distance occurs, subject to the constraint imposed by the given evasion distance. Thus an irregular dynamic max-min problem of extremely large-scale is formulated. In this problem, the time instant when the optimal solution (i.e., the maximum residual velocity satisfying the evasion distance constraint) can be attained is uncertain and the optimum solution is dependent on all the intermediate guidance commands generated before. For solving this challenging problem, a two-step strategy is conceived. In the first step, we use the proximal policy optimization (PPO) algorithm to generate the guidance commands of the EFV. The results obtained by PPO in the global search space are coarse, despite the fact that the reward function, the neural network parameters and the learning rate are designed elaborately. Therefore, in the second step, we propose to invoke the evolution strategy (ES) based algorithm, which uses the result of PPO as the initial value, to further improve the quality of the solution by searching in the local space. Extensive simulation results demonstrate that the guidance design method relying on the proposed ES-enhanced PPO algorithm is highly effective.
doi_str_mv 10.1109/ACCESS.2024.3383322
format Article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_crossref_primary_10_1109_ACCESS_2024_3383322</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10485410</ieee_id><doaj_id>oai_doaj_org_article_747423c0bf2a427f95cef6a10c059f71</doaj_id><sourcerecordid>3033619251</sourcerecordid><originalsourceid>FETCH-LOGICAL-c359t-a16c36222b5b0a3954412578cb91231f311c1644025f5ae8ff94df6d89e5a2b73</originalsourceid><addsrcrecordid>eNpNkU1r5DAMhkPZQku3v2B7MPQ8U9uyk_hYZtMPGCh02r0ax5EzHqb2rJ1Z6L9v0pSlukgIvY8k3qL4xeiSMapublerZrNZcsrFEqAG4PykOOesVAuQUP74Vp8Vlznv6Bj12JLVedHfH31ngkXyG7PvA3ExkSZbc0Byt_f9diB_cOvtHslr9qEnzb-4Pw4-BrIZkhmwfydN2E6EbkTggTyjDyPE4huGgazRpDDqfhanzuwzXn7li-L1rnlZPSzWT_ePq9v1woJUw8Kw0kLJOW9lSw0oKQTjsqptqxgH5oAxy0ohKJdOGqydU6JzZVcrlIa3FVwUjzO3i2anD8m_mfSuo_H6sxFTr00apn90JSrBwdLWcSN45ZS06ErDqKVSuYqNrOuZdUjx7xHzoHfxmMJ4vgYKUDLF5TQF85RNMeeE7v9WRvVkkJ4N0pNB-sugUXU1qzwiflOIWgpG4QOsXosS</addsrcrecordid><sourcetype>Open Website</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3033619251</pqid></control><display><type>article</type><title>Guidance Design for Escape Flight Vehicle Using Evolution Strategy Enhanced Deep Reinforcement Learning</title><source>DOAJ Directory of Open Access Journals</source><source>IEEE Xplore Open Access Journals</source><source>EZB Electronic Journals Library</source><creator>Hu, Xiao ; Wang, Tianshu ; Gong, Min ; Yang, Shaoshi</creator><creatorcontrib>Hu, Xiao ; Wang, Tianshu ; Gong, Min ; Yang, Shaoshi</creatorcontrib><description>Guidance commands of flight vehicles can be regarded as a series of data sets having fixed time intervals, thus guidance design constitutes a typical sequential decision problem and satisfies the basic conditions for using the deep reinforcement learning (DRL) technique. In this paper, we consider the scenario where the escape flight vehicle (EFV) generates guidance commands based on the DRL technique and the pursuit flight vehicle (PFV) generates guidance commands based on the proportional navigation method. Evasion distance is described as the minimum distance between the EFV and the PFV during the escape-and-pursuit process. For the EFV, the objective of the guidance design entails progressively maximizing the residual velocity, which is described as the EFV's velocity when the evasion distance occurs, subject to the constraint imposed by the given evasion distance. Thus an irregular dynamic max-min problem of extremely large-scale is formulated. In this problem, the time instant when the optimal solution (i.e., the maximum residual velocity satisfying the evasion distance constraint) can be attained is uncertain and the optimum solution is dependent on all the intermediate guidance commands generated before. For solving this challenging problem, a two-step strategy is conceived. In the first step, we use the proximal policy optimization (PPO) algorithm to generate the guidance commands of the EFV. The results obtained by PPO in the global search space are coarse, despite the fact that the reward function, the neural network parameters and the learning rate are designed elaborately. Therefore, in the second step, we propose to invoke the evolution strategy (ES) based algorithm, which uses the result of PPO as the initial value, to further improve the quality of the solution by searching in the local space. Extensive simulation results demonstrate that the guidance design method relying on the proposed ES-enhanced PPO algorithm is highly effective.</description><identifier>ISSN: 2169-3536</identifier><identifier>EISSN: 2169-3536</identifier><identifier>DOI: 10.1109/ACCESS.2024.3383322</identifier><identifier>CODEN: IAECCG</identifier><language>eng</language><publisher>Piscataway: IEEE</publisher><subject>Algorithms ; Commands ; Deep learning ; Deep reinforcement learning ; Differential games ; Earth ; Evolution ; evolution strategy (ES) ; Evolutionary computation ; Flight ; Flight vehicles ; guidance design ; Machine learning ; max-min problem ; Minimax techniques ; Navigation ; Neural networks ; Optimization ; Proportional navigation ; proximal policy optimization (PPO) ; Real-time systems ; Vectors ; Velocity</subject><ispartof>IEEE access, 2024-01, Vol.12, p.1-1</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2024</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c359t-a16c36222b5b0a3954412578cb91231f311c1644025f5ae8ff94df6d89e5a2b73</cites><orcidid>0000-0003-2395-1637 ; 0009-0007-3011-7858</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10485410$$EHTML$$P50$$Gieee$$Hfree_for_read</linktohtml><link.rule.ids>314,780,784,864,2102,27633,27924,27925,54933</link.rule.ids></links><search><creatorcontrib>Hu, Xiao</creatorcontrib><creatorcontrib>Wang, Tianshu</creatorcontrib><creatorcontrib>Gong, Min</creatorcontrib><creatorcontrib>Yang, Shaoshi</creatorcontrib><title>Guidance Design for Escape Flight Vehicle Using Evolution Strategy Enhanced Deep Reinforcement Learning</title><title>IEEE access</title><addtitle>Access</addtitle><description>Guidance commands of flight vehicles can be regarded as a series of data sets having fixed time intervals, thus guidance design constitutes a typical sequential decision problem and satisfies the basic conditions for using the deep reinforcement learning (DRL) technique. In this paper, we consider the scenario where the escape flight vehicle (EFV) generates guidance commands based on the DRL technique and the pursuit flight vehicle (PFV) generates guidance commands based on the proportional navigation method. Evasion distance is described as the minimum distance between the EFV and the PFV during the escape-and-pursuit process. For the EFV, the objective of the guidance design entails progressively maximizing the residual velocity, which is described as the EFV's velocity when the evasion distance occurs, subject to the constraint imposed by the given evasion distance. Thus an irregular dynamic max-min problem of extremely large-scale is formulated. In this problem, the time instant when the optimal solution (i.e., the maximum residual velocity satisfying the evasion distance constraint) can be attained is uncertain and the optimum solution is dependent on all the intermediate guidance commands generated before. For solving this challenging problem, a two-step strategy is conceived. In the first step, we use the proximal policy optimization (PPO) algorithm to generate the guidance commands of the EFV. The results obtained by PPO in the global search space are coarse, despite the fact that the reward function, the neural network parameters and the learning rate are designed elaborately. Therefore, in the second step, we propose to invoke the evolution strategy (ES) based algorithm, which uses the result of PPO as the initial value, to further improve the quality of the solution by searching in the local space. Extensive simulation results demonstrate that the guidance design method relying on the proposed ES-enhanced PPO algorithm is highly effective.</description><subject>Algorithms</subject><subject>Commands</subject><subject>Deep learning</subject><subject>Deep reinforcement learning</subject><subject>Differential games</subject><subject>Earth</subject><subject>Evolution</subject><subject>evolution strategy (ES)</subject><subject>Evolutionary computation</subject><subject>Flight</subject><subject>Flight vehicles</subject><subject>guidance design</subject><subject>Machine learning</subject><subject>max-min problem</subject><subject>Minimax techniques</subject><subject>Navigation</subject><subject>Neural networks</subject><subject>Optimization</subject><subject>Proportional navigation</subject><subject>proximal policy optimization (PPO)</subject><subject>Real-time systems</subject><subject>Vectors</subject><subject>Velocity</subject><issn>2169-3536</issn><issn>2169-3536</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>ESBDL</sourceid><sourceid>RIE</sourceid><sourceid>DOA</sourceid><recordid>eNpNkU1r5DAMhkPZQku3v2B7MPQ8U9uyk_hYZtMPGCh02r0ax5EzHqb2rJ1Z6L9v0pSlukgIvY8k3qL4xeiSMapublerZrNZcsrFEqAG4PykOOesVAuQUP74Vp8Vlznv6Bj12JLVedHfH31ngkXyG7PvA3ExkSZbc0Byt_f9diB_cOvtHslr9qEnzb-4Pw4-BrIZkhmwfydN2E6EbkTggTyjDyPE4huGgazRpDDqfhanzuwzXn7li-L1rnlZPSzWT_ePq9v1woJUw8Kw0kLJOW9lSw0oKQTjsqptqxgH5oAxy0ohKJdOGqydU6JzZVcrlIa3FVwUjzO3i2anD8m_mfSuo_H6sxFTr00apn90JSrBwdLWcSN45ZS06ErDqKVSuYqNrOuZdUjx7xHzoHfxmMJ4vgYKUDLF5TQF85RNMeeE7v9WRvVkkJ4N0pNB-sugUXU1qzwiflOIWgpG4QOsXosS</recordid><startdate>20240101</startdate><enddate>20240101</enddate><creator>Hu, Xiao</creator><creator>Wang, Tianshu</creator><creator>Gong, Min</creator><creator>Yang, Shaoshi</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>ESBDL</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>7SR</scope><scope>8BQ</scope><scope>8FD</scope><scope>JG9</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>DOA</scope><orcidid>https://orcid.org/0000-0003-2395-1637</orcidid><orcidid>https://orcid.org/0009-0007-3011-7858</orcidid></search><sort><creationdate>20240101</creationdate><title>Guidance Design for Escape Flight Vehicle Using Evolution Strategy Enhanced Deep Reinforcement Learning</title><author>Hu, Xiao ; Wang, Tianshu ; Gong, Min ; Yang, Shaoshi</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c359t-a16c36222b5b0a3954412578cb91231f311c1644025f5ae8ff94df6d89e5a2b73</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Algorithms</topic><topic>Commands</topic><topic>Deep learning</topic><topic>Deep reinforcement learning</topic><topic>Differential games</topic><topic>Earth</topic><topic>Evolution</topic><topic>evolution strategy (ES)</topic><topic>Evolutionary computation</topic><topic>Flight</topic><topic>Flight vehicles</topic><topic>guidance design</topic><topic>Machine learning</topic><topic>max-min problem</topic><topic>Minimax techniques</topic><topic>Navigation</topic><topic>Neural networks</topic><topic>Optimization</topic><topic>Proportional navigation</topic><topic>proximal policy optimization (PPO)</topic><topic>Real-time systems</topic><topic>Vectors</topic><topic>Velocity</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Hu, Xiao</creatorcontrib><creatorcontrib>Wang, Tianshu</creatorcontrib><creatorcontrib>Gong, Min</creatorcontrib><creatorcontrib>Yang, Shaoshi</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005–Present</collection><collection>IEEE Xplore Open Access Journals</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998–Present</collection><collection>IEEE Electronic Library Online</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics &amp; Communications Abstracts</collection><collection>Engineered Materials Abstracts</collection><collection>METADEX</collection><collection>Technology Research Database</collection><collection>Materials Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>DOAJ Directory of Open Access Journals</collection><jtitle>IEEE access</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Hu, Xiao</au><au>Wang, Tianshu</au><au>Gong, Min</au><au>Yang, Shaoshi</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Guidance Design for Escape Flight Vehicle Using Evolution Strategy Enhanced Deep Reinforcement Learning</atitle><jtitle>IEEE access</jtitle><stitle>Access</stitle><date>2024-01-01</date><risdate>2024</risdate><volume>12</volume><spage>1</spage><epage>1</epage><pages>1-1</pages><issn>2169-3536</issn><eissn>2169-3536</eissn><coden>IAECCG</coden><abstract>Guidance commands of flight vehicles can be regarded as a series of data sets having fixed time intervals, thus guidance design constitutes a typical sequential decision problem and satisfies the basic conditions for using the deep reinforcement learning (DRL) technique. In this paper, we consider the scenario where the escape flight vehicle (EFV) generates guidance commands based on the DRL technique and the pursuit flight vehicle (PFV) generates guidance commands based on the proportional navigation method. Evasion distance is described as the minimum distance between the EFV and the PFV during the escape-and-pursuit process. For the EFV, the objective of the guidance design entails progressively maximizing the residual velocity, which is described as the EFV's velocity when the evasion distance occurs, subject to the constraint imposed by the given evasion distance. Thus an irregular dynamic max-min problem of extremely large-scale is formulated. In this problem, the time instant when the optimal solution (i.e., the maximum residual velocity satisfying the evasion distance constraint) can be attained is uncertain and the optimum solution is dependent on all the intermediate guidance commands generated before. For solving this challenging problem, a two-step strategy is conceived. In the first step, we use the proximal policy optimization (PPO) algorithm to generate the guidance commands of the EFV. The results obtained by PPO in the global search space are coarse, despite the fact that the reward function, the neural network parameters and the learning rate are designed elaborately. Therefore, in the second step, we propose to invoke the evolution strategy (ES) based algorithm, which uses the result of PPO as the initial value, to further improve the quality of the solution by searching in the local space. Extensive simulation results demonstrate that the guidance design method relying on the proposed ES-enhanced PPO algorithm is highly effective.</abstract><cop>Piscataway</cop><pub>IEEE</pub><doi>10.1109/ACCESS.2024.3383322</doi><tpages>1</tpages><orcidid>https://orcid.org/0000-0003-2395-1637</orcidid><orcidid>https://orcid.org/0009-0007-3011-7858</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 2169-3536
ispartof IEEE access, 2024-01, Vol.12, p.1-1
issn 2169-3536
2169-3536
language eng
recordid cdi_crossref_primary_10_1109_ACCESS_2024_3383322
source DOAJ Directory of Open Access Journals; IEEE Xplore Open Access Journals; EZB Electronic Journals Library
subjects Algorithms
Commands
Deep learning
Deep reinforcement learning
Differential games
Earth
Evolution
evolution strategy (ES)
Evolutionary computation
Flight
Flight vehicles
guidance design
Machine learning
max-min problem
Minimax techniques
Navigation
Neural networks
Optimization
Proportional navigation
proximal policy optimization (PPO)
Real-time systems
Vectors
Velocity
title Guidance Design for Escape Flight Vehicle Using Evolution Strategy Enhanced Deep Reinforcement Learning
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-25T09%3A03%3A58IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Guidance%20Design%20for%20Escape%20Flight%20Vehicle%20Using%20Evolution%20Strategy%20Enhanced%20Deep%20Reinforcement%20Learning&rft.jtitle=IEEE%20access&rft.au=Hu,%20Xiao&rft.date=2024-01-01&rft.volume=12&rft.spage=1&rft.epage=1&rft.pages=1-1&rft.issn=2169-3536&rft.eissn=2169-3536&rft.coden=IAECCG&rft_id=info:doi/10.1109/ACCESS.2024.3383322&rft_dat=%3Cproquest_cross%3E3033619251%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=3033619251&rft_id=info:pmid/&rft_ieee_id=10485410&rft_doaj_id=oai_doaj_org_article_747423c0bf2a427f95cef6a10c059f71&rfr_iscdi=true