Guidance Design for Escape Flight Vehicle Using Evolution Strategy Enhanced Deep Reinforcement Learning
Guidance commands of flight vehicles can be regarded as a series of data sets having fixed time intervals, thus guidance design constitutes a typical sequential decision problem and satisfies the basic conditions for using the deep reinforcement learning (DRL) technique. In this paper, we consider t...
Gespeichert in:
Veröffentlicht in: | IEEE access 2024-01, Vol.12, p.1-1 |
---|---|
Hauptverfasser: | , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 1 |
---|---|
container_issue | |
container_start_page | 1 |
container_title | IEEE access |
container_volume | 12 |
creator | Hu, Xiao Wang, Tianshu Gong, Min Yang, Shaoshi |
description | Guidance commands of flight vehicles can be regarded as a series of data sets having fixed time intervals, thus guidance design constitutes a typical sequential decision problem and satisfies the basic conditions for using the deep reinforcement learning (DRL) technique. In this paper, we consider the scenario where the escape flight vehicle (EFV) generates guidance commands based on the DRL technique and the pursuit flight vehicle (PFV) generates guidance commands based on the proportional navigation method. Evasion distance is described as the minimum distance between the EFV and the PFV during the escape-and-pursuit process. For the EFV, the objective of the guidance design entails progressively maximizing the residual velocity, which is described as the EFV's velocity when the evasion distance occurs, subject to the constraint imposed by the given evasion distance. Thus an irregular dynamic max-min problem of extremely large-scale is formulated. In this problem, the time instant when the optimal solution (i.e., the maximum residual velocity satisfying the evasion distance constraint) can be attained is uncertain and the optimum solution is dependent on all the intermediate guidance commands generated before. For solving this challenging problem, a two-step strategy is conceived. In the first step, we use the proximal policy optimization (PPO) algorithm to generate the guidance commands of the EFV. The results obtained by PPO in the global search space are coarse, despite the fact that the reward function, the neural network parameters and the learning rate are designed elaborately. Therefore, in the second step, we propose to invoke the evolution strategy (ES) based algorithm, which uses the result of PPO as the initial value, to further improve the quality of the solution by searching in the local space. Extensive simulation results demonstrate that the guidance design method relying on the proposed ES-enhanced PPO algorithm is highly effective. |
doi_str_mv | 10.1109/ACCESS.2024.3383322 |
format | Article |
fullrecord | <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_crossref_primary_10_1109_ACCESS_2024_3383322</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10485410</ieee_id><doaj_id>oai_doaj_org_article_747423c0bf2a427f95cef6a10c059f71</doaj_id><sourcerecordid>3033619251</sourcerecordid><originalsourceid>FETCH-LOGICAL-c359t-a16c36222b5b0a3954412578cb91231f311c1644025f5ae8ff94df6d89e5a2b73</originalsourceid><addsrcrecordid>eNpNkU1r5DAMhkPZQku3v2B7MPQ8U9uyk_hYZtMPGCh02r0ax5EzHqb2rJ1Z6L9v0pSlukgIvY8k3qL4xeiSMapublerZrNZcsrFEqAG4PykOOesVAuQUP74Vp8Vlznv6Bj12JLVedHfH31ngkXyG7PvA3ExkSZbc0Byt_f9diB_cOvtHslr9qEnzb-4Pw4-BrIZkhmwfydN2E6EbkTggTyjDyPE4huGgazRpDDqfhanzuwzXn7li-L1rnlZPSzWT_ePq9v1woJUw8Kw0kLJOW9lSw0oKQTjsqptqxgH5oAxy0ohKJdOGqydU6JzZVcrlIa3FVwUjzO3i2anD8m_mfSuo_H6sxFTr00apn90JSrBwdLWcSN45ZS06ErDqKVSuYqNrOuZdUjx7xHzoHfxmMJ4vgYKUDLF5TQF85RNMeeE7v9WRvVkkJ4N0pNB-sugUXU1qzwiflOIWgpG4QOsXosS</addsrcrecordid><sourcetype>Open Website</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3033619251</pqid></control><display><type>article</type><title>Guidance Design for Escape Flight Vehicle Using Evolution Strategy Enhanced Deep Reinforcement Learning</title><source>DOAJ Directory of Open Access Journals</source><source>IEEE Xplore Open Access Journals</source><source>EZB Electronic Journals Library</source><creator>Hu, Xiao ; Wang, Tianshu ; Gong, Min ; Yang, Shaoshi</creator><creatorcontrib>Hu, Xiao ; Wang, Tianshu ; Gong, Min ; Yang, Shaoshi</creatorcontrib><description>Guidance commands of flight vehicles can be regarded as a series of data sets having fixed time intervals, thus guidance design constitutes a typical sequential decision problem and satisfies the basic conditions for using the deep reinforcement learning (DRL) technique. In this paper, we consider the scenario where the escape flight vehicle (EFV) generates guidance commands based on the DRL technique and the pursuit flight vehicle (PFV) generates guidance commands based on the proportional navigation method. Evasion distance is described as the minimum distance between the EFV and the PFV during the escape-and-pursuit process. For the EFV, the objective of the guidance design entails progressively maximizing the residual velocity, which is described as the EFV's velocity when the evasion distance occurs, subject to the constraint imposed by the given evasion distance. Thus an irregular dynamic max-min problem of extremely large-scale is formulated. In this problem, the time instant when the optimal solution (i.e., the maximum residual velocity satisfying the evasion distance constraint) can be attained is uncertain and the optimum solution is dependent on all the intermediate guidance commands generated before. For solving this challenging problem, a two-step strategy is conceived. In the first step, we use the proximal policy optimization (PPO) algorithm to generate the guidance commands of the EFV. The results obtained by PPO in the global search space are coarse, despite the fact that the reward function, the neural network parameters and the learning rate are designed elaborately. Therefore, in the second step, we propose to invoke the evolution strategy (ES) based algorithm, which uses the result of PPO as the initial value, to further improve the quality of the solution by searching in the local space. Extensive simulation results demonstrate that the guidance design method relying on the proposed ES-enhanced PPO algorithm is highly effective.</description><identifier>ISSN: 2169-3536</identifier><identifier>EISSN: 2169-3536</identifier><identifier>DOI: 10.1109/ACCESS.2024.3383322</identifier><identifier>CODEN: IAECCG</identifier><language>eng</language><publisher>Piscataway: IEEE</publisher><subject>Algorithms ; Commands ; Deep learning ; Deep reinforcement learning ; Differential games ; Earth ; Evolution ; evolution strategy (ES) ; Evolutionary computation ; Flight ; Flight vehicles ; guidance design ; Machine learning ; max-min problem ; Minimax techniques ; Navigation ; Neural networks ; Optimization ; Proportional navigation ; proximal policy optimization (PPO) ; Real-time systems ; Vectors ; Velocity</subject><ispartof>IEEE access, 2024-01, Vol.12, p.1-1</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2024</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c359t-a16c36222b5b0a3954412578cb91231f311c1644025f5ae8ff94df6d89e5a2b73</cites><orcidid>0000-0003-2395-1637 ; 0009-0007-3011-7858</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10485410$$EHTML$$P50$$Gieee$$Hfree_for_read</linktohtml><link.rule.ids>314,780,784,864,2102,27633,27924,27925,54933</link.rule.ids></links><search><creatorcontrib>Hu, Xiao</creatorcontrib><creatorcontrib>Wang, Tianshu</creatorcontrib><creatorcontrib>Gong, Min</creatorcontrib><creatorcontrib>Yang, Shaoshi</creatorcontrib><title>Guidance Design for Escape Flight Vehicle Using Evolution Strategy Enhanced Deep Reinforcement Learning</title><title>IEEE access</title><addtitle>Access</addtitle><description>Guidance commands of flight vehicles can be regarded as a series of data sets having fixed time intervals, thus guidance design constitutes a typical sequential decision problem and satisfies the basic conditions for using the deep reinforcement learning (DRL) technique. In this paper, we consider the scenario where the escape flight vehicle (EFV) generates guidance commands based on the DRL technique and the pursuit flight vehicle (PFV) generates guidance commands based on the proportional navigation method. Evasion distance is described as the minimum distance between the EFV and the PFV during the escape-and-pursuit process. For the EFV, the objective of the guidance design entails progressively maximizing the residual velocity, which is described as the EFV's velocity when the evasion distance occurs, subject to the constraint imposed by the given evasion distance. Thus an irregular dynamic max-min problem of extremely large-scale is formulated. In this problem, the time instant when the optimal solution (i.e., the maximum residual velocity satisfying the evasion distance constraint) can be attained is uncertain and the optimum solution is dependent on all the intermediate guidance commands generated before. For solving this challenging problem, a two-step strategy is conceived. In the first step, we use the proximal policy optimization (PPO) algorithm to generate the guidance commands of the EFV. The results obtained by PPO in the global search space are coarse, despite the fact that the reward function, the neural network parameters and the learning rate are designed elaborately. Therefore, in the second step, we propose to invoke the evolution strategy (ES) based algorithm, which uses the result of PPO as the initial value, to further improve the quality of the solution by searching in the local space. Extensive simulation results demonstrate that the guidance design method relying on the proposed ES-enhanced PPO algorithm is highly effective.</description><subject>Algorithms</subject><subject>Commands</subject><subject>Deep learning</subject><subject>Deep reinforcement learning</subject><subject>Differential games</subject><subject>Earth</subject><subject>Evolution</subject><subject>evolution strategy (ES)</subject><subject>Evolutionary computation</subject><subject>Flight</subject><subject>Flight vehicles</subject><subject>guidance design</subject><subject>Machine learning</subject><subject>max-min problem</subject><subject>Minimax techniques</subject><subject>Navigation</subject><subject>Neural networks</subject><subject>Optimization</subject><subject>Proportional navigation</subject><subject>proximal policy optimization (PPO)</subject><subject>Real-time systems</subject><subject>Vectors</subject><subject>Velocity</subject><issn>2169-3536</issn><issn>2169-3536</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>ESBDL</sourceid><sourceid>RIE</sourceid><sourceid>DOA</sourceid><recordid>eNpNkU1r5DAMhkPZQku3v2B7MPQ8U9uyk_hYZtMPGCh02r0ax5EzHqb2rJ1Z6L9v0pSlukgIvY8k3qL4xeiSMapublerZrNZcsrFEqAG4PykOOesVAuQUP74Vp8Vlznv6Bj12JLVedHfH31ngkXyG7PvA3ExkSZbc0Byt_f9diB_cOvtHslr9qEnzb-4Pw4-BrIZkhmwfydN2E6EbkTggTyjDyPE4huGgazRpDDqfhanzuwzXn7li-L1rnlZPSzWT_ePq9v1woJUw8Kw0kLJOW9lSw0oKQTjsqptqxgH5oAxy0ohKJdOGqydU6JzZVcrlIa3FVwUjzO3i2anD8m_mfSuo_H6sxFTr00apn90JSrBwdLWcSN45ZS06ErDqKVSuYqNrOuZdUjx7xHzoHfxmMJ4vgYKUDLF5TQF85RNMeeE7v9WRvVkkJ4N0pNB-sugUXU1qzwiflOIWgpG4QOsXosS</recordid><startdate>20240101</startdate><enddate>20240101</enddate><creator>Hu, Xiao</creator><creator>Wang, Tianshu</creator><creator>Gong, Min</creator><creator>Yang, Shaoshi</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>ESBDL</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>7SR</scope><scope>8BQ</scope><scope>8FD</scope><scope>JG9</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>DOA</scope><orcidid>https://orcid.org/0000-0003-2395-1637</orcidid><orcidid>https://orcid.org/0009-0007-3011-7858</orcidid></search><sort><creationdate>20240101</creationdate><title>Guidance Design for Escape Flight Vehicle Using Evolution Strategy Enhanced Deep Reinforcement Learning</title><author>Hu, Xiao ; Wang, Tianshu ; Gong, Min ; Yang, Shaoshi</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c359t-a16c36222b5b0a3954412578cb91231f311c1644025f5ae8ff94df6d89e5a2b73</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Algorithms</topic><topic>Commands</topic><topic>Deep learning</topic><topic>Deep reinforcement learning</topic><topic>Differential games</topic><topic>Earth</topic><topic>Evolution</topic><topic>evolution strategy (ES)</topic><topic>Evolutionary computation</topic><topic>Flight</topic><topic>Flight vehicles</topic><topic>guidance design</topic><topic>Machine learning</topic><topic>max-min problem</topic><topic>Minimax techniques</topic><topic>Navigation</topic><topic>Neural networks</topic><topic>Optimization</topic><topic>Proportional navigation</topic><topic>proximal policy optimization (PPO)</topic><topic>Real-time systems</topic><topic>Vectors</topic><topic>Velocity</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Hu, Xiao</creatorcontrib><creatorcontrib>Wang, Tianshu</creatorcontrib><creatorcontrib>Gong, Min</creatorcontrib><creatorcontrib>Yang, Shaoshi</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005–Present</collection><collection>IEEE Xplore Open Access Journals</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998–Present</collection><collection>IEEE Electronic Library Online</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Engineered Materials Abstracts</collection><collection>METADEX</collection><collection>Technology Research Database</collection><collection>Materials Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>DOAJ Directory of Open Access Journals</collection><jtitle>IEEE access</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Hu, Xiao</au><au>Wang, Tianshu</au><au>Gong, Min</au><au>Yang, Shaoshi</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Guidance Design for Escape Flight Vehicle Using Evolution Strategy Enhanced Deep Reinforcement Learning</atitle><jtitle>IEEE access</jtitle><stitle>Access</stitle><date>2024-01-01</date><risdate>2024</risdate><volume>12</volume><spage>1</spage><epage>1</epage><pages>1-1</pages><issn>2169-3536</issn><eissn>2169-3536</eissn><coden>IAECCG</coden><abstract>Guidance commands of flight vehicles can be regarded as a series of data sets having fixed time intervals, thus guidance design constitutes a typical sequential decision problem and satisfies the basic conditions for using the deep reinforcement learning (DRL) technique. In this paper, we consider the scenario where the escape flight vehicle (EFV) generates guidance commands based on the DRL technique and the pursuit flight vehicle (PFV) generates guidance commands based on the proportional navigation method. Evasion distance is described as the minimum distance between the EFV and the PFV during the escape-and-pursuit process. For the EFV, the objective of the guidance design entails progressively maximizing the residual velocity, which is described as the EFV's velocity when the evasion distance occurs, subject to the constraint imposed by the given evasion distance. Thus an irregular dynamic max-min problem of extremely large-scale is formulated. In this problem, the time instant when the optimal solution (i.e., the maximum residual velocity satisfying the evasion distance constraint) can be attained is uncertain and the optimum solution is dependent on all the intermediate guidance commands generated before. For solving this challenging problem, a two-step strategy is conceived. In the first step, we use the proximal policy optimization (PPO) algorithm to generate the guidance commands of the EFV. The results obtained by PPO in the global search space are coarse, despite the fact that the reward function, the neural network parameters and the learning rate are designed elaborately. Therefore, in the second step, we propose to invoke the evolution strategy (ES) based algorithm, which uses the result of PPO as the initial value, to further improve the quality of the solution by searching in the local space. Extensive simulation results demonstrate that the guidance design method relying on the proposed ES-enhanced PPO algorithm is highly effective.</abstract><cop>Piscataway</cop><pub>IEEE</pub><doi>10.1109/ACCESS.2024.3383322</doi><tpages>1</tpages><orcidid>https://orcid.org/0000-0003-2395-1637</orcidid><orcidid>https://orcid.org/0009-0007-3011-7858</orcidid><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | ISSN: 2169-3536 |
ispartof | IEEE access, 2024-01, Vol.12, p.1-1 |
issn | 2169-3536 2169-3536 |
language | eng |
recordid | cdi_crossref_primary_10_1109_ACCESS_2024_3383322 |
source | DOAJ Directory of Open Access Journals; IEEE Xplore Open Access Journals; EZB Electronic Journals Library |
subjects | Algorithms Commands Deep learning Deep reinforcement learning Differential games Earth Evolution evolution strategy (ES) Evolutionary computation Flight Flight vehicles guidance design Machine learning max-min problem Minimax techniques Navigation Neural networks Optimization Proportional navigation proximal policy optimization (PPO) Real-time systems Vectors Velocity |
title | Guidance Design for Escape Flight Vehicle Using Evolution Strategy Enhanced Deep Reinforcement Learning |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-25T09%3A03%3A58IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Guidance%20Design%20for%20Escape%20Flight%20Vehicle%20Using%20Evolution%20Strategy%20Enhanced%20Deep%20Reinforcement%20Learning&rft.jtitle=IEEE%20access&rft.au=Hu,%20Xiao&rft.date=2024-01-01&rft.volume=12&rft.spage=1&rft.epage=1&rft.pages=1-1&rft.issn=2169-3536&rft.eissn=2169-3536&rft.coden=IAECCG&rft_id=info:doi/10.1109/ACCESS.2024.3383322&rft_dat=%3Cproquest_cross%3E3033619251%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=3033619251&rft_id=info:pmid/&rft_ieee_id=10485410&rft_doaj_id=oai_doaj_org_article_747423c0bf2a427f95cef6a10c059f71&rfr_iscdi=true |