Accelerating Fuzzy Actor–Critic Learning via Suboptimal Knowledge for a Multi-Agent Tracking Problem
Multi-agent differential games usually include tracking policies and escaping policies. To obtain the proper policies in unknown environments, agents can learn through reinforcement learning. This typically requires a large amount of interaction with the environment, which is time-consuming and inef...
Gespeichert in:
Veröffentlicht in: | Electronics (Basel) 2023-04, Vol.12 (8), p.1852 |
---|---|
Hauptverfasser: | , , , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | |
---|---|
container_issue | 8 |
container_start_page | 1852 |
container_title | Electronics (Basel) |
container_volume | 12 |
creator | Wang, Xiao Ma, Zhe Mao, Lei Sun, Kewu Huang, Xuhui Fan, Changchao Li, Jiake |
description | Multi-agent differential games usually include tracking policies and escaping policies. To obtain the proper policies in unknown environments, agents can learn through reinforcement learning. This typically requires a large amount of interaction with the environment, which is time-consuming and inefficient. However, if one can obtain an estimated model based on some prior knowledge, the control policy can be obtained based on suboptimal knowledge. Although there exists an error between the estimated model and the environment, the suboptimal guided policy will avoid unnecessary exploration; thus, the learning process can be significantly accelerated. Facing the problem of tracking policy optimization for multiple pursuers, this study proposed a new form of fuzzy actor–critic learning algorithm based on suboptimal knowledge (SK-FACL). In the SK-FACL, the information about the environment that can be obtained is abstracted as an estimated model, and the suboptimal guided policy is calculated based on the Apollonius circle. The guided policy is combined with the fuzzy actor–critic learning algorithm, improving the learning efficiency. Considering the ground game of two pursuers and one evader, the experimental results verified the advantages of the SK-FACL in reducing tracking error, adapting model error and adapting to sudden changes made by the evader compared with pure knowledge control and the pure fuzzy actor–critic learning algorithm. |
doi_str_mv | 10.3390/electronics12081852 |
format | Article |
fullrecord | <record><control><sourceid>gale_proqu</sourceid><recordid>TN_cdi_proquest_journals_2806536879</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><galeid>A747443740</galeid><sourcerecordid>A747443740</sourcerecordid><originalsourceid>FETCH-LOGICAL-c361t-d716109044d87a8a69385e345aac3e42491733dd0d9ff3712e7439e21c598a723</originalsourceid><addsrcrecordid>eNptUc1OwzAMrhBIINgTcInEuZDEaZMcq4k_MQQScK6y1J0CXTPSFDROvANvyJOQaRw4YB9s2f4-W_6y7JjRUwBNz7BDG4PvnR0Yp4qpgu9kB5xKnWuu-e6ffD-bDMMzTaYZKKAHWVtZmwiCia5fkIvx42NNKht9-P78mgYXnSUzNKHfdN-cIQ_j3K-iW5qO3PT-vcNmgaT1gRhyO3bR5dUC-0geg7EvG8x98PMOl0fZXmu6ASe_8TB7ujh_nF7ls7vL62k1yy2ULOaNZCWjmgrRKGmUKTWoAkEUxlhAwYVmEqBpaKPbFiTjKAVo5MwWWhnJ4TA72fKugn8dcYj1sx9Dn1bWXNGygFJJnaZOt1ML02Ht-tbHdG_yBpfO-h5bl-qVFFIIkIImAGwBNvhhCNjWq5B-ENY1o_VGhPofEeAHlLl9hw</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2806536879</pqid></control><display><type>article</type><title>Accelerating Fuzzy Actor–Critic Learning via Suboptimal Knowledge for a Multi-Agent Tracking Problem</title><source>MDPI - Multidisciplinary Digital Publishing Institute</source><source>EZB-FREE-00999 freely available EZB journals</source><creator>Wang, Xiao ; Ma, Zhe ; Mao, Lei ; Sun, Kewu ; Huang, Xuhui ; Fan, Changchao ; Li, Jiake</creator><creatorcontrib>Wang, Xiao ; Ma, Zhe ; Mao, Lei ; Sun, Kewu ; Huang, Xuhui ; Fan, Changchao ; Li, Jiake</creatorcontrib><description>Multi-agent differential games usually include tracking policies and escaping policies. To obtain the proper policies in unknown environments, agents can learn through reinforcement learning. This typically requires a large amount of interaction with the environment, which is time-consuming and inefficient. However, if one can obtain an estimated model based on some prior knowledge, the control policy can be obtained based on suboptimal knowledge. Although there exists an error between the estimated model and the environment, the suboptimal guided policy will avoid unnecessary exploration; thus, the learning process can be significantly accelerated. Facing the problem of tracking policy optimization for multiple pursuers, this study proposed a new form of fuzzy actor–critic learning algorithm based on suboptimal knowledge (SK-FACL). In the SK-FACL, the information about the environment that can be obtained is abstracted as an estimated model, and the suboptimal guided policy is calculated based on the Apollonius circle. The guided policy is combined with the fuzzy actor–critic learning algorithm, improving the learning efficiency. Considering the ground game of two pursuers and one evader, the experimental results verified the advantages of the SK-FACL in reducing tracking error, adapting model error and adapting to sudden changes made by the evader compared with pure knowledge control and the pure fuzzy actor–critic learning algorithm.</description><identifier>ISSN: 2079-9292</identifier><identifier>EISSN: 2079-9292</identifier><identifier>DOI: 10.3390/electronics12081852</identifier><language>eng</language><publisher>Basel: MDPI AG</publisher><subject>Algorithms ; Data mining ; Decision making ; Deep learning ; Differential games ; Disadvantages ; Error reduction ; Fuzzy control ; Game theory ; Knowledge ; Machine learning ; Mathematical optimization ; Methods ; Multi-agent systems ; Multiagent systems ; Optimization ; Policies ; Reinforcement learning (Machine learning) ; Telecommunications systems ; Tracking errors ; Tracking problem ; Unknown environments ; Unmanned aerial vehicles</subject><ispartof>Electronics (Basel), 2023-04, Vol.12 (8), p.1852</ispartof><rights>COPYRIGHT 2023 MDPI AG</rights><rights>2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c361t-d716109044d87a8a69385e345aac3e42491733dd0d9ff3712e7439e21c598a723</citedby><cites>FETCH-LOGICAL-c361t-d716109044d87a8a69385e345aac3e42491733dd0d9ff3712e7439e21c598a723</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,780,784,27922,27923</link.rule.ids></links><search><creatorcontrib>Wang, Xiao</creatorcontrib><creatorcontrib>Ma, Zhe</creatorcontrib><creatorcontrib>Mao, Lei</creatorcontrib><creatorcontrib>Sun, Kewu</creatorcontrib><creatorcontrib>Huang, Xuhui</creatorcontrib><creatorcontrib>Fan, Changchao</creatorcontrib><creatorcontrib>Li, Jiake</creatorcontrib><title>Accelerating Fuzzy Actor–Critic Learning via Suboptimal Knowledge for a Multi-Agent Tracking Problem</title><title>Electronics (Basel)</title><description>Multi-agent differential games usually include tracking policies and escaping policies. To obtain the proper policies in unknown environments, agents can learn through reinforcement learning. This typically requires a large amount of interaction with the environment, which is time-consuming and inefficient. However, if one can obtain an estimated model based on some prior knowledge, the control policy can be obtained based on suboptimal knowledge. Although there exists an error between the estimated model and the environment, the suboptimal guided policy will avoid unnecessary exploration; thus, the learning process can be significantly accelerated. Facing the problem of tracking policy optimization for multiple pursuers, this study proposed a new form of fuzzy actor–critic learning algorithm based on suboptimal knowledge (SK-FACL). In the SK-FACL, the information about the environment that can be obtained is abstracted as an estimated model, and the suboptimal guided policy is calculated based on the Apollonius circle. The guided policy is combined with the fuzzy actor–critic learning algorithm, improving the learning efficiency. Considering the ground game of two pursuers and one evader, the experimental results verified the advantages of the SK-FACL in reducing tracking error, adapting model error and adapting to sudden changes made by the evader compared with pure knowledge control and the pure fuzzy actor–critic learning algorithm.</description><subject>Algorithms</subject><subject>Data mining</subject><subject>Decision making</subject><subject>Deep learning</subject><subject>Differential games</subject><subject>Disadvantages</subject><subject>Error reduction</subject><subject>Fuzzy control</subject><subject>Game theory</subject><subject>Knowledge</subject><subject>Machine learning</subject><subject>Mathematical optimization</subject><subject>Methods</subject><subject>Multi-agent systems</subject><subject>Multiagent systems</subject><subject>Optimization</subject><subject>Policies</subject><subject>Reinforcement learning (Machine learning)</subject><subject>Telecommunications systems</subject><subject>Tracking errors</subject><subject>Tracking problem</subject><subject>Unknown environments</subject><subject>Unmanned aerial vehicles</subject><issn>2079-9292</issn><issn>2079-9292</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><recordid>eNptUc1OwzAMrhBIINgTcInEuZDEaZMcq4k_MQQScK6y1J0CXTPSFDROvANvyJOQaRw4YB9s2f4-W_6y7JjRUwBNz7BDG4PvnR0Yp4qpgu9kB5xKnWuu-e6ffD-bDMMzTaYZKKAHWVtZmwiCia5fkIvx42NNKht9-P78mgYXnSUzNKHfdN-cIQ_j3K-iW5qO3PT-vcNmgaT1gRhyO3bR5dUC-0geg7EvG8x98PMOl0fZXmu6ASe_8TB7ujh_nF7ls7vL62k1yy2ULOaNZCWjmgrRKGmUKTWoAkEUxlhAwYVmEqBpaKPbFiTjKAVo5MwWWhnJ4TA72fKugn8dcYj1sx9Dn1bWXNGygFJJnaZOt1ML02Ht-tbHdG_yBpfO-h5bl-qVFFIIkIImAGwBNvhhCNjWq5B-ENY1o_VGhPofEeAHlLl9hw</recordid><startdate>20230401</startdate><enddate>20230401</enddate><creator>Wang, Xiao</creator><creator>Ma, Zhe</creator><creator>Mao, Lei</creator><creator>Sun, Kewu</creator><creator>Huang, Xuhui</creator><creator>Fan, Changchao</creator><creator>Li, Jiake</creator><general>MDPI AG</general><scope>AAYXX</scope><scope>CITATION</scope><scope>7SP</scope><scope>8FD</scope><scope>8FE</scope><scope>8FG</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L7M</scope><scope>P5Z</scope><scope>P62</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope></search><sort><creationdate>20230401</creationdate><title>Accelerating Fuzzy Actor–Critic Learning via Suboptimal Knowledge for a Multi-Agent Tracking Problem</title><author>Wang, Xiao ; Ma, Zhe ; Mao, Lei ; Sun, Kewu ; Huang, Xuhui ; Fan, Changchao ; Li, Jiake</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c361t-d716109044d87a8a69385e345aac3e42491733dd0d9ff3712e7439e21c598a723</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Algorithms</topic><topic>Data mining</topic><topic>Decision making</topic><topic>Deep learning</topic><topic>Differential games</topic><topic>Disadvantages</topic><topic>Error reduction</topic><topic>Fuzzy control</topic><topic>Game theory</topic><topic>Knowledge</topic><topic>Machine learning</topic><topic>Mathematical optimization</topic><topic>Methods</topic><topic>Multi-agent systems</topic><topic>Multiagent systems</topic><topic>Optimization</topic><topic>Policies</topic><topic>Reinforcement learning (Machine learning)</topic><topic>Telecommunications systems</topic><topic>Tracking errors</topic><topic>Tracking problem</topic><topic>Unknown environments</topic><topic>Unmanned aerial vehicles</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Wang, Xiao</creatorcontrib><creatorcontrib>Ma, Zhe</creatorcontrib><creatorcontrib>Mao, Lei</creatorcontrib><creatorcontrib>Sun, Kewu</creatorcontrib><creatorcontrib>Huang, Xuhui</creatorcontrib><creatorcontrib>Fan, Changchao</creatorcontrib><creatorcontrib>Li, Jiake</creatorcontrib><collection>CrossRef</collection><collection>Electronics & Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>Advanced Technologies & Aerospace Collection</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection (ProQuest)</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Advanced Technologies & Aerospace Database</collection><collection>ProQuest Advanced Technologies & Aerospace Collection</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><jtitle>Electronics (Basel)</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Wang, Xiao</au><au>Ma, Zhe</au><au>Mao, Lei</au><au>Sun, Kewu</au><au>Huang, Xuhui</au><au>Fan, Changchao</au><au>Li, Jiake</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Accelerating Fuzzy Actor–Critic Learning via Suboptimal Knowledge for a Multi-Agent Tracking Problem</atitle><jtitle>Electronics (Basel)</jtitle><date>2023-04-01</date><risdate>2023</risdate><volume>12</volume><issue>8</issue><spage>1852</spage><pages>1852-</pages><issn>2079-9292</issn><eissn>2079-9292</eissn><abstract>Multi-agent differential games usually include tracking policies and escaping policies. To obtain the proper policies in unknown environments, agents can learn through reinforcement learning. This typically requires a large amount of interaction with the environment, which is time-consuming and inefficient. However, if one can obtain an estimated model based on some prior knowledge, the control policy can be obtained based on suboptimal knowledge. Although there exists an error between the estimated model and the environment, the suboptimal guided policy will avoid unnecessary exploration; thus, the learning process can be significantly accelerated. Facing the problem of tracking policy optimization for multiple pursuers, this study proposed a new form of fuzzy actor–critic learning algorithm based on suboptimal knowledge (SK-FACL). In the SK-FACL, the information about the environment that can be obtained is abstracted as an estimated model, and the suboptimal guided policy is calculated based on the Apollonius circle. The guided policy is combined with the fuzzy actor–critic learning algorithm, improving the learning efficiency. Considering the ground game of two pursuers and one evader, the experimental results verified the advantages of the SK-FACL in reducing tracking error, adapting model error and adapting to sudden changes made by the evader compared with pure knowledge control and the pure fuzzy actor–critic learning algorithm.</abstract><cop>Basel</cop><pub>MDPI AG</pub><doi>10.3390/electronics12081852</doi><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | ISSN: 2079-9292 |
ispartof | Electronics (Basel), 2023-04, Vol.12 (8), p.1852 |
issn | 2079-9292 2079-9292 |
language | eng |
recordid | cdi_proquest_journals_2806536879 |
source | MDPI - Multidisciplinary Digital Publishing Institute; EZB-FREE-00999 freely available EZB journals |
subjects | Algorithms Data mining Decision making Deep learning Differential games Disadvantages Error reduction Fuzzy control Game theory Knowledge Machine learning Mathematical optimization Methods Multi-agent systems Multiagent systems Optimization Policies Reinforcement learning (Machine learning) Telecommunications systems Tracking errors Tracking problem Unknown environments Unmanned aerial vehicles |
title | Accelerating Fuzzy Actor–Critic Learning via Suboptimal Knowledge for a Multi-Agent Tracking Problem |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-14T07%3A30%3A38IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-gale_proqu&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Accelerating%20Fuzzy%20Actor%E2%80%93Critic%20Learning%20via%20Suboptimal%20Knowledge%20for%20a%20Multi-Agent%20Tracking%20Problem&rft.jtitle=Electronics%20(Basel)&rft.au=Wang,%20Xiao&rft.date=2023-04-01&rft.volume=12&rft.issue=8&rft.spage=1852&rft.pages=1852-&rft.issn=2079-9292&rft.eissn=2079-9292&rft_id=info:doi/10.3390/electronics12081852&rft_dat=%3Cgale_proqu%3EA747443740%3C/gale_proqu%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2806536879&rft_id=info:pmid/&rft_galeid=A747443740&rfr_iscdi=true |