Accelerating Fuzzy Actor–Critic Learning via Suboptimal Knowledge for a Multi-Agent Tracking Problem

Multi-agent differential games usually include tracking policies and escaping policies. To obtain the proper policies in unknown environments, agents can learn through reinforcement learning. This typically requires a large amount of interaction with the environment, which is time-consuming and inef...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Electronics (Basel) 2023-04, Vol.12 (8), p.1852
Hauptverfasser: Wang, Xiao, Ma, Zhe, Mao, Lei, Sun, Kewu, Huang, Xuhui, Fan, Changchao, Li, Jiake
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue 8
container_start_page 1852
container_title Electronics (Basel)
container_volume 12
creator Wang, Xiao
Ma, Zhe
Mao, Lei
Sun, Kewu
Huang, Xuhui
Fan, Changchao
Li, Jiake
description Multi-agent differential games usually include tracking policies and escaping policies. To obtain the proper policies in unknown environments, agents can learn through reinforcement learning. This typically requires a large amount of interaction with the environment, which is time-consuming and inefficient. However, if one can obtain an estimated model based on some prior knowledge, the control policy can be obtained based on suboptimal knowledge. Although there exists an error between the estimated model and the environment, the suboptimal guided policy will avoid unnecessary exploration; thus, the learning process can be significantly accelerated. Facing the problem of tracking policy optimization for multiple pursuers, this study proposed a new form of fuzzy actor–critic learning algorithm based on suboptimal knowledge (SK-FACL). In the SK-FACL, the information about the environment that can be obtained is abstracted as an estimated model, and the suboptimal guided policy is calculated based on the Apollonius circle. The guided policy is combined with the fuzzy actor–critic learning algorithm, improving the learning efficiency. Considering the ground game of two pursuers and one evader, the experimental results verified the advantages of the SK-FACL in reducing tracking error, adapting model error and adapting to sudden changes made by the evader compared with pure knowledge control and the pure fuzzy actor–critic learning algorithm.
doi_str_mv 10.3390/electronics12081852
format Article
fullrecord <record><control><sourceid>gale_proqu</sourceid><recordid>TN_cdi_proquest_journals_2806536879</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><galeid>A747443740</galeid><sourcerecordid>A747443740</sourcerecordid><originalsourceid>FETCH-LOGICAL-c361t-d716109044d87a8a69385e345aac3e42491733dd0d9ff3712e7439e21c598a723</originalsourceid><addsrcrecordid>eNptUc1OwzAMrhBIINgTcInEuZDEaZMcq4k_MQQScK6y1J0CXTPSFDROvANvyJOQaRw4YB9s2f4-W_6y7JjRUwBNz7BDG4PvnR0Yp4qpgu9kB5xKnWuu-e6ffD-bDMMzTaYZKKAHWVtZmwiCia5fkIvx42NNKht9-P78mgYXnSUzNKHfdN-cIQ_j3K-iW5qO3PT-vcNmgaT1gRhyO3bR5dUC-0geg7EvG8x98PMOl0fZXmu6ASe_8TB7ujh_nF7ls7vL62k1yy2ULOaNZCWjmgrRKGmUKTWoAkEUxlhAwYVmEqBpaKPbFiTjKAVo5MwWWhnJ4TA72fKugn8dcYj1sx9Dn1bWXNGygFJJnaZOt1ML02Ht-tbHdG_yBpfO-h5bl-qVFFIIkIImAGwBNvhhCNjWq5B-ENY1o_VGhPofEeAHlLl9hw</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2806536879</pqid></control><display><type>article</type><title>Accelerating Fuzzy Actor–Critic Learning via Suboptimal Knowledge for a Multi-Agent Tracking Problem</title><source>MDPI - Multidisciplinary Digital Publishing Institute</source><source>EZB-FREE-00999 freely available EZB journals</source><creator>Wang, Xiao ; Ma, Zhe ; Mao, Lei ; Sun, Kewu ; Huang, Xuhui ; Fan, Changchao ; Li, Jiake</creator><creatorcontrib>Wang, Xiao ; Ma, Zhe ; Mao, Lei ; Sun, Kewu ; Huang, Xuhui ; Fan, Changchao ; Li, Jiake</creatorcontrib><description>Multi-agent differential games usually include tracking policies and escaping policies. To obtain the proper policies in unknown environments, agents can learn through reinforcement learning. This typically requires a large amount of interaction with the environment, which is time-consuming and inefficient. However, if one can obtain an estimated model based on some prior knowledge, the control policy can be obtained based on suboptimal knowledge. Although there exists an error between the estimated model and the environment, the suboptimal guided policy will avoid unnecessary exploration; thus, the learning process can be significantly accelerated. Facing the problem of tracking policy optimization for multiple pursuers, this study proposed a new form of fuzzy actor–critic learning algorithm based on suboptimal knowledge (SK-FACL). In the SK-FACL, the information about the environment that can be obtained is abstracted as an estimated model, and the suboptimal guided policy is calculated based on the Apollonius circle. The guided policy is combined with the fuzzy actor–critic learning algorithm, improving the learning efficiency. Considering the ground game of two pursuers and one evader, the experimental results verified the advantages of the SK-FACL in reducing tracking error, adapting model error and adapting to sudden changes made by the evader compared with pure knowledge control and the pure fuzzy actor–critic learning algorithm.</description><identifier>ISSN: 2079-9292</identifier><identifier>EISSN: 2079-9292</identifier><identifier>DOI: 10.3390/electronics12081852</identifier><language>eng</language><publisher>Basel: MDPI AG</publisher><subject>Algorithms ; Data mining ; Decision making ; Deep learning ; Differential games ; Disadvantages ; Error reduction ; Fuzzy control ; Game theory ; Knowledge ; Machine learning ; Mathematical optimization ; Methods ; Multi-agent systems ; Multiagent systems ; Optimization ; Policies ; Reinforcement learning (Machine learning) ; Telecommunications systems ; Tracking errors ; Tracking problem ; Unknown environments ; Unmanned aerial vehicles</subject><ispartof>Electronics (Basel), 2023-04, Vol.12 (8), p.1852</ispartof><rights>COPYRIGHT 2023 MDPI AG</rights><rights>2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c361t-d716109044d87a8a69385e345aac3e42491733dd0d9ff3712e7439e21c598a723</citedby><cites>FETCH-LOGICAL-c361t-d716109044d87a8a69385e345aac3e42491733dd0d9ff3712e7439e21c598a723</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,780,784,27922,27923</link.rule.ids></links><search><creatorcontrib>Wang, Xiao</creatorcontrib><creatorcontrib>Ma, Zhe</creatorcontrib><creatorcontrib>Mao, Lei</creatorcontrib><creatorcontrib>Sun, Kewu</creatorcontrib><creatorcontrib>Huang, Xuhui</creatorcontrib><creatorcontrib>Fan, Changchao</creatorcontrib><creatorcontrib>Li, Jiake</creatorcontrib><title>Accelerating Fuzzy Actor–Critic Learning via Suboptimal Knowledge for a Multi-Agent Tracking Problem</title><title>Electronics (Basel)</title><description>Multi-agent differential games usually include tracking policies and escaping policies. To obtain the proper policies in unknown environments, agents can learn through reinforcement learning. This typically requires a large amount of interaction with the environment, which is time-consuming and inefficient. However, if one can obtain an estimated model based on some prior knowledge, the control policy can be obtained based on suboptimal knowledge. Although there exists an error between the estimated model and the environment, the suboptimal guided policy will avoid unnecessary exploration; thus, the learning process can be significantly accelerated. Facing the problem of tracking policy optimization for multiple pursuers, this study proposed a new form of fuzzy actor–critic learning algorithm based on suboptimal knowledge (SK-FACL). In the SK-FACL, the information about the environment that can be obtained is abstracted as an estimated model, and the suboptimal guided policy is calculated based on the Apollonius circle. The guided policy is combined with the fuzzy actor–critic learning algorithm, improving the learning efficiency. Considering the ground game of two pursuers and one evader, the experimental results verified the advantages of the SK-FACL in reducing tracking error, adapting model error and adapting to sudden changes made by the evader compared with pure knowledge control and the pure fuzzy actor–critic learning algorithm.</description><subject>Algorithms</subject><subject>Data mining</subject><subject>Decision making</subject><subject>Deep learning</subject><subject>Differential games</subject><subject>Disadvantages</subject><subject>Error reduction</subject><subject>Fuzzy control</subject><subject>Game theory</subject><subject>Knowledge</subject><subject>Machine learning</subject><subject>Mathematical optimization</subject><subject>Methods</subject><subject>Multi-agent systems</subject><subject>Multiagent systems</subject><subject>Optimization</subject><subject>Policies</subject><subject>Reinforcement learning (Machine learning)</subject><subject>Telecommunications systems</subject><subject>Tracking errors</subject><subject>Tracking problem</subject><subject>Unknown environments</subject><subject>Unmanned aerial vehicles</subject><issn>2079-9292</issn><issn>2079-9292</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><recordid>eNptUc1OwzAMrhBIINgTcInEuZDEaZMcq4k_MQQScK6y1J0CXTPSFDROvANvyJOQaRw4YB9s2f4-W_6y7JjRUwBNz7BDG4PvnR0Yp4qpgu9kB5xKnWuu-e6ffD-bDMMzTaYZKKAHWVtZmwiCia5fkIvx42NNKht9-P78mgYXnSUzNKHfdN-cIQ_j3K-iW5qO3PT-vcNmgaT1gRhyO3bR5dUC-0geg7EvG8x98PMOl0fZXmu6ASe_8TB7ujh_nF7ls7vL62k1yy2ULOaNZCWjmgrRKGmUKTWoAkEUxlhAwYVmEqBpaKPbFiTjKAVo5MwWWhnJ4TA72fKugn8dcYj1sx9Dn1bWXNGygFJJnaZOt1ML02Ht-tbHdG_yBpfO-h5bl-qVFFIIkIImAGwBNvhhCNjWq5B-ENY1o_VGhPofEeAHlLl9hw</recordid><startdate>20230401</startdate><enddate>20230401</enddate><creator>Wang, Xiao</creator><creator>Ma, Zhe</creator><creator>Mao, Lei</creator><creator>Sun, Kewu</creator><creator>Huang, Xuhui</creator><creator>Fan, Changchao</creator><creator>Li, Jiake</creator><general>MDPI AG</general><scope>AAYXX</scope><scope>CITATION</scope><scope>7SP</scope><scope>8FD</scope><scope>8FE</scope><scope>8FG</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L7M</scope><scope>P5Z</scope><scope>P62</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope></search><sort><creationdate>20230401</creationdate><title>Accelerating Fuzzy Actor–Critic Learning via Suboptimal Knowledge for a Multi-Agent Tracking Problem</title><author>Wang, Xiao ; Ma, Zhe ; Mao, Lei ; Sun, Kewu ; Huang, Xuhui ; Fan, Changchao ; Li, Jiake</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c361t-d716109044d87a8a69385e345aac3e42491733dd0d9ff3712e7439e21c598a723</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Algorithms</topic><topic>Data mining</topic><topic>Decision making</topic><topic>Deep learning</topic><topic>Differential games</topic><topic>Disadvantages</topic><topic>Error reduction</topic><topic>Fuzzy control</topic><topic>Game theory</topic><topic>Knowledge</topic><topic>Machine learning</topic><topic>Mathematical optimization</topic><topic>Methods</topic><topic>Multi-agent systems</topic><topic>Multiagent systems</topic><topic>Optimization</topic><topic>Policies</topic><topic>Reinforcement learning (Machine learning)</topic><topic>Telecommunications systems</topic><topic>Tracking errors</topic><topic>Tracking problem</topic><topic>Unknown environments</topic><topic>Unmanned aerial vehicles</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Wang, Xiao</creatorcontrib><creatorcontrib>Ma, Zhe</creatorcontrib><creatorcontrib>Mao, Lei</creatorcontrib><creatorcontrib>Sun, Kewu</creatorcontrib><creatorcontrib>Huang, Xuhui</creatorcontrib><creatorcontrib>Fan, Changchao</creatorcontrib><creatorcontrib>Li, Jiake</creatorcontrib><collection>CrossRef</collection><collection>Electronics &amp; Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>Advanced Technologies &amp; Aerospace Collection</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection (ProQuest)</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Advanced Technologies &amp; Aerospace Database</collection><collection>ProQuest Advanced Technologies &amp; Aerospace Collection</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><jtitle>Electronics (Basel)</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Wang, Xiao</au><au>Ma, Zhe</au><au>Mao, Lei</au><au>Sun, Kewu</au><au>Huang, Xuhui</au><au>Fan, Changchao</au><au>Li, Jiake</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Accelerating Fuzzy Actor–Critic Learning via Suboptimal Knowledge for a Multi-Agent Tracking Problem</atitle><jtitle>Electronics (Basel)</jtitle><date>2023-04-01</date><risdate>2023</risdate><volume>12</volume><issue>8</issue><spage>1852</spage><pages>1852-</pages><issn>2079-9292</issn><eissn>2079-9292</eissn><abstract>Multi-agent differential games usually include tracking policies and escaping policies. To obtain the proper policies in unknown environments, agents can learn through reinforcement learning. This typically requires a large amount of interaction with the environment, which is time-consuming and inefficient. However, if one can obtain an estimated model based on some prior knowledge, the control policy can be obtained based on suboptimal knowledge. Although there exists an error between the estimated model and the environment, the suboptimal guided policy will avoid unnecessary exploration; thus, the learning process can be significantly accelerated. Facing the problem of tracking policy optimization for multiple pursuers, this study proposed a new form of fuzzy actor–critic learning algorithm based on suboptimal knowledge (SK-FACL). In the SK-FACL, the information about the environment that can be obtained is abstracted as an estimated model, and the suboptimal guided policy is calculated based on the Apollonius circle. The guided policy is combined with the fuzzy actor–critic learning algorithm, improving the learning efficiency. Considering the ground game of two pursuers and one evader, the experimental results verified the advantages of the SK-FACL in reducing tracking error, adapting model error and adapting to sudden changes made by the evader compared with pure knowledge control and the pure fuzzy actor–critic learning algorithm.</abstract><cop>Basel</cop><pub>MDPI AG</pub><doi>10.3390/electronics12081852</doi><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 2079-9292
ispartof Electronics (Basel), 2023-04, Vol.12 (8), p.1852
issn 2079-9292
2079-9292
language eng
recordid cdi_proquest_journals_2806536879
source MDPI - Multidisciplinary Digital Publishing Institute; EZB-FREE-00999 freely available EZB journals
subjects Algorithms
Data mining
Decision making
Deep learning
Differential games
Disadvantages
Error reduction
Fuzzy control
Game theory
Knowledge
Machine learning
Mathematical optimization
Methods
Multi-agent systems
Multiagent systems
Optimization
Policies
Reinforcement learning (Machine learning)
Telecommunications systems
Tracking errors
Tracking problem
Unknown environments
Unmanned aerial vehicles
title Accelerating Fuzzy Actor–Critic Learning via Suboptimal Knowledge for a Multi-Agent Tracking Problem
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-14T07%3A30%3A38IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-gale_proqu&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Accelerating%20Fuzzy%20Actor%E2%80%93Critic%20Learning%20via%20Suboptimal%20Knowledge%20for%20a%20Multi-Agent%20Tracking%20Problem&rft.jtitle=Electronics%20(Basel)&rft.au=Wang,%20Xiao&rft.date=2023-04-01&rft.volume=12&rft.issue=8&rft.spage=1852&rft.pages=1852-&rft.issn=2079-9292&rft.eissn=2079-9292&rft_id=info:doi/10.3390/electronics12081852&rft_dat=%3Cgale_proqu%3EA747443740%3C/gale_proqu%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2806536879&rft_id=info:pmid/&rft_galeid=A747443740&rfr_iscdi=true