Accelerating Fuzzy Actor–Critic Learning via Suboptimal Knowledge for a Multi-Agent Tracking Problem

Multi-agent differential games usually include tracking policies and escaping policies. To obtain the proper policies in unknown environments, agents can learn through reinforcement learning. This typically requires a large amount of interaction with the environment, which is time-consuming and inef...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Electronics (Basel) 2023-04, Vol.12 (8), p.1852
Hauptverfasser:	Wang, Xiao, Ma, Zhe, Mao, Lei, Sun, Kewu, Huang, Xuhui, Fan, Changchao, Li, Jiake
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Data mining Decision making Deep learning Differential games Disadvantages Error reduction Fuzzy control Game theory Knowledge Machine learning Mathematical optimization Methods Multi-agent systems Multiagent systems Optimization Policies Reinforcement learning (Machine learning) Telecommunications systems Tracking errors Tracking problem Unknown environments Unmanned aerial vehicles
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue	8
container_start_page	1852
container_title	Electronics (Basel)
container_volume	12
creator	Wang, Xiao Ma, Zhe Mao, Lei Sun, Kewu Huang, Xuhui Fan, Changchao Li, Jiake
description	Multi-agent differential games usually include tracking policies and escaping policies. To obtain the proper policies in unknown environments, agents can learn through reinforcement learning. This typically requires a large amount of interaction with the environment, which is time-consuming and inefficient. However, if one can obtain an estimated model based on some prior knowledge, the control policy can be obtained based on suboptimal knowledge. Although there exists an error between the estimated model and the environment, the suboptimal guided policy will avoid unnecessary exploration; thus, the learning process can be significantly accelerated. Facing the problem of tracking policy optimization for multiple pursuers, this study proposed a new form of fuzzy actor–critic learning algorithm based on suboptimal knowledge (SK-FACL). In the SK-FACL, the information about the environment that can be obtained is abstracted as an estimated model, and the suboptimal guided policy is calculated based on the Apollonius circle. The guided policy is combined with the fuzzy actor–critic learning algorithm, improving the learning efficiency. Considering the ground game of two pursuers and one evader, the experimental results verified the advantages of the SK-FACL in reducing tracking error, adapting model error and adapting to sudden changes made by the evader compared with pure knowledge control and the pure fuzzy actor–critic learning algorithm.
doi_str_mv	10.3390/electronics12081852
format	Article
fullrecord	<record><control><sourceid>gale_proqu</sourceid><recordid>TN_cdi_proquest_journals_2806536879</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><galeid>A747443740</galeid><sourcerecordid>A747443740</sourcerecordid><originalsourceid>FETCH-LOGICAL-c361t-d716109044d87a8a69385e345aac3e42491733dd0d9ff3712e7439e21c598a723</originalsourceid><addsrcrecordid>eNptUc1OwzAMrhBIINgTcInEuZDEaZMcq4k_MQQScK6y1J0CXTPSFDROvANvyJOQaRw4YB9s2f4-W_6y7JjRUwBNz7BDG4PvnR0Yp4qpgu9kB5xKnWuu-e6ffD-bDMMzTaYZKKAHWVtZmwiCia5fkIvx42NNKht9-P78mgYXnSUzNKHfdN-cIQ_j3K-iW5qO3PT-vcNmgaT1gRhyO3bR5dUC-0geg7EvG8x98PMOl0fZXmu6ASe_8TB7ujh_nF7ls7vL62k1yy2ULOaNZCWjmgrRKGmUKTWoAkEUxlhAwYVmEqBpaKPbFiTjKAVo5MwWWhnJ4TA72fKugn8dcYj1sx9Dn1bWXNGygFJJnaZOt1ML02Ht-tbHdG_yBpfO-h5bl-qVFFIIkIImAGwBNvhhCNjWq5B-ENY1o_VGhPofEeAHlLl9hw</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2806536879</pqid></control><display><type>article</type><title>Accelerating Fuzzy Actor–Critic Learning via Suboptimal Knowledge for a Multi-Agent Tracking Problem</title><source>MDPI - Multidisciplinary Digital Publishing Institute</source><source>EZB-FREE-00999 freely available EZB journals</source><creator>Wang, Xiao ; Ma, Zhe ; Mao, Lei ; Sun, Kewu ; Huang, Xuhui ; Fan, Changchao ; Li, Jiake</creator><creatorcontrib>Wang, Xiao ; Ma, Zhe ; Mao, Lei ; Sun, Kewu ; Huang, Xuhui ; Fan, Changchao ; Li, Jiake</creatorcontrib><description>Multi-agent differential games usually include tracking policies and escaping policies. To obtain the proper policies in unknown environments, agents can learn through reinforcement learning. This typically requires a large amount of interaction with the environment, which is time-consuming and inefficient. However, if one can obtain an estimated model based on some prior knowledge, the control policy can be obtained based on suboptimal knowledge. Although there exists an error between the estimated model and the environment, the suboptimal guided policy will avoid unnecessary exploration; thus, the learning process can be significantly accelerated. Facing the problem of tracking policy optimization for multiple pursuers, this study proposed a new form of fuzzy actor–critic learning algorithm based on suboptimal knowledge (SK-FACL). In the SK-FACL, the information about the environment that can be obtained is abstracted as an estimated model, and the suboptimal guided policy is calculated based on the Apollonius circle. The guided policy is combined with the fuzzy actor–critic learning algorithm, improving the learning efficiency. Considering the ground game of two pursuers and one evader, the experimental results verified the advantages of the SK-FACL in reducing tracking error, adapting model error and adapting to sudden changes made by the evader compared with pure knowledge control and the pure fuzzy actor–critic learning algorithm.</description><identifier>ISSN: 2079-9292</identifier><identifier>EISSN: 2079-9292</identifier><identifier>DOI: 10.3390/electronics12081852</identifier><language>eng</language><publisher>Basel: MDPI AG</publisher><subject>Algorithms ; Data mining ; Decision making ; Deep learning ; Differential games ; Disadvantages ; Error reduction ; Fuzzy control ; Game theory ; Knowledge ; Machine learning ; Mathematical optimization ; Methods ; Multi-agent systems ; Multiagent systems ; Optimization ; Policies ; Reinforcement learning (Machine learning) ; Telecommunications systems ; Tracking errors ; Tracking problem ; Unknown environments ; Unmanned aerial vehicles</subject><ispartof>Electronics (Basel), 2023-04, Vol.12 (8), p.1852</ispartof><rights>COPYRIGHT 2023 MDPI AG</rights><rights>2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c361t-d716109044d87a8a69385e345aac3e42491733dd0d9ff3712e7439e21c598a723</citedby><cites>FETCH-LOGICAL-c361t-d716109044d87a8a69385e345aac3e42491733dd0d9ff3712e7439e21c598a723</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,780,784,27922,27923</link.rule.ids></links><search><creatorcontrib>Wang, Xiao</creatorcontrib><creatorcontrib>Ma, Zhe</creatorcontrib><creatorcontrib>Mao, Lei</creatorcontrib><creatorcontrib>Sun, Kewu</creatorcontrib><creatorcontrib>Huang, Xuhui</creatorcontrib><creatorcontrib>Fan, Changchao</creatorcontrib><creatorcontrib>Li, Jiake</creatorcontrib><title>Accelerating Fuzzy Actor–Critic Learning via Suboptimal Knowledge for a Multi-Agent Tracking Problem</title><title>Electronics (Basel)</title><description>Multi-agent differential games usually include tracking policies and escaping policies. To obtain the proper policies in unknown environments, agents can learn through reinforcement learning. This typically requires a large amount of interaction with the environment, which is time-consuming and inefficient. However, if one can obtain an estimated model based on some prior knowledge, the control policy can be obtained based on suboptimal knowledge. Although there exists an error between the estimated model and the environment, the suboptimal guided policy will avoid unnecessary exploration; thus, the learning process can be significantly accelerated. Facing the problem of tracking policy optimization for multiple pursuers, this study proposed a new form of fuzzy actor–critic learning algorithm based on suboptimal knowledge (SK-FACL). In the SK-FACL, the information about the environment that can be obtained is abstracted as an estimated model, and the suboptimal guided policy is calculated based on the Apollonius circle. The guided policy is combined with the fuzzy actor–critic learning algorithm, improving the learning efficiency. Considering the ground game of two pursuers and one evader, the experimental results verified the advantages of the SK-FACL in reducing tracking error, adapting model error and adapting to sudden changes made by the evader compared with pure knowledge control and the pure fuzzy actor–critic learning algorithm.</description><subject>Algorithms</subject><subject>Data mining</subject><subject>Decision making</subject><subject>Deep learning</subject><subject>Differential games</subject><subject>Disadvantages</subject><subject>Error reduction</subject><subject>Fuzzy control</subject><subject>Game theory</subject><subject>Knowledge</subject><subject>Machine learning</subject><subject>Mathematical optimization</subject><subject>Methods</subject><subject>Multi-agent systems</subject><subject>Multiagent systems</subject><subject>Optimization</subject><subject>Policies</subject><subject>Reinforcement learning (Machine learning)</subject><subject>Telecommunications systems</subject><subject>Tracking errors</subject><subject>Tracking problem</subject><subject>Unknown environments</subject><subject>Unmanned aerial vehicles</subject><issn>2079-9292</issn><issn>2079-9292</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><recordid>eNptUc1OwzAMrhBIINgTcInEuZDEaZMcq4k_MQQScK6y1J0CXTPSFDROvANvyJOQaRw4YB9s2f4-W_6y7JjRUwBNz7BDG4PvnR0Yp4qpgu9kB5xKnWuu-e6ffD-bDMMzTaYZKKAHWVtZmwiCia5fkIvx42NNKht9-P78mgYXnSUzNKHfdN-cIQ_j3K-iW5qO3PT-vcNmgaT1gRhyO3bR5dUC-0geg7EvG8x98PMOl0fZXmu6ASe_8TB7ujh_nF7ls7vL62k1yy2ULOaNZCWjmgrRKGmUKTWoAkEUxlhAwYVmEqBpaKPbFiTjKAVo5MwWWhnJ4TA72fKugn8dcYj1sx9Dn1bWXNGygFJJnaZOt1ML02Ht-tbHdG_yBpfO-h5bl-qVFFIIkIImAGwBNvhhCNjWq5B-ENY1o_VGhPofEeAHlLl9hw</recordid><startdate>20230401</startdate><enddate>20230401</enddate><creator>Wang, Xiao</creator><creator>Ma, Zhe</creator><creator>Mao, Lei</creator><creator>Sun, Kewu</creator><creator>Huang, Xuhui</creator><creator>Fan, Changchao</creator><creator>Li, Jiake</creator><general>MDPI AG</general><scope>AAYXX</scope><scope>CITATION</scope><scope>7SP</scope><scope>8FD</scope><scope>8FE</scope><scope>8FG</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L7M</scope><scope>P5Z</scope><scope>P62</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope></search><sort><creationdate>20230401</creationdate><title>Accelerating Fuzzy Actor–Critic Learning via Suboptimal Knowledge for a Multi-Agent Tracking Problem</title><author>Wang, Xiao ; Ma, Zhe ; Mao, Lei ; Sun, Kewu ; Huang, Xuhui ; Fan, Changchao ; Li, Jiake</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c361t-d716109044d87a8a69385e345aac3e42491733dd0d9ff3712e7439e21c598a723</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Algorithms</topic><topic>Data mining</topic><topic>Decision making</topic><topic>Deep learning</topic><topic>Differential games</topic><topic>Disadvantages</topic><topic>Error reduction</topic><topic>Fuzzy control</topic><topic>Game theory</topic><topic>Knowledge</topic><topic>Machine learning</topic><topic>Mathematical optimization</topic><topic>Methods</topic><topic>Multi-agent systems</topic><topic>Multiagent systems</topic><topic>Optimization</topic><topic>Policies</topic><topic>Reinforcement learning (Machine learning)</topic><topic>Telecommunications systems</topic><topic>Tracking errors</topic><topic>Tracking problem</topic><topic>Unknown environments</topic><topic>Unmanned aerial vehicles</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Wang, Xiao</creatorcontrib><creatorcontrib>Ma, Zhe</creatorcontrib><creatorcontrib>Mao, Lei</creatorcontrib><creatorcontrib>Sun, Kewu</creatorcontrib><creatorcontrib>Huang, Xuhui</creatorcontrib><creatorcontrib>Fan, Changchao</creatorcontrib><creatorcontrib>Li, Jiake</creatorcontrib><collection>CrossRef</collection><collection>Electronics & Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>Advanced Technologies & Aerospace Collection</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection (ProQuest)</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Advanced Technologies & Aerospace Database</collection><collection>ProQuest Advanced Technologies & Aerospace Collection</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><jtitle>Electronics (Basel)</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Wang, Xiao</au><au>Ma, Zhe</au><au>Mao, Lei</au><au>Sun, Kewu</au><au>Huang, Xuhui</au><au>Fan, Changchao</au><au>Li, Jiake</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Accelerating Fuzzy Actor–Critic Learning via Suboptimal Knowledge for a Multi-Agent Tracking Problem</atitle><jtitle>Electronics (Basel)</jtitle><date>2023-04-01</date><risdate>2023</risdate><volume>12</volume><issue>8</issue><spage>1852</spage><pages>1852-</pages><issn>2079-9292</issn><eissn>2079-9292</eissn><abstract>Multi-agent differential games usually include tracking policies and escaping policies. To obtain the proper policies in unknown environments, agents can learn through reinforcement learning. This typically requires a large amount of interaction with the environment, which is time-consuming and inefficient. However, if one can obtain an estimated model based on some prior knowledge, the control policy can be obtained based on suboptimal knowledge. Although there exists an error between the estimated model and the environment, the suboptimal guided policy will avoid unnecessary exploration; thus, the learning process can be significantly accelerated. Facing the problem of tracking policy optimization for multiple pursuers, this study proposed a new form of fuzzy actor–critic learning algorithm based on suboptimal knowledge (SK-FACL). In the SK-FACL, the information about the environment that can be obtained is abstracted as an estimated model, and the suboptimal guided policy is calculated based on the Apollonius circle. The guided policy is combined with the fuzzy actor–critic learning algorithm, improving the learning efficiency. Considering the ground game of two pursuers and one evader, the experimental results verified the advantages of the SK-FACL in reducing tracking error, adapting model error and adapting to sudden changes made by the evader compared with pure knowledge control and the pure fuzzy actor–critic learning algorithm.</abstract><cop>Basel</cop><pub>MDPI AG</pub><doi>10.3390/electronics12081852</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	ISSN: 2079-9292
ispartof	Electronics (Basel), 2023-04, Vol.12 (8), p.1852
issn	2079-9292 2079-9292
language	eng
recordid	cdi_proquest_journals_2806536879
source	MDPI - Multidisciplinary Digital Publishing Institute; EZB-FREE-00999 freely available EZB journals
subjects	Algorithms Data mining Decision making Deep learning Differential games Disadvantages Error reduction Fuzzy control Game theory Knowledge Machine learning Mathematical optimization Methods Multi-agent systems Multiagent systems Optimization Policies Reinforcement learning (Machine learning) Telecommunications systems Tracking errors Tracking problem Unknown environments Unmanned aerial vehicles
title	Accelerating Fuzzy Actor–Critic Learning via Suboptimal Knowledge for a Multi-Agent Tracking Problem
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-14T07%3A30%3A38IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-gale_proqu&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Accelerating%20Fuzzy%20Actor%E2%80%93Critic%20Learning%20via%20Suboptimal%20Knowledge%20for%20a%20Multi-Agent%20Tracking%20Problem&rft.jtitle=Electronics%20(Basel)&rft.au=Wang,%20Xiao&rft.date=2023-04-01&rft.volume=12&rft.issue=8&rft.spage=1852&rft.pages=1852-&rft.issn=2079-9292&rft.eissn=2079-9292&rft_id=info:doi/10.3390/electronics12081852&rft_dat=%3Cgale_proqu%3EA747443740%3C/gale_proqu%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2806536879&rft_id=info:pmid/&rft_galeid=A747443740&rfr_iscdi=true