A Friend-or-Foe framework for multi-agent reinforcement learning policy generation in mixing cooperative–competitive scenarios

Although multi-agent deep deterministic policy gradient is a classic deep reinforcement learning algorithm in multi-agent systems. It also has critical problems such as poor training stability and low policy robustness, which significantly limit the capability and application of the algorithm. So th...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Transactions of the Institute of Measurement and Control 2022-08, Vol.44 (12), p.2378-2395
Hauptverfasser:	Sun, Yu, Lai, Jun, Cao, Lei, Chen, Xiliang, Xu, Zhixiong, Lian, Zhen, Fan, Huijin
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Deep learning Game theory Machine learning Multiagent systems Perturbation methods Policies Robustness Stability Training
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	2395
container_issue	12
container_start_page	2378
container_title	Transactions of the Institute of Measurement and Control
container_volume	44
creator	Sun, Yu Lai, Jun Cao, Lei Chen, Xiliang Xu, Zhixiong Lian, Zhen Fan, Huijin
description	Although multi-agent deep deterministic policy gradient is a classic deep reinforcement learning algorithm in multi-agent systems. It also has critical problems such as poor training stability and low policy robustness, which significantly limit the capability and application of the algorithm. So this article proposes an improved algorithm called friend-or-foe multi-agent deep deterministic policy gradient for solving the above problems. The main innovations are as follows: (1) inspired by the concept of friend-or-foe game theory, we modified the framework of the original multi-agent deep deterministic policy gradient by using two identical training networks with agents’ optimal and worst actions input, which improves the robustness of training policies, and (2) we propose an action perturbation technique based on gradient-descent to expand the selection range of actions, thereby improving training stability of our proposing algorithm. Finally, we conducted multiple sets of comparative experiments between our friend-or-foe multi-agent deep deterministic policy gradient and original one in four authoritative mixed cooperative–competitive scenarios. The results show that our improving algorithm can simultaneously improve the training stability and the robustness of agents’ generating policies in different complicated environments.
doi_str_mv	10.1177/01423312221077755
format	Article
fullrecord	<record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2682737933</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sage_id>10.1177_01423312221077755</sage_id><sourcerecordid>2682737933</sourcerecordid><originalsourceid>FETCH-LOGICAL-c264t-703cf4693c40a8d45c3eba473f3356e9d6d0a8553a0c46b5ae59a66fb1a85dda3</originalsourceid><addsrcrecordid>eNp1UMtKAzEUDaJgrX6Au4Drqcnk1VmWYlUouNH1kGbulNSZZEymanf9B__QLzFjBRfi6nKeFw5Cl5RMKFXqmlCeM0bzPKdEKSXEERpRrlRGmCyO0WjQs8Fwis5i3BBCOJd8hPYzvAgWXJX5kC084DroFt58eMa1D7jdNr3N9BpcjwNYlzgD7YAa0MFZt8adb6zZ4WSBoHvrHbYOt_Z90Iz33Tf7Cp_7D-PbDno7IBwNOB2sj-fopNZNhIufO0ZPi5vH-V22fLi9n8-Wmckl7zNFmKm5LJjhRE8rLgyDleaK1YwJCUUlq8QLwTQxXK6EBlFoKesVTWxVaTZGV4feLviXLcS-3PhtcOllmctprpgqGEsuenCZ4GMMUJddsK0Ou5KSchi6_DN0ykwOmZh2-m39P_AFaXKBOQ</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2682737933</pqid></control><display><type>article</type><title>A Friend-or-Foe framework for multi-agent reinforcement learning policy generation in mixing cooperative–competitive scenarios</title><source>SAGE Complete</source><creator>Sun, Yu ; Lai, Jun ; Cao, Lei ; Chen, Xiliang ; Xu, Zhixiong ; Lian, Zhen ; Fan, Huijin</creator><creatorcontrib>Sun, Yu ; Lai, Jun ; Cao, Lei ; Chen, Xiliang ; Xu, Zhixiong ; Lian, Zhen ; Fan, Huijin</creatorcontrib><description>Although multi-agent deep deterministic policy gradient is a classic deep reinforcement learning algorithm in multi-agent systems. It also has critical problems such as poor training stability and low policy robustness, which significantly limit the capability and application of the algorithm. So this article proposes an improved algorithm called friend-or-foe multi-agent deep deterministic policy gradient for solving the above problems. The main innovations are as follows: (1) inspired by the concept of friend-or-foe game theory, we modified the framework of the original multi-agent deep deterministic policy gradient by using two identical training networks with agents’ optimal and worst actions input, which improves the robustness of training policies, and (2) we propose an action perturbation technique based on gradient-descent to expand the selection range of actions, thereby improving training stability of our proposing algorithm. Finally, we conducted multiple sets of comparative experiments between our friend-or-foe multi-agent deep deterministic policy gradient and original one in four authoritative mixed cooperative–competitive scenarios. The results show that our improving algorithm can simultaneously improve the training stability and the robustness of agents’ generating policies in different complicated environments.</description><identifier>ISSN: 0142-3312</identifier><identifier>EISSN: 1477-0369</identifier><identifier>DOI: 10.1177/01423312221077755</identifier><language>eng</language><publisher>London, England: SAGE Publications</publisher><subject>Algorithms ; Deep learning ; Game theory ; Machine learning ; Multiagent systems ; Perturbation methods ; Policies ; Robustness ; Stability ; Training</subject><ispartof>Transactions of the Institute of Measurement and Control, 2022-08, Vol.44 (12), p.2378-2395</ispartof><rights>The Author(s) 2022</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c264t-703cf4693c40a8d45c3eba473f3356e9d6d0a8553a0c46b5ae59a66fb1a85dda3</cites><orcidid>0000-0002-8218-3909</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://journals.sagepub.com/doi/pdf/10.1177/01423312221077755$$EPDF$$P50$$Gsage$$H</linktopdf><linktohtml>$$Uhttps://journals.sagepub.com/doi/10.1177/01423312221077755$$EHTML$$P50$$Gsage$$H</linktohtml><link.rule.ids>314,776,780,21800,27903,27904,43599,43600</link.rule.ids></links><search><creatorcontrib>Sun, Yu</creatorcontrib><creatorcontrib>Lai, Jun</creatorcontrib><creatorcontrib>Cao, Lei</creatorcontrib><creatorcontrib>Chen, Xiliang</creatorcontrib><creatorcontrib>Xu, Zhixiong</creatorcontrib><creatorcontrib>Lian, Zhen</creatorcontrib><creatorcontrib>Fan, Huijin</creatorcontrib><title>A Friend-or-Foe framework for multi-agent reinforcement learning policy generation in mixing cooperative–competitive scenarios</title><title>Transactions of the Institute of Measurement and Control</title><description>Although multi-agent deep deterministic policy gradient is a classic deep reinforcement learning algorithm in multi-agent systems. It also has critical problems such as poor training stability and low policy robustness, which significantly limit the capability and application of the algorithm. So this article proposes an improved algorithm called friend-or-foe multi-agent deep deterministic policy gradient for solving the above problems. The main innovations are as follows: (1) inspired by the concept of friend-or-foe game theory, we modified the framework of the original multi-agent deep deterministic policy gradient by using two identical training networks with agents’ optimal and worst actions input, which improves the robustness of training policies, and (2) we propose an action perturbation technique based on gradient-descent to expand the selection range of actions, thereby improving training stability of our proposing algorithm. Finally, we conducted multiple sets of comparative experiments between our friend-or-foe multi-agent deep deterministic policy gradient and original one in four authoritative mixed cooperative–competitive scenarios. The results show that our improving algorithm can simultaneously improve the training stability and the robustness of agents’ generating policies in different complicated environments.</description><subject>Algorithms</subject><subject>Deep learning</subject><subject>Game theory</subject><subject>Machine learning</subject><subject>Multiagent systems</subject><subject>Perturbation methods</subject><subject>Policies</subject><subject>Robustness</subject><subject>Stability</subject><subject>Training</subject><issn>0142-3312</issn><issn>1477-0369</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><recordid>eNp1UMtKAzEUDaJgrX6Au4Drqcnk1VmWYlUouNH1kGbulNSZZEymanf9B__QLzFjBRfi6nKeFw5Cl5RMKFXqmlCeM0bzPKdEKSXEERpRrlRGmCyO0WjQs8Fwis5i3BBCOJd8hPYzvAgWXJX5kC084DroFt58eMa1D7jdNr3N9BpcjwNYlzgD7YAa0MFZt8adb6zZ4WSBoHvrHbYOt_Z90Iz33Tf7Cp_7D-PbDno7IBwNOB2sj-fopNZNhIufO0ZPi5vH-V22fLi9n8-Wmckl7zNFmKm5LJjhRE8rLgyDleaK1YwJCUUlq8QLwTQxXK6EBlFoKesVTWxVaTZGV4feLviXLcS-3PhtcOllmctprpgqGEsuenCZ4GMMUJddsK0Ou5KSchi6_DN0ykwOmZh2-m39P_AFaXKBOQ</recordid><startdate>202208</startdate><enddate>202208</enddate><creator>Sun, Yu</creator><creator>Lai, Jun</creator><creator>Cao, Lei</creator><creator>Chen, Xiliang</creator><creator>Xu, Zhixiong</creator><creator>Lian, Zhen</creator><creator>Fan, Huijin</creator><general>SAGE Publications</general><general>Sage Publications Ltd</general><scope>AAYXX</scope><scope>CITATION</scope><scope>7SP</scope><scope>7U5</scope><scope>8FD</scope><scope>F28</scope><scope>FR3</scope><scope>L7M</scope><orcidid>https://orcid.org/0000-0002-8218-3909</orcidid></search><sort><creationdate>202208</creationdate><title>A Friend-or-Foe framework for multi-agent reinforcement learning policy generation in mixing cooperative–competitive scenarios</title><author>Sun, Yu ; Lai, Jun ; Cao, Lei ; Chen, Xiliang ; Xu, Zhixiong ; Lian, Zhen ; Fan, Huijin</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c264t-703cf4693c40a8d45c3eba473f3356e9d6d0a8553a0c46b5ae59a66fb1a85dda3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Algorithms</topic><topic>Deep learning</topic><topic>Game theory</topic><topic>Machine learning</topic><topic>Multiagent systems</topic><topic>Perturbation methods</topic><topic>Policies</topic><topic>Robustness</topic><topic>Stability</topic><topic>Training</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Sun, Yu</creatorcontrib><creatorcontrib>Lai, Jun</creatorcontrib><creatorcontrib>Cao, Lei</creatorcontrib><creatorcontrib>Chen, Xiliang</creatorcontrib><creatorcontrib>Xu, Zhixiong</creatorcontrib><creatorcontrib>Lian, Zhen</creatorcontrib><creatorcontrib>Fan, Huijin</creatorcontrib><collection>CrossRef</collection><collection>Electronics & Communications Abstracts</collection><collection>Solid State and Superconductivity Abstracts</collection><collection>Technology Research Database</collection><collection>ANTE: Abstracts in New Technology & Engineering</collection><collection>Engineering Research Database</collection><collection>Advanced Technologies Database with Aerospace</collection><jtitle>Transactions of the Institute of Measurement and Control</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Sun, Yu</au><au>Lai, Jun</au><au>Cao, Lei</au><au>Chen, Xiliang</au><au>Xu, Zhixiong</au><au>Lian, Zhen</au><au>Fan, Huijin</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>A Friend-or-Foe framework for multi-agent reinforcement learning policy generation in mixing cooperative–competitive scenarios</atitle><jtitle>Transactions of the Institute of Measurement and Control</jtitle><date>2022-08</date><risdate>2022</risdate><volume>44</volume><issue>12</issue><spage>2378</spage><epage>2395</epage><pages>2378-2395</pages><issn>0142-3312</issn><eissn>1477-0369</eissn><abstract>Although multi-agent deep deterministic policy gradient is a classic deep reinforcement learning algorithm in multi-agent systems. It also has critical problems such as poor training stability and low policy robustness, which significantly limit the capability and application of the algorithm. So this article proposes an improved algorithm called friend-or-foe multi-agent deep deterministic policy gradient for solving the above problems. The main innovations are as follows: (1) inspired by the concept of friend-or-foe game theory, we modified the framework of the original multi-agent deep deterministic policy gradient by using two identical training networks with agents’ optimal and worst actions input, which improves the robustness of training policies, and (2) we propose an action perturbation technique based on gradient-descent to expand the selection range of actions, thereby improving training stability of our proposing algorithm. Finally, we conducted multiple sets of comparative experiments between our friend-or-foe multi-agent deep deterministic policy gradient and original one in four authoritative mixed cooperative–competitive scenarios. The results show that our improving algorithm can simultaneously improve the training stability and the robustness of agents’ generating policies in different complicated environments.</abstract><cop>London, England</cop><pub>SAGE Publications</pub><doi>10.1177/01423312221077755</doi><tpages>18</tpages><orcidid>https://orcid.org/0000-0002-8218-3909</orcidid></addata></record>
fulltext	fulltext
identifier	ISSN: 0142-3312
ispartof	Transactions of the Institute of Measurement and Control, 2022-08, Vol.44 (12), p.2378-2395
issn	0142-3312 1477-0369
language	eng
recordid	cdi_proquest_journals_2682737933
source	SAGE Complete
subjects	Algorithms Deep learning Game theory Machine learning Multiagent systems Perturbation methods Policies Robustness Stability Training
title	A Friend-or-Foe framework for multi-agent reinforcement learning policy generation in mixing cooperative–competitive scenarios
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-28T09%3A30%3A28IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=A%20Friend-or-Foe%20framework%20for%20multi-agent%20reinforcement%20learning%20policy%20generation%20in%20mixing%20cooperative%E2%80%93competitive%20scenarios&rft.jtitle=Transactions%20of%20the%20Institute%20of%20Measurement%20and%20Control&rft.au=Sun,%20Yu&rft.date=2022-08&rft.volume=44&rft.issue=12&rft.spage=2378&rft.epage=2395&rft.pages=2378-2395&rft.issn=0142-3312&rft.eissn=1477-0369&rft_id=info:doi/10.1177/01423312221077755&rft_dat=%3Cproquest_cross%3E2682737933%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2682737933&rft_id=info:pmid/&rft_sage_id=10.1177_01423312221077755&rfr_iscdi=true