Multi-Agent Reinforcement Learning for Dynamic Topology Optimization of Mesh Wireless Networks

In Mesh Wireless Networks (MWNs), the network coverage is extended by connecting Access Points (APs) in a mesh topology, where transmitting frames by multi-hop routing has to sustain the performances, such as end-to-end (E2E) delay and channel efficiency. Several recent studies have focused on minim...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on wireless communications 2024-09, Vol.23 (9), p.10501-10513
Hauptverfasser:	Sun, Wei, Lv, Qiushuo, Xiao, Yang, Liu, Zhi, Tang, Qingwei, Li, Qiyue, Mu, Daoming
Format:	Artikel
Sprache:	eng
Schlagworte:	Actor-critic ad hoc wireless network Algorithms Convergence Delay Delays Efficiency Logic gates Machine learning mesh wireless network Multiagent systems Network topologies Network topology reinforcement learning Topology Topology optimization Trajectory Vectors Wireless networks
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	10513
container_issue	9
container_start_page	10501
container_title	IEEE transactions on wireless communications
container_volume	23
creator	Sun, Wei Lv, Qiushuo Xiao, Yang Liu, Zhi Tang, Qingwei Li, Qiyue Mu, Daoming
description	In Mesh Wireless Networks (MWNs), the network coverage is extended by connecting Access Points (APs) in a mesh topology, where transmitting frames by multi-hop routing has to sustain the performances, such as end-to-end (E2E) delay and channel efficiency. Several recent studies have focused on minimizing E2E delay, but these methods are unable to adapt to the dynamic nature of MWNs. Meanwhile, reinforcement-learning-based methods offer better adaptability to dynamics but suffer from the problem of high-dimensional action spaces, leading to slower convergence. In this paper, we propose a multi-agent actor-critic reinforcement learning (MACRL) algorithm to optimize multiple objectives, specifically the minimization of E2E delay and the enhancement of channel efficiency. First, to reduce the action space and speed up the convergence in the dynamical optimization process, a centralized-critic-distributed-actor scheme is proposed. Then, a multi-objective reward balancing method is designed to dynamically balance the MWNs' performances between the E2E delay and the channel efficiency. Finally, the trained MACRL algorithm is deployed in the QaulNet simulator to verify its effectiveness.
doi_str_mv	10.1109/TWC.2024.3372694
format	Article
fullrecord	<record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_crossref_primary_10_1109_TWC_2024_3372694</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10466475</ieee_id><sourcerecordid>3102974480</sourcerecordid><originalsourceid>FETCH-LOGICAL-c245t-a972108d556aae2d6f36c28ee2baa9266f22b5fbebbe5dabd593bb59474415e63</originalsourceid><addsrcrecordid>eNpNkM1LwzAYh4MoOKd3Dx4CnjuTNEnb45ifsDmQyW6GpH07M9umJh0y_3pbtoOn94Pn977wIHRNyYRSkt2t1rMJI4xP4jhhMuMnaESFSCPGeHo69LGMKEvkOboIYUsITaQQI_Sx2FWdjaYbaDr8BrYpnc-hHqY5aN_YZoP7Fb7fN7q2OV651lVus8fLtrO1_dWddQ12JV5A-MRr66GCEPArdD_Of4VLdFbqKsDVsY7R--PDavYczZdPL7PpPMoZF12ks4RRkhZCSK2BFbKMZc5SAGa0zpiUJWNGlAaMAVFoU4gsNkZkPOGcCpDxGN0e7rbefe8gdGrrdr7pX6qYEpb1XEp6ihyo3LsQPJSq9bbWfq8oUYNF1VtUg0V1tNhHbg4RCwD_cC4lT0T8B5xgb3U</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3102974480</pqid></control><display><type>article</type><title>Multi-Agent Reinforcement Learning for Dynamic Topology Optimization of Mesh Wireless Networks</title><source>IEEE Electronic Library (IEL)</source><creator>Sun, Wei ; Lv, Qiushuo ; Xiao, Yang ; Liu, Zhi ; Tang, Qingwei ; Li, Qiyue ; Mu, Daoming</creator><creatorcontrib>Sun, Wei ; Lv, Qiushuo ; Xiao, Yang ; Liu, Zhi ; Tang, Qingwei ; Li, Qiyue ; Mu, Daoming</creatorcontrib><description>In Mesh Wireless Networks (MWNs), the network coverage is extended by connecting Access Points (APs) in a mesh topology, where transmitting frames by multi-hop routing has to sustain the performances, such as end-to-end (E2E) delay and channel efficiency. Several recent studies have focused on minimizing E2E delay, but these methods are unable to adapt to the dynamic nature of MWNs. Meanwhile, reinforcement-learning-based methods offer better adaptability to dynamics but suffer from the problem of high-dimensional action spaces, leading to slower convergence. In this paper, we propose a multi-agent actor-critic reinforcement learning (MACRL) algorithm to optimize multiple objectives, specifically the minimization of E2E delay and the enhancement of channel efficiency. First, to reduce the action space and speed up the convergence in the dynamical optimization process, a centralized-critic-distributed-actor scheme is proposed. Then, a multi-objective reward balancing method is designed to dynamically balance the MWNs' performances between the E2E delay and the channel efficiency. Finally, the trained MACRL algorithm is deployed in the QaulNet simulator to verify its effectiveness.</description><identifier>ISSN: 1536-1276</identifier><identifier>EISSN: 1558-2248</identifier><identifier>DOI: 10.1109/TWC.2024.3372694</identifier><identifier>CODEN: ITWCAX</identifier><language>eng</language><publisher>New York: IEEE</publisher><subject>Actor-critic ; ad hoc wireless network ; Algorithms ; Convergence ; Delay ; Delays ; Efficiency ; Logic gates ; Machine learning ; mesh wireless network ; Multiagent systems ; Network topologies ; Network topology ; reinforcement learning ; Topology ; Topology optimization ; Trajectory ; Vectors ; Wireless networks</subject><ispartof>IEEE transactions on wireless communications, 2024-09, Vol.23 (9), p.10501-10513</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2024</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c245t-a972108d556aae2d6f36c28ee2baa9266f22b5fbebbe5dabd593bb59474415e63</cites><orcidid>0000-0003-0537-4522 ; 0000-0001-8549-6794 ; 0000-0003-4075-0597 ; 0000-0002-9399-8759 ; 0000-0002-1692-3796</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10466475$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,776,780,792,27901,27902,54733</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/10466475$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Sun, Wei</creatorcontrib><creatorcontrib>Lv, Qiushuo</creatorcontrib><creatorcontrib>Xiao, Yang</creatorcontrib><creatorcontrib>Liu, Zhi</creatorcontrib><creatorcontrib>Tang, Qingwei</creatorcontrib><creatorcontrib>Li, Qiyue</creatorcontrib><creatorcontrib>Mu, Daoming</creatorcontrib><title>Multi-Agent Reinforcement Learning for Dynamic Topology Optimization of Mesh Wireless Networks</title><title>IEEE transactions on wireless communications</title><addtitle>TWC</addtitle><description>In Mesh Wireless Networks (MWNs), the network coverage is extended by connecting Access Points (APs) in a mesh topology, where transmitting frames by multi-hop routing has to sustain the performances, such as end-to-end (E2E) delay and channel efficiency. Several recent studies have focused on minimizing E2E delay, but these methods are unable to adapt to the dynamic nature of MWNs. Meanwhile, reinforcement-learning-based methods offer better adaptability to dynamics but suffer from the problem of high-dimensional action spaces, leading to slower convergence. In this paper, we propose a multi-agent actor-critic reinforcement learning (MACRL) algorithm to optimize multiple objectives, specifically the minimization of E2E delay and the enhancement of channel efficiency. First, to reduce the action space and speed up the convergence in the dynamical optimization process, a centralized-critic-distributed-actor scheme is proposed. Then, a multi-objective reward balancing method is designed to dynamically balance the MWNs' performances between the E2E delay and the channel efficiency. Finally, the trained MACRL algorithm is deployed in the QaulNet simulator to verify its effectiveness.</description><subject>Actor-critic</subject><subject>ad hoc wireless network</subject><subject>Algorithms</subject><subject>Convergence</subject><subject>Delay</subject><subject>Delays</subject><subject>Efficiency</subject><subject>Logic gates</subject><subject>Machine learning</subject><subject>mesh wireless network</subject><subject>Multiagent systems</subject><subject>Network topologies</subject><subject>Network topology</subject><subject>reinforcement learning</subject><subject>Topology</subject><subject>Topology optimization</subject><subject>Trajectory</subject><subject>Vectors</subject><subject>Wireless networks</subject><issn>1536-1276</issn><issn>1558-2248</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNpNkM1LwzAYh4MoOKd3Dx4CnjuTNEnb45ifsDmQyW6GpH07M9umJh0y_3pbtoOn94Pn977wIHRNyYRSkt2t1rMJI4xP4jhhMuMnaESFSCPGeHo69LGMKEvkOboIYUsITaQQI_Sx2FWdjaYbaDr8BrYpnc-hHqY5aN_YZoP7Fb7fN7q2OV651lVus8fLtrO1_dWddQ12JV5A-MRr66GCEPArdD_Of4VLdFbqKsDVsY7R--PDavYczZdPL7PpPMoZF12ks4RRkhZCSK2BFbKMZc5SAGa0zpiUJWNGlAaMAVFoU4gsNkZkPOGcCpDxGN0e7rbefe8gdGrrdr7pX6qYEpb1XEp6ihyo3LsQPJSq9bbWfq8oUYNF1VtUg0V1tNhHbg4RCwD_cC4lT0T8B5xgb3U</recordid><startdate>20240901</startdate><enddate>20240901</enddate><creator>Sun, Wei</creator><creator>Lv, Qiushuo</creator><creator>Xiao, Yang</creator><creator>Liu, Zhi</creator><creator>Tang, Qingwei</creator><creator>Li, Qiyue</creator><creator>Mu, Daoming</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0003-0537-4522</orcidid><orcidid>https://orcid.org/0000-0001-8549-6794</orcidid><orcidid>https://orcid.org/0000-0003-4075-0597</orcidid><orcidid>https://orcid.org/0000-0002-9399-8759</orcidid><orcidid>https://orcid.org/0000-0002-1692-3796</orcidid></search><sort><creationdate>20240901</creationdate><title>Multi-Agent Reinforcement Learning for Dynamic Topology Optimization of Mesh Wireless Networks</title><author>Sun, Wei ; Lv, Qiushuo ; Xiao, Yang ; Liu, Zhi ; Tang, Qingwei ; Li, Qiyue ; Mu, Daoming</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c245t-a972108d556aae2d6f36c28ee2baa9266f22b5fbebbe5dabd593bb59474415e63</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Actor-critic</topic><topic>ad hoc wireless network</topic><topic>Algorithms</topic><topic>Convergence</topic><topic>Delay</topic><topic>Delays</topic><topic>Efficiency</topic><topic>Logic gates</topic><topic>Machine learning</topic><topic>mesh wireless network</topic><topic>Multiagent systems</topic><topic>Network topologies</topic><topic>Network topology</topic><topic>reinforcement learning</topic><topic>Topology</topic><topic>Topology optimization</topic><topic>Trajectory</topic><topic>Vectors</topic><topic>Wireless networks</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Sun, Wei</creatorcontrib><creatorcontrib>Lv, Qiushuo</creatorcontrib><creatorcontrib>Xiao, Yang</creatorcontrib><creatorcontrib>Liu, Zhi</creatorcontrib><creatorcontrib>Tang, Qingwei</creatorcontrib><creatorcontrib>Li, Qiyue</creatorcontrib><creatorcontrib>Mu, Daoming</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>IEEE transactions on wireless communications</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Sun, Wei</au><au>Lv, Qiushuo</au><au>Xiao, Yang</au><au>Liu, Zhi</au><au>Tang, Qingwei</au><au>Li, Qiyue</au><au>Mu, Daoming</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Multi-Agent Reinforcement Learning for Dynamic Topology Optimization of Mesh Wireless Networks</atitle><jtitle>IEEE transactions on wireless communications</jtitle><stitle>TWC</stitle><date>2024-09-01</date><risdate>2024</risdate><volume>23</volume><issue>9</issue><spage>10501</spage><epage>10513</epage><pages>10501-10513</pages><issn>1536-1276</issn><eissn>1558-2248</eissn><coden>ITWCAX</coden><abstract>In Mesh Wireless Networks (MWNs), the network coverage is extended by connecting Access Points (APs) in a mesh topology, where transmitting frames by multi-hop routing has to sustain the performances, such as end-to-end (E2E) delay and channel efficiency. Several recent studies have focused on minimizing E2E delay, but these methods are unable to adapt to the dynamic nature of MWNs. Meanwhile, reinforcement-learning-based methods offer better adaptability to dynamics but suffer from the problem of high-dimensional action spaces, leading to slower convergence. In this paper, we propose a multi-agent actor-critic reinforcement learning (MACRL) algorithm to optimize multiple objectives, specifically the minimization of E2E delay and the enhancement of channel efficiency. First, to reduce the action space and speed up the convergence in the dynamical optimization process, a centralized-critic-distributed-actor scheme is proposed. Then, a multi-objective reward balancing method is designed to dynamically balance the MWNs' performances between the E2E delay and the channel efficiency. Finally, the trained MACRL algorithm is deployed in the QaulNet simulator to verify its effectiveness.</abstract><cop>New York</cop><pub>IEEE</pub><doi>10.1109/TWC.2024.3372694</doi><tpages>13</tpages><orcidid>https://orcid.org/0000-0003-0537-4522</orcidid><orcidid>https://orcid.org/0000-0001-8549-6794</orcidid><orcidid>https://orcid.org/0000-0003-4075-0597</orcidid><orcidid>https://orcid.org/0000-0002-9399-8759</orcidid><orcidid>https://orcid.org/0000-0002-1692-3796</orcidid></addata></record>
fulltext	fulltext_linktorsrc
identifier	ISSN: 1536-1276
ispartof	IEEE transactions on wireless communications, 2024-09, Vol.23 (9), p.10501-10513
issn	1536-1276 1558-2248
language	eng
recordid	cdi_crossref_primary_10_1109_TWC_2024_3372694
source	IEEE Electronic Library (IEL)
subjects	Actor-critic ad hoc wireless network Algorithms Convergence Delay Delays Efficiency Logic gates Machine learning mesh wireless network Multiagent systems Network topologies Network topology reinforcement learning Topology Topology optimization Trajectory Vectors Wireless networks
title	Multi-Agent Reinforcement Learning for Dynamic Topology Optimization of Mesh Wireless Networks
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-13T05%3A55%3A51IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Multi-Agent%20Reinforcement%20Learning%20for%20Dynamic%20Topology%20Optimization%20of%20Mesh%20Wireless%20Networks&rft.jtitle=IEEE%20transactions%20on%20wireless%20communications&rft.au=Sun,%20Wei&rft.date=2024-09-01&rft.volume=23&rft.issue=9&rft.spage=10501&rft.epage=10513&rft.pages=10501-10513&rft.issn=1536-1276&rft.eissn=1558-2248&rft.coden=ITWCAX&rft_id=info:doi/10.1109/TWC.2024.3372694&rft_dat=%3Cproquest_RIE%3E3102974480%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=3102974480&rft_id=info:pmid/&rft_ieee_id=10466475&rfr_iscdi=true