Deep Multi-User Reinforcement Learning for Distributed Dynamic Spectrum Access

We consider the problem of dynamic spectrum access for network utility maximization in multichannel wireless networks. The shared bandwidth is divided into K orthogonal channels. In the beginning of each time slot, each user selects a channel and transmits a packet with a certain transmission prob...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on wireless communications 2019-01, Vol.18 (1), p.310-323
Hauptverfasser: Naparstek, Oshri, Cohen, Kobi
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 323
container_issue 1
container_start_page 310
container_title IEEE transactions on wireless communications
container_volume 18
creator Naparstek, Oshri
Cohen, Kobi
description We consider the problem of dynamic spectrum access for network utility maximization in multichannel wireless networks. The shared bandwidth is divided into K orthogonal channels. In the beginning of each time slot, each user selects a channel and transmits a packet with a certain transmission probability. After each time slot, each user that has transmitted a packet receives a local observation indicating whether its packet was successfully delivered or not (i.e., ACK signal). The objective is a multi-user strategy for accessing the spectrum that maximizes a certain network utility in a distributed manner without online coordination or message exchanges between users. Obtaining an optimal solution for the spectrum access problem is computationally expensive, in general, due to the large-state space and partial observability of the states. To tackle this problem, we develop a novel distributed dynamic spectrum access algorithm based on deep multi-user reinforcement leaning. Specifically, at each time slot, each user maps its current state to the spectrum access actions based on a trained deep-Q network used to maximize the objective function. Game theoretic analysis of the system dynamics is developed for establishing design principles for the implementation of the algorithm. The experimental results demonstrate the strong performance of the algorithm.
doi_str_mv 10.1109/TWC.2018.2879433
format Article
fullrecord <record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_proquest_journals_2165864719</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>8532121</ieee_id><sourcerecordid>2165864719</sourcerecordid><originalsourceid>FETCH-LOGICAL-c333t-8708c89b4c969f54069091c4e8e8aab2d6b41b1586776c9dc6d9977440a29c253</originalsourceid><addsrcrecordid>eNo9kElLA0EQRhtRMEbvgpcGzxN7X44hcYOooAkem5meGumQWeyeOeTfOyHBU30U76uCh9AtJTNKiX1Yfy9mjFAzY0ZbwfkZmlApTcaYMOeHzFVGmVaX6CqlLSFUKykn6H0J0OG3YdeHbJMg4k8ITdVGDzU0PV5BHpvQ_OBxhZch9TEUQw8lXu6bvA4ef3Xg-zjUeO49pHSNLqp8l-DmNKdo8_S4Xrxkq4_n18V8lXnOeZ8ZTYw3thDeKltJQZQllnoBBkyeF6xUhaAFlUZprbwtvSqt1VoIkjPrmeRTdH-828X2d4DUu207xGZ86RhVY09oakeKHCkf25QiVK6Loc7j3lHiDtbcaM0drLmTtbFyd6wEAPjHjeSMMsr_AOFyZ14</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2165864719</pqid></control><display><type>article</type><title>Deep Multi-User Reinforcement Learning for Distributed Dynamic Spectrum Access</title><source>IEEE Electronic Library (IEL)</source><creator>Naparstek, Oshri ; Cohen, Kobi</creator><creatorcontrib>Naparstek, Oshri ; Cohen, Kobi</creatorcontrib><description>We consider the problem of dynamic spectrum access for network utility maximization in multichannel wireless networks. The shared bandwidth is divided into &lt;inline-formula&gt; &lt;tex-math notation="LaTeX"&gt;K &lt;/tex-math&gt;&lt;/inline-formula&gt; orthogonal channels. In the beginning of each time slot, each user selects a channel and transmits a packet with a certain transmission probability. After each time slot, each user that has transmitted a packet receives a local observation indicating whether its packet was successfully delivered or not (i.e., ACK signal). The objective is a multi-user strategy for accessing the spectrum that maximizes a certain network utility in a distributed manner without online coordination or message exchanges between users. Obtaining an optimal solution for the spectrum access problem is computationally expensive, in general, due to the large-state space and partial observability of the states. To tackle this problem, we develop a novel distributed dynamic spectrum access algorithm based on deep multi-user reinforcement leaning. Specifically, at each time slot, each user maps its current state to the spectrum access actions based on a trained deep-Q network used to maximize the objective function. Game theoretic analysis of the system dynamics is developed for establishing design principles for the implementation of the algorithm. The experimental results demonstrate the strong performance of the algorithm.</description><identifier>ISSN: 1536-1276</identifier><identifier>EISSN: 1558-2248</identifier><identifier>DOI: 10.1109/TWC.2018.2879433</identifier><identifier>CODEN: ITWCAX</identifier><language>eng</language><publisher>New York: IEEE</publisher><subject>Algorithms ; Bandwidths ; deep reinforcement learning ; Dynamic spectrum access ; Game theory ; Games ; Heuristic algorithms ; medium access control (MAC) protocols ; multi-agent learning ; Multichannel communication ; Observability (systems) ; Optimization ; Prediction algorithms ; Spectrum allocation ; System dynamics ; Training ; Wireless communication ; Wireless networks</subject><ispartof>IEEE transactions on wireless communications, 2019-01, Vol.18 (1), p.310-323</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2019</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c333t-8708c89b4c969f54069091c4e8e8aab2d6b41b1586776c9dc6d9977440a29c253</citedby><cites>FETCH-LOGICAL-c333t-8708c89b4c969f54069091c4e8e8aab2d6b41b1586776c9dc6d9977440a29c253</cites><orcidid>0000-0003-0532-009X</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/8532121$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,777,781,793,27905,27906,54739</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/8532121$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Naparstek, Oshri</creatorcontrib><creatorcontrib>Cohen, Kobi</creatorcontrib><title>Deep Multi-User Reinforcement Learning for Distributed Dynamic Spectrum Access</title><title>IEEE transactions on wireless communications</title><addtitle>TWC</addtitle><description>We consider the problem of dynamic spectrum access for network utility maximization in multichannel wireless networks. The shared bandwidth is divided into &lt;inline-formula&gt; &lt;tex-math notation="LaTeX"&gt;K &lt;/tex-math&gt;&lt;/inline-formula&gt; orthogonal channels. In the beginning of each time slot, each user selects a channel and transmits a packet with a certain transmission probability. After each time slot, each user that has transmitted a packet receives a local observation indicating whether its packet was successfully delivered or not (i.e., ACK signal). The objective is a multi-user strategy for accessing the spectrum that maximizes a certain network utility in a distributed manner without online coordination or message exchanges between users. Obtaining an optimal solution for the spectrum access problem is computationally expensive, in general, due to the large-state space and partial observability of the states. To tackle this problem, we develop a novel distributed dynamic spectrum access algorithm based on deep multi-user reinforcement leaning. Specifically, at each time slot, each user maps its current state to the spectrum access actions based on a trained deep-Q network used to maximize the objective function. Game theoretic analysis of the system dynamics is developed for establishing design principles for the implementation of the algorithm. The experimental results demonstrate the strong performance of the algorithm.</description><subject>Algorithms</subject><subject>Bandwidths</subject><subject>deep reinforcement learning</subject><subject>Dynamic spectrum access</subject><subject>Game theory</subject><subject>Games</subject><subject>Heuristic algorithms</subject><subject>medium access control (MAC) protocols</subject><subject>multi-agent learning</subject><subject>Multichannel communication</subject><subject>Observability (systems)</subject><subject>Optimization</subject><subject>Prediction algorithms</subject><subject>Spectrum allocation</subject><subject>System dynamics</subject><subject>Training</subject><subject>Wireless communication</subject><subject>Wireless networks</subject><issn>1536-1276</issn><issn>1558-2248</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2019</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNo9kElLA0EQRhtRMEbvgpcGzxN7X44hcYOooAkem5meGumQWeyeOeTfOyHBU30U76uCh9AtJTNKiX1Yfy9mjFAzY0ZbwfkZmlApTcaYMOeHzFVGmVaX6CqlLSFUKykn6H0J0OG3YdeHbJMg4k8ITdVGDzU0PV5BHpvQ_OBxhZch9TEUQw8lXu6bvA4ef3Xg-zjUeO49pHSNLqp8l-DmNKdo8_S4Xrxkq4_n18V8lXnOeZ8ZTYw3thDeKltJQZQllnoBBkyeF6xUhaAFlUZprbwtvSqt1VoIkjPrmeRTdH-828X2d4DUu207xGZ86RhVY09oakeKHCkf25QiVK6Loc7j3lHiDtbcaM0drLmTtbFyd6wEAPjHjeSMMsr_AOFyZ14</recordid><startdate>201901</startdate><enddate>201901</enddate><creator>Naparstek, Oshri</creator><creator>Cohen, Kobi</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0003-0532-009X</orcidid></search><sort><creationdate>201901</creationdate><title>Deep Multi-User Reinforcement Learning for Distributed Dynamic Spectrum Access</title><author>Naparstek, Oshri ; Cohen, Kobi</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c333t-8708c89b4c969f54069091c4e8e8aab2d6b41b1586776c9dc6d9977440a29c253</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2019</creationdate><topic>Algorithms</topic><topic>Bandwidths</topic><topic>deep reinforcement learning</topic><topic>Dynamic spectrum access</topic><topic>Game theory</topic><topic>Games</topic><topic>Heuristic algorithms</topic><topic>medium access control (MAC) protocols</topic><topic>multi-agent learning</topic><topic>Multichannel communication</topic><topic>Observability (systems)</topic><topic>Optimization</topic><topic>Prediction algorithms</topic><topic>Spectrum allocation</topic><topic>System dynamics</topic><topic>Training</topic><topic>Wireless communication</topic><topic>Wireless networks</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Naparstek, Oshri</creatorcontrib><creatorcontrib>Cohen, Kobi</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics &amp; Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>IEEE transactions on wireless communications</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Naparstek, Oshri</au><au>Cohen, Kobi</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Deep Multi-User Reinforcement Learning for Distributed Dynamic Spectrum Access</atitle><jtitle>IEEE transactions on wireless communications</jtitle><stitle>TWC</stitle><date>2019-01</date><risdate>2019</risdate><volume>18</volume><issue>1</issue><spage>310</spage><epage>323</epage><pages>310-323</pages><issn>1536-1276</issn><eissn>1558-2248</eissn><coden>ITWCAX</coden><abstract>We consider the problem of dynamic spectrum access for network utility maximization in multichannel wireless networks. The shared bandwidth is divided into &lt;inline-formula&gt; &lt;tex-math notation="LaTeX"&gt;K &lt;/tex-math&gt;&lt;/inline-formula&gt; orthogonal channels. In the beginning of each time slot, each user selects a channel and transmits a packet with a certain transmission probability. After each time slot, each user that has transmitted a packet receives a local observation indicating whether its packet was successfully delivered or not (i.e., ACK signal). The objective is a multi-user strategy for accessing the spectrum that maximizes a certain network utility in a distributed manner without online coordination or message exchanges between users. Obtaining an optimal solution for the spectrum access problem is computationally expensive, in general, due to the large-state space and partial observability of the states. To tackle this problem, we develop a novel distributed dynamic spectrum access algorithm based on deep multi-user reinforcement leaning. Specifically, at each time slot, each user maps its current state to the spectrum access actions based on a trained deep-Q network used to maximize the objective function. Game theoretic analysis of the system dynamics is developed for establishing design principles for the implementation of the algorithm. The experimental results demonstrate the strong performance of the algorithm.</abstract><cop>New York</cop><pub>IEEE</pub><doi>10.1109/TWC.2018.2879433</doi><tpages>14</tpages><orcidid>https://orcid.org/0000-0003-0532-009X</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier ISSN: 1536-1276
ispartof IEEE transactions on wireless communications, 2019-01, Vol.18 (1), p.310-323
issn 1536-1276
1558-2248
language eng
recordid cdi_proquest_journals_2165864719
source IEEE Electronic Library (IEL)
subjects Algorithms
Bandwidths
deep reinforcement learning
Dynamic spectrum access
Game theory
Games
Heuristic algorithms
medium access control (MAC) protocols
multi-agent learning
Multichannel communication
Observability (systems)
Optimization
Prediction algorithms
Spectrum allocation
System dynamics
Training
Wireless communication
Wireless networks
title Deep Multi-User Reinforcement Learning for Distributed Dynamic Spectrum Access
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-19T21%3A00%3A55IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Deep%20Multi-User%20Reinforcement%20Learning%20for%20Distributed%20Dynamic%20Spectrum%20Access&rft.jtitle=IEEE%20transactions%20on%20wireless%20communications&rft.au=Naparstek,%20Oshri&rft.date=2019-01&rft.volume=18&rft.issue=1&rft.spage=310&rft.epage=323&rft.pages=310-323&rft.issn=1536-1276&rft.eissn=1558-2248&rft.coden=ITWCAX&rft_id=info:doi/10.1109/TWC.2018.2879433&rft_dat=%3Cproquest_RIE%3E2165864719%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2165864719&rft_id=info:pmid/&rft_ieee_id=8532121&rfr_iscdi=true