Social-aware D2MD user grouping based on game theory and deep Q-learning

Device-to-Multi-Device (D2MD) communication shows great advantages in maximizing the offloading of Base Station (BS). At present, the main research works are based on mathematical model optimization methods, viz, spatial distribution model and content request model of Content Request Users (CRUs), a...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Peer-to-peer networking and applications 2023-03, Vol.16 (2), p.606-628
Hauptverfasser: Liu, Jianlong, Wen, Jiaye, Xie, Yuhang, Lin, Lixia, Zhou, Wen’an
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Device-to-Multi-Device (D2MD) communication shows great advantages in maximizing the offloading of Base Station (BS). At present, the main research works are based on mathematical model optimization methods, viz, spatial distribution model and content request model of Content Request Users (CRUs), and social relationship intensity model, which can estimate the D2MD transmission performance in order to reduce the BS traffic to a greater extent by optimizing the D2MD user grouping. However, for the real-world system, the three models are complex and difficult to describe in realistic scenario; for Seed Users (SUs), they have inherent selfishness, i.e., they want to communicate with lower transmission power; for CRU, they want to obtain the content in the shortest possible waiting time. Hence, the problem that how to reduce the BS traffic by D2MD user grouping and incentive SUs to contribute transmission power becomes very difficult to solve. In order to solve it, we describe the D2MD user grouping with transmission power control processes in this scenario, and then model the problem as a joint problem of game and long-term optimization. Then, we use matching-Stackelberg hierarchical game and Q-learning algorithm to solve it. Specially, at first, we propose a matching and incentive-based power control method, which maximize the myopic offloading of BS with lower transmission power of SUs, i.e., higher energy efficiency; Secondly, we design a D2MD user grouping algorithm based on multi-agent deep Q-learning algorithm, in order to maximize the long-term average offloading of BS. Finally, the results of simulation experiments show that the proposed algorithm can maximize the long-term average offloading of BS while access more CRUs with quality of service-guaranteed, and it can maximize the energy efficiency of SUs as well.
ISSN:1936-6442
1936-6450
DOI:10.1007/s12083-022-01411-7