Fast-DRD: Fast decentralized reinforcement distillation for deadline-aware edge computing

Edge computing has recently gained momentum as it provides computing services for mobile devices through high-speed networks. In edge computing system optimization, deep reinforcement learning(DRL) enhances the quality of services(QoS) and shorts the age of information(AoI). However, loosely coupled...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Information processing & management 2022-03, Vol.59 (2), p.102850, Article 102850
Hauptverfasser: Song, Shinan, Fang, Zhiyi, Jiang, Jingyan
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Edge computing has recently gained momentum as it provides computing services for mobile devices through high-speed networks. In edge computing system optimization, deep reinforcement learning(DRL) enhances the quality of services(QoS) and shorts the age of information(AoI). However, loosely coupled edge servers saturate a noisy data space for DRL exploration, and learning a reasonable solution is enormously costly. Most existing works assume that the edge is an exact observation system and harvests well-labeled data for the pretraining of DRL neural networks. However, this assumption stands in opposition to the motivation of driving DRL to explore unknown information and increases the scheduling and computing costs in large-scale dynamic systems. This article leverages DRL with a distillation module to drive learning efficiency for edge computing with partial observation. We formulate the deadline-aware offloading problem as a decentralized partially observable Markov decision process (Dec-POMDP) with distillation, called fast decentralized reinforcement distillation(Fast-DRD). Each edge server decides makes offloading decisions in accordance with its own observations and learning strategies in a decentralized manner. By defining trajectory observation history(TOH) distillation and trust distillation to avoid overfitting, Fast-DRD learns a suitable offloading model in a noisy partially observed edge system and reduces the cost for communication among servers. Finally, experimental simulations are presented to evaluate and compare the effectiveness and complexity of Fast-DRD. •As far as we know, Fast-DRD is the first to investigate Dec-POMDP for modeling the deadline-aware offloading problem. Fast-DRD drives a distributed offloading and decentralized learning for loosely coupled edge servers with lower synchronize requirement, especially in unknown data space or poor communication with the central cloud.•Random exploration embodies non-iid data space and barriers to DRL efficiency in the edge. Cooperated with Dec-POMDP, we put forward the concept of trajectory observation history (TOH) as the basic distillation unit. TOH decomposes the optimization goal into ephemeral estimated rewards and accumulated real rewards for harvesting valuable knowledge and filtering out the noise in DRL.•We conduct simulation experiments for multi-server edge computing offloading. The result shows that, compared with naive Policy Distillation, Fast-DRD’s two-stage distillation
ISSN:0306-4573
1873-5371
DOI:10.1016/j.ipm.2021.102850