Collaborative Computation Offloading and Resource Allocation in Multi-UAV-Assisted IoT Networks: A Deep Reinforcement Learning Approach
In the fifth-generation (5G) wireless networks, Edge-Internet-of-Things (EIoT) devices are envisioned to generate huge amounts of data. Due to the limitation of computation capacity and battery life of devices, all tasks cannot be processed by these devices. However, mobile-edge computing (MEC) is a...
Gespeichert in:
Veröffentlicht in: | IEEE internet of things journal 2021-08, Vol.8 (15), p.12203-12218 |
---|---|
Hauptverfasser: | , , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | In the fifth-generation (5G) wireless networks, Edge-Internet-of-Things (EIoT) devices are envisioned to generate huge amounts of data. Due to the limitation of computation capacity and battery life of devices, all tasks cannot be processed by these devices. However, mobile-edge computing (MEC) is a very promising solution enabling offloading of tasks to nearby MEC servers to improve quality of service. Also, during emergency situations in areas where network failure exists, unmanned aerial vehicles (UAVs) can be deployed to restore the network by acting as Aerial Base Stations and computational nodes for the edge network. In this article, we consider a central network controller who trains observations and broadcasts the trained data to a multi-UAV cluster network. Each UAV cluster head acts as an agent and autonomously allocates resources to EIoT devices in a decentralized fashion. We propose model-free deep reinforcement learning (DRL)-based collaborative computation offloading and resource allocation (CCORA-DRL) scheme in an aerial to ground (A2G) network for emergency situations, which can control the continuous action space. Each agent learns efficient computation offloading policies independently in the network and checks the statuses of the UAVs through Jain's Fairness index. The objective is minimizing task execution delay and energy consumption and acquiring an efficient solution by adaptive learning from the dynamic A2G network. Simulation results reveal that our scheme through deep deterministic policy gradient, effectively learns the optimal policy, outperforming A3C, deep Q -network and greedy-based offloading for local computation in stochastic dynamic environments. |
---|---|
ISSN: | 2327-4662 2327-4662 |
DOI: | 10.1109/JIOT.2021.3063188 |