Multi-Agent Reinforcement Learning-Based Computation Offloading for Unmanned Aerial Vehicle Post-Disaster Rescue

Natural disasters cause significant losses. Unmanned aerial vehicles (UAVs) are valuable in rescue missions but need to offload tasks to edge servers due to their limited computing power and battery life. This study proposes a task offloading decision algorithm called the multi-agent deep determinis...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Sensors (Basel, Switzerland) Switzerland), 2024-12, Vol.24 (24), p.8014
Hauptverfasser:	Wang, Lixing, Jiao, Huirong
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Batteries Collaboration Communication computation offloading Cooperation Decision making Deep learning Drone aircraft Edge computing Efficiency Energy consumption Genetic algorithms mobile edge computing multi-agent reinforcement learning Natural disasters Optimization post-disaster rescue Teaching methods unmanned aerial vehicle Unmanned aerial vehicles Vegetation mapping Vehicles
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Natural disasters cause significant losses. Unmanned aerial vehicles (UAVs) are valuable in rescue missions but need to offload tasks to edge servers due to their limited computing power and battery life. This study proposes a task offloading decision algorithm called the multi-agent deep deterministic policy gradient with cooperation and experience replay (CER-MADDPG), which is based on multi-agent reinforcement learning for UAV computation offloading. CER-MADDPG emphasizes collaboration between UAVs and uses historical UAV experiences to classify and obtain optimal strategies. It enables collaboration among edge devices through the design of the 'critic' network. Additionally, by defining good and bad experiences for UAVs, experiences are classified into two separate buffers, allowing UAVs to learn from them, seek benefits, avoid harm, and reduce system overhead. The performance of CER-MADDPG was verified through simulations in two aspects. First, the influence of key hyperparameters on performance was examined, and the optimal values were determined. Second, CER-MADDPG was compared with other baseline algorithms. The results show that compared with MADDPG and stochastic game-based resource allocation with prioritized experience replay, CER-MADDPG achieves the lowest system overhead and superior stability and scalability.
ISSN:	1424-8220 1424-8220
DOI:	10.3390/s24248014