Sample-efficient multi-agent reinforcement learning with masked reconstruction

Deep reinforcement learning (DRL) is a powerful approach that combines reinforcement learning (RL) and deep learning to address complex decision-making problems in high-dimensional environments. Although DRL has been remarkably successful, its low sample efficiency necessitates extensive training ti...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	PloS one 2023-09, Vol.18 (9), p.e0291545
Hauptverfasser:	Kim, Jung In, Lee, Young Jae, Heo, Jongkook, Park, Jinhyeok, Kim, Jaehoon, Lim, Sae Rin, Jeong, Jinyong, Kim, Seoung Bum
Format:	Artikel
Sprache:	eng
Schlagworte:	Benchmarking Biology and Life Sciences Computer and Information Sciences Data mining Decision making Deep learning Efficiency Evaluation Learning Methods Multiagent systems Physical Sciences Policy Reconstruction Reinforcement Reinforcement learning (Machine learning) Reinforcement, Psychology Research and Analysis Methods Social Sciences System effectiveness
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Deep reinforcement learning (DRL) is a powerful approach that combines reinforcement learning (RL) and deep learning to address complex decision-making problems in high-dimensional environments. Although DRL has been remarkably successful, its low sample efficiency necessitates extensive training times and large amounts of data to learn optimal policies. These limitations are more pronounced in the context of multi-agent reinforcement learning (MARL). To address these limitations, various studies have been conducted to improve DRL. In this study, we propose an approach that combines a masked reconstruction task with QMIX (M-QMIX). By introducing a masked reconstruction task as an auxiliary task, we aim to achieve enhanced sample efficiency-a fundamental limitation of RL in multi-agent systems. Experiments were conducted using the StarCraft II micromanagement benchmark to validate the effectiveness of the proposed method. We used 11 scenarios comprising five easy, three hard, and three very hard scenarios. We particularly focused on using a limited number of time steps for each scenario to demonstrate the improved sample efficiency. Compared to QMIX, the proposed method is superior in eight of the 11 scenarios. These results provide strong evidence that the proposed method is more sample-efficient than QMIX, demonstrating that it effectively addresses the limitations of DRL in multi-agent systems.
ISSN:	1932-6203 1932-6203
DOI:	10.1371/journal.pone.0291545