A Contrastive-Enhanced Ensemble Framework for Efficient Multi-Agent Reinforcement Learning

Multi-agent reinforcement learning is promising for real-world applications as it encourages agents to perceive and interact with their surrounding environment autonomously. However, sample efficiency is still a concern that prevents the application of multi-agent reinforcement learning in practice....

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Expert systems with applications 2024-07, Vol.245, p.123158, Article 123158
Hauptverfasser:	Du, Xinqi, Chen, Hechang, Xing, Yongheng, Yu, Philip S., He, Lifang
Format:	Artikel
Sprache:	eng
Schlagworte:	Contrastive learning Ensemble learning Multi-agent reinforcement learning Multi-agent system Sample efficiency
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Multi-agent reinforcement learning is promising for real-world applications as it encourages agents to perceive and interact with their surrounding environment autonomously. However, sample efficiency is still a concern that prevents the application of multi-agent reinforcement learning in practice. A well-performing agent typically needs an abundance of interaction data for training, while obtaining numerous interaction data in a ‘trial-and-error’ manner is usually overhead-expensive or even infeasible for real-world tasks. In this paper, we propose a data-efficient framework, Contrastive-Enhanced Ensemble framework for Multi-Agent Reinforcement Learning (C2E-MARL), with the aim of training better-performing agents in the multi-agent system with fewer interaction data. Specifically, the proposed framework deploys an ensemble of centralized critic networks for action value estimation, i.e., it combines the outputs of multiple critic networks to estimate the action value. It makes full use of data from various perspectives to reduce the estimation error, which is helpful for efficient policy updating. Moreover, contrastive learning, a prevailing self-supervised technology, is employed to enhance the learning efficiency of submodels in C2E-MARL by augmenting the interaction data. Extensive experimental results compared with the state-of-the-art methods on three multi-agent benchmark scenarios demonstrate the superiority of C2E-MARL in terms of efficiency and performance. •Propose C2E-MARL to improve sample efficiency for multi-agent reinforcement learning.•Learn from data generated by contrastive learning to reduce the demand for sample.•Ensemble Q-network to provide better-generalized Q-estimation for efficient training.•Achieve superior sample efficiency and performance in multi-agent scenarios.
ISSN:	0957-4174 1873-6793
DOI:	10.1016/j.eswa.2024.123158