Robust cooperative multi-agent reinforcement learning via multi-view message certification

Many multi-agent scenarios require message sharing among agents to promote coordination, hastening the robustness of multi-agent communication when policies are deployed in a message perturbation environment. Major relevant studies tackle this issue under specific assumptions, like a limited number...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Science China. Information sciences 2024-04, Vol.67 (4), p.142102, Article 142102
Hauptverfasser:	Yuan, Lei, Jiang, Tao, Li, Lihe, Chen, Feng, Zhang, Zongzhang, Yu, Yang
Format:	Artikel
Sprache:	eng
Schlagworte:	Certification Communication Computer Science Decision making Information Systems and Communication Service Lower bounds Messages Multiagent systems Perturbation Representations Research Paper Robustness
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Many multi-agent scenarios require message sharing among agents to promote coordination, hastening the robustness of multi-agent communication when policies are deployed in a message perturbation environment. Major relevant studies tackle this issue under specific assumptions, like a limited number of message channels would sustain perturbations, limiting the efficiency in complex scenarios. In this paper, we take a further step in addressing this issue by learning a robust cooperative multi-agent reinforcement learning via multi-view message certification, dubbed CroMAC. Agents trained under CroMAC can obtain guaranteed lower bounds on state-action values to identify and choose the optimal action under a worst-case deviation when the received messages are perturbed. Concretely, we first model multi-agent communication as a multi-view problem, where every message stands for a view of the state. Then we extract a certificated joint message representation by a multi-view variational autoencoder (MVAE) that uses a product-of-experts inference network. For the optimization phase, we do perturbations in the latent space of the state for a certificate guarantee. Then the learned joint message representation is used to approximate the certificated state representation during training. Extensive experiments in several cooperative multi-agent benchmarks validate the effectiveness of the proposed CroMAC.
ISSN:	1674-733X 1869-1919
DOI:	10.1007/s11432-023-3853-y