LDR: Learning Discrete Representation to Improve Noise Robustness in Multiagent Tasks

In real-world applications of multiagent reinforcement learning (MARL), agents often face inaccurate environments due to unavoidable noise, presenting a challenge to their robustness. However, limited prior work focuses on addressing such noise in observations, hindering the deployment of multiagent...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on systems, man, and cybernetics. Systems man, and cybernetics. Systems, 2025-01, Vol.55 (1), p.513-525
Hauptverfasser: Fu, Yuqian, Zhu, Yuanheng, Chai, Jiajun, Zhao, Dongbin
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:In real-world applications of multiagent reinforcement learning (MARL), agents often face inaccurate environments due to unavoidable noise, presenting a challenge to their robustness. However, limited prior work focuses on addressing such noise in observations, hindering the deployment of multiagent systems. In this article, we propose a method named learning discrete representation (LDR) to improve robustness against noise in multiagent tasks. Specifically, LDR employs a quantization module with a segment mechanism to encode observations and teammate actions, generating discrete representations from learnable codebooks. These representations are subsequently processed via a combiner for decision-making. Through discretization, LDR is able to mitigate the impact of minor noise on decision-making. To enhance the learning efficiency, we incorporate a set-input block that treats the joint observations of agents as a permutation-invariant set, thereby reducing the complexity of the joint observation space. Additionally, we theoretically analyze the expressiveness of discrete representation and the boundedness of discrete distortion. We evaluate the proposed method on StarCraft II micromanagement tasks and multiagent MuJoCo with noisy observations. Empirical results demonstrate that LDR outperforms existing algorithms, improving robustness in noisy cooperative MARL tasks while maintaining superior performance in clean observations.
ISSN:2168-2216
2168-2232
DOI:10.1109/TSMC.2024.3487535