Learning to Communicate Using Counterfactual Reasoning
Learning to communicate in order to share state information is an active problem in the area of multi-agent reinforcement learning (MARL). The credit assignment problem, the non-stationarity of the communication environment and the creation of influenceable agents are major challenges within this re...
Gespeichert in:
Hauptverfasser: | , , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Learning to communicate in order to share state information is an active
problem in the area of multi-agent reinforcement learning (MARL). The credit
assignment problem, the non-stationarity of the communication environment and
the creation of influenceable agents are major challenges within this research
field which need to be overcome in order to learn a valid communication
protocol. This paper introduces the novel multi-agent counterfactual
communication learning (MACC) method which adapts counterfactual reasoning in
order to overcome the credit assignment problem for communicating agents.
Secondly, the non-stationarity of the communication environment while learning
the communication Q-function is overcome by creating the communication
Q-function using the action policy of the other agents and the Q-function of
the action environment. Additionally, a social loss function is introduced in
order to create influenceable agents which is required to learn a valid
communication protocol. Our experiments show that MACC is able to outperform
the state-of-the-art baselines in four different scenarios in the Particle
environment. |
---|---|
DOI: | 10.48550/arxiv.2006.07200 |