Classes of Dilemma Problems and Their Multi-Agent Reinforcement Learning Method
Multi-agent systems appear in a wide variety of fields and there have been several studies on multi-agent reinforcement learning. Dilemma problems are typical classes of multi-agent problems. In these problems, the best policy for each agent differs from the best policy for the group of agents, whic...
Gespeichert in:
Veröffentlicht in: | IEEE access 2024, Vol.12, p.107353-107367 |
---|---|
Hauptverfasser: | , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Multi-agent systems appear in a wide variety of fields and there have been several studies on multi-agent reinforcement learning. Dilemma problems are typical classes of multi-agent problems. In these problems, the best policy for each agent differs from the best policy for the group of agents, which makes them difficult to solve. The purpose of this paper is to discuss multi-agent reinforcement learning methods for the dilemma problems. Firstly, we propose definitions of classes of dilemma problems in a general framework of reinforcement learning. We also discuss the relationship among our definitions and existing definitions and show the generality of our proposed definitions. Secondly, we propose a reinforcement learning method that can acquire the cooperative policies for the dilemma problems. In the method, each agent assumes the policies which the other agents would take, and learns through maximizing its return, expecting them to take the assumed policy. We apply the proposed method to the n-person iterative Prisoner's dilemma (NIPD) and the Tragedy of the Commons which are typical examples of dilemma problems and investigate its performance. It is shown through the experiments that the proposed method makes it possible to learn the cooperative policies more reliably than the existing methods and possesses superior performance to them. |
---|---|
ISSN: | 2169-3536 2169-3536 |
DOI: | 10.1109/ACCESS.2024.3438937 |