Risk-Sensitive Multi-Agent Reinforcement Learning in Network Aggregative Markov Games
Classical multi-agent reinforcement learning (MARL) assumes risk neutrality and complete objectivity for agents. However, in settings where agents need to consider or model human economic or social preferences, a notion of risk must be incorporated into the RL optimization problem. This will be of g...
Gespeichert in:
Hauptverfasser: | , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Classical multi-agent reinforcement learning (MARL) assumes risk neutrality
and complete objectivity for agents. However, in settings where agents need to
consider or model human economic or social preferences, a notion of risk must
be incorporated into the RL optimization problem. This will be of greater
importance in MARL where other human or non-human agents are involved, possibly
with their own risk-sensitive policies. In this work, we consider
risk-sensitive and non-cooperative MARL with cumulative prospect theory (CPT),
a non-convex risk measure and a generalization of coherent measures of risk.
CPT is capable of explaining loss aversion in humans and their tendency to
overestimate/underestimate small/large probabilities. We propose a distributed
sampling-based actor-critic (AC) algorithm with CPT risk for network
aggregative Markov games (NAMGs), which we call Distributed Nested CPT-AC.
Under a set of assumptions, we prove the convergence of the algorithm to a
subjective notion of Markov perfect Nash equilibrium in NAMGs. The experimental
results show that subjective CPT policies obtained by our algorithm can be
different from the risk-neutral ones, and agents with a higher loss aversion
are more inclined to socially isolate themselves in an NAMG. |
---|---|
DOI: | 10.48550/arxiv.2402.05906 |