Generalized Beliefs for Cooperative AI
Self-play is a common paradigm for constructing solutions in Markov games that can yield optimal policies in collaborative settings. However, these policies often adopt highly-specialized conventions that make playing with a novel partner difficult. To address this, recent approaches rely on encodin...
Gespeichert in:
Hauptverfasser: | , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Self-play is a common paradigm for constructing solutions in Markov games
that can yield optimal policies in collaborative settings. However, these
policies often adopt highly-specialized conventions that make playing with a
novel partner difficult. To address this, recent approaches rely on encoding
symmetry and convention-awareness into policy training, but these require
strong environmental assumptions and can complicate policy training. We
therefore propose moving the learning of conventions to the belief space.
Specifically, we propose a belief learning model that can maintain beliefs over
rollouts of policies not seen at training time, and can thus decode and adapt
to novel conventions at test time. We show how to leverage this model for both
search and training of a best response over various pools of policies to
greatly improve ad-hoc teamplay. We also show how our setup promotes
explainability and interpretability of nuanced agent conventions. |
---|---|
DOI: | 10.48550/arxiv.2206.12765 |