Generating and Adapting to Diverse Ad Hoc Partners in Hanabi

Hanabi is a cooperative game that brings the problem of modeling other players to the forefront. In this game, coordinated groups of players can leverage preestablished conventions to great effect. In this article, we focus on ad hoc settings with no previous coordination between partners. We introd...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on games 2023-06, Vol.15 (2), p.228-241
Hauptverfasser:	Canaan, Rodrigo, Gao, Xianbo, Togelius, Julian, Nealen, Andy, Menzel, Stefan
Format:	Artikel
Sprache:	eng
Schlagworte:	Adaptive systems Artificial intelligence Color Computational and artificial intelligence - Evolutionary computation Games Hypotheses Learning (artificial intelligence) - Naive Bayes methods Players Policies Sociology Statistics Training
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Hanabi is a cooperative game that brings the problem of modeling other players to the forefront. In this game, coordinated groups of players can leverage preestablished conventions to great effect. In this article, we focus on ad hoc settings with no previous coordination between partners. We introduce a "Bayesian Meta-Agent" that maintains a belief distribution over hypotheses of partner policies. The policies that serve as initial hypotheses are generated using MAP-Elites, to ensure behavioral diversity. We evaluate an "Adaptive" version of the agent, which selects a response policy based on the updated belief distribution and a "Generalist" version, which selects a response based on the uniform prior. In short episodes of ten games with a consistent partner, the "Adaptive" version outperforms the "Generalist" when the training and evaluation populations are the same. This presents a first step toward an agent that can model its partner and adapt within a time frame that is compatible with human interaction.
ISSN:	2475-1502 2475-1510
DOI:	10.1109/TG.2022.3169168