Equivariant Networks for Zero-Shot Coordination
Successful coordination in Dec-POMDPs requires agents to adopt robust strategies and interpretable styles of play for their partner. A common failure mode is symmetry breaking, when agents arbitrarily converge on one out of many equivalent but mutually incompatible policies. Commonly these examples...
Gespeichert in:
Hauptverfasser: | , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Successful coordination in Dec-POMDPs requires agents to adopt robust
strategies and interpretable styles of play for their partner. A common failure
mode is symmetry breaking, when agents arbitrarily converge on one out of many
equivalent but mutually incompatible policies. Commonly these examples include
partial observability, e.g. waving your right hand vs. left hand to convey a
covert message. In this paper, we present a novel equivariant network
architecture for use in Dec-POMDPs that effectively leverages environmental
symmetry for improving zero-shot coordination, doing so more effectively than
prior methods. Our method also acts as a ``coordination-improvement operator''
for generic, pre-trained policies, and thus may be applied at test-time in
conjunction with any self-play algorithm. We provide theoretical guarantees of
our work and test on the AI benchmark task of Hanabi, where we demonstrate our
methods outperforming other symmetry-aware baselines in zero-shot coordination,
as well as able to improve the coordination ability of a variety of pre-trained
policies. In particular, we show our method can be used to improve on the state
of the art for zero-shot coordination on the Hanabi benchmark. |
---|---|
DOI: | 10.48550/arxiv.2210.12124 |