Configurable Mirror Descent: Towards a Unification of Decision Making
Decision-making problems, categorized as single-agent, e.g., Atari, cooperative multi-agent, e.g., Hanabi, competitive multi-agent, e.g., Hold'em poker, and mixed cooperative and competitive, e.g., football, are ubiquitous in the real world. Various methods are proposed to address the specific...
Gespeichert in:
Hauptverfasser: | , , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Decision-making problems, categorized as single-agent, e.g., Atari,
cooperative multi-agent, e.g., Hanabi, competitive multi-agent, e.g., Hold'em
poker, and mixed cooperative and competitive, e.g., football, are ubiquitous in
the real world. Various methods are proposed to address the specific
decision-making problems. Despite the successes in specific categories, these
methods typically evolve independently and cannot generalize to other
categories. Therefore, a fundamental question for decision-making is: \emph{Can
we develop \textbf{a single algorithm} to tackle \textbf{ALL} categories of
decision-making problems?} There are several main challenges to address this
question: i) different decision-making categories involve different numbers of
agents and different relationships between agents, ii) different categories
have different solution concepts and evaluation measures, and iii) there lacks
a comprehensive benchmark covering all the categories. This work presents a
preliminary attempt to address the question with three main contributions. i)
We propose the generalized mirror descent (GMD), a generalization of MD
variants, which considers multiple historical policies and works with a broader
class of Bregman divergences. ii) We propose the configurable mirror descent
(CMD) where a meta-controller is introduced to dynamically adjust the
hyper-parameters in GMD conditional on the evaluation measures. iii) We
construct the \textsc{GameBench} with 15 academic-friendly games across
different decision-making categories. Extensive experiments demonstrate that
CMD achieves empirically competitive or better outcomes compared to baselines
while providing the capability of exploring diverse dimensions of decision
making. |
---|---|
DOI: | 10.48550/arxiv.2405.11746 |