Beyond Nash Equilibrium: Achieving Bayesian Perfect Equilibrium with Belief Update Fictitious Play

In the domain of machine learning and game theory, the quest for Nash Equilibrium (NE) in extensive-form games with incomplete information is challenging yet crucial for enhancing AI's decision-making support under varied scenarios. Traditional Counterfactual Regret Minimization (CFR) technique...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Ju, Qi, Fang, Zhemei, Luo, Yunfeng
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:In the domain of machine learning and game theory, the quest for Nash Equilibrium (NE) in extensive-form games with incomplete information is challenging yet crucial for enhancing AI's decision-making support under varied scenarios. Traditional Counterfactual Regret Minimization (CFR) techniques excel in navigating towards NE, focusing on scenarios where opponents deploy optimal strategies. However, the essence of machine learning in strategic game play extends beyond reacting to optimal moves; it encompasses aiding human decision-making in all circumstances. This includes not only crafting responses to optimal strategies but also recovering from suboptimal decisions and capitalizing on opponents' errors. Herein lies the significance of transitioning from NE to Bayesian Perfect Equilibrium (BPE), which accounts for every possible condition, including the irrationality of opponents. To bridge this gap, we propose Belief Update Fictitious Play (BUFP), which innovatively blends fictitious play with belief to target BPE, a more comprehensive equilibrium concept than NE. Specifically, through adjusting iteration stepsizes, BUFP allows for strategic convergence to both NE and BPE. For instance, in our experiments, BUFP(EF) leverages the stepsize of Extensive Form Fictitious Play (EFFP) to achieve BPE, outperforming traditional CFR by securing a 48.53\% increase in benefits in scenarios characterized by dominated strategies.
DOI:10.48550/arxiv.2409.02706