Time is Budget: A Heuristic for Reducing the Risk of Ruin in Multi-armed Gambler Bandits

In this paper we consider Multi-Armed Gambler Bandits (MAGB), a stochastic random process in which an agent performs successive actions and either loses 1 unit from its budget after observing a failure, or earns 1 unit after a success. It constitutes a survival problem where the risk of ruin must be...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Perotto, Filipo Studzinski, Pucel, Xavier, Farges, Jean-Loup
Format: Buchkapitel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:In this paper we consider Multi-Armed Gambler Bandits (MAGB), a stochastic random process in which an agent performs successive actions and either loses 1 unit from its budget after observing a failure, or earns 1 unit after a success. It constitutes a survival problem where the risk of ruin must be taken into account. The agent’s initial budget evolves in time with the received rewards and must remain positive throughout the process. The contribution of this paper is the definition of an original heuristic which aims at improving the probability of survival in a MAGB by replacing the time by the budget as the factor that regulates exploration in UCB-like methods. The proposed strategy is then experimentally compared to standard algorithms presenting good results.
ISSN:0302-9743
1611-3349
DOI:10.1007/978-3-031-21441-7_29