Time is Budget: A Heuristic for Reducing the Risk of Ruin in Multi-armed Gambler Bandits
In this paper we consider Multi-Armed Gambler Bandits (MAGB), a stochastic random process in which an agent performs successive actions and either loses 1 unit from its budget after observing a failure, or earns 1 unit after a success. It constitutes a survival problem where the risk of ruin must be...
Gespeichert in:
Hauptverfasser: | , , |
---|---|
Format: | Buchkapitel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | In this paper we consider Multi-Armed Gambler Bandits (MAGB), a stochastic random process in which an agent performs successive actions and either loses 1 unit from its budget after observing a failure, or earns 1 unit after a success. It constitutes a survival problem where the risk of ruin must be taken into account. The agent’s initial budget evolves in time with the received rewards and must remain positive throughout the process. The contribution of this paper is the definition of an original heuristic which aims at improving the probability of survival in a MAGB by replacing the time by the budget as the factor that regulates exploration in UCB-like methods. The proposed strategy is then experimentally compared to standard algorithms presenting good results. |
---|---|
ISSN: | 0302-9743 1611-3349 |
DOI: | 10.1007/978-3-031-21441-7_29 |