Controlling a Markov Decision Process with an Abrupt Change in the Transition Kernel
We consider the control of a Markov decision process (MDP) that undergoes an abrupt change in its transition kernel (mode). We formulate the problem of minimizing regret under control-switching based on mode change detection, compared to a mode-observing controller, as an optimal stopping problem. U...
Gespeichert in:
Veröffentlicht in: | arXiv.org 2022-10 |
---|---|
Hauptverfasser: | , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | We consider the control of a Markov decision process (MDP) that undergoes an abrupt change in its transition kernel (mode). We formulate the problem of minimizing regret under control-switching based on mode change detection, compared to a mode-observing controller, as an optimal stopping problem. Using a sequence of approximations, we reduce it to a quickest change detection (QCD) problem with Markovian data, for which we characterize a state-dependent threshold-type optimal change detection policy. Numerical experiments illustrate various properties of our control-switching policy. |
---|---|
ISSN: | 2331-8422 |