Probabilistic majorization of partially observable markov decision processes

Markov Decision Processes (MDPs) are wielded by the Reinforcement Learning and control community as a framework to bestow artificial agents with the ability to make autonomous decisions. Control as Inference (CaI) is a tangent research direction that aims to recast optimal decision making as an inst...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
1. Verfasser: Lefebvre, Tom
Format: Tagungsbericht
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Markov Decision Processes (MDPs) are wielded by the Reinforcement Learning and control community as a framework to bestow artificial agents with the ability to make autonomous decisions. Control as Inference (CaI) is a tangent research direction that aims to recast optimal decision making as an instance of probabilistic inference, with the dual hope to incite exploration and simplify calculations. Active Inference (AIF) is a sibling theory conforming to similar directives. Notably, AIF also entertains a procedure for per- and proprio-ception, which is currently lacking from the CaI theory. Recent work has established an explicit connection between CaI and Markov Decision Processes (MDPs). In particular, it was shown that the CaI policy can be iterated recursively, ultimately retrieving the associated MDP policy. In this work, such results are generalized to Partially Observable Markov Decision Processes, that – apart from a procedure to make optimal decisions – now also entertains a procedure for model based per- and proprio-ception. By extending the theory of CaI to the context of optimal decision making under partial observability, we mean to further our understanding of and illuminate the relationship between these different frameworks.
ISSN:1865-0937
1865-0929
1865-0937