Reinforcement learning of non-Markov decision processes

Techniques based on reinforcement learning (RL) have been used to build systems that learn to perform nontrivial sequential decision tasks. To date, most of this work has focused on learning tasks that can be described as Markov decision processes. While this formalism is useful for modeling a wide...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Artificial intelligence 1995-02, Vol.73 (1), p.271-306
Hauptverfasser: Whitehead, Steven D., Lin, Long-Ji
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Techniques based on reinforcement learning (RL) have been used to build systems that learn to perform nontrivial sequential decision tasks. To date, most of this work has focused on learning tasks that can be described as Markov decision processes. While this formalism is useful for modeling a wide range of control problems, there are important tasks that are inherently non-Markov. We refer to these as hidden state tasks since they arise when information relevant to identifying the state of the environment is hidden (or missing) from the agent's immediate sensation. Two important types of control problems that resist Markov modeling are those in which (1) the system has a high degree of control over the information collected by its sensors (e.g., as in active vision), or (2) the system has a limited set of sensors that do not always provide adequate information about the current state of the environment. Existing RL algorithms perform unreliably on hidden state tasks. This article examines two general approaches to extending reinforcement learning to hidden state tasks. The Consistent Representation (CR) Method unifies recent approaches such as the Lion algorithm, the G-algorithm, and CS-QL. The method is useful for learning tasks that require the agent to control its sensory inputs. However, it assumes that, by appropriate control of perception, the external states can be identified at each point in time from the immediate sensory inputs. A second, more general set of algorithms in which the agent maintains internal state over time is also considered. These stored-state algorithms, though quite different in detail, share the common feature that each derives its internal representation by combining immediate sensory inputs with internal state which is maintained over time. The relative merits of these methods are considered and conditions for their useful application are discussed.
ISSN:0004-3702
1872-7921
DOI:10.1016/0004-3702(94)00012-P