Reinforcement learning of non-Markov decision processes
Techniques based on reinforcement learning (RL) have been used to build systems that learn to perform nontrivial sequential decision tasks. To date, most of this work has focused on learning tasks that can be described as Markov decision processes. While this formalism is useful for modeling a wide...
Gespeichert in:
Veröffentlicht in: | Artificial intelligence 1995-02, Vol.73 (1), p.271-306 |
---|---|
Hauptverfasser: | , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Techniques based on reinforcement learning (RL) have been used to build systems that learn to perform nontrivial sequential decision tasks. To date, most of this work has focused on learning tasks that can be described as Markov decision processes. While this formalism is useful for modeling a wide range of control problems, there are important tasks that are inherently non-Markov. We refer to these as
hidden state tasks since they arise when information relevant to identifying the state of the environment is
hidden (or missing) from the agent's immediate sensation. Two important types of control problems that resist Markov modeling are those in which (1) the system has a high degree of control over the information collected by its sensors (e.g., as in active vision), or (2) the system has a limited set of sensors that do not always provide adequate information about the current state of the environment. Existing RL algorithms perform unreliably on hidden state tasks.
This article examines two general approaches to extending reinforcement learning to hidden state tasks. The
Consistent Representation (CR) Method unifies recent approaches such as the Lion algorithm, the G-algorithm, and CS-QL. The method is useful for learning tasks that require the agent to control its sensory inputs. However, it assumes that, by appropriate control of perception, the external states can be identified at each point in time from the immediate sensory inputs. A second, more general set of algorithms in which the agent maintains internal state over time is also considered. These
stored-state algorithms, though quite different in detail, share the common feature that each derives its internal representation by combining immediate sensory inputs with internal state which is maintained over time. The relative merits of these methods are considered and conditions for their useful application are discussed. |
---|---|
ISSN: | 0004-3702 1872-7921 |
DOI: | 10.1016/0004-3702(94)00012-P |