A reinforcement learning neural network for adaptive control of Markov chains

In this paper we consider the problem of reinforcement learning in a dynamically changing environment. In this context, we study the problem of adaptive control of finite-state Markov chains with a finite number of controls. The transition and payoff structures are unknown. The objective is to find...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on systems, man and cybernetics. Part A, Systems and humans man and cybernetics. Part A, Systems and humans, 1997-09, Vol.27 (5), p.588-600
Hauptverfasser: Santharam, G., Sastry, P.S.
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:In this paper we consider the problem of reinforcement learning in a dynamically changing environment. In this context, we study the problem of adaptive control of finite-state Markov chains with a finite number of controls. The transition and payoff structures are unknown. The objective is to find an optimal policy which maximizes the expected total discounted payoff over the infinite horizon. A stochastic neural network model is suggested for the controller. The parameters of the neural net, which determine a random control strategy, are updated at each instant using a simple learning scheme. This learning scheme involves estimation of some relevant parameters using an adaptive critic. It is proved that the controller asymptotically chooses an optimal action in each state of the Markov chain with a high probability.
ISSN:1083-4427
1558-2426
DOI:10.1109/3468.618258