Comparison of reinforcement algorithms on discrete functions: learnability, time complexity, and scaling

The authors compare the performances of a variety of algorithms in a reinforcement learning paradigm, including Ar-p, Ar-i, reinforcement-comparison (plus a new variation), and backpropagation of reinforcement gradient through a forward model. The task domain is discrete multioutput functions. Perfo...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Markey, K.L., Mozer, M.C.
Format:	Tagungsbericht
Sprache:	eng
Schlagworte:	Algorithm design and analysis Backpropagation algorithms Cognitive science Computer science Control systems Learning systems Predictive models Signal design Stochastic processes Time measurement
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	The authors compare the performances of a variety of algorithms in a reinforcement learning paradigm, including Ar-p, Ar-i, reinforcement-comparison (plus a new variation), and backpropagation of reinforcement gradient through a forward model. The task domain is discrete multioutput functions. Performance is measured in terms of learnability, training time, and scaling. Ar-p outperforms all others and scales well relative to supervised backpropagation. An ergodic variant of reinforcement-comparison approaches Ar-p performance. For the tasks studied, total training time (including model and controller) for the forward model algorithm is 1 to 2 orders of magnitude more costly than for Ar-p, and the controller's success is sensitive to forward model accuracy. Distortions of the reinforcement gradient predicted by an inaccurate forward model cause the controller's failures.< >
DOI:	10.1109/IJCNN.1992.287080