Performance Loss Bounds for Approximate Value Iteration with State Aggregation

We consider approximate value iteration with a parameterized approximator in which the state space is partitioned and the optimal cost-to-go function over each partition is approximated by a constant. We establish performance loss bounds for policies derived from approximations associated with fixed...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Mathematics of operations research 2006-05, Vol.31 (2), p.234-244
1. Verfasser:	Van Roy, Benjamin
Format:	Artikel
Sprache:	eng
Schlagworte:	Aggregation Analysis approximate value iteration Approximate values Approximation Approximations Difference equations Dynamic programming Iterative solutions Machine learning Markov processes Mathematical aptitude Mathematical functions Mathematical theorems Operations research Optimization algorithms state aggregation Studies temporal-difference learning
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	We consider approximate value iteration with a parameterized approximator in which the state space is partitioned and the optimal cost-to-go function over each partition is approximated by a constant. We establish performance loss bounds for policies derived from approximations associated with fixed points. These bounds identify benefits to using invariant distributions of appropriate policies as projection weights. Such projection weighting relates to what is done by temporal-difference learning. Our analysis also leads to the first performance loss bound for approximate value iteration with an average-cost objective.
ISSN:	0364-765X 1526-5471
DOI:	10.1287/moor.1060.0188