Finding the Near Optimal Policy via Adaptive Reduced Regularization in MDPs

Regularized MDPs serve as a smooth version of original MDPs. However, biased optimal policy always exists for regularized MDPs. Instead of making the coefficient{\lambda}of regularized term sufficiently small, we propose an adaptive reduction scheme for {\lambda} to approximate optimal policy of the...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	arXiv.org 2020-10
Hauptverfasser:	Yang, Wenhao, Li, Xiang, Xie, Guangzeng, Zhang, Zhihua
Format:	Artikel
Sprache:	eng
Schlagworte:	Adaptive algorithms Reduction Regularization
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Regularized MDPs serve as a smooth version of original MDPs. However, biased optimal policy always exists for regularized MDPs. Instead of making the coefficient{\lambda}of regularized term sufficiently small, we propose an adaptive reduction scheme for {\lambda} to approximate optimal policy of the original MDP. It is shown that the iteration complexity for obtaining an{\epsilon}-optimal policy could be reduced in comparison with setting sufficiently small{\lambda}. In addition, there exists strong duality connection between the reduction method and solving the original MDP directly, from which we can derive more adaptive reduction method for certain algorithms.
ISSN:	2331-8422