A general Markov decision process formalism for action-state entropy-regularized reward maximization
Previous work has separately addressed different forms of action, state and action-state entropy regularization, pure exploration and space occupation. These problems have become extremely relevant for regularization, generalization, speeding up learning and providing robust solutions at unprecedent...
Gespeichert in:
Hauptverfasser: | , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Previous work has separately addressed different forms of action, state and
action-state entropy regularization, pure exploration and space occupation.
These problems have become extremely relevant for regularization,
generalization, speeding up learning and providing robust solutions at
unprecedented levels. However, solutions of those problems are hectic, ranging
from convex and non-convex optimization, and unconstrained optimization to
constrained optimization. Here we provide a general dual function formalism
that transforms the constrained optimization problem into an unconstrained
convex one for any mixture of action and state entropies. The cases with pure
action entropy and pure state entropy are understood as limits of the mixture. |
---|---|
DOI: | 10.48550/arxiv.2302.01098 |