A Safety Modulator Actor-Critic Method in Model-Free Safe Reinforcement Learning and Application in UAV Hovering
This paper proposes a safety modulator actor-critic (SMAC) method to address safety constraint and overestimation mitigation in model-free safe reinforcement learning (RL). A safety modulator is developed to satisfy safety constraints by modulating actions, allowing the policy to ignore safety const...
Gespeichert in:
Hauptverfasser: | , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | This paper proposes a safety modulator actor-critic (SMAC) method to address
safety constraint and overestimation mitigation in model-free safe
reinforcement learning (RL). A safety modulator is developed to satisfy safety
constraints by modulating actions, allowing the policy to ignore safety
constraint and focus on maximizing reward. Additionally, a distributional
critic with a theoretical update rule for SMAC is proposed to mitigate the
overestimation of Q-values with safety constraints. Both simulation and
real-world scenarios experiments on Unmanned Aerial Vehicles (UAVs) hovering
confirm that the SMAC can effectively maintain safety constraints and
outperform mainstream baseline algorithms. |
---|---|
DOI: | 10.48550/arxiv.2410.06847 |