Safe MPC Alignment with Human Directional Feedback
In safety-critical robot planning or control, manually specifying safety constraints or learning them from demonstrations can be challenging. In this paper, we propose a certifiable alignment method for a robot to learn a safety constraint in its model predictive control (MPC) policy with human onli...
Gespeichert in:
Hauptverfasser: | , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | In safety-critical robot planning or control, manually specifying safety
constraints or learning them from demonstrations can be challenging. In this
paper, we propose a certifiable alignment method for a robot to learn a safety
constraint in its model predictive control (MPC) policy with human online
directional feedback. To our knowledge, it is the first method to learn safety
constraints from human feedback. The proposed method is based on an empirical
observation: human directional feedback, when available, tends to guide the
robot toward safer regions. The method only requires the direction of human
feedback to update the learning hypothesis space. It is certifiable, providing
an upper bound on the total number of human feedback in the case of successful
learning of safety constraints, or declaring the misspecification of the
hypothesis space, i.e., the true implicit safety constraint cannot be found
within the specified hypothesis space. We evaluated the proposed method using
numerical examples and user studies in two developed simulation games.
Additionally, we implemented and tested the proposed method on a real-world
Franka robot arm performing mobile water-pouring tasks in a user study. The
simulation and experimental results demonstrate the efficacy and efficiency of
our method, showing that it enables a robot to successfully learn safety
constraints with a small handful (tens) of human directional corrections. |
---|---|
DOI: | 10.48550/arxiv.2407.04216 |