Moral reinforcement learning using actual causation
Reinforcement learning systems will to a greater and greater extent make decisions that significantly impact the well-being of humans, and it is therefore essential that these systems make decisions that conform to our expectations of morally good behavior. The morally good is often defined in causa...
Gespeichert in:
1. Verfasser: | |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Reinforcement learning systems will to a greater and greater extent make
decisions that significantly impact the well-being of humans, and it is
therefore essential that these systems make decisions that conform to our
expectations of morally good behavior. The morally good is often defined in
causal terms, as in whether one's actions have in fact caused a particular
outcome, and whether the outcome could have been anticipated. We propose an
online reinforcement learning method that learns a policy under the constraint
that the agent should not be the cause of harm. This is accomplished by
defining cause using the theory of actual causation and assigning blame to the
agent when its actions are the actual cause of an undesirable outcome. We
conduct experiments on a toy ethical dilemma in which a natural choice of
reward function leads to clearly undesirable behavior, but our method learns a
policy that avoids being the cause of harmful behavior, demonstrating the
soundness of our approach. Allowing an agent to learn while observing causal
moral distinctions such as blame, opens the possibility to learning policies
that better conform to our moral judgments. |
---|---|
DOI: | 10.48550/arxiv.2205.08192 |