Heterogeneous Social Value Orientation Leads to Meaningful Diversity in Sequential Social Dilemmas
In social psychology, Social Value Orientation (SVO) describes an individual's propensity to allocate resources between themself and others. In reinforcement learning, SVO has been instantiated as an intrinsic motivation that remaps an agent's rewards based on particular target distributio...
Gespeichert in:
Hauptverfasser: | , , , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | In social psychology, Social Value Orientation (SVO) describes an
individual's propensity to allocate resources between themself and others. In
reinforcement learning, SVO has been instantiated as an intrinsic motivation
that remaps an agent's rewards based on particular target distributions of
group reward. Prior studies show that groups of agents endowed with
heterogeneous SVO learn diverse policies in settings that resemble the
incentive structure of Prisoner's dilemma. Our work extends this body of
results and demonstrates that (1) heterogeneous SVO leads to meaningfully
diverse policies across a range of incentive structures in sequential social
dilemmas, as measured by task-specific diversity metrics; and (2) learning a
best response to such policy diversity leads to better zero-shot generalization
in some situations. We show that these best-response agents learn policies that
are conditioned on their co-players, which we posit is the reason for improved
zero-shot generalization results. |
---|---|
DOI: | 10.48550/arxiv.2305.00768 |