A Behavioral Model for Exploration vs. Exploitation: Theoretical Framework and Experimental Evidence
How do people navigate the exploration-exploitation (EE) trade-off when making repeated choices with unknown rewards? We study this question through the lens of multi-armed bandit problems and introduce a novel behavioral model, Quantal Choice with Adaptive Reduction of Exploration (QCARE). It gener...
Gespeichert in:
Hauptverfasser: | , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | How do people navigate the exploration-exploitation (EE) trade-off when
making repeated choices with unknown rewards? We study this question through
the lens of multi-armed bandit problems and introduce a novel behavioral model,
Quantal Choice with Adaptive Reduction of Exploration (QCARE). It generalizes
Thompson Sampling, allowing for a principled way to quantify the EE trade-off
and reflect human decision-making patterns. The model adaptively reduces
exploration as information accumulates, with the reduction rate serving as a
parameter to quantify the EE trade-off dynamics. We theoretically analyze how
varying reduction rates influence decision quality, shedding light on the
effects of ``over-exploration'' and ``under-exploration.'' Empirically, we
validate QCARE through experiments collecting behavioral data from human
participants. QCARE not only captures critical behavioral patterns in the EE
trade-off but also outperforms alternative models in predictive power. Our
analysis reveals a behavioral tendency toward over-exploration. |
---|---|
DOI: | 10.48550/arxiv.2207.01028 |