Increasing Students' Engagement to Reminder Emails Through Multi-Armed Bandits
Conducting randomized experiments in education settings raises the question of how we can use machine learning techniques to improve educational interventions. Using Multi-Armed Bandits (MAB) algorithms like Thompson Sampling (TS) in adaptive experiments can increase students' chances of obtain...
Gespeichert in:
Hauptverfasser: | , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Conducting randomized experiments in education settings raises the question
of how we can use machine learning techniques to improve educational
interventions. Using Multi-Armed Bandits (MAB) algorithms like Thompson
Sampling (TS) in adaptive experiments can increase students' chances of
obtaining better outcomes by increasing the probability of assignment to the
most optimal condition (arm), even before an intervention completes. This is an
advantage over traditional A/B testing, which may allocate an equal number of
students to both optimal and non-optimal conditions. The problem is the
exploration-exploitation trade-off. Even though adaptive policies aim to
collect enough information to allocate more students to better arms reliably,
past work shows that this may not be enough exploration to draw reliable
conclusions about whether arms differ. Hence, it is of interest to provide
additional uniform random (UR) exploration throughout the experiment. This
paper shows a real-world adaptive experiment on how students engage with
instructors' weekly email reminders to build their time management habits. Our
metric of interest is open email rates which tracks the arms represented by
different subject lines. These are delivered following different allocation
algorithms: UR, TS, and what we identified as TS{\dag} - which combines both TS
and UR rewards to update its priors. We highlight problems with these adaptive
algorithms - such as possible exploitation of an arm when there is no
significant difference - and address their causes and consequences. Future
directions includes studying situations where the early choice of the optimal
arm is not ideal and how adaptive algorithms can address them. |
---|---|
DOI: | 10.48550/arxiv.2208.05090 |