Jump Starting Bandits with LLM-Generated Prior Knowledge
We present substantial evidence demonstrating the benefits of integrating Large Language Models (LLMs) with a Contextual Multi-Armed Bandit framework. Contextual bandits have been widely used in recommendation systems to generate personalized suggestions based on user-specific contexts. We show that...
Gespeichert in:
Hauptverfasser: | , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | We present substantial evidence demonstrating the benefits of integrating
Large Language Models (LLMs) with a Contextual Multi-Armed Bandit framework.
Contextual bandits have been widely used in recommendation systems to generate
personalized suggestions based on user-specific contexts. We show that LLMs,
pre-trained on extensive corpora rich in human knowledge and preferences, can
simulate human behaviours well enough to jump-start contextual multi-armed
bandits to reduce online learning regret. We propose an initialization
algorithm for contextual bandits by prompting LLMs to produce a pre-training
dataset of approximate human preferences for the bandit. This significantly
reduces online learning regret and data-gathering costs for training such
models. Our approach is validated empirically through two sets of experiments
with different bandit setups: one which utilizes LLMs to serve as an oracle and
a real-world experiment utilizing data from a conjoint survey experiment. |
---|---|
DOI: | 10.48550/arxiv.2406.19317 |