Non-Stationary Contextual Bandit Learning via Neural Predictive Ensemble Sampling
Real-world applications of contextual bandits often exhibit non-stationarity due to seasonality, serendipity, and evolving social trends. While a number of non-stationary contextual bandit learning algorithms have been proposed in the literature, they excessively explore due to a lack of prioritizat...
Gespeichert in:
Hauptverfasser: | , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Real-world applications of contextual bandits often exhibit non-stationarity
due to seasonality, serendipity, and evolving social trends. While a number of
non-stationary contextual bandit learning algorithms have been proposed in the
literature, they excessively explore due to a lack of prioritization for
information of enduring value, or are designed in ways that do not scale in
modern applications with high-dimensional user-specific features and large
action set, or both. In this paper, we introduce a novel non-stationary
contextual bandit algorithm that addresses these concerns. It combines a
scalable, deep-neural-network-based architecture with a carefully designed
exploration mechanism that strategically prioritizes collecting information
with the most lasting value in a non-stationary environment. Through empirical
evaluations on two real-world recommendation datasets, which exhibit pronounced
non-stationarity, we demonstrate that our approach significantly outperforms
the state-of-the-art baselines. |
---|---|
DOI: | 10.48550/arxiv.2310.07786 |