Localized exploration in contextual dynamic pricing achieves dimension-free regret
We study the problem of contextual dynamic pricing with a linear demand model. We propose a novel localized exploration-then-commit (LetC) algorithm which starts with a pure exploration stage, followed by a refinement stage that explores near the learned optimal pricing policy, and finally enters a...
Gespeichert in:
Hauptverfasser: | , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | We study the problem of contextual dynamic pricing with a linear demand
model. We propose a novel localized exploration-then-commit (LetC) algorithm
which starts with a pure exploration stage, followed by a refinement stage that
explores near the learned optimal pricing policy, and finally enters a pure
exploitation stage. The algorithm is shown to achieve a minimax optimal,
dimension-free regret bound when the time horizon exceeds a polynomial of the
covariate dimension. Furthermore, we provide a general theoretical framework
that encompasses the entire time spectrum, demonstrating how to balance
exploration and exploitation when the horizon is limited. The analysis is
powered by a novel critical inequality that depicts the
exploration-exploitation trade-off in dynamic pricing, mirroring its existing
counterpart for the bias-variance trade-off in regularized regression. Our
theoretical results are validated by extensive experiments on synthetic and
real-world data. |
---|---|
DOI: | 10.48550/arxiv.2412.19252 |