Learning to Bid in Contextual First Price Auctions
In this paper, we investigate the problem about how to bid in repeated contextual first price auctions. We consider a single bidder (learner) who repeatedly bids in the first price auctions: at each time $t$, the learner observes a context $x_t\in \mathbb{R}^d$ and decides the bid based on historica...
Gespeichert in:
Hauptverfasser: | , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | In this paper, we investigate the problem about how to bid in repeated
contextual first price auctions. We consider a single bidder (learner) who
repeatedly bids in the first price auctions: at each time $t$, the learner
observes a context $x_t\in \mathbb{R}^d$ and decides the bid based on
historical information and $x_t$. We assume a structured linear model of the
maximum bid of all the others $m_t = \alpha_0\cdot x_t + z_t$, where
$\alpha_0\in \mathbb{R}^d$ is unknown to the learner and $z_t$ is randomly
sampled from a noise distribution $\mathcal{F}$ with log-concave density
function $f$. We consider both \emph{binary feedback} (the learner can only
observe whether she wins or not) and \emph{full information feedback} (the
learner can observe $m_t$) at the end of each time $t$. For binary feedback,
when the noise distribution $\mathcal{F}$ is known, we propose a bidding
algorithm, by using maximum likelihood estimation (MLE) method to achieve at
most $\widetilde{O}(\sqrt{\log(d) T})$ regret. Moreover, we generalize this
algorithm to the setting with binary feedback and the noise distribution is
unknown but belongs to a parametrized family of distributions. For the full
information feedback with \emph{unknown} noise distribution, we provide an
algorithm that achieves regret at most $\widetilde{O}(\sqrt{dT})$. Our approach
combines an estimator for log-concave density functions and then MLE method to
learn the noise distribution $\mathcal{F}$ and linear weight $\alpha_0$
simultaneously. We also provide a lower bound result such that any bidding
policy in a broad class must achieve regret at least $\Omega(\sqrt{T})$, even
when the learner receives the full information feedback and $\mathcal{F}$ is
known. |
---|---|
DOI: | 10.48550/arxiv.2109.03173 |