Logistic Regression Regret: What's the Catch?
We address the problem of the achievable regret rates with online logistic regression. We derive lower bounds with logarithmic regret under $L_1$, $L_2$, and $L_\infty$ constraints on the parameter values. The bounds are dominated by $d/2 \log T$, where $T$ is the horizon and $d$ is the dimensionali...
Gespeichert in:
1. Verfasser: | |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | We address the problem of the achievable regret rates with online logistic
regression. We derive lower bounds with logarithmic regret under $L_1$, $L_2$,
and $L_\infty$ constraints on the parameter values. The bounds are dominated by
$d/2 \log T$, where $T$ is the horizon and $d$ is the dimensionality of the
parameter space. We show their achievability for $d=o(T^{1/3})$ in all these
cases with Bayesian methods, that achieve them up to a $d/2 \log d$ term.
Interesting different behaviors are shown for larger dimensionality.
Specifically, on the negative side, if $d = \Omega(\sqrt{T})$, any algorithm is
guaranteed regret of $\Omega(d \log T)$ (greater than $\Omega(\sqrt{T})$) under
$L_\infty$ constraints on the parameters (and the example features). On the
positive side, under $L_1$ constraints on the parameters, there exist
algorithms that can achieve regret that is sub-linear in $d$ for the
asymptotically larger values of $d$. For $L_2$ constraints, it is shown that
for large enough $d$, the regret remains linear in $d$ but no longer
logarithmic in $T$. Adapting the redundancy-capacity theorem from information
theory, we demonstrate a principled methodology based on grids of parameters to
derive lower bounds. Grids are also utilized to derive some upper bounds. Our
results strengthen results by Kakade and Ng (2005) and Foster et al. (2018) for
upper bounds for this problem, introduce novel lower bounds, and adapt a
methodology that can be used to obtain such bounds for other related problems.
They also give a novel characterization of the asymptotic behavior when the
dimension of the parameter space is allowed to grow with $T$. They additionally
establish connections to the information theory literature, demonstrating that
the actual regret for logistic regression depends on the richness of the
parameter class, where even within this problem, richer classes lead to greater
regret. |
---|---|
DOI: | 10.48550/arxiv.2002.02950 |