Variance Reduced Online Gradient Descent for Kernelized Pairwise Learning with Limited Memory
Pairwise learning is essential in machine learning, especially for problems involving loss functions defined on pairs of training examples. Online gradient descent (OGD) algorithms have been proposed to handle online pairwise learning, where data arrives sequentially. However, the pairwise nature of...
Gespeichert in:
Hauptverfasser: | , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Pairwise learning is essential in machine learning, especially for problems
involving loss functions defined on pairs of training examples. Online gradient
descent (OGD) algorithms have been proposed to handle online pairwise learning,
where data arrives sequentially. However, the pairwise nature of the problem
makes scalability challenging, as the gradient computation for a new sample
involves all past samples. Recent advancements in OGD algorithms have aimed to
reduce the complexity of calculating online gradients, achieving complexities
less than $O(T)$ and even as low as $O(1)$. However, these approaches are
primarily limited to linear models and have induced variance. In this study, we
propose a limited memory OGD algorithm that extends to kernel online pairwise
learning while improving the sublinear regret. Specifically, we establish a
clear connection between the variance of online gradients and the regret, and
construct online gradients using the most recent stratified samples with a
limited buffer of size of $s$ representing all past data, which have a
complexity of $O(sT)$ and employs $O(\sqrt{T}\log{T})$ random Fourier features
for kernel approximation. Importantly, our theoretical results demonstrate that
the variance-reduced online gradients lead to an improved sublinear regret
bound. The experiments on real-world datasets demonstrate the superiority of
our algorithm over both kernelized and linear online pairwise learning
algorithms. |
---|---|
DOI: | 10.48550/arxiv.2310.06483 |