Online Pairwise Learning Algorithms

Pairwise learning usually refers to a learning task that involves a loss function depending on pairs of examples, among which the most notable ones are bipartite ranking, metric learning, and AUC maximization. In this letter we study an online algorithm for pairwise learning with a least-square loss...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Neural computation 2016-04, Vol.28 (4), p.743-777
Hauptverfasser:	Ying, Yiming, Zhou, Ding-Xuan
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Convergence Distance learning Hilbert space Kernels Learning Letters Mathematical analysis Mathematical models Probability Random variables
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Pairwise learning usually refers to a learning task that involves a loss function depending on pairs of examples, among which the most notable ones are bipartite ranking, metric learning, and AUC maximization. In this letter we study an online algorithm for pairwise learning with a least-square loss function in an unconstrained setting of a reproducing kernel Hilbert space (RKHS) that we refer to as the Online Pairwise lEaRning Algorithm (OPERA). In contrast to existing works (Kar, Sriperumbudur, Jain, & Karnick, ; Wang, Khardon, Pechyony, & Jones, ), which require that the iterates are restricted to a bounded domain or the loss function is strongly convex, OPERA is associated with a non-strongly convex objective function and learns the target function in an unconstrained RKHS. Specifically, we establish a general theorem that guarantees the almost sure convergence for the last iterate of OPERA without any assumptions on the underlying distribution. Explicit convergence rates are derived under the condition of polynomially decaying step sizes. We also establish an interesting property for a family of widely used kernels in the setting of pairwise learning and illustrate the convergence results using such kernels. Our methodology mainly depends on the characterization of RKHSs using its associated integral operators and probability inequalities for random variables with values in a Hilbert space.
ISSN:	0899-7667 1530-888X
DOI:	10.1162/NECO_a_00817