Online Learning as Stochastic Approximation of Regularization Paths: Optimality and Almost-Sure Convergence

In this paper, an online learning algorithm is proposed as sequential stochastic approximation of a regularization path converging to the regression function in reproducing kernel Hilbert spaces (RKHSs). We show that it is possible to produce the best known strong (RKHS norm) convergence rate of bat...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on information theory 2014-09, Vol.60 (9), p.5716-5735
Hauptverfasser:	Tarres, Pierre, Yuan Yao
Format:	Artikel
Sprache:	eng
Schlagworte:	Applied sciences Approximation Approximation methods Convergence Educational institutions Exact sciences and technology Hilbert space Information theory Information, signal and communications theory Kernel Probabilistic logic Stochastic models Telecommunications and information theory Upper bound
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	In this paper, an online learning algorithm is proposed as sequential stochastic approximation of a regularization path converging to the regression function in reproducing kernel Hilbert spaces (RKHSs). We show that it is possible to produce the best known strong (RKHS norm) convergence rate of batch learning, through a careful choice of the gain or step size sequences, depending on regularity assumptions on the regression function. The corresponding weak (mean square distance) convergence rate is optimal in the sense that it reaches the minimax and individual lower rates in this paper. In both cases, we deduce almost sure convergence, using Bernstein-type inequalities for martingales in Hilbert spaces. To achieve this, we develop a bias-variance decomposition similar to the batch learning setting; the bias consists in the approximation and drift errors along the regularization path, which display the same rates of convergence, and the variance arises from the sample error analyzed as a (reverse) martingale difference sequence. The rates above are obtained by an optimal tradeoff between the bias and the variance.
ISSN:	0018-9448 1557-9654
DOI:	10.1109/TIT.2014.2332531