Last iterate convergence of SGD for Least-Squares in the Interpolation regime

Motivated by the recent successes of neural networks that have the ability to fit the data perfectly and generalize well, we study the noiseless model in the fundamental least-squares setup. We assume that an optimum predictor fits perfectly inputs and outputs $\langle \theta_* , \phi(X) \rangle = Y...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Varre, Aditya, Pillaud-Vivien, Loucas, Flammarion, Nicolas
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer Science - Learning Mathematics - Optimization and Control Statistics - Machine Learning
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title
container_volume
creator	Varre, Aditya Pillaud-Vivien, Loucas Flammarion, Nicolas
description	Motivated by the recent successes of neural networks that have the ability to fit the data perfectly and generalize well, we study the noiseless model in the fundamental least-squares setup. We assume that an optimum predictor fits perfectly inputs and outputs $\langle \theta_* , \phi(X) \rangle = Y$, where $\phi(X)$ stands for a possibly infinite dimensional non-linear feature map. To solve this problem, we consider the estimator given by the last iterate of stochastic gradient descent (SGD) with constant step-size. In this context, our contribution is two fold: (i) from a (stochastic) optimization perspective, we exhibit an archetypal problem where we can show explicitly the convergence of SGD final iterate for a non-strongly convex problem with constant step-size whereas usual results use some form of average and (ii) from a statistical perspective, we give explicit non-asymptotic convergence rates in the over-parameterized setting and leverage a fine-grained parameterization of the problem to exhibit polynomial rates that can be faster than $O(1/T)$. The link with reproducing kernel Hilbert spaces is established.
doi_str_mv	10.48550/arxiv.2102.03183
format	Article
fullrecord	<record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2102_03183</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2102_03183</sourcerecordid><originalsourceid>FETCH-LOGICAL-a673-186b2eb09cda3ab6c4059dc87d5665a73c0c4084f07fd188b49c8f5c91ff74ce3</originalsourceid><addsrcrecordid>eNotj8tOwzAURL1hgQofwAr_QIIdP7NEBUqlIBbtPrpxroul1i6OqeDvCaWrkUZzRjqE3HFWS6sUe4D8HU51w1lTM8GtuCZvHUyFhoIZClKX4gnzDqNDmjzdrJ6oT5l2OI-qzecXZJxoiLR8IF3HGTqmPZSQIs24Cwe8IVce9hPeXnJBti_P2-Vr1b2v1svHrgJtRMWtHhocWOtGEDBoJ5lqR2fNqLRWYIRjc2WlZ8aP3NpBts565VruvZEOxYLc_9-effpjDgfIP_2fV3_2Er8QlUh9</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Last iterate convergence of SGD for Least-Squares in the Interpolation regime</title><source>arXiv.org</source><creator>Varre, Aditya ; Pillaud-Vivien, Loucas ; Flammarion, Nicolas</creator><creatorcontrib>Varre, Aditya ; Pillaud-Vivien, Loucas ; Flammarion, Nicolas</creatorcontrib><description>Motivated by the recent successes of neural networks that have the ability to fit the data perfectly and generalize well, we study the noiseless model in the fundamental least-squares setup. We assume that an optimum predictor fits perfectly inputs and outputs $\langle \theta_* , \phi(X) \rangle = Y$, where $\phi(X)$ stands for a possibly infinite dimensional non-linear feature map. To solve this problem, we consider the estimator given by the last iterate of stochastic gradient descent (SGD) with constant step-size. In this context, our contribution is two fold: (i) from a (stochastic) optimization perspective, we exhibit an archetypal problem where we can show explicitly the convergence of SGD final iterate for a non-strongly convex problem with constant step-size whereas usual results use some form of average and (ii) from a statistical perspective, we give explicit non-asymptotic convergence rates in the over-parameterized setting and leverage a fine-grained parameterization of the problem to exhibit polynomial rates that can be faster than $O(1/T)$. The link with reproducing kernel Hilbert spaces is established.</description><identifier>DOI: 10.48550/arxiv.2102.03183</identifier><language>eng</language><subject>Computer Science - Learning ; Mathematics - Optimization and Control ; Statistics - Machine Learning</subject><creationdate>2021-02</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,777,882</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2102.03183$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2102.03183$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Varre, Aditya</creatorcontrib><creatorcontrib>Pillaud-Vivien, Loucas</creatorcontrib><creatorcontrib>Flammarion, Nicolas</creatorcontrib><title>Last iterate convergence of SGD for Least-Squares in the Interpolation regime</title><description>Motivated by the recent successes of neural networks that have the ability to fit the data perfectly and generalize well, we study the noiseless model in the fundamental least-squares setup. We assume that an optimum predictor fits perfectly inputs and outputs $\langle \theta_* , \phi(X) \rangle = Y$, where $\phi(X)$ stands for a possibly infinite dimensional non-linear feature map. To solve this problem, we consider the estimator given by the last iterate of stochastic gradient descent (SGD) with constant step-size. In this context, our contribution is two fold: (i) from a (stochastic) optimization perspective, we exhibit an archetypal problem where we can show explicitly the convergence of SGD final iterate for a non-strongly convex problem with constant step-size whereas usual results use some form of average and (ii) from a statistical perspective, we give explicit non-asymptotic convergence rates in the over-parameterized setting and leverage a fine-grained parameterization of the problem to exhibit polynomial rates that can be faster than $O(1/T)$. The link with reproducing kernel Hilbert spaces is established.</description><subject>Computer Science - Learning</subject><subject>Mathematics - Optimization and Control</subject><subject>Statistics - Machine Learning</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotj8tOwzAURL1hgQofwAr_QIIdP7NEBUqlIBbtPrpxroul1i6OqeDvCaWrkUZzRjqE3HFWS6sUe4D8HU51w1lTM8GtuCZvHUyFhoIZClKX4gnzDqNDmjzdrJ6oT5l2OI-qzecXZJxoiLR8IF3HGTqmPZSQIs24Cwe8IVce9hPeXnJBti_P2-Vr1b2v1svHrgJtRMWtHhocWOtGEDBoJ5lqR2fNqLRWYIRjc2WlZ8aP3NpBts565VruvZEOxYLc_9-effpjDgfIP_2fV3_2Er8QlUh9</recordid><startdate>20210205</startdate><enddate>20210205</enddate><creator>Varre, Aditya</creator><creator>Pillaud-Vivien, Loucas</creator><creator>Flammarion, Nicolas</creator><scope>AKY</scope><scope>AKZ</scope><scope>EPD</scope><scope>GOX</scope></search><sort><creationdate>20210205</creationdate><title>Last iterate convergence of SGD for Least-Squares in the Interpolation regime</title><author>Varre, Aditya ; Pillaud-Vivien, Loucas ; Flammarion, Nicolas</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a673-186b2eb09cda3ab6c4059dc87d5665a73c0c4084f07fd188b49c8f5c91ff74ce3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>Computer Science - Learning</topic><topic>Mathematics - Optimization and Control</topic><topic>Statistics - Machine Learning</topic><toplevel>online_resources</toplevel><creatorcontrib>Varre, Aditya</creatorcontrib><creatorcontrib>Pillaud-Vivien, Loucas</creatorcontrib><creatorcontrib>Flammarion, Nicolas</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv Mathematics</collection><collection>arXiv Statistics</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Varre, Aditya</au><au>Pillaud-Vivien, Loucas</au><au>Flammarion, Nicolas</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Last iterate convergence of SGD for Least-Squares in the Interpolation regime</atitle><date>2021-02-05</date><risdate>2021</risdate><abstract>Motivated by the recent successes of neural networks that have the ability to fit the data perfectly and generalize well, we study the noiseless model in the fundamental least-squares setup. We assume that an optimum predictor fits perfectly inputs and outputs $\langle \theta_* , \phi(X) \rangle = Y$, where $\phi(X)$ stands for a possibly infinite dimensional non-linear feature map. To solve this problem, we consider the estimator given by the last iterate of stochastic gradient descent (SGD) with constant step-size. In this context, our contribution is two fold: (i) from a (stochastic) optimization perspective, we exhibit an archetypal problem where we can show explicitly the convergence of SGD final iterate for a non-strongly convex problem with constant step-size whereas usual results use some form of average and (ii) from a statistical perspective, we give explicit non-asymptotic convergence rates in the over-parameterized setting and leverage a fine-grained parameterization of the problem to exhibit polynomial rates that can be faster than $O(1/T)$. The link with reproducing kernel Hilbert spaces is established.</abstract><doi>10.48550/arxiv.2102.03183</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	DOI: 10.48550/arxiv.2102.03183
ispartof
issn
language	eng
recordid	cdi_arxiv_primary_2102_03183
source	arXiv.org
subjects	Computer Science - Learning Mathematics - Optimization and Control Statistics - Machine Learning
title	Last iterate convergence of SGD for Least-Squares in the Interpolation regime
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-17T23%3A23%3A39IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Last%20iterate%20convergence%20of%20SGD%20for%20Least-Squares%20in%20the%20Interpolation%20regime&rft.au=Varre,%20Aditya&rft.date=2021-02-05&rft_id=info:doi/10.48550/arxiv.2102.03183&rft_dat=%3Carxiv_GOX%3E2102_03183%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true