Last iterate convergence of SGD for Least-Squares in the Interpolation regime
Motivated by the recent successes of neural networks that have the ability to fit the data perfectly and generalize well, we study the noiseless model in the fundamental least-squares setup. We assume that an optimum predictor fits perfectly inputs and outputs $\langle \theta_* , \phi(X) \rangle = Y...
Gespeichert in:
Hauptverfasser: | , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | |
---|---|
container_issue | |
container_start_page | |
container_title | |
container_volume | |
creator | Varre, Aditya Pillaud-Vivien, Loucas Flammarion, Nicolas |
description | Motivated by the recent successes of neural networks that have the ability to
fit the data perfectly and generalize well, we study the noiseless model in the
fundamental least-squares setup. We assume that an optimum predictor fits
perfectly inputs and outputs $\langle \theta_* , \phi(X) \rangle = Y$, where
$\phi(X)$ stands for a possibly infinite dimensional non-linear feature map. To
solve this problem, we consider the estimator given by the last iterate of
stochastic gradient descent (SGD) with constant step-size. In this context, our
contribution is two fold: (i) from a (stochastic) optimization perspective, we
exhibit an archetypal problem where we can show explicitly the convergence of
SGD final iterate for a non-strongly convex problem with constant step-size
whereas usual results use some form of average and (ii) from a statistical
perspective, we give explicit non-asymptotic convergence rates in the
over-parameterized setting and leverage a fine-grained parameterization of the
problem to exhibit polynomial rates that can be faster than $O(1/T)$. The link
with reproducing kernel Hilbert spaces is established. |
doi_str_mv | 10.48550/arxiv.2102.03183 |
format | Article |
fullrecord | <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2102_03183</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2102_03183</sourcerecordid><originalsourceid>FETCH-LOGICAL-a673-186b2eb09cda3ab6c4059dc87d5665a73c0c4084f07fd188b49c8f5c91ff74ce3</originalsourceid><addsrcrecordid>eNotj8tOwzAURL1hgQofwAr_QIIdP7NEBUqlIBbtPrpxroul1i6OqeDvCaWrkUZzRjqE3HFWS6sUe4D8HU51w1lTM8GtuCZvHUyFhoIZClKX4gnzDqNDmjzdrJ6oT5l2OI-qzecXZJxoiLR8IF3HGTqmPZSQIs24Cwe8IVce9hPeXnJBti_P2-Vr1b2v1svHrgJtRMWtHhocWOtGEDBoJ5lqR2fNqLRWYIRjc2WlZ8aP3NpBts565VruvZEOxYLc_9-effpjDgfIP_2fV3_2Er8QlUh9</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Last iterate convergence of SGD for Least-Squares in the Interpolation regime</title><source>arXiv.org</source><creator>Varre, Aditya ; Pillaud-Vivien, Loucas ; Flammarion, Nicolas</creator><creatorcontrib>Varre, Aditya ; Pillaud-Vivien, Loucas ; Flammarion, Nicolas</creatorcontrib><description>Motivated by the recent successes of neural networks that have the ability to
fit the data perfectly and generalize well, we study the noiseless model in the
fundamental least-squares setup. We assume that an optimum predictor fits
perfectly inputs and outputs $\langle \theta_* , \phi(X) \rangle = Y$, where
$\phi(X)$ stands for a possibly infinite dimensional non-linear feature map. To
solve this problem, we consider the estimator given by the last iterate of
stochastic gradient descent (SGD) with constant step-size. In this context, our
contribution is two fold: (i) from a (stochastic) optimization perspective, we
exhibit an archetypal problem where we can show explicitly the convergence of
SGD final iterate for a non-strongly convex problem with constant step-size
whereas usual results use some form of average and (ii) from a statistical
perspective, we give explicit non-asymptotic convergence rates in the
over-parameterized setting and leverage a fine-grained parameterization of the
problem to exhibit polynomial rates that can be faster than $O(1/T)$. The link
with reproducing kernel Hilbert spaces is established.</description><identifier>DOI: 10.48550/arxiv.2102.03183</identifier><language>eng</language><subject>Computer Science - Learning ; Mathematics - Optimization and Control ; Statistics - Machine Learning</subject><creationdate>2021-02</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,777,882</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2102.03183$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2102.03183$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Varre, Aditya</creatorcontrib><creatorcontrib>Pillaud-Vivien, Loucas</creatorcontrib><creatorcontrib>Flammarion, Nicolas</creatorcontrib><title>Last iterate convergence of SGD for Least-Squares in the Interpolation regime</title><description>Motivated by the recent successes of neural networks that have the ability to
fit the data perfectly and generalize well, we study the noiseless model in the
fundamental least-squares setup. We assume that an optimum predictor fits
perfectly inputs and outputs $\langle \theta_* , \phi(X) \rangle = Y$, where
$\phi(X)$ stands for a possibly infinite dimensional non-linear feature map. To
solve this problem, we consider the estimator given by the last iterate of
stochastic gradient descent (SGD) with constant step-size. In this context, our
contribution is two fold: (i) from a (stochastic) optimization perspective, we
exhibit an archetypal problem where we can show explicitly the convergence of
SGD final iterate for a non-strongly convex problem with constant step-size
whereas usual results use some form of average and (ii) from a statistical
perspective, we give explicit non-asymptotic convergence rates in the
over-parameterized setting and leverage a fine-grained parameterization of the
problem to exhibit polynomial rates that can be faster than $O(1/T)$. The link
with reproducing kernel Hilbert spaces is established.</description><subject>Computer Science - Learning</subject><subject>Mathematics - Optimization and Control</subject><subject>Statistics - Machine Learning</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotj8tOwzAURL1hgQofwAr_QIIdP7NEBUqlIBbtPrpxroul1i6OqeDvCaWrkUZzRjqE3HFWS6sUe4D8HU51w1lTM8GtuCZvHUyFhoIZClKX4gnzDqNDmjzdrJ6oT5l2OI-qzecXZJxoiLR8IF3HGTqmPZSQIs24Cwe8IVce9hPeXnJBti_P2-Vr1b2v1svHrgJtRMWtHhocWOtGEDBoJ5lqR2fNqLRWYIRjc2WlZ8aP3NpBts565VruvZEOxYLc_9-effpjDgfIP_2fV3_2Er8QlUh9</recordid><startdate>20210205</startdate><enddate>20210205</enddate><creator>Varre, Aditya</creator><creator>Pillaud-Vivien, Loucas</creator><creator>Flammarion, Nicolas</creator><scope>AKY</scope><scope>AKZ</scope><scope>EPD</scope><scope>GOX</scope></search><sort><creationdate>20210205</creationdate><title>Last iterate convergence of SGD for Least-Squares in the Interpolation regime</title><author>Varre, Aditya ; Pillaud-Vivien, Loucas ; Flammarion, Nicolas</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a673-186b2eb09cda3ab6c4059dc87d5665a73c0c4084f07fd188b49c8f5c91ff74ce3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>Computer Science - Learning</topic><topic>Mathematics - Optimization and Control</topic><topic>Statistics - Machine Learning</topic><toplevel>online_resources</toplevel><creatorcontrib>Varre, Aditya</creatorcontrib><creatorcontrib>Pillaud-Vivien, Loucas</creatorcontrib><creatorcontrib>Flammarion, Nicolas</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv Mathematics</collection><collection>arXiv Statistics</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Varre, Aditya</au><au>Pillaud-Vivien, Loucas</au><au>Flammarion, Nicolas</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Last iterate convergence of SGD for Least-Squares in the Interpolation regime</atitle><date>2021-02-05</date><risdate>2021</risdate><abstract>Motivated by the recent successes of neural networks that have the ability to
fit the data perfectly and generalize well, we study the noiseless model in the
fundamental least-squares setup. We assume that an optimum predictor fits
perfectly inputs and outputs $\langle \theta_* , \phi(X) \rangle = Y$, where
$\phi(X)$ stands for a possibly infinite dimensional non-linear feature map. To
solve this problem, we consider the estimator given by the last iterate of
stochastic gradient descent (SGD) with constant step-size. In this context, our
contribution is two fold: (i) from a (stochastic) optimization perspective, we
exhibit an archetypal problem where we can show explicitly the convergence of
SGD final iterate for a non-strongly convex problem with constant step-size
whereas usual results use some form of average and (ii) from a statistical
perspective, we give explicit non-asymptotic convergence rates in the
over-parameterized setting and leverage a fine-grained parameterization of the
problem to exhibit polynomial rates that can be faster than $O(1/T)$. The link
with reproducing kernel Hilbert spaces is established.</abstract><doi>10.48550/arxiv.2102.03183</doi><oa>free_for_read</oa></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | DOI: 10.48550/arxiv.2102.03183 |
ispartof | |
issn | |
language | eng |
recordid | cdi_arxiv_primary_2102_03183 |
source | arXiv.org |
subjects | Computer Science - Learning Mathematics - Optimization and Control Statistics - Machine Learning |
title | Last iterate convergence of SGD for Least-Squares in the Interpolation regime |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-17T23%3A23%3A39IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Last%20iterate%20convergence%20of%20SGD%20for%20Least-Squares%20in%20the%20Interpolation%20regime&rft.au=Varre,%20Aditya&rft.date=2021-02-05&rft_id=info:doi/10.48550/arxiv.2102.03183&rft_dat=%3Carxiv_GOX%3E2102_03183%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true |