Generalization Bounds of SGLD for Non-convex Learning: Two Theoretical Viewpoints

Algorithm-dependent generalization error bounds are central to statistical learning theory. A learning algorithm may use a large hypothesis space, but the limited number of iterations controls its model capacity and generalization error. The impacts of stochastic gradient methods on generalization e...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Mou, Wenlong, Wang, Liwei, Zhai, Xiyu, Zheng, Kai
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title
container_volume
creator Mou, Wenlong
Wang, Liwei
Zhai, Xiyu
Zheng, Kai
description Algorithm-dependent generalization error bounds are central to statistical learning theory. A learning algorithm may use a large hypothesis space, but the limited number of iterations controls its model capacity and generalization error. The impacts of stochastic gradient methods on generalization error for non-convex learning problems not only have important theoretical consequences, but are also critical to generalization errors of deep learning. In this paper, we study the generalization errors of Stochastic Gradient Langevin Dynamics (SGLD) with non-convex objectives. Two theories are proposed with non-asymptotic discrete-time analysis, using Stability and PAC-Bayesian results respectively. The stability-based theory obtains a bound of $O\left(\frac{1}{n}L\sqrt{\beta T_k}\right)$, where $L$ is uniform Lipschitz parameter, $\beta$ is inverse temperature, and $T_k$ is aggregated step sizes. For PAC-Bayesian theory, though the bound has a slower $O(1/\sqrt{n})$ rate, the contribution of each step is shown with an exponentially decaying factor by imposing $\ell^2$ regularization, and the uniform Lipschitz constant is also replaced by actual norms of gradients along trajectory. Our bounds have no implicit dependence on dimensions, norms or other capacity measures of parameter, which elegantly characterizes the phenomenon of "Fast Training Guarantees Generalization" in non-convex settings. This is the first algorithm-dependent result with reasonable dependence on aggregated step sizes for non-convex learning, and has important implications to statistical learning aspects of stochastic gradient methods in complicated models such as deep learning.
doi_str_mv 10.48550/arxiv.1707.05947
format Article
fullrecord <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_1707_05947</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>1707_05947</sourcerecordid><originalsourceid>FETCH-LOGICAL-a677-a1690c7e3b37a6bb6d362f0ebbb8d270515114c0b7f2b815464c636b5fa017203</originalsourceid><addsrcrecordid>eNotz71OwzAUQGEvDKjwAEz1CyTY8V_KBgUCUgSqiFija-caLAW7ckJbeHpEYTrbkT5CLjgrZa0Uu4R8CLuSG2ZKplbSnJJNgxEzjOEb5pAivUmfcZho8vSlaW-pT5k-pVi4FHd4oC1CjiG-XdFun2j3jinjHByM9DXgfptCnKczcuJhnPD8vwvS3d9164eifW4e19dtAdqYArheMWdQWGFAW6sHoSvP0FpbD5VhiivOpWPW-MrWXEktnRbaKg-Mm4qJBVn-bY-mfpvDB-Sv_tfWH23iB1IWSTM</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Generalization Bounds of SGLD for Non-convex Learning: Two Theoretical Viewpoints</title><source>arXiv.org</source><creator>Mou, Wenlong ; Wang, Liwei ; Zhai, Xiyu ; Zheng, Kai</creator><creatorcontrib>Mou, Wenlong ; Wang, Liwei ; Zhai, Xiyu ; Zheng, Kai</creatorcontrib><description>Algorithm-dependent generalization error bounds are central to statistical learning theory. A learning algorithm may use a large hypothesis space, but the limited number of iterations controls its model capacity and generalization error. The impacts of stochastic gradient methods on generalization error for non-convex learning problems not only have important theoretical consequences, but are also critical to generalization errors of deep learning. In this paper, we study the generalization errors of Stochastic Gradient Langevin Dynamics (SGLD) with non-convex objectives. Two theories are proposed with non-asymptotic discrete-time analysis, using Stability and PAC-Bayesian results respectively. The stability-based theory obtains a bound of $O\left(\frac{1}{n}L\sqrt{\beta T_k}\right)$, where $L$ is uniform Lipschitz parameter, $\beta$ is inverse temperature, and $T_k$ is aggregated step sizes. For PAC-Bayesian theory, though the bound has a slower $O(1/\sqrt{n})$ rate, the contribution of each step is shown with an exponentially decaying factor by imposing $\ell^2$ regularization, and the uniform Lipschitz constant is also replaced by actual norms of gradients along trajectory. Our bounds have no implicit dependence on dimensions, norms or other capacity measures of parameter, which elegantly characterizes the phenomenon of "Fast Training Guarantees Generalization" in non-convex settings. This is the first algorithm-dependent result with reasonable dependence on aggregated step sizes for non-convex learning, and has important implications to statistical learning aspects of stochastic gradient methods in complicated models such as deep learning.</description><identifier>DOI: 10.48550/arxiv.1707.05947</identifier><language>eng</language><subject>Computer Science - Learning ; Mathematics - Optimization and Control ; Statistics - Machine Learning</subject><creationdate>2017-07</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,776,881</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/1707.05947$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.1707.05947$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Mou, Wenlong</creatorcontrib><creatorcontrib>Wang, Liwei</creatorcontrib><creatorcontrib>Zhai, Xiyu</creatorcontrib><creatorcontrib>Zheng, Kai</creatorcontrib><title>Generalization Bounds of SGLD for Non-convex Learning: Two Theoretical Viewpoints</title><description>Algorithm-dependent generalization error bounds are central to statistical learning theory. A learning algorithm may use a large hypothesis space, but the limited number of iterations controls its model capacity and generalization error. The impacts of stochastic gradient methods on generalization error for non-convex learning problems not only have important theoretical consequences, but are also critical to generalization errors of deep learning. In this paper, we study the generalization errors of Stochastic Gradient Langevin Dynamics (SGLD) with non-convex objectives. Two theories are proposed with non-asymptotic discrete-time analysis, using Stability and PAC-Bayesian results respectively. The stability-based theory obtains a bound of $O\left(\frac{1}{n}L\sqrt{\beta T_k}\right)$, where $L$ is uniform Lipschitz parameter, $\beta$ is inverse temperature, and $T_k$ is aggregated step sizes. For PAC-Bayesian theory, though the bound has a slower $O(1/\sqrt{n})$ rate, the contribution of each step is shown with an exponentially decaying factor by imposing $\ell^2$ regularization, and the uniform Lipschitz constant is also replaced by actual norms of gradients along trajectory. Our bounds have no implicit dependence on dimensions, norms or other capacity measures of parameter, which elegantly characterizes the phenomenon of "Fast Training Guarantees Generalization" in non-convex settings. This is the first algorithm-dependent result with reasonable dependence on aggregated step sizes for non-convex learning, and has important implications to statistical learning aspects of stochastic gradient methods in complicated models such as deep learning.</description><subject>Computer Science - Learning</subject><subject>Mathematics - Optimization and Control</subject><subject>Statistics - Machine Learning</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2017</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotz71OwzAUQGEvDKjwAEz1CyTY8V_KBgUCUgSqiFija-caLAW7ckJbeHpEYTrbkT5CLjgrZa0Uu4R8CLuSG2ZKplbSnJJNgxEzjOEb5pAivUmfcZho8vSlaW-pT5k-pVi4FHd4oC1CjiG-XdFun2j3jinjHByM9DXgfptCnKczcuJhnPD8vwvS3d9164eifW4e19dtAdqYArheMWdQWGFAW6sHoSvP0FpbD5VhiivOpWPW-MrWXEktnRbaKg-Mm4qJBVn-bY-mfpvDB-Sv_tfWH23iB1IWSTM</recordid><startdate>20170719</startdate><enddate>20170719</enddate><creator>Mou, Wenlong</creator><creator>Wang, Liwei</creator><creator>Zhai, Xiyu</creator><creator>Zheng, Kai</creator><scope>AKY</scope><scope>AKZ</scope><scope>EPD</scope><scope>GOX</scope></search><sort><creationdate>20170719</creationdate><title>Generalization Bounds of SGLD for Non-convex Learning: Two Theoretical Viewpoints</title><author>Mou, Wenlong ; Wang, Liwei ; Zhai, Xiyu ; Zheng, Kai</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a677-a1690c7e3b37a6bb6d362f0ebbb8d270515114c0b7f2b815464c636b5fa017203</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2017</creationdate><topic>Computer Science - Learning</topic><topic>Mathematics - Optimization and Control</topic><topic>Statistics - Machine Learning</topic><toplevel>online_resources</toplevel><creatorcontrib>Mou, Wenlong</creatorcontrib><creatorcontrib>Wang, Liwei</creatorcontrib><creatorcontrib>Zhai, Xiyu</creatorcontrib><creatorcontrib>Zheng, Kai</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv Mathematics</collection><collection>arXiv Statistics</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Mou, Wenlong</au><au>Wang, Liwei</au><au>Zhai, Xiyu</au><au>Zheng, Kai</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Generalization Bounds of SGLD for Non-convex Learning: Two Theoretical Viewpoints</atitle><date>2017-07-19</date><risdate>2017</risdate><abstract>Algorithm-dependent generalization error bounds are central to statistical learning theory. A learning algorithm may use a large hypothesis space, but the limited number of iterations controls its model capacity and generalization error. The impacts of stochastic gradient methods on generalization error for non-convex learning problems not only have important theoretical consequences, but are also critical to generalization errors of deep learning. In this paper, we study the generalization errors of Stochastic Gradient Langevin Dynamics (SGLD) with non-convex objectives. Two theories are proposed with non-asymptotic discrete-time analysis, using Stability and PAC-Bayesian results respectively. The stability-based theory obtains a bound of $O\left(\frac{1}{n}L\sqrt{\beta T_k}\right)$, where $L$ is uniform Lipschitz parameter, $\beta$ is inverse temperature, and $T_k$ is aggregated step sizes. For PAC-Bayesian theory, though the bound has a slower $O(1/\sqrt{n})$ rate, the contribution of each step is shown with an exponentially decaying factor by imposing $\ell^2$ regularization, and the uniform Lipschitz constant is also replaced by actual norms of gradients along trajectory. Our bounds have no implicit dependence on dimensions, norms or other capacity measures of parameter, which elegantly characterizes the phenomenon of "Fast Training Guarantees Generalization" in non-convex settings. This is the first algorithm-dependent result with reasonable dependence on aggregated step sizes for non-convex learning, and has important implications to statistical learning aspects of stochastic gradient methods in complicated models such as deep learning.</abstract><doi>10.48550/arxiv.1707.05947</doi><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier DOI: 10.48550/arxiv.1707.05947
ispartof
issn
language eng
recordid cdi_arxiv_primary_1707_05947
source arXiv.org
subjects Computer Science - Learning
Mathematics - Optimization and Control
Statistics - Machine Learning
title Generalization Bounds of SGLD for Non-convex Learning: Two Theoretical Viewpoints
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-08T00%3A15%3A28IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Generalization%20Bounds%20of%20SGLD%20for%20Non-convex%20Learning:%20Two%20Theoretical%20Viewpoints&rft.au=Mou,%20Wenlong&rft.date=2017-07-19&rft_id=info:doi/10.48550/arxiv.1707.05947&rft_dat=%3Carxiv_GOX%3E1707_05947%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true