Generalization Bounds of SGLD for Non-convex Learning: Two Theoretical Viewpoints

Algorithm-dependent generalization error bounds are central to statistical learning theory. A learning algorithm may use a large hypothesis space, but the limited number of iterations controls its model capacity and generalization error. The impacts of stochastic gradient methods on generalization e...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Mou, Wenlong, Wang, Liwei, Zhai, Xiyu, Zheng, Kai
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer Science - Learning Mathematics - Optimization and Control Statistics - Machine Learning
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title
container_volume
creator	Mou, Wenlong Wang, Liwei Zhai, Xiyu Zheng, Kai
description	Algorithm-dependent generalization error bounds are central to statistical learning theory. A learning algorithm may use a large hypothesis space, but the limited number of iterations controls its model capacity and generalization error. The impacts of stochastic gradient methods on generalization error for non-convex learning problems not only have important theoretical consequences, but are also critical to generalization errors of deep learning. In this paper, we study the generalization errors of Stochastic Gradient Langevin Dynamics (SGLD) with non-convex objectives. Two theories are proposed with non-asymptotic discrete-time analysis, using Stability and PAC-Bayesian results respectively. The stability-based theory obtains a bound of $O\left(\frac{1}{n}L\sqrt{\beta T_k}\right)$, where $L$ is uniform Lipschitz parameter, $\beta$ is inverse temperature, and $T_k$ is aggregated step sizes. For PAC-Bayesian theory, though the bound has a slower $O(1/\sqrt{n})$ rate, the contribution of each step is shown with an exponentially decaying factor by imposing $\ell^2$ regularization, and the uniform Lipschitz constant is also replaced by actual norms of gradients along trajectory. Our bounds have no implicit dependence on dimensions, norms or other capacity measures of parameter, which elegantly characterizes the phenomenon of "Fast Training Guarantees Generalization" in non-convex settings. This is the first algorithm-dependent result with reasonable dependence on aggregated step sizes for non-convex learning, and has important implications to statistical learning aspects of stochastic gradient methods in complicated models such as deep learning.
doi_str_mv	10.48550/arxiv.1707.05947
format	Article
fullrecord	<record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_1707_05947</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>1707_05947</sourcerecordid><originalsourceid>FETCH-LOGICAL-a677-a1690c7e3b37a6bb6d362f0ebbb8d270515114c0b7f2b815464c636b5fa017203</originalsourceid><addsrcrecordid>eNotz71OwzAUQGEvDKjwAEz1CyTY8V_KBgUCUgSqiFija-caLAW7ckJbeHpEYTrbkT5CLjgrZa0Uu4R8CLuSG2ZKplbSnJJNgxEzjOEb5pAivUmfcZho8vSlaW-pT5k-pVi4FHd4oC1CjiG-XdFun2j3jinjHByM9DXgfptCnKczcuJhnPD8vwvS3d9164eifW4e19dtAdqYArheMWdQWGFAW6sHoSvP0FpbD5VhiivOpWPW-MrWXEktnRbaKg-Mm4qJBVn-bY-mfpvDB-Sv_tfWH23iB1IWSTM</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Generalization Bounds of SGLD for Non-convex Learning: Two Theoretical Viewpoints</title><source>arXiv.org</source><creator>Mou, Wenlong ; Wang, Liwei ; Zhai, Xiyu ; Zheng, Kai</creator><creatorcontrib>Mou, Wenlong ; Wang, Liwei ; Zhai, Xiyu ; Zheng, Kai</creatorcontrib><description>Algorithm-dependent generalization error bounds are central to statistical learning theory. A learning algorithm may use a large hypothesis space, but the limited number of iterations controls its model capacity and generalization error. The impacts of stochastic gradient methods on generalization error for non-convex learning problems not only have important theoretical consequences, but are also critical to generalization errors of deep learning. In this paper, we study the generalization errors of Stochastic Gradient Langevin Dynamics (SGLD) with non-convex objectives. Two theories are proposed with non-asymptotic discrete-time analysis, using Stability and PAC-Bayesian results respectively. The stability-based theory obtains a bound of $O\left(\frac{1}{n}L\sqrt{\beta T_k}\right)$, where $L$ is uniform Lipschitz parameter, $\beta$ is inverse temperature, and $T_k$ is aggregated step sizes. For PAC-Bayesian theory, though the bound has a slower $O(1/\sqrt{n})$ rate, the contribution of each step is shown with an exponentially decaying factor by imposing $\ell^2$ regularization, and the uniform Lipschitz constant is also replaced by actual norms of gradients along trajectory. Our bounds have no implicit dependence on dimensions, norms or other capacity measures of parameter, which elegantly characterizes the phenomenon of "Fast Training Guarantees Generalization" in non-convex settings. This is the first algorithm-dependent result with reasonable dependence on aggregated step sizes for non-convex learning, and has important implications to statistical learning aspects of stochastic gradient methods in complicated models such as deep learning.</description><identifier>DOI: 10.48550/arxiv.1707.05947</identifier><language>eng</language><subject>Computer Science - Learning ; Mathematics - Optimization and Control ; Statistics - Machine Learning</subject><creationdate>2017-07</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,776,881</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/1707.05947$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.1707.05947$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Mou, Wenlong</creatorcontrib><creatorcontrib>Wang, Liwei</creatorcontrib><creatorcontrib>Zhai, Xiyu</creatorcontrib><creatorcontrib>Zheng, Kai</creatorcontrib><title>Generalization Bounds of SGLD for Non-convex Learning: Two Theoretical Viewpoints</title><description>Algorithm-dependent generalization error bounds are central to statistical learning theory. A learning algorithm may use a large hypothesis space, but the limited number of iterations controls its model capacity and generalization error. The impacts of stochastic gradient methods on generalization error for non-convex learning problems not only have important theoretical consequences, but are also critical to generalization errors of deep learning. In this paper, we study the generalization errors of Stochastic Gradient Langevin Dynamics (SGLD) with non-convex objectives. Two theories are proposed with non-asymptotic discrete-time analysis, using Stability and PAC-Bayesian results respectively. The stability-based theory obtains a bound of $O\left(\frac{1}{n}L\sqrt{\beta T_k}\right)$, where $L$ is uniform Lipschitz parameter, $\beta$ is inverse temperature, and $T_k$ is aggregated step sizes. For PAC-Bayesian theory, though the bound has a slower $O(1/\sqrt{n})$ rate, the contribution of each step is shown with an exponentially decaying factor by imposing $\ell^2$ regularization, and the uniform Lipschitz constant is also replaced by actual norms of gradients along trajectory. Our bounds have no implicit dependence on dimensions, norms or other capacity measures of parameter, which elegantly characterizes the phenomenon of "Fast Training Guarantees Generalization" in non-convex settings. This is the first algorithm-dependent result with reasonable dependence on aggregated step sizes for non-convex learning, and has important implications to statistical learning aspects of stochastic gradient methods in complicated models such as deep learning.</description><subject>Computer Science - Learning</subject><subject>Mathematics - Optimization and Control</subject><subject>Statistics - Machine Learning</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2017</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotz71OwzAUQGEvDKjwAEz1CyTY8V_KBgUCUgSqiFija-caLAW7ckJbeHpEYTrbkT5CLjgrZa0Uu4R8CLuSG2ZKplbSnJJNgxEzjOEb5pAivUmfcZho8vSlaW-pT5k-pVi4FHd4oC1CjiG-XdFun2j3jinjHByM9DXgfptCnKczcuJhnPD8vwvS3d9164eifW4e19dtAdqYArheMWdQWGFAW6sHoSvP0FpbD5VhiivOpWPW-MrWXEktnRbaKg-Mm4qJBVn-bY-mfpvDB-Sv_tfWH23iB1IWSTM</recordid><startdate>20170719</startdate><enddate>20170719</enddate><creator>Mou, Wenlong</creator><creator>Wang, Liwei</creator><creator>Zhai, Xiyu</creator><creator>Zheng, Kai</creator><scope>AKY</scope><scope>AKZ</scope><scope>EPD</scope><scope>GOX</scope></search><sort><creationdate>20170719</creationdate><title>Generalization Bounds of SGLD for Non-convex Learning: Two Theoretical Viewpoints</title><author>Mou, Wenlong ; Wang, Liwei ; Zhai, Xiyu ; Zheng, Kai</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a677-a1690c7e3b37a6bb6d362f0ebbb8d270515114c0b7f2b815464c636b5fa017203</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2017</creationdate><topic>Computer Science - Learning</topic><topic>Mathematics - Optimization and Control</topic><topic>Statistics - Machine Learning</topic><toplevel>online_resources</toplevel><creatorcontrib>Mou, Wenlong</creatorcontrib><creatorcontrib>Wang, Liwei</creatorcontrib><creatorcontrib>Zhai, Xiyu</creatorcontrib><creatorcontrib>Zheng, Kai</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv Mathematics</collection><collection>arXiv Statistics</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Mou, Wenlong</au><au>Wang, Liwei</au><au>Zhai, Xiyu</au><au>Zheng, Kai</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Generalization Bounds of SGLD for Non-convex Learning: Two Theoretical Viewpoints</atitle><date>2017-07-19</date><risdate>2017</risdate><abstract>Algorithm-dependent generalization error bounds are central to statistical learning theory. A learning algorithm may use a large hypothesis space, but the limited number of iterations controls its model capacity and generalization error. The impacts of stochastic gradient methods on generalization error for non-convex learning problems not only have important theoretical consequences, but are also critical to generalization errors of deep learning. In this paper, we study the generalization errors of Stochastic Gradient Langevin Dynamics (SGLD) with non-convex objectives. Two theories are proposed with non-asymptotic discrete-time analysis, using Stability and PAC-Bayesian results respectively. The stability-based theory obtains a bound of $O\left(\frac{1}{n}L\sqrt{\beta T_k}\right)$, where $L$ is uniform Lipschitz parameter, $\beta$ is inverse temperature, and $T_k$ is aggregated step sizes. For PAC-Bayesian theory, though the bound has a slower $O(1/\sqrt{n})$ rate, the contribution of each step is shown with an exponentially decaying factor by imposing $\ell^2$ regularization, and the uniform Lipschitz constant is also replaced by actual norms of gradients along trajectory. Our bounds have no implicit dependence on dimensions, norms or other capacity measures of parameter, which elegantly characterizes the phenomenon of "Fast Training Guarantees Generalization" in non-convex settings. This is the first algorithm-dependent result with reasonable dependence on aggregated step sizes for non-convex learning, and has important implications to statistical learning aspects of stochastic gradient methods in complicated models such as deep learning.</abstract><doi>10.48550/arxiv.1707.05947</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	DOI: 10.48550/arxiv.1707.05947
ispartof
issn
language	eng
recordid	cdi_arxiv_primary_1707_05947
source	arXiv.org
subjects	Computer Science - Learning Mathematics - Optimization and Control Statistics - Machine Learning
title	Generalization Bounds of SGLD for Non-convex Learning: Two Theoretical Viewpoints
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-08T00%3A15%3A28IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Generalization%20Bounds%20of%20SGLD%20for%20Non-convex%20Learning:%20Two%20Theoretical%20Viewpoints&rft.au=Mou,%20Wenlong&rft.date=2017-07-19&rft_id=info:doi/10.48550/arxiv.1707.05947&rft_dat=%3Carxiv_GOX%3E1707_05947%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true