Black-Box Generalization: Stability of Zeroth-Order Learning
We provide the first generalization error analysis for black-box learning through derivative-free optimization. Under the assumption of a Lipschitz and smooth unknown loss, we consider the Zeroth-order Stochastic Search (ZoSS) algorithm, that updates a $d$-dimensional model by replacing stochastic g...
Gespeichert in:
Hauptverfasser: | , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | |
---|---|
container_issue | |
container_start_page | |
container_title | |
container_volume | |
creator | Nikolakakis, Konstantinos E Haddadpour, Farzin Kalogerias, Dionysios S Karbasi, Amin |
description | We provide the first generalization error analysis for black-box learning
through derivative-free optimization. Under the assumption of a Lipschitz and
smooth unknown loss, we consider the Zeroth-order Stochastic Search (ZoSS)
algorithm, that updates a $d$-dimensional model by replacing stochastic
gradient directions with stochastic differences of $K+1$ perturbed loss
evaluations per dataset (example) query. For both unbounded and bounded
possibly nonconvex losses, we present the first generalization bounds for the
ZoSS algorithm. These bounds coincide with those for SGD, and rather
surprisingly are independent of $d$, $K$ and the batch size $m$, under
appropriate choices of a slightly decreased learning rate. For bounded
nonconvex losses and a batch size $m=1$, we additionally show that both
generalization error and learning rate are independent of $d$ and $K$, and
remain essentially the same as for the SGD, even for two function evaluations.
Our results extensively extend and consistently recover established results for
SGD in prior work, on both generalization bounds and corresponding learning
rates. If additionally $m=n$, where $n$ is the dataset size, we derive
generalization guarantees for full-batch GD as well. |
doi_str_mv | 10.48550/arxiv.2202.06880 |
format | Article |
fullrecord | <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2202_06880</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2202_06880</sourcerecordid><originalsourceid>FETCH-LOGICAL-a670-68c50afd28c3be939d036f02e3911cecf0a5ee3d46262237cca7331f916757513</originalsourceid><addsrcrecordid>eNotj7tOwzAUQL0woMIHMOEfcLj2rR9BLLSCghSpA526RLfOdbEICTIRavl6RGE629E5QlxpqObBWrihcshflTFgKnAhwLm4W_QU39RiPMgVD1yoz9805XG4lS8T7XKfp6Mck9xyGadXtS4dF9kwlSEP-wtxlqj_5Mt_zsTm8WGzfFLNevW8vG8UOQ_KhWiBUmdCxB3XWHeALoFhrLWOHBOQZcZu7owzBn2M5BF1qrXz1luNM3H9pz3ltx8lv1M5tr8b7WkDfwCIJkF-</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Black-Box Generalization: Stability of Zeroth-Order Learning</title><source>arXiv.org</source><creator>Nikolakakis, Konstantinos E ; Haddadpour, Farzin ; Kalogerias, Dionysios S ; Karbasi, Amin</creator><creatorcontrib>Nikolakakis, Konstantinos E ; Haddadpour, Farzin ; Kalogerias, Dionysios S ; Karbasi, Amin</creatorcontrib><description>We provide the first generalization error analysis for black-box learning
through derivative-free optimization. Under the assumption of a Lipschitz and
smooth unknown loss, we consider the Zeroth-order Stochastic Search (ZoSS)
algorithm, that updates a $d$-dimensional model by replacing stochastic
gradient directions with stochastic differences of $K+1$ perturbed loss
evaluations per dataset (example) query. For both unbounded and bounded
possibly nonconvex losses, we present the first generalization bounds for the
ZoSS algorithm. These bounds coincide with those for SGD, and rather
surprisingly are independent of $d$, $K$ and the batch size $m$, under
appropriate choices of a slightly decreased learning rate. For bounded
nonconvex losses and a batch size $m=1$, we additionally show that both
generalization error and learning rate are independent of $d$ and $K$, and
remain essentially the same as for the SGD, even for two function evaluations.
Our results extensively extend and consistently recover established results for
SGD in prior work, on both generalization bounds and corresponding learning
rates. If additionally $m=n$, where $n$ is the dataset size, we derive
generalization guarantees for full-batch GD as well.</description><identifier>DOI: 10.48550/arxiv.2202.06880</identifier><language>eng</language><subject>Computer Science - Learning ; Mathematics - Optimization and Control ; Statistics - Machine Learning</subject><creationdate>2022-02</creationdate><rights>http://creativecommons.org/licenses/by/4.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,780,885</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2202.06880$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2202.06880$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Nikolakakis, Konstantinos E</creatorcontrib><creatorcontrib>Haddadpour, Farzin</creatorcontrib><creatorcontrib>Kalogerias, Dionysios S</creatorcontrib><creatorcontrib>Karbasi, Amin</creatorcontrib><title>Black-Box Generalization: Stability of Zeroth-Order Learning</title><description>We provide the first generalization error analysis for black-box learning
through derivative-free optimization. Under the assumption of a Lipschitz and
smooth unknown loss, we consider the Zeroth-order Stochastic Search (ZoSS)
algorithm, that updates a $d$-dimensional model by replacing stochastic
gradient directions with stochastic differences of $K+1$ perturbed loss
evaluations per dataset (example) query. For both unbounded and bounded
possibly nonconvex losses, we present the first generalization bounds for the
ZoSS algorithm. These bounds coincide with those for SGD, and rather
surprisingly are independent of $d$, $K$ and the batch size $m$, under
appropriate choices of a slightly decreased learning rate. For bounded
nonconvex losses and a batch size $m=1$, we additionally show that both
generalization error and learning rate are independent of $d$ and $K$, and
remain essentially the same as for the SGD, even for two function evaluations.
Our results extensively extend and consistently recover established results for
SGD in prior work, on both generalization bounds and corresponding learning
rates. If additionally $m=n$, where $n$ is the dataset size, we derive
generalization guarantees for full-batch GD as well.</description><subject>Computer Science - Learning</subject><subject>Mathematics - Optimization and Control</subject><subject>Statistics - Machine Learning</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotj7tOwzAUQL0woMIHMOEfcLj2rR9BLLSCghSpA526RLfOdbEICTIRavl6RGE629E5QlxpqObBWrihcshflTFgKnAhwLm4W_QU39RiPMgVD1yoz9805XG4lS8T7XKfp6Mck9xyGadXtS4dF9kwlSEP-wtxlqj_5Mt_zsTm8WGzfFLNevW8vG8UOQ_KhWiBUmdCxB3XWHeALoFhrLWOHBOQZcZu7owzBn2M5BF1qrXz1luNM3H9pz3ltx8lv1M5tr8b7WkDfwCIJkF-</recordid><startdate>20220214</startdate><enddate>20220214</enddate><creator>Nikolakakis, Konstantinos E</creator><creator>Haddadpour, Farzin</creator><creator>Kalogerias, Dionysios S</creator><creator>Karbasi, Amin</creator><scope>AKY</scope><scope>AKZ</scope><scope>EPD</scope><scope>GOX</scope></search><sort><creationdate>20220214</creationdate><title>Black-Box Generalization: Stability of Zeroth-Order Learning</title><author>Nikolakakis, Konstantinos E ; Haddadpour, Farzin ; Kalogerias, Dionysios S ; Karbasi, Amin</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a670-68c50afd28c3be939d036f02e3911cecf0a5ee3d46262237cca7331f916757513</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Computer Science - Learning</topic><topic>Mathematics - Optimization and Control</topic><topic>Statistics - Machine Learning</topic><toplevel>online_resources</toplevel><creatorcontrib>Nikolakakis, Konstantinos E</creatorcontrib><creatorcontrib>Haddadpour, Farzin</creatorcontrib><creatorcontrib>Kalogerias, Dionysios S</creatorcontrib><creatorcontrib>Karbasi, Amin</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv Mathematics</collection><collection>arXiv Statistics</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Nikolakakis, Konstantinos E</au><au>Haddadpour, Farzin</au><au>Kalogerias, Dionysios S</au><au>Karbasi, Amin</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Black-Box Generalization: Stability of Zeroth-Order Learning</atitle><date>2022-02-14</date><risdate>2022</risdate><abstract>We provide the first generalization error analysis for black-box learning
through derivative-free optimization. Under the assumption of a Lipschitz and
smooth unknown loss, we consider the Zeroth-order Stochastic Search (ZoSS)
algorithm, that updates a $d$-dimensional model by replacing stochastic
gradient directions with stochastic differences of $K+1$ perturbed loss
evaluations per dataset (example) query. For both unbounded and bounded
possibly nonconvex losses, we present the first generalization bounds for the
ZoSS algorithm. These bounds coincide with those for SGD, and rather
surprisingly are independent of $d$, $K$ and the batch size $m$, under
appropriate choices of a slightly decreased learning rate. For bounded
nonconvex losses and a batch size $m=1$, we additionally show that both
generalization error and learning rate are independent of $d$ and $K$, and
remain essentially the same as for the SGD, even for two function evaluations.
Our results extensively extend and consistently recover established results for
SGD in prior work, on both generalization bounds and corresponding learning
rates. If additionally $m=n$, where $n$ is the dataset size, we derive
generalization guarantees for full-batch GD as well.</abstract><doi>10.48550/arxiv.2202.06880</doi><oa>free_for_read</oa></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | DOI: 10.48550/arxiv.2202.06880 |
ispartof | |
issn | |
language | eng |
recordid | cdi_arxiv_primary_2202_06880 |
source | arXiv.org |
subjects | Computer Science - Learning Mathematics - Optimization and Control Statistics - Machine Learning |
title | Black-Box Generalization: Stability of Zeroth-Order Learning |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-02T13%3A45%3A42IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Black-Box%20Generalization:%20Stability%20of%20Zeroth-Order%20Learning&rft.au=Nikolakakis,%20Konstantinos%20E&rft.date=2022-02-14&rft_id=info:doi/10.48550/arxiv.2202.06880&rft_dat=%3Carxiv_GOX%3E2202_06880%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true |