Distributed Learning in Non-Convex Environments-Part I: Agreement at a Linear Rate

Driven by the need to solve increasingly complex optimization problems in signal processing and machine learning, there has been increasing interest in understanding the behavior of gradient-descent algorithms in non-convex environments. Most available works on distributed non-convex optimization pr...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on signal processing 2021, Vol.69, p.1242-1256
Hauptverfasser: Vlaski, Stefan, Sayed, Ali H.
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 1256
container_issue
container_start_page 1242
container_title IEEE transactions on signal processing
container_volume 69
creator Vlaski, Stefan
Sayed, Ali H.
description Driven by the need to solve increasingly complex optimization problems in signal processing and machine learning, there has been increasing interest in understanding the behavior of gradient-descent algorithms in non-convex environments. Most available works on distributed non-convex optimization problems focus on the deterministic setting where exact gradients are available at each agent. In this work and its Part II, we consider stochastic cost functions, where exact gradients are replaced by stochastic approximations and the resulting gradient noise persistently seeps into the dynamics of the algorithm. We establish that the diffusion learning strategy continues to yield meaningful estimates non-convex scenarios in the sense that the iterates by the individual agents will cluster in a small region around the network centroid. We use this insight to motivate a short-term model for network evolution over a finite-horizon. In Part II of this work, we leverage this model to establish descent of the diffusion strategy through saddle points in O(1/μ) steps, where μ denotes the step-size, and the return of approximately second-order stationary points in a polynomial number of iterations.
doi_str_mv 10.1109/TSP.2021.3050858
format Article
fullrecord <record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_proquest_journals_2493601095</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>9321168</ieee_id><sourcerecordid>2493601095</sourcerecordid><originalsourceid>FETCH-LOGICAL-c399t-8934cc21408e508322151ed27dc6e621acf8ac085782214d51df94490cf4af5a3</originalsourceid><addsrcrecordid>eNo9kE1LAzEQhoMoWKt3wUvA89Z8bhNvpVYtLFpqBW8hZmdLis3WZFv035vSIgQmzLzvfDwIXVMyoJTou8XbbMAIowNOJFFSnaAe1YIWRAzL0_wnkhdSDT_O0UVKK0KoELrsofmDT130n9sOalyBjcGHJfYBv7ShGLdhBz94EnY-tmENoUvFzMYOT-_xaBkB9ils88OVD9mM57aDS3TW2K8EV8fYR--Pk8X4uahen6bjUVU4rnVXKM2Fc4wKoiBvzBmjkkLNhrUroWTUukZZl08ZqlwStaR1o_PSxDXCNtLyPro99N3E9nsLqTOrdhtDHmmY0LwkGYvMKnJQudimFKExm-jXNv4aSsyenMnkzJ6cOZLLlpuDxQPAv1xzRmmp-B85smhC</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2493601095</pqid></control><display><type>article</type><title>Distributed Learning in Non-Convex Environments-Part I: Agreement at a Linear Rate</title><source>IEEE Electronic Library (IEL)</source><creator>Vlaski, Stefan ; Sayed, Ali H.</creator><creatorcontrib>Vlaski, Stefan ; Sayed, Ali H.</creatorcontrib><description>Driven by the need to solve increasingly complex optimization problems in signal processing and machine learning, there has been increasing interest in understanding the behavior of gradient-descent algorithms in non-convex environments. Most available works on distributed non-convex optimization problems focus on the deterministic setting where exact gradients are available at each agent. In this work and its Part II, we consider stochastic cost functions, where exact gradients are replaced by stochastic approximations and the resulting gradient noise persistently seeps into the dynamics of the algorithm. We establish that the diffusion learning strategy continues to yield meaningful estimates non-convex scenarios in the sense that the iterates by the individual agents will cluster in a small region around the network centroid. We use this insight to motivate a short-term model for network evolution over a finite-horizon. In Part II of this work, we leverage this model to establish descent of the diffusion strategy through saddle points in O(1/μ) steps, where μ denotes the step-size, and the return of approximately second-order stationary points in a polynomial number of iterations.</description><identifier>ISSN: 1053-587X</identifier><identifier>EISSN: 1941-0476</identifier><identifier>DOI: 10.1109/TSP.2021.3050858</identifier><identifier>CODEN: ITPRED</identifier><language>eng</language><publisher>New York: IEEE</publisher><subject>adaptation ; Aggregates ; Algorithms ; Annealing ; Centroids ; Computational geometry ; Convexity ; Cost function ; Descent ; diffusion learning ; distributed optimization ; Eigenvalues and eigenfunctions ; gradient noise ; Heuristic algorithms ; Machine learning ; non-convex cost ; Optimization ; Polynomials ; Saddle points ; Signal processing ; Signal processing algorithms ; stationary points ; Stochastic optimization ; Stochastic processes</subject><ispartof>IEEE transactions on signal processing, 2021, Vol.69, p.1242-1256</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2021</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c399t-8934cc21408e508322151ed27dc6e621acf8ac085782214d51df94490cf4af5a3</citedby><cites>FETCH-LOGICAL-c399t-8934cc21408e508322151ed27dc6e621acf8ac085782214d51df94490cf4af5a3</cites><orcidid>0000-0002-5125-5519 ; 0000-0002-0616-3076</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/9321168$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,780,784,796,4024,27923,27924,27925,54758</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/9321168$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Vlaski, Stefan</creatorcontrib><creatorcontrib>Sayed, Ali H.</creatorcontrib><title>Distributed Learning in Non-Convex Environments-Part I: Agreement at a Linear Rate</title><title>IEEE transactions on signal processing</title><addtitle>TSP</addtitle><description>Driven by the need to solve increasingly complex optimization problems in signal processing and machine learning, there has been increasing interest in understanding the behavior of gradient-descent algorithms in non-convex environments. Most available works on distributed non-convex optimization problems focus on the deterministic setting where exact gradients are available at each agent. In this work and its Part II, we consider stochastic cost functions, where exact gradients are replaced by stochastic approximations and the resulting gradient noise persistently seeps into the dynamics of the algorithm. We establish that the diffusion learning strategy continues to yield meaningful estimates non-convex scenarios in the sense that the iterates by the individual agents will cluster in a small region around the network centroid. We use this insight to motivate a short-term model for network evolution over a finite-horizon. In Part II of this work, we leverage this model to establish descent of the diffusion strategy through saddle points in O(1/μ) steps, where μ denotes the step-size, and the return of approximately second-order stationary points in a polynomial number of iterations.</description><subject>adaptation</subject><subject>Aggregates</subject><subject>Algorithms</subject><subject>Annealing</subject><subject>Centroids</subject><subject>Computational geometry</subject><subject>Convexity</subject><subject>Cost function</subject><subject>Descent</subject><subject>diffusion learning</subject><subject>distributed optimization</subject><subject>Eigenvalues and eigenfunctions</subject><subject>gradient noise</subject><subject>Heuristic algorithms</subject><subject>Machine learning</subject><subject>non-convex cost</subject><subject>Optimization</subject><subject>Polynomials</subject><subject>Saddle points</subject><subject>Signal processing</subject><subject>Signal processing algorithms</subject><subject>stationary points</subject><subject>Stochastic optimization</subject><subject>Stochastic processes</subject><issn>1053-587X</issn><issn>1941-0476</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNo9kE1LAzEQhoMoWKt3wUvA89Z8bhNvpVYtLFpqBW8hZmdLis3WZFv035vSIgQmzLzvfDwIXVMyoJTou8XbbMAIowNOJFFSnaAe1YIWRAzL0_wnkhdSDT_O0UVKK0KoELrsofmDT130n9sOalyBjcGHJfYBv7ShGLdhBz94EnY-tmENoUvFzMYOT-_xaBkB9ils88OVD9mM57aDS3TW2K8EV8fYR--Pk8X4uahen6bjUVU4rnVXKM2Fc4wKoiBvzBmjkkLNhrUroWTUukZZl08ZqlwStaR1o_PSxDXCNtLyPro99N3E9nsLqTOrdhtDHmmY0LwkGYvMKnJQudimFKExm-jXNv4aSsyenMnkzJ6cOZLLlpuDxQPAv1xzRmmp-B85smhC</recordid><startdate>2021</startdate><enddate>2021</enddate><creator>Vlaski, Stefan</creator><creator>Sayed, Ali H.</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0002-5125-5519</orcidid><orcidid>https://orcid.org/0000-0002-0616-3076</orcidid></search><sort><creationdate>2021</creationdate><title>Distributed Learning in Non-Convex Environments-Part I: Agreement at a Linear Rate</title><author>Vlaski, Stefan ; Sayed, Ali H.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c399t-8934cc21408e508322151ed27dc6e621acf8ac085782214d51df94490cf4af5a3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>adaptation</topic><topic>Aggregates</topic><topic>Algorithms</topic><topic>Annealing</topic><topic>Centroids</topic><topic>Computational geometry</topic><topic>Convexity</topic><topic>Cost function</topic><topic>Descent</topic><topic>diffusion learning</topic><topic>distributed optimization</topic><topic>Eigenvalues and eigenfunctions</topic><topic>gradient noise</topic><topic>Heuristic algorithms</topic><topic>Machine learning</topic><topic>non-convex cost</topic><topic>Optimization</topic><topic>Polynomials</topic><topic>Saddle points</topic><topic>Signal processing</topic><topic>Signal processing algorithms</topic><topic>stationary points</topic><topic>Stochastic optimization</topic><topic>Stochastic processes</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Vlaski, Stefan</creatorcontrib><creatorcontrib>Sayed, Ali H.</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics &amp; Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>IEEE transactions on signal processing</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Vlaski, Stefan</au><au>Sayed, Ali H.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Distributed Learning in Non-Convex Environments-Part I: Agreement at a Linear Rate</atitle><jtitle>IEEE transactions on signal processing</jtitle><stitle>TSP</stitle><date>2021</date><risdate>2021</risdate><volume>69</volume><spage>1242</spage><epage>1256</epage><pages>1242-1256</pages><issn>1053-587X</issn><eissn>1941-0476</eissn><coden>ITPRED</coden><abstract>Driven by the need to solve increasingly complex optimization problems in signal processing and machine learning, there has been increasing interest in understanding the behavior of gradient-descent algorithms in non-convex environments. Most available works on distributed non-convex optimization problems focus on the deterministic setting where exact gradients are available at each agent. In this work and its Part II, we consider stochastic cost functions, where exact gradients are replaced by stochastic approximations and the resulting gradient noise persistently seeps into the dynamics of the algorithm. We establish that the diffusion learning strategy continues to yield meaningful estimates non-convex scenarios in the sense that the iterates by the individual agents will cluster in a small region around the network centroid. We use this insight to motivate a short-term model for network evolution over a finite-horizon. In Part II of this work, we leverage this model to establish descent of the diffusion strategy through saddle points in O(1/μ) steps, where μ denotes the step-size, and the return of approximately second-order stationary points in a polynomial number of iterations.</abstract><cop>New York</cop><pub>IEEE</pub><doi>10.1109/TSP.2021.3050858</doi><tpages>15</tpages><orcidid>https://orcid.org/0000-0002-5125-5519</orcidid><orcidid>https://orcid.org/0000-0002-0616-3076</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier ISSN: 1053-587X
ispartof IEEE transactions on signal processing, 2021, Vol.69, p.1242-1256
issn 1053-587X
1941-0476
language eng
recordid cdi_proquest_journals_2493601095
source IEEE Electronic Library (IEL)
subjects adaptation
Aggregates
Algorithms
Annealing
Centroids
Computational geometry
Convexity
Cost function
Descent
diffusion learning
distributed optimization
Eigenvalues and eigenfunctions
gradient noise
Heuristic algorithms
Machine learning
non-convex cost
Optimization
Polynomials
Saddle points
Signal processing
Signal processing algorithms
stationary points
Stochastic optimization
Stochastic processes
title Distributed Learning in Non-Convex Environments-Part I: Agreement at a Linear Rate
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-19T09%3A43%3A49IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Distributed%20Learning%20in%20Non-Convex%20Environments-Part%20I:%20Agreement%20at%20a%20Linear%20Rate&rft.jtitle=IEEE%20transactions%20on%20signal%20processing&rft.au=Vlaski,%20Stefan&rft.date=2021&rft.volume=69&rft.spage=1242&rft.epage=1256&rft.pages=1242-1256&rft.issn=1053-587X&rft.eissn=1941-0476&rft.coden=ITPRED&rft_id=info:doi/10.1109/TSP.2021.3050858&rft_dat=%3Cproquest_RIE%3E2493601095%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2493601095&rft_id=info:pmid/&rft_ieee_id=9321168&rfr_iscdi=true