Distributed Training with Heterogeneous Data: Bridging Median- and Mean-Based Algorithms
Recently, there is a growing interest in the study of median-based algorithms for distributed non-convex optimization. Two prominent such algorithms include signSGD with majority vote, an effective approach for communication reduction via 1-bit compression on the local gradients, and medianSGD, an a...
Gespeichert in:
Hauptverfasser: | , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | |
---|---|
container_issue | |
container_start_page | |
container_title | |
container_volume | |
creator | Chen, Xiangyi Chen, Tiancong Sun, Haoran Wu, Zhiwei Steven Hong, Mingyi |
description | Recently, there is a growing interest in the study of median-based algorithms
for distributed non-convex optimization. Two prominent such algorithms include
signSGD with majority vote, an effective approach for communication reduction
via 1-bit compression on the local gradients, and medianSGD, an algorithm
recently proposed to ensure robustness against Byzantine workers. The
convergence analyses for these algorithms critically rely on the assumption
that all the distributed data are drawn iid from the same distribution.
However, in applications such as Federated Learning, the data across different
nodes or machines can be inherently heterogeneous, which violates such an iid
assumption. This work analyzes signSGD and medianSGD in distributed settings
with heterogeneous data. We show that these algorithms are non-convergent
whenever there is some disparity between the expected median and mean over the
local gradients. To overcome this gap, we provide a novel gradient correction
mechanism that perturbs the local gradients with noise, together with a series
results that provable close the gap between mean and median of the gradients.
The proposed methods largely preserve nice properties of these methods, such as
the low per-iteration communication complexity of signSGD, and further enjoy
global convergence to stationary solutions. Our perturbation technique can be
of independent interest when one wishes to estimate mean through a median
estimator. |
doi_str_mv | 10.48550/arxiv.1906.01736 |
format | Article |
fullrecord | <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_1906_01736</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>1906_01736</sourcerecordid><originalsourceid>FETCH-LOGICAL-a676-37938ff5ff1aaa650b10cf3dd52a52c9ab4764a0762fc5c381554d5aabc77fca3</originalsourceid><addsrcrecordid>eNotj7tOwzAYhb0woMIDMOEXSLDj_HbC1gtQpCKWDGzRH1-CpdZBtsvl7UkL0_mGc470EXLDWVk3AOwO47f_LHnLZMm4EvKSvG18ytEPx2wN7SL64MNIv3x-p1ubbZxGG-x0THSDGe_pKnoznhov1ngMBcVgZp5phWl-WO7HKc7jQ7oiFw73yV7_54J0jw_delvsXp-e18tdgVLJQqhWNM6BcxwRJbCBM-2EMVAhVLrFoVayRqZk5TRo0XCA2gDioJVyGsWC3P7dntX6j-gPGH_6k2J_VhS_9JBMvw</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Distributed Training with Heterogeneous Data: Bridging Median- and Mean-Based Algorithms</title><source>arXiv.org</source><creator>Chen, Xiangyi ; Chen, Tiancong ; Sun, Haoran ; Wu, Zhiwei Steven ; Hong, Mingyi</creator><creatorcontrib>Chen, Xiangyi ; Chen, Tiancong ; Sun, Haoran ; Wu, Zhiwei Steven ; Hong, Mingyi</creatorcontrib><description>Recently, there is a growing interest in the study of median-based algorithms
for distributed non-convex optimization. Two prominent such algorithms include
signSGD with majority vote, an effective approach for communication reduction
via 1-bit compression on the local gradients, and medianSGD, an algorithm
recently proposed to ensure robustness against Byzantine workers. The
convergence analyses for these algorithms critically rely on the assumption
that all the distributed data are drawn iid from the same distribution.
However, in applications such as Federated Learning, the data across different
nodes or machines can be inherently heterogeneous, which violates such an iid
assumption. This work analyzes signSGD and medianSGD in distributed settings
with heterogeneous data. We show that these algorithms are non-convergent
whenever there is some disparity between the expected median and mean over the
local gradients. To overcome this gap, we provide a novel gradient correction
mechanism that perturbs the local gradients with noise, together with a series
results that provable close the gap between mean and median of the gradients.
The proposed methods largely preserve nice properties of these methods, such as
the low per-iteration communication complexity of signSGD, and further enjoy
global convergence to stationary solutions. Our perturbation technique can be
of independent interest when one wishes to estimate mean through a median
estimator.</description><identifier>DOI: 10.48550/arxiv.1906.01736</identifier><language>eng</language><subject>Computer Science - Distributed, Parallel, and Cluster Computing ; Computer Science - Learning ; Mathematics - Optimization and Control ; Statistics - Machine Learning</subject><creationdate>2019-06</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,776,881</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/1906.01736$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.1906.01736$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Chen, Xiangyi</creatorcontrib><creatorcontrib>Chen, Tiancong</creatorcontrib><creatorcontrib>Sun, Haoran</creatorcontrib><creatorcontrib>Wu, Zhiwei Steven</creatorcontrib><creatorcontrib>Hong, Mingyi</creatorcontrib><title>Distributed Training with Heterogeneous Data: Bridging Median- and Mean-Based Algorithms</title><description>Recently, there is a growing interest in the study of median-based algorithms
for distributed non-convex optimization. Two prominent such algorithms include
signSGD with majority vote, an effective approach for communication reduction
via 1-bit compression on the local gradients, and medianSGD, an algorithm
recently proposed to ensure robustness against Byzantine workers. The
convergence analyses for these algorithms critically rely on the assumption
that all the distributed data are drawn iid from the same distribution.
However, in applications such as Federated Learning, the data across different
nodes or machines can be inherently heterogeneous, which violates such an iid
assumption. This work analyzes signSGD and medianSGD in distributed settings
with heterogeneous data. We show that these algorithms are non-convergent
whenever there is some disparity between the expected median and mean over the
local gradients. To overcome this gap, we provide a novel gradient correction
mechanism that perturbs the local gradients with noise, together with a series
results that provable close the gap between mean and median of the gradients.
The proposed methods largely preserve nice properties of these methods, such as
the low per-iteration communication complexity of signSGD, and further enjoy
global convergence to stationary solutions. Our perturbation technique can be
of independent interest when one wishes to estimate mean through a median
estimator.</description><subject>Computer Science - Distributed, Parallel, and Cluster Computing</subject><subject>Computer Science - Learning</subject><subject>Mathematics - Optimization and Control</subject><subject>Statistics - Machine Learning</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2019</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotj7tOwzAYhb0woMIDMOEXSLDj_HbC1gtQpCKWDGzRH1-CpdZBtsvl7UkL0_mGc470EXLDWVk3AOwO47f_LHnLZMm4EvKSvG18ytEPx2wN7SL64MNIv3x-p1ubbZxGG-x0THSDGe_pKnoznhov1ngMBcVgZp5phWl-WO7HKc7jQ7oiFw73yV7_54J0jw_delvsXp-e18tdgVLJQqhWNM6BcxwRJbCBM-2EMVAhVLrFoVayRqZk5TRo0XCA2gDioJVyGsWC3P7dntX6j-gPGH_6k2J_VhS_9JBMvw</recordid><startdate>20190604</startdate><enddate>20190604</enddate><creator>Chen, Xiangyi</creator><creator>Chen, Tiancong</creator><creator>Sun, Haoran</creator><creator>Wu, Zhiwei Steven</creator><creator>Hong, Mingyi</creator><scope>AKY</scope><scope>AKZ</scope><scope>EPD</scope><scope>GOX</scope></search><sort><creationdate>20190604</creationdate><title>Distributed Training with Heterogeneous Data: Bridging Median- and Mean-Based Algorithms</title><author>Chen, Xiangyi ; Chen, Tiancong ; Sun, Haoran ; Wu, Zhiwei Steven ; Hong, Mingyi</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a676-37938ff5ff1aaa650b10cf3dd52a52c9ab4764a0762fc5c381554d5aabc77fca3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2019</creationdate><topic>Computer Science - Distributed, Parallel, and Cluster Computing</topic><topic>Computer Science - Learning</topic><topic>Mathematics - Optimization and Control</topic><topic>Statistics - Machine Learning</topic><toplevel>online_resources</toplevel><creatorcontrib>Chen, Xiangyi</creatorcontrib><creatorcontrib>Chen, Tiancong</creatorcontrib><creatorcontrib>Sun, Haoran</creatorcontrib><creatorcontrib>Wu, Zhiwei Steven</creatorcontrib><creatorcontrib>Hong, Mingyi</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv Mathematics</collection><collection>arXiv Statistics</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Chen, Xiangyi</au><au>Chen, Tiancong</au><au>Sun, Haoran</au><au>Wu, Zhiwei Steven</au><au>Hong, Mingyi</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Distributed Training with Heterogeneous Data: Bridging Median- and Mean-Based Algorithms</atitle><date>2019-06-04</date><risdate>2019</risdate><abstract>Recently, there is a growing interest in the study of median-based algorithms
for distributed non-convex optimization. Two prominent such algorithms include
signSGD with majority vote, an effective approach for communication reduction
via 1-bit compression on the local gradients, and medianSGD, an algorithm
recently proposed to ensure robustness against Byzantine workers. The
convergence analyses for these algorithms critically rely on the assumption
that all the distributed data are drawn iid from the same distribution.
However, in applications such as Federated Learning, the data across different
nodes or machines can be inherently heterogeneous, which violates such an iid
assumption. This work analyzes signSGD and medianSGD in distributed settings
with heterogeneous data. We show that these algorithms are non-convergent
whenever there is some disparity between the expected median and mean over the
local gradients. To overcome this gap, we provide a novel gradient correction
mechanism that perturbs the local gradients with noise, together with a series
results that provable close the gap between mean and median of the gradients.
The proposed methods largely preserve nice properties of these methods, such as
the low per-iteration communication complexity of signSGD, and further enjoy
global convergence to stationary solutions. Our perturbation technique can be
of independent interest when one wishes to estimate mean through a median
estimator.</abstract><doi>10.48550/arxiv.1906.01736</doi><oa>free_for_read</oa></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | DOI: 10.48550/arxiv.1906.01736 |
ispartof | |
issn | |
language | eng |
recordid | cdi_arxiv_primary_1906_01736 |
source | arXiv.org |
subjects | Computer Science - Distributed, Parallel, and Cluster Computing Computer Science - Learning Mathematics - Optimization and Control Statistics - Machine Learning |
title | Distributed Training with Heterogeneous Data: Bridging Median- and Mean-Based Algorithms |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-07T23%3A24%3A31IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Distributed%20Training%20with%20Heterogeneous%20Data:%20Bridging%20Median-%20and%20Mean-Based%20Algorithms&rft.au=Chen,%20Xiangyi&rft.date=2019-06-04&rft_id=info:doi/10.48550/arxiv.1906.01736&rft_dat=%3Carxiv_GOX%3E1906_01736%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true |