Distributed Training with Heterogeneous Data: Bridging Median- and Mean-Based Algorithms

Recently, there is a growing interest in the study of median-based algorithms for distributed non-convex optimization. Two prominent such algorithms include signSGD with majority vote, an effective approach for communication reduction via 1-bit compression on the local gradients, and medianSGD, an a...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Chen, Xiangyi, Chen, Tiancong, Sun, Haoran, Wu, Zhiwei Steven, Hong, Mingyi
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer Science - Distributed, Parallel, and Cluster Computing Computer Science - Learning Mathematics - Optimization and Control Statistics - Machine Learning
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title
container_volume
creator	Chen, Xiangyi Chen, Tiancong Sun, Haoran Wu, Zhiwei Steven Hong, Mingyi
description	Recently, there is a growing interest in the study of median-based algorithms for distributed non-convex optimization. Two prominent such algorithms include signSGD with majority vote, an effective approach for communication reduction via 1-bit compression on the local gradients, and medianSGD, an algorithm recently proposed to ensure robustness against Byzantine workers. The convergence analyses for these algorithms critically rely on the assumption that all the distributed data are drawn iid from the same distribution. However, in applications such as Federated Learning, the data across different nodes or machines can be inherently heterogeneous, which violates such an iid assumption. This work analyzes signSGD and medianSGD in distributed settings with heterogeneous data. We show that these algorithms are non-convergent whenever there is some disparity between the expected median and mean over the local gradients. To overcome this gap, we provide a novel gradient correction mechanism that perturbs the local gradients with noise, together with a series results that provable close the gap between mean and median of the gradients. The proposed methods largely preserve nice properties of these methods, such as the low per-iteration communication complexity of signSGD, and further enjoy global convergence to stationary solutions. Our perturbation technique can be of independent interest when one wishes to estimate mean through a median estimator.
doi_str_mv	10.48550/arxiv.1906.01736
format	Article
fullrecord	<record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_1906_01736</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>1906_01736</sourcerecordid><originalsourceid>FETCH-LOGICAL-a676-37938ff5ff1aaa650b10cf3dd52a52c9ab4764a0762fc5c381554d5aabc77fca3</originalsourceid><addsrcrecordid>eNotj7tOwzAYhb0woMIDMOEXSLDj_HbC1gtQpCKWDGzRH1-CpdZBtsvl7UkL0_mGc470EXLDWVk3AOwO47f_LHnLZMm4EvKSvG18ytEPx2wN7SL64MNIv3x-p1ubbZxGG-x0THSDGe_pKnoznhov1ngMBcVgZp5phWl-WO7HKc7jQ7oiFw73yV7_54J0jw_delvsXp-e18tdgVLJQqhWNM6BcxwRJbCBM-2EMVAhVLrFoVayRqZk5TRo0XCA2gDioJVyGsWC3P7dntX6j-gPGH_6k2J_VhS_9JBMvw</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Distributed Training with Heterogeneous Data: Bridging Median- and Mean-Based Algorithms</title><source>arXiv.org</source><creator>Chen, Xiangyi ; Chen, Tiancong ; Sun, Haoran ; Wu, Zhiwei Steven ; Hong, Mingyi</creator><creatorcontrib>Chen, Xiangyi ; Chen, Tiancong ; Sun, Haoran ; Wu, Zhiwei Steven ; Hong, Mingyi</creatorcontrib><description>Recently, there is a growing interest in the study of median-based algorithms for distributed non-convex optimization. Two prominent such algorithms include signSGD with majority vote, an effective approach for communication reduction via 1-bit compression on the local gradients, and medianSGD, an algorithm recently proposed to ensure robustness against Byzantine workers. The convergence analyses for these algorithms critically rely on the assumption that all the distributed data are drawn iid from the same distribution. However, in applications such as Federated Learning, the data across different nodes or machines can be inherently heterogeneous, which violates such an iid assumption. This work analyzes signSGD and medianSGD in distributed settings with heterogeneous data. We show that these algorithms are non-convergent whenever there is some disparity between the expected median and mean over the local gradients. To overcome this gap, we provide a novel gradient correction mechanism that perturbs the local gradients with noise, together with a series results that provable close the gap between mean and median of the gradients. The proposed methods largely preserve nice properties of these methods, such as the low per-iteration communication complexity of signSGD, and further enjoy global convergence to stationary solutions. Our perturbation technique can be of independent interest when one wishes to estimate mean through a median estimator.</description><identifier>DOI: 10.48550/arxiv.1906.01736</identifier><language>eng</language><subject>Computer Science - Distributed, Parallel, and Cluster Computing ; Computer Science - Learning ; Mathematics - Optimization and Control ; Statistics - Machine Learning</subject><creationdate>2019-06</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,776,881</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/1906.01736$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.1906.01736$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Chen, Xiangyi</creatorcontrib><creatorcontrib>Chen, Tiancong</creatorcontrib><creatorcontrib>Sun, Haoran</creatorcontrib><creatorcontrib>Wu, Zhiwei Steven</creatorcontrib><creatorcontrib>Hong, Mingyi</creatorcontrib><title>Distributed Training with Heterogeneous Data: Bridging Median- and Mean-Based Algorithms</title><description>Recently, there is a growing interest in the study of median-based algorithms for distributed non-convex optimization. Two prominent such algorithms include signSGD with majority vote, an effective approach for communication reduction via 1-bit compression on the local gradients, and medianSGD, an algorithm recently proposed to ensure robustness against Byzantine workers. The convergence analyses for these algorithms critically rely on the assumption that all the distributed data are drawn iid from the same distribution. However, in applications such as Federated Learning, the data across different nodes or machines can be inherently heterogeneous, which violates such an iid assumption. This work analyzes signSGD and medianSGD in distributed settings with heterogeneous data. We show that these algorithms are non-convergent whenever there is some disparity between the expected median and mean over the local gradients. To overcome this gap, we provide a novel gradient correction mechanism that perturbs the local gradients with noise, together with a series results that provable close the gap between mean and median of the gradients. The proposed methods largely preserve nice properties of these methods, such as the low per-iteration communication complexity of signSGD, and further enjoy global convergence to stationary solutions. Our perturbation technique can be of independent interest when one wishes to estimate mean through a median estimator.</description><subject>Computer Science - Distributed, Parallel, and Cluster Computing</subject><subject>Computer Science - Learning</subject><subject>Mathematics - Optimization and Control</subject><subject>Statistics - Machine Learning</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2019</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotj7tOwzAYhb0woMIDMOEXSLDj_HbC1gtQpCKWDGzRH1-CpdZBtsvl7UkL0_mGc470EXLDWVk3AOwO47f_LHnLZMm4EvKSvG18ytEPx2wN7SL64MNIv3x-p1ubbZxGG-x0THSDGe_pKnoznhov1ngMBcVgZp5phWl-WO7HKc7jQ7oiFw73yV7_54J0jw_delvsXp-e18tdgVLJQqhWNM6BcxwRJbCBM-2EMVAhVLrFoVayRqZk5TRo0XCA2gDioJVyGsWC3P7dntX6j-gPGH_6k2J_VhS_9JBMvw</recordid><startdate>20190604</startdate><enddate>20190604</enddate><creator>Chen, Xiangyi</creator><creator>Chen, Tiancong</creator><creator>Sun, Haoran</creator><creator>Wu, Zhiwei Steven</creator><creator>Hong, Mingyi</creator><scope>AKY</scope><scope>AKZ</scope><scope>EPD</scope><scope>GOX</scope></search><sort><creationdate>20190604</creationdate><title>Distributed Training with Heterogeneous Data: Bridging Median- and Mean-Based Algorithms</title><author>Chen, Xiangyi ; Chen, Tiancong ; Sun, Haoran ; Wu, Zhiwei Steven ; Hong, Mingyi</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a676-37938ff5ff1aaa650b10cf3dd52a52c9ab4764a0762fc5c381554d5aabc77fca3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2019</creationdate><topic>Computer Science - Distributed, Parallel, and Cluster Computing</topic><topic>Computer Science - Learning</topic><topic>Mathematics - Optimization and Control</topic><topic>Statistics - Machine Learning</topic><toplevel>online_resources</toplevel><creatorcontrib>Chen, Xiangyi</creatorcontrib><creatorcontrib>Chen, Tiancong</creatorcontrib><creatorcontrib>Sun, Haoran</creatorcontrib><creatorcontrib>Wu, Zhiwei Steven</creatorcontrib><creatorcontrib>Hong, Mingyi</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv Mathematics</collection><collection>arXiv Statistics</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Chen, Xiangyi</au><au>Chen, Tiancong</au><au>Sun, Haoran</au><au>Wu, Zhiwei Steven</au><au>Hong, Mingyi</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Distributed Training with Heterogeneous Data: Bridging Median- and Mean-Based Algorithms</atitle><date>2019-06-04</date><risdate>2019</risdate><abstract>Recently, there is a growing interest in the study of median-based algorithms for distributed non-convex optimization. Two prominent such algorithms include signSGD with majority vote, an effective approach for communication reduction via 1-bit compression on the local gradients, and medianSGD, an algorithm recently proposed to ensure robustness against Byzantine workers. The convergence analyses for these algorithms critically rely on the assumption that all the distributed data are drawn iid from the same distribution. However, in applications such as Federated Learning, the data across different nodes or machines can be inherently heterogeneous, which violates such an iid assumption. This work analyzes signSGD and medianSGD in distributed settings with heterogeneous data. We show that these algorithms are non-convergent whenever there is some disparity between the expected median and mean over the local gradients. To overcome this gap, we provide a novel gradient correction mechanism that perturbs the local gradients with noise, together with a series results that provable close the gap between mean and median of the gradients. The proposed methods largely preserve nice properties of these methods, such as the low per-iteration communication complexity of signSGD, and further enjoy global convergence to stationary solutions. Our perturbation technique can be of independent interest when one wishes to estimate mean through a median estimator.</abstract><doi>10.48550/arxiv.1906.01736</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	DOI: 10.48550/arxiv.1906.01736
ispartof
issn
language	eng
recordid	cdi_arxiv_primary_1906_01736
source	arXiv.org
subjects	Computer Science - Distributed, Parallel, and Cluster Computing Computer Science - Learning Mathematics - Optimization and Control Statistics - Machine Learning
title	Distributed Training with Heterogeneous Data: Bridging Median- and Mean-Based Algorithms
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-07T23%3A24%3A31IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Distributed%20Training%20with%20Heterogeneous%20Data:%20Bridging%20Median-%20and%20Mean-Based%20Algorithms&rft.au=Chen,%20Xiangyi&rft.date=2019-06-04&rft_id=info:doi/10.48550/arxiv.1906.01736&rft_dat=%3Carxiv_GOX%3E1906_01736%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true