Distributed Inference for Linear Support Vector Machine

The growing size of modern data brings many new challenges to existing statistical inference methodologies and theories, and calls for the development of distributed inferential approaches. This paper studies distributed inference for linear support vector machine (SVM) for the binary classification...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	arXiv.org 2019-09
Hauptverfasser:	Wang, Xiaozhou, Yang, Zhuoyi, Chen, Xi, Liu, Weidong
Format:	Artikel
Sprache:	eng
Schlagworte:	Asymptotic properties Computer simulation Least squares method Normality Representations Statistical inference Support vector machines
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title	arXiv.org
container_volume
creator	Wang, Xiaozhou Yang, Zhuoyi Chen, Xi Liu, Weidong
description	The growing size of modern data brings many new challenges to existing statistical inference methodologies and theories, and calls for the development of distributed inferential approaches. This paper studies distributed inference for linear support vector machine (SVM) for the binary classification task. Despite a vast literature on SVM, much less is known about the inferential properties of SVM, especially in a distributed setting. In this paper, we propose a multi-round distributed linear-type (MDL) estimator for conducting inference for linear SVM. The proposed estimator is computationally efficient. In particular, it only requires an initial SVM estimator and then successively refines the estimator by solving simple weighted least squares problem. Theoretically, we establish the Bahadur representation of the estimator. Based on the representation, the asymptotic normality is further derived, which shows that the MDL estimator achieves the optimal statistical efficiency, i.e., the same efficiency as the classical linear SVM applying to the entire data set in a single machine setup. Moreover, our asymptotic result avoids the condition on the number of machines or data batches, which is commonly assumed in distributed estimation literature, and allows the case of diverging dimension. We provide simulation studies to demonstrate the performance of the proposed MDL estimator.
format	Article
fullrecord	<record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_2139812973</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2139812973</sourcerecordid><originalsourceid>FETCH-proquest_journals_21398129733</originalsourceid><addsrcrecordid>eNpjYuA0MjY21LUwMTLiYOAtLs4yMDAwMjM3MjU15mQwd8ksLinKTCotSU1R8MxLSy1KzUtOVUjLL1LwycxLTSxSCC4tKMgvKlEIS00uAYr6JiZnACV4GFjTEnOKU3mhNDeDsptriLOHbkFRfmFpanFJfFZ-aVEeUCreyNDY0sLQyNLc2Jg4VQAJAjWv</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2139812973</pqid></control><display><type>article</type><title>Distributed Inference for Linear Support Vector Machine</title><source>Free E- Journals</source><creator>Wang, Xiaozhou ; Yang, Zhuoyi ; Chen, Xi ; Liu, Weidong</creator><creatorcontrib>Wang, Xiaozhou ; Yang, Zhuoyi ; Chen, Xi ; Liu, Weidong</creatorcontrib><description>The growing size of modern data brings many new challenges to existing statistical inference methodologies and theories, and calls for the development of distributed inferential approaches. This paper studies distributed inference for linear support vector machine (SVM) for the binary classification task. Despite a vast literature on SVM, much less is known about the inferential properties of SVM, especially in a distributed setting. In this paper, we propose a multi-round distributed linear-type (MDL) estimator for conducting inference for linear SVM. The proposed estimator is computationally efficient. In particular, it only requires an initial SVM estimator and then successively refines the estimator by solving simple weighted least squares problem. Theoretically, we establish the Bahadur representation of the estimator. Based on the representation, the asymptotic normality is further derived, which shows that the MDL estimator achieves the optimal statistical efficiency, i.e., the same efficiency as the classical linear SVM applying to the entire data set in a single machine setup. Moreover, our asymptotic result avoids the condition on the number of machines or data batches, which is commonly assumed in distributed estimation literature, and allows the case of diverging dimension. We provide simulation studies to demonstrate the performance of the proposed MDL estimator.</description><identifier>EISSN: 2331-8422</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Asymptotic properties ; Computer simulation ; Least squares method ; Normality ; Representations ; Statistical inference ; Support vector machines</subject><ispartof>arXiv.org, 2019-09</ispartof><rights>2019. This work is published under http://arxiv.org/licenses/nonexclusive-distrib/1.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>776,780</link.rule.ids></links><search><creatorcontrib>Wang, Xiaozhou</creatorcontrib><creatorcontrib>Yang, Zhuoyi</creatorcontrib><creatorcontrib>Chen, Xi</creatorcontrib><creatorcontrib>Liu, Weidong</creatorcontrib><title>Distributed Inference for Linear Support Vector Machine</title><title>arXiv.org</title><description>The growing size of modern data brings many new challenges to existing statistical inference methodologies and theories, and calls for the development of distributed inferential approaches. This paper studies distributed inference for linear support vector machine (SVM) for the binary classification task. Despite a vast literature on SVM, much less is known about the inferential properties of SVM, especially in a distributed setting. In this paper, we propose a multi-round distributed linear-type (MDL) estimator for conducting inference for linear SVM. The proposed estimator is computationally efficient. In particular, it only requires an initial SVM estimator and then successively refines the estimator by solving simple weighted least squares problem. Theoretically, we establish the Bahadur representation of the estimator. Based on the representation, the asymptotic normality is further derived, which shows that the MDL estimator achieves the optimal statistical efficiency, i.e., the same efficiency as the classical linear SVM applying to the entire data set in a single machine setup. Moreover, our asymptotic result avoids the condition on the number of machines or data batches, which is commonly assumed in distributed estimation literature, and allows the case of diverging dimension. We provide simulation studies to demonstrate the performance of the proposed MDL estimator.</description><subject>Asymptotic properties</subject><subject>Computer simulation</subject><subject>Least squares method</subject><subject>Normality</subject><subject>Representations</subject><subject>Statistical inference</subject><subject>Support vector machines</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2019</creationdate><recordtype>article</recordtype><sourceid>BENPR</sourceid><recordid>eNpjYuA0MjY21LUwMTLiYOAtLs4yMDAwMjM3MjU15mQwd8ksLinKTCotSU1R8MxLSy1KzUtOVUjLL1LwycxLTSxSCC4tKMgvKlEIS00uAYr6JiZnACV4GFjTEnOKU3mhNDeDsptriLOHbkFRfmFpanFJfFZ-aVEeUCreyNDY0sLQyNLc2Jg4VQAJAjWv</recordid><startdate>20190920</startdate><enddate>20190920</enddate><creator>Wang, Xiaozhou</creator><creator>Yang, Zhuoyi</creator><creator>Chen, Xi</creator><creator>Liu, Weidong</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>20190920</creationdate><title>Distributed Inference for Linear Support Vector Machine</title><author>Wang, Xiaozhou ; Yang, Zhuoyi ; Chen, Xi ; Liu, Weidong</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_21398129733</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2019</creationdate><topic>Asymptotic properties</topic><topic>Computer simulation</topic><topic>Least squares method</topic><topic>Normality</topic><topic>Representations</topic><topic>Statistical inference</topic><topic>Support vector machines</topic><toplevel>online_resources</toplevel><creatorcontrib>Wang, Xiaozhou</creatorcontrib><creatorcontrib>Yang, Zhuoyi</creatorcontrib><creatorcontrib>Chen, Xi</creatorcontrib><creatorcontrib>Liu, Weidong</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Wang, Xiaozhou</au><au>Yang, Zhuoyi</au><au>Chen, Xi</au><au>Liu, Weidong</au><format>book</format><genre>document</genre><ristype>GEN</ristype><atitle>Distributed Inference for Linear Support Vector Machine</atitle><jtitle>arXiv.org</jtitle><date>2019-09-20</date><risdate>2019</risdate><eissn>2331-8422</eissn><abstract>The growing size of modern data brings many new challenges to existing statistical inference methodologies and theories, and calls for the development of distributed inferential approaches. This paper studies distributed inference for linear support vector machine (SVM) for the binary classification task. Despite a vast literature on SVM, much less is known about the inferential properties of SVM, especially in a distributed setting. In this paper, we propose a multi-round distributed linear-type (MDL) estimator for conducting inference for linear SVM. The proposed estimator is computationally efficient. In particular, it only requires an initial SVM estimator and then successively refines the estimator by solving simple weighted least squares problem. Theoretically, we establish the Bahadur representation of the estimator. Based on the representation, the asymptotic normality is further derived, which shows that the MDL estimator achieves the optimal statistical efficiency, i.e., the same efficiency as the classical linear SVM applying to the entire data set in a single machine setup. Moreover, our asymptotic result avoids the condition on the number of machines or data batches, which is commonly assumed in distributed estimation literature, and allows the case of diverging dimension. We provide simulation studies to demonstrate the performance of the proposed MDL estimator.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	EISSN: 2331-8422
ispartof	arXiv.org, 2019-09
issn	2331-8422
language	eng
recordid	cdi_proquest_journals_2139812973
source	Free E- Journals
subjects	Asymptotic properties Computer simulation Least squares method Normality Representations Statistical inference Support vector machines
title	Distributed Inference for Linear Support Vector Machine
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-30T04%3A16%3A43IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=document&rft.atitle=Distributed%20Inference%20for%20Linear%20Support%20Vector%20Machine&rft.jtitle=arXiv.org&rft.au=Wang,%20Xiaozhou&rft.date=2019-09-20&rft.eissn=2331-8422&rft_id=info:doi/&rft_dat=%3Cproquest%3E2139812973%3C/proquest%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2139812973&rft_id=info:pmid/&rfr_iscdi=true