Fast SVM training using data reconstruction for classification of very large datasets

This paper proposes a fast support vector machine (SVM) training method for the classification of very large datasets using data reconstruction. The idea is to scale down the training data by removing the samples that have low probability to become support vectors (SVs) in the feature space. For thi...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEJ transactions on electrical and electronic engineering 2020-03, Vol.15 (3), p.372-381
Hauptverfasser:	Liang, Peifeng, Li, Weite, Hu, Jinglu
Format:	Artikel
Sprache:	eng
Schlagworte:	Classification Classifiers Computer simulation Datasets fast SVM training Iterative algorithms Iterative methods Kernels large datasets Mapping quasi‐linear kernel Reconstruction Separation support vector machine Support vector machines Training training data reconstruction
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	381
container_issue	3
container_start_page	372
container_title	IEEJ transactions on electrical and electronic engineering
container_volume	15
creator	Liang, Peifeng Li, Weite Hu, Jinglu
description	This paper proposes a fast support vector machine (SVM) training method for the classification of very large datasets using data reconstruction. The idea is to scale down the training data by removing the samples that have low probability to become support vectors (SVs) in the feature space. For this purpose, it applies a series of gradually refined rough SVM classifiers with a quasi‐linear kernel to build rough separation boundaries and remove those samples that are far away from the boundary. In order to make the proposed algorithm efficient for both low‐dimensional and high‐dimensional datasets, efforts are made on three aspects. The first one is to compose a quasi‐linear kernel using the information of data manifold and potential separation boundary such that the samples mapped to feature space keep a sparse distribution, especially in the direction perpendicular to the separation boundary. The second one is to avoid computing Euclidean distances between samples, which may lose its effect on very high dimensional datasets when mapping the samples to feature space and selecting the samples for training data reconstruction. The third one is to design a sophisticated iterative algorithm to gradually refine the rough SVM classifier so as to remove non‐SVs efficiently. The proposed fast SVM training method is applied to different real‐world large datasets and compared with different methods, and simulation results confirm the effectiveness of the proposed method, especially for very high dimensional datasets. © 2019 Institute of Electrical Engineers of Japan. Published by John Wiley & Sons, Inc.
doi_str_mv	10.1002/tee.23065
format	Article
fullrecord	<record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2350265721</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2350265721</sourcerecordid><originalsourceid>FETCH-LOGICAL-c3635-5bd76aff6badf20dd8db84d223f9ff9c327b30c908cbdfe7ab3ddbbf4e4d34b53</originalsourceid><addsrcrecordid>eNp1kEtLAzEUhYMoWKsL_0HAlYtpk9x5LqXUB1RcWN2GPEvKOKlJRum_d9oRd27uvRy-cw8chK4pmVFC2DwZM2NAyuIETWgDNMubmp7-3RWco4sYt4TkJdT1BL3di5jw6_szTkG4znUb3MfD1CIJHIzyXUyhV8n5DlsfsGpFjM46JY6St_jLhD1uRdiYoymaFC_RmRVtNFe_ezrkLNeLx2z18vC0uFtlCkooskLqqhTWllJoy4jWtZZ1rhkD21jbKGCVBKIaUiupramEBK2ltLnJNeSygCm6Gf_ugv_sTUx86_vQDZGcQUFYWVSMDtTtSKngYwzG8l1wHyLsOSX80BofWuPH1gZ2PrLfrjX7_0G-Xi5Hxw9kKXB2</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2350265721</pqid></control><display><type>article</type><title>Fast SVM training using data reconstruction for classification of very large datasets</title><source>Access via Wiley Online Library</source><creator>Liang, Peifeng ; Li, Weite ; Hu, Jinglu</creator><creatorcontrib>Liang, Peifeng ; Li, Weite ; Hu, Jinglu</creatorcontrib><description>This paper proposes a fast support vector machine (SVM) training method for the classification of very large datasets using data reconstruction. The idea is to scale down the training data by removing the samples that have low probability to become support vectors (SVs) in the feature space. For this purpose, it applies a series of gradually refined rough SVM classifiers with a quasi‐linear kernel to build rough separation boundaries and remove those samples that are far away from the boundary. In order to make the proposed algorithm efficient for both low‐dimensional and high‐dimensional datasets, efforts are made on three aspects. The first one is to compose a quasi‐linear kernel using the information of data manifold and potential separation boundary such that the samples mapped to feature space keep a sparse distribution, especially in the direction perpendicular to the separation boundary. The second one is to avoid computing Euclidean distances between samples, which may lose its effect on very high dimensional datasets when mapping the samples to feature space and selecting the samples for training data reconstruction. The third one is to design a sophisticated iterative algorithm to gradually refine the rough SVM classifier so as to remove non‐SVs efficiently. The proposed fast SVM training method is applied to different real‐world large datasets and compared with different methods, and simulation results confirm the effectiveness of the proposed method, especially for very high dimensional datasets. © 2019 Institute of Electrical Engineers of Japan. Published by John Wiley & Sons, Inc.</description><identifier>ISSN: 1931-4973</identifier><identifier>EISSN: 1931-4981</identifier><identifier>DOI: 10.1002/tee.23065</identifier><language>eng</language><publisher>Hoboken, USA: John Wiley & Sons, Inc</publisher><subject>Classification ; Classifiers ; Computer simulation ; Datasets ; fast SVM training ; Iterative algorithms ; Iterative methods ; Kernels ; large datasets ; Mapping ; quasi‐linear kernel ; Reconstruction ; Separation ; support vector machine ; Support vector machines ; Training ; training data reconstruction</subject><ispartof>IEEJ transactions on electrical and electronic engineering, 2020-03, Vol.15 (3), p.372-381</ispartof><rights>2019 Institute of Electrical Engineers of Japan. Published by John Wiley & Sons, Inc.</rights><rights>Copyright © 2020 Institute of Electrical Engineers of Japan. Published by John Wiley & Sons, Inc.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c3635-5bd76aff6badf20dd8db84d223f9ff9c327b30c908cbdfe7ab3ddbbf4e4d34b53</citedby><cites>FETCH-LOGICAL-c3635-5bd76aff6badf20dd8db84d223f9ff9c327b30c908cbdfe7ab3ddbbf4e4d34b53</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://onlinelibrary.wiley.com/doi/pdf/10.1002%2Ftee.23065$$EPDF$$P50$$Gwiley$$H</linktopdf><linktohtml>$$Uhttps://onlinelibrary.wiley.com/doi/full/10.1002%2Ftee.23065$$EHTML$$P50$$Gwiley$$H</linktohtml><link.rule.ids>314,780,784,1417,27924,27925,45574,45575</link.rule.ids></links><search><creatorcontrib>Liang, Peifeng</creatorcontrib><creatorcontrib>Li, Weite</creatorcontrib><creatorcontrib>Hu, Jinglu</creatorcontrib><title>Fast SVM training using data reconstruction for classification of very large datasets</title><title>IEEJ transactions on electrical and electronic engineering</title><description>This paper proposes a fast support vector machine (SVM) training method for the classification of very large datasets using data reconstruction. The idea is to scale down the training data by removing the samples that have low probability to become support vectors (SVs) in the feature space. For this purpose, it applies a series of gradually refined rough SVM classifiers with a quasi‐linear kernel to build rough separation boundaries and remove those samples that are far away from the boundary. In order to make the proposed algorithm efficient for both low‐dimensional and high‐dimensional datasets, efforts are made on three aspects. The first one is to compose a quasi‐linear kernel using the information of data manifold and potential separation boundary such that the samples mapped to feature space keep a sparse distribution, especially in the direction perpendicular to the separation boundary. The second one is to avoid computing Euclidean distances between samples, which may lose its effect on very high dimensional datasets when mapping the samples to feature space and selecting the samples for training data reconstruction. The third one is to design a sophisticated iterative algorithm to gradually refine the rough SVM classifier so as to remove non‐SVs efficiently. The proposed fast SVM training method is applied to different real‐world large datasets and compared with different methods, and simulation results confirm the effectiveness of the proposed method, especially for very high dimensional datasets. © 2019 Institute of Electrical Engineers of Japan. Published by John Wiley & Sons, Inc.</description><subject>Classification</subject><subject>Classifiers</subject><subject>Computer simulation</subject><subject>Datasets</subject><subject>fast SVM training</subject><subject>Iterative algorithms</subject><subject>Iterative methods</subject><subject>Kernels</subject><subject>large datasets</subject><subject>Mapping</subject><subject>quasi‐linear kernel</subject><subject>Reconstruction</subject><subject>Separation</subject><subject>support vector machine</subject><subject>Support vector machines</subject><subject>Training</subject><subject>training data reconstruction</subject><issn>1931-4973</issn><issn>1931-4981</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2020</creationdate><recordtype>article</recordtype><recordid>eNp1kEtLAzEUhYMoWKsL_0HAlYtpk9x5LqXUB1RcWN2GPEvKOKlJRum_d9oRd27uvRy-cw8chK4pmVFC2DwZM2NAyuIETWgDNMubmp7-3RWco4sYt4TkJdT1BL3di5jw6_szTkG4znUb3MfD1CIJHIzyXUyhV8n5DlsfsGpFjM46JY6St_jLhD1uRdiYoymaFC_RmRVtNFe_ezrkLNeLx2z18vC0uFtlCkooskLqqhTWllJoy4jWtZZ1rhkD21jbKGCVBKIaUiupramEBK2ltLnJNeSygCm6Gf_ugv_sTUx86_vQDZGcQUFYWVSMDtTtSKngYwzG8l1wHyLsOSX80BofWuPH1gZ2PrLfrjX7_0G-Xi5Hxw9kKXB2</recordid><startdate>202003</startdate><enddate>202003</enddate><creator>Liang, Peifeng</creator><creator>Li, Weite</creator><creator>Hu, Jinglu</creator><general>John Wiley & Sons, Inc</general><general>Wiley Subscription Services, Inc</general><scope>AAYXX</scope><scope>CITATION</scope><scope>7SP</scope><scope>8FD</scope><scope>L7M</scope></search><sort><creationdate>202003</creationdate><title>Fast SVM training using data reconstruction for classification of very large datasets</title><author>Liang, Peifeng ; Li, Weite ; Hu, Jinglu</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c3635-5bd76aff6badf20dd8db84d223f9ff9c327b30c908cbdfe7ab3ddbbf4e4d34b53</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2020</creationdate><topic>Classification</topic><topic>Classifiers</topic><topic>Computer simulation</topic><topic>Datasets</topic><topic>fast SVM training</topic><topic>Iterative algorithms</topic><topic>Iterative methods</topic><topic>Kernels</topic><topic>large datasets</topic><topic>Mapping</topic><topic>quasi‐linear kernel</topic><topic>Reconstruction</topic><topic>Separation</topic><topic>support vector machine</topic><topic>Support vector machines</topic><topic>Training</topic><topic>training data reconstruction</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Liang, Peifeng</creatorcontrib><creatorcontrib>Li, Weite</creatorcontrib><creatorcontrib>Hu, Jinglu</creatorcontrib><collection>CrossRef</collection><collection>Electronics & Communications Abstracts</collection><collection>Technology Research Database</collection><collection>Advanced Technologies Database with Aerospace</collection><jtitle>IEEJ transactions on electrical and electronic engineering</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Liang, Peifeng</au><au>Li, Weite</au><au>Hu, Jinglu</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Fast SVM training using data reconstruction for classification of very large datasets</atitle><jtitle>IEEJ transactions on electrical and electronic engineering</jtitle><date>2020-03</date><risdate>2020</risdate><volume>15</volume><issue>3</issue><spage>372</spage><epage>381</epage><pages>372-381</pages><issn>1931-4973</issn><eissn>1931-4981</eissn><abstract>This paper proposes a fast support vector machine (SVM) training method for the classification of very large datasets using data reconstruction. The idea is to scale down the training data by removing the samples that have low probability to become support vectors (SVs) in the feature space. For this purpose, it applies a series of gradually refined rough SVM classifiers with a quasi‐linear kernel to build rough separation boundaries and remove those samples that are far away from the boundary. In order to make the proposed algorithm efficient for both low‐dimensional and high‐dimensional datasets, efforts are made on three aspects. The first one is to compose a quasi‐linear kernel using the information of data manifold and potential separation boundary such that the samples mapped to feature space keep a sparse distribution, especially in the direction perpendicular to the separation boundary. The second one is to avoid computing Euclidean distances between samples, which may lose its effect on very high dimensional datasets when mapping the samples to feature space and selecting the samples for training data reconstruction. The third one is to design a sophisticated iterative algorithm to gradually refine the rough SVM classifier so as to remove non‐SVs efficiently. The proposed fast SVM training method is applied to different real‐world large datasets and compared with different methods, and simulation results confirm the effectiveness of the proposed method, especially for very high dimensional datasets. © 2019 Institute of Electrical Engineers of Japan. Published by John Wiley & Sons, Inc.</abstract><cop>Hoboken, USA</cop><pub>John Wiley & Sons, Inc</pub><doi>10.1002/tee.23065</doi><tpages>10</tpages></addata></record>
fulltext	fulltext
identifier	ISSN: 1931-4973
ispartof	IEEJ transactions on electrical and electronic engineering, 2020-03, Vol.15 (3), p.372-381
issn	1931-4973 1931-4981
language	eng
recordid	cdi_proquest_journals_2350265721
source	Access via Wiley Online Library
subjects	Classification Classifiers Computer simulation Datasets fast SVM training Iterative algorithms Iterative methods Kernels large datasets Mapping quasi‐linear kernel Reconstruction Separation support vector machine Support vector machines Training training data reconstruction
title	Fast SVM training using data reconstruction for classification of very large datasets
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-04T01%3A10%3A14IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Fast%20SVM%20training%20using%20data%20reconstruction%20for%20classification%20of%20very%20large%20datasets&rft.jtitle=IEEJ%20transactions%20on%20electrical%20and%20electronic%20engineering&rft.au=Liang,%20Peifeng&rft.date=2020-03&rft.volume=15&rft.issue=3&rft.spage=372&rft.epage=381&rft.pages=372-381&rft.issn=1931-4973&rft.eissn=1931-4981&rft_id=info:doi/10.1002/tee.23065&rft_dat=%3Cproquest_cross%3E2350265721%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2350265721&rft_id=info:pmid/&rfr_iscdi=true