Online Feature Selection of Class Imbalance via PA Algorithm
Imbalance classification techniques have been frequently applied in many machine learning application domains where the number of the majority (or positive) class of a dataset is much larger than that of the minority (or negative) class. Meanwhile, feature selection (FS) is one of the key techniques...
Gespeichert in:
Veröffentlicht in: | Journal of computer science and technology 2016-07, Vol.31 (4), p.673-682 |
---|---|
Hauptverfasser: | , , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 682 |
---|---|
container_issue | 4 |
container_start_page | 673 |
container_title | Journal of computer science and technology |
container_volume | 31 |
creator | Han, Chao Tan, Yun-Kun Zhu, Jin-Hui Guo, Yong Chen, Jian Wu, Qing-Yao |
description | Imbalance classification techniques have been frequently applied in many machine learning application domains where the number of the majority (or positive) class of a dataset is much larger than that of the minority (or negative) class. Meanwhile, feature selection (FS) is one of the key techniques for the high-dimensional classification task in a manner which greatly improves the classification performance and the computational efficiency. However, most studies of feature selection and imbalance classification are restricted to off-line batch learning, which is not well adapted to some practical scenarios. In this paper, we aim to solve high-dimensional imbalanced classification problem accurately and efficiently with only a small number of active features in an online fashion, and we propose two novel online learning algorithms for this purpose. In our approach, a classifier which involves only a small and fixed number of features is constructed to classify a sequence of imbalanced data received in an online manner. We formulate the construction of such online learner into an optimization problem and use an iterative approach to solve the problem based on the passive-aggressive (PA) algorithm as well as a truncated gradient (TG) method. We evaluate the performance of the proposed algorithms based on several real-world datasets, and our experimental results have demonstrated the effectiveness of the proposed algorithms in comparison with the baselines. |
doi_str_mv | 10.1007/s11390-016-1656-0 |
format | Article |
fullrecord | <record><control><sourceid>wanfang_jour_proqu</sourceid><recordid>TN_cdi_wanfang_journals_jsjkxjsxb_e201604005</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><cqvip_id>669375802</cqvip_id><wanfj_id>jsjkxjsxb_e201604005</wanfj_id><sourcerecordid>jsjkxjsxb_e201604005</sourcerecordid><originalsourceid>FETCH-LOGICAL-c413t-89a9dc3f46c1537b48b588c034c504194da773c0707d9b85fd2392ab0d4885813</originalsourceid><addsrcrecordid>eNp9kEtPwzAQhCMEEs8fwC2CCwcC6_gtcakqCpWQigScLcdx2oTUoXbK49_jKhVCHLis9_DNjHeS5BTBFQLg1wEhLCEDxDLEKMtgJzlAgkFGOJG7cQeATMaxnxyG0ABgDoQcJDcz19bOphOr-7W36ZNtrenrzqVdlY5bHUI6XRa61c7Y9L3W6eMoHbXzztf9Ynmc7FW6DfZk-x4lL5Pb5_F99jC7m45HD5khCPeZkFqWBleEGUQxL4goqBAGMDEUCJKk1JxjAxx4KQtBqzLHMtcFlEQIKhA-Si4H3w_tKu3mqunW3sVE1YTm9bMJn4WyebwdCACN-MWAv_lutbahV8s6GNvGI2y3DgqJnFLKmcARPf-D_lgjARFkFG0oNFDGdyF4W6k3Xy-1_1II1KZ-NdSv4hfUpn4FUZMPmhBZN7f-l_M_orNt0KJz81XU_SQxJjGnAnL8DZioj2I</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1801826513</pqid></control><display><type>article</type><title>Online Feature Selection of Class Imbalance via PA Algorithm</title><source>SpringerLink Journals - AutoHoldings</source><creator>Han, Chao ; Tan, Yun-Kun ; Zhu, Jin-Hui ; Guo, Yong ; Chen, Jian ; Wu, Qing-Yao</creator><creatorcontrib>Han, Chao ; Tan, Yun-Kun ; Zhu, Jin-Hui ; Guo, Yong ; Chen, Jian ; Wu, Qing-Yao</creatorcontrib><description>Imbalance classification techniques have been frequently applied in many machine learning application domains where the number of the majority (or positive) class of a dataset is much larger than that of the minority (or negative) class. Meanwhile, feature selection (FS) is one of the key techniques for the high-dimensional classification task in a manner which greatly improves the classification performance and the computational efficiency. However, most studies of feature selection and imbalance classification are restricted to off-line batch learning, which is not well adapted to some practical scenarios. In this paper, we aim to solve high-dimensional imbalanced classification problem accurately and efficiently with only a small number of active features in an online fashion, and we propose two novel online learning algorithms for this purpose. In our approach, a classifier which involves only a small and fixed number of features is constructed to classify a sequence of imbalanced data received in an online manner. We formulate the construction of such online learner into an optimization problem and use an iterative approach to solve the problem based on the passive-aggressive (PA) algorithm as well as a truncated gradient (TG) method. We evaluate the performance of the proposed algorithms based on several real-world datasets, and our experimental results have demonstrated the effectiveness of the proposed algorithms in comparison with the baselines.</description><identifier>ISSN: 1000-9000</identifier><identifier>EISSN: 1860-4749</identifier><identifier>DOI: 10.1007/s11390-016-1656-0</identifier><language>eng</language><publisher>New York: Springer US</publisher><subject>Accuracy ; Algorithms ; Artificial Intelligence ; Classification ; Computational efficiency ; Computer Science ; Construction ; Data mining ; Data Structures and Information Theory ; Datasets ; Distance learning ; Feature selection ; Information management ; Information Systems Applications (incl.Internet) ; Machine learning ; Methods ; Online ; Optimization ; Passive-aggressive behavior ; Performance evaluation ; Regular Paper ; Sampling techniques ; Software ; Software Engineering ; Studies ; Tasks ; Theory of Computation ; Training ; Websites ; 不平衡数据 ; 分类技术 ; 分类问题 ; 功放 ; 在线学习算法 ; 机器学习 ; 特征选择 ; 计算效率</subject><ispartof>Journal of computer science and technology, 2016-07, Vol.31 (4), p.673-682</ispartof><rights>Springer Science+Business Media New York 2016</rights><rights>Springer Science+Business Media New York 2016.</rights><rights>Copyright © Wanfang Data Co. Ltd. All Rights Reserved.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c413t-89a9dc3f46c1537b48b588c034c504194da773c0707d9b85fd2392ab0d4885813</citedby><cites>FETCH-LOGICAL-c413t-89a9dc3f46c1537b48b588c034c504194da773c0707d9b85fd2392ab0d4885813</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Uhttp://image.cqvip.com/vip1000/qk/85226X/85226X.jpg</thumbnail><linktopdf>$$Uhttps://link.springer.com/content/pdf/10.1007/s11390-016-1656-0$$EPDF$$P50$$Gspringer$$H</linktopdf><linktohtml>$$Uhttps://link.springer.com/10.1007/s11390-016-1656-0$$EHTML$$P50$$Gspringer$$H</linktohtml><link.rule.ids>314,777,781,27905,27906,41469,42538,51300</link.rule.ids></links><search><creatorcontrib>Han, Chao</creatorcontrib><creatorcontrib>Tan, Yun-Kun</creatorcontrib><creatorcontrib>Zhu, Jin-Hui</creatorcontrib><creatorcontrib>Guo, Yong</creatorcontrib><creatorcontrib>Chen, Jian</creatorcontrib><creatorcontrib>Wu, Qing-Yao</creatorcontrib><title>Online Feature Selection of Class Imbalance via PA Algorithm</title><title>Journal of computer science and technology</title><addtitle>J. Comput. Sci. Technol</addtitle><addtitle>Journal of Computer Science and Technology</addtitle><description>Imbalance classification techniques have been frequently applied in many machine learning application domains where the number of the majority (or positive) class of a dataset is much larger than that of the minority (or negative) class. Meanwhile, feature selection (FS) is one of the key techniques for the high-dimensional classification task in a manner which greatly improves the classification performance and the computational efficiency. However, most studies of feature selection and imbalance classification are restricted to off-line batch learning, which is not well adapted to some practical scenarios. In this paper, we aim to solve high-dimensional imbalanced classification problem accurately and efficiently with only a small number of active features in an online fashion, and we propose two novel online learning algorithms for this purpose. In our approach, a classifier which involves only a small and fixed number of features is constructed to classify a sequence of imbalanced data received in an online manner. We formulate the construction of such online learner into an optimization problem and use an iterative approach to solve the problem based on the passive-aggressive (PA) algorithm as well as a truncated gradient (TG) method. We evaluate the performance of the proposed algorithms based on several real-world datasets, and our experimental results have demonstrated the effectiveness of the proposed algorithms in comparison with the baselines.</description><subject>Accuracy</subject><subject>Algorithms</subject><subject>Artificial Intelligence</subject><subject>Classification</subject><subject>Computational efficiency</subject><subject>Computer Science</subject><subject>Construction</subject><subject>Data mining</subject><subject>Data Structures and Information Theory</subject><subject>Datasets</subject><subject>Distance learning</subject><subject>Feature selection</subject><subject>Information management</subject><subject>Information Systems Applications (incl.Internet)</subject><subject>Machine learning</subject><subject>Methods</subject><subject>Online</subject><subject>Optimization</subject><subject>Passive-aggressive behavior</subject><subject>Performance evaluation</subject><subject>Regular Paper</subject><subject>Sampling techniques</subject><subject>Software</subject><subject>Software Engineering</subject><subject>Studies</subject><subject>Tasks</subject><subject>Theory of Computation</subject><subject>Training</subject><subject>Websites</subject><subject>不平衡数据</subject><subject>分类技术</subject><subject>分类问题</subject><subject>功放</subject><subject>在线学习算法</subject><subject>机器学习</subject><subject>特征选择</subject><subject>计算效率</subject><issn>1000-9000</issn><issn>1860-4749</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2016</creationdate><recordtype>article</recordtype><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><sourceid>GNUQQ</sourceid><recordid>eNp9kEtPwzAQhCMEEs8fwC2CCwcC6_gtcakqCpWQigScLcdx2oTUoXbK49_jKhVCHLis9_DNjHeS5BTBFQLg1wEhLCEDxDLEKMtgJzlAgkFGOJG7cQeATMaxnxyG0ABgDoQcJDcz19bOphOr-7W36ZNtrenrzqVdlY5bHUI6XRa61c7Y9L3W6eMoHbXzztf9Ynmc7FW6DfZk-x4lL5Pb5_F99jC7m45HD5khCPeZkFqWBleEGUQxL4goqBAGMDEUCJKk1JxjAxx4KQtBqzLHMtcFlEQIKhA-Si4H3w_tKu3mqunW3sVE1YTm9bMJn4WyebwdCACN-MWAv_lutbahV8s6GNvGI2y3DgqJnFLKmcARPf-D_lgjARFkFG0oNFDGdyF4W6k3Xy-1_1II1KZ-NdSv4hfUpn4FUZMPmhBZN7f-l_M_orNt0KJz81XU_SQxJjGnAnL8DZioj2I</recordid><startdate>20160701</startdate><enddate>20160701</enddate><creator>Han, Chao</creator><creator>Tan, Yun-Kun</creator><creator>Zhu, Jin-Hui</creator><creator>Guo, Yong</creator><creator>Chen, Jian</creator><creator>Wu, Qing-Yao</creator><general>Springer US</general><general>Springer Nature B.V</general><general>School of Software Engineering, South China University of Technology, Guangzhou 510000, China</general><scope>2RA</scope><scope>92L</scope><scope>CQIGP</scope><scope>W92</scope><scope>~WA</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>3V.</scope><scope>7SC</scope><scope>7WY</scope><scope>7WZ</scope><scope>7XB</scope><scope>87Z</scope><scope>8AL</scope><scope>8FD</scope><scope>8FE</scope><scope>8FG</scope><scope>8FK</scope><scope>8FL</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BEZIV</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>FRNLG</scope><scope>F~G</scope><scope>GNUQQ</scope><scope>HCIFZ</scope><scope>JQ2</scope><scope>K60</scope><scope>K6~</scope><scope>K7-</scope><scope>L.-</scope><scope>L6V</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>M0C</scope><scope>M0N</scope><scope>M7S</scope><scope>P5Z</scope><scope>P62</scope><scope>PQBIZ</scope><scope>PQBZA</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PTHSS</scope><scope>Q9U</scope><scope>2B.</scope><scope>4A8</scope><scope>92I</scope><scope>93N</scope><scope>PSX</scope><scope>TCJ</scope></search><sort><creationdate>20160701</creationdate><title>Online Feature Selection of Class Imbalance via PA Algorithm</title><author>Han, Chao ; Tan, Yun-Kun ; Zhu, Jin-Hui ; Guo, Yong ; Chen, Jian ; Wu, Qing-Yao</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c413t-89a9dc3f46c1537b48b588c034c504194da773c0707d9b85fd2392ab0d4885813</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2016</creationdate><topic>Accuracy</topic><topic>Algorithms</topic><topic>Artificial Intelligence</topic><topic>Classification</topic><topic>Computational efficiency</topic><topic>Computer Science</topic><topic>Construction</topic><topic>Data mining</topic><topic>Data Structures and Information Theory</topic><topic>Datasets</topic><topic>Distance learning</topic><topic>Feature selection</topic><topic>Information management</topic><topic>Information Systems Applications (incl.Internet)</topic><topic>Machine learning</topic><topic>Methods</topic><topic>Online</topic><topic>Optimization</topic><topic>Passive-aggressive behavior</topic><topic>Performance evaluation</topic><topic>Regular Paper</topic><topic>Sampling techniques</topic><topic>Software</topic><topic>Software Engineering</topic><topic>Studies</topic><topic>Tasks</topic><topic>Theory of Computation</topic><topic>Training</topic><topic>Websites</topic><topic>不平衡数据</topic><topic>分类技术</topic><topic>分类问题</topic><topic>功放</topic><topic>在线学习算法</topic><topic>机器学习</topic><topic>特征选择</topic><topic>计算效率</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Han, Chao</creatorcontrib><creatorcontrib>Tan, Yun-Kun</creatorcontrib><creatorcontrib>Zhu, Jin-Hui</creatorcontrib><creatorcontrib>Guo, Yong</creatorcontrib><creatorcontrib>Chen, Jian</creatorcontrib><creatorcontrib>Wu, Qing-Yao</creatorcontrib><collection>中文科技期刊数据库</collection><collection>中文科技期刊数据库-CALIS站点</collection><collection>中文科技期刊数据库-7.0平台</collection><collection>中文科技期刊数据库-工程技术</collection><collection>中文科技期刊数据库- 镜像站点</collection><collection>CrossRef</collection><collection>ProQuest Central (Corporate)</collection><collection>Computer and Information Systems Abstracts</collection><collection>ABI/INFORM Collection</collection><collection>ABI/INFORM Global (PDF only)</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>ABI/INFORM Global (Alumni Edition)</collection><collection>Computing Database (Alumni Edition)</collection><collection>Technology Research Database</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>ABI/INFORM Collection (Alumni Edition)</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>Advanced Technologies & Aerospace Collection</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Business Premium Collection</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>Business Premium Collection (Alumni)</collection><collection>ABI/INFORM Global (Corporate)</collection><collection>ProQuest Central Student</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Computer Science Collection</collection><collection>ProQuest Business Collection (Alumni Edition)</collection><collection>ProQuest Business Collection</collection><collection>Computer Science Database</collection><collection>ABI/INFORM Professional Advanced</collection><collection>ProQuest Engineering Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>ABI/INFORM Global</collection><collection>Computing Database</collection><collection>Engineering Database</collection><collection>Advanced Technologies & Aerospace Database</collection><collection>ProQuest Advanced Technologies & Aerospace Collection</collection><collection>ProQuest One Business</collection><collection>ProQuest One Business (Alumni)</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>Engineering Collection</collection><collection>ProQuest Central Basic</collection><collection>Wanfang Data Journals - Hong Kong</collection><collection>WANFANG Data Centre</collection><collection>Wanfang Data Journals</collection><collection>万方数据期刊 - 香港版</collection><collection>China Online Journals (COJ)</collection><collection>China Online Journals (COJ)</collection><jtitle>Journal of computer science and technology</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Han, Chao</au><au>Tan, Yun-Kun</au><au>Zhu, Jin-Hui</au><au>Guo, Yong</au><au>Chen, Jian</au><au>Wu, Qing-Yao</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Online Feature Selection of Class Imbalance via PA Algorithm</atitle><jtitle>Journal of computer science and technology</jtitle><stitle>J. Comput. Sci. Technol</stitle><addtitle>Journal of Computer Science and Technology</addtitle><date>2016-07-01</date><risdate>2016</risdate><volume>31</volume><issue>4</issue><spage>673</spage><epage>682</epage><pages>673-682</pages><issn>1000-9000</issn><eissn>1860-4749</eissn><abstract>Imbalance classification techniques have been frequently applied in many machine learning application domains where the number of the majority (or positive) class of a dataset is much larger than that of the minority (or negative) class. Meanwhile, feature selection (FS) is one of the key techniques for the high-dimensional classification task in a manner which greatly improves the classification performance and the computational efficiency. However, most studies of feature selection and imbalance classification are restricted to off-line batch learning, which is not well adapted to some practical scenarios. In this paper, we aim to solve high-dimensional imbalanced classification problem accurately and efficiently with only a small number of active features in an online fashion, and we propose two novel online learning algorithms for this purpose. In our approach, a classifier which involves only a small and fixed number of features is constructed to classify a sequence of imbalanced data received in an online manner. We formulate the construction of such online learner into an optimization problem and use an iterative approach to solve the problem based on the passive-aggressive (PA) algorithm as well as a truncated gradient (TG) method. We evaluate the performance of the proposed algorithms based on several real-world datasets, and our experimental results have demonstrated the effectiveness of the proposed algorithms in comparison with the baselines.</abstract><cop>New York</cop><pub>Springer US</pub><doi>10.1007/s11390-016-1656-0</doi><tpages>10</tpages></addata></record> |
fulltext | fulltext |
identifier | ISSN: 1000-9000 |
ispartof | Journal of computer science and technology, 2016-07, Vol.31 (4), p.673-682 |
issn | 1000-9000 1860-4749 |
language | eng |
recordid | cdi_wanfang_journals_jsjkxjsxb_e201604005 |
source | SpringerLink Journals - AutoHoldings |
subjects | Accuracy Algorithms Artificial Intelligence Classification Computational efficiency Computer Science Construction Data mining Data Structures and Information Theory Datasets Distance learning Feature selection Information management Information Systems Applications (incl.Internet) Machine learning Methods Online Optimization Passive-aggressive behavior Performance evaluation Regular Paper Sampling techniques Software Software Engineering Studies Tasks Theory of Computation Training Websites 不平衡数据 分类技术 分类问题 功放 在线学习算法 机器学习 特征选择 计算效率 |
title | Online Feature Selection of Class Imbalance via PA Algorithm |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-18T20%3A13%3A06IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-wanfang_jour_proqu&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Online%20Feature%20Selection%20of%20Class%20Imbalance%20via%20PA%20Algorithm&rft.jtitle=Journal%20of%20computer%20science%20and%20technology&rft.au=Han,%20Chao&rft.date=2016-07-01&rft.volume=31&rft.issue=4&rft.spage=673&rft.epage=682&rft.pages=673-682&rft.issn=1000-9000&rft.eissn=1860-4749&rft_id=info:doi/10.1007/s11390-016-1656-0&rft_dat=%3Cwanfang_jour_proqu%3Ejsjkxjsxb_e201604005%3C/wanfang_jour_proqu%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=1801826513&rft_id=info:pmid/&rft_cqvip_id=669375802&rft_wanfj_id=jsjkxjsxb_e201604005&rfr_iscdi=true |