KNN weighted reduced universum twin SVM for class imbalance learning
In real world problems, imbalance of data samples poses major challenge for the classification problems as the data samples of a particular class are dominating. Problems like fault and disease detection involve imbalance data and hence need attention to avoid the bias towards a particular class. Th...
Gespeichert in:
Veröffentlicht in: | Knowledge-based systems 2022-06, Vol.245, p.108578, Article 108578 |
---|---|
Hauptverfasser: | , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | |
---|---|
container_issue | |
container_start_page | 108578 |
container_title | Knowledge-based systems |
container_volume | 245 |
creator | Ganaie, M.A. Tanveer, M. |
description | In real world problems, imbalance of data samples poses major challenge for the classification problems as the data samples of a particular class are dominating. Problems like fault and disease detection involve imbalance data and hence need attention to avoid the bias towards a particular class. The classification models like support vector machines (SVM) get biased to majority class samples and hence results in misclassification of the minority class samples. SVM suffers as no prior information related to the data is involved in the generation of hyperplanes. Also, local information of the neighbourhood is ignored in SVM samples and thus treats each sample equally for generating the hyperplanes. However, the data points may be contaminated and may mislead the generation of hyperplanes. Inspired by the idea of prior data information and local neighbourhood information, we propose K-nearest neighbour based weighted reduced universum twin SVM for class imbalance learning (KWRUTSVM-CIL). The proposed KWRUTSVM-CIL embodies the local neighbourhood information and uses universum data to balance the classes in class imbalance problems. Local neighbourhood information is incorporated via weight matrix in the objective function. In proposed KWRUTSVM-CIL model, weight vectors are used in the corresponding constraints of the objective functions to exploit the interclass information. The oversampling and undersampling approaches are followed to balance the data in class imbalance problems. Universum data gives prior information of the data. Twin SVM, universum twin SVM, and reduced universum twin SVM for class imbalance implement empirical risk minimization principle and thus may lead to overfitting. However, the proposed KWRUTSVM-CIL model embodies regularization term to maximize the margin and implement the structural risk minimization principle which is the marrow of statistical learning and overcomes the issues of overfitting. Experimental results and the statistical analysis signify that the generalization ability of proposed KWRUTSVM-CIL model is superior in comparison to other twin SVM based models. As an application, we use the proposed KWRUTSVM-CIL model for the diagnosis of Alzheimer’s disease and breast cancer disease. The proposed KWRUTSVM-CIL model showed better generalization performance compared to other twin SVM based models in biomedical datasets.
•To incorporate the local neighbourhood information, K nearest neighbourbased weights are used in the pr |
doi_str_mv | 10.1016/j.knosys.2022.108578 |
format | Article |
fullrecord | <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2667853790</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><els_id>S0950705122002581</els_id><sourcerecordid>2667853790</sourcerecordid><originalsourceid>FETCH-LOGICAL-c380t-995288e0c44e32bed68a1a4d8490488a7086f2d8711206a2a61b8b2173fa35e23</originalsourceid><addsrcrecordid>eNp9kEtPwzAQhC0EEqXwDzhY4pyydh52LkioPAWUA4-r5Tib4pAmxU5a9d_jKpw5jbSamdV8hJwzmDFg2WU9-247v_MzDpyHk0yFPCATJgWPRAL5IZlAnkIkIGXH5MT7GiA4mZyQm6fFgm7RLr96LKnDcjBBh9Zu0PlhRfutbenb5wutOkdNo72ndlXoRrcGaYPatbZdnpKjSjcez_50Sj7ubt_nD9Hz6_3j_Po5MrGEPsrzlEuJYJIEY15gmUnNdFLKJIdESi1AZhUvpWCMQ6a5zlghC85EXOk4RR5PycXYu3bdz4C-V3U3uDa8VDzLhExjkUNwJaPLuM57h5VaO7vSbqcYqD0vVauRl9rzUiOvELsaYxgWbCw65Y3FMLO0Dk2vys7-X_ALrgN0IA</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2667853790</pqid></control><display><type>article</type><title>KNN weighted reduced universum twin SVM for class imbalance learning</title><source>Elsevier ScienceDirect Journals</source><creator>Ganaie, M.A. ; Tanveer, M.</creator><creatorcontrib>Ganaie, M.A. ; Tanveer, M. ; Alzheimer’s Disease Neuroimaging Initiative</creatorcontrib><description>In real world problems, imbalance of data samples poses major challenge for the classification problems as the data samples of a particular class are dominating. Problems like fault and disease detection involve imbalance data and hence need attention to avoid the bias towards a particular class. The classification models like support vector machines (SVM) get biased to majority class samples and hence results in misclassification of the minority class samples. SVM suffers as no prior information related to the data is involved in the generation of hyperplanes. Also, local information of the neighbourhood is ignored in SVM samples and thus treats each sample equally for generating the hyperplanes. However, the data points may be contaminated and may mislead the generation of hyperplanes. Inspired by the idea of prior data information and local neighbourhood information, we propose K-nearest neighbour based weighted reduced universum twin SVM for class imbalance learning (KWRUTSVM-CIL). The proposed KWRUTSVM-CIL embodies the local neighbourhood information and uses universum data to balance the classes in class imbalance problems. Local neighbourhood information is incorporated via weight matrix in the objective function. In proposed KWRUTSVM-CIL model, weight vectors are used in the corresponding constraints of the objective functions to exploit the interclass information. The oversampling and undersampling approaches are followed to balance the data in class imbalance problems. Universum data gives prior information of the data. Twin SVM, universum twin SVM, and reduced universum twin SVM for class imbalance implement empirical risk minimization principle and thus may lead to overfitting. However, the proposed KWRUTSVM-CIL model embodies regularization term to maximize the margin and implement the structural risk minimization principle which is the marrow of statistical learning and overcomes the issues of overfitting. Experimental results and the statistical analysis signify that the generalization ability of proposed KWRUTSVM-CIL model is superior in comparison to other twin SVM based models. As an application, we use the proposed KWRUTSVM-CIL model for the diagnosis of Alzheimer’s disease and breast cancer disease. The proposed KWRUTSVM-CIL model showed better generalization performance compared to other twin SVM based models in biomedical datasets.
•To incorporate the local neighbourhood information, K nearest neighbourbased weights are used in the proposed KWRUTSVM-CIL.•Unlike RUTSVM-CIL, UTSVM, TSVM and FTWSVM models which implement the empirical risk minimization principle, the proposed KWRUTSVM-CIL model implements the structural risk minimization principle.•Similar to RUTSVM-CIL, the proposed KWRUTSVM-CIL model incorporates prior information about the data (universum data) to handle the class imbalance problem.•The matrices appearing in the Wolfe dual of the proposed KWRUTSVM-CIL are positive definite, while as the matrices in the Wolfe dual of RUTSVM-CIL, UTSVM, TSVM and FTWSVM are positive semi-definite.•Experimental results and statistical analysis show the efficacy of the proposed KWRUTSVM-CIL model. As an application, we use the proposed KWRUTSVM-CIL model for the classification of Alzheimer’s disease and breast cancer subjects.</description><identifier>ISSN: 0950-7051</identifier><identifier>EISSN: 1872-7409</identifier><identifier>DOI: 10.1016/j.knosys.2022.108578</identifier><language>eng</language><publisher>Amsterdam: Elsevier B.V</publisher><subject>Alzheimer's disease ; Class imbalance ; Classification ; Data points ; Empirical analysis ; Hyperplanes ; Imbalance ratio ; KNN weighted ; Learning ; Mathematical analysis ; Neighborhoods ; Optimization ; Oversampling ; Principles ; Rectangular kernel ; Regularization ; Statistical analysis ; Statistical methods ; Support vector machines ; Twin support vector machine ; Universum</subject><ispartof>Knowledge-based systems, 2022-06, Vol.245, p.108578, Article 108578</ispartof><rights>2022 Elsevier B.V.</rights><rights>Copyright Elsevier Science Ltd. Jun 7, 2022</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c380t-995288e0c44e32bed68a1a4d8490488a7086f2d8711206a2a61b8b2173fa35e23</citedby><cites>FETCH-LOGICAL-c380t-995288e0c44e32bed68a1a4d8490488a7086f2d8711206a2a61b8b2173fa35e23</cites><orcidid>0000-0002-5727-3697</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://www.sciencedirect.com/science/article/pii/S0950705122002581$$EHTML$$P50$$Gelsevier$$H</linktohtml><link.rule.ids>314,776,780,3537,27901,27902,65306</link.rule.ids></links><search><creatorcontrib>Ganaie, M.A.</creatorcontrib><creatorcontrib>Tanveer, M.</creatorcontrib><creatorcontrib>Alzheimer’s Disease Neuroimaging Initiative</creatorcontrib><title>KNN weighted reduced universum twin SVM for class imbalance learning</title><title>Knowledge-based systems</title><description>In real world problems, imbalance of data samples poses major challenge for the classification problems as the data samples of a particular class are dominating. Problems like fault and disease detection involve imbalance data and hence need attention to avoid the bias towards a particular class. The classification models like support vector machines (SVM) get biased to majority class samples and hence results in misclassification of the minority class samples. SVM suffers as no prior information related to the data is involved in the generation of hyperplanes. Also, local information of the neighbourhood is ignored in SVM samples and thus treats each sample equally for generating the hyperplanes. However, the data points may be contaminated and may mislead the generation of hyperplanes. Inspired by the idea of prior data information and local neighbourhood information, we propose K-nearest neighbour based weighted reduced universum twin SVM for class imbalance learning (KWRUTSVM-CIL). The proposed KWRUTSVM-CIL embodies the local neighbourhood information and uses universum data to balance the classes in class imbalance problems. Local neighbourhood information is incorporated via weight matrix in the objective function. In proposed KWRUTSVM-CIL model, weight vectors are used in the corresponding constraints of the objective functions to exploit the interclass information. The oversampling and undersampling approaches are followed to balance the data in class imbalance problems. Universum data gives prior information of the data. Twin SVM, universum twin SVM, and reduced universum twin SVM for class imbalance implement empirical risk minimization principle and thus may lead to overfitting. However, the proposed KWRUTSVM-CIL model embodies regularization term to maximize the margin and implement the structural risk minimization principle which is the marrow of statistical learning and overcomes the issues of overfitting. Experimental results and the statistical analysis signify that the generalization ability of proposed KWRUTSVM-CIL model is superior in comparison to other twin SVM based models. As an application, we use the proposed KWRUTSVM-CIL model for the diagnosis of Alzheimer’s disease and breast cancer disease. The proposed KWRUTSVM-CIL model showed better generalization performance compared to other twin SVM based models in biomedical datasets.
•To incorporate the local neighbourhood information, K nearest neighbourbased weights are used in the proposed KWRUTSVM-CIL.•Unlike RUTSVM-CIL, UTSVM, TSVM and FTWSVM models which implement the empirical risk minimization principle, the proposed KWRUTSVM-CIL model implements the structural risk minimization principle.•Similar to RUTSVM-CIL, the proposed KWRUTSVM-CIL model incorporates prior information about the data (universum data) to handle the class imbalance problem.•The matrices appearing in the Wolfe dual of the proposed KWRUTSVM-CIL are positive definite, while as the matrices in the Wolfe dual of RUTSVM-CIL, UTSVM, TSVM and FTWSVM are positive semi-definite.•Experimental results and statistical analysis show the efficacy of the proposed KWRUTSVM-CIL model. As an application, we use the proposed KWRUTSVM-CIL model for the classification of Alzheimer’s disease and breast cancer subjects.</description><subject>Alzheimer's disease</subject><subject>Class imbalance</subject><subject>Classification</subject><subject>Data points</subject><subject>Empirical analysis</subject><subject>Hyperplanes</subject><subject>Imbalance ratio</subject><subject>KNN weighted</subject><subject>Learning</subject><subject>Mathematical analysis</subject><subject>Neighborhoods</subject><subject>Optimization</subject><subject>Oversampling</subject><subject>Principles</subject><subject>Rectangular kernel</subject><subject>Regularization</subject><subject>Statistical analysis</subject><subject>Statistical methods</subject><subject>Support vector machines</subject><subject>Twin support vector machine</subject><subject>Universum</subject><issn>0950-7051</issn><issn>1872-7409</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><recordid>eNp9kEtPwzAQhC0EEqXwDzhY4pyydh52LkioPAWUA4-r5Tib4pAmxU5a9d_jKpw5jbSamdV8hJwzmDFg2WU9-247v_MzDpyHk0yFPCATJgWPRAL5IZlAnkIkIGXH5MT7GiA4mZyQm6fFgm7RLr96LKnDcjBBh9Zu0PlhRfutbenb5wutOkdNo72ndlXoRrcGaYPatbZdnpKjSjcez_50Sj7ubt_nD9Hz6_3j_Po5MrGEPsrzlEuJYJIEY15gmUnNdFLKJIdESi1AZhUvpWCMQ6a5zlghC85EXOk4RR5PycXYu3bdz4C-V3U3uDa8VDzLhExjkUNwJaPLuM57h5VaO7vSbqcYqD0vVauRl9rzUiOvELsaYxgWbCw65Y3FMLO0Dk2vys7-X_ALrgN0IA</recordid><startdate>20220607</startdate><enddate>20220607</enddate><creator>Ganaie, M.A.</creator><creator>Tanveer, M.</creator><general>Elsevier B.V</general><general>Elsevier Science Ltd</general><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>8FD</scope><scope>E3H</scope><scope>F2A</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0002-5727-3697</orcidid></search><sort><creationdate>20220607</creationdate><title>KNN weighted reduced universum twin SVM for class imbalance learning</title><author>Ganaie, M.A. ; Tanveer, M.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c380t-995288e0c44e32bed68a1a4d8490488a7086f2d8711206a2a61b8b2173fa35e23</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Alzheimer's disease</topic><topic>Class imbalance</topic><topic>Classification</topic><topic>Data points</topic><topic>Empirical analysis</topic><topic>Hyperplanes</topic><topic>Imbalance ratio</topic><topic>KNN weighted</topic><topic>Learning</topic><topic>Mathematical analysis</topic><topic>Neighborhoods</topic><topic>Optimization</topic><topic>Oversampling</topic><topic>Principles</topic><topic>Rectangular kernel</topic><topic>Regularization</topic><topic>Statistical analysis</topic><topic>Statistical methods</topic><topic>Support vector machines</topic><topic>Twin support vector machine</topic><topic>Universum</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Ganaie, M.A.</creatorcontrib><creatorcontrib>Tanveer, M.</creatorcontrib><creatorcontrib>Alzheimer’s Disease Neuroimaging Initiative</creatorcontrib><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>Library & Information Sciences Abstracts (LISA)</collection><collection>Library & Information Science Abstracts (LISA)</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>Knowledge-based systems</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Ganaie, M.A.</au><au>Tanveer, M.</au><aucorp>Alzheimer’s Disease Neuroimaging Initiative</aucorp><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>KNN weighted reduced universum twin SVM for class imbalance learning</atitle><jtitle>Knowledge-based systems</jtitle><date>2022-06-07</date><risdate>2022</risdate><volume>245</volume><spage>108578</spage><pages>108578-</pages><artnum>108578</artnum><issn>0950-7051</issn><eissn>1872-7409</eissn><abstract>In real world problems, imbalance of data samples poses major challenge for the classification problems as the data samples of a particular class are dominating. Problems like fault and disease detection involve imbalance data and hence need attention to avoid the bias towards a particular class. The classification models like support vector machines (SVM) get biased to majority class samples and hence results in misclassification of the minority class samples. SVM suffers as no prior information related to the data is involved in the generation of hyperplanes. Also, local information of the neighbourhood is ignored in SVM samples and thus treats each sample equally for generating the hyperplanes. However, the data points may be contaminated and may mislead the generation of hyperplanes. Inspired by the idea of prior data information and local neighbourhood information, we propose K-nearest neighbour based weighted reduced universum twin SVM for class imbalance learning (KWRUTSVM-CIL). The proposed KWRUTSVM-CIL embodies the local neighbourhood information and uses universum data to balance the classes in class imbalance problems. Local neighbourhood information is incorporated via weight matrix in the objective function. In proposed KWRUTSVM-CIL model, weight vectors are used in the corresponding constraints of the objective functions to exploit the interclass information. The oversampling and undersampling approaches are followed to balance the data in class imbalance problems. Universum data gives prior information of the data. Twin SVM, universum twin SVM, and reduced universum twin SVM for class imbalance implement empirical risk minimization principle and thus may lead to overfitting. However, the proposed KWRUTSVM-CIL model embodies regularization term to maximize the margin and implement the structural risk minimization principle which is the marrow of statistical learning and overcomes the issues of overfitting. Experimental results and the statistical analysis signify that the generalization ability of proposed KWRUTSVM-CIL model is superior in comparison to other twin SVM based models. As an application, we use the proposed KWRUTSVM-CIL model for the diagnosis of Alzheimer’s disease and breast cancer disease. The proposed KWRUTSVM-CIL model showed better generalization performance compared to other twin SVM based models in biomedical datasets.
•To incorporate the local neighbourhood information, K nearest neighbourbased weights are used in the proposed KWRUTSVM-CIL.•Unlike RUTSVM-CIL, UTSVM, TSVM and FTWSVM models which implement the empirical risk minimization principle, the proposed KWRUTSVM-CIL model implements the structural risk minimization principle.•Similar to RUTSVM-CIL, the proposed KWRUTSVM-CIL model incorporates prior information about the data (universum data) to handle the class imbalance problem.•The matrices appearing in the Wolfe dual of the proposed KWRUTSVM-CIL are positive definite, while as the matrices in the Wolfe dual of RUTSVM-CIL, UTSVM, TSVM and FTWSVM are positive semi-definite.•Experimental results and statistical analysis show the efficacy of the proposed KWRUTSVM-CIL model. As an application, we use the proposed KWRUTSVM-CIL model for the classification of Alzheimer’s disease and breast cancer subjects.</abstract><cop>Amsterdam</cop><pub>Elsevier B.V</pub><doi>10.1016/j.knosys.2022.108578</doi><orcidid>https://orcid.org/0000-0002-5727-3697</orcidid><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | ISSN: 0950-7051 |
ispartof | Knowledge-based systems, 2022-06, Vol.245, p.108578, Article 108578 |
issn | 0950-7051 1872-7409 |
language | eng |
recordid | cdi_proquest_journals_2667853790 |
source | Elsevier ScienceDirect Journals |
subjects | Alzheimer's disease Class imbalance Classification Data points Empirical analysis Hyperplanes Imbalance ratio KNN weighted Learning Mathematical analysis Neighborhoods Optimization Oversampling Principles Rectangular kernel Regularization Statistical analysis Statistical methods Support vector machines Twin support vector machine Universum |
title | KNN weighted reduced universum twin SVM for class imbalance learning |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-11T03%3A46%3A43IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=KNN%20weighted%20reduced%20universum%20twin%20SVM%20for%20class%20imbalance%20learning&rft.jtitle=Knowledge-based%20systems&rft.au=Ganaie,%20M.A.&rft.aucorp=Alzheimer%E2%80%99s%20Disease%20Neuroimaging%20Initiative&rft.date=2022-06-07&rft.volume=245&rft.spage=108578&rft.pages=108578-&rft.artnum=108578&rft.issn=0950-7051&rft.eissn=1872-7409&rft_id=info:doi/10.1016/j.knosys.2022.108578&rft_dat=%3Cproquest_cross%3E2667853790%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2667853790&rft_id=info:pmid/&rft_els_id=S0950705122002581&rfr_iscdi=true |