KNN weighted reduced universum twin SVM for class imbalance learning

In real world problems, imbalance of data samples poses major challenge for the classification problems as the data samples of a particular class are dominating. Problems like fault and disease detection involve imbalance data and hence need attention to avoid the bias towards a particular class. Th...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Knowledge-based systems 2022-06, Vol.245, p.108578, Article 108578
Hauptverfasser: Ganaie, M.A., Tanveer, M.
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page 108578
container_title Knowledge-based systems
container_volume 245
creator Ganaie, M.A.
Tanveer, M.
description In real world problems, imbalance of data samples poses major challenge for the classification problems as the data samples of a particular class are dominating. Problems like fault and disease detection involve imbalance data and hence need attention to avoid the bias towards a particular class. The classification models like support vector machines (SVM) get biased to majority class samples and hence results in misclassification of the minority class samples. SVM suffers as no prior information related to the data is involved in the generation of hyperplanes. Also, local information of the neighbourhood is ignored in SVM samples and thus treats each sample equally for generating the hyperplanes. However, the data points may be contaminated and may mislead the generation of hyperplanes. Inspired by the idea of prior data information and local neighbourhood information, we propose K-nearest neighbour based weighted reduced universum twin SVM for class imbalance learning (KWRUTSVM-CIL). The proposed KWRUTSVM-CIL embodies the local neighbourhood information and uses universum data to balance the classes in class imbalance problems. Local neighbourhood information is incorporated via weight matrix in the objective function. In proposed KWRUTSVM-CIL model, weight vectors are used in the corresponding constraints of the objective functions to exploit the interclass information. The oversampling and undersampling approaches are followed to balance the data in class imbalance problems. Universum data gives prior information of the data. Twin SVM, universum twin SVM, and reduced universum twin SVM for class imbalance implement empirical risk minimization principle and thus may lead to overfitting. However, the proposed KWRUTSVM-CIL model embodies regularization term to maximize the margin and implement the structural risk minimization principle which is the marrow of statistical learning and overcomes the issues of overfitting. Experimental results and the statistical analysis signify that the generalization ability of proposed KWRUTSVM-CIL model is superior in comparison to other twin SVM based models. As an application, we use the proposed KWRUTSVM-CIL model for the diagnosis of Alzheimer’s disease and breast cancer disease. The proposed KWRUTSVM-CIL model showed better generalization performance compared to other twin SVM based models in biomedical datasets. •To incorporate the local neighbourhood information, K nearest neighbourbased weights are used in the pr
doi_str_mv 10.1016/j.knosys.2022.108578
format Article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2667853790</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><els_id>S0950705122002581</els_id><sourcerecordid>2667853790</sourcerecordid><originalsourceid>FETCH-LOGICAL-c380t-995288e0c44e32bed68a1a4d8490488a7086f2d8711206a2a61b8b2173fa35e23</originalsourceid><addsrcrecordid>eNp9kEtPwzAQhC0EEqXwDzhY4pyydh52LkioPAWUA4-r5Tib4pAmxU5a9d_jKpw5jbSamdV8hJwzmDFg2WU9-247v_MzDpyHk0yFPCATJgWPRAL5IZlAnkIkIGXH5MT7GiA4mZyQm6fFgm7RLr96LKnDcjBBh9Zu0PlhRfutbenb5wutOkdNo72ndlXoRrcGaYPatbZdnpKjSjcez_50Sj7ubt_nD9Hz6_3j_Po5MrGEPsrzlEuJYJIEY15gmUnNdFLKJIdESi1AZhUvpWCMQ6a5zlghC85EXOk4RR5PycXYu3bdz4C-V3U3uDa8VDzLhExjkUNwJaPLuM57h5VaO7vSbqcYqD0vVauRl9rzUiOvELsaYxgWbCw65Y3FMLO0Dk2vys7-X_ALrgN0IA</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2667853790</pqid></control><display><type>article</type><title>KNN weighted reduced universum twin SVM for class imbalance learning</title><source>Elsevier ScienceDirect Journals</source><creator>Ganaie, M.A. ; Tanveer, M.</creator><creatorcontrib>Ganaie, M.A. ; Tanveer, M. ; Alzheimer’s Disease Neuroimaging Initiative</creatorcontrib><description>In real world problems, imbalance of data samples poses major challenge for the classification problems as the data samples of a particular class are dominating. Problems like fault and disease detection involve imbalance data and hence need attention to avoid the bias towards a particular class. The classification models like support vector machines (SVM) get biased to majority class samples and hence results in misclassification of the minority class samples. SVM suffers as no prior information related to the data is involved in the generation of hyperplanes. Also, local information of the neighbourhood is ignored in SVM samples and thus treats each sample equally for generating the hyperplanes. However, the data points may be contaminated and may mislead the generation of hyperplanes. Inspired by the idea of prior data information and local neighbourhood information, we propose K-nearest neighbour based weighted reduced universum twin SVM for class imbalance learning (KWRUTSVM-CIL). The proposed KWRUTSVM-CIL embodies the local neighbourhood information and uses universum data to balance the classes in class imbalance problems. Local neighbourhood information is incorporated via weight matrix in the objective function. In proposed KWRUTSVM-CIL model, weight vectors are used in the corresponding constraints of the objective functions to exploit the interclass information. The oversampling and undersampling approaches are followed to balance the data in class imbalance problems. Universum data gives prior information of the data. Twin SVM, universum twin SVM, and reduced universum twin SVM for class imbalance implement empirical risk minimization principle and thus may lead to overfitting. However, the proposed KWRUTSVM-CIL model embodies regularization term to maximize the margin and implement the structural risk minimization principle which is the marrow of statistical learning and overcomes the issues of overfitting. Experimental results and the statistical analysis signify that the generalization ability of proposed KWRUTSVM-CIL model is superior in comparison to other twin SVM based models. As an application, we use the proposed KWRUTSVM-CIL model for the diagnosis of Alzheimer’s disease and breast cancer disease. The proposed KWRUTSVM-CIL model showed better generalization performance compared to other twin SVM based models in biomedical datasets. •To incorporate the local neighbourhood information, K nearest neighbourbased weights are used in the proposed KWRUTSVM-CIL.•Unlike RUTSVM-CIL, UTSVM, TSVM and FTWSVM models which implement the empirical risk minimization principle, the proposed KWRUTSVM-CIL model implements the structural risk minimization principle.•Similar to RUTSVM-CIL, the proposed KWRUTSVM-CIL model incorporates prior information about the data (universum data) to handle the class imbalance problem.•The matrices appearing in the Wolfe dual of the proposed KWRUTSVM-CIL are positive definite, while as the matrices in the Wolfe dual of RUTSVM-CIL, UTSVM, TSVM and FTWSVM are positive semi-definite.•Experimental results and statistical analysis show the efficacy of the proposed KWRUTSVM-CIL model. As an application, we use the proposed KWRUTSVM-CIL model for the classification of Alzheimer’s disease and breast cancer subjects.</description><identifier>ISSN: 0950-7051</identifier><identifier>EISSN: 1872-7409</identifier><identifier>DOI: 10.1016/j.knosys.2022.108578</identifier><language>eng</language><publisher>Amsterdam: Elsevier B.V</publisher><subject>Alzheimer's disease ; Class imbalance ; Classification ; Data points ; Empirical analysis ; Hyperplanes ; Imbalance ratio ; KNN weighted ; Learning ; Mathematical analysis ; Neighborhoods ; Optimization ; Oversampling ; Principles ; Rectangular kernel ; Regularization ; Statistical analysis ; Statistical methods ; Support vector machines ; Twin support vector machine ; Universum</subject><ispartof>Knowledge-based systems, 2022-06, Vol.245, p.108578, Article 108578</ispartof><rights>2022 Elsevier B.V.</rights><rights>Copyright Elsevier Science Ltd. Jun 7, 2022</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c380t-995288e0c44e32bed68a1a4d8490488a7086f2d8711206a2a61b8b2173fa35e23</citedby><cites>FETCH-LOGICAL-c380t-995288e0c44e32bed68a1a4d8490488a7086f2d8711206a2a61b8b2173fa35e23</cites><orcidid>0000-0002-5727-3697</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://www.sciencedirect.com/science/article/pii/S0950705122002581$$EHTML$$P50$$Gelsevier$$H</linktohtml><link.rule.ids>314,776,780,3537,27901,27902,65306</link.rule.ids></links><search><creatorcontrib>Ganaie, M.A.</creatorcontrib><creatorcontrib>Tanveer, M.</creatorcontrib><creatorcontrib>Alzheimer’s Disease Neuroimaging Initiative</creatorcontrib><title>KNN weighted reduced universum twin SVM for class imbalance learning</title><title>Knowledge-based systems</title><description>In real world problems, imbalance of data samples poses major challenge for the classification problems as the data samples of a particular class are dominating. Problems like fault and disease detection involve imbalance data and hence need attention to avoid the bias towards a particular class. The classification models like support vector machines (SVM) get biased to majority class samples and hence results in misclassification of the minority class samples. SVM suffers as no prior information related to the data is involved in the generation of hyperplanes. Also, local information of the neighbourhood is ignored in SVM samples and thus treats each sample equally for generating the hyperplanes. However, the data points may be contaminated and may mislead the generation of hyperplanes. Inspired by the idea of prior data information and local neighbourhood information, we propose K-nearest neighbour based weighted reduced universum twin SVM for class imbalance learning (KWRUTSVM-CIL). The proposed KWRUTSVM-CIL embodies the local neighbourhood information and uses universum data to balance the classes in class imbalance problems. Local neighbourhood information is incorporated via weight matrix in the objective function. In proposed KWRUTSVM-CIL model, weight vectors are used in the corresponding constraints of the objective functions to exploit the interclass information. The oversampling and undersampling approaches are followed to balance the data in class imbalance problems. Universum data gives prior information of the data. Twin SVM, universum twin SVM, and reduced universum twin SVM for class imbalance implement empirical risk minimization principle and thus may lead to overfitting. However, the proposed KWRUTSVM-CIL model embodies regularization term to maximize the margin and implement the structural risk minimization principle which is the marrow of statistical learning and overcomes the issues of overfitting. Experimental results and the statistical analysis signify that the generalization ability of proposed KWRUTSVM-CIL model is superior in comparison to other twin SVM based models. As an application, we use the proposed KWRUTSVM-CIL model for the diagnosis of Alzheimer’s disease and breast cancer disease. The proposed KWRUTSVM-CIL model showed better generalization performance compared to other twin SVM based models in biomedical datasets. •To incorporate the local neighbourhood information, K nearest neighbourbased weights are used in the proposed KWRUTSVM-CIL.•Unlike RUTSVM-CIL, UTSVM, TSVM and FTWSVM models which implement the empirical risk minimization principle, the proposed KWRUTSVM-CIL model implements the structural risk minimization principle.•Similar to RUTSVM-CIL, the proposed KWRUTSVM-CIL model incorporates prior information about the data (universum data) to handle the class imbalance problem.•The matrices appearing in the Wolfe dual of the proposed KWRUTSVM-CIL are positive definite, while as the matrices in the Wolfe dual of RUTSVM-CIL, UTSVM, TSVM and FTWSVM are positive semi-definite.•Experimental results and statistical analysis show the efficacy of the proposed KWRUTSVM-CIL model. As an application, we use the proposed KWRUTSVM-CIL model for the classification of Alzheimer’s disease and breast cancer subjects.</description><subject>Alzheimer's disease</subject><subject>Class imbalance</subject><subject>Classification</subject><subject>Data points</subject><subject>Empirical analysis</subject><subject>Hyperplanes</subject><subject>Imbalance ratio</subject><subject>KNN weighted</subject><subject>Learning</subject><subject>Mathematical analysis</subject><subject>Neighborhoods</subject><subject>Optimization</subject><subject>Oversampling</subject><subject>Principles</subject><subject>Rectangular kernel</subject><subject>Regularization</subject><subject>Statistical analysis</subject><subject>Statistical methods</subject><subject>Support vector machines</subject><subject>Twin support vector machine</subject><subject>Universum</subject><issn>0950-7051</issn><issn>1872-7409</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><recordid>eNp9kEtPwzAQhC0EEqXwDzhY4pyydh52LkioPAWUA4-r5Tib4pAmxU5a9d_jKpw5jbSamdV8hJwzmDFg2WU9-247v_MzDpyHk0yFPCATJgWPRAL5IZlAnkIkIGXH5MT7GiA4mZyQm6fFgm7RLr96LKnDcjBBh9Zu0PlhRfutbenb5wutOkdNo72ndlXoRrcGaYPatbZdnpKjSjcez_50Sj7ubt_nD9Hz6_3j_Po5MrGEPsrzlEuJYJIEY15gmUnNdFLKJIdESi1AZhUvpWCMQ6a5zlghC85EXOk4RR5PycXYu3bdz4C-V3U3uDa8VDzLhExjkUNwJaPLuM57h5VaO7vSbqcYqD0vVauRl9rzUiOvELsaYxgWbCw65Y3FMLO0Dk2vys7-X_ALrgN0IA</recordid><startdate>20220607</startdate><enddate>20220607</enddate><creator>Ganaie, M.A.</creator><creator>Tanveer, M.</creator><general>Elsevier B.V</general><general>Elsevier Science Ltd</general><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>8FD</scope><scope>E3H</scope><scope>F2A</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0002-5727-3697</orcidid></search><sort><creationdate>20220607</creationdate><title>KNN weighted reduced universum twin SVM for class imbalance learning</title><author>Ganaie, M.A. ; Tanveer, M.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c380t-995288e0c44e32bed68a1a4d8490488a7086f2d8711206a2a61b8b2173fa35e23</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Alzheimer's disease</topic><topic>Class imbalance</topic><topic>Classification</topic><topic>Data points</topic><topic>Empirical analysis</topic><topic>Hyperplanes</topic><topic>Imbalance ratio</topic><topic>KNN weighted</topic><topic>Learning</topic><topic>Mathematical analysis</topic><topic>Neighborhoods</topic><topic>Optimization</topic><topic>Oversampling</topic><topic>Principles</topic><topic>Rectangular kernel</topic><topic>Regularization</topic><topic>Statistical analysis</topic><topic>Statistical methods</topic><topic>Support vector machines</topic><topic>Twin support vector machine</topic><topic>Universum</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Ganaie, M.A.</creatorcontrib><creatorcontrib>Tanveer, M.</creatorcontrib><creatorcontrib>Alzheimer’s Disease Neuroimaging Initiative</creatorcontrib><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>Library &amp; Information Sciences Abstracts (LISA)</collection><collection>Library &amp; Information Science Abstracts (LISA)</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>Knowledge-based systems</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Ganaie, M.A.</au><au>Tanveer, M.</au><aucorp>Alzheimer’s Disease Neuroimaging Initiative</aucorp><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>KNN weighted reduced universum twin SVM for class imbalance learning</atitle><jtitle>Knowledge-based systems</jtitle><date>2022-06-07</date><risdate>2022</risdate><volume>245</volume><spage>108578</spage><pages>108578-</pages><artnum>108578</artnum><issn>0950-7051</issn><eissn>1872-7409</eissn><abstract>In real world problems, imbalance of data samples poses major challenge for the classification problems as the data samples of a particular class are dominating. Problems like fault and disease detection involve imbalance data and hence need attention to avoid the bias towards a particular class. The classification models like support vector machines (SVM) get biased to majority class samples and hence results in misclassification of the minority class samples. SVM suffers as no prior information related to the data is involved in the generation of hyperplanes. Also, local information of the neighbourhood is ignored in SVM samples and thus treats each sample equally for generating the hyperplanes. However, the data points may be contaminated and may mislead the generation of hyperplanes. Inspired by the idea of prior data information and local neighbourhood information, we propose K-nearest neighbour based weighted reduced universum twin SVM for class imbalance learning (KWRUTSVM-CIL). The proposed KWRUTSVM-CIL embodies the local neighbourhood information and uses universum data to balance the classes in class imbalance problems. Local neighbourhood information is incorporated via weight matrix in the objective function. In proposed KWRUTSVM-CIL model, weight vectors are used in the corresponding constraints of the objective functions to exploit the interclass information. The oversampling and undersampling approaches are followed to balance the data in class imbalance problems. Universum data gives prior information of the data. Twin SVM, universum twin SVM, and reduced universum twin SVM for class imbalance implement empirical risk minimization principle and thus may lead to overfitting. However, the proposed KWRUTSVM-CIL model embodies regularization term to maximize the margin and implement the structural risk minimization principle which is the marrow of statistical learning and overcomes the issues of overfitting. Experimental results and the statistical analysis signify that the generalization ability of proposed KWRUTSVM-CIL model is superior in comparison to other twin SVM based models. As an application, we use the proposed KWRUTSVM-CIL model for the diagnosis of Alzheimer’s disease and breast cancer disease. The proposed KWRUTSVM-CIL model showed better generalization performance compared to other twin SVM based models in biomedical datasets. •To incorporate the local neighbourhood information, K nearest neighbourbased weights are used in the proposed KWRUTSVM-CIL.•Unlike RUTSVM-CIL, UTSVM, TSVM and FTWSVM models which implement the empirical risk minimization principle, the proposed KWRUTSVM-CIL model implements the structural risk minimization principle.•Similar to RUTSVM-CIL, the proposed KWRUTSVM-CIL model incorporates prior information about the data (universum data) to handle the class imbalance problem.•The matrices appearing in the Wolfe dual of the proposed KWRUTSVM-CIL are positive definite, while as the matrices in the Wolfe dual of RUTSVM-CIL, UTSVM, TSVM and FTWSVM are positive semi-definite.•Experimental results and statistical analysis show the efficacy of the proposed KWRUTSVM-CIL model. As an application, we use the proposed KWRUTSVM-CIL model for the classification of Alzheimer’s disease and breast cancer subjects.</abstract><cop>Amsterdam</cop><pub>Elsevier B.V</pub><doi>10.1016/j.knosys.2022.108578</doi><orcidid>https://orcid.org/0000-0002-5727-3697</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 0950-7051
ispartof Knowledge-based systems, 2022-06, Vol.245, p.108578, Article 108578
issn 0950-7051
1872-7409
language eng
recordid cdi_proquest_journals_2667853790
source Elsevier ScienceDirect Journals
subjects Alzheimer's disease
Class imbalance
Classification
Data points
Empirical analysis
Hyperplanes
Imbalance ratio
KNN weighted
Learning
Mathematical analysis
Neighborhoods
Optimization
Oversampling
Principles
Rectangular kernel
Regularization
Statistical analysis
Statistical methods
Support vector machines
Twin support vector machine
Universum
title KNN weighted reduced universum twin SVM for class imbalance learning
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-11T03%3A46%3A43IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=KNN%20weighted%20reduced%20universum%20twin%20SVM%20for%20class%20imbalance%20learning&rft.jtitle=Knowledge-based%20systems&rft.au=Ganaie,%20M.A.&rft.aucorp=Alzheimer%E2%80%99s%20Disease%20Neuroimaging%20Initiative&rft.date=2022-06-07&rft.volume=245&rft.spage=108578&rft.pages=108578-&rft.artnum=108578&rft.issn=0950-7051&rft.eissn=1872-7409&rft_id=info:doi/10.1016/j.knosys.2022.108578&rft_dat=%3Cproquest_cross%3E2667853790%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2667853790&rft_id=info:pmid/&rft_els_id=S0950705122002581&rfr_iscdi=true