Improving cancer prediction using feature selection in spark environment

Cancer prediction from microarray‐based gene expression data has been subject to much research in recent years. Because of its vast number of features and relatively smaller sample sizes, feature selection becomes necessary for improving classification performance. Additionally, the characteristics...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Concurrency and computation 2024-01, Vol.36 (2), p.n/a
Hauptverfasser:	Longkumer, Imtisenla, Hussain Mazumder, Dilwar
Format:	Artikel
Sprache:	eng
Schlagworte:	big data Cancer cancer prediction Classification Classifiers Decision trees Feature selection Gene expression machine learning Selectors Support vector machines
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	n/a
container_issue	2
container_start_page
container_title	Concurrency and computation
container_volume	36
creator	Longkumer, Imtisenla Hussain Mazumder, Dilwar
description	Cancer prediction from microarray‐based gene expression data has been subject to much research in recent years. Because of its vast number of features and relatively smaller sample sizes, feature selection becomes necessary for improving classification performance. Additionally, the characteristics of this malignant condition may often vary, providing a significant amount of data that requires additional time and resources to process. This research work proposes an Apache Spark‐based feature selection for microarray cancer classification. The first aim is to select only the optimal and necessary features obtained by the feature selector(information gain [IG] and correlation‐based feature selection [CFS]). Secondly, employ a distributed framework and observe the efficiency of the different feature selectors for classification. Finally, we evaluated our approach in terms of accuracy, precision, recall and ROC (AUC) using three classifiers: support vector machine (SVM), naive Bayes (NB), and decision tree (DT). The results reveal that the NB classifier outperformed in all the cases with IG as a feature selector.
doi_str_mv	10.1002/cpe.7903
format	Article
fullrecord	<record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2906077435</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2906077435</sourcerecordid><originalsourceid>FETCH-LOGICAL-c2883-1fb3749fd1bc0ef4fbf2ff69449c154958ab668badd15c067ebd067b2d649f433</originalsourceid><addsrcrecordid>eNp1kE9LAzEQxYMoWKvgR1jw4mVr_m129yiltoWCHvQckuxEUtvsmuxW-u1NXfHmZWZ4_Gbe8BC6JXhGMKYPpoNZWWN2hiakYDTHgvHzv5mKS3QV4xZjQjAjE7Ra77vQHpx_z4zyBkLWBWic6V3rsyGedAuqHwJkEXYw6s5nsVPhIwN_cKH1e_D9Nbqwahfh5rdP0dvT4nW-yjfPy_X8cZMbWlUsJ1azkte2IdpgsNxqS60VNee1IQWvi0ppISqtmoYUBosSdJOqpo1IW5yxKbob76a3PweIvdy2Q_DJUtIaC1yWnBWJuh8pE9oYA1jZBbdX4SgJlqecZMpJnnJKaD6iX24Hx385OX9Z_PDf7Etp0g</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2906077435</pqid></control><display><type>article</type><title>Improving cancer prediction using feature selection in spark environment</title><source>Wiley Online Library Journals Frontfile Complete</source><creator>Longkumer, Imtisenla ; Hussain Mazumder, Dilwar</creator><creatorcontrib>Longkumer, Imtisenla ; Hussain Mazumder, Dilwar</creatorcontrib><description>Cancer prediction from microarray‐based gene expression data has been subject to much research in recent years. Because of its vast number of features and relatively smaller sample sizes, feature selection becomes necessary for improving classification performance. Additionally, the characteristics of this malignant condition may often vary, providing a significant amount of data that requires additional time and resources to process. This research work proposes an Apache Spark‐based feature selection for microarray cancer classification. The first aim is to select only the optimal and necessary features obtained by the feature selector(information gain [IG] and correlation‐based feature selection [CFS]). Secondly, employ a distributed framework and observe the efficiency of the different feature selectors for classification. Finally, we evaluated our approach in terms of accuracy, precision, recall and ROC (AUC) using three classifiers: support vector machine (SVM), naive Bayes (NB), and decision tree (DT). The results reveal that the NB classifier outperformed in all the cases with IG as a feature selector.</description><identifier>ISSN: 1532-0626</identifier><identifier>EISSN: 1532-0634</identifier><identifier>DOI: 10.1002/cpe.7903</identifier><language>eng</language><publisher>Hoboken: Wiley Subscription Services, Inc</publisher><subject>big data ; Cancer ; cancer prediction ; Classification ; Classifiers ; Decision trees ; Feature selection ; Gene expression ; machine learning ; Selectors ; Support vector machines</subject><ispartof>Concurrency and computation, 2024-01, Vol.36 (2), p.n/a</ispartof><rights>2023 John Wiley & Sons Ltd.</rights><rights>2024 John Wiley & Sons, Ltd.</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c2883-1fb3749fd1bc0ef4fbf2ff69449c154958ab668badd15c067ebd067b2d649f433</cites><orcidid>0000-0001-5925-4708</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://onlinelibrary.wiley.com/doi/pdf/10.1002%2Fcpe.7903$$EPDF$$P50$$Gwiley$$H</linktopdf><linktohtml>$$Uhttps://onlinelibrary.wiley.com/doi/full/10.1002%2Fcpe.7903$$EHTML$$P50$$Gwiley$$H</linktohtml><link.rule.ids>314,776,780,1411,27901,27902,45550,45551</link.rule.ids></links><search><creatorcontrib>Longkumer, Imtisenla</creatorcontrib><creatorcontrib>Hussain Mazumder, Dilwar</creatorcontrib><title>Improving cancer prediction using feature selection in spark environment</title><title>Concurrency and computation</title><description>Cancer prediction from microarray‐based gene expression data has been subject to much research in recent years. Because of its vast number of features and relatively smaller sample sizes, feature selection becomes necessary for improving classification performance. Additionally, the characteristics of this malignant condition may often vary, providing a significant amount of data that requires additional time and resources to process. This research work proposes an Apache Spark‐based feature selection for microarray cancer classification. The first aim is to select only the optimal and necessary features obtained by the feature selector(information gain [IG] and correlation‐based feature selection [CFS]). Secondly, employ a distributed framework and observe the efficiency of the different feature selectors for classification. Finally, we evaluated our approach in terms of accuracy, precision, recall and ROC (AUC) using three classifiers: support vector machine (SVM), naive Bayes (NB), and decision tree (DT). The results reveal that the NB classifier outperformed in all the cases with IG as a feature selector.</description><subject>big data</subject><subject>Cancer</subject><subject>cancer prediction</subject><subject>Classification</subject><subject>Classifiers</subject><subject>Decision trees</subject><subject>Feature selection</subject><subject>Gene expression</subject><subject>machine learning</subject><subject>Selectors</subject><subject>Support vector machines</subject><issn>1532-0626</issn><issn>1532-0634</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><recordid>eNp1kE9LAzEQxYMoWKvgR1jw4mVr_m129yiltoWCHvQckuxEUtvsmuxW-u1NXfHmZWZ4_Gbe8BC6JXhGMKYPpoNZWWN2hiakYDTHgvHzv5mKS3QV4xZjQjAjE7Ra77vQHpx_z4zyBkLWBWic6V3rsyGedAuqHwJkEXYw6s5nsVPhIwN_cKH1e_D9Nbqwahfh5rdP0dvT4nW-yjfPy_X8cZMbWlUsJ1azkte2IdpgsNxqS60VNee1IQWvi0ppISqtmoYUBosSdJOqpo1IW5yxKbob76a3PweIvdy2Q_DJUtIaC1yWnBWJuh8pE9oYA1jZBbdX4SgJlqecZMpJnnJKaD6iX24Hx385OX9Z_PDf7Etp0g</recordid><startdate>20240125</startdate><enddate>20240125</enddate><creator>Longkumer, Imtisenla</creator><creator>Hussain Mazumder, Dilwar</creator><general>Wiley Subscription Services, Inc</general><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0001-5925-4708</orcidid></search><sort><creationdate>20240125</creationdate><title>Improving cancer prediction using feature selection in spark environment</title><author>Longkumer, Imtisenla ; Hussain Mazumder, Dilwar</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c2883-1fb3749fd1bc0ef4fbf2ff69449c154958ab668badd15c067ebd067b2d649f433</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>big data</topic><topic>Cancer</topic><topic>cancer prediction</topic><topic>Classification</topic><topic>Classifiers</topic><topic>Decision trees</topic><topic>Feature selection</topic><topic>Gene expression</topic><topic>machine learning</topic><topic>Selectors</topic><topic>Support vector machines</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Longkumer, Imtisenla</creatorcontrib><creatorcontrib>Hussain Mazumder, Dilwar</creatorcontrib><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>Concurrency and computation</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Longkumer, Imtisenla</au><au>Hussain Mazumder, Dilwar</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Improving cancer prediction using feature selection in spark environment</atitle><jtitle>Concurrency and computation</jtitle><date>2024-01-25</date><risdate>2024</risdate><volume>36</volume><issue>2</issue><epage>n/a</epage><issn>1532-0626</issn><eissn>1532-0634</eissn><abstract>Cancer prediction from microarray‐based gene expression data has been subject to much research in recent years. Because of its vast number of features and relatively smaller sample sizes, feature selection becomes necessary for improving classification performance. Additionally, the characteristics of this malignant condition may often vary, providing a significant amount of data that requires additional time and resources to process. This research work proposes an Apache Spark‐based feature selection for microarray cancer classification. The first aim is to select only the optimal and necessary features obtained by the feature selector(information gain [IG] and correlation‐based feature selection [CFS]). Secondly, employ a distributed framework and observe the efficiency of the different feature selectors for classification. Finally, we evaluated our approach in terms of accuracy, precision, recall and ROC (AUC) using three classifiers: support vector machine (SVM), naive Bayes (NB), and decision tree (DT). The results reveal that the NB classifier outperformed in all the cases with IG as a feature selector.</abstract><cop>Hoboken</cop><pub>Wiley Subscription Services, Inc</pub><doi>10.1002/cpe.7903</doi><tpages>14</tpages><orcidid>https://orcid.org/0000-0001-5925-4708</orcidid><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	ISSN: 1532-0626
ispartof	Concurrency and computation, 2024-01, Vol.36 (2), p.n/a
issn	1532-0626 1532-0634
language	eng
recordid	cdi_proquest_journals_2906077435
source	Wiley Online Library Journals Frontfile Complete
subjects	big data Cancer cancer prediction Classification Classifiers Decision trees Feature selection Gene expression machine learning Selectors Support vector machines
title	Improving cancer prediction using feature selection in spark environment
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-30T01%3A24%3A52IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Improving%20cancer%20prediction%20using%20feature%20selection%20in%20spark%20environment&rft.jtitle=Concurrency%20and%20computation&rft.au=Longkumer,%20Imtisenla&rft.date=2024-01-25&rft.volume=36&rft.issue=2&rft.epage=n/a&rft.issn=1532-0626&rft.eissn=1532-0634&rft_id=info:doi/10.1002/cpe.7903&rft_dat=%3Cproquest_cross%3E2906077435%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2906077435&rft_id=info:pmid/&rfr_iscdi=true