Ensemble feature selection for stable biomarker identification and cancer classification from microarray expression data

Microarray technology facilitates the simultaneous measurement of expression of tens of thousands of genes and enables us to study cancers and tumors at the molecular level. Because microarray data are typically characterized by small sample size and high dimensionality, accurate and stable feature...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Computers in biology and medicine 2022-03, Vol.142, p.105208-105208, Article 105208
Hauptverfasser: Wang, Aiguo, Liu, Huancheng, Yang, Jing, Chen, Guilin
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 105208
container_issue
container_start_page 105208
container_title Computers in biology and medicine
container_volume 142
creator Wang, Aiguo
Liu, Huancheng
Yang, Jing
Chen, Guilin
description Microarray technology facilitates the simultaneous measurement of expression of tens of thousands of genes and enables us to study cancers and tumors at the molecular level. Because microarray data are typically characterized by small sample size and high dimensionality, accurate and stable feature selection is thus of fundamental importance to the diagnostic accuracy and deep understanding of disease mechanism. Hence, we in this study present an ensemble feature selection framework to improve the discrimination and stability of finally selected features. Specifically, we utilize sampling techniques to obtain multiple sampled datasets, from each of which we use a base feature selector to select a subset of features. Afterwards, we develop two aggregation strategies to combine multiple feature subsets into one set. Finally, comparative experiments are conducted on four publicly available microarray datasets covering both binary and multi-class cases in terms of classification accuracy and three stability metrics. Results show that the proposed method obtains better stability scores and achieves comparable to and even better classification performance than its competitors. •An ensemble feature selection framework towards stable gene selection is proposed.•We present two aggregation strategies to combine multiple feature subsets into one.•Experimental results show its effectiveness in terms of stability and accuracy.•We conducted time complexity analysis of the proposed model.
doi_str_mv 10.1016/j.compbiomed.2021.105208
format Article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_2619208556</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><els_id>S0010482521010027</els_id><sourcerecordid>2627122114</sourcerecordid><originalsourceid>FETCH-LOGICAL-c402t-d1d8e98cb1d3d56dbe44432724fe41b27c050040ac0196f8750262d45d19082d3</originalsourceid><addsrcrecordid>eNqFkUtr3DAUhUVJaCZp_0IRdNONp_fKkh_LJkzaQKCbdC1k6Ro0ta2pZIfk30fOJAS6yUqg893XOYxxhC0CVt_3WxvGQ-fDSG4rQGD-VgKaD2yDTd0WoEp5wjYACIVshDpj5yntAUBCCR_ZWalyFwSxYQ-7KdHYDcR7MvMSiScayM4-TLwPkafZrOI6ysS_FLl3NM2-99Y8M2Zy3JrJZsUOJqU3pY9h5KO3MZgYzSOnh0OkDGTJmdl8Yqe9GRJ9fnkv2J_r3d3Vr-L298-bqx-3hZUg5sKha6htbIeudKpyHUkpS1EL2ZPETtQW1HqWsYBt1Te1AlEJJ5XDFhrhygv27dj3EMO_hdKsR58sDYOZKCxJiwrb7JxSVUa__ofuwxKnvF2mRI1CIMpMNUcqX5ZSpF4fos_mPGoEvaaj9_otHb2mo4_p5NIvLwOWbtVeC1_jyMDlEaDsyL2nqJP1lN11PuZQtAv-_SlPsbGmmA</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2627122114</pqid></control><display><type>article</type><title>Ensemble feature selection for stable biomarker identification and cancer classification from microarray expression data</title><source>MEDLINE</source><source>Elsevier ScienceDirect Journals</source><creator>Wang, Aiguo ; Liu, Huancheng ; Yang, Jing ; Chen, Guilin</creator><creatorcontrib>Wang, Aiguo ; Liu, Huancheng ; Yang, Jing ; Chen, Guilin</creatorcontrib><description>Microarray technology facilitates the simultaneous measurement of expression of tens of thousands of genes and enables us to study cancers and tumors at the molecular level. Because microarray data are typically characterized by small sample size and high dimensionality, accurate and stable feature selection is thus of fundamental importance to the diagnostic accuracy and deep understanding of disease mechanism. Hence, we in this study present an ensemble feature selection framework to improve the discrimination and stability of finally selected features. Specifically, we utilize sampling techniques to obtain multiple sampled datasets, from each of which we use a base feature selector to select a subset of features. Afterwards, we develop two aggregation strategies to combine multiple feature subsets into one set. Finally, comparative experiments are conducted on four publicly available microarray datasets covering both binary and multi-class cases in terms of classification accuracy and three stability metrics. Results show that the proposed method obtains better stability scores and achieves comparable to and even better classification performance than its competitors. •An ensemble feature selection framework towards stable gene selection is proposed.•We present two aggregation strategies to combine multiple feature subsets into one.•Experimental results show its effectiveness in terms of stability and accuracy.•We conducted time complexity analysis of the proposed model.</description><identifier>ISSN: 0010-4825</identifier><identifier>EISSN: 1879-0534</identifier><identifier>DOI: 10.1016/j.compbiomed.2021.105208</identifier><identifier>PMID: 35016102</identifier><language>eng</language><publisher>United States: Elsevier Ltd</publisher><subject>Accuracy ; Algorithms ; Biomarkers ; Cancer ; Classification ; Datasets ; Ensemble learning ; Experiments ; Feature selection ; Gene expression ; Gene expression profiles ; Humans ; Medical diagnosis ; Methods ; Neoplasms - diagnosis ; Neoplasms - genetics ; Oligonucleotide Array Sequence Analysis - methods ; Sampling methods ; Stability ; Tumors</subject><ispartof>Computers in biology and medicine, 2022-03, Vol.142, p.105208-105208, Article 105208</ispartof><rights>2022 Elsevier Ltd</rights><rights>Copyright © 2022 Elsevier Ltd. All rights reserved.</rights><rights>2022. Elsevier Ltd</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c402t-d1d8e98cb1d3d56dbe44432724fe41b27c050040ac0196f8750262d45d19082d3</citedby><cites>FETCH-LOGICAL-c402t-d1d8e98cb1d3d56dbe44432724fe41b27c050040ac0196f8750262d45d19082d3</cites><orcidid>0000-0001-6150-8068</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://www.sciencedirect.com/science/article/pii/S0010482521010027$$EHTML$$P50$$Gelsevier$$H</linktohtml><link.rule.ids>314,776,780,3537,27901,27902,65306</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/35016102$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Wang, Aiguo</creatorcontrib><creatorcontrib>Liu, Huancheng</creatorcontrib><creatorcontrib>Yang, Jing</creatorcontrib><creatorcontrib>Chen, Guilin</creatorcontrib><title>Ensemble feature selection for stable biomarker identification and cancer classification from microarray expression data</title><title>Computers in biology and medicine</title><addtitle>Comput Biol Med</addtitle><description>Microarray technology facilitates the simultaneous measurement of expression of tens of thousands of genes and enables us to study cancers and tumors at the molecular level. Because microarray data are typically characterized by small sample size and high dimensionality, accurate and stable feature selection is thus of fundamental importance to the diagnostic accuracy and deep understanding of disease mechanism. Hence, we in this study present an ensemble feature selection framework to improve the discrimination and stability of finally selected features. Specifically, we utilize sampling techniques to obtain multiple sampled datasets, from each of which we use a base feature selector to select a subset of features. Afterwards, we develop two aggregation strategies to combine multiple feature subsets into one set. Finally, comparative experiments are conducted on four publicly available microarray datasets covering both binary and multi-class cases in terms of classification accuracy and three stability metrics. Results show that the proposed method obtains better stability scores and achieves comparable to and even better classification performance than its competitors. •An ensemble feature selection framework towards stable gene selection is proposed.•We present two aggregation strategies to combine multiple feature subsets into one.•Experimental results show its effectiveness in terms of stability and accuracy.•We conducted time complexity analysis of the proposed model.</description><subject>Accuracy</subject><subject>Algorithms</subject><subject>Biomarkers</subject><subject>Cancer</subject><subject>Classification</subject><subject>Datasets</subject><subject>Ensemble learning</subject><subject>Experiments</subject><subject>Feature selection</subject><subject>Gene expression</subject><subject>Gene expression profiles</subject><subject>Humans</subject><subject>Medical diagnosis</subject><subject>Methods</subject><subject>Neoplasms - diagnosis</subject><subject>Neoplasms - genetics</subject><subject>Oligonucleotide Array Sequence Analysis - methods</subject><subject>Sampling methods</subject><subject>Stability</subject><subject>Tumors</subject><issn>0010-4825</issn><issn>1879-0534</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><sourceid>EIF</sourceid><sourceid>8G5</sourceid><sourceid>BENPR</sourceid><sourceid>GUQSH</sourceid><sourceid>M2O</sourceid><recordid>eNqFkUtr3DAUhUVJaCZp_0IRdNONp_fKkh_LJkzaQKCbdC1k6Ro0ta2pZIfk30fOJAS6yUqg893XOYxxhC0CVt_3WxvGQ-fDSG4rQGD-VgKaD2yDTd0WoEp5wjYACIVshDpj5yntAUBCCR_ZWalyFwSxYQ-7KdHYDcR7MvMSiScayM4-TLwPkafZrOI6ysS_FLl3NM2-99Y8M2Zy3JrJZsUOJqU3pY9h5KO3MZgYzSOnh0OkDGTJmdl8Yqe9GRJ9fnkv2J_r3d3Vr-L298-bqx-3hZUg5sKha6htbIeudKpyHUkpS1EL2ZPETtQW1HqWsYBt1Te1AlEJJ5XDFhrhygv27dj3EMO_hdKsR58sDYOZKCxJiwrb7JxSVUa__ofuwxKnvF2mRI1CIMpMNUcqX5ZSpF4fos_mPGoEvaaj9_otHb2mo4_p5NIvLwOWbtVeC1_jyMDlEaDsyL2nqJP1lN11PuZQtAv-_SlPsbGmmA</recordid><startdate>202203</startdate><enddate>202203</enddate><creator>Wang, Aiguo</creator><creator>Liu, Huancheng</creator><creator>Yang, Jing</creator><creator>Chen, Guilin</creator><general>Elsevier Ltd</general><general>Elsevier Limited</general><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>3V.</scope><scope>7RV</scope><scope>7X7</scope><scope>7XB</scope><scope>88E</scope><scope>8AL</scope><scope>8AO</scope><scope>8FD</scope><scope>8FE</scope><scope>8FG</scope><scope>8FH</scope><scope>8FI</scope><scope>8FJ</scope><scope>8FK</scope><scope>8G5</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>AZQEC</scope><scope>BBNVY</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>BHPHI</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>FR3</scope><scope>FYUFA</scope><scope>GHDGH</scope><scope>GNUQQ</scope><scope>GUQSH</scope><scope>HCIFZ</scope><scope>JQ2</scope><scope>K7-</scope><scope>K9.</scope><scope>KB0</scope><scope>LK8</scope><scope>M0N</scope><scope>M0S</scope><scope>M1P</scope><scope>M2O</scope><scope>M7P</scope><scope>M7Z</scope><scope>MBDVC</scope><scope>NAPCQ</scope><scope>P5Z</scope><scope>P62</scope><scope>P64</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>Q9U</scope><scope>7X8</scope><orcidid>https://orcid.org/0000-0001-6150-8068</orcidid></search><sort><creationdate>202203</creationdate><title>Ensemble feature selection for stable biomarker identification and cancer classification from microarray expression data</title><author>Wang, Aiguo ; Liu, Huancheng ; Yang, Jing ; Chen, Guilin</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c402t-d1d8e98cb1d3d56dbe44432724fe41b27c050040ac0196f8750262d45d19082d3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Accuracy</topic><topic>Algorithms</topic><topic>Biomarkers</topic><topic>Cancer</topic><topic>Classification</topic><topic>Datasets</topic><topic>Ensemble learning</topic><topic>Experiments</topic><topic>Feature selection</topic><topic>Gene expression</topic><topic>Gene expression profiles</topic><topic>Humans</topic><topic>Medical diagnosis</topic><topic>Methods</topic><topic>Neoplasms - diagnosis</topic><topic>Neoplasms - genetics</topic><topic>Oligonucleotide Array Sequence Analysis - methods</topic><topic>Sampling methods</topic><topic>Stability</topic><topic>Tumors</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Wang, Aiguo</creatorcontrib><creatorcontrib>Liu, Huancheng</creatorcontrib><creatorcontrib>Yang, Jing</creatorcontrib><creatorcontrib>Chen, Guilin</creatorcontrib><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>ProQuest Central (Corporate)</collection><collection>ProQuest Nursing and Allied Health Journals</collection><collection>Health &amp; Medical Collection</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>Medical Database (Alumni Edition)</collection><collection>Computing Database (Alumni Edition)</collection><collection>ProQuest Pharma Collection</collection><collection>Technology Research Database</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Natural Science Collection</collection><collection>Hospital Premium Collection</collection><collection>Hospital Premium Collection (Alumni Edition)</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>Research Library (Alumni Edition)</collection><collection>ProQuest Central (Alumni)</collection><collection>ProQuest Central UK/Ireland</collection><collection>Advanced Technologies &amp; Aerospace Collection</collection><collection>ProQuest Central Essentials</collection><collection>Biological Science Collection</collection><collection>AUTh Library subscriptions: ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest Natural Science Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central</collection><collection>Engineering Research Database</collection><collection>Health Research Premium Collection</collection><collection>Health Research Premium Collection (Alumni)</collection><collection>ProQuest Central Student</collection><collection>Research Library Prep</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Computer Science Collection</collection><collection>Computer Science Database</collection><collection>ProQuest Health &amp; Medical Complete (Alumni)</collection><collection>Nursing &amp; Allied Health Database (Alumni Edition)</collection><collection>Biological Sciences</collection><collection>Computing Database</collection><collection>Health &amp; Medical Collection (Alumni Edition)</collection><collection>Medical Database</collection><collection>ProQuest research library</collection><collection>Biological Science Database</collection><collection>Biochemistry Abstracts 1</collection><collection>Research Library (Corporate)</collection><collection>Nursing &amp; Allied Health Premium</collection><collection>ProQuest advanced technologies &amp; aerospace journals</collection><collection>ProQuest Advanced Technologies &amp; Aerospace Collection</collection><collection>Biotechnology and BioEngineering Abstracts</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>ProQuest Central Basic</collection><collection>MEDLINE - Academic</collection><jtitle>Computers in biology and medicine</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Wang, Aiguo</au><au>Liu, Huancheng</au><au>Yang, Jing</au><au>Chen, Guilin</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Ensemble feature selection for stable biomarker identification and cancer classification from microarray expression data</atitle><jtitle>Computers in biology and medicine</jtitle><addtitle>Comput Biol Med</addtitle><date>2022-03</date><risdate>2022</risdate><volume>142</volume><spage>105208</spage><epage>105208</epage><pages>105208-105208</pages><artnum>105208</artnum><issn>0010-4825</issn><eissn>1879-0534</eissn><abstract>Microarray technology facilitates the simultaneous measurement of expression of tens of thousands of genes and enables us to study cancers and tumors at the molecular level. Because microarray data are typically characterized by small sample size and high dimensionality, accurate and stable feature selection is thus of fundamental importance to the diagnostic accuracy and deep understanding of disease mechanism. Hence, we in this study present an ensemble feature selection framework to improve the discrimination and stability of finally selected features. Specifically, we utilize sampling techniques to obtain multiple sampled datasets, from each of which we use a base feature selector to select a subset of features. Afterwards, we develop two aggregation strategies to combine multiple feature subsets into one set. Finally, comparative experiments are conducted on four publicly available microarray datasets covering both binary and multi-class cases in terms of classification accuracy and three stability metrics. Results show that the proposed method obtains better stability scores and achieves comparable to and even better classification performance than its competitors. •An ensemble feature selection framework towards stable gene selection is proposed.•We present two aggregation strategies to combine multiple feature subsets into one.•Experimental results show its effectiveness in terms of stability and accuracy.•We conducted time complexity analysis of the proposed model.</abstract><cop>United States</cop><pub>Elsevier Ltd</pub><pmid>35016102</pmid><doi>10.1016/j.compbiomed.2021.105208</doi><tpages>1</tpages><orcidid>https://orcid.org/0000-0001-6150-8068</orcidid></addata></record>
fulltext fulltext
identifier ISSN: 0010-4825
ispartof Computers in biology and medicine, 2022-03, Vol.142, p.105208-105208, Article 105208
issn 0010-4825
1879-0534
language eng
recordid cdi_proquest_miscellaneous_2619208556
source MEDLINE; Elsevier ScienceDirect Journals
subjects Accuracy
Algorithms
Biomarkers
Cancer
Classification
Datasets
Ensemble learning
Experiments
Feature selection
Gene expression
Gene expression profiles
Humans
Medical diagnosis
Methods
Neoplasms - diagnosis
Neoplasms - genetics
Oligonucleotide Array Sequence Analysis - methods
Sampling methods
Stability
Tumors
title Ensemble feature selection for stable biomarker identification and cancer classification from microarray expression data
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-07T16%3A06%3A22IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Ensemble%20feature%20selection%20for%20stable%20biomarker%20identification%20and%20cancer%20classification%20from%20microarray%20expression%20data&rft.jtitle=Computers%20in%20biology%20and%20medicine&rft.au=Wang,%20Aiguo&rft.date=2022-03&rft.volume=142&rft.spage=105208&rft.epage=105208&rft.pages=105208-105208&rft.artnum=105208&rft.issn=0010-4825&rft.eissn=1879-0534&rft_id=info:doi/10.1016/j.compbiomed.2021.105208&rft_dat=%3Cproquest_cross%3E2627122114%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2627122114&rft_id=info:pmid/35016102&rft_els_id=S0010482521010027&rfr_iscdi=true