Ensemble feature selection for stable biomarker identification and cancer classification from microarray expression data
Microarray technology facilitates the simultaneous measurement of expression of tens of thousands of genes and enables us to study cancers and tumors at the molecular level. Because microarray data are typically characterized by small sample size and high dimensionality, accurate and stable feature...
Gespeichert in:
Veröffentlicht in: | Computers in biology and medicine 2022-03, Vol.142, p.105208-105208, Article 105208 |
---|---|
Hauptverfasser: | , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 105208 |
---|---|
container_issue | |
container_start_page | 105208 |
container_title | Computers in biology and medicine |
container_volume | 142 |
creator | Wang, Aiguo Liu, Huancheng Yang, Jing Chen, Guilin |
description | Microarray technology facilitates the simultaneous measurement of expression of tens of thousands of genes and enables us to study cancers and tumors at the molecular level. Because microarray data are typically characterized by small sample size and high dimensionality, accurate and stable feature selection is thus of fundamental importance to the diagnostic accuracy and deep understanding of disease mechanism. Hence, we in this study present an ensemble feature selection framework to improve the discrimination and stability of finally selected features. Specifically, we utilize sampling techniques to obtain multiple sampled datasets, from each of which we use a base feature selector to select a subset of features. Afterwards, we develop two aggregation strategies to combine multiple feature subsets into one set. Finally, comparative experiments are conducted on four publicly available microarray datasets covering both binary and multi-class cases in terms of classification accuracy and three stability metrics. Results show that the proposed method obtains better stability scores and achieves comparable to and even better classification performance than its competitors.
•An ensemble feature selection framework towards stable gene selection is proposed.•We present two aggregation strategies to combine multiple feature subsets into one.•Experimental results show its effectiveness in terms of stability and accuracy.•We conducted time complexity analysis of the proposed model. |
doi_str_mv | 10.1016/j.compbiomed.2021.105208 |
format | Article |
fullrecord | <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_2619208556</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><els_id>S0010482521010027</els_id><sourcerecordid>2627122114</sourcerecordid><originalsourceid>FETCH-LOGICAL-c402t-d1d8e98cb1d3d56dbe44432724fe41b27c050040ac0196f8750262d45d19082d3</originalsourceid><addsrcrecordid>eNqFkUtr3DAUhUVJaCZp_0IRdNONp_fKkh_LJkzaQKCbdC1k6Ro0ta2pZIfk30fOJAS6yUqg893XOYxxhC0CVt_3WxvGQ-fDSG4rQGD-VgKaD2yDTd0WoEp5wjYACIVshDpj5yntAUBCCR_ZWalyFwSxYQ-7KdHYDcR7MvMSiScayM4-TLwPkafZrOI6ysS_FLl3NM2-99Y8M2Zy3JrJZsUOJqU3pY9h5KO3MZgYzSOnh0OkDGTJmdl8Yqe9GRJ9fnkv2J_r3d3Vr-L298-bqx-3hZUg5sKha6htbIeudKpyHUkpS1EL2ZPETtQW1HqWsYBt1Te1AlEJJ5XDFhrhygv27dj3EMO_hdKsR58sDYOZKCxJiwrb7JxSVUa__ofuwxKnvF2mRI1CIMpMNUcqX5ZSpF4fos_mPGoEvaaj9_otHb2mo4_p5NIvLwOWbtVeC1_jyMDlEaDsyL2nqJP1lN11PuZQtAv-_SlPsbGmmA</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2627122114</pqid></control><display><type>article</type><title>Ensemble feature selection for stable biomarker identification and cancer classification from microarray expression data</title><source>MEDLINE</source><source>Elsevier ScienceDirect Journals</source><creator>Wang, Aiguo ; Liu, Huancheng ; Yang, Jing ; Chen, Guilin</creator><creatorcontrib>Wang, Aiguo ; Liu, Huancheng ; Yang, Jing ; Chen, Guilin</creatorcontrib><description>Microarray technology facilitates the simultaneous measurement of expression of tens of thousands of genes and enables us to study cancers and tumors at the molecular level. Because microarray data are typically characterized by small sample size and high dimensionality, accurate and stable feature selection is thus of fundamental importance to the diagnostic accuracy and deep understanding of disease mechanism. Hence, we in this study present an ensemble feature selection framework to improve the discrimination and stability of finally selected features. Specifically, we utilize sampling techniques to obtain multiple sampled datasets, from each of which we use a base feature selector to select a subset of features. Afterwards, we develop two aggregation strategies to combine multiple feature subsets into one set. Finally, comparative experiments are conducted on four publicly available microarray datasets covering both binary and multi-class cases in terms of classification accuracy and three stability metrics. Results show that the proposed method obtains better stability scores and achieves comparable to and even better classification performance than its competitors.
•An ensemble feature selection framework towards stable gene selection is proposed.•We present two aggregation strategies to combine multiple feature subsets into one.•Experimental results show its effectiveness in terms of stability and accuracy.•We conducted time complexity analysis of the proposed model.</description><identifier>ISSN: 0010-4825</identifier><identifier>EISSN: 1879-0534</identifier><identifier>DOI: 10.1016/j.compbiomed.2021.105208</identifier><identifier>PMID: 35016102</identifier><language>eng</language><publisher>United States: Elsevier Ltd</publisher><subject>Accuracy ; Algorithms ; Biomarkers ; Cancer ; Classification ; Datasets ; Ensemble learning ; Experiments ; Feature selection ; Gene expression ; Gene expression profiles ; Humans ; Medical diagnosis ; Methods ; Neoplasms - diagnosis ; Neoplasms - genetics ; Oligonucleotide Array Sequence Analysis - methods ; Sampling methods ; Stability ; Tumors</subject><ispartof>Computers in biology and medicine, 2022-03, Vol.142, p.105208-105208, Article 105208</ispartof><rights>2022 Elsevier Ltd</rights><rights>Copyright © 2022 Elsevier Ltd. All rights reserved.</rights><rights>2022. Elsevier Ltd</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c402t-d1d8e98cb1d3d56dbe44432724fe41b27c050040ac0196f8750262d45d19082d3</citedby><cites>FETCH-LOGICAL-c402t-d1d8e98cb1d3d56dbe44432724fe41b27c050040ac0196f8750262d45d19082d3</cites><orcidid>0000-0001-6150-8068</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://www.sciencedirect.com/science/article/pii/S0010482521010027$$EHTML$$P50$$Gelsevier$$H</linktohtml><link.rule.ids>314,776,780,3537,27901,27902,65306</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/35016102$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Wang, Aiguo</creatorcontrib><creatorcontrib>Liu, Huancheng</creatorcontrib><creatorcontrib>Yang, Jing</creatorcontrib><creatorcontrib>Chen, Guilin</creatorcontrib><title>Ensemble feature selection for stable biomarker identification and cancer classification from microarray expression data</title><title>Computers in biology and medicine</title><addtitle>Comput Biol Med</addtitle><description>Microarray technology facilitates the simultaneous measurement of expression of tens of thousands of genes and enables us to study cancers and tumors at the molecular level. Because microarray data are typically characterized by small sample size and high dimensionality, accurate and stable feature selection is thus of fundamental importance to the diagnostic accuracy and deep understanding of disease mechanism. Hence, we in this study present an ensemble feature selection framework to improve the discrimination and stability of finally selected features. Specifically, we utilize sampling techniques to obtain multiple sampled datasets, from each of which we use a base feature selector to select a subset of features. Afterwards, we develop two aggregation strategies to combine multiple feature subsets into one set. Finally, comparative experiments are conducted on four publicly available microarray datasets covering both binary and multi-class cases in terms of classification accuracy and three stability metrics. Results show that the proposed method obtains better stability scores and achieves comparable to and even better classification performance than its competitors.
•An ensemble feature selection framework towards stable gene selection is proposed.•We present two aggregation strategies to combine multiple feature subsets into one.•Experimental results show its effectiveness in terms of stability and accuracy.•We conducted time complexity analysis of the proposed model.</description><subject>Accuracy</subject><subject>Algorithms</subject><subject>Biomarkers</subject><subject>Cancer</subject><subject>Classification</subject><subject>Datasets</subject><subject>Ensemble learning</subject><subject>Experiments</subject><subject>Feature selection</subject><subject>Gene expression</subject><subject>Gene expression profiles</subject><subject>Humans</subject><subject>Medical diagnosis</subject><subject>Methods</subject><subject>Neoplasms - diagnosis</subject><subject>Neoplasms - genetics</subject><subject>Oligonucleotide Array Sequence Analysis - methods</subject><subject>Sampling methods</subject><subject>Stability</subject><subject>Tumors</subject><issn>0010-4825</issn><issn>1879-0534</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><sourceid>EIF</sourceid><sourceid>8G5</sourceid><sourceid>BENPR</sourceid><sourceid>GUQSH</sourceid><sourceid>M2O</sourceid><recordid>eNqFkUtr3DAUhUVJaCZp_0IRdNONp_fKkh_LJkzaQKCbdC1k6Ro0ta2pZIfk30fOJAS6yUqg893XOYxxhC0CVt_3WxvGQ-fDSG4rQGD-VgKaD2yDTd0WoEp5wjYACIVshDpj5yntAUBCCR_ZWalyFwSxYQ-7KdHYDcR7MvMSiScayM4-TLwPkafZrOI6ysS_FLl3NM2-99Y8M2Zy3JrJZsUOJqU3pY9h5KO3MZgYzSOnh0OkDGTJmdl8Yqe9GRJ9fnkv2J_r3d3Vr-L298-bqx-3hZUg5sKha6htbIeudKpyHUkpS1EL2ZPETtQW1HqWsYBt1Te1AlEJJ5XDFhrhygv27dj3EMO_hdKsR58sDYOZKCxJiwrb7JxSVUa__ofuwxKnvF2mRI1CIMpMNUcqX5ZSpF4fos_mPGoEvaaj9_otHb2mo4_p5NIvLwOWbtVeC1_jyMDlEaDsyL2nqJP1lN11PuZQtAv-_SlPsbGmmA</recordid><startdate>202203</startdate><enddate>202203</enddate><creator>Wang, Aiguo</creator><creator>Liu, Huancheng</creator><creator>Yang, Jing</creator><creator>Chen, Guilin</creator><general>Elsevier Ltd</general><general>Elsevier Limited</general><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>3V.</scope><scope>7RV</scope><scope>7X7</scope><scope>7XB</scope><scope>88E</scope><scope>8AL</scope><scope>8AO</scope><scope>8FD</scope><scope>8FE</scope><scope>8FG</scope><scope>8FH</scope><scope>8FI</scope><scope>8FJ</scope><scope>8FK</scope><scope>8G5</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>AZQEC</scope><scope>BBNVY</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>BHPHI</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>FR3</scope><scope>FYUFA</scope><scope>GHDGH</scope><scope>GNUQQ</scope><scope>GUQSH</scope><scope>HCIFZ</scope><scope>JQ2</scope><scope>K7-</scope><scope>K9.</scope><scope>KB0</scope><scope>LK8</scope><scope>M0N</scope><scope>M0S</scope><scope>M1P</scope><scope>M2O</scope><scope>M7P</scope><scope>M7Z</scope><scope>MBDVC</scope><scope>NAPCQ</scope><scope>P5Z</scope><scope>P62</scope><scope>P64</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>Q9U</scope><scope>7X8</scope><orcidid>https://orcid.org/0000-0001-6150-8068</orcidid></search><sort><creationdate>202203</creationdate><title>Ensemble feature selection for stable biomarker identification and cancer classification from microarray expression data</title><author>Wang, Aiguo ; Liu, Huancheng ; Yang, Jing ; Chen, Guilin</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c402t-d1d8e98cb1d3d56dbe44432724fe41b27c050040ac0196f8750262d45d19082d3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Accuracy</topic><topic>Algorithms</topic><topic>Biomarkers</topic><topic>Cancer</topic><topic>Classification</topic><topic>Datasets</topic><topic>Ensemble learning</topic><topic>Experiments</topic><topic>Feature selection</topic><topic>Gene expression</topic><topic>Gene expression profiles</topic><topic>Humans</topic><topic>Medical diagnosis</topic><topic>Methods</topic><topic>Neoplasms - diagnosis</topic><topic>Neoplasms - genetics</topic><topic>Oligonucleotide Array Sequence Analysis - methods</topic><topic>Sampling methods</topic><topic>Stability</topic><topic>Tumors</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Wang, Aiguo</creatorcontrib><creatorcontrib>Liu, Huancheng</creatorcontrib><creatorcontrib>Yang, Jing</creatorcontrib><creatorcontrib>Chen, Guilin</creatorcontrib><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>ProQuest Central (Corporate)</collection><collection>ProQuest Nursing and Allied Health Journals</collection><collection>Health & Medical Collection</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>Medical Database (Alumni Edition)</collection><collection>Computing Database (Alumni Edition)</collection><collection>ProQuest Pharma Collection</collection><collection>Technology Research Database</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Natural Science Collection</collection><collection>Hospital Premium Collection</collection><collection>Hospital Premium Collection (Alumni Edition)</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>Research Library (Alumni Edition)</collection><collection>ProQuest Central (Alumni)</collection><collection>ProQuest Central UK/Ireland</collection><collection>Advanced Technologies & Aerospace Collection</collection><collection>ProQuest Central Essentials</collection><collection>Biological Science Collection</collection><collection>AUTh Library subscriptions: ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest Natural Science Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central</collection><collection>Engineering Research Database</collection><collection>Health Research Premium Collection</collection><collection>Health Research Premium Collection (Alumni)</collection><collection>ProQuest Central Student</collection><collection>Research Library Prep</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Computer Science Collection</collection><collection>Computer Science Database</collection><collection>ProQuest Health & Medical Complete (Alumni)</collection><collection>Nursing & Allied Health Database (Alumni Edition)</collection><collection>Biological Sciences</collection><collection>Computing Database</collection><collection>Health & Medical Collection (Alumni Edition)</collection><collection>Medical Database</collection><collection>ProQuest research library</collection><collection>Biological Science Database</collection><collection>Biochemistry Abstracts 1</collection><collection>Research Library (Corporate)</collection><collection>Nursing & Allied Health Premium</collection><collection>ProQuest advanced technologies & aerospace journals</collection><collection>ProQuest Advanced Technologies & Aerospace Collection</collection><collection>Biotechnology and BioEngineering Abstracts</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>ProQuest Central Basic</collection><collection>MEDLINE - Academic</collection><jtitle>Computers in biology and medicine</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Wang, Aiguo</au><au>Liu, Huancheng</au><au>Yang, Jing</au><au>Chen, Guilin</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Ensemble feature selection for stable biomarker identification and cancer classification from microarray expression data</atitle><jtitle>Computers in biology and medicine</jtitle><addtitle>Comput Biol Med</addtitle><date>2022-03</date><risdate>2022</risdate><volume>142</volume><spage>105208</spage><epage>105208</epage><pages>105208-105208</pages><artnum>105208</artnum><issn>0010-4825</issn><eissn>1879-0534</eissn><abstract>Microarray technology facilitates the simultaneous measurement of expression of tens of thousands of genes and enables us to study cancers and tumors at the molecular level. Because microarray data are typically characterized by small sample size and high dimensionality, accurate and stable feature selection is thus of fundamental importance to the diagnostic accuracy and deep understanding of disease mechanism. Hence, we in this study present an ensemble feature selection framework to improve the discrimination and stability of finally selected features. Specifically, we utilize sampling techniques to obtain multiple sampled datasets, from each of which we use a base feature selector to select a subset of features. Afterwards, we develop two aggregation strategies to combine multiple feature subsets into one set. Finally, comparative experiments are conducted on four publicly available microarray datasets covering both binary and multi-class cases in terms of classification accuracy and three stability metrics. Results show that the proposed method obtains better stability scores and achieves comparable to and even better classification performance than its competitors.
•An ensemble feature selection framework towards stable gene selection is proposed.•We present two aggregation strategies to combine multiple feature subsets into one.•Experimental results show its effectiveness in terms of stability and accuracy.•We conducted time complexity analysis of the proposed model.</abstract><cop>United States</cop><pub>Elsevier Ltd</pub><pmid>35016102</pmid><doi>10.1016/j.compbiomed.2021.105208</doi><tpages>1</tpages><orcidid>https://orcid.org/0000-0001-6150-8068</orcidid></addata></record> |
fulltext | fulltext |
identifier | ISSN: 0010-4825 |
ispartof | Computers in biology and medicine, 2022-03, Vol.142, p.105208-105208, Article 105208 |
issn | 0010-4825 1879-0534 |
language | eng |
recordid | cdi_proquest_miscellaneous_2619208556 |
source | MEDLINE; Elsevier ScienceDirect Journals |
subjects | Accuracy Algorithms Biomarkers Cancer Classification Datasets Ensemble learning Experiments Feature selection Gene expression Gene expression profiles Humans Medical diagnosis Methods Neoplasms - diagnosis Neoplasms - genetics Oligonucleotide Array Sequence Analysis - methods Sampling methods Stability Tumors |
title | Ensemble feature selection for stable biomarker identification and cancer classification from microarray expression data |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-07T16%3A06%3A22IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Ensemble%20feature%20selection%20for%20stable%20biomarker%20identification%20and%20cancer%20classification%20from%20microarray%20expression%20data&rft.jtitle=Computers%20in%20biology%20and%20medicine&rft.au=Wang,%20Aiguo&rft.date=2022-03&rft.volume=142&rft.spage=105208&rft.epage=105208&rft.pages=105208-105208&rft.artnum=105208&rft.issn=0010-4825&rft.eissn=1879-0534&rft_id=info:doi/10.1016/j.compbiomed.2021.105208&rft_dat=%3Cproquest_cross%3E2627122114%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2627122114&rft_id=info:pmid/35016102&rft_els_id=S0010482521010027&rfr_iscdi=true |