Theoretical and empirical analysis of filter ranking methods: Experimental study on benchmark DNA microarray data

•A study on applicability and effect of filter ranking methods on microarray data.•10 microarray datasets (both binary class and multi-class) with varying dimension.•Concluded that out of all the methods Mutual Information (MI) gives the best results.•An informed choice about selecting an appropriat...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Expert systems with applications 2021-05, Vol.169, p.114485, Article 114485
Hauptverfasser:	Kanti Ghosh, Kushal, Begum, Shemim, Sardar, Aritra, Adhikary, Sukdev, Ghosh, Manosij, Kumar, Munish, Sarkar, Ram
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Cancer classification Chi-square test Classification Datasets Deoxyribonucleic acid DNA DNA chips Empirical analysis Entropy Feature selection Filter method Gene expression Genes Machine learning Microarray data Multilayers Ranking Redundancy Similarity Statistical methods Support vector machines
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page	114485
container_title	Expert systems with applications
container_volume	169
creator	Kanti Ghosh, Kushal Begum, Shemim Sardar, Aritra Adhikary, Sukdev Ghosh, Manosij Kumar, Munish Sarkar, Ram
description	•A study on applicability and effect of filter ranking methods on microarray data.•10 microarray datasets (both binary class and multi-class) with varying dimension.•Concluded that out of all the methods Mutual Information (MI) gives the best results.•An informed choice about selecting an appropriate filtering method for their work. DNA microarray experiments generate thousands of gene expression values that provide information about the state of cells and tissues. Though these expressive values are useful in disease classification, however, only a few genes contribute towards this classification. In this context, usage of feature selection algorithms can be beneficial, as the main goal of feature selection algorithms is to identify the relevant features (here genes) efficiently. In the recent past, many feature selection algorithms have been proposed in the literature that measure the relevancy and redundancy of the features using various evaluation criteria. An important type of feature selection techniques is feature ranking, which does not use any learning algorithm, rather assigns an important value or weight to a feature. In this paper, we provide an extensive study on 10 popularly used filter ranking methods. We have applied the methods to 10 microarray datasets (both binary class and multi-class) and tested the accuracies using three well-known classifiers namely Multi-layer Perceptron (MLP), Support Vector Machine (SVM) and K-Nearest Neighbor (KNN). We have conducted a wide variety of tests to assess the strength and weakness of various filter methods. This vast study provides a comparison amongst different filter methods helping researchers make an informed choice about selecting an appropriate filter method for their work. Three categories of filtering methods are tested, namely, Entropy based, Similarity based and Statistics based. The experiments show that out of all the methods Mutual Information (MI) gives the best results (also best among Entropy based methods). In the category of Similarity based methods ReliefF performs best and Chi-square performs best in the category of Statistics based methods. In case of bi-class datasets, Chi-square would be the better choice, while for multi-class datasets, MI gives better results.
doi_str_mv	10.1016/j.eswa.2020.114485
format	Article
fullrecord	<record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2501860605</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><els_id>S0957417420311325</els_id><sourcerecordid>2501860605</sourcerecordid><originalsourceid>FETCH-LOGICAL-c328t-eae592df070c3bff4200b4a18ded96d6867ff6d42e50e24d1bad3a2bd8e5a9023</originalsourceid><addsrcrecordid>eNp9kEtPAzEMhCMEEqXwBzhF4rzFyb4Rl6qUh1TBBc6Rd-PQlO5uSVKg_55U5czJsjWfNTOMXQqYCBDF9WpC_hsnEmQ8iCyr8iM2ElWZJkVZp8dsBHVeJpkos1N25v0KQJQA5Yh9vi5pcBRsi2uOvebUbaz723C989bzwXBj14Ecd9h_2P6ddxSWg_Y3fP6zIWc76kMEfNjqHR963lDfLjt0H_zueco727oBncMd1xjwnJ0YXHu6-Jtj9nY_f509JouXh6fZdJG0qaxCQkh5LbWBEtq0MSaTAE2GotKk60IXVVEaU-hMUg4kMy0a1CnKRleUYw0yHbOrw9-NGz635INaDVsXM3klcxBVAQXkUSUPqujRe0dGbWIedDslQO2rVSu1r1btq1WHaiN0e4Ao-v-y5JRvbcxM2jpqg9KD_Q__BZsDhEM</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2501860605</pqid></control><display><type>article</type><title>Theoretical and empirical analysis of filter ranking methods: Experimental study on benchmark DNA microarray data</title><source>Access via ScienceDirect (Elsevier)</source><creator>Kanti Ghosh, Kushal ; Begum, Shemim ; Sardar, Aritra ; Adhikary, Sukdev ; Ghosh, Manosij ; Kumar, Munish ; Sarkar, Ram</creator><creatorcontrib>Kanti Ghosh, Kushal ; Begum, Shemim ; Sardar, Aritra ; Adhikary, Sukdev ; Ghosh, Manosij ; Kumar, Munish ; Sarkar, Ram</creatorcontrib><description>•A study on applicability and effect of filter ranking methods on microarray data.•10 microarray datasets (both binary class and multi-class) with varying dimension.•Concluded that out of all the methods Mutual Information (MI) gives the best results.•An informed choice about selecting an appropriate filtering method for their work. DNA microarray experiments generate thousands of gene expression values that provide information about the state of cells and tissues. Though these expressive values are useful in disease classification, however, only a few genes contribute towards this classification. In this context, usage of feature selection algorithms can be beneficial, as the main goal of feature selection algorithms is to identify the relevant features (here genes) efficiently. In the recent past, many feature selection algorithms have been proposed in the literature that measure the relevancy and redundancy of the features using various evaluation criteria. An important type of feature selection techniques is feature ranking, which does not use any learning algorithm, rather assigns an important value or weight to a feature. In this paper, we provide an extensive study on 10 popularly used filter ranking methods. We have applied the methods to 10 microarray datasets (both binary class and multi-class) and tested the accuracies using three well-known classifiers namely Multi-layer Perceptron (MLP), Support Vector Machine (SVM) and K-Nearest Neighbor (KNN). We have conducted a wide variety of tests to assess the strength and weakness of various filter methods. This vast study provides a comparison amongst different filter methods helping researchers make an informed choice about selecting an appropriate filter method for their work. Three categories of filtering methods are tested, namely, Entropy based, Similarity based and Statistics based. The experiments show that out of all the methods Mutual Information (MI) gives the best results (also best among Entropy based methods). In the category of Similarity based methods ReliefF performs best and Chi-square performs best in the category of Statistics based methods. In case of bi-class datasets, Chi-square would be the better choice, while for multi-class datasets, MI gives better results.</description><identifier>ISSN: 0957-4174</identifier><identifier>EISSN: 1873-6793</identifier><identifier>DOI: 10.1016/j.eswa.2020.114485</identifier><language>eng</language><publisher>New York: Elsevier Ltd</publisher><subject>Algorithms ; Cancer classification ; Chi-square test ; Classification ; Datasets ; Deoxyribonucleic acid ; DNA ; DNA chips ; Empirical analysis ; Entropy ; Feature selection ; Filter method ; Gene expression ; Genes ; Machine learning ; Microarray data ; Multilayers ; Ranking ; Redundancy ; Similarity ; Statistical methods ; Support vector machines</subject><ispartof>Expert systems with applications, 2021-05, Vol.169, p.114485, Article 114485</ispartof><rights>2020 Elsevier Ltd</rights><rights>Copyright Elsevier BV May 1, 2021</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c328t-eae592df070c3bff4200b4a18ded96d6867ff6d42e50e24d1bad3a2bd8e5a9023</citedby><cites>FETCH-LOGICAL-c328t-eae592df070c3bff4200b4a18ded96d6867ff6d42e50e24d1bad3a2bd8e5a9023</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://dx.doi.org/10.1016/j.eswa.2020.114485$$EHTML$$P50$$Gelsevier$$H</linktohtml><link.rule.ids>314,780,784,3550,27924,27925,45995</link.rule.ids></links><search><creatorcontrib>Kanti Ghosh, Kushal</creatorcontrib><creatorcontrib>Begum, Shemim</creatorcontrib><creatorcontrib>Sardar, Aritra</creatorcontrib><creatorcontrib>Adhikary, Sukdev</creatorcontrib><creatorcontrib>Ghosh, Manosij</creatorcontrib><creatorcontrib>Kumar, Munish</creatorcontrib><creatorcontrib>Sarkar, Ram</creatorcontrib><title>Theoretical and empirical analysis of filter ranking methods: Experimental study on benchmark DNA microarray data</title><title>Expert systems with applications</title><description>•A study on applicability and effect of filter ranking methods on microarray data.•10 microarray datasets (both binary class and multi-class) with varying dimension.•Concluded that out of all the methods Mutual Information (MI) gives the best results.•An informed choice about selecting an appropriate filtering method for their work. DNA microarray experiments generate thousands of gene expression values that provide information about the state of cells and tissues. Though these expressive values are useful in disease classification, however, only a few genes contribute towards this classification. In this context, usage of feature selection algorithms can be beneficial, as the main goal of feature selection algorithms is to identify the relevant features (here genes) efficiently. In the recent past, many feature selection algorithms have been proposed in the literature that measure the relevancy and redundancy of the features using various evaluation criteria. An important type of feature selection techniques is feature ranking, which does not use any learning algorithm, rather assigns an important value or weight to a feature. In this paper, we provide an extensive study on 10 popularly used filter ranking methods. We have applied the methods to 10 microarray datasets (both binary class and multi-class) and tested the accuracies using three well-known classifiers namely Multi-layer Perceptron (MLP), Support Vector Machine (SVM) and K-Nearest Neighbor (KNN). We have conducted a wide variety of tests to assess the strength and weakness of various filter methods. This vast study provides a comparison amongst different filter methods helping researchers make an informed choice about selecting an appropriate filter method for their work. Three categories of filtering methods are tested, namely, Entropy based, Similarity based and Statistics based. The experiments show that out of all the methods Mutual Information (MI) gives the best results (also best among Entropy based methods). In the category of Similarity based methods ReliefF performs best and Chi-square performs best in the category of Statistics based methods. In case of bi-class datasets, Chi-square would be the better choice, while for multi-class datasets, MI gives better results.</description><subject>Algorithms</subject><subject>Cancer classification</subject><subject>Chi-square test</subject><subject>Classification</subject><subject>Datasets</subject><subject>Deoxyribonucleic acid</subject><subject>DNA</subject><subject>DNA chips</subject><subject>Empirical analysis</subject><subject>Entropy</subject><subject>Feature selection</subject><subject>Filter method</subject><subject>Gene expression</subject><subject>Genes</subject><subject>Machine learning</subject><subject>Microarray data</subject><subject>Multilayers</subject><subject>Ranking</subject><subject>Redundancy</subject><subject>Similarity</subject><subject>Statistical methods</subject><subject>Support vector machines</subject><issn>0957-4174</issn><issn>1873-6793</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><recordid>eNp9kEtPAzEMhCMEEqXwBzhF4rzFyb4Rl6qUh1TBBc6Rd-PQlO5uSVKg_55U5czJsjWfNTOMXQqYCBDF9WpC_hsnEmQ8iCyr8iM2ElWZJkVZp8dsBHVeJpkos1N25v0KQJQA5Yh9vi5pcBRsi2uOvebUbaz723C989bzwXBj14Ecd9h_2P6ddxSWg_Y3fP6zIWc76kMEfNjqHR963lDfLjt0H_zueco727oBncMd1xjwnJ0YXHu6-Jtj9nY_f509JouXh6fZdJG0qaxCQkh5LbWBEtq0MSaTAE2GotKk60IXVVEaU-hMUg4kMy0a1CnKRleUYw0yHbOrw9-NGz635INaDVsXM3klcxBVAQXkUSUPqujRe0dGbWIedDslQO2rVSu1r1btq1WHaiN0e4Ao-v-y5JRvbcxM2jpqg9KD_Q__BZsDhEM</recordid><startdate>20210501</startdate><enddate>20210501</enddate><creator>Kanti Ghosh, Kushal</creator><creator>Begum, Shemim</creator><creator>Sardar, Aritra</creator><creator>Adhikary, Sukdev</creator><creator>Ghosh, Manosij</creator><creator>Kumar, Munish</creator><creator>Sarkar, Ram</creator><general>Elsevier Ltd</general><general>Elsevier BV</general><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope></search><sort><creationdate>20210501</creationdate><title>Theoretical and empirical analysis of filter ranking methods: Experimental study on benchmark DNA microarray data</title><author>Kanti Ghosh, Kushal ; Begum, Shemim ; Sardar, Aritra ; Adhikary, Sukdev ; Ghosh, Manosij ; Kumar, Munish ; Sarkar, Ram</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c328t-eae592df070c3bff4200b4a18ded96d6867ff6d42e50e24d1bad3a2bd8e5a9023</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>Algorithms</topic><topic>Cancer classification</topic><topic>Chi-square test</topic><topic>Classification</topic><topic>Datasets</topic><topic>Deoxyribonucleic acid</topic><topic>DNA</topic><topic>DNA chips</topic><topic>Empirical analysis</topic><topic>Entropy</topic><topic>Feature selection</topic><topic>Filter method</topic><topic>Gene expression</topic><topic>Genes</topic><topic>Machine learning</topic><topic>Microarray data</topic><topic>Multilayers</topic><topic>Ranking</topic><topic>Redundancy</topic><topic>Similarity</topic><topic>Statistical methods</topic><topic>Support vector machines</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Kanti Ghosh, Kushal</creatorcontrib><creatorcontrib>Begum, Shemim</creatorcontrib><creatorcontrib>Sardar, Aritra</creatorcontrib><creatorcontrib>Adhikary, Sukdev</creatorcontrib><creatorcontrib>Ghosh, Manosij</creatorcontrib><creatorcontrib>Kumar, Munish</creatorcontrib><creatorcontrib>Sarkar, Ram</creatorcontrib><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>Expert systems with applications</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Kanti Ghosh, Kushal</au><au>Begum, Shemim</au><au>Sardar, Aritra</au><au>Adhikary, Sukdev</au><au>Ghosh, Manosij</au><au>Kumar, Munish</au><au>Sarkar, Ram</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Theoretical and empirical analysis of filter ranking methods: Experimental study on benchmark DNA microarray data</atitle><jtitle>Expert systems with applications</jtitle><date>2021-05-01</date><risdate>2021</risdate><volume>169</volume><spage>114485</spage><pages>114485-</pages><artnum>114485</artnum><issn>0957-4174</issn><eissn>1873-6793</eissn><abstract>•A study on applicability and effect of filter ranking methods on microarray data.•10 microarray datasets (both binary class and multi-class) with varying dimension.•Concluded that out of all the methods Mutual Information (MI) gives the best results.•An informed choice about selecting an appropriate filtering method for their work. DNA microarray experiments generate thousands of gene expression values that provide information about the state of cells and tissues. Though these expressive values are useful in disease classification, however, only a few genes contribute towards this classification. In this context, usage of feature selection algorithms can be beneficial, as the main goal of feature selection algorithms is to identify the relevant features (here genes) efficiently. In the recent past, many feature selection algorithms have been proposed in the literature that measure the relevancy and redundancy of the features using various evaluation criteria. An important type of feature selection techniques is feature ranking, which does not use any learning algorithm, rather assigns an important value or weight to a feature. In this paper, we provide an extensive study on 10 popularly used filter ranking methods. We have applied the methods to 10 microarray datasets (both binary class and multi-class) and tested the accuracies using three well-known classifiers namely Multi-layer Perceptron (MLP), Support Vector Machine (SVM) and K-Nearest Neighbor (KNN). We have conducted a wide variety of tests to assess the strength and weakness of various filter methods. This vast study provides a comparison amongst different filter methods helping researchers make an informed choice about selecting an appropriate filter method for their work. Three categories of filtering methods are tested, namely, Entropy based, Similarity based and Statistics based. The experiments show that out of all the methods Mutual Information (MI) gives the best results (also best among Entropy based methods). In the category of Similarity based methods ReliefF performs best and Chi-square performs best in the category of Statistics based methods. In case of bi-class datasets, Chi-square would be the better choice, while for multi-class datasets, MI gives better results.</abstract><cop>New York</cop><pub>Elsevier Ltd</pub><doi>10.1016/j.eswa.2020.114485</doi></addata></record>
fulltext	fulltext
identifier	ISSN: 0957-4174
ispartof	Expert systems with applications, 2021-05, Vol.169, p.114485, Article 114485
issn	0957-4174 1873-6793
language	eng
recordid	cdi_proquest_journals_2501860605
source	Access via ScienceDirect (Elsevier)
subjects	Algorithms Cancer classification Chi-square test Classification Datasets Deoxyribonucleic acid DNA DNA chips Empirical analysis Entropy Feature selection Filter method Gene expression Genes Machine learning Microarray data Multilayers Ranking Redundancy Similarity Statistical methods Support vector machines
title	Theoretical and empirical analysis of filter ranking methods: Experimental study on benchmark DNA microarray data
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-01T17%3A16%3A34IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Theoretical%20and%20empirical%20analysis%20of%20filter%20ranking%20methods:%20Experimental%20study%20on%20benchmark%20DNA%20microarray%20data&rft.jtitle=Expert%20systems%20with%20applications&rft.au=Kanti%20Ghosh,%20Kushal&rft.date=2021-05-01&rft.volume=169&rft.spage=114485&rft.pages=114485-&rft.artnum=114485&rft.issn=0957-4174&rft.eissn=1873-6793&rft_id=info:doi/10.1016/j.eswa.2020.114485&rft_dat=%3Cproquest_cross%3E2501860605%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2501860605&rft_id=info:pmid/&rft_els_id=S0957417420311325&rfr_iscdi=true