Theoretical and empirical analysis of filter ranking methods: Experimental study on benchmark DNA microarray data

•A study on applicability and effect of filter ranking methods on microarray data.•10 microarray datasets (both binary class and multi-class) with varying dimension.•Concluded that out of all the methods Mutual Information (MI) gives the best results.•An informed choice about selecting an appropriat...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Expert systems with applications 2021-05, Vol.169, p.114485, Article 114485
Hauptverfasser: Kanti Ghosh, Kushal, Begum, Shemim, Sardar, Aritra, Adhikary, Sukdev, Ghosh, Manosij, Kumar, Munish, Sarkar, Ram
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page 114485
container_title Expert systems with applications
container_volume 169
creator Kanti Ghosh, Kushal
Begum, Shemim
Sardar, Aritra
Adhikary, Sukdev
Ghosh, Manosij
Kumar, Munish
Sarkar, Ram
description •A study on applicability and effect of filter ranking methods on microarray data.•10 microarray datasets (both binary class and multi-class) with varying dimension.•Concluded that out of all the methods Mutual Information (MI) gives the best results.•An informed choice about selecting an appropriate filtering method for their work. DNA microarray experiments generate thousands of gene expression values that provide information about the state of cells and tissues. Though these expressive values are useful in disease classification, however, only a few genes contribute towards this classification. In this context, usage of feature selection algorithms can be beneficial, as the main goal of feature selection algorithms is to identify the relevant features (here genes) efficiently. In the recent past, many feature selection algorithms have been proposed in the literature that measure the relevancy and redundancy of the features using various evaluation criteria. An important type of feature selection techniques is feature ranking, which does not use any learning algorithm, rather assigns an important value or weight to a feature. In this paper, we provide an extensive study on 10 popularly used filter ranking methods. We have applied the methods to 10 microarray datasets (both binary class and multi-class) and tested the accuracies using three well-known classifiers namely Multi-layer Perceptron (MLP), Support Vector Machine (SVM) and K-Nearest Neighbor (KNN). We have conducted a wide variety of tests to assess the strength and weakness of various filter methods. This vast study provides a comparison amongst different filter methods helping researchers make an informed choice about selecting an appropriate filter method for their work. Three categories of filtering methods are tested, namely, Entropy based, Similarity based and Statistics based. The experiments show that out of all the methods Mutual Information (MI) gives the best results (also best among Entropy based methods). In the category of Similarity based methods ReliefF performs best and Chi-square performs best in the category of Statistics based methods. In case of bi-class datasets, Chi-square would be the better choice, while for multi-class datasets, MI gives better results.
doi_str_mv 10.1016/j.eswa.2020.114485
format Article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2501860605</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><els_id>S0957417420311325</els_id><sourcerecordid>2501860605</sourcerecordid><originalsourceid>FETCH-LOGICAL-c328t-eae592df070c3bff4200b4a18ded96d6867ff6d42e50e24d1bad3a2bd8e5a9023</originalsourceid><addsrcrecordid>eNp9kEtPAzEMhCMEEqXwBzhF4rzFyb4Rl6qUh1TBBc6Rd-PQlO5uSVKg_55U5czJsjWfNTOMXQqYCBDF9WpC_hsnEmQ8iCyr8iM2ElWZJkVZp8dsBHVeJpkos1N25v0KQJQA5Yh9vi5pcBRsi2uOvebUbaz723C989bzwXBj14Ecd9h_2P6ddxSWg_Y3fP6zIWc76kMEfNjqHR963lDfLjt0H_zueco727oBncMd1xjwnJ0YXHu6-Jtj9nY_f509JouXh6fZdJG0qaxCQkh5LbWBEtq0MSaTAE2GotKk60IXVVEaU-hMUg4kMy0a1CnKRleUYw0yHbOrw9-NGz635INaDVsXM3klcxBVAQXkUSUPqujRe0dGbWIedDslQO2rVSu1r1btq1WHaiN0e4Ao-v-y5JRvbcxM2jpqg9KD_Q__BZsDhEM</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2501860605</pqid></control><display><type>article</type><title>Theoretical and empirical analysis of filter ranking methods: Experimental study on benchmark DNA microarray data</title><source>Access via ScienceDirect (Elsevier)</source><creator>Kanti Ghosh, Kushal ; Begum, Shemim ; Sardar, Aritra ; Adhikary, Sukdev ; Ghosh, Manosij ; Kumar, Munish ; Sarkar, Ram</creator><creatorcontrib>Kanti Ghosh, Kushal ; Begum, Shemim ; Sardar, Aritra ; Adhikary, Sukdev ; Ghosh, Manosij ; Kumar, Munish ; Sarkar, Ram</creatorcontrib><description>•A study on applicability and effect of filter ranking methods on microarray data.•10 microarray datasets (both binary class and multi-class) with varying dimension.•Concluded that out of all the methods Mutual Information (MI) gives the best results.•An informed choice about selecting an appropriate filtering method for their work. DNA microarray experiments generate thousands of gene expression values that provide information about the state of cells and tissues. Though these expressive values are useful in disease classification, however, only a few genes contribute towards this classification. In this context, usage of feature selection algorithms can be beneficial, as the main goal of feature selection algorithms is to identify the relevant features (here genes) efficiently. In the recent past, many feature selection algorithms have been proposed in the literature that measure the relevancy and redundancy of the features using various evaluation criteria. An important type of feature selection techniques is feature ranking, which does not use any learning algorithm, rather assigns an important value or weight to a feature. In this paper, we provide an extensive study on 10 popularly used filter ranking methods. We have applied the methods to 10 microarray datasets (both binary class and multi-class) and tested the accuracies using three well-known classifiers namely Multi-layer Perceptron (MLP), Support Vector Machine (SVM) and K-Nearest Neighbor (KNN). We have conducted a wide variety of tests to assess the strength and weakness of various filter methods. This vast study provides a comparison amongst different filter methods helping researchers make an informed choice about selecting an appropriate filter method for their work. Three categories of filtering methods are tested, namely, Entropy based, Similarity based and Statistics based. The experiments show that out of all the methods Mutual Information (MI) gives the best results (also best among Entropy based methods). In the category of Similarity based methods ReliefF performs best and Chi-square performs best in the category of Statistics based methods. In case of bi-class datasets, Chi-square would be the better choice, while for multi-class datasets, MI gives better results.</description><identifier>ISSN: 0957-4174</identifier><identifier>EISSN: 1873-6793</identifier><identifier>DOI: 10.1016/j.eswa.2020.114485</identifier><language>eng</language><publisher>New York: Elsevier Ltd</publisher><subject>Algorithms ; Cancer classification ; Chi-square test ; Classification ; Datasets ; Deoxyribonucleic acid ; DNA ; DNA chips ; Empirical analysis ; Entropy ; Feature selection ; Filter method ; Gene expression ; Genes ; Machine learning ; Microarray data ; Multilayers ; Ranking ; Redundancy ; Similarity ; Statistical methods ; Support vector machines</subject><ispartof>Expert systems with applications, 2021-05, Vol.169, p.114485, Article 114485</ispartof><rights>2020 Elsevier Ltd</rights><rights>Copyright Elsevier BV May 1, 2021</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c328t-eae592df070c3bff4200b4a18ded96d6867ff6d42e50e24d1bad3a2bd8e5a9023</citedby><cites>FETCH-LOGICAL-c328t-eae592df070c3bff4200b4a18ded96d6867ff6d42e50e24d1bad3a2bd8e5a9023</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://dx.doi.org/10.1016/j.eswa.2020.114485$$EHTML$$P50$$Gelsevier$$H</linktohtml><link.rule.ids>314,780,784,3550,27924,27925,45995</link.rule.ids></links><search><creatorcontrib>Kanti Ghosh, Kushal</creatorcontrib><creatorcontrib>Begum, Shemim</creatorcontrib><creatorcontrib>Sardar, Aritra</creatorcontrib><creatorcontrib>Adhikary, Sukdev</creatorcontrib><creatorcontrib>Ghosh, Manosij</creatorcontrib><creatorcontrib>Kumar, Munish</creatorcontrib><creatorcontrib>Sarkar, Ram</creatorcontrib><title>Theoretical and empirical analysis of filter ranking methods: Experimental study on benchmark DNA microarray data</title><title>Expert systems with applications</title><description>•A study on applicability and effect of filter ranking methods on microarray data.•10 microarray datasets (both binary class and multi-class) with varying dimension.•Concluded that out of all the methods Mutual Information (MI) gives the best results.•An informed choice about selecting an appropriate filtering method for their work. DNA microarray experiments generate thousands of gene expression values that provide information about the state of cells and tissues. Though these expressive values are useful in disease classification, however, only a few genes contribute towards this classification. In this context, usage of feature selection algorithms can be beneficial, as the main goal of feature selection algorithms is to identify the relevant features (here genes) efficiently. In the recent past, many feature selection algorithms have been proposed in the literature that measure the relevancy and redundancy of the features using various evaluation criteria. An important type of feature selection techniques is feature ranking, which does not use any learning algorithm, rather assigns an important value or weight to a feature. In this paper, we provide an extensive study on 10 popularly used filter ranking methods. We have applied the methods to 10 microarray datasets (both binary class and multi-class) and tested the accuracies using three well-known classifiers namely Multi-layer Perceptron (MLP), Support Vector Machine (SVM) and K-Nearest Neighbor (KNN). We have conducted a wide variety of tests to assess the strength and weakness of various filter methods. This vast study provides a comparison amongst different filter methods helping researchers make an informed choice about selecting an appropriate filter method for their work. Three categories of filtering methods are tested, namely, Entropy based, Similarity based and Statistics based. The experiments show that out of all the methods Mutual Information (MI) gives the best results (also best among Entropy based methods). In the category of Similarity based methods ReliefF performs best and Chi-square performs best in the category of Statistics based methods. In case of bi-class datasets, Chi-square would be the better choice, while for multi-class datasets, MI gives better results.</description><subject>Algorithms</subject><subject>Cancer classification</subject><subject>Chi-square test</subject><subject>Classification</subject><subject>Datasets</subject><subject>Deoxyribonucleic acid</subject><subject>DNA</subject><subject>DNA chips</subject><subject>Empirical analysis</subject><subject>Entropy</subject><subject>Feature selection</subject><subject>Filter method</subject><subject>Gene expression</subject><subject>Genes</subject><subject>Machine learning</subject><subject>Microarray data</subject><subject>Multilayers</subject><subject>Ranking</subject><subject>Redundancy</subject><subject>Similarity</subject><subject>Statistical methods</subject><subject>Support vector machines</subject><issn>0957-4174</issn><issn>1873-6793</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><recordid>eNp9kEtPAzEMhCMEEqXwBzhF4rzFyb4Rl6qUh1TBBc6Rd-PQlO5uSVKg_55U5czJsjWfNTOMXQqYCBDF9WpC_hsnEmQ8iCyr8iM2ElWZJkVZp8dsBHVeJpkos1N25v0KQJQA5Yh9vi5pcBRsi2uOvebUbaz723C989bzwXBj14Ecd9h_2P6ddxSWg_Y3fP6zIWc76kMEfNjqHR963lDfLjt0H_zueco727oBncMd1xjwnJ0YXHu6-Jtj9nY_f509JouXh6fZdJG0qaxCQkh5LbWBEtq0MSaTAE2GotKk60IXVVEaU-hMUg4kMy0a1CnKRleUYw0yHbOrw9-NGz635INaDVsXM3klcxBVAQXkUSUPqujRe0dGbWIedDslQO2rVSu1r1btq1WHaiN0e4Ao-v-y5JRvbcxM2jpqg9KD_Q__BZsDhEM</recordid><startdate>20210501</startdate><enddate>20210501</enddate><creator>Kanti Ghosh, Kushal</creator><creator>Begum, Shemim</creator><creator>Sardar, Aritra</creator><creator>Adhikary, Sukdev</creator><creator>Ghosh, Manosij</creator><creator>Kumar, Munish</creator><creator>Sarkar, Ram</creator><general>Elsevier Ltd</general><general>Elsevier BV</general><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope></search><sort><creationdate>20210501</creationdate><title>Theoretical and empirical analysis of filter ranking methods: Experimental study on benchmark DNA microarray data</title><author>Kanti Ghosh, Kushal ; Begum, Shemim ; Sardar, Aritra ; Adhikary, Sukdev ; Ghosh, Manosij ; Kumar, Munish ; Sarkar, Ram</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c328t-eae592df070c3bff4200b4a18ded96d6867ff6d42e50e24d1bad3a2bd8e5a9023</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>Algorithms</topic><topic>Cancer classification</topic><topic>Chi-square test</topic><topic>Classification</topic><topic>Datasets</topic><topic>Deoxyribonucleic acid</topic><topic>DNA</topic><topic>DNA chips</topic><topic>Empirical analysis</topic><topic>Entropy</topic><topic>Feature selection</topic><topic>Filter method</topic><topic>Gene expression</topic><topic>Genes</topic><topic>Machine learning</topic><topic>Microarray data</topic><topic>Multilayers</topic><topic>Ranking</topic><topic>Redundancy</topic><topic>Similarity</topic><topic>Statistical methods</topic><topic>Support vector machines</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Kanti Ghosh, Kushal</creatorcontrib><creatorcontrib>Begum, Shemim</creatorcontrib><creatorcontrib>Sardar, Aritra</creatorcontrib><creatorcontrib>Adhikary, Sukdev</creatorcontrib><creatorcontrib>Ghosh, Manosij</creatorcontrib><creatorcontrib>Kumar, Munish</creatorcontrib><creatorcontrib>Sarkar, Ram</creatorcontrib><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>Expert systems with applications</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Kanti Ghosh, Kushal</au><au>Begum, Shemim</au><au>Sardar, Aritra</au><au>Adhikary, Sukdev</au><au>Ghosh, Manosij</au><au>Kumar, Munish</au><au>Sarkar, Ram</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Theoretical and empirical analysis of filter ranking methods: Experimental study on benchmark DNA microarray data</atitle><jtitle>Expert systems with applications</jtitle><date>2021-05-01</date><risdate>2021</risdate><volume>169</volume><spage>114485</spage><pages>114485-</pages><artnum>114485</artnum><issn>0957-4174</issn><eissn>1873-6793</eissn><abstract>•A study on applicability and effect of filter ranking methods on microarray data.•10 microarray datasets (both binary class and multi-class) with varying dimension.•Concluded that out of all the methods Mutual Information (MI) gives the best results.•An informed choice about selecting an appropriate filtering method for their work. DNA microarray experiments generate thousands of gene expression values that provide information about the state of cells and tissues. Though these expressive values are useful in disease classification, however, only a few genes contribute towards this classification. In this context, usage of feature selection algorithms can be beneficial, as the main goal of feature selection algorithms is to identify the relevant features (here genes) efficiently. In the recent past, many feature selection algorithms have been proposed in the literature that measure the relevancy and redundancy of the features using various evaluation criteria. An important type of feature selection techniques is feature ranking, which does not use any learning algorithm, rather assigns an important value or weight to a feature. In this paper, we provide an extensive study on 10 popularly used filter ranking methods. We have applied the methods to 10 microarray datasets (both binary class and multi-class) and tested the accuracies using three well-known classifiers namely Multi-layer Perceptron (MLP), Support Vector Machine (SVM) and K-Nearest Neighbor (KNN). We have conducted a wide variety of tests to assess the strength and weakness of various filter methods. This vast study provides a comparison amongst different filter methods helping researchers make an informed choice about selecting an appropriate filter method for their work. Three categories of filtering methods are tested, namely, Entropy based, Similarity based and Statistics based. The experiments show that out of all the methods Mutual Information (MI) gives the best results (also best among Entropy based methods). In the category of Similarity based methods ReliefF performs best and Chi-square performs best in the category of Statistics based methods. In case of bi-class datasets, Chi-square would be the better choice, while for multi-class datasets, MI gives better results.</abstract><cop>New York</cop><pub>Elsevier Ltd</pub><doi>10.1016/j.eswa.2020.114485</doi></addata></record>
fulltext fulltext
identifier ISSN: 0957-4174
ispartof Expert systems with applications, 2021-05, Vol.169, p.114485, Article 114485
issn 0957-4174
1873-6793
language eng
recordid cdi_proquest_journals_2501860605
source Access via ScienceDirect (Elsevier)
subjects Algorithms
Cancer classification
Chi-square test
Classification
Datasets
Deoxyribonucleic acid
DNA
DNA chips
Empirical analysis
Entropy
Feature selection
Filter method
Gene expression
Genes
Machine learning
Microarray data
Multilayers
Ranking
Redundancy
Similarity
Statistical methods
Support vector machines
title Theoretical and empirical analysis of filter ranking methods: Experimental study on benchmark DNA microarray data
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-01T17%3A16%3A34IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Theoretical%20and%20empirical%20analysis%20of%20filter%20ranking%20methods:%20Experimental%20study%20on%20benchmark%20DNA%20microarray%20data&rft.jtitle=Expert%20systems%20with%20applications&rft.au=Kanti%20Ghosh,%20Kushal&rft.date=2021-05-01&rft.volume=169&rft.spage=114485&rft.pages=114485-&rft.artnum=114485&rft.issn=0957-4174&rft.eissn=1873-6793&rft_id=info:doi/10.1016/j.eswa.2020.114485&rft_dat=%3Cproquest_cross%3E2501860605%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2501860605&rft_id=info:pmid/&rft_els_id=S0957417420311325&rfr_iscdi=true