Theoretical and empirical analysis of filter ranking methods: Experimental study on benchmark DNA microarray data
•A study on applicability and effect of filter ranking methods on microarray data.•10 microarray datasets (both binary class and multi-class) with varying dimension.•Concluded that out of all the methods Mutual Information (MI) gives the best results.•An informed choice about selecting an appropriat...
Gespeichert in:
Veröffentlicht in: | Expert systems with applications 2021-05, Vol.169, p.114485, Article 114485 |
---|---|
Hauptverfasser: | , , , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | |
---|---|
container_issue | |
container_start_page | 114485 |
container_title | Expert systems with applications |
container_volume | 169 |
creator | Kanti Ghosh, Kushal Begum, Shemim Sardar, Aritra Adhikary, Sukdev Ghosh, Manosij Kumar, Munish Sarkar, Ram |
description | •A study on applicability and effect of filter ranking methods on microarray data.•10 microarray datasets (both binary class and multi-class) with varying dimension.•Concluded that out of all the methods Mutual Information (MI) gives the best results.•An informed choice about selecting an appropriate filtering method for their work.
DNA microarray experiments generate thousands of gene expression values that provide information about the state of cells and tissues. Though these expressive values are useful in disease classification, however, only a few genes contribute towards this classification. In this context, usage of feature selection algorithms can be beneficial, as the main goal of feature selection algorithms is to identify the relevant features (here genes) efficiently. In the recent past, many feature selection algorithms have been proposed in the literature that measure the relevancy and redundancy of the features using various evaluation criteria. An important type of feature selection techniques is feature ranking, which does not use any learning algorithm, rather assigns an important value or weight to a feature. In this paper, we provide an extensive study on 10 popularly used filter ranking methods. We have applied the methods to 10 microarray datasets (both binary class and multi-class) and tested the accuracies using three well-known classifiers namely Multi-layer Perceptron (MLP), Support Vector Machine (SVM) and K-Nearest Neighbor (KNN). We have conducted a wide variety of tests to assess the strength and weakness of various filter methods. This vast study provides a comparison amongst different filter methods helping researchers make an informed choice about selecting an appropriate filter method for their work. Three categories of filtering methods are tested, namely, Entropy based, Similarity based and Statistics based. The experiments show that out of all the methods Mutual Information (MI) gives the best results (also best among Entropy based methods). In the category of Similarity based methods ReliefF performs best and Chi-square performs best in the category of Statistics based methods. In case of bi-class datasets, Chi-square would be the better choice, while for multi-class datasets, MI gives better results. |
doi_str_mv | 10.1016/j.eswa.2020.114485 |
format | Article |
fullrecord | <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2501860605</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><els_id>S0957417420311325</els_id><sourcerecordid>2501860605</sourcerecordid><originalsourceid>FETCH-LOGICAL-c328t-eae592df070c3bff4200b4a18ded96d6867ff6d42e50e24d1bad3a2bd8e5a9023</originalsourceid><addsrcrecordid>eNp9kEtPAzEMhCMEEqXwBzhF4rzFyb4Rl6qUh1TBBc6Rd-PQlO5uSVKg_55U5czJsjWfNTOMXQqYCBDF9WpC_hsnEmQ8iCyr8iM2ElWZJkVZp8dsBHVeJpkos1N25v0KQJQA5Yh9vi5pcBRsi2uOvebUbaz723C989bzwXBj14Ecd9h_2P6ddxSWg_Y3fP6zIWc76kMEfNjqHR963lDfLjt0H_zueco727oBncMd1xjwnJ0YXHu6-Jtj9nY_f509JouXh6fZdJG0qaxCQkh5LbWBEtq0MSaTAE2GotKk60IXVVEaU-hMUg4kMy0a1CnKRleUYw0yHbOrw9-NGz635INaDVsXM3klcxBVAQXkUSUPqujRe0dGbWIedDslQO2rVSu1r1btq1WHaiN0e4Ao-v-y5JRvbcxM2jpqg9KD_Q__BZsDhEM</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2501860605</pqid></control><display><type>article</type><title>Theoretical and empirical analysis of filter ranking methods: Experimental study on benchmark DNA microarray data</title><source>Access via ScienceDirect (Elsevier)</source><creator>Kanti Ghosh, Kushal ; Begum, Shemim ; Sardar, Aritra ; Adhikary, Sukdev ; Ghosh, Manosij ; Kumar, Munish ; Sarkar, Ram</creator><creatorcontrib>Kanti Ghosh, Kushal ; Begum, Shemim ; Sardar, Aritra ; Adhikary, Sukdev ; Ghosh, Manosij ; Kumar, Munish ; Sarkar, Ram</creatorcontrib><description>•A study on applicability and effect of filter ranking methods on microarray data.•10 microarray datasets (both binary class and multi-class) with varying dimension.•Concluded that out of all the methods Mutual Information (MI) gives the best results.•An informed choice about selecting an appropriate filtering method for their work.
DNA microarray experiments generate thousands of gene expression values that provide information about the state of cells and tissues. Though these expressive values are useful in disease classification, however, only a few genes contribute towards this classification. In this context, usage of feature selection algorithms can be beneficial, as the main goal of feature selection algorithms is to identify the relevant features (here genes) efficiently. In the recent past, many feature selection algorithms have been proposed in the literature that measure the relevancy and redundancy of the features using various evaluation criteria. An important type of feature selection techniques is feature ranking, which does not use any learning algorithm, rather assigns an important value or weight to a feature. In this paper, we provide an extensive study on 10 popularly used filter ranking methods. We have applied the methods to 10 microarray datasets (both binary class and multi-class) and tested the accuracies using three well-known classifiers namely Multi-layer Perceptron (MLP), Support Vector Machine (SVM) and K-Nearest Neighbor (KNN). We have conducted a wide variety of tests to assess the strength and weakness of various filter methods. This vast study provides a comparison amongst different filter methods helping researchers make an informed choice about selecting an appropriate filter method for their work. Three categories of filtering methods are tested, namely, Entropy based, Similarity based and Statistics based. The experiments show that out of all the methods Mutual Information (MI) gives the best results (also best among Entropy based methods). In the category of Similarity based methods ReliefF performs best and Chi-square performs best in the category of Statistics based methods. In case of bi-class datasets, Chi-square would be the better choice, while for multi-class datasets, MI gives better results.</description><identifier>ISSN: 0957-4174</identifier><identifier>EISSN: 1873-6793</identifier><identifier>DOI: 10.1016/j.eswa.2020.114485</identifier><language>eng</language><publisher>New York: Elsevier Ltd</publisher><subject>Algorithms ; Cancer classification ; Chi-square test ; Classification ; Datasets ; Deoxyribonucleic acid ; DNA ; DNA chips ; Empirical analysis ; Entropy ; Feature selection ; Filter method ; Gene expression ; Genes ; Machine learning ; Microarray data ; Multilayers ; Ranking ; Redundancy ; Similarity ; Statistical methods ; Support vector machines</subject><ispartof>Expert systems with applications, 2021-05, Vol.169, p.114485, Article 114485</ispartof><rights>2020 Elsevier Ltd</rights><rights>Copyright Elsevier BV May 1, 2021</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c328t-eae592df070c3bff4200b4a18ded96d6867ff6d42e50e24d1bad3a2bd8e5a9023</citedby><cites>FETCH-LOGICAL-c328t-eae592df070c3bff4200b4a18ded96d6867ff6d42e50e24d1bad3a2bd8e5a9023</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://dx.doi.org/10.1016/j.eswa.2020.114485$$EHTML$$P50$$Gelsevier$$H</linktohtml><link.rule.ids>314,780,784,3550,27924,27925,45995</link.rule.ids></links><search><creatorcontrib>Kanti Ghosh, Kushal</creatorcontrib><creatorcontrib>Begum, Shemim</creatorcontrib><creatorcontrib>Sardar, Aritra</creatorcontrib><creatorcontrib>Adhikary, Sukdev</creatorcontrib><creatorcontrib>Ghosh, Manosij</creatorcontrib><creatorcontrib>Kumar, Munish</creatorcontrib><creatorcontrib>Sarkar, Ram</creatorcontrib><title>Theoretical and empirical analysis of filter ranking methods: Experimental study on benchmark DNA microarray data</title><title>Expert systems with applications</title><description>•A study on applicability and effect of filter ranking methods on microarray data.•10 microarray datasets (both binary class and multi-class) with varying dimension.•Concluded that out of all the methods Mutual Information (MI) gives the best results.•An informed choice about selecting an appropriate filtering method for their work.
DNA microarray experiments generate thousands of gene expression values that provide information about the state of cells and tissues. Though these expressive values are useful in disease classification, however, only a few genes contribute towards this classification. In this context, usage of feature selection algorithms can be beneficial, as the main goal of feature selection algorithms is to identify the relevant features (here genes) efficiently. In the recent past, many feature selection algorithms have been proposed in the literature that measure the relevancy and redundancy of the features using various evaluation criteria. An important type of feature selection techniques is feature ranking, which does not use any learning algorithm, rather assigns an important value or weight to a feature. In this paper, we provide an extensive study on 10 popularly used filter ranking methods. We have applied the methods to 10 microarray datasets (both binary class and multi-class) and tested the accuracies using three well-known classifiers namely Multi-layer Perceptron (MLP), Support Vector Machine (SVM) and K-Nearest Neighbor (KNN). We have conducted a wide variety of tests to assess the strength and weakness of various filter methods. This vast study provides a comparison amongst different filter methods helping researchers make an informed choice about selecting an appropriate filter method for their work. Three categories of filtering methods are tested, namely, Entropy based, Similarity based and Statistics based. The experiments show that out of all the methods Mutual Information (MI) gives the best results (also best among Entropy based methods). In the category of Similarity based methods ReliefF performs best and Chi-square performs best in the category of Statistics based methods. In case of bi-class datasets, Chi-square would be the better choice, while for multi-class datasets, MI gives better results.</description><subject>Algorithms</subject><subject>Cancer classification</subject><subject>Chi-square test</subject><subject>Classification</subject><subject>Datasets</subject><subject>Deoxyribonucleic acid</subject><subject>DNA</subject><subject>DNA chips</subject><subject>Empirical analysis</subject><subject>Entropy</subject><subject>Feature selection</subject><subject>Filter method</subject><subject>Gene expression</subject><subject>Genes</subject><subject>Machine learning</subject><subject>Microarray data</subject><subject>Multilayers</subject><subject>Ranking</subject><subject>Redundancy</subject><subject>Similarity</subject><subject>Statistical methods</subject><subject>Support vector machines</subject><issn>0957-4174</issn><issn>1873-6793</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><recordid>eNp9kEtPAzEMhCMEEqXwBzhF4rzFyb4Rl6qUh1TBBc6Rd-PQlO5uSVKg_55U5czJsjWfNTOMXQqYCBDF9WpC_hsnEmQ8iCyr8iM2ElWZJkVZp8dsBHVeJpkos1N25v0KQJQA5Yh9vi5pcBRsi2uOvebUbaz723C989bzwXBj14Ecd9h_2P6ddxSWg_Y3fP6zIWc76kMEfNjqHR963lDfLjt0H_zueco727oBncMd1xjwnJ0YXHu6-Jtj9nY_f509JouXh6fZdJG0qaxCQkh5LbWBEtq0MSaTAE2GotKk60IXVVEaU-hMUg4kMy0a1CnKRleUYw0yHbOrw9-NGz635INaDVsXM3klcxBVAQXkUSUPqujRe0dGbWIedDslQO2rVSu1r1btq1WHaiN0e4Ao-v-y5JRvbcxM2jpqg9KD_Q__BZsDhEM</recordid><startdate>20210501</startdate><enddate>20210501</enddate><creator>Kanti Ghosh, Kushal</creator><creator>Begum, Shemim</creator><creator>Sardar, Aritra</creator><creator>Adhikary, Sukdev</creator><creator>Ghosh, Manosij</creator><creator>Kumar, Munish</creator><creator>Sarkar, Ram</creator><general>Elsevier Ltd</general><general>Elsevier BV</general><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope></search><sort><creationdate>20210501</creationdate><title>Theoretical and empirical analysis of filter ranking methods: Experimental study on benchmark DNA microarray data</title><author>Kanti Ghosh, Kushal ; Begum, Shemim ; Sardar, Aritra ; Adhikary, Sukdev ; Ghosh, Manosij ; Kumar, Munish ; Sarkar, Ram</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c328t-eae592df070c3bff4200b4a18ded96d6867ff6d42e50e24d1bad3a2bd8e5a9023</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>Algorithms</topic><topic>Cancer classification</topic><topic>Chi-square test</topic><topic>Classification</topic><topic>Datasets</topic><topic>Deoxyribonucleic acid</topic><topic>DNA</topic><topic>DNA chips</topic><topic>Empirical analysis</topic><topic>Entropy</topic><topic>Feature selection</topic><topic>Filter method</topic><topic>Gene expression</topic><topic>Genes</topic><topic>Machine learning</topic><topic>Microarray data</topic><topic>Multilayers</topic><topic>Ranking</topic><topic>Redundancy</topic><topic>Similarity</topic><topic>Statistical methods</topic><topic>Support vector machines</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Kanti Ghosh, Kushal</creatorcontrib><creatorcontrib>Begum, Shemim</creatorcontrib><creatorcontrib>Sardar, Aritra</creatorcontrib><creatorcontrib>Adhikary, Sukdev</creatorcontrib><creatorcontrib>Ghosh, Manosij</creatorcontrib><creatorcontrib>Kumar, Munish</creatorcontrib><creatorcontrib>Sarkar, Ram</creatorcontrib><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>Expert systems with applications</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Kanti Ghosh, Kushal</au><au>Begum, Shemim</au><au>Sardar, Aritra</au><au>Adhikary, Sukdev</au><au>Ghosh, Manosij</au><au>Kumar, Munish</au><au>Sarkar, Ram</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Theoretical and empirical analysis of filter ranking methods: Experimental study on benchmark DNA microarray data</atitle><jtitle>Expert systems with applications</jtitle><date>2021-05-01</date><risdate>2021</risdate><volume>169</volume><spage>114485</spage><pages>114485-</pages><artnum>114485</artnum><issn>0957-4174</issn><eissn>1873-6793</eissn><abstract>•A study on applicability and effect of filter ranking methods on microarray data.•10 microarray datasets (both binary class and multi-class) with varying dimension.•Concluded that out of all the methods Mutual Information (MI) gives the best results.•An informed choice about selecting an appropriate filtering method for their work.
DNA microarray experiments generate thousands of gene expression values that provide information about the state of cells and tissues. Though these expressive values are useful in disease classification, however, only a few genes contribute towards this classification. In this context, usage of feature selection algorithms can be beneficial, as the main goal of feature selection algorithms is to identify the relevant features (here genes) efficiently. In the recent past, many feature selection algorithms have been proposed in the literature that measure the relevancy and redundancy of the features using various evaluation criteria. An important type of feature selection techniques is feature ranking, which does not use any learning algorithm, rather assigns an important value or weight to a feature. In this paper, we provide an extensive study on 10 popularly used filter ranking methods. We have applied the methods to 10 microarray datasets (both binary class and multi-class) and tested the accuracies using three well-known classifiers namely Multi-layer Perceptron (MLP), Support Vector Machine (SVM) and K-Nearest Neighbor (KNN). We have conducted a wide variety of tests to assess the strength and weakness of various filter methods. This vast study provides a comparison amongst different filter methods helping researchers make an informed choice about selecting an appropriate filter method for their work. Three categories of filtering methods are tested, namely, Entropy based, Similarity based and Statistics based. The experiments show that out of all the methods Mutual Information (MI) gives the best results (also best among Entropy based methods). In the category of Similarity based methods ReliefF performs best and Chi-square performs best in the category of Statistics based methods. In case of bi-class datasets, Chi-square would be the better choice, while for multi-class datasets, MI gives better results.</abstract><cop>New York</cop><pub>Elsevier Ltd</pub><doi>10.1016/j.eswa.2020.114485</doi></addata></record> |
fulltext | fulltext |
identifier | ISSN: 0957-4174 |
ispartof | Expert systems with applications, 2021-05, Vol.169, p.114485, Article 114485 |
issn | 0957-4174 1873-6793 |
language | eng |
recordid | cdi_proquest_journals_2501860605 |
source | Access via ScienceDirect (Elsevier) |
subjects | Algorithms Cancer classification Chi-square test Classification Datasets Deoxyribonucleic acid DNA DNA chips Empirical analysis Entropy Feature selection Filter method Gene expression Genes Machine learning Microarray data Multilayers Ranking Redundancy Similarity Statistical methods Support vector machines |
title | Theoretical and empirical analysis of filter ranking methods: Experimental study on benchmark DNA microarray data |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-01T17%3A16%3A34IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Theoretical%20and%20empirical%20analysis%20of%20filter%20ranking%20methods:%20Experimental%20study%20on%20benchmark%20DNA%20microarray%20data&rft.jtitle=Expert%20systems%20with%20applications&rft.au=Kanti%20Ghosh,%20Kushal&rft.date=2021-05-01&rft.volume=169&rft.spage=114485&rft.pages=114485-&rft.artnum=114485&rft.issn=0957-4174&rft.eissn=1873-6793&rft_id=info:doi/10.1016/j.eswa.2020.114485&rft_dat=%3Cproquest_cross%3E2501860605%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2501860605&rft_id=info:pmid/&rft_els_id=S0957417420311325&rfr_iscdi=true |