Theoretical and empirical analysis of filter ranking methods: Experimental study on benchmark DNA microarray data

•A study on applicability and effect of filter ranking methods on microarray data.•10 microarray datasets (both binary class and multi-class) with varying dimension.•Concluded that out of all the methods Mutual Information (MI) gives the best results.•An informed choice about selecting an appropriat...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Expert systems with applications 2021-05, Vol.169, p.114485, Article 114485
Hauptverfasser: Kanti Ghosh, Kushal, Begum, Shemim, Sardar, Aritra, Adhikary, Sukdev, Ghosh, Manosij, Kumar, Munish, Sarkar, Ram
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:•A study on applicability and effect of filter ranking methods on microarray data.•10 microarray datasets (both binary class and multi-class) with varying dimension.•Concluded that out of all the methods Mutual Information (MI) gives the best results.•An informed choice about selecting an appropriate filtering method for their work. DNA microarray experiments generate thousands of gene expression values that provide information about the state of cells and tissues. Though these expressive values are useful in disease classification, however, only a few genes contribute towards this classification. In this context, usage of feature selection algorithms can be beneficial, as the main goal of feature selection algorithms is to identify the relevant features (here genes) efficiently. In the recent past, many feature selection algorithms have been proposed in the literature that measure the relevancy and redundancy of the features using various evaluation criteria. An important type of feature selection techniques is feature ranking, which does not use any learning algorithm, rather assigns an important value or weight to a feature. In this paper, we provide an extensive study on 10 popularly used filter ranking methods. We have applied the methods to 10 microarray datasets (both binary class and multi-class) and tested the accuracies using three well-known classifiers namely Multi-layer Perceptron (MLP), Support Vector Machine (SVM) and K-Nearest Neighbor (KNN). We have conducted a wide variety of tests to assess the strength and weakness of various filter methods. This vast study provides a comparison amongst different filter methods helping researchers make an informed choice about selecting an appropriate filter method for their work. Three categories of filtering methods are tested, namely, Entropy based, Similarity based and Statistics based. The experiments show that out of all the methods Mutual Information (MI) gives the best results (also best among Entropy based methods). In the category of Similarity based methods ReliefF performs best and Chi-square performs best in the category of Statistics based methods. In case of bi-class datasets, Chi-square would be the better choice, while for multi-class datasets, MI gives better results.
ISSN:0957-4174
1873-6793
DOI:10.1016/j.eswa.2020.114485