Gene selection for tumor classification using neighborhood rough sets and entropy measures
[Display omitted] •We extend the neighborhood rough set model to deal with real-value gene expression data sets.•We propose an entropy measure to evaluate neighborhood classes.•We propose an efficient entropy-based gene selection algorithm for searching a compact gene subset. With the development of...
Gespeichert in:
Veröffentlicht in: | Journal of biomedical informatics 2017-03, Vol.67, p.59-68 |
---|---|
Hauptverfasser: | , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | [Display omitted]
•We extend the neighborhood rough set model to deal with real-value gene expression data sets.•We propose an entropy measure to evaluate neighborhood classes.•We propose an efficient entropy-based gene selection algorithm for searching a compact gene subset.
With the development of bioinformatics, tumor classification from gene expression data becomes an important useful technology for cancer diagnosis. Since a gene expression data often contains thousands of genes and a small number of samples, gene selection from gene expression data becomes a key step for tumor classification. Attribute reduction of rough sets has been successfully applied to gene selection field, as it has the characters of data driving and requiring no additional information. However, traditional rough set method deals with discrete data only. As for the gene expression data containing real-value or noisy data, they are usually employed by a discrete preprocessing, which may result in poor classification accuracy. In this paper, we propose a novel gene selection method based on the neighborhood rough set model, which has the ability of dealing with real-value data whilst maintaining the original gene classification information. Moreover, this paper addresses an entropy measure under the frame of neighborhood rough sets for tackling the uncertainty and noisy of gene expression data. The utilization of this measure can bring about a discovery of compact gene subsets. Finally, a gene selection algorithm is designed based on neighborhood granules and the entropy measure. Some experiments on two gene expression data show that the proposed gene selection is an effective method for improving the accuracy of tumor classification. |
---|---|
ISSN: | 1532-0464 1532-0480 |
DOI: | 10.1016/j.jbi.2017.02.007 |