Joint neighborhood entropy-based gene selection method with fisher score for tumor classification
Tumor classification is one of the most vital technologies for cancer diagnosis. Due to the high dimensionality, gene selection (finding a small, closely related gene set to accurately classify tumor) is an important step for improving gene expression data classification performance. Traditional rou...
Gespeichert in:
Veröffentlicht in: | Applied intelligence (Dordrecht, Netherlands) Netherlands), 2019-04, Vol.49 (4), p.1245-1259 |
---|---|
Hauptverfasser: | , , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Tumor classification is one of the most vital technologies for cancer diagnosis. Due to the high dimensionality, gene selection (finding a small, closely related gene set to accurately classify tumor) is an important step for improving gene expression data classification performance. Traditional rough set model as a classical attribute reduction method deals with discrete data only. As for the gene expression data containing real-value or noisy data, they are usually employed by a discrete preprocessing, which may result in poor classification accuracy. In this paper, a novel neighborhood rough sets and entropy measure-based gene selection with Fisher score for tumor classification is proposed, which has the ability of dealing with real-value data whilst maintaining the original gene classification information. First, the Fisher score method is employed to eliminate irrelevant genes to significantly reduce computation complexity. Next, some neighborhood entropy-based uncertainty measures are investigated for handling the uncertainty and noisy of gene expression data. Moreover, some of their properties are derived and the relationships among these measures are established. Finally, a joint neighborhood entropy-based gene selection algorithm with the Fisher score is presented to improve the classification performance of gene expression data. The experimental results under an instance and several public gene expression data sets prove that the proposed method is very effective for selecting the most relevant genes with high classification accuracy. |
---|---|
ISSN: | 0924-669X 1573-7497 |
DOI: | 10.1007/s10489-018-1320-1 |