A support vector machine-recursive feature elimination feature selection method based on artificial contrast variables and mutual information

► A new method was proposed to select the discriminative variables from the high dimension metabolome data. ► The developed method filters out noise and non-informative variables by means of artificial variables and mutual information. ► The discriminative variables were selected by SVM-RFE after re...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Journal of chromatography. B, Analytical technologies in the biomedical and life sciences Analytical technologies in the biomedical and life sciences, 2012-12, Vol.910, p.149-155
Hauptverfasser: Lin, Xiaohui, Yang, Fufang, Zhou, Lina, Yin, Peiyuan, Kong, Hongwei, Xing, Wenbin, Lu, Xin, Jia, Lewen, Wang, Quancai, Xu, Guowang
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:► A new method was proposed to select the discriminative variables from the high dimension metabolome data. ► The developed method filters out noise and non-informative variables by means of artificial variables and mutual information. ► The discriminative variables were selected by SVM-RFE after removing noise. ► An better accuracy was obtained to distinguish among three liver diseases. ► 17 differential metabolites were identified to distinguish 3 liver diseases and the control. Filtering the discriminative metabolites from high dimension metabolome data is very important in metabolomics study. Support vector machine-recursive feature elimination (SVM-RFE) is an efficient feature selection technique and has shown promising applications in the analysis of the metabolome data. SVM-RFE measures the weights of the features according to the support vectors, noise and non-informative variables in the high dimension data may affect the hyper-plane of the SVM learning model. Hence we proposed a mutual information (MI)-SVM-RFE method which filters out noise and non-informative variables by means of artificial variables and MI, then conducts SVM-RFE to select the most discriminative features. A serum metabolomics data set from patients with chronic hepatitis B, cirrhosis and hepatocellular carcinoma analyzed by liquid chromatography–mass spectrometry (LC–MS) was used to demonstrate the validation of our method. An accuracy of 74.33±2.98% to distinguish among three liver diseases was obtained, better than 72.00±4.15% from the original SVM-RFE. Thirty-four ion features were defined to distinguish among the control and 3 liver diseases, 17 of them were identified.
ISSN:1570-0232
1873-376X
DOI:10.1016/j.jchromb.2012.05.020