A support vector machine-recursive feature elimination feature selection method based on artificial contrast variables and mutual information
► A new method was proposed to select the discriminative variables from the high dimension metabolome data. ► The developed method filters out noise and non-informative variables by means of artificial variables and mutual information. ► The discriminative variables were selected by SVM-RFE after re...
Gespeichert in:
Veröffentlicht in: | Journal of chromatography. B, Analytical technologies in the biomedical and life sciences Analytical technologies in the biomedical and life sciences, 2012-12, Vol.910, p.149-155 |
---|---|
Hauptverfasser: | , , , , , , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | ► A new method was proposed to select the discriminative variables from the high dimension metabolome data. ► The developed method filters out noise and non-informative variables by means of artificial variables and mutual information. ► The discriminative variables were selected by SVM-RFE after removing noise. ► An better accuracy was obtained to distinguish among three liver diseases. ► 17 differential metabolites were identified to distinguish 3 liver diseases and the control.
Filtering the discriminative metabolites from high dimension metabolome data is very important in metabolomics study. Support vector machine-recursive feature elimination (SVM-RFE) is an efficient feature selection technique and has shown promising applications in the analysis of the metabolome data. SVM-RFE measures the weights of the features according to the support vectors, noise and non-informative variables in the high dimension data may affect the hyper-plane of the SVM learning model. Hence we proposed a mutual information (MI)-SVM-RFE method which filters out noise and non-informative variables by means of artificial variables and MI, then conducts SVM-RFE to select the most discriminative features. A serum metabolomics data set from patients with chronic hepatitis B, cirrhosis and hepatocellular carcinoma analyzed by liquid chromatography–mass spectrometry (LC–MS) was used to demonstrate the validation of our method. An accuracy of 74.33±2.98% to distinguish among three liver diseases was obtained, better than 72.00±4.15% from the original SVM-RFE. Thirty-four ion features were defined to distinguish among the control and 3 liver diseases, 17 of them were identified. |
---|---|
ISSN: | 1570-0232 1873-376X |
DOI: | 10.1016/j.jchromb.2012.05.020 |