Support vector classification of proteomic profile spectra based on feature extraction with the bi-orthogonal discrete wavelet transform

Automatic classification of high-resolution mass spectrometry data has increasing potential to support physicians in diagnosis of diseases like cancer. The proteomic data exhibit variations among different disease states. A precise and reliable classification of mass spectra is essential for a succe...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Computing and visualization in science 2009-04, Vol.12 (4), p.189-199
Hauptverfasser: Schleif, Frank-Michael, Lindemann, Mathias, Diaz, Mario, Maaß, Peter, Decker, Jens, Elssner, Thomas, Kuhn, Michael, Thiele, Herbert
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Automatic classification of high-resolution mass spectrometry data has increasing potential to support physicians in diagnosis of diseases like cancer. The proteomic data exhibit variations among different disease states. A precise and reliable classification of mass spectra is essential for a successful diagnosis and treatment. The underlying process to obtain such reliable classification results is a crucial point. In this paper such a method is explained and a corresponding semi automatic parameterization procedure is derived. Thereby a simple straightforward classification procedure to assign mass spectra to a particular disease state is derived. The method is based on an initial preprocessing stage of the whole set of spectra followed by the bi-orthogonal discrete wavelet transform (DWT) for feature extraction. The approximation coefficients calculated from the scaling function exhibit a high peak pattern matching property and feature a denoising of the spectrum. The discriminating coefficients, selected by the Kolmogorov–Smirnov test are finally used as features for training and testing a support vector machine with both a linear and a radial basis kernel. For comparison the peak areas obtained with the it ClinProt-System 1 [33] were analyzed using the same support vector machines. The introduced approach was evaluated on clinical MALDI-MS data sets with two classes each originating from cancer studies. The cross validated error rates using the wavelet coefficients where better than those obtained from the peak areas 2 .
ISSN:1432-9360
1433-0369
DOI:10.1007/s00791-008-0087-z