Biomarker discovery in MALDI-TOF serum protein profiles using discrete wavelet transformation

Motivation: Automatic classification of high-resolution mass spectrometry proteomic data has increasing potential in the early diagnosis of cancer. We propose a new procedure of biomarker discovery in serum protein profiles based on: (i) discrete wavelet transformation of the spectra; (ii) selection...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Bioinformatics 2009-03, Vol.25 (5), p.643-649
Hauptverfasser: Alexandrov, Theodore, Decker, Jens, Mertens, Bart, Deelder, Andre M., Tollenaar, Rob A. E. M., Maass, Peter, Thiele, Herbert
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 649
container_issue 5
container_start_page 643
container_title Bioinformatics
container_volume 25
creator Alexandrov, Theodore
Decker, Jens
Mertens, Bart
Deelder, Andre M.
Tollenaar, Rob A. E. M.
Maass, Peter
Thiele, Herbert
description Motivation: Automatic classification of high-resolution mass spectrometry proteomic data has increasing potential in the early diagnosis of cancer. We propose a new procedure of biomarker discovery in serum protein profiles based on: (i) discrete wavelet transformation of the spectra; (ii) selection of discriminative wavelet coefficients by a statistical test and (iii) building and evaluating a support vector machine classifier by double cross-validation with attention to the generalizability of the results. In addition to the evaluation results (total recognition rate, sensitivity and specificity), the procedure provides the biomarker patterns, i.e. the parts of spectra which discriminate cancer and control individuals. The evaluation was performed on matrix-assisted laser desorption ionization time-of-flight (MALDI-TOF) serum protein profiles of 66 colorectal cancer patients and 50 controls. Results: Our procedure provided a high recognition rate (97.3%), sensitivity (98.4%) and specificity (95.8%). The extracted biomarker patterns mostly represent the peaks expressing mean differences between the cancer and control spectra. However, we showed that the discriminative power of a peak is not simply expressed by its mean height and cannot be derived by comparison of the mean spectra. The obtained classifiers have high generalization power as measured by the number of support vectors. This prevents overfitting and contributes to the reproducibility of the results, which is required to find biomarkers differentiating cancer patients from healthy individuals. Availability: The data and scripts used in this study are available at http://www.math.uni-bremen.de/~theodore/MALDIDWT. Contact: theodore@math.uni-bremen.de Supplementary information: Supplementary data are available at Bioinformatics online.
doi_str_mv 10.1093/bioinformatics/btn662
format Article
fullrecord <record><control><sourceid>proquest_pubme</sourceid><recordid>TN_cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_2647828</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><oup_id>10.1093/bioinformatics/btn662</oup_id><sourcerecordid>20773974</sourcerecordid><originalsourceid>FETCH-LOGICAL-c673t-ca78455d8daf74a37944bcaafd08e020951c01b5c9ab333673f9a54598e60c183</originalsourceid><addsrcrecordid>eNqNkstu1DAUhiMEoqXwCKAICXahvjveILWF0opBZVGgQkKW4zkpLok92MlA3x4PiQbKpqyOZX__8bn8RfEYoxcYKbrfuOB8G2JvBmfTfjN4IcidYhczgSqCuLqbz1TIitWI7hQPUrpCiGPG2P1iByvCGFVot_hy6EJv4jeI5dIlG9YQr0vny3cHi1en1fnZcZkgjn25imGAfJ9j6zpI5Zicv_ytiTBA-cOsoYOhHKLxaS4r-IfFvdZ0CR7Nca_4cPz6_OikWpy9OT06WFRWSDpU1siacb6sl6aVzFCpGGusMe0S1YAIUhxbhBtulWkozU3RVhnOuKpBIItrule8nPKuxqaHpQWf6-j0Krrc27UOxumbL9591ZdhrYlgsiabBM_nBDF8HyENus-dQdcZD2FMWgglkCTyVpAySXLF7FaQICmpkhvw6T_gVRijz-PSWNVCMoRphvgE2RhSitBue8NIb_ygb_pBT37Iuid_D-aPajZABp7NgEnWdG1en3VpyxGM83zEhkMTF8bVf_9dTRKXBvi5FWWz6bxByfXJxWf96b26eMsPP2pOfwGzEOV-</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>198674013</pqid></control><display><type>article</type><title>Biomarker discovery in MALDI-TOF serum protein profiles using discrete wavelet transformation</title><source>Oxford Journals Open Access Collection</source><source>MEDLINE</source><source>Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals</source><source>PubMed Central</source><source>Alma/SFX Local Collection</source><creator>Alexandrov, Theodore ; Decker, Jens ; Mertens, Bart ; Deelder, Andre M. ; Tollenaar, Rob A. E. M. ; Maass, Peter ; Thiele, Herbert</creator><creatorcontrib>Alexandrov, Theodore ; Decker, Jens ; Mertens, Bart ; Deelder, Andre M. ; Tollenaar, Rob A. E. M. ; Maass, Peter ; Thiele, Herbert</creatorcontrib><description>Motivation: Automatic classification of high-resolution mass spectrometry proteomic data has increasing potential in the early diagnosis of cancer. We propose a new procedure of biomarker discovery in serum protein profiles based on: (i) discrete wavelet transformation of the spectra; (ii) selection of discriminative wavelet coefficients by a statistical test and (iii) building and evaluating a support vector machine classifier by double cross-validation with attention to the generalizability of the results. In addition to the evaluation results (total recognition rate, sensitivity and specificity), the procedure provides the biomarker patterns, i.e. the parts of spectra which discriminate cancer and control individuals. The evaluation was performed on matrix-assisted laser desorption ionization time-of-flight (MALDI-TOF) serum protein profiles of 66 colorectal cancer patients and 50 controls. Results: Our procedure provided a high recognition rate (97.3%), sensitivity (98.4%) and specificity (95.8%). The extracted biomarker patterns mostly represent the peaks expressing mean differences between the cancer and control spectra. However, we showed that the discriminative power of a peak is not simply expressed by its mean height and cannot be derived by comparison of the mean spectra. The obtained classifiers have high generalization power as measured by the number of support vectors. This prevents overfitting and contributes to the reproducibility of the results, which is required to find biomarkers differentiating cancer patients from healthy individuals. Availability: The data and scripts used in this study are available at http://www.math.uni-bremen.de/~theodore/MALDIDWT. Contact: theodore@math.uni-bremen.de Supplementary information: Supplementary data are available at Bioinformatics online.</description><identifier>ISSN: 1367-4803</identifier><identifier>EISSN: 1460-2059</identifier><identifier>EISSN: 1367-4811</identifier><identifier>DOI: 10.1093/bioinformatics/btn662</identifier><identifier>PMID: 19244390</identifier><identifier>CODEN: BOINFP</identifier><language>eng</language><publisher>Oxford: Oxford University Press</publisher><subject>Biological and medical sciences ; Biomarkers - blood ; Blood Proteins - analysis ; Fundamental and applied biological sciences. Psychology ; Gene Expression Profiling ; General aspects ; Mathematics in biology. Statistical analysis. Models. Metrology. Data processing in biology (general aspects) ; Original Papers ; Proteomics - methods ; Spectrometry, Mass, Matrix-Assisted Laser Desorption-Ionization - methods</subject><ispartof>Bioinformatics, 2009-03, Vol.25 (5), p.643-649</ispartof><rights>2009 The Author(s) 2009</rights><rights>2009 INIST-CNRS</rights><rights>2009 The Author(s)</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c673t-ca78455d8daf74a37944bcaafd08e020951c01b5c9ab333673f9a54598e60c183</citedby><cites>FETCH-LOGICAL-c673t-ca78455d8daf74a37944bcaafd08e020951c01b5c9ab333673f9a54598e60c183</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC2647828/pdf/$$EPDF$$P50$$Gpubmedcentral$$Hfree_for_read</linktopdf><linktohtml>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC2647828/$$EHTML$$P50$$Gpubmedcentral$$Hfree_for_read</linktohtml><link.rule.ids>230,314,723,776,780,881,1598,27901,27902,53766,53768</link.rule.ids><backlink>$$Uhttp://pascal-francis.inist.fr/vibad/index.php?action=getRecordDetail&amp;idt=21178260$$DView record in Pascal Francis$$Hfree_for_read</backlink><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/19244390$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Alexandrov, Theodore</creatorcontrib><creatorcontrib>Decker, Jens</creatorcontrib><creatorcontrib>Mertens, Bart</creatorcontrib><creatorcontrib>Deelder, Andre M.</creatorcontrib><creatorcontrib>Tollenaar, Rob A. E. M.</creatorcontrib><creatorcontrib>Maass, Peter</creatorcontrib><creatorcontrib>Thiele, Herbert</creatorcontrib><title>Biomarker discovery in MALDI-TOF serum protein profiles using discrete wavelet transformation</title><title>Bioinformatics</title><addtitle>Bioinformatics</addtitle><description>Motivation: Automatic classification of high-resolution mass spectrometry proteomic data has increasing potential in the early diagnosis of cancer. We propose a new procedure of biomarker discovery in serum protein profiles based on: (i) discrete wavelet transformation of the spectra; (ii) selection of discriminative wavelet coefficients by a statistical test and (iii) building and evaluating a support vector machine classifier by double cross-validation with attention to the generalizability of the results. In addition to the evaluation results (total recognition rate, sensitivity and specificity), the procedure provides the biomarker patterns, i.e. the parts of spectra which discriminate cancer and control individuals. The evaluation was performed on matrix-assisted laser desorption ionization time-of-flight (MALDI-TOF) serum protein profiles of 66 colorectal cancer patients and 50 controls. Results: Our procedure provided a high recognition rate (97.3%), sensitivity (98.4%) and specificity (95.8%). The extracted biomarker patterns mostly represent the peaks expressing mean differences between the cancer and control spectra. However, we showed that the discriminative power of a peak is not simply expressed by its mean height and cannot be derived by comparison of the mean spectra. The obtained classifiers have high generalization power as measured by the number of support vectors. This prevents overfitting and contributes to the reproducibility of the results, which is required to find biomarkers differentiating cancer patients from healthy individuals. Availability: The data and scripts used in this study are available at http://www.math.uni-bremen.de/~theodore/MALDIDWT. Contact: theodore@math.uni-bremen.de Supplementary information: Supplementary data are available at Bioinformatics online.</description><subject>Biological and medical sciences</subject><subject>Biomarkers - blood</subject><subject>Blood Proteins - analysis</subject><subject>Fundamental and applied biological sciences. Psychology</subject><subject>Gene Expression Profiling</subject><subject>General aspects</subject><subject>Mathematics in biology. Statistical analysis. Models. Metrology. Data processing in biology (general aspects)</subject><subject>Original Papers</subject><subject>Proteomics - methods</subject><subject>Spectrometry, Mass, Matrix-Assisted Laser Desorption-Ionization - methods</subject><issn>1367-4803</issn><issn>1460-2059</issn><issn>1367-4811</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2009</creationdate><recordtype>article</recordtype><sourceid>TOX</sourceid><sourceid>EIF</sourceid><recordid>eNqNkstu1DAUhiMEoqXwCKAICXahvjveILWF0opBZVGgQkKW4zkpLok92MlA3x4PiQbKpqyOZX__8bn8RfEYoxcYKbrfuOB8G2JvBmfTfjN4IcidYhczgSqCuLqbz1TIitWI7hQPUrpCiGPG2P1iByvCGFVot_hy6EJv4jeI5dIlG9YQr0vny3cHi1en1fnZcZkgjn25imGAfJ9j6zpI5Zicv_ytiTBA-cOsoYOhHKLxaS4r-IfFvdZ0CR7Nca_4cPz6_OikWpy9OT06WFRWSDpU1siacb6sl6aVzFCpGGusMe0S1YAIUhxbhBtulWkozU3RVhnOuKpBIItrule8nPKuxqaHpQWf6-j0Krrc27UOxumbL9591ZdhrYlgsiabBM_nBDF8HyENus-dQdcZD2FMWgglkCTyVpAySXLF7FaQICmpkhvw6T_gVRijz-PSWNVCMoRphvgE2RhSitBue8NIb_ygb_pBT37Iuid_D-aPajZABp7NgEnWdG1en3VpyxGM83zEhkMTF8bVf_9dTRKXBvi5FWWz6bxByfXJxWf96b26eMsPP2pOfwGzEOV-</recordid><startdate>20090301</startdate><enddate>20090301</enddate><creator>Alexandrov, Theodore</creator><creator>Decker, Jens</creator><creator>Mertens, Bart</creator><creator>Deelder, Andre M.</creator><creator>Tollenaar, Rob A. E. M.</creator><creator>Maass, Peter</creator><creator>Thiele, Herbert</creator><general>Oxford University Press</general><general>Oxford Publishing Limited (England)</general><scope>BSCLL</scope><scope>TOX</scope><scope>IQODW</scope><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7QF</scope><scope>7QO</scope><scope>7QQ</scope><scope>7SC</scope><scope>7SE</scope><scope>7SP</scope><scope>7SR</scope><scope>7TA</scope><scope>7TB</scope><scope>7TM</scope><scope>7TO</scope><scope>7U5</scope><scope>8BQ</scope><scope>8FD</scope><scope>F28</scope><scope>FR3</scope><scope>H8D</scope><scope>H8G</scope><scope>H94</scope><scope>JG9</scope><scope>JQ2</scope><scope>K9.</scope><scope>KR7</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>P64</scope><scope>7X8</scope><scope>5PM</scope></search><sort><creationdate>20090301</creationdate><title>Biomarker discovery in MALDI-TOF serum protein profiles using discrete wavelet transformation</title><author>Alexandrov, Theodore ; Decker, Jens ; Mertens, Bart ; Deelder, Andre M. ; Tollenaar, Rob A. E. M. ; Maass, Peter ; Thiele, Herbert</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c673t-ca78455d8daf74a37944bcaafd08e020951c01b5c9ab333673f9a54598e60c183</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2009</creationdate><topic>Biological and medical sciences</topic><topic>Biomarkers - blood</topic><topic>Blood Proteins - analysis</topic><topic>Fundamental and applied biological sciences. Psychology</topic><topic>Gene Expression Profiling</topic><topic>General aspects</topic><topic>Mathematics in biology. Statistical analysis. Models. Metrology. Data processing in biology (general aspects)</topic><topic>Original Papers</topic><topic>Proteomics - methods</topic><topic>Spectrometry, Mass, Matrix-Assisted Laser Desorption-Ionization - methods</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Alexandrov, Theodore</creatorcontrib><creatorcontrib>Decker, Jens</creatorcontrib><creatorcontrib>Mertens, Bart</creatorcontrib><creatorcontrib>Deelder, Andre M.</creatorcontrib><creatorcontrib>Tollenaar, Rob A. E. M.</creatorcontrib><creatorcontrib>Maass, Peter</creatorcontrib><creatorcontrib>Thiele, Herbert</creatorcontrib><collection>Istex</collection><collection>Oxford Journals Open Access Collection</collection><collection>Pascal-Francis</collection><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>Aluminium Industry Abstracts</collection><collection>Biotechnology Research Abstracts</collection><collection>Ceramic Abstracts</collection><collection>Computer and Information Systems Abstracts</collection><collection>Corrosion Abstracts</collection><collection>Electronics &amp; Communications Abstracts</collection><collection>Engineered Materials Abstracts</collection><collection>Materials Business File</collection><collection>Mechanical &amp; Transportation Engineering Abstracts</collection><collection>Nucleic Acids Abstracts</collection><collection>Oncogenes and Growth Factors Abstracts</collection><collection>Solid State and Superconductivity Abstracts</collection><collection>METADEX</collection><collection>Technology Research Database</collection><collection>ANTE: Abstracts in New Technology &amp; Engineering</collection><collection>Engineering Research Database</collection><collection>Aerospace Database</collection><collection>Copper Technical Reference Library</collection><collection>AIDS and Cancer Research Abstracts</collection><collection>Materials Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>ProQuest Health &amp; Medical Complete (Alumni)</collection><collection>Civil Engineering Abstracts</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>Biotechnology and BioEngineering Abstracts</collection><collection>MEDLINE - Academic</collection><collection>PubMed Central (Full Participant titles)</collection><jtitle>Bioinformatics</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Alexandrov, Theodore</au><au>Decker, Jens</au><au>Mertens, Bart</au><au>Deelder, Andre M.</au><au>Tollenaar, Rob A. E. M.</au><au>Maass, Peter</au><au>Thiele, Herbert</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Biomarker discovery in MALDI-TOF serum protein profiles using discrete wavelet transformation</atitle><jtitle>Bioinformatics</jtitle><addtitle>Bioinformatics</addtitle><date>2009-03-01</date><risdate>2009</risdate><volume>25</volume><issue>5</issue><spage>643</spage><epage>649</epage><pages>643-649</pages><issn>1367-4803</issn><eissn>1460-2059</eissn><eissn>1367-4811</eissn><coden>BOINFP</coden><abstract>Motivation: Automatic classification of high-resolution mass spectrometry proteomic data has increasing potential in the early diagnosis of cancer. We propose a new procedure of biomarker discovery in serum protein profiles based on: (i) discrete wavelet transformation of the spectra; (ii) selection of discriminative wavelet coefficients by a statistical test and (iii) building and evaluating a support vector machine classifier by double cross-validation with attention to the generalizability of the results. In addition to the evaluation results (total recognition rate, sensitivity and specificity), the procedure provides the biomarker patterns, i.e. the parts of spectra which discriminate cancer and control individuals. The evaluation was performed on matrix-assisted laser desorption ionization time-of-flight (MALDI-TOF) serum protein profiles of 66 colorectal cancer patients and 50 controls. Results: Our procedure provided a high recognition rate (97.3%), sensitivity (98.4%) and specificity (95.8%). The extracted biomarker patterns mostly represent the peaks expressing mean differences between the cancer and control spectra. However, we showed that the discriminative power of a peak is not simply expressed by its mean height and cannot be derived by comparison of the mean spectra. The obtained classifiers have high generalization power as measured by the number of support vectors. This prevents overfitting and contributes to the reproducibility of the results, which is required to find biomarkers differentiating cancer patients from healthy individuals. Availability: The data and scripts used in this study are available at http://www.math.uni-bremen.de/~theodore/MALDIDWT. Contact: theodore@math.uni-bremen.de Supplementary information: Supplementary data are available at Bioinformatics online.</abstract><cop>Oxford</cop><pub>Oxford University Press</pub><pmid>19244390</pmid><doi>10.1093/bioinformatics/btn662</doi><tpages>7</tpages><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 1367-4803
ispartof Bioinformatics, 2009-03, Vol.25 (5), p.643-649
issn 1367-4803
1460-2059
1367-4811
language eng
recordid cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_2647828
source Oxford Journals Open Access Collection; MEDLINE; Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals; PubMed Central; Alma/SFX Local Collection
subjects Biological and medical sciences
Biomarkers - blood
Blood Proteins - analysis
Fundamental and applied biological sciences. Psychology
Gene Expression Profiling
General aspects
Mathematics in biology. Statistical analysis. Models. Metrology. Data processing in biology (general aspects)
Original Papers
Proteomics - methods
Spectrometry, Mass, Matrix-Assisted Laser Desorption-Ionization - methods
title Biomarker discovery in MALDI-TOF serum protein profiles using discrete wavelet transformation
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-05T02%3A04%3A59IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_pubme&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Biomarker%20discovery%20in%20MALDI-TOF%20serum%20protein%20profiles%20using%20discrete%20wavelet%20transformation&rft.jtitle=Bioinformatics&rft.au=Alexandrov,%20Theodore&rft.date=2009-03-01&rft.volume=25&rft.issue=5&rft.spage=643&rft.epage=649&rft.pages=643-649&rft.issn=1367-4803&rft.eissn=1460-2059&rft.coden=BOINFP&rft_id=info:doi/10.1093/bioinformatics/btn662&rft_dat=%3Cproquest_pubme%3E20773974%3C/proquest_pubme%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=198674013&rft_id=info:pmid/19244390&rft_oup_id=10.1093/bioinformatics/btn662&rfr_iscdi=true