Probabilistic disease classification of expression-dependent proteomic data from mass spectrometry of human serum
We have developed an algorithm called Q5 for probabilistic classification of healthy versus disease whole serum samples using mass spectrometry. The algorithm employs principal components analysis (PCA) followed by linear discriminant analysis (LDA) on whole spectrum surface-enhanced laser desorptio...
Gespeichert in:
Veröffentlicht in: | Journal of computational biology 2003-01, Vol.10 (6), p.925-946 |
---|---|
Hauptverfasser: | , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 946 |
---|---|
container_issue | 6 |
container_start_page | 925 |
container_title | Journal of computational biology |
container_volume | 10 |
creator | Lilien, Ryan H Farid, Hany Donald, Bruce R |
description | We have developed an algorithm called Q5 for probabilistic classification of healthy versus disease whole serum samples using mass spectrometry. The algorithm employs principal components analysis (PCA) followed by linear discriminant analysis (LDA) on whole spectrum surface-enhanced laser desorption/ionization time of flight (SELDI-TOF) mass spectrometry (MS) data and is demonstrated on four real datasets from complete, complex SELDI spectra of human blood serum. Q5 is a closed-form, exact solution to the problem of classification of complete mass spectra of a complex protein mixture. Q5 employs a probabilistic classification algorithm built upon a dimension-reduced linear discriminant analysis. Our solution is computationally efficient; it is noniterative and computes the optimal linear discriminant using closed-form equations. The optimal discriminant is computed and verified for datasets of complete, complex SELDI spectra of human blood serum. Replicate experiments of different training/testing splits of each dataset are employed to verify robustness of the algorithm. The probabilistic classification method achieves excellent performance. We achieve sensitivity, specificity, and positive predictive values above 97% on three ovarian cancer datasets and one prostate cancer dataset. The Q5 method outperforms previous full-spectrum complex sample spectral classification techniques and can provide clues as to the molecular identities of differentially expressed proteins and peptides. |
doi_str_mv | 10.1089/106652703322756159 |
format | Article |
fullrecord | <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_71535784</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>17709414</sourcerecordid><originalsourceid>FETCH-LOGICAL-c440t-e662d25aa8913e599aa357ae22277af20e50b2d39839605fd3cc7c47b21a7a0e3</originalsourceid><addsrcrecordid>eNqFkctK7jAURoMo3l_AgWR0ZtVcmqQZingDQQc6LrvpLkbapiYp6NubH39wcAaOkh3W-tjhI-SMswvOGnvJmdZKGCalEEZpruwOOeRKmarRWu-WewGqQpgDcpTSO2Ncamb2yQGvbVOm5pB8PMfQQedHn7J3tPcJISF1I6TkB-8g-zDTMFD8XCKWtzBXPS449zhnusSQMUwbETLQIYaJTsWkaUGXy4Q5fm3st3WCmSaM63RC9gYYE55uz2Pyenvzcn1fPT7dPVxfPVaurlmuUGvRCwXQWC5RWQsglQEU5a8GBsFQsU700jbSaqaGXjpnXG06wcEAQ3lM_v3kliU_Vky5nXxyOI4wY1hTa7gqgU39J8iNYbbmG1D8gC6GlCIO7RL9BPGr5azdNNL-30iRzrfpazdh_6tsK5DfUmiISg</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>17709414</pqid></control><display><type>article</type><title>Probabilistic disease classification of expression-dependent proteomic data from mass spectrometry of human serum</title><source>Mary Ann Liebert Online Subscription</source><source>MEDLINE</source><creator>Lilien, Ryan H ; Farid, Hany ; Donald, Bruce R</creator><creatorcontrib>Lilien, Ryan H ; Farid, Hany ; Donald, Bruce R</creatorcontrib><description>We have developed an algorithm called Q5 for probabilistic classification of healthy versus disease whole serum samples using mass spectrometry. The algorithm employs principal components analysis (PCA) followed by linear discriminant analysis (LDA) on whole spectrum surface-enhanced laser desorption/ionization time of flight (SELDI-TOF) mass spectrometry (MS) data and is demonstrated on four real datasets from complete, complex SELDI spectra of human blood serum. Q5 is a closed-form, exact solution to the problem of classification of complete mass spectra of a complex protein mixture. Q5 employs a probabilistic classification algorithm built upon a dimension-reduced linear discriminant analysis. Our solution is computationally efficient; it is noniterative and computes the optimal linear discriminant using closed-form equations. The optimal discriminant is computed and verified for datasets of complete, complex SELDI spectra of human blood serum. Replicate experiments of different training/testing splits of each dataset are employed to verify robustness of the algorithm. The probabilistic classification method achieves excellent performance. We achieve sensitivity, specificity, and positive predictive values above 97% on three ovarian cancer datasets and one prostate cancer dataset. The Q5 method outperforms previous full-spectrum complex sample spectral classification techniques and can provide clues as to the molecular identities of differentially expressed proteins and peptides.</description><identifier>ISSN: 1066-5277</identifier><identifier>EISSN: 1557-8666</identifier><identifier>DOI: 10.1089/106652703322756159</identifier><identifier>PMID: 14980018</identifier><language>eng</language><publisher>United States</publisher><subject>Algorithms ; Blood Proteins - analysis ; Blood Proteins - chemistry ; Databases, Protein ; Diagnosis, Differential ; Discriminant Analysis ; Female ; Gene Expression Regulation, Neoplastic ; Humans ; Male ; Ovarian Neoplasms - chemistry ; Ovarian Neoplasms - classification ; Ovarian Neoplasms - diagnosis ; Pattern Recognition, Automated ; Principal Component Analysis ; Probability ; Prostatic Neoplasms - chemistry ; Prostatic Neoplasms - classification ; Prostatic Neoplasms - diagnosis ; Proteome - analysis ; Proteome - chemistry ; Proteomics ; Serum - chemistry ; Spectrometry, Mass, Matrix-Assisted Laser Desorption-Ionization</subject><ispartof>Journal of computational biology, 2003-01, Vol.10 (6), p.925-946</ispartof><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c440t-e662d25aa8913e599aa357ae22277af20e50b2d39839605fd3cc7c47b21a7a0e3</citedby><cites>FETCH-LOGICAL-c440t-e662d25aa8913e599aa357ae22277af20e50b2d39839605fd3cc7c47b21a7a0e3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,776,780,3029,27901,27902</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/14980018$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Lilien, Ryan H</creatorcontrib><creatorcontrib>Farid, Hany</creatorcontrib><creatorcontrib>Donald, Bruce R</creatorcontrib><title>Probabilistic disease classification of expression-dependent proteomic data from mass spectrometry of human serum</title><title>Journal of computational biology</title><addtitle>J Comput Biol</addtitle><description>We have developed an algorithm called Q5 for probabilistic classification of healthy versus disease whole serum samples using mass spectrometry. The algorithm employs principal components analysis (PCA) followed by linear discriminant analysis (LDA) on whole spectrum surface-enhanced laser desorption/ionization time of flight (SELDI-TOF) mass spectrometry (MS) data and is demonstrated on four real datasets from complete, complex SELDI spectra of human blood serum. Q5 is a closed-form, exact solution to the problem of classification of complete mass spectra of a complex protein mixture. Q5 employs a probabilistic classification algorithm built upon a dimension-reduced linear discriminant analysis. Our solution is computationally efficient; it is noniterative and computes the optimal linear discriminant using closed-form equations. The optimal discriminant is computed and verified for datasets of complete, complex SELDI spectra of human blood serum. Replicate experiments of different training/testing splits of each dataset are employed to verify robustness of the algorithm. The probabilistic classification method achieves excellent performance. We achieve sensitivity, specificity, and positive predictive values above 97% on three ovarian cancer datasets and one prostate cancer dataset. The Q5 method outperforms previous full-spectrum complex sample spectral classification techniques and can provide clues as to the molecular identities of differentially expressed proteins and peptides.</description><subject>Algorithms</subject><subject>Blood Proteins - analysis</subject><subject>Blood Proteins - chemistry</subject><subject>Databases, Protein</subject><subject>Diagnosis, Differential</subject><subject>Discriminant Analysis</subject><subject>Female</subject><subject>Gene Expression Regulation, Neoplastic</subject><subject>Humans</subject><subject>Male</subject><subject>Ovarian Neoplasms - chemistry</subject><subject>Ovarian Neoplasms - classification</subject><subject>Ovarian Neoplasms - diagnosis</subject><subject>Pattern Recognition, Automated</subject><subject>Principal Component Analysis</subject><subject>Probability</subject><subject>Prostatic Neoplasms - chemistry</subject><subject>Prostatic Neoplasms - classification</subject><subject>Prostatic Neoplasms - diagnosis</subject><subject>Proteome - analysis</subject><subject>Proteome - chemistry</subject><subject>Proteomics</subject><subject>Serum - chemistry</subject><subject>Spectrometry, Mass, Matrix-Assisted Laser Desorption-Ionization</subject><issn>1066-5277</issn><issn>1557-8666</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2003</creationdate><recordtype>article</recordtype><sourceid>EIF</sourceid><recordid>eNqFkctK7jAURoMo3l_AgWR0ZtVcmqQZingDQQc6LrvpLkbapiYp6NubH39wcAaOkh3W-tjhI-SMswvOGnvJmdZKGCalEEZpruwOOeRKmarRWu-WewGqQpgDcpTSO2Ncamb2yQGvbVOm5pB8PMfQQedHn7J3tPcJISF1I6TkB-8g-zDTMFD8XCKWtzBXPS449zhnusSQMUwbETLQIYaJTsWkaUGXy4Q5fm3st3WCmSaM63RC9gYYE55uz2Pyenvzcn1fPT7dPVxfPVaurlmuUGvRCwXQWC5RWQsglQEU5a8GBsFQsU700jbSaqaGXjpnXG06wcEAQ3lM_v3kliU_Vky5nXxyOI4wY1hTa7gqgU39J8iNYbbmG1D8gC6GlCIO7RL9BPGr5azdNNL-30iRzrfpazdh_6tsK5DfUmiISg</recordid><startdate>20030101</startdate><enddate>20030101</enddate><creator>Lilien, Ryan H</creator><creator>Farid, Hany</creator><creator>Donald, Bruce R</creator><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7QO</scope><scope>8FD</scope><scope>FR3</scope><scope>P64</scope><scope>7X8</scope></search><sort><creationdate>20030101</creationdate><title>Probabilistic disease classification of expression-dependent proteomic data from mass spectrometry of human serum</title><author>Lilien, Ryan H ; Farid, Hany ; Donald, Bruce R</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c440t-e662d25aa8913e599aa357ae22277af20e50b2d39839605fd3cc7c47b21a7a0e3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2003</creationdate><topic>Algorithms</topic><topic>Blood Proteins - analysis</topic><topic>Blood Proteins - chemistry</topic><topic>Databases, Protein</topic><topic>Diagnosis, Differential</topic><topic>Discriminant Analysis</topic><topic>Female</topic><topic>Gene Expression Regulation, Neoplastic</topic><topic>Humans</topic><topic>Male</topic><topic>Ovarian Neoplasms - chemistry</topic><topic>Ovarian Neoplasms - classification</topic><topic>Ovarian Neoplasms - diagnosis</topic><topic>Pattern Recognition, Automated</topic><topic>Principal Component Analysis</topic><topic>Probability</topic><topic>Prostatic Neoplasms - chemistry</topic><topic>Prostatic Neoplasms - classification</topic><topic>Prostatic Neoplasms - diagnosis</topic><topic>Proteome - analysis</topic><topic>Proteome - chemistry</topic><topic>Proteomics</topic><topic>Serum - chemistry</topic><topic>Spectrometry, Mass, Matrix-Assisted Laser Desorption-Ionization</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Lilien, Ryan H</creatorcontrib><creatorcontrib>Farid, Hany</creatorcontrib><creatorcontrib>Donald, Bruce R</creatorcontrib><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>Biotechnology Research Abstracts</collection><collection>Technology Research Database</collection><collection>Engineering Research Database</collection><collection>Biotechnology and BioEngineering Abstracts</collection><collection>MEDLINE - Academic</collection><jtitle>Journal of computational biology</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Lilien, Ryan H</au><au>Farid, Hany</au><au>Donald, Bruce R</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Probabilistic disease classification of expression-dependent proteomic data from mass spectrometry of human serum</atitle><jtitle>Journal of computational biology</jtitle><addtitle>J Comput Biol</addtitle><date>2003-01-01</date><risdate>2003</risdate><volume>10</volume><issue>6</issue><spage>925</spage><epage>946</epage><pages>925-946</pages><issn>1066-5277</issn><eissn>1557-8666</eissn><abstract>We have developed an algorithm called Q5 for probabilistic classification of healthy versus disease whole serum samples using mass spectrometry. The algorithm employs principal components analysis (PCA) followed by linear discriminant analysis (LDA) on whole spectrum surface-enhanced laser desorption/ionization time of flight (SELDI-TOF) mass spectrometry (MS) data and is demonstrated on four real datasets from complete, complex SELDI spectra of human blood serum. Q5 is a closed-form, exact solution to the problem of classification of complete mass spectra of a complex protein mixture. Q5 employs a probabilistic classification algorithm built upon a dimension-reduced linear discriminant analysis. Our solution is computationally efficient; it is noniterative and computes the optimal linear discriminant using closed-form equations. The optimal discriminant is computed and verified for datasets of complete, complex SELDI spectra of human blood serum. Replicate experiments of different training/testing splits of each dataset are employed to verify robustness of the algorithm. The probabilistic classification method achieves excellent performance. We achieve sensitivity, specificity, and positive predictive values above 97% on three ovarian cancer datasets and one prostate cancer dataset. The Q5 method outperforms previous full-spectrum complex sample spectral classification techniques and can provide clues as to the molecular identities of differentially expressed proteins and peptides.</abstract><cop>United States</cop><pmid>14980018</pmid><doi>10.1089/106652703322756159</doi><tpages>22</tpages><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | ISSN: 1066-5277 |
ispartof | Journal of computational biology, 2003-01, Vol.10 (6), p.925-946 |
issn | 1066-5277 1557-8666 |
language | eng |
recordid | cdi_proquest_miscellaneous_71535784 |
source | Mary Ann Liebert Online Subscription; MEDLINE |
subjects | Algorithms Blood Proteins - analysis Blood Proteins - chemistry Databases, Protein Diagnosis, Differential Discriminant Analysis Female Gene Expression Regulation, Neoplastic Humans Male Ovarian Neoplasms - chemistry Ovarian Neoplasms - classification Ovarian Neoplasms - diagnosis Pattern Recognition, Automated Principal Component Analysis Probability Prostatic Neoplasms - chemistry Prostatic Neoplasms - classification Prostatic Neoplasms - diagnosis Proteome - analysis Proteome - chemistry Proteomics Serum - chemistry Spectrometry, Mass, Matrix-Assisted Laser Desorption-Ionization |
title | Probabilistic disease classification of expression-dependent proteomic data from mass spectrometry of human serum |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-05T21%3A16%3A22IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Probabilistic%20disease%20classification%20of%20expression-dependent%20proteomic%20data%20from%20mass%20spectrometry%20of%20human%20serum&rft.jtitle=Journal%20of%20computational%20biology&rft.au=Lilien,%20Ryan%20H&rft.date=2003-01-01&rft.volume=10&rft.issue=6&rft.spage=925&rft.epage=946&rft.pages=925-946&rft.issn=1066-5277&rft.eissn=1557-8666&rft_id=info:doi/10.1089/106652703322756159&rft_dat=%3Cproquest_cross%3E17709414%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=17709414&rft_id=info:pmid/14980018&rfr_iscdi=true |