Systematic Comparison of False-Discovery-Rate-Controlling Strategies for Proteogenomic Search Using Spike-in Experiments

Proteogenomic searches are useful for novel peptide identification from tandem mass spectra. Usually, separate and multistage approaches are adopted to accurately control the false discovery rate (FDR) for proteogenomic search. Their performance on novel peptide identification has not been thoroughl...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Journal of proteome research 2017-06, Vol.16 (6), p.2231-2239
Hauptverfasser: Li, Honglan, Park, Jonghun, Kim, Hyunwoo, Hwang, Kyu-Baek, Paek, Eunok
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 2239
container_issue 6
container_start_page 2231
container_title Journal of proteome research
container_volume 16
creator Li, Honglan
Park, Jonghun
Kim, Hyunwoo
Hwang, Kyu-Baek
Paek, Eunok
description Proteogenomic searches are useful for novel peptide identification from tandem mass spectra. Usually, separate and multistage approaches are adopted to accurately control the false discovery rate (FDR) for proteogenomic search. Their performance on novel peptide identification has not been thoroughly evaluated, however, mainly due to the difficulty in confirming the existence of identified novel peptides. We simulated a proteogenomic search using a controlled, spike-in proteomic data set. After confirming that the results of the simulated proteogenomic search were similar to those of a real proteogenomic search using a human cell line data set, we evaluated the performance of six FDR control methodsglobal, separate, and multistage FDR estimation, respectively, coupled to a target-decoy search and a mixture model-based methodon novel peptide identification. The multistage approach showed the highest accuracy for FDR estimation. However, global and separate FDR estimation with the mixture model-based method showed higher sensitivities than others at the same true FDR. Furthermore, the mixture model-based method performed equally well when applied without or with a reduced set of decoy sequences. Considering different prior probabilities for novel and known protein identification, we recommend using mixture model-based methods with separate FDR estimation for sensitive and reliable identification of novel peptides from proteogenomic searches.
doi_str_mv 10.1021/acs.jproteome.7b00033
format Article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_1893547964</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>1893547964</sourcerecordid><originalsourceid>FETCH-LOGICAL-a351t-5c7eea912c1b3484eb581de912453c98f5fca12e3e74e24e6d84c287942598f23</originalsourceid><addsrcrecordid>eNqFkMtOwzAQRS0EoqXwCaAs2aT42SRLFFpAqgSidB057qSkJHGwHdT-Pe5zy8rWzL13Zg5CtwQPCabkQSo7XLVGO9A1DKMcY8zYGeoTwUTIEhydH_9xwnroytoVxkREmF2iHo25oDwWfbSebayDWrpSBamuW2lKq5tAF8FEVhbCp9Iq_QtmE35IB2GqG2d0VZXNMpg540vLEmxQaBO873ZZQqNrnzUDadRXMLc7ZVt-Q1g2wXjdgilraJy9RhfFdsLN4R2g-WT8mb6E07fn1_RxGkomiAuFigBkQqgiOeMxh1zEZAG-wAVTSVyIQklCgUHEgXIYLWKuaBwlnArfpWyA7ve5ntVPB9ZltT8Jqko2oDubEY9H8CgZcS8Ve6ky2loDRdb6ZaXZZARnW-iZh56doGcH6N53dxjR5TUsTq4jZS8ge8HOrzvT-Iv_Cf0DzW6Upw</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1893547964</pqid></control><display><type>article</type><title>Systematic Comparison of False-Discovery-Rate-Controlling Strategies for Proteogenomic Search Using Spike-in Experiments</title><source>American Chemical Society</source><source>MEDLINE</source><creator>Li, Honglan ; Park, Jonghun ; Kim, Hyunwoo ; Hwang, Kyu-Baek ; Paek, Eunok</creator><creatorcontrib>Li, Honglan ; Park, Jonghun ; Kim, Hyunwoo ; Hwang, Kyu-Baek ; Paek, Eunok</creatorcontrib><description>Proteogenomic searches are useful for novel peptide identification from tandem mass spectra. Usually, separate and multistage approaches are adopted to accurately control the false discovery rate (FDR) for proteogenomic search. Their performance on novel peptide identification has not been thoroughly evaluated, however, mainly due to the difficulty in confirming the existence of identified novel peptides. We simulated a proteogenomic search using a controlled, spike-in proteomic data set. After confirming that the results of the simulated proteogenomic search were similar to those of a real proteogenomic search using a human cell line data set, we evaluated the performance of six FDR control methodsglobal, separate, and multistage FDR estimation, respectively, coupled to a target-decoy search and a mixture model-based methodon novel peptide identification. The multistage approach showed the highest accuracy for FDR estimation. However, global and separate FDR estimation with the mixture model-based method showed higher sensitivities than others at the same true FDR. Furthermore, the mixture model-based method performed equally well when applied without or with a reduced set of decoy sequences. Considering different prior probabilities for novel and known protein identification, we recommend using mixture model-based methods with separate FDR estimation for sensitive and reliable identification of novel peptides from proteogenomic searches.</description><identifier>ISSN: 1535-3893</identifier><identifier>EISSN: 1535-3907</identifier><identifier>DOI: 10.1021/acs.jproteome.7b00033</identifier><identifier>PMID: 28452485</identifier><language>eng</language><publisher>United States: American Chemical Society</publisher><subject>Cell Line ; Computer Simulation ; False Positive Reactions ; Humans ; Methods ; Models, Theoretical ; Peptides - analysis ; Proteogenomics - methods ; Tandem Mass Spectrometry</subject><ispartof>Journal of proteome research, 2017-06, Vol.16 (6), p.2231-2239</ispartof><rights>Copyright © 2017 American Chemical Society</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-a351t-5c7eea912c1b3484eb581de912453c98f5fca12e3e74e24e6d84c287942598f23</citedby><cites>FETCH-LOGICAL-a351t-5c7eea912c1b3484eb581de912453c98f5fca12e3e74e24e6d84c287942598f23</cites><orcidid>0000-0001-6785-7760 ; 0000-0002-0001-0554 ; 0000-0003-3655-9749 ; 0000-0003-2652-5326</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://pubs.acs.org/doi/pdf/10.1021/acs.jproteome.7b00033$$EPDF$$P50$$Gacs$$H</linktopdf><linktohtml>$$Uhttps://pubs.acs.org/doi/10.1021/acs.jproteome.7b00033$$EHTML$$P50$$Gacs$$H</linktohtml><link.rule.ids>315,781,785,2766,27081,27929,27930,56743,56793</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/28452485$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Li, Honglan</creatorcontrib><creatorcontrib>Park, Jonghun</creatorcontrib><creatorcontrib>Kim, Hyunwoo</creatorcontrib><creatorcontrib>Hwang, Kyu-Baek</creatorcontrib><creatorcontrib>Paek, Eunok</creatorcontrib><title>Systematic Comparison of False-Discovery-Rate-Controlling Strategies for Proteogenomic Search Using Spike-in Experiments</title><title>Journal of proteome research</title><addtitle>J. Proteome Res</addtitle><description>Proteogenomic searches are useful for novel peptide identification from tandem mass spectra. Usually, separate and multistage approaches are adopted to accurately control the false discovery rate (FDR) for proteogenomic search. Their performance on novel peptide identification has not been thoroughly evaluated, however, mainly due to the difficulty in confirming the existence of identified novel peptides. We simulated a proteogenomic search using a controlled, spike-in proteomic data set. After confirming that the results of the simulated proteogenomic search were similar to those of a real proteogenomic search using a human cell line data set, we evaluated the performance of six FDR control methodsglobal, separate, and multistage FDR estimation, respectively, coupled to a target-decoy search and a mixture model-based methodon novel peptide identification. The multistage approach showed the highest accuracy for FDR estimation. However, global and separate FDR estimation with the mixture model-based method showed higher sensitivities than others at the same true FDR. Furthermore, the mixture model-based method performed equally well when applied without or with a reduced set of decoy sequences. Considering different prior probabilities for novel and known protein identification, we recommend using mixture model-based methods with separate FDR estimation for sensitive and reliable identification of novel peptides from proteogenomic searches.</description><subject>Cell Line</subject><subject>Computer Simulation</subject><subject>False Positive Reactions</subject><subject>Humans</subject><subject>Methods</subject><subject>Models, Theoretical</subject><subject>Peptides - analysis</subject><subject>Proteogenomics - methods</subject><subject>Tandem Mass Spectrometry</subject><issn>1535-3893</issn><issn>1535-3907</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2017</creationdate><recordtype>article</recordtype><sourceid>EIF</sourceid><recordid>eNqFkMtOwzAQRS0EoqXwCaAs2aT42SRLFFpAqgSidB057qSkJHGwHdT-Pe5zy8rWzL13Zg5CtwQPCabkQSo7XLVGO9A1DKMcY8zYGeoTwUTIEhydH_9xwnroytoVxkREmF2iHo25oDwWfbSebayDWrpSBamuW2lKq5tAF8FEVhbCp9Iq_QtmE35IB2GqG2d0VZXNMpg540vLEmxQaBO873ZZQqNrnzUDadRXMLc7ZVt-Q1g2wXjdgilraJy9RhfFdsLN4R2g-WT8mb6E07fn1_RxGkomiAuFigBkQqgiOeMxh1zEZAG-wAVTSVyIQklCgUHEgXIYLWKuaBwlnArfpWyA7ve5ntVPB9ZltT8Jqko2oDubEY9H8CgZcS8Ve6ky2loDRdb6ZaXZZARnW-iZh56doGcH6N53dxjR5TUsTq4jZS8ge8HOrzvT-Iv_Cf0DzW6Upw</recordid><startdate>20170602</startdate><enddate>20170602</enddate><creator>Li, Honglan</creator><creator>Park, Jonghun</creator><creator>Kim, Hyunwoo</creator><creator>Hwang, Kyu-Baek</creator><creator>Paek, Eunok</creator><general>American Chemical Society</general><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7X8</scope><orcidid>https://orcid.org/0000-0001-6785-7760</orcidid><orcidid>https://orcid.org/0000-0002-0001-0554</orcidid><orcidid>https://orcid.org/0000-0003-3655-9749</orcidid><orcidid>https://orcid.org/0000-0003-2652-5326</orcidid></search><sort><creationdate>20170602</creationdate><title>Systematic Comparison of False-Discovery-Rate-Controlling Strategies for Proteogenomic Search Using Spike-in Experiments</title><author>Li, Honglan ; Park, Jonghun ; Kim, Hyunwoo ; Hwang, Kyu-Baek ; Paek, Eunok</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a351t-5c7eea912c1b3484eb581de912453c98f5fca12e3e74e24e6d84c287942598f23</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2017</creationdate><topic>Cell Line</topic><topic>Computer Simulation</topic><topic>False Positive Reactions</topic><topic>Humans</topic><topic>Methods</topic><topic>Models, Theoretical</topic><topic>Peptides - analysis</topic><topic>Proteogenomics - methods</topic><topic>Tandem Mass Spectrometry</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Li, Honglan</creatorcontrib><creatorcontrib>Park, Jonghun</creatorcontrib><creatorcontrib>Kim, Hyunwoo</creatorcontrib><creatorcontrib>Hwang, Kyu-Baek</creatorcontrib><creatorcontrib>Paek, Eunok</creatorcontrib><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>MEDLINE - Academic</collection><jtitle>Journal of proteome research</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Li, Honglan</au><au>Park, Jonghun</au><au>Kim, Hyunwoo</au><au>Hwang, Kyu-Baek</au><au>Paek, Eunok</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Systematic Comparison of False-Discovery-Rate-Controlling Strategies for Proteogenomic Search Using Spike-in Experiments</atitle><jtitle>Journal of proteome research</jtitle><addtitle>J. Proteome Res</addtitle><date>2017-06-02</date><risdate>2017</risdate><volume>16</volume><issue>6</issue><spage>2231</spage><epage>2239</epage><pages>2231-2239</pages><issn>1535-3893</issn><eissn>1535-3907</eissn><abstract>Proteogenomic searches are useful for novel peptide identification from tandem mass spectra. Usually, separate and multistage approaches are adopted to accurately control the false discovery rate (FDR) for proteogenomic search. Their performance on novel peptide identification has not been thoroughly evaluated, however, mainly due to the difficulty in confirming the existence of identified novel peptides. We simulated a proteogenomic search using a controlled, spike-in proteomic data set. After confirming that the results of the simulated proteogenomic search were similar to those of a real proteogenomic search using a human cell line data set, we evaluated the performance of six FDR control methodsglobal, separate, and multistage FDR estimation, respectively, coupled to a target-decoy search and a mixture model-based methodon novel peptide identification. The multistage approach showed the highest accuracy for FDR estimation. However, global and separate FDR estimation with the mixture model-based method showed higher sensitivities than others at the same true FDR. Furthermore, the mixture model-based method performed equally well when applied without or with a reduced set of decoy sequences. Considering different prior probabilities for novel and known protein identification, we recommend using mixture model-based methods with separate FDR estimation for sensitive and reliable identification of novel peptides from proteogenomic searches.</abstract><cop>United States</cop><pub>American Chemical Society</pub><pmid>28452485</pmid><doi>10.1021/acs.jproteome.7b00033</doi><tpages>9</tpages><orcidid>https://orcid.org/0000-0001-6785-7760</orcidid><orcidid>https://orcid.org/0000-0002-0001-0554</orcidid><orcidid>https://orcid.org/0000-0003-3655-9749</orcidid><orcidid>https://orcid.org/0000-0003-2652-5326</orcidid></addata></record>
fulltext fulltext
identifier ISSN: 1535-3893
ispartof Journal of proteome research, 2017-06, Vol.16 (6), p.2231-2239
issn 1535-3893
1535-3907
language eng
recordid cdi_proquest_miscellaneous_1893547964
source American Chemical Society; MEDLINE
subjects Cell Line
Computer Simulation
False Positive Reactions
Humans
Methods
Models, Theoretical
Peptides - analysis
Proteogenomics - methods
Tandem Mass Spectrometry
title Systematic Comparison of False-Discovery-Rate-Controlling Strategies for Proteogenomic Search Using Spike-in Experiments
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-13T23%3A44%3A55IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Systematic%20Comparison%20of%20False-Discovery-Rate-Controlling%20Strategies%20for%20Proteogenomic%20Search%20Using%20Spike-in%20Experiments&rft.jtitle=Journal%20of%20proteome%20research&rft.au=Li,%20Honglan&rft.date=2017-06-02&rft.volume=16&rft.issue=6&rft.spage=2231&rft.epage=2239&rft.pages=2231-2239&rft.issn=1535-3893&rft.eissn=1535-3907&rft_id=info:doi/10.1021/acs.jproteome.7b00033&rft_dat=%3Cproquest_cross%3E1893547964%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=1893547964&rft_id=info:pmid/28452485&rfr_iscdi=true