A Model for Random Sampling and Estimation of Relative Protein Abundance in Shotgun Proteomics

Proteomic analysis of complex protein mixtures using proteolytic digestion and liquid chromatography in combination with tandem mass spectrometry is a standard approach in biological studies. Data-dependent acquisition is used to automatically acquire tandem mass spectra of peptides eluting into the...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Analytical chemistry (Washington) 2004-07, Vol.76 (14), p.4193-4201
Hauptverfasser: Liu, Hongbin, Sadygov, Rovshan G, Yates, John R
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 4201
container_issue 14
container_start_page 4193
container_title Analytical chemistry (Washington)
container_volume 76
creator Liu, Hongbin
Sadygov, Rovshan G
Yates, John R
description Proteomic analysis of complex protein mixtures using proteolytic digestion and liquid chromatography in combination with tandem mass spectrometry is a standard approach in biological studies. Data-dependent acquisition is used to automatically acquire tandem mass spectra of peptides eluting into the mass spectrometer. In more complicated mixtures, for example, whole cell lysates, data-dependent acquisition incompletely samples among the peptide ions present rather than acquiring tandem mass spectra for all ions available. We analyzed the sampling process and developed a statistical model to accurately predict the level of sampling expected for mixtures of a specific complexity. The model also predicts how many analyses are required for saturated sampling of a complex protein mixture. For a yeast-soluble cell lysate 10 analyses are required to reach a 95% saturation level on protein identifications based on our model. The statistical model also suggests a relationship between the level of sampling observed for a protein and the relative abundance of the protein in the mixture. We demonstrate a linear dynamic range over 2 orders of magnitude by using the number of spectra (spectral sampling) acquired for each protein.
doi_str_mv 10.1021/ac0498563
format Article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_66707683</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>66707683</sourcerecordid><originalsourceid>FETCH-LOGICAL-a503t-a8ffe9feb172936c483f9e075a62920c5a3338ef42f615fdd32c1bb5992c36e53</originalsourceid><addsrcrecordid>eNqF0U1vEzEQBmALgWgoHPgDyEICqYcF2xN_7DGKCkVqS5SUK5bXa5ctu3awdxH8exwlaio4cLKtefTKM4PQS0reUcLoe2PJvFZcwCM0o5yRSijFHqMZIQQqJgk5Qc9yviOEUkLFU3RSEAchYIa-LvBVbF2PfUx4bUIbB7wxw7bvwi0uT3yex24wYxcDjh6vXV_uPx1epTi6LuBFM4XWBOtweWy-xfF2CvtiHDqbn6Mn3vTZvTicp-jLh_Ob5UV1-fnjp-XisjKcwFgZ5b2rvWuoZDUIO1fga0ckN4LVjFhuAEA5P2deUO7bFpilTcPrmlkQjsMpervP3ab4Y3J51EOXret7E1ycshZCEikU_BdSCYRJukt8_Re8i1MKpQnNqFQKFOzQ2R7ZFHNOzuttKtNKvzUlercafb-aYl8dAqdmcO1RHnZRwJsDMNma3qcy1i4_cHWJkaq4au-6PLpf93WTvmshQXJ9s9ro9bWai6sl6NUx19h8bOLfD_4B1HWuyQ</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>217883835</pqid></control><display><type>article</type><title>A Model for Random Sampling and Estimation of Relative Protein Abundance in Shotgun Proteomics</title><source>MEDLINE</source><source>ACS Publications</source><creator>Liu, Hongbin ; Sadygov, Rovshan G ; Yates, John R</creator><creatorcontrib>Liu, Hongbin ; Sadygov, Rovshan G ; Yates, John R</creatorcontrib><description>Proteomic analysis of complex protein mixtures using proteolytic digestion and liquid chromatography in combination with tandem mass spectrometry is a standard approach in biological studies. Data-dependent acquisition is used to automatically acquire tandem mass spectra of peptides eluting into the mass spectrometer. In more complicated mixtures, for example, whole cell lysates, data-dependent acquisition incompletely samples among the peptide ions present rather than acquiring tandem mass spectra for all ions available. We analyzed the sampling process and developed a statistical model to accurately predict the level of sampling expected for mixtures of a specific complexity. The model also predicts how many analyses are required for saturated sampling of a complex protein mixture. For a yeast-soluble cell lysate 10 analyses are required to reach a 95% saturation level on protein identifications based on our model. The statistical model also suggests a relationship between the level of sampling observed for a protein and the relative abundance of the protein in the mixture. We demonstrate a linear dynamic range over 2 orders of magnitude by using the number of spectra (spectral sampling) acquired for each protein.</description><identifier>ISSN: 0003-2700</identifier><identifier>EISSN: 1520-6882</identifier><identifier>DOI: 10.1021/ac0498563</identifier><identifier>PMID: 15253663</identifier><identifier>CODEN: ANCHAM</identifier><language>eng</language><publisher>Washington, DC: American Chemical Society</publisher><subject>Analytical biochemistry: general aspects, technics, instrumentation ; Analytical chemistry ; Analytical, structural and metabolic biochemistry ; Biological and medical sciences ; Chemistry ; Data Collection - methods ; Exact sciences and technology ; Fundamental and applied biological sciences. Psychology ; Models, Statistical ; Proteins ; Proteins - analysis ; Proteomics - methods ; Proteomics - statistics &amp; numerical data ; Sampling techniques ; Spectrometric and optical methods ; Spectrum analysis</subject><ispartof>Analytical chemistry (Washington), 2004-07, Vol.76 (14), p.4193-4201</ispartof><rights>Copyright © 2004 American Chemical Society</rights><rights>2005 INIST-CNRS</rights><rights>Copyright American Chemical Society Jul 15, 2004</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-a503t-a8ffe9feb172936c483f9e075a62920c5a3338ef42f615fdd32c1bb5992c36e53</citedby><cites>FETCH-LOGICAL-a503t-a8ffe9feb172936c483f9e075a62920c5a3338ef42f615fdd32c1bb5992c36e53</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://pubs.acs.org/doi/pdf/10.1021/ac0498563$$EPDF$$P50$$Gacs$$H</linktopdf><linktohtml>$$Uhttps://pubs.acs.org/doi/10.1021/ac0498563$$EHTML$$P50$$Gacs$$H</linktohtml><link.rule.ids>314,777,781,2752,27057,27905,27906,56719,56769</link.rule.ids><backlink>$$Uhttp://pascal-francis.inist.fr/vibad/index.php?action=getRecordDetail&amp;idt=15956378$$DView record in Pascal Francis$$Hfree_for_read</backlink><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/15253663$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Liu, Hongbin</creatorcontrib><creatorcontrib>Sadygov, Rovshan G</creatorcontrib><creatorcontrib>Yates, John R</creatorcontrib><title>A Model for Random Sampling and Estimation of Relative Protein Abundance in Shotgun Proteomics</title><title>Analytical chemistry (Washington)</title><addtitle>Anal. Chem</addtitle><description>Proteomic analysis of complex protein mixtures using proteolytic digestion and liquid chromatography in combination with tandem mass spectrometry is a standard approach in biological studies. Data-dependent acquisition is used to automatically acquire tandem mass spectra of peptides eluting into the mass spectrometer. In more complicated mixtures, for example, whole cell lysates, data-dependent acquisition incompletely samples among the peptide ions present rather than acquiring tandem mass spectra for all ions available. We analyzed the sampling process and developed a statistical model to accurately predict the level of sampling expected for mixtures of a specific complexity. The model also predicts how many analyses are required for saturated sampling of a complex protein mixture. For a yeast-soluble cell lysate 10 analyses are required to reach a 95% saturation level on protein identifications based on our model. The statistical model also suggests a relationship between the level of sampling observed for a protein and the relative abundance of the protein in the mixture. We demonstrate a linear dynamic range over 2 orders of magnitude by using the number of spectra (spectral sampling) acquired for each protein.</description><subject>Analytical biochemistry: general aspects, technics, instrumentation</subject><subject>Analytical chemistry</subject><subject>Analytical, structural and metabolic biochemistry</subject><subject>Biological and medical sciences</subject><subject>Chemistry</subject><subject>Data Collection - methods</subject><subject>Exact sciences and technology</subject><subject>Fundamental and applied biological sciences. Psychology</subject><subject>Models, Statistical</subject><subject>Proteins</subject><subject>Proteins - analysis</subject><subject>Proteomics - methods</subject><subject>Proteomics - statistics &amp; numerical data</subject><subject>Sampling techniques</subject><subject>Spectrometric and optical methods</subject><subject>Spectrum analysis</subject><issn>0003-2700</issn><issn>1520-6882</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2004</creationdate><recordtype>article</recordtype><sourceid>EIF</sourceid><recordid>eNqF0U1vEzEQBmALgWgoHPgDyEICqYcF2xN_7DGKCkVqS5SUK5bXa5ctu3awdxH8exwlaio4cLKtefTKM4PQS0reUcLoe2PJvFZcwCM0o5yRSijFHqMZIQQqJgk5Qc9yviOEUkLFU3RSEAchYIa-LvBVbF2PfUx4bUIbB7wxw7bvwi0uT3yex24wYxcDjh6vXV_uPx1epTi6LuBFM4XWBOtweWy-xfF2CvtiHDqbn6Mn3vTZvTicp-jLh_Ob5UV1-fnjp-XisjKcwFgZ5b2rvWuoZDUIO1fga0ckN4LVjFhuAEA5P2deUO7bFpilTcPrmlkQjsMpervP3ab4Y3J51EOXret7E1ycshZCEikU_BdSCYRJukt8_Re8i1MKpQnNqFQKFOzQ2R7ZFHNOzuttKtNKvzUlercafb-aYl8dAqdmcO1RHnZRwJsDMNma3qcy1i4_cHWJkaq4au-6PLpf93WTvmshQXJ9s9ro9bWai6sl6NUx19h8bOLfD_4B1HWuyQ</recordid><startdate>20040715</startdate><enddate>20040715</enddate><creator>Liu, Hongbin</creator><creator>Sadygov, Rovshan G</creator><creator>Yates, John R</creator><general>American Chemical Society</general><scope>BSCLL</scope><scope>IQODW</scope><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7QF</scope><scope>7QO</scope><scope>7QQ</scope><scope>7SC</scope><scope>7SE</scope><scope>7SP</scope><scope>7SR</scope><scope>7TA</scope><scope>7TB</scope><scope>7TM</scope><scope>7U5</scope><scope>7U7</scope><scope>7U9</scope><scope>8BQ</scope><scope>8FD</scope><scope>C1K</scope><scope>F28</scope><scope>FR3</scope><scope>H8D</scope><scope>H8G</scope><scope>H94</scope><scope>JG9</scope><scope>JQ2</scope><scope>KR7</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>P64</scope><scope>7X8</scope></search><sort><creationdate>20040715</creationdate><title>A Model for Random Sampling and Estimation of Relative Protein Abundance in Shotgun Proteomics</title><author>Liu, Hongbin ; Sadygov, Rovshan G ; Yates, John R</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a503t-a8ffe9feb172936c483f9e075a62920c5a3338ef42f615fdd32c1bb5992c36e53</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2004</creationdate><topic>Analytical biochemistry: general aspects, technics, instrumentation</topic><topic>Analytical chemistry</topic><topic>Analytical, structural and metabolic biochemistry</topic><topic>Biological and medical sciences</topic><topic>Chemistry</topic><topic>Data Collection - methods</topic><topic>Exact sciences and technology</topic><topic>Fundamental and applied biological sciences. Psychology</topic><topic>Models, Statistical</topic><topic>Proteins</topic><topic>Proteins - analysis</topic><topic>Proteomics - methods</topic><topic>Proteomics - statistics &amp; numerical data</topic><topic>Sampling techniques</topic><topic>Spectrometric and optical methods</topic><topic>Spectrum analysis</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Liu, Hongbin</creatorcontrib><creatorcontrib>Sadygov, Rovshan G</creatorcontrib><creatorcontrib>Yates, John R</creatorcontrib><collection>Istex</collection><collection>Pascal-Francis</collection><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>Aluminium Industry Abstracts</collection><collection>Biotechnology Research Abstracts</collection><collection>Ceramic Abstracts</collection><collection>Computer and Information Systems Abstracts</collection><collection>Corrosion Abstracts</collection><collection>Electronics &amp; Communications Abstracts</collection><collection>Engineered Materials Abstracts</collection><collection>Materials Business File</collection><collection>Mechanical &amp; Transportation Engineering Abstracts</collection><collection>Nucleic Acids Abstracts</collection><collection>Solid State and Superconductivity Abstracts</collection><collection>Toxicology Abstracts</collection><collection>Virology and AIDS Abstracts</collection><collection>METADEX</collection><collection>Technology Research Database</collection><collection>Environmental Sciences and Pollution Management</collection><collection>ANTE: Abstracts in New Technology &amp; Engineering</collection><collection>Engineering Research Database</collection><collection>Aerospace Database</collection><collection>Copper Technical Reference Library</collection><collection>AIDS and Cancer Research Abstracts</collection><collection>Materials Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Civil Engineering Abstracts</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>Biotechnology and BioEngineering Abstracts</collection><collection>MEDLINE - Academic</collection><jtitle>Analytical chemistry (Washington)</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Liu, Hongbin</au><au>Sadygov, Rovshan G</au><au>Yates, John R</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>A Model for Random Sampling and Estimation of Relative Protein Abundance in Shotgun Proteomics</atitle><jtitle>Analytical chemistry (Washington)</jtitle><addtitle>Anal. Chem</addtitle><date>2004-07-15</date><risdate>2004</risdate><volume>76</volume><issue>14</issue><spage>4193</spage><epage>4201</epage><pages>4193-4201</pages><issn>0003-2700</issn><eissn>1520-6882</eissn><coden>ANCHAM</coden><abstract>Proteomic analysis of complex protein mixtures using proteolytic digestion and liquid chromatography in combination with tandem mass spectrometry is a standard approach in biological studies. Data-dependent acquisition is used to automatically acquire tandem mass spectra of peptides eluting into the mass spectrometer. In more complicated mixtures, for example, whole cell lysates, data-dependent acquisition incompletely samples among the peptide ions present rather than acquiring tandem mass spectra for all ions available. We analyzed the sampling process and developed a statistical model to accurately predict the level of sampling expected for mixtures of a specific complexity. The model also predicts how many analyses are required for saturated sampling of a complex protein mixture. For a yeast-soluble cell lysate 10 analyses are required to reach a 95% saturation level on protein identifications based on our model. The statistical model also suggests a relationship between the level of sampling observed for a protein and the relative abundance of the protein in the mixture. We demonstrate a linear dynamic range over 2 orders of magnitude by using the number of spectra (spectral sampling) acquired for each protein.</abstract><cop>Washington, DC</cop><pub>American Chemical Society</pub><pmid>15253663</pmid><doi>10.1021/ac0498563</doi><tpages>9</tpages></addata></record>
fulltext fulltext
identifier ISSN: 0003-2700
ispartof Analytical chemistry (Washington), 2004-07, Vol.76 (14), p.4193-4201
issn 0003-2700
1520-6882
language eng
recordid cdi_proquest_miscellaneous_66707683
source MEDLINE; ACS Publications
subjects Analytical biochemistry: general aspects, technics, instrumentation
Analytical chemistry
Analytical, structural and metabolic biochemistry
Biological and medical sciences
Chemistry
Data Collection - methods
Exact sciences and technology
Fundamental and applied biological sciences. Psychology
Models, Statistical
Proteins
Proteins - analysis
Proteomics - methods
Proteomics - statistics & numerical data
Sampling techniques
Spectrometric and optical methods
Spectrum analysis
title A Model for Random Sampling and Estimation of Relative Protein Abundance in Shotgun Proteomics
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-19T20%3A11%3A54IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=A%20Model%20for%20Random%20Sampling%20and%20Estimation%20of%20Relative%20Protein%20Abundance%20in%20Shotgun%20Proteomics&rft.jtitle=Analytical%20chemistry%20(Washington)&rft.au=Liu,%20Hongbin&rft.date=2004-07-15&rft.volume=76&rft.issue=14&rft.spage=4193&rft.epage=4201&rft.pages=4193-4201&rft.issn=0003-2700&rft.eissn=1520-6882&rft.coden=ANCHAM&rft_id=info:doi/10.1021/ac0498563&rft_dat=%3Cproquest_cross%3E66707683%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=217883835&rft_id=info:pmid/15253663&rfr_iscdi=true