The Midpoint Mixed Model with a Missingness Mechanism (M5): A Likelihood-Based Framework for Quantification of Mass Spectrometry Proteomics Data (Preprint)

Statistical models for proteomics data often estimate protein fold changes between two samples, A and B, as the average peptide intensity from sample A divided by the average peptide intensity from sample B. Such average intensity ratios fail to take full advantage of the experimental design which e...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: O'Brien, Jonathon, Gunawardena, Harsha, Chen, Xian, Ibrahim, Joseph, Qaqish, Bahjat
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title
container_volume
creator O'Brien, Jonathon
Gunawardena, Harsha
Chen, Xian
Ibrahim, Joseph
Qaqish, Bahjat
description Statistical models for proteomics data often estimate protein fold changes between two samples, A and B, as the average peptide intensity from sample A divided by the average peptide intensity from sample B. Such average intensity ratios fail to take full advantage of the experimental design which eliminates unseen confounding variables by processing peptides from both samples under identical conditions. Typically this structure is exploited through the estimation of a protein ratio as the median ratio of matched peptide intensities. This simple solution fails to account for a substantial missing data bias which has led to the development of more sophisticated average intensity models. Here we develop the first statistical model that accounts for nonignorable missingness while utilizing peptide level matched pairs across samples. Our simulation analysis shows that models which fail to utilize peptide level ratios, su.er astonishing losses to accuracy with basic ANOVA estimates having an average MSE 371% higher than median ratio estimates. In turn, median ratio estimates have an average MSE 35% higher than our model estimates. An analysis of breast cancer data reinforces these relationships and shows that our model is capable of increasing the number of proteins estimated by 22%.
doi_str_mv 10.48550/arxiv.1507.06907
format Article
fullrecord <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_1507_06907</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>1507_06907</sourcerecordid><originalsourceid>FETCH-LOGICAL-a677-2adda55ad747c7a2f339235ee78c338bf97b30cd3fe00dc7e8f29a6699bb96d3</originalsourceid><addsrcrecordid>eNotkMFOwkAQhnvxYNAH8OQc4VBcurTbekMUNaERA_dmujtrN9Au2V0FnsWXtaKnf_In883ki6KbCRtP8zRld-iO5ms8SZkYs6xg4jL63jQEpVF7a7rQD0dSUFpFOziY0AD2lfem--jIeyhJNtgZ38KwTEf3MIOl2dLONNaq-AF9v7tw2NLBui1o6-D9E7tgtJEYjO3Aaiix56z3JIOzLQV3gpWzgWxrpIdHDAjDlaO9678ZXUUXGneerv9zEK0XT5v5S7x8e36dz5YxZkLECSqFaYpKTIUUmGjOi4SnRCKXnOe1LkTNmVRcE2NKCsp1UmCWFUVdF5nig-j2j3qWU_WnW3Sn6ldSdZbEfwBj_2Rh</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>The Midpoint Mixed Model with a Missingness Mechanism (M5): A Likelihood-Based Framework for Quantification of Mass Spectrometry Proteomics Data (Preprint)</title><source>arXiv.org</source><creator>O'Brien, Jonathon ; Gunawardena, Harsha ; Chen, Xian ; Ibrahim, Joseph ; Qaqish, Bahjat</creator><creatorcontrib>O'Brien, Jonathon ; Gunawardena, Harsha ; Chen, Xian ; Ibrahim, Joseph ; Qaqish, Bahjat</creatorcontrib><description>Statistical models for proteomics data often estimate protein fold changes between two samples, A and B, as the average peptide intensity from sample A divided by the average peptide intensity from sample B. Such average intensity ratios fail to take full advantage of the experimental design which eliminates unseen confounding variables by processing peptides from both samples under identical conditions. Typically this structure is exploited through the estimation of a protein ratio as the median ratio of matched peptide intensities. This simple solution fails to account for a substantial missing data bias which has led to the development of more sophisticated average intensity models. Here we develop the first statistical model that accounts for nonignorable missingness while utilizing peptide level matched pairs across samples. Our simulation analysis shows that models which fail to utilize peptide level ratios, su.er astonishing losses to accuracy with basic ANOVA estimates having an average MSE 371% higher than median ratio estimates. In turn, median ratio estimates have an average MSE 35% higher than our model estimates. An analysis of breast cancer data reinforces these relationships and shows that our model is capable of increasing the number of proteins estimated by 22%.</description><identifier>DOI: 10.48550/arxiv.1507.06907</identifier><language>eng</language><subject>Statistics - Applications</subject><creationdate>2015-07</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,780,885</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/1507.06907$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.1507.06907$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>O'Brien, Jonathon</creatorcontrib><creatorcontrib>Gunawardena, Harsha</creatorcontrib><creatorcontrib>Chen, Xian</creatorcontrib><creatorcontrib>Ibrahim, Joseph</creatorcontrib><creatorcontrib>Qaqish, Bahjat</creatorcontrib><title>The Midpoint Mixed Model with a Missingness Mechanism (M5): A Likelihood-Based Framework for Quantification of Mass Spectrometry Proteomics Data (Preprint)</title><description>Statistical models for proteomics data often estimate protein fold changes between two samples, A and B, as the average peptide intensity from sample A divided by the average peptide intensity from sample B. Such average intensity ratios fail to take full advantage of the experimental design which eliminates unseen confounding variables by processing peptides from both samples under identical conditions. Typically this structure is exploited through the estimation of a protein ratio as the median ratio of matched peptide intensities. This simple solution fails to account for a substantial missing data bias which has led to the development of more sophisticated average intensity models. Here we develop the first statistical model that accounts for nonignorable missingness while utilizing peptide level matched pairs across samples. Our simulation analysis shows that models which fail to utilize peptide level ratios, su.er astonishing losses to accuracy with basic ANOVA estimates having an average MSE 371% higher than median ratio estimates. In turn, median ratio estimates have an average MSE 35% higher than our model estimates. An analysis of breast cancer data reinforces these relationships and shows that our model is capable of increasing the number of proteins estimated by 22%.</description><subject>Statistics - Applications</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2015</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotkMFOwkAQhnvxYNAH8OQc4VBcurTbekMUNaERA_dmujtrN9Au2V0FnsWXtaKnf_In883ki6KbCRtP8zRld-iO5ms8SZkYs6xg4jL63jQEpVF7a7rQD0dSUFpFOziY0AD2lfem--jIeyhJNtgZ38KwTEf3MIOl2dLONNaq-AF9v7tw2NLBui1o6-D9E7tgtJEYjO3Aaiix56z3JIOzLQV3gpWzgWxrpIdHDAjDlaO9678ZXUUXGneerv9zEK0XT5v5S7x8e36dz5YxZkLECSqFaYpKTIUUmGjOi4SnRCKXnOe1LkTNmVRcE2NKCsp1UmCWFUVdF5nig-j2j3qWU_WnW3Sn6ldSdZbEfwBj_2Rh</recordid><startdate>20150724</startdate><enddate>20150724</enddate><creator>O'Brien, Jonathon</creator><creator>Gunawardena, Harsha</creator><creator>Chen, Xian</creator><creator>Ibrahim, Joseph</creator><creator>Qaqish, Bahjat</creator><scope>EPD</scope><scope>GOX</scope></search><sort><creationdate>20150724</creationdate><title>The Midpoint Mixed Model with a Missingness Mechanism (M5): A Likelihood-Based Framework for Quantification of Mass Spectrometry Proteomics Data (Preprint)</title><author>O'Brien, Jonathon ; Gunawardena, Harsha ; Chen, Xian ; Ibrahim, Joseph ; Qaqish, Bahjat</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a677-2adda55ad747c7a2f339235ee78c338bf97b30cd3fe00dc7e8f29a6699bb96d3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2015</creationdate><topic>Statistics - Applications</topic><toplevel>online_resources</toplevel><creatorcontrib>O'Brien, Jonathon</creatorcontrib><creatorcontrib>Gunawardena, Harsha</creatorcontrib><creatorcontrib>Chen, Xian</creatorcontrib><creatorcontrib>Ibrahim, Joseph</creatorcontrib><creatorcontrib>Qaqish, Bahjat</creatorcontrib><collection>arXiv Statistics</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>O'Brien, Jonathon</au><au>Gunawardena, Harsha</au><au>Chen, Xian</au><au>Ibrahim, Joseph</au><au>Qaqish, Bahjat</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>The Midpoint Mixed Model with a Missingness Mechanism (M5): A Likelihood-Based Framework for Quantification of Mass Spectrometry Proteomics Data (Preprint)</atitle><date>2015-07-24</date><risdate>2015</risdate><abstract>Statistical models for proteomics data often estimate protein fold changes between two samples, A and B, as the average peptide intensity from sample A divided by the average peptide intensity from sample B. Such average intensity ratios fail to take full advantage of the experimental design which eliminates unseen confounding variables by processing peptides from both samples under identical conditions. Typically this structure is exploited through the estimation of a protein ratio as the median ratio of matched peptide intensities. This simple solution fails to account for a substantial missing data bias which has led to the development of more sophisticated average intensity models. Here we develop the first statistical model that accounts for nonignorable missingness while utilizing peptide level matched pairs across samples. Our simulation analysis shows that models which fail to utilize peptide level ratios, su.er astonishing losses to accuracy with basic ANOVA estimates having an average MSE 371% higher than median ratio estimates. In turn, median ratio estimates have an average MSE 35% higher than our model estimates. An analysis of breast cancer data reinforces these relationships and shows that our model is capable of increasing the number of proteins estimated by 22%.</abstract><doi>10.48550/arxiv.1507.06907</doi><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier DOI: 10.48550/arxiv.1507.06907
ispartof
issn
language eng
recordid cdi_arxiv_primary_1507_06907
source arXiv.org
subjects Statistics - Applications
title The Midpoint Mixed Model with a Missingness Mechanism (M5): A Likelihood-Based Framework for Quantification of Mass Spectrometry Proteomics Data (Preprint)
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-05T14%3A44%3A12IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=The%20Midpoint%20Mixed%20Model%20with%20a%20Missingness%20Mechanism%20(M5):%20A%20Likelihood-Based%20Framework%20for%20Quantification%20of%20Mass%20Spectrometry%20Proteomics%20Data%20(Preprint)&rft.au=O'Brien,%20Jonathon&rft.date=2015-07-24&rft_id=info:doi/10.48550/arxiv.1507.06907&rft_dat=%3Carxiv_GOX%3E1507_06907%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true