The Midpoint Mixed Model with a Missingness Mechanism (M5): A Likelihood-Based Framework for Quantification of Mass Spectrometry Proteomics Data (Preprint)

Statistical models for proteomics data often estimate protein fold changes between two samples, A and B, as the average peptide intensity from sample A divided by the average peptide intensity from sample B. Such average intensity ratios fail to take full advantage of the experimental design which e...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: O'Brien, Jonathon, Gunawardena, Harsha, Chen, Xian, Ibrahim, Joseph, Qaqish, Bahjat
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Statistical models for proteomics data often estimate protein fold changes between two samples, A and B, as the average peptide intensity from sample A divided by the average peptide intensity from sample B. Such average intensity ratios fail to take full advantage of the experimental design which eliminates unseen confounding variables by processing peptides from both samples under identical conditions. Typically this structure is exploited through the estimation of a protein ratio as the median ratio of matched peptide intensities. This simple solution fails to account for a substantial missing data bias which has led to the development of more sophisticated average intensity models. Here we develop the first statistical model that accounts for nonignorable missingness while utilizing peptide level matched pairs across samples. Our simulation analysis shows that models which fail to utilize peptide level ratios, su.er astonishing losses to accuracy with basic ANOVA estimates having an average MSE 371% higher than median ratio estimates. In turn, median ratio estimates have an average MSE 35% higher than our model estimates. An analysis of breast cancer data reinforces these relationships and shows that our model is capable of increasing the number of proteins estimated by 22%.
DOI:10.48550/arxiv.1507.06907