Assembling the Community-Scale Discoverable Human Proteome

The increasing throughput and sharing of proteomics mass spectrometry data have now yielded over one-third of a million public mass spectrometry runs. However, these discoveries are not continuously aggregated in an open and error-controlled manner, which limits their utility. To facilitate the reus...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Cell systems 2018-10, Vol.7 (4), p.412-421.e5
Hauptverfasser:	Wang, Mingxun, Wang, Jian, Carver, Jeremy, Pullman, Benjamin S., Cha, Seong Won, Bandeira, Nuno
Format:	Artikel
Sprache:	eng
Schlagworte:	algorithms big data knowledge base proteomics repositories spectral libraries tandem mass spectrometry
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	The increasing throughput and sharing of proteomics mass spectrometry data have now yielded over one-third of a million public mass spectrometry runs. However, these discoveries are not continuously aggregated in an open and error-controlled manner, which limits their utility. To facilitate the reusability of these data, we built the MassIVE Knowledge Base (MassIVE-KB), a community-wide, continuously updating knowledge base that aggregates proteomics mass spectrometry discoveries into an open reusable format with full provenance information for community scrutiny. Reusing >31 TB of public human data stored in a mass spectrometry interactive virtual environment (MassIVE), the MassIVE-KB contains >2.1 million precursors from 19,610 proteins (48% larger than before; 97% of the total) and doubles proteome coverage to 6 million amino acids (54% of the proteome) with strict library-scale false discovery controls, thereby providing evidence for 430 proteins for which sufficient protein-level evidence was previously missing. Furthermore, MassIVE-KB can inform experimental design, helps identify and quantify new data, and provides tools for community construction of specialized spectral libraries. [Display omitted] •Reprocessed 31 TB of human proteomics data•MassIVE-KB spectral library including 2.1 million precursors (>4-fold increase)•55% of all human proteome amino acids are covered (2-fold increase)•430 new proteins observed with previously missing proteomics evidence Wang et al. introduce MassIVE-KB, a program designed to distill the entire community’s mass spectrometry data into reusable spectral library resources. As a result, the statistically-significant discovery of a peptide or protein in a single researcher’s data will thus be made available to the whole community to support its identification (in shotgun experiments) or quantitative detection (in targeted experiments) in all future analyses.
ISSN:	2405-4712 2405-4720
DOI:	10.1016/j.cels.2018.08.004