Recognizing millions of consistently unidentified spectra across hundreds of shotgun proteomics datasets

A newly developed algorithm enabled clustering of all 256 million (66 million identified and 190 million unidentified) peptide MS/MS spectra available in the PRIDE Archive database, allowing the detection of millions of consistently unidentified spectra across different data sets, of which roughly 2...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Nature methods 2016-08, Vol.13 (8), p.651-656
Hauptverfasser: Griss, Johannes, Perez-Riverol, Yasset, Lewis, Steve, Tabb, David L, Dianes, José A, del-Toro, Noemi, Rurik, Marc, Walzer, Mathias, Kohlbacher, Oliver, Hermjakob, Henning, Wang, Rui, Vizcaíno, Juan Antonio
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:A newly developed algorithm enabled clustering of all 256 million (66 million identified and 190 million unidentified) peptide MS/MS spectra available in the PRIDE Archive database, allowing the detection of millions of consistently unidentified spectra across different data sets, of which roughly 20% could be identified using multiple complementary analysis tools. Mass spectrometry (MS) is the main technology used in proteomics approaches. However, on average, 75% of spectra analyzed in an MS experiment remain unidentified. We propose to use spectrum clustering at a large scale to shed light on these unidentified spectra. The Proteomics Identifications (PRIDE) Database Archive is one of the largest MS proteomics public data repositories worldwide. By clustering all tandem MS spectra publicly available in the PRIDE Archive, coming from hundreds of data sets, we were able to consistently characterize spectra into three distinct groups: (1) incorrectly identified, (2) correctly identified but below the set scoring threshold, and (3) truly unidentified. Using multiple complementary analysis approaches, we were able to identify ∼20% of the consistently unidentified spectra. The complete spectrum-clustering results are available through the new version of the PRIDE Cluster resource ( http://www.ebi.ac.uk/pride/cluster ). This resource is intended, among other aims, to encourage and simplify further investigation into these unidentified spectra.
ISSN:1548-7091
1548-7105
DOI:10.1038/nmeth.3902