Reinspection of a Clinical Proteomics Tumor Analysis Consortium (CPTAC) dataset with cloud computing reveals abundant post-translational modifications and protein sequence variants
Simple Summary:& nbsp;We reanalyzed a publicly available breast cancer proteomics dataset consisting of 122 human tumor samples using a scalable cloud computing workflow. By doing so, we were able to search these files against millions of known human sequence variants and hundreds of common post...
Gespeichert in:
Veröffentlicht in: | Cancers 2021-10, Vol.13 (20) |
---|---|
Hauptverfasser: | , , , , , , , , , , , , , , , , , , , , , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Simple Summary:& nbsp;We reanalyzed a publicly available breast cancer proteomics dataset consisting of 122 human tumor samples using a scalable cloud computing workflow. By doing so, we were able to search these files against millions of known human sequence variants and hundreds of common post-translational protein modifications, thereby demonstrating the power of cloud computing to address proteomic data in a true biological context. We identified thousands of relevant sequence variants and PTMs, indicating that the original studies may have only scratched the surface of the true value of the CPTAC studies completed to date. We present the results of this reanalysis in a searchable web interface for community analysis and validation.The Clinical Proteomic Tumor Analysis Consortium (CPTAC) has provided some of the most in-depth analyses of the phenotypes of human tumors ever constructed. Today, the majority of proteomic data analysis is still performed using software housed on desktop computers which limits the number of sequence variants and post-translational modifications that can be considered. The original CPTAC studies limited the search for PTMs to only samples that were chemically enriched for those modified peptides. Similarly, the only sequence variants considered were those with strong evidence at the exon or transcript level. In this multi-institutional collaborative reanalysis, we utilized unbiased protein databases containing millions of human sequence variants in conjunction with hundreds of common post-translational modifications. Using these tools, we identified tens of thousands of high-confidence PTMs and sequence variants. We identified 4132 phosphorylated peptides in nonenriched samples, 93% of which were confirmed in the samples which were chemically enriched for phosphopeptides. In addition, our results also cover 90% of the high-confidence variants reported by the original proteogenomics study, without the need for sample specific next-generation sequencing. Finally, we report fivefold more somatic and germline variants that have an independent evidence at the peptide level, including mutations in ERRB2 and BCAS1. In this reanalysis of CPTAC proteomic data with cloud computing, we present an openly available and searchable web resource of the highest-coverage proteomic profiling of human tumors described to date. |
---|---|
DOI: | 10.3390/cancers13205034 |