A Regression-Based Analysis of Ribosome-Profiling Data Reveals a Conserved Complexity to Mammalian Translation

A fundamental goal of genomics is to identify the complete set of expressed proteins. Automated annotation strategies rely on assumptions about protein-coding sequences (CDSs), e.g., they are conserved, do not overlap, and exceed a minimum length. However, an increasing number of newly discovered pr...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Molecular cell 2015-12, Vol.60 (5), p.816-827
Hauptverfasser: Fields, Alexander P., Rodriguez, Edwin H., Jovanovic, Marko, Stern-Ginossar, Noam, Haas, Brian J., Mertins, Philipp, Raychowdhury, Raktima, Hacohen, Nir, Carr, Steven A., Ingolia, Nicholas T., Regev, Aviv, Weissman, Jonathan S.
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:A fundamental goal of genomics is to identify the complete set of expressed proteins. Automated annotation strategies rely on assumptions about protein-coding sequences (CDSs), e.g., they are conserved, do not overlap, and exceed a minimum length. However, an increasing number of newly discovered proteins violate these rules. Here we present an experimental and analytical framework, based on ribosome profiling and linear regression, for systematic identification and quantification of translation. Application of this approach to lipopolysaccharide-stimulated mouse dendritic cells and HCMV-infected human fibroblasts identifies thousands of novel CDSs, including micropeptides and variants of known proteins, that bear the hallmarks of canonical translation and exhibit translation levels and dynamics comparable to that of annotated CDSs. Remarkably, many translation events are identified in both mouse and human cells even when the peptide sequence is not conserved. Our work thus reveals an unexpected complexity to mammalian translation suited to provide both conserved regulatory or protein-based functions. [Display omitted] •ORF-RATER robustly identifies and quantifies translation from ribosome profiling data•ORF-RATER reveals thousands of novel micropeptides and variants of mammalian proteins•Hundreds of novel CDSs show evidence of protein-coding conservation among mammals•Many ORFs are translated in both mice and humans but lack protein-coding conservation Fields et al. describe a ribosome profiling-based approach for empirical annotation of protein-coding regions of the genome. Of the thousands of previously unknown translated ORFs they identify in mouse and human, many are conserved or dynamically regulated. Surprisingly, a considerable subset is translated in both species despite weak sequence conservation.
ISSN:1097-2765
1097-4164
DOI:10.1016/j.molcel.2015.11.013