Large-scale DNA-based phenotypic recording and deep learning enable highly accurate sequence-function mapping

Predicting effects of gene regulatory elements (GREs) is a longstanding challenge in biology. Machine learning may address this, but requires large datasets linking GREs to their quantitative function. However, experimental methods to generate such datasets are either application-specific or technic...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Nature communications 2020-07, Vol.11 (1), p.3551-3551, Article 3551
Hauptverfasser: Höllerer, Simon, Papaxanthos, Laetitia, Gumpinger, Anja Cathrin, Fischer, Katrin, Beisel, Christian, Borgwardt, Karsten, Benenson, Yaakov, Jeschek, Markus
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Predicting effects of gene regulatory elements (GREs) is a longstanding challenge in biology. Machine learning may address this, but requires large datasets linking GREs to their quantitative function. However, experimental methods to generate such datasets are either application-specific or technically complex and error-prone. Here, we introduce DNA-based phenotypic recording as a widely applicable, practicable approach to generate large-scale sequence-function datasets. We use a site-specific recombinase to directly record a GRE’s effect in DNA, enabling readout of both sequence and quantitative function for extremely large GRE-sets via next-generation sequencing. We record translation kinetics of over 300,000 bacterial ribosome binding sites (RBSs) in >2.7 million sequence-function pairs in a single experiment. Further, we introduce a deep learning approach employing ensembling and uncertainty modelling that predicts RBS function with high accuracy, outperforming state-of-the-art methods. DNA-based phenotypic recording combined with deep learning represents a major advance in our ability to predict function from genetic sequence. Current methods to generate sequence-function data at large scale are either technically complex or limited to specific applications. Here the authors introduce DNA-based phenotypic recording to overcome these limitations and enable deep learning for accurate prediction of function from sequence.
ISSN:2041-1723
2041-1723
DOI:10.1038/s41467-020-17222-4