Supervised enhancer prediction with epigenetic pattern recognition and targeted validation

Enhancers are important non-coding elements, but they have traditionally been hard to characterize experimentally. The development of massively parallel assays allows the characterization of large numbers of enhancers for the first time. Here, we developed a framework using Drosophila STARR-seq to c...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Nature methods 2020-08, Vol.17 (8), p.807-814
Hauptverfasser: Sethi, Anurag, Gu, Mengting, Gumusgoz, Emrah, Chan, Landon, Yan, Koon-Kiu, Rozowsky, Joel, Barozzi, Iros, Afzal, Veena, Akiyama, Jennifer A., Plajzer-Frick, Ingrid, Yan, Chengfei, Novak, Catherine S., Kato, Momoe, Garvin, Tyler H., Pham, Quan, Harrington, Anne, Mannion, Brandon J., Lee, Elizabeth A., Fukuda-Yuzawa, Yoko, Visel, Axel, Dickel, Diane E., Yip, Kevin Y., Sutton, Richard, Pennacchio, Len A., Gerstein, Mark
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Enhancers are important non-coding elements, but they have traditionally been hard to characterize experimentally. The development of massively parallel assays allows the characterization of large numbers of enhancers for the first time. Here, we developed a framework using Drosophila STARR-seq to create shape-matching filters based on meta-profiles of epigenetic features. We integrated these features with supervised machine-learning algorithms to predict enhancers. We further demonstrated that our model could be transferred to predict enhancers in mammals. We comprehensively validated the predictions using a combination of in vivo and in vitro approaches, involving transgenic assays in mice and transduction-based reporter assays in human cell lines (153 enhancers in total). The results confirmed that our model can accurately predict enhancers in different species without re-parameterization. Finally, we examined the transcription factor binding patterns at predicted enhancers versus promoters. We demonstrated that these patterns enable the construction of a secondary model that effectively distinguishes enhancers and promoters. Supervised machine-learning models trained using Drosophila epigenetic and STARR-seq data can be transferred to predict mouse and human enhancers.
ISSN:1548-7091
1548-7105
DOI:10.1038/s41592-020-0907-8