Representation learning of genomic sequence motifs with convolutional neural networks

Although convolutional neural networks (CNNs) have been applied to a variety of computational genomics problems, there remains a large gap in our understanding of how they build representations of regulatory genomic sequences. Here we perform systematic experiments on synthetic sequences to reveal h...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	PLoS computational biology 2019-12, Vol.15 (12), p.e1007560-e1007560
Hauptverfasser:	Koo, Peter K, Eddy, Sean R
Format:	Artikel
Sprache:	eng
Schlagworte:	Amino Acid Motifs Artificial neural networks Binding sites Binding Sites - genetics Biology and Life Sciences Cellular biology Computational Biology Computer and Information Sciences Computer applications Computer Simulation Databases, Genetic - statistics & numerical data Deep Learning - statistics & numerical data DNA - genetics Genome, Human Genomics Genomics - statistics & numerical data Humans Learning Neural networks Neural Networks, Computer Neurons Representations Research and Analysis Methods Transcription Factors - chemistry Transcription Factors - genetics Transcription Factors - metabolism
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Although convolutional neural networks (CNNs) have been applied to a variety of computational genomics problems, there remains a large gap in our understanding of how they build representations of regulatory genomic sequences. Here we perform systematic experiments on synthetic sequences to reveal how CNN architecture, specifically convolutional filter size and max-pooling, influences the extent that sequence motif representations are learned by first layer filters. We find that CNNs designed to foster hierarchical representation learning of sequence motifs-assembling partial features into whole features in deeper layers-tend to learn distributed representations, i.e. partial motifs. On the other hand, CNNs that are designed to limit the ability to hierarchically build sequence motif representations in deeper layers tend to learn more interpretable localist representations, i.e. whole motifs. We then validate that this representation learning principle established from synthetic sequences generalizes to in vivo sequences.
ISSN:	1553-7358 1553-734X 1553-7358
DOI:	10.1371/journal.pcbi.1007560