DeepGenGrep: a general deep learning-based predictor for multiple genomic signals and regions

Abstract Motivation Accurate annotation of different genomic signals and regions (GSRs) from DNA sequences is fundamentally important for understanding gene structure, regulation and function. Numerous efforts have been made to develop machine learning-based predictors for in silico identification o...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Bioinformatics 2022-09, Vol.38 (17), p.4053-4061
Hauptverfasser: Liu, Quanzhong, Fang, Honglin, Wang, Xiao, Wang, Miao, Li, Shuqin, Coin, Lachlan J M, Li, Fuyi, Song, Jiangning
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Abstract Motivation Accurate annotation of different genomic signals and regions (GSRs) from DNA sequences is fundamentally important for understanding gene structure, regulation and function. Numerous efforts have been made to develop machine learning-based predictors for in silico identification of GSRs. However, it remains a great challenge to identify GSRs as the performance of most existing approaches is unsatisfactory. As such, it is highly desirable to develop more accurate computational methods for GSRs prediction. Results In this study, we propose a general deep learning framework termed DeepGenGrep, a general predictor for the systematic identification of multiple different GSRs from genomic DNA sequences. DeepGenGrep leverages the power of hybrid neural networks comprising a three-layer convolutional neural network and a two-layer long short-term memory to effectively learn useful feature representations from sequences. Benchmarking experiments demonstrate that DeepGenGrep outperforms several state-of-the-art approaches on identifying polyadenylation signals, translation initiation sites and splice sites across four eukaryotic species including Homo sapiens, Mus musculus, Bos taurus and Drosophila melanogaster. Overall, DeepGenGrep represents a useful tool for the high-throughput and cost-effective identification of potential GSRs in eukaryotic genomes. Availability and implementation The webserver and source code are freely available at http://bigdata.biocie.cn/deepgengrep/home and Github (https://github.com/wx-cie/DeepGenGrep/). Supplementary information Supplementary data are available at Bioinformatics online.
ISSN:1367-4803
1460-2059
1367-4811
DOI:10.1093/bioinformatics/btac454