LncRNAnet: long non-coding RNA identification using deep learning

Abstract Motivation Long non-coding RNAs (lncRNAs) are important regulatory elements in biological processes. LncRNAs share similar sequence characteristics with messenger RNAs, but they play completely different roles, thus providing novel insights for biological studies. The development of next-ge...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Bioinformatics 2018-11, Vol.34 (22), p.3889-3897
Hauptverfasser: Baek, Junghwan, Lee, Byunghan, Kwon, Sunyoung, Yoon, Sungroh
Format: Artikel
Sprache:eng
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Abstract Motivation Long non-coding RNAs (lncRNAs) are important regulatory elements in biological processes. LncRNAs share similar sequence characteristics with messenger RNAs, but they play completely different roles, thus providing novel insights for biological studies. The development of next-generation sequencing has helped in the discovery of lncRNA transcripts. However, the experimental verification of numerous transcriptomes is time consuming and costly. To alleviate these issues, a computational approach is needed to distinguish lncRNAs from the transcriptomes. Results We present a deep learning-based approach, lncRNAnet, to identify lncRNAs that incorporates recurrent neural networks for RNA sequence modeling and convolutional neural networks for detecting stop codons to obtain an open reading frame indicator. lncRNAnet performed clearly better than the other tools for sequences of short lengths, on which most lncRNAs are distributed. In addition, lncRNAnet successfully learned features and showed 7.83%, 5.76%, 5.30% and 3.78% improvements over the alternatives on a human test set in terms of specificity, accuracy, F1-score and area under the curve, respectively. Availability and implementation Data and codes are available in http://data.snu.ac.kr/pub/lncRNAnet.
ISSN:1367-4803
1460-2059
1367-4811
DOI:10.1093/bioinformatics/bty418