An Improved Method for Identification of Pre-miRNA in Drosophila

Identification of microRNAs is important in studies of regulation of gene expression in many biologyical processes. In this study, we developed an improved method for identification of microRNAs in Drosophila. We used the iLearn, PyFeat, and Pse-in-One methods to extract the features and then used M...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE access 2020, Vol.8, p.52173-52180
Hauptverfasser: Yu, Tieying, Chen, Min, Wang, Chunde
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Identification of microRNAs is important in studies of regulation of gene expression in many biologyical processes. In this study, we developed an improved method for identification of microRNAs in Drosophila. We used the iLearn, PyFeat, and Pse-in-One methods to extract the features and then used Max-Relevance-Max-Distance (MRMD2.0) and t-Distributed Stochastic Neighbour Embedding (t-SNE) to reduce dimension of the features and the random forest classifier in Weka to identify miRNAs. With this method, we found that the discriminative features for identification of pre-miRNAs were, in Drosophila melanogaster, the occurrences of G_GUG and C_AGU when the value of the feature vector was greater than 2, and in Drosophila pseudoobscura, the 4-tuple nucleotide composition and the occurrence of 4-length neighbouring nucleic acids when the value of the feature vector was less than 0.02. These vectors covered all compositional information or the frequency of bases. Classification results showed the classification accuracy was 95.7% and 93.6%, the precision rate was 95.8% and 93.6%, and the recall rate was 95.7% and 93.6% in Drosophila melanogaster and Drosophila pseudoobscura, respectively, which are higher than those reported in previous studies.
ISSN:2169-3536
2169-3536
DOI:10.1109/ACCESS.2020.2980897