ORFhunteR: An accurate approach to the automatic identification and annotation of open reading frames in human mRNA molecules

The coding potential of RNA molecules can be estimated using algorithms that find open reading frames (ORFs). However, previously developed algorithms show limited performance. We developed a computational approach dedicated to the automatic identification of ORFs in a large set of human mRNA molecu...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Software impacts 2022-05, Vol.12, p.100268, Article 100268
Hauptverfasser: Grinev, Vasily V., Yatskou, Mikalai M., Skakun, Victor V., Chepeleva, Maryna K., Nazarov, Petr V.
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:The coding potential of RNA molecules can be estimated using algorithms that find open reading frames (ORFs). However, previously developed algorithms show limited performance. We developed a computational approach dedicated to the automatic identification of ORFs in a large set of human mRNA molecules. It is based on the vectorization of nucleotide sequences followed by classification using a random forest. The predictive model was validated on human mRNA molecules from the NCBI RefSeq and Ensembl databases and demonstrated almost 95% accuracy in detecting true ORFs. Our method is implemented into a powerful R/Bioconductor package ORFhunteR. •ORFhunteR is an R/Bioconductor package aimed at automatic identification of ORFs in large sets of human mRNA molecules.•The approach is based on vectorization of nucleotide sequences into features, followed by a random forest classification.•The predictive model was validated on human mRNA molecules from the NCBI RefSeq and Ensembl databases.•ORFhunteR is available as a package in R/Bioconductor and as an online tool (https://orfhunter.bsu.by/).
ISSN:2665-9638
2665-9638
DOI:10.1016/j.simpa.2022.100268