ORFhunteR: An accurate approach to the automatic identification and annotation of open reading frames in human mRNA molecules
The coding potential of RNA molecules can be estimated using algorithms that find open reading frames (ORFs). However, previously developed algorithms show limited performance. We developed a computational approach dedicated to the automatic identification of ORFs in a large set of human mRNA molecu...
Gespeichert in:
Veröffentlicht in: | Software impacts 2022-05, Vol.12, p.100268, Article 100268 |
---|---|
Hauptverfasser: | , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | The coding potential of RNA molecules can be estimated using algorithms that find open reading frames (ORFs). However, previously developed algorithms show limited performance. We developed a computational approach dedicated to the automatic identification of ORFs in a large set of human mRNA molecules. It is based on the vectorization of nucleotide sequences followed by classification using a random forest. The predictive model was validated on human mRNA molecules from the NCBI RefSeq and Ensembl databases and demonstrated almost 95% accuracy in detecting true ORFs. Our method is implemented into a powerful R/Bioconductor package ORFhunteR.
•ORFhunteR is an R/Bioconductor package aimed at automatic identification of ORFs in large sets of human mRNA molecules.•The approach is based on vectorization of nucleotide sequences into features, followed by a random forest classification.•The predictive model was validated on human mRNA molecules from the NCBI RefSeq and Ensembl databases.•ORFhunteR is available as a package in R/Bioconductor and as an online tool (https://orfhunter.bsu.by/). |
---|---|
ISSN: | 2665-9638 2665-9638 |
DOI: | 10.1016/j.simpa.2022.100268 |