Voting-ac4C:Pre-trained large RNA language model enhances RNA N4-acetylcytidine site prediction
RNA N4-acetylcytidine (ac4C) modification plays a crucial role in gene expression regulation. However, existing prediction methods face limitations in capturing RNA sequence features, particularly in handling sequence complexity and long-range dependencies. To enhance the accuracy of RNA-ac4C modifi...
Gespeichert in:
Veröffentlicht in: | International journal of biological macromolecules 2024-12, Vol.282 (Pt 3), p.136940, Article 136940 |
---|---|
Hauptverfasser: | , , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | RNA N4-acetylcytidine (ac4C) modification plays a crucial role in gene expression regulation. However, existing prediction methods face limitations in capturing RNA sequence features, particularly in handling sequence complexity and long-range dependencies. To enhance the accuracy of RNA-ac4C modification sites prediction, this study introduces, for the first time, the transformer-based RNAErnie pre-trained model, which deeply extracts semantic information from RNA sequences. This model is combined with six traditional feature extraction methods (such as One-hot, ENAC, etc.) to form a multidimensional feature set. On this basis, we propose the Voting-ac4C model, which utilizes a deep neural network for feature selection. The selected features are then fed into a soft voting ensemble learning model, integrating the strengths of various machine learning algorithms to predict RNA-ac4C modification sites. Experimental results demonstrate that compared to the state-of-the-art methods, Voting-ac4C achieves significant improvements across multiple metrics, including AUC, SN, SP, ACC, and MCC. This study provides a novel approach for RNA modification sites prediction and highlights the potential applications of pre-trained models in biological sequence analysis. |
---|---|
ISSN: | 0141-8130 1879-0003 1879-0003 |
DOI: | 10.1016/j.ijbiomac.2024.136940 |