An efficient circRNA-miRNA interaction prediction model by combining biological text mining and wavelet diffusion-based sparse network structure embedding

Accumulating clinical evidence shows that circular RNA (circRNA) plays an important regulatory role in the occurrence and development of human diseases, which is expected to provide a new perspective for the diagnosis and treatment of related diseases. Using computational methods can provide high pr...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Computers in biology and medicine 2023-10, Vol.165, p.107421-107421, Article 107421
Hauptverfasser: Wang, Xin-Fei, Yu, Chang-Qing, You, Zhu-Hong, Qiao, Yan, Li, Zheng-Wei, Huang, Wen-Zhun
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Accumulating clinical evidence shows that circular RNA (circRNA) plays an important regulatory role in the occurrence and development of human diseases, which is expected to provide a new perspective for the diagnosis and treatment of related diseases. Using computational methods can provide high probability preselection for wet experiments to save resources. However, due to the lack of neighborhood structure in sparse biological networks, the model based on network embedding and graph embedding is difficult to achieve ideal results. In this paper, we propose BioDGW-CMI, which combines biological text mining and wavelet diffusion-based sparse network structure embedding to predict circRNA-miRNA interaction (CMI). In detail, BioDGW-CMI first uses the Bidirectional Encoder Representations from Transformers (BERT) for biological text mining to mine hidden features in RNA sequences, then constructs a CMI network, obtains the topological structure embedding of nodes in the network through heat wavelet diffusion patterns. Next, the Denoising autoencoder organically combines the structural features and Gaussian kernel similarity, finally, the feature is sent to lightGBM for training and prediction. BioDGW-CMI achieves the highest prediction performance in all three datasets in the field of CMI prediction. In the case study, all the 8 pairs of CMI based on circ-ITCH were successfully predicted. The data and source code can be found at https://github.com/1axin/BioDGW-CMI-model. •BioDGW-CMI uses wavelet diffusion as an energy signal to capture the topology of nodes.•BioDGW-CMI uses Biobert to extract sequence features of biological significance•BioDGW-CMI achieves the most competitive performance in all known datasets.
ISSN:0010-4825
1879-0534
DOI:10.1016/j.compbiomed.2023.107421