LDAEXC: LncRNA–Disease Associations Prediction with Deep Autoencoder and XGBoost Classifier
Numerous scientific evidences have revealed that long non-coding RNAs (lncRNAs) are involved in the progression of human complex diseases and biological life activities. Therefore, identifying novel and potential disease-related lncRNAs is helpful to diagnosis, prognosis and therapy of many human co...
Gespeichert in:
Veröffentlicht in: | Interdisciplinary sciences : computational life sciences 2023-09, Vol.15 (3), p.439-451 |
---|---|
Hauptverfasser: | , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Numerous scientific evidences have revealed that long non-coding RNAs (lncRNAs) are involved in the progression of human complex diseases and biological life activities. Therefore, identifying novel and potential disease-related lncRNAs is helpful to diagnosis, prognosis and therapy of many human complex diseases. Since traditional laboratory experiments are cost and time-consuming, a great quantity of computer algorithms have been proposed for predicting the relationships between lncRNAs and diseases. However, there are still much room for the improvement. In this paper, we introduce an accurate framework named LDAEXC to infer LncRNA–Disease Associations with deep autoencoder and XGBoost Classifier. LDAEXC utilizes different similarity views of lncRNAs and human diseases to construct features for each data sources. Then, the reduced features are obtained by feeding the constructed feature vectors into a deep autoencoder, and at last an XGBoost classifier is leveraged to calculate the latent lncRNA–disease-associated scores using reduced features. The fivefold cross-validation experiments on four datasets showed that LDAEXC reached AUC scores of 0.9676 ± 0.0043, 0.9449 ± 0.022, 0.9375 ± 0.0331 and 0.9556 ± 0.0134, respectively, significantly higher than other advanced similar computer methods. Extensive experiment results and case studies of two complex diseases (colon and breast cancers) further indicated the practicability and excellent prediction performance of LDAEXC in inferring unknown lncRNA–disease associations.
Graphical Abstract
TLDAEXC utilizes disease semantic similarity, lncRNA expression similarity, and Gaussian interaction profile kernel similarity of lncRNAs and diseases for feature construction. The constructed features are fed to a deep autoencoder to extract reduced features, and an XGBoost classifier is used to predict the lncRNA–disease associations based on the reduced features. The fivefold and tenfold cross-validation experiments on a benchmark dataset showed that LDAEXC could achieve AUC scores of 0.9676 and 0.9682, respectively, significantly higher than other state-of-the-art similar methods. |
---|---|
ISSN: | 1913-2751 1867-1462 |
DOI: | 10.1007/s12539-023-00573-z |