Recurrent neural network-based prediction of O-GlcNAcylation sites in mammalian proteins

O-GlcNAcylation has the potential to be an important target for therapeutics, but a motif or an algorithm to reliably predict O-GlcNAcylation sites is not available. Current predictive models are insufficient as they fail to generalize, and many are no longer available. This article constructs recur...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Computers & chemical engineering 2024-10, Vol.189, p.108818, Article 108818
Hauptverfasser:	Seber, Pedro, Braatz, Richard D.
Format:	Artikel
Sprache:	eng
Schlagworte:	Computational biology Deep learning Glycosylation Machine learning O-GlcNAcylation Recurrent neural networks
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	O-GlcNAcylation has the potential to be an important target for therapeutics, but a motif or an algorithm to reliably predict O-GlcNAcylation sites is not available. Current predictive models are insufficient as they fail to generalize, and many are no longer available. This article constructs recurrent neural network models to predict O-GlcNAcylation sites based on protein sequences. Different datasets are evaluated separately and assessed in terms of strengths and issues. Within a given dataset, results are robust to changes in cross-validation and test data as determined by nested validation. The best model achieves an F1 score of 36% (more than 3.5-fold greater than the previous best model) and a Matthews Correlation Coefficient of 35% (more than 4.5-fold greater than the previous best model), and, for the F1 score, 7.6-fold higher than when not using any model. Shapley values are used to interpret the model’s predictions and provide biological insight into O-GlcNAcylation. •O-GlcNAcylation has the potential to be an important target for therapeutics.•Recurrent neural networks predict O-GlcNAcylation sites based on protein sequences.•The model achieves 3.5-fold improvement in the F1 score over published models.•The model achieves 4.5-fold improvement in the Matthews Correlation Coefficient.•Shapley coefficients provide interpretability and insight on the model predictions.
ISSN:	0098-1354
DOI:	10.1016/j.compchemeng.2024.108818