ZFP-CanPred: Predicting the effect of mutations in zinc-finger proteins in cancers using protein language models

•Developed a deep learning model to predict cancer driver mutations in zinc finger proteins using protein language models.•Representations derived from ESM-2 performed better than ProteinBERT in mutation classification.•Achieved high performance with an accuracy of 0.72, F1-score of 0.79, and AU-ROC...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Methods (San Diego, Calif.) Calif.), 2025-03, Vol.235, p.55-63
Hauptverfasser: Phogat, Amit, Krishnan, Sowmya Ramaswamy, Pandey, Medha, Gromiha, M. Michael
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:•Developed a deep learning model to predict cancer driver mutations in zinc finger proteins using protein language models.•Representations derived from ESM-2 performed better than ProteinBERT in mutation classification.•Achieved high performance with an accuracy of 0.72, F1-score of 0.79, and AU-ROC of 0.74 on an independent test set.•Outperformed existing prediction tools with the highest AU-ROC of 0.74 using an unbiased dataset of 357 mutations. Zinc-finger proteins (ZNFs) constitute the largest family of transcription factors and play crucial roles in various cellular processes. Missense mutations in ZNFs significantly alter protein-DNA interactions, potentially leading to the development of various types of cancers. This study presents ZFP-CanPred, a novel deep learning-based model for predicting cancer-associated driver mutations in ZNFs. The representations derived from protein language models (PLMs) from the structural neighbourhood of mutated sites were utilized to train ZFP-CanPred for differentiating between cancer-causing and neutral mutations. ZFP-CanPred, achieved a superior performance with an accuracy of 0.72, F1-score of 0.79, and area under the Receiver Operating Characteristics (ROC) Curve (AUC) of 0.74, on an independent test set. In a comparative analysis against 11 existing prediction tools using a curated dataset of 331 mutations, ZFP-CanPred demonstrated the highest AU-ROC of 0.74, outperforming both generic and cancer-specific methods. The model’s balanced performance across specificity and sensitivity addresses a significant limitation of current methodologies. The source code and other related files are available on GitHub at https://github.com/amitphogat/ZFP-CanPred.git. We envisage that the present study contributes to understand the oncogenic processes and developing targeted therapeutic strategies.
ISSN:1046-2023
1095-9130
1095-9130
DOI:10.1016/j.ymeth.2025.01.020