ZFP-CanPred: Predicting the effect of mutations in zinc-finger proteins in cancers using protein language models
•Developed a deep learning model to predict cancer driver mutations in zinc finger proteins using protein language models.•Representations derived from ESM-2 performed better than ProteinBERT in mutation classification.•Achieved high performance with an accuracy of 0.72, F1-score of 0.79, and AU-ROC...
Gespeichert in:
Veröffentlicht in: | Methods (San Diego, Calif.) Calif.), 2025-03, Vol.235, p.55-63 |
---|---|
Hauptverfasser: | , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | •Developed a deep learning model to predict cancer driver mutations in zinc finger proteins using protein language models.•Representations derived from ESM-2 performed better than ProteinBERT in mutation classification.•Achieved high performance with an accuracy of 0.72, F1-score of 0.79, and AU-ROC of 0.74 on an independent test set.•Outperformed existing prediction tools with the highest AU-ROC of 0.74 using an unbiased dataset of 357 mutations.
Zinc-finger proteins (ZNFs) constitute the largest family of transcription factors and play crucial roles in various cellular processes. Missense mutations in ZNFs significantly alter protein-DNA interactions, potentially leading to the development of various types of cancers. This study presents ZFP-CanPred, a novel deep learning-based model for predicting cancer-associated driver mutations in ZNFs. The representations derived from protein language models (PLMs) from the structural neighbourhood of mutated sites were utilized to train ZFP-CanPred for differentiating between cancer-causing and neutral mutations. ZFP-CanPred, achieved a superior performance with an accuracy of 0.72, F1-score of 0.79, and area under the Receiver Operating Characteristics (ROC) Curve (AUC) of 0.74, on an independent test set. In a comparative analysis against 11 existing prediction tools using a curated dataset of 331 mutations, ZFP-CanPred demonstrated the highest AU-ROC of 0.74, outperforming both generic and cancer-specific methods. The model’s balanced performance across specificity and sensitivity addresses a significant limitation of current methodologies. The source code and other related files are available on GitHub at https://github.com/amitphogat/ZFP-CanPred.git. We envisage that the present study contributes to understand the oncogenic processes and developing targeted therapeutic strategies. |
---|---|
ISSN: | 1046-2023 1095-9130 1095-9130 |
DOI: | 10.1016/j.ymeth.2025.01.020 |