TransPPMP: Predicting pathogenicity of frameshift and nonsense mutations by a transformer based on protein features
Protein structure can be severely disrupted by frameshift and nonsense mutations at specific positions in the protein sequence. Frameshift and nonsense mutation cases can also be found in healthy individuals. A method to distinguish neutral and potentially disease-associated frameshift and nonsense...
Gespeichert in:
Veröffentlicht in: | Bioinformatics (Oxford, England) England), 2022-03 |
---|---|
Hauptverfasser: | , , , , |
Format: | Artikel |
Sprache: | eng |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Protein structure can be severely disrupted by frameshift and nonsense mutations at specific positions in the protein sequence. Frameshift and nonsense mutation cases can also be found in healthy individuals. A method to distinguish neutral and potentially disease-associated frameshift and nonsense mutations is of practical and fundamental importance. It would allow researchers to rapidly screen out the potentially pathogenic sites from a large number of mutated genes and then use these sites as drug targets to speed up diagnosis and improve access to treatment. The problem of how to distinguish between neutral and potentially disease-associated frameshift and nonsense mutations remains under-researched.
We built a Transformer-based neural network model to predict the pathogenicity of frameshift and nonsense mutations on protein features and named it TransPPMP. The feature matrix of contextual sequences computed by the ESM pre-training model, type of mutation residue, and the auxiliary features, including structure and function information, are combined as input features, and the focal loss function is designed to solve the sample imbalance problem during the training. In 10-fold cross-validation and independent blind test set, TransPPMP showed good robust performance and absolute advantages in all evaluation metrics compared with four other advanced methods, namely, ENTPRISE-X, VEST-indel, DDIG-in, and CADD. In addition, we demonstrate the usefulness of the multi-head attention mechanism in Transformer to predict the pathogenicity of mutations-not only can multiple self-attention heads learn local and global interactions, but also functional sites with large influence on the mutated residue can be captured by attention focus. These could offer useful clues to study the pathogenicity mechanism of human complex diseases for which traditional machine learning methods fall short.
TransPPMP is available at https://github.com/lennylv/TransPPMP.
Supplementary data are available at Bioinformatics online. |
---|---|
ISSN: | 1367-4811 |