DR-BERT: A protein language model to annotate disordered regions
Despite their lack of a rigid structure, intrinsically disordered regions (IDRs) in proteins play important roles in cellular functions, including mediating protein-protein interactions. Therefore, it is important to computationally annotate IDRs with high accuracy. In this study, we present Disorde...
Gespeichert in:
Veröffentlicht in: | Structure (London) 2024-08, Vol.32 (8), p.1260-1268.e3 |
---|---|
Hauptverfasser: | , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Despite their lack of a rigid structure, intrinsically disordered regions (IDRs) in proteins play important roles in cellular functions, including mediating protein-protein interactions. Therefore, it is important to computationally annotate IDRs with high accuracy. In this study, we present Disordered Region prediction using Bidirectional Encoder Representations from Transformers (DR-BERT), a compact protein language model. Unlike most popular tools, DR-BERT is pretrained on unannotated proteins and trained to predict IDRs without relying on explicit evolutionary or biophysical data. Despite this, DR-BERT demonstrates significant improvement over existing methods on the Critical Assessment of protein Intrinsic Disorder (CAID) evaluation dataset and outperforms competitors on two out of four test cases in the CAID 2 dataset, while maintaining competitiveness in the others. This performance is due to the information learned during pretraining and DR-BERT’s ability to use contextual information.
[Display omitted]
•DR-BERT, a protein language model, makes accurate predictions of disordered regions•Performance is due to pretraining and DR-BERT’s ability to use contextual information•DR-BERT does not require sequence alignments or biophysical properties as an input
Nambiar et al. present DR-BERT, a lightweight protein language model that outperforms many existing methods in predicting intrinsically disordered protein regions. Leveraging contextual information, DR-BERT’s pretraining-based approach offers a computationally efficient and accurate means for IDR annotation, allowing easier access to computational annotation of IDR. |
---|---|
ISSN: | 0969-2126 1878-4186 1878-4186 |
DOI: | 10.1016/j.str.2024.04.010 |