PRECOGx: exploring GPCR signaling mechanisms with deep protein representations

Abstract In this study we show that protein language models can encode structural and functional information of GPCR sequences that can be used to predict their signaling and functional repertoire. We used the ESM1b protein embeddings as features and the binding information known from publicly avail...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Nucleic acids research 2022-07, Vol.50 (W1), p.W598-W610
Hauptverfasser: Matic, Marin, Singh, Gurdeep, Carli, Francesco, De Oliveira Rosa, Natalia, Miglionico, Pasquale, Magni, Lorenzo, Gutkind, J Silvio, Russell, Robert B, Inoue, Asuka, Raimondi, Francesco
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Abstract In this study we show that protein language models can encode structural and functional information of GPCR sequences that can be used to predict their signaling and functional repertoire. We used the ESM1b protein embeddings as features and the binding information known from publicly available studies to develop PRECOGx, a machine learning predictor to explore GPCR interactions with G protein and β-arrestin, which we made available through a new webserver (https://precogx.bioinfolab.sns.it/). PRECOGx outperformed its predecessor (e.g. PRECOG) in predicting GPCR-transducer couplings, being also able to consider all GPCR classes. The webserver also provides new functionalities, such as the projection of input sequences on a low-dimensional space describing essential features of the human GPCRome, which is used as a reference to track GPCR variants. Additionally, it allows inspection of the sequence and structural determinants responsible for coupling via the analysis of the most important attention maps used by the models as well as through predicted intramolecular contacts. We demonstrate applications of PRECOGx by predicting the impact of disease variants (ClinVar) and alternative splice forms from healthy tissues (GTEX) of human GPCRs, revealing the power to dissect system biasing mechanisms in both health and disease. Graphical Abstract Graphical Abstract Protein language models can encode structural and functional information of GPCR sequences which is used to predict their signaling and functional mechanisms, such as coupling with G protein and β-arrestin and their sequence and structural determinants. Projection of the input sequence on a low-dimensional space (PCA) describes the essential features of the human GPCRome.
ISSN:0305-1048
1362-4962
DOI:10.1093/nar/gkac426