Undersampling and the inference of coevolution in proteins
Protein structure, function, and evolution depend on local and collective epistatic interactions between amino acids. A powerful approach to defining these interactions is to construct models of couplings between amino acids that reproduce the empirical statistics (frequencies and correlations) obse...
Gespeichert in:
Veröffentlicht in: | Cell systems 2023-03, Vol.14 (3), p.210-219.e7 |
---|---|
Hauptverfasser: | , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Protein structure, function, and evolution depend on local and collective epistatic interactions between amino acids. A powerful approach to defining these interactions is to construct models of couplings between amino acids that reproduce the empirical statistics (frequencies and correlations) observed in sequences comprising a protein family. The top couplings are then interpreted. Here, we show that as currently implemented, this inference unequally represents epistatic interactions, a problem that fundamentally arises from limited sampling of sequences in the context of distinct scales at which epistasis occurs in proteins. We show that these issues explain the ability of current approaches to predict tertiary contacts between amino acids and the inability to obviously expose larger networks of functionally relevant, collectively evolving residues called sectors. This work provides a necessary foundation for more deeply understanding and improving evolution-based models of proteins.
•Direct coupling analysis models unequally represent epistatic patterns within proteins•Model inference is typically done in the limit of extreme undersampling of input data•Show why epistatic features of different sizes and strengths are unequally inferred•Findings are recapitulated in experimental data
A current approach for understanding and designing proteins is to make models of epistatic interactions between amino acids from available sequence data comprising a protein family. This work shows that as currently implemented, these models unequally represent the pattern of these interactions. These insights provide a basis for improving next-generation models. |
---|---|
ISSN: | 2405-4712 2405-4720 |
DOI: | 10.1016/j.cels.2022.12.013 |