Unattained geometric configurations of secondary structure elements in protein structural space

[Display omitted] •Folds and families exhibit considerable structural diversity at an SSE level.•The space of helix-strand-strand SSE substructures shows evidence for a gap.•Many SSE sequences are not found in the PDB. Discovery of new folds in the Protein Data Bank (PDB) has all but ceased. This co...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Journal of structural biology 2022-09, Vol.214 (3), p.107870-107870, Article 107870
Hauptverfasser: Sykes, Janan, Holland, Barbara, Charleston, Michael
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:[Display omitted] •Folds and families exhibit considerable structural diversity at an SSE level.•The space of helix-strand-strand SSE substructures shows evidence for a gap.•Many SSE sequences are not found in the PDB. Discovery of new folds in the Protein Data Bank (PDB) has all but ceased. This could be viewed as evidence that all existing protein folds have been documented. Sampling bias has, however, been presented as an alternative explanation. Furthermore, although we may know of all protein folds that do exist, we may not have documented all protein folds that could exist. While addressing completeness in the context of entire protein structures is extremely difficult, they can be simplified in a number of ways. One such simplification is presented: considering protein structures as a series of α helices and β sheets and analysing the geometric relationships between these successive secondary structure elements (SSEs) through torsion angles, lengths and distances. We aimed to find out whether all substructures that could be formed by triplets of these successive SSEs were represented in the PDB. When SSEs were defined with the assignment program Promotif, a gap was identified in the represented torsion angles of helix-strand-strand substructures. This was not present when SSEs were defined with an alternative assignment program with a smaller minimum SSE length, DSSP. We also looked at representing proteins as one-dimensional sequences of SSE types and searched for underrepresented motifs. Completely absent motifs occurred more often than expected at random. If a gap in SSE substructure space exists that could be filled or if a physically possible SSE motif is absent, associated gaps in protein structure space are implied, meaning that the PDB as we know it may not be complete.
ISSN:1047-8477
1095-8657
DOI:10.1016/j.jsb.2022.107870