Flexible Protein-Protein Docking Benchmark(FD1.0)

To effectively assess the capabilities of various methods in flexible protein-protein docking, it is essential for a protein-protein docking dataset to encompass not only the structures of the heterodimer but also that of unbound monomers. Existing datasets such as DB5.5 and AB-Benchmark, while usef...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
1. Verfasser:	Qin, Ming
Format:	Dataset
Sprache:	eng
Schlagworte:	Protein-protein Flexible Docking
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	To effectively assess the capabilities of various methods in flexible protein-protein docking, it is essential for a protein-protein docking dataset to encompass not only the structures of the heterodimer but also that of unbound monomers. Existing datasets such as DB5.5 and AB-Benchmark, while useful, are relatively limited in scale. In contrast, the Database of Interacting Protein Structures (DIPS) contains up to 42,826 binary protein complex structures but lacks the unbound state structures of the monomers. This limitation restricts its applicability to evaluations of rigid docking models rather than flexible ones. Consequently, the impact of large-scale docking datasets on methods for flexible protein-protein docking has not been thoroughly explored. To address this gap, we introduce the Flexible Protein-Protein Docking Benchmark (FD1.0), which, to our knowledge, is currently the largest dataset dedicated to flexible protein-protein docking. By providing a large and well-characterized dataset, FD1.0 aims to foster innovation in the development of flexible docking algorithms. It allows researchers to rigorously test and refine their methods, facilitating more accurate predictions of protein interactions, which are essential for understanding biological functions and designing therapeutic interventions. In our analysis of the DIPS dataset, we identified several critical issues: (1) Multiple three-dimensional structures correspond to a single protein sequence, introducing substantial noise and affecting fair comparisons among baselines, especially for models reliant on 3D structural data. (2) The DIPS training set, primarily consisting of homo-multimers, fails to capture the diversity of interface types fully. Moreover, protein-protein docking predictions are most valuable for elucidating mechanisms of protein-protein interactions (PPIs), which predominantly involve heterodimers. Homomers, often synthesized directly rather than through docking, do not accurately represent typical PPI scenarios. (3) A significant number of docking cases in DIPS involve the interaction of one polymeric protein with another, further complicating the dataset. As a cornerstone for the flexible docking dataset, it is imperative to acquire the structures of protein monomers in their unbound state. Specifically, this can be achieved through protein structure prediction methods, such as AlphaFold2, and the aggregation of structural data from sources including electron microscopy.
DOI:	10.5281/zenodo.14004827