Clustering one million molecular structures on GPU within seconds

Structure clustering is a general but time‐consuming work in the study of life science. Up to now, most published tools do not support the clustering analysis on graphics processing unit (GPU) with root mean square deviation metric. In this work, we specially write codes to do the work. It supports...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Journal of computational chemistry 2024-12, Vol.45 (32), p.2710-2718
Hauptverfasser: Gao, Junyong, Wu, Mincong, Liao, Jun, Meng, Fanjun, Chen, Changjun
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Structure clustering is a general but time‐consuming work in the study of life science. Up to now, most published tools do not support the clustering analysis on graphics processing unit (GPU) with root mean square deviation metric. In this work, we specially write codes to do the work. It supports multiple threads on multiple GPUs. To show the performance, we apply the program to a 33‐residue fragment in protein Pin1 WW domain mutant. The dataset contains 1,400,000 snapshots, which are extracted from an enhanced sampling simulation and distribute widely in the conformational space. Various testing results present that our program is quite efficient. Particularly, with two NVIDIA RTX4090 GPUs and single precision data type, the clustering calculation on 1 million snapshots is completed in a few seconds (including the uploading time of data from memory to GPU and neglecting the reading time from hard disk). This is hundreds of times faster than central processing unit. Our program could be a powerful tool for fast extraction of representative states of a molecule among its thousands to millions of candidate structures. In this paper, we implement a GPU‐accelerated molecular structure clustering module in FSATOOL (https://github.com/fsatool/fsatool.github.io). It supports K‐medoids method, best‐fit RMSD metric, multi‐GPU platform and single or double precision data type. The clustering quality is indicated by Davies‐Bouldin index (DBI) and Residue‐Similarity index (RSI). Performance test shows that the program is able to complete the clustering calculation for one million snapshots of a 33‐residue protein in a few seconds. This is hundreds of times faster than CPU. The program can be used as a high‐throughput analysis tool for long molecular dynamics simulation trajectories.
ISSN:0192-8651
1096-987X
1096-987X
DOI:10.1002/jcc.27470