Optimizing FPGA-based Accelerator Design for Large-Scale Molecular Similarity Search
Molecular similarity search has been widely used in drug discovery to identify structurally similar compounds from large molecular databases rapidly. With the increasing size of chemical libraries, there is growing interest in the efficient acceleration of large-scale similarity search. Existing wor...
Gespeichert in:
Hauptverfasser: | , , , , , , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Molecular similarity search has been widely used in drug discovery to
identify structurally similar compounds from large molecular databases rapidly.
With the increasing size of chemical libraries, there is growing interest in
the efficient acceleration of large-scale similarity search. Existing works
mainly focus on CPU and GPU to accelerate the computation of the Tanimoto
coefficient in measuring the pairwise similarity between different molecular
fingerprints. In this paper, we propose and optimize an FPGA-based accelerator
design on exhaustive and approximate search algorithms. On exhaustive search
using BitBound & folding, we analyze the similarity cutoff and folding level
relationship with search speedup and accuracy, and propose a scalable
on-the-fly query engine on FPGAs to reduce the resource utilization and
pipeline interval. We achieve a 450 million compounds-per-second processing
throughput for a single query engine. On approximate search using hierarchical
navigable small world (HNSW), a popular algorithm with high recall and query
speed. We propose an FPGA-based graph traversal engine to utilize a high
throughput register array based priority queue and fine-grained distance
calculation engine to increase the processing capability. Experimental results
show that the proposed FPGA-based HNSW implementation has a 103385 query per
second (QPS) on the Chembl database with 0.92 recall and achieves a 35x speedup
than the existing CPU implementation on average. To the best of our knowledge,
our FPGA-based implementation is the first attempt to accelerate molecular
similarity search algorithms on FPGA and has the highest performance among
existing approaches. |
---|---|
DOI: | 10.48550/arxiv.2109.06355 |