Tree and Hashing Data Structures to Speed up Chemical Searches: Analysis and Experiments
In many large chemoinformatics database systems, molecules are represented by long binary fingerprint vectors whose components record the presence or absence of particular functional groups or combinatorial features. For a given query molecule, one is interested in retrieving all the molecules in th...
Gespeichert in:
Veröffentlicht in: | Molecular informatics 2011-09, Vol.30 (9), p.791-800 |
---|---|
Hauptverfasser: | , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | In many large chemoinformatics database systems, molecules are represented by long binary fingerprint vectors whose components record the presence or absence of particular functional groups or combinatorial features. For a given query molecule, one is interested in retrieving all the molecules in the database with a similarity to the query above a certain threshold. Here we describe a method for speeding up chemical searches in these large databases of small molecules by combining previously developed tree and hashing data structures to prune the search space without any false negatives. More importantly, we provide a mathematical analysis that allows one to predict the level of pruning, and validate the quality of the predictions of the method through simulation experiments. |
---|---|
ISSN: | 1868-1743 1868-1751 |
DOI: | 10.1002/minf.201100089 |