Tree and Hashing Data Structures to Speed up Chemical Searches: Analysis and Experiments

In many large chemoinformatics database systems, molecules are represented by long binary fingerprint vectors whose components record the presence or absence of particular functional groups or combinatorial features. For a given query molecule, one is interested in retrieving all the molecules in th...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Molecular informatics 2011-09, Vol.30 (9), p.791-800
Hauptverfasser: Nasr, Ramzi, Kristensen, Thomas, Baldi, Pierre
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:In many large chemoinformatics database systems, molecules are represented by long binary fingerprint vectors whose components record the presence or absence of particular functional groups or combinatorial features. For a given query molecule, one is interested in retrieving all the molecules in the database with a similarity to the query above a certain threshold. Here we describe a method for speeding up chemical searches in these large databases of small molecules by combining previously developed tree and hashing data structures to prune the search space without any false negatives. More importantly, we provide a mathematical analysis that allows one to predict the level of pruning, and validate the quality of the predictions of the method through simulation experiments.
ISSN:1868-1743
1868-1751
DOI:10.1002/minf.201100089