Using Inverted Indices for Accelerating LINGO Calculations

The ever growing size of chemical databases calls for the development of novel methods for representing and comparing molecules. One such method called LINGO is based on fragmenting the SMILES string representation of molecules. Comparison of molecules can then be performed by calculating the Tanimo...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Journal of chemical information and modeling 2011-03, Vol.51 (3), p.597-600
Hauptverfasser:	Kristensen, Thomas G, Nielsen, Jesper, Pedersen, Christian N. S
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Chemicals Chemistry Computational Biology Computational Chemistry Databases, Chemical Exact sciences and technology Fingerprinting General and physical chemistry General. Nomenclature, chemical documentation, computer chemistry Matrix Molecules Theory of reactions, general kinetics. Catalysis. Nomenclature, chemical documentation, computer chemistry
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	The ever growing size of chemical databases calls for the development of novel methods for representing and comparing molecules. One such method called LINGO is based on fragmenting the SMILES string representation of molecules. Comparison of molecules can then be performed by calculating the Tanimoto coefficient, which is called LINGOsim when used on LINGO multisets. This paper introduces a verbose representation for storing LINGO multisets, which makes it possible to transform them into sparse fingerprints such that fingerprint data structures and algorithms can be used to accelerate queries. The previous best method for rapidly calculating the LINGOsim similarity matrix required specialized hardware to yield a significant speedup over existing methods. By representing LINGO multisets in the verbose representation and using inverted indices, it is possible to calculate LINGOsim similarity matrices roughly 2.6 times faster than existing methods without relying on specialized hardware.
ISSN:	1549-9596 1549-960X
DOI:	10.1021/ci100437e