Fast screening of large databases using clustering and PCA based on structure fragments

Jarvis‐Patrick clustering based on structural fragments with the Tanimoto coefficient as the similarity measure provides a fast tool for classification of large amounts of chemicals. This clustering technique was applied to chemicals in relation to their acute fish toxicity (LC50). Correlation analy...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Journal of chemometrics 1996-09, Vol.10 (5-6), p.385-398
Hauptverfasser: Nouwen, Johan, Lindgren, Fredrik, Hansen, Bjorn, Karcher, Walter, Verhaar, Henk J. M., Hermens, Joop L. M.
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Jarvis‐Patrick clustering based on structural fragments with the Tanimoto coefficient as the similarity measure provides a fast tool for classification of large amounts of chemicals. This clustering technique was applied to chemicals in relation to their acute fish toxicity (LC50). Correlation analysis with log LC50 as the response variable and log Kow as the predictor variable resulted in good models for several clusters. Benzylic chemicals were not recognized as separate clusters. Including them in the training set resulted in models without any predictive capability. Based on statistical and chemical criteria, they were rejected, improving the final model substantially. The toxicological response of phenols and some organophosphates was found to fit well into one model. The clustering resulted in smaller groupings than those listed by Verhaar et al. but were only in dispute for a minority of chemicals. PCA allowed a quick visual inspection of the application limits of the models for the HPVCs and the EINECS. The models performed well for the HPVCs but could only be used to estimate a fraction of the EINECS. PCA showed that in some cases subclusters were present. © 1996 by John Wiley & Sons, Ltd.
ISSN:0886-9383
1099-128X
DOI:10.1002/(SICI)1099-128X(199609)10:5/6<385::AID-CEM439>3.0.CO;2-5