Predicting the Genotoxicity of Thiophene Derivatives from Molecular Structure

We report several binary classification models that directly link the genetic toxicity of a series of 140 thiophene derivatives with information derived from the compounds' molecular structure. Genetic toxicity was measured using an SOS Chromotest. IMAX (maximal SOS induction factor) values wer...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Chemical research in toxicology 2003-06, Vol.16 (6), p.721-732
Hauptverfasser: Mosier, Philip D, Jurs, Peter C, Custer, Laura L, Durham, Stephen K, Pearl, Greg M
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:We report several binary classification models that directly link the genetic toxicity of a series of 140 thiophene derivatives with information derived from the compounds' molecular structure. Genetic toxicity was measured using an SOS Chromotest. IMAX (maximal SOS induction factor) values were recorded for each of the 140 compounds both in the presence and in the absence of S9 rat liver homogenate. Compounds were classified as genotoxic if IMAX ≥ 1.5 in either test or nongenotoxic if IMAX < 1.5 for both tests. The molecular structures were represented by numerical descriptors that encoded the topological, geometric, electronic, and polar surface area properties of the thiophene derivatives. The classification models used were linear discriminant analysis (LDA), k-nearest neighbor classification (k-NN), and the probabilistic neural network (PNN). These were used in conjunction with either a genetic algorithm or a generalized simulated annealing to find optimal subsets of descriptors for each classifier. The quality of the resulting models was determined by the number of misclassified compounds, with preference given to models that produced fewer false negative classifications. Model sizes ranged from seven descriptors for LDA to three descriptors for k-NN and PNN. Very good classification results were obtained with all three classifiers. Classification rates for the LDA, k-NN, and PNN models were 80, 85, and 85%, respectively, for the prediction set compounds. Additionally, a consensus model was generated that incorporated all three of the basic model types. This consensus model correctly predicted the genotoxicity of 95% of the prediction set compounds.
ISSN:0893-228X
1520-5010
DOI:10.1021/tx020104i