In silico prediction of Tetrahymena pyriformis toxicity for diverse industrial chemicals with substructure pattern recognition and machine learning methods
Robust and high predictive accuracy classification models for Tetrahymena pyriformis toxicity prediction were developed by substructure pattern recognition and different machine learning methods, which provided a useful strategy for evaluating toxicological properties of industrial chemicals in the...
Gespeichert in:
Veröffentlicht in: | Chemosphere (Oxford) 2011-03, Vol.82 (11), p.1636-1643 |
---|---|
Hauptverfasser: | , , , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Robust and high predictive accuracy classification models for Tetrahymena pyriformis toxicity prediction were developed by substructure pattern recognition and different machine learning methods, which provided a useful strategy for evaluating toxicological properties of industrial chemicals in the environmental hazard assessment. [Display omitted] .
► A total of 1571 diverse unique chemicals were collected from the literature and composed of the largest diverse data set for Tetrahymena pyriformis toxicity. ► Robust and high predictive accuracy classification models for T. pyriformis toxicity prediction were developed by substructure pattern recognition and different machine learning methods. ► Some useful substructure patterns for characterizing T. pyriformis toxicity were also identified via the information gain analysis methods.
There is an increasing need for the rapid safety assessment of chemicals by both industries and regulatory agencies throughout the world. In silico techniques are practical alternatives in the environmental hazard assessment. It is especially true to address the persistence, bioaccumulative and toxicity potentials of organic chemicals. Tetrahymena pyriformis toxicity is often used as a toxic endpoint. In this study, 1571 diverse unique chemicals were collected from the literature and composed of the largest diverse data set for T. pyriformis toxicity. Classification predictive models of T. pyriformis toxicity were developed by substructure pattern recognition and different machine learning methods, including support vector machine (SVM), C4.5 decision tree, k-nearest neighbors and random forest. The results of a 5-fold cross-validation showed that the SVM method performed better than other algorithms. The overall predictive accuracies of the SVM classification model with radial basis functions kernel was 92.2% for the 5-fold cross-validation and 92.6% for the external validation set, respectively. Furthermore, several representative substructure patterns for characterizing T. pyriformis toxicity were also identified via the information gain analysis methods. |
---|---|
ISSN: | 0045-6535 1879-1298 |
DOI: | 10.1016/j.chemosphere.2010.11.043 |