Improving the prediction of organism-level toxicity through integration of chemical, protein target and cytotoxicity qHTS data† †Electronic supplementary information (ESI) available. See DOI: 10.1039/c5tx00406c
Using three descriptor domains – encoding complementary bioactivity data – enhances the predictive power, applicability, and interpretability of rat acute-toxicity classifiers. Prediction of compound toxicity is essential because covering the vast chemical space requiring safety assessment using tra...
Gespeichert in:
Veröffentlicht in: | Toxicology research (Cambridge) 2016-03, Vol.5 (3), p.883-894 |
---|---|
Hauptverfasser: | , , , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Using three descriptor domains – encoding complementary bioactivity data – enhances the predictive power, applicability, and interpretability of rat acute-toxicity classifiers.
Prediction of compound toxicity is essential because covering the vast chemical space requiring safety assessment using traditional experimentally-based, resource-intensive techniques is impossible. However, such prediction is nontrivial due to the complex causal relationship between compound structure and
in vivo
harm. Protein target annotations and
in vitro
experimental outcomes encode relevant bioactivity information complementary to chemicals’ structures. This work tests the hypothesis that utilizing three complementary types of data will afford predictive models that outperform traditional models built using fewer data types. A tripartite, heterogeneous descriptor set for 367 compounds was comprised of (a) chemical descriptors, (b) protein target descriptors generated using an algorithm trained on 190 000 ligand–protein interactions from ChEMBL, and (c) descriptors derived from
in vitro
cell cytotoxicity dose–response data from a panel of human cell lines. 100 random forests classification models for predicting rat LD
50
were built using every combination of descriptors. Successive integration of data types improved predictive performance; models built using the full dataset had an average external correct classification rate of 0.82, compared to 0.73–0.80 for models built using two data types and 0.67–0.78 for models built using one. Pairwise comparisons of models trained on the same data showed that including a third data domain on top of chemistry improved average correct classification rate by 1.4–2.4 points, with
p
-values |
---|---|
ISSN: | 2045-452X 2045-4538 |
DOI: | 10.1039/c5tx00406c |