The Effect of Resampling on Data‐imbalanced Conditions for Prediction towards Nuclear Receptor Profiling Using Deep Learning

In toxicity evaluation based on the nuclear receptor signalling pathway, in silico prediction tools are used for the detection of the early stages of long‐term toxicities, the prioritization of newly synthesized chemicals and the acquisition of the selectivity and sensitivity. Computational predicti...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Molecular informatics 2020-08, Vol.39 (8), p.n/a
Hauptverfasser: Lee, Yong Oh, Kim, Young Jun
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:In toxicity evaluation based on the nuclear receptor signalling pathway, in silico prediction tools are used for the detection of the early stages of long‐term toxicities, the prioritization of newly synthesized chemicals and the acquisition of the selectivity and sensitivity. Computational prediction model is one of the promising tools for the toxicity screening of the chemical‐protein interaction as deep learning has been improved the prediction accuracies. However, the challenge is that data‐imbalanced conditions, where the volume of toxic chemical compound dataset is much smaller than the nontoxic dataset, result in low prediction accuracy of the toxic dataset providing valid information to toxicity hazard. In this paper, we have examined the effect of data imbalance in the toxicity assessment data of AR (LBD), ER (LBD), AhR, and PPAR as nuclear receptors, and identified the severe imbalance between the prediction of the toxic and nontoxic datasets. As the acquisition of the balanced selectivity and sensitivity is required for the assessment of toxicity hazards, data resampling methods have been investigated in order to improve the bias problem in binary classification for toxicity hazard profiling of nuclear receptor. The experimental results achieved a sensitivity of 0.714 and a specificity of 0.787, with an overall accuracy of 0.829 and a ROC‐AUC of 0.822 by the simple resampling methods.
ISSN:1868-1743
1868-1751
DOI:10.1002/minf.201900131