QSAR modelling study of the bioconcentration factor and toxicity of organic compounds to aquatic organisms using machine learning and ensemble methods
Bioconcentration factors and median lethal concentrations (LC50s) are important when assessing risks posed by organic pollutants to aquatic ecosystems. Various quantitative structure–activity relationship models have been developed to predict bioconcentration factors and classify acute toxicity. In...
Gespeichert in:
Veröffentlicht in: | Ecotoxicology and environmental safety 2019-09, Vol.179, p.71-78 |
---|---|
Hauptverfasser: | , , , , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Bioconcentration factors and median lethal concentrations (LC50s) are important when assessing risks posed by organic pollutants to aquatic ecosystems. Various quantitative structure–activity relationship models have been developed to predict bioconcentration factors and classify acute toxicity. In the study, we developed a regression model using Recursive Feature Elimination (RFE) method combined with the Support Vector Machine (SVM) algorithm. We calculated 2D molecular descriptors from a dataset containing 450 diverse chemicals in our regression model. Then we built three ensemble models using three machine learning algorithms and calculated 12 molecular fingerprints from a dataset containing 400 diverse chemicals in our classification models. In the regression model, the R2 and Rpred2 for the regression model were 0.860 and 0.757, respectively. Other parameters indicated that the regression model made good predictions and could efficiently predict a new set of compounds following standards set by Golbraikh, Tropsha, and Roy. In the classification models, the ensemble-SVM classification model gave an overall accuracy, sensitivity, specificity, and area under the receiver operating characteristic curve of 92.2, 95.1, 86.0, and 0.965, respectively, in a five-fold cross-validation and of 87.3, 92.6, 76.0, and 0.940, respectively, in an external validation. These parameters indicated that our ensemble-SVM model was more stable and gave more accurate predictions than previous models. The model could therefore be used to effectively predict aquatic toxicity and assess risks posed to aquatic ecosystems. We identified several structures most relevant to acute aquatic toxicity through predictions made by the two types of models, and this information may be important to aquatic toxicology experiments and aquatic system management.
•Ensemble methods could effectively improve the classification model of aquatic acute toxicity.•Analysis of multiple parameters would make a more comprehensive contribution to the assessment of aquatic systems.•The important structures identified by the model could be considered in aquatic toxicology experiment and risk assessment.•The RFE-SVM model and ensemble-SVM model could be used as effective tools for assessing risks posed to aquatic ecosystems. |
---|---|
ISSN: | 0147-6513 1090-2414 |
DOI: | 10.1016/j.ecoenv.2019.04.035 |