Application of Random Forest and Multiple Linear Regression Techniques to QSPR Prediction of an Aqueous Solubility for Military Compounds
The relationship between the aqueous solubility of more than two thousand eight hundred organic compounds and their structures was investigated using a QSPR approach based on Simplex Representation of Molecular Structure (SiRMS). The dataset consists of 2537 diverse organic compounds. Multiple Linea...
Gespeichert in:
Veröffentlicht in: | Molecular informatics 2010-05, Vol.29 (5), p.394-406 |
---|---|
Hauptverfasser: | , , , , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | The relationship between the aqueous solubility of more than two thousand eight hundred organic compounds and their structures was investigated using a QSPR approach based on Simplex Representation of Molecular Structure (SiRMS). The dataset consists of 2537 diverse organic compounds. Multiple Linear Regression (MLR) and Random Forest (RF) methods were used for statistical modeling at the 2D level of representation of molecular structure. Statistical characteristics of the best models are quite good (MLR method: R2=0.85, Q2=0.83; RF method: R2=0.99, R2oob=0.88). The external validation set of 301 compounds (including 47 nitro‐, nitroso‐ and nitrogen‐rich compounds of military interest) which were not included in the training set and modeling process, was used for evaluation of the models predictivity. Thus, well‐fitted and robust (R2test(MLR)=0.76 and R2test(RF)=0.82) models were obtained for both statistical techniques using descriptors based on the topological structural information only. The predicted solubility values for military compounds are in good agreement with experimental ones. Developed QSPR models represent powerful and easy‐to‐use virtual screening tool that can be recommended for prediction of aqueous solubility. |
---|---|
ISSN: | 1868-1743 1868-1751 |
DOI: | 10.1002/minf.201000001 |