Nonlinear Multivariate Regression Outperforms Several Concisely Designed Neural Networks on Three QSPR Data Sets

Neural networks (NNs) are accepted as the most powerful nonlinear technique in QSAR and QSPR modeling. However, the NN models are often very robust, containing a large number of parameters optimized during the training procedure. We have recently found (J. Chem. Inf. Comput. Sci. 1999, 39, 121−132)...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Journal of Chemical Information and Computer Sciences 2000-03, Vol.40 (2), p.403-413
Hauptverfasser: Lucic, B, Amic, D, Trinajstic, N
Format: Artikel
Sprache:eng
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Neural networks (NNs) are accepted as the most powerful nonlinear technique in QSAR and QSPR modeling. However, the NN models are often very robust, containing a large number of parameters optimized during the training procedure. We have recently found (J. Chem. Inf. Comput. Sci. 1999, 39, 121−132) that the simpler nonlinear multiregression (MR) models are significantly better than the robust NNs, according to the same statistical parameters. In the present paper we investigated whether the nonlinear MR models are also better than the concisely designed NN models. Nonlinear MR models were generated in the following way. First, nonlinear terms, the 2-fold and 3-fold cross-products of initial descriptors, were calculated and added to initial descriptors. Then, the combination of two powerful techniques for descriptor selection (CROMRsel for “the best” selection and CROMRiisel for approximative, “i by i” stepwise selection) were used to detect the most important descriptors in MR models. For boiling points (BPs) of 150 alkanes the 20-descriptor MR model produced the cross-validated (CV) standard error of 2.88 K, and the best NN model (with 70−80 adjusted weights) had 3.60 K. Prediction of BPs of 50 compounds using the 17-descriptor MR model (obtained on 100 compounds) gave the standard error of 3.58 K. In the case of modeling of 243 chemical shifts CV standard errors were (in ppm) 0.89 and 1.19 with 15- and 9-descriptor MR models, respectively. The best NN models adjusted 60−90 weights and achieved 1.42 ppm. The standard error in predicting the 83 chemical shifts using the 10-descriptor MR model obtained on 160 samples was 1.25 ppm. It is also shown in this data set that the model quality depends on the scaling procedure used for transformation of the initial descriptors. In modeling the sublimation enthalpy the CV correlation coefficient was 0.97 using the best 4-descriptor MR model versus 0.93 obtained using NN with ∼50 adjusted weights. The CV correlation coefficient in predicting the sublimation enthalpies for 21 compounds using the 4-descriptor MR model was 0.98. This is, to our knowledge, the first unambiguous result which shows a way for obtaining nonlinear MR models having better fitted, cross-validated, and predictive performances than the corresponding NN models. Moreover, the nonlinear MR models are significantly simpler than the NN models, which allows one to establish the functional relationships between the modeled property/activity and descript
ISSN:0095-2338
1549-960X
DOI:10.1021/ci990061k