An improvement on the prediction power of the 3D-QSAR CoMFA models using a hybrid of statistical and machine learning methods: a case study on γ‑secretase modulators of Alzheimer’s disease
A comparative molecular field analysis has been developed to study the three-dimensional quantitative structure–activity relationship of a series of triterpene-based γ-secretase modulators. We have performed the genetic algorithm on a large set of comparative molecular field analysis fields to selec...
Gespeichert in:
Veröffentlicht in: | Medicinal chemistry research 2017-06, Vol.26 (6), p.1184-1200 |
---|---|
Hauptverfasser: | , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | A comparative molecular field analysis has been developed to study the three-dimensional quantitative structure–activity relationship of a series of triterpene-based γ-secretase modulators. We have performed the genetic algorithm on a large set of comparative molecular field analysis fields to select the most responsible fields contributing to inhibitory activities of these compounds against Alzheimer’s disease. The genetic algorithm-selected comparative molecular field analysis fields were introduced into the partial least squares and principal component analysis to reduce the dimensionality of the input features. The extracted partial least squares components were used as inputs to build partial least squares regression (genetic algorithm-partial least squares regression), and the extracted principal components were used as inputs for principal component regression (genetic algorithm-principal component regression) and support vector regression (genetic algorithm-principal component analysis-support vector regression). The classic three-dimensional quantitative structure–activity relationship comparative molecular field analysis analysis (partial least squares regression) is also carried out for the sake of comparison. The results show that among the constructed models, in terms of root mean squares and leave-one-out cross-validated
R
2
(
q
2
), the combination of principal component analysis and support vector machine can effectively improve the prediction performance (RMSE
train
= 0.231, RMSE
test
= 0.360, and
q
2
= 0.638) compared with PLSR (RMSE
train
= 0.415, RMSE
test
= 0.680, and
q
2
= 0.311). The performances of the genetic algorithm-principal component regression and genetic algorithm-partial least squares regression were also comparable but less powerful than genetic algorithm-principal component analysis-support vector regression. Finally, based on the information derived from the comparative molecular field analysis contour map, some key features for increasing the activity of γ-secretase modulators have been identified to design new triterpene-based Alzheimer’s disease drugs. |
---|---|
ISSN: | 1054-2523 1554-8120 |
DOI: | 10.1007/s00044-017-1828-7 |