QSAR–old and new directions

Regression analysis has recently faced increasing doubt concerning its predictivity. A series of studies have questioned the reliability of the underlying approach leading to elusive models despite significant correlations for the training data, but conversely disappointing results for external test...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Format: Buchkapitel
Sprache:eng
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Regression analysis has recently faced increasing doubt concerning its predictivity. A series of studies have questioned the reliability of the underlying approach leading to elusive models despite significant correlations for the training data, but conversely disappointing results for external test sets. The performance of QSAR (quantitative structure-activity relationships) predictions depends on a series of issues, comprising choice of descriptors, compound set, mathematical methods, quality of experimental data, and eventually common sense. A further problem concerns the interpretability of descriptors. The vast number of computable molecular features makes a preselection mandatory particularly for the use in neural networks and support vector regression. Corresponding strategies comprise principal component analysis and removal of collinear descriptors. The issues involved with the latter approach can lead to the preference of highly specific variables in favour of more generally applicable and more meaningful descriptors. Examples are provided where the resulting models are questionable despite seemingly sound statistical prove. Therefore, selection criteria and general guidelines are discussed which facilitate the choice of interpretable descriptors e.g. for lipophilicity and hydrogen-bonding capacity. Reasons for errors and outliers in prediction models are summarized with respect to cross-validations methods, such as leave-one-out. Furthermore, some case studies are discussed which make use of support vector regression, an emerging technique in QSAR.
ISSN:1472-0965
1472-0973
DOI:10.1039/B812893F