Sample-distance partial least squares: PLS optimized for many variables, with application to CoMFA

Three-dimensional molecular modeling can provide an unlimited number m of structural properties. Comparative Molecular Field Analysis (CoMFA), for example, may calculate thousands of field values for each model structure. When m is large, partial least squares (PLS) is the statistical method of choi...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Journal of computer-aided molecular design 1993-10, Vol.7 (5), p.587-619
Hauptverfasser: Bush, B L, Nachbar, Jr, R B
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Three-dimensional molecular modeling can provide an unlimited number m of structural properties. Comparative Molecular Field Analysis (CoMFA), for example, may calculate thousands of field values for each model structure. When m is large, partial least squares (PLS) is the statistical method of choice for fitting and predicting biological responses. Yet PLS is usually implemented in a property-based fashion which is optimal only for small m. We describe here a sample-based formulation of PLS which can be used to fit any single response (bioactivity). SAMPLS reduces all explanatory data to the pairwise 'distances' among n samples (molecules), or equivalently to an n-by-n covariance matrix C. This matrix, unmodified, can be used to fit all PLS components. Furthermore, SAMPLS will validate the model by modern resampling techniques, at a cost independent of m. We have implemented SAMPLS as a Fortran program and have reproduced conventional and cross-validated PLS analyses of data from two published studies. Full (leave-each-out) cross-validation of a typical CoMFA takes 0.2 CPU s. SAMPLS is thus ideally suited to structure-activity analysis based on CoMFA fields or bonded topology. The sample-distance formulation also relates PLS to methods like cluster analysis and nonlinear mapping, and shows how drastically PLS simplifies the information in CoMFA fields.
ISSN:0920-654X
1573-4951
DOI:10.1007/BF00124364