Genetic algorithms and self-organizing maps: a powerful combination for modeling complex QSAR and QSPR problems

Modeling non-linear descriptor-target activity/property relationships with many dependent descriptors has been a long-standing challenge in the design of biologically active molecules. In an effort to address this problem, we couple the supervised self-organizing map with the genetic algorithm. Alth...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Journal of computer-aided molecular design 2004-07, Vol.18 (7-9), p.483-493
Hauptverfasser: Bayram, Ersin, Santago, 2nd, Peter, Harris, Rebecca, Xiao, Yun-De, Clauset, Aaron J, Schmitt, Jeffrey D
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Modeling non-linear descriptor-target activity/property relationships with many dependent descriptors has been a long-standing challenge in the design of biologically active molecules. In an effort to address this problem, we couple the supervised self-organizing map with the genetic algorithm. Although self-organizing maps are non-linear and topology-preserving techniques that hold great potential for modeling and decoding relationships, the large number of descriptors in typical quantitative structure-activity relationship or quantitative structure-property relationship analysis may lead to spurious correlation(s) and/or difficulty in the interpretation of resulting models. To reduce the number of descriptors to a manageable size, we chose the genetic algorithm for descriptor selection because of its flexibility and efficiency in solving complex problems. Feasibility studies were conducted using six different datasets, of moderate-to-large size and moderate-to-great diversity; each with a different biological endpoint. Since favorable training set statistics do not necessarily indicate a highly predictive model, the quality of all models was confirmed by withholding a portion of each dataset for external validation. We also address the variability introduced onto modeling through dataset partitioning and through the stochastic nature of the combined genetic algorithm supervised self-organizing map method using the z-score and other tests. Experiments show that the combined method provides comparable accuracy to the supervised self-organizing map alone, but using significantly fewer descriptors in the models generated. We observed consistently better results than partial least squares models. We conclude that the combination of genetic algorithms with the supervised self-organizing map shows great potential as a quantitative structure-activity/property relationship modeling tool.
ISSN:0920-654X
1573-4951
DOI:10.1007/s10822-004-5321-2