Block‐wise Exploration of Molecular Descriptors with Multi‐block Orthogonal Component Analysis (MOCA)

Data tables for machine learning and structure‐activity relationship modelling (QSAR) are often naturally organized in blocks of data, where multiple molecular representations or sets of descriptors form the blocks. Multi‐block Orthogonal Component Analysis (MOCA), a new analytical tool, can be used...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Molecular informatics 2022-05, Vol.41 (5), p.e2100165-n/a
Hauptverfasser: Schmidt, Sebastian, Schindler, Michael, Eriksson, Lennart
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Data tables for machine learning and structure‐activity relationship modelling (QSAR) are often naturally organized in blocks of data, where multiple molecular representations or sets of descriptors form the blocks. Multi‐block Orthogonal Component Analysis (MOCA), a new analytical tool, can be used to explore such data structures in a single model, identifying principal components that are unique to a single block or joint over multiple blocks. We applied MOCA to two sets of 550 and 300 molecules and up to 9213 molecular descriptors organized in 11 blocks. The MOCA models reveal relationships between the blocks and overarching trends across the whole dataset. Based on the MOCA joint components, we propose a quantitative metric for the redundancy of blocks, useful for a priori block‐wise feature selection or evaluation of new molecular representations. The second data set includes 7 ecotoxicological study endpoints for crop protection chemicals, for which we (re‐)discovered some general trends and linked them to molecular properties. Using a single MOCA model we estimated the predictive potential of each block and the model‐ability of the target block.
ISSN:1868-1743
1868-1751
1868-1751
DOI:10.1002/minf.202100165