When global and local molecular descriptors are more than the sum of its parts: Simple, But Not Simpler?
In this report, we introduce a set of aggregation operators (AOs) to calculate global and local (group and atom type) molecular descriptors (MDs) as a generalization of the classical approach of molecular encoding using the sum of the atomic (or fragment) contributions. These AOs are implemented in...
Gespeichert in:
Veröffentlicht in: | Molecular diversity 2020-11, Vol.24 (4), p.913-932 |
---|---|
Hauptverfasser: | , , , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | In this report, we introduce a set of aggregation operators (AOs) to calculate global and local (group and atom type) molecular descriptors (MDs) as a generalization of the classical approach of molecular encoding using the sum of the atomic (or fragment) contributions. These AOs are implemented in a new and free software denominated
MD-LOVIs
(
http://tomocomd.com/md-lovis
), which allows for the calculation of MDs from atomic weights vector and LOVIs (local vertex invariants). This software was developed in Java programming language and employed the Chemical Development Kit (CDK) library for handling chemical structures and the calculation of atomic weights. An analysis of the complexities of the algorithms presented herein demonstrates that these aspects were efficiently implemented. The calculation speed experiments show that the
MD-LOVIs
software has satisfactory behavior when compared to software such as Padel, CDKDescriptor, DRAGON and Bluecal software. Shannon’s entropy (SE)-based variability studies demonstrate that
MD-LOVIs
yields indices with greater information content when compared to those of popular academic and commercial software. A principal component analysis reveals that our approach captures chemical information orthogonal to that codified by the DRAGON, Padel and Mold2 software, as a result of the several generalizations in
MD-LOVIs
not used in other programs. Lastly, three QSARs were built using multiple linear regression with genetic algorithms, and the statistical parameters of these models demonstrate that the
MD-LOVIs
indices obtained with AOs yield better performance than those obtained when the summation operator is used exclusively. Moreover, it is also revealed that the
MD-LOVIs
indices yield models with comparable to superior performance when compared to other QSAR methodologies reported in the literature, despite their simplicity. The studies performed herein collectively demonstrated that
MD-LOVIs
software generates indices as simple as possible, but not simpler and that use of AOs enhances the diversity of the chemical information codified, which consequently improves the performance of traditional MDs.
Graphic abstract |
---|---|
ISSN: | 1381-1991 1573-501X |
DOI: | 10.1007/s11030-019-10002-3 |