Impact of Chemist-In-The-Loop Molecular Representations on Machine Learning Outcomes
The development of molecular descriptors is a central challenge in cheminformatics. Most approaches use algorithms that extract atomic environments or end-to-end machine learning. However, a looming question is that how do these approaches compare with the critical eye of trained chemists. The CAS f...
Gespeichert in:
Veröffentlicht in: | Journal of chemical information and modeling 2020-10, Vol.60 (10), p.4449-4456 |
---|---|
Hauptverfasser: | , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | The development of molecular descriptors is a central challenge in cheminformatics. Most approaches use algorithms that extract atomic environments or end-to-end machine learning. However, a looming question is that how do these approaches compare with the critical eye of trained chemists. The CAS fingerprint engages expert chemists to curate chemical motifs, which they deem could influence bioactivity. In this paper, we benchmark the CAS fingerprint against commonly used fingerprints using a well-established benchmark set of 88 targets. We show that the CAS fingerprint outperforms most of the commonly used molecular fingerprints. Analysis of the CAS fingerprint reveals that experts tend to select features that are rarely reported in the literature, though not all rare features are selected. Our analysis also shows that the CAS fingerprint provides a different source of information compared to other commonly used fingerprints. These results suggest that anthropomorphic insights do have predictive power and highlight the importance of a chemist-in-the-loop approach in the era of machine learning. |
---|---|
ISSN: | 1549-9596 1549-960X |
DOI: | 10.1021/acs.jcim.0c00193 |