Reusability report: Learning the language of synthetic methods used in medicinal chemistry
Heteroatom alkylation and arylation Acylation and related processes C–C bond formation A heterogeneous distribution of class labels was observed, leading us to drop reactions with fewer than 50 instances from prediction tasks to prevent issues relating to exotic reaction classes with limited represe...
Gespeichert in:
Veröffentlicht in: | Nature machine intelligence 2021-07, Vol.3 (7), p.572-575 |
---|---|
Hauptverfasser: | , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Heteroatom alkylation and arylation Acylation and related processes C–C bond formation A heterogeneous distribution of class labels was observed, leading us to drop reactions with fewer than 50 instances from prediction tasks to prevent issues relating to exotic reaction classes with limited representation. Filled white squares indicate reactions with 3-pyridinylboronic acid (inset A) and filled green circles represent reactions with 1-methyl-1H-pyrazole-4-boronic acid (inset B), as reactants. d, Superposition of all reaction classes for the SCH27k dataset only, highlighting an example of a reaction just outside the SCH27k convex hull with a green circle. Using rxnfp for classification of AZ-ELN reactions To assess the utility of these methods for classifying AZ-ELN data, we calculated continuous embeddings with the three pretrained transformer models distributed with the original paper1: pretrained only (trained on the Pistachio set in an unsupervised manner), rxnfp (PST) and rxnfp (SCH10k) (trained on the 10k subset from the Schneider set with a classification model). [...]we evaluated the robustness of the rxnfp (PST) model by modifying the input text representations. |
---|---|
ISSN: | 2522-5839 2522-5839 |
DOI: | 10.1038/s42256-021-00367-2 |