Retrosynthesis prediction with an iterative string editing model

Retrosynthesis is a crucial task in drug discovery and organic synthesis, where artificial intelligence (AI) is increasingly employed to expedite the process. However, existing approaches employ token-by-token decoding methods to translate target molecule strings into corresponding precursors, exhib...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Nature communications 2024-07, Vol.15 (1), p.6404-16, Article 6404
Hauptverfasser: Han, Yuqiang, Xu, Xiaoyang, Hsieh, Chang-Yu, Ding, Keyan, Xu, Hongxia, Xu, Renjun, Hou, Tingjun, Zhang, Qiang, Chen, Huajun
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Retrosynthesis is a crucial task in drug discovery and organic synthesis, where artificial intelligence (AI) is increasingly employed to expedite the process. However, existing approaches employ token-by-token decoding methods to translate target molecule strings into corresponding precursors, exhibiting unsatisfactory performance and limited diversity. As chemical reactions typically induce local molecular changes, reactants and products often overlap significantly. Inspired by this fact, we propose reframing single-step retrosynthesis prediction as a molecular string editing task, iteratively refining target molecule strings to generate precursor compounds. Our proposed approach involves a fragment-based generative editing model that uses explicit sequence editing operations. Additionally, we design an inference module with reposition sampling and sequence augmentation to enhance both prediction accuracy and diversity. Extensive experiments demonstrate that our model generates high-quality and diverse results, achieving superior performance with a promising top-1 accuracy of 60.8% on the standard benchmark dataset USPTO-50 K. Retrosynthesis aims to identify synthesis solutions for compounds in drug discovery. Here, the authors frame it as a molecular string editing task and utilize an iterative string editing model to provide high-quality and diverse solutions.
ISSN:2041-1723
2041-1723
DOI:10.1038/s41467-024-50617-1