Leveraging infrared spectroscopy for automated structure elucidation
The application of machine learning models in chemistry has made remarkable strides in recent years. While analytical chemistry has received considerable interest from machine learning practitioners, its adoption into everyday use remains limited. Among the available analytical methods, Infrared (IR...
Gespeichert in:
Veröffentlicht in: | Communications chemistry 2024-11, Vol.7 (1), p.268-11, Article 268 |
---|---|
Hauptverfasser: | , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | The application of machine learning models in chemistry has made remarkable strides in recent years. While analytical chemistry has received considerable interest from machine learning practitioners, its adoption into everyday use remains limited. Among the available analytical methods, Infrared (IR) spectroscopy stands out in terms of affordability, simplicity, and accessibility. However, its use has been limited to the identification of a selected few functional groups, as most peaks lie beyond human interpretation. We present a transformer model that enables chemists to leverage the complete information contained within an IR spectrum to directly predict the molecular structure. To cover a large chemical space, we pretrain the model using 634,585 simulated IR spectra and fine-tune it on 3,453 experimental spectra. Our approach achieves a top–1 accuracy of 44.4% and top–10 accuracy of 69.8% on compounds containing 6 to 13 heavy atoms. When solely predicting scaffolds, the model accurately predicts the top–1 scaffold in 84.5% and among the top–10 in 93.0% of cases.
Infrared spectroscopy stands out as an analytical tool for its affordability, simplicity, and accessibility, however, its use has been limited to the identification of a select few functional groups, as most peaks lie beyond human interpretation. Here, the authors use a transformer model that enables chemists to leverage all information contained within an IR spectrum to directly predict the molecular structure. |
---|---|
ISSN: | 2399-3669 2399-3669 |
DOI: | 10.1038/s42004-024-01341-w |