MSNovelist: de novo structure generation from mass spectra

Current methods for structure elucidation of small molecules rely on finding similarity with spectra of known compounds, but do not predict structures de novo for unknown compound classes. We present MSNovelist, which combines fingerprint prediction with an encoder–decoder neural network to generate...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Nature methods 2022-07, Vol.19 (7), p.865-870
Hauptverfasser: Stravs, Michael A., Dührkop, Kai, Böcker, Sebastian, Zamboni, Nicola
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Current methods for structure elucidation of small molecules rely on finding similarity with spectra of known compounds, but do not predict structures de novo for unknown compound classes. We present MSNovelist, which combines fingerprint prediction with an encoder–decoder neural network to generate structures de novo solely from tandem mass spectrometry (MS 2 ) spectra. In an evaluation with 3,863 MS 2 spectra from the Global Natural Product Social Molecular Networking site, MSNovelist predicted 25% of structures correctly on first rank, retrieved 45% of structures overall and reproduced 61% of correct database annotations, without having ever seen the structure in the training phase. Similarly, for the CASMI 2016 challenge, MSNovelist correctly predicted 26% and retrieved 57% of structures, recovering 64% of correct database annotations. Finally, we illustrate the application of MSNovelist in a bryophyte MS 2 dataset, in which de novo structure prediction substantially outscored the best database candidate for seven spectra. MSNovelist is ideally suited to complement library-based annotation in the case of poorly represented analyte classes and novel compounds. MSNovelist combines fingerprint prediction with an encoder–decoder neural network for de novo structure generation of small molecules from mass spectra.
ISSN:1548-7091
1548-7105
DOI:10.1038/s41592-022-01486-3