MIST-CF: Chemical Formula Inference from Tandem Mass Spectra

Chemical formula annotation for tandem mass spectrometry (MS/MS) data is the first step toward structurally elucidating unknown metabolites. While great strides have been made toward solving this problem, the current state-of-the-art method depends on time-intensive, proprietary, and expert-parametr...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Journal of chemical information and modeling 2024-04, Vol.64 (7), p.2421-2431
Hauptverfasser: Goldman, Samuel, Xin, Jiayi, Provenzano, Joules, Coley, Connor W.
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 2431
container_issue 7
container_start_page 2421
container_title Journal of chemical information and modeling
container_volume 64
creator Goldman, Samuel
Xin, Jiayi
Provenzano, Joules
Coley, Connor W.
description Chemical formula annotation for tandem mass spectrometry (MS/MS) data is the first step toward structurally elucidating unknown metabolites. While great strides have been made toward solving this problem, the current state-of-the-art method depends on time-intensive, proprietary, and expert-parametrized fragmentation tree construction and scoring. In this work, we extend our previous spectrum Transformer methodology into an energy-based modeling framework, MIST-CF: Metabolite Inference with Spectrum Transformers for Chemical Formula prediction, for learning to rank chemical formula and adduct assignments given an unannotated MS/MS spectrum. Importantly, MIST-CF learns in a data-dependent fashion using a Formula Transformer neural network architecture and circumvents the need for fragmentation tree construction. We train and evaluate our model on a large open-access database, showing an absolute improvement of 10% top 1 accuracy over other neural network architectures. We further validate our approach on the CASMI2022 challenge data set, achieving nearly equivalent performance to the winning entry within the positive mode category without any manual curation or postprocessing of our results. These results demonstrate an exciting strategy to more powerfully leverage MS2 fragment peaks for predicting MS1 precursor chemical formulas with data-driven learning.
doi_str_mv 10.1021/acs.jcim.3c01082
format Article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_2866759550</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2866759550</sourcerecordid><originalsourceid>FETCH-LOGICAL-a364t-d319644cd0c59d01f6ece772dbedda428954ffec0155a28bdc8a9ce148a300c73</originalsourceid><addsrcrecordid>eNp1kD1PwzAURS0EoqWwM6FILAyk-LsxYkERhUqtGFokNsu1X0SqOCl2M_DvSWnLgMT0PJx73_NB6JLgIcGU3Bkbhytb-iGzmOCMHqE-EVylSuL348NbKNlDZzGuMGZMSXqKemw0ooLJrI8eZpP5Is3H90n-Ab60pkrGTfBtZZJJXUCA2kJShMYnC1M78MnMxJjM12A3wZyjk8JUES72c4Dexk-L_CWdvj5P8sdpapjkm9QxoiTn1mErlMOkkGChu8AtwTnDaaYELwroviCEodnS2cwoC4RnhmFsR2yAbna969B8thA32pfRQlWZGpo2appJORJKCNyh13_QVdOGurtOM8wUF1TibSHeUTY0MQYo9DqU3oQvTbDemtWdWb01q_dmu8jVvrhdenC_gYPKDrjdAT_Rw9J_-74BzKKCdA</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3039452607</pqid></control><display><type>article</type><title>MIST-CF: Chemical Formula Inference from Tandem Mass Spectra</title><source>MEDLINE</source><source>ACS Publications</source><creator>Goldman, Samuel ; Xin, Jiayi ; Provenzano, Joules ; Coley, Connor W.</creator><creatorcontrib>Goldman, Samuel ; Xin, Jiayi ; Provenzano, Joules ; Coley, Connor W.</creatorcontrib><description>Chemical formula annotation for tandem mass spectrometry (MS/MS) data is the first step toward structurally elucidating unknown metabolites. While great strides have been made toward solving this problem, the current state-of-the-art method depends on time-intensive, proprietary, and expert-parametrized fragmentation tree construction and scoring. In this work, we extend our previous spectrum Transformer methodology into an energy-based modeling framework, MIST-CF: Metabolite Inference with Spectrum Transformers for Chemical Formula prediction, for learning to rank chemical formula and adduct assignments given an unannotated MS/MS spectrum. Importantly, MIST-CF learns in a data-dependent fashion using a Formula Transformer neural network architecture and circumvents the need for fragmentation tree construction. We train and evaluate our model on a large open-access database, showing an absolute improvement of 10% top 1 accuracy over other neural network architectures. We further validate our approach on the CASMI2022 challenge data set, achieving nearly equivalent performance to the winning entry within the positive mode category without any manual curation or postprocessing of our results. These results demonstrate an exciting strategy to more powerfully leverage MS2 fragment peaks for predicting MS1 precursor chemical formulas with data-driven learning.</description><identifier>ISSN: 1549-9596</identifier><identifier>ISSN: 1549-960X</identifier><identifier>EISSN: 1549-960X</identifier><identifier>DOI: 10.1021/acs.jcim.3c01082</identifier><identifier>PMID: 37725368</identifier><language>eng</language><publisher>United States: American Chemical Society</publisher><subject>Annotations ; Databases, Factual ; Fragmentation ; Inference ; Learning ; Machine Learning and Deep Learning ; Mass spectra ; Mass spectrometry ; Metabolites ; Neural networks ; Neural Networks, Computer ; Tandem Mass Spectrometry - methods ; Transformers</subject><ispartof>Journal of chemical information and modeling, 2024-04, Vol.64 (7), p.2421-2431</ispartof><rights>2023 American Chemical Society</rights><rights>Copyright American Chemical Society Apr 8, 2024</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-a364t-d319644cd0c59d01f6ece772dbedda428954ffec0155a28bdc8a9ce148a300c73</citedby><cites>FETCH-LOGICAL-a364t-d319644cd0c59d01f6ece772dbedda428954ffec0155a28bdc8a9ce148a300c73</cites><orcidid>0000-0002-8271-8723 ; 0000-0002-3928-6873 ; 0000-0003-3693-3809</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://pubs.acs.org/doi/pdf/10.1021/acs.jcim.3c01082$$EPDF$$P50$$Gacs$$H</linktopdf><linktohtml>$$Uhttps://pubs.acs.org/doi/10.1021/acs.jcim.3c01082$$EHTML$$P50$$Gacs$$H</linktohtml><link.rule.ids>314,780,784,2763,27075,27923,27924,56737,56787</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/37725368$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Goldman, Samuel</creatorcontrib><creatorcontrib>Xin, Jiayi</creatorcontrib><creatorcontrib>Provenzano, Joules</creatorcontrib><creatorcontrib>Coley, Connor W.</creatorcontrib><title>MIST-CF: Chemical Formula Inference from Tandem Mass Spectra</title><title>Journal of chemical information and modeling</title><addtitle>J. Chem. Inf. Model</addtitle><description>Chemical formula annotation for tandem mass spectrometry (MS/MS) data is the first step toward structurally elucidating unknown metabolites. While great strides have been made toward solving this problem, the current state-of-the-art method depends on time-intensive, proprietary, and expert-parametrized fragmentation tree construction and scoring. In this work, we extend our previous spectrum Transformer methodology into an energy-based modeling framework, MIST-CF: Metabolite Inference with Spectrum Transformers for Chemical Formula prediction, for learning to rank chemical formula and adduct assignments given an unannotated MS/MS spectrum. Importantly, MIST-CF learns in a data-dependent fashion using a Formula Transformer neural network architecture and circumvents the need for fragmentation tree construction. We train and evaluate our model on a large open-access database, showing an absolute improvement of 10% top 1 accuracy over other neural network architectures. We further validate our approach on the CASMI2022 challenge data set, achieving nearly equivalent performance to the winning entry within the positive mode category without any manual curation or postprocessing of our results. These results demonstrate an exciting strategy to more powerfully leverage MS2 fragment peaks for predicting MS1 precursor chemical formulas with data-driven learning.</description><subject>Annotations</subject><subject>Databases, Factual</subject><subject>Fragmentation</subject><subject>Inference</subject><subject>Learning</subject><subject>Machine Learning and Deep Learning</subject><subject>Mass spectra</subject><subject>Mass spectrometry</subject><subject>Metabolites</subject><subject>Neural networks</subject><subject>Neural Networks, Computer</subject><subject>Tandem Mass Spectrometry - methods</subject><subject>Transformers</subject><issn>1549-9596</issn><issn>1549-960X</issn><issn>1549-960X</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>EIF</sourceid><recordid>eNp1kD1PwzAURS0EoqWwM6FILAyk-LsxYkERhUqtGFokNsu1X0SqOCl2M_DvSWnLgMT0PJx73_NB6JLgIcGU3Bkbhytb-iGzmOCMHqE-EVylSuL348NbKNlDZzGuMGZMSXqKemw0ooLJrI8eZpP5Is3H90n-Ab60pkrGTfBtZZJJXUCA2kJShMYnC1M78MnMxJjM12A3wZyjk8JUES72c4Dexk-L_CWdvj5P8sdpapjkm9QxoiTn1mErlMOkkGChu8AtwTnDaaYELwroviCEodnS2cwoC4RnhmFsR2yAbna969B8thA32pfRQlWZGpo2appJORJKCNyh13_QVdOGurtOM8wUF1TibSHeUTY0MQYo9DqU3oQvTbDemtWdWb01q_dmu8jVvrhdenC_gYPKDrjdAT_Rw9J_-74BzKKCdA</recordid><startdate>20240408</startdate><enddate>20240408</enddate><creator>Goldman, Samuel</creator><creator>Xin, Jiayi</creator><creator>Provenzano, Joules</creator><creator>Coley, Connor W.</creator><general>American Chemical Society</general><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SR</scope><scope>7U5</scope><scope>8BQ</scope><scope>8FD</scope><scope>JG9</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>7X8</scope><orcidid>https://orcid.org/0000-0002-8271-8723</orcidid><orcidid>https://orcid.org/0000-0002-3928-6873</orcidid><orcidid>https://orcid.org/0000-0003-3693-3809</orcidid></search><sort><creationdate>20240408</creationdate><title>MIST-CF: Chemical Formula Inference from Tandem Mass Spectra</title><author>Goldman, Samuel ; Xin, Jiayi ; Provenzano, Joules ; Coley, Connor W.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a364t-d319644cd0c59d01f6ece772dbedda428954ffec0155a28bdc8a9ce148a300c73</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Annotations</topic><topic>Databases, Factual</topic><topic>Fragmentation</topic><topic>Inference</topic><topic>Learning</topic><topic>Machine Learning and Deep Learning</topic><topic>Mass spectra</topic><topic>Mass spectrometry</topic><topic>Metabolites</topic><topic>Neural networks</topic><topic>Neural Networks, Computer</topic><topic>Tandem Mass Spectrometry - methods</topic><topic>Transformers</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Goldman, Samuel</creatorcontrib><creatorcontrib>Xin, Jiayi</creatorcontrib><creatorcontrib>Provenzano, Joules</creatorcontrib><creatorcontrib>Coley, Connor W.</creatorcontrib><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Engineered Materials Abstracts</collection><collection>Solid State and Superconductivity Abstracts</collection><collection>METADEX</collection><collection>Technology Research Database</collection><collection>Materials Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>MEDLINE - Academic</collection><jtitle>Journal of chemical information and modeling</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Goldman, Samuel</au><au>Xin, Jiayi</au><au>Provenzano, Joules</au><au>Coley, Connor W.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>MIST-CF: Chemical Formula Inference from Tandem Mass Spectra</atitle><jtitle>Journal of chemical information and modeling</jtitle><addtitle>J. Chem. Inf. Model</addtitle><date>2024-04-08</date><risdate>2024</risdate><volume>64</volume><issue>7</issue><spage>2421</spage><epage>2431</epage><pages>2421-2431</pages><issn>1549-9596</issn><issn>1549-960X</issn><eissn>1549-960X</eissn><abstract>Chemical formula annotation for tandem mass spectrometry (MS/MS) data is the first step toward structurally elucidating unknown metabolites. While great strides have been made toward solving this problem, the current state-of-the-art method depends on time-intensive, proprietary, and expert-parametrized fragmentation tree construction and scoring. In this work, we extend our previous spectrum Transformer methodology into an energy-based modeling framework, MIST-CF: Metabolite Inference with Spectrum Transformers for Chemical Formula prediction, for learning to rank chemical formula and adduct assignments given an unannotated MS/MS spectrum. Importantly, MIST-CF learns in a data-dependent fashion using a Formula Transformer neural network architecture and circumvents the need for fragmentation tree construction. We train and evaluate our model on a large open-access database, showing an absolute improvement of 10% top 1 accuracy over other neural network architectures. We further validate our approach on the CASMI2022 challenge data set, achieving nearly equivalent performance to the winning entry within the positive mode category without any manual curation or postprocessing of our results. These results demonstrate an exciting strategy to more powerfully leverage MS2 fragment peaks for predicting MS1 precursor chemical formulas with data-driven learning.</abstract><cop>United States</cop><pub>American Chemical Society</pub><pmid>37725368</pmid><doi>10.1021/acs.jcim.3c01082</doi><tpages>11</tpages><orcidid>https://orcid.org/0000-0002-8271-8723</orcidid><orcidid>https://orcid.org/0000-0002-3928-6873</orcidid><orcidid>https://orcid.org/0000-0003-3693-3809</orcidid></addata></record>
fulltext fulltext
identifier ISSN: 1549-9596
ispartof Journal of chemical information and modeling, 2024-04, Vol.64 (7), p.2421-2431
issn 1549-9596
1549-960X
1549-960X
language eng
recordid cdi_proquest_miscellaneous_2866759550
source MEDLINE; ACS Publications
subjects Annotations
Databases, Factual
Fragmentation
Inference
Learning
Machine Learning and Deep Learning
Mass spectra
Mass spectrometry
Metabolites
Neural networks
Neural Networks, Computer
Tandem Mass Spectrometry - methods
Transformers
title MIST-CF: Chemical Formula Inference from Tandem Mass Spectra
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-13T02%3A51%3A04IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=MIST-CF:%20Chemical%20Formula%20Inference%20from%20Tandem%20Mass%20Spectra&rft.jtitle=Journal%20of%20chemical%20information%20and%20modeling&rft.au=Goldman,%20Samuel&rft.date=2024-04-08&rft.volume=64&rft.issue=7&rft.spage=2421&rft.epage=2431&rft.pages=2421-2431&rft.issn=1549-9596&rft.eissn=1549-960X&rft_id=info:doi/10.1021/acs.jcim.3c01082&rft_dat=%3Cproquest_cross%3E2866759550%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=3039452607&rft_id=info:pmid/37725368&rfr_iscdi=true