MLstructureMining: a machine learning tool for structure identification from X-ray pair distribution functions

Synchrotron X-ray techniques are essential for studies of the intrinsic relationship between synthesis, structure, and properties of materials. Modern synchrotrons can produce up to 1 petabyte of data per day. Such amounts of data can speed up materials development, but also comes with a staggering...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Digital discovery 2024-05, Vol.3 (5), p.98-918
Hauptverfasser: Kjær, Emil T. S, Anker, Andy S, Kirsch, Andrea, Lajer, Joakim, Aalling-Frederiksen, Olivia, Billinge, Simon J. L, Jensen, Kirsten M. Ø
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 918
container_issue 5
container_start_page 98
container_title Digital discovery
container_volume 3
creator Kjær, Emil T. S
Anker, Andy S
Kirsch, Andrea
Lajer, Joakim
Aalling-Frederiksen, Olivia
Billinge, Simon J. L
Jensen, Kirsten M. Ø
description Synchrotron X-ray techniques are essential for studies of the intrinsic relationship between synthesis, structure, and properties of materials. Modern synchrotrons can produce up to 1 petabyte of data per day. Such amounts of data can speed up materials development, but also comes with a staggering growth in workload, as the data generated must be stored and analyzed. We present an approach for quickly identifying an atomic structure model from pair distribution function (PDF) data from (nano)crystalline materials. Our model, MLstructureMining, uses a tree-based machine learning (ML) classifier. MLstructureMining has been trained to classify chemical structures from a PDF and gives a top-3 accuracy of 99% on simulated PDFs not seen during training, with a total of 6062 possible classes. We also demonstrate that MLstructureMining can identify the chemical structure from experimental PDFs from nanoparticles of CoFe 2 O 4 and CeO 2 , and we show how it can be used to treat an in situ PDF series collected during Bi 2 Fe 4 O 9 formation. Additionally, we show how MLstructureMining can be used in combination with the well-known methods, principal component analysis (PCA) and non-negative matrix factorization (NMF) to analyze data from in situ experiments. MLstructureMining thus allows for real-time structure characterization by screening vast quantities of crystallographic information files in seconds. We present MLstructureMining, a machine learning tool that identifies a structural model from an experimental pair distribution function. We show how the method can be used for structure analysis of both crystalline and nanocrystalline materials.
doi_str_mv 10.1039/d4dd00001c
format Article
fullrecord <record><control><sourceid>proquest_pubme</sourceid><recordid>TN_cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_11094694</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>3056666697</sourcerecordid><originalsourceid>FETCH-LOGICAL-c387t-9717ef99c8ec856563b48cc71b33b33d28ea04058ce5ef64a2c461fa3a7eba203</originalsourceid><addsrcrecordid>eNpVksFrFDEUxoNYbFl78a4ETyKMTSYzyYwXkd1WC1t6UegtZN686UZmkjXJCP3vzTp1rY-QPPJ-fPngCyGvOPvAmWgv-qrvWS4Oz8hZKUVdsLa5e_6kPyXnMf7ISKkU50K-IKeiUbUsy_qMuJttTGGGNAe8sc66-4_U0MnAzjqkI5pwuKPJ-5EOPtAjTG2PLtnBgknWOzoEP9G7IpgHujc20N5m1HbzMpwdHJr4kpwMZox4_niuyPery2_rr8X29sv1-vO2gGwtFa3iCoe2hQahqWUtRVc1AIp3QuTVlw0aVrG6AaxxkJUpoZJ8MMIo7EzJxIp8WnT3czdhD9lqMKPeBzuZ8KC9sfr_ibM7fe9_ac5ZW8m2ygpvFwUfk9URbELYgXcOIelSCKHytiLvHp8J_ueMMenJRsBxNA79HLVgtTxUqzL6fkEh-BgDDkcznOlDknpTbTZ_klxn-M1T-0f0b24ZeL0AIcJx-u8riN_Y76Wm</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3056666697</pqid></control><display><type>article</type><title>MLstructureMining: a machine learning tool for structure identification from X-ray pair distribution functions</title><source>DOAJ Directory of Open Access Journals</source><creator>Kjær, Emil T. S ; Anker, Andy S ; Kirsch, Andrea ; Lajer, Joakim ; Aalling-Frederiksen, Olivia ; Billinge, Simon J. L ; Jensen, Kirsten M. Ø</creator><creatorcontrib>Kjær, Emil T. S ; Anker, Andy S ; Kirsch, Andrea ; Lajer, Joakim ; Aalling-Frederiksen, Olivia ; Billinge, Simon J. L ; Jensen, Kirsten M. Ø</creatorcontrib><description>Synchrotron X-ray techniques are essential for studies of the intrinsic relationship between synthesis, structure, and properties of materials. Modern synchrotrons can produce up to 1 petabyte of data per day. Such amounts of data can speed up materials development, but also comes with a staggering growth in workload, as the data generated must be stored and analyzed. We present an approach for quickly identifying an atomic structure model from pair distribution function (PDF) data from (nano)crystalline materials. Our model, MLstructureMining, uses a tree-based machine learning (ML) classifier. MLstructureMining has been trained to classify chemical structures from a PDF and gives a top-3 accuracy of 99% on simulated PDFs not seen during training, with a total of 6062 possible classes. We also demonstrate that MLstructureMining can identify the chemical structure from experimental PDFs from nanoparticles of CoFe 2 O 4 and CeO 2 , and we show how it can be used to treat an in situ PDF series collected during Bi 2 Fe 4 O 9 formation. Additionally, we show how MLstructureMining can be used in combination with the well-known methods, principal component analysis (PCA) and non-negative matrix factorization (NMF) to analyze data from in situ experiments. MLstructureMining thus allows for real-time structure characterization by screening vast quantities of crystallographic information files in seconds. We present MLstructureMining, a machine learning tool that identifies a structural model from an experimental pair distribution function. We show how the method can be used for structure analysis of both crystalline and nanocrystalline materials.</description><identifier>ISSN: 2635-098X</identifier><identifier>EISSN: 2635-098X</identifier><identifier>DOI: 10.1039/d4dd00001c</identifier><identifier>PMID: 38756225</identifier><language>eng</language><publisher>England: Royal Society of Chemistry (RSC)</publisher><subject>Chemistry</subject><ispartof>Digital discovery, 2024-05, Vol.3 (5), p.98-918</ispartof><rights>This journal is © The Royal Society of Chemistry.</rights><rights>This journal is © The Royal Society of Chemistry 2024 RSC</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c387t-9717ef99c8ec856563b48cc71b33b33d28ea04058ce5ef64a2c461fa3a7eba203</cites><orcidid>0000-0002-7403-6642 ; 0000-0002-9734-4998 ; 0000-0002-0298-6016 ; 0000-0003-2602-7415 ; 0000-0003-0291-217X ; 0000000274036642 ; 0000000326027415 ; 0000000297344998 ; 000000030291217X ; 0000000202986016</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>230,314,780,784,864,885,27924,27925</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/38756225$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink><backlink>$$Uhttps://www.osti.gov/biblio/2333723$$D View this record in Osti.gov$$Hfree_for_read</backlink></links><search><creatorcontrib>Kjær, Emil T. S</creatorcontrib><creatorcontrib>Anker, Andy S</creatorcontrib><creatorcontrib>Kirsch, Andrea</creatorcontrib><creatorcontrib>Lajer, Joakim</creatorcontrib><creatorcontrib>Aalling-Frederiksen, Olivia</creatorcontrib><creatorcontrib>Billinge, Simon J. L</creatorcontrib><creatorcontrib>Jensen, Kirsten M. Ø</creatorcontrib><title>MLstructureMining: a machine learning tool for structure identification from X-ray pair distribution functions</title><title>Digital discovery</title><addtitle>Digit Discov</addtitle><description>Synchrotron X-ray techniques are essential for studies of the intrinsic relationship between synthesis, structure, and properties of materials. Modern synchrotrons can produce up to 1 petabyte of data per day. Such amounts of data can speed up materials development, but also comes with a staggering growth in workload, as the data generated must be stored and analyzed. We present an approach for quickly identifying an atomic structure model from pair distribution function (PDF) data from (nano)crystalline materials. Our model, MLstructureMining, uses a tree-based machine learning (ML) classifier. MLstructureMining has been trained to classify chemical structures from a PDF and gives a top-3 accuracy of 99% on simulated PDFs not seen during training, with a total of 6062 possible classes. We also demonstrate that MLstructureMining can identify the chemical structure from experimental PDFs from nanoparticles of CoFe 2 O 4 and CeO 2 , and we show how it can be used to treat an in situ PDF series collected during Bi 2 Fe 4 O 9 formation. Additionally, we show how MLstructureMining can be used in combination with the well-known methods, principal component analysis (PCA) and non-negative matrix factorization (NMF) to analyze data from in situ experiments. MLstructureMining thus allows for real-time structure characterization by screening vast quantities of crystallographic information files in seconds. We present MLstructureMining, a machine learning tool that identifies a structural model from an experimental pair distribution function. We show how the method can be used for structure analysis of both crystalline and nanocrystalline materials.</description><subject>Chemistry</subject><issn>2635-098X</issn><issn>2635-098X</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><recordid>eNpVksFrFDEUxoNYbFl78a4ETyKMTSYzyYwXkd1WC1t6UegtZN686UZmkjXJCP3vzTp1rY-QPPJ-fPngCyGvOPvAmWgv-qrvWS4Oz8hZKUVdsLa5e_6kPyXnMf7ISKkU50K-IKeiUbUsy_qMuJttTGGGNAe8sc66-4_U0MnAzjqkI5pwuKPJ-5EOPtAjTG2PLtnBgknWOzoEP9G7IpgHujc20N5m1HbzMpwdHJr4kpwMZox4_niuyPery2_rr8X29sv1-vO2gGwtFa3iCoe2hQahqWUtRVc1AIp3QuTVlw0aVrG6AaxxkJUpoZJ8MMIo7EzJxIp8WnT3czdhD9lqMKPeBzuZ8KC9sfr_ibM7fe9_ac5ZW8m2ygpvFwUfk9URbELYgXcOIelSCKHytiLvHp8J_ueMMenJRsBxNA79HLVgtTxUqzL6fkEh-BgDDkcznOlDknpTbTZ_klxn-M1T-0f0b24ZeL0AIcJx-u8riN_Y76Wm</recordid><startdate>20240515</startdate><enddate>20240515</enddate><creator>Kjær, Emil T. S</creator><creator>Anker, Andy S</creator><creator>Kirsch, Andrea</creator><creator>Lajer, Joakim</creator><creator>Aalling-Frederiksen, Olivia</creator><creator>Billinge, Simon J. L</creator><creator>Jensen, Kirsten M. Ø</creator><general>Royal Society of Chemistry (RSC)</general><general>RSC</general><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7X8</scope><scope>OTOTI</scope><scope>5PM</scope><orcidid>https://orcid.org/0000-0002-7403-6642</orcidid><orcidid>https://orcid.org/0000-0002-9734-4998</orcidid><orcidid>https://orcid.org/0000-0002-0298-6016</orcidid><orcidid>https://orcid.org/0000-0003-2602-7415</orcidid><orcidid>https://orcid.org/0000-0003-0291-217X</orcidid><orcidid>https://orcid.org/0000000274036642</orcidid><orcidid>https://orcid.org/0000000326027415</orcidid><orcidid>https://orcid.org/0000000297344998</orcidid><orcidid>https://orcid.org/000000030291217X</orcidid><orcidid>https://orcid.org/0000000202986016</orcidid></search><sort><creationdate>20240515</creationdate><title>MLstructureMining: a machine learning tool for structure identification from X-ray pair distribution functions</title><author>Kjær, Emil T. S ; Anker, Andy S ; Kirsch, Andrea ; Lajer, Joakim ; Aalling-Frederiksen, Olivia ; Billinge, Simon J. L ; Jensen, Kirsten M. Ø</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c387t-9717ef99c8ec856563b48cc71b33b33d28ea04058ce5ef64a2c461fa3a7eba203</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Chemistry</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Kjær, Emil T. S</creatorcontrib><creatorcontrib>Anker, Andy S</creatorcontrib><creatorcontrib>Kirsch, Andrea</creatorcontrib><creatorcontrib>Lajer, Joakim</creatorcontrib><creatorcontrib>Aalling-Frederiksen, Olivia</creatorcontrib><creatorcontrib>Billinge, Simon J. L</creatorcontrib><creatorcontrib>Jensen, Kirsten M. Ø</creatorcontrib><collection>PubMed</collection><collection>CrossRef</collection><collection>MEDLINE - Academic</collection><collection>OSTI.GOV</collection><collection>PubMed Central (Full Participant titles)</collection><jtitle>Digital discovery</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Kjær, Emil T. S</au><au>Anker, Andy S</au><au>Kirsch, Andrea</au><au>Lajer, Joakim</au><au>Aalling-Frederiksen, Olivia</au><au>Billinge, Simon J. L</au><au>Jensen, Kirsten M. Ø</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>MLstructureMining: a machine learning tool for structure identification from X-ray pair distribution functions</atitle><jtitle>Digital discovery</jtitle><addtitle>Digit Discov</addtitle><date>2024-05-15</date><risdate>2024</risdate><volume>3</volume><issue>5</issue><spage>98</spage><epage>918</epage><pages>98-918</pages><issn>2635-098X</issn><eissn>2635-098X</eissn><abstract>Synchrotron X-ray techniques are essential for studies of the intrinsic relationship between synthesis, structure, and properties of materials. Modern synchrotrons can produce up to 1 petabyte of data per day. Such amounts of data can speed up materials development, but also comes with a staggering growth in workload, as the data generated must be stored and analyzed. We present an approach for quickly identifying an atomic structure model from pair distribution function (PDF) data from (nano)crystalline materials. Our model, MLstructureMining, uses a tree-based machine learning (ML) classifier. MLstructureMining has been trained to classify chemical structures from a PDF and gives a top-3 accuracy of 99% on simulated PDFs not seen during training, with a total of 6062 possible classes. We also demonstrate that MLstructureMining can identify the chemical structure from experimental PDFs from nanoparticles of CoFe 2 O 4 and CeO 2 , and we show how it can be used to treat an in situ PDF series collected during Bi 2 Fe 4 O 9 formation. Additionally, we show how MLstructureMining can be used in combination with the well-known methods, principal component analysis (PCA) and non-negative matrix factorization (NMF) to analyze data from in situ experiments. MLstructureMining thus allows for real-time structure characterization by screening vast quantities of crystallographic information files in seconds. We present MLstructureMining, a machine learning tool that identifies a structural model from an experimental pair distribution function. We show how the method can be used for structure analysis of both crystalline and nanocrystalline materials.</abstract><cop>England</cop><pub>Royal Society of Chemistry (RSC)</pub><pmid>38756225</pmid><doi>10.1039/d4dd00001c</doi><tpages>11</tpages><orcidid>https://orcid.org/0000-0002-7403-6642</orcidid><orcidid>https://orcid.org/0000-0002-9734-4998</orcidid><orcidid>https://orcid.org/0000-0002-0298-6016</orcidid><orcidid>https://orcid.org/0000-0003-2602-7415</orcidid><orcidid>https://orcid.org/0000-0003-0291-217X</orcidid><orcidid>https://orcid.org/0000000274036642</orcidid><orcidid>https://orcid.org/0000000326027415</orcidid><orcidid>https://orcid.org/0000000297344998</orcidid><orcidid>https://orcid.org/000000030291217X</orcidid><orcidid>https://orcid.org/0000000202986016</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 2635-098X
ispartof Digital discovery, 2024-05, Vol.3 (5), p.98-918
issn 2635-098X
2635-098X
language eng
recordid cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_11094694
source DOAJ Directory of Open Access Journals
subjects Chemistry
title MLstructureMining: a machine learning tool for structure identification from X-ray pair distribution functions
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-03T16%3A39%3A38IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_pubme&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=MLstructureMining:%20a%20machine%20learning%20tool%20for%20structure%20identification%20from%20X-ray%20pair%20distribution%20functions&rft.jtitle=Digital%20discovery&rft.au=Kj%C3%A6r,%20Emil%20T.%20S&rft.date=2024-05-15&rft.volume=3&rft.issue=5&rft.spage=98&rft.epage=918&rft.pages=98-918&rft.issn=2635-098X&rft.eissn=2635-098X&rft_id=info:doi/10.1039/d4dd00001c&rft_dat=%3Cproquest_pubme%3E3056666697%3C/proquest_pubme%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=3056666697&rft_id=info:pmid/38756225&rfr_iscdi=true