Classifying and analyzing small‐angle scattering data using weighted k nearest neighbors machine learning techniques

A consistent challenge for both new and expert practitioners of small‐angle scattering (SAS) lies in determining how to analyze the data, given the limited information content of said data and the large number of models that can be employed. Machine learning (ML) methods are powerful tools for class...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Journal of applied crystallography 2020-04, Vol.53 (2), p.326-334
Hauptverfasser: Archibald, Richard K., Doucet, Mathieu, Johnston, Travis, Young, Steven R., Yang, Erika, Heller, William T.
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 334
container_issue 2
container_start_page 326
container_title Journal of applied crystallography
container_volume 53
creator Archibald, Richard K.
Doucet, Mathieu
Johnston, Travis
Young, Steven R.
Yang, Erika
Heller, William T.
description A consistent challenge for both new and expert practitioners of small‐angle scattering (SAS) lies in determining how to analyze the data, given the limited information content of said data and the large number of models that can be employed. Machine learning (ML) methods are powerful tools for classifying data that have found diverse applications in many fields of science. Here, ML methods are applied to the problem of classifying SAS data for the most appropriate model to use for data analysis. The approach employed is built around the method of weighted k nearest neighbors (wKNN), and utilizes a subset of the models implemented in the SasView package (https://www.sasview.org/) for generating a well defined set of training and testing data. The prediction rate of the wKNN method implemented here using a subset of SasView models is reasonably good for many of the models, but has difficulty with others, notably those based on spherical structures. A novel expansion of the wKNN method was also developed, which uses Gaussian processes to produce local surrogate models for the classification, and this significantly improves the classification accuracy. Further, by integrating a stochastic gradient descent method during post‐processing, it is possible to leverage the local surrogate model both to classify the SAS data with high accuracy and to predict the structural parameters that best describe the data. The linking of data classification and model fitting has the potential to facilitate the translation of measured data into results for both novice and expert practitioners of SAS. It is demonstrated how k nearest neighbor machine learning methods can be used to classify small‐angle scattering data for the most appropriate model to use for data analysis. The results show the promise of machine learning for helping small‐angle scattering practitioners translate measured data into scientific results.
doi_str_mv 10.1107/S1600576720000552
format Article
fullrecord <record><control><sourceid>proquest_osti_</sourceid><recordid>TN_cdi_osti_scitechconnect_1649508</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2386136591</sourcerecordid><originalsourceid>FETCH-LOGICAL-c3988-e62e19790688ec3e43e04ffa58c4ef1a4aaeababd6455166a0a21ffd2ed3b1383</originalsourceid><addsrcrecordid>eNqFkc1KAzEQxxdR8PMBvC16rmY2m3T3KMVPBMGva5hmZ9voNqtJaqknH8Fn9ElMqAfBg4Ewk__8_sOQybJ9YEcAbHh8B5IxMZTDgsUjRLGWbSVpkLT1X_lmtu39E2MQ0WIrext16L1pl8ZOcrRNvNgt39PLz7Drvj4-0U46yr3GEMilQoMB87lP6YLMZBqoyZ9zS-jIhxijNO6dz2eop8ZS3sWKTXQgPbXmdU5-N9tosfO09xN3soez0_vRxeD65vxydHI90LyuqgHJgqAe1kxWFWlOJSdWti2KSpfUApaIhGMcN7IUAqREhgW0bVNQw8fAK76THaz69j4Y5bVJI-jeWtJBgSxrwRJ0uIJeXJ-GC-qpn7v4D14VvJLApaghUrCitOu9d9SqF2dm6JYKmEo7UH92ED31yrMwHS3_N6ir0W3xeC4AKv4NAmSMoA</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2386136591</pqid></control><display><type>article</type><title>Classifying and analyzing small‐angle scattering data using weighted k nearest neighbors machine learning techniques</title><source>Wiley Online Library Journals Frontfile Complete</source><source>Alma/SFX Local Collection</source><creator>Archibald, Richard K. ; Doucet, Mathieu ; Johnston, Travis ; Young, Steven R. ; Yang, Erika ; Heller, William T.</creator><creatorcontrib>Archibald, Richard K. ; Doucet, Mathieu ; Johnston, Travis ; Young, Steven R. ; Yang, Erika ; Heller, William T. ; Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)</creatorcontrib><description>A consistent challenge for both new and expert practitioners of small‐angle scattering (SAS) lies in determining how to analyze the data, given the limited information content of said data and the large number of models that can be employed. Machine learning (ML) methods are powerful tools for classifying data that have found diverse applications in many fields of science. Here, ML methods are applied to the problem of classifying SAS data for the most appropriate model to use for data analysis. The approach employed is built around the method of weighted k nearest neighbors (wKNN), and utilizes a subset of the models implemented in the SasView package (https://www.sasview.org/) for generating a well defined set of training and testing data. The prediction rate of the wKNN method implemented here using a subset of SasView models is reasonably good for many of the models, but has difficulty with others, notably those based on spherical structures. A novel expansion of the wKNN method was also developed, which uses Gaussian processes to produce local surrogate models for the classification, and this significantly improves the classification accuracy. Further, by integrating a stochastic gradient descent method during post‐processing, it is possible to leverage the local surrogate model both to classify the SAS data with high accuracy and to predict the structural parameters that best describe the data. The linking of data classification and model fitting has the potential to facilitate the translation of measured data into results for both novice and expert practitioners of SAS. It is demonstrated how k nearest neighbor machine learning methods can be used to classify small‐angle scattering data for the most appropriate model to use for data analysis. The results show the promise of machine learning for helping small‐angle scattering practitioners translate measured data into scientific results.</description><identifier>ISSN: 1600-5767</identifier><identifier>ISSN: 0021-8898</identifier><identifier>EISSN: 1600-5767</identifier><identifier>DOI: 10.1107/S1600576720000552</identifier><language>eng</language><publisher>5 Abbey Square, Chester, Cheshire CH1 2HU, England: International Union of Crystallography</publisher><subject>Classification ; Data analysis ; Gaussian process ; Learning algorithms ; Machine learning ; MATHEMATICS AND COMPUTING ; modeling ; SasView ; Scattering ; small-angle scattering data</subject><ispartof>Journal of applied crystallography, 2020-04, Vol.53 (2), p.326-334</ispartof><rights>2020 Richard K. Archibald et al. published by IUCr Journals.</rights><rights>Copyright Blackwell Publishing Ltd. Apr 2020</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c3988-e62e19790688ec3e43e04ffa58c4ef1a4aaeababd6455166a0a21ffd2ed3b1383</citedby><cites>FETCH-LOGICAL-c3988-e62e19790688ec3e43e04ffa58c4ef1a4aaeababd6455166a0a21ffd2ed3b1383</cites><orcidid>0000-0002-5560-6478 ; 0000-0002-4538-9780 ; 0000-0001-9935-6864 ; 0000-0001-6456-2975 ; 0000000245389780 ; 0000000164562975 ; 0000000171081934 ; 0000000255606478 ; 0000000305914330</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://onlinelibrary.wiley.com/doi/pdf/10.1107%2FS1600576720000552$$EPDF$$P50$$Gwiley$$H</linktopdf><linktohtml>$$Uhttps://onlinelibrary.wiley.com/doi/full/10.1107%2FS1600576720000552$$EHTML$$P50$$Gwiley$$H</linktohtml><link.rule.ids>230,314,776,780,881,1411,27901,27902,45550,45551</link.rule.ids><backlink>$$Uhttps://www.osti.gov/servlets/purl/1649508$$D View this record in Osti.gov$$Hfree_for_read</backlink></links><search><creatorcontrib>Archibald, Richard K.</creatorcontrib><creatorcontrib>Doucet, Mathieu</creatorcontrib><creatorcontrib>Johnston, Travis</creatorcontrib><creatorcontrib>Young, Steven R.</creatorcontrib><creatorcontrib>Yang, Erika</creatorcontrib><creatorcontrib>Heller, William T.</creatorcontrib><creatorcontrib>Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)</creatorcontrib><title>Classifying and analyzing small‐angle scattering data using weighted k nearest neighbors machine learning techniques</title><title>Journal of applied crystallography</title><description>A consistent challenge for both new and expert practitioners of small‐angle scattering (SAS) lies in determining how to analyze the data, given the limited information content of said data and the large number of models that can be employed. Machine learning (ML) methods are powerful tools for classifying data that have found diverse applications in many fields of science. Here, ML methods are applied to the problem of classifying SAS data for the most appropriate model to use for data analysis. The approach employed is built around the method of weighted k nearest neighbors (wKNN), and utilizes a subset of the models implemented in the SasView package (https://www.sasview.org/) for generating a well defined set of training and testing data. The prediction rate of the wKNN method implemented here using a subset of SasView models is reasonably good for many of the models, but has difficulty with others, notably those based on spherical structures. A novel expansion of the wKNN method was also developed, which uses Gaussian processes to produce local surrogate models for the classification, and this significantly improves the classification accuracy. Further, by integrating a stochastic gradient descent method during post‐processing, it is possible to leverage the local surrogate model both to classify the SAS data with high accuracy and to predict the structural parameters that best describe the data. The linking of data classification and model fitting has the potential to facilitate the translation of measured data into results for both novice and expert practitioners of SAS. It is demonstrated how k nearest neighbor machine learning methods can be used to classify small‐angle scattering data for the most appropriate model to use for data analysis. The results show the promise of machine learning for helping small‐angle scattering practitioners translate measured data into scientific results.</description><subject>Classification</subject><subject>Data analysis</subject><subject>Gaussian process</subject><subject>Learning algorithms</subject><subject>Machine learning</subject><subject>MATHEMATICS AND COMPUTING</subject><subject>modeling</subject><subject>SasView</subject><subject>Scattering</subject><subject>small-angle scattering data</subject><issn>1600-5767</issn><issn>0021-8898</issn><issn>1600-5767</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2020</creationdate><recordtype>article</recordtype><recordid>eNqFkc1KAzEQxxdR8PMBvC16rmY2m3T3KMVPBMGva5hmZ9voNqtJaqknH8Fn9ElMqAfBg4Ewk__8_sOQybJ9YEcAbHh8B5IxMZTDgsUjRLGWbSVpkLT1X_lmtu39E2MQ0WIrext16L1pl8ZOcrRNvNgt39PLz7Drvj4-0U46yr3GEMilQoMB87lP6YLMZBqoyZ9zS-jIhxijNO6dz2eop8ZS3sWKTXQgPbXmdU5-N9tosfO09xN3soez0_vRxeD65vxydHI90LyuqgHJgqAe1kxWFWlOJSdWti2KSpfUApaIhGMcN7IUAqREhgW0bVNQw8fAK76THaz69j4Y5bVJI-jeWtJBgSxrwRJ0uIJeXJ-GC-qpn7v4D14VvJLApaghUrCitOu9d9SqF2dm6JYKmEo7UH92ED31yrMwHS3_N6ir0W3xeC4AKv4NAmSMoA</recordid><startdate>202004</startdate><enddate>202004</enddate><creator>Archibald, Richard K.</creator><creator>Doucet, Mathieu</creator><creator>Johnston, Travis</creator><creator>Young, Steven R.</creator><creator>Yang, Erika</creator><creator>Heller, William T.</creator><general>International Union of Crystallography</general><general>Blackwell Publishing Ltd</general><scope>AAYXX</scope><scope>CITATION</scope><scope>7SR</scope><scope>7U5</scope><scope>8BQ</scope><scope>8FD</scope><scope>JG9</scope><scope>L7M</scope><scope>OIOZB</scope><scope>OTOTI</scope><orcidid>https://orcid.org/0000-0002-5560-6478</orcidid><orcidid>https://orcid.org/0000-0002-4538-9780</orcidid><orcidid>https://orcid.org/0000-0001-9935-6864</orcidid><orcidid>https://orcid.org/0000-0001-6456-2975</orcidid><orcidid>https://orcid.org/0000000245389780</orcidid><orcidid>https://orcid.org/0000000164562975</orcidid><orcidid>https://orcid.org/0000000171081934</orcidid><orcidid>https://orcid.org/0000000255606478</orcidid><orcidid>https://orcid.org/0000000305914330</orcidid></search><sort><creationdate>202004</creationdate><title>Classifying and analyzing small‐angle scattering data using weighted k nearest neighbors machine learning techniques</title><author>Archibald, Richard K. ; Doucet, Mathieu ; Johnston, Travis ; Young, Steven R. ; Yang, Erika ; Heller, William T.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c3988-e62e19790688ec3e43e04ffa58c4ef1a4aaeababd6455166a0a21ffd2ed3b1383</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2020</creationdate><topic>Classification</topic><topic>Data analysis</topic><topic>Gaussian process</topic><topic>Learning algorithms</topic><topic>Machine learning</topic><topic>MATHEMATICS AND COMPUTING</topic><topic>modeling</topic><topic>SasView</topic><topic>Scattering</topic><topic>small-angle scattering data</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Archibald, Richard K.</creatorcontrib><creatorcontrib>Doucet, Mathieu</creatorcontrib><creatorcontrib>Johnston, Travis</creatorcontrib><creatorcontrib>Young, Steven R.</creatorcontrib><creatorcontrib>Yang, Erika</creatorcontrib><creatorcontrib>Heller, William T.</creatorcontrib><creatorcontrib>Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)</creatorcontrib><collection>CrossRef</collection><collection>Engineered Materials Abstracts</collection><collection>Solid State and Superconductivity Abstracts</collection><collection>METADEX</collection><collection>Technology Research Database</collection><collection>Materials Research Database</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>OSTI.GOV - Hybrid</collection><collection>OSTI.GOV</collection><jtitle>Journal of applied crystallography</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Archibald, Richard K.</au><au>Doucet, Mathieu</au><au>Johnston, Travis</au><au>Young, Steven R.</au><au>Yang, Erika</au><au>Heller, William T.</au><aucorp>Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)</aucorp><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Classifying and analyzing small‐angle scattering data using weighted k nearest neighbors machine learning techniques</atitle><jtitle>Journal of applied crystallography</jtitle><date>2020-04</date><risdate>2020</risdate><volume>53</volume><issue>2</issue><spage>326</spage><epage>334</epage><pages>326-334</pages><issn>1600-5767</issn><issn>0021-8898</issn><eissn>1600-5767</eissn><abstract>A consistent challenge for both new and expert practitioners of small‐angle scattering (SAS) lies in determining how to analyze the data, given the limited information content of said data and the large number of models that can be employed. Machine learning (ML) methods are powerful tools for classifying data that have found diverse applications in many fields of science. Here, ML methods are applied to the problem of classifying SAS data for the most appropriate model to use for data analysis. The approach employed is built around the method of weighted k nearest neighbors (wKNN), and utilizes a subset of the models implemented in the SasView package (https://www.sasview.org/) for generating a well defined set of training and testing data. The prediction rate of the wKNN method implemented here using a subset of SasView models is reasonably good for many of the models, but has difficulty with others, notably those based on spherical structures. A novel expansion of the wKNN method was also developed, which uses Gaussian processes to produce local surrogate models for the classification, and this significantly improves the classification accuracy. Further, by integrating a stochastic gradient descent method during post‐processing, it is possible to leverage the local surrogate model both to classify the SAS data with high accuracy and to predict the structural parameters that best describe the data. The linking of data classification and model fitting has the potential to facilitate the translation of measured data into results for both novice and expert practitioners of SAS. It is demonstrated how k nearest neighbor machine learning methods can be used to classify small‐angle scattering data for the most appropriate model to use for data analysis. The results show the promise of machine learning for helping small‐angle scattering practitioners translate measured data into scientific results.</abstract><cop>5 Abbey Square, Chester, Cheshire CH1 2HU, England</cop><pub>International Union of Crystallography</pub><doi>10.1107/S1600576720000552</doi><tpages>8</tpages><orcidid>https://orcid.org/0000-0002-5560-6478</orcidid><orcidid>https://orcid.org/0000-0002-4538-9780</orcidid><orcidid>https://orcid.org/0000-0001-9935-6864</orcidid><orcidid>https://orcid.org/0000-0001-6456-2975</orcidid><orcidid>https://orcid.org/0000000245389780</orcidid><orcidid>https://orcid.org/0000000164562975</orcidid><orcidid>https://orcid.org/0000000171081934</orcidid><orcidid>https://orcid.org/0000000255606478</orcidid><orcidid>https://orcid.org/0000000305914330</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 1600-5767
ispartof Journal of applied crystallography, 2020-04, Vol.53 (2), p.326-334
issn 1600-5767
0021-8898
1600-5767
language eng
recordid cdi_osti_scitechconnect_1649508
source Wiley Online Library Journals Frontfile Complete; Alma/SFX Local Collection
subjects Classification
Data analysis
Gaussian process
Learning algorithms
Machine learning
MATHEMATICS AND COMPUTING
modeling
SasView
Scattering
small-angle scattering data
title Classifying and analyzing small‐angle scattering data using weighted k nearest neighbors machine learning techniques
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-29T22%3A37%3A45IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_osti_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Classifying%20and%20analyzing%20small%E2%80%90angle%20scattering%20data%20using%20weighted%20k%20nearest%20neighbors%20machine%20learning%20techniques&rft.jtitle=Journal%20of%20applied%20crystallography&rft.au=Archibald,%20Richard%20K.&rft.aucorp=Oak%20Ridge%20National%20Lab.%20(ORNL),%20Oak%20Ridge,%20TN%20(United%20States)&rft.date=2020-04&rft.volume=53&rft.issue=2&rft.spage=326&rft.epage=334&rft.pages=326-334&rft.issn=1600-5767&rft.eissn=1600-5767&rft_id=info:doi/10.1107/S1600576720000552&rft_dat=%3Cproquest_osti_%3E2386136591%3C/proquest_osti_%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2386136591&rft_id=info:pmid/&rfr_iscdi=true