Impact of dataset uncertainties on machine learning model predictions: the example of polymer glass transition temperatures

Over the past decade, there has been a resurgence in the importance of data-driven techniques in materials science and engineering. The utilization of state-of-the art algorithms, coupled with the increased availability of experimental and computational data, has led to the development of surrogate...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Modelling and simulation in materials science and engineering 2019-01, Vol.27 (2), p.24002
Hauptverfasser: Jha, Anurag, Chandrasekaran, Anand, Kim, Chiho, Ramprasad, Rampi
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue 2
container_start_page 24002
container_title Modelling and simulation in materials science and engineering
container_volume 27
creator Jha, Anurag
Chandrasekaran, Anand
Kim, Chiho
Ramprasad, Rampi
description Over the past decade, there has been a resurgence in the importance of data-driven techniques in materials science and engineering. The utilization of state-of-the art algorithms, coupled with the increased availability of experimental and computational data, has led to the development of surrogate models offering the promise of rapid and accurate predictions of materials' properties based solely on their structure or composition. Such machine learning (ML) models are trained on available past data and are thus susceptible to the intrinsic uncertainties/errors associate with these past measurements. The glass transition temperature (Tg) of polymers, a property of paramount interest in polymer science, is one strong example of a material property that can show widespread variation in the final reported value as a result of a variety of intrinsic and extrinsic factors that occur during the experimental measurement process. In the current work, we curate a large database of Tg measurements from a variety of data sources and proceed to investigate the statistical nature of the inherent uncertainties in the database. Through the partitioning of the dataset using statistically relevant measures, we investigate the effect of variations in the dataset on the performance of the final ML model. We demonstrate that the measure of central tendency, median is a valid approximation when dealing with multiple reported values for Tg when dealing with multiple reported values of Tg for the same polymeric material. Moreover, the Bayesian model noise/uncertainty that emerges from our machine-learning pipeline is able to represent quantitatively the underlying noise/uncertainties in the experimental measurement of Tg.
doi_str_mv 10.1088/1361-651X/aaf8ca
format Article
fullrecord <record><control><sourceid>iop_cross</sourceid><recordid>TN_cdi_iop_journals_10_1088_1361_651X_aaf8ca</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>msmsaaf8ca</sourcerecordid><originalsourceid>FETCH-LOGICAL-c426t-26a1220cb249f0223f7f31b135ee53b4ae85a2deb7455782f5bed6005f1a0d5d3</originalsourceid><addsrcrecordid>eNp9kE1Lw0AQhhdRsFbvHvfowdj9yCapNyl-FApeFLwtk2S23ZLdhN0tWPzzNlQ8iaeB4XlfZh5Crjm746yqZlwWPCsU_5gBmKqBEzL5XZ2SCZsXKmNyLs_JRYxbxpiqRDkhX0s3QJNob2gLCSImuvMNhgTWJ4uR9p46aDbWI-0Qgrd-TV3fYkeHgK1tku19vKdpgxQ_wQ0djl1D3-0dBrruIEaaAvhoR5ImdAMGSLuA8ZKcGegiXv3MKXl_enxbvGSr1-fl4mGVNbkoUiYK4EKwphb53DAhpCmN5DWXClHJOgesFIgW6zJXqqyEUTW2xeFBw4G1qpVTwo69TehjDGj0EKyDsNec6VGeHk3p0ZQ-yjtEbo8R2w962--CPxz4H37zB-6ii1qUWmgmcsaEHlojvwE8IIL_</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Impact of dataset uncertainties on machine learning model predictions: the example of polymer glass transition temperatures</title><source>IOP Publishing Journals</source><source>Institute of Physics (IOP) Journals - HEAL-Link</source><creator>Jha, Anurag ; Chandrasekaran, Anand ; Kim, Chiho ; Ramprasad, Rampi</creator><creatorcontrib>Jha, Anurag ; Chandrasekaran, Anand ; Kim, Chiho ; Ramprasad, Rampi</creatorcontrib><description>Over the past decade, there has been a resurgence in the importance of data-driven techniques in materials science and engineering. The utilization of state-of-the art algorithms, coupled with the increased availability of experimental and computational data, has led to the development of surrogate models offering the promise of rapid and accurate predictions of materials' properties based solely on their structure or composition. Such machine learning (ML) models are trained on available past data and are thus susceptible to the intrinsic uncertainties/errors associate with these past measurements. The glass transition temperature (Tg) of polymers, a property of paramount interest in polymer science, is one strong example of a material property that can show widespread variation in the final reported value as a result of a variety of intrinsic and extrinsic factors that occur during the experimental measurement process. In the current work, we curate a large database of Tg measurements from a variety of data sources and proceed to investigate the statistical nature of the inherent uncertainties in the database. Through the partitioning of the dataset using statistically relevant measures, we investigate the effect of variations in the dataset on the performance of the final ML model. We demonstrate that the measure of central tendency, median is a valid approximation when dealing with multiple reported values for Tg when dealing with multiple reported values of Tg for the same polymeric material. Moreover, the Bayesian model noise/uncertainty that emerges from our machine-learning pipeline is able to represent quantitatively the underlying noise/uncertainties in the experimental measurement of Tg.</description><identifier>ISSN: 0965-0393</identifier><identifier>EISSN: 1361-651X</identifier><identifier>DOI: 10.1088/1361-651X/aaf8ca</identifier><identifier>CODEN: MSMEEU</identifier><language>eng</language><publisher>IOP Publishing</publisher><subject>glass transition temperature ; machine learning ; polymers</subject><ispartof>Modelling and simulation in materials science and engineering, 2019-01, Vol.27 (2), p.24002</ispartof><rights>2019 IOP Publishing Ltd</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c426t-26a1220cb249f0223f7f31b135ee53b4ae85a2deb7455782f5bed6005f1a0d5d3</citedby><cites>FETCH-LOGICAL-c426t-26a1220cb249f0223f7f31b135ee53b4ae85a2deb7455782f5bed6005f1a0d5d3</cites><orcidid>0000-0002-2794-3717</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://iopscience.iop.org/article/10.1088/1361-651X/aaf8ca/pdf$$EPDF$$P50$$Giop$$H</linktopdf><link.rule.ids>314,780,784,27924,27925,53846,53893</link.rule.ids></links><search><creatorcontrib>Jha, Anurag</creatorcontrib><creatorcontrib>Chandrasekaran, Anand</creatorcontrib><creatorcontrib>Kim, Chiho</creatorcontrib><creatorcontrib>Ramprasad, Rampi</creatorcontrib><title>Impact of dataset uncertainties on machine learning model predictions: the example of polymer glass transition temperatures</title><title>Modelling and simulation in materials science and engineering</title><addtitle>MSMS</addtitle><addtitle>Modelling Simul. Mater. Sci. Eng</addtitle><description>Over the past decade, there has been a resurgence in the importance of data-driven techniques in materials science and engineering. The utilization of state-of-the art algorithms, coupled with the increased availability of experimental and computational data, has led to the development of surrogate models offering the promise of rapid and accurate predictions of materials' properties based solely on their structure or composition. Such machine learning (ML) models are trained on available past data and are thus susceptible to the intrinsic uncertainties/errors associate with these past measurements. The glass transition temperature (Tg) of polymers, a property of paramount interest in polymer science, is one strong example of a material property that can show widespread variation in the final reported value as a result of a variety of intrinsic and extrinsic factors that occur during the experimental measurement process. In the current work, we curate a large database of Tg measurements from a variety of data sources and proceed to investigate the statistical nature of the inherent uncertainties in the database. Through the partitioning of the dataset using statistically relevant measures, we investigate the effect of variations in the dataset on the performance of the final ML model. We demonstrate that the measure of central tendency, median is a valid approximation when dealing with multiple reported values for Tg when dealing with multiple reported values of Tg for the same polymeric material. Moreover, the Bayesian model noise/uncertainty that emerges from our machine-learning pipeline is able to represent quantitatively the underlying noise/uncertainties in the experimental measurement of Tg.</description><subject>glass transition temperature</subject><subject>machine learning</subject><subject>polymers</subject><issn>0965-0393</issn><issn>1361-651X</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2019</creationdate><recordtype>article</recordtype><recordid>eNp9kE1Lw0AQhhdRsFbvHvfowdj9yCapNyl-FApeFLwtk2S23ZLdhN0tWPzzNlQ8iaeB4XlfZh5Crjm746yqZlwWPCsU_5gBmKqBEzL5XZ2SCZsXKmNyLs_JRYxbxpiqRDkhX0s3QJNob2gLCSImuvMNhgTWJ4uR9p46aDbWI-0Qgrd-TV3fYkeHgK1tku19vKdpgxQ_wQ0djl1D3-0dBrruIEaaAvhoR5ImdAMGSLuA8ZKcGegiXv3MKXl_enxbvGSr1-fl4mGVNbkoUiYK4EKwphb53DAhpCmN5DWXClHJOgesFIgW6zJXqqyEUTW2xeFBw4G1qpVTwo69TehjDGj0EKyDsNec6VGeHk3p0ZQ-yjtEbo8R2w962--CPxz4H37zB-6ii1qUWmgmcsaEHlojvwE8IIL_</recordid><startdate>20190117</startdate><enddate>20190117</enddate><creator>Jha, Anurag</creator><creator>Chandrasekaran, Anand</creator><creator>Kim, Chiho</creator><creator>Ramprasad, Rampi</creator><general>IOP Publishing</general><scope>AAYXX</scope><scope>CITATION</scope><orcidid>https://orcid.org/0000-0002-2794-3717</orcidid></search><sort><creationdate>20190117</creationdate><title>Impact of dataset uncertainties on machine learning model predictions: the example of polymer glass transition temperatures</title><author>Jha, Anurag ; Chandrasekaran, Anand ; Kim, Chiho ; Ramprasad, Rampi</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c426t-26a1220cb249f0223f7f31b135ee53b4ae85a2deb7455782f5bed6005f1a0d5d3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2019</creationdate><topic>glass transition temperature</topic><topic>machine learning</topic><topic>polymers</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Jha, Anurag</creatorcontrib><creatorcontrib>Chandrasekaran, Anand</creatorcontrib><creatorcontrib>Kim, Chiho</creatorcontrib><creatorcontrib>Ramprasad, Rampi</creatorcontrib><collection>CrossRef</collection><jtitle>Modelling and simulation in materials science and engineering</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Jha, Anurag</au><au>Chandrasekaran, Anand</au><au>Kim, Chiho</au><au>Ramprasad, Rampi</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Impact of dataset uncertainties on machine learning model predictions: the example of polymer glass transition temperatures</atitle><jtitle>Modelling and simulation in materials science and engineering</jtitle><stitle>MSMS</stitle><addtitle>Modelling Simul. Mater. Sci. Eng</addtitle><date>2019-01-17</date><risdate>2019</risdate><volume>27</volume><issue>2</issue><spage>24002</spage><pages>24002-</pages><issn>0965-0393</issn><eissn>1361-651X</eissn><coden>MSMEEU</coden><abstract>Over the past decade, there has been a resurgence in the importance of data-driven techniques in materials science and engineering. The utilization of state-of-the art algorithms, coupled with the increased availability of experimental and computational data, has led to the development of surrogate models offering the promise of rapid and accurate predictions of materials' properties based solely on their structure or composition. Such machine learning (ML) models are trained on available past data and are thus susceptible to the intrinsic uncertainties/errors associate with these past measurements. The glass transition temperature (Tg) of polymers, a property of paramount interest in polymer science, is one strong example of a material property that can show widespread variation in the final reported value as a result of a variety of intrinsic and extrinsic factors that occur during the experimental measurement process. In the current work, we curate a large database of Tg measurements from a variety of data sources and proceed to investigate the statistical nature of the inherent uncertainties in the database. Through the partitioning of the dataset using statistically relevant measures, we investigate the effect of variations in the dataset on the performance of the final ML model. We demonstrate that the measure of central tendency, median is a valid approximation when dealing with multiple reported values for Tg when dealing with multiple reported values of Tg for the same polymeric material. Moreover, the Bayesian model noise/uncertainty that emerges from our machine-learning pipeline is able to represent quantitatively the underlying noise/uncertainties in the experimental measurement of Tg.</abstract><pub>IOP Publishing</pub><doi>10.1088/1361-651X/aaf8ca</doi><tpages>9</tpages><orcidid>https://orcid.org/0000-0002-2794-3717</orcidid></addata></record>
fulltext fulltext
identifier ISSN: 0965-0393
ispartof Modelling and simulation in materials science and engineering, 2019-01, Vol.27 (2), p.24002
issn 0965-0393
1361-651X
language eng
recordid cdi_iop_journals_10_1088_1361_651X_aaf8ca
source IOP Publishing Journals; Institute of Physics (IOP) Journals - HEAL-Link
subjects glass transition temperature
machine learning
polymers
title Impact of dataset uncertainties on machine learning model predictions: the example of polymer glass transition temperatures
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-07T20%3A58%3A43IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-iop_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Impact%20of%20dataset%20uncertainties%20on%20machine%20learning%20model%20predictions:%20the%20example%20of%20polymer%20glass%20transition%20temperatures&rft.jtitle=Modelling%20and%20simulation%20in%20materials%20science%20and%20engineering&rft.au=Jha,%20Anurag&rft.date=2019-01-17&rft.volume=27&rft.issue=2&rft.spage=24002&rft.pages=24002-&rft.issn=0965-0393&rft.eissn=1361-651X&rft.coden=MSMEEU&rft_id=info:doi/10.1088/1361-651X/aaf8ca&rft_dat=%3Ciop_cross%3Emsmsaaf8ca%3C/iop_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true