Impact of dataset uncertainties on machine learning model predictions: the example of polymer glass transition temperatures
Over the past decade, there has been a resurgence in the importance of data-driven techniques in materials science and engineering. The utilization of state-of-the art algorithms, coupled with the increased availability of experimental and computational data, has led to the development of surrogate...
Gespeichert in:
Veröffentlicht in: | Modelling and simulation in materials science and engineering 2019-01, Vol.27 (2), p.24002 |
---|---|
Hauptverfasser: | , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | |
---|---|
container_issue | 2 |
container_start_page | 24002 |
container_title | Modelling and simulation in materials science and engineering |
container_volume | 27 |
creator | Jha, Anurag Chandrasekaran, Anand Kim, Chiho Ramprasad, Rampi |
description | Over the past decade, there has been a resurgence in the importance of data-driven techniques in materials science and engineering. The utilization of state-of-the art algorithms, coupled with the increased availability of experimental and computational data, has led to the development of surrogate models offering the promise of rapid and accurate predictions of materials' properties based solely on their structure or composition. Such machine learning (ML) models are trained on available past data and are thus susceptible to the intrinsic uncertainties/errors associate with these past measurements. The glass transition temperature (Tg) of polymers, a property of paramount interest in polymer science, is one strong example of a material property that can show widespread variation in the final reported value as a result of a variety of intrinsic and extrinsic factors that occur during the experimental measurement process. In the current work, we curate a large database of Tg measurements from a variety of data sources and proceed to investigate the statistical nature of the inherent uncertainties in the database. Through the partitioning of the dataset using statistically relevant measures, we investigate the effect of variations in the dataset on the performance of the final ML model. We demonstrate that the measure of central tendency, median is a valid approximation when dealing with multiple reported values for Tg when dealing with multiple reported values of Tg for the same polymeric material. Moreover, the Bayesian model noise/uncertainty that emerges from our machine-learning pipeline is able to represent quantitatively the underlying noise/uncertainties in the experimental measurement of Tg. |
doi_str_mv | 10.1088/1361-651X/aaf8ca |
format | Article |
fullrecord | <record><control><sourceid>iop_cross</sourceid><recordid>TN_cdi_iop_journals_10_1088_1361_651X_aaf8ca</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>msmsaaf8ca</sourcerecordid><originalsourceid>FETCH-LOGICAL-c426t-26a1220cb249f0223f7f31b135ee53b4ae85a2deb7455782f5bed6005f1a0d5d3</originalsourceid><addsrcrecordid>eNp9kE1Lw0AQhhdRsFbvHvfowdj9yCapNyl-FApeFLwtk2S23ZLdhN0tWPzzNlQ8iaeB4XlfZh5Crjm746yqZlwWPCsU_5gBmKqBEzL5XZ2SCZsXKmNyLs_JRYxbxpiqRDkhX0s3QJNob2gLCSImuvMNhgTWJ4uR9p46aDbWI-0Qgrd-TV3fYkeHgK1tku19vKdpgxQ_wQ0djl1D3-0dBrruIEaaAvhoR5ImdAMGSLuA8ZKcGegiXv3MKXl_enxbvGSr1-fl4mGVNbkoUiYK4EKwphb53DAhpCmN5DWXClHJOgesFIgW6zJXqqyEUTW2xeFBw4G1qpVTwo69TehjDGj0EKyDsNec6VGeHk3p0ZQ-yjtEbo8R2w962--CPxz4H37zB-6ii1qUWmgmcsaEHlojvwE8IIL_</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Impact of dataset uncertainties on machine learning model predictions: the example of polymer glass transition temperatures</title><source>IOP Publishing Journals</source><source>Institute of Physics (IOP) Journals - HEAL-Link</source><creator>Jha, Anurag ; Chandrasekaran, Anand ; Kim, Chiho ; Ramprasad, Rampi</creator><creatorcontrib>Jha, Anurag ; Chandrasekaran, Anand ; Kim, Chiho ; Ramprasad, Rampi</creatorcontrib><description>Over the past decade, there has been a resurgence in the importance of data-driven techniques in materials science and engineering. The utilization of state-of-the art algorithms, coupled with the increased availability of experimental and computational data, has led to the development of surrogate models offering the promise of rapid and accurate predictions of materials' properties based solely on their structure or composition. Such machine learning (ML) models are trained on available past data and are thus susceptible to the intrinsic uncertainties/errors associate with these past measurements. The glass transition temperature (Tg) of polymers, a property of paramount interest in polymer science, is one strong example of a material property that can show widespread variation in the final reported value as a result of a variety of intrinsic and extrinsic factors that occur during the experimental measurement process. In the current work, we curate a large database of Tg measurements from a variety of data sources and proceed to investigate the statistical nature of the inherent uncertainties in the database. Through the partitioning of the dataset using statistically relevant measures, we investigate the effect of variations in the dataset on the performance of the final ML model. We demonstrate that the measure of central tendency, median is a valid approximation when dealing with multiple reported values for Tg when dealing with multiple reported values of Tg for the same polymeric material. Moreover, the Bayesian model noise/uncertainty that emerges from our machine-learning pipeline is able to represent quantitatively the underlying noise/uncertainties in the experimental measurement of Tg.</description><identifier>ISSN: 0965-0393</identifier><identifier>EISSN: 1361-651X</identifier><identifier>DOI: 10.1088/1361-651X/aaf8ca</identifier><identifier>CODEN: MSMEEU</identifier><language>eng</language><publisher>IOP Publishing</publisher><subject>glass transition temperature ; machine learning ; polymers</subject><ispartof>Modelling and simulation in materials science and engineering, 2019-01, Vol.27 (2), p.24002</ispartof><rights>2019 IOP Publishing Ltd</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c426t-26a1220cb249f0223f7f31b135ee53b4ae85a2deb7455782f5bed6005f1a0d5d3</citedby><cites>FETCH-LOGICAL-c426t-26a1220cb249f0223f7f31b135ee53b4ae85a2deb7455782f5bed6005f1a0d5d3</cites><orcidid>0000-0002-2794-3717</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://iopscience.iop.org/article/10.1088/1361-651X/aaf8ca/pdf$$EPDF$$P50$$Giop$$H</linktopdf><link.rule.ids>314,780,784,27924,27925,53846,53893</link.rule.ids></links><search><creatorcontrib>Jha, Anurag</creatorcontrib><creatorcontrib>Chandrasekaran, Anand</creatorcontrib><creatorcontrib>Kim, Chiho</creatorcontrib><creatorcontrib>Ramprasad, Rampi</creatorcontrib><title>Impact of dataset uncertainties on machine learning model predictions: the example of polymer glass transition temperatures</title><title>Modelling and simulation in materials science and engineering</title><addtitle>MSMS</addtitle><addtitle>Modelling Simul. Mater. Sci. Eng</addtitle><description>Over the past decade, there has been a resurgence in the importance of data-driven techniques in materials science and engineering. The utilization of state-of-the art algorithms, coupled with the increased availability of experimental and computational data, has led to the development of surrogate models offering the promise of rapid and accurate predictions of materials' properties based solely on their structure or composition. Such machine learning (ML) models are trained on available past data and are thus susceptible to the intrinsic uncertainties/errors associate with these past measurements. The glass transition temperature (Tg) of polymers, a property of paramount interest in polymer science, is one strong example of a material property that can show widespread variation in the final reported value as a result of a variety of intrinsic and extrinsic factors that occur during the experimental measurement process. In the current work, we curate a large database of Tg measurements from a variety of data sources and proceed to investigate the statistical nature of the inherent uncertainties in the database. Through the partitioning of the dataset using statistically relevant measures, we investigate the effect of variations in the dataset on the performance of the final ML model. We demonstrate that the measure of central tendency, median is a valid approximation when dealing with multiple reported values for Tg when dealing with multiple reported values of Tg for the same polymeric material. Moreover, the Bayesian model noise/uncertainty that emerges from our machine-learning pipeline is able to represent quantitatively the underlying noise/uncertainties in the experimental measurement of Tg.</description><subject>glass transition temperature</subject><subject>machine learning</subject><subject>polymers</subject><issn>0965-0393</issn><issn>1361-651X</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2019</creationdate><recordtype>article</recordtype><recordid>eNp9kE1Lw0AQhhdRsFbvHvfowdj9yCapNyl-FApeFLwtk2S23ZLdhN0tWPzzNlQ8iaeB4XlfZh5Crjm746yqZlwWPCsU_5gBmKqBEzL5XZ2SCZsXKmNyLs_JRYxbxpiqRDkhX0s3QJNob2gLCSImuvMNhgTWJ4uR9p46aDbWI-0Qgrd-TV3fYkeHgK1tku19vKdpgxQ_wQ0djl1D3-0dBrruIEaaAvhoR5ImdAMGSLuA8ZKcGegiXv3MKXl_enxbvGSr1-fl4mGVNbkoUiYK4EKwphb53DAhpCmN5DWXClHJOgesFIgW6zJXqqyEUTW2xeFBw4G1qpVTwo69TehjDGj0EKyDsNec6VGeHk3p0ZQ-yjtEbo8R2w962--CPxz4H37zB-6ii1qUWmgmcsaEHlojvwE8IIL_</recordid><startdate>20190117</startdate><enddate>20190117</enddate><creator>Jha, Anurag</creator><creator>Chandrasekaran, Anand</creator><creator>Kim, Chiho</creator><creator>Ramprasad, Rampi</creator><general>IOP Publishing</general><scope>AAYXX</scope><scope>CITATION</scope><orcidid>https://orcid.org/0000-0002-2794-3717</orcidid></search><sort><creationdate>20190117</creationdate><title>Impact of dataset uncertainties on machine learning model predictions: the example of polymer glass transition temperatures</title><author>Jha, Anurag ; Chandrasekaran, Anand ; Kim, Chiho ; Ramprasad, Rampi</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c426t-26a1220cb249f0223f7f31b135ee53b4ae85a2deb7455782f5bed6005f1a0d5d3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2019</creationdate><topic>glass transition temperature</topic><topic>machine learning</topic><topic>polymers</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Jha, Anurag</creatorcontrib><creatorcontrib>Chandrasekaran, Anand</creatorcontrib><creatorcontrib>Kim, Chiho</creatorcontrib><creatorcontrib>Ramprasad, Rampi</creatorcontrib><collection>CrossRef</collection><jtitle>Modelling and simulation in materials science and engineering</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Jha, Anurag</au><au>Chandrasekaran, Anand</au><au>Kim, Chiho</au><au>Ramprasad, Rampi</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Impact of dataset uncertainties on machine learning model predictions: the example of polymer glass transition temperatures</atitle><jtitle>Modelling and simulation in materials science and engineering</jtitle><stitle>MSMS</stitle><addtitle>Modelling Simul. Mater. Sci. Eng</addtitle><date>2019-01-17</date><risdate>2019</risdate><volume>27</volume><issue>2</issue><spage>24002</spage><pages>24002-</pages><issn>0965-0393</issn><eissn>1361-651X</eissn><coden>MSMEEU</coden><abstract>Over the past decade, there has been a resurgence in the importance of data-driven techniques in materials science and engineering. The utilization of state-of-the art algorithms, coupled with the increased availability of experimental and computational data, has led to the development of surrogate models offering the promise of rapid and accurate predictions of materials' properties based solely on their structure or composition. Such machine learning (ML) models are trained on available past data and are thus susceptible to the intrinsic uncertainties/errors associate with these past measurements. The glass transition temperature (Tg) of polymers, a property of paramount interest in polymer science, is one strong example of a material property that can show widespread variation in the final reported value as a result of a variety of intrinsic and extrinsic factors that occur during the experimental measurement process. In the current work, we curate a large database of Tg measurements from a variety of data sources and proceed to investigate the statistical nature of the inherent uncertainties in the database. Through the partitioning of the dataset using statistically relevant measures, we investigate the effect of variations in the dataset on the performance of the final ML model. We demonstrate that the measure of central tendency, median is a valid approximation when dealing with multiple reported values for Tg when dealing with multiple reported values of Tg for the same polymeric material. Moreover, the Bayesian model noise/uncertainty that emerges from our machine-learning pipeline is able to represent quantitatively the underlying noise/uncertainties in the experimental measurement of Tg.</abstract><pub>IOP Publishing</pub><doi>10.1088/1361-651X/aaf8ca</doi><tpages>9</tpages><orcidid>https://orcid.org/0000-0002-2794-3717</orcidid></addata></record> |
fulltext | fulltext |
identifier | ISSN: 0965-0393 |
ispartof | Modelling and simulation in materials science and engineering, 2019-01, Vol.27 (2), p.24002 |
issn | 0965-0393 1361-651X |
language | eng |
recordid | cdi_iop_journals_10_1088_1361_651X_aaf8ca |
source | IOP Publishing Journals; Institute of Physics (IOP) Journals - HEAL-Link |
subjects | glass transition temperature machine learning polymers |
title | Impact of dataset uncertainties on machine learning model predictions: the example of polymer glass transition temperatures |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-07T20%3A58%3A43IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-iop_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Impact%20of%20dataset%20uncertainties%20on%20machine%20learning%20model%20predictions:%20the%20example%20of%20polymer%20glass%20transition%20temperatures&rft.jtitle=Modelling%20and%20simulation%20in%20materials%20science%20and%20engineering&rft.au=Jha,%20Anurag&rft.date=2019-01-17&rft.volume=27&rft.issue=2&rft.spage=24002&rft.pages=24002-&rft.issn=0965-0393&rft.eissn=1361-651X&rft.coden=MSMEEU&rft_id=info:doi/10.1088/1361-651X/aaf8ca&rft_dat=%3Ciop_cross%3Emsmsaaf8ca%3C/iop_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true |