Theory-Guided Machine Learning in Materials Science

Materials scientists are increasingly adopting the use of machine learning tools to discover hidden trends in data and make predictions. Applying concepts from data science without foreknowledge of their limitations and the unique qualities of materials data, however, could lead to errant conclusion...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Frontiers in materials 2016-06, Vol.3
Hauptverfasser: Wagner, Nicholas, Rondinelli, James M.
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title Frontiers in materials
container_volume 3
creator Wagner, Nicholas
Rondinelli, James M.
description Materials scientists are increasingly adopting the use of machine learning tools to discover hidden trends in data and make predictions. Applying concepts from data science without foreknowledge of their limitations and the unique qualities of materials data, however, could lead to errant conclusions. The differences that exist between various kinds of experimental and calculated data require careful choices of data processing and machine learning methods. Here, we outline potential pitfalls involved in using machine learning without robust protocols. We address some problems of overfitting to training data using decision trees as an example, rational descriptor selection in the field of perovskites, and preserving physical interpretability in the application of dimensionality reducing techniques such as principal component analysis. We show how proceeding without the guidance of domain knowledge can lead to both quantitatively and qualitatively incorrect predictive models.
doi_str_mv 10.3389/fmats.2016.00028
format Article
fullrecord <record><control><sourceid>crossref_osti_</sourceid><recordid>TN_cdi_osti_scitechconnect_1466646</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>10_3389_fmats_2016_00028</sourcerecordid><originalsourceid>FETCH-LOGICAL-c378t-7a56d7be83f51aea7b26b1e2228ca9f685b047ef95c43c7983a766748f9e1ff93</originalsourceid><addsrcrecordid>eNpNkDtPwzAUhS0EElXpzhixJ_gVP0ZUQakUxECZLce5JkbUQbYZ-u9JWwame-7RpzN8CN0S3DCm9L3f25IbioloMMZUXaAFpVrUam4u_-VrtMr5c0YIoy0ndIHYboQpHerNTxhgqF6sG0OEqgObYogfVYhzVyAF-5WrNxcgOrhBV35-YfV3l-j96XG3fq671812_dDVjklVamlbMcgeFPMtsWBlT0VPgFKqnNVeqLbHXILXrePMSa2YlUJIrrwG4r1mS3R33p1yCSa7UMCNbooRXDGECyG4mCF8hlyack7gzXcKe5sOhmBzlGNOcsxRjjnJYb9bO1c1</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Theory-Guided Machine Learning in Materials Science</title><source>DOAJ Directory of Open Access Journals</source><source>EZB-FREE-00999 freely available EZB journals</source><creator>Wagner, Nicholas ; Rondinelli, James M.</creator><creatorcontrib>Wagner, Nicholas ; Rondinelli, James M. ; Northwestern Univ., Evanston, IL (United States)</creatorcontrib><description>Materials scientists are increasingly adopting the use of machine learning tools to discover hidden trends in data and make predictions. Applying concepts from data science without foreknowledge of their limitations and the unique qualities of materials data, however, could lead to errant conclusions. The differences that exist between various kinds of experimental and calculated data require careful choices of data processing and machine learning methods. Here, we outline potential pitfalls involved in using machine learning without robust protocols. We address some problems of overfitting to training data using decision trees as an example, rational descriptor selection in the field of perovskites, and preserving physical interpretability in the application of dimensionality reducing techniques such as principal component analysis. We show how proceeding without the guidance of domain knowledge can lead to both quantitatively and qualitatively incorrect predictive models.</description><identifier>ISSN: 2296-8016</identifier><identifier>EISSN: 2296-8016</identifier><identifier>DOI: 10.3389/fmats.2016.00028</identifier><language>eng</language><publisher>United States: Frontiers Research Foundation</publisher><subject>descriptor selection ; machine learning ; materials informatics ; MATERIALS SCIENCE ; overfitting ; theory</subject><ispartof>Frontiers in materials, 2016-06, Vol.3</ispartof><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c378t-7a56d7be83f51aea7b26b1e2228ca9f685b047ef95c43c7983a766748f9e1ff93</citedby><cites>FETCH-LOGICAL-c378t-7a56d7be83f51aea7b26b1e2228ca9f685b047ef95c43c7983a766748f9e1ff93</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>230,314,780,784,864,885,27924,27925</link.rule.ids><backlink>$$Uhttps://www.osti.gov/servlets/purl/1466646$$D View this record in Osti.gov$$Hfree_for_read</backlink></links><search><creatorcontrib>Wagner, Nicholas</creatorcontrib><creatorcontrib>Rondinelli, James M.</creatorcontrib><creatorcontrib>Northwestern Univ., Evanston, IL (United States)</creatorcontrib><title>Theory-Guided Machine Learning in Materials Science</title><title>Frontiers in materials</title><description>Materials scientists are increasingly adopting the use of machine learning tools to discover hidden trends in data and make predictions. Applying concepts from data science without foreknowledge of their limitations and the unique qualities of materials data, however, could lead to errant conclusions. The differences that exist between various kinds of experimental and calculated data require careful choices of data processing and machine learning methods. Here, we outline potential pitfalls involved in using machine learning without robust protocols. We address some problems of overfitting to training data using decision trees as an example, rational descriptor selection in the field of perovskites, and preserving physical interpretability in the application of dimensionality reducing techniques such as principal component analysis. We show how proceeding without the guidance of domain knowledge can lead to both quantitatively and qualitatively incorrect predictive models.</description><subject>descriptor selection</subject><subject>machine learning</subject><subject>materials informatics</subject><subject>MATERIALS SCIENCE</subject><subject>overfitting</subject><subject>theory</subject><issn>2296-8016</issn><issn>2296-8016</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2016</creationdate><recordtype>article</recordtype><recordid>eNpNkDtPwzAUhS0EElXpzhixJ_gVP0ZUQakUxECZLce5JkbUQbYZ-u9JWwame-7RpzN8CN0S3DCm9L3f25IbioloMMZUXaAFpVrUam4u_-VrtMr5c0YIoy0ndIHYboQpHerNTxhgqF6sG0OEqgObYogfVYhzVyAF-5WrNxcgOrhBV35-YfV3l-j96XG3fq671812_dDVjklVamlbMcgeFPMtsWBlT0VPgFKqnNVeqLbHXILXrePMSa2YlUJIrrwG4r1mS3R33p1yCSa7UMCNbooRXDGECyG4mCF8hlyack7gzXcKe5sOhmBzlGNOcsxRjjnJYb9bO1c1</recordid><startdate>20160627</startdate><enddate>20160627</enddate><creator>Wagner, Nicholas</creator><creator>Rondinelli, James M.</creator><general>Frontiers Research Foundation</general><scope>AAYXX</scope><scope>CITATION</scope><scope>OIOZB</scope><scope>OTOTI</scope></search><sort><creationdate>20160627</creationdate><title>Theory-Guided Machine Learning in Materials Science</title><author>Wagner, Nicholas ; Rondinelli, James M.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c378t-7a56d7be83f51aea7b26b1e2228ca9f685b047ef95c43c7983a766748f9e1ff93</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2016</creationdate><topic>descriptor selection</topic><topic>machine learning</topic><topic>materials informatics</topic><topic>MATERIALS SCIENCE</topic><topic>overfitting</topic><topic>theory</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Wagner, Nicholas</creatorcontrib><creatorcontrib>Rondinelli, James M.</creatorcontrib><creatorcontrib>Northwestern Univ., Evanston, IL (United States)</creatorcontrib><collection>CrossRef</collection><collection>OSTI.GOV - Hybrid</collection><collection>OSTI.GOV</collection><jtitle>Frontiers in materials</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Wagner, Nicholas</au><au>Rondinelli, James M.</au><aucorp>Northwestern Univ., Evanston, IL (United States)</aucorp><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Theory-Guided Machine Learning in Materials Science</atitle><jtitle>Frontiers in materials</jtitle><date>2016-06-27</date><risdate>2016</risdate><volume>3</volume><issn>2296-8016</issn><eissn>2296-8016</eissn><abstract>Materials scientists are increasingly adopting the use of machine learning tools to discover hidden trends in data and make predictions. Applying concepts from data science without foreknowledge of their limitations and the unique qualities of materials data, however, could lead to errant conclusions. The differences that exist between various kinds of experimental and calculated data require careful choices of data processing and machine learning methods. Here, we outline potential pitfalls involved in using machine learning without robust protocols. We address some problems of overfitting to training data using decision trees as an example, rational descriptor selection in the field of perovskites, and preserving physical interpretability in the application of dimensionality reducing techniques such as principal component analysis. We show how proceeding without the guidance of domain knowledge can lead to both quantitatively and qualitatively incorrect predictive models.</abstract><cop>United States</cop><pub>Frontiers Research Foundation</pub><doi>10.3389/fmats.2016.00028</doi><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 2296-8016
ispartof Frontiers in materials, 2016-06, Vol.3
issn 2296-8016
2296-8016
language eng
recordid cdi_osti_scitechconnect_1466646
source DOAJ Directory of Open Access Journals; EZB-FREE-00999 freely available EZB journals
subjects descriptor selection
machine learning
materials informatics
MATERIALS SCIENCE
overfitting
theory
title Theory-Guided Machine Learning in Materials Science
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-05T11%3A59%3A18IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-crossref_osti_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Theory-Guided%20Machine%20Learning%20in%20Materials%20Science&rft.jtitle=Frontiers%20in%20materials&rft.au=Wagner,%20Nicholas&rft.aucorp=Northwestern%20Univ.,%20Evanston,%20IL%20(United%20States)&rft.date=2016-06-27&rft.volume=3&rft.issn=2296-8016&rft.eissn=2296-8016&rft_id=info:doi/10.3389/fmats.2016.00028&rft_dat=%3Ccrossref_osti_%3E10_3389_fmats_2016_00028%3C/crossref_osti_%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true