Representation of compounds for machine-learning prediction of physical properties

The representations of a compound, called “descriptors” or “features”, play an essential role in constructing a machine-learning model of its physical properties. In this study, we adopt a procedure for generating a set of descriptors from simple elemental and structural representations. First, it i...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Physical review. B 2017-04, Vol.95 (14), p.144110, Article 144110
Hauptverfasser: Seko, Atsuto, Hayashi, Hiroyuki, Nakayama, Keita, Takahashi, Akira, Tanaka, Isao
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue 14
container_start_page 144110
container_title Physical review. B
container_volume 95
creator Seko, Atsuto
Hayashi, Hiroyuki
Nakayama, Keita
Takahashi, Akira
Tanaka, Isao
description The representations of a compound, called “descriptors” or “features”, play an essential role in constructing a machine-learning model of its physical properties. In this study, we adopt a procedure for generating a set of descriptors from simple elemental and structural representations. First, it is applied to a large data set composed of the cohesive energy for about 18 000 compounds computed by density functional theory calculation. As a result, we obtain a kernel ridge prediction model with a prediction error of 0.041 eV/atom, which is close to the “chemical accuracy” of 1 kcal/mol (0.043 eV/atom). A prediction model with an error of 0.071 eV/atom of the cohesive energy is obtained for the normalized prototype structures, which can be used for the practical purpose of searching for as-yet-unknown structures. The procedure is also applied to two smaller data sets, i.e., a data set of the lattice thermal conductivity for 110 compounds computed by density functional theory calculation and a data set of the experimental melting temperature for 248 compounds. We examine the effect of the descriptor sets on the efficiency of Bayesian optimization in addition to the accuracy of the kernel ridge regression models. They exhibit good predictive performances.
doi_str_mv 10.1103/PhysRevB.95.144110
format Article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2125765284</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2125765284</sourcerecordid><originalsourceid>FETCH-LOGICAL-c385t-78f9f00cfd319a451f0c99e1c9c9b6dcd5b8ccc3865c8bde8fd5f10d99ff693c3</originalsourceid><addsrcrecordid>eNo9kE1LAzEURYMoWGr_gKsB11OTmUlm3lKLX1BQiq5D5iWxKW0yJlOh_95Irav3uBzuhUPINaNzxmh9-7Y-pJX5vp8Dn7OmydkZmVSNgBJAwPn_z-klmaW0oZQyQaGlMCGrlRmiScaPanTBF8EWGHZD2HudChtisVO4dt6UW6Oid_6zyLh2eIKHvO1QbXMcBhNHZ9IVubBqm8zs707Jx-PD--K5XL4-vSzuliXWHR_LtrNgKUWrawaq4cxSBDAMAaEXGjXvO8TMCo5dr01nNbeMagBrBdRYT8nNsTdPf-1NGuUm7KPPk7JiFW8Fr7omU9WRwhhSisbKIbqdigfJqPzVJ0_6JHB51Ff_APMuZy8</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2125765284</pqid></control><display><type>article</type><title>Representation of compounds for machine-learning prediction of physical properties</title><source>American Physical Society Journals</source><creator>Seko, Atsuto ; Hayashi, Hiroyuki ; Nakayama, Keita ; Takahashi, Akira ; Tanaka, Isao</creator><creatorcontrib>Seko, Atsuto ; Hayashi, Hiroyuki ; Nakayama, Keita ; Takahashi, Akira ; Tanaka, Isao</creatorcontrib><description>The representations of a compound, called “descriptors” or “features”, play an essential role in constructing a machine-learning model of its physical properties. In this study, we adopt a procedure for generating a set of descriptors from simple elemental and structural representations. First, it is applied to a large data set composed of the cohesive energy for about 18 000 compounds computed by density functional theory calculation. As a result, we obtain a kernel ridge prediction model with a prediction error of 0.041 eV/atom, which is close to the “chemical accuracy” of 1 kcal/mol (0.043 eV/atom). A prediction model with an error of 0.071 eV/atom of the cohesive energy is obtained for the normalized prototype structures, which can be used for the practical purpose of searching for as-yet-unknown structures. The procedure is also applied to two smaller data sets, i.e., a data set of the lattice thermal conductivity for 110 compounds computed by density functional theory calculation and a data set of the experimental melting temperature for 248 compounds. We examine the effect of the descriptor sets on the efficiency of Bayesian optimization in addition to the accuracy of the kernel ridge regression models. They exhibit good predictive performances.</description><identifier>ISSN: 2469-9950</identifier><identifier>EISSN: 2469-9969</identifier><identifier>DOI: 10.1103/PhysRevB.95.144110</identifier><language>eng</language><publisher>College Park: American Physical Society</publisher><subject>Bayesian analysis ; Computation ; Datasets ; Density functional theory ; Machine learning ; Melt temperature ; Model accuracy ; Organic chemistry ; Performance prediction ; Physical properties ; Regression models ; Representations ; Thermal conductivity</subject><ispartof>Physical review. B, 2017-04, Vol.95 (14), p.144110, Article 144110</ispartof><rights>Copyright American Physical Society Apr 1, 2017</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c385t-78f9f00cfd319a451f0c99e1c9c9b6dcd5b8ccc3865c8bde8fd5f10d99ff693c3</citedby><cites>FETCH-LOGICAL-c385t-78f9f00cfd319a451f0c99e1c9c9b6dcd5b8ccc3865c8bde8fd5f10d99ff693c3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,776,780,2863,2864,27901,27902</link.rule.ids></links><search><creatorcontrib>Seko, Atsuto</creatorcontrib><creatorcontrib>Hayashi, Hiroyuki</creatorcontrib><creatorcontrib>Nakayama, Keita</creatorcontrib><creatorcontrib>Takahashi, Akira</creatorcontrib><creatorcontrib>Tanaka, Isao</creatorcontrib><title>Representation of compounds for machine-learning prediction of physical properties</title><title>Physical review. B</title><description>The representations of a compound, called “descriptors” or “features”, play an essential role in constructing a machine-learning model of its physical properties. In this study, we adopt a procedure for generating a set of descriptors from simple elemental and structural representations. First, it is applied to a large data set composed of the cohesive energy for about 18 000 compounds computed by density functional theory calculation. As a result, we obtain a kernel ridge prediction model with a prediction error of 0.041 eV/atom, which is close to the “chemical accuracy” of 1 kcal/mol (0.043 eV/atom). A prediction model with an error of 0.071 eV/atom of the cohesive energy is obtained for the normalized prototype structures, which can be used for the practical purpose of searching for as-yet-unknown structures. The procedure is also applied to two smaller data sets, i.e., a data set of the lattice thermal conductivity for 110 compounds computed by density functional theory calculation and a data set of the experimental melting temperature for 248 compounds. We examine the effect of the descriptor sets on the efficiency of Bayesian optimization in addition to the accuracy of the kernel ridge regression models. They exhibit good predictive performances.</description><subject>Bayesian analysis</subject><subject>Computation</subject><subject>Datasets</subject><subject>Density functional theory</subject><subject>Machine learning</subject><subject>Melt temperature</subject><subject>Model accuracy</subject><subject>Organic chemistry</subject><subject>Performance prediction</subject><subject>Physical properties</subject><subject>Regression models</subject><subject>Representations</subject><subject>Thermal conductivity</subject><issn>2469-9950</issn><issn>2469-9969</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2017</creationdate><recordtype>article</recordtype><recordid>eNo9kE1LAzEURYMoWGr_gKsB11OTmUlm3lKLX1BQiq5D5iWxKW0yJlOh_95Irav3uBzuhUPINaNzxmh9-7Y-pJX5vp8Dn7OmydkZmVSNgBJAwPn_z-klmaW0oZQyQaGlMCGrlRmiScaPanTBF8EWGHZD2HudChtisVO4dt6UW6Oid_6zyLh2eIKHvO1QbXMcBhNHZ9IVubBqm8zs707Jx-PD--K5XL4-vSzuliXWHR_LtrNgKUWrawaq4cxSBDAMAaEXGjXvO8TMCo5dr01nNbeMagBrBdRYT8nNsTdPf-1NGuUm7KPPk7JiFW8Fr7omU9WRwhhSisbKIbqdigfJqPzVJ0_6JHB51Ff_APMuZy8</recordid><startdate>20170419</startdate><enddate>20170419</enddate><creator>Seko, Atsuto</creator><creator>Hayashi, Hiroyuki</creator><creator>Nakayama, Keita</creator><creator>Takahashi, Akira</creator><creator>Tanaka, Isao</creator><general>American Physical Society</general><scope>AAYXX</scope><scope>CITATION</scope><scope>7SR</scope><scope>7U5</scope><scope>8BQ</scope><scope>8FD</scope><scope>H8D</scope><scope>JG9</scope><scope>L7M</scope></search><sort><creationdate>20170419</creationdate><title>Representation of compounds for machine-learning prediction of physical properties</title><author>Seko, Atsuto ; Hayashi, Hiroyuki ; Nakayama, Keita ; Takahashi, Akira ; Tanaka, Isao</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c385t-78f9f00cfd319a451f0c99e1c9c9b6dcd5b8ccc3865c8bde8fd5f10d99ff693c3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2017</creationdate><topic>Bayesian analysis</topic><topic>Computation</topic><topic>Datasets</topic><topic>Density functional theory</topic><topic>Machine learning</topic><topic>Melt temperature</topic><topic>Model accuracy</topic><topic>Organic chemistry</topic><topic>Performance prediction</topic><topic>Physical properties</topic><topic>Regression models</topic><topic>Representations</topic><topic>Thermal conductivity</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Seko, Atsuto</creatorcontrib><creatorcontrib>Hayashi, Hiroyuki</creatorcontrib><creatorcontrib>Nakayama, Keita</creatorcontrib><creatorcontrib>Takahashi, Akira</creatorcontrib><creatorcontrib>Tanaka, Isao</creatorcontrib><collection>CrossRef</collection><collection>Engineered Materials Abstracts</collection><collection>Solid State and Superconductivity Abstracts</collection><collection>METADEX</collection><collection>Technology Research Database</collection><collection>Aerospace Database</collection><collection>Materials Research Database</collection><collection>Advanced Technologies Database with Aerospace</collection><jtitle>Physical review. B</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Seko, Atsuto</au><au>Hayashi, Hiroyuki</au><au>Nakayama, Keita</au><au>Takahashi, Akira</au><au>Tanaka, Isao</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Representation of compounds for machine-learning prediction of physical properties</atitle><jtitle>Physical review. B</jtitle><date>2017-04-19</date><risdate>2017</risdate><volume>95</volume><issue>14</issue><spage>144110</spage><pages>144110-</pages><artnum>144110</artnum><issn>2469-9950</issn><eissn>2469-9969</eissn><abstract>The representations of a compound, called “descriptors” or “features”, play an essential role in constructing a machine-learning model of its physical properties. In this study, we adopt a procedure for generating a set of descriptors from simple elemental and structural representations. First, it is applied to a large data set composed of the cohesive energy for about 18 000 compounds computed by density functional theory calculation. As a result, we obtain a kernel ridge prediction model with a prediction error of 0.041 eV/atom, which is close to the “chemical accuracy” of 1 kcal/mol (0.043 eV/atom). A prediction model with an error of 0.071 eV/atom of the cohesive energy is obtained for the normalized prototype structures, which can be used for the practical purpose of searching for as-yet-unknown structures. The procedure is also applied to two smaller data sets, i.e., a data set of the lattice thermal conductivity for 110 compounds computed by density functional theory calculation and a data set of the experimental melting temperature for 248 compounds. We examine the effect of the descriptor sets on the efficiency of Bayesian optimization in addition to the accuracy of the kernel ridge regression models. They exhibit good predictive performances.</abstract><cop>College Park</cop><pub>American Physical Society</pub><doi>10.1103/PhysRevB.95.144110</doi><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 2469-9950
ispartof Physical review. B, 2017-04, Vol.95 (14), p.144110, Article 144110
issn 2469-9950
2469-9969
language eng
recordid cdi_proquest_journals_2125765284
source American Physical Society Journals
subjects Bayesian analysis
Computation
Datasets
Density functional theory
Machine learning
Melt temperature
Model accuracy
Organic chemistry
Performance prediction
Physical properties
Regression models
Representations
Thermal conductivity
title Representation of compounds for machine-learning prediction of physical properties
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-02T14%3A28%3A25IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Representation%20of%20compounds%20for%20machine-learning%20prediction%20of%20physical%20properties&rft.jtitle=Physical%20review.%20B&rft.au=Seko,%20Atsuto&rft.date=2017-04-19&rft.volume=95&rft.issue=14&rft.spage=144110&rft.pages=144110-&rft.artnum=144110&rft.issn=2469-9950&rft.eissn=2469-9969&rft_id=info:doi/10.1103/PhysRevB.95.144110&rft_dat=%3Cproquest_cross%3E2125765284%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2125765284&rft_id=info:pmid/&rfr_iscdi=true