Similarity Clustering for Representative Sets of Inorganic Solids for Density Functional Testing

Benchmarking DFT functionals is complicated since the results highly depend on which properties and materials were used in the process. Unwanted biases can be introduced if a data set contains too many examples of very similar materials. We show that a clustering based on the distribution of density...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Journal of chemical theory and computation 2022-01, Vol.18 (1), p.441-447
Hauptverfasser: Kovács, Péter, Tran, Fabien, Hanbury, Allan, Madsen, Georg K. H
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 447
container_issue 1
container_start_page 441
container_title Journal of chemical theory and computation
container_volume 18
creator Kovács, Péter
Tran, Fabien
Hanbury, Allan
Madsen, Georg K. H
description Benchmarking DFT functionals is complicated since the results highly depend on which properties and materials were used in the process. Unwanted biases can be introduced if a data set contains too many examples of very similar materials. We show that a clustering based on the distribution of density gradient and kinetic energy density is able to identify groups of chemically distinct solids. We then propose a method to create smaller data sets or rebalance existing data sets in a way that no region of the meta-GGA descriptor space is overrepresented, yet the new data set reproduces average errors of the original set as closely as possible. We apply the method to an existing set of 44 inorganic solids and suggest a representative set of seven solids. The representative sets generated with this method can be used to make more general benchmarks or to train new functionals.
doi_str_mv 10.1021/acs.jctc.1c00536
format Article
fullrecord <record><control><sourceid>proquest_pubme</sourceid><recordid>TN_cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_8757462</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2623041897</sourcerecordid><originalsourceid>FETCH-LOGICAL-a461t-9f81e0aaab6fbcc900be54db44bbf3345a91c62b724c2f9eef9806b6e9915dbd3</originalsourceid><addsrcrecordid>eNp1kc1rGzEUxEVpaNK0957KQi891I6-Vl5dAsVtmkCgEKdnVdI-uTJryZW0gfz3lWPHJIHqIoF-M28eg9AHgqcEU3KmbZ6ubLFTYjFumXiFTkjL5UQKKl4f3qQ7Rm9zXmHMGKfsDTpmXBLJpDhBvxd-7QedfLlv5sOYCyQflo2LqbmBTYIMoeji76BZQMlNdM1ViGmpg7fNIg6-zw_sNwh5a3ExBlt8DHpobiGXavUOHTk9ZHi_v0_Rr4vvt_PLyfXPH1fzr9cTzQUpE-k6AlhrbYQz1kqMDbS8N5wb42rsVktiBTUzyi11EsDJDgsjQErS9qZnp-h857sZzRp6W3MnPahN8mud7lXUXj3_Cf6PWsY71c3aGRe0GnzeG6T4d6zh1dpnC8OgA8QxKyoIEfV0rKKfXqCrOKa69JaiDHPSyVml8I6yKeacwB3CEKy29alan9rWp_b1VcnHp0scBI99VeDLDniQPg79r98_apOphA</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2623041897</pqid></control><display><type>article</type><title>Similarity Clustering for Representative Sets of Inorganic Solids for Density Functional Testing</title><source>ACS Publications</source><creator>Kovács, Péter ; Tran, Fabien ; Hanbury, Allan ; Madsen, Georg K. H</creator><creatorcontrib>Kovács, Péter ; Tran, Fabien ; Hanbury, Allan ; Madsen, Georg K. H</creatorcontrib><description>Benchmarking DFT functionals is complicated since the results highly depend on which properties and materials were used in the process. Unwanted biases can be introduced if a data set contains too many examples of very similar materials. We show that a clustering based on the distribution of density gradient and kinetic energy density is able to identify groups of chemically distinct solids. We then propose a method to create smaller data sets or rebalance existing data sets in a way that no region of the meta-GGA descriptor space is overrepresented, yet the new data set reproduces average errors of the original set as closely as possible. We apply the method to an existing set of 44 inorganic solids and suggest a representative set of seven solids. The representative sets generated with this method can be used to make more general benchmarks or to train new functionals.</description><identifier>ISSN: 1549-9618</identifier><identifier>EISSN: 1549-9626</identifier><identifier>DOI: 10.1021/acs.jctc.1c00536</identifier><identifier>PMID: 34919396</identifier><language>eng</language><publisher>United States: American Chemical Society</publisher><subject>Clustering ; Condensed Matter, Interfaces, and Materials ; Datasets ; Flux density ; Functional testing ; Kinetic energy</subject><ispartof>Journal of chemical theory and computation, 2022-01, Vol.18 (1), p.441-447</ispartof><rights>2021 The Authors. Published by American Chemical Society</rights><rights>Copyright American Chemical Society Jan 11, 2022</rights><rights>2021 The Authors. Published by American Chemical Society 2021 The Authors</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-a461t-9f81e0aaab6fbcc900be54db44bbf3345a91c62b724c2f9eef9806b6e9915dbd3</citedby><cites>FETCH-LOGICAL-a461t-9f81e0aaab6fbcc900be54db44bbf3345a91c62b724c2f9eef9806b6e9915dbd3</cites><orcidid>0000-0003-4673-1987 ; 0000-0001-9844-9145</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://pubs.acs.org/doi/pdf/10.1021/acs.jctc.1c00536$$EPDF$$P50$$Gacs$$H</linktopdf><linktohtml>$$Uhttps://pubs.acs.org/doi/10.1021/acs.jctc.1c00536$$EHTML$$P50$$Gacs$$H</linktohtml><link.rule.ids>230,314,776,780,881,2752,27053,27901,27902,56713,56763</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/34919396$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Kovács, Péter</creatorcontrib><creatorcontrib>Tran, Fabien</creatorcontrib><creatorcontrib>Hanbury, Allan</creatorcontrib><creatorcontrib>Madsen, Georg K. H</creatorcontrib><title>Similarity Clustering for Representative Sets of Inorganic Solids for Density Functional Testing</title><title>Journal of chemical theory and computation</title><addtitle>J. Chem. Theory Comput</addtitle><description>Benchmarking DFT functionals is complicated since the results highly depend on which properties and materials were used in the process. Unwanted biases can be introduced if a data set contains too many examples of very similar materials. We show that a clustering based on the distribution of density gradient and kinetic energy density is able to identify groups of chemically distinct solids. We then propose a method to create smaller data sets or rebalance existing data sets in a way that no region of the meta-GGA descriptor space is overrepresented, yet the new data set reproduces average errors of the original set as closely as possible. We apply the method to an existing set of 44 inorganic solids and suggest a representative set of seven solids. The representative sets generated with this method can be used to make more general benchmarks or to train new functionals.</description><subject>Clustering</subject><subject>Condensed Matter, Interfaces, and Materials</subject><subject>Datasets</subject><subject>Flux density</subject><subject>Functional testing</subject><subject>Kinetic energy</subject><issn>1549-9618</issn><issn>1549-9626</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><recordid>eNp1kc1rGzEUxEVpaNK0957KQi891I6-Vl5dAsVtmkCgEKdnVdI-uTJryZW0gfz3lWPHJIHqIoF-M28eg9AHgqcEU3KmbZ6ubLFTYjFumXiFTkjL5UQKKl4f3qQ7Rm9zXmHMGKfsDTpmXBLJpDhBvxd-7QedfLlv5sOYCyQflo2LqbmBTYIMoeji76BZQMlNdM1ViGmpg7fNIg6-zw_sNwh5a3ExBlt8DHpobiGXavUOHTk9ZHi_v0_Rr4vvt_PLyfXPH1fzr9cTzQUpE-k6AlhrbYQz1kqMDbS8N5wb42rsVktiBTUzyi11EsDJDgsjQErS9qZnp-h857sZzRp6W3MnPahN8mud7lXUXj3_Cf6PWsY71c3aGRe0GnzeG6T4d6zh1dpnC8OgA8QxKyoIEfV0rKKfXqCrOKa69JaiDHPSyVml8I6yKeacwB3CEKy29alan9rWp_b1VcnHp0scBI99VeDLDniQPg79r98_apOphA</recordid><startdate>20220111</startdate><enddate>20220111</enddate><creator>Kovács, Péter</creator><creator>Tran, Fabien</creator><creator>Hanbury, Allan</creator><creator>Madsen, Georg K. H</creator><general>American Chemical Society</general><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SR</scope><scope>7U5</scope><scope>8BQ</scope><scope>8FD</scope><scope>JG9</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>7X8</scope><scope>5PM</scope><orcidid>https://orcid.org/0000-0003-4673-1987</orcidid><orcidid>https://orcid.org/0000-0001-9844-9145</orcidid></search><sort><creationdate>20220111</creationdate><title>Similarity Clustering for Representative Sets of Inorganic Solids for Density Functional Testing</title><author>Kovács, Péter ; Tran, Fabien ; Hanbury, Allan ; Madsen, Georg K. H</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a461t-9f81e0aaab6fbcc900be54db44bbf3345a91c62b724c2f9eef9806b6e9915dbd3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Clustering</topic><topic>Condensed Matter, Interfaces, and Materials</topic><topic>Datasets</topic><topic>Flux density</topic><topic>Functional testing</topic><topic>Kinetic energy</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Kovács, Péter</creatorcontrib><creatorcontrib>Tran, Fabien</creatorcontrib><creatorcontrib>Hanbury, Allan</creatorcontrib><creatorcontrib>Madsen, Georg K. H</creatorcontrib><collection>PubMed</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Engineered Materials Abstracts</collection><collection>Solid State and Superconductivity Abstracts</collection><collection>METADEX</collection><collection>Technology Research Database</collection><collection>Materials Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>MEDLINE - Academic</collection><collection>PubMed Central (Full Participant titles)</collection><jtitle>Journal of chemical theory and computation</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Kovács, Péter</au><au>Tran, Fabien</au><au>Hanbury, Allan</au><au>Madsen, Georg K. H</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Similarity Clustering for Representative Sets of Inorganic Solids for Density Functional Testing</atitle><jtitle>Journal of chemical theory and computation</jtitle><addtitle>J. Chem. Theory Comput</addtitle><date>2022-01-11</date><risdate>2022</risdate><volume>18</volume><issue>1</issue><spage>441</spage><epage>447</epage><pages>441-447</pages><issn>1549-9618</issn><eissn>1549-9626</eissn><abstract>Benchmarking DFT functionals is complicated since the results highly depend on which properties and materials were used in the process. Unwanted biases can be introduced if a data set contains too many examples of very similar materials. We show that a clustering based on the distribution of density gradient and kinetic energy density is able to identify groups of chemically distinct solids. We then propose a method to create smaller data sets or rebalance existing data sets in a way that no region of the meta-GGA descriptor space is overrepresented, yet the new data set reproduces average errors of the original set as closely as possible. We apply the method to an existing set of 44 inorganic solids and suggest a representative set of seven solids. The representative sets generated with this method can be used to make more general benchmarks or to train new functionals.</abstract><cop>United States</cop><pub>American Chemical Society</pub><pmid>34919396</pmid><doi>10.1021/acs.jctc.1c00536</doi><tpages>7</tpages><orcidid>https://orcid.org/0000-0003-4673-1987</orcidid><orcidid>https://orcid.org/0000-0001-9844-9145</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 1549-9618
ispartof Journal of chemical theory and computation, 2022-01, Vol.18 (1), p.441-447
issn 1549-9618
1549-9626
language eng
recordid cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_8757462
source ACS Publications
subjects Clustering
Condensed Matter, Interfaces, and Materials
Datasets
Flux density
Functional testing
Kinetic energy
title Similarity Clustering for Representative Sets of Inorganic Solids for Density Functional Testing
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-30T16%3A41%3A33IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_pubme&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Similarity%20Clustering%20for%20Representative%20Sets%20of%20Inorganic%20Solids%20for%20Density%20Functional%20Testing&rft.jtitle=Journal%20of%20chemical%20theory%20and%20computation&rft.au=Kova%CC%81cs,%20Pe%CC%81ter&rft.date=2022-01-11&rft.volume=18&rft.issue=1&rft.spage=441&rft.epage=447&rft.pages=441-447&rft.issn=1549-9618&rft.eissn=1549-9626&rft_id=info:doi/10.1021/acs.jctc.1c00536&rft_dat=%3Cproquest_pubme%3E2623041897%3C/proquest_pubme%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2623041897&rft_id=info:pmid/34919396&rfr_iscdi=true