Similarity Clustering for Representative Sets of Inorganic Solids for Density Functional Testing
Benchmarking DFT functionals is complicated since the results highly depend on which properties and materials were used in the process. Unwanted biases can be introduced if a data set contains too many examples of very similar materials. We show that a clustering based on the distribution of density...
Gespeichert in:
Veröffentlicht in: | Journal of chemical theory and computation 2022-01, Vol.18 (1), p.441-447 |
---|---|
Hauptverfasser: | , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 447 |
---|---|
container_issue | 1 |
container_start_page | 441 |
container_title | Journal of chemical theory and computation |
container_volume | 18 |
creator | Kovács, Péter Tran, Fabien Hanbury, Allan Madsen, Georg K. H |
description | Benchmarking DFT functionals is complicated since the results highly depend on which properties and materials were used in the process. Unwanted biases can be introduced if a data set contains too many examples of very similar materials. We show that a clustering based on the distribution of density gradient and kinetic energy density is able to identify groups of chemically distinct solids. We then propose a method to create smaller data sets or rebalance existing data sets in a way that no region of the meta-GGA descriptor space is overrepresented, yet the new data set reproduces average errors of the original set as closely as possible. We apply the method to an existing set of 44 inorganic solids and suggest a representative set of seven solids. The representative sets generated with this method can be used to make more general benchmarks or to train new functionals. |
doi_str_mv | 10.1021/acs.jctc.1c00536 |
format | Article |
fullrecord | <record><control><sourceid>proquest_pubme</sourceid><recordid>TN_cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_8757462</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2623041897</sourcerecordid><originalsourceid>FETCH-LOGICAL-a461t-9f81e0aaab6fbcc900be54db44bbf3345a91c62b724c2f9eef9806b6e9915dbd3</originalsourceid><addsrcrecordid>eNp1kc1rGzEUxEVpaNK0957KQi891I6-Vl5dAsVtmkCgEKdnVdI-uTJryZW0gfz3lWPHJIHqIoF-M28eg9AHgqcEU3KmbZ6ubLFTYjFumXiFTkjL5UQKKl4f3qQ7Rm9zXmHMGKfsDTpmXBLJpDhBvxd-7QedfLlv5sOYCyQflo2LqbmBTYIMoeji76BZQMlNdM1ViGmpg7fNIg6-zw_sNwh5a3ExBlt8DHpobiGXavUOHTk9ZHi_v0_Rr4vvt_PLyfXPH1fzr9cTzQUpE-k6AlhrbYQz1kqMDbS8N5wb42rsVktiBTUzyi11EsDJDgsjQErS9qZnp-h857sZzRp6W3MnPahN8mud7lXUXj3_Cf6PWsY71c3aGRe0GnzeG6T4d6zh1dpnC8OgA8QxKyoIEfV0rKKfXqCrOKa69JaiDHPSyVml8I6yKeacwB3CEKy29alan9rWp_b1VcnHp0scBI99VeDLDniQPg79r98_apOphA</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2623041897</pqid></control><display><type>article</type><title>Similarity Clustering for Representative Sets of Inorganic Solids for Density Functional Testing</title><source>ACS Publications</source><creator>Kovács, Péter ; Tran, Fabien ; Hanbury, Allan ; Madsen, Georg K. H</creator><creatorcontrib>Kovács, Péter ; Tran, Fabien ; Hanbury, Allan ; Madsen, Georg K. H</creatorcontrib><description>Benchmarking DFT functionals is complicated since the results highly depend on which properties and materials were used in the process. Unwanted biases can be introduced if a data set contains too many examples of very similar materials. We show that a clustering based on the distribution of density gradient and kinetic energy density is able to identify groups of chemically distinct solids. We then propose a method to create smaller data sets or rebalance existing data sets in a way that no region of the meta-GGA descriptor space is overrepresented, yet the new data set reproduces average errors of the original set as closely as possible. We apply the method to an existing set of 44 inorganic solids and suggest a representative set of seven solids. The representative sets generated with this method can be used to make more general benchmarks or to train new functionals.</description><identifier>ISSN: 1549-9618</identifier><identifier>EISSN: 1549-9626</identifier><identifier>DOI: 10.1021/acs.jctc.1c00536</identifier><identifier>PMID: 34919396</identifier><language>eng</language><publisher>United States: American Chemical Society</publisher><subject>Clustering ; Condensed Matter, Interfaces, and Materials ; Datasets ; Flux density ; Functional testing ; Kinetic energy</subject><ispartof>Journal of chemical theory and computation, 2022-01, Vol.18 (1), p.441-447</ispartof><rights>2021 The Authors. Published by American Chemical Society</rights><rights>Copyright American Chemical Society Jan 11, 2022</rights><rights>2021 The Authors. Published by American Chemical Society 2021 The Authors</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-a461t-9f81e0aaab6fbcc900be54db44bbf3345a91c62b724c2f9eef9806b6e9915dbd3</citedby><cites>FETCH-LOGICAL-a461t-9f81e0aaab6fbcc900be54db44bbf3345a91c62b724c2f9eef9806b6e9915dbd3</cites><orcidid>0000-0003-4673-1987 ; 0000-0001-9844-9145</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://pubs.acs.org/doi/pdf/10.1021/acs.jctc.1c00536$$EPDF$$P50$$Gacs$$H</linktopdf><linktohtml>$$Uhttps://pubs.acs.org/doi/10.1021/acs.jctc.1c00536$$EHTML$$P50$$Gacs$$H</linktohtml><link.rule.ids>230,314,776,780,881,2752,27053,27901,27902,56713,56763</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/34919396$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Kovács, Péter</creatorcontrib><creatorcontrib>Tran, Fabien</creatorcontrib><creatorcontrib>Hanbury, Allan</creatorcontrib><creatorcontrib>Madsen, Georg K. H</creatorcontrib><title>Similarity Clustering for Representative Sets of Inorganic Solids for Density Functional Testing</title><title>Journal of chemical theory and computation</title><addtitle>J. Chem. Theory Comput</addtitle><description>Benchmarking DFT functionals is complicated since the results highly depend on which properties and materials were used in the process. Unwanted biases can be introduced if a data set contains too many examples of very similar materials. We show that a clustering based on the distribution of density gradient and kinetic energy density is able to identify groups of chemically distinct solids. We then propose a method to create smaller data sets or rebalance existing data sets in a way that no region of the meta-GGA descriptor space is overrepresented, yet the new data set reproduces average errors of the original set as closely as possible. We apply the method to an existing set of 44 inorganic solids and suggest a representative set of seven solids. The representative sets generated with this method can be used to make more general benchmarks or to train new functionals.</description><subject>Clustering</subject><subject>Condensed Matter, Interfaces, and Materials</subject><subject>Datasets</subject><subject>Flux density</subject><subject>Functional testing</subject><subject>Kinetic energy</subject><issn>1549-9618</issn><issn>1549-9626</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><recordid>eNp1kc1rGzEUxEVpaNK0957KQi891I6-Vl5dAsVtmkCgEKdnVdI-uTJryZW0gfz3lWPHJIHqIoF-M28eg9AHgqcEU3KmbZ6ubLFTYjFumXiFTkjL5UQKKl4f3qQ7Rm9zXmHMGKfsDTpmXBLJpDhBvxd-7QedfLlv5sOYCyQflo2LqbmBTYIMoeji76BZQMlNdM1ViGmpg7fNIg6-zw_sNwh5a3ExBlt8DHpobiGXavUOHTk9ZHi_v0_Rr4vvt_PLyfXPH1fzr9cTzQUpE-k6AlhrbYQz1kqMDbS8N5wb42rsVktiBTUzyi11EsDJDgsjQErS9qZnp-h857sZzRp6W3MnPahN8mud7lXUXj3_Cf6PWsY71c3aGRe0GnzeG6T4d6zh1dpnC8OgA8QxKyoIEfV0rKKfXqCrOKa69JaiDHPSyVml8I6yKeacwB3CEKy29alan9rWp_b1VcnHp0scBI99VeDLDniQPg79r98_apOphA</recordid><startdate>20220111</startdate><enddate>20220111</enddate><creator>Kovács, Péter</creator><creator>Tran, Fabien</creator><creator>Hanbury, Allan</creator><creator>Madsen, Georg K. H</creator><general>American Chemical Society</general><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SR</scope><scope>7U5</scope><scope>8BQ</scope><scope>8FD</scope><scope>JG9</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>7X8</scope><scope>5PM</scope><orcidid>https://orcid.org/0000-0003-4673-1987</orcidid><orcidid>https://orcid.org/0000-0001-9844-9145</orcidid></search><sort><creationdate>20220111</creationdate><title>Similarity Clustering for Representative Sets of Inorganic Solids for Density Functional Testing</title><author>Kovács, Péter ; Tran, Fabien ; Hanbury, Allan ; Madsen, Georg K. H</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a461t-9f81e0aaab6fbcc900be54db44bbf3345a91c62b724c2f9eef9806b6e9915dbd3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Clustering</topic><topic>Condensed Matter, Interfaces, and Materials</topic><topic>Datasets</topic><topic>Flux density</topic><topic>Functional testing</topic><topic>Kinetic energy</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Kovács, Péter</creatorcontrib><creatorcontrib>Tran, Fabien</creatorcontrib><creatorcontrib>Hanbury, Allan</creatorcontrib><creatorcontrib>Madsen, Georg K. H</creatorcontrib><collection>PubMed</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Engineered Materials Abstracts</collection><collection>Solid State and Superconductivity Abstracts</collection><collection>METADEX</collection><collection>Technology Research Database</collection><collection>Materials Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>MEDLINE - Academic</collection><collection>PubMed Central (Full Participant titles)</collection><jtitle>Journal of chemical theory and computation</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Kovács, Péter</au><au>Tran, Fabien</au><au>Hanbury, Allan</au><au>Madsen, Georg K. H</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Similarity Clustering for Representative Sets of Inorganic Solids for Density Functional Testing</atitle><jtitle>Journal of chemical theory and computation</jtitle><addtitle>J. Chem. Theory Comput</addtitle><date>2022-01-11</date><risdate>2022</risdate><volume>18</volume><issue>1</issue><spage>441</spage><epage>447</epage><pages>441-447</pages><issn>1549-9618</issn><eissn>1549-9626</eissn><abstract>Benchmarking DFT functionals is complicated since the results highly depend on which properties and materials were used in the process. Unwanted biases can be introduced if a data set contains too many examples of very similar materials. We show that a clustering based on the distribution of density gradient and kinetic energy density is able to identify groups of chemically distinct solids. We then propose a method to create smaller data sets or rebalance existing data sets in a way that no region of the meta-GGA descriptor space is overrepresented, yet the new data set reproduces average errors of the original set as closely as possible. We apply the method to an existing set of 44 inorganic solids and suggest a representative set of seven solids. The representative sets generated with this method can be used to make more general benchmarks or to train new functionals.</abstract><cop>United States</cop><pub>American Chemical Society</pub><pmid>34919396</pmid><doi>10.1021/acs.jctc.1c00536</doi><tpages>7</tpages><orcidid>https://orcid.org/0000-0003-4673-1987</orcidid><orcidid>https://orcid.org/0000-0001-9844-9145</orcidid><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | ISSN: 1549-9618 |
ispartof | Journal of chemical theory and computation, 2022-01, Vol.18 (1), p.441-447 |
issn | 1549-9618 1549-9626 |
language | eng |
recordid | cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_8757462 |
source | ACS Publications |
subjects | Clustering Condensed Matter, Interfaces, and Materials Datasets Flux density Functional testing Kinetic energy |
title | Similarity Clustering for Representative Sets of Inorganic Solids for Density Functional Testing |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-30T16%3A41%3A33IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_pubme&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Similarity%20Clustering%20for%20Representative%20Sets%20of%20Inorganic%20Solids%20for%20Density%20Functional%20Testing&rft.jtitle=Journal%20of%20chemical%20theory%20and%20computation&rft.au=Kova%CC%81cs,%20Pe%CC%81ter&rft.date=2022-01-11&rft.volume=18&rft.issue=1&rft.spage=441&rft.epage=447&rft.pages=441-447&rft.issn=1549-9618&rft.eissn=1549-9626&rft_id=info:doi/10.1021/acs.jctc.1c00536&rft_dat=%3Cproquest_pubme%3E2623041897%3C/proquest_pubme%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2623041897&rft_id=info:pmid/34919396&rfr_iscdi=true |