k‑Means Clustering in Fingerprint-Based Configuration Selection for Fitting Interatomic Potentials
In this study, we present a method for selecting an arbitrary number of distinct configurations from a larger data set by applying k-means clustering to atomistic configuration fingerprints based on the CrystalNN model and radial distribution function (RDF). This approach improves the accuracy of fi...
Gespeichert in:
Veröffentlicht in: | Journal of chemical theory and computation 2024-12, Vol.20 (23), p.10676-10683 |
---|---|
Hauptverfasser: | , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 10683 |
---|---|
container_issue | 23 |
container_start_page | 10676 |
container_title | Journal of chemical theory and computation |
container_volume | 20 |
creator | Lebeda, Miroslav Drahokoupil, Jan Löbel, Ludvík Vlčák, Petr |
description | In this study, we present a method for selecting an arbitrary number of distinct configurations from a larger data set by applying k-means clustering to atomistic configuration fingerprints based on the CrystalNN model and radial distribution function (RDF). This approach improves the accuracy of fitting classical molecular dynamics interatomic potentials to density functional theory (DFT) data for both energies and forces while requiring fewer configurations than random selection. We demonstrate this improvement by fitting an embedded-atom method (EAM) potential for titanium, using various configurational sizes from an initial set of 1800 configurations. The k-means clustering consistently achieves better precision and lower standard deviations for a smaller number of configurations than random selection. The results also suggest that only about 30 configurations are sufficient to obtain an EAM model that describes well the full set of 1800 configurations in terms of energies and forces. Additionally, t-distributed stochastic neighbor embedding (t-SNE) method was used to reduce the configuration fingerprints into 2D space, and it revealed an overlap between two configuration subsets with and without Ti vacancy, indicating similar atomic environments. This similarity is captured by k-means clustering but not by random selection. Furthermore, when the overlapping configurations with vacancies were excluded from the k-means algorithm and used only as a test set, their energy and force predictions showed similar precision to those when they were included. This indicates that the overlapping configurations in the 2D t-SNE space indeed imply potential information redundancy among the atomistic configurations. |
doi_str_mv | 10.1021/acs.jctc.4c01225 |
format | Article |
fullrecord | <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_3130828226</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>3130828226</sourcerecordid><originalsourceid>FETCH-LOGICAL-a247t-4794738df5dccf4d5be1bfd8725fa663310b6fe9de7d9639868a96351d60d6473</originalsourceid><addsrcrecordid>eNp1kTtPwzAUhS0EoqWwM6FILAyk-BUnGSGiUKkIJGCOXD-qlNQutjOw8Rf4i_wS3AcdkJjOtfSd46t7ADhFcIggRldc-OFcBDGkAiKMsz3QRxkt05Jhtr-bUdEDR97PISSEYnIIeqTMGMKI9YF8-_78elDc-KRqOx-Ua8wsaUwyiqrcMj5DesO9kklljW5mneOhsSZ5Vq0S60lbF-kQVsaxiQk82EUjkicblAkNb_0xONBR1MlWB-B1dPtS3aeTx7txdT1JOaZ5SGle0pwUUmdSCE1lNlVoqmWR40xzxghBcMq0KqXKZclIWbCCR82QZFCyaB2Ai03u0tn3TvlQLxovVNtyo2zna4IILHCBMYvo-R90bjtn4naRohlGlNEyUnBDCWe9d0rX8SAL7j5qBOtVA3VsoF41UG8biJazbXA3XSi5M_yePAKXG2Bt_f3037wflseTUA</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3145214649</pqid></control><display><type>article</type><title>k‑Means Clustering in Fingerprint-Based Configuration Selection for Fitting Interatomic Potentials</title><source>ACS Publications</source><creator>Lebeda, Miroslav ; Drahokoupil, Jan ; Löbel, Ludvík ; Vlčák, Petr</creator><creatorcontrib>Lebeda, Miroslav ; Drahokoupil, Jan ; Löbel, Ludvík ; Vlčák, Petr</creatorcontrib><description>In this study, we present a method for selecting an arbitrary number of distinct configurations from a larger data set by applying k-means clustering to atomistic configuration fingerprints based on the CrystalNN model and radial distribution function (RDF). This approach improves the accuracy of fitting classical molecular dynamics interatomic potentials to density functional theory (DFT) data for both energies and forces while requiring fewer configurations than random selection. We demonstrate this improvement by fitting an embedded-atom method (EAM) potential for titanium, using various configurational sizes from an initial set of 1800 configurations. The k-means clustering consistently achieves better precision and lower standard deviations for a smaller number of configurations than random selection. The results also suggest that only about 30 configurations are sufficient to obtain an EAM model that describes well the full set of 1800 configurations in terms of energies and forces. Additionally, t-distributed stochastic neighbor embedding (t-SNE) method was used to reduce the configuration fingerprints into 2D space, and it revealed an overlap between two configuration subsets with and without Ti vacancy, indicating similar atomic environments. This similarity is captured by k-means clustering but not by random selection. Furthermore, when the overlapping configurations with vacancies were excluded from the k-means algorithm and used only as a test set, their energy and force predictions showed similar precision to those when they were included. This indicates that the overlapping configurations in the 2D t-SNE space indeed imply potential information redundancy among the atomistic configurations.</description><identifier>ISSN: 1549-9618</identifier><identifier>ISSN: 1549-9626</identifier><identifier>EISSN: 1549-9626</identifier><identifier>DOI: 10.1021/acs.jctc.4c01225</identifier><identifier>PMID: 39561216</identifier><language>eng</language><publisher>United States: American Chemical Society</publisher><subject>Algorithms ; Cluster analysis ; Clustering ; Condensed Matter, Interfaces, and Materials ; Configurations ; Density functional theory ; Distribution functions ; Embedded atom method ; Embedding ; Fingerprints ; Molecular dynamics ; Radial distribution ; Redundancy ; Vector quantization</subject><ispartof>Journal of chemical theory and computation, 2024-12, Vol.20 (23), p.10676-10683</ispartof><rights>2024 American Chemical Society</rights><rights>Copyright American Chemical Society Dec 10, 2024</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-a247t-4794738df5dccf4d5be1bfd8725fa663310b6fe9de7d9639868a96351d60d6473</cites><orcidid>0000-0002-4872-2681</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://pubs.acs.org/doi/pdf/10.1021/acs.jctc.4c01225$$EPDF$$P50$$Gacs$$H</linktopdf><linktohtml>$$Uhttps://pubs.acs.org/doi/10.1021/acs.jctc.4c01225$$EHTML$$P50$$Gacs$$H</linktohtml><link.rule.ids>314,776,780,2751,27055,27903,27904,56716,56766</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/39561216$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Lebeda, Miroslav</creatorcontrib><creatorcontrib>Drahokoupil, Jan</creatorcontrib><creatorcontrib>Löbel, Ludvík</creatorcontrib><creatorcontrib>Vlčák, Petr</creatorcontrib><title>k‑Means Clustering in Fingerprint-Based Configuration Selection for Fitting Interatomic Potentials</title><title>Journal of chemical theory and computation</title><addtitle>J. Chem. Theory Comput</addtitle><description>In this study, we present a method for selecting an arbitrary number of distinct configurations from a larger data set by applying k-means clustering to atomistic configuration fingerprints based on the CrystalNN model and radial distribution function (RDF). This approach improves the accuracy of fitting classical molecular dynamics interatomic potentials to density functional theory (DFT) data for both energies and forces while requiring fewer configurations than random selection. We demonstrate this improvement by fitting an embedded-atom method (EAM) potential for titanium, using various configurational sizes from an initial set of 1800 configurations. The k-means clustering consistently achieves better precision and lower standard deviations for a smaller number of configurations than random selection. The results also suggest that only about 30 configurations are sufficient to obtain an EAM model that describes well the full set of 1800 configurations in terms of energies and forces. Additionally, t-distributed stochastic neighbor embedding (t-SNE) method was used to reduce the configuration fingerprints into 2D space, and it revealed an overlap between two configuration subsets with and without Ti vacancy, indicating similar atomic environments. This similarity is captured by k-means clustering but not by random selection. Furthermore, when the overlapping configurations with vacancies were excluded from the k-means algorithm and used only as a test set, their energy and force predictions showed similar precision to those when they were included. This indicates that the overlapping configurations in the 2D t-SNE space indeed imply potential information redundancy among the atomistic configurations.</description><subject>Algorithms</subject><subject>Cluster analysis</subject><subject>Clustering</subject><subject>Condensed Matter, Interfaces, and Materials</subject><subject>Configurations</subject><subject>Density functional theory</subject><subject>Distribution functions</subject><subject>Embedded atom method</subject><subject>Embedding</subject><subject>Fingerprints</subject><subject>Molecular dynamics</subject><subject>Radial distribution</subject><subject>Redundancy</subject><subject>Vector quantization</subject><issn>1549-9618</issn><issn>1549-9626</issn><issn>1549-9626</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><recordid>eNp1kTtPwzAUhS0EoqWwM6FILAyk-BUnGSGiUKkIJGCOXD-qlNQutjOw8Rf4i_wS3AcdkJjOtfSd46t7ADhFcIggRldc-OFcBDGkAiKMsz3QRxkt05Jhtr-bUdEDR97PISSEYnIIeqTMGMKI9YF8-_78elDc-KRqOx-Ua8wsaUwyiqrcMj5DesO9kklljW5mneOhsSZ5Vq0S60lbF-kQVsaxiQk82EUjkicblAkNb_0xONBR1MlWB-B1dPtS3aeTx7txdT1JOaZ5SGle0pwUUmdSCE1lNlVoqmWR40xzxghBcMq0KqXKZclIWbCCR82QZFCyaB2Ai03u0tn3TvlQLxovVNtyo2zna4IILHCBMYvo-R90bjtn4naRohlGlNEyUnBDCWe9d0rX8SAL7j5qBOtVA3VsoF41UG8biJazbXA3XSi5M_yePAKXG2Bt_f3037wflseTUA</recordid><startdate>20241210</startdate><enddate>20241210</enddate><creator>Lebeda, Miroslav</creator><creator>Drahokoupil, Jan</creator><creator>Löbel, Ludvík</creator><creator>Vlčák, Petr</creator><general>American Chemical Society</general><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SR</scope><scope>7U5</scope><scope>8BQ</scope><scope>8FD</scope><scope>JG9</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>7X8</scope><orcidid>https://orcid.org/0000-0002-4872-2681</orcidid></search><sort><creationdate>20241210</creationdate><title>k‑Means Clustering in Fingerprint-Based Configuration Selection for Fitting Interatomic Potentials</title><author>Lebeda, Miroslav ; Drahokoupil, Jan ; Löbel, Ludvík ; Vlčák, Petr</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a247t-4794738df5dccf4d5be1bfd8725fa663310b6fe9de7d9639868a96351d60d6473</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Algorithms</topic><topic>Cluster analysis</topic><topic>Clustering</topic><topic>Condensed Matter, Interfaces, and Materials</topic><topic>Configurations</topic><topic>Density functional theory</topic><topic>Distribution functions</topic><topic>Embedded atom method</topic><topic>Embedding</topic><topic>Fingerprints</topic><topic>Molecular dynamics</topic><topic>Radial distribution</topic><topic>Redundancy</topic><topic>Vector quantization</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Lebeda, Miroslav</creatorcontrib><creatorcontrib>Drahokoupil, Jan</creatorcontrib><creatorcontrib>Löbel, Ludvík</creatorcontrib><creatorcontrib>Vlčák, Petr</creatorcontrib><collection>PubMed</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Engineered Materials Abstracts</collection><collection>Solid State and Superconductivity Abstracts</collection><collection>METADEX</collection><collection>Technology Research Database</collection><collection>Materials Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>MEDLINE - Academic</collection><jtitle>Journal of chemical theory and computation</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Lebeda, Miroslav</au><au>Drahokoupil, Jan</au><au>Löbel, Ludvík</au><au>Vlčák, Petr</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>k‑Means Clustering in Fingerprint-Based Configuration Selection for Fitting Interatomic Potentials</atitle><jtitle>Journal of chemical theory and computation</jtitle><addtitle>J. Chem. Theory Comput</addtitle><date>2024-12-10</date><risdate>2024</risdate><volume>20</volume><issue>23</issue><spage>10676</spage><epage>10683</epage><pages>10676-10683</pages><issn>1549-9618</issn><issn>1549-9626</issn><eissn>1549-9626</eissn><abstract>In this study, we present a method for selecting an arbitrary number of distinct configurations from a larger data set by applying k-means clustering to atomistic configuration fingerprints based on the CrystalNN model and radial distribution function (RDF). This approach improves the accuracy of fitting classical molecular dynamics interatomic potentials to density functional theory (DFT) data for both energies and forces while requiring fewer configurations than random selection. We demonstrate this improvement by fitting an embedded-atom method (EAM) potential for titanium, using various configurational sizes from an initial set of 1800 configurations. The k-means clustering consistently achieves better precision and lower standard deviations for a smaller number of configurations than random selection. The results also suggest that only about 30 configurations are sufficient to obtain an EAM model that describes well the full set of 1800 configurations in terms of energies and forces. Additionally, t-distributed stochastic neighbor embedding (t-SNE) method was used to reduce the configuration fingerprints into 2D space, and it revealed an overlap between two configuration subsets with and without Ti vacancy, indicating similar atomic environments. This similarity is captured by k-means clustering but not by random selection. Furthermore, when the overlapping configurations with vacancies were excluded from the k-means algorithm and used only as a test set, their energy and force predictions showed similar precision to those when they were included. This indicates that the overlapping configurations in the 2D t-SNE space indeed imply potential information redundancy among the atomistic configurations.</abstract><cop>United States</cop><pub>American Chemical Society</pub><pmid>39561216</pmid><doi>10.1021/acs.jctc.4c01225</doi><tpages>8</tpages><orcidid>https://orcid.org/0000-0002-4872-2681</orcidid></addata></record> |
fulltext | fulltext |
identifier | ISSN: 1549-9618 |
ispartof | Journal of chemical theory and computation, 2024-12, Vol.20 (23), p.10676-10683 |
issn | 1549-9618 1549-9626 1549-9626 |
language | eng |
recordid | cdi_proquest_miscellaneous_3130828226 |
source | ACS Publications |
subjects | Algorithms Cluster analysis Clustering Condensed Matter, Interfaces, and Materials Configurations Density functional theory Distribution functions Embedded atom method Embedding Fingerprints Molecular dynamics Radial distribution Redundancy Vector quantization |
title | k‑Means Clustering in Fingerprint-Based Configuration Selection for Fitting Interatomic Potentials |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-26T07%3A04%3A25IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=k%E2%80%91Means%20Clustering%20in%20Fingerprint-Based%20Configuration%20Selection%20for%20Fitting%20Interatomic%20Potentials&rft.jtitle=Journal%20of%20chemical%20theory%20and%20computation&rft.au=Lebeda,%20Miroslav&rft.date=2024-12-10&rft.volume=20&rft.issue=23&rft.spage=10676&rft.epage=10683&rft.pages=10676-10683&rft.issn=1549-9618&rft.eissn=1549-9626&rft_id=info:doi/10.1021/acs.jctc.4c01225&rft_dat=%3Cproquest_cross%3E3130828226%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=3145214649&rft_id=info:pmid/39561216&rfr_iscdi=true |