k‑Means Clustering in Fingerprint-Based Configuration Selection for Fitting Interatomic Potentials

In this study, we present a method for selecting an arbitrary number of distinct configurations from a larger data set by applying k-means clustering to atomistic configuration fingerprints based on the CrystalNN model and radial distribution function (RDF). This approach improves the accuracy of fi...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Journal of chemical theory and computation 2024-12, Vol.20 (23), p.10676-10683
Hauptverfasser: Lebeda, Miroslav, Drahokoupil, Jan, Löbel, Ludvík, Vlčák, Petr
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 10683
container_issue 23
container_start_page 10676
container_title Journal of chemical theory and computation
container_volume 20
creator Lebeda, Miroslav
Drahokoupil, Jan
Löbel, Ludvík
Vlčák, Petr
description In this study, we present a method for selecting an arbitrary number of distinct configurations from a larger data set by applying k-means clustering to atomistic configuration fingerprints based on the CrystalNN model and radial distribution function (RDF). This approach improves the accuracy of fitting classical molecular dynamics interatomic potentials to density functional theory (DFT) data for both energies and forces while requiring fewer configurations than random selection. We demonstrate this improvement by fitting an embedded-atom method (EAM) potential for titanium, using various configurational sizes from an initial set of 1800 configurations. The k-means clustering consistently achieves better precision and lower standard deviations for a smaller number of configurations than random selection. The results also suggest that only about 30 configurations are sufficient to obtain an EAM model that describes well the full set of 1800 configurations in terms of energies and forces. Additionally, t-distributed stochastic neighbor embedding (t-SNE) method was used to reduce the configuration fingerprints into 2D space, and it revealed an overlap between two configuration subsets with and without Ti vacancy, indicating similar atomic environments. This similarity is captured by k-means clustering but not by random selection. Furthermore, when the overlapping configurations with vacancies were excluded from the k-means algorithm and used only as a test set, their energy and force predictions showed similar precision to those when they were included. This indicates that the overlapping configurations in the 2D t-SNE space indeed imply potential information redundancy among the atomistic configurations.
doi_str_mv 10.1021/acs.jctc.4c01225
format Article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_3130828226</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>3130828226</sourcerecordid><originalsourceid>FETCH-LOGICAL-a247t-4794738df5dccf4d5be1bfd8725fa663310b6fe9de7d9639868a96351d60d6473</originalsourceid><addsrcrecordid>eNp1kTtPwzAUhS0EoqWwM6FILAyk-BUnGSGiUKkIJGCOXD-qlNQutjOw8Rf4i_wS3AcdkJjOtfSd46t7ADhFcIggRldc-OFcBDGkAiKMsz3QRxkt05Jhtr-bUdEDR97PISSEYnIIeqTMGMKI9YF8-_78elDc-KRqOx-Ua8wsaUwyiqrcMj5DesO9kklljW5mneOhsSZ5Vq0S60lbF-kQVsaxiQk82EUjkicblAkNb_0xONBR1MlWB-B1dPtS3aeTx7txdT1JOaZ5SGle0pwUUmdSCE1lNlVoqmWR40xzxghBcMq0KqXKZclIWbCCR82QZFCyaB2Ai03u0tn3TvlQLxovVNtyo2zna4IILHCBMYvo-R90bjtn4naRohlGlNEyUnBDCWe9d0rX8SAL7j5qBOtVA3VsoF41UG8biJazbXA3XSi5M_yePAKXG2Bt_f3037wflseTUA</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3145214649</pqid></control><display><type>article</type><title>k‑Means Clustering in Fingerprint-Based Configuration Selection for Fitting Interatomic Potentials</title><source>ACS Publications</source><creator>Lebeda, Miroslav ; Drahokoupil, Jan ; Löbel, Ludvík ; Vlčák, Petr</creator><creatorcontrib>Lebeda, Miroslav ; Drahokoupil, Jan ; Löbel, Ludvík ; Vlčák, Petr</creatorcontrib><description>In this study, we present a method for selecting an arbitrary number of distinct configurations from a larger data set by applying k-means clustering to atomistic configuration fingerprints based on the CrystalNN model and radial distribution function (RDF). This approach improves the accuracy of fitting classical molecular dynamics interatomic potentials to density functional theory (DFT) data for both energies and forces while requiring fewer configurations than random selection. We demonstrate this improvement by fitting an embedded-atom method (EAM) potential for titanium, using various configurational sizes from an initial set of 1800 configurations. The k-means clustering consistently achieves better precision and lower standard deviations for a smaller number of configurations than random selection. The results also suggest that only about 30 configurations are sufficient to obtain an EAM model that describes well the full set of 1800 configurations in terms of energies and forces. Additionally, t-distributed stochastic neighbor embedding (t-SNE) method was used to reduce the configuration fingerprints into 2D space, and it revealed an overlap between two configuration subsets with and without Ti vacancy, indicating similar atomic environments. This similarity is captured by k-means clustering but not by random selection. Furthermore, when the overlapping configurations with vacancies were excluded from the k-means algorithm and used only as a test set, their energy and force predictions showed similar precision to those when they were included. This indicates that the overlapping configurations in the 2D t-SNE space indeed imply potential information redundancy among the atomistic configurations.</description><identifier>ISSN: 1549-9618</identifier><identifier>ISSN: 1549-9626</identifier><identifier>EISSN: 1549-9626</identifier><identifier>DOI: 10.1021/acs.jctc.4c01225</identifier><identifier>PMID: 39561216</identifier><language>eng</language><publisher>United States: American Chemical Society</publisher><subject>Algorithms ; Cluster analysis ; Clustering ; Condensed Matter, Interfaces, and Materials ; Configurations ; Density functional theory ; Distribution functions ; Embedded atom method ; Embedding ; Fingerprints ; Molecular dynamics ; Radial distribution ; Redundancy ; Vector quantization</subject><ispartof>Journal of chemical theory and computation, 2024-12, Vol.20 (23), p.10676-10683</ispartof><rights>2024 American Chemical Society</rights><rights>Copyright American Chemical Society Dec 10, 2024</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-a247t-4794738df5dccf4d5be1bfd8725fa663310b6fe9de7d9639868a96351d60d6473</cites><orcidid>0000-0002-4872-2681</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://pubs.acs.org/doi/pdf/10.1021/acs.jctc.4c01225$$EPDF$$P50$$Gacs$$H</linktopdf><linktohtml>$$Uhttps://pubs.acs.org/doi/10.1021/acs.jctc.4c01225$$EHTML$$P50$$Gacs$$H</linktohtml><link.rule.ids>314,776,780,2751,27055,27903,27904,56716,56766</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/39561216$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Lebeda, Miroslav</creatorcontrib><creatorcontrib>Drahokoupil, Jan</creatorcontrib><creatorcontrib>Löbel, Ludvík</creatorcontrib><creatorcontrib>Vlčák, Petr</creatorcontrib><title>k‑Means Clustering in Fingerprint-Based Configuration Selection for Fitting Interatomic Potentials</title><title>Journal of chemical theory and computation</title><addtitle>J. Chem. Theory Comput</addtitle><description>In this study, we present a method for selecting an arbitrary number of distinct configurations from a larger data set by applying k-means clustering to atomistic configuration fingerprints based on the CrystalNN model and radial distribution function (RDF). This approach improves the accuracy of fitting classical molecular dynamics interatomic potentials to density functional theory (DFT) data for both energies and forces while requiring fewer configurations than random selection. We demonstrate this improvement by fitting an embedded-atom method (EAM) potential for titanium, using various configurational sizes from an initial set of 1800 configurations. The k-means clustering consistently achieves better precision and lower standard deviations for a smaller number of configurations than random selection. The results also suggest that only about 30 configurations are sufficient to obtain an EAM model that describes well the full set of 1800 configurations in terms of energies and forces. Additionally, t-distributed stochastic neighbor embedding (t-SNE) method was used to reduce the configuration fingerprints into 2D space, and it revealed an overlap between two configuration subsets with and without Ti vacancy, indicating similar atomic environments. This similarity is captured by k-means clustering but not by random selection. Furthermore, when the overlapping configurations with vacancies were excluded from the k-means algorithm and used only as a test set, their energy and force predictions showed similar precision to those when they were included. This indicates that the overlapping configurations in the 2D t-SNE space indeed imply potential information redundancy among the atomistic configurations.</description><subject>Algorithms</subject><subject>Cluster analysis</subject><subject>Clustering</subject><subject>Condensed Matter, Interfaces, and Materials</subject><subject>Configurations</subject><subject>Density functional theory</subject><subject>Distribution functions</subject><subject>Embedded atom method</subject><subject>Embedding</subject><subject>Fingerprints</subject><subject>Molecular dynamics</subject><subject>Radial distribution</subject><subject>Redundancy</subject><subject>Vector quantization</subject><issn>1549-9618</issn><issn>1549-9626</issn><issn>1549-9626</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><recordid>eNp1kTtPwzAUhS0EoqWwM6FILAyk-BUnGSGiUKkIJGCOXD-qlNQutjOw8Rf4i_wS3AcdkJjOtfSd46t7ADhFcIggRldc-OFcBDGkAiKMsz3QRxkt05Jhtr-bUdEDR97PISSEYnIIeqTMGMKI9YF8-_78elDc-KRqOx-Ua8wsaUwyiqrcMj5DesO9kklljW5mneOhsSZ5Vq0S60lbF-kQVsaxiQk82EUjkicblAkNb_0xONBR1MlWB-B1dPtS3aeTx7txdT1JOaZ5SGle0pwUUmdSCE1lNlVoqmWR40xzxghBcMq0KqXKZclIWbCCR82QZFCyaB2Ai03u0tn3TvlQLxovVNtyo2zna4IILHCBMYvo-R90bjtn4naRohlGlNEyUnBDCWe9d0rX8SAL7j5qBOtVA3VsoF41UG8biJazbXA3XSi5M_yePAKXG2Bt_f3037wflseTUA</recordid><startdate>20241210</startdate><enddate>20241210</enddate><creator>Lebeda, Miroslav</creator><creator>Drahokoupil, Jan</creator><creator>Löbel, Ludvík</creator><creator>Vlčák, Petr</creator><general>American Chemical Society</general><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SR</scope><scope>7U5</scope><scope>8BQ</scope><scope>8FD</scope><scope>JG9</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>7X8</scope><orcidid>https://orcid.org/0000-0002-4872-2681</orcidid></search><sort><creationdate>20241210</creationdate><title>k‑Means Clustering in Fingerprint-Based Configuration Selection for Fitting Interatomic Potentials</title><author>Lebeda, Miroslav ; Drahokoupil, Jan ; Löbel, Ludvík ; Vlčák, Petr</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a247t-4794738df5dccf4d5be1bfd8725fa663310b6fe9de7d9639868a96351d60d6473</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Algorithms</topic><topic>Cluster analysis</topic><topic>Clustering</topic><topic>Condensed Matter, Interfaces, and Materials</topic><topic>Configurations</topic><topic>Density functional theory</topic><topic>Distribution functions</topic><topic>Embedded atom method</topic><topic>Embedding</topic><topic>Fingerprints</topic><topic>Molecular dynamics</topic><topic>Radial distribution</topic><topic>Redundancy</topic><topic>Vector quantization</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Lebeda, Miroslav</creatorcontrib><creatorcontrib>Drahokoupil, Jan</creatorcontrib><creatorcontrib>Löbel, Ludvík</creatorcontrib><creatorcontrib>Vlčák, Petr</creatorcontrib><collection>PubMed</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Engineered Materials Abstracts</collection><collection>Solid State and Superconductivity Abstracts</collection><collection>METADEX</collection><collection>Technology Research Database</collection><collection>Materials Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>MEDLINE - Academic</collection><jtitle>Journal of chemical theory and computation</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Lebeda, Miroslav</au><au>Drahokoupil, Jan</au><au>Löbel, Ludvík</au><au>Vlčák, Petr</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>k‑Means Clustering in Fingerprint-Based Configuration Selection for Fitting Interatomic Potentials</atitle><jtitle>Journal of chemical theory and computation</jtitle><addtitle>J. Chem. Theory Comput</addtitle><date>2024-12-10</date><risdate>2024</risdate><volume>20</volume><issue>23</issue><spage>10676</spage><epage>10683</epage><pages>10676-10683</pages><issn>1549-9618</issn><issn>1549-9626</issn><eissn>1549-9626</eissn><abstract>In this study, we present a method for selecting an arbitrary number of distinct configurations from a larger data set by applying k-means clustering to atomistic configuration fingerprints based on the CrystalNN model and radial distribution function (RDF). This approach improves the accuracy of fitting classical molecular dynamics interatomic potentials to density functional theory (DFT) data for both energies and forces while requiring fewer configurations than random selection. We demonstrate this improvement by fitting an embedded-atom method (EAM) potential for titanium, using various configurational sizes from an initial set of 1800 configurations. The k-means clustering consistently achieves better precision and lower standard deviations for a smaller number of configurations than random selection. The results also suggest that only about 30 configurations are sufficient to obtain an EAM model that describes well the full set of 1800 configurations in terms of energies and forces. Additionally, t-distributed stochastic neighbor embedding (t-SNE) method was used to reduce the configuration fingerprints into 2D space, and it revealed an overlap between two configuration subsets with and without Ti vacancy, indicating similar atomic environments. This similarity is captured by k-means clustering but not by random selection. Furthermore, when the overlapping configurations with vacancies were excluded from the k-means algorithm and used only as a test set, their energy and force predictions showed similar precision to those when they were included. This indicates that the overlapping configurations in the 2D t-SNE space indeed imply potential information redundancy among the atomistic configurations.</abstract><cop>United States</cop><pub>American Chemical Society</pub><pmid>39561216</pmid><doi>10.1021/acs.jctc.4c01225</doi><tpages>8</tpages><orcidid>https://orcid.org/0000-0002-4872-2681</orcidid></addata></record>
fulltext fulltext
identifier ISSN: 1549-9618
ispartof Journal of chemical theory and computation, 2024-12, Vol.20 (23), p.10676-10683
issn 1549-9618
1549-9626
1549-9626
language eng
recordid cdi_proquest_miscellaneous_3130828226
source ACS Publications
subjects Algorithms
Cluster analysis
Clustering
Condensed Matter, Interfaces, and Materials
Configurations
Density functional theory
Distribution functions
Embedded atom method
Embedding
Fingerprints
Molecular dynamics
Radial distribution
Redundancy
Vector quantization
title k‑Means Clustering in Fingerprint-Based Configuration Selection for Fitting Interatomic Potentials
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-26T07%3A04%3A25IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=k%E2%80%91Means%20Clustering%20in%20Fingerprint-Based%20Configuration%20Selection%20for%20Fitting%20Interatomic%20Potentials&rft.jtitle=Journal%20of%20chemical%20theory%20and%20computation&rft.au=Lebeda,%20Miroslav&rft.date=2024-12-10&rft.volume=20&rft.issue=23&rft.spage=10676&rft.epage=10683&rft.pages=10676-10683&rft.issn=1549-9618&rft.eissn=1549-9626&rft_id=info:doi/10.1021/acs.jctc.4c01225&rft_dat=%3Cproquest_cross%3E3130828226%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=3145214649&rft_id=info:pmid/39561216&rfr_iscdi=true