A cross entropy test allows quantitative statistical comparison of t-SNE and UMAP representations

The advent of high-dimensional single-cell data has necessitated the development of dimensionality-reduction tools. t-Distributed stochastic neighbor embedding (t-SNE) and uniform manifold approximation and projection (UMAP) are the two most frequently used approaches, allowing clear visualization o...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Cell reports methods 2023-01, Vol.3 (1), p.100390-100390, Article 100390
Hauptverfasser: Roca, Carlos P., Burton, Oliver T., Neumann, Julika, Tareen, Samar, Whyte, Carly E., Gergelits, Vaclav, Veiga, Rafael V., Humblet-Baron, Stéphanie, Liston, Adrian
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 100390
container_issue 1
container_start_page 100390
container_title Cell reports methods
container_volume 3
creator Roca, Carlos P.
Burton, Oliver T.
Neumann, Julika
Tareen, Samar
Whyte, Carly E.
Gergelits, Vaclav
Veiga, Rafael V.
Humblet-Baron, Stéphanie
Liston, Adrian
description The advent of high-dimensional single-cell data has necessitated the development of dimensionality-reduction tools. t-Distributed stochastic neighbor embedding (t-SNE) and uniform manifold approximation and projection (UMAP) are the two most frequently used approaches, allowing clear visualization of complex single-cell datasets. Despite the need for quantitative comparison, t-SNE and UMAP have largely remained visualization tools due to the lack of robust statistical approaches. Here, we have derived a statistical test for evaluating the difference between dimensionality-reduced datasets using the Kolmogorov-Smirnov test on the distributions of cross entropy of single cells within each dataset. As the approach uses the inter-relationship of single cells for comparison, the resulting statistic is robust and capable of identifying true biological variation. Further, the test provides a valid distance between single-cell datasets, allowing the organization of multiple samples into a dendrogram for quantitative comparison of complex datasets. These results demonstrate the largely untapped potential of dimensionality-reduction tools for biomedical data analysis beyond visualization. [Display omitted] •A cross entropy test enables evaluation of differences between t-SNE and UMAP projections•The cross entropy test can distinguish biological variation from technical variation•The cross entropy test can quantify differences between multiple samples•Full code and instructions are given for applying the test to single cell datasets Dimensionality-reduction tools, such as t-SNE and UMAP, are frequently used to visualize highly complex single-cell datasets in single-cell sequencing, flow cytometry, and mass cytometry. Despite the ubiquity of these approaches and the clear need for quantitative comparison of single-cell datasets, t-SNE and UMAP have largely remained data visualization tools, with a lack of robust statistical approaches available. We sought to fulfill the need for a statistical test to evaluate the difference between dimensionality-reduced datasets and provide a quantification of differences between multiple datasets. Dimensionality-reduction tools such as t-SNE and UMAP allow visualizations of single-cell datasets. Roca et al. develop and validate the cross entropy test for robust comparison of dimensionality-reduced datasets in flow cytometry, mass cytometry, and single-cell sequencing. The test allows statistical significance assessment and quantific
doi_str_mv 10.1016/j.crmeth.2022.100390
format Article
fullrecord <record><control><sourceid>proquest_pubme</sourceid><recordid>TN_cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_9939422</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><els_id>S2667237522002958</els_id><sourcerecordid>2779347298</sourcerecordid><originalsourceid>FETCH-LOGICAL-c463t-1f03c06f31f2aa3c2a59fb2d930f3244b1df5286684a388d6cdb21f48a62a0f53</originalsourceid><addsrcrecordid>eNp9kU9P3DAQxa2KqiDKN6iQj1yy2GPHiS-VVogWJPpHajlbs44NXiVxsL1b8e2bZSmll57Gst-88bwfIR84W3DG1fl6YdPgyv0CGMB8xYRmb8gRKNVUIJr64NX5kJzkvGaMQc2F0PwdORSq5bIVzRHBJbUp5kzdWFKcHmlxuVDs-_gr04cNjiUULGHraN7VXILFnto4TJhCjiONnpbqx9dLimNHb78sv9PkpuTy7Dfr45jfk7ce--xOnusxuf10-fPiqrr59vn6YnlTWalEqbhnwjLlBfeAKCxgrf0KOi2YFyDline-hlapVqJo207ZbgXcyxYVIPO1OCYf977TZjW4zu4Wwt5MKQyYHk3EYP59GcO9uYtbo7XQEmA2OHs2SPFhM8dghpCt63scXdxkA02jhWxAt7NU7qVP2SXnX8ZwZnaAzNrsAZkdILMHNLedvv7iS9MfHH93cHNQ2-CSyTa40bouJGeL6WL4_4Tfs5alcQ</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2779347298</pqid></control><display><type>article</type><title>A cross entropy test allows quantitative statistical comparison of t-SNE and UMAP representations</title><source>DOAJ Directory of Open Access Journals</source><source>Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals</source><source>PubMed Central</source><source>Alma/SFX Local Collection</source><creator>Roca, Carlos P. ; Burton, Oliver T. ; Neumann, Julika ; Tareen, Samar ; Whyte, Carly E. ; Gergelits, Vaclav ; Veiga, Rafael V. ; Humblet-Baron, Stéphanie ; Liston, Adrian</creator><creatorcontrib>Roca, Carlos P. ; Burton, Oliver T. ; Neumann, Julika ; Tareen, Samar ; Whyte, Carly E. ; Gergelits, Vaclav ; Veiga, Rafael V. ; Humblet-Baron, Stéphanie ; Liston, Adrian</creatorcontrib><description>The advent of high-dimensional single-cell data has necessitated the development of dimensionality-reduction tools. t-Distributed stochastic neighbor embedding (t-SNE) and uniform manifold approximation and projection (UMAP) are the two most frequently used approaches, allowing clear visualization of complex single-cell datasets. Despite the need for quantitative comparison, t-SNE and UMAP have largely remained visualization tools due to the lack of robust statistical approaches. Here, we have derived a statistical test for evaluating the difference between dimensionality-reduced datasets using the Kolmogorov-Smirnov test on the distributions of cross entropy of single cells within each dataset. As the approach uses the inter-relationship of single cells for comparison, the resulting statistic is robust and capable of identifying true biological variation. Further, the test provides a valid distance between single-cell datasets, allowing the organization of multiple samples into a dendrogram for quantitative comparison of complex datasets. These results demonstrate the largely untapped potential of dimensionality-reduction tools for biomedical data analysis beyond visualization. [Display omitted] •A cross entropy test enables evaluation of differences between t-SNE and UMAP projections•The cross entropy test can distinguish biological variation from technical variation•The cross entropy test can quantify differences between multiple samples•Full code and instructions are given for applying the test to single cell datasets Dimensionality-reduction tools, such as t-SNE and UMAP, are frequently used to visualize highly complex single-cell datasets in single-cell sequencing, flow cytometry, and mass cytometry. Despite the ubiquity of these approaches and the clear need for quantitative comparison of single-cell datasets, t-SNE and UMAP have largely remained data visualization tools, with a lack of robust statistical approaches available. We sought to fulfill the need for a statistical test to evaluate the difference between dimensionality-reduced datasets and provide a quantification of differences between multiple datasets. Dimensionality-reduction tools such as t-SNE and UMAP allow visualizations of single-cell datasets. Roca et al. develop and validate the cross entropy test for robust comparison of dimensionality-reduced datasets in flow cytometry, mass cytometry, and single-cell sequencing. The test allows statistical significance assessment and quantification of differences.</description><identifier>ISSN: 2667-2375</identifier><identifier>EISSN: 2667-2375</identifier><identifier>DOI: 10.1016/j.crmeth.2022.100390</identifier><identifier>PMID: 36814837</identifier><language>eng</language><publisher>United States: Elsevier Inc</publisher><subject>dimensionality reduction ; flow cytometry ; mass cytometry ; single cell sequencing ; t-SNE ; UMAP</subject><ispartof>Cell reports methods, 2023-01, Vol.3 (1), p.100390-100390, Article 100390</ispartof><rights>2022 The Author(s)</rights><rights>2022 The Author(s).</rights><rights>2022 The Author(s) 2022</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c463t-1f03c06f31f2aa3c2a59fb2d930f3244b1df5286684a388d6cdb21f48a62a0f53</citedby><cites>FETCH-LOGICAL-c463t-1f03c06f31f2aa3c2a59fb2d930f3244b1df5286684a388d6cdb21f48a62a0f53</cites><orcidid>0000-0002-6272-4085</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC9939422/pdf/$$EPDF$$P50$$Gpubmedcentral$$Hfree_for_read</linktopdf><linktohtml>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC9939422/$$EHTML$$P50$$Gpubmedcentral$$Hfree_for_read</linktohtml><link.rule.ids>230,314,727,780,784,864,885,27923,27924,53790,53792</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/36814837$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Roca, Carlos P.</creatorcontrib><creatorcontrib>Burton, Oliver T.</creatorcontrib><creatorcontrib>Neumann, Julika</creatorcontrib><creatorcontrib>Tareen, Samar</creatorcontrib><creatorcontrib>Whyte, Carly E.</creatorcontrib><creatorcontrib>Gergelits, Vaclav</creatorcontrib><creatorcontrib>Veiga, Rafael V.</creatorcontrib><creatorcontrib>Humblet-Baron, Stéphanie</creatorcontrib><creatorcontrib>Liston, Adrian</creatorcontrib><title>A cross entropy test allows quantitative statistical comparison of t-SNE and UMAP representations</title><title>Cell reports methods</title><addtitle>Cell Rep Methods</addtitle><description>The advent of high-dimensional single-cell data has necessitated the development of dimensionality-reduction tools. t-Distributed stochastic neighbor embedding (t-SNE) and uniform manifold approximation and projection (UMAP) are the two most frequently used approaches, allowing clear visualization of complex single-cell datasets. Despite the need for quantitative comparison, t-SNE and UMAP have largely remained visualization tools due to the lack of robust statistical approaches. Here, we have derived a statistical test for evaluating the difference between dimensionality-reduced datasets using the Kolmogorov-Smirnov test on the distributions of cross entropy of single cells within each dataset. As the approach uses the inter-relationship of single cells for comparison, the resulting statistic is robust and capable of identifying true biological variation. Further, the test provides a valid distance between single-cell datasets, allowing the organization of multiple samples into a dendrogram for quantitative comparison of complex datasets. These results demonstrate the largely untapped potential of dimensionality-reduction tools for biomedical data analysis beyond visualization. [Display omitted] •A cross entropy test enables evaluation of differences between t-SNE and UMAP projections•The cross entropy test can distinguish biological variation from technical variation•The cross entropy test can quantify differences between multiple samples•Full code and instructions are given for applying the test to single cell datasets Dimensionality-reduction tools, such as t-SNE and UMAP, are frequently used to visualize highly complex single-cell datasets in single-cell sequencing, flow cytometry, and mass cytometry. Despite the ubiquity of these approaches and the clear need for quantitative comparison of single-cell datasets, t-SNE and UMAP have largely remained data visualization tools, with a lack of robust statistical approaches available. We sought to fulfill the need for a statistical test to evaluate the difference between dimensionality-reduced datasets and provide a quantification of differences between multiple datasets. Dimensionality-reduction tools such as t-SNE and UMAP allow visualizations of single-cell datasets. Roca et al. develop and validate the cross entropy test for robust comparison of dimensionality-reduced datasets in flow cytometry, mass cytometry, and single-cell sequencing. The test allows statistical significance assessment and quantification of differences.</description><subject>dimensionality reduction</subject><subject>flow cytometry</subject><subject>mass cytometry</subject><subject>single cell sequencing</subject><subject>t-SNE</subject><subject>UMAP</subject><issn>2667-2375</issn><issn>2667-2375</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><recordid>eNp9kU9P3DAQxa2KqiDKN6iQj1yy2GPHiS-VVogWJPpHajlbs44NXiVxsL1b8e2bZSmll57Gst-88bwfIR84W3DG1fl6YdPgyv0CGMB8xYRmb8gRKNVUIJr64NX5kJzkvGaMQc2F0PwdORSq5bIVzRHBJbUp5kzdWFKcHmlxuVDs-_gr04cNjiUULGHraN7VXILFnto4TJhCjiONnpbqx9dLimNHb78sv9PkpuTy7Dfr45jfk7ce--xOnusxuf10-fPiqrr59vn6YnlTWalEqbhnwjLlBfeAKCxgrf0KOi2YFyDline-hlapVqJo207ZbgXcyxYVIPO1OCYf977TZjW4zu4Wwt5MKQyYHk3EYP59GcO9uYtbo7XQEmA2OHs2SPFhM8dghpCt63scXdxkA02jhWxAt7NU7qVP2SXnX8ZwZnaAzNrsAZkdILMHNLedvv7iS9MfHH93cHNQ2-CSyTa40bouJGeL6WL4_4Tfs5alcQ</recordid><startdate>20230123</startdate><enddate>20230123</enddate><creator>Roca, Carlos P.</creator><creator>Burton, Oliver T.</creator><creator>Neumann, Julika</creator><creator>Tareen, Samar</creator><creator>Whyte, Carly E.</creator><creator>Gergelits, Vaclav</creator><creator>Veiga, Rafael V.</creator><creator>Humblet-Baron, Stéphanie</creator><creator>Liston, Adrian</creator><general>Elsevier Inc</general><general>Elsevier</general><scope>6I.</scope><scope>AAFTH</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7X8</scope><scope>5PM</scope><orcidid>https://orcid.org/0000-0002-6272-4085</orcidid></search><sort><creationdate>20230123</creationdate><title>A cross entropy test allows quantitative statistical comparison of t-SNE and UMAP representations</title><author>Roca, Carlos P. ; Burton, Oliver T. ; Neumann, Julika ; Tareen, Samar ; Whyte, Carly E. ; Gergelits, Vaclav ; Veiga, Rafael V. ; Humblet-Baron, Stéphanie ; Liston, Adrian</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c463t-1f03c06f31f2aa3c2a59fb2d930f3244b1df5286684a388d6cdb21f48a62a0f53</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>dimensionality reduction</topic><topic>flow cytometry</topic><topic>mass cytometry</topic><topic>single cell sequencing</topic><topic>t-SNE</topic><topic>UMAP</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Roca, Carlos P.</creatorcontrib><creatorcontrib>Burton, Oliver T.</creatorcontrib><creatorcontrib>Neumann, Julika</creatorcontrib><creatorcontrib>Tareen, Samar</creatorcontrib><creatorcontrib>Whyte, Carly E.</creatorcontrib><creatorcontrib>Gergelits, Vaclav</creatorcontrib><creatorcontrib>Veiga, Rafael V.</creatorcontrib><creatorcontrib>Humblet-Baron, Stéphanie</creatorcontrib><creatorcontrib>Liston, Adrian</creatorcontrib><collection>ScienceDirect Open Access Titles</collection><collection>Elsevier:ScienceDirect:Open Access</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>MEDLINE - Academic</collection><collection>PubMed Central (Full Participant titles)</collection><jtitle>Cell reports methods</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Roca, Carlos P.</au><au>Burton, Oliver T.</au><au>Neumann, Julika</au><au>Tareen, Samar</au><au>Whyte, Carly E.</au><au>Gergelits, Vaclav</au><au>Veiga, Rafael V.</au><au>Humblet-Baron, Stéphanie</au><au>Liston, Adrian</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>A cross entropy test allows quantitative statistical comparison of t-SNE and UMAP representations</atitle><jtitle>Cell reports methods</jtitle><addtitle>Cell Rep Methods</addtitle><date>2023-01-23</date><risdate>2023</risdate><volume>3</volume><issue>1</issue><spage>100390</spage><epage>100390</epage><pages>100390-100390</pages><artnum>100390</artnum><issn>2667-2375</issn><eissn>2667-2375</eissn><abstract>The advent of high-dimensional single-cell data has necessitated the development of dimensionality-reduction tools. t-Distributed stochastic neighbor embedding (t-SNE) and uniform manifold approximation and projection (UMAP) are the two most frequently used approaches, allowing clear visualization of complex single-cell datasets. Despite the need for quantitative comparison, t-SNE and UMAP have largely remained visualization tools due to the lack of robust statistical approaches. Here, we have derived a statistical test for evaluating the difference between dimensionality-reduced datasets using the Kolmogorov-Smirnov test on the distributions of cross entropy of single cells within each dataset. As the approach uses the inter-relationship of single cells for comparison, the resulting statistic is robust and capable of identifying true biological variation. Further, the test provides a valid distance between single-cell datasets, allowing the organization of multiple samples into a dendrogram for quantitative comparison of complex datasets. These results demonstrate the largely untapped potential of dimensionality-reduction tools for biomedical data analysis beyond visualization. [Display omitted] •A cross entropy test enables evaluation of differences between t-SNE and UMAP projections•The cross entropy test can distinguish biological variation from technical variation•The cross entropy test can quantify differences between multiple samples•Full code and instructions are given for applying the test to single cell datasets Dimensionality-reduction tools, such as t-SNE and UMAP, are frequently used to visualize highly complex single-cell datasets in single-cell sequencing, flow cytometry, and mass cytometry. Despite the ubiquity of these approaches and the clear need for quantitative comparison of single-cell datasets, t-SNE and UMAP have largely remained data visualization tools, with a lack of robust statistical approaches available. We sought to fulfill the need for a statistical test to evaluate the difference between dimensionality-reduced datasets and provide a quantification of differences between multiple datasets. Dimensionality-reduction tools such as t-SNE and UMAP allow visualizations of single-cell datasets. Roca et al. develop and validate the cross entropy test for robust comparison of dimensionality-reduced datasets in flow cytometry, mass cytometry, and single-cell sequencing. The test allows statistical significance assessment and quantification of differences.</abstract><cop>United States</cop><pub>Elsevier Inc</pub><pmid>36814837</pmid><doi>10.1016/j.crmeth.2022.100390</doi><tpages>1</tpages><orcidid>https://orcid.org/0000-0002-6272-4085</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 2667-2375
ispartof Cell reports methods, 2023-01, Vol.3 (1), p.100390-100390, Article 100390
issn 2667-2375
2667-2375
language eng
recordid cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_9939422
source DOAJ Directory of Open Access Journals; Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals; PubMed Central; Alma/SFX Local Collection
subjects dimensionality reduction
flow cytometry
mass cytometry
single cell sequencing
t-SNE
UMAP
title A cross entropy test allows quantitative statistical comparison of t-SNE and UMAP representations
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-13T03%3A18%3A04IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_pubme&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=A%20cross%20entropy%20test%20allows%20quantitative%20statistical%20comparison%20of%20t-SNE%20and%20UMAP%20representations&rft.jtitle=Cell%20reports%20methods&rft.au=Roca,%20Carlos%20P.&rft.date=2023-01-23&rft.volume=3&rft.issue=1&rft.spage=100390&rft.epage=100390&rft.pages=100390-100390&rft.artnum=100390&rft.issn=2667-2375&rft.eissn=2667-2375&rft_id=info:doi/10.1016/j.crmeth.2022.100390&rft_dat=%3Cproquest_pubme%3E2779347298%3C/proquest_pubme%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2779347298&rft_id=info:pmid/36814837&rft_els_id=S2667237522002958&rfr_iscdi=true