A cross entropy test allows quantitative statistical comparison of t-SNE and UMAP representations
The advent of high-dimensional single-cell data has necessitated the development of dimensionality-reduction tools. t-Distributed stochastic neighbor embedding (t-SNE) and uniform manifold approximation and projection (UMAP) are the two most frequently used approaches, allowing clear visualization o...
Gespeichert in:
Veröffentlicht in: | Cell reports methods 2023-01, Vol.3 (1), p.100390-100390, Article 100390 |
---|---|
Hauptverfasser: | , , , , , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | The advent of high-dimensional single-cell data has necessitated the development of dimensionality-reduction tools. t-Distributed stochastic neighbor embedding (t-SNE) and uniform manifold approximation and projection (UMAP) are the two most frequently used approaches, allowing clear visualization of complex single-cell datasets. Despite the need for quantitative comparison, t-SNE and UMAP have largely remained visualization tools due to the lack of robust statistical approaches. Here, we have derived a statistical test for evaluating the difference between dimensionality-reduced datasets using the Kolmogorov-Smirnov test on the distributions of cross entropy of single cells within each dataset. As the approach uses the inter-relationship of single cells for comparison, the resulting statistic is robust and capable of identifying true biological variation. Further, the test provides a valid distance between single-cell datasets, allowing the organization of multiple samples into a dendrogram for quantitative comparison of complex datasets. These results demonstrate the largely untapped potential of dimensionality-reduction tools for biomedical data analysis beyond visualization.
[Display omitted]
•A cross entropy test enables evaluation of differences between t-SNE and UMAP projections•The cross entropy test can distinguish biological variation from technical variation•The cross entropy test can quantify differences between multiple samples•Full code and instructions are given for applying the test to single cell datasets
Dimensionality-reduction tools, such as t-SNE and UMAP, are frequently used to visualize highly complex single-cell datasets in single-cell sequencing, flow cytometry, and mass cytometry. Despite the ubiquity of these approaches and the clear need for quantitative comparison of single-cell datasets, t-SNE and UMAP have largely remained data visualization tools, with a lack of robust statistical approaches available. We sought to fulfill the need for a statistical test to evaluate the difference between dimensionality-reduced datasets and provide a quantification of differences between multiple datasets.
Dimensionality-reduction tools such as t-SNE and UMAP allow visualizations of single-cell datasets. Roca et al. develop and validate the cross entropy test for robust comparison of dimensionality-reduced datasets in flow cytometry, mass cytometry, and single-cell sequencing. The test allows statistical significance assessment and quantific |
---|---|
ISSN: | 2667-2375 2667-2375 |
DOI: | 10.1016/j.crmeth.2022.100390 |