Implications of Peak Selection in the Interpretation of Unsupervised Mass Spectrometry Imaging Data Analyses

Mass spectrometry imaging can produce large amounts of complex spectral and spatial data. Such data sets are often analyzed with unsupervised machine learning approaches, which aim at reducing their complexity and facilitating their interpretation. However, choices made during data processing can im...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Analytical chemistry (Washington) 2021-02, Vol.93 (4), p.2309-2316
Hauptverfasser:	Murta, Teresa, Steven, Rory T, Nikula, Chelsea J, Thomas, Spencer A, Zeiger, Lucas B, Dexter, Alex, Elia, Efstathios A, Yan, Bin, Campbell, Andrew D, Goodwin, Richard J. A, Takáts, Zoltan, Sansom, Owen J, Bunch, Josephine
Format:	Artikel
Sprache:	eng
Schlagworte:	Animal models Chemistry Cluster analysis Clustering Colorectal cancer Colorectal carcinoma Complexity Data analysis Data processing Embedding Genetic engineering Hypotheses Image segmentation Ions Learning algorithms Life Sciences Machine learning Mass spectrometry Mass spectroscopy Scientific imaging Spatial data Spectroscopy Vector quantization
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Mass spectrometry imaging can produce large amounts of complex spectral and spatial data. Such data sets are often analyzed with unsupervised machine learning approaches, which aim at reducing their complexity and facilitating their interpretation. However, choices made during data processing can impact the overall interpretation of these analyses. This work investigates the impact of the choices made at the peak selection step, which often occurs early in the data processing pipeline. The discussion is done in terms of visualization and interpretation of the results of two commonly used unsupervised approaches: t-distributed stochastic neighbor embedding and k-means clustering, which differ in nature and complexity. Criteria considered for peak selection include those based on hypotheses (exemplified herein in the analysis of metabolic alterations in genetically engineered mouse models of human colorectal cancer), particular molecular classes, and ion intensity. The results suggest that the choices made at the peak selection step have a significant impact in the visual interpretation of the results of either dimensionality reduction or clustering techniques and consequently in any downstream analysis that relies on these. Of particular significance, the results of this work show that while using the most abundant ions can result in interesting structure-related segmentation patterns that correlate well with histological features, using a smaller number of ions specifically selected based on prior knowledge about the biochemistry of the tissues under investigation can result in an easier-to-interpret, potentially more valuable, hypothesis-confirming result. Findings presented will help researchers understand and better utilize unsupervised machine learning approaches to mine high-dimensionality data.
ISSN:	0003-2700 1520-6882
DOI:	10.1021/acs.analchem.0c04179