Exploring the scores: Procrustes analysis for comprehensive exploration of multivariate data
Exploratory data analysis (EDA) through the projection of multivariate data into spaces of low dimensionality using methods such as principal components analysis (PCA) are at the core of chemometric applications in many fields, including metabolomics, biomarker discovery, food authentication, and ma...
Gespeichert in:
Veröffentlicht in: | Chemometrics and intelligent laboratory systems 2023-07, Vol.238, p.104841, Article 104841 |
---|---|
Hauptverfasser: | , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Exploratory data analysis (EDA) through the projection of multivariate data into spaces of low dimensionality using methods such as principal components analysis (PCA) are at the core of chemometric applications in many fields, including metabolomics, biomarker discovery, food authentication, and many others. In addition to revealing underlying class structures in the data, unsupervised EDA methods have become a de facto method of confirmatory analysis for object classification (e.g., by health status, provenance), especially for small sample sizes, because they are not plagued with problems of overfitting that often accompany supervised methods. However, the characteristics of the scores plots for EDA projection methods are often highly dependent on data analysis options chosen, such as the type of preprocessing used, the projection method employed, the variables selected and (in the case of multiblock data) how the data are combined. The combinations of these parameters can lead to hundreds of different scores plots that need to be manually assessed for results that are interesting to the researcher. The present work is intended to expedite this process through a relational analysis of multiple results using Procrustes analysis to compare projections and applying hierarchical clustering to summarize the results in the form of a dendrogram. The software developed, ScorXplor, allows projections to be quickly assessed for their similarity and quality, with interactive plotting of scores plots for visual evaluation. Moreover, the approach provides a better understanding of the role and relationships among the various analysis options (preprocessing, analysis tools, etc.). The method is demonstrated using multiblock spectral data (UV–visible, near-infrared, mid-infrared) for flavored olive oils from different regions of Italy, implementing different preprocessing and fusion options, and applying PCA and maximum likelihood PCA as projection methods.
•Interactive visualization tool for multiple results of multivariate data.•Procrustes rotation as a relational analysis to compare the quality of projections.•Quick quality evaluation of various analysis options.•Simple dendrogram interpretation and interactive viewing of scores plot. |
---|---|
ISSN: | 0169-7439 1873-3239 |
DOI: | 10.1016/j.chemolab.2023.104841 |