Statistical considerations and database limitations in NMR-based metabolic profiling studies
Introduction Interpretation and analysis of NMR-based metabolic profiling studies is limited by substantially incomplete commercial and academic databases. Statistical significance tests, including p-values, VIP scores, AUC values and FC values, can be largely inconsistent. Data normalization prior...
Gespeichert in:
Veröffentlicht in: | Metabolomics 2023-06, Vol.19 (7), p.64, Article 64 |
---|---|
Hauptverfasser: | , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Introduction
Interpretation and analysis of NMR-based metabolic profiling studies is limited by substantially incomplete commercial and academic databases. Statistical significance tests, including p-values, VIP scores, AUC values and FC values, can be largely inconsistent. Data normalization prior to statistical analysis can cause erroneous outcomes.
Objectives
The objectives were (1) to quantitatively assess consistency among p-values, VIP scores, AUC values and FC values in representative NMR-based metabolic profiling datasets, (2) to assess how data normalization can impact statistical significance outcomes, (3) to determine resonance peak assignment completion potential using commonly used databases and (4) to analyze intersection and uniqueness of metabolite space in these databases.
Methods
P-values, VIP scores, AUC values and FC values, and their dependence on data normalization, were determined in orthotopic mouse model of pancreatic cancer and two human pancreatic cancer cell lines. Completeness of resonance assignments were evaluated using Chenomx, the human metabolite database (HMDB) and the COLMAR database. The intersection and uniqueness of the databases was quantified.
Results
P-values and AUC values were strongly correlated compared to VIP or FC values. Distributions of statistically significant bins depended strongly on whether or not datasets were normalized. 40–45% of peaks had either no or ambiguous database matches. 9–22% of metabolites were unique to each database.
Conclusions
Lack of consistency in statistical analyses of metabolomics data can lead to misleading or inconsistent interpretation. Data normalization can have large effects on statistical analysis and should be justified. About 40% of peak assignments remain ambiguous or impossible with current databases. 1D and 2D databases should be made consistent to maximize metabolite assignment confidence and validation. |
---|---|
ISSN: | 1573-3890 1573-3882 1573-3890 |
DOI: | 10.1007/s11306-023-02027-5 |