Performance evaluate of different chemometrics formalisms used for prostate cancer diagnosis by NMR-based metabolomics

Introduction In general, two characteristics are ever present in NMR-based metabolomics studies: (1) they are assays aiming to classify the samples in different groups, and (2) the number of samples is smaller than the feature (chemical shift) number. It is also common to observe imbalanced datasets...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Metabolomics 2023-12, Vol.20 (1), p.8-8, Article 8
Hauptverfasser: Oliveira, Márcio Felipe, de Albuquerque Neto, Moacir Cavalcante, Leite, Thiago Siqueira, Alves, Paulo André Araújo, Lima, Salvador Vilar Correia, Silva, Ricardo Oliveira
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Introduction In general, two characteristics are ever present in NMR-based metabolomics studies: (1) they are assays aiming to classify the samples in different groups, and (2) the number of samples is smaller than the feature (chemical shift) number. It is also common to observe imbalanced datasets due to the sampling method and/or inclusion criteria. These situations can cause overfitting. However, appropriate feature selection and classification methods can be useful to solve this issue. Objectives Investigate the performance of metabolomics models built from the association between feature selectors, the absence of feature selection, and classification algorithms, as well as use the best performance model as an NMR-based metabolomic method for prostate cancer diagnosis. Methods We evaluated the performance of NMR-based metabolomics models for prostate cancer diagnosis using seven feature selectors and five classification formalisms. We also obtained metabolomics models without feature selection. In this study, thirty-eight volunteers with a positive diagnosis of prostate cancer and twenty-three healthy volunteers were enrolled. Results Thirty-eight models obtained were evaluated using AUROC, accuracy, sensitivity, specificity, and kappa’s index values. The best result was obtained when Genetic Algorithm was used with Linear Discriminant Analysis with 0.92 sensitivity, 0.83 specificity, and 0.88 accuracy. Conclusion The results show that the pick of a proper feature selection method and classification model, and a resampling method can avoid overfitting in a small metabolomic dataset. Furthermore, this approach would decrease the number of biopsies and optimize patient follow-up. 1 H NMR-based metabolomics promises to be a non-invasive tool in prostate cancer diagnosis.
ISSN:1573-3890
1573-3882
1573-3890
DOI:10.1007/s11306-023-02067-x