Machine learning prediction for constructing a universal multidimensional information library of Panax saponins (ginsenosides)

[Display omitted] •Multidimensional information library (GinMIL) was established based on GBM model.•GinMIL included 4D structure information (tR, MS, MS/MS, CCS) for 579 ginsenosides.•GinMIL improved accuracy of isomer identification to ca. 88%. Accurate characterization of Panax herb ginsenosides...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Food chemistry 2024-05, Vol.439, p.138106-138106, Article 138106
Hauptverfasser: Wang, Hongda, Zhang, Lin, Li, Xiaohang, Sun, Mengxiao, Jiang, Meiting, Shi, Xiaojian, Xu, Xiaoyan, Ding, Mengxiang, Chen, Boxue, Yu, Heshui, Li, Zheng, Guo, Dean, Yang, Wenzhi
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:[Display omitted] •Multidimensional information library (GinMIL) was established based on GBM model.•GinMIL included 4D structure information (tR, MS, MS/MS, CCS) for 579 ginsenosides.•GinMIL improved accuracy of isomer identification to ca. 88%. Accurate characterization of Panax herb ginsenosides is challenging because of the isomers and lack of sufficient reference compounds. More structural information could help differentiate ginsenosides and their isomers, enabling more accurate identification. Based on the VionTM ion-mobility high-resolution LC-MS platform, a multidimensional information library for ginsenosides, namely GinMIL, was established by predicting retention time (tR) and collision cross section (CCS) through machine learning. Robustness validation experiments proved tR and CCS were suitable for database construction. Among three machine learning models we attempted, gradient boosting machine (GBM) exhibited the best prediction performance. GinMIL included the multidimensional information (m/z, molecular formula, tR, CCS, and some MS/MS fragments) for 579 known ginsenosides. Accuracy in identifying ginsenosides from diverse ginseng products was greatly improved by a unique LC-MS approach and searching GinMIL, demonstrating a universal Panax saponins library constructed based on hierarchical design. GinMIL could improve the accuracy of isomers identification by approximately 88%.
ISSN:0308-8146
1873-7072
DOI:10.1016/j.foodchem.2023.138106