Machine learning framework for gut microbiome biomarkers discovery and modulation analysis in large-scale obese population

The gut microbiome has proven to be an important factor affecting obesity; however, it remains a challenge to identify consistent biomarkers across geographic locations and perform precisely targeted modulation for obese individuals. This study proposed a systematic machine learning framework and ap...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:BMC genomics 2022-12, Vol.23 (1), p.850-850, Article 850
Hauptverfasser: Liu, Yaoliang, Zhu, Jinlin, Wang, Hongchao, Lu, Wenwei, Lee, Yuan Kun, Zhao, Jianxin, Zhang, Hao
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:The gut microbiome has proven to be an important factor affecting obesity; however, it remains a challenge to identify consistent biomarkers across geographic locations and perform precisely targeted modulation for obese individuals. This study proposed a systematic machine learning framework and applied it to 870 human stool metagenomes across five countries to obtain comprehensive regional shared biomarkers and conduct a personalized modulation analysis. In our pipeline, a heterogeneous ensemble feature selection diagram is first developed to determine an optimal subset of biomarkers through the aggregation of multiple techniques. Subsequently, a deep reinforcement learning method was established to alter the targeted composition to the desired healthy target. In this manner, we can realize personalized modulation by counterfactual inference. Consequently, a total of 42 species were identified as regional shared biomarkers, and they showed good performance in distinguishing obese people from the healthy group (area under curve (AUC) =0.85) when demonstrated on validation datasets. In addition, by pooling all counterfactual explanations, we found that Akkermansia muciniphila, Faecalibacterium prausnitzii, Prevotella copri, Bacteroides dorei, Bacteroides eggerthii, Alistipes finegoldii, Alistipes shahii, Eubacterium sp. _CAG_180, and Roseburia hominis may be potential broad-spectrum targets with consistent modulation in the multi-regional obese population. This article shows that based on our proposed machine-learning framework, we can obtain more comprehensive and accurate biomarkers and provide modulation analysis for the obese population. Moreover, our machine-learning framework will also be very useful for other researchers to further obtain biomarkers and perform counterfactual modulation analysis in different diseases.
ISSN:1471-2164
1471-2164
DOI:10.1186/s12864-022-09087-2