A Multioutput Regression Model using Stacked Ensemble Machine Learning Framework for Predicting Disease-Specific Microbial Characteristics

The omics method such as metagenomics and metabolomics shows the importance of the human gut microbiome in whole-body health and diseases. The microbe-related methods resulting into a huge volume of datasets require machine learning techniques to process them. All these factors do not aid reproducib...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Siddalingappa, Rashmi, Lopes S, Bruno, Gornale, Shivanand, S, Deepa, Kothandan, Gugan
Format: Dataset
Sprache:eng
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:The omics method such as metagenomics and metabolomics shows the importance of the human gut microbiome in whole-body health and diseases. The microbe-related methods resulting into a huge volume of datasets require machine learning techniques to process them. All these factors do not aid reproducibility, clinical translation and quite often result in skewed or false associations between the microbes and some disease, due to limitations such as- scarcity of sample size, inconsistency in protocols, and distribution of labels, and so on. To deal with these issues, we developed a microbial characteristics-based framework for the prediction of diseases using the GMrepo dataset which included 70335 samples. The dataset includes demographics details of the patients such as age, BMI, country, gender, name of the targeted disease, were used as input features for the model while the predicted bacteria and their relative abundance were the outputs. A stacked ensemble multi-model regressor was created using Random Forest and XGBoost as base and Meta learners respectively. The results were evaluated using various regression metrics such as the Mean Squared Error (MSE), Mean Absolute Error (MAE), R squared (R2), and Mean Absolute Percentage Error (MAPE), Symmetric MAPE (sMAPE), obtaining satisfactory results. Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC) were applied to balance between the model fit and complexity. This approach makes it possible to completely automate the process of finding microbial features for disease prediction by overcoming the critical issues of reproducibility and scalability. Moving the microbiome research from observational association studies to efficacy studies leads to further development of clinical applicability of our research and can contribute to the revolution in precision medicine.
DOI:10.5281/zenodo.14562393