GEMimp: An Accurate and Robust Imputation Method for Microbiome Data Using Graph Embedding Neural Network
[Display omitted] •The analysis of microbiome data is frequently compromised by inherent sparsity issues, characterized by a substantial presence of observed zeros.•We introduce GEMimp, an innovative imputation method designed to infuse robustness into microbiome data analysis.•GEMimp leverages the...
Gespeichert in:
Veröffentlicht in: | Journal of molecular biology 2024-12, Vol.436 (23), p.168841, Article 168841 |
---|---|
Hauptverfasser: | , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | [Display omitted]
•The analysis of microbiome data is frequently compromised by inherent sparsity issues, characterized by a substantial presence of observed zeros.•We introduce GEMimp, an innovative imputation method designed to infuse robustness into microbiome data analysis.•GEMimp leverages the node2vec algorithm, which incorporates both Breadth-First Search (BFS) and Depth-First Search (DFS) strategies in its random walks sampling process.•GEMimp shows notable proficiency in identifying significant taxa, enhancing the detection of disease-related taxa.•These findings collectively highlight the strong effectiveness of GEMimp, allowing for better analysis on microbial data.
Microbiome research has increasingly underscored the profound link between microbial compositions and human health, with numerous studies establishing a strong correlation between microbiome characteristics and various diseases. However, the analysis of microbiome data is frequently compromised by inherent sparsity issues, characterized by a substantial presence of observed zeros. These zeros not only skew the abundance distribution of microbial species but also undermine the reliability of scientific conclusions drawn from such data. Addressing this challenge, we introduce GEMimp, an innovative imputation method designed to infuse robustness into microbiome data analysis. GEMimp leverages the node2vec algorithm, which incorporates both Breadth-First Search (BFS) and Depth-First Search (DFS) strategies in its random walks sampling process. This approach enables GEMimp to learn nuanced, low-dimensional representations of each taxonomic unit, facilitating the reconstruction of their similarity networks with unprecedented accuracy.
Our comparative analysis pits GEMimp against state-of-the-art imputation methods including SAVER, MAGIC and mbImpute. The results unequivocally demonstrate that GEMimp outperforms its counterparts by achieving the highest Pearson correlation coefficient when compared to the original raw dataset. Furthermore, GEMimp shows notable proficiency in identifying significant taxa, enhancing the detection of disease-related taxa and effectively mitigating the impact of sparsity on both simulated and real-world datasets, such as those pertaining to Type 2 Diabetes (T2D) and Colorectal Cancer (CRC). These findings collectively highlight the strong effectiveness of GEMimp, allowing for better analysis on microbial data. With alleviation of sparsity issues, it could be gr |
---|---|
ISSN: | 0022-2836 1089-8638 1089-8638 |
DOI: | 10.1016/j.jmb.2024.168841 |