GEMimp: An Accurate and Robust Imputation Method for Microbiome Data Using Graph Embedding Neural Network

[Display omitted] •The analysis of microbiome data is frequently compromised by inherent sparsity issues, characterized by a substantial presence of observed zeros.•We introduce GEMimp, an innovative imputation method designed to infuse robustness into microbiome data analysis.•GEMimp leverages the...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Journal of molecular biology 2024-12, Vol.436 (23), p.168841, Article 168841
Hauptverfasser: Sun, Ziwei, Song, Kai
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:[Display omitted] •The analysis of microbiome data is frequently compromised by inherent sparsity issues, characterized by a substantial presence of observed zeros.•We introduce GEMimp, an innovative imputation method designed to infuse robustness into microbiome data analysis.•GEMimp leverages the node2vec algorithm, which incorporates both Breadth-First Search (BFS) and Depth-First Search (DFS) strategies in its random walks sampling process.•GEMimp shows notable proficiency in identifying significant taxa, enhancing the detection of disease-related taxa.•These findings collectively highlight the strong effectiveness of GEMimp, allowing for better analysis on microbial data. Microbiome research has increasingly underscored the profound link between microbial compositions and human health, with numerous studies establishing a strong correlation between microbiome characteristics and various diseases. However, the analysis of microbiome data is frequently compromised by inherent sparsity issues, characterized by a substantial presence of observed zeros. These zeros not only skew the abundance distribution of microbial species but also undermine the reliability of scientific conclusions drawn from such data. Addressing this challenge, we introduce GEMimp, an innovative imputation method designed to infuse robustness into microbiome data analysis. GEMimp leverages the node2vec algorithm, which incorporates both Breadth-First Search (BFS) and Depth-First Search (DFS) strategies in its random walks sampling process. This approach enables GEMimp to learn nuanced, low-dimensional representations of each taxonomic unit, facilitating the reconstruction of their similarity networks with unprecedented accuracy. Our comparative analysis pits GEMimp against state-of-the-art imputation methods including SAVER, MAGIC and mbImpute. The results unequivocally demonstrate that GEMimp outperforms its counterparts by achieving the highest Pearson correlation coefficient when compared to the original raw dataset. Furthermore, GEMimp shows notable proficiency in identifying significant taxa, enhancing the detection of disease-related taxa and effectively mitigating the impact of sparsity on both simulated and real-world datasets, such as those pertaining to Type 2 Diabetes (T2D) and Colorectal Cancer (CRC). These findings collectively highlight the strong effectiveness of GEMimp, allowing for better analysis on microbial data. With alleviation of sparsity issues, it could be gr
ISSN:0022-2836
1089-8638
1089-8638
DOI:10.1016/j.jmb.2024.168841