A posterior probability based Bayesian method for single-cell RNA-seq data imputation

•We propose BayesImpute, a novel statistical algorithm to impute scRNA-seq data. BayesImpute first identifies likely dropouts, and then only imputes these values, which preserves the true biological signal and reduces the introduction of unwanted bias during imputation.•Unlike other statistical impu...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Methods (San Diego, Calif.) Calif.), 2023-08, Vol.216, p.21-38
Hauptverfasser: Chen, Siqi, Zheng, Ruiqing, Tian, Luyi, Wu, Fang-Xiang, Li, Min
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:•We propose BayesImpute, a novel statistical algorithm to impute scRNA-seq data. BayesImpute first identifies likely dropouts, and then only imputes these values, which preserves the true biological signal and reduces the introduction of unwanted bias during imputation.•Unlike other statistical imputation methods, the identification process of BayesImpute is straightforward and avoids parameter estimation, which increases its efficiency and user-friendliness in practice. In addition, BayesImpute takes advantage of cell-to-cell relationships and employs the Bayes approach to recover true biological signals, making the obtained estimations interpretable.•Using simulated and real scRNA-seq datasets, we demonstrate that BayesImpute can effectively identify dropouts, reduce the introduction of false positive signals, better recover the missing biological signals, and improve the reliability of some downstream analyses. More importantly, BayesImpute outperforms other statistical model-based methods in terms of computational running time, memory usage, and scalability. Single-cell RNA-sequencing (scRNA-seq) data suffer from a lot of zeros. Such dropout events impede the downstream data analyses. We propose BayesImpute to infer and impute dropouts from the scRNA-seq data. Using the expression rate and coefficient of variation of the genes within the cell subpopulation, BayesImpute first determines likely dropouts, and then constructs the posterior distribution for each gene and uses the posterior mean to impute dropout values. Some simulated and real experiments show that BayesImpute can effectively identify dropout events and reduce the introduction of false positive signals. Additionally, BayesImpute successfully recovers the true expression levels of missing values, restores the gene-to-gene and cell-to-cell correlation coefficient, and maintains the biological information in bulk RNA-seq data. Furthermore, BayesImpute boosts the clustering and visualization of cell subpopulations and improves the identification of differentially expressed genes. We further demonstrate that, in comparison to other statistical-based imputation methods, BayesImpute is scalable and fast with minimal memory usage.
ISSN:1046-2023
1095-9130
DOI:10.1016/j.ymeth.2023.06.004