Demystifying "drop-outs" in single-cell UMI data

Many existing pipelines for scRNA-seq data apply pre-processing steps such as normalization or imputation to account for excessive zeros or "drop-outs." Here, we extensively analyze diverse UMI data sets to show that clustering should be the foremost step of the workflow. We observe that m...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Genome Biology 2020-08, Vol.21 (1), p.196-196, Article 196
Hauptverfasser:	Kim, Tae Hyun, Zhou, Xiang, Chen, Mengjie
Format:	Artikel
Sprache:	eng
Schlagworte:	B cells Bar codes Binomial distribution Clustering Cytotoxicity Data analysis data collection Datasets Feature selection Genes genome Genomics Molecular Typing - methods Noise pipelines Population Sequence Analysis, RNA Single-Cell Analysis Software Statistics
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Many existing pipelines for scRNA-seq data apply pre-processing steps such as normalization or imputation to account for excessive zeros or "drop-outs." Here, we extensively analyze diverse UMI data sets to show that clustering should be the foremost step of the workflow. We observe that most drop-outs disappear once cell-type heterogeneity is resolved, while imputing or normalizing heterogeneous data can introduce unwanted noise. We propose a novel framework HIPPO (Heterogeneity-Inspired Pre-Processing tOol) that leverages zero proportions to explain cellular heterogeneity and integrates feature selection with iterative clustering. HIPPO leads to downstream analysis with greater flexibility and interpretability compared to alternatives.
ISSN:	1474-760X 1474-7596 1474-760X
DOI:	10.1186/s13059-020-02096-y