Model-based analysis of sample index hopping reveals its widespread artifacts in multiplexed single-cell RNA-sequencing

Index hopping is the main cause of incorrect sample assignment of sequencing reads in multiplexed pooled libraries. We introduce a statistical model for estimating the sample index-hopping rate in multiplexed droplet-based single-cell RNA-seq data and for probabilistic inference of the true sample o...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Nature communications 2020-06, Vol.11 (1), p.2704-2704, Article 2704
Hauptverfasser: Farouni, Rick, Djambazian, Haig, Ferri, Lorenzo E., Ragoussis, Jiannis, Najafabadi, Hamed S.
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Index hopping is the main cause of incorrect sample assignment of sequencing reads in multiplexed pooled libraries. We introduce a statistical model for estimating the sample index-hopping rate in multiplexed droplet-based single-cell RNA-seq data and for probabilistic inference of the true sample of origin of hopped reads. We analyze several datasets and estimate the sample index hopping probability to range between 0.003–0.009, a small number that counter-intuitively gives rise to a large fraction of phantom molecules — the fraction of phantom molecules exceeds 8% in more than 25% of samples and reaches as high as 85% in low-complexity samples. Phantom molecules lead to widespread complications in downstream analyses, including transcriptome mixing across cells, emergence of phantom copies of cells from other samples, and misclassification of empty droplets as cells. We demonstrate that our approach can correct for these artifacts by accurately purging the majority of phantom molecules from the data. Sample index hopping results in various artefacts in multiplexed scRNA-seq experiments. Here, the authors introduce a statistical model to estimate sample index hopping rate in droplet-based scRNA-seq data and show that artifacts can be corrected by purging phantom molecules from the data.
ISSN:2041-1723
2041-1723
DOI:10.1038/s41467-020-16522-z