Barcode identification for single cell genomics

Single-cell sequencing experiments use short DNA barcode 'tags' to identify reads that originate from the same cell. In order to recover single-cell information from such experiments, reads must be grouped based on their barcode tag, a crucial processing step that precedes other computatio...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	BMC bioinformatics 2019-01, Vol.20 (1), p.32-32, Article 32
Hauptverfasser:	Tambe, Akshay, Pachter, Lior
Format:	Artikel
Sprache:	eng
Schlagworte:	Bar codes Barcode identification Barcodes Base sequence Bioinformatics Circularity Circularization Clonal deletion Computational biology de Bruijn graph Deoxyribonucleic acid DNA DNA barcoding DNA sequencing Error correction Experiments Gene expression Genetic research Genomes Genomics Identification Identification and classification Insertion K-mer counting Ribonucleic acid RNA RNA sequencing Science Single-cell Software Stem cells
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Single-cell sequencing experiments use short DNA barcode 'tags' to identify reads that originate from the same cell. In order to recover single-cell information from such experiments, reads must be grouped based on their barcode tag, a crucial processing step that precedes other computations. However, this step can be difficult due to high rates of mismatch and deletion errors that can afflict barcodes. Here we present an approach to identify and error-correct barcodes by traversing the de Bruijn graph of circularized barcode k-mers. Our approach is based on the observation that circularizing a barcode sequence can yield error-free k-mers even when the size of k is large relative to the length of the barcode sequence, a regime which is typical single-cell barcoding applications. This allows for assignment of reads to consensus fingerprints constructed from k-mers. We show that for single-cell RNA-Seq circularization improves the recovery of accurate single-cell transcriptome estimates, especially when there are a high number of errors per read. This approach is robust to the type of error (mismatch, insertion, deletion), as well as to the relative abundances of the cells. Sircel, a software package that implements this approach is described and publically available.
ISSN:	1471-2105 1471-2105
DOI:	10.1186/s12859-019-2612-0