Counting unique molecular identifiers in sequencing using a multi-type branching process with immigration

Detection of extremely rare variant alleles, such as tumor DNA, within a complex mixture of DNA molecules is experimentally challenging due to sequencing errors. Barcoding of target DNA molecules in library construction for next-generation sequencing provides a way to identify and bioinformatically...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Journal of theoretical biology 2023-02, Vol.558, p.111365-111365, Article 111365
Hauptverfasser:	Sagitov, Serik, Ståhlberg, Anders
Format:	Artikel
Sprache:	eng
Schlagworte:	Biological Sciences Biologiska vetenskaper DNA Emigration and Immigration Growing immigration High-Throughput Nucleotide Sequencing - methods PCR amplification rate PCR branching process Polymerase Chain Reaction - methods Sequence Analysis, DNA - methods Sequencing Tree-bookkeeping Unique Molecular Identifier
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Detection of extremely rare variant alleles, such as tumor DNA, within a complex mixture of DNA molecules is experimentally challenging due to sequencing errors. Barcoding of target DNA molecules in library construction for next-generation sequencing provides a way to identify and bioinformatically remove polymerase induced errors. During the barcoding procedure involving t consecutive PCR cycles, the DNA molecules become barcoded by Unique Molecular Identifiers (UMIs). Different library construction protocols utilize different values of t. The effect of a larger t and imperfect PCR amplifications in relation to UMI cluster sizes is poorly described. This paper proposes a branching process with growing immigration as a model describing the random outcome of t cycles of PCR barcoding. Our model discriminates between five different amplification rates r1, r2, r3, r4, r for different types of molecules associated with the PCR barcoding procedure. We study this model by focussing on Ct, the number of clusters of molecules sharing the same UMI, as well as Ct(m), the number of UMI clusters of size m. Our main finding is a remarkable asymptotic pattern valid for moderately large t. It turns out that E(Ct(m))/E(Ct)≈2−m for m=1,2,…, regardless of the underlying parameters (r1,r2,r3,r4,r). The knowledge of the quantities Ct and Ct(m) as functions of the experimental parameters t and (r1,r2,r3,r4,r) will help the users to draw more adequate conclusions from the outcomes of different sequencing protocols. •The use of Unique Molecular Identifiers (UMIs) enables error-free sequencing.•We propose tree-bookkeeping for UMI clusters in PCR-based library construction.•Our branching process model discriminates between five PCR-amplification rates.•The numbers of UMI clusters of different sizes m decreases geometrically with m.
ISSN:	0022-5193 1095-8541
DOI:	10.1016/j.jtbi.2022.111365