Characterization of background noise in MiSeq MPS data when sequencing human mitochondrial DNA from various sample sources and library preparation methods
•Distinguishing minor variants from error is essential in reporting mtDNA heteroplasmy.•MiSeq has a low rate of error with the majority of error being randomly produced.•Increased error rates were observed with damaged DNA and low quantity templates.•A large proportion of error positions observed af...
Gespeichert in:
Veröffentlicht in: | Mitochondrion 2020-05, Vol.52, p.40-55 |
---|---|
Hauptverfasser: | , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | •Distinguishing minor variants from error is essential in reporting mtDNA heteroplasmy.•MiSeq has a low rate of error with the majority of error being randomly produced.•Increased error rates were observed with damaged DNA and low quantity templates.•A large proportion of error positions observed after a sequence of identical bases.•Error profiles were observed to shift across sample types and library preparations.•An easily implemented bioinformatic pipeline for error estimation is provided.
Improved resolution of massively parallel sequencing (MPS) allows for the characterization of mitochondrial (mt) DNA heteroplasmy to levels previously unattainable with traditional sequencing approaches. An essential criterion for the reporting of heteroplasmy is the ability of the MPS method to distinguish minor sequence variants (MSVs) from system noise, or error. Therefore, an assessment of the background noise in the MPS method is desirable to identify the point at which reliable data can be reported. Substitution and sequence specific error (SSE) was evaluated for a variety of sample types and two library preparations. Substitution error rates ranged from 0.18 to 0.49 per 100 nucleotides with C positions generally having the highest rate of misincorporation. Comparison of error rates across sample types indicated a significant increase for samples with damaged DNA. The positions of error were varied across datasets (pairwise concordance 0–68%), but had greater consistency within the damaged samples (80–96%). The most commonly observed motif preceding error in forward reads was CCG, while GGT was most common in reverse reads, both consistent with previous findings. The findings illustrate that for datasets containing samples with damaged DNA, reporting thresholds for heteroplasmy may have to be modified and individual sites with error levels exceeding thresholds should be scrutinized. Collectively, the shifting error profiles observed across the various sample types and library preparation methods demonstrates the need for an assessment of error under these varying circumstances. Characterization of the applicable background noise will help to ensure that thresholds are reliably set for detection of true MSVs. |
---|---|
ISSN: | 1567-7249 1872-8278 |
DOI: | 10.1016/j.mito.2020.02.005 |