Benchmarking the empirical accuracy of short-read sequencing across the M. tuberculosis genome

Abstract Motivation Short-read whole-genome sequencing (WGS) is a vital tool for clinical applications and basic research. Genetic divergence from the reference genome, repetitive sequences and sequencing bias reduces the performance of variant calling using short-read alignment, but the loss in rec...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Bioinformatics 2022-03, Vol.38 (7), p.1781-1787
Hauptverfasser:	Marin, Maximillian, Vargas, Roger, Harris, Michael, Jeffrey, Brendan, Epperson, L Elaine, Durbin, David, Strong, Michael, Salfinger, Max, Iqbal, Zamin, Akhundova, Irada, Vashakidze, Sergo, Crudu, Valeriu, Rosenthal, Alex, Farhat, Maha Reda
Format:	Artikel
Sprache:	eng
Schlagworte:	Benchmarking High-Throughput Nucleotide Sequencing - methods Humans Mycobacterium tuberculosis - genetics Original Papers Sequence Analysis, DNA - methods Software Tuberculosis
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Abstract Motivation Short-read whole-genome sequencing (WGS) is a vital tool for clinical applications and basic research. Genetic divergence from the reference genome, repetitive sequences and sequencing bias reduces the performance of variant calling using short-read alignment, but the loss in recall and specificity has not been adequately characterized. To benchmark short-read variant calling, we used 36 diverse clinical Mycobacterium tuberculosis (Mtb) isolates dually sequenced with Illumina short-reads and PacBio long-reads. We systematically studied the short-read variant calling accuracy and the influence of sequence uniqueness, reference bias and GC content. Results Reference-based Illumina variant calling demonstrated a maximum recall of 89.0% and minimum precision of 98.5% across parameters evaluated. The approach that maximized variant recall while still maintaining high precision (
ISSN:	1367-4803 1460-2059 1367-4811
DOI:	10.1093/bioinformatics/btac023