Benchmarking the empirical accuracy of short-read sequencing across the M. tuberculosis genome

Abstract Motivation Short-read whole-genome sequencing (WGS) is a vital tool for clinical applications and basic research. Genetic divergence from the reference genome, repetitive sequences and sequencing bias reduces the performance of variant calling using short-read alignment, but the loss in rec...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Bioinformatics 2022-03, Vol.38 (7), p.1781-1787
Hauptverfasser: Marin, Maximillian, Vargas, Roger, Harris, Michael, Jeffrey, Brendan, Epperson, L Elaine, Durbin, David, Strong, Michael, Salfinger, Max, Iqbal, Zamin, Akhundova, Irada, Vashakidze, Sergo, Crudu, Valeriu, Rosenthal, Alex, Farhat, Maha Reda
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Abstract Motivation Short-read whole-genome sequencing (WGS) is a vital tool for clinical applications and basic research. Genetic divergence from the reference genome, repetitive sequences and sequencing bias reduces the performance of variant calling using short-read alignment, but the loss in recall and specificity has not been adequately characterized. To benchmark short-read variant calling, we used 36 diverse clinical Mycobacterium tuberculosis (Mtb) isolates dually sequenced with Illumina short-reads and PacBio long-reads. We systematically studied the short-read variant calling accuracy and the influence of sequence uniqueness, reference bias and GC content. Results Reference-based Illumina variant calling demonstrated a maximum recall of 89.0% and minimum precision of 98.5% across parameters evaluated. The approach that maximized variant recall while still maintaining high precision (
ISSN:1367-4803
1460-2059
1367-4811
DOI:10.1093/bioinformatics/btac023