A synthetic-diploid benchmark for accurate variant-calling evaluation

Existing benchmark datasets for use in evaluating variant-calling accuracy are constructed from a consensus of known short-variant callers, and they are thus biased toward easy regions that are accessible by these algorithms. We derived a new benchmark dataset from the de novo PacBio assemblies of t...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Nature methods 2018-08, Vol.15 (8), p.595-597
Hauptverfasser: Li, Heng, Bloom, Jonathan M., Farjoun, Yossi, Fleharty, Mark, Gauthier, Laura, Neale, Benjamin, MacArthur, Daniel
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Existing benchmark datasets for use in evaluating variant-calling accuracy are constructed from a consensus of known short-variant callers, and they are thus biased toward easy regions that are accessible by these algorithms. We derived a new benchmark dataset from the de novo PacBio assemblies of two fully homozygous human cell lines, which provides a relatively more accurate and less biased estimate of small-variant-calling error rates in a realistic context. The synthetic-diploid (Syndip) benchmark dataset, constructed from two fully homozygous long-read assemblies, provides more accurate assessments of error rates in small-variant-calling algorithms than existing benchmarks.
ISSN:1548-7091
1548-7105
DOI:10.1038/s41592-018-0054-7