Calling Structural Variants with Confidence from Short-Read Data in Wild Bird Populations

Abstract Comprehensive characterization of structural variation in natural populations has only become feasible in the last decade. To investigate the population genomic nature of structural variation, reproducible and high-confidence structural variation callsets are first required. We created a po...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Genome biology and evolution 2024-04, Vol.16 (4)
Hauptverfasser: David, Gabriel, Bertolotti, Alicia, Layer, Ryan, Scofield, Douglas, Hayward, Alexander, Baril, Tobias, Burnett, Hamish A, Gudmunds, Erik, Jensen, Henrik, Husby, Arild
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Abstract Comprehensive characterization of structural variation in natural populations has only become feasible in the last decade. To investigate the population genomic nature of structural variation, reproducible and high-confidence structural variation callsets are first required. We created a population-scale reference of the genome-wide landscape of structural variation across 33 Nordic house sparrows (Passer domesticus). To produce a consensus callset across all samples using short-read data, we compare heuristic-based quality filtering and visual curation (Samplot/PlotCritic and Samplot-ML) approaches. We demonstrate that curation of structural variants is important for reducing putative false positives and that the time invested in this step outweighs the potential costs of analyzing short-read–discovered structural variation data sets that include many potential false positives. We find that even a lenient manual curation strategy (e.g. applied by a single curator) can reduce the proportion of putative false positives by up to 80%, thus enriching the proportion of high-confidence variants. Crucially, in applying a lenient manual curation strategy with a single curator, nearly all (>99%) variants rejected as putative false positives were also classified as such by a more stringent curation strategy using three additional curators. Furthermore, variants rejected by manual curation failed to reflect the expected population structure from SNPs, whereas variants passing curation did. Combining heuristic-based quality filtering with rapid manual curation of structural variants in short-read data can therefore become a time- and cost-effective first step for functional and population genomic studies requiring high-confidence structural variation callsets.
ISSN:1759-6653
1759-6653
DOI:10.1093/gbe/evae049