Different structural variant prediction tools yield considerably different results in Caenorhabditis elegans

The accurate characterization of structural variation is crucial for our understanding of how large chromosomal alterations affect phenotypic differences and contribute to genome evolution. Whole-genome sequencing is a popular approach for identifying structural variants, but the accuracy of popular...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	PloS one 2022-12, Vol.17 (12), p.e0278424-e0278424
Hauptverfasser:	Lesack, Kyle, Mariene, Grace M, Andersen, Erik C, Wasmuth, James D
Format:	Artikel
Sprache:	eng
Schlagworte:	Accuracy Analysis Animals Benchmarks Biology and Life Sciences Caenorhabditis elegans Caenorhabditis elegans - genetics Datasets DNA sequencing Evolution Gene expression Gene sequencing Genetic aspects Genome, Human Genomes Genomics Health aspects High-Throughput Nucleotide Sequencing Humans Nematodes Nucleotide sequencing Performance evaluation Research and Analysis Methods Structural analysis Whole Genome Sequencing Worms
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	The accurate characterization of structural variation is crucial for our understanding of how large chromosomal alterations affect phenotypic differences and contribute to genome evolution. Whole-genome sequencing is a popular approach for identifying structural variants, but the accuracy of popular tools remains unclear due to the limitations of existing benchmarks. Moreover, the performance of these tools for predicting variants in non-human genomes is less certain, as most tools were developed and benchmarked using data from the human genome. To evaluate the use of long-read data for the validation of short-read structural variant calls, the agreement between predictions from a short-read ensemble learning method and long-read tools were compared using real and simulated data from Caenorhabditis elegans. The results obtained from simulated data indicate that the best performing tool is contingent on the type and size of the variant, as well as the sequencing depth of coverage. These results also highlight the need for reference datasets generated from real data that can be used as 'ground truth' in benchmarks.
ISSN:	1932-6203 1932-6203
DOI:	10.1371/journal.pone.0278424