Long-read sequence and assembly of segmental duplications

We have developed a computational method based on polyploid phasing of long sequence reads to resolve collapsed regions of segmental duplications within genome assemblies. Segmental Duplication Assembler (SDA; https://github.com/mvollger/SDA ) constructs graphs in which paralogous sequence variants...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Nature methods 2019-01, Vol.16 (1), p.88-94
Hauptverfasser: Vollger, Mitchell R., Dishuck, Philip C., Sorensen, Melanie, Welch, AnneMarie E., Dang, Vy, Dougherty, Max L., Graves-Lindsay, Tina A., Wilson, Richard K., Chaisson, Mark J. P., Eichler, Evan E.
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:We have developed a computational method based on polyploid phasing of long sequence reads to resolve collapsed regions of segmental duplications within genome assemblies. Segmental Duplication Assembler (SDA; https://github.com/mvollger/SDA ) constructs graphs in which paralogous sequence variants define the nodes and long-read sequences provide attraction and repulsion edges, enabling the partition and assembly of long reads corresponding to distinct paralogs. We apply it to single-molecule, real-time sequence data from three human genomes and recover 33–79 megabase pairs (Mb) of duplications in which approximately half of the loci are diverged (99.9%) and that the diverged sequence corresponds to copy-number-variable paralogs that are absent from the human reference genome. Our method can be applied to other complex genomes to resolve the last gene-rich gaps, improve duplicate gene annotation, and better understand copy-number-variant genetic diversity at the base-pair level. Segmental Duplication Assembler (SDA) uses long sequence reads to resolve segmental duplications that are collapsed in current genome assemblies. These assemblies correspond in total to the length of an average human chromosome.
ISSN:1548-7091
1548-7105
DOI:10.1038/s41592-018-0236-3