De novo assembly and phasing of a Korean human genome

De novo assembly and phasing of the genome of an individual from Korea using a combination of different sequencing approaches provides a useful population-specific reference genome and represents the most contiguous human genome assembly so far. A Korean human genome Jeong-Sun Seo and colleagues rep...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Nature (London) 2016-10, Vol.538 (7624), p.243-247
Hauptverfasser: Seo, Jeong-Sun, Rhie, Arang, Kim, Junsoo, Lee, Sangjin, Sohn, Min-Hwan, Kim, Chang-Uk, Hastie, Alex, Cao, Han, Yun, Ji-Young, Kim, Jihye, Kuk, Junho, Park, Gun Hwa, Kim, Juhyeok, Ryu, Hanna, Kim, Jongbum, Roh, Mira, Baek, Jeonghun, Hunkapiller, Michael W., Korlach, Jonas, Shin, Jong-Yeon, Kim, Changhoon
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:De novo assembly and phasing of the genome of an individual from Korea using a combination of different sequencing approaches provides a useful population-specific reference genome and represents the most contiguous human genome assembly so far. A Korean human genome Jeong-Sun Seo and colleagues report de novo assembly and phasing of the genome of an individual from Korea using a combination of PacBio long-read sequencing, Illumina short-read sequencing, 10X Genomics linked reads, bacterial artificial chromosome (BAC) sequencing and BioNano Genomics optical mapping. This provides a useful population-specific reference genome and represents the most contiguous human genome assembly to date. The authors use this to close gaps in the human reference genome and map structural variation. Advances in genome assembly and phasing provide an opportunity to investigate the diploid architecture of the human genome and reveal the full range of structural variation across population groups. Here we report the de novo assembly and haplotype phasing of the Korean individual AK1 (ref. 1 ) using single-molecule real-time sequencing 2 , next-generation mapping 3 , microfluidics-based linked reads 4 , and bacterial artificial chromosome (BAC) sequencing approaches. Single-molecule sequencing coupled with next-generation mapping generated a highly contiguous assembly, with a contig N50 size of 17.9 Mb and a scaffold N50 size of 44.8 Mb, resolving 8 chromosomal arms into single scaffolds. The de novo assembly, along with local assemblies and spanning long reads, closes 105 and extends into 72 out of 190 euchromatic gaps in the reference genome, adding 1.03 Mb of previously intractable sequence. High concordance between the assembly and paired-end sequences from 62,758 BAC clones provides strong support for the robustness of the assembly. We identify 18,210 structural variants by direct comparison of the assembly with the human reference, identifying thousands of breakpoints that, to our knowledge, have not been reported before. Many of the insertions are reflected in the transcriptome and are shared across the Asian population. We performed haplotype phasing of the assembly with short reads, long reads and linked reads from whole-genome sequencing and with short reads from 31,719 BAC clones, thereby achieving phased blocks with an N50 size of 11.6 Mb. Haplotigs assembled from single-molecule real-time reads assigned to haplotypes on phased blocks covered 89% of genes. The hapl
ISSN:0028-0836
1476-4687
DOI:10.1038/nature20098