Sequencing and de novo assembly of 150 genomes from Denmark as a population reference
A report of high-depth, short-read sequencing and de novo assemblies for 150 individuals from 50 parent–offspring trios as part of establishing a population reference genome for the GenomeDenmark project. Sequencing the genome of Denmark Genome sequencing of individuals across a population is an imp...
Gespeichert in:
Veröffentlicht in: | Nature (London) 2017-08, Vol.548 (7665), p.87-91 |
---|---|
Hauptverfasser: | , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | A report of high-depth, short-read sequencing and
de novo
assemblies for 150 individuals from 50 parent–offspring trios as part of establishing a population reference genome for the GenomeDenmark project.
Sequencing the genome of Denmark
Genome sequencing of individuals across a population is an important component of precision medicine initiatives, and can be used to characterize genetic variation and in association mapping for diseases and complex traits. Mikkel Schierup and colleagues report efforts to establish a population reference genome for the Danish population as part of the GenomeDenmark project. The authors report high-depth short-read sequencing and
de novo
assemblies for 150 individuals from 50 parent–offspring trios. They demonstrate that this approach provides similar quality metrics to long-read approaches and helps to resolve structural variation and complex genomic regions. This provides a cost-effective way to establish a population reference genome that will be useful for association mapping and precision medicine initiatives.
Hundreds of thousands of human genomes are now being sequenced to characterize genetic variation and use this information to augment association mapping studies of complex disorders and other phenotypic traits
1
,
2
,
3
,
4
. Genetic variation is identified mainly by mapping short reads to the reference genome or by performing local assembly
2
,
5
,
6
,
7
. However, these approaches are biased against discovery of structural variants and variation in the more complex parts of the genome. Hence, large-scale
de novo
assembly is needed. Here we show that it is possible to construct excellent
de novo
assemblies from high-coverage sequencing with mate-pair libraries extending up to 20 kilobases. We report
de novo
assemblies of 150 individuals (50 trios) from the GenomeDenmark project. The quality of these assemblies is similar to those obtained using the more expensive long-read technology
4
,
8
,
9
,
10
,
11
,
12
,
13
. We use the assemblies to identify a rich set of structural variants including many novel insertions and demonstrate how this variant catalogue enables further deciphering of known association mapping signals. We leverage the assemblies to provide 100 completely resolved major histocompatibility complex haplotypes and to resolve major parts of the Y chromosome. Our study provides a regional reference genome that we expect will improve the power of future association mapping studies and hence pave the w |
---|---|
ISSN: | 0028-0836 1476-4687 |
DOI: | 10.1038/nature23264 |