Characterization and visualization of tandem repeats at genome scale

Tandem repeat (TR) variation is associated with gene expression changes and numerous rare monogenic diseases. Although long-read sequencing provides accurate full-length sequences and methylation of TRs, there is still a need for computational methods to profile TRs across the genome. Here we introd...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Nature biotechnology 2024-10, Vol.42 (10), p.1606-1614
Hauptverfasser: Dolzhenko, Egor, English, Adam, Dashnow, Harriet, De Sena Brandine, Guilherme, Mokveld, Tom, Rowell, William J., Karniski, Caitlin, Kronenberg, Zev, Danzi, Matt C., Cheung, Warren A., Bi, Chengpeng, Farrow, Emily, Wenger, Aaron, Chua, Khi Pin, Martínez-Cerdeño, Verónica, Bartley, Trevor D., Jin, Peng, Nelson, David L., Zuchner, Stephan, Pastinen, Tomi, Quinlan, Aaron R., Sedlazeck, Fritz J., Eberle, Michael A.
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Tandem repeat (TR) variation is associated with gene expression changes and numerous rare monogenic diseases. Although long-read sequencing provides accurate full-length sequences and methylation of TRs, there is still a need for computational methods to profile TRs across the genome. Here we introduce the Tandem Repeat Genotyping Tool (TRGT) and an accompanying TR database. TRGT determines the consensus sequences and methylation levels of specified TRs from PacBio HiFi sequencing data. It also reports reads that support each repeat allele. These reads can be subsequently visualized with a companion TR visualization tool. Assessing 937,122 TRs, TRGT showed a Mendelian concordance of 98.38%, allowing a single repeat unit difference. In six samples with known repeat expansions, TRGT detected all expansions while also identifying methylation signals and mosaicism and providing finer repeat length resolution than existing methods. Additionally, we released a database with allele sequences and methylation levels for 937,122 TRs across 100 genomes. A set of tools maps tandem repeats across complete genomes.
ISSN:1087-0156
1546-1696
1546-1696
DOI:10.1038/s41587-023-02057-3