Localized assembly for long reads enables genome-wide analysis of repetitive regions at single-base resolution in human genomes
Long-read sequencing technologies have the potential to overcome the limitations of short reads and provide a comprehensive picture of the human genome. However, the characterization of repetitive sequences by reconstructing genomic structures at high resolution solely from long reads remains diffic...
Gespeichert in:
Veröffentlicht in: | Human Genomics 2023-03, Vol.17 (1), p.21-21, Article 21 |
---|---|
Hauptverfasser: | , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Long-read sequencing technologies have the potential to overcome the limitations of short reads and provide a comprehensive picture of the human genome. However, the characterization of repetitive sequences by reconstructing genomic structures at high resolution solely from long reads remains difficult. Here, we developed a localized assembly method (LoMA) that constructs highly accurate consensus sequences (CSs) from long reads.
We developed LoMA by combining minimap2, MAFFT, and our algorithm, which classifies diploid haplotypes based on structural variants and CSs. Using this tool, we analyzed two human samples (NA18943 and NA19240) sequenced with the Oxford Nanopore sequencer. We defined target regions in each genome based on mapping patterns and then constructed a high-quality catalog of the human insertion solely from the long-read data.
The assessment of LoMA showed a high accuracy of CSs (error rate 8%) and superiority to a previous study. The genome-wide analysis of NA18943 and NA19240 identified 5516 and 6542 insertions (≥ 100 bp), respectively. Most insertions (~ 80%) were derived from tandem repeats and transposable elements. We also detected processed pseudogenes, insertions in transposable elements, and long insertions (> 10 kbp). Finally, our analysis suggested that short tandem duplications are associated with gene expression and transposons.
Our analysis showed that LoMA constructs high-quality sequences from long reads with substantial errors. This study revealed the true structures of the insertions with high accuracy and inferred the mechanisms for the insertions, thus contributing to future human genome studies. LoMA is available at our GitHub page: https://github.com/kolikem/loma . |
---|---|
ISSN: | 1479-7364 1473-9542 1479-7364 |
DOI: | 10.1186/s40246-023-00467-7 |