De novo diploid genome assembly using long noisy reads

The high sequencing error rate has impeded the application of long noisy reads for diploid genome assembly. Most existing assemblers failed to generate high-quality phased assemblies using long noisy reads. Here, we present PECAT, a P hased E rror C orrection and A ssembly T ool, for reconstructing...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Nature communications 2024-04, Vol.15 (1), p.2964-2964, Article 2964
Hauptverfasser: Nie, Fan, Ni, Peng, Huang, Neng, Zhang, Jun, Wang, Zhenyu, Xiao, Chuanle, Luo, Feng, Wang, Jianxin
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:The high sequencing error rate has impeded the application of long noisy reads for diploid genome assembly. Most existing assemblers failed to generate high-quality phased assemblies using long noisy reads. Here, we present PECAT, a P hased E rror C orrection and A ssembly T ool, for reconstructing diploid genomes from long noisy reads. We design a haplotype-aware error correction method that can retain heterozygote alleles while correcting sequencing errors. We combine a corrected read SNP caller and a raw read SNP caller to further improve the identification of inconsistent overlaps in the string graph. We use a grouping method to assign reads to different haplotype groups. PECAT efficiently assembles diploid genomes using Nanopore R9, PacBio CLR or Nanopore R10 reads only. PECAT generates more contiguous haplotype-specific contigs compared to other assemblers. Especially, PECAT achieves nearly haplotype-resolved assembly on B. taurus (Bison×Simmental) using Nanopore R9 reads and phase block NG50 with 59.4/58.0 Mb for HG002 using Nanopore R10 reads. Most existing assemblers failed to generate high-quality phased assemblies using long noisy reads. Here, the authors present PECAT, a Phased Error Correction and Assembly Tool, for reconstructing diploid genomes from long noisy reads.
ISSN:2041-1723
2041-1723
DOI:10.1038/s41467-024-47349-7