bsgenova: an accurate, robust, and fast genotype caller for bisulfite-sequencing data

Bisulfite sequencing (BS-Seq) is a fundamental technique for characterizing DNA methylation profiles. Genotype calling from bisulfite-converted BS-Seq data allows allele-specific methylation analysis and the concurrent exploration of genetic and epigenetic profiles. Despite various methods have been...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:BMC bioinformatics 2024-06, Vol.25 (1), p.206-12
Hauptverfasser: Feng, Yance, Gao, Fei
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Bisulfite sequencing (BS-Seq) is a fundamental technique for characterizing DNA methylation profiles. Genotype calling from bisulfite-converted BS-Seq data allows allele-specific methylation analysis and the concurrent exploration of genetic and epigenetic profiles. Despite various methods have been proposed, single nucleotide polymorphisms (SNPs) calling from BS-Seq data, particularly for SNPs on chromosome X and in the presence of contaminative data, poses ongoing challenges. We introduce bsgenova, a novel SNP caller tailored for bisulfite sequencing data, employing a Bayesian multinomial model. The performance of bsgenova is assessed by comparing SNPs called from real-world BS-Seq data with those from corresponding whole-genome sequencing (WGS) data across three human cell lines. bsgenova is both sensitive and precise, especially for chromosome X, compared with three existing methods. Moreover, in the presence of low-quality reads, bsgenova outperforms other methods notably. In addition, bsgenova is meticulously implemented, leveraging matrix imputation and multi-process parallelization. Compared to existing methods, bsgenova stands out for its speed and efficiency in memory and disk usage. Furthermore, bsgenova integrates bsextractor, a methylation extractor, enhancing its flexibility and expanding its utility. We introduce bsgenova for SNP calling from bisulfite-sequencing data. The source code is available at https://github.com/hippo-yf/bsgenova under license GPL-3.0.
ISSN:1471-2105
1471-2105
DOI:10.1186/s12859-024-05821-7