Toward perfect reads: self-correction of short reads via mapping on de Bruijn graphs

Abstract Motivation Short-read accuracy is important for downstream analyses such as genome assembly and hybrid long-read correction. Despite much work on short-read correction, present-day correctors either do not scale well on large datasets or consider reads as mere suites of k-mers, without taki...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	BIOINFORMATICS 2020-03, Vol.36 (5), p.1374-1381
Hauptverfasser:	Limasset, Antoine, Flot, Jean-François, Peterlongo, Pierre
Format:	Artikel
Sprache:	eng
Schlagworte:	Biochemical Research Methods Biochemistry & Molecular Biology Bioinformatics Biotechnology & Applied Microbiology Computer Science Computer Science, Interdisciplinary Applications Life Sciences & Biomedicine Mathematical & Computational Biology Mathematics Physical Sciences Science & Technology Statistics & Probability Technology
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Abstract Motivation Short-read accuracy is important for downstream analyses such as genome assembly and hybrid long-read correction. Despite much work on short-read correction, present-day correctors either do not scale well on large datasets or consider reads as mere suites of k-mers, without taking into account their full-length sequence information. Results We propose a new method to correct short reads using de Bruijn graphs and implement it as a tool called Bcool. As a first step, Bcool constructs a compacted de Bruijn graph from the reads. This graph is filtered on the basis of k-mer abundance then of unitig abundance, thereby removing most sequencing errors. The cleaned graph is then used as a reference on which the reads are mapped to correct them. We show that this approach yields more accurate reads than k-mer-spectrum correctors while being scalable to human-size genomic datasets and beyond. Availability and implementation The implementation is open source, available at http://github.com/Malfoy/BCOOL under the Affero GPL license and as a Bioconda package. Supplementary information Supplementary data are available at Bioinformatics online.
ISSN:	1367-4803 1460-2059 1367-4811
DOI:	10.1093/bioinformatics/btz102