MuffinEc: Error correction for de Novo assembly via greedy partitioning and sequence alignment
•We present an error correction method based on grouping reads and Multiple Sequence Alignment (MSA).•This method supports any sequencing technology because it handles all types of errors, including indels.•PacBio datasets can be corrected without an existing genome or a helper dataset with less err...
Gespeichert in:
Veröffentlicht in: | Information sciences 2016-02, Vol.329, p.206-219 |
---|---|
Hauptverfasser: | , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | •We present an error correction method based on grouping reads and Multiple Sequence Alignment (MSA).•This method supports any sequencing technology because it handles all types of errors, including indels.•PacBio datasets can be corrected without an existing genome or a helper dataset with less errors.•Our method is faster and uses less memory than the state of the art.•MuffinEc obtains better sensitivity, specificity and gain in most of our experiments.
Error correction is typically the first step of de Novo genome assembly from NGS data. This step has an important impact on the quality and speed of the assembly process. However, the majority of available stand-alone error correction solutions can only detect and correct mismatches. Therefore, these solutions only support correcting reads generated by Illumina sequencers. Several solutions support insertions and deletions (indels) and are capable of working with multiple technologies. However, these solutions are limited by correction performance and resource consumption. In this paper, we introduce MuffinEc, an indel-aware multi-technology correction method for NGS data. This method uses a greedy approach to create groups of reads and subsequently corrects them using their consensus. MuffinEc surpasses existing solutions by offering better correction ratios for multiple technologies. This method also exploits parallel processing via OpenMP and uses less computational resources than similar programs, thereby being capable of handling large datasets. MuffinEc is open source and freely available at http://muffinec.sourceforge.net. |
---|---|
ISSN: | 0020-0255 1872-6291 |
DOI: | 10.1016/j.ins.2015.09.012 |