Long read alignment based on maximal exact match seeds

The explosive growth of next-generation sequencing datasets poses a challenge to the mapping of reads to reference genomes in terms of alignment quality and execution speed. With the continuing progress of high-throughput sequencing technologies, read length is constantly increasing and many existin...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Bioinformatics 2012-09, Vol.28 (18), p.i318-i324
Hauptverfasser:	Liu, Yongchao, Schmidt, Bertil
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Chromosome Mapping Genome, Human Genomics - methods High-Throughput Nucleotide Sequencing Humans Original Papers Sequence Alignment - methods Sequence Analysis, DNA - methods Software
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	The explosive growth of next-generation sequencing datasets poses a challenge to the mapping of reads to reference genomes in terms of alignment quality and execution speed. With the continuing progress of high-throughput sequencing technologies, read length is constantly increasing and many existing aligners are becoming inefficient as generated reads grow larger. We present CUSHAW2, a parallelized, accurate, and memory-efficient long read aligner. Our aligner is based on the seed-and-extend approach and uses maximal exact matches as seeds to find gapped alignments. We have evaluated and compared CUSHAW2 to the three other long read aligners BWA-SW, Bowtie2 and GASSST, by aligning simulated and real datasets to the human genome. The performance evaluation shows that CUSHAW2 is consistently among the highest-ranked aligners in terms of alignment quality for both single-end and paired-end alignment, while demonstrating highly competitive speed. Furthermore, our aligner shows good parallel scalability with respect to the number of CPU threads. CUSHAW2, written in C++, and all simulated datasets are available at http://cushaw2.sourceforge.net liuy@uni-mainz.de; bertil.schmidt@uni-mainz.de Supplementary data are available at Bioinformatics online.
ISSN:	1367-4803 1367-4811 1460-2059
DOI:	10.1093/bioinformatics/bts414