Gerbil: a fast and memory-efficient k -mer counter with GPU-support
A basic task in bioinformatics is the counting of -mers in genome sequences. Existing -mer counting tools are most often optimized for small < 32 and suffer from excessive memory resource consumption or degrading performance for large . However, given the technology trend towards long reads o...
Gespeichert in:
Veröffentlicht in: | Algorithms for molecular biology 2017-03, Vol.12 (1), p.9-9, Article 9 |
---|---|
Hauptverfasser: | , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | A basic task in bioinformatics is the counting of
-mers in genome sequences. Existing
-mer counting tools are most often optimized for small
< 32 and suffer from excessive memory resource consumption or degrading performance for large
. However, given the technology trend towards long reads of next-generation sequencers, support for large
becomes increasingly important.
We present the open source
-mer counting software
that has been designed for the efficient counting of
-mers for
≥ 32. Our software is the result of an intensive process of algorithm engineering. It implements a two-step approach. In the first step, genome reads are loaded from disk and redistributed to temporary files. In a second step, the
-mers of each temporary file are counted via a hash table approach. In addition to its basic functionality,
can optionally use GPUs to accelerate the counting step. In a set of experiments with real-world genome data sets, we show that
is able to efficiently support both small and large
.
While
's performance is comparable to existing state-of-the-art open source
-mer counting tools for small
< 32, it vastly outperforms its competitors for large
, thereby enabling new applications which require large values of
. |
---|---|
ISSN: | 1748-7188 1748-7188 |
DOI: | 10.1186/s13015-017-0097-9 |