Gerbil: a fast and memory-efficient  k -mer counter with GPU-support

A basic task in bioinformatics is the counting of  -mers in genome sequences. Existing  -mer counting tools are most often optimized for small  < 32 and suffer from excessive memory resource consumption or degrading performance for large  . However, given the technology trend towards long reads o...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Algorithms for molecular biology 2017-03, Vol.12 (1), p.9-9, Article 9
Hauptverfasser: Erbert, Marius, Rechner, Steffen, Müller-Hannemann, Matthias
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:A basic task in bioinformatics is the counting of  -mers in genome sequences. Existing  -mer counting tools are most often optimized for small  < 32 and suffer from excessive memory resource consumption or degrading performance for large  . However, given the technology trend towards long reads of next-generation sequencers, support for large  becomes increasingly important. We present the open source  -mer counting software that has been designed for the efficient counting of  -mers for  ≥ 32. Our software is the result of an intensive process of algorithm engineering. It implements a two-step approach. In the first step, genome reads are loaded from disk and redistributed to temporary files. In a second step, the  -mers of each temporary file are counted via a hash table approach. In addition to its basic functionality, can optionally use GPUs to accelerate the counting step. In a set of experiments with real-world genome data sets, we show that is able to efficiently support both small and large  . While 's performance is comparable to existing state-of-the-art open source  -mer counting tools for small  < 32, it vastly outperforms its competitors for large  , thereby enabling new applications which require large values of  .
ISSN:1748-7188
1748-7188
DOI:10.1186/s13015-017-0097-9