A representation of a compressed de Bruijn graph for pan-genome analysis that enables search

Recently, Marcus et al. (Bioinformatics 2014) proposed to use a compressed de Bruijn graph to describe the relationship between the genomes of many individuals/strains of the same or closely related species. They devised an \(O(n \log g)\) time algorithm called splitMEM that constructs this graph di...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	arXiv.org 2016-02
Hauptverfasser:	Beller, Timo, Ohlebusch, Enno
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Bioinformatics Genomes Graphical representations Suffix trees
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Recently, Marcus et al. (Bioinformatics 2014) proposed to use a compressed de Bruijn graph to describe the relationship between the genomes of many individuals/strains of the same or closely related species. They devised an \(O(n \log g)\) time algorithm called splitMEM that constructs this graph directly (i.e., without using the uncompressed de Bruijn graph) based on a suffix tree, where \(n\) is the total length of the genomes and \(g\) is the length of the longest genome. In this paper, we present a construction algorithm that outperforms their algorithm in theory and in practice. Moreover, we propose a new space-efficient representation of the compressed de Bruijn graph that adds the possibility to search for a pattern (e.g. an allele - a variant form of a gene) within the pan-genome.
ISSN:	2331-8422