A Multiple Genome Sequence Matching Based on Skipping Tree

In this paper, a new algorithm, skipping suffix algorithm based on a new encoded mode for genome sequence aimed at accelerating multiple genome sequence matching are proposed. By introducing binary coding, the efficiency of gene sequence alignment gets improved obviously. Besides, we decide the maxi...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:International journal of machine learning and computing 2015-02, Vol.5 (1), p.78-85
Hauptverfasser: Xu, Zihuan, Cheng, Kewei, Ding, Yi, Tian, Ziqiang, Zhao, Hui
Format: Artikel
Sprache:eng
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:In this paper, a new algorithm, skipping suffix algorithm based on a new encoded mode for genome sequence aimed at accelerating multiple genome sequence matching are proposed. By introducing binary coding, the efficiency of gene sequence alignment gets improved obviously. Besides, we decide the maximal bits to skip by constructing skipping tree. A contrastive evaluation of the computational efficiency of KMP algorithm, suffix array and skipping suffix algorithm shows that preprocess of skipping suffix algorithm is more than 12 times speedup than that of suffix array. Moreover, multiple genome sequence matching based on suffix array is more than 50 times speedup than that of KMP. In a word, skipping suffix algorithm strike balance between preprocess and search successfully which better help it fit into large-scale genetic data matching.
ISSN:2010-3700
2010-3700
DOI:10.7763/IJMLC.2015.V5.487