A Multiple Genome Sequence Matching Based on Skipping Tree
In this paper, a new algorithm, skipping suffix algorithm based on a new encoded mode for genome sequence aimed at accelerating multiple genome sequence matching are proposed. By introducing binary coding, the efficiency of gene sequence alignment gets improved obviously. Besides, we decide the maxi...
Gespeichert in:
Veröffentlicht in: | International journal of machine learning and computing 2015-02, Vol.5 (1), p.78-85 |
---|---|
Hauptverfasser: | , , , , |
Format: | Artikel |
Sprache: | eng |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | In this paper, a new algorithm, skipping suffix algorithm based on a new encoded mode for genome sequence aimed at accelerating multiple genome sequence matching are proposed. By introducing binary coding, the efficiency of gene sequence alignment gets improved obviously. Besides, we decide the maximal bits to skip by constructing skipping tree. A contrastive evaluation of the computational efficiency of KMP algorithm, suffix array and skipping suffix algorithm shows that preprocess of skipping suffix algorithm is more than 12 times speedup than that of suffix array. Moreover, multiple genome sequence matching based on suffix array is more than 50 times speedup than that of KMP. In a word, skipping suffix algorithm strike balance between preprocess and search successfully which better help it fit into large-scale genetic data matching. |
---|---|
ISSN: | 2010-3700 2010-3700 |
DOI: | 10.7763/IJMLC.2015.V5.487 |