New Refinement Techniques for Longest Common Subsequence Algorithms

Certain properties of the input strings have dominating influence on the running time of an algorithm selected to solve the longest common subsequence (lcs) problem of two input strings. It has turned out to be difficult – as well theoretically as practically – to develop an lcs algorithm which woul...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Bergroth, Lasse, Hakonen, Harri, Väisänen, Juri
Format:	Tagungsbericht
Sprache:	eng
Schlagworte:	Algorithmics. Computability. Computer arithmetics Applied sciences Computer science control theory systems Exact sciences and technology heuristic algorithms Information systems. Data bases longest common subsequence Memory organisation. Data processing Software string algorithms Theoretical computing
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Certain properties of the input strings have dominating influence on the running time of an algorithm selected to solve the longest common subsequence (lcs) problem of two input strings. It has turned out to be difficult – as well theoretically as practically – to develop an lcs algorithm which would be superior for all problem instances. Furthermore, implementing the most evolved lcs algorithms presented recently is laborious. This paper shows that it is still beneficial to refine the traditional lcs algorithms to get new algorithm variants that are in practice competitive to the modern lcs methods in certain problem instances. We present and analyse a general-purpose algorithm NKY-MODIF, which has a moderate time and space efficiency and can easily be implemented correctly. The algorithm bases on the so-called diagonal-wise method of Nakatsu, Kambayashi and Yajima (NKY). The NKY algorithm was selected for our further consideration due to its algorithmic independence of the size of the input alphabet and its light pre-processing phase. The NKY-MODIF algorithm refines the NKY method essentially in three ways: by reducing unnecessary scanning over the input sequences, storing the intermediate results more locally, and utilizing lower and upper bound knowledge about the lcs. In order to demonstrate that the some of the presented ideas are not specific for the NKY only, we apply lower bound information on two lcs algorithms having a different processing approach than the NKY has. This introduces a new way to solve the lcs problem. The lcs problem has two variants: calculating only the length of the lcs, and determining also the symbols belonging to one instance of the lcs. We verify the presented ideas for both of these problem types by extensive test runs.
ISSN:	0302-9743 1611-3349
DOI:	10.1007/978-3-540-39984-1_22