Learning string transformations from examples

"Robert" and "Bob" refer to the same first name but are textually far apart. Traditional string similarity functions do not allow a flexible way to account for such synonyms, abbreviations and aliases. Recently, string transformations have been proposed as a mechanism to make mat...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Proceedings of the VLDB Endowment 2009-08, Vol.2 (1), p.514-525
Hauptverfasser: Arasu, Arvind, Chaudhuri, Surajit, Kaushik, Raghav
Format: Artikel
Sprache:eng
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:"Robert" and "Bob" refer to the same first name but are textually far apart. Traditional string similarity functions do not allow a flexible way to account for such synonyms, abbreviations and aliases. Recently, string transformations have been proposed as a mechanism to make matching robust to such variations. However, in many domains, identifying an appropriate set of transformations is challenging as the space of possible transformations is large. In this paper, we investigate the problem of leveraging examples of matching strings to learn string transformations. We formulate an optimization problem where we are required to learn a concise set of transformations that explain most of the differences. We propose a greedy approximation algorithm for this NP-hard problem. Our experiments over real-life data illustrate the benefits of our approach.
ISSN:2150-8097
2150-8097
DOI:10.14778/1687627.1687686