Supporting Uncertain Predicates in DBMS Using Approximate String Matching and Probabilistic Databases
Current relational database systems are deterministic in nature and lack the support for approximate matching. The result of approximate matching would be the tuples annotated with the percentage of similarity but the existing relational database system can not process these similarity scores furthe...
Gespeichert in:
Veröffentlicht in: | IEEE access 2020, Vol.8, p.169070-169081 |
---|---|
Hauptverfasser: | , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Current relational database systems are deterministic in nature and lack the support for approximate matching. The result of approximate matching would be the tuples annotated with the percentage of similarity but the existing relational database system can not process these similarity scores further. In this paper, we propose a system to support approximate matching in the DBMS field. We introduce a ' \approx ' (uncertain predicate operator) for approximate matching and devise a novel formula to calculate the similarity scores. Instead of returning an empty answer set in case of no match, our system gives ranked results thereby providing a glance at existing tuples closely matching with the queried literals. Two variants of the ' \approx ' operator are also introduced for numeric data: ' \approx + ' for higher-the-better and ' \approx - ' for lower-the-better cases. Efficient approximate string matching methods are proposed for matching string-type data whereas numeric closeness is used for other types of data (date, time, and number). We also provide results of our system taken over several sample queries that illustrate the significance of our system. All experiments are performed using the MySQL database, whereas the IMDb movie database and European Football database are used as sample datasets. |
---|---|
ISSN: | 2169-3536 2169-3536 |
DOI: | 10.1109/ACCESS.2020.3021945 |