Automatic Threshold Selection using PSO for GA based Duplicate Record Detection

Normally setting the threshold is an important issue in applications where the similarity functions are used and it relies more on human intervention. The proposed work addressed two issues : first to find the optimal equation using Genetic Algorithm (GA) and next it adopts an intelligence algorithm...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:International journal of computer applications 2013-01, Vol.62 (4), p.22-27
Hauptverfasser: Deepa, K, Rangarajan, R, Selvi, M Senthamil
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Normally setting the threshold is an important issue in applications where the similarity functions are used and it relies more on human intervention. The proposed work addressed two issues : first to find the optimal equation using Genetic Algorithm (GA) and next it adopts an intelligence algorithm, Particle Swarm Optimization (PSO) to get the optimal threshold to detect the duplicate records more accurately and also it reduces human intervention. Restaurant and CORA data repository are used to analyze the proposed algorithm and the performance of the proposed algorithm is compared against marlin method and the genetic programming with the help of evaluation metrics.
ISSN:0975-8887
0975-8887
DOI:10.5120/10068-4674