Automatic Threshold Selection using PSO for GA based Duplicate Record Detection
Normally setting the threshold is an important issue in applications where the similarity functions are used and it relies more on human intervention. The proposed work addressed two issues : first to find the optimal equation using Genetic Algorithm (GA) and next it adopts an intelligence algorithm...
Gespeichert in:
Veröffentlicht in: | International journal of computer applications 2013-01, Vol.62 (4), p.22-27 |
---|---|
Hauptverfasser: | , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Normally setting the threshold is an important issue in applications where the similarity functions are used and it relies more on human intervention. The proposed work addressed two issues : first to find the optimal equation using Genetic Algorithm (GA) and next it adopts an intelligence algorithm, Particle Swarm Optimization (PSO) to get the optimal threshold to detect the duplicate records more accurately and also it reduces human intervention. Restaurant and CORA data repository are used to analyze the proposed algorithm and the performance of the proposed algorithm is compared against marlin method and the genetic programming with the help of evaluation metrics. |
---|---|
ISSN: | 0975-8887 0975-8887 |
DOI: | 10.5120/10068-4674 |