Implementation of an extended Fellegi-Sunter probabilistic record linkage method using the Jaro-Winkler string comparator

Record linkage is the task of identifying which records from one or more data sources refer to the same person. Often, records do not have a common key and may contain typographical variations in identifier fields, in such a case, the Fellegi-Sunter probabilistic record linkage is a method commonly...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Xinran Li, Guttmann, Aline, Cipiere, Sebastien, Maigne, Lydia, Demongeot, Jacques, Boire, Jean-Yves, Ouchchane, Lemlih
Format: Tagungsbericht
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Record linkage is the task of identifying which records from one or more data sources refer to the same person. Often, records do not have a common key and may contain typographical variations in identifier fields, in such a case, the Fellegi-Sunter probabilistic record linkage is a method commonly used. In this method, a weight is assigned for each pair of records. Record pairs with weights above a given threshold are considered as matches. Winkler introduced an extension of the Fellegi-Sunter method that takes into account field similarity in the calculation of weight, and proved its outperformance. The implementation of the Fellegi-Sunter method is frequently presented in the literature, however, the application of Winkler method is rarely mentioned. This paper presents brief backgrounds of these two record linkage methods, and describes in details how to implement the Winkler method. We formalized and then estimated the required parameters of the Winkler method using the expectation-maximization (EM) algorithm. Simulated data sets-with known truth of the matches-were used to assess parameters' estimation and to compare Winkler and Fellegi-Sunter methods regarding their ability to reduce the rates of false matches and false non-matches.
ISSN:2168-2194
2168-2208
DOI:10.1109/BHI.2014.6864381