Estimating Precision and Recall for Deterministic and Probabilistic Record Linkage

Linking administrative, survey and census files to enhance dimensions such as time and breadth or depth of detail is now common. Because a unique person identifier is often not available, records belonging to two different units (e.g. people) may be incorrectly linked. Estimating the proportion of l...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:International statistical review 2018-08, Vol.86 (2), p.219-236
Hauptverfasser: Chipperfield, James, Hansen, Noel, Rossiter, Peter
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Linking administrative, survey and census files to enhance dimensions such as time and breadth or depth of detail is now common. Because a unique person identifier is often not available, records belonging to two different units (e.g. people) may be incorrectly linked. Estimating the proportion of links that are correct, called Precision, is difficult because, even after clerical review, there will remain uncertainty about whether a link is in fact correct or incorrect. Measures of Precision are useful when deciding whether or not it is worthwhile linking two files, when comparing alternative linking strategies and as a quality measure for estimates based on the linked file. This paper proposes an estimator of Precision for a linked file that has been created by either deterministic (or rules-based) or probabilistic (where evidence for a link being a match is weighted against the evidence that it is not a match) linkage, both of which are widely used in practice. This paper shows that the proposed estimators perform well.
ISSN:0306-7734
1751-5823
DOI:10.1111/insr.12246