A prior for record linkage based on allelic partitions
In database management, record linkage aims to identify multiple records that correspond to the same individual. Record linkage can be treated as a clustering problem in which one or more noisy database records are associated with a unique latent entity. In contrast to traditional clustering applica...
Gespeichert in:
Veröffentlicht in: | Computational statistics & data analysis 2022-08, Vol.172, p.107474, Article 107474 |
---|---|
Hauptverfasser: | , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | In database management, record linkage aims to identify multiple records that correspond to the same individual. Record linkage can be treated as a clustering problem in which one or more noisy database records are associated with a unique latent entity. In contrast to traditional clustering applications, a large number of clusters with a few observations per cluster is expected in this context. Hence, a new class of prior distributions based on allelic partitions is proposed for the small cluster setting of record linkage. The proposed prior facilitates the introduction of information about the cluster size distribution at different scales, and naturally enforces sublinear growth of the maximum cluster size – known as the microclustering property. In addition, a set of novel microclustering conditions are introduced in order to impose further constraints on the cluster sizes a priori. The performance of the proposed class of priors is evaluated using simulated data and three official statistics data sets. Moreover, different loss functions for optimal point estimation of the partitions are compared using decision-theoretical based approaches recently proposed in the literature. |
---|---|
ISSN: | 0167-9473 1872-7352 |
DOI: | 10.1016/j.csda.2022.107474 |