Making statistical inferences about linkage errors

Record linkage aims to identify records that are from the same unit, in one or many sources. Sometimes, it is imperfect because the available identifying information is limited and erroneous. In such cases, it is important to report the linkage accuracy, which may be measured according to one of man...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Japanese journal of statistics and data science 2024-06, Vol.7 (1), p.17-56
Hauptverfasser: Dasylva, Abel, Goussanou, Arthur
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Record linkage aims to identify records that are from the same unit, in one or many sources. Sometimes, it is imperfect because the available identifying information is limited and erroneous. In such cases, it is important to report the linkage accuracy, which may be measured according to one of many proposed statistical models. These models offer clear advantages over clerical reviews, in terms of costs and timeliness. They also apply where clerical reviews are impossible, e.g., when two parties need to link their respective data sets, such that neither party can see the record pairs in the clear. For obvious reasons, these models must be validated before they are used, by performing goodness-to-fit tests. Unfortunately, this is currently difficult because all existing models rely on observations that are correlated. Thus, the Chi-squared and likelihood ratio tests are biased. In fact, it is challenging to perform any kind of statistical inference about these models or their parameters. In this work, this long-standing problem is addressed when modeling the linkage errors through the number of links of a record. The proposed solution bases the inferences on a subset of observations that are approximately independent.
ISSN:2520-8756
2520-8764
DOI:10.1007/s42081-023-00228-9