Assessment of the discriminating power of identifiers for record linkage
To reconstruct a patient's medical history, one has often to combine information from different sources, whatever the context of this reconstitution: epidemiological studies or health care. As a linkage using less informative identifiers could lead to linkage errors, it is essential to quantify...
Gespeichert in:
Veröffentlicht in: | Revue d'épidémiologie et de santé publique 2004-10, Vol.52 (5), p.431-440 |
---|---|
Hauptverfasser: | , , , , , , , , , |
Format: | Artikel |
Sprache: | fre |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | To reconstruct a patient's medical history, one has often to combine information from different sources, whatever the context of this reconstitution: epidemiological studies or health care. As a linkage using less informative identifiers could lead to linkage errors, it is essential to quantify the information associated to each identifier.
The aim of this study was to estimate the discriminating power of different identifiers which could be used in a record linkage process based on the means of the likelihood ratio proposed by Jaro as a probabilistic record linkage method. Six identifiers were considered: date of birth, maiden name, usual last name, first and second christian names and the gender. Two types of phonetic treatment were taken into account: the Soundex and a phonetic treatment adapted to the French language. Three situations were considered: 1) and 2) linkage of the data collected during two consecutive years in a university hospital (CHU de Dijon; 100000x100000 records) and a Paris hospital (50000x50000 records), 3) linkage of two files obtained through a healthcare network (Burgundy Perinatal Network; 200x2500 records).
Whatever the situation, this work showed the interest of three identifiers when linking data concerning a same patient. The date of birth had the best discriminating power followed by the first and the last names. Including a poorly discriminating identifier like gender did not improve the results. Moreover, adding a second Christian name, often missing, increased linkage errors. On the contrary, it seemed that using a phonetic treatment adapted to the French language could slightly improve the results of linkage in comparison to Soundex.
Whatever the method used, it seems necessary to improve the quality of identifier collection, in particular of the date of birth and of the first and last names as it could make the linkage of data obtained from different sources easier. Further research is needed to estimate the discriminating power of other identifiers (birth place and parents identifiers). |
---|---|
ISSN: | 0398-7620 |
DOI: | 10.1016/S0398-7620(04)99079-7 |