Hausdorff Distance-Based Record Linkage for Improved Matching of Households and Individuals in Different Databases
Matching households and individuals across different databases poses challenges due to the lack of unique identifiers, typographical errors, and changes in attributes over time. Record linkage tools play a crucial role in overcoming these difficulties. This paper presents a multi-step record linkage...
Gespeichert in:
Hauptverfasser: | , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Matching households and individuals across different databases poses
challenges due to the lack of unique identifiers, typographical errors, and
changes in attributes over time. Record linkage tools play a crucial role in
overcoming these difficulties. This paper presents a multi-step record linkage
procedure that incorporates household information to enhance the
entity-matching process across multiple databases. Our approach utilizes the
Hausdorff distance to estimate the probability of a match between households in
multiple files. Subsequently, probabilities of matching individuals within
these households are computed using a logistic regression model based on
attribute-level distances. These estimated probabilities are then employed in a
linear programming optimization framework to infer one-to-one matches between
individuals. To assess the efficacy of our method, we apply it to link data
from the Italian Survey of Household Income and Wealth across different years.
Through internal and external validation procedures, the proposed method is
shown to provide a significant enhancement in the quality of the individual
matching process, thanks to the incorporation of household information. A
comparison with a standard record linkage approach based on direct matching of
individuals, which neglects household information, underscores the advantages
of accounting for such information. |
---|---|
DOI: | 10.48550/arxiv.2404.05566 |