Data integration using statistical matching techniques: A review

In the era of data revolution, availability and presence of data is a huge wealth that has to be utilized. Instead of making new surveys, benefit can be made from data that already exists. As enormous amounts of data become available, it is becoming essential to undertake research that involves inte...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Statistical journal of the IAOS 2021, Vol.37 (4), p.1391-1410
Hauptverfasser: Lewaa, Israa, Hafez, Mai Sherif, Ismail, Mohamed Ali
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:In the era of data revolution, availability and presence of data is a huge wealth that has to be utilized. Instead of making new surveys, benefit can be made from data that already exists. As enormous amounts of data become available, it is becoming essential to undertake research that involves integrating data from multiple sources in order to make the best use out of it. Statistical Data Integration (SDI) is the statistical tool for considering this issue. SDI can be used to integrate data files that have common units, and it also allows to merge unrelated files that do not share any common units, depending on the input data. The convenient method of data integration is determined according to the nature of the input data. SDI has two main methods, Record Linkage (RL) and Statistical Matching (SM). SM techniques typically aim to achieve a complete data file from different sources which do not contain the same units. This paper aims at giving a complete overview of existing SM methods, both classical and recent, in order to provide a unified summary of various SM techniques along with their drawbacks. Points for future research are suggested at the end of this paper.
ISSN:1874-7655
1875-9254
DOI:10.3233/SJI-210835