Inter-relationships between geographical scale, socio-economic data suppression and population homogeneity

Over time, technology has greatly enhanced access to vast amounts of public data in government datasets. At the same time there has been an increase in ‘neighbourhood’ level research, in which researchers typically select an administrative unit for their analysis. As the demand for data driven insig...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Applied spatial analysis and policy 2022-12, Vol.15 (4), p.1075-1091
Hauptverfasser: Mills, Oliver, Shackleton, Nichola, Colbert, Jessie, Zhao, Jinfeng, Norman, Paul, Exeter, Daniel J.
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Over time, technology has greatly enhanced access to vast amounts of public data in government datasets. At the same time there has been an increase in ‘neighbourhood’ level research, in which researchers typically select an administrative unit for their analysis. As the demand for data driven insights and decision making continues to rise, researchers face a tradeoff between data suppression (to protect the privacy of citizens) and homogeneity (the similarity of individuals within an area unit for given characteristics). In this paper, we explore the extent that different scales of geography impact data suppression and spatial homogeneity using the intra-class correlation and the D-Statistic. We use age, sex, ethnicity, education and income data from the 2013 New Zealand Census to assess a) the extent to which data are suppressed, and b) the spatial homogeneity of these variables across 5 scales of ‘small area’ geography available to researchers in NZ. The data used for this paper was accessed via the Integrated Data Infrastructure (IDI), a large data repository of de-identified, linked microdata obtained from government agencies, and nationally representative surveys. The scales used in this study are the 2013 Meshblock, Statistical Area 1, Data Zone, Statistical Area 2 and Area Unit, each of which can be used to analyse patterns at the ‘neighbourhood’ scale. We found that Data Zones are a suitable choice for undertaking analyses of census data as they represent a’medium’ scale geography designed to reduce data suppression while maintaining reasonable levels of population homogeneity. The policy implications for this research relate to zone design and decisions relating to the definition of ‘a small cell count’ for data dissemination for different users of sociodemographic data.
ISSN:1874-463X
1874-4621
DOI:10.1007/s12061-021-09430-2