Implementing Automated Data Validation for Canadian Political Datasets

This paper describes a series of automated data validation tests for datasets detailing charity financial information, political donations, and government lobbying in Canada. We motivate and document a series of 200 tests that check the validity, internal consistency, and external consistency of the...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:arXiv.org 2023-09
Hauptverfasser: Katz, Lindsay, Moore, Callandra
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:This paper describes a series of automated data validation tests for datasets detailing charity financial information, political donations, and government lobbying in Canada. We motivate and document a series of 200 tests that check the validity, internal consistency, and external consistency of these datasets. We present preliminary findings after application of these tests to the political donations (\(\approx10.1\) million observations) and lobbying (\(\approx711,200\) observations) datasets, and to a sample of \(\approx380,880\) observations from the charities datasets. We conclude with areas for future work and lessons learnt for others looking to implement automated data validation in their own workflows.
ISSN:2331-8422