Detecting Quality Problems in Research Data: A Model-Driven Approach
As scientific progress highly depends on the quality of research data, there are strict requirements for data quality coming from the scientific community. A major challenge in data quality assurance is to localise quality problems that are inherent to data. Due to the dynamic digitalisation in spec...
Gespeichert in:
Hauptverfasser: | , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | As scientific progress highly depends on the quality of research data, there
are strict requirements for data quality coming from the scientific community.
A major challenge in data quality assurance is to localise quality problems
that are inherent to data. Due to the dynamic digitalisation in specific
scientific fields, especially the humanities, different database technologies
and data formats may be used in rather short terms to gain experiences. We
present a model-driven approach to analyse the quality of research data. It
allows abstracting from the underlying database technology. Based on the
observation that many quality problems show anti-patterns, a data engineer
formulates analysis patterns that are generic concerning the database format
and technology. A domain expert chooses a pattern that has been adapted to a
specific database technology and concretises it for a domain-specific database
format. The resulting concrete patterns are used by data analysts to locate
quality problems in their databases. As proof of concept, we implemented tool
support that realises this approach for XML databases. We evaluated our
approach concerning expressiveness and performance in the domain of cultural
heritage based on a qualitative study on quality problems occurring in cultural
heritage data. |
---|---|
DOI: | 10.48550/arxiv.2007.11298 |