A pilot study for "Linked Open Research Data" (LORDpilot): a LOD-based Concept Registry for social science research data
Reusing research data is an important part of research practice in the social and economic sciences. To find suitable data, researchers need functional search options. However, a comprehensive search for data is hampered by inconsistent or missing semantic indexing because different survey programs...
Gespeichert in:
Hauptverfasser: | , , , , , , , , , , |
---|---|
Format: | Report |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Reusing research data is an important part of research practice in the social and economic sciences. To find suitable data, researchers need functional search options. However, a comprehensive search for data is hampered by inconsistent or missing semantic indexing because different survey programs use their own terminology for documentation. In most cases, there is no link between the measured theoretical concepts and the variables. From the user's perspective, the fragmentation of data documentation hampers data retrieval and thus limits the research potential of existing data. The challenge, therefore, lies in the concept-oriented indexing of data. Since semantic modelling for content indexing is still lacking, a process and a technology for a uniform semantic indexing of research data are needed. The LORD infrastructure aims to close this gap.
The LORDpilot project aimed to test the feasibility of a concept registry for the social sciences. To this end, the pilot project developed a data model and a user-friendly interface to link (i.e., annotate) questions and variables with theoretical concepts for a selection of measurement instruments from three large surveys (ALLBUS, Nacaps, SOEP). We used Semantic Web standards for the technical implementation. By linking the concepts with descriptors from the SKOS-compliant "Thesaurus Social Sciences" (TheSoz), the search in the concept database is supported, and the concept vocabulary is linked to the Linked Open Data (LOD) Cloud. The links were created as RDF triples and made available in a triple store with a SPARQL endpoint.
To evaluate our approach, selected measurement instruments of the three surveys were annotated (i.e., questions and variables were described with concepts) by each of the project partners involved, and then the fit between the measurement and the concept was assessed by domain experts. The evaluation of these test annotations shows that (1) the annotations of different annotators show a high degree of agreement, (2) the topical experts predominantly rate the concepts as matching the measurement intention, and (3) conceptual correlations across the data sets become visible via the assigned concepts. However, the analysis also shows non-substantive heterogeneity in the concept vocabulary across annotators.
The pilot study has shown that the infrastructure outlined in the application is feasible if the redundancy in the concept vocabulary is limited, e.g. by suggesting appropriate existing |
---|---|
DOI: | 10.5281/zenodo.11047522 |