Ontology-based integration and querying of heterogeneous rare disease data sources — POLVAS perspective

The integration of rare disease medical databases belonging to different countries is an important problem, as a large number of observations are required for reliable statistical inference of patient data in order to facilitate clinical research. Such integration of national registry data, which re...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Computers in biology and medicine 2024-12, Vol.185, p.109452, Article 109452
Hauptverfasser: Palacz, Wojciech, Lichołai, Sabina, Musiał, Jacek, Wawrzycka-Adamczyk, Katarzyna, Ślusarczyk, Grażyna, Strug, Barbara, Yaman, Beyza, Tesi, Michelangelo, Gisslander, Karl, O’Sullivan, Declan, Vaglio, Augusto, Emmi, Giacomo, Little, Mark A., Wójcik, Krzysztof
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:The integration of rare disease medical databases belonging to different countries is an important problem, as a large number of observations are required for reliable statistical inference of patient data in order to facilitate clinical research. Such integration of national registry data, which requires harmonization of the heterogeneous data sets into a unified view, is facilitated in the European FAIRVASC project by developing a domain-specific ontology. The FAIRVASC project is dedicated to the rare disease of anti-neutrophil cytoplasmic antibody (ANCA) associated vasculitis (AAV). This paper focuses on the practical issues and challenges, encountered during the process of integrating the Polish national database POLVAS into the federated database within the FAIRVASC project. It discusses the use of ontology-based methods for data integration and the importance of ensuring patient privacy and data protection. It addresses the problem of missing information in POLVAS, which can be obtained by aggregating other data available within the database, incompatibility of data types and formats, and mapping polish data names into the common vocabulary. The modifications of mappings used to ‘uplift’ national data into the Resource Description Framework (RDF) triplestore are also proposed. The described methods allow for integrating the Polish national database into the European network over which federated queries are performed. •The FAIRVASC project integrates seven registries concerned with the rare disease of anti-neutrophil cytoplasmic antibody (ANCA) associated vasculitis (AAV).•Heterogeneous data from registries are converted into unified form defined by an AAV-specific FAIRVASC ontology.•This article describes the conversion process of the POLVAS registry, which uses a Python preprocessor in addition to an R2RML mapping.•Each registry has their own SPARQL server with their ‘uplifted’ data. To combine these datasets, federated queries are used.
ISSN:0010-4825
1879-0534
1879-0534
DOI:10.1016/j.compbiomed.2024.109452