SCALPEL3: A scalable open-source library for healthcare claims databases

•Concept extraction and manipulation from large claims databases hinders research.•SCALPEL3 framework eases these tasks on SNDS, a French large claims database.•It leverages distributed computing, data denormalization, and columnar storage.•Implemented in Scala and Python on top of Spark, free and o...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:International journal of medical informatics (Shannon, Ireland) Ireland), 2020-09, Vol.141, p.104203-104203, Article 104203
Hauptverfasser: Bacry, Emmanuel, Gaïffas, Stéphane, Leroy, Fanny, Morel, Maryan, Nguyen, Dinh-Phong, Sebiat, Youcef, Sun, Dian
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:•Concept extraction and manipulation from large claims databases hinders research.•SCALPEL3 framework eases these tasks on SNDS, a French large claims database.•It leverages distributed computing, data denormalization, and columnar storage.•Implemented in Scala and Python on top of Spark, free and open-source.•Tested code and knowledge encapsulation foster code reuse and reproducibility. This article introduces SCALPEL3 (Scalable Pipeline for Health Data), a scalable open-source framework for studies involving Large Observational Databases (LODs). It focuses on scalable medical concept extraction, easy interactive analysis, and helpers for data flow analysis to accelerate studies performed on LODs. Inspired from web analytics, SCALPEL3 relies on distributed computing, data denormalization and columnar storage. It was compared to the existing SAS-Oracle SNDS infrastructure by performing several queries on a dataset containing a three years-long history of healthcare claims of 13.7 million patients. SCALPEL3 horizontal scalability allows handling large tasks quicker than the existing infrastructure while it has comparable performance when using only a few executors. SCALPEL3 provides a sharp interactive control of data processing through legible code, which helps to build studies with full reproducibility, leading to improved maintainability and audit of studies performed on LODs. SCALPEL3 makes studies based on SNDS much easier and more scalable than the existing framework [1]. It is now used at the agency collecting SNDS data, at the French Ministry of Health and soon at the National Health Data Hub in France [2].
ISSN:1386-5056
1872-8243
DOI:10.1016/j.ijmedinf.2020.104203