EMR-LIP: A lightweight framework for standardizing the preprocessing of longitudinal irregular data in electronic medical records

•EMR-LIP offers a preprocessing workflow for longitudinal irregular data that is more aligned with clinical practice than previous pipelines.•EMR-LIP provides automated preprocessing tools for longitudinal irregular data, which are universally applicable across EMR databases.•Across multiple large d...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Computer methods and programs in biomedicine 2025-02, Vol.259, p.108521, Article 108521
Hauptverfasser: Luo, Jiawei, Huang, Shixin, Lan, Lan, Yang, Shu, Cao, Tingqian, Yin, Jin, Qiu, Jiajun, Yang, Xiaoyan, Guo, Yingqiang, Zhou, Xiaobo
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:•EMR-LIP offers a preprocessing workflow for longitudinal irregular data that is more aligned with clinical practice than previous pipelines.•EMR-LIP provides automated preprocessing tools for longitudinal irregular data, which are universally applicable across EMR databases.•Across multiple large databases, data processed by EMR-LIP has demonstrated optimal performance in several benchmark clinical prediction tasks. Longitudinal data from Electronic Medical Records (EMRs) are increasingly utilized to construct predictive models for various clinical tasks, offering enhanced insights into patient health. However, significant discrepancies exist in preprocessing the irregular and intricate EMR data across studies due to the absence of universally accepted tools and standardization methods. This study introduces the Electronic Medical Record Longitudinal Irregular Data Preprocessing (EMR-LIP) framework, a lightweight approach for optimizing the preprocessing of longitudinal, irregular EMR data, aiming to enhance research efficiency, consistency, reproducibility, and comparability. EMR-LIP modularizes the preprocessing of longitudinal irregular EMR data, offering tools with a low level of encapsulation. Compared to other pipelines, EMR-LIP categorizes variables in a more granular manner, designing specific preprocessing techniques for each type. To demonstrate its versatility, EMR-LIP was applied in an empirical study to two public EMR databases, MIMIC-IV and eICU-CRD. Data processed with EMR-LIP was then used to test several renowned deep learning models on a range of commonly used benchmark tasks. In both the MIMIC-IV and eICU-CRD databases, models based on EMR-LIP showed superior baseline performance compared to previous studies. Interestingly, using data preprocessed by EMR-LIP, traditional models such as LSTM and GRU outperformed more complex models, achieving an AUROC of up to 0.94 for in-hospital death prediction. Additionally, models based on EMR-LIP showed stable performance across various resampling intervals and exhibited better fairness in performance across different ethnic groups. EMR-LIP streamlines the preprocessing of irregular longitudinal EMR data, offering an end-to-end solution for model-ready data creation, and has been open-sourced for collaborative refinement by the research community. [Display omitted]
ISSN:0169-2607
1872-7565
1872-7565
DOI:10.1016/j.cmpb.2024.108521