Using computable knowledge mined from the literature to elucidate confounders for EHR-based pharmacovigilance

[Display omitted] •Drug safety research asks causal questions but must rely on observational data.•We use literature-derived computable knowledge to elucidate confounders.•We search a knowledge base for common causes relative to a drug and an adverse event.•We test modeling and inference methods on...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Journal of biomedical informatics 2021-05, Vol.117, p.103719-103719, Article 103719
Hauptverfasser:	Malec, Scott A., Wei, Peng, Bernstam, Elmer V., Boyce, Richard D., Cohen, Trevor
Format:	Artikel
Sprache:	eng
Schlagworte:	Bias Causal inference Causality Confounder selection Confounding bias Electronic health records Models, Theoretical Pharmacovigilance Reproducibility of Results
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	[Display omitted] •Drug safety research asks causal questions but must rely on observational data.•We use literature-derived computable knowledge to elucidate confounders.•We search a knowledge base for common causes relative to a drug and an adverse event.•We test modeling and inference methods on health data to detect adverse drug events.•Models informed with confounders from vector-based search performed best. Drug safety research asks causal questions but relies on observational data. Confounding bias threatens the reliability of studies using such data. The successful control of confounding requires knowledge of variables called confounders affecting both the exposure and outcome of interest. However, causal knowledge of dynamic biological systems is complex and challenging. Fortunately, computable knowledge mined from the literature may hold clues about confounders. In this paper, we tested the hypothesis that incorporating literature-derived confounders can improve causal inference from observational data. We introduce two methods (semantic vector-based and string-based confounder search) that query literature-derived information for confounder candidates to control, using SemMedDB, a database of computable knowledge mined from the biomedical literature. These methods search SemMedDB for confounders by applying semantic constraint search for indications treated by the drug (exposure) and that are also known to cause the adverse event (outcome). We then include the literature-derived confounder candidates in statistical and causal models derived from free-text clinical notes. For evaluation, we use a reference dataset widely used in drug safety containing labeled pairwise relationships between drugs and adverse events and attempt to rediscover these relationships from a corpus of 2.2 M NLP-processed free-text clinical notes. We employ standard adjustment and causal inference procedures to predict and estimate causal effects by informing the models with varying numbers of literature-derived confounders and instantiating the exposure, outcome, and confounder variables in the models with dichotomous EHR-derived data. Finally, we compare the results from applying these procedures with naive measures of association (χ2 and reporting odds ratio) and with each other. We found semantic vector-based search to be superior to string-based search at reducing confounding bias. However, the effect of including more rather than fewer literature-derived confounders
ISSN:	1532-0464 1532-0480 1532-0480
DOI:	10.1016/j.jbi.2021.103719