Bias of Inaccurate Disease Mentions in Electronic Health Record-based Phenotyping

•Our manual review of 487,300 clinical notes for 10 diseases clarified the presence of disease mentions that do not connote the patient’s diagnosis contrary to syntactic characteristics for all object diseases, except diabetic nephropathy.•If extracting disease mentions from clinical notes is adopte...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	International journal of medical informatics (Shannon, Ireland) Ireland), 2019-04, Vol.124, p.90-96
Hauptverfasser:	Kagawa, Rina, Shinohara, Emiko, Imai, Takeshi, Kawazoe, Yoshimasa, Ohe, Kazuhiko
Format:	Artikel
Sprache:	eng
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	•Our manual review of 487,300 clinical notes for 10 diseases clarified the presence of disease mentions that do not connote the patient’s diagnosis contrary to syntactic characteristics for all object diseases, except diabetic nephropathy.•If extracting disease mentions from clinical notes is adopted as simple and robust electronic health record-based phenotyping algorithms, the bias occurred owing to disease mentions that incorrectly signify a patient’s diagnosis in the value of precision is 78.1% (on average) for free text in progress notes.•The following five categories of physicians’ intentions to write such disease mentions were also formulated: (1) Differential diagnosis, (2) Misinterpretation of meanings, (3) Possibility of suffering from the disease in the future, (4) Screening, pre-surgery screening, general meanings, and (5) Family history, diagnosis of another person. Electronic health record (EHR)-based phenotyping is an automated technique for identifying patients diagnosed with a particular disease using EHR data. However, EHR-based phenotyping has difficulties in achieving satisfactorily high performance because clinical notes include disease mentions that ultimately signify something other than the patient’s diagnosis (such as differential diagnosis or screening). Our objective is to quantify the influence of such disease mentions on EHR-based phenotyping performance. Physicians manually reviewed whether the disease mentions indicated the patients’ diseases in 487,300 clinical notes of 4,430 patients. Particular focus was placed on disease mentions that did not signify the patient’s diagnosis even though they did not have any syntactic modifier or indicator in the same sentences. Patients were then classified according to whether their clinical notes included such disease mentions. Among the patients whose clinical notes included disease mentions without any modifier or indicator, the proportion of patients whose disease mentions signified the patients’ diagnosis was 78.1% (on average). This value can be interpreted as the bias of disease mentions that did not signify the patient’s diagnosis on the precision of EHR-based phenotyping by extracting disease mentions from clinical notes. This study quantified the bias occurred owing to disease mentions that incorrectly signify a patient’s diagnosis in the value of precision of EHR-based phenotyping from four dataset types. The results of this study will help researchers in diverse research envi
ISSN:	1386-5056 1872-8243
DOI:	10.1016/j.ijmedinf.2018.12.004