Annotating longitudinal clinical narratives for de-identification: The 2014 i2b2/UTHealth corpus

[Display omitted] •De-identification shared task for longitudinal clinical records.•Protected Health Information in records replaced with realistic surrogates.•First corpus of its kind available for distribution.•Used for Track 1 of the 2014 i2b2/UTHealth NLP shared task. The 2014 i2b2/UTHealth natu...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Journal of biomedical informatics 2015-12, Vol.58 (Suppl), p.S20-S29
Hauptverfasser: Stubbs, Amber, Uzuner, Özlem
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:[Display omitted] •De-identification shared task for longitudinal clinical records.•Protected Health Information in records replaced with realistic surrogates.•First corpus of its kind available for distribution.•Used for Track 1 of the 2014 i2b2/UTHealth NLP shared task. The 2014 i2b2/UTHealth natural language processing shared task featured a track focused on the de-identification of longitudinal medical records. For this track, we de-identified a set of 1304 longitudinal medical records describing 296 patients. This corpus was de-identified under a broad interpretation of the HIPAA guidelines using double-annotation followed by arbitration, rounds of sanity checking, and proof reading. The average token-based F1 measure for the annotators compared to the gold standard was 0.927. The resulting annotations were used both to de-identify the data and to set the gold standard for the de-identification track of the 2014 i2b2/UTHealth shared task. All annotated private health information were replaced with realistic surrogates automatically and then read over and corrected manually. The resulting corpus is the first of its kind made available for de-identification research. This corpus was first used for the 2014 i2b2/UTHealth shared task, during which the systems achieved a mean F-measure of 0.872 and a maximum F-measure of 0.964 using entity-based micro-averaged evaluations.
ISSN:1532-0464
1532-0480
DOI:10.1016/j.jbi.2015.07.020