Building gold standard corpora for medical natural language processing tasks

We present the construction of three annotated corpora to serve as gold standards for medical natural language processing (NLP) tasks. Clinical notes from the medical record, clinical trial announcements, and FDA drug labels are annotated. We report high inter-annotator agreements (overall F-measure...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	AMIA ... Annual Symposium proceedings 2012, Vol.2012, p.144-153
Hauptverfasser:	Deleger, Louise, Li, Qi, Lingren, Todd, Kaiser, Megan, Molnar, Katalin, Stoutenborough, Laura, Kouril, Michal, Marsolo, Keith, Solti, Imre
Format:	Artikel
Sprache:	eng
Schlagworte:	Clinical Trials as Topic Drug Labeling Medical Records Natural Language Processing Software United States United States Food and Drug Administration
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	We present the construction of three annotated corpora to serve as gold standards for medical natural language processing (NLP) tasks. Clinical notes from the medical record, clinical trial announcements, and FDA drug labels are annotated. We report high inter-annotator agreements (overall F-measures between 0.8467 and 0.9176) for the annotation of Personal Health Information (PHI) elements for a de-identification task and of medications, diseases/disorders, and signs/symptoms for information extraction (IE) task. The annotated corpora of clinical trials and FDA labels will be publicly released and to facilitate translational NLP tasks that require cross-corpora interoperability (e.g. clinical trial eligibility screening) their annotation schemas are aligned with a large scale, NIH-funded clinical text annotation project.
ISSN:	1559-4076