Tailoring vocabularies for NLP in sub-domains: a method to detect unused word sense
We developed a method to help tailor a comprehensive vocabulary system (e.g. the UMLS) for a sub-domain (e.g. clinical reports) in support of natural language processing (NLP). The method detects unused sense in a sub-domain by comparing the relational neighborhood of a word/term in the vocabulary w...
Gespeichert in:
Veröffentlicht in: | AMIA ... Annual Symposium proceedings 2009-11, Vol.2009, p.188-192 |
---|---|
Hauptverfasser: | , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | We developed a method to help tailor a comprehensive vocabulary system (e.g. the UMLS) for a sub-domain (e.g. clinical reports) in support of natural language processing (NLP). The method detects unused sense in a sub-domain by comparing the relational neighborhood of a word/term in the vocabulary with the semantic neighborhood of the word/term in the sub-domain. The semantic neighborhood of the word/term in the sub-domain is determined using latent semantic analysis (LSA). We trained and tested the unused sense detection on two clinical text corpora: one contains discharge summaries and the other outpatient visit notes. We were able to detect unused senses with precision from 79% to 87%, recall from 48% to 74%, and an area under receiver operation curve (AUC) of 72% to 87%. |
---|---|
ISSN: | 1559-4076 |