Profile of a dictionary compiled from scanning over one million words of surgical pathology narrative text

An anatomic pathology natural language dictionary (LEXICON) has evolved over a nine-year period, a result of scanning over one million words of narrative text from tissue examination request forms and surgical pathology reports. The text is parsed into individual words which are looked up in LEXICON...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Computers and biomedical research 1980-08, Vol.13 (4), p.382-398
Hauptverfasser: Wong, Ruth L., Reno, James D., Hain, Timothy C., Platt, Robert C., Gaynon, Paul S., Joseph, David M.
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:An anatomic pathology natural language dictionary (LEXICON) has evolved over a nine-year period, a result of scanning over one million words of narrative text from tissue examination request forms and surgical pathology reports. The text is parsed into individual words which are looked up in LEXICON and flagged by action codes which determine usage in constructing a KWIC index file and an on-line database retrievable by keywords. The LEXICON now resides on an IBM 370 168 system and has survived several transfers between computer systems. An update program is used after each batch of narrative text is scanned to modify LEXICON. LEXICON now contains 24,228 medical and nonmedical terms, 24.8% are errors (misspellings), 45.9% are keywords retrievable on and off line, 52.2% of the words are cross-referenced to a supplementary word. A preliminary study shows that many of the “nonmedical” terms in LEXICON carry significant medical information, and that there is considerable overlap of medical words among LEXICON, SNOMED, and ICDA-8. Our LEXICON appears to be an intermediate step in the process of evolving an algorithm capable of “understanding” medical narrative text.
ISSN:0010-4809
1090-2368
DOI:10.1016/0010-4809(80)90029-4