Providing Machine Tractable Dictionary Tools
Machine readable dictionaries (MRDs) contain knowledge about language and the world essential for tasks in natural language processing (NLP). However, this knowledge, collected and recorded by lexicographers for human readers, is not presented in a manner for MRDs to be used directly for NLP tasks....
Gespeichert in:
Veröffentlicht in: | Machine translation 1990-06, Vol.5 (2), p.99-154 |
---|---|
Hauptverfasser: | , , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Machine readable dictionaries (MRDs) contain knowledge about language and the world essential for tasks in natural language processing (NLP). However, this knowledge, collected and recorded by lexicographers for human readers, is not presented in a manner for MRDs to be used directly for NLP tasks. What is badly needed are machine tractable dictionaries (MTDs): MRDs transformed into a format usable for NLP. This paper discusses three different but related large-scale computational methods to transform MRDs into MTDs. The MRD used is "The Longman Dictionary of Contemporary English" (LDOCE). The three methods differ in the amount of knowledge they start with and the kinds of knowledge they provide. All require some handcoding of initial information but are largely automatic. Method I, a statistical approach, uses the least handcoding. It generates "relatedness" networks for words in LDOCE and presents a method for doing partial word sense disambiguation. Method II employs the most handcoding because it develops and builds lexical entries for a very carefully controlled defining vocabulary of 2,000 word senses (1,000 words). The payoff is that the method will provide an MTD containing highly structured semantic information. Method III requires the handcoding of a grammar and the semantic patterns used by its parser, but not the handcoding of any lexical material. This is because the method builds up lexical material from sources wholly within LDOCE. The information extracted is a set of sources of information, individually weak, but which can be combined to give a strong and determinate linguistic data base. |
---|---|
ISSN: | 0922-6567 1573-0573 |
DOI: | 10.1007/bf00393758 |