Providing Machine Tractable Dictionary Tools

Machine readable dictionaries (MRDs) contain knowledge about language and the world essential for tasks in natural language processing (NLP). However, this knowledge, collected and recorded by lexicographers for human readers, is not presented in a manner for MRDs to be used directly for NLP tasks....

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Machine translation 1990-06, Vol.5 (2), p.99-154
Hauptverfasser: Wilks, Yorick, Fass, Dan, Guo, Cheng-Ming, McDonald, James E., Plate, Tony, Slator, Brian M.
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Machine readable dictionaries (MRDs) contain knowledge about language and the world essential for tasks in natural language processing (NLP). However, this knowledge, collected and recorded by lexicographers for human readers, is not presented in a manner for MRDs to be used directly for NLP tasks. What is badly needed are machine tractable dictionaries (MTDs): MRDs transformed into a format usable for NLP. This paper discusses three different but related large-scale computational methods to transform MRDs into MTDs. The MRD used is "The Longman Dictionary of Contemporary English" (LDOCE). The three methods differ in the amount of knowledge they start with and the kinds of knowledge they provide. All require some handcoding of initial information but are largely automatic. Method I, a statistical approach, uses the least handcoding. It generates "relatedness" networks for words in LDOCE and presents a method for doing partial word sense disambiguation. Method II employs the most handcoding because it develops and builds lexical entries for a very carefully controlled defining vocabulary of 2,000 word senses (1,000 words). The payoff is that the method will provide an MTD containing highly structured semantic information. Method III requires the handcoding of a grammar and the semantic patterns used by its parser, but not the handcoding of any lexical material. This is because the method builds up lexical material from sources wholly within LDOCE. The information extracted is a set of sources of information, individually weak, but which can be combined to give a strong and determinate linguistic data base.
ISSN:0922-6567
1573-0573
DOI:10.1007/bf00393758