Development and Evaluation of Rich Linguistic Resources for Automatic Indexing of Agricultural Literature

We compiled rich linguistic resources (such as a morpheme dictionary and a stop list) for automatic indexing of agricultural literature. Terms from agricultural dictionaries, registered plant variety names, and new terms extracted from records in the Japan Agricultural Science Index (JASI) were inco...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Agricultural Information Research 2010, Vol.19(1), pp.10-15
Hauptverfasser: Takezaki, Akane, Hosobami, Takashi, Horyu, Daisuke, Kiura, Takuji
Format: Artikel
Sprache:eng ; jpn
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:We compiled rich linguistic resources (such as a morpheme dictionary and a stop list) for automatic indexing of agricultural literature. Terms from agricultural dictionaries, registered plant variety names, and new terms extracted from records in the Japan Agricultural Science Index (JASI) were incorporated into an agricultural morpheme dictionary. The addition of new terms identified by morpheme analysis of JASI records decreased the number of unknown words in subsequent analyses. Combining general and enriched agricultural morpheme dictionaries left fewer unknown words extracted by morpheme analysis than using the general morpheme dictionary only. One-roman letters with the exception of atomic symbols, SI units, reference terms, Indo-Arabic numerals, and numerals were chosen as stop words. Two-thirds of manually indexed terms corresponded completely or partially to automatically indexed terms when both the enriched morpheme dictionary and stop list were used. These results suggest that compiled linguistic resources can improve morpheme analysis and automatic indexing.
ISSN:0916-9482
1881-5219
DOI:10.3173/air.19.10