Development and Evaluation of Rich Linguistic Resources for Automatic Indexing of Agricultural Literature
We compiled rich linguistic resources (such as a morpheme dictionary and a stop list) for automatic indexing of agricultural literature. Terms from agricultural dictionaries, registered plant variety names, and new terms extracted from records in the Japan Agricultural Science Index (JASI) were inco...
Gespeichert in:
Veröffentlicht in: | Agricultural Information Research 2010, Vol.19(1), pp.10-15 |
---|---|
Hauptverfasser: | , , , |
Format: | Artikel |
Sprache: | eng ; jpn |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | We compiled rich linguistic resources (such as a morpheme dictionary and a stop list) for automatic indexing of agricultural literature. Terms from agricultural dictionaries, registered plant variety names, and new terms extracted from records in the Japan Agricultural Science Index (JASI) were incorporated into an agricultural morpheme dictionary. The addition of new terms identified by morpheme analysis of JASI records decreased the number of unknown words in subsequent analyses. Combining general and enriched agricultural morpheme dictionaries left fewer unknown words extracted by morpheme analysis than using the general morpheme dictionary only. One-roman letters with the exception of atomic symbols, SI units, reference terms, Indo-Arabic numerals, and numerals were chosen as stop words. Two-thirds of manually indexed terms corresponded completely or partially to automatically indexed terms when both the enriched morpheme dictionary and stop list were used. These results suggest that compiled linguistic resources can improve morpheme analysis and automatic indexing. |
---|---|
ISSN: | 0916-9482 1881-5219 |
DOI: | 10.3173/air.19.10 |