Automatic Acquisition of Basic Katakana Lexicon from a Given Corpus

Katakana, Japanese phonogram mainly used for loan words, is a troublemaker in Japanese word segmentation. Since Katakana words are heavily domain-dependent and there are many Katakana neologisms, it is almost impossible to construct and maintain Katakana word dictionary by hand. This paper proposes...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Nakazawa, Toshiaki, Kawahara, Daisuke, Kurohashi, Sadao
Format:	Tagungsbericht
Sprache:	eng
Schlagworte:	Applied sciences Artificial intelligence Automatic Acquisition Compound Noun Compound Word Computer science control theory systems Exact sciences and technology Input Word Speech and sound recognition and synthesis. Linguistics Tomato Sauce
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Katakana, Japanese phonogram mainly used for loan words, is a troublemaker in Japanese word segmentation. Since Katakana words are heavily domain-dependent and there are many Katakana neologisms, it is almost impossible to construct and maintain Katakana word dictionary by hand. This paper proposes an automatic segmentation method of Japanese Katakana compounds, which makes it possible to construct precise and concise Katakana word dictionary automatically, given only a medium or large size of Japanese corpus of some domain.
ISSN:	0302-9743 1611-3349
DOI:	10.1007/11562214_60