MORPHEME ANALYSIS MODEL GENERATION DEVICE, MORPHEME ANALYSIS MODEL GENERATION METHOD, AND PROGRAM

PROBLEM TO BE SOLVED: To provide a morpheme analysis model generation device capable of estimating spaced-wording and a part of speech by teacherless learning.SOLUTION: A morpheme analysis model generation device 1 includes a learning data storage part 18 for storing a plurality of characters as dat...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: UCHIUMI KEI, TSUKAHARA YASUSHI
Format: Patent
Sprache:eng ; jpn
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:PROBLEM TO BE SOLVED: To provide a morpheme analysis model generation device capable of estimating spaced-wording and a part of speech by teacherless learning.SOLUTION: A morpheme analysis model generation device 1 includes a learning data storage part 18 for storing a plurality of characters as data for learning, and a learning part 11 for repeatedly carrying out processing for updating parameters of a morpheme analysis model while executing sampling for spaced-wording of each sentence and mapping of a part of speech to each word candidate constituting the spaced-wording. The morpheme analysis model is, in a hidden semi Markov model in which a character string is an observed value and a word boundary in the character string and a part of speech are hidden classes, a non-parametric Bayesian model in which a stochastic process is applied to the prior distribution of the word n-gram probability of each part of speed and the prior distribution of a part-of-speech n-gram probability as a transition probability.SELECTED DRAWING: Figure 2 【課題】 教師なし学習によって、分かち書きと品詞推定を行うことが可能な形態素解析モデル生成装置を提供する。【解決手段】 形態素解析モデル生成装置1は、学習用のデータとして複数の文を記憶した学習データ記憶部18と、各文の分かち書きと当該分かち書きを構成する各単語候補に対する品詞の対応付けのサンプリングを行いながら、形態素解析モデルのパラメータの更新を行う処理を、所定の収束条件を満たすまで繰り返し行う学習部と11を備えている。形態素解析モデルは、文字列を観測値とし、文字列における単語境界及び品詞を隠れクラスとする隠れセミマルコフモデルにおいて、品詞ごとの単語n−gram確率の事前分布、及び、遷移確率である品詞n−gram確率の事前分布に確率過程を適用したノンパラメトリックベイズモデルである。【選択図】 図2