How to utilize syllable distribution patterns as the input of LSTM for Korean morphological analysis

This paper proposes the use of syllable distribution patterns as deep learning inputs for morphological analysis. The proposed syllable distribution pattern comprises two parts: a distributed syllable embedding vector and a morpheme syllable-level distribution pattern. As a learning method, we utili...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Pattern recognition letters 2019-04, Vol.120, p.39-45
Hauptverfasser:	Kim, Hyemin, Yang, Seon, Ko, Youngjoong
Format:	Artikel
Sprache:	eng
Schlagworte:	Bi-LSTM-CRF Embedding Long short-term memory Machine learning Morpheme distribution Morphological analysis Morphology POS tagging Restoration Syllable distribution pattern Syllable embedding
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	This paper proposes the use of syllable distribution patterns as deep learning inputs for morphological analysis. The proposed syllable distribution pattern comprises two parts: a distributed syllable embedding vector and a morpheme syllable-level distribution pattern. As a learning method, we utilize bidirectional long short-term memory with a conditional random field layer (Bi-LSTM-CRF) for Korean part-of-speech tagging tasks. After syllable-level outputs are generated by Bi-LSTM-CRF, a morpheme restoration process is performed utilizing pre-analyzed dictionaries that were automatically created from a training corpus. Experimental results reveal outstanding performance for the proposed method with an F1-score of 98.65%.
ISSN:	0167-8655 1872-7344
DOI:	10.1016/j.patrec.2018.12.019