Conditional Random Fields combined FSM stemming method for Uyghur
This paper presents the generation of Uyghur noun suffix DFA combined with conditional random fields (CRF) for stemming algorithm. Because of the agglutinative nature of Uyghur language, stemming is an essential task for Uyghur language processing applications. We generate Uyghur noun inflectional s...
Gespeichert in:
Hauptverfasser: | , , , |
---|---|
Format: | Tagungsbericht |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | This paper presents the generation of Uyghur noun suffix DFA combined with conditional random fields (CRF) for stemming algorithm. Because of the agglutinative nature of Uyghur language, stemming is an essential task for Uyghur language processing applications. We generate Uyghur noun inflectional suffixes finite state machines (FSMs) by using the morphotactic rules in reverse order. But there are eight suffixes which is similar to the ending part of some words. These suffixes make the FSM ambiguous. We apply the CRF model to resolve ambiguity of the FSM. This paper describes the steps of generating the FSM, building the CRF suffix identifying model and combination of CRF with FSM. |
---|---|
DOI: | 10.1109/ICCSIT.2009.5234727 |