Conditional Random Fields combined FSM stemming method for Uyghur

This paper presents the generation of Uyghur noun suffix DFA combined with conditional random fields (CRF) for stemming algorithm. Because of the agglutinative nature of Uyghur language, stemming is an essential task for Uyghur language processing applications. We generate Uyghur noun inflectional s...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Wumaier, A., Yibulayin, T., Zaokere Kadeer, Shengwei Tian
Format: Tagungsbericht
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 299
container_issue
container_start_page 295
container_title
container_volume
creator Wumaier, A.
Yibulayin, T.
Zaokere Kadeer
Shengwei Tian
description This paper presents the generation of Uyghur noun suffix DFA combined with conditional random fields (CRF) for stemming algorithm. Because of the agglutinative nature of Uyghur language, stemming is an essential task for Uyghur language processing applications. We generate Uyghur noun inflectional suffixes finite state machines (FSMs) by using the morphotactic rules in reverse order. But there are eight suffixes which is similar to the ending part of some words. These suffixes make the FSM ambiguous. We apply the CRF model to resolve ambiguity of the FSM. This paper describes the steps of generating the FSM, building the CRF suffix identifying model and combination of CRF with FSM.
doi_str_mv 10.1109/ICCSIT.2009.5234727
format Conference Proceeding
fullrecord <record><control><sourceid>ieee_6IE</sourceid><recordid>TN_cdi_ieee_primary_5234727</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>5234727</ieee_id><sourcerecordid>5234727</sourcerecordid><originalsourceid>FETCH-LOGICAL-i90t-f255d99f0b206d5f665be6b89d089fb7903ec9bd0041899b1d862950a12746343</originalsourceid><addsrcrecordid>eNotkN1KwzAcxSMy0M0-wW7yAq1Jmo_-L0exWpgIrl6PZkm2SNNIWy_29lbWc3M48ONwOAhtKckoJfBcl-WhbjJGCGSC5VwxdYcSUAXljHMuGGH3aL0ECnSF1v8skFwBfUDJOH6TWTMIij-iXRl74ycf-7bDn21vYsCVt50Z8SkG7XtrcHV4x-NkQ_D9GQc7XaLBLg7463q-_A5PaOXabrTJ4hvUVC9N-ZbuP17rcrdPPZApdUwIA-CIZkQa4aQU2kpdgCEFOK3mffYE2szLaAGgqSkkA0FayhSXOc83aHur9dba48_gQztcj8sD-R9mC0vn</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype></control><display><type>conference_proceeding</type><title>Conditional Random Fields combined FSM stemming method for Uyghur</title><source>IEEE Electronic Library (IEL) Conference Proceedings</source><creator>Wumaier, A. ; Yibulayin, T. ; Zaokere Kadeer ; Shengwei Tian</creator><creatorcontrib>Wumaier, A. ; Yibulayin, T. ; Zaokere Kadeer ; Shengwei Tian</creatorcontrib><description>This paper presents the generation of Uyghur noun suffix DFA combined with conditional random fields (CRF) for stemming algorithm. Because of the agglutinative nature of Uyghur language, stemming is an essential task for Uyghur language processing applications. We generate Uyghur noun inflectional suffixes finite state machines (FSMs) by using the morphotactic rules in reverse order. But there are eight suffixes which is similar to the ending part of some words. These suffixes make the FSM ambiguous. We apply the CRF model to resolve ambiguity of the FSM. This paper describes the steps of generating the FSM, building the CRF suffix identifying model and combination of CRF with FSM.</description><identifier>ISBN: 1424445191</identifier><identifier>ISBN: 9781424445196</identifier><identifier>EISBN: 9781424445202</identifier><identifier>EISBN: 1424445205</identifier><identifier>DOI: 10.1109/ICCSIT.2009.5234727</identifier><identifier>LCCN: 2009903791</identifier><language>eng</language><publisher>IEEE</publisher><subject>Algorithm design and analysis ; Ambiguous FSM ; Automata ; Buildings ; CRF ; Dictionaries ; Doped fiber amplifiers ; Information science ; Morphology ; Natural language processing ; Natural languages ; Statistical analysis ; stemming ; Uyghur</subject><ispartof>2009 2nd IEEE International Conference on Computer Science and Information Technology, 2009, p.295-299</ispartof><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/5234727$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>309,310,777,781,786,787,2052,27906,54901</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/5234727$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Wumaier, A.</creatorcontrib><creatorcontrib>Yibulayin, T.</creatorcontrib><creatorcontrib>Zaokere Kadeer</creatorcontrib><creatorcontrib>Shengwei Tian</creatorcontrib><title>Conditional Random Fields combined FSM stemming method for Uyghur</title><title>2009 2nd IEEE International Conference on Computer Science and Information Technology</title><addtitle>ICCSIT</addtitle><description>This paper presents the generation of Uyghur noun suffix DFA combined with conditional random fields (CRF) for stemming algorithm. Because of the agglutinative nature of Uyghur language, stemming is an essential task for Uyghur language processing applications. We generate Uyghur noun inflectional suffixes finite state machines (FSMs) by using the morphotactic rules in reverse order. But there are eight suffixes which is similar to the ending part of some words. These suffixes make the FSM ambiguous. We apply the CRF model to resolve ambiguity of the FSM. This paper describes the steps of generating the FSM, building the CRF suffix identifying model and combination of CRF with FSM.</description><subject>Algorithm design and analysis</subject><subject>Ambiguous FSM</subject><subject>Automata</subject><subject>Buildings</subject><subject>CRF</subject><subject>Dictionaries</subject><subject>Doped fiber amplifiers</subject><subject>Information science</subject><subject>Morphology</subject><subject>Natural language processing</subject><subject>Natural languages</subject><subject>Statistical analysis</subject><subject>stemming</subject><subject>Uyghur</subject><isbn>1424445191</isbn><isbn>9781424445196</isbn><isbn>9781424445202</isbn><isbn>1424445205</isbn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2009</creationdate><recordtype>conference_proceeding</recordtype><sourceid>6IE</sourceid><sourceid>RIE</sourceid><recordid>eNotkN1KwzAcxSMy0M0-wW7yAq1Jmo_-L0exWpgIrl6PZkm2SNNIWy_29lbWc3M48ONwOAhtKckoJfBcl-WhbjJGCGSC5VwxdYcSUAXljHMuGGH3aL0ECnSF1v8skFwBfUDJOH6TWTMIij-iXRl74ycf-7bDn21vYsCVt50Z8SkG7XtrcHV4x-NkQ_D9GQc7XaLBLg7463q-_A5PaOXabrTJ4hvUVC9N-ZbuP17rcrdPPZApdUwIA-CIZkQa4aQU2kpdgCEFOK3mffYE2szLaAGgqSkkA0FayhSXOc83aHur9dba48_gQztcj8sD-R9mC0vn</recordid><startdate>200908</startdate><enddate>200908</enddate><creator>Wumaier, A.</creator><creator>Yibulayin, T.</creator><creator>Zaokere Kadeer</creator><creator>Shengwei Tian</creator><general>IEEE</general><scope>6IE</scope><scope>6IL</scope><scope>CBEJK</scope><scope>RIE</scope><scope>RIL</scope></search><sort><creationdate>200908</creationdate><title>Conditional Random Fields combined FSM stemming method for Uyghur</title><author>Wumaier, A. ; Yibulayin, T. ; Zaokere Kadeer ; Shengwei Tian</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-i90t-f255d99f0b206d5f665be6b89d089fb7903ec9bd0041899b1d862950a12746343</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2009</creationdate><topic>Algorithm design and analysis</topic><topic>Ambiguous FSM</topic><topic>Automata</topic><topic>Buildings</topic><topic>CRF</topic><topic>Dictionaries</topic><topic>Doped fiber amplifiers</topic><topic>Information science</topic><topic>Morphology</topic><topic>Natural language processing</topic><topic>Natural languages</topic><topic>Statistical analysis</topic><topic>stemming</topic><topic>Uyghur</topic><toplevel>online_resources</toplevel><creatorcontrib>Wumaier, A.</creatorcontrib><creatorcontrib>Yibulayin, T.</creatorcontrib><creatorcontrib>Zaokere Kadeer</creatorcontrib><creatorcontrib>Shengwei Tian</creatorcontrib><collection>IEEE Electronic Library (IEL) Conference Proceedings</collection><collection>IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume</collection><collection>IEEE Xplore All Conference Proceedings</collection><collection>IEEE Electronic Library (IEL)</collection><collection>IEEE Proceedings Order Plans (POP All) 1998-Present</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Wumaier, A.</au><au>Yibulayin, T.</au><au>Zaokere Kadeer</au><au>Shengwei Tian</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>Conditional Random Fields combined FSM stemming method for Uyghur</atitle><btitle>2009 2nd IEEE International Conference on Computer Science and Information Technology</btitle><stitle>ICCSIT</stitle><date>2009-08</date><risdate>2009</risdate><spage>295</spage><epage>299</epage><pages>295-299</pages><isbn>1424445191</isbn><isbn>9781424445196</isbn><eisbn>9781424445202</eisbn><eisbn>1424445205</eisbn><abstract>This paper presents the generation of Uyghur noun suffix DFA combined with conditional random fields (CRF) for stemming algorithm. Because of the agglutinative nature of Uyghur language, stemming is an essential task for Uyghur language processing applications. We generate Uyghur noun inflectional suffixes finite state machines (FSMs) by using the morphotactic rules in reverse order. But there are eight suffixes which is similar to the ending part of some words. These suffixes make the FSM ambiguous. We apply the CRF model to resolve ambiguity of the FSM. This paper describes the steps of generating the FSM, building the CRF suffix identifying model and combination of CRF with FSM.</abstract><pub>IEEE</pub><doi>10.1109/ICCSIT.2009.5234727</doi><tpages>5</tpages></addata></record>
fulltext fulltext_linktorsrc
identifier ISBN: 1424445191
ispartof 2009 2nd IEEE International Conference on Computer Science and Information Technology, 2009, p.295-299
issn
language eng
recordid cdi_ieee_primary_5234727
source IEEE Electronic Library (IEL) Conference Proceedings
subjects Algorithm design and analysis
Ambiguous FSM
Automata
Buildings
CRF
Dictionaries
Doped fiber amplifiers
Information science
Morphology
Natural language processing
Natural languages
Statistical analysis
stemming
Uyghur
title Conditional Random Fields combined FSM stemming method for Uyghur
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-20T06%3A55%3A13IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-ieee_6IE&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=Conditional%20Random%20Fields%20combined%20FSM%20stemming%20method%20for%20Uyghur&rft.btitle=2009%202nd%20IEEE%20International%20Conference%20on%20Computer%20Science%20and%20Information%20Technology&rft.au=Wumaier,%20A.&rft.date=2009-08&rft.spage=295&rft.epage=299&rft.pages=295-299&rft.isbn=1424445191&rft.isbn_list=9781424445196&rft_id=info:doi/10.1109/ICCSIT.2009.5234727&rft_dat=%3Cieee_6IE%3E5234727%3C/ieee_6IE%3E%3Curl%3E%3C/url%3E&rft.eisbn=9781424445202&rft.eisbn_list=1424445205&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=5234727&rfr_iscdi=true