Conditional Random Fields combined FSM stemming method for Uyghur
This paper presents the generation of Uyghur noun suffix DFA combined with conditional random fields (CRF) for stemming algorithm. Because of the agglutinative nature of Uyghur language, stemming is an essential task for Uyghur language processing applications. We generate Uyghur noun inflectional s...
Gespeichert in:
Hauptverfasser: | , , , |
---|---|
Format: | Tagungsbericht |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 299 |
---|---|
container_issue | |
container_start_page | 295 |
container_title | |
container_volume | |
creator | Wumaier, A. Yibulayin, T. Zaokere Kadeer Shengwei Tian |
description | This paper presents the generation of Uyghur noun suffix DFA combined with conditional random fields (CRF) for stemming algorithm. Because of the agglutinative nature of Uyghur language, stemming is an essential task for Uyghur language processing applications. We generate Uyghur noun inflectional suffixes finite state machines (FSMs) by using the morphotactic rules in reverse order. But there are eight suffixes which is similar to the ending part of some words. These suffixes make the FSM ambiguous. We apply the CRF model to resolve ambiguity of the FSM. This paper describes the steps of generating the FSM, building the CRF suffix identifying model and combination of CRF with FSM. |
doi_str_mv | 10.1109/ICCSIT.2009.5234727 |
format | Conference Proceeding |
fullrecord | <record><control><sourceid>ieee_6IE</sourceid><recordid>TN_cdi_ieee_primary_5234727</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>5234727</ieee_id><sourcerecordid>5234727</sourcerecordid><originalsourceid>FETCH-LOGICAL-i90t-f255d99f0b206d5f665be6b89d089fb7903ec9bd0041899b1d862950a12746343</originalsourceid><addsrcrecordid>eNotkN1KwzAcxSMy0M0-wW7yAq1Jmo_-L0exWpgIrl6PZkm2SNNIWy_29lbWc3M48ONwOAhtKckoJfBcl-WhbjJGCGSC5VwxdYcSUAXljHMuGGH3aL0ECnSF1v8skFwBfUDJOH6TWTMIij-iXRl74ycf-7bDn21vYsCVt50Z8SkG7XtrcHV4x-NkQ_D9GQc7XaLBLg7463q-_A5PaOXabrTJ4hvUVC9N-ZbuP17rcrdPPZApdUwIA-CIZkQa4aQU2kpdgCEFOK3mffYE2szLaAGgqSkkA0FayhSXOc83aHur9dba48_gQztcj8sD-R9mC0vn</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype></control><display><type>conference_proceeding</type><title>Conditional Random Fields combined FSM stemming method for Uyghur</title><source>IEEE Electronic Library (IEL) Conference Proceedings</source><creator>Wumaier, A. ; Yibulayin, T. ; Zaokere Kadeer ; Shengwei Tian</creator><creatorcontrib>Wumaier, A. ; Yibulayin, T. ; Zaokere Kadeer ; Shengwei Tian</creatorcontrib><description>This paper presents the generation of Uyghur noun suffix DFA combined with conditional random fields (CRF) for stemming algorithm. Because of the agglutinative nature of Uyghur language, stemming is an essential task for Uyghur language processing applications. We generate Uyghur noun inflectional suffixes finite state machines (FSMs) by using the morphotactic rules in reverse order. But there are eight suffixes which is similar to the ending part of some words. These suffixes make the FSM ambiguous. We apply the CRF model to resolve ambiguity of the FSM. This paper describes the steps of generating the FSM, building the CRF suffix identifying model and combination of CRF with FSM.</description><identifier>ISBN: 1424445191</identifier><identifier>ISBN: 9781424445196</identifier><identifier>EISBN: 9781424445202</identifier><identifier>EISBN: 1424445205</identifier><identifier>DOI: 10.1109/ICCSIT.2009.5234727</identifier><identifier>LCCN: 2009903791</identifier><language>eng</language><publisher>IEEE</publisher><subject>Algorithm design and analysis ; Ambiguous FSM ; Automata ; Buildings ; CRF ; Dictionaries ; Doped fiber amplifiers ; Information science ; Morphology ; Natural language processing ; Natural languages ; Statistical analysis ; stemming ; Uyghur</subject><ispartof>2009 2nd IEEE International Conference on Computer Science and Information Technology, 2009, p.295-299</ispartof><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/5234727$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>309,310,777,781,786,787,2052,27906,54901</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/5234727$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Wumaier, A.</creatorcontrib><creatorcontrib>Yibulayin, T.</creatorcontrib><creatorcontrib>Zaokere Kadeer</creatorcontrib><creatorcontrib>Shengwei Tian</creatorcontrib><title>Conditional Random Fields combined FSM stemming method for Uyghur</title><title>2009 2nd IEEE International Conference on Computer Science and Information Technology</title><addtitle>ICCSIT</addtitle><description>This paper presents the generation of Uyghur noun suffix DFA combined with conditional random fields (CRF) for stemming algorithm. Because of the agglutinative nature of Uyghur language, stemming is an essential task for Uyghur language processing applications. We generate Uyghur noun inflectional suffixes finite state machines (FSMs) by using the morphotactic rules in reverse order. But there are eight suffixes which is similar to the ending part of some words. These suffixes make the FSM ambiguous. We apply the CRF model to resolve ambiguity of the FSM. This paper describes the steps of generating the FSM, building the CRF suffix identifying model and combination of CRF with FSM.</description><subject>Algorithm design and analysis</subject><subject>Ambiguous FSM</subject><subject>Automata</subject><subject>Buildings</subject><subject>CRF</subject><subject>Dictionaries</subject><subject>Doped fiber amplifiers</subject><subject>Information science</subject><subject>Morphology</subject><subject>Natural language processing</subject><subject>Natural languages</subject><subject>Statistical analysis</subject><subject>stemming</subject><subject>Uyghur</subject><isbn>1424445191</isbn><isbn>9781424445196</isbn><isbn>9781424445202</isbn><isbn>1424445205</isbn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2009</creationdate><recordtype>conference_proceeding</recordtype><sourceid>6IE</sourceid><sourceid>RIE</sourceid><recordid>eNotkN1KwzAcxSMy0M0-wW7yAq1Jmo_-L0exWpgIrl6PZkm2SNNIWy_29lbWc3M48ONwOAhtKckoJfBcl-WhbjJGCGSC5VwxdYcSUAXljHMuGGH3aL0ECnSF1v8skFwBfUDJOH6TWTMIij-iXRl74ycf-7bDn21vYsCVt50Z8SkG7XtrcHV4x-NkQ_D9GQc7XaLBLg7463q-_A5PaOXabrTJ4hvUVC9N-ZbuP17rcrdPPZApdUwIA-CIZkQa4aQU2kpdgCEFOK3mffYE2szLaAGgqSkkA0FayhSXOc83aHur9dba48_gQztcj8sD-R9mC0vn</recordid><startdate>200908</startdate><enddate>200908</enddate><creator>Wumaier, A.</creator><creator>Yibulayin, T.</creator><creator>Zaokere Kadeer</creator><creator>Shengwei Tian</creator><general>IEEE</general><scope>6IE</scope><scope>6IL</scope><scope>CBEJK</scope><scope>RIE</scope><scope>RIL</scope></search><sort><creationdate>200908</creationdate><title>Conditional Random Fields combined FSM stemming method for Uyghur</title><author>Wumaier, A. ; Yibulayin, T. ; Zaokere Kadeer ; Shengwei Tian</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-i90t-f255d99f0b206d5f665be6b89d089fb7903ec9bd0041899b1d862950a12746343</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2009</creationdate><topic>Algorithm design and analysis</topic><topic>Ambiguous FSM</topic><topic>Automata</topic><topic>Buildings</topic><topic>CRF</topic><topic>Dictionaries</topic><topic>Doped fiber amplifiers</topic><topic>Information science</topic><topic>Morphology</topic><topic>Natural language processing</topic><topic>Natural languages</topic><topic>Statistical analysis</topic><topic>stemming</topic><topic>Uyghur</topic><toplevel>online_resources</toplevel><creatorcontrib>Wumaier, A.</creatorcontrib><creatorcontrib>Yibulayin, T.</creatorcontrib><creatorcontrib>Zaokere Kadeer</creatorcontrib><creatorcontrib>Shengwei Tian</creatorcontrib><collection>IEEE Electronic Library (IEL) Conference Proceedings</collection><collection>IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume</collection><collection>IEEE Xplore All Conference Proceedings</collection><collection>IEEE Electronic Library (IEL)</collection><collection>IEEE Proceedings Order Plans (POP All) 1998-Present</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Wumaier, A.</au><au>Yibulayin, T.</au><au>Zaokere Kadeer</au><au>Shengwei Tian</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>Conditional Random Fields combined FSM stemming method for Uyghur</atitle><btitle>2009 2nd IEEE International Conference on Computer Science and Information Technology</btitle><stitle>ICCSIT</stitle><date>2009-08</date><risdate>2009</risdate><spage>295</spage><epage>299</epage><pages>295-299</pages><isbn>1424445191</isbn><isbn>9781424445196</isbn><eisbn>9781424445202</eisbn><eisbn>1424445205</eisbn><abstract>This paper presents the generation of Uyghur noun suffix DFA combined with conditional random fields (CRF) for stemming algorithm. Because of the agglutinative nature of Uyghur language, stemming is an essential task for Uyghur language processing applications. We generate Uyghur noun inflectional suffixes finite state machines (FSMs) by using the morphotactic rules in reverse order. But there are eight suffixes which is similar to the ending part of some words. These suffixes make the FSM ambiguous. We apply the CRF model to resolve ambiguity of the FSM. This paper describes the steps of generating the FSM, building the CRF suffix identifying model and combination of CRF with FSM.</abstract><pub>IEEE</pub><doi>10.1109/ICCSIT.2009.5234727</doi><tpages>5</tpages></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | ISBN: 1424445191 |
ispartof | 2009 2nd IEEE International Conference on Computer Science and Information Technology, 2009, p.295-299 |
issn | |
language | eng |
recordid | cdi_ieee_primary_5234727 |
source | IEEE Electronic Library (IEL) Conference Proceedings |
subjects | Algorithm design and analysis Ambiguous FSM Automata Buildings CRF Dictionaries Doped fiber amplifiers Information science Morphology Natural language processing Natural languages Statistical analysis stemming Uyghur |
title | Conditional Random Fields combined FSM stemming method for Uyghur |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-20T06%3A55%3A13IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-ieee_6IE&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=Conditional%20Random%20Fields%20combined%20FSM%20stemming%20method%20for%20Uyghur&rft.btitle=2009%202nd%20IEEE%20International%20Conference%20on%20Computer%20Science%20and%20Information%20Technology&rft.au=Wumaier,%20A.&rft.date=2009-08&rft.spage=295&rft.epage=299&rft.pages=295-299&rft.isbn=1424445191&rft.isbn_list=9781424445196&rft_id=info:doi/10.1109/ICCSIT.2009.5234727&rft_dat=%3Cieee_6IE%3E5234727%3C/ieee_6IE%3E%3Curl%3E%3C/url%3E&rft.eisbn=9781424445202&rft.eisbn_list=1424445205&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=5234727&rfr_iscdi=true |