Error analysis in Croatian morphosyntactic tagging
In this paper, we provide detailed insight on properties of errors generated by a stochastic morphosyntactic tagger assigning multext-East morphosyntactic descriptions to Croatian texts. Tagging the Croatia Weekly newspaper corpus by the CroTag tagger in stochastic mode revealed that approximately 8...
Gespeichert in:
Hauptverfasser: | , , |
---|---|
Format: | Tagungsbericht |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 526 |
---|---|
container_issue | |
container_start_page | 521 |
container_title | |
container_volume | |
creator | Agic, Zeljko Tadic, Marko Dovedan, Zdravko |
description | In this paper, we provide detailed insight on properties of errors generated by a stochastic morphosyntactic tagger assigning multext-East morphosyntactic descriptions to Croatian texts. Tagging the Croatia Weekly newspaper corpus by the CroTag tagger in stochastic mode revealed that approximately 85 percent of all tagging errors occur on nouns, adjectives, pronouns and verbs. Moreover, approximately 50 percent of these are shown to be incorrect assignments of case values. We provide various other distributional properties of errors in assigning morphosyntactic descriptions for these and other parts of speech. On the basis of these properties, we propose rule-based and stochastic strategies which could be integrated in the tagging module, creating a hybrid procedure in order to raise overall tagging accuracy for Croatian. |
doi_str_mv | 10.1109/ITI.2009.5196140 |
format | Conference Proceeding |
fullrecord | <record><control><sourceid>pascalfrancis_6IE</sourceid><recordid>TN_cdi_pascalfrancis_primary_22470743</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>5196140</ieee_id><sourcerecordid>22470743</sourcerecordid><originalsourceid>FETCH-LOGICAL-i247t-8cf74171218d94408df5969dbeae2f5f67f3797c5c8021ba773f1617160953b73</originalsourceid><addsrcrecordid>eNo9UL9rhDAYDbSFHlf3QheXjtrvS4xJxiLXVjjocp2Pz2hswFNJXPzvK9zR6Q3vJ4-xZ4QcEcxbfapzDmByiabEAu5YYpQ2UigUGqW-ZzsUAjIE5I8sidE3ABy0EQZ3jB9CmEJKIw1r9DH1Y1qFiRZPY3qZwvw7xXVcyC7epgv1vR_7J_bgaIhdcsM9-_k4nKqv7Pj9WVfvx8zzQi2Ztk4VqJCjbk1RgG6dNKVpm4467qQrlRPKKCutBo4NKSUclpuhhG18o8SevV5zZ4qWBhdotD6e5-AvFNYz31pAFWLTvVx1vuu6f_r2hvgDq_pRJQ</addsrcrecordid><sourcetype>Index Database</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype></control><display><type>conference_proceeding</type><title>Error analysis in Croatian morphosyntactic tagging</title><source>IEEE Electronic Library (IEL) Conference Proceedings</source><creator>Agic, Zeljko ; Tadic, Marko ; Dovedan, Zdravko</creator><creatorcontrib>Agic, Zeljko ; Tadic, Marko ; Dovedan, Zdravko</creatorcontrib><description>In this paper, we provide detailed insight on properties of errors generated by a stochastic morphosyntactic tagger assigning multext-East morphosyntactic descriptions to Croatian texts. Tagging the Croatia Weekly newspaper corpus by the CroTag tagger in stochastic mode revealed that approximately 85 percent of all tagging errors occur on nouns, adjectives, pronouns and verbs. Moreover, approximately 50 percent of these are shown to be incorrect assignments of case values. We provide various other distributional properties of errors in assigning morphosyntactic descriptions for these and other parts of speech. On the basis of these properties, we propose rule-based and stochastic strategies which could be integrated in the tagging module, creating a hybrid procedure in order to raise overall tagging accuracy for Croatian.</description><identifier>ISSN: 1330-1012</identifier><identifier>ISBN: 9789537138158</identifier><identifier>ISBN: 9537138151</identifier><identifier>DOI: 10.1109/ITI.2009.5196140</identifier><language>eng</language><publisher>Zagreb: IEEE</publisher><subject>Applied sciences ; Artificial intelligence ; Computer science; control theory; systems ; Croatian language ; Error analysis ; error distribution ; Exact sciences and technology ; Hidden Markov models ; Humans ; hybrid tagging ; Morphosyntactic tagging ; Natural language processing ; Natural languages ; part-of-speech tagging ; Smoothing methods ; Speech ; Speech and sound recognition and synthesis. Linguistics ; Stochastic processes ; Stochastic systems ; Tagging</subject><ispartof>Information technology interfaces, 2009, p.521-526</ispartof><rights>2015 INIST-CNRS</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/5196140$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>309,310,776,780,785,786,2052,27902,54895</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/5196140$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc><backlink>$$Uhttp://pascal-francis.inist.fr/vibad/index.php?action=getRecordDetail&idt=22470743$$DView record in Pascal Francis$$Hfree_for_read</backlink></links><search><creatorcontrib>Agic, Zeljko</creatorcontrib><creatorcontrib>Tadic, Marko</creatorcontrib><creatorcontrib>Dovedan, Zdravko</creatorcontrib><title>Error analysis in Croatian morphosyntactic tagging</title><title>Information technology interfaces</title><addtitle>ITI</addtitle><description>In this paper, we provide detailed insight on properties of errors generated by a stochastic morphosyntactic tagger assigning multext-East morphosyntactic descriptions to Croatian texts. Tagging the Croatia Weekly newspaper corpus by the CroTag tagger in stochastic mode revealed that approximately 85 percent of all tagging errors occur on nouns, adjectives, pronouns and verbs. Moreover, approximately 50 percent of these are shown to be incorrect assignments of case values. We provide various other distributional properties of errors in assigning morphosyntactic descriptions for these and other parts of speech. On the basis of these properties, we propose rule-based and stochastic strategies which could be integrated in the tagging module, creating a hybrid procedure in order to raise overall tagging accuracy for Croatian.</description><subject>Applied sciences</subject><subject>Artificial intelligence</subject><subject>Computer science; control theory; systems</subject><subject>Croatian language</subject><subject>Error analysis</subject><subject>error distribution</subject><subject>Exact sciences and technology</subject><subject>Hidden Markov models</subject><subject>Humans</subject><subject>hybrid tagging</subject><subject>Morphosyntactic tagging</subject><subject>Natural language processing</subject><subject>Natural languages</subject><subject>part-of-speech tagging</subject><subject>Smoothing methods</subject><subject>Speech</subject><subject>Speech and sound recognition and synthesis. Linguistics</subject><subject>Stochastic processes</subject><subject>Stochastic systems</subject><subject>Tagging</subject><issn>1330-1012</issn><isbn>9789537138158</isbn><isbn>9537138151</isbn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2009</creationdate><recordtype>conference_proceeding</recordtype><sourceid>6IE</sourceid><sourceid>RIE</sourceid><recordid>eNo9UL9rhDAYDbSFHlf3QheXjtrvS4xJxiLXVjjocp2Pz2hswFNJXPzvK9zR6Q3vJ4-xZ4QcEcxbfapzDmByiabEAu5YYpQ2UigUGqW-ZzsUAjIE5I8sidE3ABy0EQZ3jB9CmEJKIw1r9DH1Y1qFiRZPY3qZwvw7xXVcyC7epgv1vR_7J_bgaIhdcsM9-_k4nKqv7Pj9WVfvx8zzQi2Ztk4VqJCjbk1RgG6dNKVpm4467qQrlRPKKCutBo4NKSUclpuhhG18o8SevV5zZ4qWBhdotD6e5-AvFNYz31pAFWLTvVx1vuu6f_r2hvgDq_pRJQ</recordid><startdate>20090101</startdate><enddate>20090101</enddate><creator>Agic, Zeljko</creator><creator>Tadic, Marko</creator><creator>Dovedan, Zdravko</creator><general>IEEE</general><general>University Computing Centre</general><scope>6IE</scope><scope>6IL</scope><scope>CBEJK</scope><scope>RIE</scope><scope>RIL</scope><scope>IQODW</scope></search><sort><creationdate>20090101</creationdate><title>Error analysis in Croatian morphosyntactic tagging</title><author>Agic, Zeljko ; Tadic, Marko ; Dovedan, Zdravko</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-i247t-8cf74171218d94408df5969dbeae2f5f67f3797c5c8021ba773f1617160953b73</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2009</creationdate><topic>Applied sciences</topic><topic>Artificial intelligence</topic><topic>Computer science; control theory; systems</topic><topic>Croatian language</topic><topic>Error analysis</topic><topic>error distribution</topic><topic>Exact sciences and technology</topic><topic>Hidden Markov models</topic><topic>Humans</topic><topic>hybrid tagging</topic><topic>Morphosyntactic tagging</topic><topic>Natural language processing</topic><topic>Natural languages</topic><topic>part-of-speech tagging</topic><topic>Smoothing methods</topic><topic>Speech</topic><topic>Speech and sound recognition and synthesis. Linguistics</topic><topic>Stochastic processes</topic><topic>Stochastic systems</topic><topic>Tagging</topic><toplevel>online_resources</toplevel><creatorcontrib>Agic, Zeljko</creatorcontrib><creatorcontrib>Tadic, Marko</creatorcontrib><creatorcontrib>Dovedan, Zdravko</creatorcontrib><collection>IEEE Electronic Library (IEL) Conference Proceedings</collection><collection>IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume</collection><collection>IEEE Xplore All Conference Proceedings</collection><collection>IEEE Electronic Library (IEL)</collection><collection>IEEE Proceedings Order Plans (POP All) 1998-Present</collection><collection>Pascal-Francis</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Agic, Zeljko</au><au>Tadic, Marko</au><au>Dovedan, Zdravko</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>Error analysis in Croatian morphosyntactic tagging</atitle><btitle>Information technology interfaces</btitle><stitle>ITI</stitle><date>2009-01-01</date><risdate>2009</risdate><spage>521</spage><epage>526</epage><pages>521-526</pages><issn>1330-1012</issn><isbn>9789537138158</isbn><isbn>9537138151</isbn><abstract>In this paper, we provide detailed insight on properties of errors generated by a stochastic morphosyntactic tagger assigning multext-East morphosyntactic descriptions to Croatian texts. Tagging the Croatia Weekly newspaper corpus by the CroTag tagger in stochastic mode revealed that approximately 85 percent of all tagging errors occur on nouns, adjectives, pronouns and verbs. Moreover, approximately 50 percent of these are shown to be incorrect assignments of case values. We provide various other distributional properties of errors in assigning morphosyntactic descriptions for these and other parts of speech. On the basis of these properties, we propose rule-based and stochastic strategies which could be integrated in the tagging module, creating a hybrid procedure in order to raise overall tagging accuracy for Croatian.</abstract><cop>Zagreb</cop><pub>IEEE</pub><doi>10.1109/ITI.2009.5196140</doi><tpages>6</tpages><oa>free_for_read</oa></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | ISSN: 1330-1012 |
ispartof | Information technology interfaces, 2009, p.521-526 |
issn | 1330-1012 |
language | eng |
recordid | cdi_pascalfrancis_primary_22470743 |
source | IEEE Electronic Library (IEL) Conference Proceedings |
subjects | Applied sciences Artificial intelligence Computer science control theory systems Croatian language Error analysis error distribution Exact sciences and technology Hidden Markov models Humans hybrid tagging Morphosyntactic tagging Natural language processing Natural languages part-of-speech tagging Smoothing methods Speech Speech and sound recognition and synthesis. Linguistics Stochastic processes Stochastic systems Tagging |
title | Error analysis in Croatian morphosyntactic tagging |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-04T20%3A47%3A54IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-pascalfrancis_6IE&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=Error%20analysis%20in%20Croatian%20morphosyntactic%20tagging&rft.btitle=Information%20technology%20interfaces&rft.au=Agic,%20Zeljko&rft.date=2009-01-01&rft.spage=521&rft.epage=526&rft.pages=521-526&rft.issn=1330-1012&rft.isbn=9789537138158&rft.isbn_list=9537138151&rft_id=info:doi/10.1109/ITI.2009.5196140&rft_dat=%3Cpascalfrancis_6IE%3E22470743%3C/pascalfrancis_6IE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=5196140&rfr_iscdi=true |