Error analysis in Croatian morphosyntactic tagging

In this paper, we provide detailed insight on properties of errors generated by a stochastic morphosyntactic tagger assigning multext-East morphosyntactic descriptions to Croatian texts. Tagging the Croatia Weekly newspaper corpus by the CroTag tagger in stochastic mode revealed that approximately 8...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Agic, Zeljko, Tadic, Marko, Dovedan, Zdravko
Format: Tagungsbericht
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 526
container_issue
container_start_page 521
container_title
container_volume
creator Agic, Zeljko
Tadic, Marko
Dovedan, Zdravko
description In this paper, we provide detailed insight on properties of errors generated by a stochastic morphosyntactic tagger assigning multext-East morphosyntactic descriptions to Croatian texts. Tagging the Croatia Weekly newspaper corpus by the CroTag tagger in stochastic mode revealed that approximately 85 percent of all tagging errors occur on nouns, adjectives, pronouns and verbs. Moreover, approximately 50 percent of these are shown to be incorrect assignments of case values. We provide various other distributional properties of errors in assigning morphosyntactic descriptions for these and other parts of speech. On the basis of these properties, we propose rule-based and stochastic strategies which could be integrated in the tagging module, creating a hybrid procedure in order to raise overall tagging accuracy for Croatian.
doi_str_mv 10.1109/ITI.2009.5196140
format Conference Proceeding
fullrecord <record><control><sourceid>pascalfrancis_6IE</sourceid><recordid>TN_cdi_pascalfrancis_primary_22470743</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>5196140</ieee_id><sourcerecordid>22470743</sourcerecordid><originalsourceid>FETCH-LOGICAL-i247t-8cf74171218d94408df5969dbeae2f5f67f3797c5c8021ba773f1617160953b73</originalsourceid><addsrcrecordid>eNo9UL9rhDAYDbSFHlf3QheXjtrvS4xJxiLXVjjocp2Pz2hswFNJXPzvK9zR6Q3vJ4-xZ4QcEcxbfapzDmByiabEAu5YYpQ2UigUGqW-ZzsUAjIE5I8sidE3ABy0EQZ3jB9CmEJKIw1r9DH1Y1qFiRZPY3qZwvw7xXVcyC7epgv1vR_7J_bgaIhdcsM9-_k4nKqv7Pj9WVfvx8zzQi2Ztk4VqJCjbk1RgG6dNKVpm4467qQrlRPKKCutBo4NKSUclpuhhG18o8SevV5zZ4qWBhdotD6e5-AvFNYz31pAFWLTvVx1vuu6f_r2hvgDq_pRJQ</addsrcrecordid><sourcetype>Index Database</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype></control><display><type>conference_proceeding</type><title>Error analysis in Croatian morphosyntactic tagging</title><source>IEEE Electronic Library (IEL) Conference Proceedings</source><creator>Agic, Zeljko ; Tadic, Marko ; Dovedan, Zdravko</creator><creatorcontrib>Agic, Zeljko ; Tadic, Marko ; Dovedan, Zdravko</creatorcontrib><description>In this paper, we provide detailed insight on properties of errors generated by a stochastic morphosyntactic tagger assigning multext-East morphosyntactic descriptions to Croatian texts. Tagging the Croatia Weekly newspaper corpus by the CroTag tagger in stochastic mode revealed that approximately 85 percent of all tagging errors occur on nouns, adjectives, pronouns and verbs. Moreover, approximately 50 percent of these are shown to be incorrect assignments of case values. We provide various other distributional properties of errors in assigning morphosyntactic descriptions for these and other parts of speech. On the basis of these properties, we propose rule-based and stochastic strategies which could be integrated in the tagging module, creating a hybrid procedure in order to raise overall tagging accuracy for Croatian.</description><identifier>ISSN: 1330-1012</identifier><identifier>ISBN: 9789537138158</identifier><identifier>ISBN: 9537138151</identifier><identifier>DOI: 10.1109/ITI.2009.5196140</identifier><language>eng</language><publisher>Zagreb: IEEE</publisher><subject>Applied sciences ; Artificial intelligence ; Computer science; control theory; systems ; Croatian language ; Error analysis ; error distribution ; Exact sciences and technology ; Hidden Markov models ; Humans ; hybrid tagging ; Morphosyntactic tagging ; Natural language processing ; Natural languages ; part-of-speech tagging ; Smoothing methods ; Speech ; Speech and sound recognition and synthesis. Linguistics ; Stochastic processes ; Stochastic systems ; Tagging</subject><ispartof>Information technology interfaces, 2009, p.521-526</ispartof><rights>2015 INIST-CNRS</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/5196140$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>309,310,776,780,785,786,2052,27902,54895</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/5196140$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc><backlink>$$Uhttp://pascal-francis.inist.fr/vibad/index.php?action=getRecordDetail&amp;idt=22470743$$DView record in Pascal Francis$$Hfree_for_read</backlink></links><search><creatorcontrib>Agic, Zeljko</creatorcontrib><creatorcontrib>Tadic, Marko</creatorcontrib><creatorcontrib>Dovedan, Zdravko</creatorcontrib><title>Error analysis in Croatian morphosyntactic tagging</title><title>Information technology interfaces</title><addtitle>ITI</addtitle><description>In this paper, we provide detailed insight on properties of errors generated by a stochastic morphosyntactic tagger assigning multext-East morphosyntactic descriptions to Croatian texts. Tagging the Croatia Weekly newspaper corpus by the CroTag tagger in stochastic mode revealed that approximately 85 percent of all tagging errors occur on nouns, adjectives, pronouns and verbs. Moreover, approximately 50 percent of these are shown to be incorrect assignments of case values. We provide various other distributional properties of errors in assigning morphosyntactic descriptions for these and other parts of speech. On the basis of these properties, we propose rule-based and stochastic strategies which could be integrated in the tagging module, creating a hybrid procedure in order to raise overall tagging accuracy for Croatian.</description><subject>Applied sciences</subject><subject>Artificial intelligence</subject><subject>Computer science; control theory; systems</subject><subject>Croatian language</subject><subject>Error analysis</subject><subject>error distribution</subject><subject>Exact sciences and technology</subject><subject>Hidden Markov models</subject><subject>Humans</subject><subject>hybrid tagging</subject><subject>Morphosyntactic tagging</subject><subject>Natural language processing</subject><subject>Natural languages</subject><subject>part-of-speech tagging</subject><subject>Smoothing methods</subject><subject>Speech</subject><subject>Speech and sound recognition and synthesis. Linguistics</subject><subject>Stochastic processes</subject><subject>Stochastic systems</subject><subject>Tagging</subject><issn>1330-1012</issn><isbn>9789537138158</isbn><isbn>9537138151</isbn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2009</creationdate><recordtype>conference_proceeding</recordtype><sourceid>6IE</sourceid><sourceid>RIE</sourceid><recordid>eNo9UL9rhDAYDbSFHlf3QheXjtrvS4xJxiLXVjjocp2Pz2hswFNJXPzvK9zR6Q3vJ4-xZ4QcEcxbfapzDmByiabEAu5YYpQ2UigUGqW-ZzsUAjIE5I8sidE3ABy0EQZ3jB9CmEJKIw1r9DH1Y1qFiRZPY3qZwvw7xXVcyC7epgv1vR_7J_bgaIhdcsM9-_k4nKqv7Pj9WVfvx8zzQi2Ztk4VqJCjbk1RgG6dNKVpm4467qQrlRPKKCutBo4NKSUclpuhhG18o8SevV5zZ4qWBhdotD6e5-AvFNYz31pAFWLTvVx1vuu6f_r2hvgDq_pRJQ</recordid><startdate>20090101</startdate><enddate>20090101</enddate><creator>Agic, Zeljko</creator><creator>Tadic, Marko</creator><creator>Dovedan, Zdravko</creator><general>IEEE</general><general>University Computing Centre</general><scope>6IE</scope><scope>6IL</scope><scope>CBEJK</scope><scope>RIE</scope><scope>RIL</scope><scope>IQODW</scope></search><sort><creationdate>20090101</creationdate><title>Error analysis in Croatian morphosyntactic tagging</title><author>Agic, Zeljko ; Tadic, Marko ; Dovedan, Zdravko</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-i247t-8cf74171218d94408df5969dbeae2f5f67f3797c5c8021ba773f1617160953b73</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2009</creationdate><topic>Applied sciences</topic><topic>Artificial intelligence</topic><topic>Computer science; control theory; systems</topic><topic>Croatian language</topic><topic>Error analysis</topic><topic>error distribution</topic><topic>Exact sciences and technology</topic><topic>Hidden Markov models</topic><topic>Humans</topic><topic>hybrid tagging</topic><topic>Morphosyntactic tagging</topic><topic>Natural language processing</topic><topic>Natural languages</topic><topic>part-of-speech tagging</topic><topic>Smoothing methods</topic><topic>Speech</topic><topic>Speech and sound recognition and synthesis. Linguistics</topic><topic>Stochastic processes</topic><topic>Stochastic systems</topic><topic>Tagging</topic><toplevel>online_resources</toplevel><creatorcontrib>Agic, Zeljko</creatorcontrib><creatorcontrib>Tadic, Marko</creatorcontrib><creatorcontrib>Dovedan, Zdravko</creatorcontrib><collection>IEEE Electronic Library (IEL) Conference Proceedings</collection><collection>IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume</collection><collection>IEEE Xplore All Conference Proceedings</collection><collection>IEEE Electronic Library (IEL)</collection><collection>IEEE Proceedings Order Plans (POP All) 1998-Present</collection><collection>Pascal-Francis</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Agic, Zeljko</au><au>Tadic, Marko</au><au>Dovedan, Zdravko</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>Error analysis in Croatian morphosyntactic tagging</atitle><btitle>Information technology interfaces</btitle><stitle>ITI</stitle><date>2009-01-01</date><risdate>2009</risdate><spage>521</spage><epage>526</epage><pages>521-526</pages><issn>1330-1012</issn><isbn>9789537138158</isbn><isbn>9537138151</isbn><abstract>In this paper, we provide detailed insight on properties of errors generated by a stochastic morphosyntactic tagger assigning multext-East morphosyntactic descriptions to Croatian texts. Tagging the Croatia Weekly newspaper corpus by the CroTag tagger in stochastic mode revealed that approximately 85 percent of all tagging errors occur on nouns, adjectives, pronouns and verbs. Moreover, approximately 50 percent of these are shown to be incorrect assignments of case values. We provide various other distributional properties of errors in assigning morphosyntactic descriptions for these and other parts of speech. On the basis of these properties, we propose rule-based and stochastic strategies which could be integrated in the tagging module, creating a hybrid procedure in order to raise overall tagging accuracy for Croatian.</abstract><cop>Zagreb</cop><pub>IEEE</pub><doi>10.1109/ITI.2009.5196140</doi><tpages>6</tpages><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier ISSN: 1330-1012
ispartof Information technology interfaces, 2009, p.521-526
issn 1330-1012
language eng
recordid cdi_pascalfrancis_primary_22470743
source IEEE Electronic Library (IEL) Conference Proceedings
subjects Applied sciences
Artificial intelligence
Computer science
control theory
systems
Croatian language
Error analysis
error distribution
Exact sciences and technology
Hidden Markov models
Humans
hybrid tagging
Morphosyntactic tagging
Natural language processing
Natural languages
part-of-speech tagging
Smoothing methods
Speech
Speech and sound recognition and synthesis. Linguistics
Stochastic processes
Stochastic systems
Tagging
title Error analysis in Croatian morphosyntactic tagging
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-04T20%3A47%3A54IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-pascalfrancis_6IE&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=Error%20analysis%20in%20Croatian%20morphosyntactic%20tagging&rft.btitle=Information%20technology%20interfaces&rft.au=Agic,%20Zeljko&rft.date=2009-01-01&rft.spage=521&rft.epage=526&rft.pages=521-526&rft.issn=1330-1012&rft.isbn=9789537138158&rft.isbn_list=9537138151&rft_id=info:doi/10.1109/ITI.2009.5196140&rft_dat=%3Cpascalfrancis_6IE%3E22470743%3C/pascalfrancis_6IE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=5196140&rfr_iscdi=true