Domain-aware Evaluation of Named Entity Recognition Systems for Croatian
We provide an evaluation of the currently available named entity recognition systems for Croatian. The evaluation puts special emphasis on domain dependence. To this goal, we manually annotated a dataset of approximately 1 million tokens of Croatian text from various domains within the newspaper tex...
Gespeichert in:
Veröffentlicht in: | Journal of computing and information technology 2013-09, Vol.21 (3), p.195 |
---|---|
Hauptverfasser: | , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | |
---|---|
container_issue | 3 |
container_start_page | 195 |
container_title | Journal of computing and information technology |
container_volume | 21 |
creator | Agic, Zeljko Bekavac, Bozo |
description | We provide an evaluation of the currently available named entity recognition systems for Croatian. The evaluation puts special emphasis on domain dependence. To this goal, we manually annotated a dataset of approximately 1 million tokens of Croatian text from various domains within the newspaper text genre. The dataset was annotated using a three-class named entity tag set--denoting personal names, locations and organizations. We give insight to feature selection, domain sensitivity and effects of increase in training set size for statistical named entity recognition using the state-of-the-art Stanford NER system. We also sketch a comparison of publicly available named entity recognition systems for Croatian considering domain dependence, regardless of their underlying paradigms. Our top-performing system achieved an [F.sub.1]-score of 0.884 in a mixed-domain testing scenario, scoring 0.925 and 0.843 in the two domains separated for the experiment. The system shows consistency in state-of-the-art scores for detecting names of persons, locations and organizations. |
doi_str_mv | 10.2498/cit.1002190 |
format | Article |
fullrecord | <record><control><sourceid>gale_hrcak</sourceid><recordid>TN_cdi_hrcak_primary_oai_hrcak_srce_hr_110027</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><galeid>A361943259</galeid><sourcerecordid>A361943259</sourcerecordid><originalsourceid>FETCH-LOGICAL-c245t-f754cd2ac143ed13405f0930d4bb89ebc07d45b08e25ce21d75cc7a660a31ecf3</originalsourceid><addsrcrecordid>eNptkU9LAzEQxYMoWGpPfoEFTyJbk032T46lVlsoCq2ewzSb1Gh3I0mq9tubukUoODnMY_J7Q8hD6JLgYcZ4dStNGBKMM8LxCeqRihUp5bg6jZpSnBJCi3M08P4Nx6K8KBjpoemdbcC0KXyBU8nkEzZbCMa2idXJIzSqTiZtMGGXLJS069b83i13PqjGJ9q6ZOxsNEB7gc40bLwaHHofvdxPnsfTdP70MBuP5qnMWB5SXeZM1hlIwqiqCWU415hTXLPVquJqJXFZs3yFK5XlUmWkLnMpSygKDJQoqWkfpd3eVyfhXXw404DbCQtGdBPvpIpSkP1flJG_6vg1bJQwrbbBgWyMl2JEC8IZzXIeqeE_VDy1aoy0rdImzo8M10eGyAT1Hdaw9V7Mlotj9qZjpbPeO6X_Xk2w2EcnYnTiEB39AaGHiiE</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Domain-aware Evaluation of Named Entity Recognition Systems for Croatian</title><source>Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals</source><source>Alma/SFX Local Collection</source><creator>Agic, Zeljko ; Bekavac, Bozo</creator><creatorcontrib>Agic, Zeljko ; Bekavac, Bozo</creatorcontrib><description>We provide an evaluation of the currently available named entity recognition systems for Croatian. The evaluation puts special emphasis on domain dependence. To this goal, we manually annotated a dataset of approximately 1 million tokens of Croatian text from various domains within the newspaper text genre. The dataset was annotated using a three-class named entity tag set--denoting personal names, locations and organizations. We give insight to feature selection, domain sensitivity and effects of increase in training set size for statistical named entity recognition using the state-of-the-art Stanford NER system. We also sketch a comparison of publicly available named entity recognition systems for Croatian considering domain dependence, regardless of their underlying paradigms. Our top-performing system achieved an [F.sub.1]-score of 0.884 in a mixed-domain testing scenario, scoring 0.925 and 0.843 in the two domains separated for the experiment. The system shows consistency in state-of-the-art scores for detecting names of persons, locations and organizations.</description><identifier>ISSN: 1330-1136</identifier><identifier>EISSN: 1846-3908</identifier><identifier>DOI: 10.2498/cit.1002190</identifier><identifier>CODEN: CJCTEM</identifier><language>eng</language><publisher>Sveuciliste U Zagrebu</publisher><subject>Computational linguistics ; Croatian language ; domain dependence ; evaluation ; Language processing ; Methods ; named entity recognition ; Natural language interfaces ; Serbo-Croatian language ; text domain ; Text processing</subject><ispartof>Journal of computing and information technology, 2013-09, Vol.21 (3), p.195</ispartof><rights>COPYRIGHT 2013 Sveuciliste U Zagrebu</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Uhttps://hrcak.srce.hr/logo_broj/8996.jpg</thumbnail><link.rule.ids>230,314,776,780,881,27903,27904</link.rule.ids></links><search><creatorcontrib>Agic, Zeljko</creatorcontrib><creatorcontrib>Bekavac, Bozo</creatorcontrib><title>Domain-aware Evaluation of Named Entity Recognition Systems for Croatian</title><title>Journal of computing and information technology</title><description>We provide an evaluation of the currently available named entity recognition systems for Croatian. The evaluation puts special emphasis on domain dependence. To this goal, we manually annotated a dataset of approximately 1 million tokens of Croatian text from various domains within the newspaper text genre. The dataset was annotated using a three-class named entity tag set--denoting personal names, locations and organizations. We give insight to feature selection, domain sensitivity and effects of increase in training set size for statistical named entity recognition using the state-of-the-art Stanford NER system. We also sketch a comparison of publicly available named entity recognition systems for Croatian considering domain dependence, regardless of their underlying paradigms. Our top-performing system achieved an [F.sub.1]-score of 0.884 in a mixed-domain testing scenario, scoring 0.925 and 0.843 in the two domains separated for the experiment. The system shows consistency in state-of-the-art scores for detecting names of persons, locations and organizations.</description><subject>Computational linguistics</subject><subject>Croatian language</subject><subject>domain dependence</subject><subject>evaluation</subject><subject>Language processing</subject><subject>Methods</subject><subject>named entity recognition</subject><subject>Natural language interfaces</subject><subject>Serbo-Croatian language</subject><subject>text domain</subject><subject>Text processing</subject><issn>1330-1136</issn><issn>1846-3908</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2013</creationdate><recordtype>article</recordtype><recordid>eNptkU9LAzEQxYMoWGpPfoEFTyJbk032T46lVlsoCq2ewzSb1Gh3I0mq9tubukUoODnMY_J7Q8hD6JLgYcZ4dStNGBKMM8LxCeqRihUp5bg6jZpSnBJCi3M08P4Nx6K8KBjpoemdbcC0KXyBU8nkEzZbCMa2idXJIzSqTiZtMGGXLJS069b83i13PqjGJ9q6ZOxsNEB7gc40bLwaHHofvdxPnsfTdP70MBuP5qnMWB5SXeZM1hlIwqiqCWU415hTXLPVquJqJXFZs3yFK5XlUmWkLnMpSygKDJQoqWkfpd3eVyfhXXw404DbCQtGdBPvpIpSkP1flJG_6vg1bJQwrbbBgWyMl2JEC8IZzXIeqeE_VDy1aoy0rdImzo8M10eGyAT1Hdaw9V7Mlotj9qZjpbPeO6X_Xk2w2EcnYnTiEB39AaGHiiE</recordid><startdate>20130901</startdate><enddate>20130901</enddate><creator>Agic, Zeljko</creator><creator>Bekavac, Bozo</creator><general>Sveuciliste U Zagrebu</general><general>Fakultet elektrotehnike i računarstva Sveučilišta u Zagrebu</general><scope>AAYXX</scope><scope>CITATION</scope><scope>ISR</scope><scope>VP8</scope></search><sort><creationdate>20130901</creationdate><title>Domain-aware Evaluation of Named Entity Recognition Systems for Croatian</title><author>Agic, Zeljko ; Bekavac, Bozo</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c245t-f754cd2ac143ed13405f0930d4bb89ebc07d45b08e25ce21d75cc7a660a31ecf3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2013</creationdate><topic>Computational linguistics</topic><topic>Croatian language</topic><topic>domain dependence</topic><topic>evaluation</topic><topic>Language processing</topic><topic>Methods</topic><topic>named entity recognition</topic><topic>Natural language interfaces</topic><topic>Serbo-Croatian language</topic><topic>text domain</topic><topic>Text processing</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Agic, Zeljko</creatorcontrib><creatorcontrib>Bekavac, Bozo</creatorcontrib><collection>CrossRef</collection><collection>Gale In Context: Science</collection><collection>Hrcak: Portal of scientific journals of Croatia</collection><jtitle>Journal of computing and information technology</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Agic, Zeljko</au><au>Bekavac, Bozo</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Domain-aware Evaluation of Named Entity Recognition Systems for Croatian</atitle><jtitle>Journal of computing and information technology</jtitle><date>2013-09-01</date><risdate>2013</risdate><volume>21</volume><issue>3</issue><spage>195</spage><pages>195-</pages><issn>1330-1136</issn><eissn>1846-3908</eissn><coden>CJCTEM</coden><abstract>We provide an evaluation of the currently available named entity recognition systems for Croatian. The evaluation puts special emphasis on domain dependence. To this goal, we manually annotated a dataset of approximately 1 million tokens of Croatian text from various domains within the newspaper text genre. The dataset was annotated using a three-class named entity tag set--denoting personal names, locations and organizations. We give insight to feature selection, domain sensitivity and effects of increase in training set size for statistical named entity recognition using the state-of-the-art Stanford NER system. We also sketch a comparison of publicly available named entity recognition systems for Croatian considering domain dependence, regardless of their underlying paradigms. Our top-performing system achieved an [F.sub.1]-score of 0.884 in a mixed-domain testing scenario, scoring 0.925 and 0.843 in the two domains separated for the experiment. The system shows consistency in state-of-the-art scores for detecting names of persons, locations and organizations.</abstract><pub>Sveuciliste U Zagrebu</pub><doi>10.2498/cit.1002190</doi><tpages>15</tpages><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | ISSN: 1330-1136 |
ispartof | Journal of computing and information technology, 2013-09, Vol.21 (3), p.195 |
issn | 1330-1136 1846-3908 |
language | eng |
recordid | cdi_hrcak_primary_oai_hrcak_srce_hr_110027 |
source | Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals; Alma/SFX Local Collection |
subjects | Computational linguistics Croatian language domain dependence evaluation Language processing Methods named entity recognition Natural language interfaces Serbo-Croatian language text domain Text processing |
title | Domain-aware Evaluation of Named Entity Recognition Systems for Croatian |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-27T07%3A26%3A32IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-gale_hrcak&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Domain-aware%20Evaluation%20of%20Named%20Entity%20Recognition%20Systems%20for%20Croatian&rft.jtitle=Journal%20of%20computing%20and%20information%20technology&rft.au=Agic,%20Zeljko&rft.date=2013-09-01&rft.volume=21&rft.issue=3&rft.spage=195&rft.pages=195-&rft.issn=1330-1136&rft.eissn=1846-3908&rft.coden=CJCTEM&rft_id=info:doi/10.2498/cit.1002190&rft_dat=%3Cgale_hrcak%3EA361943259%3C/gale_hrcak%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rft_galeid=A361943259&rfr_iscdi=true |