Syntactic annotation in the Reference Corpus for the Processing of Basque (EPEC): Theoretical and practical issues

In this paper, we will describe some theoretical and practical issues raised during the construction of the Basque Dependency Treebank (BDT): the syntactic annotation of EPEC (Reference Corpus for the Processing of Basque). EPEC is a 300,000 word corpus of standard written Basque whose purpose is to...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Corpus linguistics and linguistic theory 2009-01, Vol.5 (2), p.241-269
Hauptverfasser: Aldezabal, Izaskun, Aranzabe, Maria Jesus, Arriola, Jose Mari, de Ilarraza, Arantza Diaz
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 269
container_issue 2
container_start_page 241
container_title Corpus linguistics and linguistic theory
container_volume 5
creator Aldezabal, Izaskun
Aranzabe, Maria Jesus
Arriola, Jose Mari
de Ilarraza, Arantza Diaz
description In this paper, we will describe some theoretical and practical issues raised during the construction of the Basque Dependency Treebank (BDT): the syntactic annotation of EPEC (Reference Corpus for the Processing of Basque). EPEC is a 300,000 word corpus of standard written Basque whose purpose is to be a training corpus for the development and improvement of several NLP (Natural Language Processing) tools for Basque. BDT will be the first corpus for the Basque language tagged at syntactic level. We will also present the dependency-based annotation hierarchy that we have established for the syntactic tagging. Decisions made during design of the annotation hierarchy are based on the description of Basque grammar made by Euskaltzaindia (Academy for the Basque Language). When describing dependency relations, we consider lexical units as syntactic heads. This will open up a way for us to work with semantics.
doi_str_mv 10.1515/CLLT.2009.010
format Article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_85701743</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>85701743</sourcerecordid><originalsourceid>FETCH-LOGICAL-c306t-14ec66edd0da9fe01b2888b9f937f19f58f4726030325e8cdd62dbc030f14a013</originalsourceid><addsrcrecordid>eNpFkD1PwzAQhiMEEqUwsntCMKSc4zgfbBC1gFSJlIbZcp0zDaRxsBOJ_ntSisp09-oevTo9nndJYUI55bfZfF5MAoB0AhSOvBGNKPNjYPz4sAfxqXfm3AcAj2hKR55dbptOqq5SRDaN6WRXmYZUDenWSF5Ro8VGIcmMbXtHtLG_h9wahc5VzTsxmjxI99UjuZ7m0-zmjhRrNBaHRlkPnSVp7W__kCrnenTn3omWtcOLvzn23mbTInvy5y-Pz9n93FcMos6nIaoowrKEUqYaga6CJElWqU5ZrGmqeaLDOIiAAQs4Jqoso6BcqSFrGkqgbOxd7Xtba4b_XCc2lVNY17JB0zuR8BhoHLIB9PegssY5i1q0ttpIuxUUxM6s2JkVO7NiMPvPV67D7wMs7aeIYhZzsShCsZwFfJFDLgr2A7miezs</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>85701743</pqid></control><display><type>article</type><title>Syntactic annotation in the Reference Corpus for the Processing of Basque (EPEC): Theoretical and practical issues</title><source>De Gruyter journals</source><creator>Aldezabal, Izaskun ; Aranzabe, Maria Jesus ; Arriola, Jose Mari ; de Ilarraza, Arantza Diaz</creator><creatorcontrib>Aldezabal, Izaskun ; Aranzabe, Maria Jesus ; Arriola, Jose Mari ; de Ilarraza, Arantza Diaz</creatorcontrib><description>In this paper, we will describe some theoretical and practical issues raised during the construction of the Basque Dependency Treebank (BDT): the syntactic annotation of EPEC (Reference Corpus for the Processing of Basque). EPEC is a 300,000 word corpus of standard written Basque whose purpose is to be a training corpus for the development and improvement of several NLP (Natural Language Processing) tools for Basque. BDT will be the first corpus for the Basque language tagged at syntactic level. We will also present the dependency-based annotation hierarchy that we have established for the syntactic tagging. Decisions made during design of the annotation hierarchy are based on the description of Basque grammar made by Euskaltzaindia (Academy for the Basque Language). When describing dependency relations, we consider lexical units as syntactic heads. This will open up a way for us to work with semantics.</description><identifier>ISSN: 1613-7027</identifier><identifier>EISSN: 1613-7035</identifier><identifier>DOI: 10.1515/CLLT.2009.010</identifier><language>eng</language><publisher>Walter de Gruyter GmbH &amp; Co. KG</publisher><subject>dependency grammar ; Syntactic annotation ; treebank</subject><ispartof>Corpus linguistics and linguistic theory, 2009-01, Vol.5 (2), p.241-269</ispartof><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c306t-14ec66edd0da9fe01b2888b9f937f19f58f4726030325e8cdd62dbc030f14a013</citedby></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,776,780,27903,27904</link.rule.ids></links><search><creatorcontrib>Aldezabal, Izaskun</creatorcontrib><creatorcontrib>Aranzabe, Maria Jesus</creatorcontrib><creatorcontrib>Arriola, Jose Mari</creatorcontrib><creatorcontrib>de Ilarraza, Arantza Diaz</creatorcontrib><title>Syntactic annotation in the Reference Corpus for the Processing of Basque (EPEC): Theoretical and practical issues</title><title>Corpus linguistics and linguistic theory</title><addtitle>Corpus Linguistics and Linguistic Theory</addtitle><description>In this paper, we will describe some theoretical and practical issues raised during the construction of the Basque Dependency Treebank (BDT): the syntactic annotation of EPEC (Reference Corpus for the Processing of Basque). EPEC is a 300,000 word corpus of standard written Basque whose purpose is to be a training corpus for the development and improvement of several NLP (Natural Language Processing) tools for Basque. BDT will be the first corpus for the Basque language tagged at syntactic level. We will also present the dependency-based annotation hierarchy that we have established for the syntactic tagging. Decisions made during design of the annotation hierarchy are based on the description of Basque grammar made by Euskaltzaindia (Academy for the Basque Language). When describing dependency relations, we consider lexical units as syntactic heads. This will open up a way for us to work with semantics.</description><subject>dependency grammar</subject><subject>Syntactic annotation</subject><subject>treebank</subject><issn>1613-7027</issn><issn>1613-7035</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2009</creationdate><recordtype>article</recordtype><recordid>eNpFkD1PwzAQhiMEEqUwsntCMKSc4zgfbBC1gFSJlIbZcp0zDaRxsBOJ_ntSisp09-oevTo9nndJYUI55bfZfF5MAoB0AhSOvBGNKPNjYPz4sAfxqXfm3AcAj2hKR55dbptOqq5SRDaN6WRXmYZUDenWSF5Ro8VGIcmMbXtHtLG_h9wahc5VzTsxmjxI99UjuZ7m0-zmjhRrNBaHRlkPnSVp7W__kCrnenTn3omWtcOLvzn23mbTInvy5y-Pz9n93FcMos6nIaoowrKEUqYaga6CJElWqU5ZrGmqeaLDOIiAAQs4Jqoso6BcqSFrGkqgbOxd7Xtba4b_XCc2lVNY17JB0zuR8BhoHLIB9PegssY5i1q0ttpIuxUUxM6s2JkVO7NiMPvPV67D7wMs7aeIYhZzsShCsZwFfJFDLgr2A7miezs</recordid><startdate>20090101</startdate><enddate>20090101</enddate><creator>Aldezabal, Izaskun</creator><creator>Aranzabe, Maria Jesus</creator><creator>Arriola, Jose Mari</creator><creator>de Ilarraza, Arantza Diaz</creator><general>Walter de Gruyter GmbH &amp; Co. KG</general><scope>BSCLL</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7T9</scope></search><sort><creationdate>20090101</creationdate><title>Syntactic annotation in the Reference Corpus for the Processing of Basque (EPEC): Theoretical and practical issues</title><author>Aldezabal, Izaskun ; Aranzabe, Maria Jesus ; Arriola, Jose Mari ; de Ilarraza, Arantza Diaz</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c306t-14ec66edd0da9fe01b2888b9f937f19f58f4726030325e8cdd62dbc030f14a013</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2009</creationdate><topic>dependency grammar</topic><topic>Syntactic annotation</topic><topic>treebank</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Aldezabal, Izaskun</creatorcontrib><creatorcontrib>Aranzabe, Maria Jesus</creatorcontrib><creatorcontrib>Arriola, Jose Mari</creatorcontrib><creatorcontrib>de Ilarraza, Arantza Diaz</creatorcontrib><collection>Istex</collection><collection>CrossRef</collection><collection>Linguistics and Language Behavior Abstracts (LLBA)</collection><jtitle>Corpus linguistics and linguistic theory</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Aldezabal, Izaskun</au><au>Aranzabe, Maria Jesus</au><au>Arriola, Jose Mari</au><au>de Ilarraza, Arantza Diaz</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Syntactic annotation in the Reference Corpus for the Processing of Basque (EPEC): Theoretical and practical issues</atitle><jtitle>Corpus linguistics and linguistic theory</jtitle><addtitle>Corpus Linguistics and Linguistic Theory</addtitle><date>2009-01-01</date><risdate>2009</risdate><volume>5</volume><issue>2</issue><spage>241</spage><epage>269</epage><pages>241-269</pages><issn>1613-7027</issn><eissn>1613-7035</eissn><abstract>In this paper, we will describe some theoretical and practical issues raised during the construction of the Basque Dependency Treebank (BDT): the syntactic annotation of EPEC (Reference Corpus for the Processing of Basque). EPEC is a 300,000 word corpus of standard written Basque whose purpose is to be a training corpus for the development and improvement of several NLP (Natural Language Processing) tools for Basque. BDT will be the first corpus for the Basque language tagged at syntactic level. We will also present the dependency-based annotation hierarchy that we have established for the syntactic tagging. Decisions made during design of the annotation hierarchy are based on the description of Basque grammar made by Euskaltzaindia (Academy for the Basque Language). When describing dependency relations, we consider lexical units as syntactic heads. This will open up a way for us to work with semantics.</abstract><pub>Walter de Gruyter GmbH &amp; Co. KG</pub><doi>10.1515/CLLT.2009.010</doi><tpages>29</tpages></addata></record>
fulltext fulltext
identifier ISSN: 1613-7027
ispartof Corpus linguistics and linguistic theory, 2009-01, Vol.5 (2), p.241-269
issn 1613-7027
1613-7035
language eng
recordid cdi_proquest_miscellaneous_85701743
source De Gruyter journals
subjects dependency grammar
Syntactic annotation
treebank
title Syntactic annotation in the Reference Corpus for the Processing of Basque (EPEC): Theoretical and practical issues
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-26T20%3A30%3A01IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Syntactic%20annotation%20in%20the%20Reference%20Corpus%20for%20the%20Processing%20of%20Basque%20(EPEC):%20Theoretical%20and%20practical%20issues&rft.jtitle=Corpus%20linguistics%20and%20linguistic%20theory&rft.au=Aldezabal,%20Izaskun&rft.date=2009-01-01&rft.volume=5&rft.issue=2&rft.spage=241&rft.epage=269&rft.pages=241-269&rft.issn=1613-7027&rft.eissn=1613-7035&rft_id=info:doi/10.1515/CLLT.2009.010&rft_dat=%3Cproquest_cross%3E85701743%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=85701743&rft_id=info:pmid/&rfr_iscdi=true