The corpus of Basque simplified texts (CBST)

In this paper we present the corpus of Basque simplified texts. This corpus compiles 227 original sentences of science popularisation domain and two simplified versions of each sentence. The simplified versions have been created following different approaches: the structural, by a court translator w...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Language resources and evaluation 2018-03, Vol.52 (1), p.217-247
Hauptverfasser: Gonzalez-Dios, Itziar, Aranzabe, María Jesús, Díaz de Ilarraza, Arantza
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 247
container_issue 1
container_start_page 217
container_title Language resources and evaluation
container_volume 52
creator Gonzalez-Dios, Itziar
Aranzabe, María Jesús
Díaz de Ilarraza, Arantza
description In this paper we present the corpus of Basque simplified texts. This corpus compiles 227 original sentences of science popularisation domain and two simplified versions of each sentence. The simplified versions have been created following different approaches: the structural, by a court translator who considers easy-to-read guidelines and the intuitive, by a teacher based on her experience. The aim of this corpus is to make a comparative analysis of simplified text. To that end, we also present the annotation scheme we have created to annotate the corpus. The annotation scheme is divided into eight macro-operations: delete, merge, split, transformation, insert, reordering, no operation and other. These macro-operations can be classified into different operations. We also relate our work and results to other languages. This corpus will be used to corroborate the decisions taken and to improve the design of the automatic text simplification system for Basque.
doi_str_mv 10.1007/s10579-017-9407-6
format Article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_1965735409</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>1965735409</sourcerecordid><originalsourceid>FETCH-LOGICAL-c359t-46a37d535819e00651f470fbf35893877ff3c5c81760a0456512fc8822e683d53</originalsourceid><addsrcrecordid>eNp1kE1LAzEQhoMoWKs_wNuCFwWjk-_kaBe_oODBFbyFNU10S9tdk12o_96UFfHiaYbheWeGB6FTAlcEQF0nAkIZDERhw0FhuYcmRCiOgRK9_9vD6yE6SmkJwClXeoIuqw9fuDZ2QyraUMzq9Dn4IjXrbtWExi-K3m_7VJyXs-fq4hgdhHqV_MlPnaKXu9uqfMDzp_vH8maOHROmx1zWTC0EE5oYDyAFCVxBeAt5YphWKgTmhNNESaiBiwzQ4LSm1EvNcnCKzsa9XWzzO6m3y3aIm3zSEiOFYoKDyRQZKRfblKIPtovNuo5floDdSbGjFJul2J0UK3OGjpmU2c27j382_xv6Bg6BYKk</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1965735409</pqid></control><display><type>article</type><title>The corpus of Basque simplified texts (CBST)</title><source>JSTOR Archive Collection A-Z Listing</source><source>SpringerLink Journals - AutoHoldings</source><creator>Gonzalez-Dios, Itziar ; Aranzabe, María Jesús ; Díaz de Ilarraza, Arantza</creator><creatorcontrib>Gonzalez-Dios, Itziar ; Aranzabe, María Jesús ; Díaz de Ilarraza, Arantza</creatorcontrib><description>In this paper we present the corpus of Basque simplified texts. This corpus compiles 227 original sentences of science popularisation domain and two simplified versions of each sentence. The simplified versions have been created following different approaches: the structural, by a court translator who considers easy-to-read guidelines and the intuitive, by a teacher based on her experience. The aim of this corpus is to make a comparative analysis of simplified text. To that end, we also present the annotation scheme we have created to annotate the corpus. The annotation scheme is divided into eight macro-operations: delete, merge, split, transformation, insert, reordering, no operation and other. These macro-operations can be classified into different operations. We also relate our work and results to other languages. This corpus will be used to corroborate the decisions taken and to improve the design of the automatic text simplification system for Basque.</description><identifier>ISSN: 1574-020X</identifier><identifier>EISSN: 1574-0218</identifier><identifier>DOI: 10.1007/s10579-017-9407-6</identifier><language>eng</language><publisher>Dordrecht: Springer Netherlands</publisher><subject>Annotations ; Basque language ; Computational Linguistics ; Computer Science ; Corpus analysis ; Corpus linguistics ; Language and Literature ; Linguistics ; Machine learning ; Sentences ; Simplified language ; Social Sciences ; Texts ; Translators</subject><ispartof>Language resources and evaluation, 2018-03, Vol.52 (1), p.217-247</ispartof><rights>The Author(s) 2017</rights><rights>Language Resources and Evaluation is a copyright of Springer, (2017). All Rights Reserved.</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c359t-46a37d535819e00651f470fbf35893877ff3c5c81760a0456512fc8822e683d53</citedby><cites>FETCH-LOGICAL-c359t-46a37d535819e00651f470fbf35893877ff3c5c81760a0456512fc8822e683d53</cites><orcidid>0000-0002-0401-1087 ; 0000-0003-3317-8561 ; 0000-0003-1048-5403</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://link.springer.com/content/pdf/10.1007/s10579-017-9407-6$$EPDF$$P50$$Gspringer$$Hfree_for_read</linktopdf><linktohtml>$$Uhttps://link.springer.com/10.1007/s10579-017-9407-6$$EHTML$$P50$$Gspringer$$Hfree_for_read</linktohtml><link.rule.ids>314,780,784,27923,27924,41487,42556,51318</link.rule.ids></links><search><creatorcontrib>Gonzalez-Dios, Itziar</creatorcontrib><creatorcontrib>Aranzabe, María Jesús</creatorcontrib><creatorcontrib>Díaz de Ilarraza, Arantza</creatorcontrib><title>The corpus of Basque simplified texts (CBST)</title><title>Language resources and evaluation</title><addtitle>Lang Resources &amp; Evaluation</addtitle><description>In this paper we present the corpus of Basque simplified texts. This corpus compiles 227 original sentences of science popularisation domain and two simplified versions of each sentence. The simplified versions have been created following different approaches: the structural, by a court translator who considers easy-to-read guidelines and the intuitive, by a teacher based on her experience. The aim of this corpus is to make a comparative analysis of simplified text. To that end, we also present the annotation scheme we have created to annotate the corpus. The annotation scheme is divided into eight macro-operations: delete, merge, split, transformation, insert, reordering, no operation and other. These macro-operations can be classified into different operations. We also relate our work and results to other languages. This corpus will be used to corroborate the decisions taken and to improve the design of the automatic text simplification system for Basque.</description><subject>Annotations</subject><subject>Basque language</subject><subject>Computational Linguistics</subject><subject>Computer Science</subject><subject>Corpus analysis</subject><subject>Corpus linguistics</subject><subject>Language and Literature</subject><subject>Linguistics</subject><subject>Machine learning</subject><subject>Sentences</subject><subject>Simplified language</subject><subject>Social Sciences</subject><subject>Texts</subject><subject>Translators</subject><issn>1574-020X</issn><issn>1574-0218</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2018</creationdate><recordtype>article</recordtype><sourceid>C6C</sourceid><sourceid>8G5</sourceid><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AIMQZ</sourceid><sourceid>AVQMV</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><sourceid>GNUQQ</sourceid><sourceid>GUQSH</sourceid><sourceid>K50</sourceid><sourceid>M1D</sourceid><sourceid>M2O</sourceid><recordid>eNp1kE1LAzEQhoMoWKs_wNuCFwWjk-_kaBe_oODBFbyFNU10S9tdk12o_96UFfHiaYbheWeGB6FTAlcEQF0nAkIZDERhw0FhuYcmRCiOgRK9_9vD6yE6SmkJwClXeoIuqw9fuDZ2QyraUMzq9Dn4IjXrbtWExi-K3m_7VJyXs-fq4hgdhHqV_MlPnaKXu9uqfMDzp_vH8maOHROmx1zWTC0EE5oYDyAFCVxBeAt5YphWKgTmhNNESaiBiwzQ4LSm1EvNcnCKzsa9XWzzO6m3y3aIm3zSEiOFYoKDyRQZKRfblKIPtovNuo5floDdSbGjFJul2J0UK3OGjpmU2c27j382_xv6Bg6BYKk</recordid><startdate>20180301</startdate><enddate>20180301</enddate><creator>Gonzalez-Dios, Itziar</creator><creator>Aranzabe, María Jesús</creator><creator>Díaz de Ilarraza, Arantza</creator><general>Springer Netherlands</general><general>Springer Nature B.V</general><scope>C6C</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>3V.</scope><scope>7SC</scope><scope>7T9</scope><scope>7XB</scope><scope>8AL</scope><scope>8FD</scope><scope>8FE</scope><scope>8FG</scope><scope>8FK</scope><scope>8G5</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AIMQZ</scope><scope>ALSLI</scope><scope>ARAPS</scope><scope>AVQMV</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>CPGLG</scope><scope>CRLPW</scope><scope>DWQXO</scope><scope>GB0</scope><scope>GNUQQ</scope><scope>GUQSH</scope><scope>HCIFZ</scope><scope>JQ2</scope><scope>K50</scope><scope>K7-</scope><scope>L7M</scope><scope>LIQON</scope><scope>L~C</scope><scope>L~D</scope><scope>M0N</scope><scope>M1D</scope><scope>M2O</scope><scope>MBDVC</scope><scope>P5Z</scope><scope>P62</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>Q9U</scope><orcidid>https://orcid.org/0000-0002-0401-1087</orcidid><orcidid>https://orcid.org/0000-0003-3317-8561</orcidid><orcidid>https://orcid.org/0000-0003-1048-5403</orcidid></search><sort><creationdate>20180301</creationdate><title>The corpus of Basque simplified texts (CBST)</title><author>Gonzalez-Dios, Itziar ; Aranzabe, María Jesús ; Díaz de Ilarraza, Arantza</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c359t-46a37d535819e00651f470fbf35893877ff3c5c81760a0456512fc8822e683d53</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2018</creationdate><topic>Annotations</topic><topic>Basque language</topic><topic>Computational Linguistics</topic><topic>Computer Science</topic><topic>Corpus analysis</topic><topic>Corpus linguistics</topic><topic>Language and Literature</topic><topic>Linguistics</topic><topic>Machine learning</topic><topic>Sentences</topic><topic>Simplified language</topic><topic>Social Sciences</topic><topic>Texts</topic><topic>Translators</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Gonzalez-Dios, Itziar</creatorcontrib><creatorcontrib>Aranzabe, María Jesús</creatorcontrib><creatorcontrib>Díaz de Ilarraza, Arantza</creatorcontrib><collection>Springer Nature OA Free Journals</collection><collection>CrossRef</collection><collection>ProQuest Central (Corporate)</collection><collection>Computer and Information Systems Abstracts</collection><collection>Linguistics and Language Behavior Abstracts (LLBA)</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>Computing Database (Alumni Edition)</collection><collection>Technology Research Database</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>Research Library (Alumni Edition)</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest One Literature</collection><collection>Social Science Premium Collection</collection><collection>Advanced Technologies &amp; Aerospace Collection</collection><collection>Arts Premium Collection</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>Linguistics Collection</collection><collection>Linguistics Database</collection><collection>ProQuest Central Korea</collection><collection>DELNET Social Sciences &amp; Humanities Collection</collection><collection>ProQuest Central Student</collection><collection>Research Library Prep</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Computer Science Collection</collection><collection>Art, Design &amp; Architecture Collection</collection><collection>Computer Science Database</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>ProQuest One Literature - U.S. Customers Only</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>Computing Database</collection><collection>Arts &amp; Humanities Database</collection><collection>Research Library</collection><collection>Research Library (Corporate)</collection><collection>Advanced Technologies &amp; Aerospace Database</collection><collection>ProQuest Advanced Technologies &amp; Aerospace Collection</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>ProQuest Central Basic</collection><jtitle>Language resources and evaluation</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Gonzalez-Dios, Itziar</au><au>Aranzabe, María Jesús</au><au>Díaz de Ilarraza, Arantza</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>The corpus of Basque simplified texts (CBST)</atitle><jtitle>Language resources and evaluation</jtitle><stitle>Lang Resources &amp; Evaluation</stitle><date>2018-03-01</date><risdate>2018</risdate><volume>52</volume><issue>1</issue><spage>217</spage><epage>247</epage><pages>217-247</pages><issn>1574-020X</issn><eissn>1574-0218</eissn><abstract>In this paper we present the corpus of Basque simplified texts. This corpus compiles 227 original sentences of science popularisation domain and two simplified versions of each sentence. The simplified versions have been created following different approaches: the structural, by a court translator who considers easy-to-read guidelines and the intuitive, by a teacher based on her experience. The aim of this corpus is to make a comparative analysis of simplified text. To that end, we also present the annotation scheme we have created to annotate the corpus. The annotation scheme is divided into eight macro-operations: delete, merge, split, transformation, insert, reordering, no operation and other. These macro-operations can be classified into different operations. We also relate our work and results to other languages. This corpus will be used to corroborate the decisions taken and to improve the design of the automatic text simplification system for Basque.</abstract><cop>Dordrecht</cop><pub>Springer Netherlands</pub><doi>10.1007/s10579-017-9407-6</doi><tpages>31</tpages><orcidid>https://orcid.org/0000-0002-0401-1087</orcidid><orcidid>https://orcid.org/0000-0003-3317-8561</orcidid><orcidid>https://orcid.org/0000-0003-1048-5403</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 1574-020X
ispartof Language resources and evaluation, 2018-03, Vol.52 (1), p.217-247
issn 1574-020X
1574-0218
language eng
recordid cdi_proquest_journals_1965735409
source JSTOR Archive Collection A-Z Listing; SpringerLink Journals - AutoHoldings
subjects Annotations
Basque language
Computational Linguistics
Computer Science
Corpus analysis
Corpus linguistics
Language and Literature
Linguistics
Machine learning
Sentences
Simplified language
Social Sciences
Texts
Translators
title The corpus of Basque simplified texts (CBST)
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-10T21%3A05%3A56IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=The%20corpus%20of%20Basque%20simplified%20texts%20(CBST)&rft.jtitle=Language%20resources%20and%20evaluation&rft.au=Gonzalez-Dios,%20Itziar&rft.date=2018-03-01&rft.volume=52&rft.issue=1&rft.spage=217&rft.epage=247&rft.pages=217-247&rft.issn=1574-020X&rft.eissn=1574-0218&rft_id=info:doi/10.1007/s10579-017-9407-6&rft_dat=%3Cproquest_cross%3E1965735409%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=1965735409&rft_id=info:pmid/&rfr_iscdi=true