Korpusy jako zdroje dat pro úpravy nástrojů automatické morfologické analýzy (Slovotvorné varianty adjektiv na [(ou)|í]cí z hlediska morfologického značkování)

Adjectives ending with -oucí/-ící are regularly derived from verbs and hence are not usually listed in any of the Czech monolingual dictionaries. On the level of automatic morphological analysis (the dictionary) of Czech they should be generated from verbal roots and tagged as verbal adjectives (pos...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Časopis pro moderní filologii 2015, Vol.97 (2), p.136-145
1. Verfasser:	Osolsobě, Klára
Format:	Artikel
Sprache:	cze
Schlagworte:	Adjectives Computational linguistics Corpus linguistics Czech language Derivation (Morphology) Dictionaries Form classes Internet Language and Literature Studies Morphology Word formation
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	145
container_issue	2
container_start_page	136
container_title	Časopis pro moderní filologii
container_volume	97
creator	Osolsobě, Klára
description	Adjectives ending with -oucí/-ící are regularly derived from verbs and hence are not usually listed in any of the Czech monolingual dictionaries. On the level of automatic morphological analysis (the dictionary) of Czech they should be generated from verbal roots and tagged as verbal adjectives (pos tag: AG.*). The data from Czech corpora prove a) inconsistencies in tagging and b) gaps in the dictionary. The main cause of both kinds of insufficiency is the existence of variants on the level of verbal forms from which the verbal adjectives are potentially derived. Consequently, text corpora are a significant source of knowledge about the formation and use of adjectives with endings -oucí/-ící that can be important for both a) automatic morphological analysis of Czech and b) theoretical description of Czech grammar (derivational morphology). Our goal is to present a corpus-based study of the Czech gerund, i.e. verbal adjectives with -oucí/-ící. The link between the inflected and the word-formation variants will be demonstrated using material from the SYN corpus (2,6 billion tokens of written Czech) and the large web corpus czTenTen12 (5,2 billion tokens of Czech text from the Internet — cleaned and deduplicated).
format	Article
fullrecord	<record><control><sourceid>ceeol_proqu</sourceid><recordid>TN_cdi_proquest_journals_2224459647</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ceeol_id>303668</ceeol_id><sourcerecordid>303668</sourcerecordid><originalsourceid>FETCH-LOGICAL-c119t-cbf77078eb61b13b27fe49d49ad3f83c0d9d6693a357f2a6583d6f5969c484f93</originalsourceid><addsrcrecordid>eNpVkM1Kw0AUhYMoWLRv4GLATbsIJJlkfpZS_MOCC3UlUm4yE5ufZmoyCaT4Cj5El4LFveBq9L0cqCCuLufej8M5d8cZBBgTl0Tc33UGnucxl2JG9p1h0-RW-iRglAcD5-1K1cu26VEOhUIrUatcIgEaLWuFzMeyhq5HlVk32l6-3xG0Wi1AZ0lhXtFC1akq1eNWQQWl-Vz1aHRTqk7pTtWVXXdQZ1DpHoHIZaGzDlWA7keqHT-bzUNiNmiF5qUUWVPAf8e5DVTB10uhOrOuzGZ86OylUDZy-DsPnLuz09vJhTu9Pr-cnEzdxPe5dpM4pdSjTMbEj30cBzSVIRchB4FThhNPcEEIx4AjmgZAIoYFSSNOeBKyMOX4wDne-tonPLWy0bNctbWt18yCIAhDi4bUUkdbKpFSlX8I9jAhDP8AnRGCdw</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2224459647</pqid></control><display><type>article</type><title>Korpusy jako zdroje dat pro úpravy nástrojů automatické morfologické analýzy (Slovotvorné varianty adjektiv na [(ou)\|í]cí z hlediska morfologického značkování)</title><source>DOAJ Directory of Open Access Journals</source><creator>Osolsobě, Klára</creator><creatorcontrib>Osolsobě, Klára</creatorcontrib><description>Adjectives ending with -oucí/-ící are regularly derived from verbs and hence are not usually listed in any of the Czech monolingual dictionaries. On the level of automatic morphological analysis (the dictionary) of Czech they should be generated from verbal roots and tagged as verbal adjectives (pos tag: AG.). The data from Czech corpora prove a) inconsistencies in tagging and b) gaps in the dictionary. The main cause of both kinds of insufficiency is the existence of variants on the level of verbal forms from which the verbal adjectives are potentially derived. Consequently, text corpora are a significant source of knowledge about the formation and use of adjectives with endings -oucí/-ící that can be important for both a) automatic morphological analysis of Czech and b) theoretical description of Czech grammar (derivational morphology). Our goal is to present a corpus-based study of the Czech gerund, i.e. verbal adjectives with -oucí/-ící. The link between the inflected and the word-formation variants will be demonstrated using material from the SYN corpus (2,6 billion tokens of written Czech) and the large web corpus czTenTen12 (5,2 billion tokens of Czech text from the Internet — cleaned and deduplicated).</description><identifier>ISSN: 0008-7386</identifier><identifier>EISSN: 2336-6591</identifier><language>cze</language><publisher>Prague: Univerzita Karlova v Praze - Filozofická fakulta, Vydavatelství</publisher><subject>Adjectives ; Computational linguistics ; Corpus linguistics ; Czech language ; Derivation (Morphology) ; Dictionaries ; Form classes ; Internet ; Language and Literature Studies ; Morphology ; Word formation</subject><ispartof>Časopis pro moderní filologii, 2015, Vol.97 (2), p.136-145</ispartof><rights>2015. This work is licensed under http://creativecommons.org/licenses/by-nc-nd/2.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Uhttps://www.ceeol.com//api/image/getissuecoverimage?id=picture_2015_21562.jpg</thumbnail><link.rule.ids>314,776,780,4010</link.rule.ids></links><search><creatorcontrib>Osolsobě, Klára</creatorcontrib><title>Korpusy jako zdroje dat pro úpravy nástrojů automatické morfologické analýzy (Slovotvorné varianty adjektiv na [(ou)\|í]cí z hlediska morfologického značkování)</title><title>Časopis pro moderní filologii</title><addtitle>Journal for Modern Philology</addtitle><description>Adjectives ending with -oucí/-ící are regularly derived from verbs and hence are not usually listed in any of the Czech monolingual dictionaries. On the level of automatic morphological analysis (the dictionary) of Czech they should be generated from verbal roots and tagged as verbal adjectives (pos tag: AG.). The data from Czech corpora prove a) inconsistencies in tagging and b) gaps in the dictionary. The main cause of both kinds of insufficiency is the existence of variants on the level of verbal forms from which the verbal adjectives are potentially derived. Consequently, text corpora are a significant source of knowledge about the formation and use of adjectives with endings -oucí/-ící that can be important for both a) automatic morphological analysis of Czech and b) theoretical description of Czech grammar (derivational morphology). Our goal is to present a corpus-based study of the Czech gerund, i.e. verbal adjectives with -oucí/-ící. The link between the inflected and the word-formation variants will be demonstrated using material from the SYN corpus (2,6 billion tokens of written Czech) and the large web corpus czTenTen12 (5,2 billion tokens of Czech text from the Internet — cleaned and deduplicated).</description><subject>Adjectives</subject><subject>Computational linguistics</subject><subject>Corpus linguistics</subject><subject>Czech language</subject><subject>Derivation (Morphology)</subject><subject>Dictionaries</subject><subject>Form classes</subject><subject>Internet</subject><subject>Language and Literature Studies</subject><subject>Morphology</subject><subject>Word formation</subject><issn>0008-7386</issn><issn>2336-6591</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2015</creationdate><recordtype>article</recordtype><sourceid>REL</sourceid><sourceid>BENPR</sourceid><recordid>eNpVkM1Kw0AUhYMoWLRv4GLATbsIJJlkfpZS_MOCC3UlUm4yE5ufZmoyCaT4Cj5El4LFveBq9L0cqCCuLufej8M5d8cZBBgTl0Tc33UGnucxl2JG9p1h0-RW-iRglAcD5-1K1cu26VEOhUIrUatcIgEaLWuFzMeyhq5HlVk32l6-3xG0Wi1AZ0lhXtFC1akq1eNWQQWl-Vz1aHRTqk7pTtWVXXdQZ1DpHoHIZaGzDlWA7keqHT-bzUNiNmiF5qUUWVPAf8e5DVTB10uhOrOuzGZ86OylUDZy-DsPnLuz09vJhTu9Pr-cnEzdxPe5dpM4pdSjTMbEj30cBzSVIRchB4FThhNPcEEIx4AjmgZAIoYFSSNOeBKyMOX4wDne-tonPLWy0bNctbWt18yCIAhDi4bUUkdbKpFSlX8I9jAhDP8AnRGCdw</recordid><startdate>2015</startdate><enddate>2015</enddate><creator>Osolsobě, Klára</creator><general>Univerzita Karlova v Praze - Filozofická fakulta, Vydavatelství</general><general>Charles University in Prague - Faculty of Arts Press</general><general>Charles University, Faculty of Arts</general><scope>AE2</scope><scope>BIXPP</scope><scope>REL</scope><scope>7T9</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>ALSLI</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BYOGL</scope><scope>CCPQU</scope><scope>CPGLG</scope><scope>CRLPW</scope><scope>DWQXO</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope></search><sort><creationdate>2015</creationdate><title>Korpusy jako zdroje dat pro úpravy nástrojů automatické morfologické analýzy (Slovotvorné varianty adjektiv na [(ou)\|í]cí z hlediska morfologického značkování)</title><author>Osolsobě, Klára</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c119t-cbf77078eb61b13b27fe49d49ad3f83c0d9d6693a357f2a6583d6f5969c484f93</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>cze</language><creationdate>2015</creationdate><topic>Adjectives</topic><topic>Computational linguistics</topic><topic>Corpus linguistics</topic><topic>Czech language</topic><topic>Derivation (Morphology)</topic><topic>Dictionaries</topic><topic>Form classes</topic><topic>Internet</topic><topic>Language and Literature Studies</topic><topic>Morphology</topic><topic>Word formation</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Osolsobě, Klára</creatorcontrib><collection>Central and Eastern European Online Library (C.E.E.O.L.) (DFG Nationallizenzen)</collection><collection>CEEOL: Open Access</collection><collection>Central and Eastern European Online Library</collection><collection>Linguistics and Language Behavior Abstracts (LLBA)</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>Social Science Premium Collection</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>East Europe, Central Europe Database</collection><collection>ProQuest One Community College</collection><collection>Linguistics Collection</collection><collection>Linguistics Database</collection><collection>ProQuest Central Korea</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><jtitle>Časopis pro moderní filologii</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Osolsobě, Klára</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Korpusy jako zdroje dat pro úpravy nástrojů automatické morfologické analýzy (Slovotvorné varianty adjektiv na [(ou)\|í]cí z hlediska morfologického značkování)</atitle><jtitle>Časopis pro moderní filologii</jtitle><addtitle>Journal for Modern Philology</addtitle><date>2015</date><risdate>2015</risdate><volume>97</volume><issue>2</issue><spage>136</spage><epage>145</epage><pages>136-145</pages><issn>0008-7386</issn><eissn>2336-6591</eissn><abstract>Adjectives ending with -oucí/-ící are regularly derived from verbs and hence are not usually listed in any of the Czech monolingual dictionaries. On the level of automatic morphological analysis (the dictionary) of Czech they should be generated from verbal roots and tagged as verbal adjectives (pos tag: AG.*). The data from Czech corpora prove a) inconsistencies in tagging and b) gaps in the dictionary. The main cause of both kinds of insufficiency is the existence of variants on the level of verbal forms from which the verbal adjectives are potentially derived. Consequently, text corpora are a significant source of knowledge about the formation and use of adjectives with endings -oucí/-ící that can be important for both a) automatic morphological analysis of Czech and b) theoretical description of Czech grammar (derivational morphology). Our goal is to present a corpus-based study of the Czech gerund, i.e. verbal adjectives with -oucí/-ící. The link between the inflected and the word-formation variants will be demonstrated using material from the SYN corpus (2,6 billion tokens of written Czech) and the large web corpus czTenTen12 (5,2 billion tokens of Czech text from the Internet — cleaned and deduplicated).</abstract><cop>Prague</cop><pub>Univerzita Karlova v Praze - Filozofická fakulta, Vydavatelství</pub><tpages>10</tpages><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	ISSN: 0008-7386
ispartof	Časopis pro moderní filologii, 2015, Vol.97 (2), p.136-145
issn	0008-7386 2336-6591
language	cze
recordid	cdi_proquest_journals_2224459647
source	DOAJ Directory of Open Access Journals
subjects	Adjectives Computational linguistics Corpus linguistics Czech language Derivation (Morphology) Dictionaries Form classes Internet Language and Literature Studies Morphology Word formation
title	Korpusy jako zdroje dat pro úpravy nástrojů automatické morfologické analýzy (Slovotvorné varianty adjektiv na [(ou)\|í]cí z hlediska morfologického značkování)
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-07T03%3A40%3A03IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-ceeol_proqu&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Korpusy%20jako%20zdroje%20dat%20pro%20%C3%BApravy%20n%C3%A1stroj%C5%AF%20automatick%C3%A9%20morfologick%C3%A9%20anal%C3%BDzy%20(Slovotvorn%C3%A9%20varianty%20adjektiv%20na%20%5B(ou)%7C%C3%AD%5Dc%C3%AD%20z%20hlediska%20morfologick%C3%A9ho%20zna%C4%8Dkov%C3%A1n%C3%AD)&rft.jtitle=%C4%8Casopis%20pro%20modern%C3%AD%20filologii&rft.au=Osolsob%C4%9B,%20Kl%C3%A1ra&rft.date=2015&rft.volume=97&rft.issue=2&rft.spage=136&rft.epage=145&rft.pages=136-145&rft.issn=0008-7386&rft.eissn=2336-6591&rft_id=info:doi/&rft_dat=%3Cceeol_proqu%3E303668%3C/ceeol_proqu%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2224459647&rft_id=info:pmid/&rft_ceeol_id=303668&rfr_iscdi=true