Russian Natural Language Generation: Creation of a Language Modelling Dataset and Evaluation with Modern Neural Architectures

Generating coherent, grammatically correct, and meaningful text is very challenging, however, it is crucial to many modern NLP systems. So far, research has mostly focused on English language, for other languages both standardized datasets, as well as experiments with state-of-the-art models, are ra...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Shaheen, Zein, Wohlgenannt, Gerhard, Zaity, Bassel, Mouromtsev, Dmitry, Pak, Vadim
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer Science - Computation and Language Computer Science - Learning
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title
container_volume
creator	Shaheen, Zein Wohlgenannt, Gerhard Zaity, Bassel Mouromtsev, Dmitry Pak, Vadim
description	Generating coherent, grammatically correct, and meaningful text is very challenging, however, it is crucial to many modern NLP systems. So far, research has mostly focused on English language, for other languages both standardized datasets, as well as experiments with state-of-the-art models, are rare. In this work, we i) provide a novel reference dataset for Russian language modeling, ii) experiment with popular modern methods for text generation, namely variational autoencoders, and generative adversarial networks, which we trained on the new dataset. We evaluate the generated text regarding metrics such as perplexity, grammatical correctness and lexical diversity.
doi_str_mv	10.48550/arxiv.2005.02470
format	Article
fullrecord	<record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2005_02470</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2005_02470</sourcerecordid><originalsourceid>FETCH-LOGICAL-a670-f02c7231ff9a2bd8b4491a4604b29f4e6133807aecdbbd64032bb6c2cbea5ec83</originalsourceid><addsrcrecordid>eNpFkL9OwzAYxL0woMIDMOEXSHBs5x9bFUpBCiCh7tFn53NqKTjIcQoMvDvFRWK6G3660x0hVxlLZZXn7Ab8pz2knLE8ZVyW7Jx8vy7zbMHRZwiLh5G24IYFBqRbdOgh2Mnd0sZjdHQyFP6Rp6nHcbRuoHcQYMZAwfV0c4BxOeEfNuwj5Y8FGPPXXu9tQH1sw_mCnBkYZ7z80xXZ3W92zUPSvmwfm3WbQFGyxDCuSy4yY2rgqq-UlHUGsmBS8dpILDIhKlYC6l6pvpBMcKUKzbVCyFFXYkWuT7Fxf_fu7Rv4r-73hy7-IH4AAQdasQ</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Russian Natural Language Generation: Creation of a Language Modelling Dataset and Evaluation with Modern Neural Architectures</title><source>arXiv.org</source><creator>Shaheen, Zein ; Wohlgenannt, Gerhard ; Zaity, Bassel ; Mouromtsev, Dmitry ; Pak, Vadim</creator><creatorcontrib>Shaheen, Zein ; Wohlgenannt, Gerhard ; Zaity, Bassel ; Mouromtsev, Dmitry ; Pak, Vadim</creatorcontrib><description>Generating coherent, grammatically correct, and meaningful text is very challenging, however, it is crucial to many modern NLP systems. So far, research has mostly focused on English language, for other languages both standardized datasets, as well as experiments with state-of-the-art models, are rare. In this work, we i) provide a novel reference dataset for Russian language modeling, ii) experiment with popular modern methods for text generation, namely variational autoencoders, and generative adversarial networks, which we trained on the new dataset. We evaluate the generated text regarding metrics such as perplexity, grammatical correctness and lexical diversity.</description><identifier>DOI: 10.48550/arxiv.2005.02470</identifier><language>eng</language><subject>Computer Science - Computation and Language ; Computer Science - Learning</subject><creationdate>2020-05</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,776,881</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2005.02470$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2005.02470$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Shaheen, Zein</creatorcontrib><creatorcontrib>Wohlgenannt, Gerhard</creatorcontrib><creatorcontrib>Zaity, Bassel</creatorcontrib><creatorcontrib>Mouromtsev, Dmitry</creatorcontrib><creatorcontrib>Pak, Vadim</creatorcontrib><title>Russian Natural Language Generation: Creation of a Language Modelling Dataset and Evaluation with Modern Neural Architectures</title><description>Generating coherent, grammatically correct, and meaningful text is very challenging, however, it is crucial to many modern NLP systems. So far, research has mostly focused on English language, for other languages both standardized datasets, as well as experiments with state-of-the-art models, are rare. In this work, we i) provide a novel reference dataset for Russian language modeling, ii) experiment with popular modern methods for text generation, namely variational autoencoders, and generative adversarial networks, which we trained on the new dataset. We evaluate the generated text regarding metrics such as perplexity, grammatical correctness and lexical diversity.</description><subject>Computer Science - Computation and Language</subject><subject>Computer Science - Learning</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2020</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNpFkL9OwzAYxL0woMIDMOEXSHBs5x9bFUpBCiCh7tFn53NqKTjIcQoMvDvFRWK6G3660x0hVxlLZZXn7Ab8pz2knLE8ZVyW7Jx8vy7zbMHRZwiLh5G24IYFBqRbdOgh2Mnd0sZjdHQyFP6Rp6nHcbRuoHcQYMZAwfV0c4BxOeEfNuwj5Y8FGPPXXu9tQH1sw_mCnBkYZ7z80xXZ3W92zUPSvmwfm3WbQFGyxDCuSy4yY2rgqq-UlHUGsmBS8dpILDIhKlYC6l6pvpBMcKUKzbVCyFFXYkWuT7Fxf_fu7Rv4r-73hy7-IH4AAQdasQ</recordid><startdate>20200505</startdate><enddate>20200505</enddate><creator>Shaheen, Zein</creator><creator>Wohlgenannt, Gerhard</creator><creator>Zaity, Bassel</creator><creator>Mouromtsev, Dmitry</creator><creator>Pak, Vadim</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20200505</creationdate><title>Russian Natural Language Generation: Creation of a Language Modelling Dataset and Evaluation with Modern Neural Architectures</title><author>Shaheen, Zein ; Wohlgenannt, Gerhard ; Zaity, Bassel ; Mouromtsev, Dmitry ; Pak, Vadim</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a670-f02c7231ff9a2bd8b4491a4604b29f4e6133807aecdbbd64032bb6c2cbea5ec83</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2020</creationdate><topic>Computer Science - Computation and Language</topic><topic>Computer Science - Learning</topic><toplevel>online_resources</toplevel><creatorcontrib>Shaheen, Zein</creatorcontrib><creatorcontrib>Wohlgenannt, Gerhard</creatorcontrib><creatorcontrib>Zaity, Bassel</creatorcontrib><creatorcontrib>Mouromtsev, Dmitry</creatorcontrib><creatorcontrib>Pak, Vadim</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Shaheen, Zein</au><au>Wohlgenannt, Gerhard</au><au>Zaity, Bassel</au><au>Mouromtsev, Dmitry</au><au>Pak, Vadim</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Russian Natural Language Generation: Creation of a Language Modelling Dataset and Evaluation with Modern Neural Architectures</atitle><date>2020-05-05</date><risdate>2020</risdate><abstract>Generating coherent, grammatically correct, and meaningful text is very challenging, however, it is crucial to many modern NLP systems. So far, research has mostly focused on English language, for other languages both standardized datasets, as well as experiments with state-of-the-art models, are rare. In this work, we i) provide a novel reference dataset for Russian language modeling, ii) experiment with popular modern methods for text generation, namely variational autoencoders, and generative adversarial networks, which we trained on the new dataset. We evaluate the generated text regarding metrics such as perplexity, grammatical correctness and lexical diversity.</abstract><doi>10.48550/arxiv.2005.02470</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	DOI: 10.48550/arxiv.2005.02470
ispartof
issn
language	eng
recordid	cdi_arxiv_primary_2005_02470
source	arXiv.org
subjects	Computer Science - Computation and Language Computer Science - Learning
title	Russian Natural Language Generation: Creation of a Language Modelling Dataset and Evaluation with Modern Neural Architectures
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-21T16%3A32%3A09IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Russian%20Natural%20Language%20Generation:%20Creation%20of%20a%20Language%20Modelling%20Dataset%20and%20Evaluation%20with%20Modern%20Neural%20Architectures&rft.au=Shaheen,%20Zein&rft.date=2020-05-05&rft_id=info:doi/10.48550/arxiv.2005.02470&rft_dat=%3Carxiv_GOX%3E2005_02470%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true