Russian Natural Language Generation: Creation of a Language Modelling Dataset and Evaluation with Modern Neural Architectures

Generating coherent, grammatically correct, and meaningful text is very challenging, however, it is crucial to many modern NLP systems. So far, research has mostly focused on English language, for other languages both standardized datasets, as well as experiments with state-of-the-art models, are ra...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Shaheen, Zein, Wohlgenannt, Gerhard, Zaity, Bassel, Mouromtsev, Dmitry, Pak, Vadim
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title
container_volume
creator Shaheen, Zein
Wohlgenannt, Gerhard
Zaity, Bassel
Mouromtsev, Dmitry
Pak, Vadim
description Generating coherent, grammatically correct, and meaningful text is very challenging, however, it is crucial to many modern NLP systems. So far, research has mostly focused on English language, for other languages both standardized datasets, as well as experiments with state-of-the-art models, are rare. In this work, we i) provide a novel reference dataset for Russian language modeling, ii) experiment with popular modern methods for text generation, namely variational autoencoders, and generative adversarial networks, which we trained on the new dataset. We evaluate the generated text regarding metrics such as perplexity, grammatical correctness and lexical diversity.
doi_str_mv 10.48550/arxiv.2005.02470
format Article
fullrecord <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2005_02470</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2005_02470</sourcerecordid><originalsourceid>FETCH-LOGICAL-a670-f02c7231ff9a2bd8b4491a4604b29f4e6133807aecdbbd64032bb6c2cbea5ec83</originalsourceid><addsrcrecordid>eNpFkL9OwzAYxL0woMIDMOEXSHBs5x9bFUpBCiCh7tFn53NqKTjIcQoMvDvFRWK6G3660x0hVxlLZZXn7Ab8pz2knLE8ZVyW7Jx8vy7zbMHRZwiLh5G24IYFBqRbdOgh2Mnd0sZjdHQyFP6Rp6nHcbRuoHcQYMZAwfV0c4BxOeEfNuwj5Y8FGPPXXu9tQH1sw_mCnBkYZ7z80xXZ3W92zUPSvmwfm3WbQFGyxDCuSy4yY2rgqq-UlHUGsmBS8dpILDIhKlYC6l6pvpBMcKUKzbVCyFFXYkWuT7Fxf_fu7Rv4r-73hy7-IH4AAQdasQ</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Russian Natural Language Generation: Creation of a Language Modelling Dataset and Evaluation with Modern Neural Architectures</title><source>arXiv.org</source><creator>Shaheen, Zein ; Wohlgenannt, Gerhard ; Zaity, Bassel ; Mouromtsev, Dmitry ; Pak, Vadim</creator><creatorcontrib>Shaheen, Zein ; Wohlgenannt, Gerhard ; Zaity, Bassel ; Mouromtsev, Dmitry ; Pak, Vadim</creatorcontrib><description>Generating coherent, grammatically correct, and meaningful text is very challenging, however, it is crucial to many modern NLP systems. So far, research has mostly focused on English language, for other languages both standardized datasets, as well as experiments with state-of-the-art models, are rare. In this work, we i) provide a novel reference dataset for Russian language modeling, ii) experiment with popular modern methods for text generation, namely variational autoencoders, and generative adversarial networks, which we trained on the new dataset. We evaluate the generated text regarding metrics such as perplexity, grammatical correctness and lexical diversity.</description><identifier>DOI: 10.48550/arxiv.2005.02470</identifier><language>eng</language><subject>Computer Science - Computation and Language ; Computer Science - Learning</subject><creationdate>2020-05</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,776,881</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2005.02470$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2005.02470$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Shaheen, Zein</creatorcontrib><creatorcontrib>Wohlgenannt, Gerhard</creatorcontrib><creatorcontrib>Zaity, Bassel</creatorcontrib><creatorcontrib>Mouromtsev, Dmitry</creatorcontrib><creatorcontrib>Pak, Vadim</creatorcontrib><title>Russian Natural Language Generation: Creation of a Language Modelling Dataset and Evaluation with Modern Neural Architectures</title><description>Generating coherent, grammatically correct, and meaningful text is very challenging, however, it is crucial to many modern NLP systems. So far, research has mostly focused on English language, for other languages both standardized datasets, as well as experiments with state-of-the-art models, are rare. In this work, we i) provide a novel reference dataset for Russian language modeling, ii) experiment with popular modern methods for text generation, namely variational autoencoders, and generative adversarial networks, which we trained on the new dataset. We evaluate the generated text regarding metrics such as perplexity, grammatical correctness and lexical diversity.</description><subject>Computer Science - Computation and Language</subject><subject>Computer Science - Learning</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2020</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNpFkL9OwzAYxL0woMIDMOEXSHBs5x9bFUpBCiCh7tFn53NqKTjIcQoMvDvFRWK6G3660x0hVxlLZZXn7Ab8pz2knLE8ZVyW7Jx8vy7zbMHRZwiLh5G24IYFBqRbdOgh2Mnd0sZjdHQyFP6Rp6nHcbRuoHcQYMZAwfV0c4BxOeEfNuwj5Y8FGPPXXu9tQH1sw_mCnBkYZ7z80xXZ3W92zUPSvmwfm3WbQFGyxDCuSy4yY2rgqq-UlHUGsmBS8dpILDIhKlYC6l6pvpBMcKUKzbVCyFFXYkWuT7Fxf_fu7Rv4r-73hy7-IH4AAQdasQ</recordid><startdate>20200505</startdate><enddate>20200505</enddate><creator>Shaheen, Zein</creator><creator>Wohlgenannt, Gerhard</creator><creator>Zaity, Bassel</creator><creator>Mouromtsev, Dmitry</creator><creator>Pak, Vadim</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20200505</creationdate><title>Russian Natural Language Generation: Creation of a Language Modelling Dataset and Evaluation with Modern Neural Architectures</title><author>Shaheen, Zein ; Wohlgenannt, Gerhard ; Zaity, Bassel ; Mouromtsev, Dmitry ; Pak, Vadim</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a670-f02c7231ff9a2bd8b4491a4604b29f4e6133807aecdbbd64032bb6c2cbea5ec83</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2020</creationdate><topic>Computer Science - Computation and Language</topic><topic>Computer Science - Learning</topic><toplevel>online_resources</toplevel><creatorcontrib>Shaheen, Zein</creatorcontrib><creatorcontrib>Wohlgenannt, Gerhard</creatorcontrib><creatorcontrib>Zaity, Bassel</creatorcontrib><creatorcontrib>Mouromtsev, Dmitry</creatorcontrib><creatorcontrib>Pak, Vadim</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Shaheen, Zein</au><au>Wohlgenannt, Gerhard</au><au>Zaity, Bassel</au><au>Mouromtsev, Dmitry</au><au>Pak, Vadim</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Russian Natural Language Generation: Creation of a Language Modelling Dataset and Evaluation with Modern Neural Architectures</atitle><date>2020-05-05</date><risdate>2020</risdate><abstract>Generating coherent, grammatically correct, and meaningful text is very challenging, however, it is crucial to many modern NLP systems. So far, research has mostly focused on English language, for other languages both standardized datasets, as well as experiments with state-of-the-art models, are rare. In this work, we i) provide a novel reference dataset for Russian language modeling, ii) experiment with popular modern methods for text generation, namely variational autoencoders, and generative adversarial networks, which we trained on the new dataset. We evaluate the generated text regarding metrics such as perplexity, grammatical correctness and lexical diversity.</abstract><doi>10.48550/arxiv.2005.02470</doi><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier DOI: 10.48550/arxiv.2005.02470
ispartof
issn
language eng
recordid cdi_arxiv_primary_2005_02470
source arXiv.org
subjects Computer Science - Computation and Language
Computer Science - Learning
title Russian Natural Language Generation: Creation of a Language Modelling Dataset and Evaluation with Modern Neural Architectures
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-21T16%3A32%3A09IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Russian%20Natural%20Language%20Generation:%20Creation%20of%20a%20Language%20Modelling%20Dataset%20and%20Evaluation%20with%20Modern%20Neural%20Architectures&rft.au=Shaheen,%20Zein&rft.date=2020-05-05&rft_id=info:doi/10.48550/arxiv.2005.02470&rft_dat=%3Carxiv_GOX%3E2005_02470%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true