Texts in, meaning out: neural language models in semantic similarity task for Russian

Distributed vector representations for natural language vocabulary get a lot of attention in contemporary computational linguistics. This paper summarizes the experience of applying neural network language models to the task of calculating semantic similarity for Russian. The experiments were perfor...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Kutuzov, Andrey, Andreev, Igor
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer Science - Computation and Language
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title
container_volume
creator	Kutuzov, Andrey Andreev, Igor
description	Distributed vector representations for natural language vocabulary get a lot of attention in contemporary computational linguistics. This paper summarizes the experience of applying neural network language models to the task of calculating semantic similarity for Russian. The experiments were performed in the course of Russian Semantic Similarity Evaluation track, where our models took from the 2nd to the 5th position, depending on the task. We introduce the tools and corpora used, comment on the nature of the shared task and describe the achieved results. It was found out that Continuous Skip-gram and Continuous Bag-of-words models, previously successfully applied to English material, can be used for semantic modeling of Russian as well. Moreover, we show that texts in Russian National Corpus (RNC) provide an excellent training material for such models, outperforming other, much larger corpora. It is especially true for semantic relatedness tasks (although stacking models trained on larger corpora on top of RNC models improves performance even more). High-quality semantic vectors learned in such a way can be used in a variety of linguistic tasks and promise an exciting field for further study.
doi_str_mv	10.48550/arxiv.1504.08183
format	Article
fullrecord	<record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_1504_08183</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>1504_08183</sourcerecordid><originalsourceid>FETCH-LOGICAL-a673-a67156adfcaff0a944a3d5d353bcc0e66d87d0be09a2f3bc1c2e778b95a8215f3</originalsourceid><addsrcrecordid>eNotj8tKxDAYhbNxIaMP4Mr_AZyaNE2bupPBGwwIUtflby4lmKaSpDLz9tpxNufA4ePAR8gNo0UlhaD3GA_up2CCVgWVTPJL8tmZQ07gwh1MBoMLI8xLfoBglogePIZxwdHANGvjVw6SmTBkpyC5yXmMLh8hY_oCO0f4WFJyGK7IhUWfzPW5N6R7fup2r9v9-8vb7nG_xbrhazBRo7YKraXYVhVyLTQXfFCKmrrWstF0MLTF0v5tTJWmaeTQCpQlE5ZvyO3_7cmr_45uwnjsV7_-5Md_AUskTG0</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Texts in, meaning out: neural language models in semantic similarity task for Russian</title><source>arXiv.org</source><creator>Kutuzov, Andrey ; Andreev, Igor</creator><creatorcontrib>Kutuzov, Andrey ; Andreev, Igor</creatorcontrib><description>Distributed vector representations for natural language vocabulary get a lot of attention in contemporary computational linguistics. This paper summarizes the experience of applying neural network language models to the task of calculating semantic similarity for Russian. The experiments were performed in the course of Russian Semantic Similarity Evaluation track, where our models took from the 2nd to the 5th position, depending on the task. We introduce the tools and corpora used, comment on the nature of the shared task and describe the achieved results. It was found out that Continuous Skip-gram and Continuous Bag-of-words models, previously successfully applied to English material, can be used for semantic modeling of Russian as well. Moreover, we show that texts in Russian National Corpus (RNC) provide an excellent training material for such models, outperforming other, much larger corpora. It is especially true for semantic relatedness tasks (although stacking models trained on larger corpora on top of RNC models improves performance even more). High-quality semantic vectors learned in such a way can be used in a variety of linguistic tasks and promise an exciting field for further study.</description><identifier>DOI: 10.48550/arxiv.1504.08183</identifier><language>eng</language><subject>Computer Science - Computation and Language</subject><creationdate>2015-04</creationdate><rights>http://creativecommons.org/licenses/by/3.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,776,881</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/1504.08183$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.1504.08183$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Kutuzov, Andrey</creatorcontrib><creatorcontrib>Andreev, Igor</creatorcontrib><title>Texts in, meaning out: neural language models in semantic similarity task for Russian</title><description>Distributed vector representations for natural language vocabulary get a lot of attention in contemporary computational linguistics. This paper summarizes the experience of applying neural network language models to the task of calculating semantic similarity for Russian. The experiments were performed in the course of Russian Semantic Similarity Evaluation track, where our models took from the 2nd to the 5th position, depending on the task. We introduce the tools and corpora used, comment on the nature of the shared task and describe the achieved results. It was found out that Continuous Skip-gram and Continuous Bag-of-words models, previously successfully applied to English material, can be used for semantic modeling of Russian as well. Moreover, we show that texts in Russian National Corpus (RNC) provide an excellent training material for such models, outperforming other, much larger corpora. It is especially true for semantic relatedness tasks (although stacking models trained on larger corpora on top of RNC models improves performance even more). High-quality semantic vectors learned in such a way can be used in a variety of linguistic tasks and promise an exciting field for further study.</description><subject>Computer Science - Computation and Language</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2015</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotj8tKxDAYhbNxIaMP4Mr_AZyaNE2bupPBGwwIUtflby4lmKaSpDLz9tpxNufA4ePAR8gNo0UlhaD3GA_up2CCVgWVTPJL8tmZQ07gwh1MBoMLI8xLfoBglogePIZxwdHANGvjVw6SmTBkpyC5yXmMLh8hY_oCO0f4WFJyGK7IhUWfzPW5N6R7fup2r9v9-8vb7nG_xbrhazBRo7YKraXYVhVyLTQXfFCKmrrWstF0MLTF0v5tTJWmaeTQCpQlE5ZvyO3_7cmr_45uwnjsV7_-5Md_AUskTG0</recordid><startdate>20150430</startdate><enddate>20150430</enddate><creator>Kutuzov, Andrey</creator><creator>Andreev, Igor</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20150430</creationdate><title>Texts in, meaning out: neural language models in semantic similarity task for Russian</title><author>Kutuzov, Andrey ; Andreev, Igor</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a673-a67156adfcaff0a944a3d5d353bcc0e66d87d0be09a2f3bc1c2e778b95a8215f3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2015</creationdate><topic>Computer Science - Computation and Language</topic><toplevel>online_resources</toplevel><creatorcontrib>Kutuzov, Andrey</creatorcontrib><creatorcontrib>Andreev, Igor</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Kutuzov, Andrey</au><au>Andreev, Igor</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Texts in, meaning out: neural language models in semantic similarity task for Russian</atitle><date>2015-04-30</date><risdate>2015</risdate><abstract>Distributed vector representations for natural language vocabulary get a lot of attention in contemporary computational linguistics. This paper summarizes the experience of applying neural network language models to the task of calculating semantic similarity for Russian. The experiments were performed in the course of Russian Semantic Similarity Evaluation track, where our models took from the 2nd to the 5th position, depending on the task. We introduce the tools and corpora used, comment on the nature of the shared task and describe the achieved results. It was found out that Continuous Skip-gram and Continuous Bag-of-words models, previously successfully applied to English material, can be used for semantic modeling of Russian as well. Moreover, we show that texts in Russian National Corpus (RNC) provide an excellent training material for such models, outperforming other, much larger corpora. It is especially true for semantic relatedness tasks (although stacking models trained on larger corpora on top of RNC models improves performance even more). High-quality semantic vectors learned in such a way can be used in a variety of linguistic tasks and promise an exciting field for further study.</abstract><doi>10.48550/arxiv.1504.08183</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	DOI: 10.48550/arxiv.1504.08183
ispartof
issn
language	eng
recordid	cdi_arxiv_primary_1504_08183
source	arXiv.org
subjects	Computer Science - Computation and Language
title	Texts in, meaning out: neural language models in semantic similarity task for Russian
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-08T22%3A37%3A33IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Texts%20in,%20meaning%20out:%20neural%20language%20models%20in%20semantic%20similarity%20task%20for%20Russian&rft.au=Kutuzov,%20Andrey&rft.date=2015-04-30&rft_id=info:doi/10.48550/arxiv.1504.08183&rft_dat=%3Carxiv_GOX%3E1504_08183%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true