Texts in, meaning out: neural language models in semantic similarity task for Russian
Distributed vector representations for natural language vocabulary get a lot of attention in contemporary computational linguistics. This paper summarizes the experience of applying neural network language models to the task of calculating semantic similarity for Russian. The experiments were perfor...
Gespeichert in:
Veröffentlicht in: | arXiv.org 2015-04 |
---|---|
Hauptverfasser: | , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | |
---|---|
container_issue | |
container_start_page | |
container_title | arXiv.org |
container_volume | |
creator | Kutuzov, Andrey Andreev, Igor |
description | Distributed vector representations for natural language vocabulary get a lot of attention in contemporary computational linguistics. This paper summarizes the experience of applying neural network language models to the task of calculating semantic similarity for Russian. The experiments were performed in the course of Russian Semantic Similarity Evaluation track, where our models took from the 2nd to the 5th position, depending on the task. We introduce the tools and corpora used, comment on the nature of the shared task and describe the achieved results. It was found out that Continuous Skip-gram and Continuous Bag-of-words models, previously successfully applied to English material, can be used for semantic modeling of Russian as well. Moreover, we show that texts in Russian National Corpus (RNC) provide an excellent training material for such models, outperforming other, much larger corpora. It is especially true for semantic relatedness tasks (although stacking models trained on larger corpora on top of RNC models improves performance even more). High-quality semantic vectors learned in such a way can be used in a variety of linguistic tasks and promise an exciting field for further study. |
format | Article |
fullrecord | <record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_2081626246</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2081626246</sourcerecordid><originalsourceid>FETCH-proquest_journals_20816262463</originalsourceid><addsrcrecordid>eNqNyk0KwjAQQOEgCBbtHQbcWmiTNha3oriWupZB05KaH80koLdXwQO4eov3TVjGhaiKtuZ8xnKisSxLLte8aUTGTp16RgLtVmAVOu0G8CluwKkU0IBBNyQcFFh_VebrgJRFF_UFSFttMOj4goh0g94HOCYijW7Bpj0aUvmvc7bc77rtobgH_0iK4nn0KbjPOvOyrSSXvJbiP_UGtW5A0A</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2081626246</pqid></control><display><type>article</type><title>Texts in, meaning out: neural language models in semantic similarity task for Russian</title><source>Free E- Journals</source><creator>Kutuzov, Andrey ; Andreev, Igor</creator><creatorcontrib>Kutuzov, Andrey ; Andreev, Igor</creatorcontrib><description>Distributed vector representations for natural language vocabulary get a lot of attention in contemporary computational linguistics. This paper summarizes the experience of applying neural network language models to the task of calculating semantic similarity for Russian. The experiments were performed in the course of Russian Semantic Similarity Evaluation track, where our models took from the 2nd to the 5th position, depending on the task. We introduce the tools and corpora used, comment on the nature of the shared task and describe the achieved results. It was found out that Continuous Skip-gram and Continuous Bag-of-words models, previously successfully applied to English material, can be used for semantic modeling of Russian as well. Moreover, we show that texts in Russian National Corpus (RNC) provide an excellent training material for such models, outperforming other, much larger corpora. It is especially true for semantic relatedness tasks (although stacking models trained on larger corpora on top of RNC models improves performance even more). High-quality semantic vectors learned in such a way can be used in a variety of linguistic tasks and promise an exciting field for further study.</description><identifier>EISSN: 2331-8422</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Linguistics ; Neural networks ; Performance enhancement ; Semantics ; Similarity ; Texts</subject><ispartof>arXiv.org, 2015-04</ispartof><rights>2015. This work is published under http://creativecommons.org/licenses/by/3.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>776,780</link.rule.ids></links><search><creatorcontrib>Kutuzov, Andrey</creatorcontrib><creatorcontrib>Andreev, Igor</creatorcontrib><title>Texts in, meaning out: neural language models in semantic similarity task for Russian</title><title>arXiv.org</title><description>Distributed vector representations for natural language vocabulary get a lot of attention in contemporary computational linguistics. This paper summarizes the experience of applying neural network language models to the task of calculating semantic similarity for Russian. The experiments were performed in the course of Russian Semantic Similarity Evaluation track, where our models took from the 2nd to the 5th position, depending on the task. We introduce the tools and corpora used, comment on the nature of the shared task and describe the achieved results. It was found out that Continuous Skip-gram and Continuous Bag-of-words models, previously successfully applied to English material, can be used for semantic modeling of Russian as well. Moreover, we show that texts in Russian National Corpus (RNC) provide an excellent training material for such models, outperforming other, much larger corpora. It is especially true for semantic relatedness tasks (although stacking models trained on larger corpora on top of RNC models improves performance even more). High-quality semantic vectors learned in such a way can be used in a variety of linguistic tasks and promise an exciting field for further study.</description><subject>Linguistics</subject><subject>Neural networks</subject><subject>Performance enhancement</subject><subject>Semantics</subject><subject>Similarity</subject><subject>Texts</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2015</creationdate><recordtype>article</recordtype><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><recordid>eNqNyk0KwjAQQOEgCBbtHQbcWmiTNha3oriWupZB05KaH80koLdXwQO4eov3TVjGhaiKtuZ8xnKisSxLLte8aUTGTp16RgLtVmAVOu0G8CluwKkU0IBBNyQcFFh_VebrgJRFF_UFSFttMOj4goh0g94HOCYijW7Bpj0aUvmvc7bc77rtobgH_0iK4nn0KbjPOvOyrSSXvJbiP_UGtW5A0A</recordid><startdate>20150430</startdate><enddate>20150430</enddate><creator>Kutuzov, Andrey</creator><creator>Andreev, Igor</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>20150430</creationdate><title>Texts in, meaning out: neural language models in semantic similarity task for Russian</title><author>Kutuzov, Andrey ; Andreev, Igor</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_20816262463</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2015</creationdate><topic>Linguistics</topic><topic>Neural networks</topic><topic>Performance enhancement</topic><topic>Semantics</topic><topic>Similarity</topic><topic>Texts</topic><toplevel>online_resources</toplevel><creatorcontrib>Kutuzov, Andrey</creatorcontrib><creatorcontrib>Andreev, Igor</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Kutuzov, Andrey</au><au>Andreev, Igor</au><format>book</format><genre>document</genre><ristype>GEN</ristype><atitle>Texts in, meaning out: neural language models in semantic similarity task for Russian</atitle><jtitle>arXiv.org</jtitle><date>2015-04-30</date><risdate>2015</risdate><eissn>2331-8422</eissn><abstract>Distributed vector representations for natural language vocabulary get a lot of attention in contemporary computational linguistics. This paper summarizes the experience of applying neural network language models to the task of calculating semantic similarity for Russian. The experiments were performed in the course of Russian Semantic Similarity Evaluation track, where our models took from the 2nd to the 5th position, depending on the task. We introduce the tools and corpora used, comment on the nature of the shared task and describe the achieved results. It was found out that Continuous Skip-gram and Continuous Bag-of-words models, previously successfully applied to English material, can be used for semantic modeling of Russian as well. Moreover, we show that texts in Russian National Corpus (RNC) provide an excellent training material for such models, outperforming other, much larger corpora. It is especially true for semantic relatedness tasks (although stacking models trained on larger corpora on top of RNC models improves performance even more). High-quality semantic vectors learned in such a way can be used in a variety of linguistic tasks and promise an exciting field for further study.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | EISSN: 2331-8422 |
ispartof | arXiv.org, 2015-04 |
issn | 2331-8422 |
language | eng |
recordid | cdi_proquest_journals_2081626246 |
source | Free E- Journals |
subjects | Linguistics Neural networks Performance enhancement Semantics Similarity Texts |
title | Texts in, meaning out: neural language models in semantic similarity task for Russian |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-22T21%3A05%3A26IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=document&rft.atitle=Texts%20in,%20meaning%20out:%20neural%20language%20models%20in%20semantic%20similarity%20task%20for%20Russian&rft.jtitle=arXiv.org&rft.au=Kutuzov,%20Andrey&rft.date=2015-04-30&rft.eissn=2331-8422&rft_id=info:doi/&rft_dat=%3Cproquest%3E2081626246%3C/proquest%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2081626246&rft_id=info:pmid/&rfr_iscdi=true |