Texts in, meaning out: neural language models in semantic similarity task for Russian
Distributed vector representations for natural language vocabulary get a lot of attention in contemporary computational linguistics. This paper summarizes the experience of applying neural network language models to the task of calculating semantic similarity for Russian. The experiments were perfor...
Gespeichert in:
Hauptverfasser: | , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Distributed vector representations for natural language vocabulary get a lot
of attention in contemporary computational linguistics. This paper summarizes
the experience of applying neural network language models to the task of
calculating semantic similarity for Russian. The experiments were performed in
the course of Russian Semantic Similarity Evaluation track, where our models
took from the 2nd to the 5th position, depending on the task.
We introduce the tools and corpora used, comment on the nature of the shared
task and describe the achieved results. It was found out that Continuous
Skip-gram and Continuous Bag-of-words models, previously successfully applied
to English material, can be used for semantic modeling of Russian as well.
Moreover, we show that texts in Russian National Corpus (RNC) provide an
excellent training material for such models, outperforming other, much larger
corpora. It is especially true for semantic relatedness tasks (although
stacking models trained on larger corpora on top of RNC models improves
performance even more).
High-quality semantic vectors learned in such a way can be used in a variety
of linguistic tasks and promise an exciting field for further study. |
---|---|
DOI: | 10.48550/arxiv.1504.08183 |