Deep transfer learning baselines for sentiment analysis in Russian
Recently, transfer learning from pre-trained language models has proven to be effective in a variety of natural language processing tasks, including sentiment analysis. This paper aims at identifying deep transfer learning baselines for sentiment analysis in Russian. Firstly, we identified the most...
Gespeichert in:
Veröffentlicht in: | Information processing & management 2021-05, Vol.58 (3), p.102484, Article 102484 |
---|---|
Hauptverfasser: | , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Recently, transfer learning from pre-trained language models has proven to be effective in a variety of natural language processing tasks, including sentiment analysis. This paper aims at identifying deep transfer learning baselines for sentiment analysis in Russian. Firstly, we identified the most used publicly available sentiment analysis datasets in Russian and recent language models which officially support the Russian language. Secondly, we fine-tuned Multilingual Bidirectional Encoder Representations from Transformers (BERT), RuBERT, and two versions of the Multilingual Universal Sentence Encoder and obtained strong, or even new, state-of-the-art results on seven sentiment datasets in Russian: SentRuEval-2016, SentiRuEval-2015, RuTweetCorp, RuSentiment, LINIS Crowd, and Kaggle Russian News Dataset, and RuReviews. Lastly, we made fine-tuned models publicly available for the research community.
•We identified the most commonly used sentiment analysis datasets of the Russian language texts.•We fine-tuned Multilingual BERT, RuBERT, and two versions of the Multilingual USE on seven sentiment analysis datasets.•Fine-tuned RuBERT achieved new state-of-the-art results on Russian sentiment datasets.•We can state that in the context of existing approaches, sentiment analysis of the Russian language texts based on the language models outperforms rule-based and basic machine learning-based approaches in terms of classification quality. |
---|---|
ISSN: | 0306-4573 1873-5371 |
DOI: | 10.1016/j.ipm.2020.102484 |