Deep transfer learning baselines for sentiment analysis in Russian

Recently, transfer learning from pre-trained language models has proven to be effective in a variety of natural language processing tasks, including sentiment analysis. This paper aims at identifying deep transfer learning baselines for sentiment analysis in Russian. Firstly, we identified the most...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Information processing & management 2021-05, Vol.58 (3), p.102484, Article 102484
Hauptverfasser:	Smetanin, Sergey, Komarov, Mikhail
Format:	Artikel
Sprache:	eng
Schlagworte:	Coders Data mining Datasets Deep learning Information management Information processing Learning Model testing Multilingualism Natural language processing Russian texts Sentiment analysis Transfer learning
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Recently, transfer learning from pre-trained language models has proven to be effective in a variety of natural language processing tasks, including sentiment analysis. This paper aims at identifying deep transfer learning baselines for sentiment analysis in Russian. Firstly, we identified the most used publicly available sentiment analysis datasets in Russian and recent language models which officially support the Russian language. Secondly, we fine-tuned Multilingual Bidirectional Encoder Representations from Transformers (BERT), RuBERT, and two versions of the Multilingual Universal Sentence Encoder and obtained strong, or even new, state-of-the-art results on seven sentiment datasets in Russian: SentRuEval-2016, SentiRuEval-2015, RuTweetCorp, RuSentiment, LINIS Crowd, and Kaggle Russian News Dataset, and RuReviews. Lastly, we made fine-tuned models publicly available for the research community. •We identified the most commonly used sentiment analysis datasets of the Russian language texts.•We fine-tuned Multilingual BERT, RuBERT, and two versions of the Multilingual USE on seven sentiment analysis datasets.•Fine-tuned RuBERT achieved new state-of-the-art results on Russian sentiment datasets.•We can state that in the context of existing approaches, sentiment analysis of the Russian language texts based on the language models outperforms rule-based and basic machine learning-based approaches in terms of classification quality.
ISSN:	0306-4573 1873-5371
DOI:	10.1016/j.ipm.2020.102484