Clustering Word Embeddings with Self-Organizing Maps. Application on LaRoSeDa -- A Large Romanian Sentiment Data Set

Romanian is one of the understudied languages in computational linguistics, with few resources available for the development of natural language processing tools. In this paper, we introduce LaRoSeDa, a Large Romanian Sentiment Data Set, which is composed of 15,000 positive and negative reviews coll...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	arXiv.org 2021-01
Hauptverfasser:	Tache, Anca Maria, Gaman, Mihaela, Radu Tudor Ionescu
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Cluster analysis Clustering Datasets Linguistics Natural language Natural language processing Self organizing maps Vector quantization Words (language) Zipf's Law
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Schreiben Sie den ersten Kommentar!