Dictoniary extension for topic analysis of Swedish online texts

In Natural Language Processing, topic analysis is the task of automatically detecting and extracting keywords from a large collection of text data in order to identify topics. On some social media platforms, such as Twitter, we can use hashtags to identify corresponding topics. However, only some of...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
1. Verfasser: Lee, Sinae
Format: Dissertation
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:In Natural Language Processing, topic analysis is the task of automatically detecting and extracting keywords from a large collection of text data in order to identify topics. On some social media platforms, such as Twitter, we can use hashtags to identify corresponding topics. However, only some of the tweets contain hashtags. In order to enhance the topic detection, we can extend the existing hashtags by extracting keywords representative of specific topics. In this project, hashtag extensions were extracted from tweets written by Swedish politicians. The following approaches were chosen for generating the hashtag extensions and compared in terms of precision and consideration of context: traditional information retrieval, Twitter specific approach based on co-occurring words and hashtags and a neural network based approach, which is called Word2Vec. The data set of tweets were preprocessed. Experiments of the three different approaches were run on the preprocessed data set. In terms of precision, the Word2Vec performed best and it produced the precision score 40%. Only the neural network approach, Word2Vec, could generate the useful hashtag extensions in consideration of the contextual interpretation.