Analysing political events on Twitter: topic modelling and user community classification

Recently, political events, such as elections, have raised a lot of discussions on social media networks, in particular, Twitter. This brings new opportunities for social scientists to address social science tasks, such as understanding what communities said or identifying whether a community has an...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:SIGIR forum 2019-06, Vol.53 (1), p.38-39
1. Verfasser: Fang, Anjie
Format: Artikel
Sprache:eng
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Recently, political events, such as elections, have raised a lot of discussions on social media networks, in particular, Twitter. This brings new opportunities for social scientists to address social science tasks, such as understanding what communities said or identifying whether a community has an influence on another. However, identifying these communities and extracting what they said from social media data are challenging and non-trivial tasks. We aim to make progress towards understanding 'who' (i.e. communities) said 'what' (i.e. discussed topics) and 'when' (i.e. time) during political events on Twitter. While identifying the 'who' can benefit from Twitter user community classification approaches, 'what' they said and 'when' can be effectively addressed on Twitter by extracting their discussed topics using topic modelling approaches that also account for the importance of time on Twitter. To evaluate the quality of these topics, it is necessary to investigate how coherent these topics are to humans. Accordingly, we propose a series of approaches in this thesis. First, we investigate how to effectively evaluate the coherence of the topics generated using a topic modelling approach. The topic coherence metric evaluates the topical coherence by examining the semantic similarity among words in a topic. We argue that the semantic similarity of words in tweets can be effectively captured by using word embeddings trained using a Twitter background dataset. Through a user study, we demonstrate that our proposed word embedding-based topic coherence metric can assess the coherence of topics like humans [1, 2]. In addition, inspired by the precision at k metric, we propose to evaluate the coherence of a topic model (containing many topics) by averaging the top-ranked topics within the topic model [3]. Our proposed metrics can not only evaluate the coherence of topics and topic models, but also can help users to choose the most coherent topics. Second, we aim to extract topics with a high coherence from Twitter data. Such topics can be easily interpreted by humans and they can assist to examine 'what' has been discussed and 'when'. Indeed, we argue that topics can be discussed in different time periods (see [4]) and therefore can be effectively identified and distinguished by considering their time periods. Hence, we propose an effective time-sensitive topic modelling approach by integrating the time dimension of tweets (i.e. 'when') [5]. We show that the time
ISSN:0163-5840
DOI:10.1145/3458537.3458542