Characterization of citizens using word2vec and latent topic analysis in a large set of tweets

With the increasing use of the Internet and mobile devices, social networks are becoming the most used media to communicate citizens' ideas and thoughts. This information is very useful to identify communities with common ideas based on what they publish in the network. This paper presents a me...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Cities 2019-09, Vol.92, p.187-196
Hauptverfasser: Vargas-Calderón, Vladimir, Camargo, Jorge E.
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:With the increasing use of the Internet and mobile devices, social networks are becoming the most used media to communicate citizens' ideas and thoughts. This information is very useful to identify communities with common ideas based on what they publish in the network. This paper presents a method to automatically detect city communities based on machine learning techniques applied to a set of tweets from Bogotá’s citizens. An analysis was performed in a collection of 2,634,176 tweets gathered from Twitter in a period of six months. Results show that the proposed method is an interesting tool to characterize a city population based on a machine learning methods and text analytics. •This paper proposes a method to automatically detect communities in the Twitter social network.•We collected a data set of tweets of Bogotá-Colombia citizens in a period of six months.•We represent the complete tweets collection using the Word2Vec model and natural language techniques.•We extract communities using a clustering algorithm to detect latent topics.•Each citizen is projected in a 2D visualization in which the obtained latent topics are colored.
ISSN:0264-2751
1873-6084
DOI:10.1016/j.cities.2019.03.019