Developing a Twitter-based traffic event detection model using deep learning architectures

•Tweets are mapped into numerical feature vectors using word-embedding models.•Tweets are classified into non-traffic, traffic incident, and traffic information.•Classification task is performed using convolutional and recurrent neural networks.•51,100 tweets are collected, labeled, and publicly rel...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Expert systems with applications 2019-03, Vol.118, p.425-439
Hauptverfasser: Dabiri, Sina, Heaslip, Kevin
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:•Tweets are mapped into numerical feature vectors using word-embedding models.•Tweets are classified into non-traffic, traffic incident, and traffic information.•Classification task is performed using convolutional and recurrent neural networks.•51,100 tweets are collected, labeled, and publicly released for future research.•Models’ superiority is demonstrated through several evaluation steps. In recent years, several studies have harnessed Twitter data for detecting traffic incidents and monitoring traffic conditions. Researchers have utilized the bag-of-words representation for converting tweets into numerical feature vectors. However, the bag-of-words not only ignores the order of tweet's words but suffers from the curse of dimensionality and sparsity. A common approach in literature for dimensionality reduction is to build the bag-of-words on the top of pre-defined traffic keywords. The immediate criticisms to such a strategy are that the pre-defined set of keywords may not include all traffic keywords and the tweet language is subjected to change over time. To address these shortcomings, we utilize the power of deep-learning architectures for both representing tweets in numerical vectors and classifying them into three categories: 1) non-traffic, 2) traffic incident, and 3) traffic information and condition. First, we map tweets into low-dimensional vector space through word-embedding tools, which are also capable of measuring the semantic relationship between words. Supervised deep-learning algorithms including convolutional neural network (CNN) and recurrent neural network (RNN) are then deployed on the top of word-embedding models for detecting traffic events. For training and testing our proposed model, a large volume of traffic tweets is collected through Twitter API endpoints and labeled through an efficient strategy. Experimental results on our labeled dataset show that the proposed approach achieves clear improvements over state-of-the-art methods.
ISSN:0957-4174
1873-6793
DOI:10.1016/j.eswa.2018.10.017