Traffic accident duration prediction using text mining and ensemble learning on expressways

Predicting traffic accident duration is necessary for ensuring traffic safety. Several attempts have been made to achieve high prediction accuracy, but researchers have not considered traffic accident text data in much detail. The limited text data of the first report on an incident describes the ch...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Scientific reports 2022-12, Vol.12 (1), p.21478-21478, Article 21478
Hauptverfasser: Chen, Jiaona, Tao, Weijun
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Predicting traffic accident duration is necessary for ensuring traffic safety. Several attempts have been made to achieve high prediction accuracy, but researchers have not considered traffic accident text data in much detail. The limited text data of the first report on an incident describes the characteristics of an accident that are initially available. This paper uses text data fusing and ensemble learning algorithms to build a model to predict an accident’s duration, and a preprocessing scheme of accident duration text data is established. Next, the random forest (RF) algorithm is applied to select feature variables of text data related to the traffic incident duration. Last, a text feature vector is introduced to models such as decision tree, k nearest neighbor, support vector regression, random forest, Gradient Boosting Decision Tree, and Xtreme Gradient Boosting. Our results show that the improved RF model has good prediction accuracy with RMSE, MAPE and R 2 . From this, the textual factors important to determining the duration of the accident are identified. Further, we investigated that the cumulative importance of 60% is sufficient for traffic accident prediction using text data. These results provide insights into minimizing traffic congestion related to accidents and contribute to the input optimization in text prediction.
ISSN:2045-2322
2045-2322
DOI:10.1038/s41598-022-25988-4