Construction-Accident Narrative Classification Using Shallow and Deep Learning

AbstractIt is crucial to extract knowledge from past accidents to prevent future ones. To this end, narrative classification is required in text mining. This autocoding process can be seen as a multiclass classification problem with an imbalanced data set. We evaluated the performance of several sta...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Journal of construction engineering and management 2022-09, Vol.148 (9)
Hauptverfasser: Qiao, Jianfeng, Wang, Changfeng, Guan, Shuang, Shuran, Lv
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:AbstractIt is crucial to extract knowledge from past accidents to prevent future ones. To this end, narrative classification is required in text mining. This autocoding process can be seen as a multiclass classification problem with an imbalanced data set. We evaluated the performance of several state-of-the-art machine learning methods, including 10 shallow learning methods (Rocchio, k-nearest neighbors, linear regression, naive Bayes, decision tree, random forest, gradient boosting, bootstrap aggregating, support vector machine (SVM), and shallow neural network), and five deep learning methods [deep neural network, convolutional neural network (CNN), recurrent neural network with long short-term memory, and a gated recurrent unit, and recurrent CNN]. The input data set contained 4,770 construction accident reports from the Occupational Safety and Health Administration (OSHA). After the narratives were relabeled based on the Occupational Injury and Illness Classification System (OIICS), the accuracy of all shallow classifiers was significantly improved compared with that reported in previous studies. SVM and CNN achieved the highest accuracy of 0.91 and 0.90 among the shallow and deep learning methods, respectively. Misclassifications occur because training data sets lack rich diversity for minority classes, some cases belong to multiple classes, and some divisions have the same key feature words. In the future, when a new data set is available, we can use learned patterns to classify them with high accuracy in practice.
ISSN:0733-9364
1943-7862
DOI:10.1061/(ASCE)CO.1943-7862.0002354