Construction-Accident Narrative Classification Using Shallow and Deep Learning
AbstractIt is crucial to extract knowledge from past accidents to prevent future ones. To this end, narrative classification is required in text mining. This autocoding process can be seen as a multiclass classification problem with an imbalanced data set. We evaluated the performance of several sta...
Gespeichert in:
Veröffentlicht in: | Journal of construction engineering and management 2022-09, Vol.148 (9) |
---|---|
Hauptverfasser: | , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | AbstractIt is crucial to extract knowledge from past accidents to prevent future ones. To this end, narrative classification is required in text mining. This autocoding process can be seen as a multiclass classification problem with an imbalanced data set. We evaluated the performance of several state-of-the-art machine learning methods, including 10 shallow learning methods (Rocchio, k-nearest neighbors, linear regression, naive Bayes, decision tree, random forest, gradient boosting, bootstrap aggregating, support vector machine (SVM), and shallow neural network), and five deep learning methods [deep neural network, convolutional neural network (CNN), recurrent neural network with long short-term memory, and a gated recurrent unit, and recurrent CNN]. The input data set contained 4,770 construction accident reports from the Occupational Safety and Health Administration (OSHA). After the narratives were relabeled based on the Occupational Injury and Illness Classification System (OIICS), the accuracy of all shallow classifiers was significantly improved compared with that reported in previous studies. SVM and CNN achieved the highest accuracy of 0.91 and 0.90 among the shallow and deep learning methods, respectively. Misclassifications occur because training data sets lack rich diversity for minority classes, some cases belong to multiple classes, and some divisions have the same key feature words. In the future, when a new data set is available, we can use learned patterns to classify them with high accuracy in practice. |
---|---|
ISSN: | 0733-9364 1943-7862 |
DOI: | 10.1061/(ASCE)CO.1943-7862.0002354 |