A random forests approach to prioritize Highway Safety Manual (HSM) variables for data collection

Summary The Highway Safety Manual (HSM) recommends using the empirical Bayes method with locally derived calibration factors to predict an agency's safety performance. The data needs for deriving these local calibration factors are significant, requiring very detailed roadway characteristics in...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Journal of advanced transportation 2016-06, Vol.50 (4), p.522-540
Hauptverfasser: Saha, Dibakar, Alluri, Priyanka, Gan, Albert
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Summary The Highway Safety Manual (HSM) recommends using the empirical Bayes method with locally derived calibration factors to predict an agency's safety performance. The data needs for deriving these local calibration factors are significant, requiring very detailed roadway characteristics information. Many of these data variables are currently unavailable in most of the agencies' databases. Furthermore, it is not economically feasible to collect and maintain all the HSM data variables. This study aims to prioritize the HSM calibration variables based on their impact on crash predictions. Prioritization would help to identify influential variables for which data could be collected and maintained for continued updates, and thereby reduce intensive data collection efforts. Data were first collected for all the HSM variables from over 2400 miles of urban and suburban arterial road networks in Florida. Using 5 years (2008–2012) of crash data, a random forests data mining approach was then applied to measure the importance of each variable in crash frequency predictions for five different urban and suburban arterial facilities including two‐lane undivided, three‐lane with a two‐way left‐turn lane, four‐lane undivided, four‐lane divided, and five‐lane with a two‐way left‐turn lane. Two heuristic approaches were adopted to prioritize the variables: (i) simple ranking based on individual relative influence of variables; and (ii) clustering based on relative influence of variables within a specific range. Traffic volume was found as the most influential variable. Roadside object density, minor commercial driveway density, and minor residential driveway density variables were the other variables with significant influence on crash predictions. Copyright © 2015 John Wiley & Sons, Ltd.
ISSN:0197-6729
2042-3195
DOI:10.1002/atr.1358