The Empirical Bayes Method for Safety Estimation Doug Harwood MRIGlobal Kansas City, MO Key Reference Hauer, E., D.W. Harwood, F.M. Council, M.S. Griffith, “The Empirical Bayes method for estimating safety: A tutorial.” Transportation Research Record 1784, pp. 126-131. National Academies Press, Washington, D.C.. 2002 http://www.ctre.iastate.edu/educweb/CE55 2/docs/Bayes_tutor_hauer.pdf The Problem You are a safety engineer for a highway agency. The agency plans next year to implement a countermeasure that will reduce crashes by 35% over the next three years. To estimate the benefits of this countermeasure, what safety measure will you multiply by 0.35? What Do We Need To Know? You need to know – or, rather, estimate – what would be expected to happen in the future if no action is taken Then, you can apply crash modification factors (CMFs) for the known effects of planned actions to estimate their effects quantitatively Common Approach: Use Last 3 Years of Crash Data Observed Crashes 2008 2009 2010 30 19 21 More Data Gives a Different Result Observed Crashes 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 22 23 16 16 9 14 17 30 19 21 RTM Example with Average Observed Crashes 7 6 Crashes 5 3- Year average (Xa) Long-term average (m) Random error 4 3 2 1 0 1993 1995 1997 1999 Year 2001 2003 2005 “True Safety Impact of a Measure” 7 3-year average ‘before’ (Xa) 6 Long-term average (m) Crashes 5 Observed safety effect 4 True safety effect 3 2 3-year average ‘after’ 1 0 1993 1995 1997 1999 2001 Year 2003 2005 2007 Regression to the mean problem … High crash locations are chosen for one reason (high number of crashes!) – might be truly high or might be just random variation Even with no treatment, we would expect, on average, for this high crash frequency to decrease This needs to be accounted for, but is often not, e.g., reporting crash reductions after treatment by comparing before and after frequencies over short periods The “imprecision” problem … Assume 100 crashes per year, and 3 years of data, we can reliably estimate the number of crashes per year with (Poisson) standard deviation of about… or 5.7% of the mean However, if there are relatively few crashes per time period (say, 1 crash per 10 years) the estimate varies greatly … or 180% of the mean! Things change… BEWARE about assuming that everything will remain the same …. Future conditions will not be identical to past conditions Most especially, traffic volumes will likely change Past trends can help forecast future volume changes Focus on Crash Frequency vs. AADT Relationships: Use of Crash Rates May Be Misleading 30 F1 Crash Frequency 25 R1 20 C1 F2 15 F3 10 E1 5 C2 E2 0 0 5000 10000 15000 AADT 20000 25000 Before 30000 After The Empirical Bayes Approach Empirical Bayes: an approach to estimating what will crashes will occur in the future if no countermeasure is implemented (or what would have happened if no countermeasure had been implemented) Simply assuming that what occurred in a recent short-term “before period” will happen again in the future is naïve and potentially very inaccurate Yet, this assumption has been the norm for many years The Empirical Bayes Approach The observed crash history for the site being analyzed is one useful and important piece of information What other information do we have available? The Empirical Bayes Approach We know the short-term crash history for the site The long-term average crash history for that site would be even better, BUT… Long-term crash records may not available If the average crash frequency is low, even the longterm average crash frequency may be imprecise Geometrics, traffic control, lane use, and other site conditions change over time We can get the crash history for other similar sites, referred to as a REFERENCE GROUP Empirical Bayes Increases precision Reduced RTM bias Uses information from the site, plus … Information from other, similar sites Safety Performance Functions SPF = Mathematical relationship between crash frequency per unit of time (and road length) and traffic volumes (AADT) 30 Crash Frequency 25 20 15 10 5 0 0 5000 10000 15000 20000 25000 30000 AADT 3-17 How Are SPFs Derived? SPFs are developed using negative binomial regression analysis SPFs are based on several years of crash data SPFs are specific to a given reference group of sites and severity level Different road types = different SPFs Different severity levels = different SPFs 3-18 The overdispersion parameter The negative binomial is a generalized Poisson where the variance is larger than the mean (overdispersed) The “standard deviation-type” parameter of the negative binomial is the overdispersion parameter φ variance = η[1+η/(φL)] Where … μ=average crashes/km-yr (or /yr for intersections) η=μYL (or μY for intersections) = number of crashes/time φ=estimated by the regression (units must be complementary with L, for intersections, L is taken as one) SPF Example Regression model for total crashes at rural 4-leg intersections with minor-road STOP control Np= exp(-8.69 + 0.65 lnADT1 + 0.47 lnADT2) where: Np = Predicted number of intersection-related crashes per year within 250 ft of intersection ADT1 = Major-road traffic flow (veh/day) ADT2 = Minor-road traffic flow (veh/day) 3-20 Calculating the Long-Term Average Expected Crash Frequency The estimate of expected crash frequency: Ne Expected Accident Frequency = w (Np) Predicted Accident Frequency + (1 – w) (No) Observed Accident Frequency Weight (w; 0<w<1) is calculated from the overdispersion parameter 3-21 Weight (w) Used in EB Computations w = 1 / ( 1 + k Np) w = weight k = overdispersion parameter for the SPF Np = predicted accident frequency for site 3-22 Graphical Representation of the EB Method 3-23 Predicting Future Safety Levels from Past Safety Performance Ne(future) = Ne(past) x (Np(future) / Np(past)) Ne = expected accident frequency Np = predicted accident frequency 3-24 Predicting Future Safety Levels from Past Safety Performance The Np(future)/Np(past) ratio can reflect changes in: Traffic volume Countermeasures (based on CMFs) 3-25 CMFs—How to Use Them CMFs are expressed as a decimal factor: CMF of 0.80 indicates a 20% crash reduction CMF of 1.20 indicates a 20% crash increase CMFs—How to Use Them Expected crash frequencies and CMFs can be multiplied together: Ne(with) = Ne(without) CMF Crashes Reduced = Ne(without) - Ne(with) CMFs—Single Factor CMF for shoulder rumble strips Rural freeways (CMFTOT = 0.79) Ne(with) = Ne(without) x 0.79 3-28 CMF Functions CMFs for Lane Width (two-lane rural roads) (Harwood et al., 2000) 3-29 CMFs for Combined Countermeasures CMFs can be multiplied together if their effects are independent: Ne(with) = Ne(without) CMF1 CMF2 Are countermeasure effects independent? EB applications HSM IHSDM Safety Analyst EB applications HSM Part C Estimate long-term expected crash frequency for a location under current conditions Estimate long-term expected crash frequency for a location under future conditions Estimate long-term expected crash frequency for a location under future conditions with one or more countermeasures in place HSM Part B Evaluate countermeasure effectiveness using before and after data EB applications Site-Specific EB Method Based on equations in this presentation Project-Level EB Method If project is made up of components with different SPFs, then there is no single value of k, the overdispersion parameter EB Before-After Effectiveness Evaluation See Chapter 9 in HSM Part B Questions?