Workshop Objectives: a. Learn how to fit an SFP to data b. Understand what SPFs can and cannot do CH1. What is what CH2. A simple SPF CH3. EDA CH4. Curve fitting CH5. A first SPF CH6: Which fit is fitter CH7: Choosing the objective function CH8: Theoretical stuff (skip) Ch9: Adding variables CH10. Choosing a model equation SPF workshop UBCO February 2014 1 What is what. 1. What are SPFs? 2. What information do (should) they give us? 3. What is that information used for? Loosely speaking, SPFs are tools that give information about the safety of units such as road segments, intersections, ramps, grade crossings … What is this? SPF workshop UBCO February 2014 2 What is Safety? Here is a count of injury accidents for a Freeway Segment in Colorado. What is its SAFETY? Here is a (monthly) count of accidents for an Intersection in Toronto. What is its SAFETY? Segment of urban freeway in Denver Intersection in Toronto SPF workshop UBCO February 2014 3 … “what is its safety?” implies that SAFETY is a property of UNITS What is a ‘Unit’? A Unit can be a road segment, an intersection, Mr. C.J. Smith, heavy trucks on the 401, etc. SPF workshop UBCO February 2014 4 What is the Safety of a unit? Had I defined: Safety = Accident Counts that would mean that safety improved from 1986 to 1987, deteriorated from 1987 to 1988 etc. Such a definition is not useful for safety management because safety changes even if there is no change in safety-relevant traits. (Exposure, traffic control, physical features, user demography, etc.) 1.9 mile long segment of 6-lane urban freeway in Denver, Colorado SPF workshop UBCO February 2014 5 We need a definition of the safety of a unit such that, as long as the ‘safety-relevant’ traits of the unit do not change, it’s ‘safety’ does not change. Three period running averages; Freeway Segment, Colorado Thirteen period running averages, Intersection, Toronto One can rightly imagine that behind the fluctuations there is a gradually changing safety property that is some kind of average SPF workshop UBCO February 2014 6 Thirteen period running averages, Intersection, Toronto There are three elements in the graph: 1. Observed values ● 2. The invisible (unknown) safety property 3. Our estimate of the unknown property SPF workshop UBCO February 2014 μ ○ 7 What is the ‘safety of a unit’? We are now ready. Definition: The safety property of a unit is the number of accidents by type and severity, expected to occur on it in a specified period of time. It will always be denoted by μ and its estimate by Accident type Accident Severity PDO Injury Fatal Rear-end Angle Single-vehicle 3.10 1.40 0.30 Pedestrian 1.70 0.90 0.10 0.20 0.10 0.02 0.05 0.03 SPF workshop UBCO February 2014 8 The ‘safety’ of a unit depends on its ‘traits’ We are gradually assembling the elements needed to say with clarity what an SPF is. Eventually it will be a function of ‘variables’. What is the link between safety and variables? SPF workshop UBCO February 2014 9 Traits & Safety 10 S-R traits Definition: A trait is ‘safety-related’ if when it changes, μ changes. Consequence: Units with the same s-r traits have the same μ. Corollary: Units that differ in some s-r traits differ in μ‘s. SPF workshop UBCO February 2014 11 Populations Units that share some traits form a population of units. Example, (1) rural, (2) two-lane road segments in (3) flat terrain of (4) Colorado. Because only some traits are common the units differ in many s-r traits and therefore differ in their μ We will describe the safety of a population by: Mean of μ’s, E{μ} and Standard deviation of μ’s, σ{μ} SPF workshop UBCO February 2014 12 Populations: real and imagined Example: segments of rural two-lane roads in Colorado form a population Their shared traits are: (1) State: Colorado, (2) Road Type: two-lane, (3) Setting: rural. A new population (subset) (1) & (2) & (3) & (4) Terrain: flat. SPF workshop UBCO February 2014 13 The more traits the fewer units. Colorado data: (1) & (2) & (3) 5323 segments Their shared traits are: (1) State: Colorado, (2) Road Type: two-lane, (3) Setting: rural, Add: 2.5<Segment Length <3.5 miles 597 segments Add: 1000<AADT<2000 vpd 119 segments If bin is 2400<AADT<2420 there are no units even in the rich data. But the SPF will still provide estimate of E{µ} for a population, albeit an ‘imagined ‘ one. SPF workshop UBCO February 2014 14 Finally: “What is an SPF?” A Safety Performance Function is a tool which for a multitude of populations provides estimates of: 1. The mean of the μ’s in populations - E{μ} and 2. The standard deviation of the μ’s in these populations - σ{μ}. Notational conventions to remember SPF workshop UBCO February 2014 15 Notational conventions to remember μ - the expected number of crashes for a unit - estimate of μ . Caret above always means: estimate of ... - Average of μ’s in a population of units. E{.} always means ‘average or expected value of whatever the dot stands for.’ - standard deviation of μ’s in a population of units. σ{.} always means standard deviation of whatever the dot stands for. SPF workshop UBCO February 2014 16 The information we get from an SPF is not about units; it is always about a population of units. When we use the SPF information to estimate the safety of a specific unit we argue as follows: “This unit has the same traits as the units in the population. Therefore my best guess of its μ is E{μ}.” SPF workshop UBCO February 2014 17 In interim summary We needed to be clear about what is an SPF To get there we had to say what we mean by ‘safety of a unit’ and that it depends on its safety-relevant traits Further, we had to mention that units that share some safety-relevant traits form populations of units The safety of a population of units can be described by E{m} and s{m} These are necessary for practical applications An SPF provides estimates of E{m} and s{m} for many populations SPF workshop UBCO February 2014 18 What Ê{μ} and σ̂{μ} are needed for? Two groups of applications: Group I: We really need the E{m}. Examples: (a)To judge what is deviant we have to know what is ‘normal’ . (b) How different are the E{m}‘s of segments with and without (say, paved shoulders)? Group II. We really need the μ of a specific unit and E{m} helps us to estimate it. Examples: (a) is this road segment a ‘blackspot’? (b) How did the μ of this unit change from ‘before’ treatment to ‘after’ treatment? SPF workshop UBCO February 2014 19 Group I: We need the E{μ} of a population Group II: We need the μ of a unit What is normal for a unit? Is this unit a ‘blackspot’? What might be the safety benefit of treating it? What was the safety benefit of treating it How different are the means of two populations { } To answer: Ê{μ} and σ̂ Ê{μ} Ê{μ} , σ̂{Ê{μ}} and σ̂{μ} SPF workshop UBCO February 2014 20 Is there a Group III? Some believe that we want to know the function linking E{m} and traits in order to be able to say how a change in the level of a trait will affect the E{m} of units. Opinions differ on whether such a use of an SPF can be trusted. I do not think so, and will give my reasons in Session 5. I hope that by the end of the workshop there will be more CMF skeptics. 21 What Ê{μ} and σ̂{μ} are used for? A sequence of simple illustrations. 1. How many units are deviant? Go to ‘Spreadsheets to accompany PowerPoints.’ Open Spreadsheet #1 2. How well will my screen work? ‘Connecticut Drivers’ on ‘1. Data’ workbook. 3. What will be the accident savings of a treatment? 4. How effective was the treatment? SPF workshop UBCO February 2014 22 Preliminaries: Get Ê{μ} and σ̂{μ} Data Connecticut drivers (1931-1936) Crashes, (k) 0 1 2 3 4 5 6 7 Total = Drivers, n(k) 23881 4503 936 160 33 14 3 1 29531 SPF workshop UBCO February 2014 23 Open workbook 2. Mean and variance estimates’ (of #1) A B C k n(k) B/B$11 0 23881 0.8087 1 4503 0.1525 2 936 0.0317 3 160 0.0054 4 33 0.0011 5 14 0.0005 6 3 0.0001 7 1 0.0000 29531 D E A * C (A-D$11)2*C 0.000 0.047 0.152 0.088 0.063 0.098 0.016 0.041 0.004 0.016 0.002 0.011 0.001 0.003 0.000 0.002 0.240 0.306 Computing sample mean and variance. 0.26 24 Stay on workbook 2. ‘Mean and variance estimates’ (of #1) A B C k n(k) B/B$11 0 23881 0.8087 1 4503 0.1525 2 936 0.0317 3 160 0.0054 4 33 0.0011 5 14 0.0005 6 3 0.0001 7 1 0.0000 29531 D E A * C (A-D$11)2*C 0.000 0.047 0.152 0.088 0.063 0.098 0.016 0.041 0.004 0.016 0.002 0.011 0.001 0.003 0.000 0.002 0.240 0.306 0.26 Estimate of V{μ}, 𝜎{𝜇} =√0.26=0.51 Naturally σ{μ}>0. Even is we used age, gender and exposure as traits, there still would be differences SPF workshop UBCO February 2014 25 Use Ê{μ} and σ̂{μ}for: Screening. Question: What % is these drivers have a μ that is, say, more than 5 times the mean? (μ>5*0.24=1.2 acc. in six years) Open workbook 3. ‘How many High mu drivers’ (of #1) GAMMADIST(μ, b, 1/a, TRUE) SPF workshop UBCO February 2014 26 P(μ<1.20) Answer: 1. Assume that μ are Gamma distributed. 2. Compute parameters of 3. Use Excel function GAMMADIST(μ, b, 1/a, TRUE) 4. P(μ<1.20)=0.99 5. There are (≈ 29,531*0.01=) 295 such (5 x) drivers 27 Use Ê{μ} and σ̂{μ} for: Screen Performance Question: If we decide to ‘treat’ those 51 (out of 29,531) who had 4 or more accidents how will such a screen do? Connecticut drivers (1931-1936) Crashes, (k) 0 1 2 3 4 5 6 7 Total = Drivers, n(k) 23881 4503 936 160 33 14 3 1 29531 SPF workshop UBCO February 2014 28 To answer we have to determine how many of those drivers with 4, 5, 6 or 7 crashes are truly ‘high μ’? Open workbook 4. ‘Gamma with k=4, 5, 6, 7’ (of #1) If in a population of unit μ is Gamma distributed then the μ’s of those units with k crashes are also Gamma distributed with EB SPF workshop UBCO February 2014 29 Modify formula in B7 and copy down First answer: Amongst those who recorded 4 crashes, 66% have μ<1.2. Do same for k=5, 6, and 7. Record. SPF workshop UBCO February 2014 30 Use Ê{μ} and σ̂{μ} for: Screen Performance k n(k) P(μ≤1.2) False Positives Correct Positives 4 33 0.66 22 11 5 14 0.49 7 7 6 3 0.33 1 2 7 1 0.20 0 1 Sums 51 30 21 Answer: Of 295 with μ>1.2, 21 correctly identified, 30 incorrectly identified and the rest missed 274 missed SPF workshop UBCO February 2014 31 Use Ê{μ} and σ̂{μ} for: Anticipating benefit CMF≡ Expected accident ‘with’ Expected accident ‘without’ Preliminaries Reduction in accidents=m(1-CMF) Question: How many accidents will be saved if treatment with CMF=0.95 is administered to Connecticut drivers with k≥4? SPF workshop UBCO February 2014 32 k+b Recall that: E{μ|k}= EB a+1 Thus, e.g., for k=4, (4+0.85)/(3.55+1)=1.07 crashes Open workbook 5. ‘Anticipating benefit’ workpage (of #1) k 4 5 6 7 n(k) 33 14 3 1 (k+b)/(a+1) 1.07 1.29 1.51 1.73 n(k)*(k+b)/(a+1) 35.2 18.0 4.5 1.7 59.4 Expected reduction=59.4×(1-0.95)=2.97 acc. in six years. SPF workshop UBCO February 2014 33 Use Ê{μ} and σ̂{μ} for: Research about CMF The 51 drivers with k>=4 received some treatment. Question: If treatment had no effect, and nothing else changed, how many crashes are they expected to have in a 6-year ‘after treatment’ period? k 4 5 6 7 n(k) 33 14 3 1 (k+b)/(a+ 1) 1.07 1.29 1.51 1.73 k+b Just as before: E{μ|k}= a+1 n(k)*(k+b)/(a+1) 35.2 18.0 4.5 1.7 59.4 SPF workshop UBCO February 2014 34 k 4 5 6 7 n(k) 33 14 3 1 (k+b)/(a+ 1) 1.07 1.29 1.51 1.73 n(k)*(k+b)/(a+1) 35.2 18.0 4.5 1.7 59.4 How come that drivers with 227 accidents are expected to have only 59.4? Before: 4*33+5*14+6*3+7*1=227 crashes in six years If ineffective, Expected After= 59 crashes in six years 227-59=168 Regression to mean! SPF workshop UBCO February 2014 35 Summary of illustrations: We used estimates of E{μ} and VAR{μ} to: • Estimate how many deviant units are in a population; • Estimate how many deviants are in subpopulations of units with many crashes (correct and false positives and negatives); • How many crashes will be saved and how many to expect after an ineffective treatment. SPF workshop UBCO February 2014 36 Two perspectives on SPF E{m} and s{m} = f(Traits, parameters) Applications centered perspective Cause and effect centered perspective The perspective determines how modeling is done 37 The perspective determines how modeling is done E{m} and s{m} = f(Traits, parameters) Here the question is: “How to do modeling to get good estimates of E{m} and s{m}? Applications centered perspective SPF workshop UBCO February 2014 38 The perspective determines how modeling is done E{m} and s{m} = f(Traits, parameters) Here the question is:” How to do modeling to get the right ‘f’ and parameters so that I can compute the change in E{m} caused by a change in a trait. Cause and effect centered perspective SPF workshop UBCO February 2014 39 Summary of 1. 1. We defined ‘safety’; 2. The safety of a unit is determined by its s-r traits; 3. Units that share some traits form a population; 4. The safety of a population is described by E{μ} and σ{μ}; 5. The SPF is ... A Safety Performance Function is a tool which for a multitude of populations provides estimates of: 1. The mean of the μ’s in populations - E{μ} and its accuracy; 2. The standard deviation of the μ’s in these populations - σ{μ}. SPF workshop UBCO February 2014 40