5. A first parametric SPF CH1. What is what CH2. A simple SPF CH3. EDA CH4. Curve fitting CH5. A first parametric SPF CH6: Which fit is fitter CH7: Choosing the objective function CH8: Theoretical stuff Ch9: Adding variables CH10. Choosing a model equation In this session: 1. How to build a C-F spreadsheet; 2. How to estimate s{m}; 3. Can one get CMFs? 4. How accurate are the parameters.1 The one-through approach to modeling: Assemble data Postulate model like: 𝐸 𝜇 = 𝛽0 Leβ 1 X 1 eβ 2 X 2 eβ 3 X 3 … Estimate the b’s and check significance Remove (add) variables, Re-estimate parameters Report SPF workshop February 2014, UBCO 2 The snakes and ladders view of modeling 3 Gradualism - its merits 1. Didactic 2. Model equation emerges gradually 3. Usable model at each stage Simple model equation Begin with X1≡ Segment Length. Add X2≡ AADT, X3 ... later. SPF workshop February 2014, UBCO 4 The first variable: Segment Length Which function? Why not linear? EDA hinted at non-linearity Possible reasons: Longer segments have fewer driveways/mile, higher speed, are they further from trauma centers, … But all this is unknown. We can choose only by: parameter parsimony goodness-of-fit quality of prediction 5 If fitting is only by: parameter parsimony goodness-of-fit quality of prediction can it be a source of CMFs? SPF workshop February 2014, UBCO 6 Recall: Two perspectives on SPF E{m} and s{m} = f(Traits, parameters) Applications centered perspective Cause and effect centered perspective The perspective determines how modeling is done. If you want CMFs you do it one way; If you want E{μ} and σ{μ} for use in applications you do it differently. SPF workshop February 2014, UBCO 7 The cause-effect issue: Illustration 1 Do ‘A’ or ‘B’? Designer asks: How many crashes on Tangent of ‘A’ vs. Tangent of ‘B’ Curve ‘A’ vs. curve ‘B’ Sum SPF workshop February 2014, UBCO 8 Suppose we have model equation Ê{m} β 0 X1β ... and estimates β0 =1.67, β1 0.86 (as will happen soon) 1 Tangent Length A B 1 mile 0.5 miles Eμ 1.67 crashes 0.92 crashes By choosing ‘B’ save 1.67-0.92=0.75 crashes on the tangent. Trouble Why by choosing ‘B’ and eliminating 0.5 miles we save 0.75 crashes when on the remaining (identical) 0.5 miles we expect 0.92 crashes? SPF workshop February 2014, UBCO 9 The explanation … The E μ = 1.67X10.86 is a fit to 5323 segments of varying length. For segments found to be 1 mile long E μ =1.67 crashes; For segments found to be 0.5 miles E μ =0.92 crashes. The reason why E μ|1 mile ≠ 2 × E μ|0.5 mile is that segments found to be 1 mile long differ from segments found to be 0.5 miles long in many traits other than length. (Were it not so β1 would be 1.) SPF workshop February 2014, UBCO 10 The question was: Why by eliminating 0.5 miles we save 0.75 crashes if on the remaining (identical) 0.5 miles we expect 0.92 crashes? The answer: The designer’s 0.5 mile and 1 mile tangents are identical in traits while the model predicts for segments that differ in traits Conclusion: Model cannot be used by designer. In General: If we had data about all safety-relevant traits, and if we knew the function by which they combine, then models might be trusted to predict the effect of design changes (manipulations). But, as it is, …. SPF workshop February 2014, UBCO 11 The cause-effect issue: Illustration 2 Garber & Gadirau, 1988 This relationship is ‘regular’ and a smooth function could be fitted to it. Does this mean that increasing the average speed on a road reduces the accident rate? No. Roads in population A differ from roads in population B by many traits, not only in average speed. SPF workshop February 2014, UBCO 12 Would accounting for many variables help? Hauer, et al. (2004), “Safety Models for Urban Four-lane Undivided Road Segments.” Transportation Research Record 1897. After accounting for (1)Traffic flow , (2)Segment Length, (3) Percent trucks, (4) Degree of curve, (5) Lane width, (6) Shoulder traits, (7) driveways SPF workshop February 2014, UBCO 13 Why “has the enterprise not been successful”? Here is how to establish ‘Hooke’s Law’: Take a spring. Hang weight and measure elongation Increase weight and measure again,… Plot weight against elongation If straight line, regress to find ‘Spring constant’ Your data is experimental. You can predict the effect of weight on elongation. SPF workshop February 2014, UBCO 14 In contrast, imagine a roomful of springs with different weights. You can measure the length of the springs and their weights, the way you found them. Now your data is observational… You could still find Hooke’s Law and predict the effect of weight on length if all spring were identical. But if they are not, the task is difficult*, perhaps impossible. * It would be particularly difficult if, say, heavy weights would tend to go with stiff springs. SPF workshop February 2014, UBCO 15 Opinion differ My opinion: In road safety there are no identical springs. A model equation does not allow one to say: “If I change predictor variable X by ΔX then E{m} will change by Δm. CMFs obtained from SPFs based on crosssection data cannot be trusted for use in practice. SPF workshop February 2014, UBCO 16 The first variable Begin with X1≡ Segment Length. SPF workshop February 2014, UBCO 17 The first C-F Spreadsheet Open: Spreadsheet #6 ‘OLS without constraint’ on ‘Data’ . Data (AADT used later) 18 Proceed to ‘Add fitted Values’ worksheet 1. Add ‘initial guesses’ E μ = β0 X β1 2. Add formula and copy down SPF workshop February 2014, UBCO 19 Use this initial guess. Check correspondence graph. OK? Better initial guess SPF workshop February 2014, UBCO 20 Complete C-F spreadsheet 2. Sum of SD’s 1. Add ‘squared difference’ formula and copy down 3. Sum of ‘Observed’ and of ‘Fitted’ SPF workshop February 2014, UBCO 21 The four parts of a C-F Spreadsheet 1. Data 3. Fitted Values 4.Objective Function 2. Parameters 22 Click on ‘Data’ tab and then on ‘Solver’ SPF workshop February 2014, UBCO 23 These make the sum of squared differences smallest SPF workshop February 2014, UBCO 24 How reliable are parameter estimates? Scale vs. Shape Does this mean that the E{m} increases with segment length less than linearly? SPF workshop February 2014, UBCO 25 Parameter values are uncertain for several reasons: 1. The estimate of β1(=0.866) would be different if the accident counts (or, later, the AADT) were a bit different. Statistical inaccuracy 2. The estimate of β1 will change as new variables are added to the model equation*. 3. The estimate of β1 will depend on the function chosen to represent the new variables and on what objective function is minimized or maximized NonStatistical inaccuracy * The ‘omitted variable bias’ SPF workshop February 2014, UBCO 26 Non statistical inaccuracy (item 2): New variable When AADT will be added b1 will change to 1.078. When Terrain will be added it will change to 0.986 b1 will change as other (correlated) variables are added. In every model there are ‘omitted variables’. Were these in the model parameter estimates would be different. By how much? We do not know! Conclusion: Parameter estimates depend on which variables the modeller puts into the model equation. SPF workshop February 2014, UBCO 27 Non-statistical inaccuracy (item 3): Which objective function? When may one estimate parameters by OLS? When the variances of observed values are equal (and the distribution is symmetrical) The problem of unequal variances is easy to correct by WLS. With weighting b1 changes from 0.866 to 0.860. Conventional Unconventional Method OLS Poisson Likelihood Negative Binomial Likelihood Absolute Differences χ2 Total Absolute Bias Conclusion: Parameter estimates depend on what the modeller chooses to be the objective function b1 0.866 0.860 0.871 0.911 0.737 0.882 28 χ2 +AADT AADT & Terrain Reported! SPF workshop February 2014, UBCO 29 Modelers report only on the ‘statistical accuracy’. As if: a. There were no omitted variables; b. The functional form was the right one; c. The objective function used was the only choice. All are untrue to some extent. Conclusion: The estimated parameters are always less accurate than what is reported; By how much less accurate cannot be said. SPF workshop February 2014, UBCO 30 Implications For parameter-based CMFs; For accuracy of model prediction when predicated on accuracy of model parameters; One can begin to trust them after they stop changing SPF workshop February 2014, UBCO 31 Estimating s{m} SPF ... many populations ... estimates of E{m} and s{m} A general method N-W Slope≌0.5 OLS slope≌0.7 N-W non parametric SPF workshop February 2014, UBCO ? 32 Summary for section 5. (A first parametric SPF) 1. The merits of gradualism; 2. There are no grounds for choosing a model equation other than parsimony and goodness of fit; 3. Detour. Can SPFs be used to determine the effect of change? 4. The four part of a C-F spreadsheet; 5. How the C-F spreadsheet is used to estimate model equation parameters; 6. How accurate are the parameters? What is reported is an exaggeration; 7. How to estimate s{m}. SPF workshop February 2014, UBCO 33