5. a first parametric spf.pptx

advertisement
5. A first parametric SPF
CH1. What is what
CH2. A simple SPF
CH3. EDA
CH4. Curve fitting
CH5. A first parametric SPF
CH6: Which fit is fitter
CH7: Choosing the objective function
CH8: Theoretical stuff
Ch9: Adding variables
CH10. Choosing a model equation
In this session:
1. How to build a C-F spreadsheet;
2. How to estimate s{m};
3. Can one get CMFs?
4. How accurate are the parameters.1
The one-through approach to modeling:
Assemble data
Postulate model like: 𝐸 𝜇 = 𝛽0 Leβ 1 X 1 eβ 2 X 2 eβ 3 X 3 …
 Estimate the b’s and check significance
Remove (add) variables,
Re-estimate parameters
Report
SPF workshop February 2014, UBCO
2
The snakes and ladders view of modeling
3
Gradualism - its merits
1. Didactic
2. Model equation emerges gradually
3. Usable model at each stage
Simple model equation
Begin with X1≡ Segment Length. Add X2≡ AADT, X3 ... later.
SPF workshop February 2014, UBCO
4
The first variable: Segment Length
Which function?
Why not linear?
EDA hinted at non-linearity
Possible reasons: Longer segments have
fewer driveways/mile, higher speed, are
they further from trauma centers, …
But all this is unknown.
We can choose only by:
parameter parsimony
goodness-of-fit
quality of prediction
5
If fitting is only by:
parameter parsimony
goodness-of-fit
quality of prediction
can it be a source of CMFs?
SPF workshop February 2014, UBCO
6
Recall:
Two perspectives on SPF
E{m} and s{m} = f(Traits, parameters)
Applications centered
perspective
Cause and effect centered
perspective
The perspective determines how modeling is done.
If you want CMFs you do it one way;
If you want E{μ} and σ{μ} for use in applications you
do it differently.
SPF workshop February 2014, UBCO
7
The cause-effect issue: Illustration 1
Do ‘A’ or ‘B’?
Designer asks: How many crashes on
 Tangent of ‘A’ vs. Tangent of ‘B’
 Curve ‘A’ vs. curve ‘B’
Sum
SPF workshop February 2014, UBCO
8
Suppose we have model equation Ê{m}  β 0 X1β  ...
and estimates β0 =1.67, β1  0.86 (as will happen soon)
1
Tangent
Length
A
B
1 mile
0.5 miles
Eμ
1.67 crashes
0.92 crashes
By choosing ‘B’ save
1.67-0.92=0.75 crashes
on the tangent.
Trouble
Why by choosing ‘B’ and eliminating 0.5 miles
we save 0.75 crashes when on the remaining
(identical) 0.5 miles we
expect 0.92 crashes?
SPF workshop February 2014, UBCO
9
The explanation …
The E μ = 1.67X10.86 is a fit to 5323 segments of
varying length.
For segments found to be 1 mile long E μ =1.67 crashes;
For segments found to be 0.5 miles E μ =0.92 crashes.
The reason why E μ|1 mile ≠ 2 × E μ|0.5 mile
is that segments found to be 1 mile long differ from
segments found to be 0.5 miles long in many traits other
than length.
(Were it not so β1 would be 1.)
SPF workshop February 2014, UBCO
10
The question was:
Why by eliminating 0.5 miles
we save 0.75 crashes if on the
remaining (identical) 0.5 miles
we expect 0.92 crashes?
The answer:
The designer’s 0.5 mile and 1 mile tangents are identical in
traits while the model predicts for segments that differ in
traits Conclusion: Model cannot be used by designer.
In General: If we had data about all safety-relevant traits,
and if we knew the function by which they combine, then
models might be trusted to predict the effect of design
changes (manipulations).
But, as it is, ….
SPF workshop February 2014, UBCO
11
The cause-effect issue: Illustration 2
Garber & Gadirau, 1988
This relationship is
‘regular’ and a smooth
function could be fitted
to it.
Does this mean that
increasing the average
speed on a road reduces
the accident rate?
No. Roads in population A differ from roads in
population B by many traits, not only in average speed.
SPF workshop February 2014, UBCO
12
Would accounting for many variables help?
Hauer, et al. (2004),
“Safety Models for
Urban Four-lane
Undivided Road
Segments.”
Transportation
Research Record 1897.
After accounting for (1)Traffic flow , (2)Segment Length,
(3) Percent trucks, (4) Degree of curve, (5) Lane width,
(6) Shoulder traits, (7) driveways
SPF workshop February 2014, UBCO
13
Why “has the enterprise not been successful”?
Here is how to establish ‘Hooke’s Law’:
Take a spring.
Hang weight and measure elongation
Increase weight and measure again,…
Plot weight against elongation
If straight line, regress to find ‘Spring constant’
Your data is experimental.
You can predict the effect of weight on
elongation.
SPF workshop February 2014, UBCO
14
In contrast, imagine a roomful of
springs with different weights.
You can measure the length of
the springs and their weights, the
way you found them.
Now your data is observational…
You could still find Hooke’s Law and predict the effect
of weight on length if all spring were identical. But if
they are not, the task is difficult*, perhaps impossible.
* It would be particularly difficult if, say, heavy weights
would tend to go with stiff springs.
SPF workshop February 2014, UBCO
15
Opinion differ
My opinion:
 In road safety there are no identical springs.
 A model equation does not allow one to say:
“If I change predictor variable X by ΔX then E{m}
will change by Δm.
 CMFs obtained from SPFs based on crosssection data cannot be trusted for use in practice.
SPF workshop February 2014, UBCO
16
The first variable
Begin with X1≡ Segment Length.
SPF workshop February 2014, UBCO
17
The first C-F Spreadsheet
Open: Spreadsheet #6 ‘OLS without constraint’ on ‘Data’ .
Data
(AADT used later)
18
Proceed to ‘Add fitted Values’ worksheet
1. Add ‘initial guesses’
E μ = β0 X
β1
2. Add formula
and copy down
SPF workshop February 2014, UBCO
19
Use this initial guess.
Check correspondence graph.
OK?
Better initial guess
SPF workshop February 2014, UBCO
20
Complete C-F spreadsheet
2. Sum of SD’s
1. Add ‘squared difference’
formula and copy down
3. Sum of ‘Observed’ and of ‘Fitted’
SPF workshop February 2014, UBCO
21
The four parts of a C-F Spreadsheet
1. Data
3. Fitted Values
4.Objective Function
2. Parameters
22
Click on ‘Data’ tab and then on ‘Solver’
SPF workshop February 2014, UBCO
23
These make the sum of
squared differences smallest
SPF workshop February 2014, UBCO
24
How reliable are parameter estimates?
Scale vs. Shape
Does this mean that the E{m} increases
with segment length less than linearly?
SPF workshop February 2014, UBCO
25
Parameter values are uncertain for several reasons:
1. The estimate of β1(=0.866) would be
different if the accident counts (or,
later, the AADT) were a bit different.
Statistical
inaccuracy
2. The estimate of β1 will change as new
variables are added to the model equation*.
3. The estimate of β1 will depend on the
function chosen to represent the new
variables and on what objective function is
minimized or maximized
NonStatistical
inaccuracy
* The ‘omitted variable bias’
SPF workshop February 2014, UBCO
26
Non statistical inaccuracy (item 2): New variable
When AADT will be added b1 will change to 1.078.
When Terrain will be added it will change to 0.986
b1 will change as other (correlated) variables are added.
In every model there are ‘omitted variables’. Were these in
the model parameter estimates would be different.
By how much? We do not know!
Conclusion: Parameter estimates depend
on which variables the modeller puts
into the model equation.
SPF workshop February 2014, UBCO
27
Non-statistical inaccuracy (item 3): Which objective function?
When may one estimate parameters by OLS?
When the variances of observed values are equal
(and the distribution is symmetrical)
The problem of unequal variances is easy to correct by WLS.
With weighting b1 changes from 0.866 to 0.860.
Conventional
Unconventional
Method
OLS
Poisson Likelihood
Negative Binomial Likelihood
Absolute Differences
χ2
Total Absolute Bias
Conclusion: Parameter estimates depend on what
the modeller chooses to be the objective function
b1
0.866
0.860
0.871
0.911
0.737
0.882
28
χ2
+AADT
AADT & Terrain
Reported!
SPF workshop February 2014, UBCO
29
Modelers report only on the ‘statistical accuracy’.
As if:
a. There were no omitted variables;
b. The functional form was the right one;
c. The objective function used was the only choice.
All are untrue to some extent.
Conclusion:
The estimated parameters are always less
accurate than what is reported;
By how much less accurate cannot be said.
SPF workshop February 2014, UBCO
30
Implications
 For parameter-based CMFs;
For accuracy of model prediction when predicated
on accuracy of model parameters;
 One can begin to trust them after they stop changing
SPF workshop February 2014, UBCO
31
Estimating s{m}
SPF ... many populations ... estimates of E{m} and s{m}
A general method
N-W Slope≌0.5
OLS slope≌0.7
N-W non parametric
SPF workshop February 2014, UBCO
?
32
Summary for section 5. (A first parametric SPF)
1. The merits of gradualism;
2. There are no grounds for choosing a model equation
other than parsimony and goodness of fit;
3. Detour. Can SPFs be used to determine the effect of
change?
4. The four part of a C-F spreadsheet;
5. How the C-F spreadsheet is used to estimate model
equation parameters;
6. How accurate are the parameters? What is reported
is an exaggeration;
7. How to estimate s{m}.
SPF workshop February 2014, UBCO
33
Download