Prediction

advertisement
Prediction
Confidence Intervals, Crossvalidation, and Predictor Selection
Skill Set
• Why is the confidence
interval for an
individual point larger
than for the regression
line?
• Describe the steps in
forward (backward,
stepwise, blockwise, all
possible regressions)
predictor selection.
• What is crossvalidation? Why is it
important?
• What are the main
problems as far as Rsquare and prediction
are concerned with
forward (backward,
stepwise, blockwise, all
possible regressions)
Prediction v. Explanation
• Prediction is important for practice
– WWII pilot training
•
•
•
•
Ability tests, e.g., eye-hand coordination
Built an airplane that flew
Fear of heights
Favorite flavor ice cream
– Age and driving accidents
• Explanation is crucial for theory. Highly
correlated vbls may not help predict, but may
help explain. Team outcomes as function of
team resources and team backup.
Confidence Intervals
D ata from Partial C o rre lation Ex amp le
CI for the line, i.e.,
the mean score:
5
Confidence Interval for Indivdual P redicted V alues
GPA
Note
4 shape.
 1 ( X i  X )2 
S '  S  

2
N
x

 
2
y. x
3
CI  Y 't( / 2,df ) S  '
2
Confidence Interval for Regression Line (Mean)
1
400
500
600
GR E
700
CI for a single person’s score:
 1 ( X i  X )2 
SY '  S 1  

2
N
x

 
2
y. x
S y2.x =MSR. N=
sample size.
The df are for
MSR (variance
of residuals).
CI  Y 't( / 2,df ) SY '
Computing Confidence
Intervals

Suppose: N  20; S y2. x  5.983; X  3;
x 2  40
Find CI for line (mean) at X=1.
Y '  5.05  .75 X
 1 ( X i  X )2 
2
S  '  S y. x  

2
N
x

 
 1 (1  3) 2 
.947  5.983 

20
40


df = N-k-1 = 20-1-1 = 18. Y '  5.05  .75 X  5.05  .75(1)  5.80
CI  Y 't( / 2,df ) S  '  5.8  (2.101)(.947) CI = 3.81 to 7.79
For an individual at X=1, what is the CI?
 1 ( X i  X )2 
SY '  S 1  

2
N
x

 
2
y. x

1 (1  3) 2 
2.623  5.9831  

40 
 20
CI  Y 't( / 2,df ) SY '  5.8  (2.101)( 2.623) CI = .29 to 11.31
Review
Why is the confidence interval for the individual wider
than a similar interval for the regression line?
Why are the confidence intervals regression curved instead
of being straight lines?
Shrinkage
R2 is biased (sample value is too large) because of
capitalizing on chance to minimize SSe in sample.
If the population value of R2 is zero, the expected value in
the sample is R2 =k/(N-1) where k is the number of predictors
and N is the number of people in the sample. If you have
many predictors, you can make R2 as large as you want.
What is the expected value of R-square if N = 101 and k
=10? Ethical issue here.
Common adjustment or shrinkage formula:
R 2  1  (1  R 2 )
N 1
N  k 1
This is reported by SAS (PROC
REG) under ‘Adj R-Sq.’ Adjusts
for both k and N and size of
initial R2.
Shrinkage Examples
Suppose R2 is .6405 with k = 4 predictors and a sample
size of 30. Then
30  1
2

R  1  (1.6405)
.583
30  4  1
R2 = .6405
N=
15
30
100
Adj
R2
.497
.583
.625
R2 = .30
Adj R2
N=
15
30
100
.020
.188
.271
Note small N means lots of shrinkage but also smaller
initial R2 shrinks more.
Cross-Validation
• Compute a and b(s) (can have one or
more IVs) on initial sample.
• Find new sample, do not estimate a and
b, but use a and b to find Y’.
• Compute correlation between Y and Y’
in new sample; square. Ta da! Crossvalidation R2.
• Cross-validation R2 does not capitalize
on chance and estimates operational R2.
Cross-validation (2)
• Double cross-validation
• Data splitting
• Expert judgment weights (don’t try this
at home)
• Math Estimates
Fixed: Rˆ
2
CV
 30  1  30  4  1
 N  1  N  k  1
2 .513  1 
(1  .6405)
 1 
(
1

R
)




 30   30  4  1 
 N   N  k  1 
Random:
 N  1   N  2   N  1
2
2
RˆCV
 1 
(
1

R
)





 N  k  1  N  k  2   N 
Review
• What is shrinkage in the context of
multiple regression? What are the
things that affect the expected amount
of shrinkage?
• What is cross-validation? Why is it
important?
Predictor Selection
• Widely misunderstood and widely misused.
• Algorithms labeled forward, backward,
stepwise, etc.
• NEVER use for work involving theory or
explanation (hint: this clearly means your
thesis and dissertation).
• NEVER use for estimating importance of
variables.
• Use SOLELY for economy (toss predictors).
All Possible Regressions
Data from Pedhazur example.
GREQ
GPA (Y)
GPA
(Y)
1
GREV
MAT
AR
GREQ
.611
1
GREV
.581
.468
1
MAT
.604
.267
.426
1
AR
.621
.508
.405
.525
1
Mean
3.313
565.333
575.333
67.00
3.567
S.D.
.600
48.618
83.03
9.248
.838
GPA is grade point average. GREQ is Graduate Record
Exam, Quantitative. GREV is GRE Verbal. MAT is Miller
Analogies Test. AR is Arithmetic Reasoning test.
All Possible Regressions (2)
Note how easy it is to
choose the model
with the highest R2
for any given number
of predictors. In
predictor selection,
you also need to
worry about cost. You
get both V and Q
GRE in one test. Also
consider what change
in R2 means.
Accuracy in
prediction of dropout.
k
R2
Variables in Model
1
1
1
1
.385
.384
.365
.338
AR
GREQ
MAT
GREV
2
2
2
2
2
2
.583
.515
.503
.493
.492
.485
GREQ MAT
GREV AR
GREQ AR
GREV MAT
MAT AR
GREQ GREV
3
3
3
3
.617
.610
.572
.572
GREQ GREV MAT
GREQ MAT AR
GREV MAT AR
GREQ GREV AR
4
.640
GREQ GREV MAT AR
Predictor Selection
Algorithms
• Forward – build up from start with p value.
End when no variables meet PIN. May
include duds.
• Backward – Start with all vbls and pull out
with POUT. May lose gems.
• Stepwise – Start forward, check backward at
each step. Not guaranteed to give best R2.
• Blockwise – not used much. Forward by
blocks, then any method (eg stepwise) within
block to choose best predictors.
Things to Consider in PS
• Algorithms consider statistical significance,
but you have to consider practical
significance and cost, i.e., algorithms don’t
work well.
• Surviving variables are often there by chance.
Do the analysis again and you would choose
a different set. OK for prediction.
• The value of correlated variables is quite
different when considered in path analysis
and SEM.
Hierarchical Regression
• Alternative to predictor selection
algorithms
• Theory based (a priori) tests of
increments to R-square
Example of Hierarchical Reg
Does personality increase prediction of med school
success beyond that afforded by cognitive ability?
Collect data on 250 med students for first two years.
Model 1:
MedGPA  a  b1UgGPA b2 MAT
R2=.10 , p<.05
Model 2
MedGPA  a  b1UgGPA  b2 MAT  b3Consc  b4 NA
Model test:
( RL2  RS2 ) /( df L  df S )
(.13  .10) /( 4  2)
F

(1  RL2 ) /( N  k L  1) (1  .13) /( 250  4  1)
F(2,245)=4.22, p < .05
R2=.13 , p<.05
Review
• Describe the steps in forward (backward,
stepwise, blockwise, all possible regressions)
predictor selection.
• What are the main problems as far as Rsquare and prediction are concerned with
forward (backward, stepwise, blockwise, all
possible regressions)
• Why avoid predictor selection algorithms
when doing substantive research (when you
want to explain variance in the DV)?
Download