Multiple

advertisement
Multiple Regression
Extension of Simple Linear Regression –
using multiple predictors
each predictor could help predict or explain additional variability
in the response/criterion variable
However:
What should be the effect of using any additional
predictors?
Multiple Regression
What should be the effect of using additional
predictors?
Logically, unless correlation with DV is 0,
each predictor will improve prediction
(explain additional variance in DV)
So just adding variables as predictors at
random will usually “improve” model
Creates potential for misuse of the strategy
IDEALLY
each predictor should be
- correlated with the DV
- uncorrelated with other predictors
(r over .8 undesirable)
each predictor should explain some
unique variability in DV
each predictor should make sense!
Best situation
CLEAR THEORY or
LOGIC determines the predictors selected
Examples
Relationship Commitment
satisfaction with outcomes (+)
investments in relationship (+)
attractiveness of available alternatives (-)
Job Satisfaction
salary
physical conditions
social conditions
Simple linear regression
Yp = a + bX (+ residuals)
Multiple regression
Yp = a + b1X1 + b2X2 (+ residuals)
a is value of Y when all X = 0 (regression constant)
b’s are ‘partial regression coefficients’
slope for each predictor when
other predictors held constant
Graph of relationship when
two predictors are used
Now try to fit a plane rather
than a line – to minimize the
errors of prediction
Multiple regression
Yp = a + b1X1 + b2X2 + b3X3 (+ residuals)
Commitment = a +
b1 (satisfaction) +
b2 (investments) +
- b3 (alternatives)
(+ residuals)
A weighted linear combination of predictors
comparison to ANOVA – main effects only model
• Let’s return to the question of predicting
Exam 2 grades – using multiple predictors
•
•
•
•
•
Undergraduate GPA (0-4 scale)
GRE Verbal (200-800 scale)
GRE Quantitative (200-800 scale)
Exam 1 grade (0-100 scale)
Mean Homework grade (0-10 scale)
Note variety of scales for predictors, weights
(partial regression coefficients) will be variable
to take those into account
Ideally, all predictors are related to the
criterion, and are unrelated to each other
Variables Entered/Removedb
Model
1
Variables Entered
homework, grev, gpatot, greq , exam1
Variables
Removed
a
.
Using just Exam 1 score, the
correlation between Exam 1 and
Exam 2 was r = .637, r2 = .406
Method
Enter
a. All requested variables entered.
b. Dependent Variable: exam2
Model Summary
Model
1
R
.735a
Adjusted
R Sq uare
.518
R Sq uare
.540
Std. Error of
the Estimate
4.61879
Now the R, between the set of
predictors and Exam 2 is .735,
and R2 = .540
a. Predictors: (Constant), homework, grev, g patot, g req,
exam1
ANOVAb
Model
1
Reg ression
Residual
Total
Sum of
Squares
2609.715
2218.658
4828.373
df
5
104
109
Mean Square
521.943
21.333
F
24.466
Sig .
.000a
a. Predictors: (Constant), homework, grev, gpatot, greq , exam1
b. Dependent Variable: exam2
Coefficientsa
Model
1
(Constant)
gpatot
grev
greq
exam1
homework
Unstandardized
Coefficients
B
Std. Error
16.059
8.406
.192
1.226
-7.4E-005
.006
.001
.007
.440
.067
3.673
.681
Standardized
Coefficients
Beta
.011
-.001
.006
.500
.390
t
1.911
.157
-.011
.086
6.538
5.394
Sig .
.059
.876
.991
.932
.000
.000
Correlations
Zero-order
Partial
.127
.137
.078
.637
.564
.015
-.001
.008
.540
.468
Part
.010
-.001
.006
.435
.359
a. Dependent Variable: exam2
Exam 2 (Pred) = 16.06 + .19 (gpa) - .00(grev) + .00(greq) +.44(exam1) +3.67(homework)
Since gpatot,
grev, greq
were all not
significant,
should they
be excluded
from the
equation?
Assumptions - never likely to satisfy all
Essentially same as for r, but at a multivariate level
Independent Observations
Interval/Ratio Data – or at least pretend
Normality – all Predictors (X’s) and Response (Y)
- errors of prediction are normally distributed
Linearity
all X’s have linear relationship with Y
- errors of prediction/predicted scores are
linear
Equality of Variances (Homoscedasticity)
- variability of errors of Y are same
at all values of X
Assumptions can
be evaluated within
SPSS at the
multivariate level.
In the Regression
window, choose
Plots and request
zresid (Y) and
zpred (X).
The tables at the
right demonstrate
the patterns that
would indicate each
violation. Although
deciding when
there is ‘enough’
discrepancy is still
subjective.
From Tabachnick &
Fidell (2007). Using
multivariate
statistics (5th).
Boston: Allyn &
Bacon.
Example to follow
Predicting Rated Distress : (1) none to Extreme (9) - when partner is emotionally unfaithful using Age
and Rated Distress over Sexual Infidelity as predictors.
All 3 variables are skewed.
Other Considerations in Multiple Regression
-- Truncated Range – same as with r, can lead
to poor assessment of ‘real’ R
-- Outliers due to multivariate deviation
Discrepancy (distance) –outlier on criterion
Leverage – outlier on predictors
Influence – combines D & L to assess
influence on solution (change in
regression coefficients if case deleted)
How these would appear in a simple linear regression situation
From Tabachnick & Fidell (2007). Using multivariate statistics (5th). Boston: Allyn & Bacon.
Other Considerations in Multiple Regression
-- Outliers due to multivariate deviation
Simple diagnostic for Influence is to request
Cook’s Distance statistic in Regression window,
Save option. Values over 1 would suggest
potentially strong influence.

Linear Regres s ion
Line with outlier

6.00

Note that residual for the outlier is not great, but
it has strong influence on solution

4.00
Cook’s distance = 92.6

2.00


0.00
Linear Regres s ion
10.00
20.00
30.00
Line without outlier
ye ars
8.00


6.00
pubs
pubs



4.00

2.00


0.00
10.00
20.00
ye ars
30.00
Other Considerations in Multiple Regression
– Sample Size – if too small may get good, but
meaningless prediction – too little variability
Minimum sample sizes recommended (to detect moderate effect sizes, 13%
with power of approximately .80)
(Green, S. B. (1991) How many subjects does it take to do a regression analysis?
Multivariate Behavioral Research, 26, 499-510.)
• For test of a model n= 50 +8p
• For test of individual predictors in model 104 + p
»
p = number of predictors
Can also conduct a power analysis based on the effect
size you desire to select your sample size
Other Considerations in Multiple Regression
– Multicollinearity or Singularity
• Singularity – when one predictor is a
combination of other predictors included
• Multicollinearity - when other predictors can
account for a high degree of variability in a
predictor
Other Considerations in Multiple Regression
– Diagnostics for Multicollinearity or Singularity
Tolerance is used as diagnostic statistic
If other predictors used to predict a predictor, what variance is shared?
But reported as 1-R2, so closer to 1 is better less than .2 indicates a problem
Variance Inflation Factor (VIF) also used.
It is the reciprocal of Tolerance, so can range from 1 up. Reflects degree to
which standard error of b is increased due to correlations among predictors.
value of 4 cause for some concern
value of 10 serious problem
Assessing the Outcome
Testing the Overall Model as a single outcome
How well do the set of predictors (X’s) predict the criterion (Y)
Ho: all b’s are = 0, all partial regression coefficients = 0
Or
Ho: R = 0, the Multiple Correlation Coefficient = 0
R = correlation of actual Y with weighted linear
combination of predictors (X’s)
Or – since weighted linear combination leads to
predicted scores
R = correlation of actual Y with predicted Yp
Reminder: Partitioning the Variability in Y
SSTotal = Sum (Y - Mean Y)2
variability of Y scores from the mean
Separated into
SSregression = Sum (Yp – Mean Y) 2
Improvement in predictions when using X (variability in Y explained by X),
rather than assuming everyone gets the Mean
SSresidual = Sum (Y - Yp) 2
Degree to which predictions do not match the actual scores
(prediction errors that have been minimized)
Example from Simple Linear Regression

120.00
iq = 53.05 + 16.99 * gpa
R-Square = 0.69


110.00
iq

100.00
Improvement
in Prediction
using GPA

Residual – distance from the
prediction line

Residual much greater here
90.00
Linear Regression

2.50
3.00
Mean GPA =
3.06
gpa
3.50
Mean IQ = 105
This would be your best
‘guess’ for every person if
you had no useful predictor
Test using F – similar to simple linear regression
Partition SST into
• SSregression (explained by weighted combination)
• SSresidual (unexplained)
F = SSregression /df regression
SSresidual / df residual
F = MSregression
MSresidual
Number of parameters in model
(predictors + intercept) – df often
indicated as p, since always only
one a (df = p +1 -1)
df = (p + a) - 1
df = n – p - 1
= explained (systematic+unsystematic)
unexplained (unsystematic)
Was R reliably different from 0 ? Yes, if F is significant
Recall: Standard Error of the Estimate = SQRT (MS residual)
R2 = SSregression = explained variability
SST
total variability
% of variance accounted for by the model
(see next slide for ANOVA example)
Adjusted R2
for better estimate for population, adjusted based on
number of predictors and sample size
Adjusted R2 = 1- ((1-R2) (n-1/n-p-1))
Can use R2 for describing a sample
so lower if small sample, but many predictors
Tests of Between-Subjects Effects
Dependent Variable: Sensitive
Source
Corrected Model
Intercept
GENDER
RELATE
GENDER * RELATE
Error
Total
Corrected Total
Type III
Sum of
Squares
83.200a
6451.600
14.400
44.600
24.200
549.200
7084.000
632.400
Partial
df
7
1
1
3
3
152
160
159
Mean
Square
11.886
6451.600
14.400
14.867
8.067
3.613
F
3.290
1785.585
3.985
4.115
2.233
Sig.
.003
.000
.048
.008
.087
Eta
Squared
.132
.922
.026
.075
.042
a. R Squared = .132 (Adjusted R Squared = .092)
Test of model in which there are 3 predictors used to predict the rating on “Sensitive”, the DV
Example from Handout Packet, Page approx. 47
In some cases, the purpose of the regression
analysis is simply to see if the Model “works”.
Does it explain variance in the criterion?
Can it be used to make predictions?
Thus, the overall test of the model is all you need,
and you can interpret the R2 or R2adj
and the SEE, if plan predictions
In other cases, you might want to know how the
individual predictors contributed to the overall
model.
Assessing the contribution of individual predictors
Dependent upon the set of predictors included!
Partial regression coefficient– can test to see if b = 0
Is b = 0 (slope = 0) when other predictors are held constant
Tested using a t test with df = n – p – 1
Beta – partial regression coefficient when all variables are standardized
(standardized slope)
If b is significant, so is beta
Test of Partial Regression Coefficient is like a typical statistical test
of significance - it is or is not significant,
and is influenced by sample size
Can also evaluate predictors based on “effect size”
measures (practical significance)
These would be “significant” if b is significant
Partial correlation (pr) – as described in simple covariation section
correlation of predictor (X1) with DV (Y) after removing
the variance in both explained by the other predictors
So both X1 and Y are adjusted before correlation is
calculated
All other X’s are ‘partialed’ out of X1 and Y
pr2 – shared variance within context –
what % of variability in Y does X1 explain after other
variables’ contributions to explaining both are removed
there is less than 100% of variability of Y left for X1 explain
Semi-partial (part) correlation (sr) –
correlation of predictor (X1) with DV (Y) after removing the variance
of X1 shared with the other predictors
So X1 is adjusted by removing variance shared with other X’s
But all variability in Y is left to be explained
Assesses ‘unique’ contribution of X1 to explaining Y
There is 100% of variability in Y to explain for each X in model
sr2 is considered best measure of individual predictor
importance (practical significance)
R2 will be lowered by sr2 for predictor when it is removed from model
(BOTH pr and sr ARE STILL DEPENDENT ON THE MODEL USED)
WHY?
Shared variability
DV and X1
IV 1
a
Variability
of DV (Y)
d
Shared variability
IV 1 and X2
b
c
IV 2
Partial correlation (X1) = a/(a + d)
Semi-partial correlation (X1) = a/(a + b + c + d)
Shared variability
DV and X2
Types of Multiple Regression
Standard –
all predictors entered together
contribution of each depends on others
in the group
Assumes other variables would usually be
there and/or are relevant
Four Humor Styles
Investment Model Variables
Big Five Personality Dimensions
Hierarchical Regression - enter in planned sequence
Can enter individual predictors one at a time
Or enter groups of variables at separate steps
As new predictors are added, each one can only explain
variability that is left
Assess change in R2 at each step (increase significantly)
and overall model when done
Predicting adult IQ
Parental IQ
Prenatal experience
Early infant experience
Education
Statistical Methods –
Let the data determine inclusion in the model, not based
on a logical or theoretical ‘plan’
Assess each step by evaluating change in R or R2
Usually an exploratory tool in possible model building
Requires a larger sample to have confidence
(40 cases per predictor)
Stepwise
Begins with single best predictor
Adds next best, and assesses if model is better
At each step, each variable is reassessed, and
might be kept or removed
Stops when adding additional variables does not
significantly improve model (R)
Forward inclusion
Begins with single best predictor
Adds next best, and assesses improvement
Once in, stay in,
but only stay if improved model (R)
Backward exclusion
Begins with full model
Removes weakest contributor and assesses loss
Keeps removing unless significant drop in R
Research questions using Multiple Regression
Assess Overall Model
Assess individual predictors
Effects of adding or changing predictors
on overall model
on other individual predictors
Predictions in new sample
Other Multiple Regression issues/applications
Suppressor variables –
variables that improve the model due to correlations with
other predictors, not criterion. They ‘suppress’ variance
in another predictor that is ‘noise’
evident if simple r with criterion very low but contributes
to model (sr is higher) can also produce a change in sign
from r to b (i.e. positive r but negative b)
Other issues/applications
Mediation Models
Relationship of X to Y is mediated by
some other variable
C
Positive use of Humor for self
a
Perceived Stress
Positive Personality (optimistic, hopeful, happy)
b
Positive use of Humor for self
(High self-enhancing/Low self-defeating)
Perceived Stress
C’
Humor use (H) predicts Perceived Stress (c) - the direct path
Humor use predicts Positive Personality (PP) (a)
Positive Personality predicts Perceived Stress, with Humor in model (b)
In Hierarchical Model, enter PP first, then H, if PP mediates H, H no longer
‘contributes’ to the model – the c’ path, indirect, not significant
Other issues/applications
Moderator Models – relationship of predictor with the
criterion depends upon some other variable
(just like an interaction in ANOVA)
Yp = a + b1x1 + b2x2 + b3(x1x2) + residuals
Main effects
Interaction term added to equation
Often requires some modifications of the data prior to the analysis
- centering variables to avoid multicollinearity (if predictors do
not have true 0 scores)
Best situation
CLEAR THEORY to be tested
Relationship Commitment (low 8 – 72 high)
satisfaction with outcomes (+) (low 3 – 21 high)
investments in relationship (+)
attractiveness of available
alternatives (-) ( low 6 – high 48)
(Subj low 6 – 54 high)
(Obj none 0 - ?? Lots)
In Handout Packet
Begin by examining the individual variables for normality and outliers etc.
Can request Cook’s D to assess for outlier influence
Can check assumptions using plot from regression analysis
Then look at the simple correlations (r)
Expect predictors to correlate with criterion, but not a lot with each other
Correlations
Pearson Correlation
Sig. (1-tailed)
N
Global Commitment
Global Satisfaction
Global alternatives
Objective Investments
Subjective Investments
Global Commitment
Global Satisfaction
Global alternatives
Objective Investments
Subjective Investments
Global Commitment
Global Satisfaction
Global alternatives
Objective Investments
Subjective Investments
Global
Commitment
1.000
.310
-.422
.395
.551
.
.003
.000
.000
.000
75
75
75
75
75
Global
Satisfaction
.310
1.000
-.237
.157
.339
.003
.
.020
.089
.001
75
75
75
75
75
Global
alternatives
-.422
-.237
1.000
-.257
-.440
.000
.020
.
.013
.000
75
75
75
75
75
Objective
Investments
.395
.157
-.257
1.000
.408
.000
.089
.013
.
.000
75
75
75
75
75
Subjective
Investments
.551
.339
-.440
.408
1.000
.000
.001
.000
.000
.
75
75
75
75
75
Check to see how well the model worked
R and R2, and test of significance
Standard error of the estimate
To describe
sample
To generalize
to population
Model Summary
Model
1
R
.619a
Adjusted
R Sq uare
.348
R Sq uare
.384
Std. Error of
the Estimate
6.34228
Typical
residual
a. Predictors: (Constant), Global alternatives, Global
Satisfaction, Objective Investments, Subjective
Investments
ANOVAb
Model
1
Regression
Residual
Total
Sum of
Squares
1751.967
2815.713
4567.680
df
4
70
74
Mean
Square
437.992
40.224
F
10.889
a. Predictors: (Constant), Subjective Investments, Global Satisfaction, Objective
Investments, Global alternatives
b. Dependent Variable: Global Commitment
Sig.
.000a
Now can look at individual predictors
check collinearity
see which predictors are individually significant
look at individual contributions (semi-partial or part r2)
Coefficientsa
Unstandardized
Coefficients
Model
1
(Constant)
Global Satisfaction
Global alternatives
Objective Investments
Subjective Investments
B
19.247
.247
-.179
3.390E-02
.346
a. Dependent Variable: Global Commitment
Std. Error
6.372
.214
.098
.019
.113
Standardi
zed
Coefficien
ts
Beta
.116
-.192
.183
.353
95
t
3.021
1.153
-1.823
1.773
3.071
Sig.
.004
.253
.073
.081
.003
95% Confidence Interval
for B
Lower
Upper
Bound
Bound
6.539
31.955
-.180
.674
-.376
.017
-.004
.072
.121
.571
Correlations
Zero-order
.310
-.422
.395
.551
Partial
.137
-.213
.207
.345
Collinearity Statistics
Part
.108
-.171
.166
.288
Tolerance
.875
.791
.826
.668
VIF
1.143
1.264
1.211
1.496
• Go through example in SPSS
• Look at G*Power
• Stepwise example in Handouts
Download