Bivariate Linear Regression

advertisement
Bivariate Linear Regression
Linear Function
•Y = a + bX +e
Sources of Error
• Error in the measurement of X and or Y or
in the manipulation of X.
• The influence upon Y of variables other
than X (extraneous variables), including
variables that interact with X.
• Any nonlinear influence of X upon Y.
The Regression Line
Yˆ  a  bX
Zˆ y  rZ x
• r2 < 1  Predicted Y regresses towards
mean Y
2
• Least Squares Criterion:
 Y Yˆ


Our Beer and Burger Data
Subject
X
Y
1
2
3
4
5
Sum
Mean
St. Dev.
5
4
3
2
1
15
3
1.581
8
10
4
6
2
30
6
3.162
sy
SSCP COVxy
b

r
2
SS x
sx
sx
Yˆ
9.2
7.6
6.0
4.4
2.8
30
(Y  Yˆ )
-1.2
2.4
-2.0
1.6
-0.8
0.0
(Y  Yˆ ) 2
1.44
5.76
4.00
2.56
0.64
14.40
(Yˆ  Y ) 2
10.24
2.56
0.00
2.56
10.24
25.60
SSY  4(3.162) 2  40
 3.162 
b  .8
  1 .6
 1.581 
a  Y  bX  6 - 1.6(3)  1.2
Pearson r is a Standardized Slope
• Pearson r is the number of standard
deviations that predicted Y changes for
each one standard deviation change in X.
br
sy
sx
Error Variance
SSE  (Y  Yˆ )2  (1  r 2 )(SSy )  (1  .64)( 40)  14.4
SSE 14.4
MSE 

 4 .8
n2
3
• What is SSE if r = 0?
Y  Y

SSE   Y  Y

2
 SSy
• If r2 > 0, we can do better than just predicting that Yi is
mean Y.
Standard Error of Estimate
• Get back to the original units of
measurement.
s y  yˆ  MSE  s y
1 r 
2
n 1
n2
sy yˆ  4.8  2.191
Regression Variance
• Variance in Y “due to” X
SSregr

  Yˆ  Y

2
 SSy  SSE  r 2SSy  40 - 14.4  25.6
MSregr 
SSregr
p
• p is number of predictors.
• p = 1 for bivariate regression.
Coefficient of Determination
• The proportion of variance in Y explained
by the linear model.
r 
2
SS regr
SS y
Coefficient of Alienation
• The proportion of variance in Y that is not
explained by the linear model.
SSerror
1 r 
SSy
2
Testing Hypotheses
• H: b = 0
F
MSregr
MSE
t
MSregr
MSE
, df  n  2
, df  p, n  p  1
• F = t2
• One-tailed p from F = two-tailed p from t
Source Table
Source
Regression
Error
Total
SS
25.6
14.4
40.0
df
1
3
4
MS
25.6
4.8
10.0
F
5.33
p
.104
• MStotal is nothing more than the sample variance of the dependent
variable Y. It is usually omitted from the table.
Power Analysis
• Using Steiger & Fouladi’s R2.exe
Power = 13%
G*Power = 15%
Summary Statement
The linear regression between my
friends’ burger consumption and their
beer consumption fell short of
statistical significance, r = .8,
beers = 1.2 + 1.6 burgers, F(1, 3) = 5.33,
p = .10. The most likely explanation of
the nonsiginficant result is that it
represents a Type II error. Given our
small sample size (N = 5), power was
only 13% even for a large effect (ρ = .5).
The Regression Line is
Similar to a Mean
Subject
X
Y
1
2
3
4
5
Sum
Mean
St. Dev.
5
4
3
2
1
15
3
1.581
8
10
4
6
2
30
6
3.162
Yˆ
9.2
7.6
6.0
4.4
2.8
30
(Y  Yˆ )
-1.2
2.4
-2.0
1.6
-0.8
0.0
 Y  Yˆ   0
(Y  Yˆ ) 2
1.44
5.76
4.00
2.56
0.64
14.40
 Y  Yˆ 
2
 Yˆ  Y 
2
 SSregression
(Yˆ  Y ) 2
10.24
2.56
0.00
2.56
10.24
25.60
 SSerror
Increase n to 10
• Same value of F
• r2 = SSregression  SStotal, = 25.6/64.0 = .4 (down
from .64).
Source
Regression
Error
Total
SS
25.6
38.4
64.0
df
1
8
9
MS
25.6
4.8
7.1
F
5.33
p
.05
Power Analysis
Power = 33%
G*Power = 36%
Increase n to 10
A .05 criterion of statistical significant was
employed for all tests. An a priori power
analysis indicated that my sample size (N = 10)
would yield power of only 33% even for a large
effect (ρ = .5). Despite this low power, the
analysis yielded a statistically significant result.
Among my friends, beer consumption increased
significantly with burger consumption, r = .632,
beers = 1.2 + 1.6 burgers, F(1, 8) = 5.33,
p = .05.
Testing Directional Hypotheses
• H: b  0
H1: b > 0
.632 8
t
 2.31, one - tailed p  .025
1  .40
• For F, one-tailed p = .05
• half-tailed p = .025.
• P(AB) = P(A)P(B) = .5(.05) = .025
Assumptions
•
•
•
•
•
•
To test H: b = 0 or construct a CI
Homoscedasticity across Y|X
Normality of Y|X
Normality of Y ignoring X
No assumptions about X
No assumptions for descriptive statistics
(not using t or F)
Placing Confidence Limits on
Predicted Values of Mean Y|X
• To predict the mean value of Y for all
subjects who have some particular score
on X:
•

1 XX
ˆ
CI  Y  t  s y  yˆ 

n
SS x

2
Placing Confidence Limits on
Predicted Values of Individual Y|X

1 XX
ˆ
CI  Y  t  s y  yˆ  1  
n
SS x

2
Bowed Confidence Intervals
Testing Other Hypotheses
• Is the correlation between X and Y the
same in one population as in another?
• Is the slope for predicting Y from X the
same in one population as in another?
• Is the intercept for predicting Y from X the
same in one population as in another.
Can differ on r but not slope
or slope but not r.
Download