Bivariate Linear Regression Linear Function •Y = a + bX +e Sources of Error • Error in the measurement of X and or Y or in the manipulation of X. • The influence upon Y of variables other than X (extraneous variables), including variables that interact with X. • Any nonlinear influence of X upon Y. The Regression Line Yˆ a bX Zˆ y rZ x • r2 < 1 Predicted Y regresses towards mean Y 2 • Least Squares Criterion: Y Yˆ Our Beer and Burger Data Subject X Y 1 2 3 4 5 Sum Mean St. Dev. 5 4 3 2 1 15 3 1.581 8 10 4 6 2 30 6 3.162 sy SSCP COVxy b r 2 SS x sx sx Yˆ 9.2 7.6 6.0 4.4 2.8 30 (Y Yˆ ) -1.2 2.4 -2.0 1.6 -0.8 0.0 (Y Yˆ ) 2 1.44 5.76 4.00 2.56 0.64 14.40 (Yˆ Y ) 2 10.24 2.56 0.00 2.56 10.24 25.60 SSY 4(3.162) 2 40 3.162 b .8 1 .6 1.581 a Y bX 6 - 1.6(3) 1.2 Pearson r is a Standardized Slope • Pearson r is the number of standard deviations that predicted Y changes for each one standard deviation change in X. br sy sx Error Variance SSE (Y Yˆ )2 (1 r 2 )(SSy ) (1 .64)( 40) 14.4 SSE 14.4 MSE 4 .8 n2 3 • What is SSE if r = 0? Y Y SSE Y Y 2 SSy • If r2 > 0, we can do better than just predicting that Yi is mean Y. Standard Error of Estimate • Get back to the original units of measurement. s y yˆ MSE s y 1 r 2 n 1 n2 sy yˆ 4.8 2.191 Regression Variance • Variance in Y “due to” X SSregr Yˆ Y 2 SSy SSE r 2SSy 40 - 14.4 25.6 MSregr SSregr p • p is number of predictors. • p = 1 for bivariate regression. Coefficient of Determination • The proportion of variance in Y explained by the linear model. r 2 SS regr SS y Coefficient of Alienation • The proportion of variance in Y that is not explained by the linear model. SSerror 1 r SSy 2 Testing Hypotheses • H: b = 0 F MSregr MSE t MSregr MSE , df n 2 , df p, n p 1 • F = t2 • One-tailed p from F = two-tailed p from t Source Table Source Regression Error Total SS 25.6 14.4 40.0 df 1 3 4 MS 25.6 4.8 10.0 F 5.33 p .104 • MStotal is nothing more than the sample variance of the dependent variable Y. It is usually omitted from the table. Power Analysis • Using Steiger & Fouladi’s R2.exe Power = 13% G*Power = 15% Summary Statement The linear regression between my friends’ burger consumption and their beer consumption fell short of statistical significance, r = .8, beers = 1.2 + 1.6 burgers, F(1, 3) = 5.33, p = .10. The most likely explanation of the nonsiginficant result is that it represents a Type II error. Given our small sample size (N = 5), power was only 13% even for a large effect (ρ = .5). The Regression Line is Similar to a Mean Subject X Y 1 2 3 4 5 Sum Mean St. Dev. 5 4 3 2 1 15 3 1.581 8 10 4 6 2 30 6 3.162 Yˆ 9.2 7.6 6.0 4.4 2.8 30 (Y Yˆ ) -1.2 2.4 -2.0 1.6 -0.8 0.0 Y Yˆ 0 (Y Yˆ ) 2 1.44 5.76 4.00 2.56 0.64 14.40 Y Yˆ 2 Yˆ Y 2 SSregression (Yˆ Y ) 2 10.24 2.56 0.00 2.56 10.24 25.60 SSerror Increase n to 10 • Same value of F • r2 = SSregression SStotal, = 25.6/64.0 = .4 (down from .64). Source Regression Error Total SS 25.6 38.4 64.0 df 1 8 9 MS 25.6 4.8 7.1 F 5.33 p .05 Power Analysis Power = 33% G*Power = 36% Increase n to 10 A .05 criterion of statistical significant was employed for all tests. An a priori power analysis indicated that my sample size (N = 10) would yield power of only 33% even for a large effect (ρ = .5). Despite this low power, the analysis yielded a statistically significant result. Among my friends, beer consumption increased significantly with burger consumption, r = .632, beers = 1.2 + 1.6 burgers, F(1, 8) = 5.33, p = .05. Testing Directional Hypotheses • H: b 0 H1: b > 0 .632 8 t 2.31, one - tailed p .025 1 .40 • For F, one-tailed p = .05 • half-tailed p = .025. • P(AB) = P(A)P(B) = .5(.05) = .025 Assumptions • • • • • • To test H: b = 0 or construct a CI Homoscedasticity across Y|X Normality of Y|X Normality of Y ignoring X No assumptions about X No assumptions for descriptive statistics (not using t or F) Placing Confidence Limits on Predicted Values of Mean Y|X • To predict the mean value of Y for all subjects who have some particular score on X: • 1 XX ˆ CI Y t s y yˆ n SS x 2 Placing Confidence Limits on Predicted Values of Individual Y|X 1 XX ˆ CI Y t s y yˆ 1 n SS x 2 Bowed Confidence Intervals Testing Other Hypotheses • Is the correlation between X and Y the same in one population as in another? • Is the slope for predicting Y from X the same in one population as in another? • Is the intercept for predicting Y from X the same in one population as in another. Can differ on r but not slope or slope but not r.