Bivariate Linear Regression

Bivariate Linear Regression Linear Function •Y = a + bX +e Sources of Error • Error in the measurement of X and or Y or in the manipulation of X. • The influence upon Y of variables other than X (extraneous variables), including variables that interact with X. • Any nonlinear influence of X upon Y. The Regression Line Yˆ  a  bX Zˆ y  rZ x • r2 < 1  Predicted Y regresses towards mean Y 2 • Least Squares Criterion:  Y Yˆ   Our Beer and Burger Data Subject X Y 1 2 3 4 5 Sum Mean St. Dev. 5 4 3 2 1 15 3 1.581 8 10 4 6 2 30 6 3.162 sy SSCP COVxy b  r 2 SS x sx sx Yˆ 9.2 7.6 6.0 4.4 2.8 30 (Y  Yˆ ) -1.2 2.4 -2.0 1.6 -0.8 0.0 (Y  Yˆ ) 2 1.44 5.76 4.00 2.56 0.64 14.40 (Yˆ  Y ) 2 10.24 2.56 0.00 2.56 10.24 25.60 SSY  4(3.162) 2  40  3.162  b  .8   1 .6  1.581  a  Y  bX  6 - 1.6(3)  1.2 Pearson r is a Standardized Slope • Pearson r is the number of standard deviations that predicted Y changes for each one standard deviation change in X. br sy sx Error Variance SSE  (Y  Yˆ )2  (1  r 2 )(SSy )  (1  .64)( 40)  14.4 SSE 14.4 MSE    4 .8 n2 3 • What is SSE if r = 0? Y  Y  SSE   Y  Y  2  SSy • If r2 > 0, we can do better than just predicting that Yi is mean Y. Standard Error of Estimate • Get back to the original units of measurement. s y  yˆ  MSE  s y 1 r  2 n 1 n2 sy yˆ  4.8  2.191 Regression Variance • Variance in Y “due to” X SSregr    Yˆ  Y  2  SSy  SSE  r 2SSy  40 - 14.4  25.6 MSregr  SSregr p • p is number of predictors. • p = 1 for bivariate regression. Coefficient of Determination • The proportion of variance in Y explained by the linear model. r  2 SS regr SS y Coefficient of Alienation • The proportion of variance in Y that is not explained by the linear model. SSerror 1 r  SSy 2 Testing Hypotheses • H: b = 0 F MSregr MSE t MSregr MSE , df  n  2 , df  p, n  p  1 • F = t2 • One-tailed p from F = two-tailed p from t Source Table Source Regression Error Total SS 25.6 14.4 40.0 df 1 3 4 MS 25.6 4.8 10.0 F 5.33 p .104 • MStotal is nothing more than the sample variance of the dependent variable Y. It is usually omitted from the table. Power Analysis • Using Steiger & Fouladi’s R2.exe Power = 13% G*Power = 15% Summary Statement The linear regression between my friends’ burger consumption and their beer consumption fell short of statistical significance, r = .8, beers = 1.2 + 1.6 burgers, F(1, 3) = 5.33, p = .10. The most likely explanation of the nonsiginficant result is that it represents a Type II error. Given our small sample size (N = 5), power was only 13% even for a large effect (ρ = .5). The Regression Line is Similar to a Mean Subject X Y 1 2 3 4 5 Sum Mean St. Dev. 5 4 3 2 1 15 3 1.581 8 10 4 6 2 30 6 3.162 Yˆ 9.2 7.6 6.0 4.4 2.8 30 (Y  Yˆ ) -1.2 2.4 -2.0 1.6 -0.8 0.0  Y  Yˆ   0 (Y  Yˆ ) 2 1.44 5.76 4.00 2.56 0.64 14.40  Y  Yˆ  2  Yˆ  Y  2  SSregression (Yˆ  Y ) 2 10.24 2.56 0.00 2.56 10.24 25.60  SSerror Increase n to 10 • Same value of F • r2 = SSregression  SStotal, = 25.6/64.0 = .4 (down from .64). Source Regression Error Total SS 25.6 38.4 64.0 df 1 8 9 MS 25.6 4.8 7.1 F 5.33 p .05 Power Analysis Power = 33% G*Power = 36% Increase n to 10 A .05 criterion of statistical significant was employed for all tests. An a priori power analysis indicated that my sample size (N = 10) would yield power of only 33% even for a large effect (ρ = .5). Despite this low power, the analysis yielded a statistically significant result. Among my friends, beer consumption increased significantly with burger consumption, r = .632, beers = 1.2 + 1.6 burgers, F(1, 8) = 5.33, p = .05. Testing Directional Hypotheses • H: b  0 H1: b > 0 .632 8 t  2.31, one - tailed p  .025 1  .40 • For F, one-tailed p = .05 • half-tailed p = .025. • P(AB) = P(A)P(B) = .5(.05) = .025 Assumptions • • • • • • To test H: b = 0 or construct a CI Homoscedasticity across Y|X Normality of Y|X Normality of Y ignoring X No assumptions about X No assumptions for descriptive statistics (not using t or F) Placing Confidence Limits on Predicted Values of Mean Y|X • To predict the mean value of Y for all subjects who have some particular score on X: •  1 XX ˆ CI  Y  t  s y  yˆ   n SS x  2 Placing Confidence Limits on Predicted Values of Individual Y|X  1 XX ˆ CI  Y  t  s y  yˆ  1   n SS x  2 Bowed Confidence Intervals Testing Other Hypotheses • Is the correlation between X and Y the same in one population as in another? • Is the slope for predicting Y from X the same in one population as in another? • Is the intercept for predicting Y from X the same in one population as in another. Can differ on r but not slope or slope but not r.

Bivariate Linear Regression

Related documents

Products

Support

Bivariate Linear Regression

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib