Chapter 11 Slides (PPT)

Chapter 11 Linear Regression and Correlation Linear Regression and Correlation • Explanatory and Response Variables are Numeric • Relationship between the mean of the response variable and the level of the explanatory variable assumed to be approximately linear (straight line) • Model: Y   0  1 x    ~ N (0,  ) • 1 > 0  Positive Association • 1 < 0  Negative Association • 1 = 0  No Association 2 e Least Squares Estimation of 0, 1  0  Mean response when x=0 (y-intercept)  1  Change in mean response when x increases by 1 unit (slope) • 0, 1 are unknown parameters (like m) • 0+1x  Mean response when explanatory variable takes on the value x • Goal: Choose values (estimates) that minimize the sum of squared errors (SSE) of observed values to the straight-line: ^ ^ ^ y   0 1 x n      SSE  i 1  yi  y i   i 1  yi    0   1 xi        n ^ 2 ^ ^ 2 Example - Pharmacodynamics of LSD • Response (y) - Math score (mean among 5 volunteers) • Predictor (x) - LSD tissue concentration (mean of 5 volunteers) • Raw Data and scatterplot of Score vs LSD concentration: 80 70 60 LSD Conc (x) 1.17 2.97 3.26 4.69 5.83 6.00 6.41 50 40 SCORE Score (y) 78.93 58.20 67.47 37.47 45.65 32.92 29.97 30 20 1 2 LSD_CONC Source: Wagner, et al (1968) 3 4 5 6 7 Least Squares Computations Parameter Estimates Summary Calculations     x  x y  y    y  y  S xx   x  x S xy S yy 2 2  2 SSE  S yy  S xy S xx  ^ 1  x  x y  y  S    S   x  x  xy 2 ^ xx ^  0  y  1 x 2    yi  y i   SSE  2 i 1  se   n2 n2 n ^ Example - Pharmacodynamics of LSD Score (y) 78.93 58.20 67.47 37.47 45.65 32.92 29.97 350.61 LSD Conc (x) 1.17 2.97 3.26 4.69 5.83 6.00 6.41 30.33 x-xbar -3.163 -1.363 -1.073 0.357 1.497 1.667 2.077 -0.001 y-ybar 28.843 8.113 17.383 -12.617 -4.437 -17.167 -20.117 0.001 Sxx 10.004569 1.857769 1.151329 0.127449 2.241009 2.778889 4.313929 22.474943 Sxy -91.230409 -11.058019 -18.651959 -4.504269 -6.642189 -28.617389 -41.783009 -202.487243 Syy 831.918649 65.820769 302.168689 159.188689 19.686969 294.705889 404.693689 2078.183343 (Column totals given in bottom row of table) 350.61 30.33 y  50.087 x  4.333 7 7 ^ ^ ^  202.4872 1   9.01  0  y   1 x  50.09  (9.01)( 4.33)  89.10 22.4749 ^ y  89.10  9.01x se2  50.72 SPSS Output and Plot of Equation Coefficientsa Model 1 Uns tandardized Coefficients B Std. Error 89.124 7.048 -9.009 1.503 (Cons tant) LSD_CONC Standardized Coefficients Beta -.937 t 12.646 -5.994 a. Dependent Variable: SCORE Math Score vs LSD Concentration (SPSS) 80.00  Linear Regression 70.00  60.00 score  50.00  40.00    30.00 1.00 2.00 score = 89.12 + -9.01 * lsd_conc R-Square4.00 = 0.88 5.00 3.00 6.00 lsd_conc Sig. .000 .002 Inference Concerning the Slope (1) • Parameter: Slope in the population model (1) ^ • Estimator: Least squares estimate:  1 • Estimated standard error: SE ^  se 1 S xx 1 • Methods of making inference regarding population: – Hypothesis tests (2-sided or 1-sided) – Confidence Intervals Hypothesis Test for 1 • 2-Sided Test – H0: 1 = 0 – HA: 1  0 • 1-sided Test – H0: 1 = 0 – HA+: 1 > 0 or – HA-: 1 < 0 ^ 1 T .S . : tobs  se 1 S xx 1 T .S . : tobs  se 1 S xx R.R. : | tobs |  t / 2,n  2 R.R. : tobs  t ,n 2 P  value : 2 P(t | tobs |) P  val  : P(t  tobs ) P  val  : P(t  tobs ) ^ R.R. : tobs   t ,n  2 (1-)100% Confidence Interval for 1 ^ ^  1  t / 2 SE   1  t / 2 se ^ 1 1 S xx • Conclude positive association if entire interval above 0 • Conclude negative association if entire interval below 0 • Cannot conclude an association if interval contains 0 • Conclusion based on interval is same as 2-sided hypothesis test Example - Pharmacodynamics of LSD ^ n  7  1  9.01 se  50.72  7.12 S xx  22.475 SE ^ 1 1  7.12  1.50 22.475 • Testing H0: 1 = 0 vs HA: 1  0  9.01 T .S . : tobs   6.01 1.50 R.R. :| tobs | t.025,5  2.571 • 95% Confidence Interval for 1 :  9.01  2.571(1.50)   9.01  3.86  (12.87,5.15) Confidence Interval for Mean When x=x* • Mean Response at a specific level x* is E ( y | x*)  m y   0  1 x * • Estimated Mean response and standard error (replacing unknown 0 and 1 with estimates): ^ ^ ^ m y   0 1 x* SE ^  se m  1 x * x  n S xx  2 • Confidence Interval for Mean Response: ^ m y  t / 2,n2SE ^ ^ m  m y  t / 2,n 2 se  1 x * x  n S xx  2 Prediction Interval of Future Response @ x=x* • Response at a specific level x* is y x*  m y     0  1 x *  • Estimated response and standard error (replacing unknown 0 and 1 with estimates): ^ ^ ^ y   0 1 x*  1 x * x SE ^  se 1   y n S xx  2 • Prediction Interval for Future Response: ^ y  t / 2,n 2SE ^ y  1 x * x  y  t / 2,n 2 se 1   n S xx ^  2 Correlation Coefficient • Measures the strength of the linear association between two variables • Takes on the same sign as the slope estimate from the linear regression • Not effected by linear transformations of y or x • Does not distinguish between dependent and independent variable (e.g. height and weight) • Population Parameter: ryx • Pearson’s Correlation Coefficient: ryx  S xy S xx S yy 1  r  1 Correlation Coefficient • Values close to 1 in absolute value  strong linear • • • • association, positive or negative from sign Values close to 0 imply little or no association If data contain outliers (are non-normal), Spearman’s coefficient of correlation can be computed based on the ranks of the x and y values Test of H0:ryx = 0 is equivalent to test of H0:1=0 Coefficient of Determination (ryx2) - Proportion of variation in y “explained” by the regression on x: r  (ryx )  2 yx 2 S yy  SSE S yy SS (Total )  SS (Residual )  SS (Total ) 0  r2 1 Example - Pharmacodynamics of LSD S xx  22.475 S xy  202.487 S yy  2078.183 SSE  253.89  202.487  0.94 (22.475)( 2078.183) ryx  2078.183  253.89 r   0.88  (0.94) 2 2078.183 2 yx Syy 80.00  Mean 70.00 80.00  60.00  Linear Regression 70.00  score score SSE 50.00 Mean = 50.09  40.00 60.00   50.00  40.00    1.00   30.00 30.00 2.00 3.00 4.00 lsd_conc 5.00 6.00 1.00  score = 89.12 + -9.01 * lsd_conc 2.00 R-Square 3.00 = 0.88 4.00 5.00 6.00 lsd_conc Example - SPSS Output Pearson’s and Spearman’s Measures l a O C S P 1 7 * S 2 . N 7 7 L P 7 1 * S 2 . N 7 7 * C l a O C S S C 0 9 * S 3 . N 7 7 L C 9 0 * S 3 . N 7 7 * C Hypothesis Test for ryx • 2-Sided Test – H0: ryx = 0 – HA: ryx  0 T .S . : tobs  ryx • 1-sided Test – H0: ryx = 0 – HA+: ryx > 0 or – HA-: ryx < 0 n2 1  ryx2 T .S . : tobs  ryx n2 1  ryx2 R.R. : | tobs |  t / 2,n  2 R.R. : tobs  t ,n  2 P  value : 2 P(t | tobs |) P  val  : P(t  tobs ) P  val  : P(t  tobs ) R.R. : tobs   t ,n  2 Analysis of Variance in Regression • Goal: Partition the total variation in y into variation “explained” by x and random variation ^ ^ ( yi  y )  ( yi  y i )  ( y i  y ) 2 ^ ^  ( y  y)   ( y  y )   ( y  y) 2 i i i 2 i • These three sums of squares and degrees of freedom are: •Total (TSS) DFT = n-1 • Error (SSE) DFE = n-2 • Model (SSR) DFR = 1 Analysis of Variance for Regression Source of Variation Model Error Total Sum of Squares SSR SSE TSS Degrees of Freedom 1 n-2 n-1 • Analysis of Variance - F-test • H0: 1 = 0 T .S . : Fobs R.R. : Fobs HA: 1  0 MSR  MSE  F ,1, n  2 P  value : P ( F  Fobs ) Mean Square MSR = SSR/1 MSE = SSE/(n-2) F F = MSR/MSE Example - Pharmacodynamics of LSD • Total Sum of squares: TSS   ( yi  y) 2  2078.183 DFT  7  1  6 • Error Sum of squares: ^ SSE   ( yi  y i ) 2  253.890 DFE  7  2  5 • Model Sum of Squares: ^ SSR   ( y i  y) 2  2078.183  253.890  1824.293 DFR  1 Example - Pharmacodynamics of LSD Source of Variation Model Error Total Sum of Squares 1824.293 253.890 2078.183 Degrees of Freedom 1 5 6 Mean Square 1824.293 50.778 •Analysis of Variance - F-test • H0: 1 = 0 T .S . : Fobs R.R. : Fobs HA: 1  0 MSR   35.93 MSE  F.05,1, 5  6.61 P  val : P ( F  35.93) F 35.93 Example - SPSS Output b O m d F S M i a g f a 1 R 2 1 2 8 2 R 1 5 6 T 3 6 a P b D Linearity of Regression (SLR) F -Test for Lack-of-Fit (n j observations at c distinct levels of "X") H 0 : E Yi    0  1 X i H A : E Yi   mi   0  1 X i Compute fitted value Y j and sample mean Y j for each distinct X level c nj  Lack-of-Fit: SS  LF     Y j  Y j j 1 i 1 c nj  Pure Error: SS  PE    Yij  Y j j 1 i 1   2 2 df LF  c  2 df PE  n  c SS ( LF )  c  2   MS ( LF )    ~ MS ( PE ) SS ( PE ) n  c     H0 Test Statistic: FLOF Reject H 0 if FLOF  F 1   ; c  2, n  c  Fc  2,n c

Chapter 11 Slides (PPT)

Related documents

Products

Support

Chapter 11 Slides (PPT)

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib