Chapter 12 Simple Regression & Correlation Analysis © 2002 Thomson / South-Western Slide 12-1 Learning Objectives • Compute the equation of a simple regression line from a sample of data, and interpret the slope and intercept of the equation. • Understand the usefulness of residual analysis in testing the assumptions underlying regression analysis and in examining the fit of the regression line to the data. • Compute a standard error of the estimate and interpret its meaning. © 2002 Thomson / South-Western Slide 12-2 Learning Objectives, continued • Compute a coefficient of determination and interpret it. • Test hypotheses about the slope of the regression model and interpret the results. • Estimate values of Y using the regression model. • Compute a coefficient of correlation and interpret it. © 2002 Thomson / South-Western Slide 12-3 Correlation and Regression • Correlation is a measure of the degree of relatedness of two variables. • Regression analysis is the process of constructing a mathematical model or function that can be used to predict or determine one variable by another variable. © 2002 Thomson / South-Western Slide 12-4 Simple Regression Analysis • Bivariate (two variables) linear regression -- the most elementary regression model – dependent variable, the variable to be predicted, usually called Y – independent variable, the predictor or explanatory variable, usually called X © 2002 Thomson / South-Western Slide 12-5 Airline Cost Data Number of Passengers © 2002 Thomson / South-Western Cost ($1,000) X Y 61 63 67 69 70 74 76 81 86 91 95 97 4.28 4.08 4.42 4.17 4.48 4.30 4.82 4.70 5.11 5.13 5.64 5.56 Slide 12-6 Scatter Plot of Airline Cost Data 6 5 Cost ($1000) 4 3 2 1 0 0 20 40 60 80 100 120 Number of Passengers © 2002 Thomson / South-Western Slide 12-7 Regression Models Deterministic 1X Probabilistic Regression Model: Y = 0 + Regression Model: Y = 0 + 1X + 0 and 1 are population parameters 0 and 1 are estimated by sample statistics b0 and b1 © 2002 Thomson / South-Western Slide 12-8 Equation of the Simple Regression Line Yˆ b0 b1 X where : b 0 = the sample intercept b = the sample slope 1 Yˆ = the predicted value of Y © 2002 Thomson / South-Western Slide 12-9 Slope and Y Intercept of the Regression Line X X Y Y XY nXY b X n X X X 2 1 2 2 X Y XY n X 2 X 2 n Y X b Y b X n b n 0 © 2002 Thomson / South-Western 1 1 Slide 12-10 Least Squares Analysis SSXY X X Y Y XY SSXX b1 X X 2 X 2 X Y n X 2 n SSXY SSXX Y X b Y b X n b n 0 1 © 2002 Thomson / South-Western 1 Slide 12-11 Airline Cost Example: Solving for Slope and Y Intercept of the Regression Line (Part 1) Number of Passengers X Cost ($1,000) Y X 4.28 4.08 4.42 4.17 4.48 4.30 4.82 4.70 5.11 5.13 5.64 5.56 3,721 3,969 4,489 4,761 4,900 5,476 5,776 6,561 7,396 8,281 9,025 9,409 61 63 67 69 70 74 76 81 86 91 95 97 X = 930 Y = 56.69 © 2002 Thomson / South-Western X 2 2 = 73,764 XY 261.08 257.04 296.14 287.73 313.60 318.20 366.32 380.70 439.46 466.83 535.80 539.32 XY = 4,462.22 Slide 12-12 Airline Cost Example: Solving for Slope and Y Intercept of the Regression Line (Part 2) SSXY XY X Y 4,462.22 (930)(56.69) 68.745 n 12 X )2 ( 73,764 (930)2 1689 SSXX X 2 n 12 b1 SSXY 68.745 .0407 SSXX 1689 Y X 56.69 (.0407) 930 1.57 b0 b1 n n 12 12 Yˆ 1.57 .0407X © 2002 Thomson / South-Western Slide 12-13 Graph of Regression Line for the Airline Cost Example 6 5 Cost ($1000) 4 3 2 1 0 0 20 40 60 80 100 120 Number of Passengers © 2002 Thomson / South-Western Slide 12-14 Residual Analysis • Residual is the difference between the actual Y value and the Y value predicted by the regression model • It is the error of the regression model in predicting each value of the dependent variable. © 2002 Thomson / South-Western Slide 12-15 Airline Cost Example: Residual Analysis Number of Passengers X 61 63 67 69 70 74 76 81 86 91 95 97 Cost ($1,000) Y 4.28 4.08 4.42 4.17 4.48 4.30 4.82 4.70 5.11 5.13 5.64 5.56 Predicted Value Ŷ 4.053 4.134 4.297 4.378 4.419 4.582 4.663 4.867 5.070 5.274 5.436 5.518 Residual Y Yˆ .227 .054 .123 -.208 .061 -.282 .157 -.167 .040 -.144 .204 .042 (Y Yˆ) .001 © 2002 Thomson / South-Western Slide 12-16 Airline Cost Example: Excel Graph of Residuals 0.2 Residual 0.1 0.0 -0.1 -0.2 -0.3 60 70 80 90 100 Number of Passengers © 2002 Thomson / South-Western Slide 12-17 Nonlinear Residual Plot 0 © 2002 Thomson / South-Western X Slide 12-18 Nonconstant Error Variance 0 0 © 2002 Thomson / South-Western X X Slide 12-19 Graphs of Nonindependent Error Terms X 0 © 2002 Thomson / South-Western X 0 Slide 12-20 Healthy Residual Plot 0 © 2002 Thomson / South-Western X Slide 12-21 Standard Error of the Estimate Sum of Squares Error SSE Standard Error of the Estimate © 2002 Thomson / South-Western Y Y 2 Y b0 Y b1 XY 2 Se SSE n2 Slide 12-22 Airline Cost Example: Determining SSE Number of Passengers X 61 63 67 69 70 74 76 81 86 91 95 97 Cost ($1,000) Y Residual 4.28 4.08 4.42 4.17 4.48 4.30 4.82 4 .70 5.11 5.13 5.64 5.56 .227 -.054 .123 -.208 .061 -.282 .157 -.167 .040 -.144 .204 .042 Y Yˆ (Y Yˆ ) . 001 (Y Yˆ ) 2 .05153 .00292 .01513 .04326 .00372 .07952 .02465 .02789 .00160 .02074 .04162 .00176 (Y Yˆ ) 2 =.31434 Sum of squares of error = SSE = .31434 © 2002 Thomson / South-Western Slide 12-23 Airline Cost Example: Standard Error of the Estimate Sum of Squares Error SSE Standard Error of the Estimate Y Yˆ 2 0.31434 SSE Se n 2 0.31434 10 0.1773 © 2002 Thomson / South-Western Slide 12-24 Coefficient of Determination • The proportion of variability of the dependent variable accounted for or explained by the independent variable in a regression model © 2002 Thomson / South-Western Slide 12-25 Coefficient of Determination SSYY Y Y Y 2 Y 2 2 n SSYY exp lained var iation un exp lained var iation SSYY SSR SSE SSR SSE 1 SSYY SSYY SSR 2 r SSYY SSE 1 SSYY SSE 2 1 2 0 1 Y 2 Y n © 2002 Thomson / South-Western r Slide 12-26 Airline Cost Example: Coefficient of Determination SSE 0.31434 Y 56.69 Y 270.9251 3.11209 2 SSYY 2 2 n SSE r 1 SSYY .31434 1 3.11209 ..899 2 © 2002 Thomson / South-Western 12 89.9% of the variability of the cost of flying a Boeing 737 is accounted for by the number of passengers. Slide 12-27 Hypothesis Tests for the Slope of the Regression Model S b t S S S SSE n2 1 H 0: 1 0 H 1: 1 0 H 0: 1 0 H 1: 1 0 H 0: 1 0 H 1: 1 0 © 2002 Thomson / South-Western 1 b where: e b e SSXX X 2 SSXX 1 X 2 n the hypothesized slope df n 2 Slide 12-28 Airline Cost Example: Point Estimation Yˆ 1.57 0.0407 X For X 73, Yˆ 1.57 0.040773 4.5411 or $4,541.10 © 2002 Thomson / South-Western Slide 12-29 Airline Cost Example: Confidence Interval to Estimate the Conditional Mean of Y X 1 X 0 n SSXX 2 where : X 0 a particular value of X Yˆ t , n 2 S e 2 X 2 SSXX = X 2 n For X 0 73 and a 95% confidence level , 73 77.5 930 73,764 2 4.5411 2.2280.1773 1 12 2 12 4.5411 1220 4.4191 E Y 73 4.6631 © 2002 Thomson / South-Western Slide 12-30 Airline Cost Example: Confidence Interval to Estimate the Average Value of Y for some Values of X X 62 68 73 85 90 Confidence Interval 4.0934 + .1876 4.3376 + .1461 4.5411 + .1220 5.0295 + .1349 5.2230 + .1656 © 2002 Thomson / South-Western 3.9058 to 4.2810 4.1915 to 4.4837 4.4191 to 4.6631 4.8946 to 5.1644 5.0674 to 5.3986 Slide 12-31 Prediction Interval to Estimate Y for a Given Value of X 1 X 0 X ˆ Y t ,n 2 S e 1 n SSXX 2 where : X 0 a particular value of X 2 X 2 SSXX = X © 2002 Thomson / South-Western 2 n Slide 12-32 Confidence Intervals for Estimation Regression Plot Cost 6 5 Regression 4 95% CI 95% PI 60 70 80 90 100 Number of Passengers © 2002 Thomson / South-Western Slide 12-33 Pearson Product-Moment Correlation Coefficient r SSXY SSX SSY X X Y Y X X Y Y X Y XY n 2 X 2 2 X 2 n © 2002 Thomson / South-Western Y Y 2 n 2 A correlation measure used to determine the degree of relatedness of two variables that are at least of interval level. 1 r 1 Slide 12-34 Three Degrees of Correlation r<0 r>0 r=0 © 2002 Thomson / South-Western Slide 12-35 Economics Example: Computation of r (Part 1) Day 1 2 3 4 5 6 7 8 9 10 11 12 Summations Interest X 7.43 7.48 8.00 7.75 7.60 7.63 7.68 7.67 7.59 8.07 8.03 8.00 92.93 © 2002 Thomson / South-Western Futures Index Y 221 222 226 225 224 223 223 226 226 235 233 241 2,725 X2 55.205 55.950 64.000 60.063 57.760 58.217 58.982 58.829 57.608 65.125 64.481 64.000 720.220 Y2 48,841 49,284 51,076 50,625 50,176 49,729 49,729 51,076 51,076 55,225 54,289 58,081 619,207 XY 1,642.03 1,660.56 1,808.00 1,743.75 1,702.40 1,701.49 1,712.64 1,733.42 1,715.34 1,896.45 1,870.99 1,928.00 21,115.07 Slide 12-36 Economics Example: Computation of r (Part 2) r X Y XY X X 2 n n 2 Y 2 Y n 92.93 2725 21115 , .07 12 2 92 . 93 720.22 619,207 2725 12 12 2 2 .815 © 2002 Thomson / South-Western Slide 12-37 Economics Example: Scatter Plot and Correlation Matrix 245 Futures Index 240 235 230 225 220 7.40 7.60 7.80 8.00 8.20 Interest Interest Interest Futures Index © 2002 Thomson / South-Western Futures Index 1 0.815254 1 Slide 12-38