Chapter Thirteen 13- 1 Linear Regression and Correlation GOALS When you have completed this chapter, you will be able to: ONE Draw a scatter diagram. TWO Understand and interpret the terms dependent variable and independent variable. THREE Calculate the least squares regression line and interpret the slope and intercept values. FOUR Calculate and interpret the coefficient of correlation, the coefficient of determination, and the standard error of estimate. Chapter Thirteen 13- 2 continued Linear Regression and Correlation GOALS When you have completed this chapter, you will be able to: FIVE Conduct a test of hypothesis to determine if the population coefficient of correlation is different from zero. 13- 3 Relationship between two variables Relationship between family income and expenditure Relationship between advertisement money and total sales volume. Relationship between car speed and mileage. 13- 4 Regression Analysis The mathematical model by which we can determine the relationship between dependent and independent variables is known as regression analysis. 13- 5 Example: A corporation owns several companies. The strategic planner for the corporation believes dollars spent on advertising can to some extent be a predictor of total sales dollars. As an aid in long-term planning, she gathers the following sales and advertising information from several of the companies for 2015 (in $ millions). ADVERTISING SALES 12.5 148 3.7 55 21.6 338 60.0 994 37.6 541 6.1 89 16.8 126 41.2 379 Develop the equation of the simple regression line to predict sales from advertising expenditures using these data. 13- 6 Scatter Diagram 13- 7 13- 8 Regression Equation y a bx xy where x y n b 2 x 2 x n a y bx 13- 9 Example: An economist is interested in the relationship between the disposable income of a family and the amount of money spent annually on food. For a preliminary study, the economist takes a random sample of eight middle-income families of the same size (father, mother, two children). The results are as follows, where x denotes disposable income, in thousands of dollars, and y denotes food expenditure, in hundreds of dollars. x y 30 55 36 60 27 42 20 40 16 37 24 26 19 39 25 43 13- 10 a) Identify the predictor and response variables. b) Graph the regression equation and the data points. c) Determine the regression equation for the data. Describe the apparent relationship between disposable income and annual food expenditure. d) What does the slope of the regression line represent in terms of disposable income and annual food expenditure? e) Use the regression equation to predict the annual food expenditure of a family with a disposable income $33,000. 13- 11 13- 12 x y xy x2 30 55 1,650 900 36 60 2,160 1,296 27 42 1,134 729 20 40 800 400 16 37 592 256 24 26 624 576 19 39 741 361 25 43 1,075 625 x = 197 y = 342 xy = 8,776 x2 = 5,143 b=1.2137, a=12.8625; y (x=33)=5291.46 13- 13 Standard Error of Estimate How good the fitting of regression line is? y yˆ 2 Se n2 13- 14 Computing Standard Error : Method 1 y yˆ ( y yˆ ) 2 49.2735 5.7265 32.7928 60 56.5557 3.4443 11.8632 27 42 45.6324 -3.6324 13.1943 20 40 37.1365 2.8635 8.1996 16 37 32.2817 4.7183 22.2623 24 26 41.9913 -15.9913 255.7217 19 39 35.9228 3.0772 9.4691 25 43 43.2050 -0.2050 0.0420 0.0011 353.5450 x y 30 55 36 ŷ Se=7.6762 13- 15 How is the standard error used? If some error term (called residual) does not lie in between 2Se or 3Se, then the corresponding data can be regarded as outlier. 13- 16 A simpler formula to calculate Se Se y a y b xy 2 n2 13- 17 Computing Standard Error : Method 2 x y xy y2 30 55 1,650 3025 36 60 2160 3600 27 42 1134 1764 20 40 800 1600 16 37 592 1369 24 26 624 676 19 39 741 1521 25 43 1075 1849 197 342 8776 15,404 13- 18 Hypothesis test for the slope of regression line t b Se S xx x x n 2 where S xx 2 Example: Income and expenditure: At the 5% significance level, do the data provide sufficient evidence to conclude that the income is useful as a predictor of expenditure? 13- 19 Hypothesis Test for b H 0 : 0, Ha : 0 t 0.025,6 2.447 1.2137 t 7.6767 7.6767 197 197 5143 8 1.2137 5143 4851 .125 1.2137 7.6767 17 .0843 1.2137 0.4493 2.7013 13- 20 Coefficient of Determination This measures the percentage of variation in the observed values of the dependent variable that is explained by the regression. 13- 21 Coefficient of Determination SSE r 1 SST 2 y SST y yˆ y n x y xy 2 2 2 2 SSE SST n x x n 2 2 13- 22 Coefficient of determination 2 342 SST y 2 y 2 / n 15404 1540414620.5 783.5 8 2 197 342 8776 2 8776 8421 . 75 8 783.5 SSE 783.5 197197 5143 4851.125 5143 8 125493.06 783.5 783.5 429.9548 353.5452 291.875 353.5452 r 1 1 0.4512 0.5488 783.5 2 13- 23 Correlation Coefficient Perfect negative correlation Strong -ve correl. -1.0 Perfect positive correlation No correlation Moderate -ve correl. -0.50 -ve correlation Weak –ve correl. Weak +ve correl 0 Moderate +ve correl 0.50 + ve correlation Strong +ve correl. 1.0 13- 24 Perfect Positive Correlation 10 9 8 7 6 y 5 4 3 2 1 0 r = +1 0 1 2 3 4 5 x 6 7 8 9 10 13- 25 Perfect Negative Correlation 10 9 8 7 6 y 5 4 3 2 1 0 r = -1 0 1 2 3 4 5 x 6 7 8 9 10 13- 26 Strong Positive Correlation 10 9 8 7 6 y 5 4 3 2 1 0 0.5 < r < 1.0 0 1 2 3 4 5 x 6 7 8 9 10 13- 27 Zero Correlation 10 9 8 7 6 y 5 4 3 2 1 0 r=0 0 1 2 3 4 5 x 6 7 8 9 10 13- 28 Formula of correlation coefficient 2 2 2 x y y xy x 2 x n y n 8 7 7 68 4 2 1.7 5 5 1 4 34 8 5 1.1 2 51 5 4 0 41 4 6 2 0.5 3 5 4.2 5 2 9 1.8 7 57 8 3.5 3 5 4.2 5 0 .7 4 0 8 4 7 8.2 1 n Another formula for Coefficient of determination: Coefficient of determination = (Correlation Coefficient)2 = r2 13- 29 13- 30 Correlation Coefficient Test Step 1: State the null hypothesis, H0 : = 0 State the alternative hypothesis as Ha: 0, or Ha: < 0 or Ha: >0 Step 2: Decide on the significance level, . Step 3: Find the critical value t/2 (2-tailed test), -t (left-tailed test), +t (right-tailed test) Step 4: Compute the value of test statistic r t 1 r 2 n2 13- 31 Correlation Coefficient test H o: 0 Ha : 0 t 0.05,7 1.895 t r 1 r 2 n2 0.7408 0.7408 0.2746 1 0.5476 6 2.6977 Re ject H o 13- 32 13- 33 Self Test Many studies have been done that indicate the maximum heart rate an individual can reach during intensive exercise decreases with age. A physician decided to do his own study and recorded the ages and peak heart rates of 10 randomly selected people. The results are shown in the following table, where x denotes age, in years, and y denotes peak heart rate. x 30 38 41 38 29 39 46 41 42 24 y 186 183 171 177 191 177 175 176 171 196 13- 34 Graph the regression equation and the data points. Determine the regression equation for the data. Describe the apparent relationship between age and peak heart rate. What does the slope of the regression line represent in terms of age and peak heart rate? Use the regression equation to predict the peak heart rate of a 28year-old person. Identify the predictor and response variables.