ELEMENTARY STATISTICS Chapter 9 Correlation and Regression MARIO F. TRIOLA EIGHTH Chapter 9. Section 9-1 and 9-2. Triola, Elementary Statistics, Eighth Edition. Copyright 2001. Addison Wesley Longman EDITION 1 Chapter 9 Correlation and Regression 9-1 Overview 9-2 Correlation 9-3 Regression 9-4 Variation and Prediction Intervals 9-5 Multiple Regression 9-6 Modeling Chapter 9. Section 9-1 and 9-2. Triola, Elementary Statistics, Eighth Edition. Copyright 2001. Addison Wesley Longman 2 9-1 Overview Paired Data is there a relationship if so, what is the equation use the equation for prediction Chapter 9. Section 9-1 and 9-2. Triola, Elementary Statistics, Eighth Edition. Copyright 2001. Addison Wesley Longman 3 9-2 Correlation Chapter 9. Section 9-1 and 9-2. Triola, Elementary Statistics, Eighth Edition. Copyright 2001. Addison Wesley Longman 4 Definition Correlation exists between two variables when one of them is related to the other in some way Chapter 9. Section 9-1 and 9-2. Triola, Elementary Statistics, Eighth Edition. Copyright 2001. Addison Wesley Longman 5 Assumptions 1. The sample of paired data (x,y) is a random sample. 2. The pairs of (x,y) data have a bivariate normal distribution. Chapter 9. Section 9-1 and 9-2. Triola, Elementary Statistics, Eighth Edition. Copyright 2001. Addison Wesley Longman 6 Definition Scatterplot (or scatter diagram) is a graph in which the paired (x,y) sample data are plotted with a horizontal x axis and a vertical y axis. Each individual (x,y) pair is plotted as a single point. Chapter 9. Section 9-1 and 9-2. Triola, Elementary Statistics, Eighth Edition. Copyright 2001. Addison Wesley Longman 7 Scatter Diagram of Paired Data Chapter 9. Section 9-1 and 9-2. Triola, Elementary Statistics, Eighth Edition. Copyright 2001. Addison Wesley Longman 8 Positive Linear Correlation y y y (a) Positive Figure 9-1 x x x (b) Strong positive (c) Perfect positive Scatter Plots Chapter 9. Section 9-1 and 9-2. Triola, Elementary Statistics, Eighth Edition. Copyright 2001. Addison Wesley Longman 9 Negative Linear Correlation y y y (d) Negative Figure 9-1 x x x (e) Strong negative (f) Perfect negative Scatter Plots Chapter 9. Section 9-1 and 9-2. Triola, Elementary Statistics, Eighth Edition. Copyright 2001. Addison Wesley Longman 10 No Linear Correlation y y x (g) No Correlation Figure 9-1 x (h) Nonlinear Correlation Scatter Plots Chapter 9. Section 9-1 and 9-2. Triola, Elementary Statistics, Eighth Edition. Copyright 2001. Addison Wesley Longman 11 Definition Linear Correlation Coefficient r measures strength of the linear relationship between paired x and y values in a sample Chapter 9. Section 9-1 and 9-2. Triola, Elementary Statistics, Eighth Edition. Copyright 2001. Addison Wesley Longman 12 Definition Linear Correlation Coefficient r measures strength of the linear relationship between paired x and y values in a sample r= nxy - (x)(y) n(x2) - (x)2 n(y2) - (y)2 Formula 9-1 Chapter 9. Section 9-1 and 9-2. Triola, Elementary Statistics, Eighth Edition. Copyright 2001. Addison Wesley Longman 13 Definition Linear Correlation Coefficient r measures strength of the linear relationship between paired x and y values in a sample r= nxy - (x)(y) n(x2) - (x)2 n(y2) - (y)2 Formula 9-1 Calculators can compute r (rho) is the linear correlation coefficient for all paired data in the population. Chapter 9. Section 9-1 and 9-2. Triola, Elementary Statistics, Eighth Edition. Copyright 2001. Addison Wesley Longman 14 Notation for the Linear Correlation Coefficient n = number of pairs of data presented denotes the addition of the items indicated. x denotes the sum of all x values. x2 indicates that each x score should be squared and then those squares added. (x)2 indicates that the x scores should be added and the total then squared. xy indicates that each x score should be first multiplied by its corresponding y score. After obtaining all such products, find their sum. r represents linear correlation coefficient for a sample represents linear correlation coefficient for a population Chapter 9. Section 9-1 and 9-2. Triola, Elementary Statistics, Eighth Edition. Copyright 2001. Addison Wesley Longman 15 Rounding the Linear Correlation Coefficient r Round to three decimal places so that it can be compared to critical values in Table A-6 Use calculator or computer if possible Chapter 9. Section 9-1 and 9-2. Triola, Elementary Statistics, Eighth Edition. Copyright 2001. Addison Wesley Longman 16 Interpreting the Linear Correlation Coefficient If the absolute value of r exceeds the value in Table A - 6, conclude that there is a significant linear correlation. Otherwise, there is not sufficient evidence to support the conclusion of significant linear correlation. Chapter 9. Section 9-1 and 9-2. Triola, Elementary Statistics, Eighth Edition. Copyright 2001. Addison Wesley Longman 17 TABLE A-6 Critical Values of the Pearson Correlation Coefficient r n 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 25 30 35 40 45 50 60 70 80 90 100 = .05 .950 .878 .811 .754 .707 .666 .632 .602 .576 .553 .532 .514 .497 .482 .468 .456 .444 .396 .361 .335 .312 .294 .279 .254 .236 .220 .207 .196 = .01 .999 .959 .917 .875 .834 .798 .765 .735 .708 .684 .661 .641 .623 .606 .590 .575 .561 .505 .463 .430 .402 .378 .361 .330 .305 .286 .269 .256 Chapter 9. Section 9-1 and 9-2. Triola, Elementary Statistics, Eighth Edition. Copyright 2001. Addison Wesley Longman 18 Properties of the Linear Correlation Coefficient r 1. -1 r 1 2. Value of r does not change if all values of either variable are converted to a different scale. 3. The r is not affected by the choice of x and y. Interchange x and y and the value of r will not change. 4. r measures strength of a linear relationship. Chapter 9. Section 9-1 and 9-2. Triola, Elementary Statistics, Eighth Edition. Copyright 2001. Addison Wesley Longman 19 Common Errors Involving Correlation 1. Causation: It is wrong to conclude that correlation implies causality. 2. Averages: Averages suppress individual variation and may inflate the correlation coefficient. 3. Linearity: There may be some relationship between x and y even when there is no significant linear correlation. Chapter 9. Section 9-1 and 9-2. Triola, Elementary Statistics, Eighth Edition. Copyright 2001. Addison Wesley Longman 20 Common Errors Involving Correlation FIGURE 9-2 250 Distance (feet) 200 150 100 50 0 0 1 2 3 4 5 6 7 8 Time (seconds) Scatterplot of Distance above Ground and Time for Object Thrown Upward Chapter 9. Section 9-1 and 9-2. Triola, Elementary Statistics, Eighth Edition. Copyright 2001. Addison Wesley Longman 21 Formal Hypothesis Test To determine whether there is a significant linear correlation between two variables Two methods Both methods let H0: = (no significant linear correlation) H1: (significant linear correlation) Chapter 9. Section 9-1 and 9-2. Triola, Elementary Statistics, Eighth Edition. Copyright 2001. Addison Wesley Longman 22 Method 1: Test Statistic is t (follows format of earlier chapters) Test statistic: t= r 1-r2 n-2 Chapter 9. Section 9-1 and 9-2. Triola, Elementary Statistics, Eighth Edition. Copyright 2001. Addison Wesley Longman 23 Method 1: Test Statistic is t (follows format of earlier chapters) Test statistic: t= r 1-r2 n-2 Critical values: use Table A-3 with degrees of freedom = n - 2 Chapter 9. Section 9-1 and 9-2. Triola, Elementary Statistics, Eighth Edition. Copyright 2001. Addison Wesley Longman 24 Method 1: Test Statistic is t (follows format of earlier chapters) Figure 9-4 Chapter 9. Section 9-1 and 9-2. Triola, Elementary Statistics, Eighth Edition. Copyright 2001. Addison Wesley Longman 25 Method 2: Test Statistic is r (uses fewer calculations) Test statistic: r Critical values: Refer to Table A-6 (no degrees of freedom) Chapter 9. Section 9-1 and 9-2. Triola, Elementary Statistics, Eighth Edition. Copyright 2001. Addison Wesley Longman 26 Method 2: Test Statistic is r (uses fewer calculations) Test statistic: r Critical values: Refer to Table A-6 (no degrees of freedom) Reject =0 -1 Figure 9-5 r = - 0.811 Fail to reject =0 0 Reject =0 r = 0.811 1 Sample data: r = 0.828 Chapter 9. Section 9-1 and 9-2. Triola, Elementary Statistics, Eighth Edition. Copyright 2001. Addison Wesley Longman 27 FIGURE 9-3 Start Testing for a Linear Correlation Let H0: = 0 H1: 0 Select a significance level Calculate r using Formula 9-1 METHOD 1 METHOD 2 The test statistic is t= The test statistic is r r Critical values of t are from Table A-6 1-r2 n -2 Critical values of t are from Table A-3 with n -2 degrees of freedom If the absolute value of the test statistic exceeds the critical values, reject H0: = 0 Otherwise fail to reject H0 If H0 is rejected conclude that there is a significant linear correlation. If you fail to reject H0, then there is not sufficient evidence to conclude that there is linear correlation. Chapter 9. Section 9-1 and 9-2. Triola, Elementary Statistics, Eighth Edition. Copyright 2001. Addison Wesley Longman 28 Is there a significant linear correlation? Data from the Garbage Project x Plastic (lb) y Household 0.27 1.41 2 3 2.19 2.83 2.19 1.81 0.85 3.05 3 6 4 2 1 5 Chapter 9. Section 9-1 and 9-2. Triola, Elementary Statistics, Eighth Edition. Copyright 2001. Addison Wesley Longman 29 Is there a significant linear correlation? Data from the Garbage Project x Plastic (lb) y Household n=8 0.27 1.41 2 3 = 0.05 2.19 2.83 2.19 1.81 0.85 3.05 3 6 4 2 1 5 H0: = 0 H1 : 0 Test statistic is r = 0.842 Chapter 9. Section 9-1 and 9-2. Triola, Elementary Statistics, Eighth Edition. Copyright 2001. Addison Wesley Longman 30 Is there a significant linear correlation? n=8 = 0.05 =0 : 0 H 0: H1 Test statistic is r = 0.842 Critical values are r = - 0.707 and 0.707 (Table A-6 with n = 8 and = 0.05) TABLE A-6 Critical Values of the Pearson Correlation Coefficient r n 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 25 30 35 40 45 50 60 70 80 90 100 = .05 .950 .878 .811 .754 .707 .666 .632 .602 .576 .553 .532 .514 .497 .482 .468 .456 .444 .396 .361 .335 .312 .294 .279 .254 .236 .220 .207 .196 Chapter 9. Section 9-1 and 9-2. Triola, Elementary Statistics, Eighth Edition. Copyright 2001. Addison Wesley Longman = .01 .999 .959 .917 .875 .834 .798 .765 .735 .708 .684 .661 .641 .623 .606 .590 .575 .561 .505 .463 .430 .402 .378 .361 .330 .305 .286 .269 .256 31 Is there a significant linear correlation? Reject =0 -1 r = - 0.707 Fail to reject =0 0 Reject =0 r = 0.707 1 Sample data: r = 0.842 Chapter 9. Section 9-1 and 9-2. Triola, Elementary Statistics, Eighth Edition. Copyright 2001. Addison Wesley Longman 32 Is there a significant linear correlation? 0.842 > 0.707, That is the test statistic does fall within the critical region. Reject =0 -1 r = - 0.707 Fail to reject =0 0 Reject =0 r = 0.707 1 Sample data: r = 0.842 Chapter 9. Section 9-1 and 9-2. Triola, Elementary Statistics, Eighth Edition. Copyright 2001. Addison Wesley Longman 33 Is there a significant linear correlation? 0.842 > 0.707, That is the test statistic does fall within the critical region. Therefore, we REJECT H0: = 0 (no correlation) and conclude there is a significant linear correlation between the weights of discarded plastic and household size. Reject =0 -1 r = - 0.707 Fail to reject =0 0 Reject =0 r = 0.707 1 Sample data: r = 0.842 Chapter 9. Section 9-1 and 9-2. Triola, Elementary Statistics, Eighth Edition. Copyright 2001. Addison Wesley Longman 34 Justification for r Formula Formula 9-1 is developed from r= (x -x) (y -y) (n -1) Sx Sy Chapter 9. Section 9-1 and 9-2. Triola, Elementary Statistics, Eighth Edition. Copyright 2001. Addison Wesley Longman 35 Justification for r Formula Formula 9-1 is developed from r= (x -x) (y -y) (n -1) Sx Sy (x, y) centroid of sample points Chapter 9. Section 9-1 and 9-2. Triola, Elementary Statistics, Eighth Edition. Copyright 2001. Addison Wesley Longman 36 Justification for r Formula Formula 9-1 is developed from r= (x -x) (y -y) (n -1) Sx Sy (x, y) centroid of sample points x=3 y x - x = 7- 3 = 4 (7, 23) • 24 20 y - y = 23 - 11 = 12 Quadrant 1 Quadrant 2 16 • 12 8 • Quadrant 3 •• 4 y = 11 (x, y) Quadrant 4 FIGURE 9-6 x 0 0 1 2 3 4 5 6 7 Chapter 9. Section 9-1 and 9-2. Triola, Elementary Statistics, Eighth Edition. Copyright 2001. Addison Wesley Longman 37