Chapter 9 Correlation and Regression Chapter 9. Section 99-1 and 99-2. Triola, Triola, Essentials of Statistics, Second Edition. Copyright 2004. Pearson/Addison Pearson/Addison Wesley 1 Chapter 9 Correlation and Regression 9-1 Overview 9-2 Correlation 9-3 Regression Chapter 9. Section 99-1 and 99-2. Triola, Triola, Essentials of Statistics, Second Edition. Copyright 2004. Pearson/Addison Pearson/Addison Wesley 2 Overview Paired Data is there a relationship if so, what is the equation use the equation for prediction Chapter 9. Section 99-1 and 99-2. Triola, Triola, Essentials of Statistics, Second Edition. Copyright 2004. Pearson/Addison Pearson/Addison Wesley 3 Example: Lengths and Weights of Male Bears x Length (in.) 53.0 67.5 72.0 72.0 73.5 68.5 73.0 37.0 y Weight (lb) 80 344 416 348 262 360 332 34 Chapter 9. Section 99-1 and 99-2. Triola, Triola, Essentials of Statistics, Second Edition. Copyright 2004. Pearson/Addison Pearson/Addison Wesley 4 9-2 Correlation Chapter 9. Section 99-1 and 99-2. Triola, Triola, Essentials of Statistics, Second Edition. Copyright 2004. Pearson/Addison Pearson/Addison Wesley 5 Definition Correlation exists between two variables when one of them is related to the other in some way Chapter 9. Section 99-1 and 99-2. Triola, Triola, Essentials of Statistics, Second Edition. Copyright 2004. Pearson/Addison Pearson/Addison Wesley 6 Definition Scatterplot (or scatter diagram) is a graph in which the paired (x,y) sample data are plotted with a horizontal x axis and a vertical y axis. Each individual (x,y) pair is plotted as a single point. Chapter 9. Section 99-1 and 99-2. Triola, Triola, Essentials of Statistics, Second Edition. Copyright 2004. Pearson/Addison Pearson/Addison Wesley 7 Example: Lengths and Weights of Male Bears x Length (in.) 53.0 67.5 72.0 72.0 73.5 68.5 73.0 37.0 y Weight (lb) 80 344 416 348 262 360 332 34 (x , y) = (Length , Weight) (53.0, 80) (67.5, 344) (72.0, 416) etc. Chapter 9. Section 99-1 and 99-2. Triola, Triola, Essentials of Statistics, Second Edition. Copyright 2004. Pearson/Addison Pearson/Addison Wesley 8 Scatter Diagram of Paired Data Lengths and Weights of Male Bears 500 (72,416) 400 Weight (lb.) (68.5,360) • • (67.5,344) 300 • • (72,348) • (73,332) • (73.5,262) 200 100 • (37,34) (53,80) • 0 35 40 45 50 55 60 65 70 75 Length (in.) Chapter 9. Section 99-1 and 99-2. Triola, Triola, Essentials of Statistics, Second Edition. Copyright 2004. Pearson/Addison Pearson/Addison Wesley 9 Scatter Diagram of Paired Data Lengths and Weights of Male Bears 500 400 Weight (lb.) • • • • 300 • • 200 100 • • 0 35 40 45 50 55 60 65 70 75 Length (in.) Chapter 9. Section 99-1 and 99-2. Triola, Triola, Essentials of Statistics, Second Edition. Copyright 2004. Pearson/Addison Pearson/Addison Wesley 10 Positive Linear Correlation y y y x x x (b) Strong positive (a) Positive (c) Perfect positive Scatter Plots Figure 99-2 Chapter 9. Section 99-1 and 99-2. Triola, Triola, Essentials of Statistics, Second Edition. Copyright 2004. Pearson/Addison Pearson/Addison Wesley 11 Negative Linear Correlation y y y Figure 99-2 x x x (d) Negative (e) Strong negative (f) Perfect negative Scatter Plots Chapter 9. Section 99-1 and 99-2. Triola, Triola, Essentials of Statistics, Second Edition. Copyright 2004. Pearson/Addison Pearson/Addison Wesley 12 No Linear Correlation y y x (g) No Correlation Figure 99-2 x (h) Nonlinear Correlation Scatter Plots Chapter 9. Section 99-1 and 99-2. Triola, Triola, Essentials of Statistics, Second Edition. Copyright 2004. Pearson/Addison Pearson/Addison Wesley 13 Definition Linear Correlation Coefficient r measures strength of the linear relationship between paired xand y-quantitative values in a sample Chapter 9. Section 99-1 and 99-2. Triola, Triola, Essentials of Statistics, Second Edition. Copyright 2004. Pearson/Addison Pearson/Addison Wesley 14 Definition Linear Correlation Coefficient r sometimes referred to as the Pearson product moment correlation coefficient Chapter 9. Section 99-1 and 99-2. Triola, Triola, Essentials of Statistics, Second Edition. Copyright 2004. Pearson/Addison Pearson/Addison Wesley 15 Assumptions 1. The sample of paired data (x,y (x,y)) is a random sample. 2. The pairs of (x,y (x,y)) data have a bivariate normal distribution. Chapter 9. Section 99-1 and 99-2. Triola, Triola, Essentials of Statistics, Second Edition. Copyright 2004. Pearson/Addison Pearson/Addison Wesley 16 Notation for the Linear Correlation Coefficient n number of pairs of data presented. Σ denotes the addition of the items indicated. Σx denotes the sum of all x values. Σ x2 indicates that each x score should be squared and then those squares added. (Σx)2 indicates that the x scores should be added and the total then squared. Σxy indicates that each x score should be first multiplied by its corresponding y score. After obtaining all such products, find their sum. r represents linear correlation coefficient for a sample ρ represents linear correlation coefficient for a population Chapter 9. Section 99-1 and 99-2. Triola, Triola, Essentials of Statistics, Second Edition. Copyright 2004. Pearson/Addison Pearson/Addison Wesley 17 Definition Linear Correlation Coefficient r r= nΣxy - (Σx)(Σy) n(Σx2) - (Σx)2 n(Σy2) - (Σy)2 Formula 99-1 Calculators can compute r ρ (rho) rho) is the linear correlation coefficient for all paired data in the population. Chapter 9. Section 99-1 and 99-2. Triola, Triola, Essentials of Statistics, Second Edition. Copyright 2004. Pearson/Addison Pearson/Addison Wesley 18 Rounding the Linear Correlation Coefficient r Round to three decimal places so that it can be compared to critical values in Table AA-5 Use calculator or computer if possible Chapter 9. Section 99-1 and 99-2. Triola, Triola, Essentials of Statistics, Second Edition. Copyright 2004. Pearson/Addison Pearson/Addison Wesley 19 Example: Lengths and Weights of Male Bears x Length (in.) 53.0 67.5 72.0 72.0 73.5 68.5 73.0 37.0 y Weight (lb) 80 344 416 348 262 360 332 34 r = 0.897 using your calculator Chapter 9. Section 99-1 and 99-2. Triola, Triola, Essentials of Statistics, Second Edition. Copyright 2004. Pearson/Addison Pearson/Addison Wesley 20 Interpreting the Linear Correlation Coefficient If the absolute value of r exceeds the value in Table A - 5, conclude that there is a significant linear correlation. Otherwise, there is not sufficient evidence to support the conclusion of significant linear correlation. Chapter 9. Section 99-1 and 99-2. Triola, Triola, Essentials of Statistics, Second Edition. Copyright 2004. Pearson/Addison Pearson/Addison Wesley 21 TABLE AA-5 Critical Values of the Pearson Correlation Coefficient r n 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 25 30 35 40 45 50 60 70 80 90 100 α = .05 .950 .878 .811 .754 .707 .666 .632 .602 .576 .553 .532 .514 .497 .482 .468 .456 .444 .396 .361 .335 .312 .294 .279 .254 .236 .220 .207 .196 α = .01 .999 .959 .917 .875 .834 .798 .765 .735 .708 .684 .661 .641 .623 .606 .590 .575 .561 .505 .463 .430 .402 .378 .361 .330 .305 .286 .269 .256 Chapter 9. Section 99-1 and 99-2. Triola, Triola, Essentials of Statistics, Second Edition. Copyright 2004. Pearson/Addison Pearson/Addison Wesley 22 Properties of the Linear Correlation Coefficient r 1. -1 ≤ r ≤ 1 2. Value of r does not change if all values of either variable are converted to a different scale. 3. The value of r is not affected by the choice of x and y. Interchange x and y and the value of r will not change. 4. r measures strength of a linear relationship. Chapter 9. Section 99-1 and 99-2. Triola, Triola, Essentials of Statistics, Second Edition. Copyright 2004. Pearson/Addison Pearson/Addison Wesley 23 Common Errors Involving Correlation 1. Causation: It is incorrect to conclude that correlation implies causality. 2. Averages: Averages: Averages suppress individual variation and may inflate the correlation coefficient. 3. Linearity: There may be some relationship between x and y even when there is no significant linear correlation. Chapter 9. Section 99-1 and 99-2. Triola, Triola, Essentials of Statistics, Second Edition. Copyright 2004. Pearson/Addison Pearson/Addison Wesley 24 Common Errors Involving Correlation 250 Distance (feet) 200 150 100 50 0 1 0 2 3 4 5 6 7 8 Time (seconds) Scatterplot of Distance above Ground and Time for Object Thrown Upward Chapter 9. Section 99-1 and 99-2. Triola, Triola, Essentials of Statistics, Second Edition. Copyright 2004. Pearson/Addison Pearson/Addison Wesley 25 Formal Hypothesis Test To determine whether there is a significant linear correlation between two variables Two methods Both methods let H0: ρ = 0 (no significant linear correlation) H1: ρ ≠ 0 (significant linear correlation) Chapter 9. Section 99-1 and 99-2. Triola, Triola, Essentials of Statistics, Second Edition. Copyright 2004. Pearson/Addison Pearson/Addison Wesley 26 Method 1: Test Statistic is t (follows format of earlier chapters) Test statistic: t= r 1-r2 n-2 Critical values: use Table AA-3 with degrees of freedom = n - 2 Chapter 9. Section 99-1 and 99-2. Triola, Triola, Essentials of Statistics, Second Edition. Copyright 2004. Pearson/Addison Pearson/Addison Wesley 27 Method 1: Test Statistic is t (follows format of earlier chapters) Chapter 9. Section 99-1 and 99-2. Triola, Triola, Essentials of Statistics, Second Edition. Copyright 2004. Pearson/Addison Pearson/Addison Wesley 28 Method 2: Test Statistic is r (uses fewer calculations) Test statistic: r Critical values: Refer to Table AA-5 (no degrees of freedom) Reject ρ =0 -1 r = - 0.811 Fail to reject ρ=0 Reject ρ =0 r = 0.811 0 1 Sample data: r = 0.828 Chapter 9. Section 99-1 and 99-2. Triola, Triola, Essentials of Statistics, Second Edition. Copyright 2004. Pearson/Addison Pearson/Addison Wesley 29 Testing for a Linear Correlation A-5 Chapter 9. Section 99-1 and 99-2. Triola, Triola, Essentials of Statistics, Second Edition. Copyright 2004. Pearson/Addison Pearson/Addison Wesley 30 Is there a significant linear correlation? Data from the Garbage Project x Plastic (lb) 0.27 1.41 y Household 2 2.19 2.83 2.19 1.81 0.85 3.05 3 6 4 2 1 5 3 α = 0.05 n=8 H0 : ρ = 0 H1 :ρ ≠ 0 Test statistic is r = 0.842 Chapter 9. Section 99-1 and 99-2. Triola, Triola, Essentials of Statistics, Second Edition. Copyright 2004. Pearson/Addison Pearson/Addison Wesley 31 Is there a significant linear correlation? n=8 α = 0.05 α = .05 n ρ=0 H :ρ ≠ 0 H0: 1 Test statistic is r = 0.842 Critical values are r = - 0.707 and 0.707 (Table A-5 with n = 8 and α = 0.05) TABLE AA-5 Critical Values of the Pearson Correlation Coefficient r .950 .878 .811 .754 .707 .666 .632 .602 .576 .553 .532 .514 .497 .482 .468 .456 .444 .396 .361 .335 .312 .294 .279 .254 .236 .220 .207 .196 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 25 30 35 40 45 50 60 70 80 90 100 α = .01 .999 .959 .917 .875 .834 .798 .765 .735 .708 .684 .661 .641 .623 .606 .590 .575 .561 .505 .463 .430 .402 .378 .361 .330 .305 .286 .269 .256 Chapter 9. Section 99-1 and 99-2. Triola, Triola, Essentials of Statistics, Second Edition. Copyright 2004. Pearson/Addison Pearson/Addison Wesley 32 Is there a significant linear correlation? 0.842 > 0.707 The test statistic does fall within the critical region. Therefore, we REJECT H0: ρ = 0 (no correlation) and conclude there is a significant linear correlation between the weights of discarded plastic and household size. Reject ρ =0 -1 r = - 0.707 Fail to reject ρ=0 0 Reject ρ =0 r = 0.707 1 Sample data: r = 0.842 Chapter 9. Section 99-1 and 99-2. Triola, Triola, Essentials of Statistics, Second Edition. Copyright 2004. Pearson/Addison Pearson/Addison Wesley 33 Justification for r Formula Formula 99-1 is developed from r= Σ (x -x) (y -y) (x, y) (n -1) Sx Sy centroid of sample points x=3 y x - x = 7- 3 = 4 (7, 23) • 24 20 y - y = 23 - 11 = 12 Quadrant 1 Quadrant 2 16 • 12 y = 11 (x, y) 8 • Quadrant 3 •• 4 Quadrant 4 x 0 0 1 2 3 4 5 6 7 Chapter 9. Section 99-1 and 99-2. Triola, Triola, Essentials of Statistics, Second Edition. Copyright 2004. Pearson/Addison Pearson/Addison Wesley 34