Correlation • This Chapter is on Correlation • We will look at patterns in data on a scatter graph • We will be looking at how to calculate the variance and co-variance of variables • We will see how to numerically measure the strength of correlation between two variables Correlation Scatter Graphs Scatter Graphs are a way of representing 2 sets of data. It is then possible to see whether they are related. Positive Correlation As one variable increases, so does the other Positive Negative Correlation As one variable increases, the other decreases Negative No Correlation There seems to be no pattern linking the two variables None 6A Scatter Graphs In the study of a city, the population density, in people/hectare, and the distance from the city centre, in km, was investigated by choosing sample areas. The results are as follows: Area A B C D E Distance 0.6 3.8 2.4 3.0 2.0 Pop. Density 50 22 14 20 33 Area F G H I J Distance 1.5 1.8 3.4 4.0 0.9 Pop. Density 47 25 8 16 38 Plot a scatter graph and describe the correlation. Interpret what the correlation means. Pop. Density (people/hectare) Correlation 50 40 30 20 10 0 0 1 2 3 4 Distance from centre (km) The correlation is negative, which means that as we get further from the city centre, the population density decreases. Correlation Variability of Bivariate Data We learnt in chapter 3 that: (Although remember that this formula changed to make it easier to use) 2 ( x x ) Variance n In Correlation: ( x x)2 Sxx ‘How x varies’ Similarly for y: ( y y )2 Syy ‘How y varies’ And you can also calculate the Co-variance of both variables ( x x)( y y) n ( x x)( y y) Sxy ‘How x and y vary together’ 6B/C Correlation Variability of Bivariate Data Like in chapter 3, we can use a formula which will make calculations easier 2 ( x x ) Variance n BUT: ( x x)2 Sxx Variance Sxx n 6B/C Correlation Variability of Bivariate Data Variance Sxx n Sxx n Variance x 2 x 2 Sxx n n n x2 x 2 Sxx n 2 n ( n) Sxx x 2 x n 2 Multiply both sides by ‘n’ The easier formula for variance from chapter 3 2 x2 n x n 2 For the second fraction, square the top and bottom separately Multiplying both fractions by ‘n’ will cancel a ‘divide by n’ from each of them 6B/C Correlation Variability of Bivariate Data These are the formulae for Sxx, Syy and Sxy. You are given these in the formula booklet. You do not need to know how to derive them (like we just did!) Sxx x 2 x n 2 Syy y 2 y 2 n x y Sxy xy n 6B/C Sxx x 2 Syy y x 2 n 2 y 2 Correlation x y Sxy xy n n Variability of Bivariate Data Calculate Sxx, Syy and Sxy, based on the following information. n 12 x 155 2 x 2031 y 198 2 y 3904 xy 2732 Sxx x 2 x 2 n Syy y 2 y 2 n (155) 2 Sxx 2031 12 (198) 2 Syy 3904 12 Sxx 28.92 Syy 637 x y Sxy xy n Sxy 2732 155 198 12 Sxy 174.5 6B/C Sxx x 2 Syy y x 2 n 2 y 2 Correlation n n Variability of Bivariate Data The following table shows babies heads’ circumferences (cm) and the gestation period (weeks) for 6 new born babies. Calculate Sxx, Syy and Sxy. We need n x y Sxy xy 6 x 190 2 x 6036 y 229 2 y 8753 Baby A B C D E F Head size (x) 31 33 30 31 35 30 Gestation period (y) 36 37 38 38 40 40 x2 961 1089 900 961 1225 900 y2 1296 1369 1444 1444 1600 1600 xy 1116 1221 1140 1178 1400 1200 xy 7255 6B/C Sxx x 2 Syy y x 2 Correlation n 2 y 2 n n Variability of Bivariate Data The following table shows babies heads’ circumferences (cm) and the gestation period (weeks) for 6 new born babies. Calculate Sxx, Syy and Sxy. We need n x y Sxy xy 6 x 190 x y 229 y xy 7255 2 2 Sxx x 2 x 2 n Syy y 2 y n (190) 2 Sxx 6036 6 (229) 2 Syy 8753 6 Sxx 19.33 Syy 12.83 6036 x y Sxy xy 8753 Sxy 7255 2 n 190 229 6 Sxy 3.33 6B/C Correlation Product Moment Correlation Coefficient We can test the correlation of data by calculating the Product Moment Correlation Coefficient. This uses Sxx, Syy and Sxy. Sxy r SxxSyy The value of this number tells you what the correlation is and how strong it is. Negative Correlation No Linear Correlation Positive Correlation -1 0 1 The closer to 1, the stronger the positive correlation. The same applies for -1 and negative correlation. A value close to 0 implies no linear correlation. 6B/C Correlation Product Moment Correlation Coefficient Given the following data, calculate the Product Moment Correlation Coefficient. Sxx 74 r Sxy SxxSyy r 102 74 150 r 0.97 Syy 150 Sxy 102 There is positive correlation, as x increases, y does as well. 6B/C Correlation Limitations of the Product Moment Correlation Coefficient Sometimes it may indicate Correlation between unrelated variables Cars on a particular street have increased, as have the sales of DVDs in town The PMCC would indicate positive correlation where the two are most likely not linked The speed of computers has increased, as has life expectancy amongst people These are not directly linked, but are both due to scientific developments 6B/C Correlation Using Coding with the PMCC Calculating the PMCC from this table. n 6 x 102 103 102 103 104 103 y 320 335 345 355 360 380 x2 10404 10609 10404 10609 10816 10609 y2 102400 112225 119025 126025 129600 144400 xy 32640 34505 35190 36565 37440 39140 Sxx x 2 x n 2 Syy y 2 y n 2 x 617 y 2095 x 63451 y 733675 xy 215480 2 2 x y Sxy xy n (617) 2 Sxx 63451 6 (2095) 2 Syy 733675 6 Sxy 215480 Sxx 2.83 Syy 2170.83 Sxy 44.17 617 2095 6 6D Correlation Using Coding with the PMCC Calculating the PMCC from this table. x 102 103 102 103 104 103 y 320 335 345 355 360 380 x2 10404 10609 10404 10609 10816 10609 y2 102400 112225 119025 126025 129600 144400 xy 32640 34505 35190 36565 37440 39140 Sxx 2.83 r Sxy SxxSyy r 44.17 2.83 2170.83 Syy 2170.83 Sxy 44.17 r 0.563 6D Correlation p x 100 q Using Coding with the PMCC Calculating the PMCC from this table, using coding. x 102 103 102 103 104 103 y 320 335 345 355 360 380 p 2 3 2 3 4 3 q 4 7 9 11 12 16 p2 4 9 4 9 16 9 q2 16 49 81 121 144 256 pq 8 21 18 33 48 48 p Sqq q 2 Spp p 2 2 n q n 2 y 300 5 n 6 p 17 q 59 p 51 q 667 pq 176 2 2 p q Spq pq n 17 59 6 (17) 2 Spp 51 6 (59) 2 Sqq 667 6 Spq 176 Spp 2.83 Sqq 86.83 Spq 8.83 6D Correlation Using Coding with the PMCC Calculating the PMCC from this table. x 102 103 102 103 104 103 y 320 335 345 355 360 380 p 2 3 2 3 4 3 q 4 7 9 11 12 16 p2 4 9 4 9 16 9 q2 16 49 81 121 144 256 pq 8 21 18 33 48 48 Spp 2.83 Sqq 86.83 Spq 8.83 Spq SppSqq 8.83 r 2.83 86.83 r r 0.563 So coding will not affect the PMCC! 6D Summary • We have looked at plotting scatter graphs • We have looked at calculating measures of variance, Sxx, Syy and Sxy • We have also seen types of correlation and how to recognise them on a graph • We have calculated the Product Moment Correlation Coefficient, and interpreted it. It is a numerical measure of correlation.