COR-STAT1202 Topic 7: Correlation and regression Statistics for Business and Economics Chapter 12 1 Correlation 2 COREL. AND REG. 1 COR-STAT1202 . Correlation • Correlation is concerned with the strength of the linear relationship between two variables COR-STAT1202 3 Correlation and Regression 3 Visually, the relationship between two variables can be seen using scatter diagrams Results There is a fairly strong positive linear relationship between age and results. (maybe ρ is about 0.7) Age COR-STAT1202 Correlation and Regression 4 4 COREL. AND REG. 2 COR-STAT1202 Results There is a fairly strong negative linear relationship between age and results (maybe ρ is about -0.8) Age COR-STAT1202 5 Correlation and Regression 5 Results There is a fairly weak negative linear relationship between age and results (maybe ρ is about -0.3) Age COR-STAT1202 Correlation and Regression 6 6 COREL. AND REG. 3 COR-STAT1202 There is no relationship between age and results Results (r would be close to zero) Age COR-STAT1202 Correlation and Regression 7 7 Can the correlation coefficient, as a summary statistic, replace an individual examination of the data? COR-STAT1202 Correlation and Regression 8 8 COREL. AND REG. 4 COR-STAT1202 The four y variables have the same mean (7.5), standard deviation (4.12) and correlation (0.81). However, as can be seen on the plots, the distribution of the variables is very different COR-STAT1202 Correlation and Regression 9 9 You can calculate the correlation coefficient or product moment correlation or Pearson correlation coefficient. This is ρ. The correlation between two variables are often estimated by the sample relationships, r COR-STAT1202 Correlation and Regression 10 10 COREL. AND REG. 5 COR-STAT1202 Some points about correlation It is independent of the scale of measurement It is independent of the origin of measurement It is symmetric (correlation between x and y is the same as between y and x) COR-STAT1202 Correlation and Regression 11 11 Example: A test has been designed to examine a prospective salesman’s ability to sell. Some experienced salesmen sit the test and their scores are compared with their actual productivity. Calculate the correlation between test score and productivity. Score (x) (mark out of 50) 41, 34, 35, 40, 33, 42, 37, 42, 40, 43, 38, 38, 46, 36, 32, 43, 42, 30, 41, 45 Productivity (y) (number sold) COR-STAT1202 32, 35, 20, 24, 27, 28, 31, 33, 26, 41, 29, 33, 36, 23, 22, 38, 26, 20, 30, 30 Correlation and Regression 12 12 COREL. AND REG. 6 COR-STAT1202 r The correlation between x and y is strong. COR-STAT1202 Correlation and Regression 13 13 Making sense of correlations COR-STAT1202 Correlation and Regression 14 14 COREL. AND REG. 7 COR-STAT1202 Spurious Correlation Spurious Correlation • “Spurious Correlation” is defined as a situation in which measures of two or more variables are statistically related but are not in fact causally linked, and this is usually because the statistical relation is caused by a third variable. COR-STAT1202 Correlation and Regression 15 15 Think!!! Studies have shown repeatedly, for example, that children with longer arms reason better than those with shorter arms. Yes, there is a correlation between the two. But commonsense tells us there is no CAUSAL relationship between the two. Children with longer arms reason better because they’re older! COR-STAT1202 Correlation and Regression 16 16 COREL. AND REG. 8 COR-STAT1202 Reasoning ability Age of children The correlation is very strong Long arms COR-STAT1202 There is no causal relation here Correlation and Regression 17 17 So what have we learnt: Correlation does not imply causation. Causation does suggest correlation. COR-STAT1202 Correlation and Regression 18 18 COREL. AND REG. 9 COR-STAT1202 Rank Correlation The rank correlation coefficient (also known as Spearman’s rank correlation coefficient) is another way to measure the strength of correlation between two variables. where di are the differences in the ranks between xi and yi. It looks at ranks not actual variable values. Therefore it takes into account extreme observations in the sample. COR-STAT1202 19 Correlation and Regression 19 Illustration: The following figures give examination and project results (in %) for eight students. Find the Spearman’s rank correlation coefficient for the data Student’s examination and project marks 1 2 3 4 5 6 7 8 Exam 95 80 70 40 30 73 85 50 Project 65 60 55 50 40 80 75 70 COR-STAT1202 Correlation and Regression 20 20 COREL. AND REG. 10 COR-STAT1202 Student’s examination and project marks 1 2 3 4 5 6 7 8 Exam 95 80 70 40 30 73 85 50 Rank (E) 8 6 4 2 1 5 7 3 Project 65 60 55 50 40 80 75 70 Rank (P) 5 4 3 2 1 8 7 6 d 3 2 1 0 0 -3 0 -3 d2 9 4 1 0 0 9 0 9 COR-STAT1202 Correlation and Regression 21 COR-STAT1202 Correlation and Regression 22 21 22 COREL. AND REG. 11 COR-STAT1202 Demonstration and Practice Use the datafile ‘satisfaction_retention’ for this exercise What is the strength of linear relationship between employee satisfaction levels and employee engagement? What is the strength of linear relationship between employee engagement and customer satisfaction? What conclusion can you draw from these findings? COR-STAT1202 Correlation and Regression 23 23 Regression 24 COREL. AND REG. 12 COR-STAT1202 In regression, we find the way of representing the linear relationship between variables We need to know the dependent variable y and the independent variable x. The relationship is given as: COR-STAT1202 25 Correlation and Regression 25 The best fitted line through a set of data is represented as satisfaction Y = a + bx Service quality perception COR-STAT1202 Correlation and Regression 26 26 COREL. AND REG. 13 COR-STAT1202 The line so formed is known as the sample regression line of y on x. COR-STAT1202 27 Correlation and Regression 27 satisfaction These two graphs show datasets of different correlations Service quality perception satisfaction Service quality perception COR-STAT1202 Correlation and Regression 28 28 COREL. AND REG. 14 COR-STAT1202 satisfaction These two graphs have datasets with different regression weights. Service quality perception satisfaction Service quality perception COR-STAT1202 Correlation and Regression 29 29 For datasets that have a high r, it means that there is a strong connection between x and y; the points will be close to the line of the best fit. In contrast, a low r means the points are scattered. CORRELATION r AND REGRESSION WEIGHT b, MEASURE TWO DIFFERENT THINGS COR-STAT1202 Correlation and Regression 30 30 COREL. AND REG. 15 COR-STAT1202 Illustration: A study was made by a retailer to determine the relation between weekly advertising expenditure and sales (in thousands of pounds). Find the equation of a regression line to predict weekly sales from Advertising. Estimate weekly sales when advertising costs are $35,000. Adv costs (‘000) 40 20 25 20 30 50 40 20 50 40 25 50 Sales (‘000) 385 400 395 365 475 440 490 420 560 525 480 510 COR-STAT1202 Correlation and Regression 31 COR-STAT1202 Correlation and Regression 32 31 32 COREL. AND REG. 16 COR-STAT1202 So, sales = 343.70 + 3.22 Adv.Costs With advertising costs of 35 (i.e. 35000), sales = $456400 COR-STAT1202 Correlation and Regression 33 33 Demonstration and Practice Use the excel file ‘satisfaction-retention’ for this exercise. What is the impact of employee engagement on customer satisfaction? COR-STAT1202 Correlation and Regression 34 34 COREL. AND REG. 17 COR-STAT1202 END OF COURSE CONGRATULATIONS! COR-STAT1202 Correlation and Regression 35 35 COREL. AND REG. 18