Stat 301 – Lecture 10 Correlation A measure of the strength of the linear association between two numerical variables. 1 Sample Covariance Measure of the co-variability between two numerical variables. x x y y n 1 2 Sample Correlation The sample covariance scaled to account for variation in the x’s and y’s. r x x y y n 1sx s y 3 1 Stat 301 – Lecture 10 Properties The value of the correlation coefficient, r, is always between –1: a perfect negative linear relationship +1: a perfect positive linear relationship 4 Properties r = 0: There is no linear relationship between the two numerical variables. Random scatter There could be a relationship, but not one that is linear. 5 Properties The correlation coefficient, r, does not have any units. Changing the scales of the numerical variables will not change the value of the correlation coefficient. 6 2 Stat 301 – Lecture 10 CO2 and Temperature r x x y y n 1s x s y 63.68808 1916.321080.22878 r 0.8977 r 7 CO2 and Temperature There is a strong, positive linear association between the carbon dioxide concentration and the temperature. 8 CO2 and Temperature Is the linear association between the carbon dioxide concentration and temperature statistically significant? 9 3 Stat 301 – Lecture 10 Step 1: Hypotheses H 0 : 0 (no linear association) H A : 0 (linear association) 10 Step 2: Test statistic t r 0 1 r 2 n2 t 8.64 0.8977 1 0.8059 18 P value 0.0001 11 Step 3: Decision Reject the null hypothesis because the P-value is so small. 12 4 Stat 301 – Lecture 10 Step 4: Conclusion There is a statistically significant linear association between carbon dioxide concentration and temperature. 13 Connection The test for the statistical significance of correlation is exactly the same as the test for the statistical significance of the estimated slope. 14 Connection ∑ ∑ ̅ 1 ̅ 1 15 5 Stat 301 – Lecture 10 Connection R 2 r 2 R 2 0.8977 0.8059 2 16 Difference R2 can be interpreted as a % of total variation. r has a sign (+/–) that matches the direction of the association and cannot be interpreted as a %. 17 JMP Analyze – Multivariate Methods – Multivariate Y, Columns: CO2, Temp Multivariate Pairwise correlations 18 6 Stat 301 – Lecture 10 Multivariate Correlations CO2 Temp CO2 1.0000 0.8977 Temp 0.8977 1.0000 Scatterplot Matrix 370 360 350 CO2 340 330 320 310 14.6 14.5 14.4 14.3 Temp 14.2 14.1 14.0 13.9 13.8 310 320 330 340 350 360 370 13.8 14.014.1 14.3 14.5 Pairwise Correlations Variable by Variable Temp CO2 Correlation Count Lower 95% Upper 95% Signif Prob 0.8977 20 0.7552 0.9592 <.0001* 19 -.8 -.6 -.4 -.2 0 .2 .4 .6 .8 JMP Output JMP does not give you the value of the test statistic. JMP does give a 95% confidence interval for the population correlation coefficient. 20 95% Confidence Interval r = 0.8977 95% confidence interval on ρ. 0.7552 to 0.9592 Note that this interval is not symmetric around the value of r. 21 7