Correlation • Correlation is used to describe the linear relationship between two continuous variables (e.g., height and weight). In general, correlation tends to be used when there is no identified response variable. It measures the strength (qualitatively) and direction of the linear relationship between two or more variables. • The Pearson correlation coefficient measures the strength of the linear association between two variables. It can be used to estimate the population correlation, ρ. p. 1 Example Correlating Two Lung Function Meters (Wright vs. Mini-Wright) Peak expiratory flow rate (PERF) can be measured using either the “Wright Peak Flow Meter” or the “Mini-Wright Peak Flow Meter.” Do their measurements “correlate”? r = 0.943 700 PERF by Mini Meter 600 500 400 300 200 100 200 300 400 500 600 700 PERF by Large Meter Note that this is a very different question than do their measurements “agree”. p. 2 Properties of Correlations • A correlation can range in value between –1 and 1: - r is a dimensionless quantity; that is, r is independent of the units of measurement of X and Y. - If the correlation is greater than 0, then as X increases Y increases and the two variables are said to be positively correlated. An r = 1 is perfect positive correlation. - If the correlation is less than 0, then as X increases Y decreases and the two variables are said to be negatively correlated. An r = -1 is perfect negative correlation. - If the correlation is 0 then there is no linear relationship between X and Y. The two variables are said to be uncorrelated. The correlation is 0 when the covariance of X and Y is 0. • The correlation coefficient is a measure of the strength of the linear trend relative to the variability of the data around that trend. Thus, it is dependent both on the magnitude of the trend and the magnitude of the variability in the data. p. 3 70 r = 1.0 60 60 50 50 40 40 Y Y 70 30 30 20 20 10 20 r = -1.0 10 30 40 50 60 70 80 20 30 40 X 160 150 50 60 70 80 X 160 r = 0.9 r = -0.9 150 130 130 Y 140 Y 140 120 120 110 110 100 100 90 90 30 40 50 60 70 80 20 30 X 40 50 60 70 80 X 150 r≈0 140 130 Y 20 120 110 100 90 20 30 40 50 60 70 80 X p. 4 Correlation is not a measure of the magnitude of the slope of the regression line. 9 8 7 6 Y 5 4 3 2 1 0 0 2 4 6 8 10 6 8 10 X 9 8 7 6 Y 5 4 3 2 1 0 0 2 4 X Both plots above have r = 1, but very different slopes. Note: In linear regression we can (and do) estimate the amount Y increases/decreases with a one-unit change in X. p. 5 However, the slope does depend on the units of measurement of X and Y, the correlation is dimensionless. p. 6 • Correlation is not a measure of the appropriateness of the straight-line model. 60 60 50 50 40 40 30 30 Y Y Examples where r = 0 20 20 10 10 0 0 0 1 2 3 4 5 6 7 0 8 1 2 3 4 5 6 7 8 5 6 7 8 X X 80 80 70 70 60 60 50 50 40 40 Y Y Examples when r is high 30 30 20 20 10 10 0 0 0 1 2 3 4 X r = -0.766 5 6 7 8 0 1 2 3 4 X r = -0.767 p. 7