Correlation A measure of the strength of the linear association between two numerical

advertisement

Correlation

 A measure of the strength of the linear association between two numerical variables.

1

Sample Covariance

 Measure of the co-variability between two numerical variables.

  x

 x

 y

 y

 n

1

2

Sample Correlation

 The sample covariance scaled to account for variation in the x ’s and y ’s.

r

 x n

 x

1

 s

 x y s y

 y

3

Properties

 The value of the correlation coefficient, r , is always between

–1: a perfect negative linear relationship

 +1: a perfect positive linear relationship

4

Properties

 r = 0: There is no linear relationship between the two numerical variables.

 Random scatter

 There could be a relationship, but not one that is linear.

5

Properties

 The correlation coefficient, r , does not have any units.

 Changing the scales of the numerical variables will not change the value of the correlation coefficient.

6

CO

2 r

 and Temperature

 x n

 x

1

 s

 y s

 y

 x y r r

19

16 .

63 .

68808

32108



0 .

22878

0 .

8977

7

CO

2 and Temperature

 There is a strong, positive linear association between the carbon dioxide concentration and the temperature.

8

CO

2 and Temperature

 Is the linear association between the carbon dioxide concentration and temperature statistically significant?

9

Step 1: Hypotheses

H

H

0

A

:

:

0

0

(no linear

(linear associatio associatio n) n)

10

Step 2: Test statistic t t

 r

0

1

 n

 r

2

2

8 .

64

P

 value

1

0

0 .

0001

.

8977

0 .

8059

18

11

Step 3: Decision

 Reject the null hypothesis because the P-value is so small.

12

Step 4: Conclusion

 There is a statistically significant linear association between carbon dioxide concentration and temperature.

13

Connection

 The test for the statistical significance of correlation is exactly the same as the test for the statistical significance of the estimated slope.

14

Connection

R

2

R

2

 

2

0 .

8977

2 

0 .

8059

15

Difference

 R 2 can be interpreted as a % of total variation.

 r has a sign (+/ –) that matches the direction of the association and cannot be interpreted as a %.

16

JMP

 Analyze – Multivariate

Methods – Multivariate

 Y, Columns: CO2, Temp

 Multivariate

 Pairwise correlations

17

Multivariate

Correlations

CO2

Temp

CO2

1.0000

0.8977

Temp

0.8977

1.0000

Scatterplot Matrix

370

360

350

340

330

320

CO2

310

14.6

14.5

14.4

14.3

14.2

14.1

14.0

13.9

13.8

310 320 330 340 350 360 370 13.8

14.0

Pairwise Correlations

Variable

Temp by Variable

CO2

Correlation

0.8977

Count

20

Signif Prob

<.0001*

Temp

14.2

14.4

14.6

-.8 -.6 -.4 -.2 0 .2 .4 .6 .8

18

Download