Stat 301 – Lecture 10  Correlation Sample Covariance

advertisement
Stat 301 – Lecture 10
Correlation
A measure of the strength of
the linear association
between two numerical
variables.
1
Sample Covariance
Measure of the co-variability
between two numerical
variables.
 x  x  y  y 
n 1
2
Sample Correlation
The sample covariance scaled
to account for variation in the
x’s and y’s.
r
 x  x  y  y 
n  1sx s y
3
1
Stat 301 – Lecture 10
Properties
The value of the correlation
coefficient, r, is always
between
 –1: a perfect negative linear
relationship
+1: a perfect positive linear
relationship
4
Properties
r = 0: There is no linear
relationship between the two
numerical variables.
Random scatter
There could be a relationship,
but not one that is linear.
5
Properties
The correlation coefficient, r,
does not have any units.
Changing the scales of the
numerical variables will not
change the value of the
correlation coefficient.
6
2
Stat 301 – Lecture 10
CO2 and Temperature
r
 x  x  y  y 
n  1s x s y
63.68808
1916.321080.22878
r  0.8977
r
7
CO2 and Temperature
There is a strong, positive
linear association between
the carbon dioxide
concentration and the
temperature.
8
CO2 and Temperature
Is the linear association
between the carbon dioxide
concentration and
temperature statistically
significant?
9
3
Stat 301 – Lecture 10
Step 1: Hypotheses
H 0 :  0 (no linear association)
H A :  0 (linear association)
10
Step 2: Test statistic
t
r 0
1 r 2
n2
t  8.64

0.8977
1  0.8059
18
P  value  0.0001
11
Step 3: Decision
Reject the null hypothesis
because the P-value is so
small.
12
4
Stat 301 – Lecture 10
Step 4: Conclusion
There is a statistically
significant linear association
between carbon dioxide
concentration and
temperature.
13
Connection
The test for the statistical
significance of correlation is
exactly the same as the test
for the statistical significance
of the estimated slope.
14
Connection
∑
∑
̅
1
̅
1
15
5
Stat 301 – Lecture 10
Connection
R 2  r 
2
R 2  0.8977   0.8059
2
16
Difference
R2 can be interpreted as a %
of total variation.
r has a sign (+/–) that
matches the direction of the
association and cannot be
interpreted as a %.
17
JMP
Analyze – Multivariate
Methods – Multivariate
Y, Columns: CO2, Temp
Multivariate
Pairwise correlations
18
6
Stat 301 – Lecture 10
Multivariate
Correlations
CO2
Temp
CO2
1.0000
0.8977
Temp
0.8977
1.0000
Scatterplot Matrix
370
360
350
CO2
340
330
320
310
14.6
14.5
14.4
14.3
Temp
14.2
14.1
14.0
13.9
13.8
310 320 330 340 350 360 370
13.8
14.014.1
14.3
14.5
Pairwise Correlations
Variable by Variable
Temp
CO2
Correlation Count Lower 95% Upper 95% Signif Prob
0.8977
20
0.7552
0.9592
<.0001*
19
-.8 -.6 -.4 -.2 0 .2 .4 .6 .8
JMP Output
 JMP does not give you the
value of the test statistic.
 JMP does give a 95%
confidence interval for the
population correlation
coefficient.
20
95% Confidence Interval
r = 0.8977
95% confidence interval on ρ.
0.7552
to 0.9592
Note that this interval is not
symmetric around the value
of r.
21
7
Download