Correlation A measure of the strength of the linear association between two numerical

advertisement
Correlation
A measure of the strength of
the linear association
between two numerical
variables.
1
Sample Covariance
Measure of the co-variability
between two numerical
variables.
 x  x  y  y 
n 1
2
Sample Correlation
The sample covariance scaled
to account for variation in the
x’s and y’s.
x  x  y  y 

r
n  1sx s y
3
Properties
The value of the correlation
coefficient, r, is always
between
 –1: a perfect negative linear
relationship
+1: a perfect positive linear
relationship
4
Properties
r = 0: There is no linear
relationship between the two
numerical variables.
Random scatter
There could be a relationship,
but not one that is linear.
5
Properties
The correlation coefficient, r,
does not have any units.
Changing the scales of the
numerical variables will not
change the value of the
correlation coefficient.
6
CO2 and Temperature
x  x  y  y 

r
n  1sx s y
63.68808
r
1916.321080.22878
r  0.8977
7
CO2 and Temperature
There is a strong, positive
linear association between
the carbon dioxide
concentration and the
temperature.
8
CO2 and Temperature
Is the linear association
between the carbon dioxide
concentration and
temperature statistically
significant?
9
Step 1: Hypotheses
H 0 : 0 (no linear association)
H A :  0 (linear association)
10
Step 2: Test statistic
r 0
0.8977
t

2
1  0.8059
1 r
18
n2
t  8.64
P  value  0.0001
11
Step 3: Decision
Reject the null hypothesis
because the P-value is so
small.
12
Step 4: Conclusion
There is a statistically
significant linear association
between carbon dioxide
concentration and
temperature.
13
Connection
The test for the statistical
significance of correlation is
exactly the same as the test
for the statistical significance
of the estimated slope.
14
Connection

15
Connection
R  r 
2
2
R  0.8977   0.8059
2
2
16
Difference
2
R
can be interpreted as a %
of total variation.
r has a sign (+/–) that
matches the direction of the
association and cannot be
interpreted as a %.
17
JMP
Analyze – Multivariate
Methods – Multivariate
Y, Columns: CO2, Temp
Multivariate
Pairwise correlations
18
19
JMP Output
 JMP does not give you the
value of the test statistic.
 JMP does give a 95%
confidence interval for the
population correlation
coefficient.
20
95% Confidence Interval
r = 0.8977
95% confidence interval on ρ.
0.7552
to 0.9592
Note that this interval is not
symmetric around the value
of r.
21
Download