CORRELATION

advertisement
Statistics: Continuous Methods
STAT452/652, Fall 2008
Computer Lab 4
Thursday, October 16, 2008
Ansari Business Building, 610
11:00AM-12:15PM
CORRELATION
with
Instructor: Ilya Zaliapin
STAT452/652, Fall 2008, Computer Lab 4
Page 1
Topic: Correlation coefficients
Goals:
• Learn how to compute the Pearson’s and Spearman’s
correlation coefficients in Minitab
• Learn how to find and interpret statistical significance of the
correlation
• Learn how to interpret results of the correlation analysis
Assignments:
1) Download the data set correlation.MTW from the course web
site to your Minitab session; it contains four samples: W, X, Y,
and Z.
2) Decide which samples are from the Normal distribution (you
can use the probability plot option with the Anderson-Darling
test).
3) Perform a visual correlation analysis of the data, decide which
pairs of variables (out of 6 possible pairs) have association, and
discuss whether the association is monotone or linear.
4) Compute Pearson’s correlation for each pairs.
5) Compute significance (P-value) for the Pearson’s correlation;
discuss which P-values can be used for analysis, and which
should be dismissed (use the results of 2).
6) Compute Spearman’s correlation for each pair.
7) Compare the results of 3)-6), discuss, and make conclusions.
Report:
A printed report for this Lab is due on Thursday, October 23 in class.
BW printouts are OK. Reports will not be accepted by mail.
STAT452/652, Fall 2008, Computer Lab 4
Page 2
1. Introduction
Applied research often focuses on relationships (associations) between different
processes or phenomena. In the probabilistic context, one deals with
relationships (associations, dependence) between random variables. In practice,
establishing dependence (or independence), which is a property of the entire
joint distribution of the random variables, is commonly replaced with a simpler
task of establishing correlation, which only reflects some partial properties of the
joint distribution. Coefficient of correlation is a scalar measure of association
between paired observations (Xi,Yi), i =1,…, n. The understanding of association
differs among different coefficients of correlation, which leads to different results
of correlation analysis and different result interpretations. In class, we have
discussed three correlation coefficients: Kendall’s τ, Pearson’s r, and
Spearman’s r. The latter two will be considered in this Lab.
2. Pearson coefficient of correlation
Definition. Consider a paired sample (Xi,Yi), i=1,…, n. The Pearson’s coefficient
of correlation is defined as
n
r( X ,Y ) =
∑( X
i =1
n
∑( X
i =1
i
i
− X )(Yi − Y )
−X)
2
n
∑ (Y − Y )
i =1
2
i
2.1 Calculation
Recall that the Pearson’s r measures linear relationships between observations.
In Minitab, Pearson’s r can be computed using the menu
Stat/Basic Statistics/Correlation
STAT452/652, Fall 2008, Computer Lab 4
Page 3
The correlation sub-window allows choosing variables for analysis and providing
other parameters. Notice that if you choose the Store matrix option, NOTHING
will be displayed.
Analysis results (values of r and the corresponding P-values) are shown in the
Session window:
2.2 Significance
The P-values computed by Minitab correspond to testing the hypothesis
H0: “The population correlation is 0” versus
Ha: “The population correlation is not 0”.
The test statistics is
U=
r n−2
1 − r2
,
STAT452/652, Fall 2008, Computer Lab 4
Page 4
which, for independent jointly Normal (X,Y) has the Student distribution with (n-2)
degrees of freedom.
2.3 Example
In this example, we perform Pearson correlation analysis for three samples, X, Y,
and Z, each of length 100.
First, we construct a scatterplot to get an idea of possible relationships among
the observations (Fig. 1).
Matrix Plot of X, Y, Z
-2
0
2
2
0
X
-2
2
0
Y
-2
2
Z
0
-2
-2
0
2
-2
0
2
Figure 1: Scatterplot of X, Y, and Z
The sactterplot suggests that the pair (X,Z) has a strong positive association
(positive correlation), the pairs (X,Y) does not seem to have association, and the
pair (Y,Z) might have a slight positive association.
Next, we proceed with a formal correlation analysis. The results are summarized
in Table 1, where the cells above diagonal display the values of the Pearson’s r,
and the cells below the diagonal – the corresponding P-values. We see that our
visual impression is confirmed by the formal analysis: the correlations between
(X,Z) and (Y,Z) are significant (the corresponding P-values are less than 0.001),
while the correlation between (X,Y) is not significant (the P-value is 0.2).
STAT452/652, Fall 2008, Computer Lab 4
Page 5
Table 1: Results of the correlation analysis for observations X, Y, and Z
(values above diagonal – Pearson’s r, values below diagonal – P-values)
X
Y
Z
X
1
-0.129
0.883
Y
0.2
1
0.350
Z <0.001 <0.001
1
We notice that despite the fact that both the correlations r(X,Z) and r(Y,Z) are
significant, the first one reflects a strong association, and the second one reflects
a weak association between random variables, which is emphasized by the
absolute values of the correlation r.
3. Spearman coefficient of correlation
Definition. Consider a paired sample (Xi,Yi), i=1,…, n. Let (RXi, RYi) be the
corresponding ranks for the sample. The Spearman’s coefficient of correlation is
defined as the Pearson’s correlation between the ranks:
ρ ( X , Y ) = r( Ri X , Ri Y )
2.1 Calculation
Minitab has no built-in routine for Spearman’s correlation. Although, you can
easily compute it by computing ranks and then applying the Perason’s
correlation.
The observation ranks are computed using the menu Data/Ranks:
STAT452/652, Fall 2008, Computer Lab 4
Page 6
Notice that since you work with ranks, the P-values reported by the Pearson’s
correlation will not be valid.
STAT452/652, Fall 2008, Computer Lab 4
Page 7
Download