4.2 Numerical measures

advertisement
4.2 Numerical Measures of Association:
There are several numerical measures of association. We first introduce the covariance
of two variables.
(I)
Covariance:
Suppose we have two populations,
population 1:
y1 , y 2 ,, y N
and population 2:
w1 , w2 ,, wN .
Also, let
sample 1:
x1 , x2 ,, xn
z1 , z2 ,, z n
and sample 2:
are drawn from population 1 and population 2, respectively.
Let u y and u w be the population means of populations 1 and 2, respectively.
Let
n
x
x
i 1
n
i
z
and
n
z
i 1
i
n
be the sample means of samples 1 and 2, respectively.
Then, the population covariance is
N
 yw 
(y
i 1
 y )( wi  w )
i
,
N
while the sample covariance
n
s xz 
 ( x  x )( z
i 1
i
i
n
 z)

n 1
x z
i 1
i i
 nx z
n 1
.
Intuitively, s xz would be very large (positive) as the observations in two
population are larger or smaller than the sample means simultaneously. That is, the
observations are positively correlated. On the other hand, s xz would be very
small (negative) as the observations in one population are larger than the sample
1
mean while the ones in the other population are smaller than the sample mean.
Therefore, the observations are negative correlated. Finally, s xz would be close
to 0 as the observations in one population being larger than the sample mean while
the ones in the other population are sometimes larger but sometimes smaller than the
sample mean, i.e., the observations in the two populations are not correlated.
Example: .
Let xi be the total money spent on advertisement for some product and z i be the
sales volume (1 unit  100 packs).
xi
2
5
1
zi
50
57
41
3
4
1
5
3
4
2
54
54
38
63
48
59
46
( xi  x )( z i  z )
0
3
26
24
0
8
5
1
12
20
10

Note:
s xz 
 (x
i 1
i
 x )( z i  z )
10  1

99
 11 .
10  1
s xz is not scale invariant. For example, in the above example, if
the sales volume is 1 unit  1 pack. Then, z i would be 5000, 5700,
4100, 5400, 5400, 3800, 6300, 4800, 5900, 4600. Thus, s xz will be 1100,
which 100 times larger than the original one. It is not plausible since the
correlation between the total money on advertisement and the sales
volume would change as the measurement unit changes. The quantity
introduced next is scale-invariant and can be used to measure the
correlation of two populations.
(II) Correlation Coefficient:
Let
 y : population standard deviation for
y1 , y 2 ,, y N
 w : population standard deviation for
w1 , w2 ,, wN
s x : sample standard deviation for
2
x1 , x2 ,, xn
z1 , z2 ,, z n .
s z : sample standard deviation for
Then, the population correlation coefficient is
 yw

 y w
 yw
,
while the sample correlation coefficient is
n
rxz
 x
s
 xz 
sx sz
i
i 1
n
 xi
 x  zi  z 
 x
2
i 1
Note:  yw  1
and
n
  zi
 z
2
.
i 1
rxz  1
Example (continue):
10
(x  x)
s x2 
i 1
i
10  1
10
2
 1.4907 
2
and
s z2 
(z
i 1
i
 z )2
10  1
 7.9303
2
Then,
rxz 
Note:
s xz
 0.93 .
sx sz
rxz is scale-invariant. For example, even the sales volume is
measured in 1 pack per unit, the value of rxz is still the same, 0.93.
Example:
Let z i  2 xi , i  1,2,3,4,5 .
xi
1
2
3
4
5
zi
2
4
6
8
10
Then,
3
5
x  3, z  6, s x 
5
 ( xi  x ) 2

i 1
5 1
5
s xz 
 (x
i 1
i
5
, sz 
2
 x )( z i  z )
5 1
 (z
i 1
i
 z)2
5 1
 10 ,
 5.
Thus,
rxz 
s xz

sx sz
5
5
10
2
1
.
Note: when there is a perfect positive linear relationship between
variable x and z, then rxz  1 . rxz  1 might indicate a positive linear
relationship.
Online Exercise:
Exercise 4.2.1
4
Download