13.3 Hypothesis test: multinomial population

advertisement
13.3 Hypothesis Test: Multinomial Population
Motivating example:
Objective:
we want to determine if a new product from company C has changed the
market shares.
pA : market share for company A
pB : market share for company B
pC : market share for company C
We want to test
H 0 : p A  0.3, pB  0.5, pC  0.2
vs.
H a : The population proprotion s are not p A  0.3, pB  0.5, pC  0.2
with   0.05 . We also have the following information:
total sample size:
n  200
observed number (company A):
f1  48
observed number (company B):
f 2  98
observed number (company C):
f 3  54
 f1  f 2  f 3  48  98  54  200  n
In addition, as H 0 is true, the expected number of company’s products are
expected number (company A):
e1  npA  200  0.3  60
expected number (company B):
e2  npB  200  0.5  100
expected number (company C):
e3  npC  200  0.2  40 .
Intuitively, if the differences between
fi
1
and
e i , i  1, 2, 3 ,
are small, that
might imply H 0 is true and thus the observed number and the expected number
(under H 0 ) are close. On the other hand, if the differences between
i  1, 2, 3 ,
fi
and
ei ,
are large, that might imply H 0 might not be true, the expected
number (under H 0 ) would be significantly different from the “true” expected
number and thus result in the difference between the observed number and the
expected number. The following statistic can be used to reflect the difference between
the observed number and the expected number,
3
 f i  ei 2
i 1
ei
 
2
2
2
 f 3  e3 2


f1  e1 
f 2  e2 



e1
e2
e3
2
2
2



48  60 
98  100 
54  40 



60
 7.34
100
40
General Case:
Suppose there are K populations. We want to test
H 0 : p1  a1 , p2  a2 , , pk  ak
vs.
H a : The population proprotion s are not p1  a1 , p2  a2 ,, pk  ak
where
k
a
i 1
i
 a1  a 2    a k  1
We also have the following information:
total sample size: nT
observed numbers: f i , i  1, 2,, k .
In addition, as H 0 is true, the expected numbers are
expected numbers: ei  nT ai , i  1, 2,, k
2
The test statistic is
k
 
2
 f i  ei 2   f1  e1 2   f 2  e2 2     f k  ek 2
ei
i 1
e1
Next question: how large
2
e2
ek
must be to reject H 0 ?
Chi-Square Distribution:
 n2 :
the random variable distributed as chi-square distribution with
degrees of freedom
Example:
n.


P  x   0.1  x  22.3072
P  x   0.9  x  4.168
P  32  x  0.05  x  7.814
2
15
2
9
Chi-Square Test:
Let
k
 
2
i 1
 f i  ei 2   f1  e1 2   f 2  e2 2     f k  ek 2
ei
e1
e2
The chi-square test with level of significance
ek

for
H 0 : p1  a1 , p2  a2 , , pk  ak
vs.
H a : The population proprotion s are not p1  a1 , p2  a2 ,, pk  ak
is to
reject H 0 :
 2   k21,
not reject H 0 :
 2   k21,
3
,
 k21,
where
can be obtained by


P  k21   k21,   .
In addition,

p - value  P  k21   2

.
Note:
2
As H 0 is true, the random variable with sample value
is
 k21 .
 2 : the sample statistic
 k21 : the random variable distributed as chi-square distribution with
degree of freedom k-1 and sample value
2.
 k21, : the critical value satisfying P k21   k21,    .
Motivating Example (continue):
Since
k  3,
 2  7.34  5.99   22,0.05   k21, ,
thus we reject H 0 .
Example:
The following data are the frequencies of products of throwing a dice 120 times:
Point
1
2
3
4
5
6
Frequency
13
24
18
22
19
24
Please test if the dice is fair (i.e.,
H 0 : p1  p 2    p6 
1
6)
[solution:]
ei  120 
1
 20, i  1,2, ,6.
6
4
with   0.05 .
Then,
6
 
2
 f i  ei 2 13  202 24  202 18  202



ei
i 1
20
20
20
2
2
2

22  20 19  20 24  20



20
20
20
 4.5
Since
 2  4.5  11.0705   52,0.05   k21, ,
we do not reject
H0 .
Example:
The following are the number of wrong answers for the number of the students.
Number of
wrong
answers
0
1
2
3
Number of
the
students
21
31
12
0
Suppose X is the random variable representing the number of wrong answers. Please
test X is distributed as Binomial(3,0.25) with   0.05 .
(Note: the distribution function for Binomial(3,0.25) is
 3
x
3 x
f  x    0.25 0.75 , x  0,1,2,3. .
 x
[solutions:]
As
H0
is true, the distribution for the number of wrong answers is
 3
27
p1  P X  0   0.2500.753  ,
64
 0
 3
27
p2  P X  1   0.2510.752  ,
64
 1
 3
9
p3  P X  2   0.2520.751  ,
64
 2
5
 3
1
p4  P X  3   0.2530.7501  ,
64
 3
n  21  31  12  0  64 ,
Since the sample size
under
H0
the expected numbers
are
27
27
 27, e2  np2  64 
 27,
64
64
.
9
1
e3  np3  64 
 9, e4  np4  64 
 1,
64
64
e1  np1  64 
Therefore,
4
 f k  ek 2
k 1
ek
 
2
2
2
2
2




21  27 
31  27 
12  9
 1




27
 3.92
27
9
1
Since
 2  3.92  7.81  32,0.05   k21, ,
we do not reject
H0 .
Online Exercise:
Exercise 13.3.1
Exercise 13.3.2
6
Download