Document

advertisement
Statistical Analysis
Professor Lynne Stokes
Department of Statistical Science
Lecture #2
Chi-square Tests for Homogeneity,
Chi-square Goodness of Fit Test,
1
Chi-square Tests
1.
2.
Tests for independence in contingency
tables
Tests for homogeneity
2
Binomial Samples
(Product Binomial Sampling)
Ho: pW = 0.5 vs. Ha: pW
Genetic Theory:
1
60
40
100
Wrinkled
Smooth
Total
2
108
92
200
3
80
100
180
4
118
90
208
5
165
135
300
6
106
76
182
7
105
125
230

0.5
8
90
110
200
Total
832
768
1600
Assumptions: 8 samples,
mutually independent counts



Hypothesis #1: Is pw = 0.5?
 Binomial inference on p
 Equivalently, overall goodness of fit (known p)
Hypothesis #2: Are all the pw equal?
 Test for homogeneity (equal but unknown p)
Hypothesis #3: Is each pw = 0.5?
 Goodness of fit (8 samples, known p)
3
Test of Homogeneity of k Binomial
Samples, Specified p
Ho: p1 = p2 = … =p8 = 0.5 vs. Ha: pj  0.5 for some j
Wrinkled
Smooth
Total
1
60
40
100
2
108
92
200
3
80
100
180
4
118
90
208
5
165
135
300
6
106
76
182
7
105
125
230
8
90
110
200
Total
832
768
1600
Chisquare
4.00
1.28
2.22
3.77
3.00
4.95
1.74
2.00
22.96
k
 2 (k)    2j (1)
j1
Does not assume
homogeneity
(see below)
X2 = 22.96 , df = 8 , p = 0.003
4
Test of Homogeneity of k Binomial
Samples: Unspecified p
Ho: p1 = p2 = … =p8 vs. Ha: pj  pk for some (j,k)
Wrinkled
Smooth
Total
1
60
40
100
2
108
92
200
3
80
100
180
4
118
90
208
5
165
135
300
6
106
76
182
7
105
125
230
8
90
110
200
Total
832
768
1600
Expected
Wrinkled
Smooth
Ri
E ij  C jpˆ i , pˆ i 
, R i  ith row total, C j  jth column total
n
E ij 
Ri  C j
n
5
Test of Homogeneity of k
Binomial Samples:
Unspecified p
Ho: p1 = p2 = … =p8 vs. Ha: pj  pk for some (j,k)
Wrinkled
Smooth
Total
Wrinkled
Smooth
Wrinkled
Smooth
1
60
40
100
52.00
48.00
1.23
1.33
2
108
92
200
104.00
96.00
0.15
0.17
3
80
100
180
4
118
90
208
5
165
135
300
6
106
76
182
7
105
125
230
8
90
110
200
Total
832
768
1600
93.60
86.40
Expected
108.16
156.00
99.84
144.00
94.64
87.36
119.60
110.40
104.00
96.00
832
768
1.98
2.14
Chi-square
0.90
0.52
0.97
0.56
1.36
1.48
1.78
1.93
1.88
2.04
20.43
pˆ  0.52
X2 = 20.43 , df = 7 , p = 0.005
Note : df  8 1  7 (only estimatedone parametersince pˆ 2  1  pˆ 1)
Note: Only one of each pair of expected values is independently estimated
(k = 8, not 16)
6
Chi-square Tests
1.
2.
3.
Tests for independence in contingency
tables
Tests for homogeneity
Goodness of fit tests
7
Chi-square Goodness of Fit Test:
Specified Probabilities
Assumptions



n independent observations
k mutually exclusive possible outcomes
pj = Pr(outcome j) is the same on every
trial
Sample size condition
All npj  1
At least 80% of the npj  5
8
Goodness of Fit Test:
Specified Probabilities
Sample size: n
Observed count for outcome j : Oj
Expected count for outcome j : Ej = npj
Ho: Pr(outcome j) = pj
Ha: Pr(outcome j)  pj
for j = 1 , ... , k
for at least one j
Reject Ho if X2 > Xa2
k
(O j - E j ) 2
j=1
Ej
X2 = 
Xa2 = Chi-Square
df = k - 1
9
Cognitive Learning
Path Chosen
A
B
C
D
Total
Number of rats
4
5
8
15
32
Expected number 8
8
8
8
32
1
1
H0 : p j 
j  1, 2, 3, 4 vs. H a : p j 
for some j
4
4
(4 - 8)2 (5 - 8)2 (8 - 8)2 (15 - 8)2
 =
+
+
+
8
8
8
8
= 2.00 + 1.12 + 0.00 + 6.12 = 9.24
2
p = 0.026
Using a significance level of a = 0.05, there is sufficient
Sufficient Evidence of
evidence (p = 0.026) to reject the hypothesis that rats
Cognitive Learning ?
choose the 4 doors with equal probability.
10
Mendelian Inheritance
Do the genotypes of a cross-breeding occur in the ratio
9:3:3:1 ?
9
3
1
H0 : p1 = , p 2 = p3 = , p 4 =
16
16
16
Genotype
Observed
Expected
1
150
144
2
46
48
Ha : Some probabilit ies differ
3
40
48
4
20
16
Total
256
Reject Ho if X2 > 7.815 (a = 0.05)
11
Mendelian Inheritance
Genotype
Observed
Expected
(O j - E j ) 2
Ej
1
150
144
:
0.25
2
46
48
3
40
48
4
20
16
0.08
1.33
1.00
Total
256
X2 = 0.25 + 0.08 + 1.33 + 1.00
= 2.66
There is insufficient evidence (p > 0.10) at a significance
level of 0.05 to conclude that the genotypes from this type
of cross-breeding occur in proportions that differ from 12
those predicted by Mendelian inheritance theory.
Chi-Square Goodness of Fit Test:
Unknown Parameters


Estimate the parameters of the distribution
Divide range of data values into mutually
exclusive and exhaustive classes




Discrete data: often use the values themselves
Continuous data: use k = n1/2 or k = log(n) classes
Estimate the probability of being in each class
Compare the observed (Oi) counts in each class
with the estimated expected (Ei) counts
k
X
i 1
Oi - E i 2 ~  2 (k - r - 1)
Ei

, r  # estimated parameters
13
Chi-Square Goodness of Fit Test for
the Poisson Distribution
Number of senders
(automated telephone
equipment) in use at
a given time
H0: number ~ Poisson
Ha: number not Poisson
Reject if X > 20.05(20) = 31.4
df: 22 – 1 (mutually exclusive &
exhaustive) – 1 (estimated
parameter) = 20
ˆ  10.4378 , X 2  43.16
Number
in Use
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
Total
Observed
Frequency
0
5
14
24
57
111
197
278
378
418
461
433
413
358
219
145
109
57
43
16
7
8
3
3754
Estimated
Probability
0.0000
0.0003
0.0016
0.0055
0.0145
0.0302
0.0526
0.0784
0.1023
0.1187
0.1239
0.1176
0.1023
0.0822
0.0613
0.0427
0.0278
0.0171
0.0099
0.0054
0.0028
0.0014
0.0007
0.9995
Expected
Frequency
0.11
1.15
5.98
20.82
54.34
113.46
197.41
294.42
384.21
445.67
465.27
441.58
384.16
308.51
230.05
160.11
104.47
64.16
37.21
20.45
10.67
5.31
2.52
3752
(Obs - Exp)2
Exp
11.16
10.74
0.49
0.13
0.05
0.00
0.92
0.10
1.72
0.04
0.17
2.16
7.94
0.53
1.43
0.20
0.80
0.90
0.97
1.26
1.37
0.09
43.16
23 – 1 = 22
Categories
p  0.002
14
Chi-Square Goodness of Fit Test for
the Normal Distribution



Divide the data into mutually exclusive
and exhaustive (contiguous) classes
  and last classes are
 open First
ended
 (
,LUj -1y), (L2,U2),
U j(L
- y3, U3) … (Lk,
z Lj  L = U z Uj 
) with
js y
j-1
sy
Estimate the mean and standard
deviation
Calculate z-scores for the limits of
each class
15
Chi-Square Goodness of Fit Test


Can be applied to any discrete or continuous
probability distribution, only probabilities need be
specified: Ei = npi
Asymptotic chi-square distribution



All Ei > 1 & at Least 80% of the Ei > 5
Does not have the highest power for specific
distributions, against specific alternatives
Degrees of freedom (k classes)


If each class represents an independent sample (i.e, k
replicate samples) and all parameters are known (i.e.,
known probabilities), df = k
If the classes represent mutually exclusive and
exhaustive categories (i.e., expected frequencies must
sum to n), data are independent and from a single
sample
 All parameters are known, df = k – 1
 r parameters are estimated: df = k – r – 1

e.g., (n – 1)s2/s2 ~ 2(n – 1)
16
Goodness of Fit to the Binomial,
Known p


Normal theory approximation
Chi-square tests
17
Binomial Sample, Specified p:
Normal Theory Approximation
Genetic Theory:
Wrinkled
Smooth
Total
Ho: pW = 0.5 vs. Ha: pW  0.5
1
2
3
4
5
6
7
8
Greater
Power
by
Combining
Samples
60
108
80
118
165
106
105
90
40
92
100
90
135
76
125
110
100
200(Assuming
180
208 Homogeneity)
300
182
230
200
832  800
z
 1.600
1600(0.5)(0.5)
Total
832
768
1600
p = 0.110
18
Alternative to the Binomial Test:
Chi-square Goodness of Fit, Specified p
Ho: pW = 0.5 vs. Ha: pW  0.5
Genetic Theory:
Observed
Expected
Wrinkled Smooth
832
768
800
800
Total
1600
1600
2
2
(
832

800
)
(
768

800
)
2 

 2.56
800
800
p = 0.110
z 2  (1.60)2  2.56
19
Overall Binomial Test vs.
Test of Homogeneity, Specified p
Ho: p1 = p2 = … =p8 = 0.5 vs. Ha: pj  0.5 for some j
Wrinkled
Smooth
Total
1
60
40
100
2
108
92
200
3
80
100
180
4
118
90
208
5
165
135
300
6
106
76
182
7
105
125
230
8
90
110
200
Total
832
768
1600
Chisquare
p-value
4.00
0.046
1.28
0.258
2.22
0.136
3.77
0.052
3.00
0.083
4.95
0.026
1.74
0.187
2.00
0.157
22.96
0.003
X2 = 2.56
, df = 1 , p = 0.110
X2 = 22.96 , df = 8 , p = 0.003
Greater Power if Homogeneous
Greater Power if Not Homogeneous
Note : 5 of the pˆ  0.5 and 3 of the pˆ  0.5
20
Binomial Samples
Wrinkled
Smooth
Total
1
60
40
100
2
108
92
200
3
80
100
180
4
118
90
208
Homogeneit y Test of H 0 : p j
5
165
135
300
6
106
76
182
7
105
125
230
8
90
110
200
 0.5
 2  22.96
Overall Test of H 0 : p W  0.5
 2  2.56
Homogeneit y Test of H 0 : p j
Note : p j  pW  j

 pW
 2  20.43
Total
832
768
1600
pw unspecified
pij  pi p j  (i, j)
More Common
H o : pij  pi p j vs. H a : pij  pi p j
Homogeneity, unspecified p equivalent to independence
21
Some Goodness of Fit Tests

Chi-square Goodness-of-fit test


Kolmogorov-Smirnov goodness-of-fit test


Very general, can have little power
Good general test, especially for continuous
random variables
Wilk-Shapiro test for normality

Regarded as the best test for normality
22
Comparing Odds Ratios Across
Categories
23
Race and Death Penalty
Punishment
Across Aggravation Levels
Victim's
Death Penalty
Race
Yes
No
Total
White
45
85
130
Black
14
218
232
Total
59
303
362
Expected Frequencies
Victim's
Death Penalty
Race
Yes
No
Total
White
21.1878 108.8122
130
Black
37.8122 194.1878
232
Total
59
303
362
Column Percentages
Victim's
Death Penalty
Race
Yes
No
White
76.3
28.1
Black
23.7
71.9
Total
100
100
Cell Chi-square Values
Victim's
Death Penalty
Race
Yes
No
Total
White
25.6494
4.9944
30.6439
Black
14.3725
2.7986 17.17115
Total
40.02199 7.79306 47.81504
Total
35.9
64.1
100
Chisquare Value
p-Value
47.82
< 0.0001
Are the results consistent across aggravation levels ?
24
Mantel-Haenszel Test


Victim's
Race
White
Black
Total
Several 2 x 2 tables
Assuming a common odds ratio, test that the odds
ratio = 1
Aggravation Level = 1
Death Penalty
Yes
No
Total
2
60
62
1
181
182
3
241
244
Aggravation Level = 4
Victim's
Death Penalty
Race
Yes
No
Total
White
9
3
12
Black
2
4
6
Total
11
7
18
Victim's
Race
White
Black
Total
Aggravation Level = 2
Death Penalty
Yes
No
Total
2
15
17
1
21
22
3
36
39
Aggravation Level = 5
Victim's
Death Penalty
Race
Yes
No
Total
White
9
0
9
Black
4
3
7
Total
13
3
16
Victim's
Race
White
Black
Total
Aggravation Level = 3
Death Penalty
Yes
No
Total
6
7
13
2
9
11
8
16
24
Aggravation Level = 6
Victim's
Death Penalty
Race
Yes
No
Total
White
17
0
17
Black
4
0
4
Total
21
0
21
25
Race and Death Penalty Punishment
Expected frequencies for chi-square test of independence
Aggravation Level = 1
Victim's
Death Penalty
Race
Yes
No
Total
White
0.7623
61.2377
62
Black
2.2377 179.7623
182
Total
3
241
244
Victim's
Race
White
Black
Total
Aggravation Level = 4
Death Penalty
Yes
No
Total
7.3333
4.6667
12
3.6667
2.3333
6
11
7
18
Aggravation Level = 2
Victim's
Death Penalty
Race
Yes
No
Total
White
1.3077
15.6923
17
Black
1.6923
20.3077
22
Total
3
36
39
Victim's
Race
White
Black
Total
Aggravation Level = 5
Death Penalty
Yes
No
Total
7.3125
1.6875
9
5.6875
1.3125
7
13
3
16
Aggravation Level = 3
Victim's
Death Penalty
Race
Yes
No
Total
White
4.3333
8.6667
13
Black
3.6667
7.3333
11
Total
8
16
24
Victim's
Race
White
Black
Total
Aggravation Level = 6
Death Penalty
Yes
No
Total
17
0
17
4
0
4
21
0
21
Note: None have sufficient sample sizes for tests of independence
26
Mantel-Haenszel Test
Select one cell; e.g., upper-left
Calculate the excess for each table
1.
2.
•
•
Calculate the variances of the excesses
3.
•
4.
Excess = Observed – Expected
e.g., Excess = O11 – E11
Variance = R1R2C1C2/n2(n-1)
z
 Excesses
Across Tables
 Variances
Across Tables
27
Race and Death Penalty Punishment
Aggrivation Level = 1
Victim's
Death Penalty
Race
Yes
No
White
2
60
Black
1
181
Total
3
241
Victim's
Race
White
Black
Total
Aggrivation Level = 4
Death Penalty
Yes
No
9
3
2
4
11
7
Total
62
182
244
Aggrivation Level = 2
Victim's
Death Penalty
Race
Yes
No
White
2
15
Black
1
21
Total
3
36
Total
12
6
18
Victim's
Race
White
Black
Total
Aggravation
Excess
Variance
1
1.238
0.564
Aggrivation Level = 5
Death Penalty
Yes
No
9
0
4
3
13
3
Total
17
22
39
Aggrivation Level = 3
Victim's
Death Penalty
Race
Yes
No
White
6
7
Black
2
9
Total
8
16
Total
13
11
24
Total
9
7
16
Victim's
Race
White
Black
Total
Aggrivation Level = 6
Death Penalty
Yes
No
17
0
4
0
21
0
Total
17
4
21
2
0.692
0.699
3
1.667
1.382
4
1.667
1.007
z-Score
3.356
p-Value
5
1.688
0.640
6
0.000
0.000
Total
6.952
4.292
0.0004
Conclusion:
Nearly 7 more white-victim murderers received the death penalty
than would be expected if the odds were the same for
white- and black-victim murderers
28
29
Estimating the Common Odds Ratio
n11n 22 / T over all the tables


ˆ
 n12n 21 / T over all the tables
Death Penalty and Race

ˆ  5.49
30
Download