Stat 653 HW4 Divya Nair Exercise 1 (2.18). Table 2.13 shows data

advertisement
Stat 653 HW4
Divya Nair
(2.18). Table 2.13 shows data from 2002 General Social Survey cross classifying a person's
perceived happiness with their family income. The table displays the observed and expected cell counts and
the standardized residuals for testing independence.
Exercise 1
Income
Above Average
Average
Below Average
Not Too Happy
Happiness
Pretty Happy
Very Happy
21
35.8
−2.973
53
79.7
−4.403
94
52.5
7.368
159
166.1
−0.947
372
370.0
0.224
249
244.0
0.595
110
88.1
3.144
221
196.4
2.907
83
129.5
−5.907
a. Show how to obtain the estimated expected cell count of 35.8 for the rst cell.
The table below extracted the observed cell counts from the original given table and calculated all the essential total number of frequencies needed to answer the question.
Happiness
Income
Not Too Happy Pretty Happy Very Happy Total
Above Average
21
159
110
290
Average
53
372
221
646
Below Average
94
249
83
426
Total
168
1362
The estimated expected cell count µ̂ij is given by ninnj for all ith row and j th column. Thus, µ̂21 =
290×168
= 35.8.
1362
Solution.
b. For testing independence, X 2 = 73.4. Report the df value and the p-value, and interpret.
Solution. The degrees of freedom df is (I − 1)(J − 1) = (3 − 1)(3 − 1) = 4. This chi-squared distribution
√
√
has a mean of df = 4 and a standard deviation of 2df = 2 2. So, a value of 73.4 is far out in the
right-hand tail and the p-value is approximately 0. Thus, we reject the null hypothesis and conclude
that happiness and income are associated.
c. Interpret the standardized residuals in the corner cells having counts 21 and 83.
The original given table has large negative residuals for subjects who have above average
income and are not too happy, and subjects who have below average income and are very happy. Thus,
there were fewer subjects of these types than the hypothesis of independence predicts.
Solution.
d. Interpret the standardized residuals in the corner cells having counts 110 and 94.
The original given table has large positive residuals for subjects who have above average
income and are very happy, and subjects who have below average income and are not too happy. Thus,
there were more subjects of these types than the hypothesis of independence predicts.
Solution.
(2.19(a)). Table 2.14 was taken from 2002 General Social Survey. Test the null hypothesis of
independence between party identication and race. Interpret.
Party Identication
Race
Democrat Independent Republican Total
White
871
444
873
2188
Black
302
80
43
425
Total
1173
524
916
2613
Exercise 2
1
Stat 653 HW4
Divya Nair
The null hypothesis is that the two response variables are independent, that is, πij = πi · πj for
all i, j . The alternate hypothesis is that the two response variables are dependent on each other. Recall
2
P
ij )
where the estimate of the
that the Pearson chi-squared statistic for testing H0 is given by X 2 = (nijµ̂−µ̂
ij
Solution.
expected frequency is given by µ̂ij =
cell is given below.
ij
ni ·nj
n
µ̂11 =
µ̂12 =
µ̂13 =
µ̂21 =
µ̂22 =
µ̂23 =
. A calculation of the estimated expected frequencies for each
2188 × 1173
= 982.2135
2613
2188 × 524
= 438.7723
2613
2188 × 916
= 797.0142
2613
425 × 1173
= 190.7865
2613
425 × 524
= 85.2277
2613
425 × 916
= 148.9858
2613
The Pearson chi-squared statistic is
(444 − 438.7723)2
(873 − 797.0142)2
(871 − 982.2135)2
+
+
+
982.2135
438.7723
797.0142
(302 − 190.7865)2
(80 − 85.2277)2
(43 − 148.9858)2
+
+
190.7865
85.2277
148.9858
= 160.4448.
X2 =
The degrees of freedom is (I − 1)(J − 1) = (2√− 1)(3 − 1) = 2. Since this chi-squared distribution has a
mean of df = 2 and a standard deviation of 2df = 2, a value of 160.4448 is far out in the right tail.
Thus, the p-value is approximately 0. Consequently, we reject the null hypothesis and conclude that party
identication and race are associated.
(2.21). Each subject in a sample of 100 men and 100 women is asked to indicate which of the
following factors (one or more) are responsible for increases in teenage crime: A, the increasing gap in income
between the rich and poor; B, the increase in the percentage of single-parent families; C, insucient time
spent by parents with their children. A cross classication of the responses by gender is
Gender A B C
Men
60 81 75
Women 75 87 86
Exercise 3
a. Is it valid to apply the chi-squared test of independence to this 2 × 3 table? Explain.
Solution. A test of independence cannot be applied on the given 2 × 3 table since the subjects can select
multiple factors which makes the three columns be dependent on each other.
b. Explain how this table actually provides information needed to cross classify gender with each of
three variables. Construct the contingency table relating gender to opinion about whether factor A is
responsible for increases in teenage crime.
There should be a table for gender versus each factor. For example,
A
Gender
Yes
No
Men
60 100 − 60 = 40
Women 75 100 − 75 = 25
Solution.
2
Stat 653 HW4
Divya Nair
(2.22). Table 2.15 classies a sample of psychiatric patients by their diagnosis and by whether
their treatment prescribed drugs.
Diagnosis
Drugs No Drugs Total
Schizophrenia
105
8
113
Aective disorder
12
2
14
Neurosis
18
19
37
Personality disorder
47
52
99
Special symptoms
0
13
13
Total
182
94
276
Exercise 4
a. Conduct a test of independence, and interpret the p-value.
H0 : πij = πi · πj for all i, j and Ha : πij 6= πi · πj for some i, j . The Pearson chi-squared
2
P
ij )
statistic is X 2 = (nijµ̂−µ̂
= 84.1888. The degrees of freedom is (I − 1)(J − 1) = (5 − 1)(2 − 1) = 4.
ij
ij
√
√
Since this chi-squared distribution has a mean of df = 4 and a standard deviation of 2df = 2 2, a
value of 84.1888 is far out in the right tail and the p-value is approximately 0. Consequently, we reject
Solution.
the null hypothesis and conclude that the diagnosis of the psychiatric patients is associated to whether
their treatment prescribed drugs.
b. Obtain standardized residuals, and interpret.
nij −µ̂ij
The standardized residuals are given by √µ̂ ·(1−p
where µ̂ij = nin·nj . The calcuij
i )·(1−pj )
lations for the row and column marginal proportions and estimated expected frequencies are shown
below.
Solution.
n11 = 105
µ̂11 =
n12 = 8
µ̂12 =
n21 = 12
µ̂21 =
n22 = 2
µ̂22 =
n31 = 18
µ̂31 =
n32 = 19
µ̂32 =
n41 = 47
µ̂41 =
n42 = 52
µ̂42 =
n51 = 0
µ̂51 =
n52 = 13
µ̂52 =
113 × 182
= 74.5145
276
113 × 94
= 38.4855
276
14 × 182
= 9.2319
276
14 × 94
= 4.7681
276
37 × 182
= 24.3986
276
37 × 94
= 12.6014
276
99 × 182
= 65.2826
276
99 × 94
= 33.7174
276
13 × 182
= 8.5725
276
13 × 94
= 4.4275
276
The table below lists the standardized residuals for each i, j .
Diagnosis
Drugs No Drugs
Schizophrenia
7.8742
−7.8745
Aective disorder
1.6022
−1.6023
Neurosis
−2.3853
2.3852
Personality disorder −4.8416
4.8418
Special symptoms −5.1393
5.1396
3
p1 =
p2 =
p3 =
p4 =
p5 =
p1 =
p2 =
113
276
14
276
37
276
99
276
13
276
182
276
94
276
= 0.4094
= 0.0507
= 0.1341
= 0.3587
= 0.0471
= 0.6594
= 0.3406
Stat 653 HW4
Divya Nair
To demonstrate an example of standardized residuals calculations, the standardized residual for the
105−74.5145
rst cell obtained by √74.5145×(1−0.4094)×(1−0.6594)
= 7.8742. The standardized residuals for other cells
are calculated similarly. Each cell shows a greater discrepancy between nij and µ̂ij than we would
expect if the variables were truly independent.
Exercise 5
(2.25). For tests of H0 : independence, µ̂ij =
.
ni nj n
a. Show that {µ̂ij } have the same row and column totals as {nij }.
Solution.
We know that
P
µ̂ij =
P ni+ n+j
ij
n
ij
. For each xed i, this sum is equal to
X ni+ n+j
n
ij
ni+ X
n+j
n j
=
ni+
·n
n
= ni+ .
=
Hence, for a xed i,
P
ij
µ̂ij = ni+ for each i. Similarly, for a xed j ,
X
µ̂ij =
ij
X ni+ n+j
ij
n
X
n+j
ni+
n i
n+j
·n
=
n
= n+j ,
=
for each j . Thus, {µ̂ij } have the same row totals ni+ and column totals n+j as {nij }.
b. For 2 × 2 tables, show that
µ̂11 µ̂22
µ̂12 µ̂21
= 1.0. Hence, {µ̂ij } satisfy H0 .
Solution.
µ̂11 µ̂22
=
µ̂12 µ̂21
n1+ n+1
n
n1+ n+2
n
·
·
n2+ n+2
n
n2+ n+1
n
{all terms cancel}
=1
(2.30). Table 2.17 contains results of a study comparing radiation therapy with surgery in
treating cancer of the larynx. Use Fisher's exact test to test H0 : θ = 1 against Ha : θ > 1. Interpret results.
Cancer Controlled Cancer Not Controlled Total
Surgery
21
2
23
Radiation Therapy
15
3
18
Total
36
5
41
Solution. For Ha : θ > 1, the p-value is the right-tail hypergeometric probability that n11 is at least as large
as the oberved value; that is, p = P (21) + P (22) + P (23). The probability of a particular value n11 equals
Exercise 6
(nn1+
)(n+1n2+
−n11 )
11
. Thus, the p-value is
P (n11 ) =
n
(n+1
)
p = P (21) + P (22) + P (23)
23 18
23 18
23
=
21
15
41
36
+
22
14
41
36
= 0.3808.
4
+
18
13
23
41
36
Stat 653 HW4
Divya Nair
We fail to reject the null hypothesis and conclude that the treatment of cancer is independent of surgery
and radiation therapy.
5
Download