STAT 557 FALL 1996 Instructions MIDTERM EXAM

advertisement
MIDTERM EXAM
STAT 557
FALL 1996
Instructions:
1.
Show your answers in the space provided on this exam. If you need more space
you can use the back of the page or additional sheets of paper, but clearly
indicate where this is done. Additional paper will be made available by the
instructor. You may use only your pencils, calculator and the formula sheet
attached to this exam. Do not waste time by trying to simplify matrix formulas.
Consider the following 3×3 contingency table:
i=1
i=2
i=3
TOTALS
A.
B.
j=1
12
22
6
40
j=2
7
15
8
30
j=3
11
13
6
6
TOTALS
30
50
20
Circle any of the following statistics that are exactly zero for this table:
(a)
(b)
(c)
(d)
2.
NAME
Pearson X2 test for independence
Kappa measure of agreement
Lambda for predicting the row category from the column category
Lambda for predicting the column category from the row category
The degrees of freedom for the test of marginal homogeneity in this 3×3
table against the general alternative are ___________.
A company that tests market potential of new products obtained a random sample of n=998
respondents to participate in a study comparing two laundry detergents, called Brand M and
Brand S. Each respondent washed clothes with both detergents and reported a preference
for one of the two detergents at the end of the trial. The company also obtained information
from each respondent on brands of detergents that they purchased in the last two years, and
the temperature and level of softness of water used to wash clothes. "Softer" water has
lower concentrations of certain minerals and other substances. Each respondent was
classified into a contingency table with respect to the four variables:
Variable
Levels
A: previous use of Brand M
1 = no
2 = yes
B: wash water temperature
1 = warm
2 = hot
C: brand preference
1 = Brand M
2 = Brand S
D: wash water softness
1 = hard, 2 = medium,
3 = soft
_____________________________________________________________________
2
In this study Brand M was a detergent currently for sale at local grocery stores and Brand S
was a new detergent, but the respondents were not told this. Nevertheless, the respondents
could be classified with respect to whether or not they previously used Brand M.
The observed counts are shown below.
No
Previous user of
Brand M (A)
Water
(B)
Temperature
Preference (C)
Water
(D)
Softness
hard (l=1)
medium (l=2)
soft (l=3)
A.
Warm (j=1)
Yes
Hot (j=2)
Warm (j=1)
Hot (j=2)
Brand Brand Brand Brand Brand Brand Brand Brand
M
S
M
S
M
S
M
S
42
50
63
68
61
50
20
33
45
46
38
29
51
55
52
27
29
26
45
47
46
25
23
27
The complete independence model can be written as
B
C
D
log( m ijk l ) = λ + λA
i + λj + λk + λl .
What are the degrees of freedom for testing the fit of this model against the saturated
model?
B.
If you computed the test in part A with CATMOD in SAS or LOGLINEAR in SPSS,
what does the software assume about the distribution of the observed counts? Make
this assumption for the rest of this problem.
C.
Using the notation established in part A, write the formula for the largest log-linear
model that satisfies the null hypothesis that Brand preference is conditionally
independent of previous use of Brand M, given the temperature and softness categories
of the wash water.
D.
A log-linear model that fits the data well (G2 = 2.36 with p-value = .992) is
B
C
D
AB
AD
CD
ACD
+ λAC
log( m ijk l ) = λ + λA
.
i + λj + λk + λl + λij
ik + λil + λk l + λik l
Describe what this model implies about independence or conditional independence of
the four variables used to construct the contingency table.
E.
What are the minimal sufficient statistics needed to estimate the parameters in the
model defined in part D of this problem?
F.
Maximum likelihood estimates of the ë-parameters in the model in part D are listed
3
below with their standard errors.
Estimate
A
ˆ
λ1 = 0.095
λˆB = 0.148
Standard Error
.033
.032
1
ˆ
λ1C = 0.119
λˆ1D = − 0.037
λˆD
2 = 0.016
AB
λˆ11
= .085
.033
.047
.046
.032
Estimate
AC
ˆ
λ11 = − .198
λˆAD = − .020
11
AD
ˆ
λ12
CD
λˆ11
CD
λˆ12
ACD
λˆ111
ACD
λˆ112
Standard Error
.033
.0447
= .003
.046
= − .118
.047
= .006
.046
= − .108
.047
= − .015
.046
Use these results to describe associations between Brand preference and level of water
softness.
3.
Given the results from problem 2, do you think it would be a good idea to use a MantelHaenszel estimator to describe the association between Brand preference and previous use
of Brand M? Explain.
4.
A.
^
Let m = (m̂1111' m̂1112 ' m̂1113 ' m̂1121 ' . . . , m̂ 2223 ) denote the vector of maximum
~
likelihood estimates of the expected counts for the model in part D of problem 2. Using
matrix notation, display an approximate formula for m as a linear function of
p = n −1 X , where X = (X1111' X1112 ' X1113 ' X1121 ' . . . , X 2223 ) is the vector of
~
~
~
observed counts.
B.
Using the result from part A, show how to derive a formula for the asymptotic
distribution of
 m̂

m̂
log(αˆ ) = log  1111 2121  .
 m̂1121 m̂ 2111 
EXAM SCORE __________
Download