Exam I Solutions Stat 557 Fall 2000

advertisement
Stat 557
Fall 2000
Exam I Solutions
A point value for each part of each problem is indicated below. These values sum to 100. You
should have a corresponding number on your graded exam for each part of each question.
If you received no credit for a response you should have a zero recorded on your paper for
that part of the question. Contact the instructor if no point value is indicated for any part
of any problem. Some of the solutions are presented here in more detail than was expected
of student responses. There are reasonable ways to approach some problems that are not
addressed in this set of solutions.
Problem 1
(a) (6 points) To do a prosepctive study you would need to recruit male volunteers between
the ages of 40 and 55 who meet any other criteria to enter the study. These volunteers
would be randomly divided into two groups of equal size. One group would be randomly
selected to receive the daily dose of aspirin, and men in the other group would receive
a placebo. The volunteers would not be told if they are taking the aspirin or the the
palcebo. The volunteers would be followed for a specied period of time (at least 5
years) and the number who experienced at least one heart attack would be recorded
for each group. The sample proportions would be used to test the null hypothesis that
the incidence of heart attacks is the same for both groups.
(b) (6 points) One way in which a retr0spective study could be done is to take a simple
random sample of patient records from a population of 40 to 55 year old males who hae
been treated for heart attacks. Take an independent simple random sample of patient
records from a population of 40 to 55 year old males with no history of heart disease.
For each sample, classify the patients as to whether or not they consumed aspirin on
a long term daily basis. Construct a 2x2 table as shown in part (c), and test null
hypothesis of independence or construct a condence interval for an odds ratio. The
solution to part (d) assumes that the table in part (c) was obtained in this way.
Alternatively, you could take a simple random sample patients records from a population of 40-55 year old males who received long term aspirin treatment prior to any
history of heart disease and claasify then with respect to their current health disease
1
status. Take an independent simple random sample of patient records from a population of 40-55 year old males who never received a long termaspirin treatment and
classify these patients with respect to current heart disease status. A third approach
would take a simple random sample of patient records from some population of 40-55
year old males and classify each patient in the sample with respect to current heart
disease status and whether or not the patient received long term aspirin treatment
prior to the onset of heart problems. If heart disease incidence is rather low in 40-55
year old males these methods could require relatively large sample sizes to observe a
substantial number of patients who experienced heart attacks. An appropriate answer
to part (d) depends on which sampling plan was proposed in part (b).
(c) (12 points) Estimate the odds ratio as
(15)(173) = :5195
^ =
(27)(185)
Use the large sample normal approximation to the distribution of log(^) to compute
an approximate 95 percent condence interval for log().
s
1 + 1 + 1 + 1
log (^) (1:96)
15 185 27 173
) ;:6548 (1:96)(:33895) ) (;1:3192; :0096)
Then an approxicmate 95 percent condence interval for the odds-ratio is
(exp(;1:3192); exp(:0096) )
) (0:27; 1:01) :
(d) (8 points) From the data in part (c), we can directly estimate the odds ratio
n
o 3
2
Pr
Aspirin
user
Heart
attack
4 n
o5
P r Aspirin non-user Heart attack
n
o 3
= 2
P r Aspirin user No heart attack
4 n
o5
P r Aspirin non-user No heart attack
Using Bayes Theorem, this becomes
n
o 3
2
P r Heart attack Aspirin user
4 n
o5
P r No heart attack Aspirin user
o 3
= 2 n
P r Heart attack Aspirin non-user
4 n
o5
P r No heart attack Aspirin non-user
=
2
4
Relative risk
of heart attack
32 n
P r No heart
54
n
o3
attack Aspirin non-user 5
o
P r No heart attack Aspirin user
2
This will be an accurate approximation to the relative risk of heart attack for the
aspirin versus non-aspirin users if the incidence of heart attacks is low for both users
and non-users, i.e.,
n
P r No heart attack Aspirin non-user
n
o
P r No heart attack Aspirin user
o
is close to 1.0 :
Problem 2 (12 points) The results for a student and her mother should be cross-classied
into a 3x3 table, with one count for each student/mother pair, as shown below. By treating
the student/mother pair as the sampling unit and recording one count in the table for
each pair, any correlation between the responses given by a student and her mother is
automatically taken into account.
Student's Response
Mother's
Response Yes No Unsure
Yes Y
Y12
Y13
No Y
Y22
Y22
Unsure Y
Y32
Y33
11
21
31
The corresponding population probablities are
Student's Response
Mother's
Response Yes No Unsure
Yes 12
13
No 22
22
Unsure 32
33
11
21
31
Let = ( )0. Then the null hypothesis of margianl homo11
21
31
12
22
32
13
23
33
3
geneity can be expressed as
2
1+
6
6
H0 : 0 = 64 2+
3+
where
3 2
+1
7
6
7
6
7
5;6
4 +2
+3
3
7
7
7
5 = C
2
6
C = 664
3
0 ;1 ;1 1 0 0 1 0 0 7
0 1 0 ;1 0 ;1 0 1 0 775
0 0 1 0 0 1 ;1 ;1 0
The alternative is that the null hypothesis is incorrect. A Wald test with an approximate
.05 Type I error (signicance) level is obtained by rejecting the null hypothesis if
Xo2 = np0 C0 (Cp C0 ); Cp
exceeds the upper :05 percentile of a central chi-square distribution with 2 degrees of freedom. The C matrix used here has rank 2. The choice of the C matrix is not unique. The
same test statistic is obtained, for example, using a C matrix consisting of any two rows of
the C matrix given above.
Problem 3 (8 points) The null hypothesis is that the success rate are the same for tho sur-
gical procedures within each hospital. This is a null hypothesis of conditional independence.
The alterantive is that the success rate are dierent for the two surgical prodecures within
some hospitals.
(i) A testing procedure that does not require the assumption of homogeneous odds ratios
within hospitals is too reject the null hypothesis if the value of the Cochran-MantelHaenszel statistic exceeds the upper percentile of a central chi-square distribution
with one degree of freedom. This test would have good relative power if one surgery
procedure was consistently better than the other.
(ii) A second approach would rst use the T test to test the null hypothesis of homogeneous
conditional odds ratios within hospitals. If this null hypoyhesis is not rejected, one
could procede to evaluate the Mantel-Haenszel estimator and construct a condence
interval for the common odds ratio. An advantage of this procedure is that it provides
an estimate of how much better (or worse) the rst procedure is relative to the second
procedure. A disdavantage is that it requires homogeneous conditional odds ratios
4
4
within hospitals. What would you do if the null hypothesis of homogeneous conditional
odds ratios was rejected by the T test?
4
(iii) You could use loglinear or logistic regression models to analyze these data, but these
approaches were not part of the material covered prior too this exam.
Problem 4
(a) (6 points) For these data
804 = C = twice the number of ocncordant pairs
212 = D = twice the number of discordant pairs
and
^ =
C;D
= 0:5827
C +D
(b) (6 points) ^ = ;: ; ;;:: : : = 0:2759
(c) (4 points) This application does not call for a measure of agreement. The catergories
for the row and column variables (student preparation and teacher job satisfaction) are
not the same, and there is no reason to focus only on the main diagonal of the table.
Consequently, the gamma measure of association is more appropriate than the Kappa
measure of agreement in this situation.
(1
42)
(1 ( 16+ 32+ 10))
1 42
Some students did not consider the nature of this particular study and tried to give
some general argument for preferring one measure to the other.
Problem 5
(a) (8 points) Let Xi denote the number of nests in which i ; 1 eggs hatched. The loglikelihood function is
`(; ) = X1 log [+(1;)e; ]+[log (1;);]
10
X
k =1
Xk+1 +log ()
10
X
k =1
10
X
kXk+1 ;
k =1
Xk+1 log (k!)
Some students converted to a multinomial model by combining all nests in which the
number of hatched eggs exceeded a certain bounded into a single category. This received slightly less than full credit because it produces a slightly less ecient estimator
for (; ): Many students failed to establish a useful notation.
5
(b) (12 points) Use the delta method. Let g(; ) = (1 ; )(). The rst partial derivatives
are
2 @ g ; 3 2
3
6
G = 64
( (
@
)
@ (g (;)
@
7
6
7
5=6
4
;
1;
7
7
5
Since the regularity conditions are satised, the maximum likelihood estimator (^; ^)0
has an approximate normal distribution with mean vector (; )0 and covariance matrix
equal to the inverse of the Fisher Information matrix, which is estimated by V. Then,
by the delta method, g(^; ^) = (1 ; ^)^ is approximately distributed as a normal
random variable with mean (1 ; )() and estimated variance
S 2 = G^ 0 V G^ = 0:0475947
Here, G^ is obtained by substituting (^; ^)0 for (; )0 in G. Then, an approximate
condence interval for (1 ; )() is
(1 ; ^)^ (1:96)S
)
(2:7893) (1:96)(:21616)
)
(2:36; 3:22):
(c) (6 points) The 100 observations in the current study provide an estimate of (1 ; )()
with standard error S = 0:21816. Since the inverse of the Fisher information matrix
used to compute S is proportional to the sample size, we will the ratio of the new
sample size to the previous sample size to be
2
n
S
=
100 (:10) :
2
2
Consequently, we will need a new sample of about
100S = 10000S
(:10)
nests to achieve the desired precision.
n=
2
2
2
) 476
(d) ( 4 points) degrees of freedom = (8-1)-2 = 5.
Final numerical answers were not required to receive full credit for solutions to parts (b) and
(c) of problem 5.
6
Scores: Here is a stem-leaf display of the scores for this exam.
9
8
8
7
7
6
6
5
5
14
87
0244
56677788899
1222233
5667899
0334
9
0
7
Download