Stat 557 Fall 2000 Exam I Solutions A point value for each part of each problem is indicated below. These values sum to 100. You should have a corresponding number on your graded exam for each part of each question. If you received no credit for a response you should have a zero recorded on your paper for that part of the question. Contact the instructor if no point value is indicated for any part of any problem. Some of the solutions are presented here in more detail than was expected of student responses. There are reasonable ways to approach some problems that are not addressed in this set of solutions. Problem 1 (a) (6 points) To do a prosepctive study you would need to recruit male volunteers between the ages of 40 and 55 who meet any other criteria to enter the study. These volunteers would be randomly divided into two groups of equal size. One group would be randomly selected to receive the daily dose of aspirin, and men in the other group would receive a placebo. The volunteers would not be told if they are taking the aspirin or the the palcebo. The volunteers would be followed for a specied period of time (at least 5 years) and the number who experienced at least one heart attack would be recorded for each group. The sample proportions would be used to test the null hypothesis that the incidence of heart attacks is the same for both groups. (b) (6 points) One way in which a retr0spective study could be done is to take a simple random sample of patient records from a population of 40 to 55 year old males who hae been treated for heart attacks. Take an independent simple random sample of patient records from a population of 40 to 55 year old males with no history of heart disease. For each sample, classify the patients as to whether or not they consumed aspirin on a long term daily basis. Construct a 2x2 table as shown in part (c), and test null hypothesis of independence or construct a condence interval for an odds ratio. The solution to part (d) assumes that the table in part (c) was obtained in this way. Alternatively, you could take a simple random sample patients records from a population of 40-55 year old males who received long term aspirin treatment prior to any history of heart disease and claasify then with respect to their current health disease 1 status. Take an independent simple random sample of patient records from a population of 40-55 year old males who never received a long termaspirin treatment and classify these patients with respect to current heart disease status. A third approach would take a simple random sample of patient records from some population of 40-55 year old males and classify each patient in the sample with respect to current heart disease status and whether or not the patient received long term aspirin treatment prior to the onset of heart problems. If heart disease incidence is rather low in 40-55 year old males these methods could require relatively large sample sizes to observe a substantial number of patients who experienced heart attacks. An appropriate answer to part (d) depends on which sampling plan was proposed in part (b). (c) (12 points) Estimate the odds ratio as (15)(173) = :5195 ^ = (27)(185) Use the large sample normal approximation to the distribution of log(^) to compute an approximate 95 percent condence interval for log(). s 1 + 1 + 1 + 1 log (^) (1:96) 15 185 27 173 ) ;:6548 (1:96)(:33895) ) (;1:3192; :0096) Then an approxicmate 95 percent condence interval for the odds-ratio is (exp(;1:3192); exp(:0096) ) ) (0:27; 1:01) : (d) (8 points) From the data in part (c), we can directly estimate the odds ratio n o 3 2 Pr Aspirin user Heart attack 4 n o5 P r Aspirin non-user Heart attack n o 3 = 2 P r Aspirin user No heart attack 4 n o5 P r Aspirin non-user No heart attack Using Bayes Theorem, this becomes n o 3 2 P r Heart attack Aspirin user 4 n o5 P r No heart attack Aspirin user o 3 = 2 n P r Heart attack Aspirin non-user 4 n o5 P r No heart attack Aspirin non-user = 2 4 Relative risk of heart attack 32 n P r No heart 54 n o3 attack Aspirin non-user 5 o P r No heart attack Aspirin user 2 This will be an accurate approximation to the relative risk of heart attack for the aspirin versus non-aspirin users if the incidence of heart attacks is low for both users and non-users, i.e., n P r No heart attack Aspirin non-user n o P r No heart attack Aspirin user o is close to 1.0 : Problem 2 (12 points) The results for a student and her mother should be cross-classied into a 3x3 table, with one count for each student/mother pair, as shown below. By treating the student/mother pair as the sampling unit and recording one count in the table for each pair, any correlation between the responses given by a student and her mother is automatically taken into account. Student's Response Mother's Response Yes No Unsure Yes Y Y12 Y13 No Y Y22 Y22 Unsure Y Y32 Y33 11 21 31 The corresponding population probablities are Student's Response Mother's Response Yes No Unsure Yes 12 13 No 22 22 Unsure 32 33 11 21 31 Let = ( )0. Then the null hypothesis of margianl homo11 21 31 12 22 32 13 23 33 3 geneity can be expressed as 2 1+ 6 6 H0 : 0 = 64 2+ 3+ where 3 2 +1 7 6 7 6 7 5;6 4 +2 +3 3 7 7 7 5 = C 2 6 C = 664 3 0 ;1 ;1 1 0 0 1 0 0 7 0 1 0 ;1 0 ;1 0 1 0 775 0 0 1 0 0 1 ;1 ;1 0 The alternative is that the null hypothesis is incorrect. A Wald test with an approximate .05 Type I error (signicance) level is obtained by rejecting the null hypothesis if Xo2 = np0 C0 (Cp C0 ); Cp exceeds the upper :05 percentile of a central chi-square distribution with 2 degrees of freedom. The C matrix used here has rank 2. The choice of the C matrix is not unique. The same test statistic is obtained, for example, using a C matrix consisting of any two rows of the C matrix given above. Problem 3 (8 points) The null hypothesis is that the success rate are the same for tho sur- gical procedures within each hospital. This is a null hypothesis of conditional independence. The alterantive is that the success rate are dierent for the two surgical prodecures within some hospitals. (i) A testing procedure that does not require the assumption of homogeneous odds ratios within hospitals is too reject the null hypothesis if the value of the Cochran-MantelHaenszel statistic exceeds the upper percentile of a central chi-square distribution with one degree of freedom. This test would have good relative power if one surgery procedure was consistently better than the other. (ii) A second approach would rst use the T test to test the null hypothesis of homogeneous conditional odds ratios within hospitals. If this null hypoyhesis is not rejected, one could procede to evaluate the Mantel-Haenszel estimator and construct a condence interval for the common odds ratio. An advantage of this procedure is that it provides an estimate of how much better (or worse) the rst procedure is relative to the second procedure. A disdavantage is that it requires homogeneous conditional odds ratios 4 4 within hospitals. What would you do if the null hypothesis of homogeneous conditional odds ratios was rejected by the T test? 4 (iii) You could use loglinear or logistic regression models to analyze these data, but these approaches were not part of the material covered prior too this exam. Problem 4 (a) (6 points) For these data 804 = C = twice the number of ocncordant pairs 212 = D = twice the number of discordant pairs and ^ = C;D = 0:5827 C +D (b) (6 points) ^ = ;: ; ;;:: : : = 0:2759 (c) (4 points) This application does not call for a measure of agreement. The catergories for the row and column variables (student preparation and teacher job satisfaction) are not the same, and there is no reason to focus only on the main diagonal of the table. Consequently, the gamma measure of association is more appropriate than the Kappa measure of agreement in this situation. (1 42) (1 ( 16+ 32+ 10)) 1 42 Some students did not consider the nature of this particular study and tried to give some general argument for preferring one measure to the other. Problem 5 (a) (8 points) Let Xi denote the number of nests in which i ; 1 eggs hatched. The loglikelihood function is `(; ) = X1 log [+(1;)e; ]+[log (1;);] 10 X k =1 Xk+1 +log () 10 X k =1 10 X kXk+1 ; k =1 Xk+1 log (k!) Some students converted to a multinomial model by combining all nests in which the number of hatched eggs exceeded a certain bounded into a single category. This received slightly less than full credit because it produces a slightly less ecient estimator for (; ): Many students failed to establish a useful notation. 5 (b) (12 points) Use the delta method. Let g(; ) = (1 ; )(). The rst partial derivatives are 2 @ g ; 3 2 3 6 G = 64 ( ( @ ) @ (g (;) @ 7 6 7 5=6 4 ; 1; 7 7 5 Since the regularity conditions are satised, the maximum likelihood estimator (^; ^)0 has an approximate normal distribution with mean vector (; )0 and covariance matrix equal to the inverse of the Fisher Information matrix, which is estimated by V. Then, by the delta method, g(^; ^) = (1 ; ^)^ is approximately distributed as a normal random variable with mean (1 ; )() and estimated variance S 2 = G^ 0 V G^ = 0:0475947 Here, G^ is obtained by substituting (^; ^)0 for (; )0 in G. Then, an approximate condence interval for (1 ; )() is (1 ; ^)^ (1:96)S ) (2:7893) (1:96)(:21616) ) (2:36; 3:22): (c) (6 points) The 100 observations in the current study provide an estimate of (1 ; )() with standard error S = 0:21816. Since the inverse of the Fisher information matrix used to compute S is proportional to the sample size, we will the ratio of the new sample size to the previous sample size to be 2 n S = 100 (:10) : 2 2 Consequently, we will need a new sample of about 100S = 10000S (:10) nests to achieve the desired precision. n= 2 2 2 ) 476 (d) ( 4 points) degrees of freedom = (8-1)-2 = 5. Final numerical answers were not required to receive full credit for solutions to parts (b) and (c) of problem 5. 6 Scores: Here is a stem-leaf display of the scores for this exam. 9 8 8 7 7 6 6 5 5 14 87 0244 56677788899 1222233 5667899 0334 9 0 7