STAT 557 FALL 2000 Instructions: 1. EXAM I NAME ________________ You may use a calculator and the formula sheets you brought to this exam. No other notes or books are allowed. Write your answers in the spaces provided below. If you need more space use the back of the page or attach additional sheets of paper, but clearly indicate where this is done. You need not complete numerical computations, you will receive complete credit by showing that you know how to solve the problem. Be sure to define any notation you use that is not defined in the statement of a problem. Medical researchers want to investigate the claim that long term consumption of a low dose of aspirin (in this case 250 mg. per day for at least five years) reduces the risk of experiencing a heart attack in middle aged males (men between 40 and 55 years old). (a) Describe how a prospective study could be done to investigate this claim. (b) Describe how a retrospective study could be done to investigate this claim. 2 (c) Suppose the study you described in Part (b) produced the following table of counts. Long term consumption of 250 mg of aspirin per day Experienced a heart attack Yes No Yes 15 27 No 185 173 Compute an approximate 95% confidence interval for the odds ratio corresponding to the odds that long term aspirin users experience a heart attack divided by the odds that non-aspirin users experience a heart attack. (d) Explain why the odds ratio in Part (c) can be used as an approximate measure of relative risk of heart attack. Be sure to describe the situations in which this approximation would be most accurate. 3 2. Each member of a simple random sample of 400 female high school students in Iowa was asked the question: “Should teenagers be allowed to purchase birth control pills without the consent of their parents?” Each respondent was classified into one of three categories: (1) Yes (2) No (3) Unsure/No opinion A response to the same question was obtained from the mother of each student in the sample. Show how you would use the data from this survey to test the null hypothesis that the distribution of opinions across the three response categories is the same for female high school students and mothers of female high school students. Give a formula for your test statistic, degrees of freedom, and the critical value for a .05 Type I error level test. 4 3. In a study of two surgical procedures (procedure A and procedure B) for correcting a heart defect, patients with the heart defect will be recruited at 18 different hospitals. The number of patients recruited at the various hospitals will vary from as few as 6 to as many as 16 at a single hospital. Within each hospital, half of the patients will be randomly assigned to procedure A and the other half will be treated with procedure B. It is well known that success rates for these types of surgical procedures vary from hospital to hospital, depending on the training and policies of the staff at different hospitals and features of the populations of patients served by different hospitals. Nevertheless, the researchers hope to find that one procedure is consistently better than the other. Describe how you would analyze the data from this experiment. In particular, identify the null hypotheses to be tested and the alternative hypothesis. Report a formula for a test statistic and describe how it would be used to establish a conclusion. 5 4. A simple random sample of 50 teaching assistants at a large public university were cross classified into a 3x3 contingency table with respect to their perception of how well students were prepared to take the course they were teaching and their level of job satisfaction. Low Student Preparation Job Satisfaction Moderate High Poor 8 6 3 Moderately Good 1 16 4 Very Good 0 7 5 (a) Estimate the gamma measure of association. (b) Estimate λ R |C . (c) The estimate of kappa is 0.34. Is it better to use gamma or kappa to describe the relationship between level of teacher job satisfaction and level of student preparation? Explain. 6 5. Using wild ducks that were captured and fitted with radio transmitters in the previous summer, researchers were able to locate 100 nesting pairs of ducks in the subsequent spring. The number of eggs hatched in each of the 100 nests was recorded, and the counts are shown below. Number of eggs that hatched Number of nests 0 26 1 8 2 12 3 11 4 18 5 14 6 8 7 2 8 0 9 0 10 1 This table indicates, for example, that 10 eggs hatched in one nest and no eggs hatched in 26 nests. If we use Y1, Y2, …, Y100 to represent the number of eggs that hatched in the 100 nests, then this table indicates that 26 of the Yi values are zero, 8 of the Yi values are one, etc . …… The relatively large number of zero counts requires a probability model that can give more probability to a zero outcome than a Poisson distribution. Consider the model where Y1, Y2, …, Y100 are independent random variables with Pr{Yi = 0} = θ + (1 − θ ) e − λ λk −λ Pr{Yi = k} = (1 − θ ) e , k! k = 1, 2, ..., (a) Write out the log-likelihood function for this model. Define any notation you use that has not been defined in the statement of this problem. (b) The log-likelihood from Part (a) was maximized to obtain maximum likelihood estimates θˆ = 0.241 ( and ) ′ The covariance matrix for θˆ, λˆ was reported as λˆ = 3.675 7 .002178 .000835 V = .000835 .039643 Assuming that the proposed probability model is correct, use this information to construct an approximate 95% confidence interval for (1 − θ) λ , the mean number of eggs that hatch per nest. (c) Suppose the researchers want to do a larger study to estimate (1 − θ) λ more precisely. They want the standard error of their estimator to be smaller than 0.10. Using the information from the current study, what is your recommendation for the number of nests that they should monitor in the new study? Show how you arrived at your answer. (d) To determine if the observed data are inconsistent with the proposed probability model, the following table of expected counts was constructed. Number of hatched eggs Observed number of nests Expected number of nests 0 1 2 3 4 5 6 26 8 12 11 18 14 8 7 or more 3 26.00 7.08 13.00 15.92 14.63 6.59 5.59 6.04 The value of the Pearson statistic is X 2 = 5.32. What are the degrees of freedom for this test?