STAT 557 Fall 1998 Midterm Exam INSTRUCTIONS: 1. NAME _____________ Show your work and write your answers in the space provided on this exam. Use the back of the page or attach additional sheets of paper if more space is needed, but clearly indicate where this is done. You may use a calculator, pencils, erasers, and the formula sheet attached to this exam. No other materials are allowed. The following table displays information on the number of accidents during a one year period experienced by a sample of 400 workers from the automobile industry. This table also presents maximum likelihood estimates of expected counts for a model that assumes that the number of accidents for different workers are i.i.d. negative binomial random variables. Observed Number of Workers Estimate of Expected Counts 0 282 282.2 Number of Accidents in One Year 1 2 3 4 5 or more 74 26 8 4 6 72.5 26.6 10.7 4.5 3.5 (a) Write out a formula for the log-likelihood that was maximized to get the estimates of the expected counts shown in the table. (b) Using the observed counts and the estimates of the expected counts from the table, the value of the Pearson statistic for testing the fit of the negative binomial model against the general alternative is X2 = 2.60. What are the degrees of freedom for this test? (c) The maximum likelihood estimates for the parameters in the negative binomial model are π̂ – 0.53 and β̂ = 0.54, and the estimate of the large sample covariance matrix for (π2 , β̂ ) is s2πˆ sπˆ , βˆ sπˆ , βˆ .0035 .0062 = .0062 .0342 s 2βˆ Show how this information can be used to construct an approximate 95% confidence interval for ðâ = exp(â log (ð)) , the proportion of the population of workers that will have no accidents in a one year period. (Display required formulas. You do not have to finish numerical computations.) 2. One way to measure the effectiveness of an allergy treatment is to monitor use of "over-the- 2 counter" remedies, such as decongestants and nasal sprays, that can be purchased without a doctor's prescription. Each of 400 subjects enrolled in a study of a particular allergy treatment was monitored for use of over-the-counter remedies during a four week period before the treatment was first administered. Each subject was classified into one of the following three categories for level of use of over-the-counter remedies: 1. no use of over-the-counter remedies 2. moderate use 3. heavy use Then, each subject was given the treatment. During the treatment period, the subjects could use any over-the-counter remedies that they waned to use. Each subject was monitored during the last 4 weeks of the treatment period and classified into one of the three categories for use of "over-the-counter" remedies listed above. Show how to test the null hypothesis that use of over-the-counter remedies was not affected by the new allergy treatment. Give a formula for your test statistic and report the associated degrees of freedom for your test. 3. In a study of smoking habits of college students, observations from 2317 students were cross-classified into a 2×2×3×3 contingency table with respect to the following variables. Each student in the study had at most one older sibling. Variable (a) (b) Description A Smoking habit of the respondent i=1 for smoker i=2 for non-smoker B Sex of respondent j=1 for female j=2 for male C Older sibling as a role model k=1 older sibling smokes k=2 older sibling does not smoke k=3 no older sibling D Parents as role models l=1 both parents smoke l=2 one parent smokes l=3 neither parent smokes Write down the largest log-linear model that satisfies the following statement: Given the parents' smoking status (D), the smoking status of the respondent (A) is conditionally independent of both sex of the respondent (B) and the status of a possible older sibling (C). The following log-linear model was fit to the data: ( 3 ) log mijk l =λ + λ A i + λik + AC + λ B j λil AD + + C k λ + λ λ jl + λkl BD D l λ AB ij + CD + λij l ABD The value of the Pearson statistic for testing the fit of this model against the general alternative is X2 = 22.74. What are the degrees of freedom associated with this test? (c) How should data be collected from students if one wishes to accurately use a chisquare distribution for the X2 test in part (b)? (d) Assume that the model shown in part (b) is the correct model. Describe what this model implies about associations of the other three variables with the respondent's smoking status. Maximum likelihood estimates of model parameters are shown below. (These estimates satisfy the restrictions that the sum across the levels of any single variable is zero.) Use these estimates to interpret the size and direction of significant associations identified by this model. More space for your answer is available on the next page. Effect Subscripts Estimate A B A*B D i=1 j=1 i=1, j=1 1=1 1=2 i=1, 1=1 i=1, 1=2 j=1, 1=1 j=1, 1=2 i=1, j=1, 1=1 i=1, j=1, 1=2 k=1 k=2 i=1, k=1 i=1, k=2 k=1, 1=1 k=1, 1=2 k=2, 1=1 k=2, 1=1 −0.2292 −0.0197 −0.1463 0.7216 0.9156 0.3522 −0.1925 0.0469 −0.00004 0.0988 0.0179 0.5934 0.1080 0.2949 0.0247 0.2000 −0.5167 0.1037 −0.1864 A*D B*D A*B*D C A*C C*D Standard Error 0.0394 0.0369 0.0369 0.0522 0.0504 0.0419 0.0431 0.0411 0.0415 0.0411 0.0415 0.0530 0.0580 0.0313 0.0335 0.0598 0.0584 0.0651 0.0627 ChiSquare Prob 33.91 0.28 15.72 190.83 329.60 70.50 19.91 1.30 0.00 5.77 0.19 125.16 3.47 88.59 0.54 11.18 78.27 2.54 8.83 0.0000 0.5938 0.0001 0.0000 0.0000 0.0000 0.0000 0.2543 0.9993 0.0163 0.6655 0.0000 0.0625 0.0000 0.4605 0.0008 0.0000 0.1112 0.0030 EXAM SCORE