MIDTERM EXAM STAT 557 FALL 1996 Instructions: 1. Show your answers in the space provided on this exam. If you need more space you can use the back of the page or additional sheets of paper, but clearly indicate where this is done. Additional paper will be made available by the instructor. You may use only your pencils, calculator and the formula sheet attached to this exam. Do not waste time by trying to simplify matrix formulas. Consider the following 3×3 contingency table: i=1 i=2 i=3 TOTALS A. B. j=1 12 22 6 40 j=2 7 15 8 30 j=3 11 13 6 6 TOTALS 30 50 20 Circle any of the following statistics that are exactly zero for this table: (a) (b) (c) (d) 2. NAME Pearson X2 test for independence Kappa measure of agreement Lambda for predicting the row category from the column category Lambda for predicting the column category from the row category The degrees of freedom for the test of marginal homogeneity in this 3×3 table against the general alternative are ___________. A company that tests market potential of new products obtained a random sample of n=998 respondents to participate in a study comparing two laundry detergents, called Brand M and Brand S. Each respondent washed clothes with both detergents and reported a preference for one of the two detergents at the end of the trial. The company also obtained information from each respondent on brands of detergents that they purchased in the last two years, and the temperature and level of softness of water used to wash clothes. "Softer" water has lower concentrations of certain minerals and other substances. Each respondent was classified into a contingency table with respect to the four variables: Variable Levels A: previous use of Brand M 1 = no 2 = yes B: wash water temperature 1 = warm 2 = hot C: brand preference 1 = Brand M 2 = Brand S D: wash water softness 1 = hard, 2 = medium, 3 = soft _____________________________________________________________________ 2 In this study Brand M was a detergent currently for sale at local grocery stores and Brand S was a new detergent, but the respondents were not told this. Nevertheless, the respondents could be classified with respect to whether or not they previously used Brand M. The observed counts are shown below. No Previous user of Brand M (A) Water (B) Temperature Preference (C) Water (D) Softness hard (l=1) medium (l=2) soft (l=3) A. Warm (j=1) Yes Hot (j=2) Warm (j=1) Hot (j=2) Brand Brand Brand Brand Brand Brand Brand Brand M S M S M S M S 42 50 63 68 61 50 20 33 45 46 38 29 51 55 52 27 29 26 45 47 46 25 23 27 The complete independence model can be written as B C D log( m ijk l ) = λ + λA i + λj + λk + λl . What are the degrees of freedom for testing the fit of this model against the saturated model? B. If you computed the test in part A with CATMOD in SAS or LOGLINEAR in SPSS, what does the software assume about the distribution of the observed counts? Make this assumption for the rest of this problem. C. Using the notation established in part A, write the formula for the largest log-linear model that satisfies the null hypothesis that Brand preference is conditionally independent of previous use of Brand M, given the temperature and softness categories of the wash water. D. A log-linear model that fits the data well (G2 = 2.36 with p-value = .992) is B C D AB AD CD ACD + λAC log( m ijk l ) = λ + λA . i + λj + λk + λl + λij ik + λil + λk l + λik l Describe what this model implies about independence or conditional independence of the four variables used to construct the contingency table. E. What are the minimal sufficient statistics needed to estimate the parameters in the model defined in part D of this problem? F. Maximum likelihood estimates of the ë-parameters in the model in part D are listed 3 below with their standard errors. Estimate A ˆ λ1 = 0.095 λˆB = 0.148 Standard Error .033 .032 1 ˆ λ1C = 0.119 λˆ1D = − 0.037 λˆD 2 = 0.016 AB λˆ11 = .085 .033 .047 .046 .032 Estimate AC ˆ λ11 = − .198 λˆAD = − .020 11 AD ˆ λ12 CD λˆ11 CD λˆ12 ACD λˆ111 ACD λˆ112 Standard Error .033 .0447 = .003 .046 = − .118 .047 = .006 .046 = − .108 .047 = − .015 .046 Use these results to describe associations between Brand preference and level of water softness. 3. Given the results from problem 2, do you think it would be a good idea to use a MantelHaenszel estimator to describe the association between Brand preference and previous use of Brand M? Explain. 4. A. ^ Let m = (m̂1111' m̂1112 ' m̂1113 ' m̂1121 ' . . . , m̂ 2223 ) denote the vector of maximum ~ likelihood estimates of the expected counts for the model in part D of problem 2. Using matrix notation, display an approximate formula for m as a linear function of p = n −1 X , where X = (X1111' X1112 ' X1113 ' X1121 ' . . . , X 2223 ) is the vector of ~ ~ ~ observed counts. B. Using the result from part A, show how to derive a formula for the asymptotic distribution of m̂ m̂ log(αˆ ) = log 1111 2121 . m̂1121 m̂ 2111 EXAM SCORE __________