Test 2 Math 327 1. The following table shows results of a three-center clinical trial to compare a drug to a placebo for curing an infection. At each center, the subjects were randomly assigned to treatment groups. Center 1 Treatment Drug Control Response Success Failure 11 25 10 27 2 Drug Control 16 22 4 10 3 Drug Control 14 7 5 12 a). Fit a logistic regression model using Center and Treatment as predictors. Based on your output, controlling for center, how do the odds of having a success change if a subject switches from the Control treatment to the Drug treatment? Also obtain a 95% confidence interval for this effect. > infection <- read.table("http://educ.jmu.edu/~chen3lx/math327/infection.txt",header=T) > infection Center Treatment Success Failure 1 1 Drug 11 25 2 1 Control 10 27 3 2 Drug 16 4 4 2 Control 22 10 5 3 Drug 14 5 6 3 Control 7 12 > infection$Center <- factor(infection$Center) > out <- glm(cbind(Success,Failure)~Center+Treatment,infection,family=binomial) > summary(out) Call: glm(formula = cbind(Success, Failure) ~ Center + Treatment, family = binomial, data = infection) Deviance Residuals: 1 2 3 -0.63760 0.69979 -0.08129 4 0.05481 5 0.95423 6 -0.90506 Coefficients: (Intercept) Estimate Std. Error z value Pr(>|z|) -1.2578 0.3286 -3.828 0.000129 *** 1 Center2 Center3 TreatmentDrug --Signif. codes: 2.0254 1.1428 0.6644 0.4195 0.4226 0.3529 4.828 1.38e-06 *** 2.704 0.006841 ** 1.882 0.059786 . 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: 31.7394 Residual deviance: 2.6355 AIC: 31.729 on 5 on 2 degrees of freedom degrees of freedom exp(0.6644) = 1.94, the odds of having a success in the Treatment group are 1.94 times the odds in the Control group, controlling for Center. 0.6644±1.96∗0.3529 = (−0.0273, 1.3561) and (e−0.0273 , e1.3561 ) = (0.97, 3.88). We are 95% confident that the odds of having a success in the Treatment group are between 0.97 to 3.88 times the odds in the Control group, controlling for Center. b). Based on your output, controlling for treatment, how do the odds of having a success change if a subject switches from Center 1 to Center 2? Which center has the highest odds of success controlling for treatment? exp(2.0254) = 7.58, the odds of having a success at Center 2 are 7.58 times the odds at Center 1, controlling for treatment. Based on the coefficient estimates for Center 2 and Center 3, Center 2 has the highest odds of success, controlling for treatment. c). Compute the sample odds ratios within each center. Based on the results, is it reasonable to assume a common conditional odds ratio between Response and Treatment? The three samples odds ratios within each center are: θ̂1 = 11∗27 25∗10 = 1.188, θ̂2 = 16∗10 22∗4 = 1.818, 14∗12 θ̂3 = 7∗5 = 4.8. Center 3 has a much higher odds ratio than Center 1 and Center 2. Probably it is more reasonable to fit a model that allows different odds ratios within each center. This can be accomplished by adding a Center and Treatment interaction term in the model. 2. A study is performed to identify risk factors associated with giving birth to a low birth weight baby (weighing less than 2500 grams). The response variable is low (1 for weight < 2500g, 0 for not). Four predictors that are considered are race ( 1=white, 2=black, 3=other), smoke: (smoking status of mother during pregnancy, 1=yes, 0=no), lwt (weight in pounds right before pregnancy), ptl (history of premature labor, taking values 0, 1, 2, 3 etc.). The following are R output of glm fit of model low ∼ lwt + race + smoke + ptl and low ∼ lwt respectively. 2 glm(formula = low ~ lwt + race + smoke + ptl, family = binomial, data = birthwt) Deviance Residuals: Min 1Q Median -1.7432 -0.8520 -0.5669 3Q 1.1667 Max 2.0614 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -0.35023 0.90242 -0.388 0.6979 lwt -0.01194 0.00637 -1.874 0.0609 . race2 1.29788 0.51263 2.532 0.0113 * race3 0.94423 0.41716 2.263 0.0236 * smoke1 0.94017 0.38611 2.435 0.0149 * ptl 0.60550 0.33103 1.829 0.0674 . --Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: 234.67 Residual deviance: 211.55 AIC: 223.55 on 188 on 183 degrees of freedom degrees of freedom Call: glm(formula = low ~ lwt, family = binomial, data = birthwt) Deviance Residuals: Min 1Q Median -1.0951 -0.9022 -0.8018 3Q 1.3609 Max 1.9821 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) 0.99831 0.78529 1.271 0.2036 lwt -0.01406 0.00617 -2.279 0.0227 * --Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: 234.67 Residual deviance: 228.69 AIC: 232.69 on 188 on 187 degrees of freedom degrees of freedom Number of Fisher Scoring iterations: 4 3 1). Based on the above output, compute the odds ratio between mothers of black race and other races , controlling for lwt, smoking status and ptl. You just need to give a point estimate. No confidence interval is needed. If we use other race as the baseline level, then the coefficient for black would be 1.2979-0.9442=0.3537, and exp(0.3537) = 1.42, the odds of having a low birth weight baby are 42% higher for a black mother than that for a mother from other races, controlling for all the other variables in the model. 2). Base on the output, more previous premature labors increase or decrease the odds of having a low birth weight baby? Quantify this effect using a 95% confidence interval. Based on the output, controlling for all the other variables, more previous premature labors increase the odds of having a low birth weight baby. exp(0.6055) = 1.83. Controlling for all the other variables, the odds increase by 83% with each additional premature labor. A 95% CI is: 0.6055 ± 1.96 ∗ 0.3310 = (−0.0433, 1.2543) and (e−0.0433 , e1.2543 ) = (0.96, 3.51) 3). Estimate the probability of having a low birth weight baby for a black, nonsmoking mother who weighed 150 pounds before pregnancy and had 2 premature labor. exp(−0.3502−0.0119∗150+1.2979+0.6055∗2) = 0.59. π̂ = 1+exp(−0.3502−0.0119∗150+1.2979+0.6055∗2) 4). Is the (conditional) effect of lwt significant after controlling for the effects of race, smoke and ptl? Perform a test to answer this question. The Wald test for testing H0 : β1 = 0 vs Ha : β1 6= 0 has z test statistic -1.874 and p-value 0.06. The conditional effect of lwt is not so significant after controlling for the effects of other variables. 5). Is the marginal effect of lwt significant? Perform a test to answer this question. The Wald test for testing the marginal effect of lwt has z test statistic -2.279 and p-value 0.02. The effect is significant. 6). Perform a test to answer if at least one of the following predictors are significant after lwt is included in the model: race, smoke and ptl. H0 : β2 = β3 = β4 = β5 = 0, 4 Ha : at least one of them is not 0. To perform a likelihood ratio test, the test statistics can be obtained by getting the difference between the residual deviance from fitting the model model low ∼ lwt + race + smoke + ptl and low ∼ lwt respectively. −2(L0 − L1 ) = 228.69 − 211.55 = 17.14, d.f. =4 Note Residual deviance given in the R output for a model M is −2(LM −LS ), where LS is the maximized loglikelihood for the saturated model. p-value = P (χ2 > 17.14) = 0.002. Reject H0 . 5