Homework for Logistic Regression 1. Logistic regression for the combined data on incubation temperature and number of male and female turtles from eggs collected in Illinois is presented in Examples 5.5-5.8. The original data is given below. Temp 27.2 27.7 28.3 male 1 0 1 7 4 6 13 6 7 female 9 8 8 3 2 2 0 3 1 % male 10% 0% 11% 70% 67% 75% 100% 67% 88% Temp 28.4 29.9 male 7 5 7 10 8 9 female 3 3 2 1 0 0 % male 70% 63% 78% 91 100% 100% (a) Use the complete data, 3 observed proportions for each temperature, to fit a logistic regression model. You can fit the model in either of two ways; • Use the proportion male, as the response and use the number of turtles as weights. • Use cbind to create a two column response containing the number of males and the number of females. How does this fit compare to that of the combined data? Look at the residual deviance as well as the fitted equation. (b) Is temperature significant in the logistic regression model using the complete data? Justify your answer statistically. (c) What is the incubation temperature that would give a predicted proportion of males of 50%? 2. There is also data on the relationship between the number of male turtles and incubation temperature for turtle eggs from New Mexico. The turtles are the same species as those from Illinois. Temp 27.20 27.20 27.20 28.30 28.30 28.30 29.90 29.90 29.90 male 0 0 0 6 2 0 4 1 3 female 5 3 2 1 0 3 1 1 0 % male 0% 0% 0% 86% 100% 0% 80% 50% 100% (a) Use logistic regression to analyze these data. You can either use the proportion male as the response with the number of turtles as weights or you can use cbind to create a two column response containing the number of males and number of females. You do NOT have to use both ways, only one. Turn in the summary of the logistic regression fit. Give the equation, comment on the residual deviance and what it indicates, and test to see if temperature is significant. 1 (b) What is the temperature at which you would get a 50:50 split of males to females? (c) Turn in a plot of the data with the logistic regression curve superimposed. Make sure your plot has appropriate labels and a title. (d) How do the New Mexico turtles compare to the Illinois turtles in terms of the effect of temperature on the sex of the turtles? (e) How would you analyze the Illinois and New Mexico data together? You do not have to do this analysis, simply tell me what variables you would include in your model and what procedure in you would use to fit the model. 3. A study was conducted to see the effect of coupons on purchasing habits of potential customers. In the study, 1000 homes were selected and a coupon and advertising material for a particular product was sent to each home. The advertising material was the same but the amount of the discount on the coupon varied from 5% to 30%. The number of coupons redeemed was counted. Below are the data. Price Reduction Xi 5 10 15 20 30 Number of Coupons ni 200 200 200 200 200 Number Redeemed Yi 32 51 70 103 148 Proportion Redeemed pi 0.160 0.255 0.350 0.515 0.740 (a) Fit a simple linear regression to the observed proportions. Use this regression to estimate the proportion redeemed. Is there a significant linear relationship between proportion redeemed and price reduction? According to this regression at what price reduction will you get a 25% redemption rate? (b) Compute the logits for the observed proportions at each price reduction level. (c) Fit a simple linear regression of the logit transformed proportions on the price reduction. Is there a significant linear relationship between the logit and the price reduction? Use this regression to estimate the proportion redeemed. According to this regression at what price reduction will you get a 25% redemption rate? (d) Use the general linear model (glm) function in S+ to fit a logistic regression of the proportion redeemed on the price reduction. Comment on the residual deviance and what this says about the adequacy of the fit of the model. Is price reduction a significant predictor in this logistic regression model? Use this regression to estimate the proportion redeemed. According to this regression at what price reduction will you get a 25% redemption rate? (e) Compare the three regression equations and price reductions to get a 25% redemption rate. (f) Create plots that show the data and the fits. 4. Kyphosis is a spinal deformity found in young children who have corrective spinal surgery. The incidence of spinal deformities following corrective spinal surgery (kyp=1 if deformity is present, kyp=0 if there is no deformity present) is thought to be related to the Age (in months) at the time of surgery, Start (the starting vertebra for the surgery) and Num (the number of vertabrae involved in the surgery). 2 Age 71 158 128 2 1 1 61 37 113 59 82 148 18 1 168 1 78 175 80 27 22 105 96 131 15 9 8 Start 5 14 5 1 15 16 17 16 16 12 14 16 2 12 18 16 15 13 16 9 16 5 12 3 2 13 6 Num 3 3 4 5 4 2 2 3 2 6 5 3 5 4 3 3 6 5 5 4 2 6 3 2 7 5 3 Kyp 0 0 1 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 1 1 0 1 0 0 Age 100 4 151 31 125 130 112 140 93 1 52 20 91 73 35 143 61 97 139 136 131 121 177 68 9 139 2 Start 14 16 16 11 11 13 16 11 16 9 6 9 12 1 13 3 1 16 10 15 13 3 14 10 17 6 17 Num 3 3 2 3 2 5 3 5 3 3 5 6 5 5 3 9 4 3 3 4 5 3 2 5 2 10 2 Kyp 0 0 0 0 0 0 0 0 0 0 1 0 1 1 0 0 0 0 1 0 0 1 0 0 0 1 0 Age 140 72 2 120 51 102 130 114 81 118 118 17 195 159 18 15 158 127 87 206 11 178 157 26 120 42 36 Start 15 15 13 8 9 13 1 8 1 16 16 10 17 13 11 16 14 12 16 10 15 15 13 13 13 6 13 Num 4 5 3 5 7 3 4 7 4 3 4 4 2 4 4 5 5 4 4 4 3 4 3 7 2 7 4 Kyp 0 0 0 1 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 (a) Plot the binary response for the incidence of Kyphosis versus the age of the child. Fit a logistic regression of incidence of Kyphosis on Age. Examine the fit, significance of Age and look at the residuals. (b) Fit a quadratic logistic regression model in Age. You will need to create a new variable AgeSq = Age*Age. Examine the fit, significance of Age and AgeSq and look at the residuals. (c) Repeat part (a) with the explanatory variable Number. (d) Fit a full quadratic logistic regression model in Age and Num, that is include variables Age, Num, Age*Num, AgeSq and NumSq. Examine the fit and significance of each of the variables. (e) Give a final model that includes only those explanatory variables from the full quadratic logistic regression model that are significant. 3