Homework for Logistic Regression

advertisement
Homework for Logistic Regression
1. Logistic regression for the combined data on incubation temperature and number of male and
female turtles from eggs collected in Illinois is presented in Examples 5.5-5.8. The original data is
given below.
Temp
27.2
27.7
28.3
male
1
0
1
7
4
6
13
6
7
female
9
8
8
3
2
2
0
3
1
% male
10%
0%
11%
70%
67%
75%
100%
67%
88%
Temp
28.4
29.9
male
7
5
7
10
8
9
female
3
3
2
1
0
0
% male
70%
63%
78%
91
100%
100%
(a) Use the complete data, 3 observed proportions for each temperature, to fit a logistic regression
model. You can fit the model in either of two ways;
• Use the proportion male, as the response and use the number of turtles as weights.
• Use cbind to create a two column response containing the number of males and the
number of females.
How does this fit compare to that of the combined data? Look at the residual deviance as
well as the fitted equation.
(b) Is temperature significant in the logistic regression model using the complete data? Justify
your answer statistically.
(c) What is the incubation temperature that would give a predicted proportion of males of 50%?
2. There is also data on the relationship between the number of male turtles and incubation temperature for turtle eggs from New Mexico. The turtles are the same species as those from Illinois.
Temp
27.20
27.20
27.20
28.30
28.30
28.30
29.90
29.90
29.90
male
0
0
0
6
2
0
4
1
3
female
5
3
2
1
0
3
1
1
0
% male
0%
0%
0%
86%
100%
0%
80%
50%
100%
(a) Use logistic regression to analyze these data. You can either use the proportion male as the
response with the number of turtles as weights or you can use cbind to create a two column
response containing the number of males and number of females. You do NOT have to use
both ways, only one. Turn in the summary of the logistic regression fit. Give the equation,
comment on the residual deviance and what it indicates, and test to see if temperature is
significant.
1
(b) What is the temperature at which you would get a 50:50 split of males to females?
(c) Turn in a plot of the data with the logistic regression curve superimposed. Make sure your
plot has appropriate labels and a title.
(d) How do the New Mexico turtles compare to the Illinois turtles in terms of the effect of
temperature on the sex of the turtles?
(e) How would you analyze the Illinois and New Mexico data together? You do not have to
do this analysis, simply tell me what variables you would include in your model and what
procedure in you would use to fit the model.
3. A study was conducted to see the effect of coupons on purchasing habits of potential customers.
In the study, 1000 homes were selected and a coupon and advertising material for a particular
product was sent to each home. The advertising material was the same but the amount of the
discount on the coupon varied from 5% to 30%. The number of coupons redeemed was counted.
Below are the data.
Price
Reduction
Xi
5
10
15
20
30
Number of
Coupons
ni
200
200
200
200
200
Number
Redeemed
Yi
32
51
70
103
148
Proportion
Redeemed
pi
0.160
0.255
0.350
0.515
0.740
(a) Fit a simple linear regression to the observed proportions. Use this regression to estimate the
proportion redeemed. Is there a significant linear relationship between proportion redeemed
and price reduction? According to this regression at what price reduction will you get a 25%
redemption rate?
(b) Compute the logits for the observed proportions at each price reduction level.
(c) Fit a simple linear regression of the logit transformed proportions on the price reduction. Is
there a significant linear relationship between the logit and the price reduction? Use this
regression to estimate the proportion redeemed. According to this regression at what price
reduction will you get a 25% redemption rate?
(d) Use the general linear model (glm) function in S+ to fit a logistic regression of the proportion
redeemed on the price reduction. Comment on the residual deviance and what this says about
the adequacy of the fit of the model. Is price reduction a significant predictor in this logistic
regression model? Use this regression to estimate the proportion redeemed. According to this
regression at what price reduction will you get a 25% redemption rate?
(e) Compare the three regression equations and price reductions to get a 25% redemption rate.
(f) Create plots that show the data and the fits.
4. Kyphosis is a spinal deformity found in young children who have corrective spinal surgery. The
incidence of spinal deformities following corrective spinal surgery (kyp=1 if deformity is present,
kyp=0 if there is no deformity present) is thought to be related to the Age (in months) at the
time of surgery, Start (the starting vertebra for the surgery) and Num (the number of vertabrae
involved in the surgery).
2
Age
71
158
128
2
1
1
61
37
113
59
82
148
18
1
168
1
78
175
80
27
22
105
96
131
15
9
8
Start
5
14
5
1
15
16
17
16
16
12
14
16
2
12
18
16
15
13
16
9
16
5
12
3
2
13
6
Num
3
3
4
5
4
2
2
3
2
6
5
3
5
4
3
3
6
5
5
4
2
6
3
2
7
5
3
Kyp
0
0
1
0
0
0
0
0
0
1
1
0
0
0
0
0
0
0
0
0
0
1
1
0
1
0
0
Age
100
4
151
31
125
130
112
140
93
1
52
20
91
73
35
143
61
97
139
136
131
121
177
68
9
139
2
Start
14
16
16
11
11
13
16
11
16
9
6
9
12
1
13
3
1
16
10
15
13
3
14
10
17
6
17
Num
3
3
2
3
2
5
3
5
3
3
5
6
5
5
3
9
4
3
3
4
5
3
2
5
2
10
2
Kyp
0
0
0
0
0
0
0
0
0
0
1
0
1
1
0
0
0
0
1
0
0
1
0
0
0
1
0
Age
140
72
2
120
51
102
130
114
81
118
118
17
195
159
18
15
158
127
87
206
11
178
157
26
120
42
36
Start
15
15
13
8
9
13
1
8
1
16
16
10
17
13
11
16
14
12
16
10
15
15
13
13
13
6
13
Num
4
5
3
5
7
3
4
7
4
3
4
4
2
4
4
5
5
4
4
4
3
4
3
7
2
7
4
Kyp
0
0
0
1
0
0
1
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
0
0
1
0
(a) Plot the binary response for the incidence of Kyphosis versus the age of the child. Fit a
logistic regression of incidence of Kyphosis on Age. Examine the fit, significance of Age and
look at the residuals.
(b) Fit a quadratic logistic regression model in Age. You will need to create a new variable AgeSq
= Age*Age. Examine the fit, significance of Age and AgeSq and look at the residuals.
(c) Repeat part (a) with the explanatory variable Number.
(d) Fit a full quadratic logistic regression model in Age and Num, that is include variables Age,
Num, Age*Num, AgeSq and NumSq. Examine the fit and significance of each of the variables.
(e) Give a final model that includes only those explanatory variables from the full quadratic
logistic regression model that are significant.
3
Download