Statistics 410/510 Logistic Regression Lab Due: Thursday, March 13.

advertisement
Statistics 410/510
Logistic Regression Lab
Due: Thursday, March 13.
The data for this lab can be found by loading the Sleuth2 package into R. This is done by using the Package pulldown tab in R and going to “Install Packages”. After denoting a site (CA-1 is Berkeley), select and install Sleuth2.
Using the Packages pull-down a second time, you will then need to load Sleuth2 to R. Once the package is
loaded into R, you can access the data by using the command library(Sleuth2). To learn more about
each dataset, do help(case2001).
1. Make a scatter plot of the data where you have age on the x-axis, status (0=dead, 1=alive) on the y-axis.
Use a different symbols or colors to distinguish females and males. Include a legend.
2. Fit the model with age and gender without interaction. Use the command fit1 <- glm( Status
~ Age + Sex, family=binomial , data=case2001). Show a table of the coefficients and
their standard errors.
3. Calculate the log odds of survival for a 40 year old female. (This is the linear combination of the
coefficients. I will call this eta for reference in later problems.) Is this greater than 0? Do this using R like
a calculator. Confirm that you can reproduce this answer via predict(fit1) and look at the 2nd
observation. Or, predict(fit1,newdata=data.frame(Age=40,Sex="Female")).
4. Calculate the probability of a 40 year old female surviving. (Use the inverse link function; i.e.,
exp(eta)/(1+exp(eta)) .) Is the probability greater than 0.5? Confirm that this probability can also be
calculated by using the predict( ) command when you use the option type=”response”.
5. Using the probability rule for complements, P(not A) = 1-P(A), calculate the probability of death for a 40
year old female.
6. Odds in our example would be P(survival) / [ 1- P(survival) ] = P(survival)/P(Death). Using your results
from problems 4 and 5, calculate the odds of a 40 year old woman surviving. Are the odds better than
“even”? Or worse? That is, are the odds greater than 1? Note, in epidemiology, we’d have P(disease)
in the numerator.
7. Duplicate your answer in problem 6 by calculating the exponential of eta.
8. Calculate the 95% confidence interval for the coefficient for the female variable. Assuming normality,
use 1.96 as the multiplier. (Hint: Use the coefficient estimate and the SE provided by R.) Is zero inside
this interval? If the coefficient’s value were 0, that’d suggest gender was unassociated with survival.
9. Calculate the probability of a 40 year old male surviving. This is similar to what you did in problem 4 for
females.
10. Calculate the odds of survival for a 40 year old male. This is similar to what you did in problem 6 for
females.
11. For 40 year olds, calculate the ratio of the odds for a female (numerator) relative to the odds of a male
(denominator). This is called an odds ratio (O.R.). An O.R.=1 suggests the two variables – gender and
survival – are unassociated. Who had a better chance of survival, a males or females?
12. Confirm that your result in problem 11 is the same as the exponential of the gender coefficient.
13. For 50 year olds, calculate the ratio of the odds for a female (numerator) relative to the odds of a male
(denominator). This O.R. should be the same as that calculated in problem 11. Using college algebra,
show why this mathematically works out. (Hint: See Logistic Regression talk slides.) Note that this
would not hold true if the model had an interaction between age and gender.
14. Take the exponential of the confidence interval you calculated in problem 8. This provides a 95%
confidence interval for the O.R. of survival by being female versus male. Is 1 inside this interval?
15. Relative risk (RR) is defined as the risk of event for group 1 divided by the risk of event for group 2,
where risk is equivalent to probability. Using death as the event, what was the risk of death for 40 year
old women relative to that of 40 year old men? How many more times likely was a 40 year old woman
to die than a 40 year old man?
16. O.R. and R.R. depend upon what group is chosen to be in the denominator. Have women in the
denominator and recalculate the OR and RR of problems 11 and 15. Interpret the R.R.
17. Reproduce problem 1, but include two different logistic curves showing probability of survival.
Download