Say good things, think good thoughts, and do good deeds 1 Categorical Data Analysis Chapter 5 (I): Logistic Regression for Quantitative Factors 2 Logistic Regression • Binary response variable: Y ~ Bernoulli(p) • k quantitative/ordinal factors: x1, … , xk • model: p ( x) 1 x1 ... k xk log 1 p ( x) SAS textbook Sec 8.5 3 Interpretation (for Only One Factor) • (Multiplicative effect on the odds) Increasing x by one unit is estimated to give the odds of response a increase by a factor of exp()— Not easy for investigators to understand 4 Interpretation (for Only One Factor) • Interpretation of the effect of x on Y in terms of risk (or called response rate): – The bigger (the effect of X on Y) is, the bigger the slope of a tangent line of the fitting curve (with respect to X) is: p ( x) p ( x)(1 p ( x)) x how the risk changes instantly at x 5 LD 50 • LD50 (LD = lethal dose)= the dose level at which toxicity rate p(dose) is 50% • In the logistic regression with dose being the only x, LD50= -/ • The instant change rate of risk (p) at LD50 is /4 6 Example: Insecticide vs Beetles dosage 1.691 # of beetles # of dead exposed beetles 59 6 … … … 1.884 60 60 See handout for SAS code and output 7 Only One Factor • The estimate of , 34.27, can be interpreted as: Increasing dose by one unit is estimated to give the odds of death a increase by a factor of exp(34.27) • Interpret the effect on the risk of death at dose 1.70 8 More than 1 Factors • The estimate of is 34.27 can be interpreted as: Increasing x by one unit, keeping other factors fixed, is estimated to give the odds of death a increase by a factor of exp(34.27) 1 is called the logistic regression coefficient for x1 adjusted for other factors 9 Confidence Intervals Based on Fisher information matrix and asymptotical results of mle • Wald C.I. for i : found by SAS • Wald C. I. for p(x): found by SAS PROC GENMOD with the OBSTATS option 10 Significance Tests • H0: 10 vs. H1: 1 is not zero • Wald test • LR test 11 Examining the Fit of the Logit Model • Plot fitted and observed rates on the same plot • Residuals for logit models 12 Grouped Data • Grouping data makes overall goodness of fit test sensible and possible • Example: Crab data grouped by the width – Ungrouped: deviance=191.7, df=165, p-value=.076 – Grouped: deviance=6.25, df=6, p-value=.40 13