Logistic and Probit Regression

advertisement
Logistic Regression
Chongming Yang
Research Support Center
FHSS College
Rules of Logarithm

Log (uv) = Log (u) + Log (v)

Log (u/v) = Log (u) - Log (v)

Log (u)v = v Log (u)
Rules of Exponentiation
(0<a<1)

aman = am + an

am/an = am – an

(am)n = amn
Exponential & Logarithmic

Inverse of One Another

Y = ax

X = Loga(y)
Assumptions of Linear Regression






Yi =  + Xi + i
Yi continuous & unbounded
expected or mean (i)= 0
I = normally distributed
not correlated with predictors
Absence of perfect multicollinearity
No measurement error in all variables
Violation of LR Assumptions

Dichotomous Dependent Variable (DV)

Unordered Categorical (Nominal) DV

Ordered Categorical (Ordinal) DV
Natural Logarithmic Transformation
(Binary DV)

Let p = probability of an event
Logit Model
Rearranged Logit Model
Logistic Model
Odds Ratio
OR 
p (1) / 1  p (1) 
p (0) / [1  p (0)]
e
B
Interpretation of Coefficients
(odds ratio)

Dichotomous predictor X1:



The predicted odds of a positive response for group A
is ? times the odds for the group B.
The odds of a positive response for group a is ?%
higher than the odds for group B.
Continuous predictor X2:

One unit increase is associated with ?% increase in
the predicted odds of X
Interpretation

See Handout
Interpretation of Interaction

Definition:


The effect of a covariate depends on the level of
another covariate.
Interpretation:



Plug in some values of two variables
Plot estimated logit
Interpret interaction effect only when
main effects is present
Likelihood at value of X
(left side of equation)
n
L 

i 1
 pi 


 1  pi 
yi
1 
pi 
Log Likelihood
(left side of equation)
Log Logit Model
(right side of equation)
Maximum Likelihood Estimation
Likelihood Ratio Test of 0, 1…

Likelihood Ratio Test =
Deviance = -2log (likelihood of fitted model /
likelihood of Saturated model)

likelihood of Saturated model=1

Deviance = -2log (likelihood of fitted model)

2

Test of 0, 1…
1. 2 =-2Ln(likelihood of without x )/
(likelihood model with x)
2. Degree of Freedom = j - (p+1)
where j = (# of Categories) + (# of continuous variables)
p = # of parameters,
Hosmer-Lemeshow Test(2)
(grouping percentile of estimated p)
Cˆ 
g

k 1
( o k  n k p k )
n k p k (1  p k )
Ck
Ck
ok 

j 1
Where
yi
pk 

j 1
ˆj
mjp
n k
g = 10, k = 1..10, n' = number of subjects in kth group, ck= # of covariate
patterns, p¯ = average estimated probability, df= g-2
y=1
Group 1
(10% prob.)
Group 2
20% prob.
…
Group 10
100% prob.
…
Estimated
Observed
Estimated
Observed
Estimated
y=0
…
Estimated
N1
N2
…
Observed
N3
N4
Observed
Wald Test of 0, 1…

W =  / se()

Normal Distribution test
(se = standard error)
Multinomial Logistic Regression
(non-ordered categorical DV)


P = probability of a response category
Pi1 + Pi2 + Pi3 = 1
 p i1 
lo g 
  B1 X
 pi3 
 pi 2 
lo g 
  B2 X
 pi3 
 p i1 
lo g 
  B3 X
 pi 2 
Multinomial Logistic Regression
p( i  k ) 
1
K 1
1  e
K 1
x
Interpretation

See handout
Ordinal Logistic Models

Adjacent Category Model

Compare two adjacent categories
Adjacent Categories Model

Let j be an ordinal scale



j = 1…
j & j+1 = two adjacent categories
Model
 p ij
log 
 p
 i , j 1

  aj  Bjxj


Practice

Run Logistic Regression Using ‘binary.sav’

DV = Admit

IV = gre, gpa, rank

Annotated output:
http://www.ats.ucla.edu/stat/spss/dae/logit.htm
Pseudo R-squared
(based on Likelihood)



Explained Variability
Improvement from null model to fitted
model
Square of correlation (predicted and
observed)
Psudo R Square

Cox & Snell


Nagelkerke


Improvement of full model over intercept model
Improvement of full model over intercept model
McFadden


adjusted R-squared in OLS
penalizing a model with too many predictors
http://www.ats.ucla.edu/stat/mult_pkg/faq/general/Psuedo_RSquareds.htm
Practice
(continued)

Run Multinomial Logistic Regression Using
‘mlogit.sav’

DV= Brand

IV = female, age

Annotated output:
http://www.ats.ucla.edu/stat/spss/dae/mlogit.htm
Practice
(continued)

Run Ordinal Logistic Regression Using ologit.sav

DV= admit

IV = gre, gpa, topnotch

Annotated output:
http://www.ats.ucla.edu/stat/SPSS/dae/ologit.htm
Practical Issues
1. Low Ratio of Cases to Variables

Problem:


Extremely large parameter estimates and standard
errors
Solution:
Collapse categories
 Delete the offending category
 Delete discrete predictors

Practical Issues
2. Inadequacy of Expected Frequencies &
Power

Problems:


Lower power with small frequency cells
Solution:



Accept low power
Collapse categories or delete discrete predictors
Evaluate model fit with 2
Practical Issues
3. Presence of multicollinearity

Problem:


Large standard errors, or estimates
Solution:
Run multiway frequency tables to identify
categorical variables
 Run correlations to identify continuous variables
 Delete theoretically less important predictors or
combine with other procedures

Practical Issues

Rare events may be appropriate for
poisson regression or negative binomial
regression.
References
1.
2.
3.
4.
5.
Allison, P. D. (Logistic regression using the SAS system. NC, Cary: SAS
Institute, Inc.
Hosmer, D. W. & Lemeshow, S. (2000). Applied logistic regression. New
York: John Wiley & Sones, Inc.
Menard, S. (1994). Applied logistic regression analysis. Thousand Oaks,
CA: Sage Publications, Inc.
Liao, T. F. (1994). Interpreting Probability models: logit, probit, and
other generalized linear models. Thousand Oaks, CA: Sage Publications,
Inc.
Long, S.J. & Freese, J. (2006). Regression models for categorical
dependent variables using stata. College Station, Texus: Stata press
Download