Uploaded by integ220

Chapter 2. Logistic Regression (Part I) handout

advertisement
Logistic Regression
Interpretation & Estimation
Statistical Inference
Statistical Learning–Classification
Chapter 2. Logistic Regression (Part I)
Instructor: Yeying Zhu
Department of Statistics & Actuarial Science
University of Waterloo
Extension
Logistic Regression
Interpretation & Estimation
Statistical Inference
Reference
• Gareth, J., Daniela, W., Trevor, H., Robert, T. (2013). An
introduction to statistical learning: with applications in R.
Spinger, (second edition): Chapter 4.3
• STAT 431 course notes
Extension
Logistic Regression
Interpretation & Estimation
Outline
Outline
• Logistic regression
• Interpretation & Estimation
• Statistical Inference
• Extension and Case Study
Statistical Inference
Extension
Logistic Regression
Interpretation & Estimation
Statistical Inference
Extension
Background
• Appropriate for classification into 2 classes, e.g., presence or
absence of an event/ medical condition
• Can be extended to the case of > 2 classes: multinomial
logistic regression
• The two classes will be coded as 0 (absence) and 1 (presence)
• Define the probability that the event occurs
P(Y = 1) = π
Then Y ∼ Bernoulli(π) and E (Y ) = π.
Logistic Regression
Interpretation & Estimation
Statistical Inference
Extension
Logistic Regression
• Would like to use a linear function to model π as a function of
the predictors (X-variables). We define π(X) = Pr (Y = 1|X)
and would like to model a function of π(X) as:
f (π(X)) = β0 + β1 X1 + · · · + βp Xp
• The identity function, f (π) = π, is problematic. Why?
π(X ) = β0 + β1 X1 + · · · + βp Xp
Logistic Regression
Interpretation & Estimation
Statistical Inference
Extension
Link Function
• The RHS of the model (the linear component) can take values
on the real line
• Define f () such that f (π) covers the real line; in GLM, we call
f as the link function
• One such function is the logit function
logit(π) = log
π
1−π
Logistic Regression
Interpretation & Estimation
Statistical Inference
Odds Ratio
Odds has a formal statistical definition
odds =
π
1−π
• If the odds of winning is 3:2, then the odds=1.5
• Conversely, if the odds are 1, what is π?
Extension
Logistic Regression
Interpretation & Estimation
Statistical Inference
Link Functions
• Some link functions we might consider:
Identity
f (π) = π
log-log
complementary log-log
Probit†
Logit
†
f (π) = log(− log(π))
f (π) = log(− log(1 − π))
f (π) = Φ−1 (π)
f (π) = log(π/(1 − π))
Φ is the cdf for a standard normal random variable.
Extension
Logistic Regression
Interpretation & Estimation
Link Functions
Statistical Inference
Extension
Logistic Regression
Interpretation & Estimation
Statistical Inference
Logistic Regression
• In logistic regression, we use the logit function as the link
function:
log
π(X)
= logit(π(X)) = β0 + β1 X1 + · · · + βp Xp
1 − π(X)
Equivalently,
π(X) =
e β0 +β1 X1 +···+βp Xp
1 + e β0 +β1 X1 +···+βp Xp
• Unlike in linear regression, there is no error term
• The parameters β’s are unknown, but the functional form
(linearity) is known.
Extension
Logistic Regression
Interpretation & Estimation
Statistical Inference
Interpretation
Interpretation of the regression coefficients
• β0 : log odds at the baseline (when x1 = · · · = xp = 0)
• βj :
• for continuous Xj : log odds ratio associated with one unit
increase in Xj when controlling for other predictors.
• for binary Xj : log odds ratio comparing Xj = 1 versus Xj = 0
when controlling for other predictors.
Extension
Logistic Regression
Interpretation & Estimation
Statistical Inference
Notations
Let xij represent the value of the jth predictor, or input, for
observation i, where i = 1, 2, . . . , n and j = 1, 2, . . . , p.
Correspondingly, let yi represent the response value for the ith
observation, either 0 or 1. Then our training data consist of
(x1 , y1 ), (x2 , y2 ), . . . , (xn , yn ).
• Explanatory variables: xi = (xi0 , xi1 , . . . , xip )T
• Regression parameters: β = (β0 , β1 , . . . , βp )T
Extension
Logistic Regression
Interpretation & Estimation
Statistical Inference
Estimation: Maximum Likelihood Estimation
• We aim to maximize the probability (likelihood) of observing
the training data w.r.t. the unknown parameters.
• The likelihood can be written as:
L(β) =
n
Y
{[π(xi ; β)]yi × [(1 − π(xi ; β)]1−yi }
i=1
• The log-likelihood can be written as:
n
X
`(β) =
{yi logπ(xi ; β) + (1 − yi )log[(1 − π(xi ; β)]}
i=1
Extension
Logistic Regression
Interpretation & Estimation
Statistical Inference
Estimation
`(β)
P
= ni=1 {yi logπ(xi ; β) + (1 − yi )log[(1 − π(xi ; β)]}
P
1
xT
i β ) + (1 − yi )log
= ni=1 {yi xT
}
i β − yi log(1 + e
xT β
1+e i
P
T
xT
i β ) − (1 − yi )log(1 + e xi β )}
= ni=1 {yi xT
i β − yi log(1 + e
P
xT
i β )}
= ni=1 {yi xT
i β − log (1 + e
Extension
Logistic Regression
Interpretation & Estimation
Statistical Inference
Extension
Estimation
• To find a maximum, we take the derivative w.r.t. the vector β
and set it to zero; The score functions for β is:
n
S(β) =
∂`(β) X
=
xi (yi − π(xi ; β)) = 0
∂β
i=1
where π(xi ; β) =
Tβ
e xi
1+e
xT β
i
• These are non-linear equation in β’s
• Use a numerical optimization technique such as
Newton-Raphson to find an approximation.
Logistic Regression
Interpretation & Estimation
Statistical Inference
Newton-Raphson
• Update in the Newton-Raphson Algorithm:
β new = β old − (
∂ 2 `(β) −1 ∂`(β)
)
∂β
∂β∂β T
where the second derivative is:
n
X
∂ 2 `(β)
xi xT
=−
i π(xi ; β)(1 − π(xi ; β))
∂β∂β T
i=1
• The information matrix is:
I (β) = −
∂ 2 `(β)
∂β∂β T
b is a MLE, asymptotically, we have
• since β
b
b
β ∼ MVN(β, I −1 (β))
Extension
Logistic Regression
Interpretation & Estimation
Statistical Inference
Newton-Raphson
Newton Raphson Algorithm for finding the MLE
We wish to maximize the function `(β) by solving S(β) = 0
• Begin with an initial estimate β 0
• Iteratively obtain estimates β 1 , β 2 , β 3 , . . . using
β i+1 = β i + I −1 (β i )S(β i )
• Iteration should continue until β i+1 ' β i
(i.e. |β i+1 − β i | is within a specified tolerance)
• Then set β̂ = β i+1
b > 0.
• To determine if it is a maxima of `(β), check that I (β)
Extension
Logistic Regression
Interpretation & Estimation
Statistical Inference
Prediction from Logistic Regression
• Sometimes we may be interested in the parameter π itself
• In a logistic regression regression model, the fitted value for
the probability of response π is
ˆ (Y = 1|X = xi ) =
π̂i = π̂(xi ) = Pr
e β̂0 +β̂1 xi1 +...β̂p xip
1 + e β̂0 +β̂1 xi1 +···+β̂p xip
• More generally, we can write
π̂i = π̂(xi ) =
b
exp(xT
i β)
b
= expit xT
β
i
b
1 + exp(xT β)
i
Extension
Logistic Regression
Interpretation & Estimation
Statistical Inference
Extension
Prediction from Logistic Regression
• Sometimes we may be interested in predicting Y
• If we use 0.5 as the threshold (Bayes Classifier), the decision
rule should be: if π̂i > 0.5, Ŷi = 1; if π̂i < 0.5, Ŷi = 0; if
π̂i = 0.5, we can make a random draw.
• However, in some situations, we may want to use a threshold
different from 0.5.
Logistic Regression
Interpretation & Estimation
Statistical Inference
Extension
Hypothesis test for βk
• Sometimes, we may test the significance of a single predictor,
say Xk , k = 1, . . . , p
• We may conduct the following test:
H0 : βk = βk0
versus
Ha : βk 6= βk0
Note: in most cases, βk0 = 0.
b ∼ MVN(β, I −1 (β))
b approximately
• Recall that, β
(asymptotically).
• The general Wald Result for scalar βk is, approximately,
βbk − βk0
∼ N(0, 1)
se(βbk )
Logistic Regression
Interpretation & Estimation
Statistical Inference
Hypothesis test for βk
• we can find the p-value of the test using:
"
#
|βbk − βk0 |
p≈2P Z >
where Z ∼ N(0, 1)
se(βbk )
where se: estimated standard errors based on the inverse of
the information matrix
i 1/2 1/2
h
se(βbk ) = I −1 β̂
= I kk β̂
kk
• The Wald-based confidence interval for βk is:
ck ± 1.96se βbk
β
Extension
Logistic Regression
Interpretation & Estimation
Statistical Inference
Confidence Intervals for πi
• Sometimes, we may be interested in providing a 95%
confidence interval for the probability of an event happening
given the predictors.
• Step I: We can use the fact that approximately,
b i
b ∼ N xT β, xT I −1 (β)x
xT
β
i
i
i
and
b − xT β
xT β
i
qi
∼ N(0, 1)
T
−1
b i
xi I (β)x
Extension
Logistic Regression
Interpretation & Estimation
Statistical Inference
Extension
Confidence Intervals for πi
• Step II: An approximate 95% CI for ηi = xT
i β is then given by
q
Tb
−1 (β)x
b i = (b
ηL , ηbU )
xi β ± 1.96 xT
i I
• Step III: An approximate 95% CI for
T
πi = exp(xT
i β)/(1 + exp(xi β)) is
(exp{b
ηL }/(1 + exp{b
ηL }), exp{b
ηU }/(1 + exp{b
ηU }))
Logistic Regression
Interpretation & Estimation
Statistical Inference
Extension
Multinomial Logistic Regression: Model
• We can extend the two-class logistic regression to the setting
of K > 2 classes.
• Assume for k = 1, . . . , K − 1,
Pr (Y = k|X = x) =
e βk0 +βk1 x1 +···+βkp xp
PK −1 β +β x +···+β xp
lp
1 + l=1 e l0 l1 1
and
Pr (Y = K |X = x) =
1
1+
PK −1
l=1
e βl0 +βl1 x1 +···+βlp xp
Here, class K is treated the reference group.
Logistic Regression
Interpretation & Estimation
Statistical Inference
Extension
Multinomial Logistic Regression: Interpretation
• for k = 1, . . . , K − 1, we can show:
log
Pr (Y = k|X = x)
= βk0 + βk1 x1 + · · · + βkp xp
Pr (Y = K |X = x)
• Interpretation of βk0 : log odds of class k versus class K at
x1 = . . . , xp = 0
Interpretation of βkj : log odds ratio of class k versus class K
associated with 1 unit increase in Xj when other predictors are
held constant.
Download