Logistic Regression Interpretation & Estimation Statistical Inference Statistical Learning–Classification Chapter 2. Logistic Regression (Part I) Instructor: Yeying Zhu Department of Statistics & Actuarial Science University of Waterloo Extension Logistic Regression Interpretation & Estimation Statistical Inference Reference • Gareth, J., Daniela, W., Trevor, H., Robert, T. (2013). An introduction to statistical learning: with applications in R. Spinger, (second edition): Chapter 4.3 • STAT 431 course notes Extension Logistic Regression Interpretation & Estimation Outline Outline • Logistic regression • Interpretation & Estimation • Statistical Inference • Extension and Case Study Statistical Inference Extension Logistic Regression Interpretation & Estimation Statistical Inference Extension Background • Appropriate for classification into 2 classes, e.g., presence or absence of an event/ medical condition • Can be extended to the case of > 2 classes: multinomial logistic regression • The two classes will be coded as 0 (absence) and 1 (presence) • Define the probability that the event occurs P(Y = 1) = π Then Y ∼ Bernoulli(π) and E (Y ) = π. Logistic Regression Interpretation & Estimation Statistical Inference Extension Logistic Regression • Would like to use a linear function to model π as a function of the predictors (X-variables). We define π(X) = Pr (Y = 1|X) and would like to model a function of π(X) as: f (π(X)) = β0 + β1 X1 + · · · + βp Xp • The identity function, f (π) = π, is problematic. Why? π(X ) = β0 + β1 X1 + · · · + βp Xp Logistic Regression Interpretation & Estimation Statistical Inference Extension Link Function • The RHS of the model (the linear component) can take values on the real line • Define f () such that f (π) covers the real line; in GLM, we call f as the link function • One such function is the logit function logit(π) = log π 1−π Logistic Regression Interpretation & Estimation Statistical Inference Odds Ratio Odds has a formal statistical definition odds = π 1−π • If the odds of winning is 3:2, then the odds=1.5 • Conversely, if the odds are 1, what is π? Extension Logistic Regression Interpretation & Estimation Statistical Inference Link Functions • Some link functions we might consider: Identity f (π) = π log-log complementary log-log Probit† Logit † f (π) = log(− log(π)) f (π) = log(− log(1 − π)) f (π) = Φ−1 (π) f (π) = log(π/(1 − π)) Φ is the cdf for a standard normal random variable. Extension Logistic Regression Interpretation & Estimation Link Functions Statistical Inference Extension Logistic Regression Interpretation & Estimation Statistical Inference Logistic Regression • In logistic regression, we use the logit function as the link function: log π(X) = logit(π(X)) = β0 + β1 X1 + · · · + βp Xp 1 − π(X) Equivalently, π(X) = e β0 +β1 X1 +···+βp Xp 1 + e β0 +β1 X1 +···+βp Xp • Unlike in linear regression, there is no error term • The parameters β’s are unknown, but the functional form (linearity) is known. Extension Logistic Regression Interpretation & Estimation Statistical Inference Interpretation Interpretation of the regression coefficients • β0 : log odds at the baseline (when x1 = · · · = xp = 0) • βj : • for continuous Xj : log odds ratio associated with one unit increase in Xj when controlling for other predictors. • for binary Xj : log odds ratio comparing Xj = 1 versus Xj = 0 when controlling for other predictors. Extension Logistic Regression Interpretation & Estimation Statistical Inference Notations Let xij represent the value of the jth predictor, or input, for observation i, where i = 1, 2, . . . , n and j = 1, 2, . . . , p. Correspondingly, let yi represent the response value for the ith observation, either 0 or 1. Then our training data consist of (x1 , y1 ), (x2 , y2 ), . . . , (xn , yn ). • Explanatory variables: xi = (xi0 , xi1 , . . . , xip )T • Regression parameters: β = (β0 , β1 , . . . , βp )T Extension Logistic Regression Interpretation & Estimation Statistical Inference Estimation: Maximum Likelihood Estimation • We aim to maximize the probability (likelihood) of observing the training data w.r.t. the unknown parameters. • The likelihood can be written as: L(β) = n Y {[π(xi ; β)]yi × [(1 − π(xi ; β)]1−yi } i=1 • The log-likelihood can be written as: n X `(β) = {yi logπ(xi ; β) + (1 − yi )log[(1 − π(xi ; β)]} i=1 Extension Logistic Regression Interpretation & Estimation Statistical Inference Estimation `(β) P = ni=1 {yi logπ(xi ; β) + (1 − yi )log[(1 − π(xi ; β)]} P 1 xT i β ) + (1 − yi )log = ni=1 {yi xT } i β − yi log(1 + e xT β 1+e i P T xT i β ) − (1 − yi )log(1 + e xi β )} = ni=1 {yi xT i β − yi log(1 + e P xT i β )} = ni=1 {yi xT i β − log (1 + e Extension Logistic Regression Interpretation & Estimation Statistical Inference Extension Estimation • To find a maximum, we take the derivative w.r.t. the vector β and set it to zero; The score functions for β is: n S(β) = ∂`(β) X = xi (yi − π(xi ; β)) = 0 ∂β i=1 where π(xi ; β) = Tβ e xi 1+e xT β i • These are non-linear equation in β’s • Use a numerical optimization technique such as Newton-Raphson to find an approximation. Logistic Regression Interpretation & Estimation Statistical Inference Newton-Raphson • Update in the Newton-Raphson Algorithm: β new = β old − ( ∂ 2 `(β) −1 ∂`(β) ) ∂β ∂β∂β T where the second derivative is: n X ∂ 2 `(β) xi xT =− i π(xi ; β)(1 − π(xi ; β)) ∂β∂β T i=1 • The information matrix is: I (β) = − ∂ 2 `(β) ∂β∂β T b is a MLE, asymptotically, we have • since β b b β ∼ MVN(β, I −1 (β)) Extension Logistic Regression Interpretation & Estimation Statistical Inference Newton-Raphson Newton Raphson Algorithm for finding the MLE We wish to maximize the function `(β) by solving S(β) = 0 • Begin with an initial estimate β 0 • Iteratively obtain estimates β 1 , β 2 , β 3 , . . . using β i+1 = β i + I −1 (β i )S(β i ) • Iteration should continue until β i+1 ' β i (i.e. |β i+1 − β i | is within a specified tolerance) • Then set β̂ = β i+1 b > 0. • To determine if it is a maxima of `(β), check that I (β) Extension Logistic Regression Interpretation & Estimation Statistical Inference Prediction from Logistic Regression • Sometimes we may be interested in the parameter π itself • In a logistic regression regression model, the fitted value for the probability of response π is ˆ (Y = 1|X = xi ) = π̂i = π̂(xi ) = Pr e β̂0 +β̂1 xi1 +...β̂p xip 1 + e β̂0 +β̂1 xi1 +···+β̂p xip • More generally, we can write π̂i = π̂(xi ) = b exp(xT i β) b = expit xT β i b 1 + exp(xT β) i Extension Logistic Regression Interpretation & Estimation Statistical Inference Extension Prediction from Logistic Regression • Sometimes we may be interested in predicting Y • If we use 0.5 as the threshold (Bayes Classifier), the decision rule should be: if π̂i > 0.5, Ŷi = 1; if π̂i < 0.5, Ŷi = 0; if π̂i = 0.5, we can make a random draw. • However, in some situations, we may want to use a threshold different from 0.5. Logistic Regression Interpretation & Estimation Statistical Inference Extension Hypothesis test for βk • Sometimes, we may test the significance of a single predictor, say Xk , k = 1, . . . , p • We may conduct the following test: H0 : βk = βk0 versus Ha : βk 6= βk0 Note: in most cases, βk0 = 0. b ∼ MVN(β, I −1 (β)) b approximately • Recall that, β (asymptotically). • The general Wald Result for scalar βk is, approximately, βbk − βk0 ∼ N(0, 1) se(βbk ) Logistic Regression Interpretation & Estimation Statistical Inference Hypothesis test for βk • we can find the p-value of the test using: " # |βbk − βk0 | p≈2P Z > where Z ∼ N(0, 1) se(βbk ) where se: estimated standard errors based on the inverse of the information matrix i 1/2 1/2 h se(βbk ) = I −1 β̂ = I kk β̂ kk • The Wald-based confidence interval for βk is: ck ± 1.96se βbk β Extension Logistic Regression Interpretation & Estimation Statistical Inference Confidence Intervals for πi • Sometimes, we may be interested in providing a 95% confidence interval for the probability of an event happening given the predictors. • Step I: We can use the fact that approximately, b i b ∼ N xT β, xT I −1 (β)x xT β i i i and b − xT β xT β i qi ∼ N(0, 1) T −1 b i xi I (β)x Extension Logistic Regression Interpretation & Estimation Statistical Inference Extension Confidence Intervals for πi • Step II: An approximate 95% CI for ηi = xT i β is then given by q Tb −1 (β)x b i = (b ηL , ηbU ) xi β ± 1.96 xT i I • Step III: An approximate 95% CI for T πi = exp(xT i β)/(1 + exp(xi β)) is (exp{b ηL }/(1 + exp{b ηL }), exp{b ηU }/(1 + exp{b ηU })) Logistic Regression Interpretation & Estimation Statistical Inference Extension Multinomial Logistic Regression: Model • We can extend the two-class logistic regression to the setting of K > 2 classes. • Assume for k = 1, . . . , K − 1, Pr (Y = k|X = x) = e βk0 +βk1 x1 +···+βkp xp PK −1 β +β x +···+β xp lp 1 + l=1 e l0 l1 1 and Pr (Y = K |X = x) = 1 1+ PK −1 l=1 e βl0 +βl1 x1 +···+βlp xp Here, class K is treated the reference group. Logistic Regression Interpretation & Estimation Statistical Inference Extension Multinomial Logistic Regression: Interpretation • for k = 1, . . . , K − 1, we can show: log Pr (Y = k|X = x) = βk0 + βk1 x1 + · · · + βkp xp Pr (Y = K |X = x) • Interpretation of βk0 : log odds of class k versus class K at x1 = . . . , xp = 0 Interpretation of βkj : log odds ratio of class k versus class K associated with 1 unit increase in Xj when other predictors are held constant.