Chapter 2. Logistic Regression (Part I) handout

Logistic Regression Interpretation & Estimation Statistical Inference Statistical Learning–Classification Chapter 2. Logistic Regression (Part I) Instructor: Yeying Zhu Department of Statistics & Actuarial Science University of Waterloo Extension Logistic Regression Interpretation & Estimation Statistical Inference Reference • Gareth, J., Daniela, W., Trevor, H., Robert, T. (2013). An introduction to statistical learning: with applications in R. Spinger, (second edition): Chapter 4.3 • STAT 431 course notes Extension Logistic Regression Interpretation & Estimation Outline Outline • Logistic regression • Interpretation & Estimation • Statistical Inference • Extension and Case Study Statistical Inference Extension Logistic Regression Interpretation & Estimation Statistical Inference Extension Background • Appropriate for classification into 2 classes, e.g., presence or absence of an event/ medical condition • Can be extended to the case of > 2 classes: multinomial logistic regression • The two classes will be coded as 0 (absence) and 1 (presence) • Define the probability that the event occurs P(Y = 1) = π Then Y ∼ Bernoulli(π) and E (Y ) = π. Logistic Regression Interpretation & Estimation Statistical Inference Extension Logistic Regression • Would like to use a linear function to model π as a function of the predictors (X-variables). We define π(X) = Pr (Y = 1|X) and would like to model a function of π(X) as: f (π(X)) = β0 + β1 X1 + · · · + βp Xp • The identity function, f (π) = π, is problematic. Why? π(X ) = β0 + β1 X1 + · · · + βp Xp Logistic Regression Interpretation & Estimation Statistical Inference Extension Link Function • The RHS of the model (the linear component) can take values on the real line • Define f () such that f (π) covers the real line; in GLM, we call f as the link function • One such function is the logit function logit(π) = log π 1−π Logistic Regression Interpretation & Estimation Statistical Inference Odds Ratio Odds has a formal statistical definition odds = π 1−π • If the odds of winning is 3:2, then the odds=1.5 • Conversely, if the odds are 1, what is π? Extension Logistic Regression Interpretation & Estimation Statistical Inference Link Functions • Some link functions we might consider: Identity f (π) = π log-log complementary log-log Probit† Logit † f (π) = log(− log(π)) f (π) = log(− log(1 − π)) f (π) = Φ−1 (π) f (π) = log(π/(1 − π)) Φ is the cdf for a standard normal random variable. Extension Logistic Regression Interpretation & Estimation Link Functions Statistical Inference Extension Logistic Regression Interpretation & Estimation Statistical Inference Logistic Regression • In logistic regression, we use the logit function as the link function: log π(X) = logit(π(X)) = β0 + β1 X1 + · · · + βp Xp 1 − π(X) Equivalently, π(X) = e β0 +β1 X1 +···+βp Xp 1 + e β0 +β1 X1 +···+βp Xp • Unlike in linear regression, there is no error term • The parameters β’s are unknown, but the functional form (linearity) is known. Extension Logistic Regression Interpretation & Estimation Statistical Inference Interpretation Interpretation of the regression coefficients • β0 : log odds at the baseline (when x1 = · · · = xp = 0) • βj : • for continuous Xj : log odds ratio associated with one unit increase in Xj when controlling for other predictors. • for binary Xj : log odds ratio comparing Xj = 1 versus Xj = 0 when controlling for other predictors. Extension Logistic Regression Interpretation & Estimation Statistical Inference Notations Let xij represent the value of the jth predictor, or input, for observation i, where i = 1, 2, . . . , n and j = 1, 2, . . . , p. Correspondingly, let yi represent the response value for the ith observation, either 0 or 1. Then our training data consist of (x1 , y1 ), (x2 , y2 ), . . . , (xn , yn ). • Explanatory variables: xi = (xi0 , xi1 , . . . , xip )T • Regression parameters: β = (β0 , β1 , . . . , βp )T Extension Logistic Regression Interpretation & Estimation Statistical Inference Estimation: Maximum Likelihood Estimation • We aim to maximize the probability (likelihood) of observing the training data w.r.t. the unknown parameters. • The likelihood can be written as: L(β) = n Y {[π(xi ; β)]yi × [(1 − π(xi ; β)]1−yi } i=1 • The log-likelihood can be written as: n X `(β) = {yi logπ(xi ; β) + (1 − yi )log[(1 − π(xi ; β)]} i=1 Extension Logistic Regression Interpretation & Estimation Statistical Inference Estimation `(β) P = ni=1 {yi logπ(xi ; β) + (1 − yi )log[(1 − π(xi ; β)]} P 1 xT i β ) + (1 − yi )log = ni=1 {yi xT } i β − yi log(1 + e xT β 1+e i P T xT i β ) − (1 − yi )log(1 + e xi β )} = ni=1 {yi xT i β − yi log(1 + e P xT i β )} = ni=1 {yi xT i β − log (1 + e Extension Logistic Regression Interpretation & Estimation Statistical Inference Extension Estimation • To find a maximum, we take the derivative w.r.t. the vector β and set it to zero; The score functions for β is: n S(β) = ∂`(β) X = xi (yi − π(xi ; β)) = 0 ∂β i=1 where π(xi ; β) = Tβ e xi 1+e xT β i • These are non-linear equation in β’s • Use a numerical optimization technique such as Newton-Raphson to find an approximation. Logistic Regression Interpretation & Estimation Statistical Inference Newton-Raphson • Update in the Newton-Raphson Algorithm: β new = β old − ( ∂ 2 `(β) −1 ∂`(β) ) ∂β ∂β∂β T where the second derivative is: n X ∂ 2 `(β) xi xT =− i π(xi ; β)(1 − π(xi ; β)) ∂β∂β T i=1 • The information matrix is: I (β) = − ∂ 2 `(β) ∂β∂β T b is a MLE, asymptotically, we have • since β b b β ∼ MVN(β, I −1 (β)) Extension Logistic Regression Interpretation & Estimation Statistical Inference Newton-Raphson Newton Raphson Algorithm for finding the MLE We wish to maximize the function `(β) by solving S(β) = 0 • Begin with an initial estimate β 0 • Iteratively obtain estimates β 1 , β 2 , β 3 , . . . using β i+1 = β i + I −1 (β i )S(β i ) • Iteration should continue until β i+1 ' β i (i.e. |β i+1 − β i | is within a specified tolerance) • Then set β̂ = β i+1 b > 0. • To determine if it is a maxima of `(β), check that I (β) Extension Logistic Regression Interpretation & Estimation Statistical Inference Prediction from Logistic Regression • Sometimes we may be interested in the parameter π itself • In a logistic regression regression model, the fitted value for the probability of response π is ˆ (Y = 1|X = xi ) = π̂i = π̂(xi ) = Pr e β̂0 +β̂1 xi1 +...β̂p xip 1 + e β̂0 +β̂1 xi1 +···+β̂p xip • More generally, we can write π̂i = π̂(xi ) = b exp(xT i β) b = expit xT β i b 1 + exp(xT β) i Extension Logistic Regression Interpretation & Estimation Statistical Inference Extension Prediction from Logistic Regression • Sometimes we may be interested in predicting Y • If we use 0.5 as the threshold (Bayes Classifier), the decision rule should be: if π̂i > 0.5, Ŷi = 1; if π̂i < 0.5, Ŷi = 0; if π̂i = 0.5, we can make a random draw. • However, in some situations, we may want to use a threshold different from 0.5. Logistic Regression Interpretation & Estimation Statistical Inference Extension Hypothesis test for βk • Sometimes, we may test the significance of a single predictor, say Xk , k = 1, . . . , p • We may conduct the following test: H0 : βk = βk0 versus Ha : βk 6= βk0 Note: in most cases, βk0 = 0. b ∼ MVN(β, I −1 (β)) b approximately • Recall that, β (asymptotically). • The general Wald Result for scalar βk is, approximately, βbk − βk0 ∼ N(0, 1) se(βbk ) Logistic Regression Interpretation & Estimation Statistical Inference Hypothesis test for βk • we can find the p-value of the test using: " # |βbk − βk0 | p≈2P Z > where Z ∼ N(0, 1) se(βbk ) where se: estimated standard errors based on the inverse of the information matrix i 1/2 1/2 h se(βbk ) = I −1 β̂ = I kk β̂ kk • The Wald-based confidence interval for βk is: ck ± 1.96se βbk β Extension Logistic Regression Interpretation & Estimation Statistical Inference Confidence Intervals for πi • Sometimes, we may be interested in providing a 95% confidence interval for the probability of an event happening given the predictors. • Step I: We can use the fact that approximately, b i b ∼ N xT β, xT I −1 (β)x xT β i i i and b − xT β xT β i qi ∼ N(0, 1) T −1 b i xi I (β)x Extension Logistic Regression Interpretation & Estimation Statistical Inference Extension Confidence Intervals for πi • Step II: An approximate 95% CI for ηi = xT i β is then given by q Tb −1 (β)x b i = (b ηL , ηbU ) xi β ± 1.96 xT i I • Step III: An approximate 95% CI for T πi = exp(xT i β)/(1 + exp(xi β)) is (exp{b ηL }/(1 + exp{b ηL }), exp{b ηU }/(1 + exp{b ηU })) Logistic Regression Interpretation & Estimation Statistical Inference Extension Multinomial Logistic Regression: Model • We can extend the two-class logistic regression to the setting of K > 2 classes. • Assume for k = 1, . . . , K − 1, Pr (Y = k|X = x) = e βk0 +βk1 x1 +···+βkp xp PK −1 β +β x +···+β xp lp 1 + l=1 e l0 l1 1 and Pr (Y = K |X = x) = 1 1+ PK −1 l=1 e βl0 +βl1 x1 +···+βlp xp Here, class K is treated the reference group. Logistic Regression Interpretation & Estimation Statistical Inference Extension Multinomial Logistic Regression: Interpretation • for k = 1, . . . , K − 1, we can show: log Pr (Y = k|X = x) = βk0 + βk1 x1 + · · · + βkp xp Pr (Y = K |X = x) • Interpretation of βk0 : log odds of class k versus class K at x1 = . . . , xp = 0 Interpretation of βkj : log odds ratio of class k versus class K associated with 1 unit increase in Xj when other predictors are held constant.

Chapter 2. Logistic Regression (Part I) handout

Related documents

Products

Support

Chapter 2. Logistic Regression (Part I) handout

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib