Logistic regression models 1/12

Logistic regression models 1/12 Example 1: dogs look like their owners? I Some people believe that dogs look like their owners. Is this true? I To test the above hypothesis, The New York Times conducted a quiz online. A group of dogs and owners are photographed by Fred Conrad. I For each dog, four possible owners are given in the quiz. Please choose the owner for each dog. http: //www.nytimes.com/interactive/2015/02/16/ sports/westminster-dog-show-quiz.html?_r=0 2/12 Example 2: breast cancer data set I Consider the data set collected by Richardson et al. (2006) as an example. The study aims to find genes that are associated with the sporadic basal-like cancers (BLC), a distinct class of human breast cancers. I In this example, the response variable Yi is the types of the breast cancer. For instance, we use Yi = 0 to represent the non-BLC type and Yi = 1 to represent the BLC type. I The predictors in this example are the gene expression data or the SNPs data. For example, we could consider the gene CSF2RA as one of the candidate gene. 3/12 Logistic regression models I Consider Yi to be Bernoulli distributed response. For example, Yi could be failure or success, or could be different treatment groups. Assume Yi ∼ Bernoulli(pi ) and Yi is associated with the covariates Xi . I Model the conditional expectation of Yi . Recall that, in linear models, we assume that E(Yi |Xi ) = XiT β and in the non-linear models, E(Yi |Xi ) = f (Xi ; β). I In logistic regression model, assume that E(Yi |Xi ) = pi depends on Xi . Namely, E(Yi |Xi ) = p(Xi ) for 0 ≤ p(Xi ) ≤ 1. 4/12 Link functions In general, we assume that E(Yi |Xi ) = h(XiT β). Here h−1 (·) is the link function, which links E(Yi |Xi ) with a linear function of Xi . Three commonly used link functions: logit link, probit link and complementary log-log link. exp(z) 1+exp(z) , p then h−1 (p) = log( 1−p ). I (Logit link) If p = h(z) = I (Probit link)If p = Φ(z) where Φ(z) is the CDF function of a standard normal, then h−1 (p) = Φ−1 (p). I (Complementary log-log) If p = h(z) = 1 − exp{− exp(η)}, then h−1 (p) = log{− log(1 − p)}. 5/12 A logistic regression model I Response: Bernoulli distributed random variable Yi ∼ Bernoulli(pi ) i = 1, · · · , n. I Systematic component: ηi = I Link function: h(ηi ) = pi . Pp j=1 Xij βj . 6/12 Estimation of β The estimation of β can be obtained by the maximum likelihood method. The likelihood function for β is L(β) = n Y piYi (1 − pi )1−Yi . i=1 The log-likelihood function for β is `(β) = log L(β) = n X n Yi log( i=1 = n X i=1 Yi XiT β − n X X pi )+ log(1 − pi ) 1 − pi i=1 log{1 + exp(XiT β)}. i=1 7/12 MLE of β I The MLE of β is β = arg max `(β) β where `(β) is log-likelihood function of β. I We do not have closed form solution of β. But `(β) is a concave function of β, which is relatively easy to optimize. 8/12 Score function and Hessian matrix I The score function of β is n n n i=1 i=1 i=1 X Xi exp(X T β) X ∂`(β) X i = Xi Yi − = Xi (Yi − pi ). ∂β 1 + exp(XiT β) I The hessian matrix of β is n X exp(XiT β) exp(XiT β) ∂`(β) T = − X X {1 − } i i ∂β∂β T 1 + exp(XiT β) 1 + exp(XiT β) i=1 = −X T VX where V = diag{p1 (1 − p1 ), · · · , pn (1 − pn )} and X = (X1 , · · · , Xn )T . Here pi = exp(XiT β)/{1 + exp(XiT β)}. 9/12 Extension to Binomial distributed data Suppose we observe Binomial distributed response Si ∼ Binomial(ni , pi ), where ni is known. We would like to study the association between the response Si and some covariates Xi . A corresponding logistic regression model is Si ∼ Binomial(ni , pi ) p i = XiT β log 1 + pi for i = 1, · · · , m. 10/12 Estimation of β We can still apply the maximum likelihood method to estimate β. The likelihood function for β is L(β) = m Y ni Si i=1 piSi (1 − pi )ni −Si . The log-likelihood function for β is `(β) = log L(β) = C + m X m Si log( i=1 =C+ n X i=1 Si XiT β − m X X pi )+ ni log(1 − pi ) 1 − pi i=1 ni log{1 + exp(XiT β)}. i=1 where C is constant that has nothing to do with β. 11/12 Score function and Hessian matrix I The score function of β is m n m i=1 i=1 i=1 X ni Xi exp(X T β) X ∂`(β) X i = Xi Si − = Xi (Si − ni pi ). ∂β 1 + exp(XiT β) I The hessian matrix of β is m X exp(XiT β) exp(XiT β) ∂`(β) T = − n X X {1 − } i i i ∂β∂β T 1 + exp(XiT β) 1 + exp(XiT β) i=1 = −X T VX where V = diag{n1 p1 (1 − p1 ), · · · , nm pm (1 − pm )} and X = (X1 , · · · , Xm )T . Here pi = exp(XiT β)/{1 + exp(XiT β)}. 12/12

Logistic regression models 1/12

Related documents

Products

Support

Logistic regression models 1/12

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib