Logistic regression models 1/13 Logistic regression model Suppose we observe binomial distributed data S1 , · · · , Sm , which are, respectively, the numbers of successes among n1 , · · · , nm trials. Assume that Si ∼ Binomial(ni , pi ) and pi log = XiT β, 1 − pi where Xi is a p-dimensional covariate. 2/13 Asymptotic normality Using the large sample property of the maximum likelihood estimator, we know that β̂ − β ∼ N(0, (X T VX )−1 ), where V = diag{n1 p1 (1 − p1 ), · · · , nm pm (1 − pm )} and X = (X1 , · · · , Xm )T is the m × p design matrix. 3/13 Wald type inference A 1 − α confidence interval for βj (j = 1, · · · , p) is β̂j ± zα/2 [(X T VX )−1/2 ]jj , where [(X T VX )−1/2 ]jj is the j-th diagonal component of (X T VX )−1/2 , and zα/2 is the upper α/2 quantile of a standard normal. 4/13 Log-likelihood of β Recall that the log-likelihood of β is m m X X ni pi )+ ni log(1 − pi ) `(β) = log{ }+ Si log( 1 − pi Si i=1 i=1 i=1 m h i X n = log{ i } + Si log(pi ) + (ni − Si ) log(1 − pi ) . Si m X i=1 5/13 Testing the significance of the predictors I Consider the covariates Xi = (1, Xi1 , · · · , Xi(p−1) )T . The first covariate is set to be 1, which is corresponding to the intercept in the logistic regression model. I If all the predictors are not significant, then the reduced model is pi = exp(β1 ) , 1 + exp(β1 ) which is a constant probability model. I To test the significant of the predictors Xi1 , · · · , Xi(p−1) , we would like to test the significance of β2 , · · · , βp . 6/13 Testing the significance of the predictors To test the significance of the predictors, we test the following hypothesis H0 : β2 = · · · = βp = 0 vs H1 : one of βj is not 0, j = 2, · · · , p. 7/13 Log-likelihood ratio I Assume p̂j is the maximum likelihood estimator of pj under the logistic regression model. Let β̂ is the MLE of β, then p̂j = exp(β̂ T Xi )/{1 + exp(β̂ T Xi )}. I The log-likelihood under the alternative is `(β) = m h X log{ i=1 I i ni } + Si log(p̂i ) + (ni − Si ) log(1 − p̂i ) . Si Under the null hypothesis, the log-likelihood is `(β) = m h X i=1 i n log{ i } + Si log(p̂) + (ni − Si ) log(1 − p̂) Si where p̂ = S/n, S = Pm i=1 Si and n = Pm i=1 ni . 8/13 Rejection region of the likelihood ratio test For testing the significance of predictors, H0 : β2 = · · · = βp = 0 vs H1 : one of βj is not 0, j = 2, · · · , p. An α level rejection region of the likelihood ratio test is m h n X i LR = 2 Si log(p̂i ) + (ni − Si ) log(1 − p̂i ) i=1 o − 2S log(S) − 2(n − S) log(n − S) + 2n log(n) > χ2p−1,α . 9/13 Diagnostic tools: goodness-of-fit test I To test the goodness-of-fit of the full model, we compare the full model (with p covariates) against a saturated model. I A saturated model has m unknown parameters. It is a nonparametric model. Specifically, Si ∼ Binomial(ni , pi ), pi are all the unknown parameters. Note m is the largest possible number of unknown parameters. I It is clear that in the saturated model, the maximum likelihood estimator of pi is Si /ni . 10/13 Deviance I The log-likelihood ratios statistics for testing the goodness-of-fit is m h i X D=2 Si log(Si /ni ) + (ni − Si ) log(1 − Si /ni ) i=1 m h X i Si log(p̂i ) + (ni − Si ) log(1 − p̂i ) . −2 i=1 I The above log-likelihood ratio is called deviance. I The deviance D follows a χ2m−p distribution under the null hypothesis that the full model (with p covariates) is adequate for the data. 11/13 Diagnostic tools: checking residuals I Ordinary residual: ei = Si − Ŝi = Si − ni p̂i for i = 1, · · · , m. I Deviance residual: √ di = h n n − S oi1/2 S i i 2Sign(ei ) Si log( i )+(n −Si ) log ni p̂i ni (1 − p̂i ) for i = 1, · · · , m. 12/13 Diagnostic tools: checking residuals I Pearson residual: S − ni p̂i ri = p i ni p̂i (1 − p̂i ) I Standardized Pearson residual: sri = √ ri 1 − hii where hii = (H)ii and H = V 1/2 X (X T VX )−1 X T V 1/2 . 13/13