lect16

Lecture 16: Logistic Regression: Goodness of Fit Information Criteria ROC analysis BMTRY 701 Biostatistical Methods II Goodness of Fit  A test of how well the model explains the data  Applies to linear models and generalized linear models  How to do it?  It is simply a comparison of the “current” model to a perfect model • What would the estimated likelihood function be in a perfect model? • What would the estimated log-likelihood function be in a perfect model Set up as a hypothesis test  Ho: current model  H1: perfect model  Recall the G2 statistic comparing models: G2 = Dev(0) - Dev(1)  How many parameters are there in the null model?  How many parameters are there in the perfect model? Goodness of Fit test  Perfect model: Assumed to be ‘saturated’ in most cases  That is, there is a parameter for each combination of predictors  In our model? that is likely to be close to N due to the number of continuous variables  Define c = number of parameters in saturated model  Deviance goodness of fit: Dev(0) Goodness of Fit test  Deviance goodness of fit: Dev(0)  If Dev(Ho) < χ2(c-p),1-α, conclude H0  If Dev(Ho) > χ2(c-p),1-α conclude H1  Why arent we subtracting deviances? GoF test for Prostate Cancer Model > mreg1 <- glm(cap.inv ~ gleason + log(psa) + vol + factor(dpros), + family=binomial) > mreg0 <- glm(cap.inv ~ gleason + log(psa) + vol, family=binomial) > mreg1 Coefficients: (Intercept) gleason log(psa) vol -8.31383 0.93147 0.53422 -0.01507 factor(dpros)2 factor(dpros)3 factor(dpros)4 0.76840 1.55109 1.44743 Degrees of Freedom: 378 Total (i.e. Null); 372 Residual (1 observation deleted due to missingness) Null Deviance: 511.3 Residual Deviance: 377.1 AIC: 391.1 Test Statistic: 377.1 ~ χ2(380 - 7) Threshold: χ2(373),1-α, = 419.0339 p-value = 0.43 More Goodness of Fit  There are a lot of options!  Deviance GoF is just one • Pearson Chi-square • Hosmer-Lemeshow • etc  Principles, however, are essentially the same  GoF is not that commonly seen in medical research because it is rarely very important Information Criteria  Information criterion is a measure of the goodness of fit of an estimated statistical model.  It is grounded in the concept of entropy, • offers a relative measure of the information lost • describes the tradeoff precision and complexity of the model.  An IC is not a test on the model in the sense of hypothesis testing  it is a tool for model selection.  Given a data set, several competing models may be ranked according to their IC  The model with the lowest IC is chosen as the “best” Information Criteria  IC rewards goodness of fit, but also includes a penalty that is an increasing function of the number of estimated parameters.  This penalty discourages overfitting.  The IC methodology attempts to find the model that best explains the data with a minimum of free parameters.  More traditional approaches such as LRT start from a null hypothesis.  IC judges a model by how close its fitted values tend to be to the true values.  the AIC value assigned to a model is only meant to rank competing models and tell you which is the best among the given alternatives. Akaike Information Criteria (AIC) AIC  2 log Lik  2 p Akaike, Hirotugu (1974). "A new look at the statistical model identification". IEEE Transactions on Automatic Control 19 (6): 716–723.. Bayesian Information Criteria BIC  2 log Lik  p ln(N ) Schwarz, Gideon E. (1978). "Estimating the dimension of a model". Annals of Statistics 6 (2): 461–464. AIC versus BIC 2 p vs. p ln(N )  BIC and AIC are similar  Different penalty for number of parameters  The BIC penalizes free parameters more strongly than does the AIC.  Implications: BIC tends to choose smaller models  The larger the N, the more likely that AIC and BIC will disagree on model selection Prostate cancer models  We looked at different forms for volume: A: volume as continuous B: volume as binary (detectable vs. undetectable) C: 4 categories of volume D: 3 categories of volume E: linear + squared term for volume AIC vs. BIC (N=380) p -2logLik AIC BIC A: continuous 8 376.0 392.0 423.5 B: binary 8 375.2 391.2 422.7 C: 4 categories 10 373.6 393.6 433.0 D: 3 categories 9 375.2 393.2 428.6 E: quadratic 9 376.0 394.0 429.4 AIC vs. BIC if N is multiplied by 10 (N=3800) p -2logLik AIC BIC A: continuous 8 3760.0 3776.0 3825.9 B: binary 8 3752.0 3768.0 3817.9 C: 4 categories 10 3736.0 3756.0 3818.4 D: 3 categories 9 3751.9 3769.9 3826.1 E: quadratic 9 3760.0 3778.0 3834.2 ROC curve analysis  Receiver Operating Characteristic Curve Analysis  Traditionally, looks at the sensitivity and specificity of a ‘model’ for predicting an outcome  Question: based on our model, can we accurately predict if a prostate cancer patient has capsular penetration? ROC curve analysis  Associations between predictors and outcomes is not enough  Need ‘stronger’ relationship  Classic interpretation of sens and spec • a binary test and a binary outcome • sensitivity = P(test + | true disease) • specificity = P(test - |true no disease)  What is test + in our dataset?  What does the model provide for us? 0.00 0.25 0.50 0.75 1.00 ROC curve analysis 0.00 0.25 0.50 Probability cutoff Sensitivity 0.75 Specificity 1.00 Fitted probabilities  The fitted probabilities are the probability that a NEW patient with the same ‘covariate profile’ will be a “case” (e.g., capsular penetration, disease, etc.)  We select a probability ‘threshold’ to determine whether a patient is defined as a case or not  Some options: • high sensitivity (e.g., cancer screens) • high specificity (e.g., PPD skin test for TB) • maximize the sum of sens and spec ROC curve . xi: logit capsule i.dpros detected gleason logpsa i.dpros _Idpros_1-4 (naturally coded; _Idpros_1 omitted) Iteration Iteration Iteration Iteration Iteration 0: 1: 2: 3: 4: log log log log log likelihood likelihood likelihood likelihood likelihood Logistic regression Log likelihood = -188.0471 = = = = = -255.62831 -193.51543 -188.23598 -188.04747 -188.0471 Number of obs LR chi2(6) Prob > chi2 Pseudo R2 = = = = 379 135.16 0.0000 0.2644 -----------------------------------------------------------------------------capsule | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------_Idpros_2 | .7801903 .3573241 2.18 0.029 .079848 1.480533 _Idpros_3 | 1.606646 .3744828 4.29 0.000 .8726729 2.340618 _Idpros_4 | 1.504732 .4495287 3.35 0.001 .6236723 2.385793 detected | -.5719155 .2570359 -2.23 0.026 -1.075697 -.0681344 gleason | .9418179 .1648245 5.71 0.000 .6187677 1.264868 logpsa | .5152153 .1547649 3.33 0.001 .2118817 .8185488 _cons | -8.275811 1.056036 -7.84 0.000 -10.3456 -6.206018 ------------------------------------------------------------------------------ 0.00 0.25 0.50 0.75 1.00 ROC curve 0.00 0.25 Area under ROC curve = 0.8295 0.50 1 - Specificity 0.75 1.00 How to interpret?  Every point represents a patient(s) in the dataset  Question: if we use that person’s fitted probability as the threshold, what are the sens and spec values?  Empirically driven based on the fitted probabilities  Choosing the threshold: • high sens or spec • maximize both? the point on ROC curve closest to the upper left corner AUC of ROC curve      AUC = Area Under the Curve 0.5 < AUC < 1 AUC = 1 if the model is perfect AUC = 0.50 if the model is no better than chance “Good” AUC? • context specific • for some outcomes, there are already good diagnostic measures so AUC would need to be very high • for others, if there is very little, even an AUC of 0.70 would be useful. Utility in model selection  If the goal of the modeling is prediction, AUC can be used to determine the ‘best’ model  A variable may be associated with the outcome, but not add much in terms of prediction  Example: • Model 1: gleason + logPSA + detectable + dpros • Model 2: gleason + logPSA + detectable • Model 3: gleason + logPSA 0.6 0.4 0.2 1: AUC=0.83 2: AUC=0.80 3: AUC=0.79 0.0 True positive rate 0.8 1.0 ROC curve of models 1, 2, and 3 0.0 0.2 0.4 0.6 False positive rate 0.8 1.0 Sensitivity and Specificity  For ‘true’ use, you need to choose a cutoff.  The AUC of the ROC curve tells you about prediction of model  But, not directly translatable into ‘accuracy’ of a given threshold phat = 0.50 cutoff Logistic model for capsule -------- True -------Classified | D ~D | Total -----------+--------------------------+----------+ | 100 39 | 139 | 53 187 | 240 -----------+--------------------------+----------Total | 153 226 | 379 Classified + if predicted Pr(D) >= .5 True D defined as capsule != 0 -------------------------------------------------Sensitivity Pr( +| D) 65.36% Specificity Pr( -|~D) 82.74% Positive predictive value Pr( D| +) 71.94% Negative predictive value Pr(~D| -) 77.92% -------------------------------------------------False + rate for true ~D Pr( +|~D) 17.26% False - rate for true D Pr( -| D) 34.64% False + rate for classified + Pr(~D| +) 28.06% False - rate for classified Pr( D| -) 22.08% -------------------------------------------------Correctly classified 75.73% -------------------------------------------------- phat = 0.25 cutoff Logistic model for capsule -------- True -------Classified | D ~D | Total -----------+--------------------------+----------+ | 137 96 | 233 | 16 130 | 146 -----------+--------------------------+----------Total | 153 226 | 379 Classified + if predicted Pr(D) >= .25 True D defined as capsule != 0 -------------------------------------------------Sensitivity Pr( +| D) 89.54% Specificity Pr( -|~D) 57.52% Positive predictive value Pr( D| +) 58.80% Negative predictive value Pr(~D| -) 89.04% -------------------------------------------------False + rate for true ~D Pr( +|~D) 42.48% False - rate for true D Pr( -| D) 10.46% False + rate for classified + Pr(~D| +) 41.20% False - rate for classified Pr( D| -) 10.96% -------------------------------------------------Correctly classified 70.45% --------------------------------------------------

lect16

Related documents

Products

Support

lect16

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib