What You See May Not Be What You Get: A Primer on Regression Artifacts Michael A. Babyak, PhD Duke University Medical Center Topics to Cover 1. 2. 3. 4. 5. 6. 7. Models: what and why? Preliminaries—requirements for a good model Dichotomizing a graded or continuous variable is dumb Using degrees of freedom wisely Covariate selection Transformations and smoothing techniques for non-linear effects Resampling as a superior method of model validation What is a model ? Y = f(x1, x2, x3…xn) Y = a + b1x1 + b2x2…bnxn Y = e a + b1x1 + b2x2…bnxn Why Model? (instead of test) 1. Can capture theoretical/predictive system 2. Estimates of population parameters 3. Allows prediction as well as hypothesis testing 4. More information for replication Preliminaries 1. Correct model 2. Measure well and don’t throw information away 3. Adequate Sample Size Correct Model • Gaussian: General Linear Model • Multiple linear regression • Binary (or ordinal): Generalized Linear Model • Logistic Regression • Proportional Odds/Ordinal Logistic • Time to event: • Cox Regression • Distribution of predictors generally not important Measure well and don’t throw information away • Reliable, interpretable • Use all the information about the variables of interest • Don’t create “clinical cutpoints” before modeling • Model with ALL the data first, then use prediction to make decisions about cutpoints Dichotomizing for Convenience Can Destroy a Model Implausible measurement assumption “depressed” Depression score 44 36 32 28 C 24 20 16 12 8 4 0 AB 40 “not depressed” Dichotomization, by definition, reduces power by a minimum of about 30% http://psych.colorado.edu/~mcclella/MedianSplit/ Dichotomization, by definition, reduces power by a minimum of about 30% Dear Project Officer, In order to facilitate analysis and interpretation, we have decided to throw away about 30% of our data. Even though this will waste about 3 or 4 hundred thousand dollars worth of subject recruitment and testing money, we are confident that you will understand. Sincerely, Dick O. Tomi, PhD Prof. Richard Obediah Tomi, PhD Examples from the WCGS Study: Correlations with CHD Mortality (n = 750) Continuous Dichotomized at median Reduction in r2 Variable r r2 r r2 Systolic Blood Pressure Hostility .15 .023 .12 .014 -39% .15 .023 .08 .006 -74% Dichotomizing does not reduce measurement error Gustafson, P. and Le, N.D. (2001). A comparison of continuous and discrete measurement error: is it wise to dichotomize imprecise covariates? Submitted. Available at http://www.stat.ubc.ca/people/gustaf. Simulation: Dichotomizing makes matters worse when measure is unreliable b1 = .4 X1 Y True Model: X1 continuous Simulation: Dichotomizing makes matters worse when measure is unreliable b1 = .4 X1 Y Same Model with X1 dichotomized Simulation: Dichotomizing makes matters worse when measure is unreliable Reliability=.65, .75., .85, 1.00 Contin. Dich. b1 = .4 X1 Y b1 = .4 X1 Y Models with reliability of X1 manipulated % correct rejections of null hypothesis Dichotomization of a variable measured with error (y = .4x + e) 100 Continuous x 90 80 70 60 50 1.00 0.85 0.75 Reliability of x 0.65 % correct rejections of null hypothesis Dichotomization of a variable measured with error (y = .4x + e) Continuous x Dichotomized x 100 90 80 70 60 50 1.00 0.85 0.75 Reliability of x 0.65 Dichotomizing will obscure non-linearity D ic h o to m iz e da tM e d ia n(C E S -D = 7 ) 3 0 2 4 PercentwithWalMotionAbnormality 1 8 1 2 6 0 N o tD e p re s s e d D e p re s s e d Dichotomizing will obscure non-linearity W M A o n a tL e a s t1 T a s k U sin gC u b icS p lin e 1 .0 0 .8 ProbabilityofWMA 0 .6 0 .4 0 .2 0 .0 0 5 1 0 1 5 2 0 C E S -D S c o re 2 5 3 0 3 5 4 0 Simulation 2: Dichotomizing a continuous predictor that is correlated with another predictor X1 and X2 continuous X1 b1 = .4 Y X2 b2 = .0 Simulation 2: Dichotomizing a continuous predictor that is correlated with another predictor X1 dichotomized X1 b1 = .4 Y X2 b2 = .0 Simulation 2: Dichotomizing a continuous predictor that is correlated with another predictor X1 dichotomized; rho12 manipulated X1 b1 = .4 r12 = Y .0, .4, .7 X2 b2 = .0 Simulation 2: Dichotomizing a continuous predictor that is correlated with another predictor (%) Incorrect rejections of X2 = 0 X1 and X2 continuous 4.5 4 3.5 3 2.5 2 1.5 1 0.5 0 0 0.4 Correlation between x1, x2 0.7 Simulation 2: Dichotomizing a continuous predictor that is correlated with another predictor Both continuous x1 dichotomous, x2 continuous 30 (%) Incorrect rejections of X2 = 0 25 20 15 10 5 0 0 0.4 0.7 Correlation between x1, x2 Is it ever a good idea to categorize quantitatively measured variables? • Yes: – when the variable is truly categorical – for descriptive/presentational purposes – for hypothesis testing, if enough categories are made. • However, using many categories can lead to problems of multiple significance tests and still run the risk of misclassification CONCLUSIONS • Cutting: – Doesn’t always make measurement sense – Almost always reduces power – Can fool you with too much power in some instances – Can completely miss important features of the underlying function • Modern computing/statistical packages can “handle” continuous variables • Want to make good clinical cutpoints? Model first, cut later. Clinical Events and LVEF Change during Mental Stress: 5 Year follow-up Model first, cut later Maximum Change in LVEF (%) Requirements: Sample Size • Linear regression – minimum of N = 50 + 8:predictor (Green, 1990) • Logistic Regression – Minimum of N = 10-15/predictor among smallest group (Peduzzi et al., 1990a) • Survival Analysis – Minimum of N = 10-15/predictor (Peduzzi et al., 1990b) Concept of Simulation Y = b X + error bs1 bs2 bs3 bs4 …………………. bsk-1 bsk Concept of Simulation Y = b X + error bs1 bs2 bs3 bs4 …………………. bsk-1 Evaluate bsk Simulation Example Y = .4 X + error bs1 bs2 bs3 bs4 …………………. bsk-1 bsk Simulation Example Y = .4 X + error bs1 bs2 bs3 bs4 …………………. bsk-1 Evaluate bsk 2000 1500 1000 500 0 Frequency of beta value 2500 True Model: Y = .4*x1 + e 0.2 0.4 Value of beta for x1 0.6 Sample Size • Linear regression – minimum of N = 50 + 8:predictor (Green, 1990) • Logistic Regression – Minimum of N = 10-15/predictor among smallest group (Peduzzi et al., 1990a) • Survival Analysis – Minimum of N = 10-15/predictor (Peduzzi et al., 1990b) 10 2 4 6 8 n/p~3 n/p~6.6 n/p=10 n/p~13.3 0 Density 12 14 16 All-noise, but good fit 0.0 0.1 0.2 0.3 0.4 0.5 0.6 R-Square from Full Model 0.7 0.8 0.9 1.0 Simulation: number of events/predictor ratio Y = .5*x1 + 0*x2 + .2*x3 + 0*x4 -- Where r x1 x4 = .4 -- N/p = 3, 5, 10, 20, 50 Parameter stability and n/p ratio x2 0 1 2 3 4 5 6 7 8 0 1 2 3 4 5 6 7 8 Density x1 n/p=3 n/p=5 n/p=10 n/p=20 n/p=50 -2.0 -1.0 0.0 0.5 1.0 1.5 2.0 -2.0 -1.0 0.5 1.0 1.5 2.0 0.5 1.0 1.5 2.0 0 1 2 3 4 5 6 7 8 x4 0 1 2 3 4 5 6 7 8 Density x3 0.0 -2.0 -1.0 0.0 0.5 Parameter Estimate 1.0 1.5 2.0 -2.0 -1.0 0.0 Parameter Estimate Peduzzi’s Simulation: number of events/predictor ratio P(survival) =a + b1*NYHA + b2*CHF + b3*VES +b4*DM + b5*STD + b6*HTN + b7*LVC --Events/p = 2, 5, 10, 15, 20, 25 --% relative bias = (estimated b – true b/true b)*100 Simulation results: number of events/predictor ratio % Relative Bias 50 40 NYHA CHF VES DM STD HTN LVC 30 20 10 0 -10 -20 0 2 5 10 15 Events per variable 20 25 Simulation results: number of events/predictor ratio Proportion w/ Bias > 100% 0.7 0.6 NYHA CHF VES DM STD HTN LVC 0.5 0.4 0.3 0.2 0.1 0 0 2 5 10 15 Events per variable 20 25 Predictor (covariate) selection 1. 2. 3. 4. Theory, substantive knowledge, prior models Testing for confounding Univariate testing Last (and least), automated methods, aka stepwise and best subset regression Searching for Confounders • Fundamental tension between underfitting and overfitting • Underfitting = not adjusting for important confounders • Overfitting = capitalizing on chance relations (sample fluctuation) Covariate selection • Overfitting has been studied extensively • “Scariest” study is by Faraway (1992)— showed that any pre-modeling strategy cost a df over and above df used later in modeling. • Premodeling strategies included: variable selection, outlier detection, linearity tests, residual analysis. Covariate selection • Therefore, if you transform, select, etc., you must include the DF in (i.e., penalize for) the “Final Model” Covariate selection: Univariate Testing • Non-Significant tests also cost a DF • Variables may not behave the same way in a multivariable model—variable “not significant” at univariate test may be very important in the presence of other variables Covariate selection • Despite the convention, testing for confounding has not been systematically studied—likely leads to overadjustment and underestimate of true effect of variable of interest. • At the very least, pulling variables in and out of models inflates the Type I error rate, sometimes dramatically SOME of the problems with stepwise variable selection. 1. It yields R-squared values that are badly biased high SOME of the problems with stepwise variable selection. 1. It yields R-squared values that are badly biased high 2. The F and chi-squared tests quoted next to each variable on the printout do not have the claimed distribution SOME of the problems with stepwise variable selection. 1. It yields R-squared values that are badly biased high 2. The F and chi-squared tests quoted next to each variable on the printout do not have the claimed distribution 3. The method yields confidence intervals for effects and predicted values that are falsely narrow (See Altman and Anderson Stat in Med) SOME of the problems with stepwise variable selection. 1. It yields R-squared values that are badly biased high 2. The F and chi-squared tests quoted next to each variable on the printout do not have the claimed distribution 3. The method yields confidence intervals for effects and predicted values that are falsely narrow (See Altman and Anderson Stat in Med) 4. It yields P-values that do not have the proper meaning and the proper correction for them is a very difficult problem SOME of the problems with stepwise variable selection. 1. It yields R-squared values that are badly biased high 2. The F and chi-squared tests quoted next to each variable on the printout do not have the claimed distribution 3. The method yields confidence intervals for effects and predicted values that are falsely narrow (See Altman and Anderson Stat in Med) 4. It yields P-values that do not have the proper meaning and the proper correction for them is a very difficult problem 5. It gives biased regression coefficients that need shrinkage (the coefficients for remaining variables are too large; see Tibshirani, 1996) SOME of the problems with stepwise variable selection. 1. It yields R-squared values that are badly biased high 2. The F and chi-squared tests quoted next to each variable on the printout do not have the claimed distribution 3. The method yields confidence intervals for effects and predicted values that are falsely narrow (See Altman and Anderson Stat in Med) 4. It yields P-values that do not have the proper meaning and the proper correction for them is a very difficult problem 5. It gives biased regression coefficients that need shrinkage (the coefficients for remaining variables are too large; see Tibshirani, 1996). 6. It has severe problems in the presence of collinearity SOME of the problems with stepwise variable selection. 1. It yields R-squared values that are badly biased high 2. The F and chi-squared tests quoted next to each variable on the printout do not have the claimed distribution 3. The method yields confidence intervals for effects and predicted values that are falsely narrow (See Altman and Anderson Stat in Med) 4. It yields P-values that do not have the proper meaning and the proper correction for them is a very difficult problem 5. It gives biased regression coefficients that need shrinkage (the coefficients for remaining variables are too large; see Tibshirani, 1996). 6. It has severe problems in the presence of collinearity 7. It is based on methods (e.g. F- tests for nested models) that were intended to be used to test pre-specified hypotheses SOME of the problems with stepwise variable selection. 1. It yields R-squared values that are badly biased high 2. The F and chi-squared tests quoted next to each variable on the printout do not have the claimed distribution 3. The method yields confidence intervals for effects and predicted values that are falsely narrow (See Altman and Anderson Stat in Med) 4. It yields P-values that do not have the proper meaning and the proper correction for them is a very difficult problem 5. It gives biased regression coefficients that need shrinkage (the coefficients for remaining variables are too large; see Tibshirani, 1996). 6. It has severe problems in the presence of collinearity 7. It is based on methods (e.g. F tests for nested models) that were intended to be used to test pre-specified hypotheses. 8. Increasing the sample size doesn't help very much (see Derksen and Keselman) SOME of the problems with stepwise variable selection. 1. It yields R-squared values that are badly biased high 2. The F and chi-squared tests quoted next to each variable on the printout do not have the claimed distribution 3. The method yields confidence intervals for effects and predicted values that are falsely narrow (See Altman and Anderson Stat in Med) 4. It yields P-values that do not have the proper meaning and the proper correction for them is a very difficult problem 5. It gives biased regression coefficients that need shrinkage (the coefficients for remaining variables are too large; see Tibshirani, 1996). 6. It has severe problems in the presence of collinearity 7. It is based on methods (e.g. F tests for nested models) that were intended to be used to test pre-specified hypotheses. 8. Increasing the sample size doesn't help very much (see Derksen and Keselman) 9. It allows us to not think about the problem SOME of the problems with stepwise variable selection. 1. It yields R-squared values that are badly biased high 2. The F and chi-squared tests quoted next to each variable on the printout do not have the claimed distribution 3. The method yields confidence intervals for effects and predicted values that are falsely narrow (See Altman and Anderson Stat in Med) 4. It yields P-values that do not have the proper meaning and the proper correction for them is a very difficult problem 5. It gives biased regression coefficients that need shrinkage (the coefficients for remaining variables are too large; see Tibshirani, 1996). 6. It has severe problems in the presence of collinearity 7. It is based on methods (e.g. F tests for nested models) that were intended to be used to test pre-specified hypotheses. 8. Increasing the sample size doesn't help very much (see Derksen and Keselman) 9. It allows us to not think about the problem 10. It uses a lot of paper “I now wish I had never written the stepwise selection code for SAS.” --Frank Harrell, author of forward and backwards selection algorithm for SAS PROC REG Automated Selection: Derksen and Keselman (1992) Simulation Study • Studied backward and forward selection • Some authentic variables and some noise variables among candidate variables • Manipulated correlation among candidate predictors • Manipulated sample size Automated Selection: Derksen and Keselman (1992) Simulation Study • “The degree of correlation between candidate predictors affected the frequency with which the authentic predictors found their way into the model.” • “The greater the number of candidate predictors, the greater the number of noise variables were included in the model.” • “Sample size was of little practical importance in determining the number of authentic variables contained in the final model.” Simulation results: Number of Noise Variables Included 35 Sample Size % of samples 30 100 200 500 1000 10000 25 20 15 10 5 0 0 1 2 3 4 5 Variables in Final Model 20 candidate predictors; 100 samples 6 7 % of samples Simulation results: R-Square From Noise Variables 100 90 80 70 60 50 40 30 20 10 0 Sample Size 100 200 500 1000 10000 0 0-5 5-10 10-15 15-20 20-25 > 25 % Variance Explained 20 candidate predictors; 100 samples Simulation results: R-Square From Noise Variables 0.3 Sample Size R-Square 0.25 10,000 1,000 500 200 100 0.2 0.15 0.1 0.05 0 Samples (Deciles) 20 candidate predictors; 100 samples Variable Selection • Pick variables a priori • Stick with them • Penalize appropriately for any datadriven decision about how to model a variable Spending DF wisely • Select variables of most importance • Use DF to assess non-linearity using flexible curve approach (more about this later) • If not enough N/predictor, combine covariates using techniques that do not look at Y in the sample, PCA, FA, conceptual clustering, collapsing, scoring, established indexes, propensity scores. Can use data to determine where to spend DF • Use Spearman’s Rho to test “importance” • Not peeking because we have chosen to include the term in the model regardless of relation to Y • Use more DF for non-linearity Example-Predict Survival from age, gender, and fare on Titanic If you have already decided to include them (and promise to keep them in the model) you can peek at predictors in order to see where to add complexity Spearman Test N df age 1046 1 fare 1308 1 sex 1309 1 0.0 0.05 0.10 0.15 Adjusted rho^2 0.20 0.25 Non-linearity using splines Linear Spline (piecewise regression) Y = a + b1(x<10) + b2(10<x<20) + b3 (x >20) 2.5 2 Y 1.5 1 0.5 0 0 0 5 10 X 15 20 25 Cubic Spline (non-linear piecewise regression) knots 2.5 2 Y 1.5 1 0.5 0 0 0 X Logistic regression model fitfare<-lrm(survived~(rcs(fare,3)+age+sex)^2,x=T,y=T) anova(fitfare) Spline with 3 knots Wald Statistics Response: survived Factor Chi-Square d.f. fare (Factor+Higher Order Factors) 55.1 6 All Interactions 13.8 4 Nonlinear (Factor+Higher Order Factors) 21.9 3 age (Factor+Higher Order Factors) 22.2 4 All Interactions 16.7 3 sex (Factor+Higher Order Factors) 208.7 4 All Interactions 20.2 3 fare * age (Factor+Higher Order Factors) 8.5 2 Nonlinear 8.5 1 Nonlinear Interaction : f(A,B) vs. AB 8.5 1 fare * sex (Factor+Higher Order Factors) 6.4 2 Nonlinear 1.5 1 Nonlinear Interaction : f(A,B) vs. AB 1.5 1 age * sex (Factor+Higher Order Factors) 9.9 1 TOTAL NONLINEAR 21.9 3 TOTAL INTERACTION 24.9 5 TOTAL NONLINEAR + INTERACTION 38.3 6 TOTAL 245.3 9 P <.0001 0.0079 0.0001 0.0002 0.0008 <.0001 0.0002 0.0142 0.0036 0.0036 0.0401 0.2153 0.2153 0.0016 0.0001 0.0001 <.0001 <.0001 Wald Statistics Response: survived Factor Chi-Square d.f. fare (Factor+Higher Order Factors) 55.1 6 All Interactions 13.8 4 Nonlinear (Factor+Higher Order Factors) 21.9 3 age (Factor+Higher Order Factors) 22.2 4 All Interactions 16.7 3 sex (Factor+Higher Order Factors) 208.7 4 All Interactions 20.2 3 fare * age (Factor+Higher Order Factors) 8.5 2 Nonlinear 8.5 1 Nonlinear Interaction : f(A,B) vs. AB 8.5 1 fare * sex (Factor+Higher Order Factors) 6.4 2 Nonlinear 1.5 1 Nonlinear Interaction : f(A,B) vs. AB 1.5 1 age * sex (Factor+Higher Order Factors) 9.9 1 TOTAL NONLINEAR 21.9 3 TOTAL INTERACTION 24.9 5 TOTAL NONLINEAR + INTERACTION 38.3 6 TOTAL 245.3 9 P <.0001 0.0079 0.0001 0.0002 0.0008 <.0001 0.0002 0.0142 0.0036 0.0036 0.0401 0.2153 0.2153 0.0016 0.0001 0.0001 <.0001 <.0001 Wald Statistics Response: survived Factor Chi-Square d.f. fare (Factor+Higher Order Factors) 55.1 6 All Interactions 13.8 4 Nonlinear (Factor+Higher Order Factors) 21.9 3 age (Factor+Higher Order Factors) 22.2 4 All Interactions 16.7 3 sex (Factor+Higher Order Factors) 208.7 4 All Interactions 20.2 3 fare * age (Factor+Higher Order Factors) 8.5 2 Nonlinear 8.5 1 Nonlinear Interaction : f(A,B) vs. AB 8.5 1 fare * sex (Factor+Higher Order Factors) 6.4 2 Nonlinear 1.5 1 Nonlinear Interaction : f(A,B) vs. AB 1.5 1 age * sex (Factor+Higher Order Factors) 9.9 1 TOTAL NONLINEAR 21.9 3 TOTAL INTERACTION 24.9 5 TOTAL NONLINEAR + INTERACTION 38.3 6 TOTAL 245.3 9 P <.0001 0.0079 0.0001 0.0002 0.0008 <.0001 0.0002 0.0142 0.0036 0.0036 0.0401 0.2153 0.2153 0.0016 0.0001 0.0001 <.0001 <.0001 Wald Statistics Response: survived Factor Chi-Square d.f. fare (Factor+Higher Order Factors) 55.1 6 All Interactions 13.8 4 Nonlinear (Factor+Higher Order Factors) 21.9 3 age (Factor+Higher Order Factors) 22.2 4 All Interactions 16.7 3 sex (Factor+Higher Order Factors) 208.7 4 All Interactions 20.2 3 fare * age (Factor+Higher Order Factors) 8.5 2 Nonlinear 8.5 1 Nonlinear Interaction : f(A,B) vs. AB 8.5 1 fare * sex (Factor+Higher Order Factors) 6.4 2 Nonlinear 1.5 1 Nonlinear Interaction : f(A,B) vs. AB 1.5 1 age * sex (Factor+Higher Order Factors) 9.9 1 TOTAL NONLINEAR 21.9 3 TOTAL INTERACTION 24.9 5 TOTAL NONLINEAR + INTERACTION 38.3 6 TOTAL 245.3 9 P <.0001 0.0079 0.0001 0.0002 0.0008 <.0001 0.0002 0.0142 0.0036 0.0036 0.0401 0.2153 0.2153 0.0016 0.0001 0.0001 <.0001 <.0001 Wald Statistics Response: survived Factor Chi-Square d.f. fare (Factor+Higher Order Factors) 55.1 6 All Interactions 13.8 4 Nonlinear (Factor+Higher Order Factors) 21.9 3 age (Factor+Higher Order Factors) 22.2 4 All Interactions 16.7 3 sex (Factor+Higher Order Factors) 208.7 4 All Interactions 20.2 3 fare * age (Factor+Higher Order Factors) 8.5 2 Nonlinear 8.5 1 Nonlinear Interaction : f(A,B) vs. AB 8.5 1 fare * sex (Factor+Higher Order Factors) 6.4 2 Nonlinear 1.5 1 Nonlinear Interaction : f(A,B) vs. AB 1.5 1 age * sex (Factor+Higher Order Factors) 9.9 1 TOTAL NONLINEAR 21.9 3 TOTAL INTERACTION 24.9 5 TOTAL NONLINEAR + INTERACTION 38.3 6 TOTAL 245.3 9 P <.0001 0.0079 0.0001 0.0002 0.0008 <.0001 0.0002 0.0142 0.0036 0.0036 0.0401 0.2153 0.2153 0.0016 0.0001 0.0001 <.0001 <.0001 Predictors of Survival on Titanic 0.50 2.00 4.00 6.00 8.00 10.00 12.00 fare - 31:7.9 sex - female:male 0. 95 age - 39:21 Adjusted to:fare=14 age=28 sex=male 0 Prob. of Survival 0.2 0.4 0.6 0.8 1 Fare and Age Interaction 60 50 250 40 3 ag 0 e 200 150 20 100 Fare 10 Adjusted to: sex=m ale 50 0 1.0 Fare and Gender Interaction 0.6 0.4 male 0.2 Prob. of Survival 0.8 female 0 50 100 150 Fare Adjusted to: age=28 200 250 300 Validation • Apparent • too optimistic • Internal • cross-validation, bootstrap • honest estimate for model performance • provides an upper limit to what would be found on external validation • External validation • replication with new sample, different circumstances Validation • Steyerburg, et al. (1999) compared validation methods • Found that split-half was far too conservative • Bootstrap was equal or superior to all other techniques Bootstrap My Sample ?1 ?2 ?3 ?4 …………………. WITH REPLACEMENT Evaluate ?k-1 ?k 1, 3, 4, 5, 7, 10 7 1 1 4 5 10 10 3 2 2 2 1 3 5 1 4 2 7 2 1 1 7 2 7 4 4 1 4 2 10 Bootstrap Validation Index Dxy R2 Intercept Slope Training 0.6565 0.4273 0.0000 1.0000 Corrected 0.646 0.407 -0.011 0.952 Summary • Think about your model • Collect enough data Summary • Measure well • Don’t destroy what you’ve measured Summary • Pick your variables ahead of time and collect enough data to test the model you want • Keep all your variables in the model unless extremely unimportant Summary • Use more df on important variables, fewer df on “nuisance” variables • Don’t peek at Y to combine, discard, or transform variables Summary • Estimate validity and shrinkage with bootstrap Summary • By all means, tinker with the model later, but be aware of the costs of tinkering • Don’t forget to say you tinkered • Go collect more data Web links for references, software, and more • Harrell’s regression modeling text – http://hesweb1.med.virginia.edu/biostat/rms/ • SAS Macros for spline estimation – http://hesweb1.med.virginia.edu/biostat/SAS/survrisk.txt • Some results comparing validation methods – http://hesweb1.med.virginia.edu/biostat/reports/logistic.val.pdf • SAS code for bootstrap – ftp://ftp.sas.com/pub/neural/jackboot.sas • S-Plus home page – insightful.com • Mike Babyak’s e-mail – michael.babyak@duke.edu • This presentation – http://www.duke.edu/~mbabyak