Regression Models for Quantitative (Numeric) and Qualitative (Categorical) Predictors KNNL – Chapter 8 Polynomial Regression Models • Useful in 2 Settings: – True relation between response and predictor is polynomial – True relation is complex nonlinear function that can be approximated by polynomial in specific range of X-levels • Models with 1 Predictor: Including p polynomial terms in model, creates p-1 “bends” – 2nd order Model: E{Y} = b0 + b1x + b2x2 (x = centered X) – 3rd order Model: E{Y} = b0 + b1x + b2x2 + b3x3 • Response Surfaces with 2 (or more) predictors – 2nd order model with 2 Predictors: E Y b0 b1 x1 b2 x2 b11 x12 b22 x22 b12 x1 x2 x1 X1 X 1 x2 X 2 X 2 Modeling Strategies • Use Extra Sums of Squares and General Linear Tests to compare models of increasing complexity (higher order) • Use coding in fitting models (centered/scaled) predictors to reduce multicollinearity when conducting testing. • Fit models in original units, or back-transform for plotting on original scale* (see below for quadratic) • For Response Surfaces, include multiple replicates at center point for goodness-of-fit tests ^ Centered variables: Y b0 b1 x b11 x 2 b0 b1 X X b11 X X 2 b0 b1 X b1 X b11 X 2 2b11 X X b11 X b0 b1 X b11 X 2 2 2 b1 2b11 X X b11 X bo' b1' X b2' X 2 Regression Models with Interaction Term(s) • Interaction Effect (Slope) of one predictor variable depends on the level other predictor variable(s) • Formulated by including cross-product term(s) among predictor variables • 2 Variable Models: E{Y} = b0 + b1X1 + b2X2 + b3X1X2 X 2 0 E Y b0 b1 X 1 b 2 0 b3 X 1 (0) b 0 b1 X 1 X 2 10 E Y b0 b1 X 1 b 2 10 b3 X 1 (10) b 0 b1 X 1 10b 2 10b3 X 1 b0 10b 2 b1 10b3 X 1 Testing Hypothesis of no interaction: H 0 : b3 0 H A : b3 0 Qualitative Predictors • Often, we wish to include categorical variables as predictors (e.g. gender, region of country, …) • Trick: Create dummy (indicator) variable(s) to represent effects of levels of the categorical variables on response • Problem: If variable has c categories, and we create c dummy variables, the model is not full rank when we include intercept • Solution: Create c – 1 dummy variables, leaving one level as the control/baseline/reference category Example – Salary vs Experience by Region State has 3 regions, sample 2 people per region, Y =salary, X 1 experience 1 if Region 1 X2 0 otherwise 1 1 1 X 1 1 1 X 11 X 21 X 31 X 41 X 51 X 61 1 1 0 0 0 0 0 0 1 1 0 0 1 if Region 2 X3 0 otherwise 0 0 0 0 1 1 1 if Region 3 X4 0 otherwise 6 6 X i1 X'X i 1 2 2 2 6 X i1 2 2 X 11 X 21 X 31 X 41 2 0 0 0 2 0 i 1 6 X i 1 2 i1 X 11 X 21 X 31 X 41 X 51 X 61 X 51 X 61 0 0 2 2 Solution, just use the Region 1 dummy (X2) and the region 2 dummy (X3), making Region 3 the “reference” region (Note: it is arbitrary which region is the reference) Example – Salary vs Experience by Region 3 regions, Y =salary, X 1 experience 1 if Region 1 X2 0 otherwise 1 if Region 2 X3 0 otherwise E Y b 0 b1 X 1 b 2 X 2 b3 X 3 Region 1: X 2 1, X 3 0 E Y b 0 b1 X 1 b 2 (1) b3 (0) b 0 b 2 b1 X 1 Region 2: X 2 0, X 3 1 E Y b 0 b1 X 1 b 2 (0) b3 (1) b 0 b3 b1 X 1 Region 3: X 2 0, X 3 0 E Y b 0 b1 X 1 b 2 (0) b3 (0) b 0 b1 X 1 b 2 Difference between Regions 1 and 3, controlling for experience b3 Difference between Regions 2 and 3, controlling for experience b 2 b3 Difference between Regions 1 and 2, controlling for experience b 2 b3 0 No differences among Regions 1,2,3 wrt Salary, Controlling for Experience Interactions Between Qualitative and Quantitative Predictors • We can allow the slope wrt to a Quantitative Predictor to differ across levels of the Categorical Predictor • Trick: Create cross-product terms between Quantitative Predictor and each of the c-1 dummy variables • Can conduct General Linear Test to determine whether slopes differ (or t-test when qualitative predictor has c=2 levels) • These models generalize to any number of quantitative and qualitative predictors Salary (Y ), Expediture (X 1 ), and regions (X 2 , X 3 ): Additive Model: E Y b 0 b1 X 1 b 2 X 2 b3 X 3 Interaction Model: E Y b 0 b1 X 1 b 2 X 2 b3 X 3 b 4 X 1 X 2 b5 X 1 X 3 Region 1: X 2 1, X 3 0 : E Y b 0 b1 X 1 b 2 (1) b3 (0) b 4 X 1 (1) b5 X 1 (0) b 0 b 2 b1 b 4 X 1 Region 2: X 2 0, X 3 1 : E Y b 0 b3 b1 b5 X 1 Region 3: X 2 0, X 3 0 : E Y b 0 b1 X 1