Chapter 8 Course Notes

advertisement
Regression Models for Quantitative
(Numeric) and Qualitative
(Categorical) Predictors
KNNL – Chapter 8
Polynomial Regression Models
• Useful in 2 Settings:
– True relation between response and predictor is
polynomial
– True relation is complex nonlinear function that can be
approximated by polynomial in specific range of X-levels
• Models with 1 Predictor: Including p polynomial
terms in model, creates p-1 “bends”
– 2nd order Model: E{Y} = b0 + b1x + b2x2
(x = centered X)
– 3rd order Model: E{Y} = b0 + b1x + b2x2 + b3x3
• Response Surfaces with 2 (or more) predictors
– 2nd order model with 2 Predictors:
E Y   b0  b1 x1  b2 x2  b11 x12  b22 x22  b12 x1 x2
x1  X1  X 1 x2  X 2  X 2
Modeling Strategies
• Use Extra Sums of Squares and General Linear Tests to
compare models of increasing complexity (higher order)
• Use coding in fitting models (centered/scaled)
predictors to reduce multicollinearity when conducting
testing.
• Fit models in original units, or back-transform for
plotting on original scale* (see below for quadratic)
• For Response Surfaces, include multiple replicates at
center point for goodness-of-fit tests
^



Centered variables: Y  b0  b1 x  b11 x 2  b0  b1 X  X  b11 X  X
2

 b0  b1 X  b1 X  b11 X 2  2b11 X X  b11 X  b0  b1 X  b11 X
2



2

2
 b1  2b11 X X  b11 X  bo'  b1' X  b2' X 2
Regression Models with Interaction Term(s)
• Interaction  Effect (Slope) of one predictor variable
depends on the level other predictor variable(s)
• Formulated by including cross-product term(s) among
predictor variables
• 2 Variable Models: E{Y} = b0 + b1X1 + b2X2 + b3X1X2
X 2  0  E Y   b0  b1 X 1  b 2  0   b3  X 1 (0)   b 0  b1 X 1
X 2  10  E Y   b0  b1 X 1  b 2 10   b3  X 1 (10)   b 0  b1 X 1  10b 2  10b3 X 1 
  b0  10b 2    b1  10b3  X 1
Testing Hypothesis of no interaction: H 0 : b3  0 H A : b3  0
Qualitative Predictors
• Often, we wish to include categorical variables as
predictors (e.g. gender, region of country, …)
• Trick: Create dummy (indicator) variable(s) to
represent effects of levels of the categorical variables
on response
• Problem: If variable has c categories, and we create c
dummy variables, the model is not full rank when we
include intercept
• Solution: Create c – 1 dummy variables, leaving one
level as the control/baseline/reference category
Example – Salary vs Experience by Region
State has 3 regions, sample 2 people per region, Y =salary, X 1  experience
1 if Region 1
X2  
 0 otherwise
1
1

1
X
1
1

1
X 11
X 21
X 31
X 41
X 51
X 61
1
1
0
0
0
0
0
0
1
1
0
0
1 if Region 2
X3  
 0 otherwise
0
0 
0

0
1

1 
1 if Region 3
X4  
 0 otherwise

 6

 6
  X i1
X'X   i 1
 2

 2
 2
6
 X i1
2
2
X 11  X 21
X 31  X 41
2
0
0
0
2
0
i 1
6
X
i 1
2
i1
X 11  X 21
X 31  X 41
X 51  X 61




X 51  X 61 


0

0


2
2
Solution, just use the Region 1 dummy (X2) and the region 2 dummy (X3), making
Region 3 the “reference” region (Note: it is arbitrary which region is the reference)
Example – Salary vs Experience by Region
3 regions, Y =salary, X 1  experience
1 if Region 1
X2  
 0 otherwise
1 if Region 2
X3  
 0 otherwise
E Y   b 0  b1 X 1  b 2 X 2  b3 X 3
Region 1: X 2  1, X 3  0  E Y   b 0  b1 X 1  b 2 (1)  b3 (0)   b 0  b 2   b1 X 1
Region 2: X 2  0, X 3  1  E Y   b 0  b1 X 1  b 2 (0)  b3 (1)   b 0  b3   b1 X 1
Region 3: X 2  0, X 3  0  E Y   b 0  b1 X 1  b 2 (0)  b3 (0)  b 0  b1 X 1
b 2  Difference between Regions 1 and 3, controlling for experience
b3  Difference between Regions 2 and 3, controlling for experience
b 2  b3  Difference between Regions 1 and 2, controlling for experience
b 2  b3  0  No differences among Regions 1,2,3 wrt Salary, Controlling for Experience
Interactions Between Qualitative and Quantitative Predictors
• We can allow the slope wrt to a Quantitative Predictor to
differ across levels of the Categorical Predictor
• Trick: Create cross-product terms between Quantitative
Predictor and each of the c-1 dummy variables
• Can conduct General Linear Test to determine whether
slopes differ (or t-test when qualitative predictor has c=2
levels)
• These models generalize to any number of quantitative
and qualitative predictors
Salary (Y ), Expediture (X 1 ), and regions (X 2 , X 3 ):
Additive Model: E Y   b 0  b1 X 1  b 2 X 2  b3 X 3
Interaction Model: E Y   b 0  b1 X 1  b 2 X 2  b3 X 3  b 4 X 1 X 2  b5 X 1 X 3
Region 1: X 2  1, X 3  0  : E Y   b 0  b1 X 1  b 2 (1)  b3 (0)  b 4 X 1 (1)  b5 X 1 (0)   b 0  b 2    b1  b 4  X 1
Region 2: X 2  0, X 3  1 : E Y    b 0  b3    b1  b5  X 1
Region 3: X 2  0, X 3  0  : E Y   b 0  b1 X 1
Download