Topic 17: Interaction Models

advertisement
Topic 17: Interaction
Models
Interaction Models
• With several explanatory variables, we
need to consider the possibility that the
effect of one variable depends on the
value of another variable
• Special cases
– One binary variable (Y/N) and one
continuous variable
– Two continuous variables
One binary variable and
one continuous variable
• X1 takes values 0 and 1 corresponding to
two different groups
• X2 is a continuous variable
• Model: Y = β0 + β1X1 + β2X2 + β3X1X2 + e
• When X1 = 0 : Y = β0 + β2X2 + e
• When X1 = 1 : Y = (β0 + β1)+ (β2 + β3) X2 + e
One binary and one
continuous
• β0 is the intercept for Group 1
• β0+ β1 is the intercept for Group 2
• Similar relationship for slopes (β2 and β3)
• H0: β1 = β3 = 0 tests the hypothesis that the
regression lines are the same
• H0: β1 = 0 tests equal intercepts
• H0: β3 = 0 tests equal slopes
KNNL Example p316
• Y is number of months for an insurance
company to adopt an innovation
• X1 is the size of the firm (a continuous
variable
• X2 is the type of firm (a qualitative or
categorical variable)
The question
• X2 takes the value 0 if it is a mutual
fund firm and 1 if it is a stock fund firm
• We ask whether or not stock firms
adopt the innovation slower or faster
than mutual firms
• We ask the question across all firms,
regardless of size
Plot the data
symbol1 v=M i=sm70 c=black l=1;
symbol2 v=S i=sm70 c=black l=3;
proc sort data=a1;
by stock size;
proc gplot data=a1;
plot months*size=stock;
run;
Two symbols on plot
months
40
S
M
30
S
S
S
M
M
20
M
SM
S
M
M
M
10
S
S
S
S
M
M
0
0
100
200
300
size
stock
M M M 0
S S S 1
400
Interaction effects
• Interaction expresses the idea that
the effect of one explanatory variable
on the response depends on another
explanatory variable
• In the KNNL example, this would
mean that the slope of the line
depends on the type of firm
Are both lines the same?
• From scatterplot, looks like different
intercepts but can use the test
statement for formal assessment
Data a1; set a1;
sizestock=size*stock;
Proc reg data=a1;
model months=size stock sizestock;
test stock, sizestock;
run;
Output
Test 1 Results for Dependent Variable months
Source
Numerator
Denominator
Mean
DF
Square
2 158.12584
16
F Value
14.34
Pr > F
0.0003
11.02381
Reject H0.There is a difference
in the linear relationship
across groups
Output
•How are they different?
Parameter Estimates
Variable
Intercept
size
stock
sizestock
DF
1
1
1
1
Parameter
Estimate
33.83837
-0.10153
8.13125
-0.00041714
Standard
Error
2.44065
0.01305
3.65405
0.01833
t Value
13.86
-7.78
2.23
-0.02
Pr > |t|
<.0001
<.0001
0.0408
0.9821
1. No difference in slopes assuming different
intercepts
2. Potentially different intercepts assuming
different slopes
Two parallel lines?
proc reg data=a1;
model months=size stock;
run;
Output
Analysis of Variance
Source
Model
Sum of
Mean
DF Squares Square F Value Pr > F
2 1504.4133 752.2066
72.50 <.0001
Error
17 176.38667 10.37569
Corrected Total
19 1680.8000
Root MSE
3.22113 R-Square
0.8951
Dependent Mean
19.40000 Adj R-Sq
0.8827
Coeff Var
16.60377
Output
Parameter Estimates
Variable
Intercept
DF
1
Parameter Standard
Estimate
Error
33.87407
1.81386
t Value Pr > |t|
18.68 <.0001
size
1
-0.10174
0.00889
-11.44
<.0001
stock
1
8.05547
1.45911
5.52
<.0001
Int for stock firms is
33.87+8.05 = 41.92
Common slope is –0.10
Plot the two fitted lines
symbol1 v=M i=rl c=black l=1;
symbol2 v=S i=rl c=black l=3;
proc gplot data=a1;
plot months*size=stock;
run;
The plot
months
40
S
M
30
S
S
S
M
M
20
M
SM
S
M
M
M
10
S
S
S
S
M
M
0
0
100
200
300
size
stock
M M M 0
S S S 1
400
Two continuous variables
•
•
•
•
•
Y = β0 + β1X1 + β2X2 + β3X1X2 + e
Can be rewritten as follows
Y = β0 + (β1 + β3X2)X1 + β2X2 + e
Y = β0 + β1X1 + (β2 + β3X1) X2 + e
The coefficient of one explanatory
variable depends on the value of the
other explanatory variable
Last slide
• We went over KNNL 8.2 – 8.7
• We used programs Topic17.sas to
generate the output for today
Download