Topic 17: Interaction Models Interaction Models • With several explanatory variables, we need to consider the possibility that the effect of one variable depends on the value of another variable • Special cases – One binary variable (Y/N) and one continuous variable – Two continuous variables One binary variable and one continuous variable • X1 takes values 0 and 1 corresponding to two different groups • X2 is a continuous variable • Model: Y = β0 + β1X1 + β2X2 + β3X1X2 + e • When X1 = 0 : Y = β0 + β2X2 + e • When X1 = 1 : Y = (β0 + β1)+ (β2 + β3) X2 + e One binary and one continuous • β0 is the intercept for Group 1 • β0+ β1 is the intercept for Group 2 • Similar relationship for slopes (β2 and β3) • H0: β1 = β3 = 0 tests the hypothesis that the regression lines are the same • H0: β1 = 0 tests equal intercepts • H0: β3 = 0 tests equal slopes KNNL Example p316 • Y is number of months for an insurance company to adopt an innovation • X1 is the size of the firm (a continuous variable • X2 is the type of firm (a qualitative or categorical variable) The question • X2 takes the value 0 if it is a mutual fund firm and 1 if it is a stock fund firm • We ask whether or not stock firms adopt the innovation slower or faster than mutual firms • We ask the question across all firms, regardless of size Plot the data symbol1 v=M i=sm70 c=black l=1; symbol2 v=S i=sm70 c=black l=3; proc sort data=a1; by stock size; proc gplot data=a1; plot months*size=stock; run; Two symbols on plot months 40 S M 30 S S S M M 20 M SM S M M M 10 S S S S M M 0 0 100 200 300 size stock M M M 0 S S S 1 400 Interaction effects • Interaction expresses the idea that the effect of one explanatory variable on the response depends on another explanatory variable • In the KNNL example, this would mean that the slope of the line depends on the type of firm Are both lines the same? • From scatterplot, looks like different intercepts but can use the test statement for formal assessment Data a1; set a1; sizestock=size*stock; Proc reg data=a1; model months=size stock sizestock; test stock, sizestock; run; Output Test 1 Results for Dependent Variable months Source Numerator Denominator Mean DF Square 2 158.12584 16 F Value 14.34 Pr > F 0.0003 11.02381 Reject H0.There is a difference in the linear relationship across groups Output •How are they different? Parameter Estimates Variable Intercept size stock sizestock DF 1 1 1 1 Parameter Estimate 33.83837 -0.10153 8.13125 -0.00041714 Standard Error 2.44065 0.01305 3.65405 0.01833 t Value 13.86 -7.78 2.23 -0.02 Pr > |t| <.0001 <.0001 0.0408 0.9821 1. No difference in slopes assuming different intercepts 2. Potentially different intercepts assuming different slopes Two parallel lines? proc reg data=a1; model months=size stock; run; Output Analysis of Variance Source Model Sum of Mean DF Squares Square F Value Pr > F 2 1504.4133 752.2066 72.50 <.0001 Error 17 176.38667 10.37569 Corrected Total 19 1680.8000 Root MSE 3.22113 R-Square 0.8951 Dependent Mean 19.40000 Adj R-Sq 0.8827 Coeff Var 16.60377 Output Parameter Estimates Variable Intercept DF 1 Parameter Standard Estimate Error 33.87407 1.81386 t Value Pr > |t| 18.68 <.0001 size 1 -0.10174 0.00889 -11.44 <.0001 stock 1 8.05547 1.45911 5.52 <.0001 Int for stock firms is 33.87+8.05 = 41.92 Common slope is –0.10 Plot the two fitted lines symbol1 v=M i=rl c=black l=1; symbol2 v=S i=rl c=black l=3; proc gplot data=a1; plot months*size=stock; run; The plot months 40 S M 30 S S S M M 20 M SM S M M M 10 S S S S M M 0 0 100 200 300 size stock M M M 0 S S S 1 400 Two continuous variables • • • • • Y = β0 + β1X1 + β2X2 + β3X1X2 + e Can be rewritten as follows Y = β0 + (β1 + β3X2)X1 + β2X2 + e Y = β0 + β1X1 + (β2 + β3X1) X2 + e The coefficient of one explanatory variable depends on the value of the other explanatory variable Last slide • We went over KNNL 8.2 – 8.7 • We used programs Topic17.sas to generate the output for today