Thursday, October 7: Interactions and Ordered Categorical Variables I. Interpreting Interaction Effects (also see Greene, pp. 123-124) A. The Concept of an Interaction. Remember that interactions measure how much changes on one variable affect the effect of another variable. Put another way, they reveal whether or not the effect of variable changes in different contexts. For instance, increases in ethnolinguistic fractionalization could lead to significant declines in political stability under authoritarian regimes, but lead to increases in the stability of democracies. If this is true, there will be a strong interaction between type of political system and ethnolinguistic fractionalization. B. Measuring an Interaction Effect. To test the hypothesis that Variable x1 influences the effect of Variable x2 (and vice-versa), your model should include an interaction term that is the product of these two variables. If the effect of one variable is altered by the other, the coefficient on this interaction term (x1x2) will be significant. In ordinary least squares regression, evaluating the effect of x1 (ELF fractionalization in this example) at any particular level of x2 (type of political system) is straightforward: you add the coefficient of x1 to the product of the coefficient on x1x2 and the particular value of x2 in which you are interested. For the model: We can get a predicted value by y = β0 + β1x1+ β2x2+ β3x1x2 + e yhat = β0 + β1x1+ β2x2+ β3x1x2 And we can obtain the first difference for x1, the change brought in yhat by a one-unit change in x1 from x1low to x1high while all other factors are held constant, by: Δy = β0 + β1x1high + β2x2+ β3x1highx2 - β0 - β1x1low - β2x2 - β3x1lowx2 Δy = β1x1high - β1x1low + β3x1highx2 - β3x1lowx2 Δy = β1(x1high - x1low) + β3x2(x1high - x1low) Δy = β1 + β3x2 C. Interaction Effects in Maximum Likelihood Models. If only life were so simple in the ML world. In OLS, this expression reduces so nicely because the predicted value yhat for a given observation is just a linear function of the coefficients and explanatory variables. The contribution of β2x2 to yhat is not going to vary across observations, so it can drop out of the expression. But in ML models with more complex systematic components, β2x2 and all of the other terms will be put through a transformation such as the logit transformation. In a model with an interaction term, the effect of x2 will depend upon both x1 and upon the levels of all the other variables in the equation (which determine where this observation is on the logit curve when the shift in x1 begins). Just as we did when we were finding simple first differences, we will have to pick some value to set all other variables constant at. One pitfall to avoid is that when you tell Clarify to setx mean, it will set the interaction term constant at the mean of the interaction term (the product of x1 and x2). This is not very substantively meaningful, since the interaction term isn’t a real variable. Instead of setting it constant at the mean of the product of the two variables, we should set it constant at the product of the means of x1 and x2. Then, to find the effect of a one-unit shift in x1 at some particular value of x2, we simulate the first difference brought by both a one-unit change in x1 (analogous to β1) and the change in the interaction term that results from a one-unit change in x1 (analogous to β3x2). D. Syntax for Interactions in Clarify. Suppose I want to predict whether Senate committees in the 50 states are required to report legislation that they are assigned. I begin with a model that attempts to explain the presence of a reporting rule (senhear=1) by using the three components of legislative professionalism, the house’s session length, staffing levels, and salaries, along with a measure of the “moralism” of the state’s political culture. I run the following logit model, using Clarify. . estsimp logit senrepor totalday salary staffup moral Iteration Iteration Iteration Iteration Iteration Iteration 0: 1: 2: 3: 4: 5: log log log log log log likelihood likelihood likelihood likelihood likelihood likelihood = = = = = = -26.345398 -20.849193 -20.011675 -19.917045 -19.914806 -19.914805 Logit estimates Log likelihood = -19.914805 Number of obs LR chi2(4) Prob > chi2 Pseudo R2 = = = = 50 12.86 0.0120 0.2441 -----------------------------------------------------------------------------senrepor | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------totalday | -.0141113 .0099893 -1.41 0.158 -.0336899 .0054674 salary | .0000921 .0000477 1.93 0.053 -1.33e-06 .0001856 staffup | -2.492502 1.392717 -1.79 0.074 -5.222177 .2371723 moral | 2.617587 1.120754 2.34 0.020 .4209504 4.814224 _cons | -2.046383 .9856179 -2.08 0.038 -3.978159 -.1146075 -----------------------------------------------------------------------------Simulating main parameters. Please wait.... % of simulations completed: 20% 40% 60% 80% 100% Number of simulations : 1000 Names of new variables : b1 b2 b3 b4 b5 Now I can look at the effects increasing the salary level from Maine’s value ($12,900) to California’s ($99,250), holding other variables constant at their means: setx mean simqi fd(pr) changex(salary 12900 99250) First Difference: salary 12900 99250 Quantity of Interest | Mean Std. Err. [95% Conf. Interval] ---------------------------+-------------------------------------------------dPr(senrepor = 0) | -.842876 .2485586 -.9939116 -.0548535 dPr(senrepor = 1) | .842876 .2485586 .0548535 .9939116 This tells me that moving from Maine’s salary to California’s salary, all other factors being equal, increase the chances of requiring Senate committees to report bills by 0.84, with a confidence interval of (0.05 to 0.99). Now suppose I want to test whether the effect of salary is different in moralistic and nonmoralistic states, and estimates those two context-bound effects. I create the interaction variable salary_moral (the product of the two, by using the Stata command genx salary_moral=salary*moral), and estimate another logit model with this interaction term included. But first, I have to get rid of the simulated parameters from the last model. drop b* estsimp logit senrepor totalday salary staffup moral salary_moral Logit estimates Log likelihood = -19.144086 Number of obs LR chi2(5) Prob > chi2 Pseudo R2 = = = = 50 14.40 0.0132 0.2733 -----------------------------------------------------------------------------senrepor | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------totalday | -.0093382 .0097495 -0.96 0.338 -.0284469 .0097705 salary | .0001326 .0000596 2.22 0.026 .0000157 .0002495 staffup | -2.422675 1.377301 -1.76 0.079 -5.122136 .2767861 moral | 4.598178 2.281156 2.02 0.044 .1271951 9.069161 salary_moral | -.0000673 .0000575 -1.17 0.242 -.0001801 .0000454 _cons | -4.184283 2.295119 -1.82 0.068 -8.682634 .3140674 -----------------------------------------------------------------------------Simulating main parameters. Please wait.... % of simulations completed: 16% 33% 50% 66% 83% 100% Number of simulations : 1000 Names of new variables : b1 b2 b3 b4 b5 b6 Although it is not statistically significant, this interaction term tells me that salary has a smaller affect on committee reporting requirements in states with moralistic political cultures. To get a sense of the scale of the “slope shift,” I can look at first differences. Maybe I wasn’t listening to my own lectures, and I set all of the variables at their means. This sets the interaction term at the mean of the product of the two variables: setx mean simqi fd(pr) changex(salary 12900 99250) First Difference: salary 12900 99250 Quantity of Interest | Mean Std. Err. [95% Conf. Interval] ---------------------------+-------------------------------------------------dPr(senrepor = 0) | -.917242 .1888489 -.9980687 -.2803625 dPr(senrepor = 1) | .917242 .1888489 .2803625 .9980687 But now I know that I need to hold the interaction term constant at the product of their means. sum salary if moral!=. Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------salary | 50 25376.64 19829.32 200 99250 . sum moral Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------moral | 50 .5 .5050763 0 10 setx mean setx salary_moral 25376*0.5 This would be the syntax I needed in order to correctly estimate the some other variable, like staffing levels. But if I want to look at effects of one of the interacted terms, like salary, I would have to the first difference of its change and of the change that its change the interaction term. effect of the look at brings to simqi fd(pr) changex(salary 12900 99250 salary_moral 12900*.5 99250*.5) First Difference: salary 12900 99250 salary_moral 12900*.5 99250*.5 Quantity of Interest | Mean Std. Err. [95% Conf. Interval] ---------------------------+-------------------------------------------------dPr(senrepor = 0) | -.7968219 .2824022 -.9988686 -.0267144 dPr(senrepor = 1) | .7968219 .2824022 .0267144 .9988686 This is the effect of salary when moralism is held constant at its mean of 0.5. setx moral 0 . simqi fd(pr) changex(salary 12900 99250 salary_moral 0 0) First Difference: salary 12900 99250 salary_moral 0 0 Quantity of Interest | Mean Std. Err. [95% Conf. Interval] ---------------------------+-------------------------------------------------dPr(senrepor = 0) | -.915604 .2083988 -.9998101 -.1976774 dPr(senrepor = 1) | .915604 .2083988 .1976775 .9998101 This is the effect of salary in nonmoralistic states. . setx moral 1 . simqi fd(pr) changex(salary 12900 99250 salary_moral 12900 99250) First Difference: salary 12900 99250 salary_moral 12900 99250 Quantity of Interest | Mean Std. Err. [95% Conf. Interval] ---------------------------+-------------------------------------------------dPr(senrepor = 0) | -.6433052 .3541569 -.9467836 .3529614 dPr(senrepor = 1) | .6433052 .3541569 -.3529614 .9467836 This is the effect of salary in moralistic states. II. Ordered Categorical Variables A. When is an ordered probit model appropriate? When your measurement strategy results in an ordinal dependent variable that takes on three or more discrete values. This means that you can specify a clear ordering of the outcomes (“3” means more of the variable than “2,” which means more than “1”) but you may not be ready to assume that the distance between each of the categories is constant. Ordered probit is often used to analyze things like Likhert scales on polls or subjective scales assigned by field experts. Since we don’t assume a constant difference between each category, we will allow for the possibility that it takes a bigger change in an independent variable to get over the “threshold” into one category than it takes to get into the next category. An ordered probit model estimates both the effects of the independent variables (through the systematic component) and the thresholds of the dependent variable (through the stochastic component) at the same time. B. Some Notation. We will use the stylized normal distribution, the special case of the Normal distribution where σ2=1. The probability density function of the distribution is denoted by fstn(yi|μi), and Fstn(yi|μi) denotes the cumulative density function, which tells us the probability of observing some value of y up to and including yi. C. The Univariate PDF. The idea behind the ordered probit model is that even though your measure is ordinal, the latent concept that you are trying to measure is continuous. We assume that the random variable is distributed around μi according to the stylized normal distribution, but that we only observe whether it got over one of m thresholds that can be indexed by Tj … Tm, j = 1,m. The latent continuous variable can be denoted as yi*. So that every possible value of yi* fits into a category, we assume T1 = -∞ and Tm = ∞. Thus, yi* is observed to fall into category j iff: Tj-1 < yi* < Tj for j=2…m Now our observed realization yji is a set of dichotomous variables that tell us whether or not observation i fell into each particular category j. yji = { 1 if Tj-1 < yi* < Tj 0 otherwise For one particular category j and one observation i, we can derive a univariate pdf. Pr(Yji =1) = Pr(Tj-1 < yi* < Tj) = ∫Tj-1Tj fstn(yi*|μi)∂yi* = FN(Tj|μi,1) - FN(Tj-1|μi,1) = FN(Tj|xiβ,1) - FN(Tj-1|xiβ,1) D. Building a Likelihood Function. All we’ve done so far is write out the univariate pdf for one category. To get the full univariate pdf for all categories (while still only looking at one case), we need to estimate the chances that our unobserved variable fell into all of the categories. We do that by taking the product of the above expression across all m categories. We raise the expression to the power yji so that it takes on a value of 1 for all of the cases where yi* doesn’t fall into category j, but takes on that category-specific pdf for the category that it does fall into. Then to turn this into a joint pdf across all observations, we take the product of that product across all n observations. The likelihood and log-likelihood functions become: n m ~ ~ y ~ ~ ~ ~ L(T , | y ) { [ FN (T j | xi ,1) FN (T j 1 | xi ,1)] ji } i 1 j 1 n m ~ ~ ~ ~ ~ ~ ln L(T , | y ) y ji ln [ FN (T j | xi ,1) FN (T j 1 | xi ,1)] i 1 j 1 E. Estimating and Interpreting Results. estsimp oprobit delivery termlim termnext salary totalday staffper income pop edcoll urban govrace senhear init, table Ordered probit estimates Log likelihood = -40.969525 Number of obs LR chi2(12) Prob > chi2 Pseudo R2 = = = = 50 23.45 0.0241 0.2225 -----------------------------------------------------------------------------delivery | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------termlim | -3.634462 1.941232 -1.87 0.061 -7.439206 .1702822 termnext | 2.139961 1.021766 2.09 0.036 .1373353 4.142586 salary | -6.49e-06 .0000182 -0.36 0.721 -.0000421 .0000291 totalday | .0116516 .0041179 2.83 0.005 .0035808 .0197225 staffper | -.0323126 .1144711 -0.28 0.778 -.2566718 .1920467 income | -.000027 .000101 -0.27 0.789 -.0002249 .0001709 pop | -3.57e-07 .0000764 -0.00 0.996 -.0001501 .0001494 edcoll | .011924 .0627344 0.19 0.849 -.1110333 .1348812 urban | .0172823 .0173222 1.00 0.318 -.0166687 .0512333 govrace | -.0892512 .3308387 -0.27 0.787 -.737683 .5591806 senhear | 1.3727 .5114623 2.68 0.007 .3702525 2.375148 init | -.8800094 .5167325 -1.70 0.089 -1.892787 .1327677 -------------+---------------------------------------------------------------_cut1 | 1.936859 1.699538 (Ancillary parameters) _cut2 | 2.642842 1.711786 -----------------------------------------------------------------------------delivery | Probability Observed -------------+------------------------------------0 | Pr( xb+u<_cut1) 0.4200 1 | Pr(_cut1<xb+u<_cut2) 0.2000 2 | Pr(_cut2<xb+u) 0.3800 . setx mean . simqi fd(pr) changex(termlim 0 1) First Difference: termlim 0 1 Quantity of Interest | Mean Std. Err. [95% Conf. Interval] ---------------------------+-------------------------------------------------dPr(delivery = 0) | .6302705 .2017743 -.034686 .841227 dPr(delivery = 1) | -.2250201 .0921626 -.3875622 -.0081882 dPr(delivery = 2) | -.4052504 .1566423 -.6185045 .0443655