Gender Economics Courses in Liberal Arts Colleges: Comment William H. Greene Abstract Burnett (1997) proposes a model of the joint determination of two binary choice variables, presence or absence of a gender economics course and presence or absence of a women’s studies program. The econometric techniques used in estimation of her model are not consistent with the model, and will not produce consistent estimates of the parameters of the model. This note reestimates her bivariate probit model using maximum likelihood procedures. We also present some related results on specification and estimation of a model in which two binary variables are jointly (simultaneously) determined and on computation of marginal effects in a bivariate probit model. Keywords: bivariate probit, marginal effects, gender economics JEL classification: C34, C51 In ‘Gender Economics Courses in Liberal Arts Colleges,’ Burnett (1997) proposes a model for explaining the presence of a gender economics course in a liberal arts college’s course offerings. The model is based on covariates such as the academic reputation of the college, whether the college has a religious affiliation, region of the country, and other variables relating to the number of women on the faculty of the college and in the economics department. Two interesting aspects of her model are, first, the fact that the ‘response variable’ of interest, presence of a gender economics course, is a binary variable and, second, that one of the important covariates, presence of a women’s studies program, is likely to be jointly determined with the gender economics indicator, and is a binary variable as well. The first of these mandates use of an appropriate technique for binary choice modeling, such as a probit model. The second greatly complicates the analysis in that it calls for something resembling a simultaneous equations treatment. The major difficulty that this represents is that this is a nonlinear model, for which familiar simultaneous equations techniques are inappropriate. The author formulates probit models of sorts for both binary choice variables, and fits them sequentially using an iterative least squares technique. In brief, an equation for the presence of the women’s studies program is fit, then results from this equation are used ‘downstream’ in an equation for the gender economics variable. The estimator is, therefore, a variant of two step (not ‘two stage’) least squares. Binary choice models are not generally estimated by least squares, linear or nonlinear. The method of choice in most studies is maximum likelihood. The purpose of this note is to suggest an alternative model specification and estimator that is at once more appropriate for the theoretical specification presented in her article and, surprisingly enough, much simpler to implement. In addition, the discussion in Burnett’s article raises two interesting issues in binary choice modelling which, it seems are not very widely known, first, an appropriate treatment for two simultaneously determined binary variables, and, second, the computation of marginal, or partial effects in a bivariate probit model. The discussion below will also include some useful results on these subjects. Lastly, having reestimated the model in full, we revisit the empirical results. We find that our results are broadly consistent with Burnett’s, but, we would argue, a bit sharper. THE MODEL Burnett gives extensive discussion to motivate the equations of the model, so we will merely lay out the estimating equations here. The reader is referred to her paper for the background details. Using our own notation (slightly different from the author’s), the model is as follows: 2 Equations: GndrEcon WomStud Variable definitions: GndrEcon WomStud AcRep EconFac PctWEcn Relig Sou Mid Nor West = f1(AcRep, WomStud, EconFac, PctWEcn, Relig), = f2(AcRep, PctWfac, Relig, Sou, Mid, Nor,West). = 1 if gender economics class is offered, 0 otherwise, = 1 if the college includes a women’s studies program, 0 otherwise, = academic reputation, lower means better reputation, 1 is best, = number of full time economics faculty, = percentage of female economics faculty, = 1 if the college has a religious affiliation, 0 otherwise, = regional dummy variable, South, = regional dummy variable, Midwest, = regional dummy variable, North, = regional dummy variable, West. (Burnett included an overall constant in her second equation, rather than the West dummy variable.) ESTIMATION OF A RECURSIVE, SIMULTANEOUS BINARY CHOICE MODEL Given the obvious relationship between women’s studies programs and gender economics courses, the preceding certainly appears to be a recursive simultaneous equations model. If the two dependent variables were continuous measures amenable to linear types of regression modeling, and if the omitted factors (determinants) in the two equations, (i.e., the counterparts to regression disturbances), were uncorrelated, then the parameters of the two equation system could be estimated consistently, indeed, efficiently, by ordinary least squares. However, the first condition is violated by construction – these are binary choices. As we shall see, the evidence is at least suggestive that the second is not, but the first is sufficient to render the simplest techniques inappropriate. Moreover, in spite of the lack of correlation (to be discussed below), this remains a simultaneous equations model. To develop the estimators, it is convenient to represent the model in simpler terms. y1 x1 y2 x2 = = = = Let GndrEcon, all variables on the right hand side of the first equation except WomStud WomStud, all variables on the right hand side of the second equation. Burnett obtained parameter estimates for the two equations by formulating the following two equation model: 3 f2 (y2) u2 f1 (y1) = (x2)y2 [1 - (x2)]1 – y2 = y2 - f2 (y2) = (x1 + u2]y1 [1 - (x1 + u2)]1 – y1, where () is the distribution function of the standard normal distribution. Estimates of [,, ] were obtained by using “an iterated, seemingly unrelated Marquardt-Levenberg technique” (Burnett, 1977, 372). Note the use of the ‘residual’ in the second equation to mitigate the simultaneity of the WomStud variable in the GndrEcon equation. There is an ambuguity in the text at this point; the author states that a “seemingly unrelated” estimator was used, which suggests joint estimation of the two equations, whereas later in the paragraph, she states that she “substitut[ed] the residuals from the second equation to form the 2SCE,” which suggests single equation, sequential estimation. Because the Rivers and Vuong (1988) study on which Burnett’s estimator is based proposed a two step approach, we assume the latter is the case. The two step procedure represents an attempt to deal with the obvious simultaneity problem. However, the approach has several shortcomings. First, neither right hand side specification given above is a conditional mean function. Neglecting the possible correlation across the two equations, the conditional means in the two cases would be the respective () functions as they appear above. As such, the use of least squares with the functions shown above as the regression function is questionable. On the other hand, with the apropriate conditional mean function, one can consistently estimate the parameters of binary choice models using nonlinear least squares. But, nonlinear least squares is inefficient. The preferred technique is maximum likelihood. Indeed the terms in the likelihood function (for a single equation approach, anyway) are precisely those given by Burnett above. As such, had she maximized the sum of the logs of the right hand sides as given above, rather than minimized the sum of squared deviations, she would (abstracting from ‘u 2’ which remains to be considered) have obtained maximum likelihood estimates of the model parameters. There remains the question of using the residual, u2, in the second equation. The two step procedure suggested by Rivers and Vuong (1988) is proposed for a probit model in which one or more of the right hand side variables are endogenous, but 4 continuous. Thus, Burnett’s use of the residual in the second equation would be appropriate if WomStud were a continuous variable, but not a binary outcome. Finally, the Rivers and Vuong technique dictates conditional maximum likelihood estimation at the second step, not nonlinear least squares. All of this suggests a fairly straightforward way to proceed, based essentially on Burnett’s two step approach. First, the initial estimation of the WomStud (y2) equation can (should) be done with the familiar maximum likelihood probit analysis. At the second step, rather than insert a ‘residual’ in the GndrEcon equation in place of WomStud, simply insert the predicted values from the Womstud equation. These would be the sample estimates of (x2), (i.e., the sample estimates of E[y2|x2]). The second step consists of maximum likelihood probit estimation of the y1 equation, with right hand side variables x1 and the sample estimates of (x2) obtained earlier. The estimated asymptotic covariance matrix at this step must be corrected. This is a direct application of Murphy and Topel’s (1983) results for two step estimation. (An application that is close to this one, and which could be modified to match it exactly, appears in Greene (1997, 142-144.) The two step procedure suggested in the previous paragraph gives consistent estimators and appropriate asymptotic standard errors. However, it is potentially inefficient insofar as it does not account for the possible correlation between the (implicit) disturbances in the two equations. Surprisingly enough, a consistent, fully efficient estimator for this model is actually much simpler to apply than even the revised two step estimator suggested earlier. Based on the relationships among joint, conditional, and marginal probabilities, Prob[y1 = 1, y2 = 1] = Prob[y1=1| y2 = 1] Prob[y2 = 1] = {BVN(y1, y2=1) / Prob[y2 = 1] } Prob[y2 = 1] where BVN is used to denote the cumulative distribution function of the bivariate normal distribution. The result seems trivial, but it has an important implication for this model. Inserting the variables of the model, and introducing the two parameter vectors as in Burnett’s equations, we have 5 Prob[y1 = 1, y2 = 1] = [BVN(x1 + y2, x2, ) / (x2) ] (x2) (remember, y2 = 1), in which (x2) = Prob[y2 = 1]. This is a univariate probit model for y2, though in this context, is, as well, the marginal probability that y2 equals 1. After cancelling like terms in the numerator and denominator, we are left with just the bivariate probability, Prob[y1 = 1, y2 = 1] = BVN(x1 + , x2, ). There are four cases, or cells for this bivariate probit model; the remaining three can be constructed analogously to the first. For the remaining cells, we have Prob[y1 = 0, y2 = 1] = BVN(-x1 - , x2, -) , Prob[y1 = 1, y2 = 0] = BVN(x1, x2, -) , Prob[y1 = 0, y2 = 0] = BVN(-x1, -x2, ) . But, these are just the terms that enter the usual likelihood function for the bivariate probit model. The counterintuitive result is that in the bivariate probit model, unlike in the linear simultaneous equations model, if the two dependent variables are jointly determined, we just put each on the right hand side of the other equation (or, in our case, one of them) and proceed as if there were no simultaneity problem. This result is not new. One form of it appears (unfortunately with a couple of sign errors) on page 123 of Maddala’s now classic 1983 monograph on limited dependent variable models. The complications that appear throughout Maddala’s discussion are bred of his inclusion of structural, latent variables. Our model is much simpler; the observed responses are what appear on the right hand sides. On the other hand, even abstracting from that simplification, Burnett’s model is precisely Maddala’s ‘Model 6’ on his page 122. We suggest, then, that the appropriate estimator for Burnett’s model is the bivariate probit model, which can be fit easily by maximum likelihood using any of several econometrics computer programs. (We used LIMDEP.) THE DATA Data used in this study were generously provided by Professor Burnett. She gives summary statistics for the data used in her estimation in her Table 1. We encountered a few small discrepancies in assembling the data for our regressions. It appears that in the group of 35 national colleges, one of the 6 midwestern colleges was classified as a northern college in the data set provided to us. A few other minor differences appear in the summary statistics below, but there is no obvious explanation. For completeness, our counterpart to Burnett’s Table 1 of summary statistics is given below. Where they differ from ours, Burnett’s reported statistics are shown in parentheses below ours. 7 Table 1. Descriptive Statistics All 132 Colleges 35 National Colleges 97 Regional Colleges ---- --- Variable Mean Std.Dev. Mean Std.Dev. Mean Std.Dev. -------- GndrEcon 0.234848 0.425519 0.714286 0.458349 0.0618557 0.242145 (0.4329) (0.2409) WomStud 0.439394 0.498204 0.942857 0.235504 0.257732 0.439658 (0.4963) (0.4374) AcRep 119.242 62.1201 18.6857 12.5435 155.526 13.9045 (119.0984)(62.1290) (155.3298) (19.7094) PctWecn 0.247866 0.217745 0.220143 0.133794 0.257869 0.240786 (0.2471) (0.2179) (0.2568) (0.2410) EconFac 6.74242 3.747 10.3429 3.82671 5.4433 2.74609 (6.7500) (3.7627) (5.4536) (2.7800) PctWFac 0.35772 0.155059 0.324569 0.0967468 0.369682 0.170144 (0.3579) (0.1554) (0.3699) (0.1706) Relig 0.575758 0.49611 0.114286 0.322803 0.742268 0.439658 (0.4942) (0.4374) Nor 0.333333 0.4732 0.714286 0.458349 0.195876 0.398935 (0.3257) (0.4686) (0.6857) (0.4710) Sou 0.242424 0.430182 0.0857143 0.284029 0.298969 0.460184 (0.4286) (0.4578) Mid 0.272727 0.447058 0.0857143 0.284029 0.340206 0.47624 (0.2803) (0.44915) (0.1113) (0.3228) (0.4738) West 0.151515 0.359916 0.114286 0.322803 0.164948 0.373062 Cross Tabulation of GndrEcon vs. WomStud GNDRECON=0 GNDRECON=1 Total WOMSTUD=0 72 2 74 WOMSTUD=1 29 29 58 Total 101 31 132 ESTIMATES OF A BIVARIATE PROBIT MODEL Full information maximum likelihood estimates of the parameters of the bivariate probit model are given below. In a first pass, the correlation between the two structural disturbances, , was allowed to vary freely. The estimated value was 0.13594. The t ratio on this coefficient of only 0.108 suggests that the effects may well be uncorrelated. The second set of results, our preferred set, was computed with fixed at zero. (This is equivalent to fitting two separate probit equations, with WomStud simply included as a covariate in the GndrEcon equation - the probit counterpart of fitting a fully recursive simultaneous equations model equation by equation, using ordinary least squares in spite of the simultaneity.) Using the two sets of results, we also carried out a likelihood ratio test of the null hypothesis that equals zero against the alternative that does not equal zero. The test statistic, LRT = -2(-85.64578 - (-85.63172)) = .04612 is distributed as chi-squared with one degree of freedom under the null hypothesis. The value of .04612 is virtually nil, so the null hypothesis is not rejected. Based on 8 these results, the model was reestimated with constrained to equal zero. These are the results given in Table 2. We emphasize, the resulting model is the first one discussed at the beginning of this article, with the two equations estimated by simple, univariate, binomial probit methods, using maximum likelihood as the estimation criterion. The results are broadly consistent with Burnett’s. As she did, we find that academic reputation is the most significant variable in either equation. The sign is consistent, negative as expected. Another statistically significant variable in the gender economics equation is the percentage of women on the economics faculty. The negative effect of religioius affiliation that Burnett observed in her results is preserved in ours, although unlike her result, we find no statistical significance. The size of the economics faculty is not significant in the bivariate probit model either. The regional differences that she observes are replicated. We observe as she did that women’s studies programs are more likely in the north and midwest than in the south and west. (We replaced the constant term in the second equation with the WEST dummy variable so as to display this result more conveniently.) In the main, in terms of their magnitude and their statistical significance, the unambiguous result of the regressions in Table 2 is that the primary determinant of a women’s studies program or a gender economics course is the strong (in terms of numbers) presence of women on the faculty, a result that seems unsurprising. On the other hand, the apparent influence of a large economics faculty and of religious affiliation both diminish dramatically in our model, compared to hers. Finally, as might be expected, we find that a significant determinant of the presence of a gender economics course is the presence of a women’s study program. Table 2. Estimated Bivariate Probit Model Number of observations 132 Iterations completed 19 Log likelihood function -85.64578 Log likelihood with not zero -85.63172 Variable Coefficient Standard Error b/St.Er. P[|Z|>z] Mean of X Index equation for GNDRECON Constant -1.417628571 .80692136 -1.757 .0789 ACREP -.1143415362E-01 .40805763E-02 -2.802 .0051 119.24242 WOMSTUD 1.109512483 .56741514 1.955 .0505 .43939394 ECONFAC .6730283502E-01 .68736793E-01 .979 .3275 6.7424242 PCTWECN 2.539164644 .98690669 2.573 .0101 .24786591 9 RELIG -.3482478834 .49841969 -.699 .4847 .57575758 Index equation for WOMSTUD ACREP -.1956572288E-01 .55235099E-02 -3.542 .0004 119.24242 PCTWFAC 1.942934611 .84349502 2.303 .0213 .35772046 RELIG -.4493737886 .33313246 -1.349 .1774 .57575758 SOU 1.359691119 .65941139 2.062 .0392 .24242424 MID 2.338647544 .81043972 2.886 .0039 .27272727 NOR 1.886695657 .82039949 2.300 .0215 .33333333 WEST 1.824807716 .87231488 2.092 .0364 .15151515 Disturbance correlation RHO(1,2) .0000000000 (Fixed Parameter) Unrestricted value 0.135938 MARGINAL EFFECTS As Burnett suggests (1997, 374), the coefficients in a binary choice model can be misleading. Because the model is actually of a probability, the absolute scale of the coefficients gives a distorted picture of the response of the dependent variable to a change in one of the stimuli. Thus, for example, it seems improbable that the isolated effect of religious affiliation on the expected value of Gender Economics would be -.35. Gender economics courses appear in only 31 percent of the colleges in total. As has been widely documented (Greene (1997)), in order to compute marginal effects in a binary choice model, one must scale the coefficients. In a simple binary probit model, we would have E[y1|x1] = (x1) = Prob[y1 = 1] so that, for a continuous variable, zi, E[y1|x1]/zi = (x1)/zi = (xi) z where () is the density function of the standard normal distribution and z is the coefficient on that variable. If zi is a binary variable, then the appropriate way to measure the marginal effect is to use Effect on E[y1 |x1] = E[y1|x1z=1] - E[y1|x1z=0] The approach in the general bivariate probit model is essentially as shown above, but the computations are quite a bit more involved (Greene (1996)). The computations involve the preceding derivatives, as well as, for example, for a variable z that might appear in either equation, BVN((x1 + ,x2,)/zk = {(x1 + )[(x2 - (x1 + ))/ 1 2 ]} z + {(x2)[(x1 + ) - (x2))/ 1 2 ]}z 10 The results are much simpler in our model because equals zero. The result of this is that the joint probability is the product of the marginals. The conditional mean function in the model analyzed here is E[y1|x1,x2] = Ey2 E[y1 | x1,x2,y2 ] = Prob [y2 = 1] E[y1 | x1,x2,y2 = 1] + Prob[y2 = 0] E[y1 | x1,x2,y2 = 0] = (x2) (x1 + ) + (-x2) (x1), where we have used the symmetry of the normal distribution in (-x2) = 1 - (x2). For y2, E[y2|x2] = (x2). Therefore, the marginal effects in the women’s studies equation are exactly those shown above for the univariate probit model. But the marginal effect of a change in a variable in the gender economics equation will be a sum of terms. One will account for the direct effect of a change in that variable on the probability that y1 equals one, and the other will measure the indirect effect of the change in this variable on the probability that y2 equals 1 in the women’s studies equation which, in turn, affects the probability that y1 equals one. Thus: (1) For a continuous variable, z, which might appear in x1 and/or x2, we have E[y1|x1,x2]/z = [(x2) (x1 + ) + (-x2) (x1)] z (direct effect) + [(x2) (x1 + ) + (-x2) (x1)] z (indirect effect) where z and z are the coefficients on z in the two equations. Note that either of these may be zero. (2) For a binary variable, q, which might appear in x1 and/or x2, we have E[y1 | x1,x2,q=1] - E[y1 | x1,x2,q=0] = [(x2) (x1 + ) + (-x2) (x1)]| q=1 - [(x2) (x1 + ) + (-x2) (x1)]| q=0. (3) For the second, endogenous binary variable, y2, we have E[y1 | x1,x2,y2 = 1] - E[y1 | x1,x2,y2 = 0] = (x1 + ) - (x1) In all cases, standard errors for these are computed using the delta method. Let k(,,,data) equal the marginal effect, which is computed using the parameter estimates and some configuration of the data; in the computations below, the sample means are used, with dummy variables set to zero and one. Let = k(,,,data)/[,] 11 = k(,,data)/ Since is zero, our model is estimated using separate probit equtions. Let Vb denote the estimated asymptotic covariance matrix of the estimates in the first probit equation (gender economics) and Va denote the same for the women’s studies equation. Then, an estimate of the asymptotic variance for the estimated marginal effect is Asy.Var.k = Vb + Va The square root gives the estimated standard error for the estimator. (If were not zero, these two terms would be combined into a single vector and a single asymptotic covariance matrix which would apply to all parameters in the model. The partitioning into two terms is possible when equals zero in the bivariate model, because the asymptotic covariance matrix for the full set of estimators becomes block diagonal.) The estimated marginal effects of the variables in the two equations are given in Table 3. At this point, our results depart sharply from Burnett’s. The primary reason is that the implicit scaling in the probit model differs from that in her regression model. However, beyond that, we find some substantive differences in the estimated effects. In terms of its marginal effect, academic reputation is far less influential in the bivariate probit model, and, qualitatively far less important than Burnett’s results suggest. In terms of its effect on gender economics, nearly all of the effect of academic reputation is the direct effect on the probability itself, as opposed to the indirect effect of academic reputation on the probability that the college has a women’s studies program. By far, the most substantial effect, at the margin, on the probability that the college will have a gender economics course is exerted by a large female presence on the economics faculty. The existence of a women’s studies program has a lesser, but also substantial effect, similar to the percentage of women on the faculty of the college. (Note that the entire effect of this variable is indirect, but that this effect is still fairly large.) Finally, the effect of religious affiliation is quantitatively smaller in our results, but, in contrast to Burnett, it is more important in women’s studies than in gender economics. 12 Table 3. Estimated Marginal Effects Direct Indirect Total Gender Economics Equation AcRep -0.002022 -0.001453 -0.003476 PctWecon +0.4491 +0.4491 EconFac +0.01190 +0.1190 Relig -0.07049 -0.03227 -0.1028 WomStud +0.1863 +0.1863 PctWfac +0.13951 +0.13951 Women’s Studies Equation AcRep -0.00754 -0.00754 PctWfac +0.13789 +0.13789 Relig -0.13265 -0.13266 (standard error, t-ratio) (type of variable) (.00126, -3.087) (.1568, 2.864) (.01292, 0.922) (.1055, -.974) (.0868, 2.146) (.08916, 1.565) (Continuous) (Continuous) (Continuous) (Binary) (Endogenous) (Continuous) (.002187, -3.448) (.01002, 13.76) (.18803, -.706) (Continuous) (Continuous) (Binary) CONCLUSIONS The preceding has suggested a number of aspects of modeling in a recursive, simultaneous equations model involving two binary choice variables. We suggested an alternative to the estimation technique used by Burnett in her study, and, in the process, were able to apply her data and model to some useful, though apparently not widely used computations in this setting. In spite of her misspecifications of the conditional means at the outset, our base results do not differ radically from hers; most of the qualititative results are replicated. However, the results do differ enough to conclude that the use of the more appropriate techniques is clearly warranted. REFERENCES Burnett, N. 1997. Gender economics courses in liberal arts colleges. Journal of Economic Education 28 (Fall): 369-377. Greene, W. 1997. Econometric Analysis (3rd ed.), Englewood Cliffs, NJ: Prentice Hall. Greene, W. 1996. Marginal effects in the bivariate probit model. Working paper number 96-11. Department of Economics, Stern School of Business, New York University. LIMDEP. 1998. Econometric Software, Inc.: Plainview, New York. Maddala, G. 1983. Limited dependent and qualitative variables in econometrics. Cambridge: Cambridge University Press. Murphy, K. and R. Topel. 1985. Estimation and inference in two step econometric models. Journal of Business and Economic Statistics, 3 (Fall): 370-379. Rivers, D. and Q. Vuong. 1987. Limited information estimators and exogeneity tests for simultaneous probit models. Journal of Econometrics 39 (November): 347-366. 13 14