There have recently been a number of Statalist postings concerning the heteroskedastic probit model. hetprob.ado represents code that James Hardin and I <jhardin@stata.com> and <wgould@stata.com>) *THINK* estimate such models correctly. I want to emphasize the word *THINK*. Neither James nor I have read the articles we should and, in fact, most of what follows comes from James and I hearing the words "heteroskedastic probit" and then thinking to ourselves, "what could that mean?" While neither James nor I would ever distribute anything -- even informally --- if we thought there was much chance it was completely misguided, we seek reassurance. In particular, our concerns have to do with assumed normalizations which might make the coefficients we estimate multiplicatively different from those estimated by some other package. Before providing the code, let us reveal our thinking. below are: The sections Development of probit model The heteroskedastic case Derivation of a heteroskedastic-corrected probit model The -hetprob- command Options An example using -hetprobComment on estimated standard errors and likelihood-ratio tests Calculating predictions from the -hetprob- estimates Request for comments (signature) The -hetprob- ado-files Development of probit model ---------------------------There are lots of way to think about and so derive the probit model, but here is one: We have a continuous variable y_j, j=1, ..., N, and y_j is given by the linear model y_j = a + X_j*b + u_j, u_j distributed N(0, s^2) If we observed y_j and X_j, we could estimate b using linear regression. We, however, do not observe y_j. Instead, we observe d_j = 1 = 0 if if y_j > c y_j < c Thus, Pr(d_j==1) = = = = = = Pr(y_j > c) Pr(a + X_j*b + u_j > c) Pr(u_j > c - a - X_j*b) Pr(u_j/s > (c-a)/s - X_j*(b/s)) Pr(-u_j/s < (a-c)/s + X_j*(b/s)) F( (a-c)/s + X_j*(b/s) ) Thus, when we estimate a probit model Pr(outcome) = F( a' + X_j*b' ) The estimates we obtain are a' = (a-c)/s b' = b/s In words this means that we cannot identify the scale of the unobserved y. The heteroskedastic case ------------------------Let us now consider heteroskedasticity. Let us start with a simple case and then we will generalize beyond it. Pretend we have two groups of data and each group is, itself, homoskedastic. y_j = a + X_j*b + u_j, u_j distributed N(0, s1^2) for group 1 y_j = a + X_j*b + v_j, v_j distributed N(), s2^2) for group 2 Note that we assume the coefficients (a,b) are the same for both groups. This means that, if we observed y, we could estimate each model separately and we would expect to estimate similar coefficients. In the probit case, however, something surprising happens. Estimating on each of the groups separately, we would obtain: group 1 -------------a1' = (a-c)/s1 b1' = b/s1 group 2 -------------a2' = (a-c)/s2 b2' = b/s2 The probit coefficients are different in each group, but related! a2'/a1' b2'/b1' = = [(a-c)/s2] / [(a-c)/s1] [ b/s2] / [ b/s1] = = s1/s2 s1/s2 This is a very un linear-regression like result but hardly surprising. We do not observe y, we observe whether y>c or y<c, and if the variance of the process increases, our prediction of Pr(y>c) must move toward .5; coefficients must move toward zero. This issue is *NOT* addressed by the Huber/White/Robust correction to the standard errors. Derivation of a heteroskedastic-corrected probit model -------------------------------------------------------Let us assume y_j = a + X_j*b + u_j, u_j distributed N(0, s_j^2) where s_j^2 is given by some function of Z_j which we will specify later. Then: Pr(y_j > c) = = = = = = Pr(a + X_j*b + u_j > c) Pr(u_j > c - a - X_j*b) Pr(u_j/s_j > (c-a)/s_j - X_j*(b/s_j)) Pr(-u_j/s_j < (a-c)/s + X_j*(b/s)) F( (a-c)/s_j + X_j*(b/s_j) ) so a' = (a-c)/s_j b' = b/s_j Let us now specify s_j^2 = exp(s0 + Z_j*g). Then b' = b/s_j = b/[exp(s0/2)exp(Z_j*g/2)] and there is obviously a lack of identification problem. We will identify the coefficients by (arbitrarily) setting s0=0. Then the model is Pr(outcome) = F( a'/s_j) + X_j*b'/s_j ) = F( (a' + X_j*b') / s_j ) where s_j^2 = exp(Z_j*g) a', b', and g are to be estimated. The -hetprob- command ---------------------- --hetprob- has syntax hetprob depvar [indepvars] [if exp] [in range], variance(varlist) --------[ nochi level(#) <maximize-options> ] ----- - and, as with all estimation commands, -hetprob- typed without arguments redisplays estimation results. For instance, if I type . hetprob outcome bp age, v(group1 group2) I am estimating the model Pr(outcome) = F( b0 + b1*bp + b2*age where s^2 = exp(g1*group1 + g2*group2) ) = F( [b0+b1*bp+b2*age]/exp[(g1*group1 + g2*group2)/2] ) The variance() variables are not required to be discrete variables. type If I . hetprob works educ sex, v(age) I am assuming s^2 = exp(g1*age). The same variables can even appear among the standard explanatory variables and the explanatory variables for variance, but realize that you are pushing things. . hetprob works age, v(age) amounts to estimating Pr(works) = F( (b0 + b1*age) / exp(g1*age/2) ) and obviously coefficients g1 and b1 will be highly correlated. Options -------variance(varlist) is not optional; it specifies the variables on which the variance is assumed to depend. nochi suppresses the calculation of the model chi-squared statistic -the likelihood-ratio test against the constant-only model. Specifying this option speeds execution considerably. level(#) specifies the confidence level, in percent, for the confidence intervals of the coefficients; see help level. maximize_options control the maximization process; see [R] maximize. should never have to specify them. You An example using -hetprob--------------------------. hetprob foreign mpg weight, v(price) Estimating constant-only model: Iteration 0: Log Likelihood = -45.03321 Iteration 1: Log Likelihood = -44.836587 <output omitted> Iteration 8: Log Likelihood = -44.66722 Estimating full model: Iteration 0: Log Likelihood = -26.844189 (unproductive step attempted) Iteration 1: Log Likelihood = -26.572242 <output omitted> Iteration 7: Log Likelihood = -24.833819 Number of obs = Model chi2(2) = Prob > chi2 = 74 39.67 0.0000 Log Likelihood = -24.8338194 -----------------------------------------------------------------------------foreign | Coef. Std. Err. z P>|z| [95% Conf. Interval] ----------+------------------------------------------------------------------foreign | mpg | -.1651034 .1257984 -1.312 0.189 -.4116637 .081457 weight | -.0057221 .0031172 -1.836 0.066 -.0118317 .0003876 _cons | 17.44533 9.500341 1.836 0.066 -1.174997 36.06566 ----------+------------------------------------------------------------------ln_var | price | .0003066 .0001685 1.819 0.069 -.0000237 .0006369 -----------------------------------------------------------------------------note, LR test for ln_var equation: chi2(1) = 4.02, Prob > Chi2 = 0.0449 Comment on estimated standard errors and likelihood-ratio tests ---------------------------------------------------------------If you look at the output above, the Wald tests (tests based on the conventionally estimated standard errors) and the likelihood-ratio tests yield considerably different results. At the top of the output, the model chi-squared is reported as chi2(2) = 39.67 (p=.0000). The corresponding Wald test is . test mpg weight ( 1) ( 2) [foreign]mpg = 0.0 [foreign]weight = 0.0 chi2( 2) = Prob > chi2 = 3.40 0.1831 (The tests for coefficients in the ln_var equation, at least in this case, are in more agreement. The reported z statistic is 1.819 (meaning chi2 = 1.819^2 = 3.31) and the corresponding likelihood-ratio test is 4.02.) James and I took a simple example to explore how different results might be. In our example, we simulated data from the true model y_j = 2*x1_j + 3*x2_j + u_j u_j distributed N(0,1) for group 0 and N(0,4) for group 2. d_j = 1 if y_j>average(y_j in the sample) Here is what we obtained in five particular runs: Sample-size L.R. test Wald test for each group chi^2 chi^2 ------------------------------------------------50 + 50 27.48 13.11 100 + 100 58.10 33.40 1000 + 1000 556.42 297.35 5000 + 5000 2812.04 1478.40 We do not want to make too big a deal about this but we do recommend some caution in interpreting Wald-test results. We would check results against the likelihood-ratio test before reporting them. Calculating predictions from the -hetprob- results --------------------------------------------------Here is how you obtain predicted probabilities form the -hetprobresults: . . . . predict predict replace gen p = i1 s, eq(ln_var) s = exp(s/2) normprob(i1/s) Request for comments --------------------We seek comments and, in particular, we seek comparison of results from this command with other implementations. --- Bill wgould@stata.com -- James jhardin@stata.com