Probit models with dummy endogenous regressors Jacob Nielsen Arendta and Anders Holmb a b Institute of Public Health – Health Economics, University of Southern Denmark, Odense. Department of Sociology, University of Copenhagen. Abstract This study considers heckit-type approximations useful for a number of different trivariate probit models. They are simple to use and have no convergence problems like full maximum likelihood. Simulations show that a heckit and a least squares approximation perform as well as the trivariate probit estimator in small samples when the degree of endogeneity is not too severe. A simple double-heckit and a heteroskedasticity corrected heckit approximation seem particularly robust and promising for testing exogeneity. The methods are used to estimate the impact of physician advice on physical activity, where the heckit approximations work as well as full maximum likelihood. JEL codes: C25, C35, I12, I18 Keywords: Multivariate Probit, Two-step estimation, Endogeneity, Selection, Medical advice. Corresponding author: Jacob Nielsen Arendt, Institute of Public Health – Health Economics, University of Southern Denmark, J. B. Winsløwsvej 9B, 5000 Odense C, Denmark. Phone: +45 6550 3843. Fax: +45 6550 3880. Email: jna@sam.sdu.dk 1. Introduction Estimation of binary regression models with dummy endogenous variables is a difficult issue in econometrics. In contrast to models with continuous endogenous variables, no simple consistent two-stage methods exist and it is only recently that more advanced semi-parametric alternatives have appeared, see e.g. Vytlacil and Yildiz (2006). Therefore, problems of endogeneity or sample selection are often ignored. When not ignored, maximum likelihood estimation (MLE) remains a preferred model, the most popular among economists probably being the multivariate probit and its variants (Zellner and Lee, 1965; Ashford and Sowden, 1970; Heckman, 1978; Poirier, 1980; Wynand and van Praag, 1981; Abowd and Farber, 1982; O’Higgins, 1995; Evans and Schwab, 1995; Carrasco, 2001). In the case of several endogenous dummy variables MLE may however become impractical or even impossible to obtain, see e.g. Nawata (1994) in the case of sample selection models or Lechner et al. (2005) in the case of panel data models, who discuss alternative simulation estimators. In this paper we focus on commonly applied multivariate probit models and propose an alternative way to estimate the effect of several endogenous dummy variables in a binary response model by simple two stage methods. These are computational less demanding compared to MLE and they always converge. In general the problem of dummy endogenous variables in binary response models is of great importance in most social science disciplines, since these frequently rely on non-experimental data. It is well-known from especially linear models that failing to take endogeneity into account may result in substantially biased results. Yatchew and Griliches (1985) derive the approximate bias in a probit model with continuous endogenous regressors and it is natural to assume that such bias extends to non-linear models. There are therefore several reasons to 2 focus on models with binary variables: 1) Less attention has been paid to these models than to models where either dependent or independent endogenous variables are continuous, 2) one frequently encounters binary variables in social science applications, and 3) when both dependent and independent variables are binary, correct modelling requires more complex models than if either is continuous. Specifically, simple two-stage methods exist which account for endogeneity in models where either the dependent or the independent endogenous variable is continuous (see e.g. Alvarez and Glasgow, 2000; Bollen et al., 1995; Mroz, 1999, for a comparison of methods and Bhattacharya et al. 2006 for a brief review), whereas such procedures are not consistent with dummy endogenous variables. Although consistent estimates can be obtained by multivariate modelling, this gives rise to several difficulties both with respect to estimation and with respect to making such models readily understandable (and applicable) to a wider audience. Consequently, we think that it is of great value to consider simpler models that approximate the true effects. We consider two types of approximations: a heckit type and a least-squares type. These are defined below. Although the former may seem complicated to some we provide GAUSS and Stata code for the implementation and given this, the estimators are simple to obtain. Nicoletti and Perracchi (2001) consider how the heckit approximation performs in a very similar case: a binary model with sample selection. The main contribution in this paper is to extend the heckit approximation to the case with two dummy endogenous explanatory variables or other models where full information MLE involves a variant of the trivariate probit. Bhattacharya et al. (2006) compare other approximations to bi- and trivariate probit. Our study differs from this in a number of ways. First because we consider other approximate 3 estimators, second because we look smaller samples (where MLE may have more bias) and third by varying simulation scenarios along different dimensions. Needless to say that the higher the dimension of the model, the more fruitful a simple approximation may be. Not only because multivariate models become more cumbersome but also because the approximations can be applied to a range of different models, e.g. combining sample selection and endogeneity or with switching regressions. We provide simulation results that illustrate the bias of this approximation as well as of a simpler least-squares-based approximation in different settings. We find that what we refer to as the tri-heckit approximation and the least-squares approximation work well and that they overall performs as well as full maximum likelihood estimation in small samples when there is not too severe endogeneity. These two estimators are however often severely biased in terms of estimation of correlation coefficients, which are the parameters of interest when assessing whether endogeneity is a problem in the first place. We find that taking the heteroskedasticity inherent in the heckit approximation into account yields an overall more robust estimator which often outperforms full MLE with respect to estimation of correlation coefficients. To illustrate empirically how the approximation works with real data we apply the approximations for estimation of the influence of physician advice about exercise on physical activity for people having hypertension, accounting for endogeneity of receiving an advice and selectivity due to the restricted sample of individuals with hypertension. We show that taking endogeneity and selection into account severely affects the results, implying that simple estimates underestimate the impact of physician advice. Moreover, the heckit approximations perform as well (or bad) as maximum likelihood estimators. Caution in interpretation of these results should be taken, as the standard errors are very high, leaving most coefficients 4 insignificant. This illustrates that it is a data demanding task to account for both endogeneity and selectivity, something which however appears to affect both approximations and full maximum likelihood estimation. The paper is organized as follows: the next section introduces the binary model with one endogenous regressor. Section three extends the model to two dummy endogenous variables and show how the approximations can be used in related models with sample selection or switching regressions, where the full MLE also relies on trivariate normal probabilities. In section four, simulation evidence for the estimators is presented. Section five presents the empirical application on the impact of physician advice on physical activity and section six provides some concluding remarks. 2. A model with one endogenous regression variable In this section we present the case with two dummy variables, y1 and y2 , where y2 may have a causal effect on y1 , but where the variables are spuriously related due to observed as well as unobserved independent variables. This situation is illustrated in the following fully parametric model: y1 1( y2 x11 1 0), y2 1( x2 2 2 0), (1) (1 , 2 | x1 , x2 ) ~ N (0, 0,1,1, ), where 1(.) is the indicator function taking the value one if the statement in the brackets is true and zero otherwise. , 1 , 2 are regression coefficients, and N (.,.,.,., ) indicates the standard bivariate normal distribution with correlation coefficients , where variances have been 5 normalized to one without loss of generalization. When is zero, the model for y1 reduces to the standard probit model. Basically, the model states three reasons why we might observe y1 and y2 to be correlated: 1) a causal relation due to the influence of y2 on y1 through the parameter , 2) y2 and y1 may depend on correlated observed variables (the x’s) and 3) y2 and y1 may depend on correlated unobserved variables (the ’s). Consistent and asymptotically efficient parameter estimates are obtained by maximum likelihood estimation of the bivariate probit model. This is based on a likelihood function consisting of a product of individual contributions of the type: Li ( , 1 , 2 | yi1 , y2i , x1i , x2i ) P( yi1 , y2i | x1i , x2i ) P( yi1 | y2i , x1i ) P( y2i | x2i ). (2) The second part of the latter term of the likelihood is simply a probit for y2. The first part of the individual likelihood contributions is given as (see e.g. Wooldridge, 2002, p. 478): P( yi1 1| y2i 1, x1i ) P( x1i 1 1i 0 | 2i x2i 2 ) x ( ) 1i 1 2i 2 d 2i . 2 ( x2i 2 ) 1 x2 i 2 (3) Even though rather precise procedures for evaluation of (2) exist, they are often timeconsuming in an iterative optimization context. Furthermore, when approaches one it can be seen from (3) that the integral numerically blows up and estimation becomes imprecise. 2.1 A heckit approximation If y1 was continuous we could rely on the ingenious work by Heckman (1974; 1976) to obtain consistent parameters estimates: 6 E ( y1* | y2* 0) y2i x1i 1 E 2 | 2 x2i 2 x1i 1 ( x2i 2 ) ( x2i 2 ) (4) y1* x1i 1 1i , y2* x2i 2 2i . where the correction factor, the ratio /, is the inverse Mill’s ratio. Note that this applies to continuous variables, not to a binary variable, hence applying a similar correction procedure to a binary model gives only an approximation: ( x2i 2 ) P( y2i x1i 1 1i 0 | 2i x2i 2 ) x1i 1 ( x2i 2 ) (5) It is therefore thus not a consistent method. Nevertheless, replacing the probabilities given in (3) in the likelihood function by the approximated probabilities, estimates of the parameters of interest can be obtained by the following two-stage procedure: first estimate 2 in a probit model for y2 . Then calculate the correction factor and estimate (1) in a probit model with the correction factor as additional explanatory variables. This procedure circumvents both drawbacks of the full MLE mentioned above: it is fast and the likelihood does not blow up when approaches one. As shown by Nicoletti and Perracchi (2001), the approximation and the probability on which it is based (that is, (4)) have equal first-order Taylor expansions and the approximation is exact for 0 (where both are equal to the standard univariate normal cdf without correction term). Moreover, it is rather precise for values of even as high as 0.8. The heckit correction works particularly well if the heteroskedasticity inherent in the correction is taken into account. 2.2 A least squares approximation 7 An even simpler approximation than the heckit correction would be to use simple least squares residuals as corrections rather than inverse Mill’s ratios. This would be valid if y2 was continuous, and may again be a good approximation when y2 is discrete: y2i x2i 2 2i , P( y1i 1| y2i ) y2i x1i 1 2i . (6) Again, however, this is also only an approximation when y2 is binary and does therefore not provide consistent estimates (given that the joint normal model is true). We note that under the null that equals zero, both the least squares approximation and the heckit approximations are identical to the probit models, thus the test that equals zero are valid tests of exogeneity of y2 . 2.3 Other approximations Bhattacharya et al. (2006) and Angrist (2001) provide related comparisons. Angrist (2001) argues that the common use of parametric models overly complicates inference when the statistic of interest is causal effects and suggests that 2SLS performs as well as parametric estimators in a labour-supply model. But as noted by among others Moffitt (2001) the “good properties” of 2SLS depends crucially on where in the sample we look for causal effects. Due to the constant effect assumption of 2SLS, 2SLS and probit are bound to give different answers if we do not consider average effects or if we consider individuals with “extreme characteristics”, i.e. where x1i ˆ is not near zero. Bhattarya et al. (2006) conduct an analysis which shares some features with the current study. They compare the simple and bivariate probit to 2SLS and a two-step probit estimator using Monte Carlo analysis. The two-stage probit estimator augments the second stage for y1 with 8 the predicted probability of y2 from initial probit regression of y2 . They also consider the case with two endogenous dummy variables. Their analysis differs from ours in that they consider other approximations, larger samples (5000 observations) and for given values of correlations they change the treatment effects. We focus on smaller samples and change correlations for given treatment effects. Bhattacharya et al. (2006) confirm that with one dummy endogenous regressor 2SLS approximates average treatment effects well, whereas the two-stage probit may have larger bias for large treatment effects. Both 2SLS and two-stage probit estimators become more biased when the mean of the outcome is near 0 or 1. In the case with two dummy endogenous regressors they find that 2SLS has a larger bias than the two-stage probit estimator, especially for large treatment effects and surprisingly also for misspecified models. For this reason we do not consider 2SLS any further. 3. A model with two dummy endogenous regression variables In this section we explore situations where an approximation may be even more fruitful, namely when the dimension of the endogeneity problem increases. We focus on the case of a binary response model with two dummy endogenous variables. A simple extension of the model in the previous section that includes one more dummy endogenous regressor would be: y1 1( 2 y2 3 y3 x11 1 0), y2 1( x2 2 2 0), (7) y3 1( x3 3 3 0), (1 , 2 , 3 | x1 , x2 , x3 ) ~ N (0, 0, 0,1,1,1, 12 , 13 , 23 ). Full maximum likelihood estimation requires estimation of a trivariate probit model, which is consistent and asymptotically efficient. The likelihood function in this case is based on the trivariate joint normal probabilities: 9 P( yi1 1, y2i 1, y3i 1| x1i , x2i , x3i ) P( 2 y2i 3 y2i x1i 1 1i 0, 2i x2i 2 , 3i x3i 3 ) (8) Writing Phjk for P( yi1 h, y2i j, y3i k | x1i , x2i , x3i ) we can write the different likelihood contributions under normality as: P000 (c1i , c2i , c3i ; 12 , 13 , 23 ) P001 (c1i , c2i , c3i ; 12 , 13 , 23 ) P010 (c1i , c2i , c3i ; 12 , 13 , 23 ) P100 (c1i , c2i , c3i ; 12 , 13 , 23 ) P011 (c1i , c2i , c3i ; 12 , 13 , 23 ) (9) P101 (c1i , c2i , c3i ; 12 , 13 , 23 ) P110 (c1i , c2i , c3i ; 12 , 13 , 23 ) P111 (c1i , c2i , c3i ; 12 , 13 , 23 ) c1i ( 2 y2i 3 y2i x1i 1 ), c2i x2i 2 , c3i x3i 3 The trivariate probit estimates can be obtained using numerical integration or simulation techniques. The most common simulation estimator is probably the GHK simulated maximum likelihood estimator of Geweke (1991), Hajivassiliou (1990), and Keane (1994), which is available e.g. in the statistical program packages STATA (triprobit or mvprobit, see Cappellari and Jenkins, 2003) and LIMDEP. The GHK simulator of multivariate normal probabilities has been shown to generally outperform other simulation methods (e.g. Hajivassiliou, McFadden and Ruud, 1996). 3.1 A trivariate heckit approximation Just as in the bivariate case, we may encounter several practical problems with the multivariate probit model. An empirically tractable alternative may therefore be to use a multivariate heckit type of approximation: P( 2 y2i 3 y3i x1i 1 1i 0 | 2i x2i 2 , 3i x3i 2 ) 2 y2i 3 y3i x1i 1 E (1i | 2i x2i 2 , 3i x3i 2 ) . 10 (10) To obtain this approximation we need the first moment in a trivariate truncated normal distribution. It simplifies greatly if we assume 23 0 . Then we just get a double heckman correction: E 1 | 2 x2i 2 , 3 x3i 3 12 ( x3 3 ) ( x2 2 ) 13 . ( x2 2 ) ( x3 3 ) (11) We refer to this as the bi-variate heckit approximation. In the general case, where 23 0 , the correction terms become more complicated. They were applied by Fishe et al. (1981) in a model where y1 is continuous and are found e.g. in Maddala (1983), p. 282: E 1 | 2 h, 3 k 12 M 23 13 M 32 ; M ij (1 232 )1 ( Pi 23 Pj ), Pi E ( i | 2 h, 3 k ), i 2,3. (12) We refer to this as the trivariate heckit correction. Fishe et al. (1981) evaluated these using numerical approximations. They can however be simplified using results found in Maddala (1983), p. 368: P( x h, y k ) E ( x | x h, y k ) (h) 1 (k * ) (k ) 1 (h* ) h* h k 1 2 , k* k h 1 2 , cov( x, y ) . (13) The formulas for the correction terms in the four cases of combinations of Y2 and Y3 being 0 or 1 are presented in the appendix. In order to calculate the two correction terms, M23 and M32, we need initial estimates of 2 and 3 and 23. This can be obtained from a bivariate probit for Y2 and Y3. Alternatively we may use two linear probability models (LP) for Y2 and Y31. The model therefore involves several steps: 1 Using the LP estimates of 2 and 3 as initial estimates, we need to rescale them as 2 and 3 estimates are not from the normal model. The scaling of linear probability coefficients by 2.5 (subtracting 1.25 from the constant) has shown to work well (Maddala, 1983, p. 23). The initial estimate of 23 is obtained from the LP models as the correlation between the residuals. 11 1. Perform estimations for Y2 and Y3 and calculate the correlation between errors (using bivariate probit or linear probability models). 2. Calculate the correction terms 3. Perform a probit estimation for Y1 adding the correction terms as additional covariates. An issue is the heteroskedasticity inherited in the heckit-correction. Recall that Nicoletti and Perracchi (2001) found it useful to account for heteroskedasticity in the case with one endogenous regressor. This is consistent with findings by Horowitz (1993) who showed that heteroskedasticity may severely bias standard probit models. The variance formula needed for the heteroskedasticity correction with two dummy endogenous variables is found in the appendix. Given an expression for this variance, V, the heteroskedasticity-corrected model is: 3 x1i 1 12 M 23 13 M 32 P(Y1 1| Y2 1, Y3 1) 2 V . (14) The variance depends upon 2, 3, 23 and 12, 13. While the three former can be obtained by least squares as above, the latter two can be obtained from initial trivariate heckit estimates without the heteroskedasticity correction. As a first-hand impression of how the approximation performs, we evaluate how well the trivariate heckit probabilities approximates the trivariate normal probabilities using graphical illustrations and Taylor expansions around ( 12 , 13 ) (0, 0) . Both the bivariate heckit correction and the trivariate normal probability have the same first-order Taylor approximations if 23 0 : P(Y1 1| Y2 0, Y3 0) ( x11 ) 12 ( x3 3 ) ( x2 2 ) ( x11 ) 13 ( x11 ). ( x2 2 ) ( x3 3 ) 12 (15) The trivariate heckit (with 23 0 ) has a similar first-order Taylor expansion where one replaces the inverse of the Mill’s ratios by M23 and M32 found in (10) and (11). It is easily seen that the heteroskedasticity corrected trivariate heteroskedasticity heckit has the same first-order expansion, since the truncated variance reduces to one when ( 12 , 13 ) (0, 0) . We simulate and plot the Heckit probabilities along with the true probabilities (using the cdfmvn GAUSScode due to Jon Breslaw) where 12 is varied from -1 to 1 is shown for different combinations of high, low, positive and negative values of other correlations and x’s. Selected figures are shown in the appendix. It should be noted that correlation matrices are restricted to be positive definite, and whenever they are not, the true trivariate normal probability is set to 0, as seen in the figures. It is shown that especially the trivariate heckit approximates the true normal probability rather well even for high correlation coefficients, whereas the bivariate heckit is often badly behaved (when 23 0 ). 3.2 Variants of the multivariate probit model It is not only discrete models with dummy endogenous regressors that give rise to multivariate probit models. Variants of the multivariate probit arise in a number of empirically relevant models with e.g. sample selection or switching regressions. In fact, any combination of models with dummy endogenous regressors, sample selection or switching regressions yields new higher order multivariate models. Since this highlights the potential usefulness of approximations to the trivariate probit we summarize some of these models: 1. Two sample selection restrictions (censoring) If we only observe y1 if y2=1 and y3=1, we have a bivariate sample selection problem. Therefore all the 8 combinations of outcomes are not observed. The likelihood will be based on the observed probabilities, P011, P111, P00 where the latter is the probability that either y2 or y3 13 are zero.This is also called a partial or censored probit and was examined in the bivariate case in different variants by among others Poirier (1980) and Wynand and van Praag (1981). 2. One trivariate endogenous regressor variable If one regressor variable is endogenous with three outcomes, it can be modelled as two separate bivariate outcomes (e.g. z = yes, no, no response. Then, for instance, y2 =1(z=no) and y3 =1(z=no response)). This also gives rise to partial observability since y2 and y3 can never be 1 at the same time. Therefore the likelihood is based on the following probabilities: P000, P001, P010, P100, P101 and P1102. 3. A switching regression with one dummy endogenous regressor. With a switching regression, the relationship between y1 and regressors depends upon the outcome of another binary variable which in general gives rise to a bivariate probit model. This is the discrete version of the Roy model, which has become very popular in the treatment evaluation literature. An application is given in Carrasco (2001), studying the impact of having given birth to a child recently (y2) on female labour force participation (y1). If in addition one of the regressors is a dummy endogenous variable we need a trivariate probit model. With one endogenous regressor the model could be defined as: 2 If the endogenous variable is an ordered qualitative variable, we could make use of the ordering and hence J increase efficiency by specifying: y2 1(y*2 c j ) where cj are unobserved thresholds. This gives rise to a j=1 bivariate model with the following heckit corrections: E (1 | y2 j ) 12 E ( 2 | c j 1 x2 2 2 c j ) ( j 1 ) ( j ) , j c j x2 2 . ( j ) ( j 1 ) In relation to these type of models, Orme and Weeks (1999) show how the bivariate probit model - and Nagakura (2004) show how the sequential and the ordered probit model - are all constrained versions of the multinomial probit. 14 y1i 1( j y2i xij j j >0) if y3i j , j 0,1 (16) If instead of a dummy endogenous variable, one encounter a sample selection model, it gives rise to the next model. 4. A switching regression with sample selection. This model could be written as: y1i 1( xi j j >0) if y2i j , j 0,1 y1i observed if y3i 1 (17) As in the case with double selection, there is partial observability, so the likelihood depends upon P001, P011, P101, P111 and P0 where the latter is the marginal probability of y3 = 0. The trivariate probabilities here and in case 3 are slightly different than those described in (10), since they are derived from an underlying fourth-dimensional normal distribution of ( 0 , 1 , 2 , 3 ) with e.g.: P( y1i 1, y2i 1, y3i 1) (c1 , c2 , c3 ; 12 , 13 , 23 ) P( y1i 1, y2i 0, y3i 1) (c0 , c2 , c3 ; 02 , 03 , 23 ) (18) Note that the correlation 01 is not identified. 5. One dummy endogenous variable and sample selection. We are interested in the effect of y2 on y1 but only observe the latter if y3=1. This model could be stated as: y1i 1( y2i xi 1 1 >0) y1i observed if y3i 1 (19) The likelihood consists of the same probabilities as in case 4. This is the model that we apply in the application below. 4. Simulations 15 In this section we report simulation results demonstrating the performance of the three heckit approximations and the least squares approximation described above (which in tables are labelled short-hand as bi-heckit, tri-heckit homo (without heteroskedasticity) and tri-heckit hetero (with heteroskedasticity) and OLS). The model consists of the following three endogenous variables: y1* y2* y3* 10 11 x11 12 x12 1 y2 2 y3 e1 , 20 21 x21 22 x22 e2 , 30 31 x31 32 x32 e3 . (20) with ( j 0 , j1 , j 2 ) (0, 0.5, 0.5), j 1, 2,3, ( 1 , 2 ) (0.5, 0.5) e1 e 2 e3 0 1 12 13 N 0 , 12 1 23 , 0 13 23 1 1 if y*j 0 yj , j 1, 2,3, 0 otherwise (21) where y2 and y3 are endogenous to y1 if the e2 or e3 are correlated with e1. Furthermore, if e2 and e3 are correlated a full trivariate model is required. This is where the trivariate heckit approximation is expected to be most relevant. The regressors are drawn as independent standard normal variables with 500 independent draws in each simulation. Therefore y1 has mean 0.5. The trivariate heckit is based on initial scaled (see footnote 3) OLS estimates of parameters in the equations for y2 and y3 and so is the estimate of 23 . For comparison we have also simulated the naïve and the trivariate probit. For the latter we use the GHK simulated MLE3. We apply the rule-of-thumb (see Cappellari and Jenkins, 2003) that the number of draws made by the GHK estimator for each simulation is the square root of the number of observations, here 23. Experimenting with the number of simulations shows that results do not 3 This is done using STATA vrs. 9.0 and the mvprobit procedure written by Cappellari and Jenkins. The other simulations are conducted in GAUSS. The trivariate heckit correction is available upon request from the authors in both GAUSS and STATA code. 16 change when altering the number of Monte Carlo simulations from 100 to 1000. To obtain a higher precision we use 200 in most simulations and 100 in some additional simulations. 4.1 The basic scenarios Table 1 shows average estimation results with various combinations of values of correlations between the three error terms. To save space, we only show results for effects of y2 and y3 in the y1 equation. We show additional results as well as results for correlation coefficients in the appendix. - Table 1 about here – From the table we find that all estimators are unbiased when all rho’s are zero. The multivariate methods however have a larger mean squared error (MSE) than the naïve probit estimator. The MSEs of the multivariate approaches are reasonably close, OLS and trivariate probit performing slightly better than the three heckit approaches which show similar performance. The higher MSE is the cost of using a multivariate procedure when it is not needed. The gain of this is that one has an assessment of the degree of endogeneity in the form of estimates of 12 and 13 , and the test that these parameters are zero is a test of exogeneity of y2 and y3 . We will consider the estimates of the correlations below. Reading from left to right in table 1 we gradually increase the correlations. First of all, we see that whenever 12 and 13 are different from zero, the single equation probit estimator that completely ignores endogeneity has a large bias. From the table it is seen that with low but positive correlations (0.15) all approximate estimators are almost as good as full MLE in terms 17 of bias. In fact, in all the first four cases, when correlations are at most 0.30, MLE never has the lowest bias. This reflects a small-sample bias of MLE. The least squares estimator has the lowest bias in most cases, although the homoskedastic heckit approximations are close to it. In terms of MSE, full MLE is slightly better in most cases. Note that biheckit is as good as triheckit, when 12 and 13 are 0.30 but 23 is very high which is unexpected given that biheckit ignores the correlation 23 . In the latter case, full MLE does not converge in 5 percent of the simulations, whereas the approximations converge in all cases. When either of the correlation coefficients are of size 0.60, we see some un-negligible biases in the heckit estimators (up to size 0.3) but also for full MLE (up to size 0.15) whereas the least squares estimator performs better in most cases. The heteroskedasticity corrected heckit overall seem to perform worse than the other approximate estimators when correlations are 0.60 or lower and the least squares estimator best with the homoskedastic triheckit lagging just behind. With really high correlations (0.8) it seems that MLE outperforms most of the approximations. The bias of approximations are of magnitude 0.10-0.20 in the presented cases (i.e. 40 percent), with the heteroskedasticity corrected heckit and least squares performing best. Finally, in general it seems as if the naïve probit and full MLE have a positive (small-sample) bias and that the approximate estimators have a negative bias when correlations are positive and vice versa when correlations are negative. Furthermore, even when correlations are all 18 equal we sometimes encounter large differences in the bias of 1 and 2 . This seems peculiar as the model should be symmetric in these parameters. For the approximations it could reflect biases due to multicollinearity between the correction terms and endogenous regressors (Nawata, 1994), but it also occurs to some extent for full MLE. It may also reflect an impact of the chosen design where the effect of y2 and y3 have opposite signs. We therefore conducted similar simulations where y2 and y3 had equal signs (not reported). We still found the mentioned asymmetries. Overall the same results are found, but it seems that the heteroskedastic triheckit performs even better in this scenario with very high correlation coefficients. With respect to the correlation coefficients (cf. table A1 in the appendix, which presents results for 12 and 13 , which are the those of key interest) it seems that the approximations are rather biased while being far closer to the true values for the full MLE. Particularly the least squares approximation is very bad at recovering the correlations. Two exceptions are the biheckit and the heteroskedasticity corrected heckit. These estimators often have smaller bias than full MLE for the correlation coefficients. Similar results are obtained when y2 and y3 have equal signs. Therefore, the approximations gives a good indication of the extent to which endogeneity or sample selection matters, by looking at whether predicted effects change, but are not good for testing exogeneity or no sample selection through tests on correlations being zero. For doing this one could use the biheckit or heteroskedastic heckit approximations as start values for a full MLE. 19 It is also worth noting that in all simulations the estimated coefficients for the exogenous explanatory variables (except the constant term) seem to be well-estimated in the single equation probit model, irrespective of the severity of the endogeneity of y2 and y3. This is surprising (as bias in one parameter should run through other parameters in nonlinear models) but may be due to the fact that all regressors are uncorrelated. 4.2 Alternative scenarios In order to see how our simulation results are influenced by our set-up we have carried out further simulations where we have used larger samples and skewed distributions of the endogenous variable of interest, y1. We have used larger samples sizes to infer whether the bias of the MLE eventually would become negligible compared to the inconsistency of the approximations. Some results are outlined in table A2 and A3. They do seem to suggest that in large samples the MLE tends to outperform the approximate estimators all though the differences to the least squares in all cases and to the heckit estimators with homoskedasticity in the cases where 12 and 13 are of opposite sign seems not to far in terms of bias of the parameters of interest. The MLE of the correlation coefficients are much better than for those of the tri-heckit with homoskedasticity and the least squares approximations. It is interesting to note, however, that in the case of large samples, the tri-variate heckit with heteroskedasticity (as with small samples) and the bi-variate heckit again seem to get quite close the MLE’s for the estimates of the correlations. All multivariate estimators are far better than the naive probit. 20 We also carried out simulations with skewed distribution of y1 to see how the assumption of normality affects the estimates. More skewed samples are samples with a less symmetric distribution of y1 and they are obtained by setting the constant term in the generation equation for y1 to one rather than zero as in the above simulations. In this case the mean of y1 rises to 0.78. The results of these estimates are illustrated in table A4 and A5. Table A4 shows that all estimators are unbiased in skewed samples when all correlations are zero. In table A5 all correlations are 0.5. The results in this case are not clear-cut. For one of the parameters of interest, both the heckit estimators with homoskedasticity and the least squares estimator has a lower bias than MLE whereas for the other parameters the heckit estimators are far more biased than the least squares and MLE. Finally we have plotted bias of the various estimators against the condition number of the correlation matrix of the error terms in the simulations. This is done to illustrate the behaviour of the approximations as the correlations increase. We have left out results for the naïve probit estimator, as it always has a much larger bias than any of the other estimators. The relationship between the bias of the estimators and the condition number is illustrated in figure 1 and 2 below: - figure 1 about here – - figure 2 about here – First note that the condition number may be high all though the bias or inconsistencies of all estimators are quite low. This happens when the condition number is high due to a high correlation between e2 and e3 but that e1 and e2 and e1 and e3 are uncorrelated. 21 For relatively small condition numbers the approximate estimators perform as well as the MLE or better. For large conditions numbers the MLE outperform most of the approximate estimators, which becomes quite unreliable. This suggests that our approximations are quite reasonable to use when the correlations are low or modest, while they are more unreliable for high correlations. This confirms the overall previous result. Note however, that the heteroskedasticity corrected tri-variate heckit and the bivariate heckit estimator do not become more unreliable as the condition number increases. This might suggest that these estimators could be useful for providing starting values for the MLE. Their usefulness is further supported by the fact that they also replicate the correlation coefficients quite, even when correlations are high. Finally it is worth observing that in some simulations the MLE often failed in converging and hence estimates where unobtainable for the MLE. In contrast, our approximate estimators are always able to produce an estimate, all though sometimes of questionable reliability. In sum our simulations indicate that in small samples the approximate estimators might prove useful when the correlations are small or modestly high, compared to the difficulties with obtaining the MLE. When correlations are very high the performance of the MLE becomes clearly better than most of the approximate estimators, but some still seems to be able to provide useful starting values for the MLE and information on whether endogenity is present. Further, in some cases it is not possible to obtain the MLE while this is always possible for the approximate estimators. 22 5. An application In order to gain insights as to how the approximations work with real data we consider whether physician’s advice of doing more exercise has any impact on the amount of physical activity that people conduct. We seek to control for potential endogeneity of receiving advice as well as for the fact that we have a selected sample. Our sample only includes information on physician advice for people having hypertension. The application therefore illustrates how to use the approximations in another type of model than that considered in the simulations (namely case five considered in section three: a probit model with an dummy endogenous variable and sample selection). The example is inspired by Kenkel and Terza (2001), who consider the importance of taking endogeneity into account of physician advice with respect to alcohol consumption. They mention that sample selection might be a problem, but argue that it is likely a minor one as health problems are not the main reason people do not drink. With respect to exercise, we think it might be more likely that individuals start to exercise because of health problems – i.e. that a selection problem occurs - that is at least what we seek to control for. The data used for the application is an update of the data used by Kenkel and Terza (2001). They used the NHIS 1990 data, and we use NHIS 2003. This is the latest of the NHIS surveys which were available at the time of initiation of this project. Lack of exercise is a serious public health problem that affects longevity, physical and psychological well-being as well as being a tremendous cost to society in form of potentially lost productivity and increased treatment expenses due to e.g. obesity and related diseases (Erlichman, Kerbey & James, 2002a; 2002b; Eden et al., 2002). As opposed to many other health risk factors like drinking and smoking, physician advice is one of the relatively few 23 ways that have been used to affect exercise (public information campaigns are another possibility). Although economic support for doing exercise is becoming more widespread, countries do not tax lack of exercise or punish individuals in other ways as we can by taxing alcohol or banning drunk driving. Physician advice is relatively costless at the individual level and as opposed to public information campaigns it is a very direct way to improve individual medical information. General research about the impact of medical information suggests that it plays an important role for good health behaviour, with a probable larger effect for higher educated (e.g. Kenkel, 1991; Glied and Lleras-Muney 2003). However, existing evidence about whether physician advice really matters is sparse. US Preventive Force Services Task Force Report concludes that the existing evidence is not sufficient to support unambiguously that doctors should spend more time on advising patients with respect to exercise (Eden et al., 2002). One problem is that randomized controlled trials focus on the impact of additional training in giving advice, and that it is hard to control for advice-giving in the control-group or previous training. In addition, one might question whether behaviour induced through clinical trials replicate actual behaviour of individuals not being monitored. Therefore, observational data is useful as a complementary source of evidence. However, to our knowledge no such studies exist. Now why may receipt of (or seeking) advice be endogenous to exercise? The reason is of course, that individuals receive advice for a reason, i.e. they may in general be in poorer health than those who do not receive advice. However, another possibility is that those who receive advice exercise more than those who do not receive advice, because they generally are more precautious. Finally, there may be other factors that influence both whether seeking a physician and whether doing exercise (being a health fanatic), creating a spurious correlation. 24 We have two measures of whether individuals exercise. We use the one based upon the question: “How often do you do VIGOROUS activities for AT LEAST 10 MINUTES that cause HEAVY sweating or LARGE increases in breathing or heart rate?” The answer to the question is in the format of number of times per week. The other measure relates to light or moderate activities. This includes some daily activities and it may be more difficult to distinguish changes in behaviour from this than from the former question (Erlichman, Kerbey & James, 2002b). 5.1 Descriptive statistics We use a sample of white men between the ages of 30 and 60. They are in a risk group for getting hypertension and still not too old to view vigorous exercise as a means to improve health. Initially we examined the impact of advice for both men and women, but as the instruments used for women are insignificant, we choose to focus on men. The sample consists of 7371 observations. However, only 1623 report they received physician advice about exercise because they have hypertension. The dependent variable is a dummy for exercising vigorously at least once per week. We do not include all the explanatory variables as in Kenkel and Terza (2001), since variables like income, work status and marital status are likely to be endogenous as well. Therefore we only include age dummies, region of living and dummies for being black and one for being neither black nor white. Kenkel and Terza (2001) used dummies for having bought health insurance or being on medicare as instruments for seeking advice. However, these are invalid if health 25 insurance status affects exercise, e.g. through economic means or socio-economic position, something we find likely. Therefore we use other instruments, namely whether the individual within the last 12 months dropped to seek a doctor because of too long phone queue or because of too long waiting time at the clinic. We need an additional instrument, namely for the selection equation of having hypertension. For this we use a dummy taking value one if anyone in the family has hypertension, stroke, diabetes or heart diseases. - Table 2 about here – Descriptive statistics for our selected sample is presented in table 2. It is seen that relatively many have received the advice that they ought to be doing more exercise, namely 66 percent. On the other hand, in the light of the selected sample of people already being told they have hypertension, and that exercise is known to prevent hypertension, this number rather seem a bit low, perhaps reflecting the reluctance of doctors to interfere in the patients personal life or a disbelief in whether the advice will be followed. We also see that 42 percent are exercising vigorously at least once a week. The first three sets of columns show means and standard deviations for those who have not received an advice, those who have received an advice and those with missing information on advice. We see that those who received advice are exercising more (39 versus 31 percent), but that those with missing information are exercising even more frequently (44 percent). This on the one hand points towards a positive relationship between exercise and advice reflecting either that advice works or an endogeneity effect, but also that this association may be subject to a sample selection problem. - Table 3 about here – 26 5.2 Results Table 3 presents the estimation results for the impact of advice on exercise from various probit models controlling for age, region and ethnicity. The naïve probit results in column two reveal that there is a positive association between advice and exercise when adjustment for these covariates has been made. The next four pairs of columns show results for three corrections that each account for potential endogeneity of advice and for sample selection. As described above, the bivariate heckit and the least-squares approximations do not allow for correlation between unobservables, which the triheckit estimators do. Finally, only the hetero-heckit estimator takes heteroskedasticity into account. Before looking at the results, table 4 presents the auxiliary regressions used to calculate correction terms. To keep things simple these are simple linear probability models for receiving advice and for having hypertension. The important point to note is whether the instruments have significant explanatory power. As seen the instrument in the selection equation, family members has high blood pressure is significant and its F-value is about 6. The instruments for advice are also jointly significant. - Table 4 about here – All the approximate estimators show that when correcting for sample selection and endogeneity, the coefficient on advice is greatly magnified. Note that the standard errors blow up by the same magnitude as the coefficients, leaving nearly all coefficients insignificant at conventional significance levels. We do note though that the selection correction term is significant. The many insignificant coefficients may reflect that the models have low power. This is often observed in instrumental variable studies where the explanatory power of the instruments is weak. In our case, the explanatory power is however not alarmingly low with F- 27 values for the instrumental variables being above five, Staiger & Stock (1997). On the basis of these results we cannot really judge whether there is truly an endogeneity problem or whether it reveals a fundamental problem with the estimator. A comparison with maximum likelihood estimators will help judge on this. Table 5 therefore presents three maximum likelihood estimates, assuming joint normality. The first only accounts for endogeneity of advice, the second only accounts for sample selection while the third is the full maximum likelihood estimator accounting for endogeneity and sample selection. The model and the full likelihood function are presented in the appendix. - Table 5 about here – The bivariate probit estimates are very similar to the triheckit estimates with very high coefficients on advice. Importantly, even though the maximum likelihood estimator is an asymptotically efficient estimator, it is seen that the standard errors of the bivariate probit estimates are very high, as was the case for the heckit estimates. The full maximum likelihood estimates also has a much higher coefficient on advice than the naïve probit. Only the selection model has a smaller coefficient. If estimating the selection model using only a simple heckit correction, similar results are found (low coefficient and low standard errors). We note that the models accounting for endogeneity and selection independently can be estimated with Stata using the “biprobit” and “heckprob” commands. As far as we are aware, no statistical package contains the full MLE for this model and it is therefore estimated using maxlik in GAUSS 6.0. We experienced a couple of problems with the full MLE estimator. First of all, it converged but the Hessian could not be inverted. Therefore standard errors are not presented. Second, when changing the start parameters it converged to another local optimum. The likelihood 28 function deviated from the first only on the fifth decimal, and one may question whether the accuracy of GAUSS is sufficient to distinguish between these cases. The estimate of the coefficient on advice is a bit smaller in the second optimum. The probit coefficients do not yield evidence about the magnitude of the impact of receiving advice. We therefore calculate the average treatment effects (ATE) as: ATE 1 n ( X i ) ( X i ) n i 1 (22) Where n is the size of the selected sample, is the coefficient to advice and X contain other exogenous regressors. The ATE is presented for each estimator in the last row. The naïve probit estimator predicts that those who receive advice have 8.5 higher percentage point probability of exercising. In comparison, the ATE of biheckit, triheckit, hetero-triheckit and least-squares approximations are 38.8 and 49.4, 30.3 and 22.8 while the ATE of the maximum likelihood estimators biprobit, selection and full MLE are 37.1, 2.2 and 32.9 respectively (ATE for the heckit estimator accounting only for selection is 0.086). The MLE estimates that account for endogeneity therefore agrees with the approximate estimates that the naïve probit underestimates the effect of advice. One lesson from this application is that it is a data demanding task to account for both selection and endogeneity in discrete choice problems. This cannot be resolved by using the full maximum likelihood estimates. On the contrary both heckit estimates and maximum likelihood estimates were uncertain with high standard errors. They however agree upon that the naïve probit underestimated the effect of advice, and if anything, the heckit estimator circumvents the problem of a demanding likelihood function which may be more difficult to optimize. 29 Therefore, the application did not reveal any substantial problems with the heckit approximations. 6. Conclusion We have introduced several approximations of a binary normal model with two dummy endogenous regressors as an alternative to the more complex trivariate probit model. We considered the small sample properties of the approximations and showed that a standard probit model that does not account for endogeneity is severely biased in the presence of even moderate endogeneity. The approximations are less biased. This is particularly so for the heckit approximation and the least squares approximation when the degree of endogeneity is not too severe. When endogeneity is severe, the bias of both the least squares and the heckit approximations is not negligible with the exception that a heteroskedasticy corrected heckit performs almost as well as full MLE. The heteroskedasticy corrected heckit also has a very small bias for the estimated correlation coefficients, where other heckit estimators and in particular the least squares estimator are severely biased. The heteroskedasticity corrected heckit approximation therefore provides a good starting point for tests of exogeneity or absence of sample selection. The heteroskedasticy corrected heckit also seems more robust to cases with a high degree of multicollinearity defined as a high condition number for the correlation matrix. Through our application we show the importance of taking endogeneity of dummy variables into account and demonstrate that in this case the trivarite heckit estimator is a more useful tool for doing so as MLE. All taken together we view the heckit approximations as a good initial working tool e.g. for initial estimates before doing full MLE. In future research it would be beneficial to extend the 30 current methods to even higher dimensions, perhaps relate it more closely to the multinomial probit or to examine the power of the test for exogeneity using the heteroskedasticity corrected heckit estimator. Acknowledgements We would like to thank Mads Meier Jæger and participants at the 28th Symposium for Applied Statistics in Copenhagen, 2006, and at the Health Economics Seminar, University of Southern Denmark, for useful comments. 31 References Alvarez, R. M. and G. Glasgow, (2000). Two-Stage Estimation of Nonrecursive Choice Models. Political Analysis 8 (2), 11-24. Angrist, J. D. (2001). Estimation Of Limited Dependent Variable Models With Dummy Endogenous Regressors: Simple Strategies For Empirical Practice. Journal of Business and Economic Statistics 19 (1), 2-15. Ashford, J. R. and R. R. Sowden (1970). Multivariate probit analysis. Biometrics 26, 535-546. Bhattacharya, J., Goldman, D. and D. McCaffrey (2006). Estimating probit models with selfselected treatments. Statistics in Medicine 25, 389-413. Bollen, K. A., Guilkey, D. K. and T. A. Mroz (1995). Binary Outcomes and Endogenous Explanatory Variables: Tests and Solutions with an Application to the Demand for Contraceptive Use in Tunisia Demography 32 (1), 111-131. Cappellari, L. and S. P. Jenkins (2003). Multivariate probit regression using simulated maximum likelihood. Stata Journal 3 (3), 278-294. Carrasco, R. (2001). Binary choice with binary endogenous regressors in panel data: Estimating the effect of fertility on Female Labor Participation. Journal of Business & Economic Statistics 19 (4): 385-394. 32 Eden, K., T. Orleans, Mulrow, C. D., Pender, N. J., Teutsch, S. M. (2002). Does Counseling by Clinicians Improve Physical Activity? A Summary of the Evidence for the U.S. Preventive Services Task Force. Annals of Internal Medicine 137 (3), 208-215. Erlichman, J., A. L. Kerbey & W. P. T. James (2002a). Physical activity and its impact on health outcomes. Paper 1: the impact of physical activity oncardiovascular disease and allcause mortality: an historical perspective. Obesity Reviews 3 (4), 257-271. Erlichman, J., A. L. Kerbey & W. P. T. James (2002b). Physical activity and its impact on health outcomes. Paper 2: prevention of unhealthy weight gain and obesity by physical activity: an analysis of the evidence. Obesity Reviews 3 (4), 273-287. Evans, W. N. and R. Schwab (1995). Finishing High School and Starting College: Do Catholic Schools Make a Difference? Quarterly Journal of Economics 110 (4), 941-974. Fishe, R. P. H., R. P. Trost and P. Lurie (1981). Labor force earnings and college choice of young women: An examination of selectivity bias and comparative advantage. Economics of Education Review 1 (2), 169-191. Fleming, M.F., M. P. Mundt, M. T. French, B. K. Lawton, M. L. Baier and E. A. Stauffacher. (2001). Benefit-cost analysis of brief physician advice with problem drinkers in primary care practices. Medical Care 38(1), 7-18. 33 Geweke, J. (1991). Efficient Simulation from the Multivariate Normal and Student-t Distributions Subject to Linear Constraints, Computer Science and Statistics: Proceedings of the Twenty-Third Symposium on the Interface, 571-578. Glied, S., & Lleras-Muney, A. (2003). Health inequality, education and medical innovation. NBER Working Paper 9738. Hajivassiliou, V. (1990). Smooth Simulation Estimation of Panel Data LDV Models. Unpublished Manuscript. Hajivassiliou, V., D. McFadden and P. Ruud. (1996). Simulation of multivariate normal rectangle probabilities and their derivatives. Theoretical and computational results..Journal of Econometrics 72, 85-134. Heckman, J. J. (1974). Shadow Prices, Market Wages, and Labor Supply. Econometrica 42 (4), 679-694 Heckman, J. J. (1976). The common structure of statistical models of truncation, sample selection and limited dependent variables and a simple estimator for such models. Annals of Economic and Social Measurement 15, 475-492. Heckman, J. J. (1978). Dummy endogenous variables in a simultaneous equation system. Econometrica 46 (6), 931-959. 34 Horowitz, J. L. (1993). Semiparametric and Nonparametric Estimation of Quantal Response Models in G. S. Maddala, C. R. Rao and H. D. Vinod (Eds.): Handbook in Statistics vol. 11, 45-72 Keane, M. P. (1994). A Computationally Practical Simulation Estimator for Panel Data. Econometrica 62(1), 95-116. Lechner, M., Lollivier, S. and T. Magnac (2005). Parametric binary choice models. Chapter prepared for Matyas and Sevestre (Eds.): The Econometrics of Panel Data. Maddala, G. S. (1983). Limited-Dependent and Qualitative Variables in Econometrics. Econometric Society Monographs. Cambridge University Press. Moffitt, R. A. (2001). Comment on Angrist: Estimation Of Limited Dependent Variable Models With Dummy Endogenous Regressors: Simple Strategies For Empirical Practice. Journal of Business and Economic Statistics 19 (1), 20-23. Mroz, T. (1999). Discrete factor approximations in simultaneous equation models: Estimating the impact of a dummy endogenous variable on a continuous outcome. Journal of Econometrics 92, 233-274. Nagakura, D. (2004). A note on the relationship of the ordered and sequential probit models to the multinomial probit model. Economic Bulletin 3 (40), 1-7. 35 Nawata, K. (1994). Estimation of sample selection bias models by the maximum likelihood estimator and Heckmans two-step estimator. Economics Letters 45, 33-40. Nicoletti, C. and F. Perracchi (2001). Two-step estimation of binary response models with sample selection, unpublished working paper. O’Higgins, N. (1994). YTS, employment, and sample selection bias. Oxford Economic Papers 46, 605-628. Poirier, J. (1980). Partial observability in bivariate probit models. Journal of Econometrics 12 (2), 209-217. Staiger, D., & Stock, J. H. (1997). Instrumental variables regression with weak instruments. Econometrica, 65(3), 557–586. U.S. Department of Health and Human Services (1996). Physical Activity and Health: A Report of the Surgeon General. Atlanta, GA: U.S. Department of Health and Human Services, Centers for Disease Control and Prevention, National Center for Chronic Disease Prevention and Health Promotion. Vytlacil, E. and N. Yildiz (2006). Dummy endogenous variables in weakly separable models. Mimeo. Columbia University. Weeks, M. and C. Orme (1996). The statistical relationship between bivariate and multinomial choice models. Mimeo. Cambridge University. 36 Wooldridge, J. M. (2002). Econometric Analysis of Cross Section and Panel Data. Cambridge MA: MIT Press. Wynand, P. and B. van Praag (1981). The demand for deductibles in private health insurance: A probit model with sample selection. Journal of Econometrics 17, 229-252. Yatchew, A. and Z. Griliches (1985). Specification Error in Probit Models. The Review of Economics and Statistics 67 (1), 134-139. Zellner, A. and T. H. Lee (1965). Joint estimation of relationships involving discrete random variables. Econometrica 33 (2), 382-394. 37 Appendix 1. The correction terms in the trivariate heckit The formulas needed for the trivariate Heckman correction are derived. Two formulas from Maddala (1983) are used repeatedly. They are: E 1 | 2 h, 3 k 12 M 23 13 M 32 (*) 2 1 M ij (1 23 ) ( Pi 23 Pj ), Pi E ( i | 2 h, 3 k ), i 2,3 and: P( x h, y k ) E ( x | x h, y k ) (h) 1 (k * ) (k ) 1 (h* ) s h* h k 1 2 , k* k h 1 2 , cov( x, y ) . (**) There are four terms in the correction corresponding to pairs of combinations of. To derive these, the following change of variables is used: z 2 , v 3 , i.e.: E (1 | Y2 1, Y3 1) E (1 | 2 x2 2 , 3 x3 3 ) E (1 | z x2 2 , v x3 3 ) We can now use (*) to get: E (1 | z x2 2 , v x3 3 ) 12 M 23 13 M 32 2 1 M ij (1 23 ) ( Pi 23 Pj ) P2 E ( z | z x2 2 , v x3 3 ), P3 E (v | z x2 2 , v x3 3 ). In order to obtain the latter parts, we need to rearrange (**): P( x h, y k ) E ( x | x h, y k ) P( z h, v k ) E ( z | z h, v k ) . We therefore get an adjusted version of (**) that can be used directly to obtain the Pis in (*): k h h k (k ) 1 (h) 1 1 2 1 2 E ( x | x h, y k ) ( x h, y k ; ) co rr ( x, y ) Therefore: 38 P2 E ( z | z x2 2 , v x3 3 ) x x x x 23 2 2 23 3 3 23 ( x3 3 ) 1 2 2 ( x2 2 ) 1 3 3 2 2 1 23 1 23 ( x2 2 , x3 3 ; 23 ) P3 can be obtained from this expression by interchanging x2 2 and x3 3 . Proceeding in the same fashion, we get: E (1 | Y2 0, Y3 1) E (1 | 2 x2 2 , v x3 3 ) 12 M 23 13 M 32 2 1 M ij (1 23 ) ( Pi 23 Pj ) P2 E ( 2 | 2 x2 2 , v x3 3 ), P3 E (v | 2 x2 2 , v x3 3 ). Again the adjusted (**)-formula gives us: P2 E ( 2 | 2 x2 2 , v x3 3 ) x x x x 23 2 2 23 3 3 23 ( x3 3 ) 1 2 2 ( x2 2 ) 1 3 3 2 1 23 1 232 ( x2 2 , x3 3 ; 23 ) P3 E (v | 2 x2 2 , v x3 3 ) x x x x 23 3 3 23 2 2 23 ( x2 2 ) 1 3 3 ( x3 3 ) 1 2 2 2 1 23 1 232 ( x2 2 , x3 3 ; 23 ) The third correction is: E (1 | Y2 1, Y3 0) E (1 | z x2 2 , 3 x3 3 ) 12 M 23 13 M 32 2 1 M ij (1 23 ) ( Pi 23 Pj ) P2 E ( z | z x2 2 , 3 x3 3 ), P3 E ( 3 | z x2 2 , 3 x3 3 ). The adjusted (**)-formula now gives us: 39 . . P2 E ( z | z x2 2 , 3 x3 3 ) x x x x 23 2 2 23 3 3 23 ( x3 3 ) 1 2 2 ( x2 2 ) 1 3 3 2 1 23 1 232 ( x2 2 , x3 3 ; 23 ) P3 E ( 3 | z x2 2 , 3 x3 3 ) x x x x 23 3 3 23 2 2 23 ( x2 2 ) 1 3 3 ( x3 3 ) 1 2 2 2 1 23 1 232 ( x2 2 , x3 3 ; 23 ) Finally, the last correction terms are: E (1 | Y2 0, Y3 0) E (1 | 2 x2 2 , 3 x3 3 ) 12 M 23 13 M 32 2 1 M ij (1 23 ) ( Pi 23 Pj ) Pi E ( i | 2 x2 2 , 3 x3 3 ), i 2,3 where: Pi E ( i | 2 x2 2 , 3 x3 3 ) x3 3 23 x2 2 x2 2 23 x3 3 ( x2 2 ) 1 ( ) 23 ( x3 3 ) 1 ( ) 2 2 1 23 1 23 . ( x2 2 , x3 3 ; 23 ) 40 . Appendix 2. Heteroskedasticity correction To account for the heteroskedasticyy inherent in heckman corrections we derive the truncated variance in a trivariate normal distributiosn. Assuming joint normality of the errors: (1 , 2 , 3 | x1 , x2 , x3 ) ~ N (0,0,0,1,1,1, 12 , 13 , 23 ) . Then we know E (1 | 2 x2 , 3 x3 ) from the formulas in Maddala (1983), p. 282. To find the truncated variance we use properties of the multivariate normal distribution and write 1 as a regression function of 2 and 3 : 122 32 132 22 2 12 13 23 1 b2 2 b3 3 v, v ~ N (0,1 ), v 2 , 3 22 32 232 1 2 12 23 13 b2 2 23 12 1 b 2 2 2 2 b3 32 3 13 2 3 23 13 23 12 Because we assumed unit variances, this simplifies to: 1 12 23 13 13 23 12 122 132 12 13 23 v , v ~ N 0,1 2 3 2 2 2 1 23 1 23 1 23 cov( 2 , v) cov( 3 , v) 0 It is easy to see that we obtain the truncated mean given in Maddala (1983), when inserting this in E (1 | 2 x2 , 3 x3 ) : 23 13 13 23 12 E (1 | 2 x2 , 3 x3 ) E 12 v | x , x 2 3 2 2 3 3 2 1 232 1 23 1 1 23 13 23 12 12 P2 13 P3 12 ( P2 23 P3 ) 13 ( P3 23 P2 ) 2 2 2 2 1 23 1 23 1 23 1 23 12 M 23 13 M 32 Where P2 and P3 are defined in appendix 1. We can obtain the variance in the same way: 41 V (1 | A) E (12 | A) E (1 | A) 2 E ((b2 2 b3 3 v) 2 | A) E ((b2 2 b3 3 v) | A) 2 E (b22 22 b32 32 v 2 2b2b3 2 3 2b2 2 v 2b313 3v | A) E ((b2 2 b3 3 v) | A) 2 b22 E ( 22 | A) b32 E ( 32 | A) 2b2b3 E ( 2 3 | A) E (v 2 ) b22 E ( 2 | A) 2 b32 E ( 3 | A) 2 2b2b3 E ( 2 | A) E ( 3 | A) b22V ( 2 | A) b32V ( 3 | A) 2b2b3 cov( 2 , 3 | A) E (v 2 ) Where b2 and b3 are defined above. With this in hand, we can use Rosenbaum’s formulas (Maddala, 1983, p. XXX) to obtain the final result. For instance with A { 2 x2 , 3 x3} , we get: E ( 2 | 2 x2 , 3 x3 ) ( x2 )(1 ( x3* )) 23 ( x3 )(1 ( x2* )) / F 1 232 * E ( 22 | 2 x2 , 3 x3 ) 1 x2 ( x2 )(1 ( x3* )) 232 x3 ( x3 )(1 ( x2* )) 23 ( x23 ) / F 2 1 232 * * * E ( 2 3 | 2 x2 , 3 x3 ) 23 23 x2 ( x2 )(1 ( x3 )) 23 x3 ( x3 )(1 ( x2 )) ( x23 ) / F 2 x * 2 x2 23 x3 2 1 23 ,x * 3 x3 23 x2 1 232 ,x * 23 x22 x32 2 23 x2 x3 1 232 , F ( x2 , x3 , 23 ) And the conditional first and second moments of 3 are obtained from the two first formulas by interchanging x2 ans x3. Gathering results: 42 V (1 | 2 x2 , 3 x3 ) b22 E ( 22 | A) b32 E ( 32 | A) 2b2b3 E ( 2 3 | A) E (v 2 ) b22 E ( 2 | A) 2 b32 E ( 3 | A) 2 2b2b3 E ( 2 | A) E ( 3 | A) 1 23 1 232 * 2 * * b 1 x2 ( x2 )(1 ( x3 )) 23 x3 ( x3 )(1 ( x2 )) ( x23 ) F 2 1 23 1 232 2 * 2 * * b3 1 x3 ( x3 )(1 ( x2 )) 23 x2 ( x2 )(1 ( x3 ) ( x23 ) F 2 2 2 1 2b2b3 23 F 1 232 * * * 23 x2 ( x2 )(1 ( x3 )) 23 x3 ( x3 )(1 ( x2 )) ( x23 ) 2 1 b ( x2 )(1 ( x3* )) 23 ( x3 )(1 ( x2* )) F 2 2 2 2 1 b ( x3 )(1 ( x2* )) 23 ( x2 )(1 ( x3* )) F 1 2b2b3 ( x2 )(1 ( x3* )) 23 ( x3 )(1 ( x2* )) * F 122 132 12 13 23 1 * * ( x )(1 ( x )) ( x )(1 ( x )) 1 3 2 23 2 3 F 1 232 2 3 Other conditional sets are derived in a manner similar to that outlined in appendix 1. 43 Appendix 3. Taylor expansions We show that the bivariate heckit correction has the same first order Taylor expansion around ( 12 , 13 ) (0,0) as the trivariate conditional probability P(Y1=1| Y2=0,Y3=0) of the multivariate probit under the assumption that 23 0 . We also derive the first order Taylor expansion of the trivariate heckit. Starting with the latter: P(Y1 1| Y2 0, Y3 0) ( x11 12 M 23 13 M 32 ) ( x11 ) 12 ( x11 12 M 23 13 M 32 ) ( x11 12 M 23 13 M 32 ) |( 12 , 13 )(0,0) 13 |( 12 , 13 ) (0,0) 12 13 ( x11 ) 12 M 23 ( x11 ) 13 M 32 ( x11 ) It is clear that if 23 0 (i.e. for the bivariate heckit correction), the same equation is obtained with the M-functions replaced by the standard inverse Mill’s ratios: P(Y1 1| Y2 0, Y3 0) ( x1 1 12 2 13 3 ) ( x1 1 ) 12 ( x2 2 ) ( x1 1 ) 13 ( x3 3 ) ( x1 1 ). Next we look at the trivariate multivariate probit probabilities. For simplicity we have only found the first order Taylor expansion under the assumption that 23 0 . Using Bayes’ formula: P(Y1 , Y2 , Y3 ) P(Y2 , Y3 | Y1 ) P(Y1 ) P(Y2 | Y1 ) P(Y3 | Y1 ) P(Y1 ) P(Y1 , Y2 ) P(Y1 , Y3 ) / P(Y1 ) i.e. P(Y1 1 | Y2 0, Y3 0) P(Y1 1, Y2 0) P(Y1 0, Y3 0) /( P(Y1 1) P(Y2 0) P(Y3 0)). With the latent variable structure we can write these probabilities in the usual way with the normal cdf evaluated at appropriate indices, which we for simplicity denote by a’s here. The Taylor expansion of this is: 44 P(Y1 1 | Y2 0, Y3 0) (a1 , a2 ) (a1 , a3 ) /( (a1 )( a2 )( a3 )) (a1 , a2 ) (a1 ) (a2 ) (a1 ) (a3 ) /( ( a1 )( a2 )( a3 )) 12 ( a1 , a3 ) /(( a1 )( a2 )( a3 )) 12 ( 12 , 13 ) (0,0) (a1 , a3 ) 13 (a2 , a3 ) /( (a1 ) ( a2 ) ( a3 )) . 13 ( 12 , 13 ) (0,0) Noting that the derivative of the bivariate distribution function with respect to the correlation is just the bivariate density, we get: P(Y1 1 | Y2 0, Y3 0) (a1 ) 12 (a1 ) (a2 ) (a2 ) 13 (a1 ) (a3 ) (a1 ) (a1 ) 12 (a1 ) (a2 ) 13 (a3 ) (a1 ). i.e. the same as for the bivariate heckit. We have plotted the Taylor approximation along with the true trivariate probabilities and the bivariate and trivariate heckit approximations. These are shown in three cases in appendix 4, figure 1-3. In most scenarios the trivariate heckit works well, but this is not the case for the bivariate heckit. Figure 1 shows a case where all are alike. Figure 2 shows a case where the bivariate heckit does not work well whereas the trivariate does (since 23 is not zero), and finally figure 3 shows a case where the trivariate heckit does not work well. But of course this is only selective evidence. 45 Appendix 4. Figures of simulated probabilities. Figure 3. Example where approximations work well. Figure 4. Example where triheckit works well, but biheckit does not work well. 46 Figure 5. Example where triheckit does not work well 47 Appendix 5. Additional Simulations. Table A1. Simulation results. Correlation coefficients. Low correlations Tri-Heckit homo Tri-Heckit hetero OLS Bivar heckit MLE Small correlations Tri-Heckit homo Tri-Heckit hetero OLS Bivar heckit MLE Medium and mixed correlations Tri-Heckit homo Tri-Heckit hetero OLS Bivar heckit MLE Medium High correlations Tri-Heckit homo 12 Bias 13 12 0; 13 0; 23 0 12 MSE 13 12 MSE 13 2 Bias 12 13 12 .15; 13 .15; 23 .15 0.0002 0.0015 0.0500 0.0467 0.0255 0.0391 0.0509 0.0514 0.0034 -0.0006 0.0440 0.0411 0.0095 0.0151 0.0396 0.0374 0.0055 0.0016 0.1012 0.0911 0.0814 0.0865 0.1056 0.0965 0.0024 0.0002 0.0476 0.0446 0.0137 0.0209 0.0472 0.0432 0.0050 0.0264 0.0350 0.0291 -0.0470 -0.0420 0.0288 0.0199 12 .3; 13 .3; 23 .3 12 .3; 13 .3; 23 .3 0.0562 0.0741 0.0588 0.0560 0.0591 -0.0533 0.0608 0.0587 -0.0009 0.0035 0.0340 0.0299 0.0046 0.0006 0.0379 0.0355 0.1375 0.0126 0.1425 0.0165 0.1153 0.0470 0.0992 0.0377 0.1470 0.0228 -0.1341 -0.0160 0.1201 0.0500 0.1179 0.0515 -0.0526 -0.0604 0.0389 0.0286 -0.0466 0.0303 0.0238 0.0217 12 .6; 13 0; 23 .3 12 .3; 13 .3; 23 .95 0.0432 0.0540 0.0688 0.0629 0.2276 -0.0614 0.0830 0.0574 -0.0129 -0.0097 0.0418 0.0346 0.1445 -0.0597 0.0650 0.0404 0.0304 -0.0675 0.0294 -0.0605 0.1035 0.0538 0.0905 0.0463 0.3444 0.1927 -0.1950 -0.1554 0.1313 0.0626 0.1333 0.0707 -0.0412* -0.0242* 0.0609* 0.0648* 0.0073 -0.0061 0.0501 0.0289 12 .6; 13 .6; 23 .6 12 .6; 13 .6; 23 .6 0.1706 0.1742 0.0677 0.0721 0.1859 -0.1782 0.0748 0.0714 Tri-Heckit hetero OLS Bivar heckit MLE High correlations 0.0392 0.0034 0.0572 0.0573 0.0371 -0.0369 0.0611 0.0657 0.2170 0.1697 0.0922 0.0837 0.2406 -0.2209 0.1024 0.0971 0.0215 -0.0412 0.0383 0.0362 0.0609 -0.0567 0.0479 0.0527 0.0287 0.0127 0.0388 0.0361 -0.0037 -0.0369 0.0336 0.0180 Tri-Heckit homo Tri-Heckit hetero OLS Bivar heckit MLE 0.1763 0.1659 0.0354 0.0346 0.1675 -0.1708 0.0354 0.0379 0.1490 0.0931 0.0347 0.0321 0.1242 -0.1193 0.0361 0.0334 0.1809 0.1514 0.0378 0.0386 0.1768 -0.1731 0.0381 0.0423 0.0744 -0.0141 0.0264 0.0334 0.1102 -0.0995 0.0341 0.0348 0.1437 0.1550 0.1586 0.1425 0.1713 -0.1837 0.1387 0.1262 Notes: Repetitions = 200; n =500; 12 .8; 13 .8; 23 .8 12 .8; 13 .8; 23 .8 1 = 0.5; 2 =-0.5. 48 Table A2. Simulations results for 1 and 2 . Endogenous variables. Large samples. Bias 1 Correlations Probit Tri-Heckit homo Tri-Heckit hetero OLS Bivar heckit MLE 0.6327 -0.1080 -0.2790 -0.1429 -0.2287 0.0392 Repetitions = 100; n =2500; MSE 2 1 12 0.5; 13 0.5; 23 0.5 0.5406 0.4031 -0.1112 0.0368 -0.2882 0.0927 0.1385 0.0401 -0.2309 0.0790 0.0132 0.0120 Bias 2 0.2956 0.0406 0.0972 0.0328 0.0764 0.0062 1 = 0.5; 2 =-0.5. 49 1 0.6299 -0.1211 -0.3009 0.1508 -0.2262 0.0548 MSE 2 1 2 12 0.5; 13 0.5; 23 0.5 -0.5335 0.3999 0.2885 0.2102 0.0369 0.0699 -0.1118 0.1059 0.0186 -0.1623 0.0386 0.0315 0.1082 0.0664 0.0376 -0.0834 0.0149 0.0184 Table A3. Simulation results. Correlation coefficients. Large samples Correlations 12 13 12 13 Bias MSE 12 0.5; 13 0.5; 23 0.5 Bias MSE 12 0.5; 13 0.5; 23 0.5 tri-Heckit homo 0.1494 0.0353 0.1543 0.0356 tri-Heckit homo 0.1477 0.0334 -0.1419 0.0313 tri-Heckit hetero -0.0596 0.0100 -0.0487 0.0062 tri-Heckit hetero -0.0488 0.0066 0.0921 0.0114 12 Bi-Heckit 0.0527 0.0148 0.0722 0.0177 13 0.0550 0.0139 -0.0557 0.0150 12 OLS 0.2531 0.0821 0.2760 0.0959 13 OLS 0.2577 0.0865 -0.2561 0.0864 12 MLE 0.0084 0.0060 -0.0126 0.0075 13 23 23 MLE 0.0245 0.0050 0.0204 0.0081 MLE 0.0334 0.0028 -0.0375 0.0030 OLS residuals -0.2241 0.0506 0.2234 0.0501 Bi-Heckit Repetitions = 100; n = 2500. 50 Table A4. Simulations results for 1 and 2 . Endogenous variables. Skew samples. Bias MSE 1 2 Correlations Probit Tri-Heckit homo Tri-Heckit hetero OLS Bivar heckit MLE -0.0047 -0.0154 -0.0461 -0.0062 -0.0151 -0.0545 -0.0074 -0.0326 -0.0852 -0.0317 -0.0328 -0.0121 Repetitions = 100; n =500; 1 = 0.5; 2 =-0.5. 1 12 0; 13 0; 23 0 0.0224 0.1176 0.1433 0.0902 0.1174 0.1022 Bias 2 0.0233 0.1572 0.1385 0.1179 0.1569 0.0721 51 1 0.6507 -0.0763 -0.2282 0.0971 -0.0794 0.1470 MSE 2 1 2 12 0.5; 13 0.5; 23 0.5 0.5496 0.4525 0.3269 -0.2480 0.1342 0.1920 -0.1971 0.1629 0.1550 -0.0681 0.0852 0.0864 -0.2479 0.1393 0.2010 0.0603 0.1246 0.0682 Table A5. Simulation results. Correlation coefficients. Skew samples. Correlations 12 13 12 13 Bias MSE 12 0; 13 0; 23 0 Bias MSE 12 0.5; 13 0.5; 23 0.5 tri-Heckit homo 0.0064 0.0488 0.1352 0.0701 tri-Heckit homo 0.0173 0.0654 0.1617 0.0757 tri-Heckit hetero 0.0035 0.0709 -0.0139 0.0618 tri-Heckit hetero 0.0322 0.0735 0.0026 0.0604 12 Bi-Heckit 0.0065 0.0484 0.0475 0.0565 13 0.0178 0.0651 0.0758 0.0610 12 OLS 0.0018 0.0983 0.2022 0.1108 13 0.0293 0.1270 0.2531 0.1378 12 MLE .0483196 0.0381 -0.0575 0.0422 13 23 23 MLE .0124794 0.0481 -0.0290 0.0407 MLE .0076124 0.0050 0.0639 0.0113 0.0038 0.0027 -0.2217 0.0508 Bi-Heckit OLS OLS residuals Repetitions = 100; n = 500. 52 Appendix 6. The likelihood for the model of exercise, advice and sample selection With y1 being an indicator for vigorous exercise, y2 an indicator for receiving advice on more exercise and y3 being an indicator for being in the sample (i.e. having hypertension), the full model is: y1 1( y2 x1 1 1 0) y2 1( x2 2 2 0) y3 1( x3 3 3 0) y2 observed if y3 1 Where constants have been included in the x’s for convenience. The full likelihood for this model is: L P( y i: y3 0 3i 0) P( y i:h 0,1 k 0,1 1i h, y2i k , y3i 1) Under joint normality, the latter probabilities are given by the trivariate normal distribution function. 53 Table 1. Simulations results for 1 and 2 . Endogenous variables. Probit Tri-Heckit homo Tri-Heckit hetero OLS Bivar heckit MLE Medium and mixed correlations Probit Tri-Heckit homo Tri-Heckit hetero OLS Bivar heckit MLE Medium High correlations Probit Tri-Heckit homo Tri-Heckit hetero OLS Bivar heckit MLE High correlations Bias 2 MSE 2 1 1 12 0; 13 0; 23 0 Low correlations Probit Tri-Heckit homo Tri-Heckit hetero OLS Bivar heckit MLE Small correlations MSE 2 1 2 Bias 2 1 12 .15; 13 .15; 23 .15 0.0040 -0.0172 0.0161 0.0160 0.1934 0.1671 0.0525 0.0436 0.0063 -0.0220 0.1120 0.1188 -0.0282 -0.0802 0.1118 0.1292 -0.0307 0.0155 0.0905 0.0959 -0.0694 0.0066 0.1014 0.0744 -0.0164 -0.0021 0.0723 0.0785 -0.0151 0.0213 0.0756 0.0644 -0.0149 -0.0107 -0.0010 -0.0437 0.0942 0.0664 0.1011 0.0640 -0.0469 0.0384 -0.0163 0.0301 0.0988 0.0870 0.0813 0.0407 12 .3; 13 .3; 23 .3 12 .3; 13 .3; 23 .3 0.3817 0.3250 0.1606 0.1231 0.3785 -0.3988 0.1572 0.1791 -0.0507 -0.1473 0.1202 0.1417 -0.0612 0.0312 0.1249 0.1268 -0.1327 0.0301 0.1170 0.0534 -0.1319 0.1129 0.1184 0.1178 -0.0299 0.0589 0.0773 0.0498 -0.0317 0.0054 0.0736 0.0820 -0.0894 0.0735 -0.0039 0.0638 0.1020 0.0999 0.0583 0.0500 -0.0960 0.0722 0.0719 -0.0540 0.1022 0.0717 0.1060 0.0406 12 .6; 13 0; 23 .3 12 .3; 13 .3; 23 .95 0.3075 0.2409 0.1142 0.0809 0.9222 -0.1844 0.8705 0.0549 -0.0494 -0.1423 0.1261 0.1406 -0.1374 -0.0565 0.1004 0.1295 -0.1179 0.0214 0.1182 0.0620 -0.3546 0.1781 0.1869 0.0923 -0.0378 0.0498 0.0833 0.0505 0.0329 0.0682 0.0347 0.0616 -0.0829 0.0802* -0.0009 -0.006* 0.1056 0.0874* 0.0596 0.0549* -0.2036 0.1516 0.0931 -0.0642 0.0959 0.1136 0.0781 0.0409 12 .6; 13 .6; 23 .6 12 .6; 13 .6; 23 .6 0.7978 0.6112 0.6570 0.3920 0.7961 -0.8134 0.6551 0.6865 -0.0519 -0.2968 0.0917 0.1911 -0.0768 0.0471 0.0975 0.0970 -0.2923 0.0567 0.1449 0.0707 -0.2555 0.2403 0.1322 0.1266 -0.0792 0.1175 0.0447 0.0583 -0.0403 0.0184 0.0373 0.0394 -0.2285 0.0955 -0.0062 0.0464 0.1158 0.0751 0.0752 0.0358 -0.2150 0.1408 0.1997 -0.0809 0.1036 0.0766 0.1013 0.0358 12 .8; 13 .8; 23 .8 12 .8; 13 .8; 23 .8 1.1761 0.7865 1.4074 0.6416 1.1756 -1.1922 1.4122 Probit 0.1842 -0.2804 0.0661 0.1210 0.1878 -0.1983 0.0846 Tri-Heckit homo -0.1687 0.0106 0.0490 0.0489 -0.1307 0.1100 0.0585 Tri-Heckit hetero 0.1098 0.1256 0.0315 0.0419 0.2057 -0.2203 0.0637 OLS -0.2343 -0.1849 0.0965 0.1049 -0.1631 0.1415 0.0682 Bivar heckit 0.0252 0.0843 0.0816 0.0762 0.0290 0.0572 -0.0708 MLE Note: Each simulation consists of 500 draws from the data generating process. Estimates and standard deviations reported in the table are based on 200 simulations. True coefficients are 1 = 0.5; 2 =-0.5. Those marked with bold are the smallest for given set of correlation coefficients. *: 5 % of the optimisations with MLE failed to converge. 54 1.4493 0.0900 0.0643 0.0739 0.0680 0.0827 Table 2. Descriptive statistics for entire sample and according to advice status. EXERCISE ADVICE 30 < AGE < 41 40 < AGE < 51 50 < AGE < 61 BLACK NOT WHITE/BLACK NORTHEAST MIDWEST SOUTH PHONE QUEUE WAIT LONG FAMILY HBP N MEAN 0,311 0,000 0,218 0,309 0,473 0,161 0,029 0,146 0,274 0,410 0,020 0,055 0,029 547 A=0 STD.DEV 0,463 0,000 0,413 0,462 0,500 0,368 0,169 0,354 0,447 0,492 0,141 0,228 0,169 MEAN 0,395 1,000 0,205 0,356 0,439 0,190 A=1 STD.DEV 0,489 0,000 0,404 0,479 0,496 0,392 0,038 0,177 0,244 0,405 0,031 0,052 0,034 1076 0,192 0,381 0,430 0,491 0,172 0,222 0,182 A = MISSING MEAN STD.DEV 0,442 0,497 0,426 0,371 0,203 0,111 0,495 0,483 0,402 0,314 0,046 0,177 0,239 0,347 0,014 0,034 0,019 5748 0,209 0,381 0,426 0,476 0,119 0,181 0,136 Notes: A = 0 is for those not receiving advice and A = 1 is for those seeking advice. 55 TOTAL MEAN STD.DEV 0,426 0,494 0,663 0,473 0,378 0,485 0,364 0,481 0,257 0,437 0,126 0,332 0,043 0,174 0,242 0,360 0,017 0,038 0,022 7371 0,204 0,380 0,428 0,480 0,130 0,192 0,147 DE: 1623 Table 3. Estimation results from various models of exercise. ADVICE 40 < AGE < 51 50 < AGE < 61 NORTHEAST MIDWEST SOUTH BLACK NOT WHITE/BLACK CONSTANT Correction1 Correction2 ATE PROBIT Estimate std.err 0,233 0,069 -0,255 0,088 -0,346 0,084 0,114 0,110 -0,133 0,101 -0,168 0,095 -0,202 0,087 -0,257 0,183 -0,132 0,110 0,085 BIHECKIT Estimate std.err 1,351 1,403 0,231 0,242 1,277 0,671 0,218 0,135 0,158 0,152 0,176 0,168 0,352 0,269 -0,388 0,202 -6,074 2,312 -0,684 0,860 3,350 1,372 0,388 TRIHECKIT Estimate std.err 3,612 3,636 0,216 0,241 1,256 0,671 0,211 0,134 0,158 0,151 0,174 0,168 0,343 0,270 -0,394 0,200 -10,618 4,231 -2,000 2,152 10,701 4,443 0,494 HET-HECKIT Estimate std.err 2,620 2,853 0,172 0,184 0,987 0,511 0,148 0,103 0,115 0,116 0,136 0,128 0,273 0,205 -0,285 0,153 -8,126 3,265 -1,401 1,689 8,327 3,385 0,303 OLS Estimate std.err 0,666 1,171 0,189 0,231 1,133 0,662 0,224 0,132 0,117 0,150 0,139 0,166 0,338 0,270 -0,345 0,201 -5,637 2,434 -0,433 1,172 5,673 2,520 0,228 Notes: The estimators are described in the text. Correction1 and Correction2 are the generalized inverse Mills ratios for the heckit and OLS residuals for the OLS estimator. ATE is average treatment effect. Table 4. Auxiliary regressions used for the corrections terms. Dependent: FAMILY HBP PHONE QUEUE WAIT LONG WAIT*PHQEUEU 40 < AGE < 51 50 < AGE < 61 NORTHEAST MIDWEST SOUTH BLACK NOT WHITE/BLACK CONSTANT N HBP Estimate std.err 0,078 0,032 ADVICE Estimate std.err 0,082 0,261 0,022 0,042 0,054 0,098 0,011 0,012 0,015 0,014 0,013 0,014 0,195 0,009 -0,299 0,044 0,000 0,039 -0,027 -0,009 0,048 -0,009 0,076 0,023 0,012 0,068 0,638 7371 0,090 0,058 0,159 0,033 0,031 0,041 0,037 0,035 0,031 0,065 0,037 1623 Notes: HBP stands for high blood pressure. Family high blood pressure indicates whether one in the family besides oneself has high blood pressure. Table 5. Maximum likelihood estimates for models of exercise. ADVICE 40 < AGE < 51 50 < AGE < 61 NORTHEAST MIDWEST SOUTH BLACK NOT WHITE/BLACK CONSTANT rho12 rho13 ATE BIPROBIT PROBIT SELECTION Estimate std.err Estimate std.err 1,104 0,788 0,136 0,050 -0,272 0,085 0,124 0,087 -0,311 0,110 0,419 0,110 0,066 0,123 0,131 0,074 -0,093 0,111 0,034 0,071 -0,142 0,102 0,038 0,069 -0,227 0,083 0,097 0,075 -0,293 0,175 -0,161 0,127 -0,694 0,525 -1,707 0,124 -0,565 0,559 0,947 0,101 0,371 0,022 FULL MLE Estimate std.err 1,033 na -0,185 na -0,588 na 1,093 na 0,216 na 0,135 na -0,448 na -0,373 na -0,832 na 0,892 na -0,733 na 0,329 Notes: “Biprobit” is a bivariate probit obtained with the biprobit command in Stata. “Probit Selection” is a probit model with selection obtained with the Heckprob command in Stata. Full MLE is the model described in appendiks 6 and is estimated using Gauss. The Hessian would not invert thus standard errors could not be obtained. 58 Figure 1. Bias of 1 the approximate estimators and MLE plotted against condition number of correlation matrix. 1 0,8 0,6 0,4 0,2 0 -0,2 0 10 20 30 40 50 -0,4 -0,6 TRIHECKI Repetitions = 100; n =500; OLS MLE HET-TRIH 1 = 0.5; 2 =-0.5. 59 BIHECK 60 Figure 2. Bias of 2 the approximate estimators and MLE plotted against condition number of correlation matrix. 0,8 0,6 0,4 0,2 0 0 10 20 30 40 50 -0,2 -0,4 TRIHECKI Repetitions = 100; n =500; OLS MLE HET-TRIH 1 = 0.5; 2 =-0.5. 60 BIHECK 60