Testing procedures for robust estimators with non-parametric convergence rate By Yulia Kotlyarova McGill University May 12, 2003 Abstract The logit estimator is consistent and efficient in the binary choice model under the appropriate assumptions; however, those assumptions may not be realistic in many applications, and their violation such as some form of heteroskedasticity will result in inconsistency of the logit estimator. It is thus important to verify whether the logit estimator is valid. This paper proposes a few formal tests for equivalence of the logit and its semiparametric counterparts, the original and smoothed maximum score estimators (Manski 1975, 1985; Horowitz 1992) whose rates 1 of convergence are slower than n− 2 . The rejection of the null hypothesis will signify substantial, probably heteroskedastic, misspeciÞcation of the logit model. The test procedures can be grouped into Hausman-type tests (with the covariance matrix modiÞed due to the difference in convergence rates) and Monte Carlo tests. The results of a simulation study conÞrm that computationally simple Hausman tests perform well in large data sets but the Monte Carlo tests are needed to ensure the correct size of the test in smaller samples. We also consider the problem of self-selection biases in the female labour supply models (Mroz 1987). 1 Introduction For an applied econometrician, two obvious methods to estimate a linear-inparameters binary choice model are logit and probit. Despite different underlying assumptions on the distribution of error terms, these two estimators produce similar results in moderate-size applications, therefore the choice of the method is often made on the basis of computational convenience. If the presence of heteroskedasticity is suspected, one modiÞes the covariance matrix of the logit/probit estimator according to White’s (1982) formula keeping in mind that the obtained maximum-likelihood estimator may be inconsistent. The willingness of econometricians to deal with a probably inconsistent estimator instead of utilizing numerous robust techniques which yield consistent estimators under much less restrictive assumptions can be explained by analytical clarity, precision and computational ease of logit and probit. There exist, 1 of course, misspeciÞcation tests for these fully parametric estimators. Davidson and MacKinnon (1984) use artiÞcial linear regressions to test for omission of speciÞed variables and heteroskedasticity of a known form. White (1982) performs the information matrix test to detect inconsistency of the standard covariance matrix estimators. However, these tests either target a narrow set of alternatives or do not suggest any alternative in case of misspeciÞcation. In this paper we propose several testing procedures which assess the appropriateness of the standard logit/probit estimators under the assumption of arbitrary heteroskedasticity in a cross-sectional framework. We compare the logit estimator to its semiparametric counterpart, the maximum score estimator, which is robust to heteroskedastic misspeciÞcation. The large discrepancy between the two estimators will signify that the logit estimates are biased and unreliable. If the null hypothesis is not rejected, it does not imply the absence of misspeciÞcations in the logit model. It rather indicates that existing (if any) deviations from the logit assumptions do not signiÞcantly affect the logit estimators for a given data set; therefore, one may proceed working with analytically simple logit model. The maximum score (MS) is a robust binary-response estimator that can accommodate an arbitrary form of heteroskedasticity. Its basic assumption concerns the conditional median of the error term, which must be equal to zero. The original MS (Manski 1975, 1985) estimator has a slow convergence rate, n−1/3 , and a non-normal asymptotic distribution (Kim and Pollard 1990). The smoothed MS (Horowitz 1992, Kotlyarova and Zinde-Walsh 2002) has a faster convergence rate and is asymptotically normal. Several Monte Carlo studies (Manski and Thompson 1986, Horowitz 1992) indicated that while the logit estimates had smaller mean-squared errors when the disturbances were distributed homoskedastically (although they are inconsistent for non-logistic distributions), the maximum score estimates were more reliable in cases of heteroskedastic errors while the logit estimates could be badly biased. Horowitz (1993) and Charlier et al (1995), among others, compared informally the estimates of the logit model and of the semiparametric maximum score estimator to detect heteroskedasticity but no formal direct test has been used. In this paper, we compare the logit estimator, which is consistent and efficient under the null (the logit speciÞcation) but inconsistent under any alternative, and the maximum score estimator which is inefficient but consistent for any model whose conditional median of the disturbance is zero. We propose an array of testing procedures that can be grouped into Hausmantype tests (Hausman 1978) and Monte Carlo simulation tests (Dufour 1995). The idea of using the Hausman speciÞcation test to compare fully parametric and semiparametric estimators is not new. Newey (1987), for instance, applied it to the tobit and the censored LAD estimators. The distinctive feature of our test is that it compares the estimators with different convergence rates that leads to signiÞcant simpliÞcation of the asymptotic covariance matrix of the difference of the estimators. The second group of tests utilizes Monte Carlo simulations to construct an exact Þnite-sample distribution of a similar minimum-distance statistic under the null. Both the original MS and the smoothed MS can be used. The simulation tests are computationally intensive but yield more reli2 able Þnite-sample results. Section 2 describes the binary choice model, provides a brief discussion of the asymptotic behavior of the logit and the MS estimators together with corresponding assumptions, and justiÞes our choice of normalization. Section 3 includes the Hausman-type tests and discusses different methods of the covariance matrix estimation such as asymptotic formulas and bootstrap. The exact Monte Carlo testing procedures are the topic of Section 4. The results of a simulation study are presented in Section 5. In Section 6 we consider the problem of self-selection biases in the female labour supply models (see Charlier et al (1995), Kordas (2002)). We estimate and compare some parametric and semiparametric models of the labour force participation decision of women using the data set from Mroz (1987). Section 7 concludes. 2 Asymptotic properties of the estimators Consider the binary-response linear-in-parameters model yi = I(x0i β + ui ), i = 1, ..., n, ½ 1 if z > 0 , xi ∈ Rk is a random vector of explanatory 0 otherwise exogenous variables, and ui is a scalar error term. We provide a summary of the asymptotic behavior and corresponding assumptions of the three considered estimators: the logit, the original MS and the smoothed MS. The choice of normalization for the MS estimators is discussed. where I(z) = 2.1 The logit model Under the logit assumptions, the error terms are iid, with the logistic cumulative eu . distribution function F (u) = 1 + eu The logit estimator is the maximum likelihood estimator; it is consistent and asymptotically efficient under correct speciÞcation: √ d n (blog − β 0 ) → N(0, J −1 ), where J is the information matrix. Any deviation from the logistic homoskedastic distribution of the error term leads to inconsistency (Johnston and DiNardo 1997). Regularity conditions for the logit estimator include existence and non-singularity of E(xx0 ). 2.2 The original maximum score estimator The MS estimator maximizes the score function Sn (b) ≡ bbms = arg maxSn (b). b 3 1 n n P (2yi −1)·sgn(x0i b) : i=1 The estimate bb of β can be identiÞed only up to scale and Manski (1975) uses the normalization b0 b = 1. For bb to be strongly consistent, the following assumptions are made (Manski 1985) : A1. There exists a unique β ∈ B ⊂ Rk (where B is a compact set) such that median(u|x) = 0; A2. (a) The support of the marginal distribution of x, Fx , is not contained in any proper linear subspace of Rk ; (b) 0 < Pr[yi∗ ≥ 0|x] < 1, a.e. Fx ; (c) There exists at least one j such that β j 6= 0 and such that for almost every value of x the distribution of xj conditional on all other regressors has everywhere positive Lebesgue density. A3. (yi , xi ), i = 1, ..., n, is a random sample of (y, x). Under given assumptions bbms is consistent; the estimator converges at the rate n−1/3 to the random variable that maximizes a certain Gaussian process with a quadratic drift (Kim and Pollard 1990). There are some indications that the traditional bootstrap does not work properly in this case (Delgado et al 2001, Manski and Thompson 1986). However, Delgado et al (2001) showed that valid inference for the maximum score estimator can be obtained using subsampling. 2.3 The smoothed MS estimator Assume that the smoothing function ψ is a continuously differentiable function with support in [−1, 1]. Let R (a) R ψ(w)dw > 0; (b) wψ(w)dw = 0; 1 (c) The bandwidth parameter σn → 0 and σ n n 3 → ∞. Now consider the smoothed MS estimator under a smoothing scheme ψ: bbψ = arg maxSn (b, σn ), b Z 1X v (2yi − 1) · sgn(x0i b − v)ψ( )dv. where Sn (b, σ n ) ≡ n σn (1) The estimator was proposed in 1992 by Horowitz (with a slightly different approach to smoothing) who obtained a faster than n−1/3 convergence rate by imposing some smoothness assumptions on the distribution of x and u. These assumptions have been relaxed in Kotlyarova and Zinde-Walsh (2002) to Þt more general models. Under assumptions of Kotlyarova and Zinde-Walsh (2002) or performing undersmoothing under Horowitz’s (1992) conditions, we obtain the following asymptotic distribution: √ e ) →d N (0, Λ), where Λ is a Þnite PD matrix. A complete set nσ n (ebψ − β 0 of assumptions is given in the Appendix. Since the semiparametric estimators are determined only up to scale, normalization is needed to compare them to the logit estimators. One obvious choice is 4 to set one of the coefficients to ±1 (Horowitz 1992). This kind of normalization can be problematic if the preset coefficient turns out to be insigniÞcant (see, for example, Charlier et al (1995)). An alternative normalization in which the vector of coefficients has a unitary norm avoids this particular problem and will be used in our test procedures. This normalization requires certain modiÞcations of the logit asymptotics: we will with the asymptotic √ distribution of the or√ work b β0 − ) n(blog −β 0 ). As long as instead of thogonal transformation of n( kblog kβ 0 k log k b b is bounded away from 0 (Manski’s 1985 assumption), is a continuous funckbk √ b − kββ0 k ) tion and we can approximate the asymptotic distribution of n( kblog log k 0 using delta-method: µ ¶ √ √ β blog ∂ b − 0 )= n n( (blog − β 0 ) + O(n−1/2 ). Therefore, kblog k kβ 0 k ∂b0 kbk β 0 Ã µ µ ¶ ¶0 ! √ β0 ∂ b blog ∂ b d − ) → N 0, n( J −1 ; where kblog k kβ 0 k ∂b0 kbk β0 ∂b0 kbk β 0 1 1 ∂ b = Ik − bb0 . kbk ∂b0 kbk kbk3 3 Hausman-type test The null hypothesis states that ui is logistic, with a cumulative distribution eu . Under the alternative ui is not logistic, probably hetfunction F (u) = 1 + eu eroskedastic; median(ui |xi ) = 0 for almost every xi under the alternative as well as under the null . If the errors are correctly speciÞed as logistic and homoskedastic, the maximum likelihood estimator blog is consistent and asymptotically efficient, whereas the SMS estimator is consistent but inefficient. Under the alternative, which represents possible deviations from the H0 speciÞcation of disturbances, the SMS is still consistent but the logit is not. Both the logit and the smoothed MS estimators are asymptotically normally distributed. Let ebψ and eblog be the projections of the normalized to one MS and logit estimators on the subspace orthogonal to β 0 . A test of Hausman type can be based on the statistic h i−1 (ebψ − eblog ) (ebψ − eblog )0 V ar(ebψ − eblog ) which is distributed as χ2k−1 under the null. Since the smoothed MS estimator has a slower convergence rate than the logit estimator, the asymptotic variance of the difference, V ar(ebψ − eblog ), will be equal to the asymptotic variance of the smoothed MS estimators. Although the standard result for the asymptotic variance of the difference of the estimators under the null still holds, i.e. V ar(ebψ − eblog ) = V ar(ebψ ) − V ar(eblog ), one can not expect this formula to provide asymptotic reÞnements over a simpler formula, V ar(ebψ ), because there is no 5 guarantee that the second term of the linear expansion of the Þnite-sample variance of the smoothed MS estimator converges to zero faster than the asymptotic logit variance. Moreover, the relationship V ar(ebψ − eblog ) = V ar(ebψ ) ensures that the estimated variance of the difference is positive deÞnite. Obviously, if the misspeciÞcation of the error term is minor (i.e. the errors are homoskedastic but not quite logistic, or the degree of heteroskedasticity is not signiÞcant) then the power of the test will be low. However, if the test does reject the null hypothesis, it should be considered as an indication of a serious misspeciÞcation. So, the simplest test looks as follows: h i−1 (ebψ − eblog ) > χ2k−1,1−α . Reject H0 if (ebψ − eblog )0 V ar(ebψ ) (Test 1) In fact, it corresponds to testing whether the true value of the parameters is equal to the logit estimates. The asymptotic variance of the SMS estimator can be estimated using the modiÞed for our normalization formulas from Horowitz (1992) (see Appendix). Although we may expect this test to be reasonably accurate for a large sample size (e.g., 10,000 observations for k=2), the Þnite-sample performance is not reliable due to two related issues: the choice of bandwidth σ n for the SMS estimator and the accuracy of the SMS variance estimation. It has been demonstrated in Monte Carlo study by Horowitz (1992) that the size of t-test of the coefficients is very sensitive to the choice of bandwidth, and even with an asymptotically optimal bandwidth it differs from the nominal level for moderate sample sizes. The estimated variance tends to be too small and that leads to overrejection of the null hypothesis. We can bootstrap the SMS. Horowitz (1996) has proved that the use of bootstrapping in the SMS estimator provides asymptotic reÞnements under a very restrictive set of assumptions (additional smoothness of x’s; higher-order smoothing function) but the bootstrap yields more accurate results in our setup as well. The idea is to do sampling with replacement of the original data set h i−1 (biψ − bψ ). By sorting these (Y, X), each time estimating (biψ − bψ )0 V ar(biψ ) new statistics in ascending order and Þnding (1 − α)th percentile, we obtain a corrected critical value for the original statistic. h i−1 (ebψ − eblog ) > Qbootstrap . Reject H0 if (ebψ − eblog )0 V ar(ebψ ) 1−α (Test 2) The Hausman-type statistic in our case is reduced to testing whether the linear coefficients in the SMS model are equal to a given β 0 , with β 0 = blog . Both testing procedures are not logit-speciÞc. They can be applied to any binary choice, linear-in-parameters estimator with the following properties under correct speciÞcation: consistency; parametric convergence rate; the model being nested within the smoothed maximum score framework. 6 The power of the test depends on the magnitude of the logit bias under the alternative hypothesis and on the variances of the logit and the SMS estimators. The power will be high if the variances of both the logit and the SMS are small relative to the bias of the logit estimator. The test will also have some power if under the alternative the variance of the smoothed estimator is much smaller than the variance of the logit. The test will not be informative if the bias and the variance of the logit estimator are small relative to the SMS variance. In another version of the Hausman-type test the bootstrap is used to evaluate the sample variance-covariance matrix. By sampling with replacement pairs (Y, X) we accumulate m estimates b∗ψ and Þnd the bootstrapped variance m P V arbootstrap (ebψ ) as m−1 (eb∗ψi − ebψ )0 (eb∗ψi − ebψ ). Then the normalized difference i=1 of estimators is compared to the standard χ2 distribution: h i−1 (ebψ − eblog ) > χ2k−1,1−α . Reject H0 if (ebψ − eblog )0 V arbootstrap (ebψ ) (Test 3) Since the bootstrapped statistic is not pivotal, there are no asymptotic reÞnements relative to Test 1 and there is an obvious disadvantage of a longer computational time. Nevertheless, the simulation results show that the bootstrapped variance is much closer to the true value of the covariance matrix than the estimate based on the asymptotic formula. The importance of this distinction will be demonstrated in the next section. Until now, we have been discussing test procedures that use only point estimators of the logit model. This information will suffice for a very large data set but not for a sample size of several hundred observations where the deviations from the asymptotic distribution are usually quite substantial. One attempt to account for the Þnite-sample speciÞcs was made in Test 2, in which we estimate the entire distribution of the standardized smoothed MS estimator on the basis of (Y, X) pairs, without any knowledge of the hypothesized null distribution. We should be able to develop more precise tests if we fully use the information that the errors are logistic under the null; this is done via Monte Carlo simulation tests. 4 Monte Carlo test The MC simulation tests are exact tests in a sense that we can control the actual size of the test. The general approach is to construct under the null the entire distribution of the statistic and to determine the critical value from the constructed distribution. We consider the same statistic, S = (ebψ − eblog )0 Θ−1 (ebψ − eblog ), where Θ is an estimate of V ar(ebψ − eblog ). In all versions of Monte Carlo tests proposed in this paper we estimate the distribution of the statistic S under the null by generating new samples with the original independent variables X and new 7 logistic disturbances u, and calculating the value of S for each of these samples. The null distribution of the statistic depends on the unknown parameter b since Y new is equal to I(X old b + unew ). The only estimator of b which is appropriate for this procedure is the logit estimator before normalization. Since the nuisance parameter is replaced by its consistent (under the null) estimator, Dufour (1995) classiÞes this test as a local Monte Carlo test that ensures the correct signiÞcance level when the sample size n → ∞. If the local Monte Carlo test fails to reject then the exact test based on the true value of b will not reject either. The procedure is similar to parametric bootstrapping with one important exception: the Monte Carlo test does not require the number of replications to go to inÞnity in order to be valid, whereas bootstrap tests do (Dufour and Khalaf). The Monte Carlo procedure includes the following steps: 1. generate N replications of statistic S; 2. Þnd the number K of simulated statistics that are greater or equal to the original statistic S0 ; K +1 ; 3. calculate the p-value of S0 as p(S0 ) = N +1 4. reject the null if p(S0 ) ≤ α, the signiÞcance level. An increase in the number of replications improves the power of the test but is not crucial for the size of the test (Dufour 1995). The simplest Monte Carlo test (Test 4) would include the estimator of the asymptotic variance of the smoothed MS estimator as its standardizing matrix Θ. However, as our simulation results have shown, this test may not have any power for some heteroskedastic alternatives with sample sizes 250-1000. It has been known (see Horowirz 1992) that the asymptotic estimator of the semiparametric variance is too small relative to the true variance even in medium-size applications. That is why the asymptotic Test 1 in the previous section tends to overreject the null. This can be easily corrected by non-parametric bootstrap in Test 2, when the pairs (X, Y ) are sampled with replacement. The problem arises when the asymptotic variance estimator under some heteroskedastic alternative is close to the true variance, whereas the asymptotic estimator of the generated logistic samples signiÞcantly (say, 5 times) underestimates the actual value. This underestimation will cause the critical values of the Monte Carlo to be extremely large, so that the discrepancy between the logit and the SMS under some heteroskedastic misspeciÞcation will not be detected. The more reliable Test 5 procedure includes a bootstrapped version of the SMS variance, the same as in Test 3. In order to Þnd the variance Θ, the data set (X, Y ) is sampled with replacement to form m new samples of the m P original size. The estimator of the variance is Θ = m−1 (eb∗ − ebψ )0 (eb∗ − ebψ ). i=1 ψi ψi The performance of this estimator is not as sensitive to the distribution of disturbances as the one of the asymptotic estimator. Thus, this Monte Carlo test consists of two layers of simulations: (i) the outer cycle which generates new dependent variables for a given set of regressors X by introducing new logistic disturbances, and (ii) the inner cycle of sampling with replacement of both dependent and independent variables. In the outer cycle new values of the 8 statistic S are generated, while the inner cycle estimates the variance Θ. The test is very computationally intensive since it requires m × (N + 1) estimations of the SMS instead of one estimation in Test 1. At the same time, it is more reliable for small-size data sets. The Monte Carlo technique does not require any knowledge of the asymptotic properties of the statistic in question (Dufour 1995), therefore the same test (Test 6), with some modiÞcations, can be applied to compare the logit estimates with the original Manski estimator. There are indications that the bootstrap does not work in its traditional form when applied to the maximum score estimator. Delgado et al (2001) have demonstrated that the subsampling procedure is valid. Let d (d < n) denote the size of subsets of the original data set (X, Y ), sampled without replacement. As long as d/n → ∞ and d → ∞ when n → ∞, the bootstrap estimation based on these subsamples is valid for the estimation of the MS variance. Although the authors recommend an additional layer of bootstrapping to determine the optimal size of subsamples for test procedures, it does not seem to be necessary for a Þrst-order approximation of variance that we need in this case. The procedure of the outer cycle is the same as in Test 5. The following Table 1 summarizes our discussion of various test procedures for the statistic S = (ebsemipar − eblog )0 Θ−1 (ebsemipar − eblog ) : Type Name estimator Θ = V ar(ebsemipar ) Critical point χ21−α Haus- Test 1 SMS asymptotic man: Test 2 SMS asymptotic non-param. bootstrap Test 3 SMS non-param. bootstrap χ21−α Monte Test 4 SMS asymptotic MC simulation Carlo: Test 5 SMS non-param. bootstrap MC simulation Test 6 MS subsampling MC simulation Table 1. Summary of testing procedures There exists another problem: the Þnite sample bias of the SMS estimator and sensitivity of the SMS estimates to the choice of bandwidth. Usually, bootstrapping helps to reduce the bandwidth sensitivity. Although undersmoothing is supposed to eliminate the asymptotic bias, Þnite sample bias may remain. One way of reducing it is to use a smoothing function of a higher order. However, it will not work if a number of smoothness conditions on the distribution of u and x are not met. Another option is to apply a combination of orthogonal low-order smoothed estimates (Kotlyarova and Zinde-Walsh 2002): the advantages include weaker distributional assumptions, reduced bandwidth sensitivity and improved accuracy. In the next section we study the Þnite-sample properties of Tests 1-6 in a Monte Carlo study. 9 5 Simulation results We evaluate the performance of the testing procedures in a small-scale Monte Carlo study. Similarly to Horowitz (1992), we will work with the model y= ½ 1 if β 1 x1 + β 2 x2 + u ≥ 0, 0 otherwise, where the true value of β is ( √12 , √12 ), x1 ∼ N (0, 1), and x2 ∼ N (1, 1). Two conditional distributions of the error term u are considered: Distribution L: u ∼ logistic with median 0 and variance 1/2, and Distribution H: u = 0.25(1 + 2z 2 + z 4 )v, where z = x1 + x2 and v ∼ L. 2 4 6 We used a standard 4th-order kernel ψ(x) = 105 64 (1 − 5x + 7x − x ) to estimate the SMS parameters. Table 2 contains the percentage of rejections of the null hypothesis H0 : β 0 = blog under distribution L; it describes the actual size of Tests 1 - 6. In Table 3 we report the rejection level of the null hypothesis under the distribution H, that is, the power of the tests for the heteroskedastic alternative. The sample sizes in the experiment are 250, 500 and 1000. We perform 1000 replications per distribution and sample size. The size and power are reported for three signiÞcance levels α: 10%, 5% and 1%. Following Davidson and MacKinnon (1997), we also construct empirical distributions of the p-values of statistics; it describes the global accuracy of the test. The 95% conÞdence intervals of the results were approximated according to r a(1 − a) , where N is the formula in Johnston and DiNardo (1997): CI = 3.92 N a number of replications and α is the actual signiÞcance level. Thus, the interval is 0.037 for α = 0.1, 0.027 for signiÞcance level 0.05, and 0.012 for α = 0.01. Additional parameters of the testing procedures are as follows: Test 2: the bootstrap critical values are determined by sampling with replacement the original data set and computing the corresponding statistic 99 times; Test 3: the variance of the SMS estimator is computed as a sample variance of 99 bootstrapped (non-parametrically) estimators; Test 4: the p-value of the statistic is its ranking among 99 other statistics that were generated using the same values of regressors and new logistic disturbances; Test 5: the variance of the SMS is evaluated as in Test 3; the p-value is determined similarly to Test 4; Test 6: the variance of the original MS estimator is found by subsampling 200 times without replacement; the size of the subsample is n0.8 ; the method for p-values is the same as in Test 4. Size of the tests (Table 2). The Monte Carlo study of the size of the test conÞrms theoretical Þndings. The simplest asymptotic Test 1 is the least accurate and signiÞcantly overrejects 10 the true hypothesis for all sample sizes. It follows from Table 2a which reports, among other things, the bias, root mean squared error and estimated standard deviation of the SMS estimator that the mean estimated variance is 1.5 times smaller than the actual variance for n=250 but approaches the true one as n increases. Thus, the underestimation of the SMS variance, as well as any Þnitesample deviation from the normal distribution, contributes to the overrejection by the asymptotic test. The bootstrapped critical values in Test 2 correct the most of the size distortion but still require at least n = 1000 to ensure a correct signiÞcance level. Test 3, where we bootstrap the variance instead of asymptotically pivotal statistic S, signiÞcantly underrejects for small sample sizes (the bootstrapped SMS variance exceeds the true variance for all sample sizes). It conÞrms the fact that an asymptotically pivotal statistic is needed to achieve asymptotic reÞnements via bootstrapping. We do not recommend using Test 3 on its own since it is as time consuming as Test 2 but is less accurate. Nevertheless, its behaviour is of interest because Test 5 consists of repeated applications of Test 3 to newly generated logistic samples. Tests 4 - 6 are different versions of local Monte Carlo tests; all of them demonstrate much better accuracy in terms of the test size. As was mentioned earlier, the Monte Carlo tests do not require the number of replications to go to inÞnity in order to be exact. We have obtained plausible results with just 99 replications, which means that the decision to reject the null hypothesis at 1% signiÞcance level is reduced to comparing the actual statistic with the maximum of the generated statistics. Although Test 4 is much faster than Test 5 for the smoothed MS estimator, it is not reliable for small sample sizes as shown in Table 3. Test 6 is the only procedure which is applicable to the original MS estimator. Some deviations of the actual size of the test from the nominal one can be the result of non-optimal size of subsamples. It is recommended for small sample sizes to add an extra layer of bootstrapping to determine an appropriate subsample size (Delgado et al 2001). However, it will imply three layers of simulations per test and will be extremely computationally intensive. Power of the tests (Table 3). We study the power of the test against a heteroskedastic alternative. Table 3a shows that the smoothed maximum score estimates are very precise; moreover, the estimated asymptotic and bootstrapped SMS variances are close to the true value for n=500 and 1000. Although the logit estimator demonstrates a large bias, even at n=1000 the bias is not signiÞcant because of the large variance; thus, we should not expect our testing procedures to have high power for given sample sizes. Tests 1 - 3 reject around 80% of the logit estimators at 10% signiÞcance level. Tests 1 and 3 perform somewhat better than Test 2 but the choice of any particular test should be determined by considerations of size accuracy and computation time. In fact, our ranking of tests is not entirely accurate since the power of the tests is reported for given nominal signiÞcance levels, not the actual ones. When we take into account that Test 1 grossly overrejects under 11 the null, its performance under the alternative becomes less impressive. At the same time, Test 1 is many times faster than Tests 2 and 3. Test 4 does not have any power for small sample sizes but improves at n=1000. The problem lies in the Þnite sample performance of the estimator of the asymptotic variance Θ. In this Monte Carlo test data samples are generated by adding new logistic disturbances to the product X old blog . While estimating new asymptotic variances, we encounter the same problem as in Test 1 under the null: the true variance may be several times larger than its estimate. It implies that the distribution of newly generated statistics will be overstretched to the right. At the same time, the original estimator of the variance is greater or equal to the true variance. Thus, the original statistic will be below the critical region most of the time and will not be rejected. At n=1000 the estimates of the SMS variance become more precise and 40% of the logit estimators are rejected at α = 10%. Test 5 avoids the pitfalls of Test 4 by using bootstrapping to evaluate the variance. The bootstrapped variance behaves in a similar way under the null and under the alternative (it slightly overestimates the true value) and does not create a distortion in the distribution of statistics. Test 5 performs better than Tests 1 - 3 at signiÞcance levels 10% and 5%. The power of this test at 1% level can be improved by increasing the number of replications from 99 to, say, 499. Test 6 has a lower power. It is not surprising since the variance of the original maximum score estimator is much higher than the one of the SMS. Therefore, a higher fraction of the logit estimates will be within the conÞdence region for the MS estimator. Global analysis of size and power. Davidson and MacKinnon (1997) suggested a simple and informative way of analyzing Þnite-sample properties of test procedures. By plotting the empirical distribution of actual p-values of statistics, one can easily determine the overall behaviour of a particular test. The Þrst set of graphs (Figures 1a-1c ) presents actual p-values as functions of nominal p-values under the null hypothesis (logistic disturbances). Ideally, the nominal and actual size of the test should coincide. Thus, any deviation from the diagonal signals over- or under rejection by the test procedure. As could be expected from the test design, Test 4 - 6 are quite close to the 450 line (they are exact). Test 1 signiÞcantly overrejects for all nominal p-values, while Tests 2 and 3 slightly underreject. The behaviour of the test procedures improves with increase in their sample size. To study more closely the exact Tests 4 - 6 we plot deviations of the actual p-values from the nominal ones (Fig. 2a-2c). It is interesting that deviations of the smoothed MS Tests 4 and 5 are usually negatively correlated with deviations of the original MS Test 6. The Kolmogorov-Smirnov criterion can be used to verify whether these ßuctuations are purely random. In the third group of graphs (Fig. 3a-3c) the power of the six tests is a function of the actual size of the tests. It is achieved by evaluating the actual size and power for 100 increasing nominal size levels and plotting the obtained pairs in consecutive order. Tests 3 and 5 have the highest power and are followed by Test 2, Test 6 and Test 1. 12 Ranking of the tests. Test 1 tends to overreject the null hypothesis but is the fastest and the easiest in implementation (just one replication). Test 2 has better size properties but requires one layer of bootstrapping. Tests 3 and 4 are as computationally intensive as Test 2. However, Test 4 has no power for small sample sizes, whereas Test 3 is dominated in terms of size and power by Test 2. Test 5 improves upon Test 2 in size and power and includes two layers of bootstrapping, although the external layer need not be very large. As we can see, there are no obvious advantages in using the unsmoothed MS estimator over the smoothed one. However, it is worth considering when there are serious doubts whether the conditional density of the error term is continuous in a neighborhood of 0. In such a case it is safer to apply the original maximum score estimator. 6 The problem of self-selection biases in female labour supply models The proposed set of testing procedures enables us to perform speciÞcation analysis for more general sets of models. The problem considered in this section was motivated by the work of T. Mroz. In his 1987’s paper Mroz provides a thorough evaluation of various econometric models of the married women’s labour supply. After considering several ”Þrst generation” models that use the subsample of working women, he analyzes more sophisticated speciÞcations which control for self-selection into the labour force. In these models a distinction is made between the decision to work and the actual hours of work. It allows to capture possible differences in characteristics of working and non-working married women. All such models introduce an unobservable measure I of utility of working as opposed to not working. Then a decision to work corresponds to positive values of the utility measure which is a linear-in-parameters function. The author studies, among other things, the sensitivity of the estimated structural responses to the distributional assumption concerning the disturbance of the utility measure I. He experiments with several homoskedastic distributions, namely, the normal, the logistic and the log-normal distributions. The estimation results being similar in all three cases, Mroz continues to work under the assumption of normality. The importance of the distributional assumption can not be fully assessed by comparing the estimates from three homoskedastic distributions, two of which (normal and logistic) are known to yield very similar results in moderate-size samples. These speciÞcations can not properly reßect possible heteroskedasticity of the error term. Heteroskedasticity may be caused by the fact that women with higher level of education or higher husband’s income may have greater variety of working and non-working options that will increase the variance of the disturbance. The comparison of the logit (or probit) estimators with the MS estimators will either conÞrm the suitability of a standard homoskedastic spec- 13 iÞcation or will indicate a serious misspeciÞcation and offer a robust alternative set of estimators (the original or smoothed MS). The data set is based on the University of Michigan Panel Study of Income Dynamics (PSID) for 1975. The sample consists of 753 married women, 57% of whom worked during that year. The detailed description of the data is provided in Mroz (1987). Both the original and the smoothed maximum score estimators are found using the simulated annealing algorithm for continuous variables proposed by Corana et al in 1987 (see also Goffe, Ferrier, and Rogers 1994). This is a random search algorithm that works well for multimodal functions. Although there is no guarantee that the obtained solution corresponds to a global maximum, it is one of the largest local maxima. The labour participation function depends on all exogenous variables in the model. The vector of exogenous variables usually includes the wife’s age, her education (in years), the number of children under the age of six and the number of children between ages 6 and 18. It may also contain the non-wife income (although it is sometimes treated as endogenous) and some background variables such as the county unemployment rate, the SMSA dummy, and the wife’s mother’s and father’s educational attainment. Quadratic and cubic terms in wife’s age and education may also be included. Since the simulated annealing algorithm involves repetitive random searches along each coordinate until the desired precision level is achieved, the estimation time becomes increasingly longer with any additional independent variable. The initial model included four regressors that were highly signiÞcant in the logit speciÞcation: women’s age and education, number of children under age of 6 and the other income deÞned as a logarithm of the household’s total income minus the woman’s labour income. The logit estimates imply that the participation probability increases with education level and decreases with age, higher non-wife income and the number of small children. All variables were standardized before the optimization procedures. The results of the model estimation are summarized in Table 4. For the smoothed MS estimator we provide both the standard errors based on the Horowitz ’s asymptotic formulas and the bootstrapped estimates. For the original maximum score we Þnd the standard errors using subsampling. Smoothed MS bootstrapped standard errors are 35-100% larger than corresponding asymptotic errors. As expected, the accuracy of the original Manski estimator is lower than that of the smoothed version. For this particular choice of regressors the results of the logit and the two semiparametric estimators are very close and conÞrm Mroz’s Þndings that different distributional assumptions on the error term do not affect the estimates. The only discrepancy is in the values of intercepts. The p-value of the asymptotic Test 1 is 0.02. However, Test 1 tends to overreject under the null and it is conÞrmed by the p − value = 0.42 of Test 2, which is based on non-parametric bootstrapping of the entire statistic. Test 3 uses the bootstrapped covariance matrix but assumes the χ2 distribution of the statistic. Its p-value 0.26 also does not reject the null hypothesis of equivalence between the logit and the smoothed 14 MS estimators. The original Manski estimator has the same point estimators and larger standard errors than the smoothed MS with bootstrapped covariance matrix. Test 6 also does not reject the null. Several other speciÞcations of up to 10 regressors have been tried. The differences between the logit and the MS estimates were of the same magnitude as the differences between the original and smoothed MS estimators. The PSID data for 1975 are not the only set for which the estimates of the logit and the MS models coincide. Charlier, Melenberg and van Soest (1995) report similar results for the panel data (1984-1988) of married Dutch women. The estimators differ only in their coefficients for the number of children under 18, where the models predict the opposite signiÞcant effects. However, the coefficients on the dummy for small children are the same. Charlier et al also report that the normalization kbk = 1 has an advantage over the Horowitz’s normalization bk = 1 in terms of estimation and comparison of estimators. Kordas (2002) Þnds that a homoskedastic model predicts the labour participation decision as well as the MS estimators. The study is based on the 1999 March Supplement to the US Current Population Survey. 7 Conclusion In this paper we propose several tests to verify the validity of the logit estimator in the presence of heteroskedasticity. The tests are based on comparison between the logit estimator and its semiparametric alternative, the maximum score estimator which is robust against heteroskedasticity. A modiÞcation of the Hausman test has been used. The problem of Þnding an appropriate estimator for the variance of the difference between the logit and the MS is easily resolved due to different convergence rates of the two estimators. We consider an array of tests with increasingly accurate signiÞcance level. There is a trade-off between size and power of the test and the length of computations: higher precision is achieved by additional layers of various bootstrap/Monte Carlo simulations. 15 8 Appendix The Appendix is based on the paper by Kotlyarova and Zinde-Walsh (2002) and summarizes the properties of the smoothed maximum score estimator. 8.1 Normalization and definitions for the smoothed maximum score estimator The normalization used in this paper sets the norm of the estimator kbk to one. Consider the orthogonal projections P and M for a given β ∈ Rk : Pβ = ββ 0 kβk2 , Mβ = I − Pβ . Then any x ∈ Rk is uniquely represented as x = Pβ x + Mβ x, and therefore β0b b = Pβ b + Mβ b = kβk We are partitioning a k × 1 vector into 2 β + Mβ b. its component that ¯projects ° β and the ones orthogonal to it. Let β̃ be ¯ ° onto ¯ 0 ¯ ° °2 normalized so that ¯β̃ b¯ = °β̃ ° . The advantage of this normalization is that ³ ´ 0 in the case when β̃ b > 0 the projection P of the difference b − β̃ is zero and ³ ´ the projection Mβ b − β̃ belongs to the Rk−1 space orthogonal to β. ³ ´ Let g = Mβ b − β̃ , zi = Xi0 β 0 , ξ i = Pβ Xi , and Vi = Mβ Xi . Note that for the vector g = Mβ b its product Xi0 g equals Vi0 g. Using the new notation: Xi = ξ i + Vi , where ξ and V are orthogonal; b = β̃ + g, so g is the discrepancy between β̃ and its estimate b; Xi0 b = zi + Vi0 g. R R DeÞne the scalar constants δ ψ ≡ ψ2 (w)dw and αψ = ψ(w)dw; they determine the dependence of the asymptotic variance of the smoothed estimator on the smoothing function. Matrices D and Q characterize the asymptotic distribution of the SMS estimator. Matrix D represents the normalized asymptotic variance of the Þrst derivative of the smoothed score function, while matrix Q is the probability limit of the£ second derivative. ¤ D ≡ E fz|V (0)V V 0 and Q ≡ 2E[fu|z=0,V (0)fz|V (0)V V 0 ]. 8.2 Assumptions for the smoothed maximum score estimator Assumption 1 (median regression). For almost every xi med(ui |xi ) = 0. 16 Assumption 2 (identifiability of bb). Let Fx be the k-variate marginal distribution of x. (a) The support of Fx is not contained in any proper linear subspace of Rk . (b) 0 < Pr[y ≥ 0|x] < 1, for almost every x. (c) The distribution of at least one of the regressors, xj , conditional on (x1 , ..., xj−1 , xj+1 , ..., xk ) has everywhere positive Lebesgue density. The corresponding coefficient β j 6= 0 . (d) β 0 = β/ kβk is uniquely deÞned in the model with Assumption 1. Assumption 3 (random sampling). (yi , xi ), i = 1, ..., n, is a random sample of (y, x). Assumption 4 (smoothing scheme). (a) The smoothing function ψ is a continuously differentiable function with supportR in [−1, 1]; (b) ψ(w)dw = 1; ½ R 0 if 0 < i < h ; (c) wi ψ(w)dw = 6= 0 for i = h 1 (d) The bandwidth parameter σ n → 0 and (d’) σn n 3 → ∞. Assumption 5 (smoothness of the model). (a) Conditional densities fu|z,V (u) and fz|V (z) exist and are continuous in a neighborhood of 0. (b) The components of V and of the matrices V V 0 and V V 0 V V 0 have Þnite Þrst absolute moments. Assumption 6 (stronger alternative to Assumption 5a). Conditional density fu|z,V (u) satisÞes a Lipschitz condition in a neighborhood of 0. Assumption 7. The (k − 1) × (k − 1) matrix Q has full rank. 8.3 Asymptotic distribution of the SMS estimator Theorem 1. Under Assumptions 1-5 the solution to (1) exists in probability and is such that as n → ∞, σn → 0 at the appropriate rate ° ° °Pβ bψ ° − 1 = Op (n−1/2 σ −1/2 ) and n 0 δ ψ −1 n1/2 σ 1/2 DQ−1 ). n Mβ 0 (bψ − β 0 ) → d N (0, 2 Q αψ 1 2 3 The ¯ rate σn → 0 satisfies σ n¯ n 3 ζ 1 (σn ) → 0 with ζ 1 (σ n ) → 0 no faster than ¯fu|z=0,V (σn ) − fu|z=0,V (0)¯ → 0, thus convergence of the estimator is 1 marginally faster than n− 3 . If additionally Assumption 6 holds then σ n can be selected to satisfy σn = 1 2 o(n− 5 ) and the convergence rate is marginally slower than n− 5 . 0 h i δ δ η(ψ) ≡ (nσn )1/2 Mβ0 (bbψ − β 0 ) . Let δ = ( αψ2 1 , ..., αψ2 l ) and τ ψi ψj ≡ DeÞne e R ψ1 ψi (w)ψj (w)dw . αψi αψj 17 ψl Theorem 2. Under the conditions of Theorem 1 for each of the set of linearly independent functions {ψ1 , ..., ψl } and for σn of Theorem 1 the joint distribution converges to 0 (e η(ψ 1 )0 , ..., e η (ψl )0 ) →d N(0, Ψ ~ Q−1 DQ−1 ) where the l × l matrix Ψ has elements {Ψ}ij = τ ψi ψj , ~ denotes the Kroneker product. If functions {ψ1 , ..., ψl } are mutually orthogonal, then Ψ = diag(δ). 8.4 Computational formulas for covariance matrix and bandwidth One way of estimating asymptotic characteristics of the SMS estimator under the normalization suggested in this paper is to switch to a local system of coordinates so that the estimated vector bb becomes one of the new basis directions. Then we may use the Horowitz(1992,1996) formulas to evaluate the covariance δ matrix αψ2 Q−1 DQ−1 , or δ ψ Q−1 DQ−1 ,since αψ is usually normalized to one. ψ More formally, let W denote the matrix whose columns are orthonormal eigenvectors of Mbb , starting with bb. Then the coordinates of any point in the local system, x∗ , are related to the initial coordinates, x, through the following exe in the formulas given by Horowitz corresponds to pression: x∗ = W −1 x; and x (x∗2 , ..., x∗k ). µ 0 ¶ P b xi 0 −1 −2 b (2yi − 1)ψ xei xei 0 converges in probability The matrix Q = n σn σn to Q. δ The parameter αψ2 D has two consistent alternative estimators: ψ µ ¶ P 2 b0 xi −1 b D1 = (nσn ) ψ xei xei 0 with a bias of size O(σ2n ) and σn µ ¶ R P 0 b0 xi −1 b D2 = ψ(w)dw · (nσ n ) ψ xei xei 0 with a O(σ hn ) bias. σn The Þrst estimator is always positive semideÞnite. However, its bias is too large to ensure asymptotic reÞnements from the use of bootstrapping. Therefore, b2 we mostly use the second alternative and switch to the Þrst method only if D is not PD. The procedure for determining the bandwidth σn includes several steps: • Þnd the unsmoothed MS estimator bms , choose the 10th percentile of the empirical distribution of x0i bms as the initial value of the bandwidth σn ; • perform a grid search to determine bsmoothed and estimate the optimal bandwidth σ opt n ; • introduce undersmoothing by making the bandwidth smaller than optimal, 1.3 (σopt , and redo the search to Þnd bund . n ) fi 18 µ ¶ 1 λopt 2h+1 , where λopt n is obtained according to the formula [tr(Q−1 ΦQ−1 D)]/(2hA0 Q−1 ΦQ−1 A) (see Horowitz 1992). The estimates of Q and D are given above; Φ is any positive semideÞnite matrix such that A0 Q−1 ΦQ−1 A 6= 0, and A is an asymptotic bias of the SMS estimator when the order of kernel h corresponds to the smooth−(h+1) P (2yi − nessµlevel ¶of the model. The vector A is estimated by (σ∗n ) b0 xi 1)ψ xei , where σ∗n is larger and converges to zero slower than σ n . The estiσ∗n · ³ ´h ¸−1 corrected b = A 1 − σσn∗ . mate of A also requires a small-sample correction A The estimator of the optimal bandwidth σopt is n n References [1] Charlier, E., B. Melenberg, and A. H. O. van Soest (1995) A smoothed maximum score estimator for the binary choice panel data model with an application to labour force participation, Statistica Neerlandica 49, 324342. [2] Corana, A., M. Marchesi, C. Martini, and S. Ridella (1987) Minimizing multimodal functions of continuous variables with the ”Simulated annealing” algorithm, ACM Transactions on Mathematical Software 13, 262-280. [3] Davidson, R. and J. MacKinnon (1984) Convenient speciÞcation tests for logit and probit models, Journal of Econometrics 25, 241-262. [4] Davidson, R. and J. MacKinnon (1998) Graphical methods for investigating the size and power of test statistics, The Manchester School 66, 1-26. [5] Delgado, M. A., J. M. Rodriguez-Poo, and M. Wolf (2001) Subsampling inference in cube root asymptotics with an application to Manski’s maximum score estimator, Economics Letters 73, 241-250. [6] Dufour, J.-M. (1995) Monte Carlo tests with nuisance parameters: A general approach to Þnite-sample inference and nonstandard asymptotics in econometrics, Technical report, C.R.D.E., Université de Montréal. [7] Dufour, J.-M. and L. Khalaf (2001) Monte Carlo test methods in econometrics, in: B. H. Baltagi (ed.) ”A companion to theoretical econometrics”, Malden, Mass. ; Oxford : Blackwell Publishers. [8] Goffe, W. L., G. D. Ferrier, and J. Rogers (1994) Global optimization of statistical functions with simulated annealing, Journal of Econometrics 60, 65-99. [9] Hausman, J. A. (1978) SpeciÞcation tests in econometrics, Econometrica 46, 1251-1272. 19 [10] Horowitz, J. L. (1992) A smoothed maximum score estimator for the binary response model, Econometrica 60, 505-531. [11] Horowitz, J. L. (1993) Semiparametric estimation of a work-trip mode choice model, Journal of Econometrics 58, 49-70. [12] Horowitz, J. L. (1996) Bootstrap critical values for tests based on the smoothed maximum score estimator, University of Iowa, unpublished manuscript. [13] Johnston, J. and J. DiNardo (1997) Econometric methods, McGraw-Hill. [14] Kim, J. and D. Pollard (1990) Cube root asymptotics, The Annals of Statistics 18, 191-219. [15] Kordas, G. (2002) Smoothed binary regression quantiles, working paper, University of Pennsylvania. [16] Kotlyarova and Zinde-Walsh (2002) Improving the efficiency of the maximum score estimator, McGill University, working paper. [17] Manski, C. F. (1975) Maximum score estimation of the stochastic utility model of choice, Journal of Econometrics 3, 205-228. [18] Manski, C. F. (1985) Semiparametric analysis of discrete response: asymptotic properties of the maximum score estimator, Journal of Econometrics 27, 313-333. [19] Manski, C. F. and T. S. Thompson (1986) Operational characteristics of maximum score estimation, Journal of Econometrics 32, 85-108. [20] Mroz, T. A. (1987) The sensitivity of an empirical model of married women’s hours of work to economic and statistical assumptions, Econometrica 55, 765-799. [21] Newey, W. K. (1987) SpeciÞcation tests for distributional assumptions in the tobit model, Journal of Econometrics 34, 125-145. [22] White, H. (1982) Maximum likelihood estimation of misspeciÞed models, Econometrica 50, 1-25. 20 Table 2. Rejection rate: logistic distribution Test 1 Test 2 Test 3 Test 4 Test 5 Test 6 signiÞcance level a=0.10 a=0.05 a=0.01 a=0.10 a=0.05 a=0.01 a=0.10 a=0.05 a=0.01 a=0.10 a=0.05 a=0.01 a=0.10 a=0.05 a=0.01 a=0.10 a=0.05 a=0.01 n=250 0.297 0.244 0.159 0.066 0.029 0.003 0.041 0.023 0.005 0.102 0.054 0.007 0.108 0.060 0.010 0.099 0.055 0.015 n=500 0.255 0.200 0.124 0.065 0.030 0.005 0.052 0.026 0.008 0.119 0.059 0.015 0.102 0.052 0.009 0.079 0.034 0.004 n=1000 0.223 0.167 0.101 0.085 0.040 0.008 0.075 0.038 0.012 0.124 0.068 0.008 0.124 0.072 0.012 0.096 0.048 0.008 Table 2a. Sample characteristics: logistic distribution Logit SMS MS bias root root root bias root root root root root bias root root root mean squared error mean standard variance mean White’s variance mean squared error mean asympt. variance median asympt. variance mean bootstr. variance median bootstr. variance mean squared error mean subsampl. variance median subsampl. variance 21 n=250 -0.0016 0.0624 0.0632 0.0627 -0.0083 0.1143 0.1025 0.0592 0.1229 0.1091 -0.0057 0.1264 0.1262 0.1204 n=500 0.0000 0.0438 0.0445 0.0445 -0.0053 0.0877 0.0793 0.0520 0.0926 0.0819 -0.0015 0.1016 0.0996 0.0959 n=1000 -0.0013 0.0302 0.0314 0.0313 -0.0039 0.0636 0.0676 0.0421 0.0663 0.0575 -0.0033 0.0788 0.0754 0.0732 Table 3. Rejection rate: heteroskedastic distribution Test 1 Test 2 Test 3 Test 4 Test 5 Test 6 signiÞcance level a=0.10 a=0.05 a=0.01 a=0.10 a=0.05 a=0.01 a=0.10 a=0.05 a=0.01 a=0.10 a=0.05 a=0.01 a=0.10 a=0.05 a=0.01 a=0.10 a=0.05 a=0.01 n=250 0.777 0.734 0.675 0.700 0.628 0.442 0.744 0.700 0.626 0.153 0.079 0.015 0.809 0.770 0.645 0.658 0.582 0.429 n=500 0.780 0.748 0.683 0.733 0.676 0.528 0.773 0.735 0.672 0.231 0.140 0.027 0.827 0.781 0.687 0.719 0.672 0.583 n=1000 0.835 0.793 0.747 0.805 0.767 0.675 0.831 0.798 0.738 0.402 0.267 0.070 0.860 0.834 0.748 0.767 0.733 0.667 Table 3a. Sample characteristics: heteroskedastic distribution Logit SMS MS bias root root root bias root root root root root bias root root root mean squared error mean standard variance mean White’s variance mean squared error mean asympt. variance median asympt. variance mean bootstr. variance median bootstr. variance mean squared error mean subsampl. variance median subsampl. variance 22 n=250 -0.1199 0.2775 0.2721 0.2733 -0.0035 0.0453 0.0488 0.0361 0.0548 0.0436 0.0040 0.0598 0.0884 0.0725 n=500 -0.1123 0.2088 0.1835 0.1832 -0.0040 0.0299 0.0333 0.0283 0.0330 0.0300 -0.0015 0.0438 0.0492 0.0458 n=1000 -0.1088 0.1638 0.1256 0.1250 -0.0029 0.0213 0.0213 0.0196 0.0218 0.0202 0.0009 0.0319 0.0340 0.0328 Table 4. Labour participation estimates intercept kids<6 age education ln other income logit (s.e.) 0.257 (0.064) -0.628 (0.055) -0.342 (0.065) 0.497 (0.060) -0.418 (0.073) SMS (as.s.e.) [boot.s.e.] 0.105 (0.056) [0.086] -0.630 (0.047) [0.079] -0.380 (0.044) [0.062] 0.540 (0.060) [0.081] -0.394 (0.051) [0.102] 23 MS (s.e.) 0.091 (0.124) -0.649 (0.097) -0.371 (0.100) 0.547 (0.112) -0.366 (0.135) Figure 1: Size of Tests 1 - 6 24 Figure 2: Size deviation: Tests 4 - 6 25 Figure 3: Power of Tests 1 - 6 26