Moment and IV Selection Approaches: A Comparative Simulation Study Mehmet Caner∗ Esfandiar Maasoumi† Juan Andrés Riquelme‡ August 7, 2014 Abstract We compare three moment selection approaches, followed by post selection estimation strategies. The first is adaptive lasso of Zou (2006), recently extended by Liao (2013) to possibly invalid moments in gmm. In this method, we select the valid instruments with adaptive lasso. The second method is based on the J test, as in Andrews and Lu (2001). The third one is using a Continuous Updating Objective (cue) function. This last approach is based on Hong et al. (2003) who propose a penalized generalized empirical likelihood based function to pick up valid moments. They use empirical likelihood, and exponential tilting in their simulations. However, the J test based approach of Andrews and Lu (2001) provides generally better moment selection results than the empirical likelihood and exponential tilting as can be seen in Hong et al. (2003). In this article, we examine penalized cue as a third way of selecting valid moments. Following a determination of valid moments, we run unpenalized gmm and cue and model averaging technique of Okui (2011) to see which one has better postselection estimator performance for structural parameters. The simulations are aimed at the following questions: which moment selection criterion can better select the valid ones and eliminate the invalid ones? Given the chosen instruments in the first stage, which strategy delivers the best finite sample performance? We find that the Adaptive Lasso in the model selection stage, coupled with either unpenalized gmm or moment averaging of Okui delivers generally the smallest rmse for the second stage coefficient estimators. Keywords and phrases: Shrinkage, Monte Carlo, Averaging. ∗ North Carolina State University, Department of Economics, 4168 Nelson Hall, Raleigh, NC 27695. Email: mcaner@ncsu.edu. † Emory University, Department of Economics, Atlanta, GA. Email: esfandiar.maasoumi@emory.edu. ‡ North Carolina State University, Department of Economics. Email: jariquel@ncsu.edu. 1 Introduction It is not uncommon to encounter a large number of instruments or moment conditions in the applications of instrumental variables (iv) or the Generalized Method of Moments (gmm) estimators. Some IVs or moments may be invalid, but the researcher does not know a priori which ones. This problem may be adjudicated statistically with the J-test which indicates whether overidentified restrictions are valid. If the null is rejected, the researcher would need to a moment selection technique that would allow distinguishing between the valid and invalid moment conditions. A few techniques have been proposed, each one with advantages (for example consistency) and disadvantages (such as overwhelming computational demand). In this paper we focus on information-based methods and review three of the moment selection criteria (msc) used in the current literature: (i) the shrinkage procedure as in Liao (2013) and (ii) the information-based criteria with gmm in Andrews (1999), and (iii) the information based criterion using generalized empirical likelihood of Hong et al. (2003) . By using Monte Carlo simulations we compare these methods in their performance in selecting valid moments in linear settings under several relevant scenarios: small and large sample sizes, fixed and increasing number of moment conditions, weak and strong identification, local-to-zero moment conditions, homoskedastic and heteroskedastic errors. The contribution of our study is the comparison of these multistep approaches with each other in a fairly comprehensive manner. The choice of methods in this study was motivated by the following considerations: adaptive lasso is heavily used in statistics and has computational advantages in large scale problems; penalized methods in Andrews (1999) and Hong et al. (2003) are not computationally advantageous, but are used by econometricians due to the need to determine valid instruments. Further, these three methods have reasonably strong 1 theoretical underpinnings. We analyze second stage estimation performance, considering the finite sample properties of structural parameter estimators. To this end, we employ Okui (2011) model averaging technique to get better Mean Squared Error, and smaller bias for the structural parameters. We then compare Okui (2011) to unpenalized gmm and cue estimation, following selection of valid instruments in the first stage. We find that the Adaptive Lasso in the model selection stage, coupled with either unpenalized gmm or moment averaging of Okui, generally delivers the smallest rmse for the second stage estimation. There is a large and rich literature on Moment Selection Techniques. Smith (1992) proposes a procedure to compare competing non-nested gmm estimations, allowing for heteroskedasticity and serial correlation. Again in the GMM context, Andrews (1999) proposes a moment selection procedure using information criteria based on a J-statistic corrected for the number of moment conditions. This is analogous to the use of Akaike (aic), Bayesian (bic) and Hannan-Quinn (hqic) information criteria in model selection. He shows that the proposed methods are consistent under suitable assumptions, and also formalized the downward and upward testing procedures. The downward testing consists of iterative J-tests starting from the largest moment conditions, and proceeding down to fewer moment conditions at each iteration, until the null is not rejected. The upward testing works in the opposite order. Andrews and Lu (2001) extend these methods to model selection in dynamic panel data structures. Hong et al. (2003) propose a similar approach by using the generalized empirical likelihood defined by Newey and Smith (2000) instead of the J-statistic. A relatively new type of moment selection methods are based on shrinkage 2 procedures. One of the advantages of the shrinkage is its computational efficiency which is consequential, especially in high-dimensional contexts. In a brief comparison, Hastie et al. (2009, section 3.6) conclude that the shrinkage method performs better than alternative model selection techniques in reducing the estimation error. Liao (2013) shows that gmm shrinkage procedures have the oracle property in selecting the moment conditions, and adding additional valid moments improves efficiency for strongly identified parameters. Cheng and Liao (2012) used a similar approach and proposed a weighted tuning parameter that allows to shrink invalid and redundant moments. We chose the three moment selection criteria based on their optimality properties, such as the oracle property, and good finite sample performance. For model selection, assuming valid instruments, Belloni, Chernozhukov and Hansen (2011) utilize lasso-type estimators in the many iv case and provide conditions under which the iv approach is asymptotically oracle-efficient. Caner (2009) and Caner and Zhang (2013) also use shrinkage methods for model selection in a gmm context. Canay (2010) proposed the use of trapezoidal kernel weights to shrink the first stage estimators. Kuersteiner and Okui (2010) point out that, despite the advantages of the kernel shrinkage estimation, they cannot completely reduce the estimation bias and are inflexible once a particular kernel is chosen. They also propose a moment average estimator using the method in Hansen (2007) to construct optimal instruments. Okui (2011) develops a shrinkage method that minimizes the asymptotic mean squared error. An important concern in the gmm estimation is the presence of weak ivs (Hausman et al. (2005); Andrews and Stock (2007)). The results in Cheng and Liao (2012) suggest that shrinkage estimation is robust in discarding invalid ivs, but tends to include redundant ivs when identification is weak. 3 The rest of the paper is as follows: in the next section we provide a literature review; in section 2 we review the msc approaches under comparison. In section 3 we present the details of our Monte Carlo simulation setups. In section 4 the main results of our simulation exercises are presented. Section 5 concludes. Standard notation is used for the projection operator PA = A(A0 A)−1 A0 , where A is a matrix. 2 2.1 Theoretical Framework Moment Selection Methods Consider a sequence of random variables {Zi }ni=1 drawn from an unknown probability distribution. The moment selection problem consists of selecting the r valid moments from a set of q candidates. A minimum set of s ≥ p valid moment conditions are required in order to identify the structural parameter vector θ, where p ≡ dim(θ). The set of q candidate moments can be separated in two subsets and the model is, for i = 1, . . . , n E[gS (Zi , θ0 )] = 0 S = {1, . . . , s} ? E[gS c (Zi , θ0 )] = 0 S c = {s + 1, . . . , q} (1) (2) ? where the sign = means that the relationship may not hold for some of the indexes in S c . We see that r = s + sv , where s represents the number of moments in (1) (i.e. those deemed to be valid), and sv represents the number of valid moments in the second set of q − s total moments in (2). S c represents the moments that may or may not be valid. Thus 0 ≤ sv ≤ q − s. Our framework assumes that researcher knows a priori that s instruments are valid, and they can identify p parameters. The 4 question is whether to include the rest of the instruments for efficiency considerations. This is the framework used recently by Liao (2013). Note that θ0 represents the true structural parameter vector of dimension p. The standard gmm estimator of θ0 , denoted by θ̂n is θ̂n ≡ argmin J(θ, W̄n ), θ∈Θ where W̄n is a p × p symmetric and positive definite weight matrix and the objective function (Hansen, 1982) is defined as J(θ, W̄n ) ≡ n · gn (θ)0 W̄n gn (θ), with gn (θ) = n−1 Pn i=1 (3) g(Zi , θ), and Θ is a compact subset of Rp . For ease of notation let g(Zi , θ) = gi (θ). Throughout the paper we consider the following linear model because of its computational advantages, and conduct the comparative examination in this widely used setup: y = Y θ0 + ε (4) Y = Zπ0 + u (5) where y is n × 1 vector, Y is a n × p vector of endogenous variables, Z is an n × q matrix of instruments, ε and u are unobserved random variables with constant second moments and correlated with each other. We do not deal with control variables in the simulations. This makes no difference in a simulation setup. We make the following diversion from standard GMM in (3). The set of instruments are divided into valid 5 ones,Zi1 (s × 1) and the set that we suspect may contain invalid instruments Zi2 (q − s × 1). The sample moment conditions are defined by: n 1X gi (θ, β), gn (θ, β) = n i=1 where gi (θ, β) = (gi1 (θ)0 , gi2 (θ, β)0 )0 with gi1 (θ) = Zi1 (yi − Yi θ), gi2 (θ, β) = Zi2 (yi − Yi θ) − β. The weight matrix for our nonstandard case is calculated as: n 1X Wn = gi (θ̃, β̃)gi (θ̃, β̃)0 , n i=1 where θ̃, β̃ are the first step GMM estimators with Iq as the weight matrix. The first method we discuss is the adaptive gmm shrinkage estimation method (Liao, 2013). This method has the advantage of selecting the valid moments and estimate θ in a single step. It consists of adding a slackness parameter vector β0 to the moment conditions in (2). So the model is: E gi1 (θ0 ) gi2 (θ0 , β0 ) = 0. and the validity of the moment conditions is verified by inference on whether β0 = 0 or not. A moment condition j is valid only if β0j = 0, for j = 1, · · · q − s. 6 The adaptive lasso estimators are defined as: " (θ̂nalasso , β̂nalasso ) = argmin gn (θ, β)0 Wn gn (θ, β) + λn (θ,β)∈Θ×B q−s X # ω̂j |βj | (6) j=1 where Θ × B is the parameter space for (θ, β), ω̂j is a vector of weights, with ω̂j = 1 , |β̃j | and β̃j is the unpenalized standard gmm estimator using all q moments. The adaptive lasso (alasso) estimator penalizes the slackness parameter by its l1 norm. This penalty is usually preferred because it has the oracle property (β0j is shrunk to zero for the valid moments) and because it can be solved by using the lars algorithm (Efron et al., 2004), which represents a great computational advantage. Liao (2013) also considered alternatives adaptive lasso, as well as bridge, and smoothly clipped absolute deviation penalties, but we focus only on the adaptive lasso estimator because the penalty is convex and easy to estimate compared with others. The degree of shrinkage is given by the tuning parameter λn ≥ 0: large values shrink more, and λn = 0 corresponds to the gmm solution. λn is chosen to differentiate between valid and invalid moments. The second msc that we analyze is by Andrews (1999) and extended in Andrews and Lu (2001). It consist of a penalization of the J-statistic (Hansen, 1982) in equation (3). Following Andrews (1999) notation, let c ∈ Rq−s denote a moment selection vector of zeros and ones such that if the jth moment condition is valid, the jth element of c P is one. Let |c| = q−s j=1 cj denote the number of moments selected by c and Zic is the vector Z from which the jth element is deleted if the corresponding jth element in c is zero. The corresponding weight matrix is W̄nc of dimension s + |c| × s + |c|. The 7 msc estimator objective function has the following general form: mscn (c) = Jc (θ, W̄nc ) − h(|c|)κn , (7) where Jc (θ, W̄nc ) = gn (θ)0 W̄nc gn (θ) uses the s + |c| moments in gmm objective function. See that gn (θ) is defined immediately below equation (3). In other words, in (7) P we have W̄nc = n−1 ni=1 Zic Zic0 ¯2i , where ¯i = yi − Yi θ̄, and θ̄ is estimated through inefficient gmm with weight matrix as identity matrix, and this inefficient GMM uses Zic . The algorithm for this process works as follows. For each instrument combination, we calculate first step inefficient GMM with identity as the weight matrix, and then given the inefficient GMM estimates we setup the new weight as described above, and get the parameter estimates for the second stage efficient GMM. Then we form (7) for each instrument combination, and pick the instrument combination that minimizes (7). The corresponding efficient GMM estimates are the ones that will be used. To be specific, say we have to potentially valid instruments Z1, Z2. The possible combinations are Z1 only, Z2 only, Z1, Z2 together. Then first, for Z1 only, we get inefficient GMM estimates, and use them to get weight matrix for the second stage and then get efficient GMM estimates for Z1. We repeat the same analysis for Z2, and then for Z1, Z2. So now we have three sets of efficient GMM estimates, and we choose the one that minimize (7). The choices of the function h(·) and {κn }n≥1 lead to different MSC. Andrews (1999) uses h(|c|) = |c| − p and three different choices of κn that lead to three moment selection criteria (aic, bic, Hannan-Quinn) gmmbic: mscbic,n (c) = Jc (θ, W̄nc ) − (|c| − p) ln n gmmaic: mscaic,n (c) = Jc (θ, W̄nc ) − 2 (|c| − p) gmmhqic: mschqic,n (c) = Jc (θ, W̄nc ) − 2.1 (|c| − p) ln ln n 8 were the value 2.1 in gmmhqic is chosen in light of the results in Andrews (1997). For consistency among the methods we will analyze the gmmbic method in this paper. Also bic based penalty gives selection consistency in both adaptive lasso and Andrews and Lu (2001). The results for the aic and hqic cases are available on request. The third method is by Hong et al. (2003). Their method is analogous to Andrews and Lu (2001), but the J function is estimated using generalized empirical likelihood and exponential tilting statistics. However, we only use cue based objective function, as described in the introduction, due to poor performance of empirical likelihood and exponential tilting as shown in Hong et al. (2003). The objective function is the same as (7) but the weight matrix is updated continuously together with parameters until P convergence. In this third method, the weight matrix is Wn,cue = n−1 ni=1 Zic Zic0 2i , where i = yi − Yi θ. 2.2 Parameter Estimation There are three methods that we will examine in the second stage of parameter estimation for θ. The first two are unpenalized gmm, and unpenalized CUE. Given valid instruments, these two methods will get parameter estimates for structural parameters. An alternative approach for parameter estimation after the moment selection has been done is the method proposed by Okui (2011): the shrinkage two stages least squares (stsls) estimation process. This method shrinks to zero some of the weights of the sample moment conditions and requires a minimum set of moments known to be valid before estimation. The stsls estimation is as follows: 9 for a shrinkage parameter m we define P m = PZI + mPZII and the stsls as stsls = (Y 0 P m Y )−1 Y 0 P m y, θ̂n,s where ZI represents s valid moments from the first set, and ZII represents the moments that may be selected as valid by an information criterion such as alasso or gmm, from the second set of q − s moments/instruments. The shrinkage parameter m is chosen to minimize the Nagar (1959)-type approximation of the mean squared error. When there is only one endogenous variable, as in our simulation setup, the estimate of the optimal shrinkage parameter is: ∗ m̂ = Ŷ 0 PZI Ŷ n 0 2 2 r + σ̂ 2 Ŷ PZI Ŷ σ̂εu ε n n σ̂ε2 2 where σ̂ε2 and σ̂εu are the estimates for σε2 = E(ε2i ) and σu2 = E(εi ui ) obtained from a preliminary estimation as described in Okui (2011), and Ŷ is the prediction of Y based on a least squares regression with the selected instruments. Okui (2011) works only under homoskedasticity and m∗ is valid only under homoskedasticity. 3 Monte Carlo Simulations The purpose of the Monte Carlo simulations is to compare the previously described msc approaches in two respects. First, effectiveness in selecting the correct moments conditions, and performance of the post selection estimators. We use the data generating process in equations (4) and (5). We have only one endogenous variable 10 and set the true θ0 = 0.5. We employ (Z, ε, u) ∼ N (0, Σ) where 2 σzz Iq Σ= σ Zε 0q σ 0Zε 00q σε2 σεu σεu σu2 2 is a (q + 2) × (q + 2) symmetric matrix, σzz is the variance of the instruments I q is an identity matrix of order q, σ Zε is a q × 1 vector of correlations between the instruments and the structural error, 0q is a q × 1 vector of zeros, σεu , σε2 and σu2 are scalars. We impose an heteroskedastic error structure of the form ε∗i = εi kZi k , with kZi k = q 2 Zi1 + · · · + Ziq2 A moment is valid if E[g(Zi , θ0 )] = E[Zi0 (y − Y θ0 )] = E[Zi0 ε] = σZε = 0. We generate invalid moments by constructing σ Zε vectors in two ways: (1) constant correlation D = 6 0 between the instrument and the structural error, and (2) local to √ √ zero correlation of the form 1/n, 1/ n and 1/ 3 n to explore different convergence rates. The homoskedastic case is when ε∗i = εi . In all setups we have q total moments, and s of them known to be valid to the researcher a priori. However, there are a total of r = s + sv valid moments, so we have to select sv valid moments among q − s of them. The number of valid and invalid moment conditions is generated in two ways: in the first setup we simulate data from a fixed number of moments: q = 11, s = 3 and r = 7. That is, there are 11 moments and we know that 3 of them are valid. We have to select among the other 8. We set 4 of them valid and 4 of them invalid. The errors for this setup are homoskedastic. In the second setup we allow the number of valid moments to increase with the 11 sample size: q = √ √ n, s = q and sv = (q − s)/2, that is, we have to choose among q − s candidates and we set half of them as valid. The errors for this setup are heteroskedastic. These are denoted as Setup 1 and Setup 2 respectively. In our Setup 1, Σ is a 13 × 13 matrix constructed as follows: we simulate Z ∈ R11 divided in three categories: the first set of instruments are known to be strong and valid (s = 3) as required by the mscs described in the previous section. As mentioned before, the next set of instruments is divided into two categories: the first four instruments are valid (sv = 4) and the last q − r = 4 are invalid. The last elements of Σ are σ Zε = (0, 0, 0, 0, 0, 0, 0, D, D, D, D) in the constant correlation case and σ Zε = (0, 0, 0, 0, 0, 0, nh , √hn , h h √ 3 n, n) in the local to zero scenario. Note that we use three rates for the local to zero moments which are recycled as needed. We set σε2 = 1, σu2 = 0.5, D = 0.2 and h = 1. For each correlation structure we investigate weak and strong identification scenarios by changing π0 in equation 5. In the strong identification scenario we have π0 = 2 · 111 and in the weak identification case we have π0 = (2 · 13 , 0.2 · 18 ) with 1` being a row vector of ones of length `. The second setup is constructed in an analogous manner. 2 We set the variance of the instruments σzz = {0.5, 1.0} · I q and the covariance between the structural and reduced form errors σεu = 0.5. This gives us two cases: 2 2 in the Case 1 σzz = 0.5 · I q and covue = 0.5 and in the Case 2 σzz = 1.0 · I q and covue = 0.5. We have estimated many other cases for the covariance matrix: in the Case 3 2 2 σzz = I q and covue = 0.5, in the Case 4 σzz = I q and covue = 0.9, in the Case 5 2 2 σzz = 2 · I q and covue = 0.5 and in the Case 6 σzz = 2 · I q and covue = 0.9. These cases and the local-to-zero ones are available on request. 12 The simulated sample sizes are n = {50, 100, 250}. All the results in the next section are based on 1000 repetitions. 4 Results We focus only on the most relevant and salient results of our simulation exercises: Cases 1 and 2 for Setups 1 and 2 using invalid moments, with constant correlation with the structural equation. We are not presenting all our simulated scenarios for economy of space, and because the general results presented here hold across all the alternative setups1 . We will focus on the weak and strong identification cases with 2 2 σzz = 0.5I q and σzz = 1I q . The analysis of the results is done with reference to two questions: how good are the msc selection procedures, and which technique gives the best estimation of the structural parameter θ0 . The R2 of the first stage regression is presented in Table 7. It ranges from 0.533 to 0.944 depending on the strength of the identification and the number of observations. The moment selection methods are the adaptive lasso (alasso), penalized gmm (gmmpen ) and penalized cue (cuepen ). We have nine post selection structural parameter estimators: alasso–ma is the estimator obtained by selecting the moments using the Adaptive Lasso method in the first stage and then using Okui (2011)’s moment averaging estimator in the second stage. Setup 1 is the homoskedastic setup, and we use the optimal m∗ in Okui (2011) MA method, which works only under homoskedasticity. Setup 2 allows heteroskedasticity. So Okui (2011) method is not designed for that and m∗ in Section 2.2 here is only designed for homoskedasticity. But we still use m∗ in Section 2.2 in Setup 2 simulations to see how this method fares 1 We have extensive results for all the moment selection techniques discussed in the section 2 for fixed correlation and local to zero correlation between the instruments and structural error available on request. 13 under heteroskedasticity. alasso–gmm and alasso–cue are the estimates using adaptive lasso to select the valid moments in the first stage and then use them in the efficient and unpenalized cue and gmm respectively. For the efficient gmm we have three estimators: gmm is the gmm estimator using the full set of moments, gmmpen –ma uses the penalized gmm estimator in Andrews and Lu (2001) for model selection and then use Okui’s moment averaging estimator in the second stage. gmmpen –gmm selects the moments in the same way as in the previous method but then the structural parameter is estimated using efficient gmm. In the same way, cue denotes the cue estimator using the full set of moments, cuepen –ma is the estimator obtained by selecting the moments using the penalized cue criteria and using these moments in the moment averaging estimator and cuepen –cue selects the moments using penalized cue and estimates θ0 using the cue estimator. A summary of our results is presented in Tables 1 and 2 for model selection and post selection estimation performance respectively. In Table 1 we present the average ranking of each method on the probability of selecting the exact valid moments in Tables 3 and 4 for each sample size and strength of the identification. In case of a tie the methods get the same ranking (and we can have two first or two second places). From these tables we can see that the adaptive lasso method is the best in perfect moment selection. In Table 2 we present the performance of the post selection estimation methods. The performance is assessed by the rmse. The rankings are based on the relative performance in Tables 5a to 6b, presented by sample size and strength of identification. The estimator with the smallest value acquires the rank of 1. Two estimators with the same rmse are given the same rank. The Average Ranking ranges from 1 to 9 14 Table 1: Summary of the Performance of the Moment Selection Techniques alasso gmmpen cuepen Setup 1 1.25 2.58 1.83 Setup 2 1.33 2.67 1.92 Figures correspond to the average ranking of each method based on the probability of selecting the exact valid moments. The latter are in Tables 3 and 4, by sample size and strength of identification. In case of a tie the methods get the same ranking (we can have two first or two second places). alasso, gmmpen and cuepen stands for adaptive lasso, penalized gmm and penalized cue respectively. and the frequency of being in the Top Three from 0 to 12. From Table 2 we conclude that the best estimator is obtained using the adaptive lasso method to select the moments, followed by the moment averaging procedure (alasso–ma). The moment averaging procedure improves estimation for the three moment selection techniques. The worse estimators are obtained with the cue method. In the heteroskedastic setup (Setup 2) adaptive lasso-ma (moment averaging in the second stage) is still the best in terms of RMSE, but not as good compared to the homoskedastic case (Setup 1). In the next sections we present the detailed analysis of the moment selection and post selection estimation methods. 4.1 Model Selection We analyze three msc methods: the Adaptive Lasso in equation 6, the Penalized Efficient gmm and the Penalized Continuously Updated gmm in equation 7. In all cases we adopt the bic criteria. For each method we measure its performance by the probability of three events: (1) the method selects the true number of valid moments and none of the invalid ones (perfect selection), (2) it selects only valid moments but strictly less or greater than true number of valid ones, and does not select any invalid at the same time, and (3), 15 Table 2: Summary of the Performance of the Post Selection Techniques Setup 1 Average Times at the Ranking top three alasso–ma 1.17 12 alasso–gmm 2.00 12 alasso–cue 4.83 4 gmm 4.33 3 gmmpen –ma 1.67 12 gmmpen –gmm 3.00 8 cue 6.33 0 cuepen –ma 1.92 11 cuepen –cue 4.25 5 Setup 2 Average Times at the Ranking top three 1.75 11 2.25 11 4.08 5 5.67 1 2.50 9 3.00 7 7.08 0 2.83 7 5.08 2 The performance is analyzed in terms of the rmse. The rankings are based in the relative performance in the Tables 5a to 6b. The estimator with the smaller value takes the rank of 1. If there is a tie the estimators are given the same rank. The average ranking ranges from 1 to 9 and the times at the top three from 0 to 12. alasso–ma is the estimator obtained by selecting the moments using the Adaptive Lasso method in the first stage and then using Okui’s moment averaging estimator in the second stage. alasso–gmm and alasso–cue are the estimates using adaptive lasso to select the valid moments in the first stage and then use them in the efficient and unpenalized cue and gmm respectively. For the efficient gmm we have three estimators: gmm is the gmm estimator using the full set of moments, gmmpen –ma uses the penalized gmm estimator in Andrews and Lu (2001) for model selection and then use Okui’s moment averaging estimator in the first stage. gmmpen –gmm selects the moments in the same way as the previous methods but then the structural parameter is estimated using efficient gmm. In the same way, cue denotes the cue estimator using the full set of moments, cuepen –ma is the estimator obtained by selecting the moments using the penalized cue criteria and using these moments in the moment averaging estimator and cuepen –cue selects the moments using penalized cue and estimates θ0 using the cue estimator. 16 it selects any invalid moments. The first probability shows the probability of being ”perfect”. The second choice is second best, since we do not choose the correct number of valid moments, but still we choose only valid ones. One can benefit from this in the second stage of structural parameter estimation in gmm. The third probability provides how bad can the moment selection criteria behave. Since invalid moments can affect the finite sample bias of the second stage structural parameter estimate badly, we would prefer the third probability to be low. The results are presented in Tables 3 and 4 for, respectively, the first setup (fixed number of moments, homoskedastic case), and the second setup (increasing number of moments, heteroskedastic case) for the weak and strong identification cases and 2 2 covariance among the instruments σzz = 0.5 · I q and σzz = Iq. 2 In Table 3, Setup 1, for σzz = 0.5 · I q we find that in the smallest sample case, n = 50, all three msc approaches behave poorly, particularly the gmmpen , selecting invalid moments with a probability of 0.999 in the weak identification case, and 1 in the strong identification case. The best method is the alasso, which selects invalid moments with a probability of 0.676 and 0.733 in the weak and strong identification cases respectively. The performance of the three methods improves when the sample size increases to n = 100, but their relative positions remain the same: alasso dominates the penalized methods. In this case the alasso selects invalid moments with probability 0.383 and 0.517 in the weak and strong identification cases, whereas the penalized methods do so with probabilities above 0.928. The performance ranking changes when the sample size is increased to n = 250: the alasso still selects invalid moments with the smallest probability, but the penalized cue method select perfectly with probability 0.737 and 0.678 in strong and weak identification, compared with 17 0.355 and 0.278 for the alasso. However, if the objective is avoid selecting any invalid moments, then the alasso still dominates, selecting any invalid moment with probability 0.043. Since alasso and the penalized methods are selection consistent we can take this as evidence of differences in the convergence rates, with the penalized methods converging faster in this case and the differences in the performance of the penalized methods is negligible. This is not true for the next case. 2 In Table 3, Setup 1, when σzz = 1 · I q the relative performance of the methods is the same as in the previous case, but with alasso dominating in all the cases and criteria. However the penalized methods are catching up the performance of alasso at the sample size increases, with the cuepen slightly dominates its counterpart gmmpen in all the cases. Also in all the cases the methods behave poorly compared 2 to the setup with σzz = 0.5 · I q . The conclusions for the Setup 2 are the same as those for the Setup 1. It is noteworthy that the alasso moves smoothly between the three performance measures (prefect, valid and invalid selection), whereas the penalized methods jump from selecting invalid instruments to perfect selection as the sample size increases, with undesirably small probabilities of selecting only valid moments (but not perfect) under all our setups. 4.2 Post Selection Performance In this section we analyze the post selection performance of the msc in terms of the bias, standard deviation and rmse of the estimate for θ̂. For each of the three methods we estimate the structural parameter using efficient and continuously updated gmm, and the moment averaging method in Okui (2011). The results for the Setups 1 and 18 Table 3: Probabilities: Moment Selection Criteria. Setup 1 Weak Identification Perfect Only Any selection Valid Invalid 2 σzz = 0.5 · I q n = 50 alasso gmmpen cuepen n = 100 alasso gmmpen cuepen n = 250 alasso gmmpen cuepen 2 σzz = 1 · I q n = 50 alasso gmmpen cuepen n = 100 alasso gmmpen cuepen n = 250 alasso gmmpen cuepen Strong Identification Perfect Only Any selection Valid Invalid 0.028 0.001 0.003 0.296 0.000 0.000 0.676 0.999 0.997 0.025 0.000 0.003 0.242 0.000 0.000 0.733 1.000 0.997 0.136 0.056 0.066 0.481 0.004 0.006 0.383 0.940 0.928 0.063 0.045 0.052 0.420 0.003 0.005 0.517 0.952 0.943 0.355 0.737 0.737 0.602 0.052 0.052 0.043 0.211 0.211 0.278 0.673 0.678 0.613 0.047 0.047 0.109 0.280 0.275 0.008 0.000 0.000 0.192 0.000 0.000 0.800 1.000 1.000 0.010 0.000 0.000 0.152 0.000 0.000 0.838 1.000 1.000 0.037 0.003 0.006 0.315 0.002 0.002 0.648 0.995 0.992 0.013 0.001 0.002 0.264 0.001 0.000 0.723 0.998 0.998 0.199 0.155 0.158 0.543 0.004 0.004 0.258 0.841 0.838 0.127 0.118 0.121 0.508 0.007 0.008 0.365 0.875 0.871 Note: alasso, gmmpen and cuepen stands for adaptive lasso, penalized gmm and penalized cue respectively. Presented numbers are the probabilities of (i) Perfect selection: the method selects only the valid moments, (2) Only Valid: the method do not choose any invalid moment and (iii) Any invalid: the method select any invalid moment. Setup 1 consist on a fixed number of moments and homoskedastic errors. There are 11 moments, 3 known to be valid and 8 unknown. Among these 4 are valid and 4 are invalid. π = (2 · 13 , 0.2 · 18 ) in the weak identification case and π = 2 · 111 in the strong identification case. 19 Table 4: Probabilities: Moment Selection Criteria. Setup 2 Weak Identification Perfect Only Any selection Valid Invalid 2 = 0.5 · I q σzz n = 50 alasso gmmpen cuepen n = 100 alasso gmmpen cuepen n = 250 alasso gmmpen cuepen 2 σzz = 1 · Iq n = 50 alasso gmmpen cuepen n = 100 alasso gmmpen cuepen n = 250 alasso gmmpen cuepen Strong Identification Perfect Only Any selection Valid Invalid 0.085 0.011 0.017 0.301 0.000 0.000 0.614 0.989 0.983 0.058 0.007 0.012 0.247 0.000 0.000 0.695 0.993 0.988 0.196 0.098 0.104 0.397 0.005 0.005 0.407 0.897 0.891 0.136 0.078 0.094 0.367 0.004 0.006 0.497 0.918 0.900 0.266 0.518 0.522 0.662 0.045 0.051 0.072 0.437 0.427 0.192 0.493 0.510 0.639 0.039 0.050 0.169 0.468 0.440 0.044 0.003 0.003 0.198 0.000 0.000 0.758 0.997 0.997 0.032 0.001 0.003 0.182 0.000 0.000 0.786 0.999 0.997 0.080 0.014 0.018 0.264 0.000 0.000 0.656 0.986 0.982 0.055 0.009 0.008 0.221 0.000 0.000 0.724 0.991 0.992 0.109 0.038 0.041 0.515 0.005 0.005 0.376 0.957 0.954 0.068 0.031 0.038 0.433 0.000 0.001 0.499 0.969 0.961 Note: alasso, gmmpen and cuepen stands for adaptive lasso, penalized gmm and penalized cue respectively. Presented numbers are the probabilities of (i) Perfect selection: the method selects only the valid moments, (2) Only Valid: the method do not choose any invalid moment and (iii) Any invalid: the method select any invalid moment.√ Setup 2 consist on an increasing number of √ moments and heteroskedastic errors. There are q = n moments, q known to be valid and q − s unknown. From these, (q − s)/2 are valid and (q − s)/2 are invalid. π = (2 · 1s , 0.2 · 1q−s ) in the weak identification case and π = 2 · 1q in the strong identification case. 20 2 are presented in Tables 5a, 5b and 6a, 6b respectively. 2 In Table 6a, Setup 2, with weak identification and σzz = 0.5 · I q we find that with sample size n = 50 the best estimator is obtained by using alasso–ma with rmse at 0.143. The worst method is the cue with a rmse of 0.297. Note that when using the full set of instruments the gmm estimator performs better than the cue in terms of rmse, but cue has a smaller bias in the weak identification case. With the sample size n = 50 the adaptive lasso based methods are the best in rmse in the strong identification case. As the sample size increases all the estimators converge to the 2 true value. With σzz = 1 · I q the relative performance remains the same, but the rmse and the standard deviations are smaller. In terms of coverage, alasso-gmm performs the best among several specifications. In all setups and specifications, we see that alasso-gmm comes close to 95% coverage whereas other methods cannot replicate this behavior. In terms of bias, in more relevant Setup 2 (Tables 6a-b), we see that adaptive lasso based methods do very well, but so does penalized cue in first stage, followed with second stage cue. 5 Conclusion We have studied the relative performance of several moment selection techniques in selecting the correct moments and in estimating the structural parameter. Our simulations suggest that using adaptive lasso in the first stage, obtaining valid instruments, followed by gmm, or moment averaging, deliver the most satisfactory rmse for the structural parameter in both the homoskedastic and heteroskedastic cases. This approach has important computational benefits due to the possibility of estimation based on the lars algorithm, which makes it a good practical choice when the number of instruments grows large. 21 Table 5a: Monte Carlo results for θ̂. Setup 1. (PART 1) Mean 2 σzz = 0.5 · I q n = 50 alasso–ma alasso–gmm alasso–cue gmm gmmpen –ma gmmpen –gmm cue cuepen –ma cuepen –cue n = 100 alasso–ma alasso–gmm alasso–cue gmm gmmpen –ma gmmpen –gmm cue cuepen –ma cuepen –cue n = 250 alasso–ma alasso–gmm alasso–cue gmm gmmpen –ma gmmpen –gmm cue cuepen –ma cuepen –cue Weak Identification Sd Bias rmse 95%c Mean Strong Identification Sd Bias rmse 95%c 0.508 0.515 0.497 0.541 0.516 0.528 0.486 0.518 0.492 0.066 0.008 0.071 0.015 0.142 -0.003 0.069 0.041 0.064 0.016 0.070 0.028 0.264 -0.014 0.066 0.018 0.182 -0.008 0.066 0.073 0.142 0.080 0.066 0.076 0.264 0.068 0.182 0.966 0.917 0.763 0.842 0.963 0.811 0.591 0.964 0.901 0.541 0.541 0.532 0.576 0.565 0.564 0.555 0.567 0.554 0.057 0.058 0.152 0.035 0.039 0.044 0.246 0.037 0.108 0.041 0.041 0.032 0.076 0.065 0.064 0.055 0.067 0.054 0.070 0.071 0.155 0.084 0.076 0.078 0.252 0.077 0.121 0.686 0.894 0.559 0.310 0.509 0.374 0.391 0.494 0.830 0.503 0.506 0.497 0.533 0.506 0.513 0.487 0.506 0.496 0.048 0.003 0.049 0.006 0.077 -0.003 0.051 0.033 0.047 0.006 0.050 0.013 0.172 -0.013 0.047 0.006 0.071 -0.004 0.048 0.050 0.077 0.061 0.048 0.052 0.173 0.048 0.071 0.945 0.927 0.855 0.937 0.943 0.891 0.791 0.944 0.920 0.528 0.528 0.524 0.575 0.549 0.549 0.560 0.551 0.540 0.048 0.048 0.085 0.024 0.038 0.040 0.143 0.037 0.048 0.028 0.028 0.024 0.075 0.049 0.049 0.060 0.051 0.040 0.055 0.056 0.089 0.079 0.063 0.063 0.155 0.063 0.063 0.663 0.929 0.644 0.230 0.452 0.455 0.462 0.433 0.872 0.501 0.502 0.500 0.529 0.501 0.503 0.496 0.501 0.500 0.028 0.001 0.028 0.002 0.029 0.000 0.032 0.029 0.028 0.001 0.028 0.003 0.076 -0.004 0.028 0.001 0.029 0.000 0.028 0.028 0.029 0.044 0.028 0.029 0.077 0.028 0.029 0.950 0.945 0.932 0.993 0.951 0.899 0.931 0.951 0.941 0.508 0.508 0.506 0.573 0.507 0.507 0.568 0.507 0.504 0.031 0.031 0.032 0.015 0.024 0.024 0.032 0.024 0.025 0.008 0.008 0.006 0.073 0.007 0.007 0.068 0.007 0.004 0.032 0.032 0.033 0.074 0.025 0.025 0.075 0.025 0.025 0.857 0.943 0.842 0.122 0.878 0.828 0.861 0.878 0.909 Setup 1 consist on a fixed number of moments and homoskedastic errors. There are 11 moments, 3 known to be valid and 8 unknown. Among these 4 are valid and 4 are invalid. π = (2 · 13 , 0.2 · 18 ) in the weak identification case and π = 2 · 111 in the strong identification case. alasso–ma is the estimator obtained by selecting the moments using the Adaptive Lasso method in the first stage and then using Okui’s moment averaging estimator in the second stage. alasso–gmm and alasso–cue are the estimates using adaptive lasso to select the valid moments in the first stage and then use them in the efficient and unpenalized cue and gmm respectively. For the efficient gmm we have three estimators: gmm is the gmm estimator using the full set of moments, gmmpen –ma uses the penalized gmm estimator in Andrews and Lu (2001) for model selection and then use Okui’s moment averaging estimator in the first stage. gmmpen –gmm selects the moments in the same way as the previous methods but then the structural parameter is estimated using efficient gmm. In the same way, cue denotes the cue estimator using the full set of moments, cuepen –ma is the estimator obtained by selecting the moments using the penalized cue criteria and using these moments in the moment averaging estimator and cuepen –cue selects the moments using penalized cue and estimates θ0 using the cue estimator. 95%c is the coverage of the empirical 95% confidence intervals. 22 Table 5b: Monte Carlo results for θ̂. Setup 1. (PART 2) 2 σzz = 1 · Iq n = 50 alasso–ma alasso–gmm alasso–cue gmm gmmpen –ma gmmpen –gmm cue cuepen –ma cuepen –cue n = 100 alasso–ma alasso–gmm alasso–cue gmm gmmpen –ma gmmpen –gmm cue cuepen –ma cuepen –cue n = 250 alasso–ma alasso–gmm alasso–cue gmm gmmpen –ma gmmpen –gmm cue cuepen –ma cuepen –cue Mean Weak Identification Sd Bias rmse 95%c Mean Strong Identification Sd Bias rmse 95%c 0.505 0.509 0.497 0.521 0.510 0.515 0.500 0.510 0.499 0.046 0.049 0.121 0.049 0.045 0.049 0.202 0.045 0.106 0.005 0.009 -0.003 0.021 0.010 0.015 0.000 0.010 -0.001 0.047 0.050 0.121 0.053 0.046 0.051 0.202 0.047 0.106 0.971 0.924 0.779 0.850 0.973 0.810 0.683 0.973 0.911 0.523 0.523 0.516 0.538 0.534 0.534 0.524 0.535 0.528 0.037 0.023 0.038 0.023 0.149 0.016 0.025 0.038 0.026 0.034 0.029 0.034 0.219 0.024 0.025 0.035 0.066 0.028 0.043 0.044 0.149 0.046 0.043 0.044 0.220 0.043 0.072 0.808 0.902 0.603 0.528 0.727 0.522 0.513 0.731 0.821 0.503 0.504 0.500 0.517 0.505 0.510 0.508 0.506 0.503 0.033 0.035 0.052 0.035 0.033 0.035 0.098 0.033 0.042 0.003 0.004 0.000 0.017 0.005 0.010 0.008 0.006 0.003 0.033 0.035 0.052 0.039 0.034 0.036 0.099 0.034 0.042 0.948 0.921 0.867 0.928 0.951 0.885 0.819 0.951 0.919 0.517 0.517 0.515 0.538 0.532 0.532 0.532 0.532 0.529 0.031 0.032 0.062 0.018 0.021 0.023 0.113 0.021 0.027 0.017 0.017 0.015 0.038 0.032 0.032 0.032 0.032 0.029 0.036 0.036 0.064 0.041 0.038 0.039 0.117 0.038 0.040 0.707 0.927 0.636 0.425 0.530 0.472 0.468 0.526 0.868 0.501 0.502 0.501 0.515 0.502 0.505 0.507 0.502 0.502 0.020 0.020 0.021 0.022 0.020 0.021 0.029 0.020 0.021 0.001 0.002 0.001 0.015 0.002 0.005 0.007 0.002 0.002 0.020 0.020 0.021 0.026 0.020 0.021 0.030 0.020 0.021 0.949 0.946 0.928 0.988 0.951 0.944 0.921 0.950 0.943 0.508 0.508 0.507 0.536 0.519 0.519 0.535 0.520 0.518 0.022 0.022 0.022 0.011 0.020 0.021 0.013 0.020 0.022 0.008 0.008 0.007 0.036 0.019 0.019 0.035 0.020 0.018 0.023 0.024 0.024 0.038 0.028 0.028 0.037 0.028 0.028 0.760 0.944 0.756 0.325 0.514 0.575 0.520 0.512 0.896 Setup 1 consist on a fixed number of moments and homoskedastic errors. There are 11 moments, 3 known to be valid and 8 unknown. Among these 4 are valid and 4 are invalid. π = (2 · 13 , 0.2 · 18 ) in the weak identification case and π = 2 · 111 in the strong identification case. alasso–ma is the estimator obtained by selecting the moments using the Adaptive Lasso method in the first stage and then using Okui’s moment averaging estimator in the second stage. alasso–gmm and alasso–cue are the estimates using adaptive lasso to select the valid moments in the first stage and then use them in the efficient and unpenalized cue and gmm respectively. For the efficient gmm we have three estimators: gmm is the gmm estimator using the full set of moments, gmmpen –ma uses the penalized gmm estimator in Andrews and Lu (2001) for model selection and then use Okui’s moment averaging estimator in the first stage. gmmpen –gmm selects the moments in the same way as the previous methods but then the structural parameter is estimated using efficient gmm. In the same way, cue denotes the cue estimator using the full set of moments, cuepen –ma is the estimator obtained by selecting the moments using the penalized cue criteria and using these moments in the moment averaging estimator and cuepen –cue selects the moments using penalized cue and estimates θ0 using the cue estimator. 95%c is the coverage of 23the empirical 95% confidence intervals. Table 6a: Monte Carlo results for θ̂. Setup 2. (PART 1) 2 σzz = 0.5 · I q n = 50 alasso–ma alasso–gmm alasso–cue gmm gmmpen –ma gmmpen –gmm cue cuepen –ma cuepen –cue n = 100 alasso–ma alasso–gmm alasso–cue gmm gmmpen –ma gmmpen –gmm cue cuepen –ma cuepen –cue n = 250 alasso–ma alasso–gmm alasso–cue gmm gmmpen –ma gmmpen –gmm cue cuepen –ma cuepen –cue Mean Weak Identification Sd Bias rmse 95%c Mean Strong Identification Sd Bias rmse 95%c 0.516 0.518 0.497 0.563 0.529 0.541 0.491 0.531 0.497 0.142 0.145 0.174 0.146 0.141 0.145 0.297 0.140 0.234 0.016 0.018 -0.003 0.063 0.029 0.041 -0.009 0.031 -0.003 0.143 0.146 0.174 0.159 0.143 0.151 0.297 0.144 0.234 0.944 0.920 0.853 0.916 0.942 0.868 0.730 0.943 0.907 0.576 0.571 0.552 0.665 0.639 0.632 0.631 0.642 0.602 0.146 0.076 0.145 0.071 0.165 0.052 0.084 0.165 0.108 0.139 0.112 0.132 0.181 0.131 0.105 0.142 0.160 0.102 0.164 0.161 0.173 0.185 0.176 0.173 0.223 0.176 0.189 0.726 0.916 0.699 0.468 0.568 0.494 0.517 0.562 0.880 0.509 0.511 0.498 0.549 0.514 0.521 0.505 0.514 0.499 0.097 0.100 0.112 0.104 0.097 0.102 0.188 0.097 0.128 0.009 0.011 -0.002 0.049 0.014 0.021 0.005 0.014 -0.001 0.097 0.101 0.112 0.115 0.098 0.104 0.188 0.098 0.128 0.944 0.928 0.879 0.953 0.941 0.896 0.824 0.941 0.916 0.550 0.548 0.536 0.642 0.587 0.582 0.622 0.590 0.565 0.098 0.050 0.098 0.048 0.105 0.036 0.063 0.142 0.087 0.087 0.089 0.082 0.115 0.122 0.085 0.090 0.105 0.065 0.110 0.109 0.111 0.155 0.123 0.121 0.168 0.124 0.123 0.738 0.918 0.760 0.467 0.595 0.618 0.620 0.588 0.886 0.502 0.504 0.496 0.589 0.504 0.508 0.475 0.504 0.494 0.079 0.079 0.087 0.084 0.077 0.078 0.267 0.078 0.083 0.002 0.004 -0.004 0.089 0.004 0.008 -0.025 0.004 -0.006 0.079 0.080 0.087 0.122 0.078 0.078 0.268 0.078 0.083 0.924 0.934 0.913 0.979 0.928 0.903 0.905 0.927 0.932 0.538 0.538 0.529 0.717 0.534 0.532 0.686 0.536 0.516 0.091 0.038 0.090 0.038 0.092 0.029 0.039 0.217 0.069 0.034 0.069 0.032 0.095 0.186 0.070 0.036 0.071 0.016 0.099 0.098 0.096 0.220 0.077 0.076 0.209 0.079 0.073 0.770 0.927 0.773 0.012 0.778 0.779 0.789 0.758 0.903 √ Setup 2 consist on an increasing number of moments and heteroskedastic errors. There are q = n √ moments, q known to be valid and q − s unknown. From these, (q − s)/2 are valid and (q − s)/2 are invalid. π = (2 · 1s , 0.2 · 1q−s ) in the weak identification case and π = 2 · 1q in the strong identification case. alasso–ma is the estimator obtained by selecting the moments using the Adaptive Lasso method and then using Okui’s model averaging estimator, alasso–gmm and alasso–cue are the estimates using adaptive lasso to select the valid moments and then use them in the efficient and cue gmm respectively, gmm is the gmm estimator using the full set of moments, gmmpen –ma uses the penalized gmm for model selection and then model averaging estimator. gmmpen –gmm selects the moments in the same way and the structural parameter is estimated using efficient gmm. cue denotes the cue estimator using the full set of moments, cuepen –ma is the estimator obtained by selecting the moments using the penalized cue criteria and using these moments in the moment averaging estimator, cuepen –cue selects the moments using penalized cue and estimates θ0 using the cue estimator. 95%c is the coverage of the empirical 95% confidence intervals. 24 Table 6b: Monte Carlo results for θ̂. Setup 2. (PART 2) 2 σzz = 1 · Iq n = 50 alasso–ma alasso–gmm alasso–cue gmm gmmpen –ma gmmpen –gmm cue cuepen –ma cuepen –cue n = 100 alasso–ma alasso–gmm alasso–cue gmm gmmpen –ma gmmpen –gmm cue cuepen –ma cuepen –cue n = 250 alasso–ma alasso–gmm alasso–cue gmm gmmpen –ma gmmpen –gmm cue cuepen –ma cuepen –cue Mean Weak Identification Sd Bias rmse 95%c Mean Strong Identification Sd Bias rmse 95%c 0.516 0.516 0.500 0.546 0.528 0.534 0.512 0.530 0.513 0.142 0.145 0.168 0.144 0.139 0.144 0.239 0.139 0.206 0.016 0.016 0.000 0.046 0.028 0.034 0.012 0.030 0.013 0.142 0.146 0.168 0.152 0.142 0.148 0.239 0.142 0.207 0.950 0.907 0.854 0.897 0.951 0.861 0.757 0.951 0.899 0.558 0.554 0.540 0.620 0.606 0.602 0.601 0.606 0.584 0.133 0.135 0.150 0.087 0.100 0.104 0.146 0.099 0.136 0.058 0.054 0.040 0.120 0.106 0.102 0.101 0.106 0.084 0.145 0.145 0.155 0.148 0.146 0.145 0.178 0.145 0.160 0.797 0.906 0.728 0.618 0.719 0.598 0.613 0.717 0.885 0.509 0.511 0.502 0.535 0.516 0.522 0.516 0.517 0.507 0.096 0.099 0.110 0.101 0.095 0.100 0.143 0.096 0.119 0.009 0.011 0.002 0.035 0.016 0.022 0.016 0.017 0.007 0.096 0.100 0.110 0.107 0.097 0.103 0.144 0.097 0.119 0.938 0.923 0.882 0.934 0.940 0.891 0.846 0.938 0.918 0.545 0.543 0.537 0.601 0.580 0.577 0.591 0.582 0.569 0.090 0.091 0.097 0.063 0.074 0.077 0.087 0.074 0.091 0.045 0.043 0.037 0.101 0.080 0.077 0.091 0.082 0.069 0.101 0.101 0.104 0.119 0.109 0.109 0.126 0.110 0.114 0.775 0.917 0.741 0.636 0.652 0.647 0.610 0.643 0.890 0.506 0.507 0.500 0.563 0.518 0.521 0.526 0.518 0.505 0.079 0.079 0.082 0.081 0.077 0.080 0.135 0.078 0.088 0.006 0.007 0.000 0.063 0.018 0.021 0.026 0.018 0.005 0.079 0.080 0.082 0.103 0.079 0.082 0.137 0.080 0.088 0.921 0.930 0.905 0.974 0.921 0.934 0.882 0.917 0.931 0.549 0.548 0.542 0.654 0.603 0.599 0.645 0.605 0.590 0.084 0.084 0.085 0.040 0.070 0.073 0.055 0.069 0.079 0.049 0.048 0.042 0.154 0.103 0.099 0.145 0.105 0.090 0.097 0.097 0.094 0.159 0.125 0.123 0.155 0.125 0.120 0.635 0.923 0.640 0.110 0.322 0.401 0.377 0.307 0.898 √ Setup 2 consist on an increasing number of moments and heteroskedastic errors. There are q = n √ moments, q known to be valid and q − s unknown. From these, (q − s)/2 are valid and (q − s)/2 are invalid. π = (2 · 1s , 0.2 · 1q−s ) in the weak identification case and π = 2 · 1q in the strong identification case. alasso–ma is the estimator obtained by selecting the moments using the Adaptive Lasso method and then using Okui’s model averaging estimator, alasso–gmm and alasso–cue are the estimates using adaptive lasso to select the valid moments and then use them in the efficient and cue gmm respectively, gmm is the gmm estimator using the full set of moments, gmmpen –ma uses the penalized gmm for model selection and then model averaging estimator. gmmpen –gmm selects the moments in the same way and the structural parameter is estimated using efficient gmm. cue denotes the cue estimator using the full set of moments, cuepen –ma is the estimator obtained by selecting the moments using the penalized cue criteria and using these moments in the moment averaging estimator, cuepen –cue selects the moments using penalized cue and estimates θ0 using the cue estimator. 95%c is the coverage of the empirical 95% confidence intervals. 25 References Andrews, Donald W. K. (1999) “Consistent moment selection procedures for generalized method of moments estimation,” Econometrica, Vol. 67, No. 3, pp. 543–564. Andrews, Donald W. K. and Biao Lu (2001) “Consistent model and moment selection procedures for GMM estimation with application to dynamic panel data models,” Journal of Econometrics, Vol. 101, No. 1, pp. 123–164. Andrews, Donald W.K. (1997) “Consistent Moment Selection Procedures for Generalized Method of Moments Estimation,” Cowles Foundation Discussion Papers 1146R, Cowles Foundation for Research in Economics, Yale University. Andrews, Donald W.K. and James H. Stock (2007) “Testing with many weak instruments,” Journal of Econometrics, Vol. 138, No. 1, pp. 24–46. Belloni, A., V. Chernozhukov, and C.B. Hansen (2011) “Lasso Methods for Gaussian Instrumental Variables Models,” working paper, Massachusetts Institute of Technology, Department of Economics. Canay, Ivan A. (2010) “Simultaneous selection and weighting of moments in {GMM} using a trapezoidal kernel,” Journal of Econometrics, Vol. 156, No. 2, pp. 284–303. Caner, M. and H. Zhang (2013) “Adaptive Elastic Net GMM with Diverging Number of Moments,” Journal of Business and Economics Statistics, Forthcoming. Caner, Mehmet (2009) “Lasso-Type GMM Estimator,” Econometric Theory, Vol. 25, pp. 270–290. Cheng, Xu and Zhipeng Liao (2012) “Select the Valid and Relevant Moments: A one-step procedure for GMM with many moments,” PIER Working Paper Archive 26 12–045, Penn Institute for Economic Research, Department of Economics, University of Pennsylvania. Efron, Bradley, Trevor Hastie, Iain Johnstone, and Robert Tibshirani (2004) “Least angle regression,” Annals of Statistics, Vol. 32, pp. 407–499. Hansen, Bruce E. (2007) “Least Squares Model Averaging,” Econometrica, Vol. 75, No. 4, pp. 1175–1189. Hansen, Lars Peter (1982) “Large Sample Properties of Generalized Method of Moments Estimators,” Econometrica, Vol. 50, No. 4, pp. 1029–1954. Hastie, Trevor, Robert Tibshirani, and Jerome Friedman (2009) The Elements of Statistical Learning: Data Mining, Inference, and Prediction: Springer, corrected edition. Hausman, Jerry, James H. Stock, and Motohiro Yogo (2005) “Asymptotic properties of the Hahn-Hausman test for weak-instruments,” Economics Letters, Vol. 89, No. 3, pp. 333–342. Hong, Han, Bruce Preston, and Matthew Shum (2003) “Generalized Empirical Likelihood Based Model Selection Criteria For Moment Condition Models,” Econometric Theory, Vol. 19, No. 06, pp. 923–943. Kuersteiner, Guido and Ryo Okui (2010) “Constructing Optimal Instruments by First-Stage Prediction Averaging,” Econometrica, Vol. 78, No. 2, pp. pp. 697–718. Liao, Zhipeng (2013) “Adaptive GMM shrinkage estimation with consistent moment selection,” Econometric Theory, Vol. FirstView, pp. 1–48. 27 Nagar, A. L. (1959) “The Bias and Moment Matrix of the General k-Class Estimators of the Parameters in Simultaneous Equations,” Econometrica, Vol. 27, No. 4, pp. 575–595. Newey, W.K. and R.J Smith (2000) “Asymptotic Bias and Equivalence of GMM and GEL Estimators,” Discussion paper 01/517, University of Bristol, Department of Economics. Okui, Ryo (2011) “Instrumental variable estimation in the presence of many moment conditions,” Journal of Econometrics, Vol. 165, No. 1, pp. 70–86. Smith, Richard J. (1992) “Non-Nested Tests for Competing Models Estimated by Generalized Method of Moments,” Econometrica, Vol. 60, No. 4, pp. 973–980. Zou, Hui (2006) “The adaptive lasso and its oracle properties,” Journal of the American Statistical Association, Vol. 101, pp. 1418–1429. Zou, Hui, Trevor Hastie, and Robert Tibshirani (2007) “On the “degrees of freedom” of the lasso,” The Annals of Statistics, Vol. 35, No. 5, pp. 2173–2192. 28 6 Additional Table Table 7: R2 of the First Stage Estimation Setup 1 2 n Method Weak Strong σzz 0.5 50 alasso 0.865 0.719 gmmpen 0.879 0.906 gmmpen 0.881 0.913 100 alasso 0.855 0.638 gmmpen 0.864 0.837 gmmpen 0.864 0.845 250 alasso 0.849 0.533 gmmpen 0.853 0.638 gmmpen 0.853 0.639 1.0 50 alasso 0.924 0.763 gmmpen 0.935 0.941 gmmpen 0.935 0.940 100 alasso 0.918 0.706 gmmpen 0.927 0.921 gmmpen 0.927 0.923 250 alasso 0.912 0.596 gmmpen 0.919 0.818 gmmpen 0.919 0.819 for Each Method Setup 2 Weak Strong 0.861 0.698 0.871 0.877 0.872 0.887 0.888 0.700 0.893 0.844 0.893 0.851 0.879 0.564 0.883 0.655 0.883 0.660 0.922 0.743 0.930 0.925 0.930 0.930 0.939 0.763 0.943 0.919 0.944 0.923 0.929 0.636 0.937 0.853 0.937 0.859 Note: alasso, gmmpen and cuepen stands for adaptive lasso, penalized gmm and penalized cue respectively. Setup 1 consist on a fixed number of moments and homoskedastic errors. There are 11 moments, 3 known to be valid and 8 unknown. Among these 4 are valid and 4 are invalid. π = (2 · 13 , 0.2 · 18 ) in the weak identification case and π = 2 · 111 in the strong identification case. √ Setup 2 consist on an increasing number of moments and heteroskedastic errors. There are q = n √ moments, q known to be valid and q − s unknown. From these, (q − s)/2 are valid and (q − s)/2 are invalid. π = (2 · 1s , 0.2 · 1q−s ) in the weak identification case and π = 2 · 1q in the strong identification case. 29