Moment and IV Selection Approaches: A Comparative Simulation Study Mehmet Caner Esfandiar Maasoumi

advertisement
Moment and IV Selection Approaches: A
Comparative Simulation Study
Mehmet Caner∗
Esfandiar Maasoumi†
Juan Andrés Riquelme‡
August 7, 2014
Abstract
We compare three moment selection approaches, followed by post selection estimation
strategies. The first is adaptive lasso of Zou (2006), recently extended by Liao (2013)
to possibly invalid moments in gmm. In this method, we select the valid instruments
with adaptive lasso. The second method is based on the J test, as in Andrews and
Lu (2001). The third one is using a Continuous Updating Objective (cue) function.
This last approach is based on Hong et al. (2003) who propose a penalized generalized
empirical likelihood based function to pick up valid moments. They use empirical
likelihood, and exponential tilting in their simulations. However, the J test based
approach of Andrews and Lu (2001) provides generally better moment selection results
than the empirical likelihood and exponential tilting as can be seen in Hong et al.
(2003). In this article, we examine penalized cue as a third way of selecting valid
moments.
Following a determination of valid moments, we run unpenalized gmm and cue
and model averaging technique of Okui (2011) to see which one has better postselection
estimator performance for structural parameters. The simulations are aimed at the
following questions: which moment selection criterion can better select the valid ones
and eliminate the invalid ones? Given the chosen instruments in the first stage, which
strategy delivers the best finite sample performance?
We find that the Adaptive Lasso in the model selection stage, coupled with either
unpenalized gmm or moment averaging of Okui delivers generally the smallest rmse
for the second stage coefficient estimators.
Keywords and phrases: Shrinkage, Monte Carlo, Averaging.
∗
North Carolina State University, Department of Economics, 4168 Nelson Hall, Raleigh, NC
27695. Email: mcaner@ncsu.edu.
†
Emory University, Department of Economics, Atlanta, GA. Email:
esfandiar.maasoumi@emory.edu.
‡
North Carolina State University, Department of Economics. Email: jariquel@ncsu.edu.
1
Introduction
It is not uncommon to encounter a large number of instruments or moment conditions
in the applications of instrumental variables (iv) or the Generalized Method of
Moments (gmm) estimators. Some IVs or moments may be invalid, but the researcher
does not know a priori which ones. This problem may be adjudicated statistically
with the J-test which indicates whether overidentified restrictions are valid. If the
null is rejected, the researcher would need to a moment selection technique that
would allow distinguishing between the valid and invalid moment conditions. A few
techniques have been proposed, each one with advantages (for example consistency)
and disadvantages (such as overwhelming computational demand). In this paper we
focus on information-based methods and review three of the moment selection criteria
(msc) used in the current literature: (i) the shrinkage procedure as in Liao (2013)
and (ii) the information-based criteria with gmm in Andrews (1999), and (iii) the
information based criterion using generalized empirical likelihood of Hong et al. (2003)
. By using Monte Carlo simulations we compare these methods in their performance in
selecting valid moments in linear settings under several relevant scenarios: small and
large sample sizes, fixed and increasing number of moment conditions, weak and strong
identification, local-to-zero moment conditions, homoskedastic and heteroskedastic
errors. The contribution of our study is the comparison of these multistep approaches
with each other in a fairly comprehensive manner.
The choice of methods in this study was motivated by the following considerations:
adaptive lasso is heavily used in statistics and has computational advantages in large
scale problems; penalized methods in Andrews (1999) and Hong et al. (2003) are not
computationally advantageous, but are used by econometricians due to the need to
determine valid instruments. Further, these three methods have reasonably strong
1
theoretical underpinnings.
We analyze second stage estimation performance, considering the finite sample
properties of structural parameter estimators. To this end, we employ Okui (2011)
model averaging technique to get better Mean Squared Error, and smaller bias for the
structural parameters. We then compare Okui (2011) to unpenalized gmm and cue
estimation, following selection of valid instruments in the first stage. We find that the
Adaptive Lasso in the model selection stage, coupled with either unpenalized gmm or
moment averaging of Okui, generally delivers the smallest rmse for the second stage
estimation.
There is a large and rich literature on Moment Selection Techniques. Smith (1992)
proposes a procedure to compare competing non-nested gmm estimations, allowing
for heteroskedasticity and serial correlation. Again in the GMM context, Andrews
(1999) proposes a moment selection procedure using information criteria based on
a J-statistic corrected for the number of moment conditions. This is analogous
to the use of Akaike (aic), Bayesian (bic) and Hannan-Quinn (hqic) information
criteria in model selection. He shows that the proposed methods are consistent
under suitable assumptions, and also formalized the downward and upward testing
procedures. The downward testing consists of iterative J-tests starting from the
largest moment conditions, and proceeding down to fewer moment conditions at each
iteration, until the null is not rejected. The upward testing works in the opposite
order. Andrews and Lu (2001) extend these methods to model selection in dynamic
panel data structures. Hong et al. (2003) propose a similar approach by using the
generalized empirical likelihood defined by Newey and Smith (2000) instead of the
J-statistic.
A relatively new type of moment selection methods are based on shrinkage
2
procedures. One of the advantages of the shrinkage is its computational efficiency
which is consequential, especially in high-dimensional contexts. In a brief comparison,
Hastie et al. (2009, section 3.6) conclude that the shrinkage method performs better
than alternative model selection techniques in reducing the estimation error. Liao
(2013) shows that gmm shrinkage procedures have the oracle property in selecting
the moment conditions, and adding additional valid moments improves efficiency for
strongly identified parameters. Cheng and Liao (2012) used a similar approach and
proposed a weighted tuning parameter that allows to shrink invalid and redundant
moments. We chose the three moment selection criteria based on their optimality
properties, such as the oracle property, and good finite sample performance.
For model selection, assuming valid instruments, Belloni, Chernozhukov and
Hansen (2011) utilize lasso-type estimators in the many iv case and provide conditions under which the iv approach is asymptotically oracle-efficient. Caner (2009)
and Caner and Zhang (2013) also use shrinkage methods for model selection in a
gmm context. Canay (2010) proposed the use of trapezoidal kernel weights to shrink
the first stage estimators. Kuersteiner and Okui (2010) point out that, despite the
advantages of the kernel shrinkage estimation, they cannot completely reduce the
estimation bias and are inflexible once a particular kernel is chosen. They also propose
a moment average estimator using the method in Hansen (2007) to construct optimal
instruments. Okui (2011) develops a shrinkage method that minimizes the asymptotic
mean squared error.
An important concern in the gmm estimation is the presence of weak ivs (Hausman
et al. (2005); Andrews and Stock (2007)). The results in Cheng and Liao (2012)
suggest that shrinkage estimation is robust in discarding invalid ivs, but tends to
include redundant ivs when identification is weak.
3
The rest of the paper is as follows: in the next section we provide a literature
review; in section 2 we review the msc approaches under comparison. In section 3 we
present the details of our Monte Carlo simulation setups. In section 4 the main results
of our simulation exercises are presented. Section 5 concludes. Standard notation is
used for the projection operator PA = A(A0 A)−1 A0 , where A is a matrix.
2
2.1
Theoretical Framework
Moment Selection Methods
Consider a sequence of random variables {Zi }ni=1 drawn from an unknown probability
distribution. The moment selection problem consists of selecting the r valid moments
from a set of q candidates. A minimum set of s ≥ p valid moment conditions are
required in order to identify the structural parameter vector θ, where p ≡ dim(θ).
The set of q candidate moments can be separated in two subsets and the model is,
for i = 1, . . . , n
E[gS (Zi , θ0 )] = 0
S = {1, . . . , s}
?
E[gS c (Zi , θ0 )] = 0
S c = {s + 1, . . . , q}
(1)
(2)
?
where the sign = means that the relationship may not hold for some of the indexes
in S c . We see that r = s + sv , where s represents the number of moments in (1)
(i.e. those deemed to be valid), and sv represents the number of valid moments in
the second set of q − s total moments in (2). S c represents the moments that may
or may not be valid. Thus 0 ≤ sv ≤ q − s. Our framework assumes that researcher
knows a priori that s instruments are valid, and they can identify p parameters. The
4
question is whether to include the rest of the instruments for efficiency considerations.
This is the framework used recently by Liao (2013). Note that θ0 represents the true
structural parameter vector of dimension p.
The standard gmm estimator of θ0 , denoted by θ̂n is
θ̂n ≡ argmin J(θ, W̄n ),
θ∈Θ
where W̄n is a p × p symmetric and positive definite weight matrix and the objective
function (Hansen, 1982) is defined as
J(θ, W̄n ) ≡ n · gn (θ)0 W̄n gn (θ),
with gn (θ) = n−1
Pn
i=1
(3)
g(Zi , θ), and Θ is a compact subset of Rp . For ease of notation
let g(Zi , θ) = gi (θ).
Throughout the paper we consider the following linear model because of its
computational advantages, and conduct the comparative examination in this widely
used setup:
y = Y θ0 + ε
(4)
Y = Zπ0 + u
(5)
where y is n × 1 vector, Y is a n × p vector of endogenous variables, Z is an n × q
matrix of instruments, ε and u are unobserved random variables with constant second
moments and correlated with each other. We do not deal with control variables in the
simulations. This makes no difference in a simulation setup. We make the following
diversion from standard GMM in (3). The set of instruments are divided into valid
5
ones,Zi1 (s × 1) and the set that we suspect may contain invalid instruments Zi2
(q − s × 1).
The sample moment conditions are defined by:
n
1X
gi (θ, β),
gn (θ, β) =
n i=1
where gi (θ, β) = (gi1 (θ)0 , gi2 (θ, β)0 )0 with
gi1 (θ) = Zi1 (yi − Yi θ),
gi2 (θ, β) = Zi2 (yi − Yi θ) − β.
The weight matrix for our nonstandard case is calculated as:
n
1X
Wn =
gi (θ̃, β̃)gi (θ̃, β̃)0 ,
n i=1
where θ̃, β̃ are the first step GMM estimators with Iq as the weight matrix. The first
method we discuss is the adaptive gmm shrinkage estimation method (Liao, 2013).
This method has the advantage of selecting the valid moments and estimate θ in
a single step. It consists of adding a slackness parameter vector β0 to the moment
conditions in (2). So the model is:


E

gi1 (θ0 )
gi2 (θ0 , β0 )

 = 0.
and the validity of the moment conditions is verified by inference on whether β0 = 0
or not. A moment condition j is valid only if β0j = 0, for j = 1, · · · q − s.
6
The adaptive lasso estimators are defined as:
"
(θ̂nalasso , β̂nalasso ) = argmin
gn (θ, β)0 Wn gn (θ, β) + λn
(θ,β)∈Θ×B
q−s
X
#
ω̂j |βj |
(6)
j=1
where Θ × B is the parameter space for (θ, β), ω̂j is a vector of weights, with ω̂j =
1
,
|β̃j |
and β̃j is the unpenalized standard gmm estimator using all q moments. The adaptive
lasso (alasso) estimator penalizes the slackness parameter by its l1 norm. This
penalty is usually preferred because it has the oracle property (β0j is shrunk to zero
for the valid moments) and because it can be solved by using the lars algorithm
(Efron et al., 2004), which represents a great computational advantage.
Liao (2013) also considered alternatives adaptive lasso, as well as bridge, and
smoothly clipped absolute deviation penalties, but we focus only on the adaptive
lasso estimator because the penalty is convex and easy to estimate compared with
others.
The degree of shrinkage is given by the tuning parameter λn ≥ 0: large values
shrink more, and λn = 0 corresponds to the gmm solution. λn is chosen to differentiate
between valid and invalid moments.
The second msc that we analyze is by Andrews (1999) and extended in Andrews
and Lu (2001). It consist of a penalization of the J-statistic (Hansen, 1982) in equation
(3). Following Andrews (1999) notation, let c ∈ Rq−s denote a moment selection vector
of zeros and ones such that if the jth moment condition is valid, the jth element of c
P
is one. Let |c| = q−s
j=1 cj denote the number of moments selected by c and Zic is the
vector Z from which the jth element is deleted if the corresponding jth element in c
is zero. The corresponding weight matrix is W̄nc of dimension s + |c| × s + |c|. The
7
msc estimator objective function has the following general form:
mscn (c) = Jc (θ, W̄nc ) − h(|c|)κn ,
(7)
where Jc (θ, W̄nc ) = gn (θ)0 W̄nc gn (θ) uses the s + |c| moments in gmm objective function.
See that gn (θ) is defined immediately below equation (3). In other words, in (7)
P
we have W̄nc = n−1 ni=1 Zic Zic0 ¯2i , where ¯i = yi − Yi θ̄, and θ̄ is estimated through
inefficient gmm with weight matrix as identity matrix, and this inefficient GMM uses
Zic . The algorithm for this process works as follows. For each instrument combination,
we calculate first step inefficient GMM with identity as the weight matrix, and then
given the inefficient GMM estimates we setup the new weight as described above, and
get the parameter estimates for the second stage efficient GMM. Then we form (7) for
each instrument combination, and pick the instrument combination that minimizes
(7). The corresponding efficient GMM estimates are the ones that will be used.
To be specific, say we have to potentially valid instruments Z1, Z2. The possible
combinations are Z1 only, Z2 only, Z1, Z2 together. Then first, for Z1 only, we get
inefficient GMM estimates, and use them to get weight matrix for the second stage
and then get efficient GMM estimates for Z1. We repeat the same analysis for Z2,
and then for Z1, Z2. So now we have three sets of efficient GMM estimates, and we
choose the one that minimize (7). The choices of the function h(·) and {κn }n≥1 lead
to different MSC. Andrews (1999) uses h(|c|) = |c| − p and three different choices of
κn that lead to three moment selection criteria (aic, bic, Hannan-Quinn)
gmmbic:
mscbic,n (c) = Jc (θ, W̄nc ) − (|c| − p) ln n
gmmaic:
mscaic,n (c) = Jc (θ, W̄nc ) − 2 (|c| − p)
gmmhqic:
mschqic,n (c) = Jc (θ, W̄nc ) − 2.1 (|c| − p) ln ln n
8
were the value 2.1 in gmmhqic is chosen in light of the results in Andrews (1997).
For consistency among the methods we will analyze the gmmbic method in this
paper. Also bic based penalty gives selection consistency in both adaptive lasso and
Andrews and Lu (2001). The results for the aic and hqic cases are available on
request.
The third method is by Hong et al. (2003). Their method is analogous to Andrews
and Lu (2001), but the J function is estimated using generalized empirical likelihood
and exponential tilting statistics. However, we only use cue based objective function,
as described in the introduction, due to poor performance of empirical likelihood and
exponential tilting as shown in Hong et al. (2003). The objective function is the same
as (7) but the weight matrix is updated continuously together with parameters until
P
convergence. In this third method, the weight matrix is Wn,cue = n−1 ni=1 Zic Zic0 2i ,
where i = yi − Yi θ.
2.2
Parameter Estimation
There are three methods that we will examine in the second stage of parameter
estimation for θ. The first two are unpenalized gmm, and unpenalized CUE. Given
valid instruments, these two methods will get parameter estimates for structural
parameters. An alternative approach for parameter estimation after the moment
selection has been done is the method proposed by Okui (2011): the shrinkage two
stages least squares (stsls) estimation process. This method shrinks to zero some
of the weights of the sample moment conditions and requires a minimum set of
moments known to be valid before estimation. The stsls estimation is as follows:
9
for a shrinkage parameter m we define P m = PZI + mPZII and the stsls as
stsls
= (Y 0 P m Y )−1 Y 0 P m y,
θ̂n,s
where ZI represents s valid moments from the first set, and ZII represents the
moments that may be selected as valid by an information criterion such as alasso or
gmm, from the second set of q − s moments/instruments. The shrinkage parameter
m is chosen to minimize the Nagar (1959)-type approximation of the mean squared
error. When there is only one endogenous variable, as in our simulation setup, the
estimate of the optimal shrinkage parameter is:
∗
m̂ =
Ŷ 0 PZI Ŷ
n
0
2
2 r + σ̂ 2 Ŷ PZI Ŷ
σ̂εu
ε
n
n
σ̂ε2
2
where σ̂ε2 and σ̂εu
are the estimates for σε2 = E(ε2i ) and σu2 = E(εi ui ) obtained from
a preliminary estimation as described in Okui (2011), and Ŷ is the prediction of Y
based on a least squares regression with the selected instruments. Okui (2011) works
only under homoskedasticity and m∗ is valid only under homoskedasticity.
3
Monte Carlo Simulations
The purpose of the Monte Carlo simulations is to compare the previously described
msc approaches in two respects. First, effectiveness in selecting the correct moments
conditions, and performance of the post selection estimators. We use the data
generating process in equations (4) and (5). We have only one endogenous variable
10
and set the true θ0 = 0.5. We employ (Z, ε, u) ∼ N (0, Σ) where

2
σzz
Iq


Σ=
 σ Zε

0q
σ 0Zε
00q
σε2
σεu
σεu
σu2






2
is a (q + 2) × (q + 2) symmetric matrix, σzz
is the variance of the instruments I q
is an identity matrix of order q, σ Zε is a q × 1 vector of correlations between the
instruments and the structural error, 0q is a q × 1 vector of zeros, σεu , σε2 and σu2 are
scalars. We impose an heteroskedastic error structure of the form
ε∗i = εi kZi k ,
with kZi k =
q
2
Zi1
+ · · · + Ziq2
A moment is valid if E[g(Zi , θ0 )] = E[Zi0 (y − Y θ0 )] = E[Zi0 ε] = σZε = 0. We
generate invalid moments by constructing σ Zε vectors in two ways: (1) constant
correlation D =
6 0 between the instrument and the structural error, and (2) local to
√
√
zero correlation of the form 1/n, 1/ n and 1/ 3 n to explore different convergence
rates. The homoskedastic case is when ε∗i = εi .
In all setups we have q total moments, and s of them known to be valid to the
researcher a priori. However, there are a total of r = s + sv valid moments, so we
have to select sv valid moments among q − s of them.
The number of valid and invalid moment conditions is generated in two ways: in
the first setup we simulate data from a fixed number of moments: q = 11, s = 3 and
r = 7. That is, there are 11 moments and we know that 3 of them are valid. We
have to select among the other 8. We set 4 of them valid and 4 of them invalid. The
errors for this setup are homoskedastic.
In the second setup we allow the number of valid moments to increase with the
11
sample size: q =
√
√
n, s = q and sv = (q − s)/2, that is, we have to choose among
q − s candidates and we set half of them as valid. The errors for this setup are
heteroskedastic. These are denoted as Setup 1 and Setup 2 respectively.
In our Setup 1, Σ is a 13 × 13 matrix constructed as follows: we simulate Z ∈ R11
divided in three categories: the first set of instruments are known to be strong
and valid (s = 3) as required by the mscs described in the previous section. As
mentioned before, the next set of instruments is divided into two categories: the
first four instruments are valid (sv = 4) and the last q − r = 4 are invalid. The last
elements of Σ are σ Zε = (0, 0, 0, 0, 0, 0, 0, D, D, D, D) in the constant correlation case
and σ Zε = (0, 0, 0, 0, 0, 0, nh , √hn ,
h
h
√
3 n, n)
in the local to zero scenario. Note that we
use three rates for the local to zero moments which are recycled as needed. We set
σε2 = 1, σu2 = 0.5, D = 0.2 and h = 1.
For each correlation structure we investigate weak and strong identification scenarios by changing π0 in equation 5. In the strong identification scenario we have
π0 = 2 · 111 and in the weak identification case we have π0 = (2 · 13 , 0.2 · 18 ) with
1` being a row vector of ones of length `. The second setup is constructed in an
analogous manner.
2
We set the variance of the instruments σzz
= {0.5, 1.0} · I q and the covariance
between the structural and reduced form errors σεu = 0.5. This gives us two cases:
2
2
in the Case 1 σzz
= 0.5 · I q and covue = 0.5 and in the Case 2 σzz
= 1.0 · I q and
covue = 0.5.
We have estimated many other cases for the covariance matrix: in the Case 3
2
2
σzz
= I q and covue = 0.5, in the Case 4 σzz
= I q and covue = 0.9, in the Case 5
2
2
σzz
= 2 · I q and covue = 0.5 and in the Case 6 σzz
= 2 · I q and covue = 0.9. These
cases and the local-to-zero ones are available on request.
12
The simulated sample sizes are n = {50, 100, 250}. All the results in the next
section are based on 1000 repetitions.
4
Results
We focus only on the most relevant and salient results of our simulation exercises:
Cases 1 and 2 for Setups 1 and 2 using invalid moments, with constant correlation
with the structural equation. We are not presenting all our simulated scenarios for
economy of space, and because the general results presented here hold across all the
alternative setups1 . We will focus on the weak and strong identification cases with
2
2
σzz
= 0.5I q and σzz
= 1I q . The analysis of the results is done with reference to two
questions: how good are the msc selection procedures, and which technique gives the
best estimation of the structural parameter θ0 . The R2 of the first stage regression is
presented in Table 7. It ranges from 0.533 to 0.944 depending on the strength of the
identification and the number of observations.
The moment selection methods are the adaptive lasso (alasso), penalized gmm
(gmmpen ) and penalized cue (cuepen ). We have nine post selection structural
parameter estimators: alasso–ma is the estimator obtained by selecting the moments
using the Adaptive Lasso method in the first stage and then using Okui (2011)’s
moment averaging estimator in the second stage. Setup 1 is the homoskedastic setup,
and we use the optimal m∗ in Okui (2011) MA method, which works only under
homoskedasticity. Setup 2 allows heteroskedasticity. So Okui (2011) method is not
designed for that and m∗ in Section 2.2 here is only designed for homoskedasticity.
But we still use m∗ in Section 2.2 in Setup 2 simulations to see how this method fares
1
We have extensive results for all the moment selection techniques discussed in the section 2 for
fixed correlation and local to zero correlation between the instruments and structural error available
on request.
13
under heteroskedasticity.
alasso–gmm and alasso–cue are the estimates using adaptive lasso to select
the valid moments in the first stage and then use them in the efficient and unpenalized
cue and gmm respectively. For the efficient gmm we have three estimators: gmm
is the gmm estimator using the full set of moments, gmmpen –ma uses the penalized
gmm estimator in Andrews and Lu (2001) for model selection and then use Okui’s
moment averaging estimator in the second stage. gmmpen –gmm selects the moments
in the same way as in the previous method but then the structural parameter is
estimated using efficient gmm. In the same way, cue denotes the cue estimator
using the full set of moments, cuepen –ma is the estimator obtained by selecting the
moments using the penalized cue criteria and using these moments in the moment
averaging estimator and cuepen –cue selects the moments using penalized cue and
estimates θ0 using the cue estimator.
A summary of our results is presented in Tables 1 and 2 for model selection and
post selection estimation performance respectively.
In Table 1 we present the average ranking of each method on the probability of
selecting the exact valid moments in Tables 3 and 4 for each sample size and strength
of the identification. In case of a tie the methods get the same ranking (and we can
have two first or two second places). From these tables we can see that the adaptive
lasso method is the best in perfect moment selection.
In Table 2 we present the performance of the post selection estimation methods.
The performance is assessed by the rmse. The rankings are based on the relative
performance in Tables 5a to 6b, presented by sample size and strength of identification.
The estimator with the smallest value acquires the rank of 1. Two estimators with
the same rmse are given the same rank. The Average Ranking ranges from 1 to 9
14
Table 1: Summary of the Performance of the Moment Selection Techniques
alasso
gmmpen
cuepen
Setup 1
1.25
2.58
1.83
Setup 2
1.33
2.67
1.92
Figures correspond to the average ranking of each method based on the probability of selecting the
exact valid moments. The latter are in Tables 3 and 4, by sample size and strength of identification.
In case of a tie the methods get the same ranking (we can have two first or two second places). alasso,
gmmpen and cuepen stands for adaptive lasso, penalized gmm and penalized cue respectively.
and the frequency of being in the Top Three from 0 to 12.
From Table 2 we conclude that the best estimator is obtained using the adaptive
lasso method to select the moments, followed by the moment averaging procedure
(alasso–ma). The moment averaging procedure improves estimation for the three
moment selection techniques. The worse estimators are obtained with the cue method.
In the heteroskedastic setup (Setup 2) adaptive lasso-ma (moment averaging in the
second stage) is still the best in terms of RMSE, but not as good compared to the
homoskedastic case (Setup 1). In the next sections we present the detailed analysis
of the moment selection and post selection estimation methods.
4.1
Model Selection
We analyze three msc methods: the Adaptive Lasso in equation 6, the Penalized
Efficient gmm and the Penalized Continuously Updated gmm in equation 7. In all
cases we adopt the bic criteria.
For each method we measure its performance by the probability of three events:
(1) the method selects the true number of valid moments and none of the invalid ones
(perfect selection), (2) it selects only valid moments but strictly less or greater than
true number of valid ones, and does not select any invalid at the same time, and (3),
15
Table 2: Summary of the Performance of the Post Selection Techniques
Setup 1
Average Times at the
Ranking
top three
alasso–ma
1.17
12
alasso–gmm
2.00
12
alasso–cue
4.83
4
gmm
4.33
3
gmmpen –ma
1.67
12
gmmpen –gmm
3.00
8
cue
6.33
0
cuepen –ma
1.92
11
cuepen –cue
4.25
5
Setup 2
Average Times at the
Ranking
top three
1.75
11
2.25
11
4.08
5
5.67
1
2.50
9
3.00
7
7.08
0
2.83
7
5.08
2
The performance is analyzed in terms of the rmse. The rankings are based in the relative performance
in the Tables 5a to 6b. The estimator with the smaller value takes the rank of 1. If there is a tie the
estimators are given the same rank. The average ranking ranges from 1 to 9 and the times at the
top three from 0 to 12. alasso–ma is the estimator obtained by selecting the moments using the
Adaptive Lasso method in the first stage and then using Okui’s moment averaging estimator in the
second stage. alasso–gmm and alasso–cue are the estimates using adaptive lasso to select the
valid moments in the first stage and then use them in the efficient and unpenalized cue and gmm
respectively. For the efficient gmm we have three estimators: gmm is the gmm estimator using the
full set of moments, gmmpen –ma uses the penalized gmm estimator in Andrews and Lu (2001) for
model selection and then use Okui’s moment averaging estimator in the first stage. gmmpen –gmm
selects the moments in the same way as the previous methods but then the structural parameter is
estimated using efficient gmm. In the same way, cue denotes the cue estimator using the full set of
moments, cuepen –ma is the estimator obtained by selecting the moments using the penalized cue
criteria and using these moments in the moment averaging estimator and cuepen –cue selects the
moments using penalized cue and estimates θ0 using the cue estimator.
16
it selects any invalid moments.
The first probability shows the probability of being ”perfect”. The second choice
is second best, since we do not choose the correct number of valid moments, but
still we choose only valid ones. One can benefit from this in the second stage of
structural parameter estimation in gmm. The third probability provides how bad
can the moment selection criteria behave. Since invalid moments can affect the finite
sample bias of the second stage structural parameter estimate badly, we would prefer
the third probability to be low.
The results are presented in Tables 3 and 4 for, respectively, the first setup (fixed
number of moments, homoskedastic case), and the second setup (increasing number
of moments, heteroskedastic case) for the weak and strong identification cases and
2
2
covariance among the instruments σzz
= 0.5 · I q and σzz
= Iq.
2
In Table 3, Setup 1, for σzz
= 0.5 · I q we find that in the smallest sample case,
n = 50, all three msc approaches behave poorly, particularly the gmmpen , selecting
invalid moments with a probability of 0.999 in the weak identification case, and 1 in
the strong identification case. The best method is the alasso, which selects invalid
moments with a probability of 0.676 and 0.733 in the weak and strong identification
cases respectively. The performance of the three methods improves when the sample
size increases to n = 100, but their relative positions remain the same: alasso
dominates the penalized methods. In this case the alasso selects invalid moments
with probability 0.383 and 0.517 in the weak and strong identification cases, whereas
the penalized methods do so with probabilities above 0.928. The performance ranking
changes when the sample size is increased to n = 250: the alasso still selects invalid
moments with the smallest probability, but the penalized cue method select perfectly
with probability 0.737 and 0.678 in strong and weak identification, compared with
17
0.355 and 0.278 for the alasso. However, if the objective is avoid selecting any
invalid moments, then the alasso still dominates, selecting any invalid moment with
probability 0.043.
Since alasso and the penalized methods are selection consistent we can take
this as evidence of differences in the convergence rates, with the penalized methods
converging faster in this case and the differences in the performance of the penalized
methods is negligible. This is not true for the next case.
2
In Table 3, Setup 1, when σzz
= 1 · I q the relative performance of the methods is
the same as in the previous case, but with alasso dominating in all the cases and
criteria. However the penalized methods are catching up the performance of alasso
at the sample size increases, with the cuepen slightly dominates its counterpart
gmmpen in all the cases. Also in all the cases the methods behave poorly compared
2
to the setup with σzz
= 0.5 · I q .
The conclusions for the Setup 2 are the same as those for the Setup 1.
It is noteworthy that the alasso moves smoothly between the three performance
measures (prefect, valid and invalid selection), whereas the penalized methods jump
from selecting invalid instruments to perfect selection as the sample size increases,
with undesirably small probabilities of selecting only valid moments (but not perfect)
under all our setups.
4.2
Post Selection Performance
In this section we analyze the post selection performance of the msc in terms of the
bias, standard deviation and rmse of the estimate for θ̂. For each of the three methods
we estimate the structural parameter using efficient and continuously updated gmm,
and the moment averaging method in Okui (2011). The results for the Setups 1 and
18
Table 3: Probabilities: Moment Selection Criteria. Setup 1
Weak Identification
Perfect Only
Any
selection Valid Invalid
2
σzz
= 0.5 · I q
n = 50
alasso
gmmpen
cuepen
n = 100
alasso
gmmpen
cuepen
n = 250
alasso
gmmpen
cuepen
2
σzz = 1 · I q
n = 50
alasso
gmmpen
cuepen
n = 100
alasso
gmmpen
cuepen
n = 250
alasso
gmmpen
cuepen
Strong Identification
Perfect Only
Any
selection Valid Invalid
0.028
0.001
0.003
0.296
0.000
0.000
0.676
0.999
0.997
0.025
0.000
0.003
0.242
0.000
0.000
0.733
1.000
0.997
0.136
0.056
0.066
0.481
0.004
0.006
0.383
0.940
0.928
0.063
0.045
0.052
0.420
0.003
0.005
0.517
0.952
0.943
0.355
0.737
0.737
0.602
0.052
0.052
0.043
0.211
0.211
0.278
0.673
0.678
0.613
0.047
0.047
0.109
0.280
0.275
0.008
0.000
0.000
0.192
0.000
0.000
0.800
1.000
1.000
0.010
0.000
0.000
0.152
0.000
0.000
0.838
1.000
1.000
0.037
0.003
0.006
0.315
0.002
0.002
0.648
0.995
0.992
0.013
0.001
0.002
0.264
0.001
0.000
0.723
0.998
0.998
0.199
0.155
0.158
0.543
0.004
0.004
0.258
0.841
0.838
0.127
0.118
0.121
0.508
0.007
0.008
0.365
0.875
0.871
Note: alasso, gmmpen and cuepen stands for adaptive lasso, penalized gmm and penalized cue
respectively. Presented numbers are the probabilities of (i) Perfect selection: the method selects
only the valid moments, (2) Only Valid: the method do not choose any invalid moment and (iii)
Any invalid: the method select any invalid moment. Setup 1 consist on a fixed number of moments
and homoskedastic errors. There are 11 moments, 3 known to be valid and 8 unknown. Among
these 4 are valid and 4 are invalid. π = (2 · 13 , 0.2 · 18 ) in the weak identification case and π = 2 · 111
in the strong identification case.
19
Table 4: Probabilities: Moment Selection Criteria. Setup 2
Weak Identification
Perfect Only
Any
selection Valid Invalid
2
= 0.5 · I q
σzz
n = 50
alasso
gmmpen
cuepen
n = 100
alasso
gmmpen
cuepen
n = 250
alasso
gmmpen
cuepen
2
σzz
= 1 · Iq
n = 50
alasso
gmmpen
cuepen
n = 100
alasso
gmmpen
cuepen
n = 250
alasso
gmmpen
cuepen
Strong Identification
Perfect Only
Any
selection Valid Invalid
0.085
0.011
0.017
0.301
0.000
0.000
0.614
0.989
0.983
0.058
0.007
0.012
0.247
0.000
0.000
0.695
0.993
0.988
0.196
0.098
0.104
0.397
0.005
0.005
0.407
0.897
0.891
0.136
0.078
0.094
0.367
0.004
0.006
0.497
0.918
0.900
0.266
0.518
0.522
0.662
0.045
0.051
0.072
0.437
0.427
0.192
0.493
0.510
0.639
0.039
0.050
0.169
0.468
0.440
0.044
0.003
0.003
0.198
0.000
0.000
0.758
0.997
0.997
0.032
0.001
0.003
0.182
0.000
0.000
0.786
0.999
0.997
0.080
0.014
0.018
0.264
0.000
0.000
0.656
0.986
0.982
0.055
0.009
0.008
0.221
0.000
0.000
0.724
0.991
0.992
0.109
0.038
0.041
0.515
0.005
0.005
0.376
0.957
0.954
0.068
0.031
0.038
0.433
0.000
0.001
0.499
0.969
0.961
Note: alasso, gmmpen and cuepen stands for adaptive lasso, penalized gmm and penalized cue
respectively. Presented numbers are the probabilities of (i) Perfect selection: the method selects
only the valid moments, (2) Only Valid: the method do not choose any invalid moment and (iii)
Any invalid: the method select any invalid moment.√ Setup 2 consist on an increasing number of
√
moments and heteroskedastic errors. There are q = n moments, q known to be valid and q − s
unknown. From these, (q − s)/2 are valid and (q − s)/2 are invalid. π = (2 · 1s , 0.2 · 1q−s ) in the
weak identification case and π = 2 · 1q in the strong identification case.
20
2 are presented in Tables 5a, 5b and 6a, 6b respectively.
2
In Table 6a, Setup 2, with weak identification and σzz
= 0.5 · I q we find that with
sample size n = 50 the best estimator is obtained by using alasso–ma with rmse at
0.143. The worst method is the cue with a rmse of 0.297. Note that when using the
full set of instruments the gmm estimator performs better than the cue in terms of
rmse, but cue has a smaller bias in the weak identification case. With the sample
size n = 50 the adaptive lasso based methods are the best in rmse in the strong
identification case. As the sample size increases all the estimators converge to the
2
true value. With σzz
= 1 · I q the relative performance remains the same, but the
rmse and the standard deviations are smaller.
In terms of coverage, alasso-gmm performs the best among several specifications.
In all setups and specifications, we see that alasso-gmm comes close to 95% coverage
whereas other methods cannot replicate this behavior. In terms of bias, in more
relevant Setup 2 (Tables 6a-b), we see that adaptive lasso based methods do very
well, but so does penalized cue in first stage, followed with second stage cue.
5
Conclusion
We have studied the relative performance of several moment selection techniques
in selecting the correct moments and in estimating the structural parameter. Our
simulations suggest that using adaptive lasso in the first stage, obtaining valid
instruments, followed by gmm, or moment averaging, deliver the most satisfactory
rmse for the structural parameter in both the homoskedastic and heteroskedastic
cases. This approach has important computational benefits due to the possibility
of estimation based on the lars algorithm, which makes it a good practical choice
when the number of instruments grows large.
21
Table 5a: Monte Carlo results for θ̂. Setup 1. (PART 1)
Mean
2
σzz
= 0.5 · I q
n = 50
alasso–ma
alasso–gmm
alasso–cue
gmm
gmmpen –ma
gmmpen –gmm
cue
cuepen –ma
cuepen –cue
n = 100
alasso–ma
alasso–gmm
alasso–cue
gmm
gmmpen –ma
gmmpen –gmm
cue
cuepen –ma
cuepen –cue
n = 250
alasso–ma
alasso–gmm
alasso–cue
gmm
gmmpen –ma
gmmpen –gmm
cue
cuepen –ma
cuepen –cue
Weak Identification
Sd
Bias rmse 95%c
Mean
Strong Identification
Sd
Bias rmse 95%c
0.508
0.515
0.497
0.541
0.516
0.528
0.486
0.518
0.492
0.066 0.008
0.071 0.015
0.142 -0.003
0.069 0.041
0.064 0.016
0.070 0.028
0.264 -0.014
0.066 0.018
0.182 -0.008
0.066
0.073
0.142
0.080
0.066
0.076
0.264
0.068
0.182
0.966
0.917
0.763
0.842
0.963
0.811
0.591
0.964
0.901
0.541
0.541
0.532
0.576
0.565
0.564
0.555
0.567
0.554
0.057
0.058
0.152
0.035
0.039
0.044
0.246
0.037
0.108
0.041
0.041
0.032
0.076
0.065
0.064
0.055
0.067
0.054
0.070
0.071
0.155
0.084
0.076
0.078
0.252
0.077
0.121
0.686
0.894
0.559
0.310
0.509
0.374
0.391
0.494
0.830
0.503
0.506
0.497
0.533
0.506
0.513
0.487
0.506
0.496
0.048 0.003
0.049 0.006
0.077 -0.003
0.051 0.033
0.047 0.006
0.050 0.013
0.172 -0.013
0.047 0.006
0.071 -0.004
0.048
0.050
0.077
0.061
0.048
0.052
0.173
0.048
0.071
0.945
0.927
0.855
0.937
0.943
0.891
0.791
0.944
0.920
0.528
0.528
0.524
0.575
0.549
0.549
0.560
0.551
0.540
0.048
0.048
0.085
0.024
0.038
0.040
0.143
0.037
0.048
0.028
0.028
0.024
0.075
0.049
0.049
0.060
0.051
0.040
0.055
0.056
0.089
0.079
0.063
0.063
0.155
0.063
0.063
0.663
0.929
0.644
0.230
0.452
0.455
0.462
0.433
0.872
0.501
0.502
0.500
0.529
0.501
0.503
0.496
0.501
0.500
0.028 0.001
0.028 0.002
0.029 0.000
0.032 0.029
0.028 0.001
0.028 0.003
0.076 -0.004
0.028 0.001
0.029 0.000
0.028
0.028
0.029
0.044
0.028
0.029
0.077
0.028
0.029
0.950
0.945
0.932
0.993
0.951
0.899
0.931
0.951
0.941
0.508
0.508
0.506
0.573
0.507
0.507
0.568
0.507
0.504
0.031
0.031
0.032
0.015
0.024
0.024
0.032
0.024
0.025
0.008
0.008
0.006
0.073
0.007
0.007
0.068
0.007
0.004
0.032
0.032
0.033
0.074
0.025
0.025
0.075
0.025
0.025
0.857
0.943
0.842
0.122
0.878
0.828
0.861
0.878
0.909
Setup 1 consist on a fixed number of moments and homoskedastic errors. There are 11 moments, 3
known to be valid and 8 unknown. Among these 4 are valid and 4 are invalid. π = (2 · 13 , 0.2 · 18 )
in the weak identification case and π = 2 · 111 in the strong identification case. alasso–ma is the
estimator obtained by selecting the moments using the Adaptive Lasso method in the first stage and
then using Okui’s moment averaging estimator in the second stage. alasso–gmm and alasso–cue
are the estimates using adaptive lasso to select the valid moments in the first stage and then use
them in the efficient and unpenalized cue and gmm respectively. For the efficient gmm we have three
estimators: gmm is the gmm estimator using the full set of moments, gmmpen –ma uses the penalized
gmm estimator in Andrews and Lu (2001) for model selection and then use Okui’s moment averaging
estimator in the first stage. gmmpen –gmm selects the moments in the same way as the previous
methods but then the structural parameter is estimated using efficient gmm. In the same way, cue
denotes the cue estimator using the full set of moments, cuepen –ma is the estimator obtained by
selecting the moments using the penalized cue criteria and using these moments in the moment
averaging estimator and cuepen –cue selects the moments using penalized cue and estimates θ0
using the cue estimator. 95%c is the coverage of the empirical 95% confidence intervals.
22
Table 5b: Monte Carlo results for θ̂. Setup 1. (PART 2)
2
σzz
= 1 · Iq
n = 50
alasso–ma
alasso–gmm
alasso–cue
gmm
gmmpen –ma
gmmpen –gmm
cue
cuepen –ma
cuepen –cue
n = 100
alasso–ma
alasso–gmm
alasso–cue
gmm
gmmpen –ma
gmmpen –gmm
cue
cuepen –ma
cuepen –cue
n = 250
alasso–ma
alasso–gmm
alasso–cue
gmm
gmmpen –ma
gmmpen –gmm
cue
cuepen –ma
cuepen –cue
Mean
Weak Identification
Sd
Bias rmse 95%c
Mean
Strong Identification
Sd
Bias rmse 95%c
0.505
0.509
0.497
0.521
0.510
0.515
0.500
0.510
0.499
0.046
0.049
0.121
0.049
0.045
0.049
0.202
0.045
0.106
0.005
0.009
-0.003
0.021
0.010
0.015
0.000
0.010
-0.001
0.047
0.050
0.121
0.053
0.046
0.051
0.202
0.047
0.106
0.971
0.924
0.779
0.850
0.973
0.810
0.683
0.973
0.911
0.523
0.523
0.516
0.538
0.534
0.534
0.524
0.535
0.528
0.037 0.023
0.038 0.023
0.149 0.016
0.025 0.038
0.026 0.034
0.029 0.034
0.219 0.024
0.025 0.035
0.066 0.028
0.043
0.044
0.149
0.046
0.043
0.044
0.220
0.043
0.072
0.808
0.902
0.603
0.528
0.727
0.522
0.513
0.731
0.821
0.503
0.504
0.500
0.517
0.505
0.510
0.508
0.506
0.503
0.033
0.035
0.052
0.035
0.033
0.035
0.098
0.033
0.042
0.003
0.004
0.000
0.017
0.005
0.010
0.008
0.006
0.003
0.033
0.035
0.052
0.039
0.034
0.036
0.099
0.034
0.042
0.948
0.921
0.867
0.928
0.951
0.885
0.819
0.951
0.919
0.517
0.517
0.515
0.538
0.532
0.532
0.532
0.532
0.529
0.031
0.032
0.062
0.018
0.021
0.023
0.113
0.021
0.027
0.017
0.017
0.015
0.038
0.032
0.032
0.032
0.032
0.029
0.036
0.036
0.064
0.041
0.038
0.039
0.117
0.038
0.040
0.707
0.927
0.636
0.425
0.530
0.472
0.468
0.526
0.868
0.501
0.502
0.501
0.515
0.502
0.505
0.507
0.502
0.502
0.020
0.020
0.021
0.022
0.020
0.021
0.029
0.020
0.021
0.001
0.002
0.001
0.015
0.002
0.005
0.007
0.002
0.002
0.020
0.020
0.021
0.026
0.020
0.021
0.030
0.020
0.021
0.949
0.946
0.928
0.988
0.951
0.944
0.921
0.950
0.943
0.508
0.508
0.507
0.536
0.519
0.519
0.535
0.520
0.518
0.022
0.022
0.022
0.011
0.020
0.021
0.013
0.020
0.022
0.008
0.008
0.007
0.036
0.019
0.019
0.035
0.020
0.018
0.023
0.024
0.024
0.038
0.028
0.028
0.037
0.028
0.028
0.760
0.944
0.756
0.325
0.514
0.575
0.520
0.512
0.896
Setup 1 consist on a fixed number of moments and homoskedastic errors. There are 11 moments, 3
known to be valid and 8 unknown. Among these 4 are valid and 4 are invalid. π = (2 · 13 , 0.2 · 18 )
in the weak identification case and π = 2 · 111 in the strong identification case. alasso–ma is the
estimator obtained by selecting the moments using the Adaptive Lasso method in the first stage and
then using Okui’s moment averaging estimator in the second stage. alasso–gmm and alasso–cue
are the estimates using adaptive lasso to select the valid moments in the first stage and then use
them in the efficient and unpenalized cue and gmm respectively. For the efficient gmm we have three
estimators: gmm is the gmm estimator using the full set of moments, gmmpen –ma uses the penalized
gmm estimator in Andrews and Lu (2001) for model selection and then use Okui’s moment averaging
estimator in the first stage. gmmpen –gmm selects the moments in the same way as the previous
methods but then the structural parameter is estimated using efficient gmm. In the same way, cue
denotes the cue estimator using the full set of moments, cuepen –ma is the estimator obtained by
selecting the moments using the penalized cue criteria and using these moments in the moment
averaging estimator and cuepen –cue selects the moments using penalized cue and estimates θ0
using the cue estimator. 95%c is the coverage of
23the empirical 95% confidence intervals.
Table 6a: Monte Carlo results for θ̂. Setup 2. (PART 1)
2
σzz
= 0.5 · I q
n = 50
alasso–ma
alasso–gmm
alasso–cue
gmm
gmmpen –ma
gmmpen –gmm
cue
cuepen –ma
cuepen –cue
n = 100
alasso–ma
alasso–gmm
alasso–cue
gmm
gmmpen –ma
gmmpen –gmm
cue
cuepen –ma
cuepen –cue
n = 250
alasso–ma
alasso–gmm
alasso–cue
gmm
gmmpen –ma
gmmpen –gmm
cue
cuepen –ma
cuepen –cue
Mean
Weak Identification
Sd
Bias rmse 95%c
Mean
Strong Identification
Sd
Bias rmse 95%c
0.516
0.518
0.497
0.563
0.529
0.541
0.491
0.531
0.497
0.142
0.145
0.174
0.146
0.141
0.145
0.297
0.140
0.234
0.016
0.018
-0.003
0.063
0.029
0.041
-0.009
0.031
-0.003
0.143
0.146
0.174
0.159
0.143
0.151
0.297
0.144
0.234
0.944
0.920
0.853
0.916
0.942
0.868
0.730
0.943
0.907
0.576
0.571
0.552
0.665
0.639
0.632
0.631
0.642
0.602
0.146 0.076
0.145 0.071
0.165 0.052
0.084 0.165
0.108 0.139
0.112 0.132
0.181 0.131
0.105 0.142
0.160 0.102
0.164
0.161
0.173
0.185
0.176
0.173
0.223
0.176
0.189
0.726
0.916
0.699
0.468
0.568
0.494
0.517
0.562
0.880
0.509
0.511
0.498
0.549
0.514
0.521
0.505
0.514
0.499
0.097
0.100
0.112
0.104
0.097
0.102
0.188
0.097
0.128
0.009
0.011
-0.002
0.049
0.014
0.021
0.005
0.014
-0.001
0.097
0.101
0.112
0.115
0.098
0.104
0.188
0.098
0.128
0.944
0.928
0.879
0.953
0.941
0.896
0.824
0.941
0.916
0.550
0.548
0.536
0.642
0.587
0.582
0.622
0.590
0.565
0.098 0.050
0.098 0.048
0.105 0.036
0.063 0.142
0.087 0.087
0.089 0.082
0.115 0.122
0.085 0.090
0.105 0.065
0.110
0.109
0.111
0.155
0.123
0.121
0.168
0.124
0.123
0.738
0.918
0.760
0.467
0.595
0.618
0.620
0.588
0.886
0.502
0.504
0.496
0.589
0.504
0.508
0.475
0.504
0.494
0.079
0.079
0.087
0.084
0.077
0.078
0.267
0.078
0.083
0.002
0.004
-0.004
0.089
0.004
0.008
-0.025
0.004
-0.006
0.079
0.080
0.087
0.122
0.078
0.078
0.268
0.078
0.083
0.924
0.934
0.913
0.979
0.928
0.903
0.905
0.927
0.932
0.538
0.538
0.529
0.717
0.534
0.532
0.686
0.536
0.516
0.091 0.038
0.090 0.038
0.092 0.029
0.039 0.217
0.069 0.034
0.069 0.032
0.095 0.186
0.070 0.036
0.071 0.016
0.099
0.098
0.096
0.220
0.077
0.076
0.209
0.079
0.073
0.770
0.927
0.773
0.012
0.778
0.779
0.789
0.758
0.903
√
Setup 2 consist on an increasing number of moments and heteroskedastic errors. There are q = n
√
moments, q known to be valid and q − s unknown. From these, (q − s)/2 are valid and (q − s)/2 are
invalid. π = (2 · 1s , 0.2 · 1q−s ) in the weak identification case and π = 2 · 1q in the strong identification
case. alasso–ma is the estimator obtained by selecting the moments using the Adaptive Lasso
method and then using Okui’s model averaging estimator, alasso–gmm and alasso–cue are the
estimates using adaptive lasso to select the valid moments and then use them in the efficient and
cue gmm respectively, gmm is the gmm estimator using the full set of moments, gmmpen –ma uses
the penalized gmm for model selection and then model averaging estimator. gmmpen –gmm selects
the moments in the same way and the structural parameter is estimated using efficient gmm. cue
denotes the cue estimator using the full set of moments, cuepen –ma is the estimator obtained by
selecting the moments using the penalized cue criteria and using these moments in the moment
averaging estimator, cuepen –cue selects the moments using penalized cue and estimates θ0 using
the cue estimator. 95%c is the coverage of the empirical 95% confidence intervals.
24
Table 6b: Monte Carlo results for θ̂. Setup 2. (PART 2)
2
σzz
= 1 · Iq
n = 50
alasso–ma
alasso–gmm
alasso–cue
gmm
gmmpen –ma
gmmpen –gmm
cue
cuepen –ma
cuepen –cue
n = 100
alasso–ma
alasso–gmm
alasso–cue
gmm
gmmpen –ma
gmmpen –gmm
cue
cuepen –ma
cuepen –cue
n = 250
alasso–ma
alasso–gmm
alasso–cue
gmm
gmmpen –ma
gmmpen –gmm
cue
cuepen –ma
cuepen –cue
Mean
Weak Identification
Sd
Bias rmse 95%c
Mean
Strong Identification
Sd
Bias rmse 95%c
0.516
0.516
0.500
0.546
0.528
0.534
0.512
0.530
0.513
0.142
0.145
0.168
0.144
0.139
0.144
0.239
0.139
0.206
0.016
0.016
0.000
0.046
0.028
0.034
0.012
0.030
0.013
0.142
0.146
0.168
0.152
0.142
0.148
0.239
0.142
0.207
0.950
0.907
0.854
0.897
0.951
0.861
0.757
0.951
0.899
0.558
0.554
0.540
0.620
0.606
0.602
0.601
0.606
0.584
0.133
0.135
0.150
0.087
0.100
0.104
0.146
0.099
0.136
0.058
0.054
0.040
0.120
0.106
0.102
0.101
0.106
0.084
0.145
0.145
0.155
0.148
0.146
0.145
0.178
0.145
0.160
0.797
0.906
0.728
0.618
0.719
0.598
0.613
0.717
0.885
0.509
0.511
0.502
0.535
0.516
0.522
0.516
0.517
0.507
0.096
0.099
0.110
0.101
0.095
0.100
0.143
0.096
0.119
0.009
0.011
0.002
0.035
0.016
0.022
0.016
0.017
0.007
0.096
0.100
0.110
0.107
0.097
0.103
0.144
0.097
0.119
0.938
0.923
0.882
0.934
0.940
0.891
0.846
0.938
0.918
0.545
0.543
0.537
0.601
0.580
0.577
0.591
0.582
0.569
0.090
0.091
0.097
0.063
0.074
0.077
0.087
0.074
0.091
0.045
0.043
0.037
0.101
0.080
0.077
0.091
0.082
0.069
0.101
0.101
0.104
0.119
0.109
0.109
0.126
0.110
0.114
0.775
0.917
0.741
0.636
0.652
0.647
0.610
0.643
0.890
0.506
0.507
0.500
0.563
0.518
0.521
0.526
0.518
0.505
0.079
0.079
0.082
0.081
0.077
0.080
0.135
0.078
0.088
0.006
0.007
0.000
0.063
0.018
0.021
0.026
0.018
0.005
0.079
0.080
0.082
0.103
0.079
0.082
0.137
0.080
0.088
0.921
0.930
0.905
0.974
0.921
0.934
0.882
0.917
0.931
0.549
0.548
0.542
0.654
0.603
0.599
0.645
0.605
0.590
0.084
0.084
0.085
0.040
0.070
0.073
0.055
0.069
0.079
0.049
0.048
0.042
0.154
0.103
0.099
0.145
0.105
0.090
0.097
0.097
0.094
0.159
0.125
0.123
0.155
0.125
0.120
0.635
0.923
0.640
0.110
0.322
0.401
0.377
0.307
0.898
√
Setup 2 consist on an increasing number of moments and heteroskedastic errors. There are q = n
√
moments, q known to be valid and q − s unknown. From these, (q − s)/2 are valid and (q − s)/2 are
invalid. π = (2 · 1s , 0.2 · 1q−s ) in the weak identification case and π = 2 · 1q in the strong identification
case. alasso–ma is the estimator obtained by selecting the moments using the Adaptive Lasso
method and then using Okui’s model averaging estimator, alasso–gmm and alasso–cue are the
estimates using adaptive lasso to select the valid moments and then use them in the efficient and
cue gmm respectively, gmm is the gmm estimator using the full set of moments, gmmpen –ma uses
the penalized gmm for model selection and then model averaging estimator. gmmpen –gmm selects
the moments in the same way and the structural parameter is estimated using efficient gmm. cue
denotes the cue estimator using the full set of moments, cuepen –ma is the estimator obtained by
selecting the moments using the penalized cue criteria and using these moments in the moment
averaging estimator, cuepen –cue selects the moments using penalized cue and estimates θ0 using
the cue estimator. 95%c is the coverage of the empirical 95% confidence intervals.
25
References
Andrews, Donald W. K. (1999) “Consistent moment selection procedures for generalized method of moments estimation,” Econometrica, Vol. 67, No. 3, pp. 543–564.
Andrews, Donald W. K. and Biao Lu (2001) “Consistent model and moment selection
procedures for GMM estimation with application to dynamic panel data models,”
Journal of Econometrics, Vol. 101, No. 1, pp. 123–164.
Andrews, Donald W.K. (1997) “Consistent Moment Selection Procedures for Generalized Method of Moments Estimation,” Cowles Foundation Discussion Papers
1146R, Cowles Foundation for Research in Economics, Yale University.
Andrews, Donald W.K. and James H. Stock (2007) “Testing with many weak instruments,” Journal of Econometrics, Vol. 138, No. 1, pp. 24–46.
Belloni, A., V. Chernozhukov, and C.B. Hansen (2011) “Lasso Methods for Gaussian Instrumental Variables Models,” working paper, Massachusetts Institute of
Technology, Department of Economics.
Canay, Ivan A. (2010) “Simultaneous selection and weighting of moments in {GMM}
using a trapezoidal kernel,” Journal of Econometrics, Vol. 156, No. 2, pp. 284–303.
Caner, M. and H. Zhang (2013) “Adaptive Elastic Net GMM with Diverging Number
of Moments,” Journal of Business and Economics Statistics, Forthcoming.
Caner, Mehmet (2009) “Lasso-Type GMM Estimator,” Econometric Theory, Vol. 25,
pp. 270–290.
Cheng, Xu and Zhipeng Liao (2012) “Select the Valid and Relevant Moments: A
one-step procedure for GMM with many moments,” PIER Working Paper Archive
26
12–045, Penn Institute for Economic Research, Department of Economics, University
of Pennsylvania.
Efron, Bradley, Trevor Hastie, Iain Johnstone, and Robert Tibshirani (2004) “Least
angle regression,” Annals of Statistics, Vol. 32, pp. 407–499.
Hansen, Bruce E. (2007) “Least Squares Model Averaging,” Econometrica, Vol. 75,
No. 4, pp. 1175–1189.
Hansen, Lars Peter (1982) “Large Sample Properties of Generalized Method of
Moments Estimators,” Econometrica, Vol. 50, No. 4, pp. 1029–1954.
Hastie, Trevor, Robert Tibshirani, and Jerome Friedman (2009) The Elements of
Statistical Learning: Data Mining, Inference, and Prediction: Springer, corrected
edition.
Hausman, Jerry, James H. Stock, and Motohiro Yogo (2005) “Asymptotic properties
of the Hahn-Hausman test for weak-instruments,” Economics Letters, Vol. 89, No.
3, pp. 333–342.
Hong, Han, Bruce Preston, and Matthew Shum (2003) “Generalized Empirical Likelihood Based Model Selection Criteria For Moment Condition Models,” Econometric
Theory, Vol. 19, No. 06, pp. 923–943.
Kuersteiner, Guido and Ryo Okui (2010) “Constructing Optimal Instruments by
First-Stage Prediction Averaging,” Econometrica, Vol. 78, No. 2, pp. pp. 697–718.
Liao, Zhipeng (2013) “Adaptive GMM shrinkage estimation with consistent moment
selection,” Econometric Theory, Vol. FirstView, pp. 1–48.
27
Nagar, A. L. (1959) “The Bias and Moment Matrix of the General k-Class Estimators
of the Parameters in Simultaneous Equations,” Econometrica, Vol. 27, No. 4, pp.
575–595.
Newey, W.K. and R.J Smith (2000) “Asymptotic Bias and Equivalence of GMM and
GEL Estimators,” Discussion paper 01/517, University of Bristol, Department of
Economics.
Okui, Ryo (2011) “Instrumental variable estimation in the presence of many moment
conditions,” Journal of Econometrics, Vol. 165, No. 1, pp. 70–86.
Smith, Richard J. (1992) “Non-Nested Tests for Competing Models Estimated by
Generalized Method of Moments,” Econometrica, Vol. 60, No. 4, pp. 973–980.
Zou, Hui (2006) “The adaptive lasso and its oracle properties,” Journal of the
American Statistical Association, Vol. 101, pp. 1418–1429.
Zou, Hui, Trevor Hastie, and Robert Tibshirani (2007) “On the “degrees of freedom”
of the lasso,” The Annals of Statistics, Vol. 35, No. 5, pp. 2173–2192.
28
6
Additional Table
Table 7: R2 of the First Stage Estimation
Setup 1
2
n Method Weak Strong
σzz
0.5 50 alasso 0.865 0.719
gmmpen 0.879 0.906
gmmpen 0.881 0.913
100 alasso 0.855 0.638
gmmpen 0.864 0.837
gmmpen 0.864 0.845
250 alasso 0.849 0.533
gmmpen 0.853 0.638
gmmpen 0.853 0.639
1.0 50 alasso 0.924 0.763
gmmpen 0.935 0.941
gmmpen 0.935 0.940
100 alasso 0.918 0.706
gmmpen 0.927 0.921
gmmpen 0.927 0.923
250 alasso 0.912 0.596
gmmpen 0.919 0.818
gmmpen 0.919 0.819
for Each Method
Setup 2
Weak Strong
0.861 0.698
0.871 0.877
0.872 0.887
0.888 0.700
0.893 0.844
0.893 0.851
0.879 0.564
0.883 0.655
0.883 0.660
0.922 0.743
0.930 0.925
0.930 0.930
0.939 0.763
0.943 0.919
0.944 0.923
0.929 0.636
0.937 0.853
0.937 0.859
Note: alasso, gmmpen and cuepen stands for adaptive lasso, penalized gmm and penalized cue
respectively. Setup 1 consist on a fixed number of moments and homoskedastic errors. There are
11 moments, 3 known to be valid and 8 unknown. Among these 4 are valid and 4 are invalid.
π = (2 · 13 , 0.2 · 18 ) in the weak identification case and π = 2 · 111 in the strong identification case.
√
Setup 2 consist on an increasing number of moments and heteroskedastic errors. There are q = n
√
moments, q known to be valid and q − s unknown. From these, (q − s)/2 are valid and (q − s)/2
are invalid. π = (2 · 1s , 0.2 · 1q−s ) in the weak identification case and π = 2 · 1q in the strong
identification case.
29
Download