Identification Robust Confidence Sets for Inference on Parameter

advertisement
Identification Robust Confidence Sets for Inference on Parameter
Ratios with Application to Discrete Choice Models1
Denis Bolduc 2
Université Laval
Lynda Khalaf3
Université Laval
Clément Yélou4
Université Laval
November 13, 2005
1
We would like to thank Jean-Marie Dufour, Mohamed Taamouti, Gary Zerbe and Paravastu Swamy
for useful comments. This work was supported by the Canada Research Chair Program (Chair in Environment, Université Laval), the Institut de Finance Mathématique de Montréal (IFM2), the Canadian
Network of Centres of Excellence [program on Mathematics of Information Technology and Complex
Systems (MITACS)], the Social Sciences and Humanities Research Council of Canada, the Fonds de
recherche sur la société et la culture (Québec), and the Chair on the Economics of Electric Energy
(Université Laval).
2
Groupe de recherche en économie de l’énergie, de l’environnement et des ressources naturelles
[GREEN], Université Laval. Mailing address: Pavillon J.-A.-De Sève, Ste-Foy, Québec, Canada, G1K
7P4. TEL: (418) 656-5427; FAX: (418) 656-2707; Email: denis.bolduc@ecn.ulaval.ca.
3
Canada Research Chair Holder (Environment). Département d’économique and Groupe de recherche
en économie de l’énergie, de l’environement et des ressources naturelles [GREEN], Université Laval,
and Centre interuniversitaire de recherche en économie quantitative (CIREQ), Université de Montréal.
Mailing address: GREEN, Université Laval, Pavillon J.-A.-De Sève, Ste-Foy, Québec, Canada, G1K 7P4.
TEL: (418) 656 2131-2409; FAX: (418) 656 7412; email: lynda.khalaf@ecn.ulaval.ca.
4
Groupe de recherche en économie de l’énergie, de l’environnement et des ressources naturelles
[GREEN], Université Laval. Mailing address: Pavillon J.-A.-De Sève, Ste-Foy, Québec, Canada, G1K
7P4. Email: cyelou@ecn.ulaval.ca.
Abstract
We study the problem of building confidence sets for ratios of parameters, from an identification robust perspective. In particular, we address the simultaneous confidence set estimation
of a finite number of ratios. Results apply to a wide class of models suitable for estimation by
consistent asymptotically normal procedures. Conventional methods (e.g. the delta method)
derived by excluding the parameter discontinuity regions entailed by the ratio functions and
which typically yield bounded confidence limits, break down even if the sample size is large
[Dufour (1997)]. One solution to this problem, which we take in this paper, is to use variants
of Fieller (1940, 1954)’s method. By inverting a joint test that does not require identifying the
ratios, Fieller-based confidence regions are formed for the full set of ratios. Simultaneous confidence sets for individual ratios are then derived applying projection techniques, which allow for
possibly unbounded outcomes. In this paper, we provide simple explicit closed-form analytical
solutions for projection-based simultaneous confidence sets, in the case of linear transformations
of ratios. Our solution further provides a formal proof for the expressions in Zerbe, Laska,
Meisner and Kushner (1982) pertaining to individual ratios. We apply the geometry of quadrics
as introduced by Dufour and Taamouti (2005a, 2005b), in a different although related context.
The confidence sets so obtained are exact if the inverted test statistic admits a tractable exact
distribution, for instance in the normal linear regression context. The proposed procedures are
applied and assessed via illustrative Monte Carlo and empirical examples, with a focus on discrete choice models estimated by exact or simulation-based maximum likelihood. Our results
underscore the superiority of Fieller-based methods.
Key words: confidence set; generalized Fieller’s theorem; delta method; weak identification;
parameter transformation; discrete choice; simulated maximum likelihood.
Journal of Economic Literature classification: C10, C35, R40.
i
Contents
1 Introduction
1
2 Statistical Framework
3
3 Confidence Set methods for One Ratio
5
4 Simultaneous Confidence Sets for Multiple Ratios
6
5 Simulation based and empirical illustrations
5.1 Simulation studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.2 Empirical Application: discrete choice models of travel demand . . . . . . . . .
8
10
11
6 Conclusion
14
A The Fieller-type solution for one parameter ratio
16
B Proof of Theorem 1
17
C Projections for individual ratios
18
List of Tables
1
2
3
4
5
Empirical coverage rates for the delta method and the Fieller method based confidence sets for a parameter ratio in a simple binary probit model. . . . . . . . .
Empirical coverage rates for the delta method and the Fieller method based confidence sets for a parameter ratio in a multinomial probit model with a logit
kernel. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Simultaneous confidence sets for values of total travel time and of out-of-vehicle
time from Ben-Akiva and Lerman (1985)’s trinomial logit model; 95% nominal
level. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Simultaneous confidence sets for values of time as percentage of net personal income.
Simultaneous confidence sets for values of time as percentage of net personal
income (continued). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
ii
11
12
13
14
15
1
Introduction
The problem of constructing confidence set (CS) estimates for parameter ratios arises in a
variety of econometrics contexts. Important examples include estimation of price and income
elasticities in demand systems [see e.g. Deaton and Muellbauer (1980); Banks, Blundell and
Lewbel (1997)], and inference on value of time in discrete-choice models for travel demand
[Ben-Akiva and Lerman (1985), Ben-Akiva, Bolduc and Bradley (1993), Bolduc (1999)].
The delta Wald-type method is commonly applied to construct confidence intervals (CI) for
ratios of parameters or ratios of linear combinations of parameters. In view of its Wald-type
form, the method is justified asymptotically for a wide class of models suitable for estimation by
consistent asymptotically normal procedures. However, even when the model under consideration is identifiable, parameter ratios involve a possibly discontinuous parameter transformation.
More precisely, the ratio is locally almost unidentified (LAU) i.e. is weakly identified over a
subset of the parameter space. In such contexts, Wald-type CS methods can have arbitrarily
poor coverage, as shown by Dufour (1997). Alternative methods based on generalizing Fieller’s
theorem [ Fieller (1940, 1954)] have recently recaptured the attention of theoretical and applied
econometricians [see e.g. the recent surveys on weak identification in econometrics by Dufour
(2003) and Stock, Wright and Yogo (2002)]. In this paper, we consider Fieller-type simultaneous
confidence sets for multiple ratio functions.
Fieller (1940, 1954)’s original Theorem proposes a procedure based on inverting a pivotal
statistic to obtain an exact CS for the ratio of two means of normal variates. Scheffé (1970)
proposes a modification of Fieller’s procedure, which avoids trivial CSs, i.e. CSs which cover
the entire real line. Zerbe et al. (1982) [see also Zerbe (1978) and Dufour (1997)] extend Fieller’s
theorem in two directions. First, they focus on ratios of parameters in the normal linear regression model. Secondly, they construct multiple confidence regions and simultaneous CSs
for several ratios of linear combinations of parameters. In this case, normality still guarantees
exact confidence levels. Extensions to ratios of asymptotically normal variates have also been
considered [see e.g. Young, Zerbe and Hay (1997)], leading to a generalized Fieller solution.1
As with the delta method, the generalized Fieller approach is based on a consistent asymptotically normal estimator of the parameters whose ratios are under consideration. Yet both
methods exploit the latter asymptotic result in fundamentally different ways. In contrast to the
delta method which is derived by excluding the parameter discontinuity regions entailed by the
ratio functions and which typically yields bounded confidence limits, Fieller-based confidence
regions are formed by inverting a test that does not require identifying the ratios. The geometry
of inverting test statistics typically leads to possibly unbounded solutions, a pre-requisite for
ensuring reliable coverage [Dufour (1997)].
1
Applications are found more frequently in statistics than econometrics. See e.g. Darby (1980), Selwyn and
Hall (1984), Buonaccorsi (1985), Bucephala and Gatsonis (1988), Zerbe (1978), Zerbe et al. (1982), Young et al.
(1997). Young et al. (1997) apply Zerbe et al. (1982)’s results to the asymptotic context of linear and nonlinear
mixed-effects models.
1
Applications of Fieller’s method in econometrics are scarce. However, related results can
be found in the so called weak instruments literature [which is now considerable; see Dufour
and Jasiak (2001), Moreira (2003), Kleibergen (2005) and the surveys by Stock et al. (2002)
and Dufour (2003)]. The weak instruments problem relates to the problem of estimating ratios
through LAU difficulties. As is evident from the above cited surveys, recent work on instrumental
models has focused on pivotal (exact or asymptotic) statistics aimed at being robust (invariant)
to identification status. This property underlies Fieller’s approach, which we study here in the
case of parameter ratios.
Our contributions are twofold. First, we address the simultaneous CS estimation of a finite
number of ratios, in a generalized Fieller setting, i.e. given a general asymptotically normal parameter estimate. Simultaneity [for definitions and references, see Savin (1984), Dufour (1989),
and Scheffé (1959)] implies controlling joint coverage for all CSs, or more formally, controlling
the probability that all the confidence expressions made hold jointly.
Fieller’s procedure for simultaneous inference starts from a joint confidence region for the
full set of ratios, obtained through the inversion of an identification-robust test [as in e.g. Zerbe
et al. (1982)]. Simultaneous CSs for individual ratios or for transformations of ratios are then
derived from the latter joint region using projection techniques. Such techniques may however
raise non-trivial analytical complications. In this paper, we provide simple explicit closed-form
projection-based simultaneous CS formulas for linear transformations of ratios. Results hold
exactly in the normal linear regression model. Our general solution further provides a formal
proof for the expressions in Zerbe et al. (1982) pertaining to individual ratios. The CSs so
obtained are not necessarily bounded. We analyze the unbounded and the trivial outcome cases
and provide recommendations for practical applications.
Our method of proof uses quadric mathematical tools as introduced by Dufour and Taamouti
(2005a, 2005b) for inference in instrumental regressions under weak instruments. The latter
approach has not been considered (to the best of our knowledge) in the literature on estimating
ratios, although as will become clear from our presentation, simple formula conveniently obtain
despite the complicated geometric surfaces under consideration.
Secondly, we illustrate our theoretical results with focus on discrete choice models estimated
by exact or simulated maximum likelihood (SML). We analyze illustrative Monte Carlo and
empirical examples. In discrete choice models, Fieller based approaches hold asymptotically, so
it is important to assess their performance in finite samples.2 We study a simple binary probit
model, and a multinomial probit model with a logit kernel [see Ben-Akiva, Bolduc and Walker
(2001), Bolduc (1999)]. Our simulation results can be summarized as follows. As expected,
we find that the delta method based CSs have very poor coverage, even in the simplest design
considered. In contrast, Fieller’s method performs extremely well, even in the most complicated
design considered. We also revisit two empirical studies from Ben-Akiva and Lerman (1985)
and Bolduc (1999) on transportation demand. We show that the Fieller and delta methods can
lead to dramatically different empirical implications, even with very large samples. Our results
2
Finite sample problems have been documented in some standard discrete choice settings even with linear
hypothesis tests; see e.g. Davidson and MacKinnon (1999); see also Savin and Würtz (1999) and Savin and
Würtz (2001).
2
underscore the superiority of Fieller-based methods.
The paper is organized as follows. Section 2 defines our statistical framework. In section
3, to set focus, we discuss the delta and Fieller’s methods in the case of a single parameter
ratio. In section 4, we consider the multiple ratio case. Section 5 presents several empirical and
simulation based examples and applications. Section 6 concludes.
2
Statistical Framework
Consider the general model
(Y, {Pθ : θ ∈ Θ}) ,
Θ ⊂ Rp ,
p≥1
(2.1)
where Y is the sample space and Pθ is a probability distribution over Y indexed by θ =
(θ1 , θ2 , ..., θ p )0 . Given a sample of size T , we estimate θ by
asy
θ̂ = (θ̂1 , θ̂2 , ..., θ̂p )0 ∼ N (θ, Σθ )
(2.2)
asy
where the symbol ∼ refers to the estimator’s asymptotic distribution, and Σθ is estimated
b θ . Parameters of interest include ρ = (ρ1 , ρ2 , ..., ρs )0 where
consistently by Σ
ρi = hi (θ) = L0i θ/K 0 θ,
i = 1, ..., s,
s≤p−1
(2.3)
and {L1 , L2 , ..., Ls , K} is a linearly independent set of fixed (nonstochastic) p×1 vectors.3 These
s ratio functions have the same discontinuity set
©
ª
(2.4)
DK = θ ∈ Θ : K 0 θ = 0
which is clearly non-empty. Ratios with the same denominator are encountered in many econometric applications; these include long run elasticities in dynamic demand models, and the
economic value of time for several use-specific portions of travel time in transportation research.
In this context, marginal Wald-type CIs each with asymptotic level 1 − α can be obtained
for each one of the ratios applying the following result, usually known as the delta method:
³ ´
³ ´

0 θ̂
θ̂
∂h
∂h
i
i
asy
 , i ∈ {1, ..., s} .
(2.5)
Σθ
hi (θ̂) ∼ N hi (θ) ,
∂θ
∂θ0
For the same problem, Fieller’s method [see e.g. Zerbe et al. (1982)] inverts a Wald-type test
associated with the hypothesis
L0i θ − ρi K 0 θ = 0,
i ∈ {1, ..., s} .
3
Observe that if s ≥ p, then {L1 , L2 , ..., Ls , K} are linearly dependent. Indeed, if s > p, then it is always
possible to express at least s − p elements of the set {L1 , L2 , ..., Ls } as a linear combination of the others, and if
s = p, then K is expressible as a linear combination of L1 , ..., Ls .
3
Inverting a test with respect to a parameter means that we collect all the values of this parameter
for which the test is not significant. We discuss this motivational case in section 3, with emphasis
two fundamental issues: (i) the null distribution of the inverted test holds without assuming the
ratio at hand is identified, and (ii) the associated solution is not necessarily an interval.
The Fieller and delta methods discussed so far produce non independent CSs with nominal
(asymptotic) level 1 − α, for each ratio individually. A joint inference procedure based on
combining such CSs has level not smaller than 1 − sα. Here, our aim is to construct alternative
simultaneous CSs which ensure overall 1 − α level control, for the vector of ratios ρ, as well as
for any linear combination of ρ:
(2.6)
lw (ρ) = w0 ρ,
where w = (w1 , w2 , ..., ws )0 is any known fixed s × 1 vector. This case is discussed in section 4,
where we proceed as follows.
First, we derive a joint confidence region for the vector ρ whose validity, once again, does
not require identifying any of the ratios. Formally, we obtain a subset of Rs , denoted CS (ρ; α),
such that (asymptotically)
Pr [ρ ∈ CS (ρ; α)] ≥ 1 − α
(2.7)
for all θ ∈ Θ [i.e. without excluding the discontinuity subset (2.4)], by inverting a joint Waldtype test associated with the joint hypothesis
L0i θ − ρi K 0 θ = 0,
i = 1, ..., s.
We next derive CSs for any transformation of ρ of the form (2.6), by projection techniques
applied to CS (ρ; α); of course, by convenient choices for the vector w, our result covers the
individual ratios case. To do this, we write CS (ρ; α) in the quadrics form, which allows the
application of Dufour and Taamouti (2005b)’s results. The CSs so obtained are simultaneous in
the following sense.
For any set of m continuous real valued functions of ρ, gi (ρ) ∈ R, i = 1, ..., m, let gi (CS (ρ; α))
denote the image of CS (ρ; α) by the function gi . Clearly, ρ ∈ CS (ρ; α) ⇒ gi (ρ) ∈ gi (CS (ρ; α)),
i = 1, .., m, hence
Pr [gi (ρ) ∈ gi (CS (ρ; α)) ,
i = 1, .., m] ≥ Pr [ρ ∈ CS (ρ; α)] .
(2.8)
Then equation (2.7) implies that (asymptotically)
Pr [gi (ρ) ∈ gi (CS (ρ; α)) ,
i = 1, .., m] ≥ 1 − α ,
∀θ ∈ Θ,
(2.9)
which means that the sets gi (CS (ρ; α)) are: (i) simultaneous [by the definition of simultaneity,
see Miller (1981), Dufour (1989), Abdelkhalek and Dufour (1998)], and (ii) identification robust,
because (2.9) does not exclude the discontinuity set (2.4). If a tractable procedure is available
to derive gi (CS (ρ; α)), then valid CS inference on any arbitrary number of transformations of
ρ is feasible ensuring overall level control. Here we provide simple explicit solutions for the case
where the functions gi are of the linear form.
Throughout the paper, we use the following notation: Is is the s-dimensional identity matrix,
and 0s refers to an s-dimensional vector of zeros.
4
3
Confidence Set methods for One Ratio
In this section, we consider a special case which serves to present the delta and Fieller methods
in their simplest form, as they apply to the ratio function δ (θ) = θ1 /θ2 . Let
¸
·
v̂1 v̂12
Σ̂12 =
v̂12 v̂2
b θ that corresponds to θ̂1 and θ̂2 . For this problem, the delta method
denote the submatrix of Σ
leads to the usual Wald-type 1 − α level confidence interval (CI):
i
h
1/2
1/2
(3.1)
DCS (δ; α) = θ̂1 /θ̂2 − zα/2 Σ̂δ , θ̂1 /θ̂2 + zα/2 Σ̂δ ,
´
³
2 0
Σ̂δ = Ĝ0 Σ̂12 Ĝ, Ĝ = 1/θ̂2 , −θ̂1 /θ̂2 ,
where zα/2 refers to the two-tailed α-level standard normal cut-off point.
Applying Fieller’s theorem to the context at hand, we are lead to consider the restriction
θ1 − δθ2 = 0,
(3.2)
and its associated t-test statistic
θ̂1 − δ θ̂2
asy
t (δ) = ¡
∼ N (0, 1)
¢
1/2
δ 2 v̂2 − 2δv̂12 + v̂1
(3.3)
under (3.2). The 1 − α level CS which inverts t (δ) corresponds to the values of δ 0 such that
|t (δ 0 )| ≤ zα/2 , or alternatively to the set:
¾
½
´2
³
¡
¢
2
2
FCS (δ; α) = δ 0 : θ̂1 − δ 0 θ̂2 ≤ zα/2 v̂1 + δ 0 v̂2 − 2δ 0 v̂12 .
(3.4)
Aδ 20 + 2Bδ 0 + C ≤ 0,
(3.5)
This requires solving the following second degree polynomial inequality for δ 0 :
A=
2
θ̂2
2
− zα/2
v̂2 ,
2
B = −θ̂1 θ̂2 + zα/2
v̂12 ,
C=
2
θ̂1
2
− zα/2
v̂1 .
(3.6)
In appendix A, we discuss the solutions to the latter inequality [see Scheffé (1970), Zerbe et al.
(1982) and Dufour (1997)]. Two properties are worth noting. First, FCS (δ; α) cannot be an
empty set. Second, FCS (δ; α) is either a bounded interval, an unbounded
¯ interval,¯or the entire
¯
¯
real line ]−∞, +∞[, where the unbounded solution occurs only when ¯θ̂2 / (v̂2 )1/2 ¯ < zα/2 , i.e.
when the Student’s t-test of the hypothesis θ2 = 0 is not significant at level α.
5
4
Simultaneous Confidence Sets for Multiple Ratios
In this section, we consider the joint estimation of several ratios of the form (2.3). As proposed
in section 2, we invert a Wald test of the following linear hypothesis associated with the s ratios
under consideration:
RHθ = 0 ⇔ L0i θ − ρi K 0 θ = 0,
·
¤0
£
H = L1 . . . Ls K , R =
i = 1, ..., s,
−ρ1 .
Is
. . −ρs
¸0
(4.1)
.
(4.2)
The s × (s + 1) matrix R has full row rank for any ρi , i = 1, ..., s; and since the s + 1 vectors
{L1 , L2 , ..., Ls , K} are linearly independent, the matrix H has full row rank. Let WRH denote
the Wald statistic associated with (4.1):
´−1 ³
´ asy
³
´0 ³
0 0
b
RH Σθ H R
RH θ̂ ∼ χ2 (s)
WRH = RH θ̂
under (4.1). This leads to the following joint CS for the components of ρ:
CS (ρ; α) = {ρ ∈ Rs : WRH ≤ cs,α }
(4.3)
where cs,α refers to the 1 − α percentile point of the χ2 (s) distribution. On observing that WRH
can be decomposed as follows [see Zerbe et al. (1982)],
¸0
³ ´0 ³
´−1 ³ ´ ·¡
´−1
¢³
0
0
0
b
b
H θ̂
WRH = H θ̂
H Σθ H
H θ̂ − ρ , 1 H Σθ H
·
¸ ·
¸
´−1 ¡
´−1
¡ 0 ¢³
¢0 −1 ¡ 0 ¢ ³
0
0
0
b
b
ρ , 1 H Σθ H
ρ , 1 H Σθ H
H θ̂ ,
ρ ,1
(4.4)
we can rewrite (4.3) as:
where
n
CS (ρ; α) = ρ ∈ Rs : ρ0∗ M ρ∗ ≤ 0,
¡
¢0 o
,
ρ∗ = ρ0 , 1
¸ ·³
¸0
³
´−1 ·³
´−1
´−1
0
0
0
bθH
bθH
bθH
M = c HΣ
− HΣ
H θ̂
HΣ
H θ̂
´−1 ³ ´
³ ´0 ³
bθH0
HΣ
H θ̂ − cs,α .
c = H θ̂
(4.5)
(4.6)
(4.7)
Applying the projection technique to the latter set is, at first sight, not a trivial problem.
Zerbe et al. (1982) provide a solution without proof (based on adapting, informally, Scheffé
(1953, 1959)’s simultaneous confidence limits in ANOVA contexts) which does not cover linear
transformations of ρ [of the form (2.6)]. Here we solve the latter problem, which also of course
provides a formal proof for Zerbe et al. (1982)’s results on the individual ratios.
6
To do this, we use Dufour and Taamouti (2005a, 2005b)’s results on the geometry of quadrics.
The set of points that satisfy an equation of the form ρ0 Γρ + Φ0 ρ + γ = 0, where Γ is a symmetric
s × s matrix, Φ is a s × 1 vector and γ is a scalar, constitutes a quadric surface. A confidence
set for ρ of the form
©
ª
Cρ = ρ0 : ρ00 Γρ0 + Φ0 ρ0 + γ ≤ 0
is a quadric confidence set ( Dufour and Taamouti (2005a, 2005b)). Depending on the values of
Γ, Φ, and γ, it may take several forms, including ellipsoids, paraboloids and hyperboloids. So we
proceed to express the CS (4.5) in the quadrics form. To the best of our knowledge, the latter
approach has not been considered to date in solving (4.5).
Partition the matrix M according to ρ∗ = (ρ0 , 1)0 in the form:
·
¸
M11 M12
M=
,
(4.8)
M21 M22
0 , and M
where M11 is an s × s matrix, M12 is a s × 1 vector, M21 = M12
22 is a scalar. Let
£
¡ 0 ¢
¤
S1 = Is 0s , S2 = 0s , 1 ,
(4.9)
so ρ = S1 ρ∗ , M11 = S1 M S10 , M22 = S2 M S20 ; then
0
ρ0∗ M ρ∗ = ρ0 M11 ρ + 2M12
ρ + M22 .
Hence (4.5) can be written as the following quadric CS:
©
ª
0
CS (ρ; α) = ρ ∈ Rs : ρ0 M11 ρ + 2M12
ρ + M22 ≤ 0 .
Further, if we consider scalar linear transformations of ρ, then the sets
¡
¢ ©
ª
0
CS w0 ρ; α = w0 ρ0 : ρ00 M11 ρ0 + 2M12
ρ0 + M22 ≤ 0
(4.10)
(4.11)
(4.12)
are simultaneous confidence sets for w0 ρ, w ∈ Rs \{0}. Using Dufour and Taamouti (2005b)’s
general expressions, we can now characterize the explicit form of (4.12). As shown by Dufour
and Taamouti (2005b), the result depends on whether M11 is singular or not.
Theorem 1 Let M11 , M12 , and M22 be defined by (4.6)-(4.8). Then M11 is nonsingular; specifically, s − 1 eigenvalues of M11 have the same sign as c [defined in (4.7)] and the remaining
eigen value has the same sign as
´−1 ³
´0 ³
´
³
bθK
K 0Σ
K 0 θ̂ − cs,α .
(4.13)
a = K 0 θ̂
In addition, if we define
−1
0
M11
M12 − M22 ,
d = M12
(4.14)
then d > 0 if and only if M11 is a positive definite or a negative definite matrix.
This Theorem is proved in Appendix B, where we exploit several useful derivations from
Zerbe et al. (1982, Appendix C). Now applying Theorem 1 to Theorems 5.1-5.3 from Dufour
and Taamouti (2005b), we obtain the following CS solution.
7
Theorem 2 Let M11 , M12 , M22 and d be defined by (4.6)-(4.8) and (4.14); let w ∈ Rs \{0} and
−1
W11 = w0 M11
w,
−1
f = −M11
M12
(4.15)
and let CS (w0 ρ; α) refer to the projection-based confidence set for w0 ρ defined by (4.12). If all
the eigenvalues of M11 are positive, then
i
¢ h
¡
CS w0 ρ; α = w0 f − (dW11 )1/2 , w0 f + (dW11 )1/2 .
If M11 has at least two negative eigenvalues, then CS (w0 ρ; α) = R. If M11 has exactly one
−1
w > 0, then CS (w0 ρ; α) = R; (ii)
negative eigenvalue, then the following obtains: (i) if w0 M11
−1
−1
0
0
0
0
if w M11 w = 0, then CS (w ρ; α) = R\{w f }; (iii) if w M11 w < 0,
h
i h
¢ i
¡
CS w0 ρ; α = −∞, w0 f − (dW11 )1/2 ∪ w0 f + (dW11 )1/2 , +∞ .
We thus see that unbounded CSs occur when M11 has negative eigenvalues, which depends
on the sign of a [defined in (4.13)]. Interestingly, observe that the sign of a can be linked to a
Wald test on the denominator of the ratios. Indeed
³
´−1 ³
³
´−1 ³
´0 ³
´
´0 ³
´
bθK
bθK
K 0Σ
K 0 θ̂ < c1,α ⇒ K 0 θ̂
K 0Σ
K 0 θ̂ < cs,α
K 0 θ̂
(4.16)
where c1,α denote the (1 − α) percentile point of the χ2 (1) distribution. In other words, if a
Wald test of K 0 θ = 0 is not significant at level α, then a < 0 [as defined in (4.13)]. So the case
c > 0 coupled by a non-significant (at level α) Wald test on the denominator term implies that
M11 has exactly one negative eigenvalue, leading to unbounded or trivial CSs.
Theorem 2 yields CSs for individual ratios by convenient choices for w. We prove in Appendix C that for any individual ratio ρi , i = 1, 2, ..., s, the confidence limits we obtain coincide
numerically with the solutions of the following quadratic inequality:
(4.17)
Ai ρ2i0 + 2Bi ρi0 + Ci ≤ 0,
³
³ ´³
³ ´2
´2
´
0b
b θ K, Bi = cs,α L0 Σ
b θ K− L0 b
b θ Li .
=
L0ib
θ −cs,α K 0 Σ
θ
K
θ
and
C
θ −cs,α L0i Σ
where Ai = K 0b
i
i
i
This is exactly the same solution informally suggested by Zerbe et al. (1982).
We clearly see that computing the latter CSs does not require any extra cost compared to
the delta method. To conclude, observe that Theorem 1 implies, in conjunction with Dufour
and Taamouti’s results, that the projection based CS for any linear transformation w0 ρ cannot
be empty. Indeed, as may be seen from section 5 in Dufour and Taamouti (2005b), the only
case where the CS is empty corresponds to a positive definite M11 with d < 0. Here Theorem 1
conveniently rules out this case.
5
Simulation based and empirical illustrations
We focus on a general discrete choice model: the multinomial probit with a logit kernel. Since the
properties of standard asymptotics in this class of models are little documented, it is important
8
to assess the performance of both confidence set procedures within this framework. The model
can be described as follows. Assuming that each decision maker n (n = 1, . . . , T ) face J discrete
alternatives, usual random utility formulations imply:
½
1 if Uin ≥ Ujn for j = 1, 2, . . . , J
(5.1)
ς in =
0 otherwise,
(5.2)
Uin = Xin β + εin ,
n = 1, . . . , T , i = 1, 2, . . . , J where Uin is the unobservable utility individual n derives from
alternative i, ς in designates the choice of individual n, Xin is a (1 × K) vector of observable
covariates, and the (K × 1) vector β constitutes the parameter of interest. In this context, the
choice probability associated with the alternative i chosen by individual n is defined by:
Pn (i) = P (Uin ≥ Ujn for j = 1, 2, . . . , J ).
(5.3)
For convenience, we write (5.2) in the following compact form:
Un = Xn β + εn ,
where Un = (U1n , U2n , . . . , UJ n )0 and εn = (ε1n , ε2n , . . . , εJ n )0 are J × 1 vectors, and Xn is the
(J × K) matrix with rows Xin , i = 1, 2, . . . , J .
The model formulation (5.2) is very general. The assumptions regarding the error term
i.i.d.
εn allow to define several classes of sub-models. For example, assuming εn ∼ N (0, Σ) gives
the Multinomial Probit [MNP] model. In this case Pn (i) requires the evaluation of multidimensional integrals, which may be analytically intractable for large choice sets; in particular,
when the choice set involves four or more alternatives, the choice probabilities are usually simi.i.d.
ulated. Assuming εn ∼ Gumbel leads to the Multinomial Logit [MNL] model, in which case
Pn (i) are analytically tractable. We consider the logit kernel formulation that results from a
convenient combination of MNP and MNL (see Ben-Akiva et al. (2001)):
Un = Xn β + εn
εn = W ξ n + µν n ,
i.i.d
i.i.d
(5.4)
W = F G,
(5.5)
where ξ n ∼ N (0, IJ ), ν n ∼ Gumbel, G is a diagonal matrix of dimension (J × J ) with
non-negative entries on the its main diagonal, the matrix F captures the correlation structure
among the error terms (see e.g. Bolduc (1999)) and µ is a selection parameter with values 0 or 1:
if µ = 0, we have an MNP specification; on the other hand if µ = 1 and G = 0 (or alternatively,
if ξ n is non-random) we get the MNL model. Assuming that G 6= 0 and µ = 1 yields the mixed
logit model [MXL], also known as the multinomial probit with a logit kernel.
The MXL formulation is attractive because Pn (i) can be conveniently specified, conditionally
to ξ n , as:
eXin β+Wi ξn
,
(5.6)
Pn (i|ξ n ) = J
P X β+W ξ
jn
j
n
e
j=1
9
where Wj denotes the jth row of W , so Pn (i) obtains by integrating Pn (i|ξ n ) over the domain of
ξ n which leads to a J -dimensional unbounded integral. To circumvent the curse of dimensionality, the SML approach (see McFadden (1989)) is typically considered where the multivariate
integral is replaced by an approximation obtained by simulation. Using S independent draws
[ξ rn , r = 1, ..., J ] from the distribution of ξ n , the empirical mean
P̂n (i) =
S
1X
Pn (i|ξ rn ),
S r=1
(5.7)
provides a valid estimator for the choice probability Pn (i), which allows to define a simulated
likelihood function suitable for estimation applying standard algorithms.
Our simulation studies and empirical applications are conducted for specific cases of this
general discrete choice model. Conforming with our general notational framework [see section
0
2], the vector of unknown parameters which describes the model is denoted by θ = (β 0 , β̄ )0 ,
where the sub-vector β is the parameter of interest as defined in (5.2), and β̄ contains the
nuisance parameters associated with the various hypotheses on the model’s error term. We will
also refer (as set up in section 2) to the components of θ as θ1 , θ2 , ..., θp .
5.1
Simulation studies
We first consider, for motivational purposes, a simulation study of a simple binary probit model,
which corresponds to (5.1) - - (5.5) with µ = 0, J = 2 and W = I2 . The matrix of covariates
includes, in addition to a constant [whose coefficient is denoted θ1 ], two regressors [the coefficients
of which constitute our parameters of interest denoted θ2 , θ3 ] drawn independently as standard
normal. The parameters are set as follows: θ1 = 1, θ2 = 3.3 and θ3 varies from 2 to 0.0001;
the sample size is set to T = 100, 250, 1000, 5000, and 10000. We construct 95%-level CSs for
δ = θ2 /θ3 ; based on 10000 replications, we compute the empirical coverage rate for both delta
method and Fieller based procedures. Results are shown in Table 1.
These results show that the empirical coverage rate of the delta method based CS deteriorates
rapidly as the denominator becomes close to zero, no matter how large the sample size. When
the denominator value is lower than 0.1, the empirical coverage rate deviates markedly from
the nominal level. In contrast, the Fieller-type method, although it is approximate in our
application, does not suffer from such problems. These results particularly regarding the delta
method are noteworthy, in view of the very simple design considered here. Of course, the same
arguments call for studying a more complicated choice model, to assess the Fieller case.
So we next consider a MXL model as specified by (5.4) and (5.5) with J = 3,

0 0 0
G =  0 1 √0  ,
2
0 0



−1
0 0 0
F = I3 − .6  0 0 1  ,
0 1 0
and µ = 1. Xn is composed of K = 5 variables [with coefficient β = (θ1 , ..., θ5 )0 ] that are
drawn as 1.5 times a uniform [0 1] distribution. The parameters are set as follows: θi = 3, for
10
Table 1: Empirical coverage rates for the delta method and the Fieller method based confidence
sets for a parameter ratio in a simple binary probit model.
T
θ3
2
1
0.5
0.4
0.3
0.2
0.1
10−2
10−3
10−4
100
DCS FCS
93.30 95.33
90.97 95.82
88.58 95.50
89.00 95.48
82.11 95.34
79.15 95.77
61.59 95.79
22.03 95.66
06.79 95.69
02.42 95.67
250
DCS FCS
94.71 95.29
94.01 95.19
91.11 95.33
90.65 94.90
89.93 94.80
85.48 95.12
72.97 95.08
30.44 94.89
10.17 94.99
02.98 95.19
1000
DCS FCS
95.09 95.38
94.97 95.13
94.38 94.87
93.91 94.93
92.81 95.09
91.36 94.97
85.82 95.13
41.07 95.50
13.90 94.89
04.29 94.79
5000
DCS FCS
95.05 95.11
95.08 95.06
94.86 94.92
94.68 94.69
94.98 95.01
94.07 94.61
91.81 94.99
56.86 95.09
19.56 94.54
06.38 95.41
10000
DCS FCS
94.84 94.93
94.84 95.01
95.14 94.96
94.81 94.88
94.77 94.75
95.10 95.07
93.16 95.22
64.47 94.96
24.15 94.92
07.71 95.34
Note: Numbers reported are empirical coverage rates for the confidence set based on the delta
method [in the columns titled “DCS”] and for the one based on the Fieller’s method [in the
columns titled “FCS”]. θ3 is the denominator of the ratio and T is the sample size. The nominal
confidence level is 95%.
i = 1, 2, 4, 5 in all the experiment, whereas θ3 varies from 3 to 0.0001. The sample size is set to
T = 1000, 5000, and 10000. We estimate the model by SML, using (5.7) with S = 50 draws.4
We construct 95%-level CSs sets for δ ∗ = θ2 /θ3 , and we compute empirical coverage rates based
on 1000 replications. Results are reported in Table 2.
Our results here are very similar to the simple binary probit case: the Fieller-type CSs still
performs very well, whereas empirical coverage rates for the delta method deviate markedly from
the nominal level when the denominator value is lower than 0.1. Coverage of the delta-method
CIs somewhat improves with sample size, yet remains quite lower from the nominal level for any
sample size. Given the complexity of the model considered, these results illustrate the usefulness
of Fieller-based inference.
5.2
Empirical Application: discrete choice models of travel demand
Discrete choice techniques are commonly applied to analyze transportation related problems.
Here we consider two empirical examples from this literature. Following our simulation studies,
we consider a relatively simple set-up where usual maximum likelihood is readily feasible, and
a more complicated setting which requires SML.
We first consider the three-alternative logit transportation model analyzed in Ben-Akiva and
Lerman (1985, Chapters 3, 5 and 7.). The model corresponds to (5.1) - - (5.5) where G = 0,
4
See Walker (2001) and Bolduc (1999) for recommendations on identifying MXL models and for guidelines on
the choice for S. Our choice of 50 draws is compatible with the latter guidelines, given our framework.
11
Table 2: Empirical coverage rates for the delta method and the Fieller method based confidence
sets for a parameter ratio in a multinomial probit model with a logit kernel.
T
θ3
3
2
1
0.5
0.3
0.2
0.1
10−2
10−3
10−4
1000
DCS FCS
94.3 95.1
94.8 95.4
93.3 93.7
90.5 93.9
87.8 94.7
82.2 94.1
69.2 93.5
25.9 93.4
7.6
93.9
2.6
94.0
5000
DCS FCS
95.2 94.6
95.0 95.5
94.4 94.5
95.1 94.1
91.6 94.9
90.6 95.1
84.5 94.1
37.4 94.4
13.1 94.7
4.6
94.8
10000
DCS FCS
94.0 93.7
94.8 94.9
94.5 94.2
94.3 93.8
93.6 93.4
91.8 94.3
86.8 94.8
45.7 93.7
15.8 94.2
5.4
93.9
Note: See notes for Table 1.
µ = 1 and J = 3. The variables in Xn include two alternative-specific constants, three generic
attributes of the travel modes [the coefficients of which constitute our parameters of interest,
denoted θ3 , θ4 , θ5 ]: (i) round trip travel time [the sum of in-vehicle and out-of vehicle times], (ii)
round trip out-of vehicle time/one-way distance, (iii) round trip travel cost/household income,
as well as seven alternative-specific socioeconomic and locational characteristics of worker n.
This model is estimated by maximum likelihood using data for a sample of 1136 workers taken
from a 1968 survey in the Washington D.C. metropolitan area.
In this setting, the economic value of travel time can be defined as the marginal rate of
substitution between the time and cost variables.5 In particular, since round trip travel time is
the sum of in-vehicle and out-of vehicle times, the value of total travel time is equal to that of
in-vehicle time and is given by
δ tot =
θ3
× household income.
θ5
Similarly, the value of out-of-vehicle time is
¸
·
θ3
θ4
× household income.
+
δ out =
θ5 θ5 × (one-way distance)
5
(5.8)
(5.9)
Although various theories of time allocation reveal that the value of travel time can be perceived in different
ways, most empirical studies refer to the value of travel time as the amount of money the traveler agrees to pay
in order to save one unit of the total duration of his travel [Ashton (1947), De Vany (1974), Truong and Hensher
(1985), Bates (1987), Ben-Akiva et al. (1993)]. In a discrete choice framework, when the traveler’s utility function
is specified as a linear function of travel cost, travel time and other variables, his evaluation of the value of travel
time is, up to a scalar constant, equal to the ratio of the coefficient of the time variable over the coefficient of the
cost variable [Truong and Hensher (1985), Bates (1987)].
12
Table 3: Simultaneous confidence sets for values of total travel time and of out-of-vehicle time
from Ben-Akiva and Lerman (1985)’s trinomial logit model; 95% nominal level.
Type of travel time
Total travel time (δ tot )
Out-of-vehicle time (δ out )
Delta method
[-2.655, 30.253]
[-4.055, 46.639]
] − ∞,
] − ∞,
Fieller method
−40.843] ∪ [3.918,
−66.877] ∪ [6.760,
+∞[
+∞[
Note: The delta method-based confidence intervals are not simultaneous.
Ben-Akiva and Lerman (1985) compute point estimates for the two parameter functions δ tot
and δ out . Conforming with our notation set in section 2, let h1 (θ) = θ3 /θ5 and h2 (θ) = θ4 /θ5 .
If θ5 is close to zero, then the functions δ tot and δ out will be weakly identified. Here, using
the estimation results from Ben-Akiva and Lerman (1985, Table 7.1 and Figure 7.1), we obtain
95%-level CSs for the ratios h1 (θ) and h2 (θ). The delta method yields
DCS (h1 (θ) ; .95) = [−.0002089,
.0023483],
DCS (h2 (θ) ; .95) = [−.1974734,
1.1382400],
(5.10)
whereas the Fieller-type method (refer to section 3) gives
FCS (h1 (θ) ; .95) = [−∞,
FCS (h2 (θ) ; .95) = [−∞,
−.0151209] ∪ [.0003947,
−7.2190631] ∪ [.0826500,
+∞] ,
(5.11)
+∞] .
Clearly, the Fieller type CS which is unbounded is in serious conflict with the delta method
based CS. Note that the t-statistic for the round trip travel cost/household income variable
[the denominator] is 1.8 (see Ben-Akiva and Lerman (1985, p. 158)), which concurs with the
unbounded CS result. However, one important result deserves notice: whereas both deltamethod based CSs cover zero [which suggests that travel time has a non-significant economic
value], we see that the Fieller-type CSs exclude this (counter-intuitive) case.
We next construct simultaneous CSs for the value of total travel time δ tot and the value of
out-of-vehicle time δ out [defined in (5.8) and (5.9) respectively]. For comparison purposes, we
also compute the delta method CSs which are not simultaneous. We use sample average values
for annual household income (equal to 12900$/year) and for one-way distance (equal to 810
centimiles). Our results are reported in Table 3.
Once again, the CSs are in serious conflict; the Fieller-based CSs are unbounded, yet in
contrast with the delta-method, they still exclude the zero value. In view of our simulation
studies which assess the relative worth of both methods, our results suggest that (despite a
close-to-zero denominator) the economic value of time is significant at the 5% level for this data
set.
We now turn to a more complicated setting. We consider the MNP model with correlated
utilities analyzed by Bolduc (1999). Nine alternatives are considered, along with a first-order
Generalized Autoregressive error process [Bolduc (1992)]. In our notation, the model obtains
from (5.1) - - (5.5) with µ = 0 and various choices for F [refer to footnotes of Tables 4 and
5]. Given the dimensionality and error structure, the model is estimated by SML with 50 or
13
Table 4: Simultaneous confidence sets for values of time as percentage of net personal income,
95% nominal level.
Type of
travel time
Confidence set
MNP i.i.d.
In vehicle time
delta
Fieller
delta
Fieller
delta
Fieller
[117.90, 285.52]
[95.77, 352.52]
[240.79, 450.66]
[222.08, 548.30]
[453, 1093.37]
[370.12, 1350.40]
Walking time
Waiting time
SML MNP
R = 50
homoscedastic
[122.07, 300.46]
[101.77, 382.24]
[239.39, 468.58]
[223.13, 590.17]
[507.58, 1201.34]
[437.36, 1533.36]
SML MNP
R = 50
unconstrained
[102.69, 265.37]
[61.88, 411.03]
[178.65, 397.82]
[141.48, 631.17]
[286.99, 830.65]
[178.70, 1373.47]
Note: See notes at the end of Table 5.
250 draws6 , given a data bank on the choice of transportation modes for the morning peak
journey to work in the central business district of Santiago. The covariates considered include
mode-specific dummies, a sex dummy, a dummy for no cars/no permit holders, cost/income,
and three specific uses of travel time [the coefficients of which are our parameters of interest]:
walking time, in-vehicle time, and waiting time. The ratio of the latter coefficients with respect
to the coefficient of the cost/income variable yields three specific values of time which we aim
to estimate.
From the estimation results reported in Bolduc (1999), we obtain simultaneous CSs for the
three ratios. For comparison purposes, we also compute the delta method CSs which are not
simultaneous. Results are summarized in Tables 4 and 5.
Here, both procedures yield bounded CSs, which signals no identification problems. As
expected, the three components of travel time considered have significant economic values. In
all cases, the Fieller-type CSs are wider and cover the ones based on the delta method. Recall
of course that the Fieller-type CSs for the three values of time hold simultaneously, which
guarantees joint level control.
6
Conclusion
This paper considers the problem of constructing CS estimates for parameter ratios and proposes
easy-to-compute simultaneous Fieller method-based confidence limits that hold asymptotically
under mild regularity conditions. Even in identifiable econometric models, parameter ratios
involve a possibly discontinuous parameter transformation that leads to the failure of standard
Wald-type based CI estimation methods. We document this problem in discrete choice models,
and show that our alternative Fieller-based CSs perform quite well despite the possibility of weak
6
The estimation method combines the SML and the Geweke-Hajivassiliou-Keane (GHK) choice probability
simulator based on analytically computed scores.
14
Table 5: Simultaneous confidence sets for values of time as percentage of net personal income,
95% nominal level (continued).
Type of
travel time
Confidence set
In vehicle time
delta
Fieller
delta
Fieller
delta
Fieller
Walking time
Waiting time
SML MNP
R = 250
unconstrained
[110.32,
286.98]
[65.89,
457.18]
[182.09,
413.40]
[143.11,
678.24]
[300.35,
879.82]
[189.56,
1512.05]
SML MNP
R = 250
constrained
[121.00,
307.96]
[76.41,
489.12]
[192.22,
439.45]
[151.55,
719.44]
[310.43,
902.24]
[205.82,
1555.02]
Note: The delta method-based CIs are not simultaneous. The CSs are obtained from estimating a
MNP model with first-order Generalized Autoregressive errors, assuming several error correlation
structures. The column entitled “MNP i.i.d.” corresponds to the independent probit model with
homoscedastic errors. The column entitled “SML R=50 homoscedastic” corresponds to the
MNP with cross-correlated and homoscedastic errors; SML R=50 or 250 refers to the number of
draws underlying the SML method. The columns entitled “SML R=50 unconstrained” or “SML
R=250 unconstrained” refers to a specification where the specific errors are not constrained
to have an equal variance (heteroscedasticity). The column entitled “SML R=250 constrained”
corresponds a specification where homoscedasticity is assumed within groups of alternatives (see
Bolduc (1999, p. 75)).
15
identification. Since the formula we provide are as simple to calculate as the usual Wald-type
ones, our results indicate that the delta method should be avoided in favour of the Fieller one.
Unbounded CSs occur when the denominator of the ratios at hand is not significantly different
from zero; this might suggest a pre-test sequential inference procedure. Our results show that
the Fieller approach integrates this pre-test within the CS estimation procedure, ensuring the
global level control.
Our methods of proof illustrate the usefulness of the geometric approach to projections introduced by Dufour and Taamouti (2005a, 2005b). Indeed, we have exploited the latter approach
to solve the problem of estimating elasticities in commonly used econometric settings. The recent literature on weak-identification (references are provided in the introduction) suggests that
projection techniques such as the one we apply here will be gaining popularity in econometric
practice.
Of course, the results discussed in this paper remain asymptotic beyond the normal regression
framework. The development of exact inference methods in discrete choice models is an appealing
research objective.
Appendix
A
The Fieller-type solution for one parameter ratio
This appendix characterizes the Fieller-type confidence set for one parameter ratio; see
Scheffé (1970) and Zerbe et al. (1982). We solve the inequality
2
2
v̂2 ,
A = θ̂2 − zα/2
Aδ 20 + 2Bδ 0 + C ≤ 0
2
B = −θ̂1 θ̂2 + zα/2
v̂12 ,
2
2
C = θ̂1 − zα/2
v̂1 ,
for real solutions δ 0 . Except for a set of measure zero, A 6= 0. Similarly, except for a set of
measure zero, ∆ = B 2 − AC 6= 0. Real roots
√
√
−B − ∆
−B + ∆
, δ 02 =
.
δ 01 =
A
A
exist if and only if ∆ > 0, so
FCS (δ; α) =
½
[δ 01 , δ 02 ]
]−∞, δ 01 ] ∪ [δ 02 ,
if
+∞[ if
A>0
.
A<0
¢
¡
If ∆ < 0, then A < 0 ⇒ ∀δ 0 ∈ R, Aδ 20 + 2Bδ 0 + C < 0 and FCS (δ; 1 − α) = R. Indeed,
´
³ 2
¡ 2
¢ 4
2
2
∆ = v̂12
− v̂1 v̂2 zα/2
+ θ̂1 v̂2 + θ̂2 v̂1 − 2θ̂1 θ̂2 v̂12 zα/2
.
16
2 − v̂ v̂ < 0 (by the Cauchy-Schwartz inequality), ∆ is negative if and only if
Since v̂12
1 2
2
2
θ̂ v̂2 + θ̂2 v̂1 − 2θ̂1 θ̂2 v̂12
2
< zα/2
.
z = 1
2
v̂1 v̂2 − v̂12
∗
Furthermore,
So ∆ < 0
B
2
(A.1)
´2
³
θ̂
v̂
−
θ̂
v̂
2
12
1
2
2
¢ < 0.
θ̂2 /v2 − z ∗ = − ¡
2
v̂2 v̂1 v̂2 − v̂12
2 , which establishes that ∆ < 0 ⇒ A < 0.
⇒ θ̂2 /v̂2 < z ∗ < zα/2
Proof of Theorem 1
To prove Lemma 1, we need the following result known as the Sylvester’s law of inertia; refer to
Lancaster and Tismenetsky (1985, pp. 184-205) for a proof of this result.
Lemma 1 (Sylvester’s law of inertia) Let Π1 and Π2 be any p × p symmetric matrices of the
same rank r ≤ p. If Π1 = N Π2 N 0 for some matrix N , then Π1 and Π2 have the same number
of positive eigenvalues.
b θ is symmetric and positive definite. Then, since H [defined
Our framework supposes that Σ
³
´−1
bθH0
in 4.2] has full row rank, it follows that H Σ
is symmetric and positive definite. Similarly,
³
´−1
bθH0
since S1 [defined in (4.9)] has full row rank, Q = S1 H Σ
S10 is symmetric and positive
definite. Then, there exists a nonsingular matrix P such that P 0 QP = Is . Using Lemma 1,
the two matrices P 0 M11 P and M11 have the same number of positive eigenvalues and the same
number of negative eigenvalues. Focusing on the matrix P 0 M11 P , we have:
¡
¢
P 0 M11 P = P 0 S1 M S10 P
¸ ·³
¸0 ¶ ¸
· µ ³
´−1 ·³
´−1
´−1
0
0
0
0
b
b
b
− H Σθ H
H θ̂
H Σθ H
H θ̂
S10 P
= P S1 c H Σθ H
·³
¸ ·³
¸0
³
´−1
´−1
´−1
0
0
0
0
0
0
b
b
b
S1 P − P S1 H Σθ H
H θ̂
H Σθ H
H θ̂ S10 P
= cP S1 H Σθ H
·
¸·
¸0
³
´−1
³
´−1
0
0
0
0
b
b
H θ̂ P S1 H Σθ H
H θ̂ .
= cIs − P S1 H Σθ H
The last expression shows that P 0 M11 P is a patterned matrix of the type discussed in Graybill
(1983, p. 206). Thus, P 0 M11 P has s − 1 eigenvalues equal to c and one eigenvalue equal to
¸0 ·
¸
·
³
´−1
³
´−1
0
0
0
0
b
b
H θ̂ P S1 H Σθ H
H θ̂ .
c − P S1 H Σθ H
17
Zerbe et al. (1982) show that in fact
·
¸0 ·
¸
³
´−1
³
´−1
0
0
0
0
b
b
c − P S1 H Σθ H
H θ̂ P S1 H Σθ H
H θ̂ = a
where a is defined in (4.13). Except for a set of values for θ̂ of measure zero, c 6= 0 and a 6= 0;
so, zero is not an eigenvalue of M11 and M11 is nonsingular. The sign of det (M11 ) is the same
as the sign of det (P 0 M11 P ) = acs−1 .
Applying the same arguments for the matrix M , the sign of det (M ) is the same as that of
−1
M12 is a
−cs,α cs . In addition, using block matrix inversion formula, and since M22 − M21 M11
scalar, we get:
¡
¢
−1
M12
det (M ) = det (M11 ) det M22 − M21 M11
¢
¡
−1
M12 − M22
= − det (M11 ) M21 M11
= − det (M11 ) d.
This implies that d has the same sign as cs,α c/a. So, we have the following results:
If a > 0, then c > 0 : all the eigenvalues of M11 are positive, and M11 is positive definite. If
c < 0, then a < 0 : all the eigenvalues of M11 are negative, and M11 is negative definite. Clearly,
we have d > 0 in these two cases.
On the other hand, if (c > 0 and a < 0), we have d < 0; then M11 has at least one positive
eigenvalue and at least one negative eigenvalue; thus, M11 is neither positive definite nor negative
definite. Lemma 1 is then proved.
C
Projections for individual ratios
Here we use two well known results on matrix inversion from Rao (1973, page 33), which we
reproduce for convenience.
Lemma 2 When Π11 and Π22 are symmetric matrices, let
·
¸
Π11 Π12
Π=
.
Π012 Π22
Then (assuming the inverses in the latter expression exist):
"
¡
¢−1 0 −1
¡
¢ #
−1
−1
0 Π−1 Π
0 Π−1 Π
Π−1
+
Π
Π
−
Π
Π
Π
−Π
Π
−
Π
Π
Π
12
22
12
12
22
12
−1
12
12
12
11
11
11
.
Π =
¡11
¢11
¡
¢11−1
−1 0
− Π22 − Π012 Π−1
Π12 Π−1
Π22 − Π012 Π−1
11 Π12
11
11 Π12
Lemma 3 Let Π be a nonsingular matrix and U and V two column vectors. Then
¡ −1 ¢ ¡ 0 −1 ¢
¢
¡
Π U V Π
0 −1
−1
.
=Π −
Π + UV
1 + V 0 Π−1 U
18
First observe that our CS limits from Theorem 2 correspond to the solutions of a quadratic
equation where the sum and the product of the roots are:
¡
¢
−1
Pe = (w0 f )2 − d w0 M11
w .
Se = 2w0 f,
Applying Lemma 3 to the matrix M (defined in (4.6)), we have
M −1 =
=
bθH0
HΣ
+
c
0
H θ̂θ̂ H 0 /c
³
´−1
0
bθH0
c − θ̂ H 0 H Σ
H θ̂
0
b θ H 0 H θ̂θ̂ H 0 /c
HΣ
−
c
cs,α
(C.1)
(C.2)
using (4.6). Now applying Lemma 2 to the matrix M as partitioned in (4.8), we have
·
¸
−1
− ff 0 /d −f/d
M11
−1
M =
−1/d
−f 0 /d
which implies that
¡
¢
¢0
¡
¢2
−1
w0 , 0 M −1 w0 , 0
= w0 M11
w − w0 f /d = −Pe/d,
¡ 0 ¢ −1 ¡ 0 ¢0
e
= −w0 f /d = −S/(2d),
0s , 1 M
w ,0
¡ 0 ¢ −1 ¡ 0 ¢0
0s , 1
0s , 1 M
= −1/d,
¡
so
Pe =
(w0 ,0)M −1 (w0 ,0)0
,
(00s ,1)M −1 (00s ,1)0
Se =
2(00s ,1)M −1 (w0 ,0)0
.
(00s ,1)M −1 (00s ,1)0
Applying the latter expressions to (C.2), and assuming that w is the selection vector with one
at the ith position and zeros elsewhere (i.e. the vector which selects the ratio with numerator
defined by Li ) we obtain
´
³
b θ Li − K 0 θ̂θ̂0 Li
2 cs,α K 0 Σ
2Bi
=−
,
Se =
0
Ai
b θ K 0 − K θ̂θ̂ K 0
cs,α K Σ
b θ Li − L0 θ̂θ̂0 Li
cs,α L0i Σ
Ci
i
=
,
Pe =
0
0
0
A
b
i
cs,α K Σθ K − K θ̂θ̂ K
which gives exactly the sum and product of the roots of the quadratic inequality (4.17). This
shows that both solutions are identical.
19
References
Abdelkhalek, T. and Dufour, J.-M. (1998), ‘Statistical inference for computable general equilibrium models, with application to a model of the Moroccan economy’, Review of Economics
and Statistics LXXX, 520—534.
Ashton, H. (1947), ‘The time element in transportation’, American Economic Review
37(2), 423—440.
Banks, J., Blundell, R. and Lewbel, A. (1997), ‘Quadratic engel curves and consumer demand’,
The Review of Economics and Statistics 79(4), 527—539.
Bates, J. J. (1987), ‘Measuring travel time values with a discrete choice model: A note’, The
Economic Journal 97(386), 493—498.
Ben-Akiva, M., Bolduc, D. and Bradley, M. (1993), ‘Estimation of travel choice models with
randomly distributed values of time’, Transportation Research Records 1413, 88—97.
Ben-Akiva, M., Bolduc, D. and Walker, J. (2001), Specification, identification, and estimation of
the logit kernel (or Continuous Mixed Logit) model, Technical report, MIT working paper.
Ben-Akiva, M. and Lerman, S. R. (1985), Discrete Choice Analysis: Theory And Application to
Travel Demand, The MIT Press, Cambridge, MA.
Bolduc, D. (1992), ‘Generalized autoregressive errors in the multinomial probit model’, Transportation Research Part B 26B(2), 155—170.
Bolduc, D. (1999), ‘A practical technique to estimate multinomial probit models in transportation’, Transportation Research Part B 33, 63—79.
Bucephala, J. P. and Gatsonis, C. A. (1988), ‘Bayesian inference for ratios of coefficients in a
linear model’, Biometrics 44(1), 87—101.
Buonaccorsi, J. P. (1985), ‘Ratios of linear combinations in the general linear model’, Communications in Statistics, Theory and Methods 14, 635—650.
Darby, S. C. (1980), ‘A bayesian approach to parallel-line bioassay’, Biometrika 3, 607—612.
Davidson, R. and MacKinnon, J. G. (1999), ‘Bootstrap testing in nonlinear models’, International Economic Review 40, 487—508.
De Vany, A. (1974), ‘The revealed value of time in air travel’, Review of Economics and Statistics
56(1), 77—82.
Deaton, A. S. and Muellbauer, J. (1980), ‘An almost ideal demand system’, American Economic
Review 70, 312—326.
20
Dufour, J.-M. (1989), ‘Nonlinear hypotheses, inequality restrictions, and non-nested hypotheses:
Exact simultaneous tests in linear regressions’, Econometrica 57, 335—355.
Dufour, J.-M. (1997), ‘Some impossibility theorems in econometrics, with applications to structural and dynamic models’, Econometrica 65, 1365—1389.
Dufour, J.-M. (2003), ‘Identification, weak instruments and statistical inference in econometrics’,
Canadian Journal of Economics 36(4), 767—808.
Dufour, J.-M. and Jasiak, J. (2001), ‘Finite sample limited information inference methods for
structural equations and models with generated regressors’, International Economic Review
42, 815—843.
Dufour, J.-M. and Taamouti, M. (2005a), ‘Further results on projection-based inference in IV
regressions with weak, collinear or missing instruments’, Journal of Econometrics forthcoming.
Dufour, J.-M. and Taamouti, M. (2005b), ‘Projection-based statistical inference in linear structural models with possibly weak instruments’, Econometrica 73(4), 1351—1365.
Fieller, E. C. (1940), ‘The biological standardization of insulin’, Journal of the Royal Statistical
Society (Supplement) 7(1), 1—64.
Fieller, E. C. (1954), ‘Some problems in interval estimation’, Journal of the Royal Statistical
Society, Series B 16(2), 175—185.
Graybill, Franklin, A. (1983), Matrices with applications in statistics, Belmont, Calif. :
Wadsworth International Group, Belmont.
Kleibergen, F. (2005), ‘Testing parameters in GMM without assuming that they are identified’,
Econometrica 73(4), 1103—1123.
Lancaster, P. and Tismenetsky, M. (1985), The Theory of Matrices, second edition with applications, Academic Press Inc., Orlando, Florida.
McFadden, D. (1989), ‘A method of simulated moments for estimation of discrete response
models without numerical integration’, Econometrica 57, 995—1026.
Miller, Jr., R. G. (1981), Simultaneous Statistical Inference, second edn, Springer-Verlag, New
York.
Moreira, M. J. (2003), ‘A conditional likelihood ratio test for structural models’, Econometrica
71(4), 1027—1048.
Rao, C. R. (1973), Linear Statistical Inference and its Applications, second edn, John Wiley &
Sons, New York.
21
Savin, N. E. (1984), Multiple hypothesis testing, in Z. Griliches and M. D. Intrilligator, eds,
‘Handbook of Econometrics, Volume 2’, North-Holland, Amsterdam, chapter 14, pp. 827—
879.
Savin, N. E. and Würtz, A. H. (1999), ‘Power of tests in binary response models’, Econometrica
67(2), 413—421.
Savin, N. E. and Würtz, A. H. (2001), Empirical relevant power comparisons for limited dependent variable models, in C. Hsiao, K. Morimune and J. L. Powell, eds, ‘Nonlinear
statistical modeling : proceedings of the thirteenth International Symposium in Economic
Theory and Econometrics : essays in honor of Takeshi Amemiya’, Cambridge University
Press, New York, chapter 2, pp. 47—70.
Scheffé, H. (1953), ‘A method for judging all contrasts in the analysis of variance’, Biometrika
40, 87—104.
Scheffé, H. (1959), The Analysis of Variance, first edn, John Wiley & Sons, New York.
Scheffé, H. (1970), ‘Multiple testing versus multiple estimation in proper confidence sets, estimation of directions and ratios’, Annals of Mathematical Statistics 41, 1—29.
Selwyn, M. R. and Hall, N. R. (1984), ‘On bayesian methods in bioequivalence’, Biometrics
40, 1103—1108.
Stock, J. H., Wright, J. H. and Yogo, M. (2002), ‘A survey of weak instruments and weak identification in generalized method of moments’, Journal of Business and Economic Statistics
20(4), 518—529.
Truong, P. T. and Hensher, D. A. (1985), ‘Measurement of travel time values and opportunities
cost from a discrete choice model’, The Economic Journal 95(378), 439—451.
Walker, J. (2001), Extended discrete choice models: Integrated framework, flexible error structures and latent variables, Technical report, Ph.D thesis, Massachusetts Institute of Technology.
Young, D. A., Zerbe, G. O. and Hay, W. W. (1997), ‘Fieller’s theorem, Scheffé simultaneous
confidence intervals, and ratios of parameters of linear and nonlinear mixed-effects models’,
Biometrics 53(3), 838—847.
Zerbe, G. O. (1978), ‘On Fieller’s theorem and the general linear model’, The American Statistician 32(3), 103—105.
Zerbe, G. O., Laska, E., Meisner, M. and Kushner, H. B. (1982), ‘On multivariate confidence
regions and simultaneous confidence limits for ratios’, Communications in Statistics, Theory
and Methods 11(21), 2401—2425.
22
Download