Robust kernel estimator for densities of unknown smoothness

advertisement
Robust kernel estimator for densities of
unknown smoothness
First draft. Preliminary and incomplete.
By Yulia Kotlyarova and Victoria Zinde-Walsh∗
Department of Economics,
McGill University and CIREQ
February 26, 2004
Abstract
Results on nonparametric kernel estimators of density differ according to the assumed degree of density smoothness. While most
asymptotic normality results assume that the density function is smooth
there are cases where non-smooth density functions may be of interest.
We provide asymptotic results for kernel estimation of a non-smooth
continuous density and derive a new result: the limit joint distribution of kernel density estimators corresponding to different bandwidths
and kernel functions. Using the results on the joint distribution of kernel density estimators we construct an estimator that optimally combines estimators for different bandwidth/kernel combinations to protect against the negative consequences of errors in assumptions about
order of smoothness. The results of a Monte Carlo experiment confirm
the usefulness of the combined estimator. We demontrate that while
in the standard normal case the combined estimator has relatively
higher MSE than the standard kernel estimator, both estimators are
highly accurate. On the other hand, for a non-smooth density where
the MSE gets very large, the combined estimator provides uniformly
better results than the standard estimator.
∗
The support of the Social Sciences and Humanities Research Council of Canada
(SSHRC), the FONDS QUEBECOIS DE LA RECHERCHE SUR LA SOCIETE ET LA
CULTURE (FRQSC) is gratefully acknowledged.
1
Robust density estimator
Corresponding author:
Victoria Zinde-Walsh
Department of Economics
McGill University,
855 Sherbrooke Street West,
Montreal, Quebec H3A 2T7
Canada
Email: victoria.zinde-walsh@mcgill.ca
2
1
Introduction
Results on nonparametric kernel estimators of density differ according to
the assumed degree of density smoothness. Continuity of density implies
that kernel estimators are consistent; finite sample biases and variances are
known. While most asymptotic normality results assume that the density
function is smooth there are cases where non-smooth density functions may
be of interest. We provide asymptotic results for kernel estimation of a
non-smooth continuous density and derive a new result: the limit joint distribution of kernel density estimators corresponding to different bandwidths
and kernel functions. Because of the nonparametric rates of convergence
each such estimator may provide additional information. The examination
of the joint distribution is similar to theorems about the joint distributions of
other esimators that converge at nonprametric rates such as smoothed maximum score (Kotlyarova and Zinde-Walsh, 2003) or smoothed least median
of squares estimator (Zinde-Walsh, 2002).
If second order or higher order derivatives of the density exist, a bandwidth that provides an optimal convergence rate can be found for a kernel
of sufficiently high order. If, however, there is no certainty that the smoothness assumptions hold, under- and especially over-smoothing is likely. Oversmoothing leads to asymptotic bias and makes the estimator concentrate
around the wrong value. Undersmoothing yields a consistent estimator but
increases the mean squared error. If there are no grounds on which to assume
smoothness of the density the chosen rate for the bandwidth may be in error
and the estimator will suffer from the problems associated with under- or
over- smoothing.
Using the results on the joint distribution of kernel density estimators
we construct an estimator that optimally combines estimators for different
bandwidth/kernel combinations to protect against the negative consequences
of errors in assumptions about order of smoothness. The results of a Monte
Carlo experiment confirm the usefulness of the combined estimator in finite
sample. We demontrate that while in the standard normal case the combined
estimator has relatively larger MSE than the standard kernel estimator, both
estimators are highly accurate. On the other hand, for a non-smooth density
where the MSE gets very large, the combined estimator provides uniformly
better results than the standard estimator.
The paper is organized as follows. Section 2 provides the definitions,
assumptions and known results for the kernel density estimator. Section
3
3 provides asymptotic results under weak (only continuity, no smoothness)
assumptions for the kernel density estimator, as well as for the joint limit
process for several estimators. The new combined estimator is defined in
Section 4, where we discuss how to construct it (selection of bandwidths,
smoothing kernels, estimation of the MSE of a linear combination) and evaluate its performance in a Monte Carlo experiment. Appendix A provides the
proofs of the results in Section 3.
2
Definitions, notation, assumptions, known
results
Consider a random variable x and the corresponding
distribution function
R
F (x). If density f (x) exists then F (x) = f (x)dx.
Assumption 1.
(a) (xi ), i = 1, ..., n, is a random sample of x.
(b) a density function f (x) exists and is continuous at x.
To estimate density we utilize kernel functions but do not restrict kernels
to symmetric or nonnegative density functions; as will be clear later this may
give us some extra flexiblity.
Assumption 2.
(a) The kernel smoothing function ψ is a continuously differentiable function; R
(b) ψ(w)dw = 1;
(c) The bandwidth parameter σ n → 0 and σ n n → ∞.
The kernel density estimator is defined as
¶
µ
n
X
1
x
x
−
i
fˆ(x) =
.
ψ
(1)
nσ n i=1
σn
We review here some of the known results; up to the difference in notation
we refer to the summary in Pagan and Ullah, 1999. Our main aim here is to
emphasize dependence of various results on the smoothness properties of the
density function and to highlight results that do not require more than our
Assumptions 1,2.
The expressions for bias and variance in Theorem 2.1 of Pagan and Ullah
4
hold (in our notation)
Z
ˆ
Bias(f ) =
ψ(w) [f (σw + x) − f (x)] dw;
(2)
·Z
¸2
Z
2
−1
−1
V ar(fˆ) = (nσ)
ψ(w) f (σw + x)dw − n
ψ(w)f (σw + x)dw (3).
R
Theorem 2.5 holds and implies that MSE fˆ → 0. If additionally |f (x)| dx <
∞, the kernel estimator is uniformly asymptotically unbiased (Theorem 2.4)
and with nσ(log n)−1 → ∞ it is strongly consistent (Theorem 2.7). With
uniform continuity of density (plus assumptions A12,A14) Theorem 2.8 establishes weak uniform consistency of the estimator. Theorem 2.9 holds
under Assumptions 1,2 and provides
Z
³
´
1
ˆ
ˆ
2
(nσ) f − E f →d N(0, f (x) ψ(w)2 dw.
(4)
For the asymptotic normality result in Theorem 2.10 the existence of
continuous second order derivatives of the density function is assumed; then
1
the sharp rate of bandwidth σ = cn− 5 is optimal for a second-order kernel
and the problem of efficient estimation is to find an appropriate c. If higher
order derivatives of density exist, further improvements in efficiency can be
obtained by using a higher order kernel to reduce the bias. For non-smooth
density precise rate results are not available but one can still establish asymptotic normality given a suitable rate for the bandwidth; we provide that in
the next section.
3
3.1
Asymptotic results for the kernel density
estimator
Asymptotic results for the density estimator when
density is continuous
Without differentiability of the density the sharp conditions on the optimal
rate of the estimator do not hold. To express the conditions under which we
can state asymptotic results we define
Z
A(ψ, σ n , x) = [f (σ n w + x) − f (x)] ψ(w)dw,
(5)
5
this is of course just the expression for the bias of the kernel density estimator.
Under assumption 1(b) A(ψ, σ n , x) converges to 0. Under more stringent
differentiability assumptions a sharp rate for A(ψ, σ n , x) could be determined
but we do not make such assumptions.
Theorem 1. Under Assumptions 1 - 2, if σ n is such that as n → ∞
1/2
(a) n1/2 σ n A(ψ, σ n , x) → 0
R
d
then n1/2 σ 1/2 (fˆ(x) − f (x)) → N(0, f (x) ψ(w)2 dw);
1/2
(b) n1/2 σ n A(ψ, σ n , x) → A(ψ), where 0 < |A| < ∞
R
d
then n1/2 σ 1/2 (fˆ(x) − f (x)) → N(A(ψ), f (x) ψ(w)2 dw; )
1/2
(c) n1/2 σ n |A(ψ, σ n , x)| h→ ∞
i
−1 ˆ
f (x) − f (x) − A(ψ, σ n , x) = op (1).
then |A(ψ, σ n , x)|
Proof in Appendix A.
Thus for case (a) (undersmoothing) we obtain a limit normal distribution
and for (b) and (c) the estimator is asymptotically biased. Without making
assumptions about the degree of smoothness of density all that is known is
that for some rate of σ n → 0 there is undersmoothing: no asymptotic bias and
a limiting Gaussian distribution, and for some slower convergence rate of σ n
there is oversmoothing. Existence of an optimal rate depends on convergence
properties of A(ψ, σ n , x) that cannot be asserted without strengthening the
assumptions.
3.2
The joint limit process for density estimators
Assume that fˆ(x, (σ n , ψ)) represents the estimator when the function ψ and
bandwidth σ n are utilized. Consider a number of bandwidths σ n : σ n1 <
σ n2 < ... < σ nm . Assume that σ ni for i ≤ m0 corresponds to undersmoothing
(part (a) of Theorem 1) while σ ni for i such that m0 < m00 < i ≤ m corresponds to oversmoothing (part (c) of Theorem 1). If an optimal rate exists
(e.g. for a h times continuously differentiable density and using an h order
1
kernel the optimal rate is O(n− 2h+1 )˙ - see, e.g., Pagan and Ullah, p. 30) then
one could have m00 > m0 + 1 and σ ni for i = m0 + 1, ...m00 corresponding to
the optimal rate.
We combine each σ ni with each smoothing function ψ j from some set
of functions that satisfy Assumption 2, j = 1, ..., l. Consider A(ψ j , σ i , x).
Define
6
 1/2 1/2
n σ i (fˆ(x) − f (x)) for i = 1, ..., m0



 n1/2 σ 1/2 ((fˆ(x) − f (x)) − A(σ , ψ , x)) for i = m0 + 1, ..., m00
i
j
i
¯−1 h
¯ i
η(σ i , ψ j ) =
ˆ
¯
¯

f (x) − f (x) − A(ψ j , σ i , x)
A(ψ j , σ i , x)



00
for i = m + 1, ..., m.
R
Let τ ψi ψj ≡ ψ i (w) ψ j (w) dw and δ = (τ ψ1 ψ1 , ..., δ ψl ψl ).
Theorem 2. Suppose that Assumptions 1, 2 hold for each bandwidth
σ i , 1 ≤ i ≤ m, and for each ψ j , 1 ≤ j ≤ l and that the functions {ψ j }lj=1
form a linearly independent set1 .
(a) If each σ 1 , ..., σ m0 (m0 ≤ m) satisfies condition (a) of Theorem 1 then
0
(η(σ 1 , ψ1 )0 , ..., η(σ 1 , ψ l )0 , ..., η(σ m0 , ψ 1 )0 , ..., η(σ m0 , ψ l )0 )
d
→ N(0, Ψf (x))
where the lm0 × lm0 matrix Ψ has elements
 τ√ψi ψRj if σ i = σ j ,
{Ψ}ij =
d ψ i (w) ψ j (dw) dw if σ i /σ j = d < ∞,

0 if σ i /σ j → 0 or σ i /σ j → ∞;
(b) If each σ m0 +1 , ..., σ m00 (m0 ≤ m00 ≤ m) satisfies condition (b) of Theorem 1 then
0
(η(σ m0 +1 , ψ 1 )0 , ..., η(σ m0 +1 , ψ l )0 , ..., η(σ m00 , ψ 1 )0 , ..., η(σ m00 , ψ l )0 )
d
→ N(0, Ψf (x))
√ R
where the lm0 × lm0 matrix Ψ has elements {Ψ}ij = d ψ i (w) ψj (dw) dw,
with σ i /σ j = d < ∞;
(c) If each σ m00 +1 , ..., σ m (m00 ≤ m) satisfies condition (c) of Theorem 1
then
0
p
(η(σ m00 +1 , ψ 1 )0 , ..., η(σ m00 +1 , ψ l )0 , ..., η(σ m , ψ 1 )0 , ..., η(σ m , ψ l )0 ) → 0
(d) Cov(η(σ i1 , ψ j1 ), η(σ i2 , ψ j2 )) → 0 for 1 ≤ i1 ≤ m00 and m00 + 1 ≤ i2 ≤
m, and any j1 , j2 .
1
If some linear combination of smoothing kernels ψ j is zero then the joint distribution
is degenerate.
7
R
Thus, if the bandwidths approach 0 at different rates or ψ i (w)ψ j (w)dw =
0, the corresponding estimators fˆ(x, (σ, ψ)) are asymptotically independent.
This is a consequence of the fact that only a small fraction of observations
have any effect on the estimator, therefore reweighting observations with different kernel functions can produce estimators with independent limit processes.
3.3
Extension to multivariate density estimation
Results similar to Theorems 1 and 2 can be obtained for kernel estimators
of multivariate density functions.
TO BE COMPLETED
4
The combined estimator
As the results in Section 3 show, an optimal rate for a density estimator
may be problematic. Here we use the results of Theorem 2 to construct
a new combined estimator that optimally combines several bandwidths and
smoothing functions in the sample instead of focussing on a single bandwidth.
Although efficiency may suffer in straightforward cases when an optimal rate
can be found, the Monte Carlo experiments show that the combined estimator
provides remarkably robust performance over a variety of cases. Section 4.1
defines the combined estimator. Section 4.2 addresses practical issues of
construction of the combined estimator. Section 4.3 discusses performance
in a Monte Carlo experiment.
4.1
Definition of the combined estimator.
Suppose that bandwidths σ n1 < σ n2 < ... < σ nm represent sequences of
rates where σ n1 corresponds to undersmoothing and σ nm to oversmoothing;
some optimal rate may or may not exist. For a set of smoothing functions
ψ 1 , ..., ψ l , Theorem 2 indicates the structure of the joint limit distribution of
fˆ(σ ni , ψ j ).
P
P
Consider a linear combination fˆ({aij }) = aij fˆ(σ ni , ψ j ), aij = 1. Assume that the biases, variances and covariances for all fˆ(σ ni , ψ j ) are known.
Then one could find weights {aij } that minimize the mean squared error
8
MSE(fˆ({aij })). Each individual fˆ(σ ni , ψ j ) is included, thus the minimized
MSE cannot be above the MSE for individual (σ ni , ψ j ).
To determine the weights in practice we need to estimate the biases and
covariances of all fˆ(σ ni , ψ j ).
Denote estimated biases and covariances by ”hats”.
P
d fˆ(σ i , ψ j ))
d fˆ(σ i , ψ j ))bias(
\ fˆ({aij })) = tr ai1 j1 ai2 j2 {bias(
Then MSE(
1
2
1
2
ˆ
ˆ
d
+Cov(f (σ i1 , ψ j1 ), f (σ i2 , ψ j2 ))},
and the combined estimator is
4.2
\ fˆ({aij })).
fbc = fˆ({b
aij }), where {b
aij } = arg min MSE(
(6)
Construction of the combined estimator
4.2.1
Estimation of variances and biases
Consistent estimators for biases and covariances can be obtained by various procedures, e.g. by bootstrap. Note that for i = 1 and j = 1, ..., l all
fˆ(σ ni , ψ j ) are ”undersmoothed” and thus asymptotically unbiased: E fˆ(σ n1 , ψ j ) =
f ; we can write that
bias(fˆ(σ ni , ψ j )) = E(fˆ(σ ni , ψ j )) − E(fˆ(σ n1 , ψ j )).
Then by bootstrap
B
l B
d fˆ(σ ni , ψ j )) = B −1 P fˆs (σ ni , ψ j ) − l−1 B −1 P P fˆs (σ n1 , ψ j ),
bias(
s=1
j=1s=1
where B is the number of bootstrap samples.
Analogously to Hall’s (1992) bootstrap estimator of variance we construct
an estimator of covariance,
d fˆ(σ i1 , ψ j ), fˆ(σ i2 , ψ j ))
Cov(
1
2
= B
−1
B
X
s=1
(fˆs (σ i1 , ψ j1 ) − B −1
X
fˆs (σ i1 , ψ j1 )) × (fˆs (σ i2 , ψ j2 ) − B −1
X
fˆs (σ i2 , ψ j2 )).
In our Monte Carlo experiment we used less computationally intensive esR
fd
(x) ψ(w)2 dw
timators. We estimate variances using the asymptotic formula
nσ
r
djt V ar(ψ i , σ j )V ar(ψ s , σ t )
·
and then approximate covariances Cov(b(ψ i , σ j ), b(ψ s , σ t )) by
δiδs
R
ψ i (w)ψ s (djt w)dw, where djt = σ j /σ t . This result follows from Theorem 2.
9
The estimators with the smallest bandwidth (undersmoothing) are asymptotically unbiased. To find individual biases, we can subtract the average of
estimators with the smallest bandwidth from actual estimators: Biasfˆ(ψ, σ j ) =
fˆ(ψ, σ j ) − fˆ(., σ 1 ).
4.2.2
Selection of functions and bandwidths
2
In our Monte Carlo experiment we use a Gaussian kernel √12π e−x /2 that
corresponds to h = 2. We can utilize even lower order kernels since if the
density is not differentiable there is nothing to be gained from using kernels
of order 2; on the other hand, if the density satisfies a Lipschitz condition, a
second order kernel would provide an advantage (this follows from the fact
that then A(ψ, σ, x) = o(σ)).
The largest bandwidth in the set is a standard ”optimal” bandwidth.
Following Pagan and Ullah (1999) we determine it as 0.79Rn−1/5 , where R
denotes the interquartile range. If this bandwidth belongs to a truly optimal
function/bandwidth combination then as sample size increases it should yield
the fastest convergence rate. Otherwise, it will correspond to oversmoothing.
The lowest bandwidth represents undersmoothing. It is chosen on the
basis of the empirical distribution of |xi − x|. We find such a quantile α of
this distribution that is equal to the largest bandwidth, and obtain other
bandwidths as mi αth quantiles of |xi − x|, i = 1, ..., m.
4.2.3
Estimation procedure for the combined estimator
The entire procedure for a combined estimator includes the following steps:
(i) find the density estimator using a standard bandwidth; (ii) construct the
empirical distribution of |xi − x| and determine m bandwidths; (iii) find the
density estimators for all smoothing functions and bandwidths; (iv) estimate
the biases and the covariance matrix; and (v) find the optimal weights for
the linear combination and compute (6).
4.3
Performance of the combined estimator
If an optimal function/bandwidth pair is included among the (σ ni , ψ j ) and
all the biases and variances are consistently estimated, the true MSE of
the combined estimator at its worst will be approaching the MSE of the
optimal estimator; it will eventually be smaller when the ”optimal” procedure
10
actually selects an inappropriately large σ. Our Monte Carlo study provides
the finite sample confirmation of these relations.
4.3.1
DGP and estimation of the density and combined estimator
We consider three different density functions.
For the first model we use the standard normal distribution: f1 (x) =
φ(x). Its density is infinitely differentiable and very smooth; thus, the density
estimator evaluated at the ”optimal” bandwidth should be a good choice.
The properties of kernel density estimators for this case are well established
and it is important to see how the combined estimator will perform.
In the second model we consider the normal mean mixture f2 (x) =
0.5φ(x + 1.5) + 0.5φ(x − 1.5) (see Sheather and Jones, 1991). This density is also infinitely differentiable; however, it is bimodal and more wiggly
than the standard normal density.
The third model contains a non-smooth density that satisfies the Lipschitz condition everywhere except x = −2. Thus, we expect the combined
estimator to perform well outside of a small neighbourhood of −2, where the
density doesnot satisfy Assumption 1.
5.25 − 5x if x ∈ [0.95, 1.05],




 0.5 if x ∈ [0, 0.95),
0.5 + 5x if x ∈ [−0.1, 0),
f3 (x) =


− 1 − 10 x if x ∈ [−2; −0.1),


 38 38
0 otherwise.
The sample size used in the experiments is N = 2000. 5000 replications
per model have been performed. We work with five bandwidths and one
smoothing kernel, the standard normal density. The combined estimator is
constructed as described in Section 4.2.
4.3.2
Summary of the results
The results are presented in Fig. 1 - 3. For each model, we plot the MSE of a
simple estimator with the standard bandwidth and the MSE of the combined
estimator over the range of values of x with increments ∆x = 0.05. In Fig. 1
(the standard normal density) the simple estimator performs uniformly better
than the combined one. On average, the MSE of the combined estimator is 4
times larger than the MSE of the standard estimator. Both MSE’s, however,
are small in absolute terms.
11
In Fig. 2 (the mixed normal density) we observe that the standard estimator has a very definite oscillating pattern of the MSE. The peaks correspond
to the points of local extrema of the density function. The smallest values
of the MSE are achieved on the segments of the density function that can
be well approximated by a straight line. The combined estimator, on the
contrary, has a relatively stable MSE, which is positioned between the two
extremes of the standard estimator’s MSE.
In Fig. 3 (non-smooth density) the combined estimator clearly dominates the standard estimator in precision. For both estimators, the points
where the density is not smooth (x = −2, −0.1, 0, 0.95, 1.05) cause substantial increases in the MSE but non-smoothness affects the standard estimator
over larger intervals. At x = −2 the density is discontinuous, therefore the
combined estimator is not expected to perform well either.
The Monte Carlo experiments demonstrate that in the absence of information about the smoothness of the density the combined estimator provides
more reliable results than the standard ”optimal” estimator. When using the
combined estimator, we may lose some efficiency in cases of smooth densities. Since such densities are usually estimated very precisely, the difference
in MSE’s of both estimators is not large in absolute terms. At the same
time, when the density is not smooth, the standard estimator can be seriously biased and the improvement offered by the combined estimator is very
significant.
Appendix A. Proofs of the Theorems
Proof of Theorem 1.
From definition (5) we have that
h
i
h
i
1
1
1
(nσ) 2 fˆ(x) − f (x) − (nσ) 2 fˆ(x) − E fˆ(x) = (nσ) 2 A(ψ, σ n , x).
In condition (a) of the Theorem the right-hand side is o(1), thus using (4)
proves (a).
From (b)
h
i
h
i
1
1
(nσ) 2 fˆ(x) − f (x) − (nσ) 2 fˆ(x) − E fˆ(x) − A(ψ) = op (1),
and (b) follows from (4).
For (c) we get from (4) that
¯
¯−1
h
i
h
i
1
1
1
¯
¯
¯(nσ) 2 A(ψ, σ n , x)¯ (nσ) 2 fˆ(x) − f (x) − (nσ) 2 fˆ(x) − E fˆ(x) = op (1)
12
and (c) obtains since nσ → ∞.¥
Proof of Theorem 2.
To prove Theorem 2 all that is required in addition to the results in
Theorem 1 is to consider covariances between η(σ, ψ).
For (a) recall that Eη(σ i , ψ j ) → 0, and
E(η(σ i1 , ψ j1 )η(σ i2 , ψ j2 ))
·µ
¶µ
¶¸
1
1 X
xi − x
1 X
xi − x
2
) − f (x)
) − f (x)
ψ j1 (
ψ j2 (
= n (σ i1 σ i2 ) E
nσ i1
σ i1
nσ i2
σ i2
X
1
1
xi − x
xi − x
)ψ j2 (
)+
Eψ j1 (
= n (σ i1 σ i2 ) 2 2
n σ i1 σ i2
σ i1
σ i2
X
1
1
xl − x
xm − x
Eψ j1 (
)Eψ j2 (
)
+n (σ i1 σ i2 ) 2 [ 2
n σ i1 σ i2 l6=m
σ i1
σ i2
1 X
xl − x
1 X
xl − x
) − f (x)
) + f (x)2 ]
Eψ j1 (
Eψ j2 (
−f (x)
nσ i1
σ i1
nσ i2
σ i2
Z
1
w
x
w
x
−
−
)ψ j2 (
)f (w)dw + o(1),
= (σ i1 σ i2 )− 2 ψ j1 (
σ i1
σ i2
)−f (x) = o(1)
where the last equality follows from the fact that nσ1 i Eψ j ( xlσ−x
i
w−x
for (a). Finally substitute σ i2 /σ i1 = d; v = σi in the last line, then
2
E(η(σ i1 , ψ j1 )η(σ i2 , ψ j2 )) = d1/2
Z
ψ j1 (dv)ψ j2 (v)dv · f (x) + o(1).
Part (a) follows.
Part (b) is obtained similarly by noting that condition (b) implies that
0 < σ i2 /σ i1 = d < ∞ when m0 ≤ i1, i2 < m00 . Part (c) follows from (c) of
Theorem 1. For (d) the covariances are zero because the rates σ i1 /σ i2 → 0.¥
5
References
[1] Kotlyarova Y. and V. Zinde-Walsh (2003) Improving the Efficiency and
Robustness of the Smoothed Maximum Score Estimator, McGill University and CIREQ working paper.
13
[2] Pagan, A. and A. Ullah (1999) Nonparametric Econometrics, Cambridge
University Press.
[3] Sheather, S. and M. Jones (1991) A Reliable Data-Based Bandwidth
Selection Method for Kernel Density Estimation, Journal of the Royal
Statistical Society, Series B, 53, 683-690.
[4] Zinde-Walsh, V. (2002) Asymptotic Theory for some High Breakdown
Point Estimators. Econometric Theory 18, 1172-1196.
14
Download