A new computational procedure for treating missed observations and log-concave errors

advertisement
641
A new computational procedure for treating
ANOVA panel data models with grouped or
missed observations and log-concave errors
Carmen Anidoa, Carlos Riverob and Teofilo Valdesb*
a
U. Autónoma de Madrid, b U. Complutense de Madrid, Spain
Corresponding author. E-mail: teofilo_valdes@mat.ucm.es.
*
Summary. We present a new computational procedure suitable for treating ANOVA panel
data models with errors within the general class of log-concave distributions when some
observations are either given in grouped form or missing.
Keywords: Panel data, estimation algorithms, ANOVA with grouped data, asymptotics.
1 Introduction
The procedure here presented focuses on the ANOVA panel data model
yit=μ+αi+βt+ηit, (i=1,…,N , t=1,…,T). The sub-indices i and t represent individuals and time periods, respectively, yit denotes the data of the individual i at time t,
and the ηit´s are random errors. The remaining terms are parameters with the usual
constraints α1+…+αN=0 and β1+…+βT=0. Our intention is to test for constant effects when the two following assumptions are made: a) some of the values yit are
given in grouped form (meaning that their exact values are lost, although we know
their grouping intervals) or are missing (a special case of grouped data); and b) the
errors follow an arbitrary distribution. The usual statistical methods become inapplicable under the assumptions made. As an example, let us suppose that we pretend to test that the time effects are constant (that is, β1=…=βT). The test implicitly means that T is fixed, thus, if we intend to apply asymptotic techniques, only
N must tend to infinity. However, as N increases, the number of parameters also
increases due to the existence of individual-terms. Thus, the search for the asymptotic distribution, as N→∞, of all the parameters of the model becomes pointless.
In spite of this, the search for the asymptotic distribution of the time-dependent
parameters is pertinent, as its dimension is assumed to be fixed. It is known (see
642
Hsiao 2003 or Baltagi 1995) that several transformations of the aforementioned
model
may
eliminate
the
individual-terms
(e.g.,
through
yit − yi . = β t + (ηit − ηi . ) , which could be estimated by generalized least squares
in the case of normal errors). However, the existence of grouped data may result in
the majority, or even all, of the values yit − yi . becoming grouped. Therefore, the
transformation cited above becomes inapplicable. Additionally, if the independent
errors ηit do not follow a normal distribution, then the transformed model errors
ηit − ηi . are dependent and their distribution may be difficult to determine.
Assuming the existence of grouped data, our procedure is valid to estimate the
time(individual)-dependent parameters of model (2.1), and, also, to test for constant time (individual) effects in this model from the asymptotic distribution of
these parameters as N→∞ (T→∞). Assuming that N and T are fixed, the estimation of the N+T-1 independent parameters of model (2.1) may be tackled through
the EM algorithm (see Dempster, Laird and Rubin 1977, McLachlan and Krishnan
1997, Lange 1999, for instance). The rationale of our proposal will be explained in
detail later. For the moment, it is sufficient to make the following general remarks:
the embryo of our proposal was the EM algorithm, assuming normal errors; in
spite of this, our procedure is applicable for the wide class of log-concave (or
strongly unimodal, see An 1998) distributions; and, finally, testing for constant effects is accomplished through the estimation of the asymptotic covariance matrix
of the time (individual) parameter estimates.
Briefly: in section 2 we describe the insides of the estimating algorithm; section
3 is devoted to the hypotheses testing computational procedure; and, finally, its
performance is considered in section 4.
2 Notation and the rationale of the algorithm
Let us consider the panel data model
yit = μ + α i + β t + ηit ,
(i=1,…,N , t=1,…,T)
(2.1)
on which we will intend to test for constant time effects (β1=…=βT), as testing for
constant individual effects is similar. The error terms ηit are i.i.d. following a
known density function f>0, supposed to be symmetric, and strongly unimodal.
Model (2.1) can be rewritten in the forms:
yi = eT μ + Z i α + Xβ + ηi , i=1,…,N
(2.2)
Y = Qδ + Η .
(2.3)
In (2.2), yi=(yi1,…,yiT)´, ηi=(ηi1,…,ηiT)´, eT=(1,…,1)´ of order T, α=(α1,…,αN-1)´
and β=(β1,…,βT-1)´ are the vectors of independent individual and time parameters;
Zi=(0T,…, 0T, eT, 0T,…, 0T), the eT in the i-th column, if i<N, otherwise
643
Z N = −eT ⊗ e′N −1 , where ⊗ denotes the Kronecker product and 0T is the null
vector of order T; finally, X=(IT-1,-eT-1)´, IT-1 being the identity matrix of order Tδ=(μ,α´,β´)´,
Η=(η1,…,ηN)´,
and
1.
In
(2.3),
Y=(y1,…,yN)´,
where
Z=(Z1,…,ZN)´.
Let
us
deQ = (eT ⊗ 1N , Z , X ⊗ e N ) ,
fine y = N −1T −1 ( y11 + ... + y NT ) , yi . = T −1 ( yi 1 + ... + yiT ) and y .t in a similar
way. As N→∞, it is known that, in the absence of grouped data, μ and β can be
unbiased and consistently estimated from OLS by means of
μˆ = y , and
(2.4)
βˆ = AT-1−1 ( y.1 − y.T ,..., y.T −1 − y.T )′ = (y.1 − y ,..., y.T −1 − y ) ,
where AT-1=IT-1+1T-1 and 1T-1 is the square matrix of order T-1 with all of its elements equal to one. The OLS estimate of α is unbiased and agrees with
(
)
′
′
αˆ = AN-1−1 ( y1. − y N . ,..., y N −1. − y N . ) = y1. − y ,..., y N −1. − y
(2.5)
From this point on, we will assume that some of the values yit are recorded in
grouped form. In this case, yit is lost, although an interval (lit,uit] is known, such
that -∞≤lit<yit≤uit≤∞. If lit=-∞ and uit=∞, then yit is missing. Thus, it holds one of
the following options for every it: either the data yit is observed, or yit is grouped;
in this latter case, there exist some known classification intervals, given by their
extremes -∞=cito<cit1< …< citr=∞, and the former (lit,uit] agrees with one of the
intervals (cits, cits+1] mentioned above. The existence of grouped data makes the
use of (2.4) and (2.5) impracticable. Assuming that the number of individuals and
time periods are fixed, let us partition the set I={it| i=1,…,N, t=1,…,T} in the two
sets: Io={it|yit is observed, i=1,…,N, t=1,…,T} and Ig=I-I0={it|yit is grouped,
i=1,…,N, t=1,…,T}. We propose to estimate the true vector parameter δ by means
of the following algorithm:
INITIALIZATION: Let δ0=(μ0,α0,β0)´ be an initial estimate of δ.
ITERATION: Assuming that δp is known, the loop is given by:
Step 1 (Conditional expectation imputation of the grouped data given δp):
For every double sub-index, it, let us define
( )
yit δ p = yit I (it∈Io ) (it ) + ŷit I (it∈I g ) (it ) ,
Where
(
(2.6)
)
ŷit = E yit | lit < yit ≤ uit ,δ p = μ p + α ip + β t p + ηˆ it , and
ηˆ it = E (ηit | lit − μ p − α ip − β t p < ηit ≤ uit − μ p − α ip − β t p ) .
Step 2 (OLS estimates from the imputed values): Let us update the current estimate of δ by means of the expressions
644
μˆ = y (δ p ) ,
(2.7)
α̂ip + 1 = yi . (δ p ) − y (δ p ) , (i=1,…,N), and
(2.8)
( ) ( )
(2.9)
β̂tp+1 = y.t δ p − y δ p , (t=1,…,T).
Step 3 p←p+1 and return to Step 1, until the convergence is achieved.
Algorithm proposed to estimate the parameters of the panel data model
As said, this algorithm can be applied when the error distribution falls within
the wide class of symmetric and strongly unimodal distributions. If the errors follow a normal distribution, the proposed algorithm agrees with the EM algorithm,
thus its limit point is the maximum likelihood estimate of δ given the existence of
grouped data. With non-normal errors, the proposed algorithm differs from the
EM. In spite of this: (a) the algorithm converges, independently of the initial point,
ˆ , βˆ )′ , since it will define our estimate of
δ0, to a unique point denoted by δˆ = ( μˆ , α
δ; and (b) under a weak condition, the asymptotic distribution of β̂ given α, as
N→∞, is a multivariate normal centered on β. We will start by precisely stating
the weak condition mentioned above.
Let us denote the rows of Q in (2.3) by qit′ = (1, uit′ ,vit′ ) , where
uit=(0,…,0,1,0,…,0)´ is of order N-1, the 1 in the i-th position, if i<N, and otherwise uNt=-eN-1; the vectors vit, of order T-1, are defined in a similar way. When the
existence of grouped data is assumed, we can write
∑
v v′
it it it
=
∑
it ∈I o
vit vit′ +
∑
it ∈I g
vit vit′ .
Basic hypothesis: Assuming T to be fixed, ξN>ξ>0 for all N≥No, where ξN is the
−1 −1
minimum eigenvalue of the matrix N T
∑
it∈I o
vit vit′ .
If the data grouping mechanism is random, this hypothesis needs to be fulfilled
almost everywhere (a.e.). The basic hypothesis holds, for example, in the following simple cases: (a) each data yit has probabilities p>0 and 1-p of being observed
and of being given in grouped form, respectively; and (b) the values yit (t=1,…,T)
of the individual i have probabilities pi>0 and 1-pi of being observed or being
given in grouped form, respectively, and we assume that N-1(p1+…+pN) →p>0
a.e., as N→∞.
Theorem 1
i) Assuming that N and T are fixed, the sequence {δp=(μp,αp,βp)´} generated by
ˆ , βˆ )′ , for
the proposed algorithm converges, as p→∞, to a unique point, δˆ = ( μ̂ ,α
0
0 0 0
any initial point δ =(μ ,α ,β )´.
If the basic hypothesis holds, then
645
ii) (Time-dependent part: T fixed) As N→∞, β̂ and μ̂ are consistent estimates of
(
(
) )
(
)
NT βˆ − β / α and L NT (μˆ − μ ) / α
ˆ − μ ) , respectively,
denote the distribution laws of NT βˆ − β and NT (μ
ˆ − β / α and
assuming α to be known. It holds that, as N→∞, L NT β
the true parameters β and μ. Let L
(
(
)
)
(
) )
(
NT (μˆ − μ ) / α weakly converge to some normal distributions, N(0,Λ(α))
and N(0,σ(α)), respectively, for some non-null covariance matrix Λ(α) and some
σ(α)>0. Additionally, Λ(α) and σ(α) can be consistently estimated by a certain
matrix and value, Λ̂ (α ) and σ̂ (α ) .
L
A similar result can be obtained for the individual-dependent parameters. It
suffices to interchange N and T and to substitute vit by uit.
The form of the estimate Λ̂ (α ) may be intuited from (3.1). These results will be
taken into account in the next sections.
3. Testing computational procedure on the effects
Let us test for null time effects (H0) at level 0.05. From ii) of Theorem 1,
(
)
βˆ ≈ N β , N −1T −1Λ̂ (α ) , if N is sufficiently large. Thus, under H0,
−1
NTβˆ ′Λˆ (α ) βˆ approximately follows a chi-square distribution with T-1 degrees
of
(
freedom
(denoted
by χT −1 ).
2
)
(
)
Pr χT2 −1 > χT2 −1 (0.05 ) = 0.05 , then
If
Pr NTβˆ ′Λˆ (α ) βˆ > χ (0.05 ) = 0.05 ; thus we use as our critical region:
R0.05 = βˆ | NTβˆ ′Λˆ (α ) β > χT2 −1 (0.05 ) . To test the null hypothesis at level l
−1
{
2
T −1
−1 ˆ
}
(0<l<1), the value 0.05 must be substituted by l. Until this point, we have supposed α to be known. If the individual parameters are unknown, the limit point
δˆ = ( μ̂ ,αˆ , βˆ )′ , satisfies that αˆ i = yi . ( δˆ ) − y ( δˆ ) , where the values yit (δ̂ ) are the
result of imputing the grouped data using δˆ , as indicated in (2.6). If the proposed
algorithm is run assuming that the individual effects are known and equal to
yi . ( δˆ ) − y ( δˆ ) , then the estimate of the time effects and the constant term agree
ˆ
with β̂ and μ̂ , respectively. If T is sufficiently large, from the consistency of α
in Theorem 1, we propose to test null time effects through the steps 1-3 below:
1.
Run the former algorithm to estimate the complete parameter
2.
Calculate the matrix
δˆ = ( μ̂ ,αˆ , βˆ )′ ;
646
()
⎛
Λˆ δˆ = NT ⎜⎜
⎝
N
T
i
t
∑∑
⎞
m*it vit vit′ ⎟⎟
⎠
−1
⎛
⎜
⎜
⎝
∑∑ ( ( ))
N
T
i
t
⎞⎛
Var ηit δˆ vit vit′ ⎟⎟⎜⎜
⎠⎝
N
T
i
t
∑∑
⎞ (3.1)
m*it vit vit′ ⎟⎟ ,
⎠
−1
where
m*it = 1 if it∈Io , and m*it =
is grouped; ηit
() (
(δˆ ) = η
it
d
E (ηit | lit + z < ηit ≤ uit + z )
if yit
dz
z = − μˆ −αˆ i − βˆ t
if it∈Io and, otherwise,
)
) ) , for
ηit δˆ = E ηit | − μˆ − αˆ i − βˆ t + cits < ηit ≤ − μˆ − αˆ i − βˆ t + cit ( s + 1)
(
ˆ − αˆ i − βˆ t + cits < ηit ≤ − μˆ − αˆ i − βˆ t + cit ( s +1
with probability P − μ
( ( ))
s=0,..., r-1; and, finally, Var ηit δˆ N = Var (ηit ) if it∈Io and, otherwise,
( ( )) agrees with the “between” variance of η
Var ηit δ̂
given the classification intervals of yit, thus, the “within” variance is now excluded;
3. For testing H0: β=0 at the level l, let us take as the critical region
it
{
()
}
(3.2)
−1
R̂l = βˆ | NTβˆ ′Λˆ δˆ βˆ > χT2 − 2 (l ) ;
thus, H0 must be rejected if
βˆ ∈ R̂l , and accepted otherwise.
Computational procedure to test for null time effects
4. Simulations and real life case studies
We have analyzed the performance of our procedure by (a) several simulations,
and (b) some real life case studies.
(a) Simulations: We have tested for constant time effects in model (2.1). Let us
denote φi=μ+αi (i=1,…,N), thus
μ =φ
and
α i = φi − φ
. We have fixed T=4,
and the values φi were selected uniformly on the interval (-5,5) for N=15,25,50.
Over 200 sets of values were assigned to (β1,β2,β3,β4); the first was (0,0,0,0) and
the remaining increasingly differ from the null vector, the intention being to
evaluate the test power function in these points. Then, the errors were generated
from the distributions: a) Laplace(1), with the density f(u)=2-1exp(-|u|); b) Logistic, with density f(u)=e-u(1+e-u)-2; and c) N(0,1). Once the values yit were generated, each was grouped with probability 0.5, in which case the grouping intervals
were (−∞,-5], (-5,0], and (0,∞). With the resulting data, (1) we have run the estimating algorithm to obtain δ̂ , using the STOP condition ||δp-δp-1||2≤10-5; (2) Λ̂ (δ̂ )
ˆ ′Λˆ ( δˆ )−1 βˆ and the critiwas computed from (3.1); and (3) we have calculated NTβ
647
cal region R̂l given in (3.2) was determined. This was repeated 300 times, and we
recorded δ̂ (r ) and Rˆ ( r ) , which were obtained in the replication r. With δ̂ (r ) , we
0.05
have calculated the empirical multivariate mean square error of β̂ , showing that it
is unbiased, to three decimal places. With Rˆ ( r ) , we have tested the hypothesis
0.05
H0: β=0 and we have computed the number, M, of replications in which H0 was
rejected, at level 0.05. The fraction M/300 represents the empirical test power
function, φ(.), in the true value of β. Figure 1 depicts the cross-section (assuming
β3=0) of this function.
1.0
1.0
1..0
0.0
0.0
0.0
(a) Laplacian errors
(b) Normal errors
(c) Logistic errors
Figure 1 Cross-section of the test power function, φ(β1,β2,0), when N=25
(b) Real life case studies: We have used data from the European Community
Household Panel (ECHP), which is a inter-state and multi-purpose annual longitudinal survey coordinated by Eurostat, the Statistical Office of the European Community. The Spanish part of ECHP has been used in this section. For each household i, we have taken the variable yit of model (2.1) to be the standardized log of
the overall income of the household´s members at year t. Our data available corresponds to years from 1998 to 2001, thus T=4. The number of different households
sampled was 6233. However, not all of these were observed in the four years,
since certain households are created and destroyed at every period. Our data is,
thus, an incomplete panel, and we have treated missing data as grouped on the
whole line. The number of missing observations was 4194, which represents the
16.8% of the 24932 possible observations (6233 households × 4 years). Finally,
we have assumed in this study that the error terms of model (1) are normal. Our
goal has been to test the hypotheses H0: β1=β2=β3=0 against H1: not H0, through
ˆ δ̂ β resulted to be
the proposed algorithm. The value of the test statistic NTβˆ ′Λ
()
149.71 (p value equal to 0.0000). Thus, there is the strongest evidence that the
time parameters of model (2.1) differ from one year to other. All of the former
computations were repeated with two special sub-samples: one-person households
(1048 sampled units, with a 13.55% of missing data) and households composed of
648
a couple without children in charge (2212 observations, with a 24.82% of missing
data). For the first sub-sample the p value resulted to be 0.3195; thus, the hypothesis of null time effects is accepted at any level of significance lower than this. On
the contrary, for the second sub-sample the same hypothesis is rejected at both the
5% and 1% level of significance, as the value 14.33 of the test statistic corresponds to a p value equal to 0.0025.
Complete results and the source program are available at request.
Acknowledgements
This paper springs from research partially funded by MEC (grant MTM200405776) and EUROSTAT (Contract No 9.242.010).
References
An, M. Y., (1998) Logconcavity versus logconvexity: a complete characterization,
Journal of Economic Theory, 80, 350-369.
Baltagi, B. H.,(1995) Econometric analysis of panel data. Wiley.
Dempster, A. P., Laird, N. M., Rubin, D. B., (1977) Maximum likelihood from incomplete data via the EM algorithm, J. Roy. Soc., 39 B, 1-22.
Hsiao, C., (2003) Analysis of panel data, Cambridge University Press.
Lange, K., (1999) Numerical analysis for statistician, Springer.
McLachlan, G. J., Krishnan, T., (1997) The EM algorithm and extensions, Wiley.
Download