MEMORANDUM

advertisement
MEMORANDUM
No 6/2001
HOW IS GENERALIZED LEAST SQUARES RELATED TO
WITHIN AND BETWEEN ESTIMATORS IN UNBALANCED
PANEL DATA?
By
Erik Biørn
ISSN: 0801-1117
Department of Economics
University of Oslo
This series is published by the
University of Oslo
Department of Economics
In co-operation with
The Frisch Centre for Economic
Research
P. O.Box 1095 Blindern
N-0317 OSLO Norway
Telephone: + 47 22855127
Fax:
+ 47 22855035
Internet:
http://www.oekonomi.uio.no/
e-mail:
econdep@econ.uio.no
Gaustadalleén 21
N-0371 OSLO Norway
Telephone:
+47 22 95 88 20
Fax:
+47 22 95 88 25
Internet:
http://www.frisch.uio.no/
e-mail:
frisch@frisch.uio.no
No 05
No 04
No 03
No 02
No 01
No 43
No 42
No 41
No 40
No 39
List of the last 10 Memoranda:
By Atle Seierstad: NECESSARY CONDITIONS AND SUFFICIENT
CONDITIONS FOR OPTIMAL CONTROL OF PIECEWISE
DETERMINISTIC CONTROL SYSTEMS. 39 p.
By Pål Longva: Out-Migration of Immigrants: Implications for
Assimilation Analysis. 48 p.
By Øystein Kravdal: The Importance of Education for Fertility in SubSaharan Africa is Substantially Underestimated When Community
Effects are Ignored. 33 p.
By Geir B. Asheim and Martin L. Weitzman: DOES NNP GROWTH
INDICATE WELFARE IMPROVEMENT? 8 p.
By Tore Schweder and Nils Lid Hjort: Confidence and Likelihood. 35 p.
By Mads Greaker: Strategic Environmental Policy when the
Governments are Threatened by Relocation. 22 p.
By Øystein Kravdal: The Impact of Individual and Aggregate
Unemployment on Fertility in Norway. 36 p.
By Michael Hoel and Erik Magnus Sæther: Private health care as a
supplement to a public health system with waiting time for treatment:
47 p.
By Geir B. Asheim, Anne Wenche Emblem and Tore Nilssen: Health
Insurance: Treatment vs. Compensation. 15 p.
By Diderik Lund and Tore Nilssen: Cream Skimming, Dregs Skimming,
and Pooling: On the Dynamics of Competitive Screening. 21 p.
A complete list of this memo-series is available in a PDF® format at:
http://www.oekonomi.uio.no/memo/
HOW IS GENERALIZED LEAST SQUARES
RELATED TO WITHIN AND BETWEEN ESTIMATORS
IN UNBALANCED PANEL DATA ? )
by
ERIK BIRN
ABSTRACT
For a random eects regression model with unbalanced panel data, we demonstrate that the
Generalized Least Squares (GLS) estimator can be expressed as a (matrix) weighted average of
estimators which utilize the within individual and the between individual variation in the data
set. We thus generalize a relationship familiar for balanced panel data. Specic attention must
be given to the intercept of the regression. We also dene an estimator containing the GLS, the
within individual, and the between individual estimators for balanced and unbalanced data as
special cases.
Keywords: Panel Data. Unbalanced panels. Missing observations. Random eects.
Generalized Least Squares. Within estimation. Between estimation
JEL classication: C13, C23
)
I thank Terje Skjerpen for valuable comments.
1 Introduction
It is well known from textbook expositions of xed and random eects regression models
with balanced panel data that the Ordinary (OLS) and the Generalized Least Squares
(GLS) estimators of the coecient vector can be interpreted as (matrix) weighted averages of the estimators which utilize only the within individual and only the between
individual variation in the data set, often denoted as within and between estimators [see
Maddala (1977, chapter 14-3) and Hsiao (1986, section 3.3.2)]. Unbalanced situations,
however, are more common in practice than balanced ones, in particular when using micro
data, due to entry or exit of respondents, non-response, rotation designs, etc. Therefore,
the interest of this weighting relationship from a practical point of view is somewhat limited. There exists a growing literature on GLS estimators of random individual eects
models in unbalanced situations [see, e.g., Birn (1981) and Baltagi (1985)]. The question
of whether, and possibly how, this estimator can be related to estimators which can be
interpreted as within and between estimators has not been addressed in this literature.
Our focus in this note is on the latter question. We demonstrate that a weighting
relationship for GLS with unbalanced panel data and random individual eects similar
to that in the balanced case exists, provided that the within and the between variation in
the data are dened in a suitable way. In deriving this estimator, we show that specic
attention must be given to the intercept term of the equation. Finally, we present a
general, and easily implementable, estimator which contains the GLS, the OLS, the
within individual, and the between individual estimators for balanced and unbalanced
situations as special cases.
2 Model and estimators
Consider a one-way error components regression model for unbalanced panel data in
which individual i (i = 1; : : :; N ) is observed in Ti periods (not all equal), and let t
denote the observation number (which diers from the calendar period if the starting
period of the individuals dier or if gaps occur in the time series of some of them):
(1)
yit = xit + k + it ; it = i + uit;
i IID(0; 2 ); uit IID(0; 2);
i; uit; xit are independent for all i; t;
i = 1; : : :; N ; t = 1; : : :; Ti;
where xit is a (row) vector of regressors, its (column) vector of coecients, i an
individual specic random eect, and uit a disturbance. Let y i = (yi1; : : :; yiTi )0 , y =
(y01 ; : : :; y0N )0, X i = (x0i1 ; : : :; x0iTi )0, X = (X 01 ; : : :; X 0N )0 , etc., and let I m be the m
1
dimensional identity matrix and em the (m 1) vector of ones. The number of observaP
tions, i.e., the number of rows in y and X , is n = Ni=1 Ti . Compactly, the model can
then be written
y = X + enk + ; = [eT0 1 1; : : :; eT0 N N ] 0 + u;
(2)
E() = 0n;1 ; E( 0 ) = ;
where
(3)
= diag(
1; : : :; N );
(4)
i = E(ii0) = 2 eTi eT0 i + 2I Ti = 2K Ti + (2 + Ti2 )J Ti ; i = 1; : : :; N;
`diag' denoting a block-diagonal matrix, J m = (em e0m )=m, and K m = I m ; J m ; m =
1; 2; : : : . Since the latter two matrices are idempotent and have orthogonal columns, we
simply have
(5)
;i 1 = 12 (K Ti + iJ Ti );
where
2
(6)
i = 2 + T 2 ;
i = 1; : : :; N:
i We use the following notation for the within individual, the between individual, and
the total covariation in arbitrary matrices, Z and Q, constructed in the same way as X
above:
Ti
N X
X
W ZQ =
(z it ; zi ) 0(qit ; qi ) = Z 0diag(K T1 ; : : :; K TN )Q;
i=1 t=1
N
X
B ZQ = Ti(zi ; z) 0(qi ; q) = Z 0diag(J T1 ; : : :; J TN )Q ; Z 0J n Q;
i=1
Ti
N X
X
T ZQ =
(z it ; z) 0(qit ; q) = W ZQ + B ZQ = Z 0 (I n ; J n )Q;
i=1 t=1
P P i z = n;1 PN T z .
;
1 PTi
where zi = Ti t=1 z it and z = n;1 Ni=1 Tt=1
i=1 i i
it
.
0
0
f i = (X i .. eTi ) and X
f = (X
f1 ; : : :; X
f N )0. In the following we do not, however,
Let X
include the intercept term and the ones attached to it in the coecient vectors and
regressor matrices, as in, e.g., Baltagi (1995, section 9.2), but specify them explicitly in
the formulae. This is essential in dening between estimators and decomposing the GLS
estimator into within and between estimators for the unbalanced case.
The OLS and GLS estimators of ( 0 k) 0 are
2
3
2P
3;1 2 P
3
0X P X 0 e
0y
b OLS
X
X
0
0
fX
f );1 (X
f y ) = 4 P i i P i Ti 5 4 P i i 5
4
5 = (X
(7)
kbOLS
eT0 i X i eT0 i eTi
eT0 i yi
2
P T x 0 x P T x 0 3;1 2 W + P T x 0 y 3
W
+
i i 5 4
XY
i i i 5
= 4 XXP i i i
P
Tixi
n
Ti yi
2
and 2
3
b GLS
f 0X
f ) ; 1 (X
f 0 y)
5 = (X
(8) 4 b
kGLS
2P
3;1 2 P
3
0
;1 X P X 0 ;1 e
0 ;1 y
X
X
= 4 P 0 i ;i 1 i P 0 i ;i 1 Ti 5 4 P 0 i ;i 1 i 5
eTi i X i eTi i eTi
eTi i yi
2
P T x 0 x P T x 0 3;1 2 W + P T x 0 y 3
W
+
XX
=4
P T xi i i i Pi iT i 5 4 XY P T yi i i i 5 ;
i i i
i i
i i i
respectively, the last equality following from (5). Since the formula for the partitioned
inverse [see, e.g., Lutkepohl (1996, section 3.5.3)] implies
2
3
2
0 ;1
bc0 3
0b !;1
A
b
Q
;
Q
b
5
4
5
4
Q = A; c ;
(9)
b c = ; bc Q bc Q bc0 + 1c ;
where A is a symmetric matrix, b a row vector, and c a scalar, (7) and (8) can be written
as
b OLS = [W XX + P Ti(xi ; x ) 0P(xi ; x)];1
(10)
[W XY + Ti(xi ; x) 0(yi ; y)];
kbOLS = y ; xb OLS ;
and
b GLS = [W XX + P i Ti(xi ; xeP) 0(xi ; xe)];1
(11)
[W XY + iTi(xi ; xe ) 0(yi ; ye)];
kbGLS = ye ; xe b GLS ;
where
P T y
P T x
P T y
P T x
i
i
i
i
i
i
i
e
e
(12)
x = P Ti ; y = P Ti ; x = P iTi ; y = P ii Ti ii :
Note that the global means occurring in the denitions of the OLS and the GLS estimators
dier when Ti depends on i.
The between and within estimators corresponding to b OLS and kbOLS , obtained by
running OLS on
p
p
p
p
(13)
Tiyi = Tixi + Tik + Tii; i = 1; : : :; N;
and on (1) with the k + i 's considered as N unknown constants, are, respectively,
1
b B = [P Ti(xi ; x ) 0(xi ; x)];1 [P Ti(xi ; x) 0 (yi ; y)] = B ;XX
B XY ;
(14)
kbB = y ; xb B ;
and
(15)
b W = W ;XX1 W XY :
p
Note that the disturbances in (13), Ti i , are homoskedastic when no individual eects
occur (2 = 0), and heteroskedastic otherwise.
3
3 The relationship between the estimators
We next consider the relationships between the estimators (10), (11), (14), and (15).
From (10), (14), and (15) we have
(16)
b OLS = (W XX + B XX );1(W XX b W + B XX b B );
regardless of whether the panel data set is balanced or unbalanced. In the balanced case,
where Ti = T and i = = 2 =( 2 + T2 ) for all i, it follows from (11), (14) and (15)
that
(17)
b GLS = (W XX + B XX );1(W XX b W + B XX b B ):
We will now derive a relationship for the unbalanced case similar to the latter.
Let vi (i = 1; : : :; N ) be an arbitrary weight for individual i and multiply its equation
in individual means by the square root of this weight, which generalizes (13) to
(18)
pv y = pv x + pv k + pv ;
i i
i i
i
i i
i = 1; : : :; N:
Running OLS on this equation, we obtain generalized between estimators of and k
2
3 2P
3 2
3
0 x P v x
0 ;1 P vi x
0 y
e B (v)
v
x
i
i
i
i
i
i
i
4
5=4 P
5 4 P
5;
(19)
keB (v)
vixi P vi
vi yi
where v = (v1 ; : : :; vN ), which, when we again use (9), leads to
e B = e B (v) = [P vi (xi ; xe (v ))0(xi ; xe (v ))];1 [P vi (xi ; xe (v))0(yi ; ye(v))] ;
(20) e e
kB = kB (v) = ye(v) ; xe (v)e B (v);
where
(21)
P
P
xe (v) = Pvivxi i ; ye(v) = Pvivyi :
i
This brings us to the main result in this note: The estimators b OLS , b W b B , e B ,
and b GLS are all matrix weighted means of b W and e B (v ) for a suitable choice of v,
since they all belong to the class
(22)
b (W ; B; v) = [W W XX + B f
B XX (v)];1[W W XX b W + B f
B XX (v)e B (v)];
where W and B are arbitrary scalar constants and
f
BXX (v) = P vi[xi ; xe (v)]0[xi ; xe (v)]:
fXX (v) and b (W ; B ; v) are homogeneous in v of degrees one and zero,
Note that B
respectively, while b (W ; B ; v) is homogeneous in (W ; B ) of degree zero. In particular,
4
we have
b OLS = b (1; 1; T );
b W = b (1; 0; T ) = b (1; 0; T );
b B = e B (T ) = b (0; 1; T );
e B (T ) = b (0; 1; T );
b GLS = b (1; 1; T );
where T = (T1; : : :; TN ) and T = (1 T1; : : :; N TN ).
In practical applications, the i 's have to be estimated, which requires estimation of
and 2 . This problem, for unbalanced panel data, is discussed in Searle, Casella, and
McCulloch (1992, section 3.6) and Birn (1999, section 3).
2
4 Conclusion
Our conclusions then are the following:
1. If we dene a modied between estimator of , (20), by choosing the weight vi
such that the weighted equation in individual means, (18), has disturbances which are
homoskedastic with variance 2, we obtain the between estimator for the unbalanced
panel data set, e B (T ). Since var(i ) = 2 + 2=Ti , this choice inplies vi = i Ti (i =
1; : : :; N ).
2. The GLS estimator for the unbalanced case can be interpreted as a matrix weighted
mean of b W and e B (T ), with weights depending on X . Unlike the OLS estimator for
the unbalanced case, it cannot, however, in general be interpreted as a matrix weighted
mean of b W and b B .
3. In the balanced case, in which T = (T; : : :; T ), we have f
B XX (T ) = B XX and
e B (T ) = e B (T ) = b B . This gives the familiar decomposition formula (17).
5
References
Baltagi, B.H. (1985): Pooling Cross-Sections with Unequal Time-Series Lengths. Economics Letters, 18 (1985), 133 { 136.
Baltagi, B.H. (1995): Econometric Analysis of Panel Data. Chichester: Wiley, 1995.
Birn, E. (1981): Estimating Economic Relations from Incomplete Cross-Section/TimeSeries Data. Journal of Econometrics, 16 (1981), 221 { 236.
Birn, E. (1999): Estimating Regression Systems from Unbalanced Panel Data: A Stepwise Maximum Likelihood Procedure. Department of Economics, University of Oslo,
Memorandum No. 20/1999.
Hsiao, C. (1986): Analysis of Panel Data. Cambridge: Cambridge University Press,
1986.
Lutkepohl, H. (1996): Handbook of Matrices. Chichester: Wiley, 1996.
Maddala, G.S. (1977): Econometrics. Auckland: McGraw-Hill, 1977.
Searle, S.R., Casella, G., and McCulloch, C.E. (1992): Variance Components. New York:
Wiley, 1992.
6
Download