Small Sample Performance of Dynamic Panel Data Growth Data

Small Sample Performance of Dynamic Panel Data

Estimators: A Monte Carlo Study on the Basis of

Growth Data

Nazrul Islam

Department of Economics

Emory University

Current Draft: August, 1998

---------------------------------------------------------------------

Initial versions of this paper were presented in seminars at Harvard University and

Emory University. I would like to thank the participants of these seminars whose comments helped improve this paper. All remaining errors are mine.

Small Sample Performance of Dynamic Panel Data

Estimators: A Monte Carlo Study on the Basis of Growth

Data

1.

Introduction

This paper investigates the small sample properties of dynamic panel data estimators as applied for estimation of growth-convergence equation using Summers-Heston (1988,

1991) data set. It shows that, among various estimators used for this purpose, the ones not using lagged dependent variable as instrument perform better than the ones that do. This is an important result, which is useful in assessing the results on convergence equation presented recently by different authors using different dynamic panel data estimators.

One of the issues around which new growth literature has evolved is that of convergence . Statistically, convergence has been interpreted as a negative correlation between initial level of income and subsequent growth rate. Hence, the popular method for testing convergence hypothesis has been to conduct growth-initial level regressions . For a long time, these regressions were estimated using cross-section data only. However, in the early nineties, it was observed that the cross-section approach suffers from certain important limitations in this regard. This led to the panel approach, and since then use of panel data in estimating the convergence equation has become quite common

Close scrutiny show that the growth-initial level regression is actually a dynamic panel data model. This provides the rationale for using dynamic panel data estimators. The resulting panel estimates generally differ from the cross-section estimates. However, they also differ among themselves. This poses a problem because the properties of the dynamic panel data estimators are mostly asymptotic, and hence, their small sample properties are apriori not known. It is therefore difficult to judge which of the different panel estimates are more reasonable.

The Monte Carlo study presented in this paper is a response to this difficulty. It is customized to the data set and specification that are generally used in growth and convergence studies. The estimators included in this study are the instrumental variable (IV) estimators of Anderson and Hsiao (1981, 1982), the multivariate estimators, including the

Minimum Distance estimator, suggested by Chamberlain (1982, 1983), and the generalized

1

method of moments (GMM) estimators proposed by Arellano (1989a, 1989b) and Arellano and Bond (1991). In addition, we also include conventional estimators like the ordinary least squares (OLS), least squares with dummy variable (LSDV), and a few other simultaneous equations estimators.

Two things emerge from this exercise. First , wide variation is actually observed in the results from different dynamic panel estimators. Second , estimators that rely on further lagged values of the dependent variable as instruments perform worse than the estimators that do not rely on them as instruments. These two results together suggest that significant caution should accompany presentation of results from specific panel estimator. In fact, it seems desirable that researchers check out the reliability of their panel results by conducting their own Monte Carlo experiments.

2.

Previous Monte Carlo Studies

The issue of small sample properties of dynamic panel data estimators is not entirely new. There have been earlier attempts to investigate it. For example, Nerlove (1967) considers a simple auto-regressive model with no exogenous variable, and compares the performance of OLS, LSDV, MLE, and several variants of GLS in estimating the model. In

Nerlove (1971), the dynamic panel data model is extended to have exogenous variable. This allows consideration of instrumental variable (IV) estimator with lagged values of the exogenous variable as instrument. It also allows having another variant of two-stage GLS.

Overall, Nerlove’s Monte Carlo results favor GLS estimators over other estimators.

Since Nerlove’s work, there have been significant developments in the field of dynamic panel data estimators. Among these has been introduction of Anderson and Hsiao

IV estimators which use further lagged values of the dependent variable as instruments.

Arellano has carried this idea forward and has proposed using all possible further lagged variables as instruments within a GMM framework. Arellano and Bond (1991) perform a

Monte Carlo study to compare primarily the small sample properties of their GMM estimators with the corresponding properties of Anderson-Hsiao estimators. According to their results, the GMM estimators perform better than Anderson-Hsiao IV estimators though not so much in terms of bias as in terms of dispersion.

2

3. Objectives of the Present Study

Given these earlier studies, what can be the motivation for the Monte Carlo study presented in this paper? There are several of them. First , Monte Carlo results have been generally found to be more useful when these are customized to the particular model, data set, and range of parameter values that pertain to the actual problem to which application of panel data estimators is being considered. Thus, in view of the controversy regarding convergence parameter estimates, it is useful to have Monte Carlo results that are specific to the growth-convergence equation, specific to the Summers-Heston growth data set, and specific to the pertinent range of parameter values. This is exactly what is done in this exercise. This also leads to the following second motivation. The previous Monte Carlo studies have generally focussed on panel data models with random individual effects.

However, the individual effect that arises in the growth-convergence equation is of correlated nature. There have been no Monte Carlo results, as far as we know, on models with correlated effects. The third motivation for this study arises from the fact that several important dynamic panel estimators do not find place in the earlier Monte Carlo studies.

Among these are the estimators that use the multivariate approach, like the three stage least squares (3SLS) and generalized three stage least squares (G3SLS). In addition,

Chamberlain’s MD estimator also remains outside the purview of the previous Monte Carlo studies. The MD estimator using optimal weighting matrix, sometimes referred to as

Optimal Minimum Distance Estimator (OMDE), has been used as the benchmark for judging theoretical efficiency of many of the GMM estimators proposed by Arellano. So it is of some interest that Monte Carlo study covered these estimators as well. The present study takes a comprehensive view and includes almost all the known dynamic panel data estimators for evaluation. Fourth , the present study does not limit itself to only one particular generating mechanism of the transitory error term. Instead, it considers three different generating schemes are considered, namely uncorrelated, autoregressive, and moving average. Finally , the study also investigates the impact of sample size on relative performance of the estimators. This is very relevant for the problem at hand. The convergence equation has generally been estimated in the context of two different samples of countries, one being small and the other, large. It is therefore necessary to see how different estimators perform in these samples of different size.

3

Model and Parameter Values

The Model

The dynamic panel data model that arises in the convergence literature is as follows:

(1) y it

= α y i , t

−

1

+ β x i , t

−

1

+ µ i

+ η t

+ ν it

.

Here y represents log of per capita GDP of country i at time t , it y i , t

−

1

is the same lagged by one period, and x i , t

−

1

is the difference in the log of investment and population growth variables of country i at time t1. Finally,

µ i

and

η t

are the individual and time effect terms, and

ν it

is the transitory error which varies across both individual and time.

The most important issue for a panel model is specification of the individual effect term

µ i

. Growth theory implies that

µ i

is correlated with the included explanatory variable x i , t

−

1

. This means that the random-effects assumption is not appropriate for this model. The most appropriate model is that of correlated effects . However, there are different ways to specify this correlation. Of the linear specifications, the simplest is the one proposed by

Mundlak (1971), whereby the individual effect term is assumed to be a linear function of the mean (over time) of the exogenous variable for the individual concerned.

1

Chamberlain suggests a more general specification whereby the individual effect term is a linear function of the exogenous variable for all the time periods with varying coefficients. This still leaves out the possibility for specification to be non-linear, but interpreted as linear predictor, this does not involve any restriction. According to this more general specification, we have

(2)

µ i

= λ

0

+ λ

1 x i 0

+ λ

2 x i 1

+ Λ + λ

T x

T

−

1

+ ω i

, with

ω i

distributed as N ( 0 ,

σ 2

ω

) .

2

Parameter Values

1

However, if the transitory error term

ν it

is serially uncorrelated, then this specification of the individual effect renders the random-effects model to be equivalent to the fixed effects model.

2

Another issue that arises here is that of dealing with the time effect term

η t

. One possible approach in this regard is running residual regression, which in this case would mean subtracting out the cross-section means from the variables included in the equation. The other approach is to include appropriate set of time dummies.

4

Considered in full, the model presented in equation (1) and (2) has three sets of parameters. Of foremost importance is, of course, the auto-regressive parameter

α

, followed by the slope parameter

β

, attached to the exogenous variable x i , t

−

1

. The other group consists of

λ

0

,

λ

1

,

Κ

,

λ

T

, parameters that arise from specification of the individual effect term

µ i

. Finally, the third set of parameters are those governing the error terms

ν it and

ω i

. Their number further increases when

ν it

is considered to be serially correlated. For

ν it

, the following three processes are considered:

1.

Serially uncorrelated with

ν it

~ N ( 0 ,

σ

ν

2

) .

2.

MA (1) process:

ν it

= ε it

+ θ ε i , t

−

1

, with

ε

~ N ( 0 ,

σ

ε

2

) .

3.

AR (1) process:

ν it

= ϕ ν i , t

−

1

+ ε it

, with

ε

~ N ( 0 ,

σ

ε

2

) .

The values of these parameters for simulation are chosen in a data-dependent manner. A three-step procedure is employed for this purpose. In the first, we obtain consistent estimates of

α

and

β

using x it

’s as instruments. These are used to compute the composite residual (

η t

+ µ i

+ ν it

) . These residuals are then regressed on x ’s to get it estimates of

λ

’s and of the time dummies. The residuals from this second regression are then used to get the parameters of different generating mechanisms of

ν it

.

3

Table-1 gives the values of those parameters of the model that stay the same under all different generating schemes of

ν it

.

In growth-convergence studies, three different samples have been frequently used.

Following Mankiw et al. (1992), these may be named as NONOIL, INTER, and OECD. Of these, OECD is the smallest and consists of 22 OECD countries. NONOIL is the largest and consists of most of the sizable countries of the world for which oil extraction is not the dominant economic activity. This sample consists of 96 countries. Finally, INTER is an intermediate sample comprised of all those countries included in NONOIL except those for which data quality is not that satisfactory. This sample has 74 countries.

3

Details regarding these steps can be obtained from the author upon request.

5

η

70

η

75

η

80

η

85

λ

1

λ

2

λ

3

λ

4

λ

5

Parameter

α

β

λ

0

Table-1

Common Parameter Values

NONOIL INTER

0.7886

0.7925

0.1641

0.1732

1.3334

1.3588

OECD

0.6294

0.0954

2.8986

-0.0028

0.1927

0.5863

0.1200

-0.1098

-0.6354

-0.1243

-0.1644

-0.0702

0.0267

0.2277

0.1286

0.6355

0.1715

-0.3484

0.0171

0.0093

-0.0156

-0.0015

0.0680

0.0827

-0.0067

0.0218

-0.0669

-0.0523

0.1295

0.1238

Some notable aspects of these parameter values are as follows. First , there seems to be some agreement across samples regarding direction in which x ’s for different years it relate to the individual effect term

µ i

. However, this agreement is not complete. Second , the way different time periods affect the growth process differs across samples. There are some differences in this regard between NONOIL and INTER samples, but more significant is the difference between these two samples, on the one hand, and OECD, on the other.

Table-2 presents the parameter values for the three different generating mechanisms of

ν it

. Notable aspects of this set of parameters are as follows. First , any serial dependence that

ν it

may have in actual data is of fairly low order. Second , variance of the individual country effect term remains quite stable under alternative generating schemes of

ν it

for all different samples. Finally , the estimate of the variance of

ν it

also remains very similar.

These suggest that the problem of serial dependence in the data is not serious, and the relative performance of different estimators may not vary widely for different ways of modeling of

ν it

. We assume that all the disturbance terms have normal distribution.

6

Table-2

Parameter Values for Different Generating Mechanisms of

ν it

Parameter

σ

ν

σ

ω ϕ

σ

ν

σ

ω

σ

ε

θ

σ

ν

σ

ω

σ

ε

NONOIL INTER OECD

Uncorrelated

ν it

0.1054

0.0872

0.0300

0.0139

0.0762

0.1281

MA(1)

ν it

0.2037

0.1179

0.1225

0.1250

0.1125

0.0990

0.0302

0.1010

0.0742

0.0980

0.0300

0.1153

AR(1)

ν it

0.2994

0.1227

0.1183

0.1171

0.1787

0.1394

0.0943

0.0319

0.0995

0.0742

0.0927

0.0316

Estimators Considered and Related Issues

Estimators Included in the Study

The following gives a complete list of the estimators included in this Monte Carlo study. It also shows the abbreviations that will be used to refer to these estimators in the remaining of the paper.

1.

Ordinary Least Squares (OLS)

2.

Least Squares with Dummy Variables (LSDV)

3.

Anderson-Hsiao Instrumental Variable Estimator in Level Form (AH(l))

4.

Anderson-Hsiao Instrumental Variable Estimator in Difference From (AH(d))

5.

Arellano GMM Estimator, One Step (AGMM1)

6.

Arellano GMM Estimator, Two Step (AGMM2)

7.

Two Stage Least Squares Estimator (2SLS)

7

8.

Three Stage Least Squares Estimator (3SLS)

9.

Generalized Three Stage Least Squares Estimator (G3SLS)

10.

Minimum Distance Estimator (MD)

Brief description of these estimators, as applied to the model considered in this study, is provided in Appendix. We can, therefore, directly proceed to presentation of the

Monte Carlo work.

Sample Size and Related Issues

:

Given a certain number of cross-sections available (i.e., given T ), different panel data estimators can make use of different numbers of these in the final stage of estimation. In simulation, therefore, it is possible to adopt two different approaches. First , it is possible to keep the actual number of cross-sections used by the estimators the same by generating varying number of cross-sections for different estimators. Second , the number of crosssections generated may be kept the same, and the number of actual cross-sections used in the final stage of estimation by different estimators may be allowed to vary. It is the second situation that a researcher faces in actual practice. She confronts a given data set and then has to choose from among different estimators depending on their relative merits. In order to conform to this real situation, this study adopts the second approach. This also conforms to the fact that this Monte Carlo study is based on a given data set, namely Summers-Heston growth data set. The panel based on this data set is considered at five-year intervals. Thus, the data available are for 1960, 1965, 1970, 1975, 1980, and 1985. Since this is a dynamic model as given by equation (1), the effective T equals to five.

Issues Particular to Individual Estimators

OLS :

OLS, the simplest of all estimators considered, is applied to the equation in the level form. Since initial values of y (i.e., of 1960) are known, OLS can use in actual estimation it all of the T cross-sections. OLS, however, is not geared to providing estimates of either

λ t

’s or

µ i

’s. It gives an estimate of the composite error term u but cannot provide its further it

8

decomposition into the component parts,

µ i

and

ν it

. Also, OLS is inconsistent even with T going to infinity.

LSDV :

LSDV is also applied to the equation in level form and, for the same reasons as with

OLS, all of the T cross-sections can be used in actual estimation. It can provide estimates of

µ i

’s (viewed as parameters to be estimated) and hence of

λ t

’s, from a subsequent regression of the estimated

µ i

on the x s. Also, it can give estimates of variances of the it two error components separately. The between-group variations or, more directly, variance of the estimated

µ i

’s, can be used to get an estimate of

σ 2

µ

. On the other hand,

σ v

2

may be obtained from the within-group variations. LSDV estimator, though not consistent in the direction of N, is consistent when T goes to infinity.

AH(l) and AH(d) :

Both these IV estimators apply to the model in first-differenced form. This results in the ‘loss’ of one cross-section from the actual estimation exercise. Furthermore, since in

AH(l) y i , t

−

2

is used as instrument for ( y i , t

−

1

− y i , t

−

2

) , one more cross-section is lost in the process. Thus, if T cross-sections are available, the number of cross-sections actually used is

(T-2) . AH(l) is mainly geared to consistent estimation of the slope parameters

α

and

β

. It can also provide an estimate of

σ

ν

2

. However, other parameters of the model cannot be recovered using this estimator. AH(d) has similar features as those of AH(l). However, since

AH(d) uses ( y i , t

−

2

− y i , t

−

3

) as instrument for ( y i , t

−

1

− y i , t

−

2

) , two cross-sections are lost in the process. Hence with T cross-sections available, only (T-3) are used. When T is small, this may be an important consideration.

AGMM(1) and AGMM(2) :

Like the AH estimators, both AGMM(1) and AGMM(2) work with the model in first differenced form. This results in loss of one cross-section from the actual estimation process.

AGMM(1) is based on arbitrary weighting matrix, whereas in AGMM(2) the weighting matrix is appropriately constructed using residuals from AGMM1 estimation. The details of

9

the construction of the instrument matrix have been sketched in Appendix. Like AH estimators, these estimators are also geared to estimation of

α

and

β

, and do not provide estimates of the remaining parameters.

2SLS , 3SLS , and G3SLS :

This set of estimators may be commonly referred to as Simultaneous Equation (SE) estimators. All of them apply to the model in first differenced form. Thus, if T is the number of cross-sections available, the number of equations used in estimation is (T-1) . These estimators are also geared to estimation of

α

and

β

.

4

One issue in G3SLS estimation is which residuals to use for construction of the covariance matrix. The residuals can be obtained from either 2SLS or 3SLS or even from a generalized 2SLS. However, this does not change the asymptotic properties of the estimator. In this study residuals from 2SLS are used for both 3SLS and G3SLS.

MD :

In contrast with AH, AGMM, and SE estimators, MD works with the model in levels . Hence, all the cross-sections available can be used in estimation. Instead of eliminating

µ i

through first differencing, as other estimators do, MD specifies

µ i

in terms of its correlation with x ’s, and uses this specification to substitute for it

µ i

. Hence, his estimator can provide estimates not only of the slope parameters of the main dynamic equation but also of the

λ t

’s. However, this also makes MD a non-linear estimation procedure. An important issue in MD estimation, therefore, is of setting initial values of the parameters for non-linear iteration to proceed. The task is made difficult by the fact that, in case of a simulation exercise, the true parameter values are known! A related question is whether to use the same set of initial values to start the iteration for each replication or to change it over replications. In this study, we resolve these issues in the following manner.

4

However, in principle it is possible to add the equation for

µ i

to the system. This raises the number of equations to T and allows having estimates of

λ t

’s. However, inclusion of the equation for

µ i

into the system does not affect the asymptotic properties of the estimates of

α

and

β

because there are no across-equation restrictions between the equation for

µ i

and those for y t

’s.

10

It is impossible to set ‘arbitrary’ initial values that are free from prior knowledge of the true parameter values because they are known! Hence, it is preferable to have these initial values result from a routine, for example, from a preliminary estimation procedure.

This also resolves the other question and allows the initial values to be data-dependent and hence be different across replications. However, the question that arises is, which particular estimator to use for generating the initial parameter values? It is clear that LSDV is the most likely candidate because it is the only estimator other than MD that provides estimates of not only

α

and

β

but also of the

λ t s.

In MD estimation, there is also the issue of choosing the weighting matrix. There are several options in this regard and the optimality of the estimates depends on the choice. This study uses the optimal weighting matrix, which takes full account of possible heteroskedasticity across individuals.

Simulation Results and Discussion

It is apparent from the discussion above that not all the panel estimators are geared to provide estimate all the parameters of the model. Because of this and also in order not to clutter the presentation with too many numerical results, we focus here on results concerning only

α

and

β

, the main parameters of the model.

5

The simulation results presented in this paper are on the basis of one thousand replications. In most cases, Monte Carlo distributions stabilized already with one hundred replications. Hence increasing the number of replications by any further was not necessary.

The detailed results regarding the Monte Carlo distributions obtained from the study are provided in three sets of tables in the Appendix for three different generating mechanisms of

ν it

. Tables A1 to A3 show results for different samples when

ν it

is serially uncorrelated. Similarly, Tables A4 to A6 give results when

ν it

follows an MR(1) process.

Finally, Tables A7 to A9 give results when

ν it

obeys an AR(1) scheme.

The two criteria that are usually used in judging performance of an estimator are bias and mean square error ( MSE ). In order to make assessment of performance of the estimators

5

Hence, we do not report the results regarding the error variances.

11

easier, we present tables showing relative bias and relative magnitude of root mean square error. Tables 3 and 4 provide relative magnitudes of bias, and Tables 5 and 6 show relative magnitudes of root mean square error for the estimates of

α

and

β

, respectively. The following general features emerge from the simulation results.

First , a notable aspect of the results is the unsatisfactory performance of AH(l). The point estimates produced by this estimator fluctuate very widely. This finds expression in large standard deviations of Monte Carlo distributions obtained for this estimator. The unsatisfactory performance of AH (l) contrasts sharply with better performance of AH(d).

Both these estimators rely on the assumption of orthogonality of lagged y with t

ν it

. This assumption holds only when

ν it

is serially uncorrelated. Therefore, one would expect both these estimators to perform well when

ν it

is serially uncorrelated, and both of them to perform poorly when

ν it

is either AR(1) or MA(1). However, as can be seen from the relevant tables, AH(d) performs relatively well under all different generation mechanisms of

ν it

and for all samples, while the performance of AH(l) is found to be unsatisfactory under all different generation mechanisms of

ν it

, and particularly for NONOIL and INTER samples. The explanation, as it turns out, lies in the difference in the degree to which instruments are correlated with the explanatory variable. It is found that ( y i , t

−

2

− y i , t

−

3

) , the instrument used by AH(d), is strongly correlated with the explanatory variable

( y i , t

−

1

− y i , t

−

2

) , while y i , t

−

2

, the instrument used by AH(l), is very poorly correlated with

( y i , t

−

1

− y i , t

−

2

) . This poor correlation finds reflection in astronomically large values of standard error estimates for AH(l).

6

These results testify to the general rule that, in order to be successful, it is not sufficient for an instrument to be just uncorrelated with the error term, it also has to be adequately correlated with the explanatory variable for which it acts as an instrument.

Second , as we noticed, the degree of serial correlation in

ν it

is very mild. As suspected on the basis of a-priori reasoning, this mildness of serial correlation imparts a general pattern to the simulation results. For most of the estimators, the performance does

6

Computation of these standard errors involves the inverse of this correlation matrix.

12

not vary to a great extent with respect to the alternative generating scheme of

ν it

. We have already seen this in the context of the AH estimators. Going by the results on

α

, this is true even for AGMM1 and AGMM2. This is somewhat surprising because these estimators depend rather heavily for their validity on orthogonality of lagged values of y to it

ν it

, and this orthogonality is violated when

ν it

follows either an AR or a MA scheme.

7

Third , performance of many of the estimators shows considerable variation with respect to the sample considered. However, it is difficult to establish a single pattern in this regard. For example, going by the results on bias of estimated

α

, performance of OLS deteriorates for OECD when compared with that for either NONOIL or INTER samples.

However, in case of LSDV and MD, the opposite is true. The AGMM and SEM estimators show yet a different kind of contrast. The performance of the AGMM estimators deteriorates for INTER sample in comparison with that for either NONOIL or OECD. In case of SEM estimators, the opposite is true. The contrasting performance of AGMM and SEM estimators may not be entirely surprising in view of the fact that while the former depends on lagged y ’s as instruments, SEM estimators rely entirely on it x ’s.

it

Fourth , comparison of the performance of 3SLS and G3SLS with that of 2SLS yields mixed results. In the NONOIL sample, regardless of generating mechanism for

ν it

, results from 3SLS and G3SLS estimators seem to be better than that from 2SLS. For INTER sample, however, 2SLS seems to perform better than either 3SLS or G3SLS. In case of the

OECD sample, the situation is less clear cut. In terms of mean of the Monte Carlo distribution, 3SLS and G3SLS fare better than 2SLS, though not in terms of dispersion. On the other hand, in the OECD sample, the Monte Carlo distributions for 2SLS have very large standard deviation.

One reason for deterioration of performance of 3SLS and G3SLS estimators in

INTER and OECD samples, when compared to that in NONOIL sample, may lie in samplesize. The sizes of INTER and OECD samples are smaller that that of NONOIL. Since the

7

To be accurate, it needs to be mentioned that AH(d) does show some expected sensitivity with respect to the generation scheme of

ν it

. Why similar sensitivity is not found in the performance of the AGMM estimators is an interesting question.

13

superiority of 3SLS and G3SLS over 2SLS is an asymptotic result, a larger sample size may help this result to surface.

Fifth , the MD estimator performs better in comparison with AGMM estimators. With regard to

α

, this is particularly true for INTER and OECD samples. For

β

this is true for all different samples. In terms of bias, some of the SEM estimators perform slightly better than MD in some of the samples. Thus, for example, with regard to

α

, 3SLS and G3SLS show less bias than MD in NONOIL and INTER samples. However, in case of OECD, the bias involved with the former estimators is much larger than with the latter. Regarding

β

,

MD estimator shows less bias than by SEM estimators for INTER sample as well. Even for the NONOIL sample, the bias involved with the MD is similar to and sometimes less than that with SEM estimators.

Bias of the Monte Carlo distributions needs to be evaluated in conjunction with corresponding dispersion. Mean Square Error (MSE) is the popular measure that is used to take into account the possible trade-off between bias and variance. Looking at the values of root MSE (presented in Tables 5 and 6) of Monte Carlo distribution of the parameter estimates, we see that the MD estimator proves to be uniformly superior to AGMM and

SEM estimators. The only exception in this regard is the performance of 2SLS in estimating

β

in the INTER sample. We earlier noted somewhat erratic performance of AH(d). In terms of bias, it performs better than MD in some isolated cases. However, in terms of root MSE, except for one lone exception, MD outperforms AH(d).

Thus, overall, among the N -consistent estimators, MD appears to be a more dependable estimator for estimating the growth convergence equation using Summers-

Heston data set. One problem noticed with MD estimator for the problem at hand is its sensitivity with respect to the starting values of iteration procedure. However, in case of a single estimation exercise, it is possible to explore this sensitivity by looking at the minimum distance statistic of the converged results obtained from different starting values.

Applying this procedure, it can be ensured that a global maximum is reached, and not a local one.

One surprising aspect of the results was the relatively better performance of LSDV.

In terms of both bias and root MSE, and in estimation of both

α

and

β

, LSDV proves to be a relatively superior estimator for the problem at hand. This is somewhat surprising given

14

that theoretically LSDV is consistent only in the direction of T , and the data used in this study has a small T . What this shows is that small sample properties of estimators may be very specific to the model and data considered.

From another point of view, it is also interesting to note that simpler estimators such as LSDV and 2SLS (and in some cases, AH(d)) outperform more sophisticated estimators such as AGMM2 and some of the SEM estimators. This helps draw attention to the following important fact. The optimal properties of sophisticated estimators often depend on use of optimal weighting matrix . Unfortunately however, these weighting matrices have to be estimated , and as a result, they pick up, along with signal, noise present in the data. This noise gets transmitted to the final parameter estimates when these are produced using the estimated weighting matrices. To the extent that simpler estimators do not have to use estimated weighting matrices, they are spared of this potential source of additional noise.

This explains why sophisticated estimators may actually perform worse than simpler estimators, though this does not necessarily have to be the case.

Conclusions

Two kinds of results emerge from this investigation of small sample properties of dynamic panel estimators in the context of the growth convergence equation and Summers-

Heston data. On the one hand, there are specific results concerning the problem at hand. On the other hand, there are more general aspects of these specific results. Among the specific results, the following stand out. First , parameter estimates produced by different dynamic panel data estimators are indeed found to vary widely. This should act as a caution for researchers who are increasingly turning to panel data methods in investigating growth and convergence issues. Use of panel data presents advantages that are not available within the confines of cross-section data. However, these advantages come with some problems. The possibility of small sample bias is one of them. It seems desirable that researchers using panel estimators tried to check into the small sample properties of the estimators in respective particular contexts. Second , for estimation of the convergence equation using

Summers-Heston data, it seems that, in general, estimators not relying of further lagged dependent variable as instrument perform better than those which do rely on them. Third ,

15

the least squares with dummy variables prove very satisfactory for this particular model and particular data set.

Among the general aspects of these concrete results, the following may be noted.

First , asymptotic equivalence of properties may indeed be misleading. Monte Carlo investigations are helpful to have some idea about the small sample behavior of the estimators. Second , sometimes, small sample property may be quite at variance from that suggested by the theoretical properties. This is exemplified by the performance of LSDV in this exercise. Third , although sophisticated estimators generally have more desirable theoretical properties, in practice their implementation requires use of estimated weighting matrices. Sampling variability and other noise picked up in estimating the weighting matrices may often cause the sophisticated estimators perform worse than their simpler counterparts. Fourth , the results confirm the general rule that to be successful instruments should not only be uncorrelated with the error term but also sufficiently correlated with the variables for which they are working as instrument.

16

Table-3

Bias as Percentage of True Parameter Value

α

Estimator

OLS

LSDV

AH(l)

AH(d)

AGMM1

AGMM2

2SLS

3SLS

G3SLS

MD

NONOIL NONOIL NONOIL INTER INTER INTER

UC MA(1) AR(1)

14.8

14.6

14.8

-8.0

n.c.

-8.2

n.c.

-7.9

n.c.

0.4

-14.5

-10.7

-10.4

-9.7

-9.3

-10.1

-9.3

-15.9

-10.6

-44.4

-10.2

-47.3

-8.6

UC

15.2

-8.4

n.c.

0.2

-3.1

MA(1)

15.2

-9.3

n.c.

-9.5

-49.5

-49.5

-3.1

AR(1)

15.4

-8.0

n.c.

-10.0

-43.4

-44.4

-2.8

-4.5

-6.0

-6.7

-3.3

-5.4

-6.9

-6.5

-5.2

-6.4

-5.8

-8.3

-6.9

-7.9

-10.1

-7.9

-5.3

-8.8

-6.7

OECD OECD OECD

UC MA(1) AR(1)

21.5

20.9

21.2

-1.6

n.c.

-1.7

n.c.

-1.4

n.c.

0.6

-9.5

-1.2

-8.6

-1.6

-8.3

-8.6

-8.2

-8.5

-18.8

-15.8

-17.1

-12.7

-12.2

-12.2

-19.9

-16.5

-13.4

-1.3

-1.1

-1.2

Table-4

Bias as Percentage of True Parameter Value

β

Estimator

OLS

LSDV

AH(l)

AH(d)

AGMM1

AGMM2

2SLS

3SLS

G3SLS

MD

NONOIL NONOIL NONOIL

UC MA(1) AR(1)

31.4

1.0

32.1

-0.7

31.6

-0.3

n.c.

0.4

13.7

3.9

-2.3

-0.2

-1.3

0.2

n.c.

-2.2

14.4

5.2

-2.7

-0.2

-2.4

-0.7

n.c.

-4.0

14.5

14.7

-1.9

-2.0

-1.7

-0.8

INTER INTER INTER OECD OECD OECD

UC MA(1) AR(1)

11.8

0.5

11.7

1.4

UC

11.1

100.0

1.1

-1.5

MA(1) AR(1)

99.9

100.5

-0.7

-2.1

n.c.

-0.6

-7.5

3.1

2.7

-2.0

2.0

1.0

n.c.

-1.1

16.3

22.1

2.0

-2.3

2.2

0.5

n.c.

-1.0

26.4

34.5

2.5

9.3

8.7

0.0

n.c.

-1.7

n.c.

-2.5

-7.3

-5.4

-17.3

-19.9

-0.8

-5.1

-14.2

-0.6

-3.9

4.5

-8.1

-0.6

n.c.

-0.4

-2.6

1.5

-3.9

-8.9

2.0

1.3

17

Table-5

Root MSE as Percentage of True Parameter Value

α

OLS

LSDV

AH(l)

AH(d)

AGMM1

AGMM2

2SLS

3SLS

G3SLS

MD

Estimator NONOIL NONOIL NONOIL

UC MA(1) AR(1)

15.0

8.5

n.c.

8.3

27.7

29.6

12.1

8.5

10.0

7.4

14.8

8.7

n.c.

16.6

27.5

29.3

12.6

14.7

18.1

7.8

14.9

8.5

n.c.

17.6

26.7

28.9

12.0

10.4

8.7

7.4

INTER INTER INTER

UC MA(1) AR(1)

15.3

8.9

n.c.

5.4

64.8

79.7

5.1

9.6

11.9

7.6

15.3

9.9

n.c.

13.9

70.4

84.9

5.4

11.1

13.8

8.7

15.3

8.7

n.c.

13.3

65.9

77.1

5.0

8.9

12.6

7.6

OECD OECD OECD

UC MA(1) AR(1)

22.3

3.5

n.c.

7.3

24.3

32.9

24.3

28.4

40.9

3.0

21.7

3.6

n.c.

7.2

21.9

29.1

21.3

28.4

37.6

3.1

22.0

3.6

n.c.

7.5

23.7

31.0

23.0

23.6

29.3

3.2

Table-6

Root MSE as Percentage of True Parameter Value

β

OLS

LSDV

AH(l)

AH(d)

AGMM1

AGMM2

2SLS

3SLS

G3SLS

MD

Estimator NONOIL NONOIL NONOIL INTER INTER INTER OECD OECD OECD

UC MA(1) AR(1) UC MA(1) AR(1) UC MA(1) AR(1)

34.6

35.2

34.7

18.8

18.1

17.7

117.4

116.6

116.0

12.8

15.3

15.4

12.4

14.5

14.3

40.1

44.9

43.8

n.c.

n.c.

n.c.

n.c.

n.c.

n.c.

n.c.

n.c.

n.c.

19.9

20.1

19.1

20.3

21.2

18.1

64.9

60.2

62.1

147.0

151.9

145.3

148.0

153.8

143.6

243.5

237.9

226.5

169.6

169.7

165.1

187.7

205.2

181.9

306.8

284.6

284.1

17.1

18.4

17.5

19.5

20.8

19.3

58.3

54.7

57.9

13.7

21.9

16.1

16.5

15.6

17.8

67.2

64.9

82.5

17.5

28.2

15.4

17.8

18.5

23.5

119.4

111.1

149.6

13.3

15.4

15.8

12.6

15.1

14.4

40.5

44.6

45.6

18

Appendix on the Estimators

The Appendix provides brief description of the estimators included in the Monte

Carlo study presented in this paper. The details can be found in the appropriate references.

In the following, we ignore the time effects,

η i

, which can be dealt with in fairly straightforward manner, as described earlier in the paper.

OLS

In OLS estimation, the unobserved individual effect term is completely ignored and estimation is carried out on the basis of the equation in levels. Let y

=

( y

11

,

Λ

, y

N 1

,

Λ

, y iT

,

Λ

, y

NT

) y

−

1

=

( y

10

,

Λ

, y

N 0

,

Λ

, y

−

1

,

Λ

, y

−

1

) , and x

=

( x

10

,

Λ

, x

N 0

,

Λ

, x i , T

−

1

,

Λ

, x

N , T

−

1

)

Also, let W

=

[ y x

1

] . Then the OLS estimator of the parameter vector (

α β

)

′= γ

is given by

∃ =

(

′

)

−

1 ′

.

The standard errors under homoskedasticity are obtained from Var (

∃ = 2

( )

−

1

, with s

2 e e / ( NT

−

2 ) , where e

=

( y

−

W

∃

. The general heteroskedasticity consistent standard errors are obtained from (

′

)

−

1 ′

( ) (

′

)

−

1

. Since Cov y i t

−

1

,

µ i

)

≠

0 ,

OLS estimator is biased. It is also inconsistent in the direction of both N and T .

LSDV

In LSDV, the assumption is that the individual effects are fixed. Hence the straightforward way of implementing LSDV is through insertion of appropriate dummies and then application of OLS on the enlarged model, hence the name Least Squares with

Dummy Variables. However, computationally, it is simpler to obtain LSDV through within

19

estimation . Averaging equation (1) over time and subtracting it from the original equation yields y it

− y i

= α

( y i , t

−

1

− y i ,

−

1

)

+ β

( x i , t

−

1

− x i

)

+

( v it

− v i

) , where y i

=

1

T t

∑ T

=

1 y it

, y i ,

−

1

=

1

T

T t

∑

−

=

0

1 y i , t

−

1

, x i

=

1

T

T t

∑

−

1

=

0 x i , t

−

1

, and v i

=

1

T t

∑ T

=

1 v it

. LSDV estimation of the original model is equivalent to OLS estimation of the above equation.

Denote it

=

( y it

− y i

), i , t

−

1

=

( y i , t

−

1

− y i ,

−

1

), and i , t

−

1

=

( x i , t

−

1

− x i

) . Also, denote

~ = y

11

Λ

, ~ ,

N 1

Λ y

1 T

Λ y

NT

)

−

1

=

( ~ ,

10

Λ y

N 0

,

Λ y

1 , T

−

1

,

Λ y

−

1

)

=

(

10

,

Λ

,

N 0

,

Λ

,

1 , T

−

1

,

Λ

,

N , T

−

1

)

Now, if

~

W

=

[~

−

1

~]

, then the LSDV estimator is given by

∃ =

(

′

)

−

1

W y .

Clearly, Cov y

−

1

− y i ,

−

1

), ( v it

− v i

)] is not equal to zero. This makes LSDV estimator biased. It is also inconsistent in the direction of N . However, Amemiya (1967) shows that

LSDV for this model is consistent as T

→ ∞

, and, if the errors are normally distributed,

LSDV is asymptotically equivalent to MLE. The standard errors of the LSDV estimates under the assumption of homoskedasticity are obtained from Var (

∃ = 2 ′

~

2 = ′

NT

−

N

−

2 ) , where

~ =

( ~

−

~

W

γ

.

−

1

, with

Anderson and Hsiao Instrumental Variable (IV) Estimators

The two IV estimators proposed by Anderson and Hsiao (1981, 1982) both start by differencing equation (1) to eliminate the individual effect term

µ i

. This yields y it

− y i , t

−

1

= α

( y i , t

−

1

− y i , t

−

2

)

+ β

( x i , t

−

1

− x i , t

−

2

)

+

( v it

− v i , t

−

1

)

20

Because of correlation between y i , t

−

1

and v i , t

−

1

, instruments are needed. Provided v is it serially uncorrelated, further lagged values of y can act as necessary instruments.

it

AH(l)

Anderson and Hsiao (level) estimator uses y i t

−

2

as instrument. Denote

( y it

− y i , t

−

1

)

= it

, ( y i , t

−

1

− y i , t

−

2

)

= i , t

−

1

, and ( x i , t

−

1

− x i , t

−

2

)

= i , t

−

1

. Further, let vec y it l

= y

12

Λ y

N 2

,

Λ y

1 T

Λ

NT

]

′ vec ( x it

) l

=

[ x

11

,

Λ

, x

N 1

,

Λ

,

1 , T

−

1

,

Λ x

N , T

−

1

]

′ vec y

−

1

) l

= y

11

Λ y

N 1

Λ y

1 , T

−

1

,

Λ vec y

−

2

) l

= y

10

Λ y

N 0

,

Λ y

1 , T

−

2

,

Λ

−

2

−

1

]

′

] .

Define W l

=

[ vec ( i , t

−

1

) l estimator of

γ

is given by vec ( i , t

−

1

) l

], and Z l

=

[ vec ( y

−

2

)

∃ =

( Z W l

′ l

)

−

1

Z vec y l

′

( ~ ) it l

.

vec ( i , t

−

1

) l

] . Then AH(l)

The asymptotic standard error under the assumption of homoskedasticity are obtained from the formula, Var (

γ = s l

2

( Z W l l

)

−

1

Z Z W Z l

′ l

( l

′ l

)

−

1

, where s l

2 = ′ l

) / [ (

−

1 )

−

2 ] with e l

=

( ~ ) it l

−

W l

∃

. The general heteroskedasticity consistent standard errors are obtained from ( Z W l

′ l

)

−

1 l

′

( l l

) l

( l

′ l

)

−

1 .

AH (d)

In AH(d), the proposed instrument is ( y i , t

−

2

− y i , t

−

3

) instead of y i , t

−

2

. Continuing with the notations above, let vec y it d

= y

13

Λ y

N 3

,

Λ y

1 T

Λ

NT

]

′ vec ( x i , t

−

1

) d

=

[ x

12

,

Λ

, x

N 2

,

Λ

,

1 , T

−

1

,

Λ x

N , T

−

1

]

′ vec y

−

1

) d

= y

12

Λ y

N 2

,

Λ y

1 , T

−

1

,

Λ

−

1

]

′

21

vec y

−

2

) d

= y

11

Λ y

N 1

Λ y

1 , T

−

2

,

Λ vec y

−

3

) d

= y

10

Λ y

N 0

,

Λ y

1 , T

−

3

,

Λ

−

2

]

−

3

] .

Define W d

=

[ vec ( y i , t

−

1

) d vec ( x i , t

−

1

) d

], and Z d

=

[( vec ( y i , t

−

2

) d

− vec ( y i , t

−

3

) d

) vec ( x i , t

−

1

) l

] .

Then the AH(d) estimator of

γ

is given by

∃ =

( Z W d

′ d

)

−

1

Z vec y d

′

( ~ ) it d

.

The asymptotic standard errors under the assumption of homoskedasticity are obtained from

Var (

∃ = 2 s Z W d d d

)

−

1

Z Z W Z d

′ d

( d

′ d

)

−

1

where s d

2

( d

) / [ N T

−

2 )

−

2 ] with e d

=

( ~ ) it d

−

W d

∃

. The general heteroskedasticity consistent standard errors are obtained from ( d

′ d

)

−

1 d

′

( d d

) d

( d

′ d

)

−

1

.

It is apparent from the above details of construction of instrument matrices that if the total number of available cross-sections is T , and the initial values of the dependent variable are known or can be estimated, then the numbers of cross-sections actually used by AH(l) and AH(d) are (T-2) and (T-3) respectively. Thus one cross-section is ‘lost’ when AH(d) is chosen over AH(l). When the number of available cross-sections is small, this may be an important issue.

Anderson and Hsiao show that both these IV estimators have desirable asymptotic properties in the directions of both N

→ ∞

and T

→ ∞

. If v is normally distributed, then it these estimators are asymptotically equivalent to the ML estimators. Further, these desirable asymptotic properties of the AH estimators do not depend on the assumptions regarding the initial values of y .

it

3SLS and G3SLS

Dynamic panel data models can be viewed as a simultaneous system of T equations with equations distinguished by t.

Thus, conventional simultaneous equations estimators can be applied to the model, and Chamberlain (1982, 1983) suggests use of a generalized version of 3SLS. Like Anderson-Hsiao estimators, simultaneous equations estimators also

22

begin by first differencing the model to eliminate the individual effect term

µ i

. The system therefore looks as follows.

y iT

− y i , T

−

1

= α

Μ y i 2

− y i 1

= α

(

( y i , T

−

1 y i 1

−

− y i 0

) y i , T

−

+ β

2

)

+

( x i 1

−

β

( x i , T

−

1

− x i 0

)

+

( v i 2 x i , T

−

2

)

+

− v i 1

)

( v iT

− v i , T

−

1

)

However, unlike AH and Arellano estimators, simultaneous equations estimators use only x ’s as instruments. If observations on it y ’s are not available, then the system may be i 0 completed by adding a specification of y in terms of i 0 x ’s as follows.

it y i 0

= φ

0

+ φ

1 x i 0

+ Λ + φ

T x i , T

−

1

+ υ i

3SLS estimation can proceed the usual way with the weighting variance-covariance matrix estimated from the residuals from initial 2SLS estimation. For generalized 3SLS, the

0 0 required weighting matrix is given by ( i i x x i i

, where v i

0

’s are residuals obtained using true values of the parameters. In actual implementation, this is approximated by its sample analog N

−

1 i

∑ N

=

1

(

∃∃ x x i

) , where v ˆ ’s are residuals from initial 2SLS or 3SLS i estimation. Thus, G3SLS estimator makes a heteroskedasticity correction to the conventional 3SLS estimator.

Minimum Distance (Chamberlain)

Most of the dynamic panel data estimators begin by eliminating

µ i

through first differencing the model. In this regard, MD estimator is distinctive because instead of eliminating or ignoring, it tries to specify

µ i

. This allows MD estimator to work with the model in its level form.

One simple specification of

µ i

is that suggested by Mundlak (1971), whereby

µ i

is taken to be a linear function of x , the mean of the exogenous variable for individual i . The i purpose of Mundlak’s specification was however to show that under this specification of

µ i

,

23

the GLS estimator under random effects assumption becomes equivalent to the LSDV estimator under fixed effects assumption.

Noting that Mundlak’s suggested specification is overly restrictive, Chamberlain suggests a more general specification whereby

µ i

depends linearly on x for all time it periods for individual i with varying coefficients. According to this specification,

µ i

= λ

0

+ λ

1 x i 0

+ λ

2 x i 1

+ Κ + λ

T x i , T

−

1

+ ω i

, with w being the uncorrelated error term. If the initial values, i y ’s, are not available, a i 0 similar specification, as the following, can be used for these as well.

y i 0

= φ

0

+ φ

1 x i 0

+ φ

2 x i 1

+ Λ + φ

T x i , T

−

1

+ υ i

The procedure starts by recursive substitution for the lagged dependent variable, y i , t

−

1

. With

T = 5, as is the case with the concrete problem studied in this paper, this results into the following system of equations: y y

2 y

3 y

4 y

5

1

= β x

0

+

= αβ

= α x

0

2 β x

0

+

= α 3 β x

0

α y

0

+ µ + v

1

+

+

β x

1

αβ

α

+ x

2 β

1

α 2 x

1

+

+ y

0

β x

2

+

αβ x

2

+

(

µ +

α 3 y

αµ

)

+

0

+

(

µ

( v

2

+

+

αµ

α v

1

)

+ α 2 µ

)

+

+ β x

3

+ α 4 y

0

+

(

µ + αµ +

( v

3

α

+

2 µ

α v

2

+ α

+ α

3 µ

)

2 v

1

)

+

= α

( v

4

4 β

+ x

0

+

(

µ +

+

α v

3

α

+ α

3 β x

1

2 v

2

+ α

+ α

2 β

3 v

1

) x

2

+

αµ + α 2 µ + α 3 µ +

αβ x

α 4 µ

)

+

3

+

( v

5

β x

4

+

+

α v

4

α 5 y

0

+ α 2 v

3

+ α 3 v

2

+ α 4 v

1

)

In matrix form, this system of equations may be expressed as follows:

24

y

1







 y y y

4

2

3

 y

5









=













 α

α

α

β

αβ

2

3

4 β

β

β

+

α

0

β

α

αβ

2 β

3 β

0

0

β

αβ

α 2 β







 1

 1

+

+

α +

1

+

α

1

+

+

α

α

2

1

+

+

α

α

α

2

3

α 2 + α 3 + α 4









µ



+ u

1



 u



 u u

3

4

2

 u

5











0

0

β

0

αβ

0 x

0

0

0

0

β















 x

1 x x

2

3

  x

4









+



α









α

α

α

 α 5

2

3

4







 y

0



In this system, the right side variables include, other than y

0

and

µ

, only x ’s. The t u ’s are t composite error terms consisting of v ’s. It is now possible to substitute for t y

0

and

µ

, using their respective specifications above. This yields the following matrix of the reduced form coefficients:

Π =

β











α

α

α

αβ

2

3

4

β

β

β

0

β

αβ

α 2 β

α 3 β

0

0

β

α

αβ

2 β

0

0

0

β

αβ

0

0

0

0

β













+





α



α





α



α

α

2

3

4

5













φ +











 1

+

1

+

α

1

+

+

α

α

1

+

α

+

2

1

α

+

+

α

2

α

α

+

2

3

α

+

3

α 4













λ

, where

φ ′=

(

φ φ φ φ φ φ

5

) , and

λ =

(

λ

0

λ

1

λ

2

λ

3

λ

4

λ

5

) . If x ’s are strictly exogenous, then t they are also uncorrelated with the u ’s, and the reduced form equations can be estimated by t applying OLS.

Ignoring the intercept term, the

Π

-matrix above contains twenty-five elements which are non-linear functions of twelve underlying coefficients, namely

(

α β λ

1

λ

2

λ

3

λ

4

λ

5

φ

1

φ

2

φ

3

φ

4

φ

5

) which may be denoted by vector

θ ′ estimation is to squeeze out estimate of

θ

from

Π minimization:

θ

∃

= arg min( vec

∃

− g ( ))

′

N

( vec

Π − g .

25

Here g (

θ

) is the vector-valued function mapping the elements of

θ

into vec (

Π

) .

Chamberlain shows that the optimal choice for the weighting matrix A

N

−

1 is the inverse of

Ω =

[( i

− Π 0 x i

)( y i

− Π 0 x i

)

′⊗ Φ − x

1

( x x i

)

Φ − x

1

] , where

Π 0

is the matrix of true coefficients, and

Φ x

=

E x x i i

′ Ω

is a heteroskedasticity consistent weighting matrix. In actual implementation, it is replaced by its consistent sample analog

∃

=

N

−

1

∑ i

N

[( y i

− x i

)( y i

− x i

)

′⊗

S

− x

1

( x x S

− x

1

] , where S x

= i

∑ N

=

1 x x i i

/ N . Minimization can be conducted through an iterative routine. In the current study, the modified Gauss-Newton algorithm was used for the purpose. Note that if y i 0

’s are known then the

φ

’s do not appear in the system, and the dimensions of the estimation problem decreases.

AGMM1 and AGMM2

Extending Anderson-Hsiao’s idea, Arellano proposes using as instruments all further lagged values of y it

that qualify as instruments in view of serial uncorrelatedness of v it

. The number of instruments then differs depending on the time period for which the equation is considered. Arellano uses the GMM framework to adopt a multi-equation approach and use all the qualified instruments for each equation. The resulting block diagonal matrix of instruments, Z , can be seen as follows: i

Z i y i 0

=









0

0

Μ

 0

0 y i 0

0

Μ

0

0 y i 1

0

Μ

0

0 0

0 0 y i

Μ

0 y i 1

Μ

0 0

0

0 y i

Μ

2

0

Μ

0

0

0 y i 0

0

0

0

0

0

Μ Μ

0 y i 1 y i 2

Λ

Λ

Λ

Ο

Λ

0

0

Μ

0 y i , T

−

2

|

|

| x x i 2 x i i 1

3

−

−

−

Μ

| x i , T

−

1

− x i 0 x x i 1 i 2







 x i , T

−

2



26

Let v denote the N ( T

−

1 )

×

1 stacked vector of differenced error term ( v it

− v i , t

−

1

) , and Z denote the N ( T

−

1 )

× m stacked matrix of the instrument matrices Z ’s, where m is the i number of columns in the instrument matrix Z . Then the GMM estimator is given by i

∃ = arg min( v Z A

N

(

′

) , where A is an appropriate weighting matrix. If y denotes the stacked vector

N

( y it

− y i , t

−

1

) , and W denotes the stacked matrix [( y i , t

−

1

− y i , t

−

2

) ( x i , t

−

1

− x i , t

−

2

)] , then the formula for the

GMM estimator from the above minimization is given by

∃ =

(

′

N

′

)

−

1 ′

N

′

.

The one-step GMM estimator (AGMM1) is obtained by setting A

N

=

( N

−

1

∑ i i

)

−

1

, where H is an arbitrary square matrix of dimension (T-2).

8

In AGMM2, the weighting matrix A

N is replaced by V

∃

N 1

=

( N

−

1

∑ i

′ ′ i

)

−

1

, where v

ˆ i

are residuals from AGMM1 estimation. Thus AGMM2 introduces heteroskedasticity correction to the weighting matrix.

The general heteroskedasticity consistent standard errors of the estimates are obtained from

( W ZV Z W

N

′

)

−

1

, where V

ˆ

N

is estimated from the corresponding residuals. The standard errors for the AGMM1 estiamtes under the assumption of homoskedasticity are given by

(

′

N

′ 1 ′

∃

−

N N

1

1 N

′

)(

′

N

′

)

−

1

, where V

ˆ

N 1

is estimated using residuals from AGMM1 estimation.

8

Arellano suggests a H which has 2’s in the main diagonal and –1 in the first sub-diagonals and zeros elsewhere. Note that ( T -2), the dimension of this matrix, is the number of time periods for which the equation can be estimated.

27

Appendix

Tables Showing Details of the Monte Carlo Distributions

Table-A1

Simulation Results for NONOIL with Uncorrelated

ν

True Parameter Values:

α

= 0.7884 and

β

= 0.1641

it

Estimator Parameter Mean of

PE.

OLS

LSDV

AH(l)

AH(d)

AGMM1

AGMM2

2SLS

3SLS

G3SLS

MD

α

β

α

β

α

β

α

β

α

β

α

β

α

β

α

β

α

β

α

β

0.9054

0.2157

0.7251

0.1624

0.9555

0.1636

0.7917

0.1647

0.7041

0.1866

0.7123

0.1705

0.7150

0.1603

0.7532

0.1638

0.7412

0.1620

0.7354

0.1644

Standard

Deviation of PE.

Mean of

SE. (PE)

0.0149

0.0235

0.0207

0.0209

11.535

0.4714

0.0652

0.0326

0.2017

0.2401

0.2206

0.2783

0.0614

0.0278

0.0572

0.0225

0.0627

0.0286

0.0240

0.0218

0.0117

0.0181

0.0202

0.0209

492.05

15.368

0.0844

0.0338

0.0324

0.0381

0.0335

0.0397

0.0960

0.0319

0.0481

0.0212

0.0412

0.0172

0.0358

0.0202

Notes:

1.

PE = Parameter Estimate; SE = Standard Error of Estimate

2.

No. of replications = 1000

Standard

Deviation of SE

(PE)

0.0006

0.0009

0.0013

0.0008

142092

407.01

0.0117

0.0023

0.0169

0.0178

0.0216

0.0220

0.0114

0.0015

0.0107

0.0017

0.0090

0.0024

0.0088

0.0023

28

Table-A2

Simulation Results for INTER with Uncorrelated

ν it


α

= 0.7925 and

β

= 0.1732


PE.

OLS

LSDV

AH(l)

AH(d)

AGMM1

AGMM2

2SLS

3SLS

G3SLS

MD

α

β

α

β

α

β

α

β

α

β

α

β

α

β

α

β

α

β

α

β

0.9131

0.1936

0.7262

0.1741

0.2001

0.1405

0.7942

0.1721

0.4406

0.1603

0.4179

0.1785

0.7680

0.1778

0.7464

0.1766

0.7269

0.1766

0.7380

0.1749

Standard

Deviation of PE.

0.0161

0.0253

0.0245

0.0215

10.576

0.4778

0.0428

0.0351

0.3741

0.2560

0.5081

0.3251

0.0324

0.0334

0.0606

0.0283

0.0677

0.0307

0.0247

0.0217

Mean of

SE. (PE)

0.0114

0.0184

0.0231

0.0215

492.04

10.086

0.0584

0.0356

0.0568

0.0340

0.0650

0.0382

0.0525

0.0346

0.0532

0.0217

0.0425

0.0165

0.0318

0.0189

Standard

Deviation of SE

(PE)

0.0007

0.0019

0.0017

0.0009

7704.5

167.80

0.0060

0.0023

0.0341

0.0169

0.0581

0.0325

0.0042

0.0016

0.0098

0.0022

0.0095

0.0030

0.0091

0.0027

Notes:

1.


2.


29

Table-A3

Simulation Results for OECD with Uncorrelated

ν it


α

= 0.6294 and

β

= 0.0954


PE.

OLS

LSDV

AH(l)

AH(d)

AGMM1

AGMM2

2SLS

3SLS

G3SLS

MD

α

β

α

β

α

β

α

β

α

β

α

β

α

β

α

β

α

β

α

β

0.7644

0.1908

0.6195

0.0940

0.6281

0.0927

0.6329

0.0938

0.5659

0.0884

0.5755

0.0789

0.5110

0.0946

0.5492

0.0905

0.5044

0.0818

0.6214

0.0948

Standard

Deviation of PE.

0.0377

0.0587

0.0193

0.0382

0.0330

0.0573

0.0456

0.0619

0.1405

0.2322

0.1998

0.2922

0.0966

0.0556

0.1601

0.0639

0.2254

0.1131

0.0172

0.0384

Mean of

SE. (PE)

0.0223

0.0350

0.0188

0.0389

0.0567

0.0581

0.0673

0.0648

0.0358

0.0455

0.0399

0.0510

0.2089

0.1104

0.0731

0.0432

0.0276

0.0153

0.0114

0.0184

Standard

Deviation of SE

(PE)

0.0028

0.0045

0.0022

0.0030

0.0093

0.0052

0.0106

0.0069

0.0216

0.0280

0.0282

0.0394

0.0496

0.0094

0.0343

0.0077

0.0179

0.0071

0.0167

0.0134

Notes:

1.


2.


30

Table-A4

Simulation Results for NONOIL with MA(1)

ν it


α

= 0.7884 and

β

= 0.1641


PE.

OLS

LSDV

AH(l)

AH(d)

AGMM1

AGMM2

2SLS

3SLS

G3SLS

MD

α

β

α

β

α

β

α

β

α

β

α

β

α

β

α

β

α

β

α

β

0.9038

0.2167

0.7241

0.1630

0.2783

0.0918

0.6743

0.1605

0.7068

0.1878

0.7085

0.1727

0.7150

0.1597

0.7621

0.1638

0.7456

0.1601

0.7341

0.1630

Standard

Deviation of PE.

0.0156

0.0239

0.0241

0.0251

5.1662

0.3746

0.0636

0.0328

0.2011

0.2481

0.2164

0.2783

0.0663

0.0299

0.0562

0.0249

0.0684

0.0264

0.0280

0.0253

Mean of

SE. (PE)

0.0120

0.0186

0.0215

0.0224

41.247

2.3581

0.0776

0.0322

0.0310

0.0389

0.0317

0.0404

0.0990

0.0329

0.0543

0.0238

0.0461

0.0189

0.0415

0.0225

Standard

Deviation of SE

(PE)

0.0006

0.0009

0.0013

0.0009

184.55

14.054

0.0093

0.0017

0.0137

0.0209

0.0144

0.0279

0.0125

0.0015

0.0103

0.0018

0.0091

0.0023

0.0098

0.0026

Notes:

1.

1.PE = Parameter Estimate; SE = Standard Error of Estimate

2.


31

Table-A5

Simulation Results for INTER with MA(1)

ν it


α

= 0.7925 and

β

= 0.1732


PE.

OLS

LSDV

AH(l)

AH(d)

AGMM1

AGMM2

2SLS

3SLS

G3SLS

MD

α

β

α

β

α

β

α

β

α

β

α

β

α

β

α

β

α

β

α

β

0.9129

0.1935

0.7192

0.1756

0.9703

0.1717

0.7169

0.1713

0.3999

0.2015

0.4000

0.2114

0.7679

0.1767

0.7295

0.1771

0.7128

0.1770

0.7296

0.1740

Standard

Deviation of PE.

0.0161

0.0253

0.0245

0.0215

1.8658

0.0945

0.0805

0.0367

0.3968

0.2649

0.5466

0.3534

0.0357

0.0358

0.0617

0.0268

0.0750

0.0318

0.0283

0.0262

Mean of

SE. (PE)

0.0114

0.0184

0.0231

0.0215

8.7633

0.4108

0.0981

0.0360

0.0635

0.0417

0.0734

0.0479

0.0551

0.0363

0.0606

0.0245

0.0496

0.0185

0.0379

0.0215

Standard

Deviation of SE

(PE)

0.0007

0.0019

0.0017

0.0009

24.020

1.1130

0.0152

0.0023

0.0404

0.0255

0.0672

0.0426

0.0045

0.0018

0.0125

0.0022

0.0120

0.0028

0.0112

0.0030

Notes:

1.


2.


32

Table-A6

Simulation Results for OECD with MA(1)

ν it


α

= 0.6294 and

β

= 0.0954


PE.

OLS

LSDV

AH(l)

AH(d)

AGMM1

AGMM2

2SLS

3SLS

G3SLS

MD

α

β

α

β

α

β

α

β

α

β

α

β

α

β

α

β

α

β

α

β

0.7606

0.1907

0.6189

0.0947

0.6294

0.0995

0.6221

0.0930

0.5754

0.0903

0.5780

0.0764

0.5297

0.0917

0.5525

0.0997

0.5256

0.0877

0.6222

0.0948

Standard

Deviation of PE.

0.0375

0.0573

0.0200

0.0428

0.0333

0.0580

0.0444

0.0574

0.1268

0.2269

0.1756

0.2708

0.0895

0.0521

0.0614

0.0618

0.2128

0.1057

0.0184

0.0425

Mean of

SE. (PE)

0.0221

0.0345

0.0196

0.0405

0.0562

0.0579

0.0629

0.0609

0.0347

0.0457

0.0380

0.0499

0.2102

0.1096

0.0746

0.0442

0.0260

0.0154

0.0120

0.0188

Standard

Deviation of SE

(PE)

0.0028

0.0045

0.0023

0.0032

0.0089

0.0042

0.0096

0.0065

0.0181

0.0288

0.0272

0.0352

0.0503

0.0087

0.0293

0.0068

0.0290

0.0103

0.0140

0.0200

Notes:

1.


2.


33

Table-A7

Simulation Results for NONOIL with AR(1)

ν it


α

= 0.7884 and

β

= 0.1641


PE.

OLS

LSDV

AH(l)

AH(d)

AGMM1

AGMM2

2SLS

3SLS

G3SLS

MD

α

β

α

β

α

β

α

β

α

β

α

β

α

β

α

β

α

β

α

β

0.9047

0.2160

0.7260

0.1636

0.6399

0.1916

0.6628

0.1576

0.7050

0.1879

0.7082

0.1882

0.7206

0.1610

0.7621

0.1638

0.7456

0.1601

0.7377

0.1628

Standard

Deviation of PE.

0.0152

0.0234

0.0246

0.0253

38.928

2.6155

0.0595

0.0307

0.1933

0.2372

0.2131

0.2699

0.0664

0.0286

0.0562

0.0249

0.0684

0.0264

0.0288

0.0259

Mean of

SE. (PE)

0.0119

0.0185

0.0214

0.0224

1998.8

125.10

0.0724

0.0312

0.0306

0.0377

0.0314

0.0393

0.0985

0.0329

0.0543

0.0238

0.0461

0.0189

0.0429

0.0224

Standard

Deviation of SE

(PE)

0.0006

0.0009

0.0013

0.0009

465000

3333.0

0.0078

0.0015

0.0143

0.0178

0.0170

0.0209

0.0120

0.0015

0.0103

0.0018

0.0091

0.0023

0.0111

0.0026

Notes:

1.


2.


34

Table-A8

Simulation Results for INTER with AR(1)

ν it


α

= 0.7925 and

β

= 0.1732


PE.

OLS

LSDV

AH(l)

AH(d)

AGMM1

AGMM2

2SLS

3SLS

G3SLS

MD

α

β

α

β

α

β

α

β

α

β

α

β

α

β

α

β

α

β

α

β

0.9146

0.1925

0.7288

0.1751

0.6981

0.1629

0.7134

0.1714

0.4487

0.2189

0.4407

0.2330

0.7704

0.1776

0.7507

0.1893

0.7230

0.1882

0.7392

0.1732

Standard

Deviation of PE.

0.0154

0.0237

0.0272

0.0246

5.7315

0.3521

0.0702

0.0313

0.3929

0.2444

0.4996

0.3093

0.0333

0.0331

0.0576

0.0263

0.0715

0.0378

0.0277

0.0250

Mean of

SE. (PE)

0.0115

0.0186

0.0238

0.0223

87.777

3.5420

0.0871

0.0332

0.0604

0.0387

0.0665

0.0428

0.0532

0.0350

0.0591

0.0221

0.0488

0.0165

0.0361

0.0202

Standard

Deviation of SE

(PE)

0.0006

0.0010

0.0017

0.0009

784.71

43.393

0.0126

0.0020

0.0423

0.0299

0.0541

0.0333

0.0041

0.0016

0.0155

0.0020

0.0175

0.0032

0.0107

0.0029

Notes:

1.

1.PE = Parameter Estimate; SE = Standard Error of Estimate

2.


35

Table-A9

Simulation Results for OECD with AR(1)

ν it


α

= 0.6294 and

β

= 0.0954


PE.

OLS

LSDV

AH(l)

AH(d)

AGMM1

AGMM2

2SLS

3SLS

G3SLS

MD

α

β

α

β

α

β

α

β

α

β

α

β

α

β

α

β

α

β

α

β

0.7629

0.1913

0.6208

0.0934

0.6319

0.0965

0.6192

0.0950

0.5770

0.0929

0.5762

0.0968

0.5215

0.0917

0.5529

0.0869

0.5449

0.0973

0.6219

0.0966

Standard

Deviation of PE.

0.0378

0.0553

0.0208

0.0417

0.0337

0.0586

0.0459

0.0593

0.1399

0.2161

0.1877

0.2710

0.0964

0.0551

0.1275

0.0782

0.1643

0.1427

0.0185

0.0435

Mean of

SE. (PE)

0.0221

0.0346

0.0192

0.0400

0.0555

0.0571

0.0649

0.0630

0.0350

0.0453

0.0378

0.0495

0.2136

0.1105

0.0778

0.0446

0.0274

0.0165

0.0120

0.0191

Standard

Deviation of SE

(PE)

0.0028

0.0044

0.0022

0.0032

0.0087

0.0049

0.0095

0.0061

0.0200

0.0288

0.0251

0.0384

0.0525

0.0090

0.0337

0.0061

0.0157

0.0079

0.0238

0.0161

Notes:

1.


2.


36

References

Amemiya, T. (1967), “A Note on the Estimation of Balestra-Nerlove Models,” Technical

Report No. 4, Institute of Mathematical Studies in Social Sciences, Stanford

University.

Amemiya, T. (1971), “Estimation of the Variance in a Variance-Component Model,”

International Economic Review , 12:1-13.

Anderson, T. W. and C. Hsiao (1981), “Estimation of Dynamic Models with Error

Components,” Journal of American Statistical Association , 76:598-606.

Anderson, T. W. and C. Hsiao (1982), “Formulation and Estimation of Dynamic Models

Using Panel Data,” Journal of Econometrics , 18:47-82.

Arellano, M. (1989a), “On the Efficient Estimation of Simultaneous Equations with

Covariance Restrictions,” Journal of Econometrics , 42:247-265.

Arellano, M. (1989b), “An Efficient GLS Estimation of Triangular Models with

Covariance Restrictions,” Journal of Econometrics , 42:267-273.

Arellano, M. and S. Bond (1991), “Some Tests of Specification for Panel Data: Monte

Carlo Evidence and an Application to Employment Equations,” The Review of

Economic Studies , 58:277-297.

Balestra, P. and M. Nerlove (1966), “Pooling Cross-section and Time Series Data in the

Estimation of a Dynamic Model: The Demand of Natural Gas,” Econometrica ,

34:585-612.

Bhargava, A. and J. D. Sargan (1983), “Estimating Dynamic Random Effects Models from Panel Data Covering Short Time Periods,” Econometrica , 51:1635-1659.

37

Chamberlain, G. (1982), “Multivariate Regression Models for Panel Data,” Journal of

Econometrics , 18:5-46.

Chamberlain, G. (1983), “Panel Data,” in Z. Griliches and M. Intrilligator (editors),

Handbook of Econometrics , Vol. II., 1247-1318.

Mankiw, N. G., D. Romer, and D. Weil (1992), “A Contribution to the Empirics of

Growth,” Quarterly Journal of Economics , CVII: 407-437.

Nerlove, M. (1967), “Experimental Evidence on the Estimation of Dynamic Economic

Relations from a Time Series of Cross-sections, Economic Studies Quarterly ,

18:42-74.

Nerlove, M. (1971), “Further Evidence on the Estimation of Dynamic Economic

Relations from a Time Series of Cross-sections,” Econometrica , 39:383-396.

Nickel, S. (1979), “Biases in Dynamic Models with Fixed Effects,” Econometrica ,

49:1399-1416.

Sevestre, P. and Trognon, A. (1982), “A Note on Autoregressive Error Components

Models,” 8209, Ecole Nationale de la Statistique et de l’Administration

Economique et Unite de Recherche.

Summers, R. and A. Heston (1988), “A New Set of International Comparisons of Real

Product and Price Levels Estimates for 130 Countries, 1950-85,” Review of

Income and Wealth , XXXIV: 1-26.

Summers, R. and A. Heston (1991), “The Penn World Table (Mark 5): An Expanded Set of International Comparisons, 1950-1988,” Quarterly Journal of Economics ,

106: 327-368.

38

Abstract

This paper investigates the small sample properties of dynamic panel data estimators as applied for estimation of growth convergence equation using Summers-Heston data set.

The Monte Carlo results show that estimates from different estimators do vary. Second, those dynamic panel estimators which do not use further lagged dependent variable as instruments perform better than the ones which do. These results together suggest that investigators using panel estimators should try to check into the possible small sample bias of their results. ( JEL classification: C3, O4)

39

Small Sample Performance of Dynamic Panel Data Growth Data

Small Sample Performance of Dynamic Panel Data

Estimators: A Monte Carlo Study on the Basis of

Growth Data

---------------------------------------------------------------------