Note

advertisement
PANEL DATA MODELS
INTRODUCTION
Fixed-effects and random-effects regression models are used to analyze panel data (also called
longitudinal data). Panel data is a combination of cross-section and time series data. To collect
panel data you collect data on the same units for two or more time periods. For example, you
might collect data on the same individuals, firms, school districts, cities, states, nations, etc. for
each year over the period 1995 to 1999.
There are two major benefits from using panel data. 1) Panel data allows you to get more
reliable estimates of the parameters of a model. There are several possible reasons for this. a)
Panel data allows you to control for unobservable factors that vary across units but not over
time, and unobservable factors that vary over time but not across units. This can substantially
reduce estimation bias. b) There is usually more variation in panel data than in cross-section or
time-series data. The greater the variation in the explanatory variables, the more precise the
estimates. c) There is usually less multicollinearity among explanatory variables when using
panel data than time-series or cross-section data alone. This also results in more precise
parameter estimates. 2) Panel data allows you to identify and measure effects that cannot be
identified and measured using cross-sectional data or time-series data. For example, suppose
that your objective is to estimate a production function to obtain separate estimates of
economies of scale and technological change for a particular industry. If you have cross-section
data, you can obtain an estimate of economies of scale, but you can’t obtain an estimate of
technological change. If you have time-series data you cannot separate economies of scale
from technological change. To attempt to separate economies of scale from technological
change, past time-series studies have assumed constant returns to scale; however, this is a very
dubious procedure. If you have panel data, you can identify and measure both economies of
scale and technological change.
Example
Your objective is to use a sample of data on working-age adults to obtain an unbiased estimate
of the effect of education on the wage. You believe that the most important variables that
affect the wage are education, work experience, and innate ability. Experience is an observable
confounding variable. Because you can obtain data for experience, you can control for it by
including it as an explanatory variable in your model. Innate ability is an unobservable
confounding variable. Because you can’t observe innate ability and collect data for it, you can’t
control for it by including it as an explanatory variable. However, you believe that innate ability
differs across working-age adults, but is constant over time. Therefore, if you can collect panel
data on wage, education, and experience, you can specify a fixed- effects model and statistically
control for innate ability.
Example
Your objective is to analyze the relationship between income, health insurance
coverage, and health care spending in the U.S. You want to determine if income and health
insurance have an effect on health care spending, and if so the direction and size of the effects.
You have data on healthcare spending per capita, income per capita, and the percent of the
population with health insurance for 50 states for the years 1991 to 2000. You believe there are
many factors that affect health care spending and are correlated with income and health
insurance. Using panel data, you can control for any such factors that vary across states but not
over time, and vary over time but not across states.
FIXED-EFFECTS REGRESSION MODEL
Specification
Consider an economic relationship that involves a dependent variable, Y, two observable
explanatory variables, X1 and X2, and one or more unobservable confounding variables. You
have panel data for Y, X1, and X2. The panel data consists of N-units and T-time periods, and
therefore you have N times T observations. The classical linear regression model without an
intercept is given by
Yit = β1Xit1 + β2Xit2 + μit
for i = 1, 2, …, N and t = 1, 2, …, T
where Yit is the value of Y for the ith unit for the tth time period; Xit1 is the value of X1 for the ith
unit for the tth time period, Xit2 is the value of X2 for the ith unit for the tth time period, and μit
is the error for the ith unit for the tth time period.
The fixed effects regression model, which is an extension of the classical linear
regression model, is given by
Yit = β1Xit1 + β2Xit2 + νi + εit
where μit = νi + εit. The error term for the classical linear regression model is decomposed into
two components. The component νi represents all unobserved factors that vary across units but
are constant over time. The component εit represents all unobserved factors that vary across
units and time.
It is assumed that the net effect on Y of unobservable factors for the ith unit that are
constant over time is a fixed parameter, designated αi. Therefore, the fixed effects model can
be rewritten as
Yit = β1Xit1 + β2Xit2 + α1 + α2 + … + αN + εit
The unobserved error component νi has been replaced with a set of fixed parameters, α1 + α2 +
… + αN, one parameter for each of the N units in the sample. These parameters are called
unobserved effects and represent unobserved heterogeneity. For example, α1 represents the
net effect on Y of unobservable factors that are constant over time for unit one, α2 for unit two,
…, aN for unit N. Therefore, in the fixed-effects model each unit in the sample has its own
intercept. These N intercepts control for the net effects of all unobservable factors that differ
across units but are constant over time.
Example
For the wage determination model, Yit is the wage for the ith working adult for the tth time
period; Xit1 is education for the ith working adult for the tth time period, Xit2 is experience for
the ith working adult for the tth time period, and αi is the effect of innate ability on wage for
the ith working adult, assuming that innate ability is the only unobservable factor that affects
wage that differs across working adults, but is constant over time. Suppose you have a sample
of N = 1,000 working-age adults for T = 3 years. Therefore, you have NxT = 3,000 observations.
This fixed-effects model has 1,002 regression coefficients, 1,000 fixed effects, and 3,000 – 1,002
= 1,998 degrees of freedom.
Example
For the health care spending model, Yit is healthcare spending per capita for the ith state for the
tth year; Xit1 is income per capita for the ith state for the tth year, Xit2 is health insurance
coverage for the ith state for the tth year, and αi is the fixed effect for the ith state. You have a
sample of N = 50 states for T = 10 years. Therefore, you have NxT = 500 observations. The fixedeffects model has 52 regression coefficients, 50 fixed-effects, and 398 degrees of freedom.
Estimation of the Fixed Effects Model
Two alternative but equivalent estimators can be used to estimate the parameters of the fixed
effects model. 1) Least squares dummy variable estimator. 2) Fixed effects estimator.
Least Squares Dummy Variable Estimator
The least squares dummy variable estimator involves two steps. In step #1, create a dummy
variable for each of the N units in the sample. These N dummy variables are defined as follows.
 1 if k = i
Dkit = 
 0 if k  i
In step #2, run a regression of the dependent variable on the N dummy variables and the
explanatory variables using the OLS estimator. For a model with N units and two explanatory
variables, the step #2 regression equation without an intercept is
Yit = 1D1it + 2 D2it + … + n DNit + 1Xit1 + 2Xit2 + it
or with an intercept is
Yit = 1 + 2 D2it + … + n DNit + 1Xit1 + 2Xit2 + it
The least-squares dummy variable regression can be run with dummy variables for all units and
no intercept, or dummy variables for N-1 units with an intercept. This yields estimates of the N
fixed-effects intercept parameters and the slope parameters.
The estimate of the intercept and slope parameters are unbiased in small samples. The
estimate of the slope parameter  is consistent in large samples with a fixed T as N  .
However, the estimates of the intercept parameters are not consistent, with a fixed T as N 
. This is because as we add each additional cross-section unit we add a new parameter. In
general, the larger T, the better the estimates of the intercept parameters. Because of this,
when T is small many researchers view of intercept parameters as controls, and ignore the
actual estimates.
Fixed Effects Estimator
When N is large, using the least squares dummy variable estimator is cumbersome or impossible. For
example, suppose that we want to estimate a fixed effects model of wage determination. We have N =
1,000 working-age adults. To use the least squares dummy variable estimator, we would have to create
1,000 dummy variables and run an OLS regression on more than 1,000 variables. In this type of
situation, it is better to use the fixed effects estimator.
The fixed effects estimator involves two steps. In step #1, you transform the original
data to time-demeaned data. This is called the within transformation. This transformation for
each variable is given as follows,
yit = Yit – YiBar
xit1 = Xit1 – Xi1Bar
xit2 = Xit2 – Xi2Bar
it = it – iBar
where YiBar is the average value of Y for the ith unit over the T years; XiBar is the average value of
X for the ith unit over the T years; iBar is the average value of  for the ith unit over the T years;
Yit, Xit, and it are the actual values, and yit, xit, and it are the deviations from the time means.
In step #2, run a regression of yit on xit1 and xit2 using the OLS estimator. That is;
estimate the following equation using OLS,
yit = 1xit1 + 2xit2 + it
Notice that this regression does not include any intercept terms. If you want to obtain
estimates of the N intercept terms that represent the fixed effects, you an recover these
estimates by using the following formula.
i^ = YiBar - ^ XiBar
for i = 1,2, …, N
The fixed effects estimator yields exactly the same estimates as the least squares
dummy variable estimator. The fixed effects estimator also has the same properties as the
least squares dummy variable estimator. It should be noted that the degrees of freedom are
the same for the least squares dummy variable estimator and the fixed effects estimator. That
is, you lose one degree of freedom for each “fixed-effect” in the model.
The logic of the fixed-effects estimator is as follows. To estimate the independent causal
effects of X1 and X2 on Y, the fixed effects estimator uses variation in X1, X2, and Y over time, i.e.,
xit = Xit – XiBar and yit = Yit – YiBar. Let Zi denote an unobserved variable that differs across units
but is constant over time, and therefore included in the error term. Because Zi does not change
over time, i.e., zi = Zi – ZiBar = 0, it cannot cause any change in yit = Yit – YiBar; that is, because Zi
does not vary over time it cannot explain any of the variation in Yit over time. Therefore, the
fixed effects estimator eliminates the effect of Zi on Yit by using data on the changes or variation
in Yit over time.
Unit Dependent Observed Factors in the Fixed-Effects Model
The fixed-effects parameters, αi, capture the net effects of all factors, both observable and
unobservable, that differ across units but are constant over time. Therefore, in the fixed-effects
model you can’t include any observable factor that differs across units but is constant over
time. By examining the fixed-effects estimator you can see why this is. Let Xi be an observable
variable that differs across units but is constant over time. For this variable, xi = Xi – XiBar = 0.
Because there is no variation in Xi over time, the fixed effects estimator eliminates the effect of
Xi, and therefore we can’t obtain an estimate of its independent causal effect on Y. That is,
because Xi doesn’t vary over time, it can’t explain any of the variation in Yit over time.
Example
You can’t include variables such as gender and race as explanatory variables in a fixed-effects
wage determination model, because these variables differ across working-age adults but are
constant over time. If your sample only includes working adults who have completed their
schooling, then education differs across working adults but is constant over time. In this case,
you can’t use the fixed-effects model to obtain an estimate of the effect of education on wage.
Heteroscedasticity in the Fixed-Effects Model
Since the fixed effects model is an extension of the classical linear regression model, you can
account for heteroscedasticity in the same way as you do in the classical linear regression
model. If you don’t know the structure of the heteroscedasticity you can correct the variancecovariance matrix of estimates using White robust standard errors. If you know the structure of
the heteroscedasticity, you can obtain more efficient estimates by using weighted least
squares.
Autocorrelation in the Fixed-Effects Model
When you have panel data with a relatively large number of cross section units (e.g., 10 or
more) and a relatively small number of time periods (e.g., 10 or less), considerations of
autocorrelation are usually ignored. These types of data sets are usually not long enough to
analyze the underlying process that generated the disturbances. However, if you have enough
time periods, you can easily account for autocorrelation in the fixed effects model. The most
common way to do this is to assume that the disturbances for each cross-section unit over time
follow an AR(1) process.
Two-Stage Least Squares for the Fixed-Effects Model
If one or more of the right-hand side variables are endogenous, you can apply the two-stage
least squares estimator to the fixed effects model. To do so, you proceed as follows.
Stage #1: Regress each endogenous right-hand side variables on all exogenous variables and all
dummy variables. Save the fitted values for the endogenous variables.
Stage #2: Replace each endogenous right-hand side variable with its fitted value variable.
Estimate the fixed effects model as usual. Correct the standard errors in the usual way.
Hypothesis Testing in the Fixed-Effects Model
To test hypotheses in the fixed effects model, you can use either small sample tests (t-test, Ftest) or large sample tests (asymptotic t-test, likelihood ratio test, Wald test, Lagrange
multiplier test).
Specification Testing in the Fixed-Effects Model
A specification test that is often performed is to test whether the classical linear regression
model or the fixed-effects regression model is the appropriate model.
For example, consider a fixed-effects model with N-units and two explanatory variables.
Yit = 1 + 2 + … + N + 1Xit1 + 2Xit2 + it
The null hypothesis of no fixed effects (classical linear regression model is the appropriate
model) against the alternative hypothesis of fixed effects (fixed-effects model is the appropriate
model) is specified as follows.
H0: 1 = 2 = … = n =  (Classical linear regression model is appropriate)
H1: At least one of the i is not equal to the constant  (Fixed-effects model is
appropriate)
To test this hypothesis, you can use an F-test. The unrestricted model is the fixed-effects model.
The restricted model is the classical linear regression model.
FIXED-EFFECTS MODEL WITH TIME EFFECTS
The fixed-effects model can be extended to account for unobserved factors that vary over time
but not across units.
The fixed-effects regression model with time effects is given by
Yit = β1Xit1 + β2Xit2 + νi + λt + εit
where μit = νi + λt + εit. The error term for the classical linear regression model is decomposed
into three components. The component λt represents all unobserved factors that vary over time
but not across units.
It is assumed that the net effect on Y of unobservable factors for the tth time period
that are constant over time is a fixed parameter, designated γ i. Therefore, the fixed effects
model can be rewritten as
Yit = β1Xit1 + β2Xit2 + α1 + α2 + … + αN + γ1 + γ2 + … + γT + εit
The unobserved error component νi has been replaced with a set of fixed parameters, α1 + α2 +
… + αn, for the N units in the sample. The unobserved error component λt has been replaced
with a set of fixed parameters, γ1 + γ2 + … + γT for T time periods in the sample. The T
parameters control for the net effects of all unobservable factors that differ over time but not
across units.
Estimation of the Fixed-Effects Model with Time Effects
Two alternative but equivalent estimators can be used to estimate the parameters of the fixed
effects model. 1) Least squares dummy variable estimator. 2) Fixed effects estimator.
Least Squares Dummy Variable Estimator
The least squares dummy variable estimator involves two steps. In step #1, create N-1 dummy
variables for the units and T-1 dummy variables for the time periods. In step #2, run a
regression of the dependent variable on the N-1 unit dummy variables, T-1 time dummy
variables, and the explanatory variables using the OLS estimator, including an intercept. One
unit and time period are dropped to avoid perfect multicollinearity.
Fixed-Effects Estimator with Time Dummy Variables
The fixed-effects estimator with time dummy variables involves estimating the model using the
fixed-effects estimator including T-1 time dummy variables and an intercept.
Time Dependent Observed Factors in the Fixed-Effects Model
You can’t include any observable factors that differ over time but are constant across units. If
you do, you will get perfect multicollinearity.
Specification Testing in the Fixed-Effects Model with Time Effects
Consider a the following fixed-effects model with time effects and two explanatory variables.
Yit = φ + 1 + 2 + … + N-1 + γ1 + γ2 + … + γT-1 + 1Xit1 + 2Xit2 + it
The following three specification tests are often times performed using an F-test.
1. No fixed-effects or time-effects.
H0: γ1 = γ2 = … = γT-1 = 1 = 2 = … = N-1 = 0
H1: At least one  or γ ≠ 0
2. No time effects.
H0: γ1 = γ2 = … = γT-1 = 0
H1: At least one γ ≠ 0
3. No fixed-effects.
H0: 1 = 2 = …, = N-1 = 0
H1: At least one  ≠ 0
RANDOM EFFECTS MODEL
Specification
Consider an economic relationship that involves a dependent variable, Y, and two observable
explanatory variables, X1 and X2. You have panel data for Y, X1, and X2. The panel data consists
of N-units and T-time periods, and therefore you have N times T observations. The random
effects model can be written as
Yit = β1Xit1 + β2Xit2 + νi + εit
for i = 1, 2, …, N and t = 1, 2, …, T
where the classical error term is decomposed into two components. The component ν i
represents all unobserved factors that vary across units but are constant over time. The
component εit represents all unobserved factors that vary across units and time. It is assumed
that vi is given by
vi = α0 + ωi
for i = 1, 2, …, N
where the vi are decomposed into two components: 1) a deterministic component 0, 2) a
random component vi. Once again, each of the N units has its own intercept. However, in this
model the N intercepts are not fixed parameters; rather they are random variables. The
deterministic component 0 is interpreted as the population mean intercept. The disturbance
ωi is the difference between the population mean intercept and the intercept for the ith unit. It
is assumed that the ωi for each unit is drawn from an independent probability distribution with
mean zero and constant variance; that is,
E(ωi) = 0
Var(ωi) = ω2
Cov(ωi,ωs) = 0
The N random variables vi are called random effects.
The random effects model can be rewritten equivalently as
Yit = α0 + β1Xit1 + β2Xit2 + μit
where μit = ωi + εit. An important assumption underlying the random effects model is that the
error term μit is not correlated with any of the explanatory variables.
Because the error component ωi is in the error term μit for each unit for each time
period, the error term μit has autocorrelation. The correlation coefficient for the error term for
the ith unit for any two time periods t and s is given by
Corr(μit, μis) = σ2ω / (σ2ω + σ2ε)
where σ2ω is the variance of ωi, and σ2ε is the variance of εit. Since this correlation coefficient
must be positive the autocorrelation is positive.
Estimation of the Random Effects Model
If you estimate the random effects model using the OLS estimator, you would obtain parameter
estimates that are unbiased but inefficient. In addition, the OLS estimates of the standard
errors and hence t-statistics are incorrect. This is because the OLS estimator ignores the
autocorrelation in the error term μit. To obtain unbiased and efficient estimates, you can use a
Feasible GLS (FGLS) estimator that takes into account the autocorrelated disturbances. The
FGLS estimator is called the random effects estimator.
The random-effects estimator involves two steps. In step #1, you transform the original
data to weighted time-demeaned data. The transformation for each variable is given as follows,
yit = Yit – wYiBar
xit1 = Xit1 – wXi1Bar
xit = Xit2 – wXi2Bar
μit = μit – wμiBar
where YiBar is the average value of Y for the ith unit over the T years; XiBar is the average value of
X for the ith unit over the T years; μiBar is the average value of μ for the ith unit over the T years;
Yit, Xit, and μit are the actual values, and yit, xit, and μit are the deviations from the weighted time
means. The weight, w, is given by
w = 1 - √[ σ2ε / (σ2ε + Tσ2ω)]
The variances that comprise the weight are estimated using either the OLS or fixed-effects
residuals.
In step #2, run a regression of yit on (1 – w), xit1 and xit2 using the OLS estimator. That is;
estimate the following equation using OLS,
yit = α0(1 – w) + 1xit1 + 2xit1 + it
Note that if w = 0 then the random effects model reduces to the classical linear regression
model. If w = 1 then the random effects model reduces to the fixed-effects model.
The random-effects estimator is not necessarily unbiased in small samples; however,
the FGLS estimator is consistent in large samples with a fixed T as N  . The properties of
the random effects estimator when N is small and T is large are unknown; however, some
researchers have used it in these situations.
Unit Dependent Observed Factors in the Random-Effects Model
Unit dependent observed factors can be included in the random-effects model.
Example
You can include variables such as gender and race as explanatory variables in a random-effects
wage determination model.
Heteroscedasticity in the Fixed-Effects Model
If you believe you have heteroscedasticity, you can correct the variance-covariance matrix of
estimates using White robust standard errors.
Autocorrelation in the Fixed-Effects Model
You can account for autocorrelation in the error component ε it, but doing so is more complex
than for the fixed-effects model.
Two-Stage Least Squares for the Fixed-Effects Model
If one or more of the right-hand side variables are endogenous, you can apply the two-stage
least squares estimator to the random effects model, but doing so is more complex than for the
fixed-effects model.
Hypothesis Testing in the Random Effects Model
To test hypotheses in the random-effects model, you cannot use the small sample t-test or Ftest. You must use large sample tests (asymptotic t-test, likelihood ratio test, Wald test,
Lagrange multiplier test).
Specification Testing in the Random Effects Model
An hypothesis that is often times tested when estimating a random-effects model is the null
hypothesis of no random effects (classical linear regression model is the appropriate model)
against the alternative hypothesis of random effects (random-effects model is the appropriate
model). This hypothesis is specified as follows
H0: ω2 = 0 (Classical linear regression model is appropriate)
H1: ω2  0 (Random-effects model is appropriate)
Note that if ω2 = 0 then each unit has the same intercept, and therefore the classical linear
regression model is the appropriate model. If ω2  0 then different units have different
intercepts, and therefore the random-effects model is the appropriate model. This null and
alternative hypothesis can be tested using a Breusch-Pagan Lagrange multiplier test.
RANDOM-EFFECTS MODEL WITH TIME EFFECTS
The random-effects model can be extended to account for unobserved factors that vary over
time but not across units. The random-effects regression model with time effects is given by
Yit = α0 + α1Di1 + α2Di2 + … + αT-1DiT-1 + β1Xit1 + β2Xit2 + μit
where μit = ωi + εit, and Di1, Di2, …, DiT-1 are dummy variables for T-1 time periods.
Estimation of the Random-Effects Model with Time Effects
The random-effects estimator extended to include dummy variables for T-1 time periods is used
to estimate the random-effects model with time effects.
WHICH MODEL IS APPROPRIATE: FIXED-EFFECTS OR RANDOM-EFFECTS MODEL?
When analyzing panel data, which model is most appropriate: the fixed-effects model, or the
random effects model? To decide which model is most appropriate, many economists use the
following criterion.
If the unit dependent unobserved effects (vi) are correlated with one or more of the
explanatory variables, then the correct model is the fixed effects model. If the unit dependent
unobserved effects (vi) are not correlated with one or more of the explanatory variables, and if
they can be viewed as outcomes of a random variable, then the correct model is the random
effects model. Why? The random-effects model assumes that the unit dependent unobserved
effects are not correlated with the explanatory variables. If this assumption is violated, then the
random-effects estimator will yield biased and inconsistent estimates. The fixed-effects
estimator will yield unbiased and consistent estimates.
Hausman Test for Fixed and Random Effects Models
In many situations, you may be uncertain whether the unit dependent unobserved effects (vi)
are correlated with one or more of the explanatory variables, and therefore uncertain whether
the fixed-effects model or random-effects model is most appropriate. In these situations, you
can use a Hausman test to test whether the unit dependent unobserved effects (vi) are
correlated with the explanatory variables. For the Hausman test, the null and alternative
hypotheses are as follows.
H0: vi is not correlated with Xit (random-effects model appropriate)
H1: vi is correlated with Xit
(fixed-effects model is appropriate)
To test the null hypothesis, you compare the estimates from the random-effects estimator and
the fixed- effects estimator. The random-effects estimator is consistent under the null
hypothesis, but inconsistent under the alternative hypothesis. The fixed-effects estimator is
consistent under both the null and alternative hypotheses. If the estimates for the randomeffects estimator are not significantly different from the estimates for the fixed-effects
estimator, then we accept the null hypothesis and conclude that vi is not correlated with Xit,
and therefore the random-effects model is the appropriate model. If the estimates for the
random-effects estimator are significantly different from the estimates for the fixed- effects
estimator, then we reject the null and conclude that vi is correlated with Xit, and therefore the
fixed- effects model is the appropriate model. The Hausman test statistic has an approximate
chi-square distribution with k degrees of freedom, where k is the number of slope parameters
in the model.
Download