Random effects model

advertisement
Hausman test
From Wikipedia, the free encyclopedia
Jump to: navigation, search
The Hausman test (also called the Wu-Hausman test, Hausman specification test, and
Durbin-Wu-Hausman test) is a statistical hypothesis test in econometrics named after De-Min
Wu[1] and Jerry A. Hausman.[2][citation needed][clarification needed] The test evaluates the
significance[clarification needed] of an estimator versus an alternative estimator. It helps one evaluate if
a statistical model corresponds to the data.
Contents




1 Details
2 Panel data
3 See also
4 References
Details
Consider the linear model y = bX + e, where y is univariate and X is vector of regressors, b is a
vector of coefficients and e is the error term. We have two estimators for b: b0 and b1. Under the
null hypothesis, both of these estimators are consistent, but b1 is efficient (has the smallest
asymptotic variance), at least in the class of estimators containing b0. Under the alternative
hypothesis, b0 is consistent, whereas b1 isn’t.
Then the Wu-Hausman statistic is:[3]
where † denotes the Moore–Penrose pseudoinverse. This statistic has asymptotically the chisquared distribution with the number of degrees of freedom equal to the rank of matrix Var(b0) −
Var(b1).
If we reject the null hypothesis, one or both of the estimators is inconsistent. This test can be
used to check for the endogeneity of a variable (by comparing instrumental variable (IV)
estimates to ordinary least squares (OLS) estimates). It can also be used to check the validity of
extra instruments by comparing IV estimates using a full set of instruments Z to IV estimates that
use a proper subset of Z. Note that in order for the test to work in the latter case, we must be
certain of the validity of the subset of Z and that subset must have enough instruments to identify
the parameters of the equation.
Hausman also showed that the covariance between an efficient estimator and the difference of an
efficient and inefficient estimator is zero.
Panel data
Hausman test can be also used to differentiate between Fixed effects model and Random effects
model in panel data. In this case, Random effects (RE) is preferred under the null hypothesis due
to higher efficiency, while under the alternative Fixed effects (FE) is at least consistent and thus
preferred.
H0 is true H1 is true
Consistent
b1 (RE estimator)
Inconsistent
Efficient
Consistent
b0 (FE estimator)
Consistent
Inefficient
See also

Regression model validation
Fixed effects model
From Wikipedia, the free encyclopedia
Jump to: navigation, search
Regression analysis
Models

Linear regression

Simple regression

Ordinary least squares

Polynomial regression

General linear model

Generalized linear model

Discrete choice

Logistic regression

Multinomial logit

Mixed logit


Probit
Multinomial probit

Ordered logit

Ordered probit


Poisson
Multilevel model

Fixed effects

Random effects


Mixed model
Nonlinear regression

Nonparametric

Semiparametric


Robust

Quantile

Isotonic
Principal components

Least angle



Local
Segmented
Errors-in-variables
Estimation

Least squares

Ordinary least squares

Linear (math)

Partial



Total
Generalized

Weighted

Non-linear
Iteratively reweighted

Ridge regression


LASSO
Least absolute deviations


Bayesian
Bayesian multivariate
Background

Regression model validation

Mean and predicted response

Errors and residuals



Goodness of fit
Studentized residual
Gauss–Markov theorem

v

t

e
This article needs additional citations for verification. Please help improve this article by
adding citations to reliable sources. Unsourced material may be challenged and removed.
(September 2009)
In econometrics and statistics, a fixed effects model is a statistical model that represents the
observed quantities in terms of explanatory variables that are treated as if the quantities were
non-random. This is in contrast to random effects models and mixed models in which either all
or some of the explanatory variables are treated as if they arise from random causes. Note that
the biostatistics definitions differ, as biostatisticians refer to the population-average and subject-
specific effects as "fixed" and "random" effects respectively.[1][2][3] Often the same structure of
model, which is usually a linear regression model, can be treated as any of the three types
depending on the analyst's viewpoint, although there may be a natural choice in any given
situation.
In panel data analysis, the term fixed effects estimator (also known as the within estimator) is
used to refer to an estimator for the coefficients in the regression model. If we assume fixed
effects, we impose time independent effects for each entity that are possibly correlated with the
regressors.
Contents










1 Qualitative description
2 Formal description
3 Equality of Fixed Effects (FE) and First Differences (FD) estimators when T=2
4 Hausman–Taylor method
5 Testing FE vs. RE
6 Steps in Fixed Effects Model for sample data
7 See also
8 Notes
9 References
10 External links
Qualitative description
Such models assist in controlling for unobserved heterogeneity when this heterogeneity is
constant over time and correlated with independent variables. This constant can be removed from
the data through differencing, for example by taking a first difference which will remove any
time invariant components of the model.
There are two common assumptions made about the individual specific effect, the random effects
assumption and the fixed effects assumption. The random effects assumption (made in a random
effects model) is that the individual specific effects are uncorrelated with the independent
variables. The fixed effect assumption is that the individual specific effect is correlated with the
independent variables. If the random effects assumption holds, the random effects model is more
efficient than the fixed effects model. However, if this assumption does not hold (i.e., if the
Durbin–Watson test fails), the random effects model is not consistent.
Formal description
Consider the linear unobserved effects model for
for
observations and
and
time periods:
where
is the dependent variable observed for individual at time
is the time-variant
regressor matrix, is the unobserved time-invariant individual effect and
is the error
term. Unlike
, cannot be observed by the econometrician. Common examples for timeinvariant effects are innate ability for individuals or historical and institutional factors for
countries.
Unlike the Random effects (RE) model where the unobserved is independent of
, the FE model allows to be correlated with the regressor matrix
exogeneity, however, is still required.
for all
. Strict
Since is not observable, it cannot be directly controlled for. The FE model eliminates
demeaning the variables using the within transformation:
where
and
effect is eliminated. The FE estimator
. Since
is constant,
by
and hence the
is then obtained by an OLS regression of on
.
Another alternative to the within transformation is to add a dummy variable for each individual .
This is numerically, but not computationally, equivalent to the fixed effect model and only works
if the number of time observations per individual, is much larger than the number of
individuals in the panel.
Equality of Fixed Effects (FE) and First Differences (FD)
estimators when T=2
For the special two period case (
), the FE estimator and the FD estimator are numerically
equivalent. To see this, establish that the fixed effects estimator is:
Since each
the line as:
can be re-written as
, we'll re-write
Hausman–Taylor method
Need to have more than one time-variant regressor ( ) and time-invariant regressor ( ) and at
least one and one that are uncorrelated with .
Partition the and variables such that
uncorrelated with . Need
.
Estimating via OLS on
estimate.
where
using
and
and
are
as instruments yields a consistent
Testing FE vs. RE
We can test whether a fixed or random effects model is appropriate using a Hausman test.
:
:
If
is true, both
consistent and
and
are consistent, but only
is efficient. If
is not.
where
is true,
is
The Hausman test is a specification test so a large test statistic might be indication that there
might be Errors in Variables (EIV) or our model is misspecified. If the FE assumption is true, we
should find that
.
A simple heuristic is that if
there could be EIV.
Steps in Fixed Effects Model for sample data
1. Calculate group and grand means
2. Calculate k=number of groups, n=number of observations per group, N=total number of
observations (k x n)
3. Calculate SS-total (or total variance) as: (Each score - Grand mean)^2 then summed
4. Calculate SS-treat (or treatment effect) as: (Each group mean- Grand mean)^2 then summed x n
5. Calculate SS-error (or error effect) as (Each score - Its group mean)^2 then summed
6. Calculate df-total: N-1, df-treat: k-1 and df-error k(n-1)
7. Calculate Mean Square MS-treat: SS-treat/df-treat, then MS-error: SS-error/df-error
8. Calculate obtained f value: MS-treat/MS-error
9. Use F-table or probability function, to look up critical f value with a certain significance level
10. Conclude as to whether treatment effect significantly affects the variable of interest
Random effects model
From Wikipedia, the free encyclopedia
Jump to: navigation, search
Regression analysis
Models

Linear regression

Simple regression

Ordinary least squares

Polynomial regression

General linear model

Generalized linear model

Discrete choice

Logistic regression

Multinomial logit

Mixed logit


Probit
Multinomial probit

Ordered logit

Ordered probit


Poisson
Multilevel model

Fixed effects

Random effects


Mixed model
Nonlinear regression

Nonparametric

Semiparametric


Robust

Quantile

Isotonic
Principal components

Least angle



Local
Segmented
Errors-in-variables
Estimation

Least squares

Ordinary least squares

Linear (math)

Partial



Total
Generalized

Weighted

Non-linear
Iteratively reweighted

Ridge regression


LASSO
Least absolute deviations


Bayesian
Bayesian multivariate
Background

Regression model validation

Mean and predicted response

Errors and residuals



Goodness of fit
Studentized residual
Gauss–Markov theorem

v

t

e
This article needs attention from an expert in statistics. Please add a reason or a talk parameter
to this template to explain the issue with the . WikiProject Statistics or the Statistics Portal may
be able to help recruit an expert. (January 2011)
In statistics, a random effect(s) model, also called a variance components model, is a kind of
hierarchical linear model. It assumes that the dataset being analysed consists of a hierarchy of
different populations whose differences relate to that hierarchy. In econometrics, random effects
models are used in the analysis of hierarchical or panel data when one assumes no fixed effects
(i.e. no individual effects). The fixed effects model is a special case of the random effects model.
Note that the biostatistics definitions differ, as biostatisticians respectively refer to the
population-average and subject-specific effects as "fixed" and "random" effects.[1][2][3]
Contents







1 Simple example
2 Variance components
3 Unbiasedness
4 See also
5 Notes
6 References
7 Bibliography
Simple example
Suppose m large elementary schools are chosen randomly from among thousands in a large
country. Suppose also that n pupils of the same age are chosen randomly at each selected school.
Their scores on a standard aptitude test are ascertained. Let Yij be the score of the jth pupil at the
ith school. A simple way to model the relationships of these quantities is
where μ is the average test score for the entire population. In this model Ui is the school-specific
random effect: it measures the difference between the average score at school i and the average
score in the entire country and it is "random" because the school has been randomly selected
from a larger population of schools. The term, Wij is the individual-specific error. That is, it is the
deviation of the j-th pupil’s score from the average for the i-th school. Again this is regarded as
random because of the random selection of pupils within the school, even though it is a fixed
quantity for any given pupil.
The model can be augmented by including additional explanatory variables, which would
capture differences in scores among different groups. For example:
where Sexij is the dummy variable for boys/girls, Raceij is the dummy variable for white/black
pupils, and ParentsEducij records the average education level of child’s parents. This is a mixed
model, not a purely random effects model.[dubious – discuss]
Variance components
The variance of Yij is the sum of the variances τ2 and σ2 of Ui and Wij respectively.
Let
be the average, not of all scores at the ith school, but of those at the ith school that are included
in the random sample. Let
be the "grand average".
Let
be respectively the sum of squares due to differences within groups and the sum of squares due
to difference between groups. Then it can be shown that
and
These "expected mean squares" can be used as the basis for estimation of the "variance
components" σ2 and τ2.
Unbiasedness
In general, random effects is efficient, and should be used (over fixed effects) if the assumptions
underlying it are believed to be satisfied. For RE to work in the school example it is necessary
that the school-specific effects be orthogonal to the other covariates of the model. This can be
tested by running random effects, then fixed effects, and doing a Hausman specification test. If
the test rejects, then random effects is biased and fixed effects is the correct estimation
procedure.
See also




Bühlmann model
Hierarchical linear modeling
Fixed effects
MINQUE
Notes
1. ^ Peter J. Diggle, Patrick Heagerty, Kung-Yee Liang, and Scott L. Zeger, 2002. Analysis of
Longitudinal Data. 2nd ed., Oxford University Press, p. 169-171.
2. ^ Garrett M. Fitzmaurice, Nan M. Laird, and James H. Ware, 2004. Applied Longitudinal Analysis.
John Wiley & Sons, Inc., p. 326-328.
3. ^ Nan M. Laird and James H. Ware (1982). "Random-Effects Models for Longitudinal Data,"
Biometrics, Vol. 38, No. 4 (Dec., 1982), pp. 963-974.
Fixed and random effects models







When you have repeated observations per individual this is a problem and an advantage:
o the observations are not independent
o we can use the repetition to get better parameter estimates
If we pooled the observations and used e.g., OLS we would have biased estimates
If we fit fixed-effect or random-effect models which take account of the repetition we can
control for fixed or random individual differences.
In the econometrics literature these models are called `cross-sectional time-series' models,
because we have time-series of observations at individual rather than aggregate level.
If we have a small number of individuals, we can simply fit a dummy for the individual:
This can be considered a `fixed-effects' model because the regression line is raised or lowered
by a fixed amount for each individual
If there are many individuals this cannot be done directly, but there are mathematically
equivalent models which achieve the same effect

This model is appropriate where we consider each individual to have a fixed effect shifting the

up or down
We may prefer to consider the individual differences as random disturbances drawn from some
specified distribution:

This has the advantage of using fewer degrees of freedom, and that individual differences are
considered random rather than fixed and estimable.

It has the disadvantage of requiring no correlation between the regressors (the

The xt series of commands provide tools for analyzing cross-sectional
timeseries (panel) datasets:
help xtdes
Describe pattern of xt data
help xtsum
Summarize xt data
help xttab
Tabulate xt data
help xtreg
Fixed-, between- and random-effects, and
populationaveraged linear models
help xtdata
Faster specification searches with xt data
help xtlogit
Fixed-effects, random-effects, & populationaveraged
logit models
help xtprobit
Random-effects and population-averaged probit
models
help xttobit
Random-effects tobit models
help xtpois
Fixed-effects, random-effects, & populationaveraged
Poisson models
help xtnbreg
Fixed-effects, random-effects, & populationaveraged
negative binomial models
help xtclog
Random-effects and population-averaged cloglog
models
s) and the
: there are tests for this assumption (Hausman test).





















help
help
help
help
Random-effects interval data regression models
Hildreth-Houck random coefficients models
Panel-data models using GLS
Population-averaged panel-data models using GEE
Fitting these models in Stata is easy:
o With data in long format, one record per individual per wave
o
o
o

xtintreg
xtrchh
xtgls
xtgee
. xtreg yvar x1 x2, fe i(pid)
. xtreg yvar x1 x2, re i(pid)
. xtlogit yvar x1 x2, re i(pid)
Reference: William Greene, Econometric Analysis, Maxwell Macmillan 1991, Ch 16 section 4.
http://www.pitt.edu/~super1/lecture/lec1171/012.htm
Download