Hausman test From Wikipedia, the free encyclopedia Jump to: navigation, search The Hausman test (also called the Wu-Hausman test, Hausman specification test, and Durbin-Wu-Hausman test) is a statistical hypothesis test in econometrics named after De-Min Wu[1] and Jerry A. Hausman.[2][citation needed][clarification needed] The test evaluates the significance[clarification needed] of an estimator versus an alternative estimator. It helps one evaluate if a statistical model corresponds to the data. Contents 1 Details 2 Panel data 3 See also 4 References Details Consider the linear model y = bX + e, where y is univariate and X is vector of regressors, b is a vector of coefficients and e is the error term. We have two estimators for b: b0 and b1. Under the null hypothesis, both of these estimators are consistent, but b1 is efficient (has the smallest asymptotic variance), at least in the class of estimators containing b0. Under the alternative hypothesis, b0 is consistent, whereas b1 isn’t. Then the Wu-Hausman statistic is:[3] where † denotes the Moore–Penrose pseudoinverse. This statistic has asymptotically the chisquared distribution with the number of degrees of freedom equal to the rank of matrix Var(b0) − Var(b1). If we reject the null hypothesis, one or both of the estimators is inconsistent. This test can be used to check for the endogeneity of a variable (by comparing instrumental variable (IV) estimates to ordinary least squares (OLS) estimates). It can also be used to check the validity of extra instruments by comparing IV estimates using a full set of instruments Z to IV estimates that use a proper subset of Z. Note that in order for the test to work in the latter case, we must be certain of the validity of the subset of Z and that subset must have enough instruments to identify the parameters of the equation. Hausman also showed that the covariance between an efficient estimator and the difference of an efficient and inefficient estimator is zero. Panel data Hausman test can be also used to differentiate between Fixed effects model and Random effects model in panel data. In this case, Random effects (RE) is preferred under the null hypothesis due to higher efficiency, while under the alternative Fixed effects (FE) is at least consistent and thus preferred. H0 is true H1 is true Consistent b1 (RE estimator) Inconsistent Efficient Consistent b0 (FE estimator) Consistent Inefficient See also Regression model validation Fixed effects model From Wikipedia, the free encyclopedia Jump to: navigation, search Regression analysis Models Linear regression Simple regression Ordinary least squares Polynomial regression General linear model Generalized linear model Discrete choice Logistic regression Multinomial logit Mixed logit Probit Multinomial probit Ordered logit Ordered probit Poisson Multilevel model Fixed effects Random effects Mixed model Nonlinear regression Nonparametric Semiparametric Robust Quantile Isotonic Principal components Least angle Local Segmented Errors-in-variables Estimation Least squares Ordinary least squares Linear (math) Partial Total Generalized Weighted Non-linear Iteratively reweighted Ridge regression LASSO Least absolute deviations Bayesian Bayesian multivariate Background Regression model validation Mean and predicted response Errors and residuals Goodness of fit Studentized residual Gauss–Markov theorem v t e This article needs additional citations for verification. Please help improve this article by adding citations to reliable sources. Unsourced material may be challenged and removed. (September 2009) In econometrics and statistics, a fixed effects model is a statistical model that represents the observed quantities in terms of explanatory variables that are treated as if the quantities were non-random. This is in contrast to random effects models and mixed models in which either all or some of the explanatory variables are treated as if they arise from random causes. Note that the biostatistics definitions differ, as biostatisticians refer to the population-average and subject- specific effects as "fixed" and "random" effects respectively.[1][2][3] Often the same structure of model, which is usually a linear regression model, can be treated as any of the three types depending on the analyst's viewpoint, although there may be a natural choice in any given situation. In panel data analysis, the term fixed effects estimator (also known as the within estimator) is used to refer to an estimator for the coefficients in the regression model. If we assume fixed effects, we impose time independent effects for each entity that are possibly correlated with the regressors. Contents 1 Qualitative description 2 Formal description 3 Equality of Fixed Effects (FE) and First Differences (FD) estimators when T=2 4 Hausman–Taylor method 5 Testing FE vs. RE 6 Steps in Fixed Effects Model for sample data 7 See also 8 Notes 9 References 10 External links Qualitative description Such models assist in controlling for unobserved heterogeneity when this heterogeneity is constant over time and correlated with independent variables. This constant can be removed from the data through differencing, for example by taking a first difference which will remove any time invariant components of the model. There are two common assumptions made about the individual specific effect, the random effects assumption and the fixed effects assumption. The random effects assumption (made in a random effects model) is that the individual specific effects are uncorrelated with the independent variables. The fixed effect assumption is that the individual specific effect is correlated with the independent variables. If the random effects assumption holds, the random effects model is more efficient than the fixed effects model. However, if this assumption does not hold (i.e., if the Durbin–Watson test fails), the random effects model is not consistent. Formal description Consider the linear unobserved effects model for for observations and and time periods: where is the dependent variable observed for individual at time is the time-variant regressor matrix, is the unobserved time-invariant individual effect and is the error term. Unlike , cannot be observed by the econometrician. Common examples for timeinvariant effects are innate ability for individuals or historical and institutional factors for countries. Unlike the Random effects (RE) model where the unobserved is independent of , the FE model allows to be correlated with the regressor matrix exogeneity, however, is still required. for all . Strict Since is not observable, it cannot be directly controlled for. The FE model eliminates demeaning the variables using the within transformation: where and effect is eliminated. The FE estimator . Since is constant, by and hence the is then obtained by an OLS regression of on . Another alternative to the within transformation is to add a dummy variable for each individual . This is numerically, but not computationally, equivalent to the fixed effect model and only works if the number of time observations per individual, is much larger than the number of individuals in the panel. Equality of Fixed Effects (FE) and First Differences (FD) estimators when T=2 For the special two period case ( ), the FE estimator and the FD estimator are numerically equivalent. To see this, establish that the fixed effects estimator is: Since each the line as: can be re-written as , we'll re-write Hausman–Taylor method Need to have more than one time-variant regressor ( ) and time-invariant regressor ( ) and at least one and one that are uncorrelated with . Partition the and variables such that uncorrelated with . Need . Estimating via OLS on estimate. where using and and are as instruments yields a consistent Testing FE vs. RE We can test whether a fixed or random effects model is appropriate using a Hausman test. : : If is true, both consistent and and are consistent, but only is efficient. If is not. where is true, is The Hausman test is a specification test so a large test statistic might be indication that there might be Errors in Variables (EIV) or our model is misspecified. If the FE assumption is true, we should find that . A simple heuristic is that if there could be EIV. Steps in Fixed Effects Model for sample data 1. Calculate group and grand means 2. Calculate k=number of groups, n=number of observations per group, N=total number of observations (k x n) 3. Calculate SS-total (or total variance) as: (Each score - Grand mean)^2 then summed 4. Calculate SS-treat (or treatment effect) as: (Each group mean- Grand mean)^2 then summed x n 5. Calculate SS-error (or error effect) as (Each score - Its group mean)^2 then summed 6. Calculate df-total: N-1, df-treat: k-1 and df-error k(n-1) 7. Calculate Mean Square MS-treat: SS-treat/df-treat, then MS-error: SS-error/df-error 8. Calculate obtained f value: MS-treat/MS-error 9. Use F-table or probability function, to look up critical f value with a certain significance level 10. Conclude as to whether treatment effect significantly affects the variable of interest Random effects model From Wikipedia, the free encyclopedia Jump to: navigation, search Regression analysis Models Linear regression Simple regression Ordinary least squares Polynomial regression General linear model Generalized linear model Discrete choice Logistic regression Multinomial logit Mixed logit Probit Multinomial probit Ordered logit Ordered probit Poisson Multilevel model Fixed effects Random effects Mixed model Nonlinear regression Nonparametric Semiparametric Robust Quantile Isotonic Principal components Least angle Local Segmented Errors-in-variables Estimation Least squares Ordinary least squares Linear (math) Partial Total Generalized Weighted Non-linear Iteratively reweighted Ridge regression LASSO Least absolute deviations Bayesian Bayesian multivariate Background Regression model validation Mean and predicted response Errors and residuals Goodness of fit Studentized residual Gauss–Markov theorem v t e This article needs attention from an expert in statistics. Please add a reason or a talk parameter to this template to explain the issue with the . WikiProject Statistics or the Statistics Portal may be able to help recruit an expert. (January 2011) In statistics, a random effect(s) model, also called a variance components model, is a kind of hierarchical linear model. It assumes that the dataset being analysed consists of a hierarchy of different populations whose differences relate to that hierarchy. In econometrics, random effects models are used in the analysis of hierarchical or panel data when one assumes no fixed effects (i.e. no individual effects). The fixed effects model is a special case of the random effects model. Note that the biostatistics definitions differ, as biostatisticians respectively refer to the population-average and subject-specific effects as "fixed" and "random" effects.[1][2][3] Contents 1 Simple example 2 Variance components 3 Unbiasedness 4 See also 5 Notes 6 References 7 Bibliography Simple example Suppose m large elementary schools are chosen randomly from among thousands in a large country. Suppose also that n pupils of the same age are chosen randomly at each selected school. Their scores on a standard aptitude test are ascertained. Let Yij be the score of the jth pupil at the ith school. A simple way to model the relationships of these quantities is where μ is the average test score for the entire population. In this model Ui is the school-specific random effect: it measures the difference between the average score at school i and the average score in the entire country and it is "random" because the school has been randomly selected from a larger population of schools. The term, Wij is the individual-specific error. That is, it is the deviation of the j-th pupil’s score from the average for the i-th school. Again this is regarded as random because of the random selection of pupils within the school, even though it is a fixed quantity for any given pupil. The model can be augmented by including additional explanatory variables, which would capture differences in scores among different groups. For example: where Sexij is the dummy variable for boys/girls, Raceij is the dummy variable for white/black pupils, and ParentsEducij records the average education level of child’s parents. This is a mixed model, not a purely random effects model.[dubious – discuss] Variance components The variance of Yij is the sum of the variances τ2 and σ2 of Ui and Wij respectively. Let be the average, not of all scores at the ith school, but of those at the ith school that are included in the random sample. Let be the "grand average". Let be respectively the sum of squares due to differences within groups and the sum of squares due to difference between groups. Then it can be shown that and These "expected mean squares" can be used as the basis for estimation of the "variance components" σ2 and τ2. Unbiasedness In general, random effects is efficient, and should be used (over fixed effects) if the assumptions underlying it are believed to be satisfied. For RE to work in the school example it is necessary that the school-specific effects be orthogonal to the other covariates of the model. This can be tested by running random effects, then fixed effects, and doing a Hausman specification test. If the test rejects, then random effects is biased and fixed effects is the correct estimation procedure. See also Bühlmann model Hierarchical linear modeling Fixed effects MINQUE Notes 1. ^ Peter J. Diggle, Patrick Heagerty, Kung-Yee Liang, and Scott L. Zeger, 2002. Analysis of Longitudinal Data. 2nd ed., Oxford University Press, p. 169-171. 2. ^ Garrett M. Fitzmaurice, Nan M. Laird, and James H. Ware, 2004. Applied Longitudinal Analysis. John Wiley & Sons, Inc., p. 326-328. 3. ^ Nan M. Laird and James H. Ware (1982). "Random-Effects Models for Longitudinal Data," Biometrics, Vol. 38, No. 4 (Dec., 1982), pp. 963-974. Fixed and random effects models When you have repeated observations per individual this is a problem and an advantage: o the observations are not independent o we can use the repetition to get better parameter estimates If we pooled the observations and used e.g., OLS we would have biased estimates If we fit fixed-effect or random-effect models which take account of the repetition we can control for fixed or random individual differences. In the econometrics literature these models are called `cross-sectional time-series' models, because we have time-series of observations at individual rather than aggregate level. If we have a small number of individuals, we can simply fit a dummy for the individual: This can be considered a `fixed-effects' model because the regression line is raised or lowered by a fixed amount for each individual If there are many individuals this cannot be done directly, but there are mathematically equivalent models which achieve the same effect This model is appropriate where we consider each individual to have a fixed effect shifting the up or down We may prefer to consider the individual differences as random disturbances drawn from some specified distribution: This has the advantage of using fewer degrees of freedom, and that individual differences are considered random rather than fixed and estimable. It has the disadvantage of requiring no correlation between the regressors (the The xt series of commands provide tools for analyzing cross-sectional timeseries (panel) datasets: help xtdes Describe pattern of xt data help xtsum Summarize xt data help xttab Tabulate xt data help xtreg Fixed-, between- and random-effects, and populationaveraged linear models help xtdata Faster specification searches with xt data help xtlogit Fixed-effects, random-effects, & populationaveraged logit models help xtprobit Random-effects and population-averaged probit models help xttobit Random-effects tobit models help xtpois Fixed-effects, random-effects, & populationaveraged Poisson models help xtnbreg Fixed-effects, random-effects, & populationaveraged negative binomial models help xtclog Random-effects and population-averaged cloglog models s) and the : there are tests for this assumption (Hausman test). help help help help Random-effects interval data regression models Hildreth-Houck random coefficients models Panel-data models using GLS Population-averaged panel-data models using GEE Fitting these models in Stata is easy: o With data in long format, one record per individual per wave o o o xtintreg xtrchh xtgls xtgee . xtreg yvar x1 x2, fe i(pid) . xtreg yvar x1 x2, re i(pid) . xtlogit yvar x1 x2, re i(pid) Reference: William Greene, Econometric Analysis, Maxwell Macmillan 1991, Ch 16 section 4. http://www.pitt.edu/~super1/lecture/lec1171/012.htm