Iterated Feasible Generalize Least Squares (IFGLS) Estimator

advertisement
SEEMINGLY UNRELATED REGRESSIONS MODEL
EQUATION SYSTEMS
Often times it makes sense to view two or more equations as a system of equations that are
related to one another in some particular way. There are 4 major types of equation systems:
1.
2.
3.
4.
Seemingly unrelated equation system
Simultaneous equation system
Recursive equation system
Block recursive system
You can use information about how the equations are related to obtain better estimates of the
parameters of the model you are estimating.
INTRODUCTION TO SEEMINGLY UNRELATED EQUATIONS SYSTEMS
In a seemingly unrelated equation system, the equations are related in one or both of the
following ways.
1. The error terms in the different equations are related.
The error terms are correlated if there are common unobserved factors that
influence the dependent variables in the equations.
2. The parameters in the different equations are related.
This occurs if the same parameter(s) appears in more than one equation, or if a
parameter(s) in one equation is a linear or nonlinear function the parameters in
the
other equations.
There are many economic processes that are best described by a seemingly unrelated equation
system. Some examples are as follows. 1) Investment demand equations for firms in the same
industry. 2) Consumer demand equations implied by utility maximizing behavior. 3) Input
demand equations implied by cost minimizing and profit maximizing behavior.
SPECIFICATION OF THE SUR MODEL
Assume that you have M-equations that are related because the error terms are correlated.
This system of M seemingly unrelated regression equations can be written in matrix format as
follows.
y1 = X11 + 1
y2 = X22 + 2
y3 = X33 + 3
.
.
.
yM = XMM + M
Using more concise notation, this system of M-equations can be written as
yi = Xi i + i
for i = 1, 2, …, M
Where yi is a Tx1 column vector of observations on the ith dependent variable; Xi is a TxK
matrix of observations for the K-1 explanatory variables and a column vector of 1’s for the ith
equation (i.e., the data matrix for the ith equation; i is the Kx1 column vector of parameters
for the ith equation; and i is the Tx1 column vector of disturbances for the ith equation.
You can view this system of M-equations as one single large equation to be estimated. To
combine the M-equations into one single large equation, you stack the vectors and matrices as
follows.
y1
| y2 |
| y3 |
| . | ==
| . |
| . |
| yM|
 
(MT)x1
 X1
| X2
0 |
|
X3
|
|
.
|
|
.
|
|
.
|
|
0
.
|

XM 
(MT)x(MK)


| 2 |
| 3 |
| . |
| . |
| . |
| M |
 
(MK)x1
+
1 
| 2 |
| 3 |
| . |
| . |
| . |
| M |
 
 1 
(MT)x1
This single large equation (henceforth called the “big equation”) can be written more concisely as
y = X + 
Where y is a (MT)x1 column vector of observations on the dependent variables for the M-equations; X
is a (MT)x(MK) matrix of observations on the explanatory variables; with the columns of 1’s, for the
M-equations;  is a (MK)x1 column vector of parameters for the M-equations; and  is a (MT)x1 column
vector of disturbances for the M-equations. The specification of the SUR model is defined by the
following set of assumptions.
Assumptions
1. The functional form of the big equation is linear in parameters.
y = X + 
2. The error term in the big equation has mean zero.
E() = 0
3. The errors in the big equation are nonspherical and satisfy the following assumptions:
a.
b.
c.
d.
The error variance for each individual equation is constant (no heteroscedasticity).
The error variance may differ for different individual equations.
The errors for each individual equation are uncorrelated (no autocorrelation)
The errors for different individual equations are contemporaneously correlated.
i) For time series data, the errors in different equations in the same time period
are
correlated. The errors in different equations for different time periods are not
correlated.
ii) For cross section data, the errors in different equations for the same decision
making
unit are correlated. The errors in different equations for different decision
making
units are not correlated.
Assumptions 3a through 3d imply the following variance-covariance matrix of errors
Cov() = E(T) = W =   I
4. The error term for the big equation has a normal distribution
~N
5. The error term in the big equation is uncorrelated with each explanatory variable in the big
equation.
Cov (,X) = 0
The Variance-Covariance Matrix of Errors
The SUR model assumes that the variance-covariance matrix of disturbances for the big
equation has the following structure.
W=I
The sigma matrix, , is an MxM matrix of variances and covariances for the M individual
equations

==
 11 12 …….. 1M 
| 21 22 …….. 2M |
| .
.
.|
| .
.
.|
| .
.
.|
 M1 M2 …….. MM  MXM
where 11 is the variance of the errors in equation 1, 22 is the variance of the errors in
equation 2, etc; 12 is the covariance of the errors in equation 1 and equation 2, etc. The
identity matrix, I, is a TxT matrix with ones on the principal diagonal and zeros off the principal
diagonal,
 1 0 …….. 0 
I
==
| 0 1 …….. 0 |
| . .
.|
| . .
.|
| . .
. |
 0 0 ……… 1

TxT
The symbol  is an operator called the Kronecker product. It tells you to multiply each element
in the matrix  by the matrix I. The result of the Kronecker product is the (MT)x(MT) matrix
of disturbances for the big equation
 11I 12I …….. 1MI 
| 21I 22I …….. 2MI |
| 31I 32I …….. 3MI |
W ==   I ==
| .
.
|
| .
.
|
| .
.
|
 M1I M2I …….. 2MI  (MT)x(MT)
Seemingly Unrelated Regression Model Concisely Stated in Matrix Format
The sample of MT multivariate observations are generated by a process described as follows.
y = X + ,
 ~ N(0, I )
or alternatively
y ~ N(X, I )
ESTIMATION
Choosing an Estimator
To obtain estimates of the parameters of the SUR model, you need to choose an estimator. We
will consider the following 5 estimators:
1.
2.
3.
4.
Ordinary least squares (OLS) estimator
Generalized least squares (GLS) estimator
Feasible generalized least squares (FGLS) estimator
Iterated feasible least squares (ITGLS) estimator
Ordinary Least Squares (OLS) Estimator
The OLS estimator is given by the rule  ^ = (XTX)-1XTy. This rule can be used to directly estimate the
(MK)x1 vector of parameters, , in the single big equation. However, because the data matrix for the
big equation is block diagonal, this is equivalent to estimating each of the M-equations separately by
OLS.
Properties of the OLS Estimator
If the sample data are generated by the SUR regression model, then the OLS estimator is
unbiased but inefficient. The reason OLS is inefficient is because it wastes information. This is
because the errors in the big equation are nonspherical and OLS does not use this information
to obtain estimates of the parameters. Thus, there exists an alternative estimator that uses the
information about the nonspherical errors to obtain more precise estimates. Note also that the
OLS estimator does not produce maximum likelihood estimates.
Generalize Least Squares (GLS) Estimator
The GLS estimator is given by the rule:
^GLS = (XTW-1X)-1XT W-1y or equivalently
^GLS = [XT (-1I) X]-1XT (-1I) y
This rule can be applied directly to the big equation, where X is the (MT)x(MK) data matrix, y
is the (MT)x1 vector of observations on the dependent variable, and W is the (MT)x(MT)
variance-covariance matrix of disturbances for the big equation.
Properties of the GLS Estimator
If the sample data are generated by the SUR regression model, then the GLS estimator is
unbiased, efficient, and the maximum likelihood estimator. The reason the GLS estimator is
more precise than the OLS estimator is that it uses the information about the nonspherical
disturbances contained in W to obtain estimates of the parameters.
Major Shortcoming of the GLS Estimator
The GLS estimator is not a feasible estimator, because you don’t know the elements of the
variance-covariance matrix of disturbances, W, for the big equation.
Feasible Generalized Least Squares (FGLS) Estimator
To make the GLS estimator a feasible estimator, you can use the sample of data to obtain an
estimate of W. When you replace true W with its estimate W^ you get the FGLS estimator. The
FGLS estimator is given by the rule:
^FGLS = (XTW-1^X)-1XT W-1^y or equivalently ^FGLS = [XT (-1^I) X]-1XT (-1^I) y
Estimating W
The most often used method for estimating W is Zellner’s method. When Zellner’s method is used to
estimate W the FGLS estimator is called Zellner’s SUR estimator. To obtain an estimate of W using
Zellner’s method you proceed as follows.
1. Estimate each of the M-equations separately using OLS.
2. Use the residuals from the OLS regressions to obtain estimates of the variances and covariances of
the disturbances for the M-equations. The estimators are:
ii^
i^Ti^
= 
T
and
i^Tj^
ij^ = 
T
Where ii^ is the estimate of the error variance for the ith equation; ij^ is the estimate of the
covariance of errors for the ith and jth equations; i^ is the vector of residuals for the ith equation;
j^ is the vector of residuals for the jth equation; and T is the sample size.
3. Use the estimates of the variances and covariances from step 2 to form an estimate of the MxM
matrix .
4. Construct the TxT identity matrix I.
5. Apply the formula W^ = ^I to obtain an estimate of the variance-covariance matrix of
disturbances for the big equation.
Once you have the estimate of W, you can use the sample data and the rule  ^FGLS = (XTW-1^X)-1XT W-1^y
to obtain estimates of the parameters.
Properties of the SUR Estimator
If the sample data is generated by the SUR regression model, then Zellner’s SUR estimator is
asymptotically equivalent to the GLS estimator and is a maximum likelihood estimator.
Therefore, it is asymptotically unbiased, efficient, and consistent. The small sample properties
of Zellner’s SUR estimator are unknown, but Monte Carlo studies suggest it is unbiased and has
a smaller variance than the OLS estimator
Iterated Feasible Generalize Least Squares (IFGLS) Estimator
An alternative FGLS estimator is the iterated FGLS estimator. The IFGLS estimator used most
often is Zellner’s iterated SUR (ISUR) estimator. The steps involved in using the ISUR estimator
are as follows.
1. Estimate the parameters of the big equation using Zellner’s SUR estimator described above.
2. Use the parameter estimates from this regression to compute the residuals for each of the
M-equations.
3. Use the residuals to obtain new estimates of the variances and covariances of the
disturbances for the M-equations, and therefore a new estimate of  and W.
4. Use the new estimate of W to repeat step 1 and obtain new parameter estimates.
5. Repeat steps 2, 3, and 4. (Each time you obtain new parameter estimates this completes an
iteration).
6. Continue to iterate until convergence is achieved. Convergence is achieved when the
change in the parameter estimates is very small. Very small is defined by a predetermined
criterion. This last set of parameter estimates are the ISUR estimates.
Properties of the ISUR Estimator
The ISUR estimator has the same asymptotic properties as the SUR estimator. However, there
is an ongoing debate about whether the ISUR or SUR estimator yields better estimates in small
samples. Most econometricians seem to prefer the ISUR estimator. One reason for this is given
below.
Singular Seemingly Unrelated Regressions Models
For some types of seemingly unrelated regressions models (e.g., consumer demand equations
implied by utility maximizing behavior; input demand equations implied by cost minimizing and
profit maximizing behavior) the variance-covariance matrix of disturbances W for the big
equation is singular, and therefore the entire system of M-equations cannot be estimated
jointly. These are called singular SUR models. To solve the singularity problem, you drop one
of the M-equations and estimate the remaining M – 1 equations jointly. The ISUR parameter
estimates are invariant to the equation dropped; that is, you will always get the same
parameter estimates regardless of the equation you eliminate. This is not true for the SUR
parameter estimates. Thus, when estimating the parameters of a singular SUR model you
should use the ISUR estimator.
Common Properties of the SUR AND, ITSUR Estimators
1. If the error terms across equations are not contemporaneously correlated, then the SUR
and ISUR estimators collapse to the OLS estimator and there are no efficiency gains.
2. If each of the M-equations have the same data matrix, X1 = X2 = … XM, then the SUR and
ISUR estimators collapse to the OLS estimator. This occurs if each of the M-equations have
identical explanatory variables with identical observations. In this case, there are no
efficiency gains from using the SUR or ISUR estimator.
3. If there are cross equation restrictions, then there are efficiency gains from using the SUR or
ISUR estimator, even if the error terms across equations are not correlated or the data
matrix for each of the M-equations is the same.
SPECIFICATION TESTING
A specification test tests an assumption that defines the specification of a statistical model. An
often used specification test for the SUR model is the Breusch-Pagan Test of Independent
Errors.
Breusch-Pagan Test of Independent Errors
The Breusch Pagan Test is used to test the assumption that the errors across equations are
contemporaneously correlated. The null hypothesis is no contemporaneous correlation. The
alternative hypothesis is contemporaneous correlation. For a two equation SUR model, the test
statistic is the following Lagrange multiplier statistic that has a chi-square distribution with
M(M-1)/M degrees of freedom.
LM = Tr212 ~ χ2(M(M-1)/M),
where r212 = (12^)2/(211^ 222^)
Where T is the sample size, (12^)2 is the square of the sample covariance of the errors for the
two equations, and 211^ and 222^ are the sample error variances for the two equations. This
test statistic can be generalized for more than two equations.
HYPOTHESIS TESTING
The following statistical tests can be used to test hypotheses in the SUR model. 1) Asymptotic ttest. 2) Approximate F-test. 3) Wald test. 4) Likelihood ratio test. 5) Lagrange multiplier test.
You must choose the appropriate test to test the hypothesis in which you are interested. Note
the following. 1) The small sample t-test and F-test cannot be used. This is because if the
sample data are generated by the SUR model we don’t know the sampling distribution of the tstatistic or the F-statistic. 2) Each of these tests is applied to the big equation.
Cross-Equation Restrictions
Economic theory and other sources of prior information often times imply that the values of
two or more parameters in two or more equations are identical, or a parameter in one
equation is a linear or nonlinear function of one or more parameters in one or more other
equations. These are called cross equation restrictions. For example, in a system of M
consumer demand equations implied by utility maximizing behavior the same parameters
appear in different demand equations. These cross-equation restrictions can be easily tested
and/or imposed in the context of the SUR model.
GOODNESS-OF-FIT
The R2 statistic that is used to measure the goodness-of-fit of a classical linear regression model is not
appropriate for the SUR regression model. Many statistical programs will report an R2 statistic for each
individual equation for the SUR model, but these R2 statistics have little if any meaning. They do not
measure the proportion of the variation in the dependent variable which is explained by variation in the
explanatory variables for the individual equation, and they can take values of less than zero and greater
than one.
Generalize R2 Statistic
The generalized R2 statistic is a single number that measures the goodness of fit for the big equation,
and therefore the entire system of individual equations. Calculating this statistic involves the following
steps.
1. Construct a TxM matrix, denoted Y, which contains the observations on the dependent variables for
the M-equations.
Y = [y1 y2 … yM]TxM
The M-columns of this matrix are the Tx1 vectors of observations on dependent variables for the Mequations.
2. Construct a TxM matrix, denoted YMean, which contains the sample means of the dependent
variables
for the M-equations.
YMean = [y1Mean y2Mean … yMMean]TxM
The M-columns of this matrix are the Tx1 vectors of contant mean for the dependent variables for
the
M-equations.
3. Construct a TxM mean deviation matrix, denoted YDev, as follows
y = Y – YMean
4. Construct a MxM mean deviation cross products matrix, denoted A, as follows
A = yTy
5. Estimate the big equation using an estimator that yields maximum likelihood estimates (e.g. SUR,
ISUR, direct maximum likelihood). Obtain the residual cross products matrix, denoted S, from this
regression.
5. Calculate the generalized R2 statistic as follows
R2~ = 1 – (|S| / |A|)
Where |S| is the determinant of the residual cross products matrix, and |A| is the determinant of
the
mean deviation cross products matrix.
The R2~ statistic measures the proportion of the variation in the vector of observations on the
dependent variable for the big equation that is explained by the variation in the explanatory variables in
the big equation.
Download