SEEMINGLY UNRELATED REGRESSIONS MODEL EQUATION SYSTEMS Often times it makes sense to view two or more equations as a system of equations that are related to one another in some particular way. There are 4 major types of equation systems: 1. 2. 3. 4. Seemingly unrelated equation system Simultaneous equation system Recursive equation system Block recursive system You can use information about how the equations are related to obtain better estimates of the parameters of the model you are estimating. INTRODUCTION TO SEEMINGLY UNRELATED EQUATIONS SYSTEMS In a seemingly unrelated equation system, the equations are related in one or both of the following ways. 1. The error terms in the different equations are related. The error terms are correlated if there are common unobserved factors that influence the dependent variables in the equations. 2. The parameters in the different equations are related. This occurs if the same parameter(s) appears in more than one equation, or if a parameter(s) in one equation is a linear or nonlinear function the parameters in the other equations. There are many economic processes that are best described by a seemingly unrelated equation system. Some examples are as follows. 1) Investment demand equations for firms in the same industry. 2) Consumer demand equations implied by utility maximizing behavior. 3) Input demand equations implied by cost minimizing and profit maximizing behavior. SPECIFICATION OF THE SUR MODEL Assume that you have M-equations that are related because the error terms are correlated. This system of M seemingly unrelated regression equations can be written in matrix format as follows. y1 = X11 + 1 y2 = X22 + 2 y3 = X33 + 3 . . . yM = XMM + M Using more concise notation, this system of M-equations can be written as yi = Xi i + i for i = 1, 2, …, M Where yi is a Tx1 column vector of observations on the ith dependent variable; Xi is a TxK matrix of observations for the K-1 explanatory variables and a column vector of 1’s for the ith equation (i.e., the data matrix for the ith equation; i is the Kx1 column vector of parameters for the ith equation; and i is the Tx1 column vector of disturbances for the ith equation. You can view this system of M-equations as one single large equation to be estimated. To combine the M-equations into one single large equation, you stack the vectors and matrices as follows. y1 | y2 | | y3 | | . | == | . | | . | | yM| (MT)x1 X1 | X2 0 | | X3 | | . | | . | | . | | 0 . | XM (MT)x(MK) | 2 | | 3 | | . | | . | | . | | M | (MK)x1 + 1 | 2 | | 3 | | . | | . | | . | | M | 1 (MT)x1 This single large equation (henceforth called the “big equation”) can be written more concisely as y = X + Where y is a (MT)x1 column vector of observations on the dependent variables for the M-equations; X is a (MT)x(MK) matrix of observations on the explanatory variables; with the columns of 1’s, for the M-equations; is a (MK)x1 column vector of parameters for the M-equations; and is a (MT)x1 column vector of disturbances for the M-equations. The specification of the SUR model is defined by the following set of assumptions. Assumptions 1. The functional form of the big equation is linear in parameters. y = X + 2. The error term in the big equation has mean zero. E() = 0 3. The errors in the big equation are nonspherical and satisfy the following assumptions: a. b. c. d. The error variance for each individual equation is constant (no heteroscedasticity). The error variance may differ for different individual equations. The errors for each individual equation are uncorrelated (no autocorrelation) The errors for different individual equations are contemporaneously correlated. i) For time series data, the errors in different equations in the same time period are correlated. The errors in different equations for different time periods are not correlated. ii) For cross section data, the errors in different equations for the same decision making unit are correlated. The errors in different equations for different decision making units are not correlated. Assumptions 3a through 3d imply the following variance-covariance matrix of errors Cov() = E(T) = W = I 4. The error term for the big equation has a normal distribution ~N 5. The error term in the big equation is uncorrelated with each explanatory variable in the big equation. Cov (,X) = 0 The Variance-Covariance Matrix of Errors The SUR model assumes that the variance-covariance matrix of disturbances for the big equation has the following structure. W=I The sigma matrix, , is an MxM matrix of variances and covariances for the M individual equations == 11 12 …….. 1M | 21 22 …….. 2M | | . . .| | . . .| | . . .| M1 M2 …….. MM MXM where 11 is the variance of the errors in equation 1, 22 is the variance of the errors in equation 2, etc; 12 is the covariance of the errors in equation 1 and equation 2, etc. The identity matrix, I, is a TxT matrix with ones on the principal diagonal and zeros off the principal diagonal, 1 0 …….. 0 I == | 0 1 …….. 0 | | . . .| | . . .| | . . . | 0 0 ……… 1 TxT The symbol is an operator called the Kronecker product. It tells you to multiply each element in the matrix by the matrix I. The result of the Kronecker product is the (MT)x(MT) matrix of disturbances for the big equation 11I 12I …….. 1MI | 21I 22I …….. 2MI | | 31I 32I …….. 3MI | W == I == | . . | | . . | | . . | M1I M2I …….. 2MI (MT)x(MT) Seemingly Unrelated Regression Model Concisely Stated in Matrix Format The sample of MT multivariate observations are generated by a process described as follows. y = X + , ~ N(0, I ) or alternatively y ~ N(X, I ) ESTIMATION Choosing an Estimator To obtain estimates of the parameters of the SUR model, you need to choose an estimator. We will consider the following 5 estimators: 1. 2. 3. 4. Ordinary least squares (OLS) estimator Generalized least squares (GLS) estimator Feasible generalized least squares (FGLS) estimator Iterated feasible least squares (ITGLS) estimator Ordinary Least Squares (OLS) Estimator The OLS estimator is given by the rule ^ = (XTX)-1XTy. This rule can be used to directly estimate the (MK)x1 vector of parameters, , in the single big equation. However, because the data matrix for the big equation is block diagonal, this is equivalent to estimating each of the M-equations separately by OLS. Properties of the OLS Estimator If the sample data are generated by the SUR regression model, then the OLS estimator is unbiased but inefficient. The reason OLS is inefficient is because it wastes information. This is because the errors in the big equation are nonspherical and OLS does not use this information to obtain estimates of the parameters. Thus, there exists an alternative estimator that uses the information about the nonspherical errors to obtain more precise estimates. Note also that the OLS estimator does not produce maximum likelihood estimates. Generalize Least Squares (GLS) Estimator The GLS estimator is given by the rule: ^GLS = (XTW-1X)-1XT W-1y or equivalently ^GLS = [XT (-1I) X]-1XT (-1I) y This rule can be applied directly to the big equation, where X is the (MT)x(MK) data matrix, y is the (MT)x1 vector of observations on the dependent variable, and W is the (MT)x(MT) variance-covariance matrix of disturbances for the big equation. Properties of the GLS Estimator If the sample data are generated by the SUR regression model, then the GLS estimator is unbiased, efficient, and the maximum likelihood estimator. The reason the GLS estimator is more precise than the OLS estimator is that it uses the information about the nonspherical disturbances contained in W to obtain estimates of the parameters. Major Shortcoming of the GLS Estimator The GLS estimator is not a feasible estimator, because you don’t know the elements of the variance-covariance matrix of disturbances, W, for the big equation. Feasible Generalized Least Squares (FGLS) Estimator To make the GLS estimator a feasible estimator, you can use the sample of data to obtain an estimate of W. When you replace true W with its estimate W^ you get the FGLS estimator. The FGLS estimator is given by the rule: ^FGLS = (XTW-1^X)-1XT W-1^y or equivalently ^FGLS = [XT (-1^I) X]-1XT (-1^I) y Estimating W The most often used method for estimating W is Zellner’s method. When Zellner’s method is used to estimate W the FGLS estimator is called Zellner’s SUR estimator. To obtain an estimate of W using Zellner’s method you proceed as follows. 1. Estimate each of the M-equations separately using OLS. 2. Use the residuals from the OLS regressions to obtain estimates of the variances and covariances of the disturbances for the M-equations. The estimators are: ii^ i^Ti^ = T and i^Tj^ ij^ = T Where ii^ is the estimate of the error variance for the ith equation; ij^ is the estimate of the covariance of errors for the ith and jth equations; i^ is the vector of residuals for the ith equation; j^ is the vector of residuals for the jth equation; and T is the sample size. 3. Use the estimates of the variances and covariances from step 2 to form an estimate of the MxM matrix . 4. Construct the TxT identity matrix I. 5. Apply the formula W^ = ^I to obtain an estimate of the variance-covariance matrix of disturbances for the big equation. Once you have the estimate of W, you can use the sample data and the rule ^FGLS = (XTW-1^X)-1XT W-1^y to obtain estimates of the parameters. Properties of the SUR Estimator If the sample data is generated by the SUR regression model, then Zellner’s SUR estimator is asymptotically equivalent to the GLS estimator and is a maximum likelihood estimator. Therefore, it is asymptotically unbiased, efficient, and consistent. The small sample properties of Zellner’s SUR estimator are unknown, but Monte Carlo studies suggest it is unbiased and has a smaller variance than the OLS estimator Iterated Feasible Generalize Least Squares (IFGLS) Estimator An alternative FGLS estimator is the iterated FGLS estimator. The IFGLS estimator used most often is Zellner’s iterated SUR (ISUR) estimator. The steps involved in using the ISUR estimator are as follows. 1. Estimate the parameters of the big equation using Zellner’s SUR estimator described above. 2. Use the parameter estimates from this regression to compute the residuals for each of the M-equations. 3. Use the residuals to obtain new estimates of the variances and covariances of the disturbances for the M-equations, and therefore a new estimate of and W. 4. Use the new estimate of W to repeat step 1 and obtain new parameter estimates. 5. Repeat steps 2, 3, and 4. (Each time you obtain new parameter estimates this completes an iteration). 6. Continue to iterate until convergence is achieved. Convergence is achieved when the change in the parameter estimates is very small. Very small is defined by a predetermined criterion. This last set of parameter estimates are the ISUR estimates. Properties of the ISUR Estimator The ISUR estimator has the same asymptotic properties as the SUR estimator. However, there is an ongoing debate about whether the ISUR or SUR estimator yields better estimates in small samples. Most econometricians seem to prefer the ISUR estimator. One reason for this is given below. Singular Seemingly Unrelated Regressions Models For some types of seemingly unrelated regressions models (e.g., consumer demand equations implied by utility maximizing behavior; input demand equations implied by cost minimizing and profit maximizing behavior) the variance-covariance matrix of disturbances W for the big equation is singular, and therefore the entire system of M-equations cannot be estimated jointly. These are called singular SUR models. To solve the singularity problem, you drop one of the M-equations and estimate the remaining M – 1 equations jointly. The ISUR parameter estimates are invariant to the equation dropped; that is, you will always get the same parameter estimates regardless of the equation you eliminate. This is not true for the SUR parameter estimates. Thus, when estimating the parameters of a singular SUR model you should use the ISUR estimator. Common Properties of the SUR AND, ITSUR Estimators 1. If the error terms across equations are not contemporaneously correlated, then the SUR and ISUR estimators collapse to the OLS estimator and there are no efficiency gains. 2. If each of the M-equations have the same data matrix, X1 = X2 = … XM, then the SUR and ISUR estimators collapse to the OLS estimator. This occurs if each of the M-equations have identical explanatory variables with identical observations. In this case, there are no efficiency gains from using the SUR or ISUR estimator. 3. If there are cross equation restrictions, then there are efficiency gains from using the SUR or ISUR estimator, even if the error terms across equations are not correlated or the data matrix for each of the M-equations is the same. SPECIFICATION TESTING A specification test tests an assumption that defines the specification of a statistical model. An often used specification test for the SUR model is the Breusch-Pagan Test of Independent Errors. Breusch-Pagan Test of Independent Errors The Breusch Pagan Test is used to test the assumption that the errors across equations are contemporaneously correlated. The null hypothesis is no contemporaneous correlation. The alternative hypothesis is contemporaneous correlation. For a two equation SUR model, the test statistic is the following Lagrange multiplier statistic that has a chi-square distribution with M(M-1)/M degrees of freedom. LM = Tr212 ~ χ2(M(M-1)/M), where r212 = (12^)2/(211^ 222^) Where T is the sample size, (12^)2 is the square of the sample covariance of the errors for the two equations, and 211^ and 222^ are the sample error variances for the two equations. This test statistic can be generalized for more than two equations. HYPOTHESIS TESTING The following statistical tests can be used to test hypotheses in the SUR model. 1) Asymptotic ttest. 2) Approximate F-test. 3) Wald test. 4) Likelihood ratio test. 5) Lagrange multiplier test. You must choose the appropriate test to test the hypothesis in which you are interested. Note the following. 1) The small sample t-test and F-test cannot be used. This is because if the sample data are generated by the SUR model we don’t know the sampling distribution of the tstatistic or the F-statistic. 2) Each of these tests is applied to the big equation. Cross-Equation Restrictions Economic theory and other sources of prior information often times imply that the values of two or more parameters in two or more equations are identical, or a parameter in one equation is a linear or nonlinear function of one or more parameters in one or more other equations. These are called cross equation restrictions. For example, in a system of M consumer demand equations implied by utility maximizing behavior the same parameters appear in different demand equations. These cross-equation restrictions can be easily tested and/or imposed in the context of the SUR model. GOODNESS-OF-FIT The R2 statistic that is used to measure the goodness-of-fit of a classical linear regression model is not appropriate for the SUR regression model. Many statistical programs will report an R2 statistic for each individual equation for the SUR model, but these R2 statistics have little if any meaning. They do not measure the proportion of the variation in the dependent variable which is explained by variation in the explanatory variables for the individual equation, and they can take values of less than zero and greater than one. Generalize R2 Statistic The generalized R2 statistic is a single number that measures the goodness of fit for the big equation, and therefore the entire system of individual equations. Calculating this statistic involves the following steps. 1. Construct a TxM matrix, denoted Y, which contains the observations on the dependent variables for the M-equations. Y = [y1 y2 … yM]TxM The M-columns of this matrix are the Tx1 vectors of observations on dependent variables for the Mequations. 2. Construct a TxM matrix, denoted YMean, which contains the sample means of the dependent variables for the M-equations. YMean = [y1Mean y2Mean … yMMean]TxM The M-columns of this matrix are the Tx1 vectors of contant mean for the dependent variables for the M-equations. 3. Construct a TxM mean deviation matrix, denoted YDev, as follows y = Y – YMean 4. Construct a MxM mean deviation cross products matrix, denoted A, as follows A = yTy 5. Estimate the big equation using an estimator that yields maximum likelihood estimates (e.g. SUR, ISUR, direct maximum likelihood). Obtain the residual cross products matrix, denoted S, from this regression. 5. Calculate the generalized R2 statistic as follows R2~ = 1 – (|S| / |A|) Where |S| is the determinant of the residual cross products matrix, and |A| is the determinant of the mean deviation cross products matrix. The R2~ statistic measures the proportion of the variation in the vector of observations on the dependent variable for the big equation that is explained by the variation in the explanatory variables in the big equation.