15. Applications of Feasible GLS (Two Step)

advertisement
Econometrics I
Professor William Greene
Stern School of Business
Department of Economics
15-1/46
Part 15: Generalized Regression Applications
Econometrics I
Part 15 – Generalized
Regression
Applications
15-2/46
Part 15: Generalized Regression Applications
Leading Applications of the GR Model
Heteroscedasticity and Weighted
Least Squares
 Autocorrelation in Time Series Models
 SUR Models for Production and Cost
 VAR models in Macroeconomics and
Finance

15-3/46
Part 15: Generalized Regression Applications
Two Step Estimation of the Generalized
Regression Model
Use the Aitken (Generalized Least Squares - GLS)
estimator with an estimate of 
1.  is parameterized by a few estimable parameters.
Examples, the heteroscedastic model
2. Use least squares residuals to estimate the variance
functions
3. Use the estimated  in GLS - Feasible GLS, or FGLS
15-4/46
Part 15: Generalized Regression Applications
General Result for Estimation
When  Is Estimated
True GLS uses [X-1 X] -1X-1y which
converges in probability to .
 We seek a vector which converges to the same
thing that this does. Call it “feasible” GLS,
1

1
ˆ
ˆ 1y
FGLS, based on  XΩ X  XΩ


15-5/46

Part 15: Generalized Regression Applications
FGLS
Feasible GLS is based on finding an estimator
which has the same properties as the true GLS.
Example Var[i] = 2 [Exp(zi)]2.
True GLS would regress y/[ Exp(zi)]
on the same transformation of xi.
With a consistent estimator of [,], say [s,c], we do
the same computation with our estimates.
So long as plim [s,c] = [,], FGLS is as “good” as
true GLS.
 Consistent
 Same Asymptotic Variance
 Same Asymptotic Normal Distribution
15-6/46
Part 15: Generalized Regression Applications
FGLS vs. Full GLS
VVIR (Theorem 9.6)
To achieve full efficiency, we do not
need an efficient estimate of the
parameters in , only a consistent
one.
15-7/46
Part 15: Generalized Regression Applications
Heteroscedasticity
Setting: The regression disturbances have unequal variances, but are
still not correlated with each other:
Classical regression with hetero-(different) scedastic (variance)
disturbances.
yi = xi + i, E[i] = 0, Var[i] = 2 i, i > 0.
The classical model arises if i = 1.
A normalization: i i = n. Not a restriction, just a scaling that is
absorbed into 2.
A characterization of the heteroscedasticity: Well defined estimators
and methods for testing hypotheses will be obtainable if the
heteroscedasticity is “well behaved” in the sense that no single
observation becomes dominant.
15-8/46
Part 15: Generalized Regression Applications
Behavior of OLS
Implications for conventional estimation technique
and hypothesis testing:
1. b is still unbiased. Proof of unbiasedness did
not rely on homoscedasticity
2. Consistent? We need the more general proof.
Not difficult.
3. If plim b = , then plim s2 = 2 (with the
normalization).
15-9/46
Part 15: Generalized Regression Applications
Inference Based on OLS
What of s2(XX)-1 ? Depends on XX - XX. If they are
nearly the same, the OLS covariance matrix is OK.
When will they be nearly the same? Relates to an
interesting property of weighted averages. Suppose i
is randomly drawn from a distribution with E[i] = 1.
Then, (1/n)ixi2  E[x2] and (1/n)iixi2  E[x2].
This is the crux of the discussion in your text.
15-10/46
Part 15: Generalized Regression Applications
Inference Based on OLS
VIR: For the heteroscedasticity to be substantive wrt estimation and
inference by LS, the weights must be correlated with x and/or x2.
(Text, page 272.)
If the heteroscedasticity is important. Then, b is inefficient.
The White estimator. ROBUST estimation of the variance of b.
Implication for testing hypotheses. We will use Wald tests. Why?
(ROBUST TEST STATISTICS)
15-11/46
Part 15: Generalized Regression Applications
Finding Heteroscedasticity
The central issue is whether E[2] = 2i is related
to the xs or their squares in the model.
Suggests an obvious strategy. Use residuals to
estimate disturbances and look for relationships
between ei2 and xi and/or xi2. For example,
regressions of squared residuals on xs and their
squares.
15-12/46
Part 15: Generalized Regression Applications
Procedures
White’s general test: nR2 in the regression of ei2 on all
unique xs, squares, and cross products. Chisquared[P]
Breusch and Pagan’s Lagrange multiplier test. Regress
{[ei2 /(ee/n)] – 1} on Z (may be X). Chi-squared. Is nR2
with degrees of freedom rank of Z. (Very elegant.)
Others described in text for other purposes. E.g.,
groupwise heteroscedasticity. Wald, LM, and LR tests
all examine the dispersion of group specific least
squares residual variances.
15-13/46
Part 15: Generalized Regression Applications
Estimation: WLS form of GLS
General result - mechanics of weighted least squares.
Generalized least squares - efficient estimation. Assuming
weights are known.
Two step generalized least squares:
 Step 1: Use least squares, then the residuals to
estimate the weights.
 Step 2: Weighted least squares using the estimated
weights.
 (Iteration: After step 2, recompute residuals and return to
step 1. Exit when coefficient vector stops changing.)
15-14/46
Part 15: Generalized Regression Applications
Autocorrelation
The analysis of “autocorrelation” in the narrow sense of correlation
of the disturbances across time largely parallels the discussions
we’ve already done for the GR model in general and for
heteroscedasticity in particular. One difference is that the relatively
crisp results for the model of heteroscedasticity are replaced with
relatively fuzzy, somewhat imprecise results here. The reason is
that it is much more difficult to characterize meaningfully “well
behaved” data in a time series context. Thus, for example, in
contrast to the sharp result that produces the White robust
estimator, the theory underlying the Newey-West robust estimator is
somewhat ambiguous in its requirement of a bland statement about
“how far one must go back in time until correlation becomes
unimportant.”
15-15/46
Part 15: Generalized Regression Applications
The Familiar AR(1) Model
t = t-1 + ut, || < 1.
This characterizes the disturbances, not the regressors.
 A general characterization of the mechanism producing 
history + current innovations
 Analysis of this model in particular. The mean and variance
and autocovariance
 Stationarity. Time series analysis.
 Implication: The form of 2; Var[] vs. Var[u].
 Other models for autocorrelation - less frequently used –
AR(1) is the workhorse.
15-16/46
Part 15: Generalized Regression Applications
Building the Model

Prior view: A feature of the data



“Account for autocorrelation in the data.”
Different models, different estimators
Contemporary view: Why is there autocorrelation?




15-17/46
What is missing from the model?
Build in appropriate dynamic structures
Autocorrelation should be “built out” of the model
Use robust procedures (Newey-West) instead of elaborate
models specifically for the autocorrelation.
Part 15: Generalized Regression Applications
Model Misspecification
15-18/46
Part 15: Generalized Regression Applications
Implications for Least Squares
Familiar results: Consistent, unbiased, inefficient, asymptotic normality
The inefficiency of least squares:
 Difficult to characterize generally. It is worst in “low frequency”
i.e., long period (year) slowly evolving data.
 Can be extremely bad. GLS vs. OLS, the efficiency ratios can
be 3 or more.
A very important exception - the lagged dependent variable
yt = xt + yt-1 + t. t = t-1 + ut,.
Obviously, Cov[yt-1 ,t ]  0, because of the form of t.
 How to estimate? IV
 Should the model be fit in this form? Is something missing?
Robust estimation of the covariance matrix - the Newey-West
estimator.
15-19/46
Part 15: Generalized Regression Applications
GLS and FGLS
Theoretical result for known  - i.e., known .
Prais-Winsten vs. Cochrane-Orcutt.
FGLS estimation: How to estimate ? OLS
residuals as usual - first autocorrelation.
Many variations, all based on correlation of et and
et-1
15-20/46
Part 15: Generalized Regression Applications
Testing for Autocorrelation
A general proposition: There are several tests. All are functions of the
simple autocorrelation of the least squares residuals. Two used
generally, Durbin-Watson and Lagrange Multiplier
The Durbin - Watson test. d  2(1 - r). Small values of d lead to
rejection of
NO AUTOCORRELATION: Why are the bounds necessary?
Godfrey’s LM test. Regression of et on et-1 and xt. Uses a “partial
correlation.”
15-21/46
Part 15: Generalized Regression Applications
Consumption “Function”
Log real consumption vs. Log real disposable income
(Aggregate U.S. Data, 1950I – 2000IV. Table F5.2 from text)
---------------------------------------------------------------------Ordinary
least squares regression ............
LHS=LOGC
Mean
=
7.88005
Standard deviation
=
.51572
Number of observs.
=
204
Model size
Parameters
=
2
Degrees of freedom
=
202
Residuals
Sum of squares
=
.09521
Standard error of e =
.02171
Fit
R-squared
=
.99824 <<<***
Adjusted R-squared
=
.99823
Model test
F[ 1,
202] (prob) =114351.2(.0000)
--------+------------------------------------------------------------Variable| Coefficient
Standard Error t-ratio P[|T|>t]
Mean of X
--------+------------------------------------------------------------Constant|
-.13526***
.02375
-5.695
.0000
LOGY|
1.00306***
.00297
338.159
.0000
7.99083
--------+-------------------------------------------------------------
15-22/46
Part 15: Generalized Regression Applications
Least Squares Residuals: r = .91
15-23/46
Part 15: Generalized Regression Applications
Conventional vs. Newey-West
+---------+--------------+----------------+--------+---------+----------+
|Variable | Coefficient | Standard Error |t-ratio |P[|T|>t] | Mean of X|
+---------+--------------+----------------+--------+---------+----------+
Constant
-.13525584
.02375149
-5.695
.0000
LOGY
1.00306313
.00296625
338.159
.0000
7.99083133
+---------+--------------+----------------+--------+---------+----------+
|Newey-West Robust Covariance Matrix
|Variable | Coefficient | Standard Error |t-ratio |P[|T|>t] | Mean of X|
+---------+--------------+----------------+--------+---------+----------+
Constant
-.13525584
.07257279
-1.864
.0638
LOGY
1.00306313
.00938791
106.846
.0000
7.99083133
15-24/46
Part 15: Generalized Regression Applications
FGLS
+---------------------------------------------+
| AR(1) Model:
e(t) = rho * e(t-1) + u(t) |
| Initial value of rho
=
.90693 | <<<***
| Maximum iterations
=
100 |
| Method = Prais - Winsten
|
| Iter= 1, SS=
.017, Log-L= 666.519353 |
| Iter= 2, SS=
.017, Log-L= 666.573544 |
| Final value of Rho
=
.910496 | <<<***
| Iter= 2, SS=
.017, Log-L= 666.573544 |
| Durbin-Watson:
e(t) =
.179008 |
| Std. Deviation: e(t) =
.022308 |
| Std. Deviation: u(t) =
.009225 |
| Durbin-Watson:
u(t) =
2.512611 |
| Autocorrelation: u(t) =
-.256306 |
| N[0,1] used for significance levels
|
+---------------------------------------------+
+---------+--------------+----------------+--------+---------+----------+
|Variable | Coefficient | Standard Error |b/St.Er.|P[|Z|>z] | Mean of X|
+---------+--------------+----------------+--------+---------+----------+
Constant
-.08791441
.09678008
-.908
.3637
LOGY
.99749200
.01208806
82.519
.0000
7.99083133
RHO
.91049600
.02902326
31.371
.0000
15-25/46
Part 15: Generalized Regression Applications
Seemingly Unrelated Regressions
The classical regression model, yi = Xii + i. Applies to
each of M equations and T observations. Familiar
example: The capital asset pricing model:
(rm - rf) = mi + m( rmarket – rf ) + m
Not quite the same as a panel data model. M is usually
small - say 3 or 4. (The CAPM might have M in the
thousands, but it is a special case for other reasons.)
15-26/46
Part 15: Generalized Regression Applications
Formulation
Consider an extension of the groupwise heteroscedastic
model: We had
yi = Xi + i with E[i|X] = 0, Var[i|X] = i2I.
Now, allow two extensions:
Different coefficient vectors for each group,
Correlation across the observations at each specific
point in time (think about the CAPM above. Variation in
excess returns is affected both by firm specific factors
and by the economy as a whole).
Stack the equations to obtain a GR model.
15-27/46
Part 15: Generalized Regression Applications
SUR Model
Two Equation System
y1  X1β1  1
 y1   X1 0   β1   1 
or    
  

y 2  X 2β 2   2
y 2   0 X2   β 2   2 
y = Xβ + 
0 
Ε[ | X]    ,
0 
15-28/46
 11 12   11I 12 I 
Ε[ | X] = E 
X  








I

I
22 
 2 1 2 2   12
= 2 
Part 15: Generalized Regression Applications
OLS and GLS
Each equation can be fit by OLS ignoring all others. Why do GLS?
Efficiency improvement.
Gains to GLS:
None if identical regressors - NOTE THE CAPM ABOVE!
Implies that GLS is the same as OLS. This is an application of a
strange special case of the GR model. “If the K columns of X are
linear combinations of K characteristic vectors of , in the GR
model, then OLS is algebraically identical to GLS.” We will forego
our opportunity to prove this theorem. This is our only application.
(Kruskal’s Theorem)
Efficiency gains increase as the cross equation correlation increases
(of course!).
15-29/46
Part 15: Generalized Regression Applications
The Identical X Case
Suppose the equations involve the same X matrices. (Not just the
same variables, the same data. Then GLS is the same as equation
by equation OLS.
Grunfeld’s investment data are not an example - each firm has its own
data matrix.
The 3 equation model on page 313 with Berndt and Wood’s data give
an example. The three share equations all have the constant and
logs of the price ratios on the RHS. Same variables, same years.
The CAPM is also an example.
(Note, because of the constraint in the B&W system (same δ
parameters in more than one equation), the OLS result for identical
Xs does not apply.)
15-30/46
Part 15: Generalized Regression Applications
Estimation by FGLS
Two step FGLS is essentially the same as the groupwise
heteroscedastic model.
(1) OLS for each equation produces residuals ei.
(2) Sij = (1/n)eiej then do FGLS
Maximum likelihood estimation for normally distributed
disturbances: Just iterate FLS.
(This is an application of the Oberhofer-Kmenta result.)
15-31/46
Part 15: Generalized Regression Applications
Inference About the Coefficient Vectors
Usually based on Wald statistics.
If the estimator is maximum likelihood, LR statistic
T(log|Srestricted| - log|Sunrestricted|)
is a chi-squared statistic with degrees of freedom
equal to the number of restrictions.
Equality of the coefficient vectors: (Historical note: Arnold Zellner, The
original developer of this model and estimation technique: “An
Efficient Method of Estimating Seemingly Unrelated Regressions
and Tests of Aggregation Bias”
(my emphasis). JASA, 1962, pp. 500-509.
 What did he have in mind by “aggregation bias?”
 How to test the hypothesis?
15-32/46
Part 15: Generalized Regression Applications
Application
A Translog demand system for a 3 factor process: (To bypass a
transition in the notation, we proceed directly to the application)
Electricity, Y, is produced using Fuel, F, capital, K, and Labor, L.
Theory:
The production function is Y = f(K,L,F). If it is smooth, has continuous
first and second derivatives, and if(1) factor prices are determined in
a market and (2) producers seek to minimize costs (maximize
profits), then there is a “cost function”
C = C(Y,PK,PL,PF).
Shephard’s Lemma states that the cost minimizing factor demands are
given by
Xm = C(…)/Pm.
Take logs gives the factor share equations,
logC(…)/logPm = Pm/C  C(…)/Pm = PmXm/C
which is the proportion of total cost spent on factor m.
15-33/46
Part 15: Generalized Regression Applications
Translog
15-34/46
Part 15: Generalized Regression Applications
Restrictions
15-35/46
Part 15: Generalized Regression Applications
Data – C&G, N=123
15-36/46
Part 15: Generalized Regression Applications
15-37/46
Part 15: Generalized Regression Applications
Least Squares Estimate of Cost Function
---------------------------------------------------------------------Ordinary
least squares regression ............
LHS=C
Mean
=
-.38339
Standard deviation
=
1.53847
Number of observs.
=
123
Model size
Parameters
=
10
Degrees of freedom
=
113
Residuals
Sum of squares
=
2.32363
Standard error of e =
.14340
Fit
R-squared
=
.99195
Adjusted R-squared
=
.99131
Model test
F[ 9,
113] (prob) = 1547.7(.0000)
--------+------------------------------------------------------------Variable| Coefficient
Standard Error t-ratio P[|T|>t]
Mean of X
--------+------------------------------------------------------------Constant|
-7.79653
6.28338
-1.241
.2172
Y|
.42610***
.14318
2.976
.0036
8.17947
YY|
.05606***
.00623
8.993
.0000
35.1125
PK|
2.80754
2.11625
1.327
.1873
.88666
PL|
-.02630 (!)
2.54421
-.010
.9918
5.58088
PKK|
.69161
.43475
1.591
.1144
.43747
PLL|
.10325
.51197
.202
.8405
15.6101
PKL|
-.48223
.41018
-1.176
.2422
5.00507
YK|
-.07676**
.03659
-2.098
.0381
7.25281
YL|
.01473
.02888
.510
.6110
45.6830
--------+-------------------------------------------------------------
15-38/46
Part 15: Generalized Regression Applications
FGLS
Criterion function for GLS is log-likelihood.
Iteration
0, GLS
=
514.2530
Iteration
1, GLS
=
519.8472
Iteration
2, GLS
=
519.9199
---------------------------------------------------------------------Estimates for equation: C.........................
Generalized least squares regression ............
LHS=C
Mean
=
-.38339
Residuals
Sum of squares
=
2.24766
Standard error of e =
.14103
Fit
R-squared
=
.99153
Adjusted R-squared
=
.99085
Model test
F[ 9,
113] (prob) = 1469.3(.0000)
--------+------------------------------------------------------------Variable| Coefficient
Standard Error b/St.Er. P[|Z|>z]
Mean of X
--------+------------------------------------------------------------Constant|
-9.51337**
4.26900
-2.228
.0258
Y|
.48204***
.09725
4.956
.0000
8.17947
YY|
.04449***
.00423
10.521
.0000
35.1125
PK|
2.48099*
1.43621
1.727
.0841
.88666
PL|
.61358
1.72652
.355
.7223
5.58088
PKK|
.65620**
.29491
2.225
.0261
.43747
PLL|
-.03048
.34730
-.088
.9301
15.6101
PKL|
-.42610
.27824
-1.531
.1257
5.00507
YK|
-.06761***
.02482
-2.724
.0064
7.25281
YL|
.01779
.01959
.908
.3640
45.6830
--------+-------------------------------------------------------------
15-39/46
Part 15: Generalized Regression Applications
Maximum Likelihood Estimates
----------------------------------------------------------Constrained MLE for Multivariate Regression Model
First iteration: 0 F= -48.2305 log|W|= -7.72939 gtinv(H)g=
2.0977
Last iteration: 5 F= 508.8056 log|W|= -16.78689 gtinv(H)g=
.0000
Number of observations used in estimation =
123
Model:
ONE
PK
PL
PKK
PLL
PKL
Y
YY
YK
YL
C
B0
BK
BL
CKK
CLL
CKL
CY
CYY
CYK
CYL
SK
BK
CKK
CKL
CYK
SL
BL
CKL
CLL
CYL
--------+-------------------------------------------------Variable| Coefficient
Standard Error b/St.Er. P[|Z|>z]
(FGLS)
--------+-------------------------------------------------B0|
-6.71218***
.21594
-31.084
.0000
-9.51337
CY|
.58239***
.02737
21.282
.0000
.48204
CYY|
.05016***
.00371
13.528
.0000
.04449
BK|
.22965***
.06757
3.399
.0007
2.48099
BL|
-.13562*
.07948
-1.706
.0879
.61358
CKK|
.11603***
.01817
6.385
.0000
.65620
CLL|
.07801***
.01563
4.991
.0000
-.03048
CKL|
-.01200
.01343
-.894
.3713
-.42610
CYK|
-.00473*
.00250
-1.891
.0586
-.06761
CYL|
-.01792***
.00211
-8.477
.0000
.01779
--------+--------------------------------------------------
15-40/46
(OLS)
-7.79653
.42610
.05606
2.80754
-.02630
.69161
.10325
-.48223
-.07676
.01473
Part 15: Generalized Regression Applications
Vector Autoregression
The vector autoregression (VAR) model is one of the most successful, flexible,
and easy to use models for the analysis of multivariate time series. It is
a natural extension of the univariate autoregressive model to dynamic multivariate
time series. The VAR model has proven to be especially useful for
describing the dynamic behavior of economic and financial time series and
for forecasting. It often provides superior forecasts to those from univariate
time series models and elaborate theory-based simultaneous equations
models. Forecasts from VAR models are quite flexible because they can be
made conditional on the potential future paths of specified variables in the
model.
In addition to data description and forecasting, the VAR model is also
used for structural inference and policy analysis. In structural analysis, certain
assumptions about the causal structure of the data under investigation
are imposed, and the resulting causal impacts of unexpected shocks or
innovations to specified variables on the variables in the model are summarized.
These causal impacts are usually summarized with impulse response
functions and forecast error variance decompositions.
Eric Zivot: http://faculty.washington.edu/ezivot/econ584/notes/varModels.pdf
15-41/46
Part 15: Generalized Regression Applications
VAR
y1 (t )  11 y1 (t  1)  12 y2 (t  1)  13 y3 (t  1)  1 x(t )  1 (t )
y2 (t )   21 y1 (t  1)   22 y2 (t  1)   23 y3 (t  1)  2 x(t )   2 (t )
y3 (t )   31 y1 (t  1)   32 y2 (t  1)   33 y3 (t  1)  3 x(t )  3 (t )
(In Zivot's examples,
1. Exchange rates
2. y(t)=stock returns, interest rates, indexes of industrial production,
rate of inflation
15-42/46
Part 15: Generalized Regression Applications
VAR Formulation
y (t) = y (t-1) + x(t) + (t)
SUR with identical regressors.
Granger Causality: Nonzero off diagonal elements in 
y1 (t )  11 y1 (t  1)  12 y2 (t  1)  13 y3 (t  1)  1 x(t )  1 (t )
y2 (t )   21 y1 (t  1)   22 y2 (t  1)   23 y3 (t  1)  2 x(t )   2 (t )
y3 (t )   31 y1 (t  1)   32 y2 (t  1)   33 y3 (t  1)  3 x(t )  3 (t )
Hypothesis: y2 does not Granger cause y1: 12 =0
15-43/46
Part 15: Generalized Regression Applications
Impulse Response
y (t) = y (t-1) + x(t) + (t)
By backward substitution or using the lag operator (text, 943)
y (t)  x(t)  x(t-1)   2 x(t-2) +... (ad infinitum)
+ (t)  (t-1)   2 (t-2) + ...
[ P must converge to 0 as P increases. Roots inside unit circle.]
Consider a one time shock (impulse) in the system,  =  2 in period t
Consider the effect of the impulse on y1 ( s ), s=t, t+1,...
Effect in period t is 0.  2 is not in the y1 equation.
 2 affects y2 in period t, which affects y1 in period t+1. Effect is 12  
In period t+2, the effect from 2 periods back is ( 2 )12  
... and so on.
15-44/46
Part 15: Generalized Regression Applications
Zivot’s Data
15-45/46
Part 15: Generalized Regression Applications
Impulse Responses
15-46/46
Part 15: Generalized Regression Applications
Download