An Overview of Econometrics using B34S, MATLAB, Stata and SAS

advertisement
Econometric Notes
30 December 2014
Econometric Notes *
Houston H. Stokes
Department of Economics
University of Illinois in Chicago
hhstokes@uic.edu
An Overview of Econometrics * .......................................... 1
Objective of Notes .................................................. 1
1. Purpose of statistics................................................ 3
2. Role of statistics .................................................. 3
3. Basic Statistics ................................................... 4
4. More complex setup to illustrate B34S Matrix Approach..................... 11
5. Review of Linear Algebra and Introduction to Programming Regression Calculations 14
Figure 5.1 X'X for a random Matrix X ................................... 22
Figure 5.2 3D plot of 50 by 50 X'X matrix where X is a random matrix Error! Bookmark
not defined.
6. A Sample Multiple Input Regression Model Dataset ........................ 25
Figure 6.1 2 D Plots of Textile Data ..................................... 31
Figure 6.2 3-D Plot of Theil (1971) Textile Data ............................ 33
7. Advanced Regression analysis ....................................... 47
Figure 7.1 Analysis of residuals of the YMA model. .......................... 58
Figure 7.2 Recursively estimated X1 and X3 coefficients for X1 Sorted Data .......... 60
Figure 7.3 CUSUM test on Estimated with Sorted Data ........................ 61
Figure 7.4 CUMSQ Test of Model y model estimated with sorted data............... 63
Figure 7.5 Quandt Likelihood Ratio tests of y model estimated with sorted data. ....... 64
8. Advanced concepts ............................................. 64
9. Summary .................................................... 67
Objective of Notes
The objective of these notes is to introduce students to the basics of applied regression calculation
using STATA setups of a number of very simple models. Computer code is shown to allow students
to "get going" ASAP. More advanced sections show matlab code to made calculations. The notes are
1
Econometric Notes
organized around the estimation of regression models and the use of basic statistical concepts. The
textbooks Introduction to Econo metrics by Christopher Dougherty 4th Edition Oxford 2011 or
Introductory Econometrics: A Modern Approach by Jeffrey Wooldridge 5th Edition, South-Western
Cengage 2013 can be used to provide added information. A number of examples from this book will
be shown. Statistical analysis will be treated, both as a means by which the data can be summarized,
and as a means by which it is possible to accept or reject a specific hypothesis. Four simple datasets
are initially discussed:
-
The Price vs Age of Cars dataset illustrates a simple 2 variable OLS model where
graphics and correlation analysis can be used to detect relationships.
-
The Theil (1971) Textile deta set illustrates use of log transformations and contracts 2D
and 3D graphic analysis of data. A variable with a low correlation was show to enter an
OLS model only in the presence of another variable.
-
The Brownlee (1965) Stack Loss data set illustrates how in a multiple regression context,
variables with "significant" correlation may not enter a full model.
-
The Brownlee (1965) Stress data set illustrates the dangers of relying on correlation
analysis.
Finally a number of statistical problems and procedures that might be used are discussed.
2
Econometric Notes
1. Purpose of statistics
- Summarize data
- Test models
- Allow one to generalize from a sample to the wider population.
2. Role of statistics
Quote by Stanley (1856) in a presidential address to section F of the British Association for the
Advancement of Science.
"The axiom on which ....(statistics) is based may be stated thus: that the laws by which nature is
governed, and more especially those laws which operate on the moral and physical condition of the
human race, are consistent, and are, in all cases best discoverable - in some cases only discoverable by the investigation and comparison of phenomena extending over a very large number of individual
instances. In dealing with MAN in the aggregate, results may be calculated with precision and
accuracy of a mathematical problem... This then is the first characteristic of statistics as a science:
that it proceeds wholly by the accumulation and comparison of registered facts; - that from these
facts alone, properly classified, it seeks to deduce general principles, and that it rejects all a priori
reasoning, employing hypothesis, if at all, only in a tentative manner, and subject to future
verification"
3
Econometric Notes
(Note: underlining entered by H. H. Stokes)
3. Basic Statistics
Key concepts:
x   x / N  x
-Mean
-Median
-Mode
-Population Variance
=middle data value
= data value with most cases
=  x2
-Sample Variance
-Population Standard Deviation
= s2
= x
-Sample Standard Deviation
-Confidence Interval with k%
-Correlation
= sx
=> a range of data values
= p xy
k
y     i X i  e
-Regression
i 1
Where X i = is a N by K matrix of explanatory variables.
-Percentile
-Quartile
-Z score
-t test
-SE of the mean
-Central Limit Theorem
Statistics attempts to generalize about a population from a sample. For the purposes of this
discussion assume the population of men in the US. A 1/1000 sample from this population would be
a randomly selected sample of men such that the sample contained only one male for every 1000 in
the population. The task of statistics is to be able to draw meaningful generalizations from the
sample about the population. It is costly, and often impossible, to examine all the measurements in
the population of interest. A sample must be selected in such a manner such that it is representative
of the population.
In a famous example of the potential for problems in sample selection, during the depression
in the 1932 presidential election the Literery Digest attempted to sample the electorate. A staff was
selected and numbers to call were randomly selected from the phone book in New York. In each
call the question was asked “Who will you vote for, Mr. Roosevelt or President Hoover?” Those
called, for the most part, supported President Hoover being relected. When Mr. Roosevelt won the
election, the question was asked? What went wrong in the sampling process? The assumption that
those who had phones was the correct characterization of poplution of the voters, was the problem.
4
Econometric Notes
Those without phones in that period disproportionally went for Mr. Roosevelt biasing the results of
the study.
In summary, statistics allows us to use the information contained in a representative sample
to correctly make inferences about the population. For example if one were interested in ascertaining
how long the light bulbs produced by a certain company last, one could hardly test them all.
Sampling would be necessary. The bootstrap can be used to test the distribution of statistics
estimated from a sample whose distribution is not known.
In addition to sampling correctly, it is important to be able to detect a shift in the underlying
population. The usual practice is to draw a sample from the population to be able to make inferences
about the underlying population. If the population is shifting, such samples will give biased
information. For example assume a reservoir. If a rain comes and adds to and stirs up the water in
the reservoir, samples of water would have to be taken more frequently than if there had been no rain
and there was no change in water usage. The interesting question is how do you know when to start
increasing the sampling rate? A possible approach would be to increase the sampling rate when the
water quality of previous samples begins to fall outside normal ranges for the focus variable. In this
example, it is not possible to use the population of all the water in the reservoir to test the water. A
number of key concepts are listed next.
Measures of Central Tendency. The mean is a measure of central tendency. Assume a vector
x containing N observations. The mean is defined as
N
x   xi / N
(3-1)
i 1
Assuming xi = (1 2 3 4 5 6 7 8 9), then N=9, and x  5 . The mean is often written as  x or E(x) or
the expected value of x. The problem with the mean as a measure of central tendency is that it is
affected by all observations. If instead of making x9 = 9, make x9 = 99. Here x  (45  90) / 5  15
which is bigger than all xi values except x9. The median defined as the middle term of an odd
number of terms or the average of the two middle terms when the terms have been arranged in
increasing order is not affected by outlier terms. In the above example the median is 5 no matter
whether x9 = 9 or x9 = 99. The final measure of central tendency is the mode or value which has the
highest frequency. The mode may not be unique. In the above example, it does not exist.
Variation. It has been reported that a poor statistician once drowned in a stream with a mean
depth of 6 inches. How could this occur? To summarize the data, we also need to check on
variation, something that can be done by looking at the standard deviation and variance. The
population variance of a vector x is defined as
 x2  i 1 ( xi  x )2 / N
N
(3-2)
5
Econometric Notes
while the sample variance s x2 is
N
sx2   ( xi  x ) 2 /( N  1)
(3-3)
i 1
The population standard deviation  x is the square root of the population variance. For the purposes
of these notes, the standard deviation will mean the sample standard deviation. There are alternative
formulas for these values that may be easier to use. As an alternative to (3-2) and (3-3)
N
N
i 1
i 1
N
N
i 1
i 1
 x2  ( N  xi  ( xi ) 2 ) / N 2
(3-4)
sx2  ( N  xi  ( xi ) 2 ) / N ( N  1)
(3-5)
For implementing the variance in a computer program, (3-2) is more accurate than (3-4)? Why is
this the case?
If sx is unbiased, a general rule is that xi will lie 99% of the time in + - 3 standard
deviations, 95% of the time in + - 2 standard deviations, and 68% of the time in + - 1 standard
deviations. Given a vector of numbers, it is important to determine where a certain number might lie.
There are 4 quartile positions of a series. Quartile 1 is the top of the lower 25%, quartile 2 the top of
the lower 50% or the median. Quartile 3 is the top of the 75%. The standard deviation gives
information concerning where observations lie. Assume  x = 10, sx = 5 and N = 300. The question
asked is how likely will a value > 14 occur? To answer this question requires putting the data in Z
form where
Z  ( xi  x ) / sx
(3-6)
Think of Z as a normalized deviation. Once we get Z, we can enter tables and determine how likely
this will occur. In this case Z = (14-10)/5 = .8.
Distribution of the mean. It often is desirable to know how the sample mean x is
distributed. Assuming a vector has a finite distribution and that each xi value is mutually
independent, then the Central Limit Theorem states that if the vector ( x1 ,...., xN ) has any distribution
with mean  and variance  2 , then the distribution of x approaches the normal distribution with
mean  and variance  2 / N as sample size N increases. Note that the standard deviation of the mean
 x defined as
6
Econometric Notes
x  / N
(3-7)
Given  x and  x the 95% confidence interval around  x is
x  2 x  x   x  2 x_
(3-8)
For small samples (<30) the formula is
 x  t.025 ( sx / N ) 2   x   x  t.025 ( sx / N ) 2
(3-9)
Tests of two means. Assume two vectors x and y where we know x , y , sx2 and s 2y . The simplest test
if the means differ is
Z  ( x  y ) /(( sx2 / N x )  ( s2y / N y )).5
(3-10)
where the small sample approximation assuming the two samples have the same population standard
deviation is
t  ( x  y ) /(( s2p / N x )  ( s2p / N y )).5
(3-11)
s2p  (( N x  1)sx2  ( N y  1)s 2y ) /( N x  N y  2)
(3-12)
Note that s 2p is an estimate of the population variance.
Correlation. If two variables are thought to be related, a possible summary measure would be
the correlation coefficient  . Most calculators or statistical computer programs will make the
calculation. The standard error of  is ( N  1) .5 for small samples and N .5 for large samples.
This means that p /( N  1).5 is distributed as a t statistic with asymptotic percentages as given above .
The correlation coefficient  is defined as
  ( xy  ( x y )) /( x y )
(3-13)
Perfect positive correlation is 1.0, perfect negative correlation is -1.0. The SE of  is converges to
0.0 as N . If N was 101, the SE of r would be 1/10 or .1. |  | must be  .2 to be significant at or
better than the 95% level. Correlation is major tool of analysis that allows a person to formalize what
7
Econometric Notes
is shown in an x y plot of the data. A simple data set will be used to illustrates these concepts and
introduce OLS models as well as show the flaws of correlation analysis as a diagnostic tool.
Single Equation OLS Regression Model. Data was obtained on 6 observations on age and
value of cars (from Freund [1960] Modern Elementary Statistics, page 332), two variables that are
thought to be related. Table One lists this data and gives means, correlation between age and value
and a simple regression value=f(age). We expect the relationship to be negative and significant.
Table 1. Age of cars
Obs
1
2
3
4
5
6
Age
1
3
6
10
5
2
Mean
4.5
Variance
10.7
Correlation
Value
1995
875
695
345
595
1795
1050
461750
-0.85884
Next we show the Stata command files to obtain analysis of this data. Assume you have a file
car_age_data.do
input double
0.1E+01
0.3E+01
0.6E+01
0.1E+02
0.5E+01
0.2E+01
end
label variable x
input double
0.1995E+04
0.8750E+03
0.6950E+03
0.3450E+03
0.5950E+03
0.1795E+04
label variable y
//
x
"AGE OF CARS
"
"PRICE OF CARS
"
y
Comment
// run
car_age_data.do
describe
summarize
list
correlate (x y)
regress y x
twoway (scatter y x)
8
Econometric Notes
Edited output is:
clear
. run
car_age_data.do
.
describe
Contains data
obs:
6
vars:
2
size:
96
----------------------------------------------------------------------------------storage display
value
variable name
type
format
label
variable label
----------------------------------------------------------------------------------x
double %10.0g
AGE OF CARS
y
double %10.0g
PRICE OF CARS
----------------------------------------------------------------------------------Sorted by:
Note: dataset has changed since last saved
.
summarize
Variable |
Obs
Mean
Std. Dev.
Min
Max
-------------+-------------------------------------------------------x |
6
4.5
3.271085
1
10
y |
6
1050
679.5219
345
1995
.
list
1.
2.
3.
4.
5.
6.
+-----------+
| x
y |
|-----------|
| 1
1995 |
| 3
875 |
| 6
695 |
| 10
345 |
| 5
595 |
|-----------|
| 2
1795 |
+-----------+
.
(obs=6)
correlate (x y)
|
x
y
-------------+-----------------x |
1.0000
y | -0.8588
1.0000
.
regress y x
Source |
SS
df
MS
-------------+-----------------------------Model | 1702935.05
1 1702935.05
Residual | 605814.953
4 151453.738
-------------+-----------------------------Total |
2308750
5
461750
Number of obs
F( 1,
4)
Prob > F
R-squared
Adj R-squared
Root MSE
=
=
=
=
=
=
6
11.24
0.0285
0.7376
0.6720
389.17
-----------------------------------------------------------------------------y |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------x | -178.4112
53.20631
-3.35
0.028
-326.1356
-30.68683
_cons |
1852.85
287.3469
6.45
0.003
1055.048
2650.653
-----------------------------------------------------------------------------.
twoway (scatter y x)
.
end of do-file
9
0
500
1000
1500
2000
Econometric Notes
0
2
4
AGE OF CARS
6
8
10
From the plot we see that the ten year old car appears to have a larger that expected value for its age.
For this reason, more variables and observations are needed.
Remark: When there are two series correlation and plots can be used effectively to determine the
model. However when there are more that two series, plots and correlation analysis are less useful
and in may cases can give the wrong impression. This will be illustrated later. In cases where there
are more than one explanatory variable, regression is the appropriate approach, although this
approach has many problems.
A regression tries to write the dependent variable y as a linear function of the explanatory variables.
In this case we have estimated a model of the form
y   xe
(3-14)
where y=value is the price of the car in period t, x=age is the age in period t and e is the error term.
Regression output produces
10
Econometric Notes
value = 1852.8505 - 178.41121*age
(6.45)
(-3.35)
(3-15)
R2 = .672, SEE = 389.17, e'e = 605814.953
which can be verified from the printout. Note that SEE= 605814.953 / 4  (e ' e) / n  k .
The regression model suggests that every year older a car gets the value significantly drops
$178.41. A car one year old should have a value of 1852.8505 - (1)*178.41221 = 1674.4. In the
sample data set the one year old car in fact had a value of 1995. For this observation the error was
320.56. Using the estimated equation (3-14) we have
Age
1
3
6
10
5
2
Actual Value
1995
875
695
345.0
595
1795
Estimated Value
1674.4
1317.6
782.38
68.738
960.79
1496
Error
320.56
-442.62
-87.383
276.26
-365.79
298.97
t scores have been placed under the estimated coefficients. Since for both coefficients |t| > 2, we can
state that given the assumptions of the linear regression model, both coefficients are significant.
Before turning to an in-depth discussion of the regression model, we look at a few optional topics.
4. More complex setup to illustrate Matlab to estimate the Model. Optional Topic.
This optional topic implements the key ideas in Appendix E of Wooldridge (2013) that show how a
linear econometrioc model has be estimated by OLS. As discussed in the text, a linear OLS Model
selects the coefficients so as to minimize the sum of squared errors. Define X as an N by K matrix
where N is the number of observation of K series. The OLS coefficient vector   ( X ' X ) 1 X ' y
where y is the right hand side vector. The error vector e  X  . Standard errors of the coefficients
can be obtained from the square root of diagonal elements of  2 ( X ' X ) 1 where  2  e ' e /( N  K ) .
As an alternative to the Stata regress command that was shown above, the self contained MATLAB
program that is listed next can be used to estimate the model.
%% Cars Example using Matlab
% Load data
x=[1,1;
1,3;
1,6;
1,10;
1,5;
1,2];
y=[1995 875 695 345 595 1795];
11
Econometric Notes
y=y';
value=y;
disp('Mean of dependent (Age) and Independent Variable (Value)')
disp([mean(y),mean(x(:,2))])
age=x(:,2);
disp('Small and Large Variances for Age and Value')
disp([var(age,0),var(age,1),var(value,0),var(value,1)])
disp('Correlation using formula and built in function')
cor=(mean(age.*y)-(mean(age)*mean(y)))/(sqrt(var(age,1))*sqrt(var(value,1)))
% using built in function
cor=corr([age,value])
%% Estimate the model
% Logic works for any sized problem!!
% for large # of obs put ; at end of [y,yhat,res] line
beta=inv(x'*x)*x'*y;
yhat=x*beta;
res=y-yhat;
disp('
Value
Yhat
Res')
[y,yhat,res]
ssr=res'*res;
disp('Sum of squared residuals')
disp(ssr)
df=size(x,1)-size(x,2);
se=sqrt(diag((ssr/df)*inv(x'*x)));
disp('
Beta
se
t')
t=beta./se;
[beta,se,t]
plot(res)
% plot(age,y,age,yhat)
disp('Durbin Watson')
i=1:1:5;
dw=((res(i+1)-res(i))'*(res(i+1)-res(i)))/(res'*res);
disp(dw)
Which produces output:
Mean of dependent (Age) and Independent Variable (Value)
1050
4.5
Small and Large Variances for Age and Value
10.7
8.9167 4.6175e+005 3.8479e+005
Correlation using formula and built in function
cor =
-0.85884
cor =
1
-0.85884
-0.85884
1
Value
Yhat
Res
ans =
1995
1674.4
320.56
875
1317.6
-442.62
695
782.38
-87.383
345
68.738
276.26
595
960.79
-365.79
1795
1496
298.97
Sum of squared residuals
12
Econometric Notes
6.0581e+005
Beta
ans =
1852.9
-178.41
Durbin Watson
2.7979
se
t
287.35
53.206
6.4481
-3.3532
which matches what was produced by the Stata regress commands which can give the user the
impression of a "black box." Our findings indicate that for every year on average the car falls in
value $178.41.
Remark: Econometric calculations can easily be programmed using 4th generation languages
without detailed knowledge of Fortran or C. This allows new techniques to be implemented without
waiting for software developers to "hard wire" these procedures.
13
Econometric Notes
5. Review of Linear Algebra and Introduction to Programming Regression Calculations.
Optional Topic for those with right math background.
Assume a problem where there are multiple x variables, all possibly related to y, and there is
some relationship between the x variables (multicollinearity). The proposed solution is to fit a linear
model of the form:
k
y      i xi  e ,
(5-1)
i 1
where y, xi and e are N element column vectors, i is the coefficient of xi and  is the intercept of
the equation. A linear model such as (5-1) can be estimated by OLS (ordinary least squares), which
will minimize e ' e which a good measure of the fit of the model. OLS is one of many methods to fit
a line, others discussed being L1 which minimizes | e | and minimax which minimizes the largest
element in e. After the coefficients are calculated, it is a good idea to estimate and report standard
errors, which allow significance tests on the estimates of the parameters. OLS models can be
estimated, using matrix algebra directly or using pre programmed procedures like the regression
command in Excel. There are however a number of ways to calculate the estimated parameters.
Before this occurs we first illustrate a number of Linear algebra calculations that include the LU
factorization, eigenvalue analysis, the Cholesky factorization, the QR factoprization, the Schur
factorization (that always works when eigen values may not work) and the SVD calculation.
The LU factorization ( LU  Z ) is the appropriate way to invert a general matrix. Eigenvalue
analysis decomposes Z  V V 1 where  is a diagonal matrix and Z is a general matrix. For the
positive definite case Z  V V ' since here V 1  V ' . Inspection of the diagonal elements of 
indicates whether lim k  Z k explodes if we note X k  V  kV 1 . The sum of the diagonal elements of
 are the trace of Z while their product is | Z | . If Z is positive definite (all diagonal elements of 
>0) the Cholesky factorization writes Z  R ' R where R is upper triangular. The Schur factorization
writes Z  USU ' where U is orthogonal UU '  I and S is block upper triangular. Unlike the
eigenvalue transformation, all elements of the Schur factorization are real for the general matrix.
The QR factorization writes X  QR where Q is orthogonal and R is the Cholesky factorization
calculated accurately since it used X not X ' X .
The SVD calculates X  U V ' where both U and V are orthogonal and N by K and K by K and 
is a K by K diagonal matrtrix whose elements are the square roots of the eigenvalues of X ' X . The
below listed Matlab script self documents these calculations and shows graphically X ' X where X
was 100 by 50. How would this graph look like if X was not a random matrix where by assumption
E ( X (, i ) X (, j ))  0 for i  j ? How might it be used?
%% Linear Algebra Useful for Econometrics in Matlab
disp(' Short course in Math using Matlab(c)')
% 2 December 2006 Version
disp(' Houston H. Stokes')
14
Econometric Notes
disp('
disp('
disp('
disp('
disp('
disp('
disp('
disp('
disp('
disp('
disp('
disp('
disp('
disp('
disp('
disp('
disp('
disp('
disp('
disp('
disp('
disp('
disp('
disp('
disp('
disp('
disp('
disp('
disp('
All Matlab commands are indented. Cut and paste from this')
document into Matlab and execute.')
')
If ; is left off result will print.')
Define x as a n by n matrix of random numbers.')
x
= rand(n) ')
define x as a n by n matrix of random normal numbers')
xn
= randn(n)')
Do a LU factorization and test answer')
Inverse using LU
')
')
x
= rand(n)
')
[l u]
= lu(x)
')
test
= l*u
')
error
= l*u - x
')
ix
= inv(x)
')
ix2
= inv(u)*inv(l)
')
error
= ix - ix2
')
n=3
x
= rand(n)
[l u]
= lu(x)
test
= l*u
error
= l*u - x
ix
= inv(x)
ix2
= inv(u)*inv(l)
error
= ix - ix2
Form PD Matrix and look at it.
')
xx
= randn(100,10);
')
xpx
= xx`*xx
')
mesh(xpx)
')
xx
= randn(100,50);
xpx
= xx'*xx;
mesh(xpx)
Factor PD matrix into R(t)*R and test')
xx
= randn(100,n);
')
xpx
= xx(t)*xx
')
r
= chol(xpx)
')
test1
= r(t)*r
')
mesh(r)
')
error
= r(t)*r - xpx
')
xx
= randn(100,n);
xpx
= xx'*xx
r
= chol(xpx)
test1
= r'*r
error
= r'*r - xpx
disp(' Eigen and svd analysis. For pd matrix s = landa')
disp('
xx
= randn(100,n); ')
disp('
xpx
= xx(t)*xx
')
disp('
lamda
= eig(xpx)
')
xx
= randn(100,n);
xpx
= xx'*xx
lamda
= eig(xpx)
15
Econometric Notes
disp(' show trace = sum eigen')
disp('
det = prod(e) ')
disp('
trace1
= trace(xpx)
disp('
det1
= det(xpx)
disp('
trace2
= sum(lamda)
disp('
det2
= prod(lamda)
trace1
= trace(xpx)
det1
= det(xpx)
trace2
= sum(lamda)
det2
= prod(lamda)
')
')
')
')
disp(' Test SVD')
disp('
s
= svd(xpx) ')
disp('
[u ss v] = svd(xpx) ')
disp('
test
= u*ss*v(t)')
disp('
error
= xpx-test ')
s
= svd(xpx)
[u ss v] = svd(xpx)
test
= u*ss*v'
error
= xpx-test
disp(' Does X*V = V*Lamda')
disp('
xx
= randn(100,n); ')
disp('
xpx
= xx(t)*xx
')
disp('
[v lamda] = eig(xpx)
')
disp('
test
= v*lamda*inv(v)')
disp('
error
= xpx-test
')
disp('
vpv
= v(t)*v
')
disp('
s
= svd(xpx)
')
xx
= randn(100,n);
xpx
= xx'*xx
[v lamda] = eig(xpx)
test
= v*lamda*inv(v)
error
= xpx-test
vpv
= v'*v
s
= svd(xpx)
disp(' Schur Factorization X = U S U(t) where U is orthogonal and')
disp(' S is block upper triangural with 1 by 1 and 2 by 2 on the')
disp(' diagonal. All elements of a Schur factorization real')
disp('
xx
= randn(100,n); ')
disp('
xpx
= xx(t)*xx
')
disp('
[U,S]
= schur(xpx)
')
disp('
test
= U*S*U(t)
')
disp('
error
= xpx-test
')
xx
= randn(100,n);
xpx
= xx'*xx
[U,S]
= schur(xpx)
test
= U*S*U'
error
= xpx-test
disp(' Schur Factorization')
disp('
xx
= randn(n,n)
')
disp('
[U,S]
= schur(xx)
')
16
Econometric Notes
disp('
disp('
disp('
disp('
disp('
disp('
disp('
disp('
disp('
disp('
test
= U*S*U(t)
')
error
= xx-test
')
xx
= randn(n,n)
[U,S]
= schur(xx)
test
= U*S*U'
error
= xx-test
QR Factorization preserves length and angles and does not magnify')
errors. We express X = Q*R where Q is orthogonal and R is upper')
triangular
')
x
= randn(n,n) ')
[Q R]
= qr(x)
')
test1
= Q(t)*Q
')
test2
= Q*R
')
error
= x - test2
')
x
= randn(n,n)
[Q R]
= qr(x)
test1
= Q'*Q
test2
= Q*R
error
= x - test2
and produces output:
Short course in Math using Matlab(c)
Houston H. Stokes
All Matlab commands are indented. Cut and paste from this
document into Matlab and execute.
If ; is left off result will print.
Define x as a n by n matrix of random numbers.
x
= rand(n)
define x as a n by n matrix of random normal numbers
xn
= randn(n)
Do a LU factorization and test answer
Inverse using LU
x
[l u]
test
error
ix
ix2
error
= rand(n)
= lu(x)
= l*u
= l*u - x
= inv(x)
= inv(u)*inv(l)
= ix - ix2
n =
3
x =
0.84622
0.52515
0.20265
0.67214
0.83812
0.01964
0.68128
0.37948
0.8318
1
0.62059
0.23947
0
1
-0.33568
0
0
1
0.84622
0
0
0.67214
0.421
0
0.68128
-0.04331
0.65411
l =
u =
test =
17
Econometric Notes
0.84622
0.67214
0.68128
0.52515
0.83812
0.37948
0.20265
0.01964
0.8318
error =
0
0
0
0
0
0
0 -6.9389e-018
0
ix =
2.9596
-2.3417
-1.3557
-1.5445
2.4281
0.15727
-0.68458
0.51318
1.5288
ix2 =
2.9596
-2.3417
-1.3557
-1.5445
2.4281
0.15727
-0.68458
0.51318
1.5288
error =
0
0
0
0
0
0
-1.1102e-016
0
0
Form PD Matrix and look at it.
xx
= randn(100,10);
xpx
= xx`*xx
mesh(xpx)
Factor PD matrix into R(t)*R and test
xx
= randn(100,n);
xpx
= xx(t)*xx
r
= chol(xpx)
test1
= r(t)*r
mesh(r)
error
= r(t)*r - xpx
xpx =
98.02
17.334
0.14022
17.334
104.66
-7.2052
0.14022
-7.2052
114.22
r =
9.9005
1.7508
0.014163
0
10.08
-0.71729
0
0
10.663
test1 =
98.02
17.334
0.14022
17.334
104.66
-7.2052
0.14022
-7.2052
114.22
error =
1.4211e-014
0
0
0
0
0
0
0
0
Eigen and svd analysis. For pd matrix s = landa
xx
= randn(100,n);
xpx
= xx(t)*xx
lamda
= eig(xpx)
xpx =
95.217
-3.5453
12.006
-3.5453
96.003
-3.9312
12.006
-3.9312
92.989
lamda =
82.025
93.783
108.4
show trace = sum eigen
det = prod(e)
18
Econometric Notes
trace1
= trace(xpx)
det1
= det(xpx)
trace2
= sum(lamda)
det2
= prod(lamda)
trace1 =
284.21
det1 =
8.3388e+005
trace2 =
284.21
det2 =
8.3388e+005
Test SVD
s
= svd(xpx)
[u ss v] = svd(xpx)
test
= u*ss*v(t)
error
= xpx-test
s =
108.4
93.783
82.025
u =
-0.67492
0.3165
0.66657
0.39135
0.91936
-0.040277
-0.62557
0.23368
-0.74435
ss =
108.4
0
0
0
93.783
0
0
0
82.025
v =
-0.67492
0.3165
0.66657
0.39135
0.91936
-0.040277
-0.62557
0.23368
-0.74435
test =
95.217
-3.5453
12.006
-3.5453
96.003
-3.9312
12.006
-3.9312
92.989
error =
-2.8422e-014 -1.199e-014 -4.0856e-014
-9.3703e-014 -2.8422e-014 5.9064e-014
-4.7962e-014 -5.3291e-015 1.4211e-014
Does X*V = V*Lamda
xx
= randn(100,n);
xpx
= xx(t)*xx
[v lamda] = eig(xpx)
test
= v*lamda*inv(v)
error
= xpx-test
vpv
= v(t)*v
s
= svd(xpx)
xpx =
98.321
-0.36605
1.9557
-0.36605
127.52
-2.4594
1.9557
-2.4594
112.74
v =
0.99127
0.1298
0.022941
0.0013134
0.1643
-0.98641
-0.13181
0.97783
0.1627
lamda =
98.061
0
0
0
112.59
0
19
Econometric Notes
0
0
127.93
test =
98.321
-0.36605
1.9557
-0.36605
127.52
-2.4594
1.9557
-2.4594
112.74
error =
-1.4211e-014 2.7645e-014 2.2204e-015
2.9421e-014 -4.2633e-014 5.7732e-015
1.7764e-015 -1.3323e-015
0
vpv =
1 2.7756e-017 -2.0817e-017
2.7756e-017
1 -2.498e-016
-2.0817e-017 -2.498e-016
1
s =
127.93
112.59
98.061
Schur Factorization X = U S U(t) where U is orthogonal and
S is block upper triangural with 1 by 1 and 2 by 2 on the
diagonal. All elements of a Schur factorization real
xx
= randn(100,n);
xpx
= xx(t)*xx
[U,S]
= schur(xpx)
test
= U*S*U(t)
error
= xpx-test
xpx =
75.062
11.465
-4.6863
11.465
135.28
7.6196
-4.6863
7.6196
87.647
U =
-0.91599
-0.36457
-0.16747
0.20355
-0.062606
-0.97706
-0.34572
0.92907
-0.13156
S =
70.745
0
0
0
88.973
0
0
0
138.27
test =
75.062
11.465
-4.6863
11.465
135.28
7.6196
-4.6863
7.6196
87.647
error =
1.4211e-014 2.1316e-014 -1.4211e-014
2.4869e-014 -8.5265e-014 7.1054e-015
-1.0658e-014 5.3291e-015 1.4211e-014
Schur Factorization
xx
= randn(n,n)
[U,S]
= schur(xx)
test
= U*S*U(t)
error
= xx-test
xx =
2.095
0.93943
-0.45994
0.34979
-0.047081
0.64722
2.0142
-1.4799
-1.8411
U =
-0.89939
-0.19282
0.39233
-0.24726
-0.51574
-0.82029
-0.3605
0.83477
-0.41617
S =
2.1689
1.4404
-1.1939
20
Econometric Notes
0
0
-0.98103
-0.42339
2.3141
-0.98103
test =
2.095
0.93943
-0.45994
0.34979
-0.047081
0.64722
2.0142
-1.4799
-1.8411
error =
8.8818e-016 -2.2204e-016 -1.6653e-016
1.1102e-016
9.09e-016 8.8818e-016
4.4409e-016 3.3307e-015 8.8818e-016
QR Factorization preserves length and angles and does not magnify
errors. We express X = Q*R where Q is orthogonal and R is upper
triangular
x
= randn(n,n)
[Q R]
= qr(x)
test1
= Q(t)*Q
test2
= Q*R
error
= x - test2
x =
-0.9756
0.55997
0.88166
0.028304
0.62542
0.15174
-0.050706
0.53695
-0.017682
Q =
-0.99823
0.0094729
-0.058658
0.028961
-0.78444
-0.61953
-0.051883
-0.62013
0.78278
R =
0.97733
-0.56872
-0.87479
0
-0.81828
-0.099712
0
0
-0.15956
test1 =
1
0 6.9389e-018
0
1 -1.6653e-016
6.9389e-018 -1.6653e-016
1
test2 =
-0.9756
0.55997
0.88166
0.028304
0.62542
0.15174
-0.050706
0.53695
-0.017682
error =
0
0 -4.4409e-016
3.4694e-018 2.2204e-016 8.3267e-017
0
0 2.0817e-017
21
Econometric Notes
X'X Where X was 100 by 50
140
120
100
80
60
40
20
0
-20
-40
50
40
50
30
40
30
20
20
10
10
0
0
Figure 5.1 X'X for a random Matrix X
These ideas are illustrated using the Theil dataset discussed in more detail in the next section.
%% Use of Theil Data to Illustrate various ways to get Beta
% For more detail on these calculations see Stokes (200x) Chapter 10
disp('Theil (1971) data on Year CT RP Income')
data=[
1923
99.2
96.7
101.0;
1924
99.0
98.1
100.1;
1925
100.0
100.0
100.0;
1926
111.6
104.6
90.6;
1927
122.2
104.9
86.5;
1928
117.6
109.5
89.7;
1929
121.1
110.8
90.6;
1930
136.0
112.3
82.8;
1931
154.2
109.3
70.1;
1932
153.6
105.3
65.4;
1933
158.5
101.7
61.3;
22
Econometric Notes
1934
140.6
95.4
62.5;
1935
136.2
96.4
63.6;
1936
168.0
97.6
52.6;
1937
154.3
102.4
59.7;
1938
149.0
101.6
59.5;
1939
165.5
103.8
61.3]
y=data(:,2);
x=[ones(size(data,1),1),data(:,3),data(:,4)];
disp('Beta using Inverse')
beta1=inv(x'*x)*x'*y
%% QR
disp('Using QR approach')
[q,r]=qr(x,0)
disp('Testing q')
q'*q
beta2=inv(r)*q'*y
yhat=q*q'*y;
resid=y-yhat;
disp('Y
Yhat
Residual')
[y,yhat,resid]
%% Testing R from QR and R from Cholesky
disp('Inverse (xpx) = inv(r)*transpose(inv(r))')
inv(x'*x)
inv(r)*(inv(r))'
r
cholr=chol(x'*x)
%% SVD approach that includes PC Regression
disp('SVD approach')
[u,s,v]=svd(x,0)
pc_coef=u'*y
beta3=inv(v')*inv(s)*pc_coef
Output produced is:
Theil (1971) data on
data =
1923
1924
1925
1926
1927
1928
1929
1930
1931
1932
1933
1934
1935
1936
1937
1938
1939
Beta using Inverse
beta1 =
130.23
Year CT RP Income
99.2
99
100
111.6
122.2
117.6
121.1
136
154.2
153.6
158.5
140.6
136.2
168
154.3
149
165.5
96.7
98.1
100
104.6
104.9
109.5
110.8
112.3
109.3
105.3
101.7
95.4
96.4
97.6
102.4
101.6
103.8
101
100.1
100
90.6
86.5
89.7
90.6
82.8
70.1
65.4
61.3
62.5
63.6
52.6
59.7
59.5
61.3
23
Econometric Notes
1.0659
-1.3822
Using QR approach
q =
-0.24254
-0.2958
-0.42465
-0.24254
-0.2297
-0.39928
-0.24254
-0.13999
-0.38173
-0.24254
0.077214
-0.20134
-0.24254
0.091379
-0.13707
-0.24254
0.30858
-0.14641
-0.24254
0.36996
-0.14898
-0.24254
0.44079
-0.018862
-0.24254
0.29913
0.14704
-0.24254
0.11027
0.18403
-0.24254
-0.059716
0.21536
-0.24254
-0.35718
0.14409
-0.24254
-0.30997
0.13597
-0.24254
-0.25331
0.31174
-0.24254
-0.026664
0.24537
-0.24254
-0.064438
0.24162
-0.24254
0.03944
0.2331
r =
-4.1231
-424.53
-314.64
0
21.179
11.878
0
0
-66.411
Testing q
ans =
1 8.1532e-017 1.3878e-017
8.1532e-017
1 -1.1796e-016
1.3878e-017 -1.1796e-016
1
beta2 =
130.23
1.0659
-1.3822
Y
Yhat
Residual
ans =
99.2
93.704
5.4962
99
96.44
2.56
100
98.603
1.3965
111.6
116.5
-4.8995
122.2
122.49
-0.28637
117.6
122.97
-5.3664
121.1
123.11
-2.0081
136
135.49
0.51173
154.2
149.84
4.3553
153.6
152.08
1.5225
158.5
153.91
4.5927
140.6
145.53
-4.9335
136.2
145.08
-8.8789
168
161.56
6.4376
154.3
156.87
-2.565
149
156.29
-7.2887
165.5
156.15
9.3542
Inverse (xpx) = inv(r)*transpose(inv(r))
ans =
23.773
-0.2272
-0.0042094
-0.2272
0.0023008 -0.00012716
-0.0042094 -0.00012716
0.00022673
ans =
23.773
-0.2272
-0.0042094
24
Econometric Notes
-0.2272
-0.0042094
0.0023008
-0.00012716
-0.00012716
0.00022673
-4.1231
0
0
cholr =
4.1231
0
0
SVD approach
u =
0.26014
0.26123
0.26398
0.26026
0.25606
0.26662
0.26959
0.26301
0.2441
0.23275
0.22268
0.21455
0.2173
0.20664
0.22192
0.22049
0.22584
s =
530.48
0
0
v =
0.0077424
0.799
0.60128
pc_coef =
545.81
131.94
26.706
beta3 =
130.23
1.0659
-1.3822
-424.53
21.179
0
-314.64
11.878
-66.411
424.53
21.179
0
314.64
11.878
66.411
-0.42317
-0.39389
-0.37096
-0.17816
-0.11332
-0.1094
-0.10823
0.025612
0.18215
0.20748
0.22834
0.13929
0.13408
0.31251
0.26022
0.25419
0.25202
0.28267
0.21821
0.12977
-0.076467
-0.086908
-0.30402
-0.36537
-0.42853
-0.27778
-0.087337
0.08395
0.37648
0.32893
0.28251
0.052713
0.090163
-0.013903
0
53.304
0
0
0
0.20509
0.0056046
0.60125
-0.79904
0.99995
-0.0095564
-0.00017699
r =
Remark: This section shows how to implement the basic linear algebra relationships that are useful
in understanding modern econometric methods and calculations. In many cases these new
approaches are required to be used for complex and multi-collinear datasets.
6. A Sample Multiple Input Regression Model Dataset
In sections 3 and 4 we introduced a small (6 observation dataset) that relates age of cars to
their value. We observed that since there are so few observations in this example, the correlation
25
Econometric Notes
coefficient must be relatively large to be significant. The small sample standard error of the
correlation coefficient is calculated using (3.3) which is this case is .4472  1/ 5 . Since the absolute
value of the correlation coefficient (-.85884) is about 2 times the standard error, we can state that at
about the 95% level, the correlation coefficient is significant. The problem with correlation analysis
is that it is hard to make direct predictions. What is wanted is a relationship where, if given only the
age of a car, we can make some prediction on its price. To obtain an answer to the prediction
problem requires more advanced statistical techniques. Its solution will be discussed further below.
As discussed earlier, when more complicated models are deemed appropriate or when predictions
are required, the correlation coefficient statistical procedure, which restricts analysis to two
variables, is no longer the best way to proceed. In the highly unlikely situation where all the
variables influencing y (the x's) were unrelated among themselves (i. e., were orthogonal),
correlation analysis would give the correct sign of the relationship between each x variable and y.
This situation would occur if the x's were principal components. In a later example, using generated
data, some of these possibilities will be illustrated with further examples.
Table Two lists data on the consumption of textiles in the Netherlands from Theil( [1971]
Principles of Econometrics, page 102) which was used as an example in the Matlab code in section
5. This example will be shown to provide a better fit than the previous example and, in addition,
illustrates multiple input regression models. (It should be noted that not all economics examples
work this well.) Usually time series models have higher R 2 than cross section models, because of
the serial correlation (relationship between the error terms across time) implicit in most time series.
In this example from Theil (l971) the consumption of textiles in the Netherlands (CT) between 19231939 is modeled as a function of income (Y) and the relative price of textiles (RP). The maintained
hypothesis is that as income increases, the consumption of textiles should increase and as the relative
price of textiles increases, the consumption of textiles should decrease. Two models are tried, one
with the raw data and one with data logged to the base 10. The linear model asserts
Table Two
Consumption of Textiles in the Netherlands: 1923-1939
Year
1923
1924
1925
1926
1927
1928
1929
1930
1931
1932
1933
1934
1935
1936
CT
99.2
99.0
100.0
111.6
122.2
117.6
121.1
136.0
154.2
153.6
158.5
140.6
136.2
168.0
Y
96.7
98.1
100.0
104.6
104.9
109.5
110.8
112.3
109.3
105.3
101.7
95.4
96.4
97.6
RP
101.0
100.1
100.0
90.6
86.5
89.7
90.6
82.8
70.1
65.4
61.3
62.5
63.6
52.6
26
Econometric Notes
1937
1938
1939
154.3
149.0
165.5
102.4
101.6
103.8
59.7
59.5
61.3
CT = consumption of textiles.
Y = income.
RP = relative price of textiles.
CT    1 y   2 RP  e
(6-1)
while the log form assumes the error is multiplicative or that
CT   y 1 ( RP) 2 e
(6-2)
(6-2) can be estimated in log form as
log10(CT )  log10( )  1 log10( y)  2 log( RP)
(6-3)
Actual estimates of the alternative models were
CT  130.7006  1.061710* y  1.382985* RP
(4.824)
( 16.50)
(3.981)
R 2  .94432 e ' e  433.3
log10(CT )  1.373914  1.143156*log10( y )  .8288375*log( RP )
(4.4886) (7.3279)
( 22.952)
(6.4)
R 2  .97070 e ' e  .002568
Prior to preliminary estimation, raw correlations and plots were performed. The log transformation
was attempted to make the time series data stationary. B34S and SAS commands to analyze this data
are shown next
Note that B34S requires the user to explicitly define variables to be built with the gen statements
with the build statement when using the B34S data step. This allows for checking of variable
names in the gen statements. For SAS the following commands would be used.
data theil;
INPUT CT Y RP
LABEL CT
LABEL LOG10CT
LABEL Y
LABEL LOG10Y
LABEL RP
LABEL LOG10RP
;
=
=
=
=
=
=
'CONSUMPTION OF TEXTILES'
;
'LOG10 OF CONSUMPTION'
;
'INCOME'
;
' LOG10 OF INCOME '
;
'RELATIVE PRICE OF TEXTILES';
'LOG10 OF RELATIVE PRICE'
;
27
Econometric Notes
LOG10CT = LOG10(CT)
LOG10RP = LOG10(RP)
LOG10Y = LOG10(Y)
CARDS;
99.2
96.7 101
99
98.1 100.1
100
100
100
111.6 104.9 90.6
122.2 104.9 86.5
117.6 109.5 89.7
121.1 110.8 90.6
136
112.3 82.8
154.2 109.3 70.1
153.6 105.3 65.4
158.5 101.7 61.3
140.6 95.4 62.5
136.2 96.4 63.6
168
97.6 52.6
154.3 102.4 59.7
149
101.6 59.5
165.5 103.8 61.3
;
proc reg;
MODEL
proc reg;
MODEL
proc autoreg; MODEL
proc autoreg; MODEL
;
;
;
CT = Y RP; run;
LOG10CT = LOG10Y LOG10RP; run;
LOG10CT = LOG10Y LOG10RP / nlag=1 method=ml; run;
LOG10CT = LOG10Y
/ nlag=1 method=ml; run;
Edited output from B34S discussed below is:
Variable
# Label
CT
Y
RP
LOG10CT
LOG10Y
LOG10RP
CONSTANT
1
2
3
4
5
6
7
Mean
CONSUMPTION OF TEXTILES
INCOME
RELATIVE PRICE OF TEXTILES
LOG10 OF CONSUMPTION
LOG10 OF INCOME
LOG10 OF RELATIVE PRICE
Std. Dev.
134.506
102.982
76.3118
2.12214
2.01222
1.87258
1.00000
Variance
23.5773
5.30097
16.8662
0.791131E-01
0.222587E-01
0.961571E-01
0.00000
555.891
28.1003
284.470
0.625889E-02
0.495451E-03
0.924619E-02
0.00000
Data file contains
17 observations on
7 variables. Current missing value code is
B34S Version 8.42e (D:M:Y) 04/01/99 (H:M:S) 16:14:15
DATA STEP
Maximum
168.000
112.300
101.000
2.22531
2.05038
2.00432
1.00000
1
0.61769E-01
Var
2
RP
Var
3
1
-0.94664
LOG10CT
Var
4
1
0.99744
2
0.93936E-01
LOG10Y
Var
5
1
0.66213E-01
2
0.99973
3
0.17511
LOG10RP
Var
6
1
-0.93820
2
0.22599
3
0.99750
4
-0.93596
5
0.22212
CONSTANT
Var
7
1
0.0000
2
0.0000
3
0.0000
4
0.0000
5
0.0000
2
0.17885
***************
Problem Number
Subproblem Number
F to enter
F to remove
Tolerance
Maximum no of steps
Dependent variable X( 1).
Standard Error of Y =
.............
Step Number 3
3
-0.94836
4
0.97862E-01
6
0.0000
4
1
0.99999998E-02
0.49999999E-02
0.10000000E-04
3
Variable Name CT
23.577332
for degrees of freedom
=
16.
Analysis of Variance for reduction in SS due to variable entering
28
99.0000
95.4000
52.6000
1.99564
1.97955
1.72099
1.00000
0.1000000000000000E+32
Correlation Matrix
Y
Minimum
PAGE
2
Econometric Notes
Variable Entering
2
Multiple R
0.975337
Std Error of Y.X
5.56336
R Square
0.951282
Source
Due Regression
Dev. from Reg.
Total
Multiple Regression Equation
Variable
Coefficient
Std. Error
CT
=
Y
X- 2
1.061710
0.2666740
RP
X- 3 -1.382985
0.8381426E-01
CONSTANT
X- 7
130.7066
27.09429
Adjusted R Square
-2 * ln(Maximum of Likelihood Function)
Akaike Information Criterion (AIC)
Scwartz Information Criterion (SIC)
Akaike (1970) Finite Prediction Error
Generalized Cross Validation
Hannan & Quinn (1979) HQ
Shibata (1981)
Rice (1984)
Residual Variance
T Val.
DF
2
14
16
SS
8460.9
433.31
8894.2
MS
4230.5
30.951
555.89
T Sig. P. Cor. Elasticity
3.981
-16.50
4.824
0.99863 0.7287
1.00000 -0.9752
0.99973
F
136.68
Partial Cor. for Var. not in equation
Variable
Coefficient
F for selection
0.8129
-0.7846
0.944321908495049
103.294108058298
111.294108058298
114.626961434523
36.4128553394184
37.5832685467569
36.8112662258895
34.4851159390962
39.3920889580981
30.9509270385056
Order of entrance (or deletion) of the variables =
7
3
2
Estimate of computational error in coefficients =
1 -0.1889E-13
2 -0.2396E-14
3 0.7430E-11
Covariance Matrix of Regression Coefficients
Row
Row
Row
1
Variable X- 2
0.71115004E-01
Y
2
Variable X- 3
RP
-0.39974169E-02 0.70248306E-02
3
Variable X- 7
CONSTANT
-7.0185405
-0.12441382
Program terminated.
734.10069
All variables put in.
Residual Statistics for...
Original Data
Von Neumann Ratio 1 ...
Von Neumann Ratio 2 ...
For D. F.
2.14471
2.14471
14 t(.9999)=
Infin
0
1.000
1.000
Cell No.
Interval
Act Per
2
1.000
1.000
2.01855
5.3624, t(.999)= 4.1403, t(.99)= 2.9768, t(.95)= 2.1448, t(.90)= 1.7613, t(.80)= 1.3450
Skewness test (Alpha 3) = -.232914E-01,
t Stat
Cell No.
Interval
Act Per
Durbin-Watson TEST.....
Peakedness test (Alpha 4)=
1.37826
Normality Test -- Extended grid cell size
1.761
1.345
1.076
0.868
0.692
0.537
2
2
4
2
0
2
0.900
0.800
0.700
0.600
0.500
0.400
1.000
0.882
0.765
0.529
0.412
0.412
=
0.393
2
0.300
0.294
Normality Test -- Small sample grid cell size =
6
2
4
0.800
0.600
0.400
0.882
0.529
0.412
1.70
0.258
0.128
1
2
0.200
0.100
0.176
0.118
3.40
3
0.200
0.176
Extended grid normality test - Prob of rejecting normality assumption
Chi=
7.118
Chi Prob= 0.4760
F(8,
14)= 0.889706
F Prob =0.450879
Small sample normality test - Large grid
Chi=
3.294
Chi Prob= 0.6515
F(3,
14)=
1.09804
F Prob =0.617396
Autocorrelation function of residuals
1) -0.1546
F(
6,
2) -0.2529
6)
=
0.3219
Sum of squared residuals
433.3
3)
0.2272
1/F =
3.106
4) -0.3925
Heteroskedasticity at
Mean squared residual
0.9032
25.49
Gen. Least Squares ended by satisfying tolerance.
***************
Problem Number
Subproblem Number
4
2
F to enter
F to remove
Tolerance
Maximum no of steps
Dependent variable X( 4).
Standard Error of Y =
0.99999998E-02
0.49999999E-02
0.10000000E-04
3
Variable Name LOG10CT
0.79113140E-01
for degrees of freedom
=
29
16.
F Sig.
1.000000
level
Econometric Notes
.............
Step Number 3
Variable Entering
5
Multiple R
0.987097
Std Error of Y.X 0.135425E-01
R Square
0.974361
Analysis of Variance
Source
Due Regression
Dev. from Reg.
Total
Multiple Regression Equation
Variable
Coefficient
Std. Error
LOG10CT =
LOG10Y
X- 5
1.143156
0.1560002
LOG10RP
X- 6 -0.8288375
0.3611136E-01
CONSTANT
X- 7
1.373914
0.3060903
Adjusted R Square
-2 * ln(Maximum of Likelihood Function)
Akaike Information Criterion (AIC)
Scwartz Information Criterion (SIC)
Akaike (1970) Finite Prediction Error
Generalized Cross Validation
Hannan & Quinn (1979) HQ
Shibata (1981)
Rice (1984)
Residual Variance
T Val.
for reduction in SS due to variable entering
DF
SS
MS
F
2
0.97575E-01
0.48787E-01
266.02
14
0.25676E-02
0.18340E-03
16
0.10014
0.62589E-02
T Sig. P. Cor. Elasticity
7.328
-22.95
4.489
1.00000 0.8906
1.00000 -0.9870
0.99949
Partial Cor. for Var. not in equation
Variable
Coefficient
F for selection
1.084
-0.7314
0.970697895872232
-101.322167384484
-93.3221673844844
-89.9893140082595
0.215763077505479D-003
0.222698319282440D-003
0.218123847024249D-003
0.204340326343424D-003
0.233416420210472D-003
0.183398615879657D-003
Order of entrance (or deletion) of the variables =
7
6
5
Estimate of computational error in coefficients =
1 0.5793E-11
2 0.2356E-12
3 0.2547E-11
Covariance Matrix of Regression Coefficients
Row
1
Variable X- 5
0.24336056E-01
LOG10Y
Row
2
Variable X- 6
LOG10RP
-0.12513115E-02 0.13040301E-02
Row
3
Variable X- 7
CONSTANT
-0.46626424E-01 0.76017246E-04
Program terminated.
0.93691270E-01
All variables put in.
Residual Statistics for...
Original Data
Von Neumann Ratio 1 ...
Von Neumann Ratio 2 ...
For D. F.
14 t(.9999)=
2.04710
2.04710
Infin
1
1.000
1.000
Cell No.
Interval
Act Per
2
1.000
1.000
1.92669
5.3624, t(.999)= 4.1403, t(.99)= 2.9768, t(.95)= 2.1448, t(.90)= 1.7613, t(.80)= 1.3450
Skewness test (Alpha 3) = -.159503
t Stat
Cell No.
Interval
Act Per
Durbin-Watson TEST.....
,
Peakedness test (Alpha 4)=
1.44345
Normality Test -- Extended grid cell size
1.761
1.345
1.076
0.868
0.692
0.537
1
1
5
1
1
3
0.900
0.800
0.700
0.600
0.500
0.400
0.941
0.882
0.824
0.529
0.471
0.412
=
0.393
1
0.300
0.235
Normality Test -- Small sample grid cell size =
6
2
4
0.800
0.600
0.400
0.882
0.529
0.412
1.70
0.258
0.128
2
1
0.200
0.100
0.176
0.059
3.40
3
0.200
0.176
Extended grid normality test - Prob of rejecting normality assumption
Chi=
9.471
Chi Prob= 0.6958
F(8,
14)= 1.18382
F Prob =0.626481
Small sample normality test - Large grid
Chi=
3.294
Chi Prob= 0.6515
F(3,
14)=
1.09804
F Prob =0.617396
Autocorrelation function of residuals
1) -0.0990
F(
6,
2) -0.1061
6)
=
Sum of squared residuals
0.5544
3)
0.0862
1/F =
1.804
4) -0.3157
Heteroskedasticity at
0.2568E-02 Mean squared residual
0.1510E-03
Gen. Least Squares ended by satisfying tolerance.
We first show plots of the data.
30
0.7544
F Sig.
1.000000
level
Econometric Notes
Log Theil Data
LOG10CT
2.20
2.15
2.10
2.05
LOG10Y
2.00
1.95
1.90
1.85
1.80
LOG10RP
1.75
2
4
6
8
10
12
14
16
OBS
Linear Theil Data
CT
160
150
140
130
120
110
Y
100
90
80
70
RP
60
2
4
6
8
10
OBS
Figure 6.1 2 D Plots of Textile Data
31
12
14
16
Econometric Notes
Two dimensional plots of this dataset do not capture the full relationships. From the plots in Figure
6.1 it appears that the consumption of textiles increases when the relative price of textiles falls and
that RI has little effect. Figure 6.2, which is based on a three dimensional extrapolation about each
point, gives a better picture of the true relationship. This figure clearly shows that LOG10RP, has the
most effect on LOG10CT, which is on the Z axis, but that LOG10RI does have a positive effect. The
OLS regression model attempts to capture this surface.
Remark: A 2-D plot may lead one to drop a variable that is in fact significant in a multi-dimensional
context. A 3-D plot can help in cases where K=3, but may be less useful for larger problems.
The plots of CT against RP and LOG10CT against LOG10RP suggest a negative relationship, which
is consistent with the economic theory that quantity demanded of a good will increase as its relative
price falls. The correlations between these two sets of variables are negative (-.94664 and -.93596)
and highly significant (at the .0001 level for both correlations). The plot between CT and Y and the
plot between LOG10CT and LOG10Y do not show much of a relationship. The raw correlations are
small (.06177 and .09786, respectively) and not significant. The preliminary finding might be that Y
was not a good variable to use on the right-hand side of a model predicting CT. It will be shown later
that such a conclusion would be wrong.
32
Econometric Notes
Log Theil Textile Data
2.18
l
o
g
1
0
c
t
2.16
2.14
2.12
2.10
2.08
2.06
2.04
2.050
2.040
2.030
2.020
log
1.80
2.010
2.000
1.90
1.990
1.980
10y
2.00
log
10rp
Figure 6.2 3-D Plot of Theil (1971) Textile Data
Remark: The preliminary estimation of a model CT = f(constant, Y, RP) indicates that the
coefficients are 1.0617 (t = 3.98) and -1.383 (t = -16.5), respectively. The results support the
maintained hypothesis that CT is positively related to income and negatively related to relative price.
The Y variable, which was not significantly correlated with CT, was found to be significant when
included in a regression controlling for RP. This demonstrates that it is important to go beyond just
raw cross correlation analysis. If proposed variables are "prescreened out" by correlation analysis and
not tried in regression models, many important variables may be incorrectly dropped from the analysis.
It is important not to prematurely drop a proposed, theoretically plausible, variable from a regression
model specification, even if in preliminary specifications it does not enter significantly. Later in the
paper an example will be presented that illustrates that if other important variables are omitted from
an equation, a significant variable that is in the equation may not show up as significant when other
variables enter the equation omitted variable bias). The preceding discussion suggests that regression
analysis requires careful use of diagnostic tests before the results are to be used in a production
environment.
A possible problem with the above formulation is that the error process might potentially have
heteroskedasticity or nonconstant variance due to the fact that the time series values for CT are
33
Econometric Notes
increasing over time. If all the variables in the model are transformed into logs (to the base 10), some
of the potential for difficulty may be avoided. If heteroskedasticity were to be present, the estimated
standard errors of the coefficients would be biased. In addition, the estimated standard error of the
model,   5.5634  (433.31298/(17.3) from equation (6-4), would be misleading, since it would be
an average, and, assuming the variance of the error was increasing, would overstate the error at the
beginning of the data set and understate the error at the end of the data set.
Log transforms to the base 10 are made and the model is estimated again and reported in the
bottom equation (6-4). The results indicate the log linear form of the model fits better (the adjusted
R 2 now is .9707) and all coefficients, except for the constant, are more significant. Comparison of the
estimated values with the actual values shows surprisingly good results, considering there are only
two explanatory variables in the model.
One of the assumptions of an OLS regression is that the error process follows a random normal
distribution with no serial correlation or heteroskedasticity (nonconstant variance). If the error process
is only normal, the estimated coefficients will be unbiased and the standard errors of the estimated
coefficients will be biased.
Another important assumption of OLS is that the error terms are not related. If et is the error
term of the estimated model, ut is a random error and the model
p
et    i et i  ut
(6-5)
i 1
is estimated, no autocorrelation up to order K implies that for   K , 1, ,  k are not significant.
First-order serial correlation can be tested by the Durbin-Watson test statistic. If the Durbin Watson
statistic is around 2.0, there is no problem. If it is substantially below (above) 2.0 there is positive
(negative) autocorrelation. This can be seen since the formula for the Durbin Watson is
T
T
t 2
t 1
d   (et  et 1 ) 2 /  et2
(6-6)
If serial correlation is found, the appropriate procedure is generalized least squares, which involves a
transformation of the data. If heteroskedasticity is found, there are other procedures that can be used
to remove the problem. To illustrate GLS, assume
yt     xt  et
(6-7)
where t refers to the time period of the observation. If   1 and model (6-5) is estimated for the
residuals for (6-7) and 1 is significant, the appropriate procedure is to lag the original equation and
34
Econometric Notes
multiply through by 1 and then subtract from the original equation. This would give
( yt  1 yt 1 )   (1  1 )   ( xt  1 xt 1 )  (et  1et 1 )
(6-9)
which will give unbiased estimates of  and  and their standard errors, since from (6-5)
ut  et  1et 1 and by assumption ut does not contain serial correlation.
As a test a misspecified model (to induce serial correlation) containing only LOG10Y is run in
SAS. This model will find LOG10Y not significant and evidence of serial correlation in the model as
measured by the low Durbin-Watson test statistic (.241). In the presence of serial correlation, the best
course of action is to attempt to add new variables to explain the serial correlation. The B34S reg
command output is shown first and next the SAS autoreg command
REG Command. Version
1 February 1997
Real*8 space available
Real*8 space used
9000000
43
OLS Estimation
Dependent variable
Adjusted R**2
Standard Error of Estimate
Sum of Squared Residuals
Model Sum of Squares
Total Sum of Squares
F( 1,
15)
F Significance
1/Condition of XPX
Number of Observations
Durbin-Watson
Variable
LOG10Y
{
CONSTANT {
0}
0}
LOG10CT
-5.645122564288263E-02
8.131550225838252E-02
9.918316361299519E-02
9.590596648930139E-04
0.1001422232778882
0.1450437196128150
0.2913428904662156
1.523468705487359E-05
17
0.2414802718079813
Coefficient
0.34782649
1.4222303
Std. Error
0.91329943
1.8378693
t
0.38084606
0.77384737
SAS output next:
The AUTOREG Procedure
Dependent Variable
LOG10CT
LOG10 OF CONSUMPTION
Ordinary Least Squares Estimates
SSE
MSE
SBC
Regress R-Square
Durbin-Watson
0.00256758
0.0001834
-92.822527
0.9744
1.9267
DFE
Root MSE
AIC
Total R-Square
Variable
DF
Estimate
Standard
Error
t Value
Approx
Pr > |t|
Intercept
LOG10Y
LOG10RP
1
1
1
1.3739
1.1432
-0.8288
0.3061
0.1560
0.0361
4.49
7.33
-22.95
0.0005
<.0001
<.0001
35
14
0.01354
-95.322167
0.9744
Variable Label
LOG10 OF INCOME
LOG10 OF RELATIVE PRICE
Econometric Notes
Estimates of Autocorrelations
Lag
Covariance
Correlation
0
1
0.000151
-0.00001
1.000000
-0.093221
-1 9 8 7 6 5 4 3 2 1 0 1 2 3 4 5 6 7 8 9 1
|
|
|********************|
**|
|
Preliminary MSE
0.000150
Estimates of Autoregressive Parameters
Lag
Coefficient
Standard
Error
t Value
1
0.093221
0.276142
0.34
Algorithm converged.
The SAS System
10:37 Wednesday, December 6, 2006
4
The AUTOREG Procedure
Maximum Likelihood Estimates
SSE
MSE
SBC
Regress R-Square
Durbin-Watson
0.0025352
0.0001950
-90.189374
0.9789
1.7932
DFE
Root MSE
AIC
Total R-Square
Variable
DF
Estimate
Standard
Error
t Value
Approx
Pr > |t|
Intercept
LOG10Y
LOG10RP
AR1
1
1
1
1
1.3592
1.1487
-0.8271
0.1248
0.2941
0.1516
0.0343
0.3186
4.62
7.58
-24.09
0.39
0.0005
<.0001
<.0001
0.7017
13
0.01396
-93.522227
0.9747
Variable Label
LOG10 OF INCOME
LOG10 OF RELATIVE PRICE
Autoregressive parameters assumed given.
Variable
DF
Estimate
Standard
Error
t Value
Approx
Pr > |t|
Intercept
LOG10Y
LOG10RP
1
1
1
1.3592
1.1487
-0.8271
0.2875
0.1471
0.0338
4.73
7.81
-24.47
0.0004
<.0001
<.0001
The SAS System
Variable Label
LOG10 OF INCOME
LOG10 OF RELATIVE PRICE
10:37 Wednesday, December 6, 2006
The AUTOREG Procedure
Dependent Variable
LOG10CT
LOG10 OF CONSUMPTION
Ordinary Least Squares Estimates
SSE
MSE
SBC
0.09918316
0.00661
-33.537669
DFE
Root MSE
AIC
36
15
0.08132
-35.204096
5
Econometric Notes
Regress R-Square
Durbin-Watson
0.0096
0.2415
Total R-Square
0.0096
Variable
DF
Estimate
Standard
Error
t Value
Approx
Pr > |t|
Intercept
LOG10Y
1
1
1.4222
0.3478
1.8379
0.9133
0.77
0.38
0.4510
0.7087
Variable Label
LOG10 OF INCOME
Estimates of Autocorrelations
Lag
Covariance
Correlation
0
1
0.00583
0.00447
1.000000
0.765305
-1 9 8 7 6 5 4 3 2 1 0 1 2 3 4 5 6 7 8 9 1
|
|
|********************|
|***************
|
Preliminary MSE
0.00242
Estimates of Autoregressive Parameters
Lag
Coefficient
Standard
Error
t Value
1
-0.765305
0.172027
-4.45
Algorithm converged.
The SAS System
10:37 Wednesday, December 6, 2006
6
The AUTOREG Procedure
Maximum Likelihood Estimates
SSE
MSE
SBC
Regress R-Square
Durbin-Watson
0.02423721
0.00173
-53.034484
0.0564
1.6157
DFE
Root MSE
AIC
Total R-Square
14
0.04161
-55.534124
0.7580
Variable
DF
Estimate
Standard
Error
t Value
Approx
Pr > |t|
Intercept
LOG10Y
AR1
1
1
1
0.6643
0.7229
-0.8961
1.6320
0.8167
0.1312
0.41
0.89
-6.83
0.6901
0.3910
<.0001
Variable Label
LOG10 OF INCOME
Autoregressive parameters assumed given.
Variable
DF
Estimate
Standard
Error
t Value
Approx
Pr > |t|
Intercept
LOG10Y
1
1
0.6643
0.7229
1.5885
0.7910
0.42
0.91
0.6821
0.3762
Variable Label
LOG10 OF INCOME
The SAS first shows the complete model where the DW was 1.9267 indicating GLS was not
needed. If in fact GLS is run,   .1248 with t  .39 . The DW fell to 1.79. For the mis-specified
equation the DW was .215 before GLS and 1.6157 after GLS. Here   .8961 with t  6.83 . For
37
Econometric Notes
B34S which the two pass method to do GLS the results were:
Problem Number
Subproblem Number
F to enter
F to remove
Tolerance (1.-R**2) for including a variable
Maximum Number of Variables Allowed
Internal Number of dependent variable
Dependent Variable
Standard Error of Y
Degrees of Freedom
.............
Step Number 2
Variable Entering
5
Multiple R
0.978620E-01
Std Error of Y.X 0.813155E-01
R Square
0.957698E-02
1
3
9.999999776482582E-03
4.999999888241291E-03
1.000000000000000E-05
2
4
LOG10CT
7.911314021618683E-02
16
Analysis of Variance
Source
Due Regression
Dev. from Reg.
Total
Multiple Regression Equation
Variable
Coefficient
Std. Error
LOG10CT =
LOG10Y
X- 5 0.3478265
0.9132994
CONSTANT
X- 7
1.422230
1.837869
Adjusted R Square
-2 * ln(Maximum of Likelihood Function)
Akaike Information Criterion (AIC)
Scwartz Information Criterion (SIC)
Akaike (1970) Finite Prediction Error
Generalized Cross Validation
Hannan & Quinn (1979) HQ
Shibata (1981)
Rice (1984)
Residual Variance
T Val.
T Sig. P. Cor. Elasticity
0.3808
0.7738
0.29134
0.54895
0.0979
7
0.3298
5
2
0.00000
Covariance Matrix of Regression Coefficients
Row
1
Variable X- 5
0.83411584
Row
2
Variable X- 7
CONSTANT
-1.6784283
3.3777634
Program terminated.
LOG10Y
All variables put in.
Residual Statistics for Original data
Von Neumann Ratio 1 ...
Von Neumann Ratio 2 ...
For D. F.
0.25657
0.25657
15 t(.9999)=
Infin
0
1.000
1.000
Cell No.
Interval
Act Per
4
1.000
1.000
0.24148
5.2391, t(.999)= 4.0728, t(.99)= 2.9467, t(.95)= 2.1314, t(.90)= 1.7531, t(.80)= 1.3406
Skewness test (Alpha 3) = -.233040
t Stat
Cell No.
Interval
Act Per
Durbin-Watson TEST.....
,
Peakedness test (Alpha 4)=
1.30008
Normality Test -- Extended grid cell size
1.753
1.341
1.074
0.866
0.691
0.536
4
1
2
4
2
2
0.900
0.800
0.700
0.600
0.500
0.400
1.000
0.765
0.706
0.588
0.353
0.235
=
0.393
1
0.300
0.118
Normality Test -- Small sample grid cell size =
3
6
3
0.800
0.600
0.400
0.765
0.588
0.235
1.70
0.258
0.128
0
1
0.200
0.100
0.059
0.059
3.40
1
0.200
0.059
Extended grid normality test - Prob of rejecting normality assumption
Chi=
10.65
Chi Prob= 0.7775
F(8,
15)= 1.33088
F Prob =0.698738
Small sample normality test - Large grid
Chi=
3.882
Chi Prob= 0.7255
F(3,
15)=
1.29412
F Prob =0.687249
Autocorrelation function of residuals
1
0.813137
F(
6,
2
0.658160
6)
=
Sum of squared residuals
1.730
3
0.545551
1/F =
0.5781
4
0.332369
Heteroskedasticity at
9.918316361299512E-02
38
0.7389
level
F Sig.
0.291343
Partial Cor. for Var. not in equation
Variable
Coefficient
F for selection
-5.645122564294430E-02
-39.20409573256165
-33.20409573256165
-30.70445570039300
7.390118073123232E-03
7.493839028535489E-03
7.454314110419272E-03
7.207081092983957E-03
7.629474124074592E-03
6.612210907531313E-03
Order of entrance (or deletion) of the variables =
Estimate of Computational Error in Coefficients
1
0.00000
for reduction in SS due to variable entering
DF
SS
MS
F
1
0.95906E-03
0.95906E-03
0.14504
15
0.99183E-01
0.66122E-02
16
0.10014
0.62589E-02
Econometric Notes
Mean squared residual
B34S 8.10Z
5.834303741940890E-03
(D:M:Y)
6/12/06 (H:M:S) 11: 8:24
REGRESSION STEP
Doing Gen. Least Squares using residual Dif. Eq. of order
Lag Coefficients
PAGE
10
1
1
0.842413
Standard Error of Y
Degrees of Freedom
0.2288578856184614
15
.............
Step Number 2
Variable Entering
5
Multiple R
0.105883
Std Error of Y.X 0.235559
R Square
0.112113E-01
Analysis of Variance
Source
Due Regression
Dev. from Reg.
Total
Multiple Regression Equation
Variable
Coefficient
Std. Error
LOG10CT =
LOG10Y
X- 5 0.2959532
0.7428190
CONSTANT
X- 7
1.605192
1.504752
Adjusted R Square
-2 * ln(Maximum of Likelihood Function)
Akaike Information Criterion (AIC)
Scwartz Information Criterion (SIC)
Akaike (1970) Finite Prediction Error
Generalized Cross Validation
Hannan & Quinn (1979) HQ
Shibata (1981)
Rice (1984)
Residual Variance
T Val.
for reduction in SS due to variable entering
DF
SS
MS
F
1
0.88080E-02
0.88080E-02
0.15874
14
0.77683
0.55488E-01
15
0.78564
0.52376E-01
T Sig. P. Cor. Elasticity
0.3984
1.067
0.30367
0.69586
0.1059
Partial Cor. for Var. not in equation
Variable
Coefficient
F for selection
0.2806
-5.941647786252678E-02
-2.995906752503885
3.004093247496115
5.321859414215458
6.242391585298818E-02
6.341477166017846E-02
6.265098482400155E-02
6.068991819040517E-02
6.473591273643219E-02
5.548792520265616E-02
Order of entrance (or deletion) of the variables =
7
5
Covariance Matrix of Regression Coefficients
Row
1
Variable X- 5
0.55178009
Row
2
Variable X- 7
CONSTANT
-1.1169023
2.2642794
Program terminated.
LOG10Y
All variables put in.
Residual Statistics for Smoothed Original data
For GLS
Y and Y estimate scaled by
Von Neumann Ratio 1 ...
Von Neumann Ratio 2 ...
For D. F.
14 t(.9999)=
0.1575867198767030
2.30745
2.30745
Infin
1
1.000
1.000
Cell No.
Interval
Act Per
1
1.000
1.000
2.16324
5.3634, t(.999)= 4.1405, t(.99)= 2.9768, t(.95)= 2.1448, t(.90)= 1.7613, t(.80)= 1.3450
Skewness test (Alpha 3) = 0.512095
t Stat
Cell No.
Interval
Act Per
Durbin-Watson TEST.....
,
Peakedness test (Alpha 4)=
1.97237
Normality Test -- Extended grid cell size
1.761
1.345
1.076
0.868
0.692
0.537
0
3
3
2
1
3
0.900
0.800
0.700
0.600
0.500
0.400
0.938
0.938
0.750
0.562
0.438
0.375
=
0.393
2
0.300
0.188
Normality Test -- Small sample grid cell size =
6
3
5
0.800
0.600
0.400
0.938
0.562
0.375
1.60
0.258
0.128
0
1
0.200
0.100
0.062
0.062
3.20
1
0.200
0.062
Extended grid normality test - Prob of rejecting normality assumption
Chi=
7.750
Chi Prob= 0.5417
F(8,
14)= 0.968750
F Prob =0.503304
Small sample normality test - Large grid
Chi=
6.500
Chi Prob= 0.9103
F(3,
14)=
2.16667
F Prob =0.862453
Autocorrelation function of residuals
1
-0.159923
F(
5,
2
-0.479294
5)
=
Sum of squared residuals
Mean squared residual
0.3909
3
0.253850
1/F =
2.558
4
-0.348167E-01
Heteroskedasticity at
0.7768309528371908
4.855193455232443E-02
Gen. Least Squares ended by satisfying tolerance.
39
0.8371
F Sig.
0.303669
level
Econometric Notes
Here   .8424 and after GLS the DW was 2.16, which is higher than found with SAS. Note
that the change in sign of  in the SAS output. Although there is positive serial correlation (the
autocorrelation was .7653) SAS reports   . The insignificant LOG10Y term is now found to be
.2959 in place of .7229 as found with SAS but very close to the OLS .3478 value.
Remarks: What can we conclude from the preceding results? Serial correlation was not the reason
that LOG10Y was not significant (as measured by the low t value) in the OLS equation containing just
LOG10Y on the right-hand side. In this equation, LOG10Y was not significant because of omitted
variable bias. The B34S two-pass GLS procedure was able to remove more serial correlation than the
SAS ML approach. We found that LOG10Y is a significant variable in a properly specified equation.
This problem illustrates how it would be a mistake to remove LOG10Y from consideration as a
potentially important variable just because it does not enter significantly into a serial correlation-free
equation that does not contain all the appropriate variables on the right.
An example having different problems is illustrated by a dataset from the engineering literature
(from Brownlee[1965] Statistical Theory and Methodology, page 454) that is presented in Table
Three. Here we have a maintained hypothesis that the stack loss of in going ammonia (Y) is related to
the operation of the factory to convert ammonia to nitric acid by the process of oxidation. There is
data on three variables for 21 days of plant operation. X1 = air flow, X2 = cooling water inlet
temperature, X3 = acid concentration and Y=stack loss of ammonia.
____________________________________________________________
Table Three
Brownlee Engineering Stack Loss Data
Obs
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
X1
X2
X3
Y1
=
=
=
=
X1
80
80
75
62
62
62
62
62
58
58
58
58
58
58
50
50
50
50
50
56
70
X2
27
27
25
24
22
23
24
24
23
18
18
17
18
19
18
18
19
19
20
20
20
X3
89
88
90
87
87
87
93
93
87
80
89
88
82
93
89
86
72
79
80
82
91
Y1
42
37
37
28
18
18
19
20
15
14
14
13
11
12
8
7
8
8
9
15
15
air flow.
cooling water inlet temperature.
acid concentration.
stack loss of ammonia.
40
Econometric Notes
The following B34S commands will load the data and perform the required analysis.
/$ Sample Data # 3
/$ Data from Browlee (1965) page 454
b34sexec data corr$
INPUT X1 X2 X3 Y$
LABEL X1 = 'AIR FLOW'$
LABEL X2 = 'COOLING WATER INLET TEMPERATURE'$
LABEL X3 = 'ACID CONCENTRATION'$
LABEL Y = 'STACK LOSS' $
DATACARDS$
80 27 89 42
80 27 88 37
75 25 90 37
62 24 87 28
62 22 87 18
62 23 87 18
62 24 93 19
62 24 93 20
58 23 87 15
58 18 80 14
58 18 89 14
58 17 88 13
58 18 82 11
58 19 93 12
50 18 89 8
50 18 86 7
50 19 72 8
50 19 79 8
50 20 80 9
56 20 82 15
70 20 91 15
b34sreturn$
b34seend$
b34sexec regression maxgls=2 residuala$
MODEL Y = X1 X2 X3 $ b34seend$
41
Econometric Notes
The results of the OLS model fit are reported next.
B34S Version 8.42e
Variable
# Label
X1
X2
X3
Y
CONSTANT
1
2
3
4
5
(D:M:Y)
04/01/99 (H:M:S) 21:47:54
DATA STEP
AIR FLOW
COOLING WATER INLET TEMPERATURE
ACID CONCENTRATION
STACK LOSS
Data file contains
B34S Version 8.42e
PAGE
Mean
21 observations on
(D:M:Y)
Std. Dev.
60.4286
21.0952
86.2857
17.5238
1.00000
Variance
9.16827
3.16077
5.35857
10.1716
0.00000
84.0571
9.99048
28.7143
103.462
0.00000
5 variables. Current missing value code is
04/01/99 (H:M:S) 21:47:54
Maximum
80.0000
27.0000
93.0000
42.0000
1.00000
1
Minimum
50.0000
17.0000
72.0000
7.00000
1.00000
0.1000000000000000E+32
DATA STEP
PAGE
2
PAGE
3
Correlation Matrix
X2
Var
2
1
0.78185
X3
Var
3
1
0.50014
2
0.39094
Y
Var
4
1
0.91966
2
0.87550
3
0.39983
CONSTANT
Var
5
1
0.0000
2
0.0000
3
0.0000
B34S Version 8.42e
(D:M:Y)
04/01/99 (H:M:S) 21:47:54
***************
Problem Number
Subproblem Number
F to enter
F to remove
Tolerance
Maximum no of steps
Dependent variable X( 4).
Standard Error of Y =
4
0.0000
REGRESSION STEP
1
1
0.99999998E-02
0.49999999E-02
0.10000000E-04
4
Variable Name Y
10.171623
for degrees of freedom
.............
Step Number 4
Variable Entering
3
Multiple R
0.955812
Std Error of Y.X
3.24336
R Square
0.913577
=
Analysis of Variance
Source
Due Regression
Dev. from Reg.
Total
Multiple Regression Equation
Variable
Coefficient
Std. Error
Y
=
X1
X- 1 0.7156402
0.1348582
X2
X- 2
1.295286
0.3680243
X3
X- 3 -0.1521225
0.1562940
CONSTANT
X- 5 -39.91967
11.89600
Adjusted R Square
-2 * ln(Maximum of Likelihood Function)
Akaike Information Criterion (AIC)
Scwartz Information Criterion (SIC)
Akaike (1970) Finite Prediction Error
Generalized Cross Validation
Hannan & Quinn (1979) HQ
Shibata (1981)
Rice (1984)
Residual Variance
T Val.
20.
for reduction in SS due to variable entering
DF
SS
MS
F
3
1890.4
630.14
59.902
17
178.83
10.519
20
2069.2
103.46
T Sig. P. Cor. Elasticity
5.307
3.520
-0.9733
-3.356
0.99994 0.7897
0.99737 0.6492
0.65595 -0.2297
0.99625
0.898325769953741
104.575591004800
114.575591004800
119.798203193417
12.5231065545072
12.9945646836181
13.0142387900995
11.7597933930896
13.7561508921817
10.5194095057860
Order of entrance (or deletion) of the variables =
1
Estimate of computational error in coefficients =
1 0.3461E-10
2 0.1208E-10
3 0.1959E-09
5
2
4
3
0.1472E-15
Covariance Matrix of Regression Coefficients
Row
1
Variable X- 1
0.18186730E-01
X1
Row
2
Variable X- 2
X2
-0.36510675E-01 0.13544186
Row
3
Variable X- 3
X3
-0.71435215E-02 0.10476827E-04
0.24427828E-01
42
2.468
1.559
-0.7490
F Sig.
1.000000
Partial Cor. for Var. not in equation
Variable
Coefficient
F for selection
Econometric Notes
Row
4
Variable X- 5
CONSTANT
0.28758711
-0.65179437
Program terminated.
-1.6763208
141.51474
All variables put in.
Residual Statistics for...
Original Data
Von Neumann Ratio 1 ...
Von Neumann Ratio 2 ...
For D. F.
17 t(.9999)=
1.55939
1.55939
Infin
2
1.000
1.000
Cell No.
Interval
Act Per
3
1.000
1.000
1.48513
5.0433, t(.999)= 3.9650, t(.99)= 2.8982, t(.95)= 2.1098, t(.90)= 1.7396, t(.80)= 1.3334
Skewness test (Alpha 3) = -.140452
t Stat
Cell No.
Interval
Act Per
Durbin-Watson TEST.....
,
Peakedness test (Alpha 4)=
2.03637
Normality Test -- Extended grid cell size
1.740
1.333
1.069
0.863
0.689
0.534
1
0
3
4
1
5
0.900
0.800
0.700
0.600
0.500
0.400
0.905
0.857
0.857
0.714
0.524
0.476
=
0.392
2
0.300
0.238
Normality Test -- Small sample grid cell size =
3
5
7
0.800
0.600
0.400
0.857
0.714
0.476
2.10
0.257
0.128
2
1
0.200
0.100
0.143
0.048
4.20
3
0.200
0.143
Extended grid normality test - Prob of rejecting normality assumption
Chi=
9.952
Chi Prob= 0.7316
F(8,
17)= 1.24405
F Prob =0.666586
Small sample normality test - Large grid
Chi=
3.048
Chi Prob= 0.6157
F(3,
17)=
1.01587
F Prob =0.589919
Autocorrelation function of residuals
1)
F(
0.0858
7,
2) -0.1149
7)
=
Sum of squared residuals
1.336
178.8
3) -0.0409
1/F =
0.7485
4) -0.0064
Heteroskedasticity at
Mean squared residual
0.6440
level
8.516
Gen. Least Squares ended by satisfying tolerance.
Y1 = -39.91967 + .7156402X1 + 1.295286X2 - .1521225X3
(-3.356)
(5.307) (3.520)
(-.9733)
R2 =
.8983, e ' e = 178.83
(6-10)
Two of the three variables (in addition to the constant) are found to be significant (significantly
different from zero at or better than the 95% level). The correlations were .9197, .8755 and .3998
respectively. Of the three variables, X3 (acid concentration) was not significant at the 95% or better
level, because its t statistic was less than 2 in absolute value. The variable X1 (air flow) was found to
be positively related to stack loss and the variable X2 (cooling water inlet temperature) was also found
to be positively related to stack loss. In this model, 89.83% of the variance is explained by the three
variables on the right. Clearly, stack loss can be lowered, if X1 and X2 are lowered. The X3 variable
(acid concentration) was not significant, even though the raw correlations show some relationship
(correlation = .39983)1. The OLS equation was found to have a Durbin-Watson statistic of 1.4851,
showing some serial correlation. First-order GLS was tried but were not executed since the residual
correlation was less that the tolerance.
Remark: A close to significant correlation is no assurance that the variable will be significant in a
more populated model. From an economist's point of view, the results reported in the above paragraph
suggest that the tradeoffs of a lower air flow and lower cooling water inlet temperature must be
1 Depending on whether the large or small sample SE is used the value is
43
1/ 20  .22361 or 1/ 21  .21822
Econometric Notes
weighed against absorption technology changes that would lower the constant. While engineering
considerations are clearly paramount in the decision process, the regression results, which can be
readily obtained with a modern PC, can help summarize the data and highlight the relationship between
the variables of interest. Of course it is important to select the appropriate data to use in the study. If
data on key variables are omitted, the results of the study could be called into question. However, the
problem may not be as bad as it seems. If a variable was inadvertently omitted that was important,
unless it was random, its effect should be visible in the error process. In the last analysis, the value of
a model lies in how well it works. Inspection of the results is a key aspect of the validation of a model.
Table Four shows a data set (taken from Brownlee [1965], op. cit., page 463). Here the number
of deaths per 100,000 males aged 55-59 years of heart disease in a number of countries is related to
the number of telephones per head (X1) (presumable as a measure of stress and or income), the percent
fat calories are of total calories (X2) and the percent animal protein calories are of total calories (X3).
Table Four
Brownlee Health Data
obs
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
X1
X2
X3
Y
X1
124
49
181
4
22
152
75
54
43
41
17
22
16
10
63
170
125
15
221
171
97
254
=
=
=
=
X2
33
31
38
17
20
39
30
29
35
31
23
21
8
23
37
40
38
25
39
33
38
39
X3
8
6
8
2
4
6
7
7
6
5
4
3
3
3
6
8
6
4
7
7
6
8
Y
81
55
80
24
78
52
88
45
50
69
66
45
24
43
38
72
41
38
52
52
66
89
1000 * telephones per head.
fat calories as a % of total calories.
animal protein as a % of total calories.
100 * log number of deaths per 1000 males 55-59 years.
The B34S commands to analyze this data are:
44
Econometric Notes
/$ Sample Data # 4
/$ From Brownlee (1965) page 463
b34sexec data corr$
INPUT X1 X2 X3 Y$
LABEL X1 = '1000 * TELEPHONES PER HEAD'$
LABEL X2 = ' FAT CALORIES AS A % OF TOTAL CALORIES'$
LABEL X3 = 'ANIMAL PROTEIN AS A % TO TOTAL CALORIES'$
LABEL Y = '100 * LOG # DEATHS PER 1GMALES 55-59'$
DATACARDS$
124 33 8 81
49 31 6 55
181 38 8 80
4 17 2 24
22 20 4 78
152 39 6 52
75 30 7 88
54 29 7 45
43 35 6 50
41 31 5 69
17 23 4 66
22 21 3 45
16 8 3 24
10 23 3 43
63 37 6 38
170 40 8 72
125 38 6 41
15 25 4 38
221 39 7 52
171 33 7 52
97 38 6 66
254 39 8 89
b34sreturn$
b34seend$
b34sexec regression residuala$
MODEL Y = X1 X2 X3 $ b34seend$
The raw correlation results, show X1, X2 and X3 positively related to Y, with correlations of .46875,
.44628 and .62110, respectively. The variable X3 appears to be the most important.
Y = 23.9306 - .0067849*X1 -.478240*X2 + 8.496616*X3
(1.499) (-.0833)
(-.6315)
(2.21)
R 2 = .3017, e ' e = 4686
(6-11)
OLS results, indicate that only 30.17% of the variance can be explained and that the animal
protein variable (X3) is the only significant variable. Clearly, this finding is interesting, but the large
unexplained component suggests that more data need to be collected to improve the model. It may
well be the case that the animal protein variable (X3) is related to other unspecified variables and
interpreting it without qualification would be dangerous. This will have to be investigated in future
research if more data is available.
45
Econometric Notes
Remark: This dataset shows a case where correlation analysis suggested a result that did not stand up
in a multiple regression model. This is in contrast to the Theil dataset where correlation analysis did
not suggested a relationship that was only found with a regression model.
B34S Version 8.42e
(D:M:Y)
04/01/99 (H:M:S) 21:58:58
Variable
# Label
X1
X2
X3
Y
CONSTANT
1 1000 * TELEPHONES PER HEAD
2 FAT CALORIES AS A % OF TOTAL CALORIES
3 ANIMAL PROTEIN AS A % TO TOTAL CALORIES
4 100 * LOG # DEATHS PER 1GMALES 55-59
5
Data file contains
B34S Version 8.42e
DATA STEP
PAGE
Mean
22 observations on
(D:M:Y)
Std. Dev.
87.5455
30.3182
5.63636
56.7273
1.00000
Variance
75.4212
8.68708
1.86562
19.3075
0.00000
5688.35
75.4654
3.48052
372.779
0.00000
5 variables. Current missing value code is
04/01/99 (H:M:S) 21:58:58
Maximum
254.000
40.0000
8.00000
89.0000
1.00000
1
Minimum
4.00000
8.00000
2.00000
24.0000
1.00000
0.1000000000000000E+32
DATA STEP
PAGE
2
PAGE
3
Correlation Matrix
X2
Var
2
1
0.75915
X3
Var
3
1
0.80220
2
0.83018
Y
Var
4
1
0.46875
2
0.44628
3
0.62110
CONSTANT
Var
5
1
0.0000
2
0.0000
3
0.0000
B34S Version 8.42e
(D:M:Y)
04/01/99 (H:M:S) 21:58:58
***************
Problem Number
Subproblem Number
REGRESSION STEP
3
1
F to enter
F to remove
Tolerance
Maximum no of steps
Dependent variable X( 4).
Standard Error of Y =
4
0.0000
0.99999998E-02
0.49999999E-02
0.10000000E-04
4
Variable Name Y
19.307491
for degrees of freedom
.............
Step Number 4
Variable Entering
1
Multiple R
0.633617
Std Error of Y.X
16.1340
R Square
0.401471
=
Analysis of Variance
Source
Due Regression
Dev. from Reg.
Total
21.
for reduction in SS due to variable entering
DF
SS
MS
F
3
3142.9
1047.6
4.0246
18
4685.5
260.31
21
7828.4
372.78
Multiple Regression Equation
Variable
Coefficient
Std. Error
T Val.
T Sig. P. Cor. Elasticity
Y
=
X1
X- 1 -0.6784908E-02 0.8144097E-01 -0.8331E-01 0.06548 -0.0196 -0.1047E-01
X2
X- 2 -0.4782399
0.7572547
-0.6315
0.46438 -0.1472 -0.2556
X3
X- 3
8.496616
3.844121
2.210
0.95973 0.4620
0.8442
CONSTANT
X- 5
23.93061
15.96606
1.499
0.84875
Adjusted R Square
-2 * ln(Maximum of Likelihood Function)
Akaike Information Criterion (AIC)
Scwartz Information Criterion (SIC)
Akaike (1970) Finite Prediction Error
Generalized Cross Validation
Hannan & Quinn (1979) HQ
Shibata (1981)
Rice (1984)
Residual Variance
0.301715772931645
180.379400452936
190.379400452936
195.834612719728
307.634186421501
318.151594504287
321.036004555605
290.423882286032
334.678950062951
260.305850048962
Order of entrance (or deletion) of the variables =
3
Estimate of computational error in coefficients =
1 0.2532E-11
2 -0.5376E-11
3 -0.7851E-12
5
2
1
4 -0.9443E-13
Covariance Matrix of Regression Coefficients
Row
1
Variable X- 1
X1
46
F Sig.
0.976444
Partial Cor. for Var. not in equation
Variable
Coefficient
F for selection
Econometric Notes
0.66326323E-02
Row
2
Variable X- 2
X2
-0.17265284E-01 0.57343470
Row
3
Variable X- 3
X3
-0.14835665
-1.6567911
Row
14.777267
4
Variable X- 5
CONSTANT
0.77898727
-6.5357233
Program terminated.
-20.071205
254.91515
All variables put in.
Residual Statistics for...
Original Data
Von Neumann Ratio 1 ...
Von Neumann Ratio 2 ...
For D. F.
2.21784
2.21784
18 t(.9999)=
Infin
1
1.000
1.000
Cell No.
Interval
Act Per
3
1.000
1.000
2.11703
4.9654, t(.999)= 3.9216, t(.99)= 2.8784, t(.95)= 2.1009, t(.90)= 1.7341, t(.80)= 1.3304
Skewness test (Alpha 3) = 0.145227
t Stat
Cell No.
Interval
Act Per
Durbin-Watson TEST.....
,
Peakedness test (Alpha 4)=
1.39268
Normality Test -- Extended grid cell size
1.734
1.330
1.067
0.862
0.688
0.534
2
5
2
1
2
3
0.900
0.800
0.700
0.600
0.500
0.400
0.955
0.864
0.636
0.545
0.500
0.409
=
0.392
4
0.300
0.273
Normality Test -- Small sample grid cell size =
7
3
7
0.800
0.600
0.400
0.864
0.545
0.409
2.20
0.257
0.127
1
1
0.200
0.100
0.091
0.045
4.40
2
0.200
0.091
Extended grid normality test - Prob of rejecting normality assumption
Chi=
8.000
Chi Prob= 0.5665
F(8,
18)= 1.00000
F Prob =0.531050
Small sample normality test - Large grid
Chi=
5.273
Chi Prob= 0.8471
F(3,
18)=
1.75758
F Prob =0.808728
Autocorrelation function of residuals
1) -0.0991
F(
7,
2)
7)
0.1355
=
Sum of squared residuals
1.432
4686.
3) -0.4051
1/F =
0.6985
4)
0.1520
Heteroskedasticity at
Mean squared residual
0.6762
level
213.0
The preceding sections have outlined some of the things that can be done with simple
regression analysis. In the next section of the paper, data will be generated that will better illustrate
problems of omitted variables and "hidden" nonlinearity.
7. Advanced Regression analysis
The below listed B34S code shows how 250 observations for a number of series are generated.
Regression models for the B34S are also shown.
/;
/; nonlinearity and serial correlation in generated data
/;
b34sexec data noob=250 maxlag=1
/; corr
;
* b0=1 b1=100 b2=-100 b3=80
$
* generate three output variables with different characteristics$
build x1 x2 x3 y ynlin yma e$
gen x1
= rn()$
gen x2
= x1*x1$
47
Econometric Notes
gen x3
= lag1(x1)$
/; gen e = 100.*rn()$
gen e
= rn()$
*
;
*
build three variables
$
*
y=f(x1,x2 x3) ;
* ynlin=f(x1, x3);
* yma =f(x1,x2,x3) + theta*lag(et);
*
;
gen y
= 1.0
+ 1.*x1
- 1.*x2
gen ynlin
= y - .8*x3 $
* generate an ma model;
gen yma
= y + (-.95*lag1(e));
b34srun$
/;
/; end of data building
/;
/; b34sexec list iend=20$ b34seend$
b34sexec reg$ model y
= x1 x2
$
b34sexec reg$ model y
= x1
$
b34sexec reg$ model ynlin = x1
$
b34sexec reg$ model ynlin = x1 x2
$
b34sexec reg$ model yma
= x1 x2 x3$
+
.8*x3
+
e
$
b34seend$
b34seend$
b34seend$
b34seend$
b34seend$
/$ do gls
b34sexec regression residuala maxgls=4$ model yma=x1 x2 x3 $ b34seend$
b34sexec matrix;
call loaddata;
call load(rrplots);
call load(data2acf);
call olsq(yma x1 x2 x3 :print);
call data2acf(%res,'Model yma=f(x1, x2, x3)',12,'yma_res_acf.wmf');
b34srun;
/; sort data by variable we suspect is nonlinear
/; Then do RR analysis
/;
b34sexec sort $ by x1$ b34seend$
/; b34sexec list iend=20$ b34seend$
b34sexec reg$
b34sexec reg$
b34sexec reg$
model y
= x1 $
model y
= x1 x3
model ynlin = x1
b34seend$
$ b34seend$
$ b34seend$
/;
/; recursive residual analysis
/; x2 which is a nonlinewar x1 term is missing.
/;
b34sexec matrix;
call loaddata;
48
Can RR detect it?
Econometric Notes
call load(rrplots);
/; call print(rrplots);
call olsq(y x1 x3 :rr 1 :print);
/; call tabulate(%rrobs,%ssr1,%ssr2,%rr,%rrstd,%res);
call print('Sum of squares of std RR ',sumsq(goodrow(%rrstd)):);
call print('Sum of squares of OLS RES ',sumsq(goodrow(%res)):);
/; call print(%rrcoef,%rrcoeft);
/; call rrplots(%rrstd,%rss,%nob,%k,%ssr1,%ssr2,1);
call rrplots(%rrstd,%rss,%nob,%k,%ssr1,%ssr2,0);
/; call names(all);
x1_coef=%rrcoef(,1);
x3_coef=%rrcoef(,2);
call graph(x1_coef,x3_coef
:file 'coef_bias.wmf' :nolabel
:heading 'Omitted Variable Bias x1 and x3 coef');
b34srun;
The above code builds three models.
y  1.0  x1  x 2  .8 x3  e
(7-1)
ynlin  1.0  x1  x 2  e
(7-2)
yma  1.0  x1t  x2t  .8 x3t  et  .95et 1
(7-3)
By construction of the data and ignoring subscripts where there is no confusion:
x 2  ( x1)2
(7-4)
x3t  x1t 1
(7-5)
Since x1 is a random variable, there is no correlation between x1 and x3. Because of (7-4) there is
correlation between x1 and x2. The purpose of the generated data set is to illustrate the conditions
under which an omitted variable will and will not cause a bias in the coefficients estimated for an
incomplete model and to show detection strategy. The yma series illustrates the relationship
between AR (autoregressive) and MA (moving average) error processes.
Assume the lag operator L defined such that xt k  Lk xt . A simple OLS model with an MA process
is defined as
yt    1 x   ( L)et ,
(7-6)
49
Econometric Notes
where  ( L) is a polynomial in the lag operator L. A simple OLS model with an AR process is
defined as
yt    1 x  (1/  ( L))et ,
(7-7)
where  ( L ) is a polynomial in the lag operator L. If we assume further that the maximum order in
 ( L) is 1, i. e.
 ( L)  (1  1L)
(7-8)
It can be proved that a first-order MA model (MA(1)) is equal to an infinite order AR model if
| 1 | 1 . This can be seen if we note that
(1  1L) 
1
(1  1L   2 L2 
(7-9)
  k Lk )
where  i  (1 )i . The importance of equation (7-9) is that it shows that if equation (7-3) is
estimated with GLS, which is implicitly an AR error correction technique, more than first-order
GLS will be required to remove the serial correlation in the error term. In a transfer function model
of the form of
yt 
 ( L)
 ( L)
xt 
e
 ( L)
 ( L) t
(7-10)
then only one MA term (7-8) would be needed and  ( L)  1 . An OLS model is a transfer function
model that constrains ( L)   ( L)   ( L)  1 . GLS allows  ( L)  1 .
The means of the data generated in accordance with equations (7-1) - (7-3) and OLS estimation
of a number of models are given next.
Variable
X1
X2
X3
Y
YNLIN
YMA
E
CONSTANT
# Cases
1
2
3
4
5
6
7
8
249
249
249
249
249
249
249
249
Mean
Std Deviation
0.1508079470
1.116435259
0.1609627505
0.1975673548
0.6879715443E-01
0.1594119944
0.3442446593E-01
1.000000000
Number of observations in data file
Current missing variable code
1.047903751
1.532920056
1.054029274
2.283588868
2.085180439
2.399542772
1.059199304
0.000000000
Variance
1.098102271
2.349843898
1.110977710
5.214778119
4.347977464
5.757805512
1.121903166
0.000000000
Maximum
3.422285173
11.71203580
3.422285173
5.013192154
4.117746229
6.108808152
2.876329588
1.000000000
Minimum
-2.584990017
0.2973256533E-04
-2.584990017
-9.876622239
-9.568155797
-10.57614307
-3.278246463
1.000000000
249
1.000000000000000E+31
The below listed output shows that the coefficients for x1 and x2 are close to their population
50
Econometric Notes
values of 1.0 and -1.0 even though x3 is missing from the model. This is because the omitted
variable X3 is not correlated with an included variable.
REG Command. Version
1 February 1997
Real*8 space available
Real*8 space used
8000000
638
OLS Estimation
Dependent variable
Adjusted R**2
Standard Error of Estimate
Sum of Squared Residuals
Model Sum of Squares
Total Sum of Squares
F( 2,
246)
F Significance
1/Condition of XPX
Number of Observations
Durbin-Watson
Variable
X1
{
X2
{
CONSTANT {
0}
0}
0}
Y
0.6416762467329375
1.366959717042574
459.6704015322101
833.5945719535468
1293.264973485757
223.0557634525042
1.000000000000000
0.1334523794627398
249
1.889314291485050
Coefficient
1.1228155
-1.0266583
1.1744354
B34S 8.10Z
(D:M:Y)
Std. Error
0.83599150E-01
0.57148357E-01
0.10731666
10/12/06 (H:M:S)
8:14:43
t
13.430944
-17.964792
10.943645
REG
STEP
PAGE
3
Since the omitted variable x2 is related to the included variable x1, the estimated coefficient for
x1 is biased.
REG Command. Version
1 February 1997
Real*8 space available
Real*8 space used
8000000
508
OLS Estimation
Dependent variable
Adjusted R**2
Standard Error of Estimate
Sum of Squared Residuals
Model Sum of Squares
Total Sum of Squares
F( 1,
247)
F Significance
1/Condition of XPX
Number of Observations
Durbin-Watson
Variable
X1
{
CONSTANT {
0}
0}
Y
0.1749360075645998
2.074253035297188
1062.723836646581
230.5411368391758
1293.264973485757
53.58274542797675
0.9999999999965166
0.7121866785207466
249
1.944404933534759
Coefficient
0.92008294
0.58811535E-01
Std. Error
0.12569399
0.13281015
t
7.3200236
0.44282410
The model for ynlin does not contain x3. Here the omission of x2 shows as a bias on the included
variable x1. The fact that is appears highly significant (t=7.55) may fool the researcher. The task
ahead is to investigate model specification in a systematic manner using simple tests.
REG Command. Version
1 February 1997
Real*8 space available
Real*8 space used
8000000
508
OLS Estimation
Dependent variable
Adjusted R**2
Standard Error of Estimate
Sum of Squared Residuals
Model Sum of Squares
Total Sum of Squares
F( 1,
247)
F Significance
1/Condition of XPX
Number of Observations
Durbin-Watson
Variable
X1
{
CONSTANT {
0}
0}
Coefficient
0.86152861
-0.61128207E-01
YNLIN
0.1841644277111127
1.883410386043661
876.1669665175114
202.1314444376089
1078.298410955120
56.98282254868781
0.9999999999991491
0.7121866785207466
249
1.922026461982424
Std. Error
0.11412945
0.12059089
t
7.5486967
-0.50690568
51
Econometric Notes
Before beginning the analysis, note that the correct model for ynlin shows coefficients close to
their population values
REG Command. Version
1 February 1997
Real*8 space available
Real*8 space used
8000000
638
OLS Estimation
Dependent variable
Adjusted R**2
Standard Error of Estimate
Sum of Squared Residuals
Model Sum of Squares
Total Sum of Squares
F( 2,
246)
F Significance
1/Condition of XPX
Number of Observations
Durbin-Watson
Variable
X1
{
X2
{
CONSTANT {
0}
0}
0}
YNLIN
0.7410517802311660
1.061084833449131
276.9716518488394
801.3267591062809
1078.298410955120
355.8602142571058
1.000000000000000
0.1334523794627398
249
1.933912998767355
Coefficient
1.0636116
-1.0233690
1.0509212
Std. Error
0.64892761E-01
0.44360674E-01
0.83303169E-01
t
16.390297
-23.069283
12.615622
The model for yma shows negative serial correlation (DW=2.872) even though all variables are in
the model.
REG Command. Version
1 February 1997
Real*8 space available
Real*8 space used
8000000
771
OLS Estimation
Dependent variable
Adjusted R**2
Standard Error of Estimate
Sum of Squared Residuals
Model Sum of Squares
Total Sum of Squares
F( 3,
245)
F Significance
1/Condition of XPX
Number of Observations
Durbin-Watson
Variable
X1
X2
X3
CONSTANT
{
{
{
{
0}
0}
0}
0}
Coefficient
1.0907591
-0.94687924
0.75276712
0.93087875
YMA
0.6413612625787487
1.437001078383063
505.9181643221510
922.0176027460155
1427.935767068167
148.8345537566244
1.000000000000000
0.1270036486757258
249
2.871725884170242
Std. Error
0.88117138E-01
0.60077630E-01
0.86803876E-01
0.11360868
t
12.378513
-15.760929
8.6720450
8.1937292
GLS will be attempted
Problem Number
Subproblem Number
F to enter
F to remove
Tolerance (1.-R**2) for including a variable
Maximum Number of Variables Allowed
Internal Number of dependent variable
Dependent Variable
Standard Error of Y
Degrees of Freedom
.............
Step Number 4
Variable Entering
8
Sig.
Multiple R
0.803554
Std Error of Y.X
1.43700
R Square
0.645700
1
1
9.999999776482582E-03
4.999999888241291E-03
1.000000000000000E-05
4
6
YMA
2.399542771523700
248
Analysis of Variance for reduction in SS due to variable entering
Source
DF
SS
MS
Due Regression
Dev. from Reg.
Total
Multiple Regression Equation
Variable
Coefficient
Std. Error
YMA
=
X1
X- 1
1.090759
0.8811714E-01
X2
X- 2 -0.9468792
0.6007763E-01
X3
X- 3 0.7527671
0.8680388E-01
CONSTANT
X- 8 0.9308788
0.1136087
Adjusted R Square
-2 * ln(Maximum of Likelihood Function)
Akaike Information Criterion (AIC)
T Val.
12.38
-15.76
8.672
8.194
3
245
248
922.02
505.92
1427.9
T Sig. P. Cor. Elasticity
1.00000 0.6203
1.00000 -0.7095
1.00000 0.4846
1.00000
0.6413612625787489
883.1529747953530
893.1529747953530
52
1.032
-6.631
0.7601
307.34
2.0650
5.7578
148.83
F
F
1.000000
Partial Cor. for Var. not in equation
Variable
Coefficient
F for selection
Econometric Notes
Scwartz Information Criterion (SIC)
Akaike (1970) Finite Prediction Error
Generalized Cross Validation
Hannan & Quinn (1979) HQ
Shibata (1981)
Rice (1984)
Residual Variance
910.7402392776766
2.098144341832704
2.098685929466314
2.146406058401069
2.097078566971383
2.099245495112659
2.064972099274085
Order of entrance (or deletion) of the variables =
Estimate of Computational Error in Coefficients
1
-0.255313E-15
2
-0.119643E-15
3
0.190567E-16
1
2
3
8
4
-0.196339E-16
Covariance Matrix of Regression Coefficients
Row
1
Variable X- 1
0.77646301E-02
X1
Row
2
Variable X- 2
X2
-0.71499452E-03 0.36093216E-02
Row
3
Variable X- 3
X3
-0.55762009E-03 0.30981359E-04
Row
4
Variable X- 8
CONSTANT
-0.28296676E-03 -0.39267339E-02 -0.11633355E-02
Program terminated.
0.75349130E-02
0.12906932E-01
All variables put in.
Residual Statistics for Original data
Von Neumann Ratio 1 ...
Von Neumann Ratio 2 ...
For D. F.
245 t(.9999)=
2.88331
2.88331
Infin
25
1.000
1.000
Cell No.
Interval
Act Per
45
1.000
1.000
2.87173
3.9556, t(.999)= 3.3307, t(.99)= 2.5960, t(.95)= 1.9697, t(.90)= 1.6511, t(.80)= 1.2850
Skewness test (Alpha 3) = 0.113402
t Stat
Cell No.
Interval
Act Per
Durbin-Watson TEST.....
,
Peakedness test (Alpha 4)=
2.84927
Normality Test -- Extended grid cell size
1.651
1.285
1.039
0.843
0.675
0.525
20
29
26
23
30
27
0.900
0.800
0.700
0.600
0.500
0.400
0.900
0.819
0.703
0.598
0.506
0.386
=
0.386
24
0.300
0.277
Normality Test -- Small sample grid cell size =
55
53
51
0.800
0.600
0.400
0.819
0.598
0.386
24.90
0.254
0.126
21
24
0.200
0.100
0.181
0.096
49.80
45
0.200
0.181
Extended grid normality test - Prob of rejecting normality assumption
Chi=
3.731
Chi Prob= 0.1195
F(8,
245)= 0.466365
F Prob =0.120880
Small sample normality test - Large grid
Chi=
1.703
Chi Prob= 0.3637
F(3,
245)= 0.567604
F Prob =0.363150
Autocorrelation function of residuals
1
-0.438023
F(
83,
2
-0.874684E-01
83)
=
Sum of squared residuals
Mean squared residual
1.121
3
0.759865E-01
1/F =
4
-0.113471
0.8919
5
0.809694E-01
Heteroskedasticity at
0.6983
level
505.9181643221511
2.031799856715466
Note the ACF values of -.438, -.087 for the OLS model. GLS is now attempted:
Doing Gen. Least Squares using residual Dif. Eq. of order
Lag Coefficients
1
1
-0.436471
Standard Error of Y
Degrees of Freedom
.............
Step Number 4
Variable Entering
3
Sig.
Multiple R
0.868408
Std Error of Y.X 0.901185
R Square
0.754132
Multiple Regression Equation
Variable
Coefficient
Std. Error
1.806380831086763
247
Analysis of Variance for reduction in SS due to variable entering
Source
DF
SS
MS
Due Regression
Dev. from Reg.
Total
T Val.
3
244
247
607.80
198.16
805.96
T Sig. P. Cor. Elasticity
53
202.60
0.81213
3.2630
249.47
F
F
1.000000
Partial Cor. for Var. not in equation
Econometric Notes
YMA
=
X1
X- 1
1.053314
X2
X- 2 -0.9569885
X3
X- 3 0.7796353
CONSTANT
X- 8 0.9431791
Variable
0.7820969E-01
0.5047854E-01
0.7773981E-01
0.8032375E-01
Adjusted R Square
-2 * ln(Maximum of Likelihood Function)
Akaike Information Criterion (AIC)
Scwartz Information Criterion (SIC)
Akaike (1970) Finite Prediction Error
Generalized Cross Validation
Hannan & Quinn (1979) HQ
Shibata (1981)
Rice (1984)
Residual Variance
13.47
-18.96
10.03
11.74
1.00000 0.6530
1.00000 -0.7718
1.00000 0.5403
1.00000
Coefficient
F for selection
0.9965
-6.702
0.7872
0.7511089134685829
648.1547627678506
658.1547627678506
675.7219064986755
0.8252334731172152
0.8254482099043910
0.8442731002246441
0.8248109265359979
0.8256701045844729
0.8121345290994816
Order of entrance (or deletion) of the variables =
1
2
8
3
Covariance Matrix of Regression Coefficients
Row
1
Variable X- 1
0.61167560E-02
X1
Row
2
Variable X- 2
X2
-0.42933638E-03 0.25480832E-02
Row
3
Variable X- 3
X3
-0.26031711E-02 -0.11668626E-03
Row
4
Variable X- 8
CONSTANT
-0.43170371E-04 -0.27722114E-02 -0.41214721E-03
Program terminated.
0.60434773E-02
0.64519047E-02
All variables put in.
Residual Statistics for Smoothed Original data
For GLS
Y and Y estimate scaled by
Von Neumann Ratio 1 ...
Von Neumann Ratio 2 ...
For D. F.
244 t(.9999)=
1.436471170926504
2.30772
2.30772
Infin
21
1.000
1.000
Cell No.
Interval
Act Per
54
1.000
1.000
2.29841
3.9559, t(.999)= 3.3308, t(.99)= 2.5961, t(.95)= 1.9697, t(.90)= 1.6511, t(.80)= 1.2850
Skewness test (Alpha 3) = 0.110831
t Stat
Cell No.
Interval
Act Per
Durbin-Watson TEST.....
,
Peakedness test (Alpha 4)=
2.81493
Normality Test -- Extended grid cell size
1.651
1.285
1.039
0.843
0.675
0.525
33
20
25
25
25
34
0.900
0.800
0.700
0.600
0.500
0.400
0.915
0.782
0.702
0.601
0.500
0.399
=
0.386
23
0.300
0.262
Normality Test -- Small sample grid cell size =
45
50
57
0.800
0.600
0.400
0.782
0.601
0.399
24.80
0.254
0.126
21
21
0.200
0.100
0.169
0.085
49.60
42
0.200
0.169
Extended grid normality test - Prob of rejecting normality assumption
Chi=
8.935
Chi Prob= 0.6522
F(8,
244)= 1.11694
F Prob =0.647735
Small sample normality test - Large grid
Chi=
3.089
Chi Prob= 0.6219
F(3,
244)=
1.02957
F Prob =0.619884
Autocorrelation function of residuals
1
-0.151211
F(
83,
2
-0.321877
83)
=
1.028
Sum of squared residuals
Mean squared residual
3
-0.413296E-03
1/F =
0.9724
4
-0.787568E-01
Heteroskedasticity at
0.5505
level
198.1608251002743
0.7990355850817513
The DW, now 2.298, is closed to 2.0. The ACF shows a spike at lag 2. GLS order 2 is attempted.
Doing Gen. Least Squares using residual Dif. Eq. of order
Lag Coefficients
1
-0.587393
2
-0.343873
Standard Error of Y
Degrees of Freedom
.............
Step Number 4
Variable Entering
Sig.
2
1.488647529013689
246
3
Analysis of Variance for reduction in SS due to variable entering
Source
DF
SS
MS
54
F
F
Econometric Notes
Multiple R
Std Error of Y.X
R Square
0.906985
0.630820
0.822623
Due Regression
Dev. from Reg.
Total
Multiple Regression Equation
Variable
Coefficient
Std. Error
YMA
=
X1
X- 1
1.064045
0.7171594E-01
X2
X- 2 -0.9423114
0.4273665E-01
X3
X- 3 0.7593212
0.7132141E-01
CONSTANT
X- 8 0.9305895
0.6243077E-01
Adjusted R Square
-2 * ln(Maximum of Likelihood Function)
Akaike Information Criterion (AIC)
Scwartz Information Criterion (SIC)
Akaike (1970) Finite Prediction Error
Generalized Cross Validation
Hannan & Quinn (1979) HQ
Shibata (1981)
Rice (1984)
Residual Variance
T Val.
3
243
246
448.46
96.698
545.15
149.49
0.39793
2.2161
T Sig. P. Cor. Elasticity
14.84
-22.05
10.65
14.91
1.00000 0.6894
1.00000 -0.8165
1.00000 0.5640
1.00000
375.65
1.000000
Partial Cor. for Var. not in equation
Variable
Coefficient
F for selection
1.007
-6.599
0.7667
0.8204327863250724
469.3198834070857
479.3198834070857
496.8668250902256
0.4043780501040350
0.4044841286507808
0.4137361553163259
0.4041693287529482
0.4045937579438610
0.3979337783892296
Order of entrance (or deletion) of the variables =
1
2
8
3
Covariance Matrix of Regression Coefficients
Row
1
Variable X- 1
0.51431761E-02
X1
Row
2
Variable X- 2
X2
-0.30211664E-03 0.18264214E-02
Row
3
Variable X- 3
X3
-0.29466037E-02 -0.27120489E-04
Row
4
Variable X- 8
CONSTANT
-0.60309115E-05 -0.19976715E-02 -0.29290972E-03
Program terminated.
0.50867433E-02
0.38976015E-02
All variables put in.
Residual Statistics for Smoothed Original data
For GLS
Y and Y estimate scaled by
Von Neumann Ratio 1 ...
Von Neumann Ratio 2 ...
For D. F.
243 t(.9999)=
1.931266064605667
2.12282
2.12282
Infin
24
1.000
1.000
Cell No.
Interval
Act Per
53
1.000
1.000
2.11423
3.9561, t(.999)= 3.3310, t(.99)= 2.5962, t(.95)= 1.9698, t(.90)= 1.6511, t(.80)= 1.2850
Skewness test (Alpha 3) = 0.726358E-01,
t Stat
Cell No.
Interval
Act Per
Durbin-Watson TEST.....
Peakedness test (Alpha 4)=
2.77604
Normality Test -- Extended grid cell size
1.651
1.285
1.039
0.843
0.676
0.525
29
23
25
23
24
27
0.900
0.800
0.700
0.600
0.500
0.400
0.903
0.785
0.692
0.591
0.498
0.401
=
0.386
30
0.300
0.291
Normality Test -- Small sample grid cell size =
48
47
57
0.800
0.600
0.400
0.785
0.591
0.401
24.70
0.254
0.126
25
17
0.200
0.100
0.170
0.069
49.40
42
0.200
0.170
Extended grid normality test - Prob of rejecting normality assumption
Chi=
4.781
Chi Prob= 0.2193
F(8,
243)= 0.597672
F Prob =0.220557
Small sample normality test - Large grid
Chi=
2.696
Chi Prob= 0.5592
F(3,
243)= 0.898785
F Prob =0.557561
Autocorrelation function of residuals
1
-0.645124E-01
F(
82,
82)
2
-0.148837
=
Sum of squared residuals
Mean squared residual
1.162
3
-0.240427
1/F =
0.8609
4
-0.992257E-01
5
0.728148E-01
Heteroskedasticity at
0.7505
level
96.69790814858062
0.3914895066744155
DW now 2.114. GLS order 3 and 4 attempted with little gain to the DW but showing slow decline
in GLS  values which would be expected in view of (7-9).
Doing Gen. Least Squares using residual Dif. Eq. of order
Lag Coefficients
1
2
3
3
55
Econometric Notes
-0.648299
-0.447794
-0.176326
Standard Error of Y
Degrees of Freedom
1.356570786867690
245
.............
Step Number 4
Variable Entering
3
Sig.
Multiple R
0.923144
Std Error of Y.X 0.524763
R Square
0.852194
Analysis of Variance for reduction in SS due to variable entering
Source
DF
SS
MS
Due Regression
Dev. from Reg.
Total
Multiple Regression Equation
Variable
Coefficient
Std. Error
YMA
=
X1
X- 1
1.027797
0.7160766E-01
X2
X- 2 -0.9485962
0.3945060E-01
X3
X- 3 0.7885052
0.7105830E-01
CONSTANT
X- 8 0.9428557
0.5552124E-01
Adjusted R Square
-2 * ln(Maximum of Likelihood Function)
Akaike Information Criterion (AIC)
Scwartz Information Criterion (SIC)
Akaike (1970) Finite Prediction Error
Generalized Cross Validation
Hannan & Quinn (1979) HQ
Shibata (1981)
Rice (1984)
Residual Variance
T Val.
3
242
245
384.23
66.641
450.87
128.08
0.27538
1.8403
T Sig. P. Cor. Elasticity
14.35
-24.05
11.10
16.98
1.00000 0.6781
1.00000 -0.8396
1.00000 0.5807
1.00000
465.10
1
2
8
3
Covariance Matrix of Regression Coefficients
Row
1
Variable X- 1
0.51276574E-02
X1
Row
2
Variable X- 2
X2
-0.23263544E-03 0.15563500E-02
Row
3
Variable X- 3
X3
-0.33414602E-02 0.11967882E-04
Row
4
Variable X- 8
CONSTANT
-0.35430730E-04 -0.17098083E-02 -0.26379732E-03
Program terminated.
0.50492818E-02
0.30826085E-02
All variables put in.
Residual Statistics for Smoothed Original data
For GLS
Y and Y estimate scaled by
Von Neumann Ratio 1 ...
Von Neumann Ratio 2 ...
For D. F.
242 t(.9999)=
2.272418943212623
2.12109
2.12109
Infin
23
1.000
1.000
Cell No.
Interval
Act Per
53
1.000
1.000
2.11247
3.9564, t(.999)= 3.3312, t(.99)= 2.5963, t(.95)= 1.9698, t(.90)= 1.6512, t(.80)= 1.2851
Skewness test (Alpha 3) = 0.628738E-01,
t Stat
Cell No.
Interval
Act Per
Durbin-Watson TEST.....
Peakedness test (Alpha 4)=
2.66896
Normality Test -- Extended grid cell size
1.651
1.285
1.039
0.843
0.676
0.525
30
24
24
23
30
22
0.900
0.800
0.700
0.600
0.500
0.400
0.907
0.785
0.687
0.589
0.496
0.374
=
0.386
28
0.300
0.285
Normality Test -- Small sample grid cell size =
48
53
50
0.800
0.600
0.400
0.785
0.589
0.374
24.60
0.254
0.126
25
17
0.200
0.100
0.171
0.069
49.20
42
0.200
0.171
Extended grid normality test - Prob of rejecting normality assumption
Chi=
5.707
Chi Prob= 0.3200
F(8,
242)= 0.713415
F Prob =0.320389
Small sample normality test - Large grid
Chi=
1.683
Chi Prob= 0.3593
F(3,
242)= 0.560976
F Prob =0.358735
Autocorrelation function of residuals
1
-0.577779E-01
F(
82,
82)
2
-0.116476
=
Sum of squared residuals
Mean squared residual
1.185
3
-0.153224
1/F =
0.8435
4
-0.211519
5
0.509156E-01
Heteroskedasticity at
66.64108838087667
0.2708987332555962
Doing Gen. Least Squares using residual Dif. Eq. of order
4
56
0.7786
F
1.000000
Partial Cor. for Var. not in equation
Variable
Coefficient
F for selection
0.9723
-6.643
0.7962
0.8503620346592192
376.8392476702688
386.8392476702688
404.3659053499306
0.2798540632805742
0.2799280742725161
0.2863502020183029
0.2797084481582168
0.2800045730288931
0.2753763982680850
Order of entrance (or deletion) of the variables =
F
6
0.347658E-02
level
Econometric Notes
Lag Coefficients
1
-0.694654
2
-0.564775
3
-0.345213
Standard Error of Y
Degrees of Freedom
4
-0.259654
1.203012334750675
244
.............
Step Number 4
Variable Entering
3
Sig.
Multiple R
0.943134
Std Error of Y.X 0.402377
R Square
0.889502
Analysis of Variance for reduction in SS due to variable entering
Source
DF
SS
MS
Due Regression
Dev. from Reg.
Total
Multiple Regression Equation
Variable
Coefficient
Std. Error
YMA
=
X1
X- 1
1.036793
0.6936194E-01
X2
X- 2 -0.9666933
0.3536745E-01
X3
X- 3 0.7804025
0.6903331E-01
CONSTANT
X- 8 0.9634053
0.4764657E-01
Adjusted R Square
-2 * ln(Maximum of Likelihood Function)
Akaike Information Criterion (AIC)
Scwartz Information Criterion (SIC)
Akaike (1970) Finite Prediction Error
Generalized Cross Validation
Hannan & Quinn (1979) HQ
Shibata (1981)
Rice (1984)
Residual Variance
T Val.
3
241
244
314.11
39.020
353.13
104.70
0.16191
1.4472
T Sig. P. Cor. Elasticity
14.95
-27.33
11.30
20.22
1.00000 0.6936
1.00000 -0.8695
1.00000 0.5887
1.00000
646.68
1
2
8
3
Covariance Matrix of Regression Coefficients
Row
1
Variable X- 1
0.48110784E-02
X1
Row
2
Variable X- 2
X2
-0.16111651E-03 0.12508569E-02
Row
3
Variable X- 3
X3
-0.35012367E-02 0.74455260E-04
Row
0.47655984E-02
4
Variable X- 8
CONSTANT
-0.40532334E-04 -0.13901832E-02 -0.26943072E-03
Program terminated.
0.22701954E-02
All variables put in.
Residual Statistics for Smoothed Original data
For GLS
Y and Y estimate scaled by
Von Neumann Ratio 1 ...
Von Neumann Ratio 2 ...
For D. F.
241 t(.9999)=
2.864295541231388
2.11504
2.11504
Infin
22
1.000
1.000
Cell No.
Interval
Act Per
52
1.000
1.000
2.10640
3.9567, t(.999)= 3.3314, t(.99)= 2.5964, t(.95)= 1.9699, t(.90)= 1.6512, t(.80)= 1.2851
Skewness test (Alpha 3) = -.879541E-01,
t Stat
Cell No.
Interval
Act Per
Durbin-Watson TEST.....
Peakedness test (Alpha 4)=
2.53942
Normality Test -- Extended grid cell size
1.651
1.285
1.039
0.843
0.676
0.525
30
30
20
21
22
27
0.900
0.800
0.700
0.600
0.500
0.400
0.910
0.788
0.665
0.584
0.498
0.408
=
0.386
23
0.300
0.298
Normality Test -- Small sample grid cell size =
50
43
50
0.800
0.600
0.400
0.788
0.584
0.408
24.50
0.254
0.126
22
28
0.200
0.100
0.204
0.114
49.00
50
0.200
0.204
Extended grid normality test - Prob of rejecting normality assumption
Chi=
5.408
Chi Prob= 0.2868
F(8,
241)= 0.676020
F Prob =0.287518
Small sample normality test - Large grid
Chi= 0.9796
Chi Prob= 0.1938
F(3,
241)= 0.326531
F Prob =0.193819
Autocorrelation function of residuals
1
-0.580007E-01
F(
82,
82)
2
-0.654729E-01
=
Sum of squared residuals
Mean squared residual
1.215
3
-0.913505E-01
1/F =
0.8233
4
-0.110200
5
-0.165341
Heteroskedasticity at
39.01973744641884
0.1592642344751790
57
6
-0.324259E-01
0.8097
F
1.000000
Partial Cor. for Var. not in equation
Variable
Coefficient
F for selection
0.9808
-6.770
0.7880
0.8881265220663394
245.1681832743955
255.1681832743955
272.6744743271191
0.1645510140428232
0.1645948877321811
0.1683823670820125
0.1644646992743718
0.1646402423899563
0.1619076242590027
Order of entrance (or deletion) of the variables =
F
level
7
-0.258273E-01
Econometric Notes
Gen. Least Squares ended by max. order reached.
The classic MA residual ACF is shown in Figure 7.1. There is one ACF spike but the PACF
suggests a longer AR model which was shown to be captured by the GLS model above.
Model yma=f(x1, x2, x3)
4
3
2
X
1
0
-1
-2
-3
-4
20
40
60
80
100
120
Obs
140
160
180
ACF of Above Series
1
.8
.8
.6
.6
.4
.4
.2
.2
0
0
-.2
-.2
-.4
-.4
-.6
-.6
-.8
-.8
1
2
3
4
5
6
Lag
7
8
220
240
PACF of Above Series
1
-1
200
9
10
11
-1
12
Figure 7.1 Analysis of residuals of the YMA model.
58
1
2
3
4
5
6
Lag
7
8
9
10
11
12
Econometric Notes
Remark: A low order autoregressive structure in the error term is usually easily captured by a
GLS model. However a simple MA residual structure, that might occur in an over shooting
situation, often requires a high order GLS model to clean the residual. The problem is that using
maximum likelihood GLS the GLS autoregressive parameters often are hard to estimate because
they are related as is seen in (7.9)
Recall that the models ynlin  f ( x1) , y  f ( x1) and y  f ( x1, x3) produced biased coefficients
for both the constant and x1 and the constant, x1 and x3 respectively. How might one test such a
models for an excluded variable (x2) that is related to an included variable (x1)? One way to
proceed is to sort the data with respect to one variable (x1 in the example to be shown) and inspect
the Durbin Watson statistic. Nonlinearity will be reflected in a low DW. This approach uses time
series methods on cross section data. This is shown next.
REG Command. Version
1 February 1997
Real*8 space available
Real*8 space used
8000000
508
OLS Estimation
Dependent variable
Adjusted R**2
Standard Error of Estimate
Sum of Squared Residuals
Model Sum of Squares
Total Sum of Squares
F( 1,
247)
F Significance
1/Condition of XPX
Number of Observations
Durbin-Watson
Variable
X1
{
CONSTANT {
0}
0}
Y
0.1749360075645995
2.074253035297189
1062.723836646582
230.5411368391754
1293.264973485757
53.58274542797663
0.9999999999965213
0.7121866785207469
249
0.9304383379985703
Coefficient
0.92008294
0.58811535E-01
REG Command. Version
Y
0.3171457852038800
1.887043512405973
875.9895715575144
417.2754019282424
1293.264973485757
58.59073681198955
1.000000000000000
0.6446864807753573
249
0.7352270766892632
Coefficient
0.85966646
0.82544163
-0.64942535E-01
REG Command. Version
7.3200236
0.44282410
8000000
638
OLS Estimation
Dependent variable
Adjusted R**2
Standard Error of Estimate
Sum of Squared Residuals
Model Sum of Squares
Total Sum of Squares
F( 2,
246)
F Significance
1/Condition of XPX
Number of Observations
Durbin-Watson
0}
0}
0}
t
1 February 1997
Real*8 space available
Real*8 space used
Variable
X1
{
X3
{
CONSTANT {
Std. Error
0.12569399
0.13281015
Std. Error
0.11465356
0.11398725
0.12202611
t
7.4979481
7.2415260
-0.53220196
1 February 1997
Real*8 space available
Real*8 space used
OLS Estimation
Dependent variable
Adjusted R**2
Standard Error of Estimate
Sum of Squared Residuals
Model Sum of Squares
Total Sum of Squares
F( 1,
247)
F Significance
8000000
508
YNLIN
0.1841644277111131
1.883410386043660
876.1669665175107
202.1314444376096
1078.298410955120
56.98282254868798
0.9999999999991508
59
Econometric Notes
1/Condition of XPX
Number of Observations
Durbin-Watson
Variable
X1
{
CONSTANT {
0}
0}
0.7121866785207469
249
0.7345600764854415
Coefficient
0.86152861
-0.61128207E-01
Std. Error
0.11412945
0.12059089
t
7.5486967
-0.50690568
The Durbin Watson tests for the three models were .9304, .7352 and .7346 respectively.
The above results show how the Durbin-Watson test, which was developed for a time series model,
can be used effectively in cross section models to test for equation misspecifications. The results
suggest that if a nonlinearity is suspected, the data should be sorted against each suspected variable
in turn and recursive coefficients analyzed. The recursively estimated coefficients for x1 and x3
for the model y  f ( x1, x3) , when the data was sorted against x1, are displayed in Figure 7.2. The
omitted variable bias is clearly shown by the movement in the x1 coefficient as higher and higher
values of x1 and added to the sample.
Om itted Variabl e B ias x 1 and x 3 coef
8
7
6
5
X
1
_
C
O
E
F
4
3
2
1
20
40
60
80
100
120
140
160
180
200
Obs
Figure 7.2 Recursively estimated X1 and X3 coefficients for X1 Sorted Data
60
220
240
X
3
_
C
O
E
F
Econometric Notes
Plot of Cusum T est
40
U1
U5
U10
20
0
-20
LL 10
5
L1
-40
-60
-80
-100
-120
-140
-160
-180
CUSUM T
0
50
100
150
200
250
D I1
Figure 7.3 CUSUM test on Estimated with Sorted Data
Figures 7.3-7.5 show respectively the CUSUM, CUMSQ and Quandt Likehood ratio tests.
Further detailed on these tests is contained in Stokes (Specifying and Diagnostically Testing
Econometric Models 1997 (see also third edition drafts) Chapter 9. Here we only sketch their use.
Brown, Durbin and Evans (1975) proposed the CUSUM test as a summary measure of
whether there was parameter stability. The test consists of plotting the quantity
i 
i
w
j  K 1
j
/ ˆ .
(7-11)
Where w j is the normalized recursive residual. The CUSUM test is particularly good at detecting
systematic departure of the  i coefficients, while the CUSUMSQ test is useful when the departure
of the  i coefficients from constancy is haphazard rather than systematic. The CUSUMSQ test
involves a plot of  *i defined as
61
Econometric Notes
*i 
i

j  K 1
T
w2j /
w.
j  K 1
2
j
(7-12)
Approximate bounds for  i and  *i are given in Brown, Durbin and Evans (1975). Assuming a
rectangular plot, the upper-right-hand value is 1.0 and the lower-left-hand value is 0.0. A
regression with stable coefficients  i will generate a  *i plot up the diagonal. If the plot lies
above the diagonal, the implication is that the regression is tracking poorly in the early subsample
in comparison with the total sample. A plot below the diagonal suggests the reverse, namely, that
the regression is tracking better in the early subsample than in the complete sample.
The Quandt log-likelihood ratio test involves the calculation of the  i , defined as
i  .5 i ln( 12 )  .5(T  i)ln( 22 )  .5 T ln( 2 )
(7-13)
where  12 ,  22 and  2 are the variances of regressions fitted to the first i observations, the last T i observations and the whole T observations, respectively. The minimum of the plot of  i can be
used to select the "break" in the sample. Although no specific tests are available for  i , the
information suggested by the plot can be tested with the multiperiod Chow test, which is discussed
next.
If structural change is suspected, a homogeneity test (Chow) of equal segments n can be
performed. Given that S (r , i ) is the residual sum of squares from a regression calculated from
observations t  r to i , the appropriate statistic is distributed as F (kp  k , T  kp ) and defined as
( F1 / F2 )((T  kp) / (kp  k ))
(7-14)
where
F1  (S (1, T )  (S (1, n)  S (n  1, 2n) 
and
F2  (S (1, n)  S (n  1,2n) 
 S ( pn  2n  1, pn  n)  S ( pn  n  1, T )))
 S ( pn  n  1, T )).
62
(7-15)
(7-16)
Econometric Notes
Plot of Cusum sq T est
1
.9
.8
.7
.6
.5
.4
.3
.2
.1
0
0
50
100
150
200
D I1
Figure 7.4 CUMSQ Test of Model y model estimated with sorted data.
63
250
Econometric Notes
Plot of Quandt L ikehood Ratio
-20
-30
-40
-50
-60
-70
-80
-90
-100
-110
0
50
100
150
200
250
D I1
Figure 7.5 Quandt Likelihood Ratio tests of y model estimated with sorted data.
Remark: If an inadvertently excluded variable is correlated with an excluded variable, substantial
bias in the estimated coefficients can occur. In cross section analysis, if the data is sort by the
included variables, what are usually thought of as time series techniques can be used to determine
the nature of the problem. For more complex models "automatic" techniques such as GAM, MARS
and ACE can be employed. These are far too complex to discuss in this introductory analysis.
8. Advanced concepts
A problem with simple OLS models is that there may be situations where the estimated
coefficients are biased, or the estimated standard errors are biased. While space precludes a
detailed treatment, some of the problems and their solution are outlined below.
_________________________________________________________
Table Five
64
Econometric Notes
Some Problems and Their Solution.
Problem
Solution
Y a 0 - 1 variable.
PROBIT, LOGIT
Y a bounded variable.
TOBIT
X's not independent (i. e.,
X's not orthogonal to e in the
population.
2SLS, 3SLS, LIML, FIML
Relationship not linear.
Reparameterize model and/or
NLS, MARS, GAM, ACE
Error not random.
GLS, weighed least squares
Coefficients changing from
changing population.
Recursive Residual Analysis
Time series problems.
ARIMA Model, transfer function
model, vector model
Outlier Problems
L1 & MINIMAX Estimation
___________________________________________________________
The 0-1 left-hand variable problem arises if there are only two states for Y. For example, if Y is
coded 0 = alive, 1=dead, then a regression model that predicts more than dead (YHAT > 1) or less
than alive (YHAT < 0) is clearly not using all of the information at hand. While the coefficients of
an OLS model can be interpreted as partial derivatives, in the case of the 0-1 problem, this is not
the case. Assume that you have a number of variables, x1, x2 , , xk , and that high values are
associated with a high probability death before 45 years of age. Clearly, since you cannot be more
than dead, if all variables are high, an additional one more unit for x1 will not have the same effect
than if all variables were low. For such problems, the appropriate procedure is LOGIT or PROBIT
analysis. Due to space and time limitations, these techniques are not illustrated.
A left-hand variable can be bounded on the upper or lower side. Examples of the former
include scores on tests and of the latter money spent on cars. Assume a model where the score on
a test (S) is a function of a number of variables, such as study time (ST), age (A), health (H) and
experience (E). Clearly, one is going to get into diminishing returns regarding study time. If the
number of hours were increased from 200 to 210, the increase in the score would not be the same
as if the hours had been increased from 0 to 10 hours. Such problems require TOBIT procedures,
which have not been illustrated here due to space and time constraints. If an OLS model were fit
to the above data, the coefficient for the study time variable would understate the effect of study
65
Econometric Notes
time on exam scores for relative low total study time hours and overstate the effect of study time
on exam scores for relatively high total study time.
An important assumption of OLS is that the right-hand variables are independent. By this,
we mean that if y  f ( x1 , x2 , , xk ) , where xi are variables, then x1 can be changed without xk
changing. On the other hand, if the system is of the form
y1  1  1 x1  2 y1  e1 ,
(8-1)
y2  2  3 x2  4 y1  e2 ,
(8-2)
then one cannot use 1 as a measure of how y1 will change for a one unit change in x1 , since there
will be an effect on y1 from the change in y 2 in the second equation, which will occur as x1 changes.
Such problems require two-stage least squares (2SLS) or limited information maximum likelihood
estimation (LIML). In addition, if the possible relationship between the error terms e1 and e2 is
taken into consideration, three-stage least squares (3SLS) and or full information maximum
likelihood (FIML) estimation procedures should be used. These more advanced techniques will
not be discussed further here except to say that the appropriate procedures are available.
In OLS estimation, there is always the danger that the estimated linear model is being used
to capture a nonlinear process. Over a short data range, a nonlinear process can look like a linear
process. In the preliminary research on the way to "kill" live polio bacteria in order to make a
"live" vaccine, a graph was used to show that increased percentages of the bacteria were killed as
more heat was applied. A straight line was fit and the appropriate temperature was selected. Much
to the surprise of the researcher, it was later determined that the relationship was not linear; in fact,
proportionately more heat was required to kill bacteria, the lower the percentage of live polio
bacteria. Poor statistical methodology resulted in people unexpectedly getting polio from the
vaccine.
One way to determine if the relationship is nonlinear is to put power and interaction terms in
the regression. The problem is that it is easy to exhaust available CPU time and researcher time
before all possibilities have been tested. The recursive residual procedure which involves starting
from a model with a small sample and recursively estimating and adding observations provides a
way to detect is there are problems in the initial specification. More detail on this approach is
provided in Stokes (Specifying and Diagnostically Testing Econometric Models 1997 Chapter 9).
A brief introduction was given in section 7. The essential idea is that if the data set is sorted against
one of the right-hand variables and regressions are run, adding one observation at a time, a plot or
list of the coefficients will indicate whether they are stable for different ranges of the sorted
variable. If other coefficients change, it indicates the need for interaction terms. If the sorted
variable coefficient changes, it indicates that there is a nonlinear relationship. If time is the variable
for which the sort is made, it suggests that over time the coefficients are shifting.
66
Econometric Notes
The OLS model for time series data can be shown to be a special case of the more general
ARIMA, transfer function and vector autoregressive moving average models. A preliminary look
at some of these models and their uses is presented in the paper "The Box-Jenkins Approach-When
Is It a Cost-effective Alternative," which I wrote with Hugh Neuburger in the Columbia Journal
of World Business (Vol. XI, No. 4, Winter 1976). As noted, if we were to write the model as
yt 
 ( L)
 ( L)
xt 
e
 ( L)
 ( L) t
(8-3)
 ( L)
 0 , then we
 ( L)
 ( L)
have an ARIMA model and are modeling yt as a function of past shocks alone. If
 1 , then
 ( L)
we have a rational distributed lag model. If neither term is zero, we have a transfer function model.
Systems of transfer function type models can be estimated using simultaneous transfer function
estimation techniques or vector model estimation procedures. Space limits more comprehensive
discussion of these important and general class of models beyond this brief treatment.
a more complex lag structure can be modeled than in the simple OLS case. If
9. Summary
These short notes have attempted to outline the scope of elementary applied statistics.
Students are encouraged to experiment with the sample data sets to perform further analysis.
* Editorial assistance was provided by Diana A. Stokes. Important suggestions were made by
Evelyn Lehrer on a prior draft. I am responsible for any errors or omissions.
67
Download