Simultaneous Equations Models-

advertisement
UNC-Wilmington
Department of Economics and Finance
ECN 377
Dr. Chris Dumas
Simultaneous Equations Models-Indirect Least Squares (ILS) and 2-Stage Least Squares (2SLS)
Simultaneous Equations Models (SEMs) are models in which two or more equations share two or more
variables that link the equations together in a system. In such models, the variables that appear in two or more
equations are said to be “mutually dependent,” or “jointly-determined,” or they are said to have a “simultaneous”
or “two-way” relationship. Such variables affect one another, causing “feedback” relationships between the
equations in the system; that is, if something in one equation changes, the change causes a change in another
equation which, in turn, “feeds back” to cause a further change in the first equation. Such situations are
sometimes referred to as “chicken and the egg” situations, because it is difficult to determine which came first, a
change in one equation, or a change in a second equation, if the two equations are affecting each other.
The feedback effects in SEMs either spiral out of control or reach an equilibrium of some sort. Usually, they
reach an equilibrium (otherwise, our world would be much more explosive than we observe). Many of the
standard models of Economics and Finance are SEMs that reach equilibrium (well, usually they reach
equilibrium, under typical conditions). Two well-known examples are the Demand and Supply model of
Microeconomics, and the IS-LM model of Macroeconomics.
The Problem with Simultaneous Equation Models—Simultaneous Equations Bias
Although very common in Economics and Finance, SEM’s face a potential problem when it comes to
Econometric estimation of the parameters in their equations using OLS regression. Recall that one of the
assumptions of OLS regression is that the error term in the regression equation is independent of the X (and Y)
variables in the equation. Well, SEM’s typically violate this assumption, with undesirable results,
econometrically-speaking (Can I say, “econometrically-speaking?” Well, I guess I just did.). To see why this is
so, consider as an example the Demand and Supply model from Microeconomics.
Suppose we have data on the quantity Q of a product traded at various locations when the market is in
equilibrium, the market price PQ in each location, average consumer income I in each location, and the price of
materials PM in each location. We want to estimate the parameters (the β's) in the supply and demand equations:
Demand: QD = β0 + β1·PQ + β2·I + eD, where "eD" is an error term in the Demand equation,
Supply:
QS = β3 + β4·PQ + β5·PM + eS, where "eS" is an error term in the Supply equation,
Equilibrium Condition: QD = QS
Notice first that Demand and Supply are a SEM, because together they are two equations that share two variables
in common, Q and PQ. Now, suppose that something outside the model changes, and this change affects Demand.
This would appear in the model as a change in eD, the error term in the Demand equation. Let’s say that there is
an increase in eD. All else held constant in the Demand equation, this would result in an increase in QD. This, in
turn, would result in an increase in QS in equilibrium, because, in equilibrium, QD = QS. Next, in the Supply
equation, if QS increases, and PM remains constant, and eS is a random error unaffected by the change in QS, then
the only way that the equality in the Supply equation can be maintained is if PQ increases. Now, if PQ in the
Supply equation increases, then PQ in the Demand equation must also increase, because it is the same PQ in both
equations. It is at this point that the Econometric problem occurs: a change in the error term in the Demand
equation, eD, affected an “X” variable in the Demand equation, namely, the variable PQ. This violates the
assumption of the OLS method that the error term in the regression equation must be independent of the X (and
Y) variables in the equation.
1
UNC-Wilmington
Department of Economics and Finance
ECN 377
Dr. Chris Dumas
As a second example, consider the IS-LM model from Macroeconomics. Suppose we have data on national
output Y, money supply M, and the interest rate r, and we want to estimate the parameters in the IS and LM
equations below:
IS Equation: YIS = β0 + β1·r + eIS, where " eIS " is an error term in the IS equation,
LM Equation: YLM = β2 + β3·M + β4·r + eLM, where " eLM " is an error term in the LM equation,
Equilibrium Condition: YIS = YLM
Notice first that IS and LM are a SEM, because together they are two equations that share two variables in
common, Y and r. Now, suppose that something outside the model changes, and this change affects the IS
equation. This would appear in the model as a change in eIS, the error term in the IS equation. Let’s say that there
is an increase in eIS. All else held constant in the IS equation, this would result in an increase in YIS. This, in turn,
would result in an increase in YLM in equilibrium, because, in equilibrium, YIS = YLM. Next, in the LM equation,
if YLM increases, and M remains constant, and eLM is a random error unaffected by the change in YLM, then the
only way that the equality in the LM equation can be maintained is if r increases. Now, if r in the LM equation
increases, then r in the IS equation must also increase, because it is the same r in both equations. It is at this point
that the Econometric problem occurs: a change in the error term in the IS equation, eIS, affected an “X” variable in
the IS equation, namely, the variable r. This violates the OLS assumption that the error term in the regression
equation is independent of the X (and Y) variables in the equation.
Okay, so what? Well, it can be shown that if this assumption of the OLS method is violated, the following
negative consequences occur:
Simultaneous Equations Bias: (1) the estimates of the β's are biased (and, sometimes, we don’t even
know the direction of the bias!)
(2) the estimates of the β's are inconsistent (that is, a larger sample size
will not diminish the bias)
The Identification Problem
The Identification Problem is another problem that is characteristic of SEMs. The Identification Problem is that
it can be difficult to determine (that is, to identify) which equation in an SEM system is being estimated in a
regression analysis.
For example, suppose you have data on the market quantity Q and market price PQ of a product traded in the
market, and suppose you want to use regression analysis to estimate a demand curve for product Q. These data
are plotted twice in the figures below. The same data are plotted in each figure, but two demand curves are drawn
through the data in the figure on the left, and two supply curves are drawn through the (same) data in the figure on
the right.
Or, Is Supply Shifting ?
(cue spooky music from Halloween)
Is Demand Shifting ?
PQ
PQ
0
Q
0
Q
2
UNC-Wilmington
Department of Economics and Finance
ECN 377
Dr. Chris Dumas
Look back at the demand and supply equations we considered earlier in this handout. The data points in the
figures above could represent the demand equation, with a change in variable “I” causing the demand curve to
shift. On the other hand, the data points could instead represent the supply equation, with a change in “PM”
causing the supply curve to shift.
The key point: If we have data for only Q and PQ (and no data for I and PM), then it is difficult to “identify” which
equation is represented by the data points. If we nonetheless did a regression analysis with data for only Q and
PQ, then we wouldn’t be sure whether we were actually estimating a demand curve or a supply curve. Although it
might be obvious from the situation that you are studying whether the curve in this simple example is demand or
supply, in more complicated (i.e., more realistic) situations involving more equations and more variables, it can be
very difficult to identify which of the equations you are actually estimating when you do a regression analysis-unless you are careful . . .
Identifying an Equation Depends on the Variables that Are in the System but NOT in the Equation !!!
Reconsider the problem of identifying whether the demand equation or the supply equation is represented by the
figures above. If we had data on variable I, then as variable I changed its value, the demand curve would shift
along the supply curve, and the data points would show us the location of the supply curve—that is, a change in
the value of a variable in the demand curve allows us to “see,” or “identify,” the supply curve, as illustrated in the
figure on the left below:
A shifting Demand Curve reveals
the location of the Supply Curve
PQ
A shifting Supply Curve reveals
the location of the Demand Curve
PQ
0
Q
0
Q
Notice in the figure on the left above that identifying the supply curve depends on changes in the value of a
variable that is NOT in the supply curve, namely, the variable I that is in the demand curve. The variable I is in
the demand-supply system of equations, but it is not in the supply curve; this is what allows us to use the variable
I to identify the supply curve.
(Tricky, no? And very Zen master-like, wouldn’t you agree, grasshopper?)
https://www.youtube.com/watch?v=gbNCBVzPYak
Now consider the problem of identifying the demand curve. If we had data on variable PM in the supply curve,
then as variable PM changed its value, the supply curve would shift along the demand curve, and the resulting data
points would show us the location of the demand curve—that is, a change in the value of a variable in the supply
curve allows us to “see,” or “identify,” the demand curve, as illustrated in the figure on the right above.
Identifying the demand curve depends on changes in the value of a variable that is NOT in the demand curve,
namely, the variable PM that is in the supply curve. The variable PM is in the demand-supply system of equations,
but it is not in the demand curve; this is what allows us to use the variable PM to identify the demand curve.
3
UNC-Wilmington
Department of Economics and Finance
ECN 377
Dr. Chris Dumas
Terminology Involved with Identifying the Equations in SEMs
Endogenous Variables are variables whose values are determined inside the SEM. These are the “Y” variables
that you are using the model to “solve for.” For example, in a model of demand and supply, the endogenous
variables would be Q and PQ. There might be other variables in the demand and supply equations, but we would
need to be given the values of these other variables to plug into the equations in order to solve for the values of
the endogenous variables, Q and PQ.
In contrast to Endogenous Variables, Predetermined Variables are variables whose values are determined
outside the SEM. We must be “given” the values of these variables; we do not solve for them using the model.
The point of Predetermined Variables is to act as “controls” on the relationship between the Endogenous
Variables in the model; that is, the Predetermined Variables act as shifters, shifting around the relationships
among the Endogenous Variables so that we can “see,” or “identify” the relationships among the Endogenous
Variables. There are two sub-types of Predetermined Variables:
1) Exogenous Variables—These are the variables that are not endogenous variables and have never been
endogenous variables. These are the “X” variables in the model; the variables that you are using to help
explain and predict movements in the endogenous “Y” variables.
2) Lagged Endogenous Variables—These are endogenous variables from earlier time periods. If we
know the values of endogenous variables from earlier time periods, sometimes these are helpful in
explaining and predicting the movements of the endogenous variables in the current time period. For
example, if we are trying to predict the value of Y in period t, we might be able to use the value of Y in
period t-1 to help eS make a better prediction. If we include the values of Y variables from earlier time
periods in our SEM, then these are considered a type of Predetermined Variable, because we know the
values from earlier time periods (they are “givens”), and we don’t need to solve for them using the model.
These Lagged Endogenous Variables are considered another type of “X” variable in the model, because
they help explain and predict the movements of the endogenous “Y” variables in the current time period.
Structural/Behavioral Equations are the original equations in the SEM that represent structural features of the
economy or behavioral aspects of individuals in the economy. For example, the LM curve from macroeconomics
is a Structural Equation, because it represents the structure of the relationship between output, money supply and
interest rates in the economy. The demand curve from microeconomics is an example of a Behavioral Equation,
because it represents the behavior of consumers in a market. Structural/Behavioral Equations are constructed
from endogenous and predetermined variables. The parameters (the β's) of Structural/Behavioral Equations are
called, perhaps not surprisingly, Structural/Behavioral Parameters.
Reduced Form Equations are derived from Structural/Behavioral Equations and express the endogenous
variables solely as functions of the predetermined variables. The Reduced Form Equations are derived by solving
for the endogenous variables in the Structural/Behavioral Equations of the SEM. Much to the chagrin of creative
people everywhere, the parameters of Reduced Form Equations are named . . . Reduced Form Parameters.
(sigh)
4
UNC-Wilmington
Department of Economics and Finance
ECN 377
Dr. Chris Dumas
Indirect Least Squares (ILS) Regression Analysis
The point of making the distinction between Structural/Behavioral Equations and Reduced Form Equations is that
Reduced Form Equations do not suffer from the problem of Simultaneous Equations Bias (yea!). If the
Simultaneous Equation Model (SEM) has enough of the right kinds of variables in the right positions, then we can
use the Indirect Least Square (ILS) Regression Analysis method to solve for the β's in the
Structural/Behavioral Equations. The ILS method proceeds as follows:
1. Derive the Reduced Form Equations from the Structural/Behavioral Equations,
2. Estimate the β's of the Reduced Form Equations using regression analysis (without Simultaneous
Equations Bias—yea!), and
3. Calculate the β's of the Structural/Behavioral Equations based on the β's from the Reduced Form
Equations (and recall that calculating the β's of the Structural/Behavioral Equations was our original
goal—nice!).
For example, suppose we were working with the demand and supply equations described earlier in this handout:
Demand: QD = β0 + β1·PQ + β2·I + eD, where " eD " is an error term in the Demand equation,
Supply:
QS = β3 + β4·PQ + β5·PM + eS, where "eS" is an error term in the Supply equation,
Equilibrium Condition: QD = QS
These are the Structural/Behavioral Equations of the demand and supply SEM. The endogenous variables are Q
and PQ, and the predetermined variables are I and PM. In this particular example, both predetermined variables are
exogenous variables, and we don’t have any lagged endogenous variables in the system. Now, to derive the
Reduced-Form Equations for this demand and supply SEM, we simply solve for the values of the endogenous
variables (Q and PQ) in the system:
Because of the Equilibrium Condition, we can set Demand equal to Supply . . .
β0 + β1·PQ + β2·I + eD = β3 + β4·PQ + β5·PM + eS
and solve for the endogenous variable PQ . . .
𝑃𝑄 = [
β3 −β0
β5
] + [β −β
]∙
β1 −β4
1
4
𝑃𝑀 + [
−β2
]∙
β1 −β4
𝐼+[
𝑒𝑆 −𝑒𝐷
]
β1 −β4
this is the Reduced-Form Equa. for PQ
Next, plug the Reduced-Form Equation for PQ back into either Demand or Supply, and solve for Q:
𝑄 = [β0 +
β1 ∙(β3 −β0 )
β β
] + [β 1−β5 ] ∙
β1 −β4
1
4
−β1 β2
]∙
1 −β4
𝑃𝑀 + [β2 + β
𝐼 + [𝑒𝐷 +
β1 (𝑒𝑆 −𝑒𝐷 )
]
β1 −β4
this is the Reduced-Form Equa. for Q
Now we can do regression analysis on the Reduced-Form Equations, and we won’t have a problem with
Simultaneous Equations Bias.
Notice that the terms in brackets are either collections of constants or collections of constants and error terms.
Each collection of constants acts as a big constant, so we’ll replace each collection of constants with a “megaconstant” (I just invented that term). The “mega-constants” are the Reduced-Form Coefficients. Also, each
collection of constants and error terms acts as a big error term, so we’ll replace each of these collections with a
“mega-error term” (I just invented that term, too). I’ll use tildes (squiggles) to denote the mega terms in the
Reduced-Form equation for PQ, and I’ll use hats to denote the mega terms in the Reduced-Form equation for Q,
like this:
5
UNC-Wilmington
Department of Economics and Finance
ECN 377
Dr. Chris Dumas
̃0 + 𝛽
̃1 ∙ 𝑃𝑀 + 𝛽
̃2 ∙ 𝐼 + 𝑒̃
𝑃𝑄 = 𝛽
𝑃𝑞
̂0 + 𝛽
̂1 ∙ 𝑃𝑀 + 𝛽
̂2 ∙ 𝐼 + 𝑒̂
𝑄=𝛽
𝑄
The “tilde-β's” and “hat-β's” in the equations above are the Reduced-Form Coefficients. Run a regression
analysis separately on each of the Reduced-Form equations above (the tilde and hat equations) to get numbers for
the tilde-β's and hat-β's. Then, you can set each of the tilde-β's and hat-β's equal to the bracketed collection of β's
that it represents, and, with some tedious algebra, solve for the original β's in the original Structural-Behavioral
Equations!! (ta-da!)
When Does the ILS Regression Method Actually Work--The Rank and Order Conditions
Sadly, when regression analysis is used to estimate the Reduced-Form Coefficients (the “tilde-β's” and “hat-β's”)
in the Reduced-Form Equations, it is not always possible to use these values to solve for the original β's in the
original Structural-Behavioral Equations. (Egad!)
The Rank and Order Conditions are a set of rules that determine whether it is possible to solve for the original
Structural Behavioral Coefficients from the Reduced-Form Coefficients. To use the Rank and Order Conditions,
we need to define a few more terms:
M = the number of endogenous variables in the SEM system of equations
m = the number of endogenous variables in the equation of interest (the equation for which you want the β's)
K = the number of predetermined variables in the SEM system of equations
k = the number of predetermined variables in the equation of interest (the equation for which you want the β's)
A = the matrix of β's that is constructed from the β's of the variables excluded from the equation of interest
Okay, with the terms above, we can now give the Rank and Order Conditions (drum roll . . .):





If (K – k < m – 1), then the equation of interest is under-identified
If (K – k = m – 1) AND (rank of matrix A = M – 1), then the equation of interest is exactly-identified
If (K – k = m – 1) AND (rank of matrix A < M – 1), then the equation of interest is under-identified
If (K – k > m – 1) AND (rank of matrix A = M – 1), then the equation of interest is over-identified
If (K – k > m – 1) AND (rank of matrix A < M – 1), then the equation of interest is under-identified
Exactly-identified means that you will be able to use the ILS Regression Method to solve for the β's in the
equation of interest (Yea!).
Under-identified means that there are not enough variables in the SEM that are excluded from the equation of
interest to be able to solve for the β's in the equation of interest. So, you will need to go “back to the drawing
board” to change the equations in your SEM or change the variables that are in your SEM until you achieve an
exactly-identified or over-identified SEM. (Sad sigh . . .)
Over-identified means that you will be able to solve for the β's in the equation of interest, but, ironically, there
will be more than one set of solutions for the β's in the equation of interest, and you don’t know which set is the
true set! In this case, there is an extra step, or “stage,” in the analysis that you can in order to obtain estimates of
the “true” set of β's. Perhaps not surprisingly, the analysis method that involves the extra stage is called (you
can’t make this stuff up) . . . Two-Stage Least Squares regression analysis (affectionately abbreviated 2SLS).
6
UNC-Wilmington
Department of Economics and Finance
ECN 377
Dr. Chris Dumas
Two-Stage Least Squares (2SLS) Regression Analysis
Two-Stage Least Squares (2SLS) Regression Analysis is a method of estimating the original β's in the original
Structural/Behavioral Equations of an SEM when the SEM is over-identified. The steps of the method are:
1. Regress each endogenous variable on all of the predetermined variables in the system (this is the first
stage of 2SLS).
2. Use the equations to predict the values of the endogenous variables. These “predicted variables” are
called Instrumental Variables.
3. Replace any endogenous variables appearing as right-hand-side “X” variables in the equation of interest
with the corresponding Instrumental Variables.
4. Estimate the β's of the original equation of interest (with the Instrumental Variables replacing the
endogenous right-hand-side X variables) using regression analysis (this is the second stage of 2SLS).
For example, suppose we were working with the demand and supply equations described earlier in this handout,
but the supply equation had some additional exogenous variables in it, “R” and “G” (doesn’t matter what they
are):
Demand: QD = β0 + β1·PQ + β2·I + eD,
where " eD " is the error term in Demand
Supply:
QS = β3 + β4·PQ + β5·PM + β6·R + β7·G + eS, where " eS " is the error term in Supply
Equilibrium Condition: QD = QS
If we check the Rank and Order Conditions, the demand equation in the system above would be over-identified,
so we could not use the ILS regression method to find its β's. However, we could use the 2SLS method to find
the β's in the demand equation:
1. First, regress PQ on I, PM, R and G using OLS (this is the first, “extra,” stage of 2SLS).
2. Use the equation to predict the value of PQ. This predicted variable is the Instrumental Variable for
PQ.
3. Replace the PQ in the demand equation with the Instrumental Variable (the predicted PQ).
4. Estimate the β's of the demand equation (with the Instrumental Variable replacing PQ on the right-handside) using OLS regression analysis (this is the second stage of 2SLS).
7
UNC-Wilmington
Department of Economics and Finance
ECN 377
Dr. Chris Dumas
Indirect Least Squares (ILS) in SAS
The "PROC SYSLIN 2SLS" procedure is used for both ILS and 2SLS in SAS. As an example of ILS from
macroeconomics, suppose we have data on aggregate consumption (c), national income (y) and aggregate
investment (i), and suppose we believe that these variables are related to one another in the following SEM:
c = β1 + β2·y + ec, where "ec" is an error term,
y = c + i + ey where "ey" is an error term,
(Note: The y equation has no β's, because in macro theory, "c + i" adds exactly to "y", plus error.
This kind of equation is called an "Identity" equation.)
These two equations together are a simultaneous equation model (SEM) because there are two or more variables
(in this case, c and y) that are in both equations. The variables c and y are endogenous, because they are in both
equations, but the variable i is exogenous because it is in one equation only. In SAS, you must specify the model
equations, which variables are endogenous, and which are exogenous. SAS calls the exogenous variables
"instruments."
proc syslin 2sls data=dataset02;
model c = y;
model y = c i;
endogenous c y;
instruments i;
run;
Two Stage Least Squares (2SLS) in SAS
The "PROC SYSLIN 2SLS" procedure is used for both ILS and 2SLS. As an example of 2SLS from
microeconomics, let's consider supply and demand for product Q. Suppose we have data on the quantity Q of the
product traded in various locations, the price PQ in each location, average consumer income I in each location, the
price of a substitute product PS in each location, and the price of materials PM in each location. We want to
estimate the supply and demand equations:
Demand: Q = β0 + β1·PQ + β2·I + β3·PS + eD,
Supply:
Q = β4 + β5·PQ + β6·PM + eS,
where " eD " is an error term in the Demand equation,
where " eS " is an error term in the Supply equation,
These two equations together are a simultaneous system because there are two or more variables (in this case, Q
and PQ) that are in both equations. The variables Q and PQ are endogenous, because they are in both equations, but
the variables I, PS and PM are exogenous because each is in only one equation. In SAS, you must specify the
model equations, which variables are endogenous, and which are exogenous. Again, SAS calls the exogenous
variables "instruments."
proc syslin 2sls data=dataset02;
model Q = PQ I PS ;
model Q = PQ PM ;
endogenous Q PQ ;
instruments I PS PM ;
run;
8
Download