UNC-Wilmington Department of Economics and Finance ECN 377 Dr. Chris Dumas Simultaneous Equations Models-Indirect Least Squares (ILS) and 2-Stage Least Squares (2SLS) Simultaneous Equations Models (SEMs) are models in which two or more equations share two or more variables that link the equations together in a system. In such models, the variables that appear in two or more equations are said to be “mutually dependent,” or “jointly-determined,” or they are said to have a “simultaneous” or “two-way” relationship. Such variables affect one another, causing “feedback” relationships between the equations in the system; that is, if something in one equation changes, the change causes a change in another equation which, in turn, “feeds back” to cause a further change in the first equation. Such situations are sometimes referred to as “chicken and the egg” situations, because it is difficult to determine which came first, a change in one equation, or a change in a second equation, if the two equations are affecting each other. The feedback effects in SEMs either spiral out of control or reach an equilibrium of some sort. Usually, they reach an equilibrium (otherwise, our world would be much more explosive than we observe). Many of the standard models of Economics and Finance are SEMs that reach equilibrium (well, usually they reach equilibrium, under typical conditions). Two well-known examples are the Demand and Supply model of Microeconomics, and the IS-LM model of Macroeconomics. The Problem with Simultaneous Equation Models—Simultaneous Equations Bias Although very common in Economics and Finance, SEM’s face a potential problem when it comes to Econometric estimation of the parameters in their equations using OLS regression. Recall that one of the assumptions of OLS regression is that the error term in the regression equation is independent of the X (and Y) variables in the equation. Well, SEM’s typically violate this assumption, with undesirable results, econometrically-speaking (Can I say, “econometrically-speaking?” Well, I guess I just did.). To see why this is so, consider as an example the Demand and Supply model from Microeconomics. Suppose we have data on the quantity Q of a product traded at various locations when the market is in equilibrium, the market price PQ in each location, average consumer income I in each location, and the price of materials PM in each location. We want to estimate the parameters (the β's) in the supply and demand equations: Demand: QD = β0 + β1·PQ + β2·I + eD, where "eD" is an error term in the Demand equation, Supply: QS = β3 + β4·PQ + β5·PM + eS, where "eS" is an error term in the Supply equation, Equilibrium Condition: QD = QS Notice first that Demand and Supply are a SEM, because together they are two equations that share two variables in common, Q and PQ. Now, suppose that something outside the model changes, and this change affects Demand. This would appear in the model as a change in eD, the error term in the Demand equation. Let’s say that there is an increase in eD. All else held constant in the Demand equation, this would result in an increase in QD. This, in turn, would result in an increase in QS in equilibrium, because, in equilibrium, QD = QS. Next, in the Supply equation, if QS increases, and PM remains constant, and eS is a random error unaffected by the change in QS, then the only way that the equality in the Supply equation can be maintained is if PQ increases. Now, if PQ in the Supply equation increases, then PQ in the Demand equation must also increase, because it is the same PQ in both equations. It is at this point that the Econometric problem occurs: a change in the error term in the Demand equation, eD, affected an “X” variable in the Demand equation, namely, the variable PQ. This violates the assumption of the OLS method that the error term in the regression equation must be independent of the X (and Y) variables in the equation. 1 UNC-Wilmington Department of Economics and Finance ECN 377 Dr. Chris Dumas As a second example, consider the IS-LM model from Macroeconomics. Suppose we have data on national output Y, money supply M, and the interest rate r, and we want to estimate the parameters in the IS and LM equations below: IS Equation: YIS = β0 + β1·r + eIS, where " eIS " is an error term in the IS equation, LM Equation: YLM = β2 + β3·M + β4·r + eLM, where " eLM " is an error term in the LM equation, Equilibrium Condition: YIS = YLM Notice first that IS and LM are a SEM, because together they are two equations that share two variables in common, Y and r. Now, suppose that something outside the model changes, and this change affects the IS equation. This would appear in the model as a change in eIS, the error term in the IS equation. Let’s say that there is an increase in eIS. All else held constant in the IS equation, this would result in an increase in YIS. This, in turn, would result in an increase in YLM in equilibrium, because, in equilibrium, YIS = YLM. Next, in the LM equation, if YLM increases, and M remains constant, and eLM is a random error unaffected by the change in YLM, then the only way that the equality in the LM equation can be maintained is if r increases. Now, if r in the LM equation increases, then r in the IS equation must also increase, because it is the same r in both equations. It is at this point that the Econometric problem occurs: a change in the error term in the IS equation, eIS, affected an “X” variable in the IS equation, namely, the variable r. This violates the OLS assumption that the error term in the regression equation is independent of the X (and Y) variables in the equation. Okay, so what? Well, it can be shown that if this assumption of the OLS method is violated, the following negative consequences occur: Simultaneous Equations Bias: (1) the estimates of the β's are biased (and, sometimes, we don’t even know the direction of the bias!) (2) the estimates of the β's are inconsistent (that is, a larger sample size will not diminish the bias) The Identification Problem The Identification Problem is another problem that is characteristic of SEMs. The Identification Problem is that it can be difficult to determine (that is, to identify) which equation in an SEM system is being estimated in a regression analysis. For example, suppose you have data on the market quantity Q and market price PQ of a product traded in the market, and suppose you want to use regression analysis to estimate a demand curve for product Q. These data are plotted twice in the figures below. The same data are plotted in each figure, but two demand curves are drawn through the data in the figure on the left, and two supply curves are drawn through the (same) data in the figure on the right. Or, Is Supply Shifting ? (cue spooky music from Halloween) Is Demand Shifting ? PQ PQ 0 Q 0 Q 2 UNC-Wilmington Department of Economics and Finance ECN 377 Dr. Chris Dumas Look back at the demand and supply equations we considered earlier in this handout. The data points in the figures above could represent the demand equation, with a change in variable “I” causing the demand curve to shift. On the other hand, the data points could instead represent the supply equation, with a change in “PM” causing the supply curve to shift. The key point: If we have data for only Q and PQ (and no data for I and PM), then it is difficult to “identify” which equation is represented by the data points. If we nonetheless did a regression analysis with data for only Q and PQ, then we wouldn’t be sure whether we were actually estimating a demand curve or a supply curve. Although it might be obvious from the situation that you are studying whether the curve in this simple example is demand or supply, in more complicated (i.e., more realistic) situations involving more equations and more variables, it can be very difficult to identify which of the equations you are actually estimating when you do a regression analysis-unless you are careful . . . Identifying an Equation Depends on the Variables that Are in the System but NOT in the Equation !!! Reconsider the problem of identifying whether the demand equation or the supply equation is represented by the figures above. If we had data on variable I, then as variable I changed its value, the demand curve would shift along the supply curve, and the data points would show us the location of the supply curve—that is, a change in the value of a variable in the demand curve allows us to “see,” or “identify,” the supply curve, as illustrated in the figure on the left below: A shifting Demand Curve reveals the location of the Supply Curve PQ A shifting Supply Curve reveals the location of the Demand Curve PQ 0 Q 0 Q Notice in the figure on the left above that identifying the supply curve depends on changes in the value of a variable that is NOT in the supply curve, namely, the variable I that is in the demand curve. The variable I is in the demand-supply system of equations, but it is not in the supply curve; this is what allows us to use the variable I to identify the supply curve. (Tricky, no? And very Zen master-like, wouldn’t you agree, grasshopper?) https://www.youtube.com/watch?v=gbNCBVzPYak Now consider the problem of identifying the demand curve. If we had data on variable PM in the supply curve, then as variable PM changed its value, the supply curve would shift along the demand curve, and the resulting data points would show us the location of the demand curve—that is, a change in the value of a variable in the supply curve allows us to “see,” or “identify,” the demand curve, as illustrated in the figure on the right above. Identifying the demand curve depends on changes in the value of a variable that is NOT in the demand curve, namely, the variable PM that is in the supply curve. The variable PM is in the demand-supply system of equations, but it is not in the demand curve; this is what allows us to use the variable PM to identify the demand curve. 3 UNC-Wilmington Department of Economics and Finance ECN 377 Dr. Chris Dumas Terminology Involved with Identifying the Equations in SEMs Endogenous Variables are variables whose values are determined inside the SEM. These are the “Y” variables that you are using the model to “solve for.” For example, in a model of demand and supply, the endogenous variables would be Q and PQ. There might be other variables in the demand and supply equations, but we would need to be given the values of these other variables to plug into the equations in order to solve for the values of the endogenous variables, Q and PQ. In contrast to Endogenous Variables, Predetermined Variables are variables whose values are determined outside the SEM. We must be “given” the values of these variables; we do not solve for them using the model. The point of Predetermined Variables is to act as “controls” on the relationship between the Endogenous Variables in the model; that is, the Predetermined Variables act as shifters, shifting around the relationships among the Endogenous Variables so that we can “see,” or “identify” the relationships among the Endogenous Variables. There are two sub-types of Predetermined Variables: 1) Exogenous Variables—These are the variables that are not endogenous variables and have never been endogenous variables. These are the “X” variables in the model; the variables that you are using to help explain and predict movements in the endogenous “Y” variables. 2) Lagged Endogenous Variables—These are endogenous variables from earlier time periods. If we know the values of endogenous variables from earlier time periods, sometimes these are helpful in explaining and predicting the movements of the endogenous variables in the current time period. For example, if we are trying to predict the value of Y in period t, we might be able to use the value of Y in period t-1 to help eS make a better prediction. If we include the values of Y variables from earlier time periods in our SEM, then these are considered a type of Predetermined Variable, because we know the values from earlier time periods (they are “givens”), and we don’t need to solve for them using the model. These Lagged Endogenous Variables are considered another type of “X” variable in the model, because they help explain and predict the movements of the endogenous “Y” variables in the current time period. Structural/Behavioral Equations are the original equations in the SEM that represent structural features of the economy or behavioral aspects of individuals in the economy. For example, the LM curve from macroeconomics is a Structural Equation, because it represents the structure of the relationship between output, money supply and interest rates in the economy. The demand curve from microeconomics is an example of a Behavioral Equation, because it represents the behavior of consumers in a market. Structural/Behavioral Equations are constructed from endogenous and predetermined variables. The parameters (the β's) of Structural/Behavioral Equations are called, perhaps not surprisingly, Structural/Behavioral Parameters. Reduced Form Equations are derived from Structural/Behavioral Equations and express the endogenous variables solely as functions of the predetermined variables. The Reduced Form Equations are derived by solving for the endogenous variables in the Structural/Behavioral Equations of the SEM. Much to the chagrin of creative people everywhere, the parameters of Reduced Form Equations are named . . . Reduced Form Parameters. (sigh) 4 UNC-Wilmington Department of Economics and Finance ECN 377 Dr. Chris Dumas Indirect Least Squares (ILS) Regression Analysis The point of making the distinction between Structural/Behavioral Equations and Reduced Form Equations is that Reduced Form Equations do not suffer from the problem of Simultaneous Equations Bias (yea!). If the Simultaneous Equation Model (SEM) has enough of the right kinds of variables in the right positions, then we can use the Indirect Least Square (ILS) Regression Analysis method to solve for the β's in the Structural/Behavioral Equations. The ILS method proceeds as follows: 1. Derive the Reduced Form Equations from the Structural/Behavioral Equations, 2. Estimate the β's of the Reduced Form Equations using regression analysis (without Simultaneous Equations Bias—yea!), and 3. Calculate the β's of the Structural/Behavioral Equations based on the β's from the Reduced Form Equations (and recall that calculating the β's of the Structural/Behavioral Equations was our original goal—nice!). For example, suppose we were working with the demand and supply equations described earlier in this handout: Demand: QD = β0 + β1·PQ + β2·I + eD, where " eD " is an error term in the Demand equation, Supply: QS = β3 + β4·PQ + β5·PM + eS, where "eS" is an error term in the Supply equation, Equilibrium Condition: QD = QS These are the Structural/Behavioral Equations of the demand and supply SEM. The endogenous variables are Q and PQ, and the predetermined variables are I and PM. In this particular example, both predetermined variables are exogenous variables, and we don’t have any lagged endogenous variables in the system. Now, to derive the Reduced-Form Equations for this demand and supply SEM, we simply solve for the values of the endogenous variables (Q and PQ) in the system: Because of the Equilibrium Condition, we can set Demand equal to Supply . . . β0 + β1·PQ + β2·I + eD = β3 + β4·PQ + β5·PM + eS and solve for the endogenous variable PQ . . . 𝑃𝑄 = [ β3 −β0 β5 ] + [β −β ]∙ β1 −β4 1 4 𝑃𝑀 + [ −β2 ]∙ β1 −β4 𝐼+[ 𝑒𝑆 −𝑒𝐷 ] β1 −β4 this is the Reduced-Form Equa. for PQ Next, plug the Reduced-Form Equation for PQ back into either Demand or Supply, and solve for Q: 𝑄 = [β0 + β1 ∙(β3 −β0 ) β β ] + [β 1−β5 ] ∙ β1 −β4 1 4 −β1 β2 ]∙ 1 −β4 𝑃𝑀 + [β2 + β 𝐼 + [𝑒𝐷 + β1 (𝑒𝑆 −𝑒𝐷 ) ] β1 −β4 this is the Reduced-Form Equa. for Q Now we can do regression analysis on the Reduced-Form Equations, and we won’t have a problem with Simultaneous Equations Bias. Notice that the terms in brackets are either collections of constants or collections of constants and error terms. Each collection of constants acts as a big constant, so we’ll replace each collection of constants with a “megaconstant” (I just invented that term). The “mega-constants” are the Reduced-Form Coefficients. Also, each collection of constants and error terms acts as a big error term, so we’ll replace each of these collections with a “mega-error term” (I just invented that term, too). I’ll use tildes (squiggles) to denote the mega terms in the Reduced-Form equation for PQ, and I’ll use hats to denote the mega terms in the Reduced-Form equation for Q, like this: 5 UNC-Wilmington Department of Economics and Finance ECN 377 Dr. Chris Dumas ̃0 + 𝛽 ̃1 ∙ 𝑃𝑀 + 𝛽 ̃2 ∙ 𝐼 + 𝑒̃ 𝑃𝑄 = 𝛽 𝑃𝑞 ̂0 + 𝛽 ̂1 ∙ 𝑃𝑀 + 𝛽 ̂2 ∙ 𝐼 + 𝑒̂ 𝑄=𝛽 𝑄 The “tilde-β's” and “hat-β's” in the equations above are the Reduced-Form Coefficients. Run a regression analysis separately on each of the Reduced-Form equations above (the tilde and hat equations) to get numbers for the tilde-β's and hat-β's. Then, you can set each of the tilde-β's and hat-β's equal to the bracketed collection of β's that it represents, and, with some tedious algebra, solve for the original β's in the original Structural-Behavioral Equations!! (ta-da!) When Does the ILS Regression Method Actually Work--The Rank and Order Conditions Sadly, when regression analysis is used to estimate the Reduced-Form Coefficients (the “tilde-β's” and “hat-β's”) in the Reduced-Form Equations, it is not always possible to use these values to solve for the original β's in the original Structural-Behavioral Equations. (Egad!) The Rank and Order Conditions are a set of rules that determine whether it is possible to solve for the original Structural Behavioral Coefficients from the Reduced-Form Coefficients. To use the Rank and Order Conditions, we need to define a few more terms: M = the number of endogenous variables in the SEM system of equations m = the number of endogenous variables in the equation of interest (the equation for which you want the β's) K = the number of predetermined variables in the SEM system of equations k = the number of predetermined variables in the equation of interest (the equation for which you want the β's) A = the matrix of β's that is constructed from the β's of the variables excluded from the equation of interest Okay, with the terms above, we can now give the Rank and Order Conditions (drum roll . . .): If (K – k < m – 1), then the equation of interest is under-identified If (K – k = m – 1) AND (rank of matrix A = M – 1), then the equation of interest is exactly-identified If (K – k = m – 1) AND (rank of matrix A < M – 1), then the equation of interest is under-identified If (K – k > m – 1) AND (rank of matrix A = M – 1), then the equation of interest is over-identified If (K – k > m – 1) AND (rank of matrix A < M – 1), then the equation of interest is under-identified Exactly-identified means that you will be able to use the ILS Regression Method to solve for the β's in the equation of interest (Yea!). Under-identified means that there are not enough variables in the SEM that are excluded from the equation of interest to be able to solve for the β's in the equation of interest. So, you will need to go “back to the drawing board” to change the equations in your SEM or change the variables that are in your SEM until you achieve an exactly-identified or over-identified SEM. (Sad sigh . . .) Over-identified means that you will be able to solve for the β's in the equation of interest, but, ironically, there will be more than one set of solutions for the β's in the equation of interest, and you don’t know which set is the true set! In this case, there is an extra step, or “stage,” in the analysis that you can in order to obtain estimates of the “true” set of β's. Perhaps not surprisingly, the analysis method that involves the extra stage is called (you can’t make this stuff up) . . . Two-Stage Least Squares regression analysis (affectionately abbreviated 2SLS). 6 UNC-Wilmington Department of Economics and Finance ECN 377 Dr. Chris Dumas Two-Stage Least Squares (2SLS) Regression Analysis Two-Stage Least Squares (2SLS) Regression Analysis is a method of estimating the original β's in the original Structural/Behavioral Equations of an SEM when the SEM is over-identified. The steps of the method are: 1. Regress each endogenous variable on all of the predetermined variables in the system (this is the first stage of 2SLS). 2. Use the equations to predict the values of the endogenous variables. These “predicted variables” are called Instrumental Variables. 3. Replace any endogenous variables appearing as right-hand-side “X” variables in the equation of interest with the corresponding Instrumental Variables. 4. Estimate the β's of the original equation of interest (with the Instrumental Variables replacing the endogenous right-hand-side X variables) using regression analysis (this is the second stage of 2SLS). For example, suppose we were working with the demand and supply equations described earlier in this handout, but the supply equation had some additional exogenous variables in it, “R” and “G” (doesn’t matter what they are): Demand: QD = β0 + β1·PQ + β2·I + eD, where " eD " is the error term in Demand Supply: QS = β3 + β4·PQ + β5·PM + β6·R + β7·G + eS, where " eS " is the error term in Supply Equilibrium Condition: QD = QS If we check the Rank and Order Conditions, the demand equation in the system above would be over-identified, so we could not use the ILS regression method to find its β's. However, we could use the 2SLS method to find the β's in the demand equation: 1. First, regress PQ on I, PM, R and G using OLS (this is the first, “extra,” stage of 2SLS). 2. Use the equation to predict the value of PQ. This predicted variable is the Instrumental Variable for PQ. 3. Replace the PQ in the demand equation with the Instrumental Variable (the predicted PQ). 4. Estimate the β's of the demand equation (with the Instrumental Variable replacing PQ on the right-handside) using OLS regression analysis (this is the second stage of 2SLS). 7 UNC-Wilmington Department of Economics and Finance ECN 377 Dr. Chris Dumas Indirect Least Squares (ILS) in SAS The "PROC SYSLIN 2SLS" procedure is used for both ILS and 2SLS in SAS. As an example of ILS from macroeconomics, suppose we have data on aggregate consumption (c), national income (y) and aggregate investment (i), and suppose we believe that these variables are related to one another in the following SEM: c = β1 + β2·y + ec, where "ec" is an error term, y = c + i + ey where "ey" is an error term, (Note: The y equation has no β's, because in macro theory, "c + i" adds exactly to "y", plus error. This kind of equation is called an "Identity" equation.) These two equations together are a simultaneous equation model (SEM) because there are two or more variables (in this case, c and y) that are in both equations. The variables c and y are endogenous, because they are in both equations, but the variable i is exogenous because it is in one equation only. In SAS, you must specify the model equations, which variables are endogenous, and which are exogenous. SAS calls the exogenous variables "instruments." proc syslin 2sls data=dataset02; model c = y; model y = c i; endogenous c y; instruments i; run; Two Stage Least Squares (2SLS) in SAS The "PROC SYSLIN 2SLS" procedure is used for both ILS and 2SLS. As an example of 2SLS from microeconomics, let's consider supply and demand for product Q. Suppose we have data on the quantity Q of the product traded in various locations, the price PQ in each location, average consumer income I in each location, the price of a substitute product PS in each location, and the price of materials PM in each location. We want to estimate the supply and demand equations: Demand: Q = β0 + β1·PQ + β2·I + β3·PS + eD, Supply: Q = β4 + β5·PQ + β6·PM + eS, where " eD " is an error term in the Demand equation, where " eS " is an error term in the Supply equation, These two equations together are a simultaneous system because there are two or more variables (in this case, Q and PQ) that are in both equations. The variables Q and PQ are endogenous, because they are in both equations, but the variables I, PS and PM are exogenous because each is in only one equation. In SAS, you must specify the model equations, which variables are endogenous, and which are exogenous. Again, SAS calls the exogenous variables "instruments." proc syslin 2sls data=dataset02; model Q = PQ I PS ; model Q = PQ PM ; endogenous Q PQ ; instruments I PS PM ; run; 8