Universidad Carlos III de Madrid ME + MIEM César Alonso ECONOMETRICS I THE MULTIPLE LINEAR REGRESSION MODEL Contents 1 The 1.1 1.2 1.3 Multiple Regression Model . . . . . . . . . . . . . . . Assumptions of the Multiple Regression Model . . . . Interpretation of coe¢ cients . . . . . . . . . . . . . . The relation between multiple and simple regression: regression . . . . . . . . . . . . . . . . . . . . . . . . 2 Parameter interpretation in the most usual speci…cations . 2.1 Linear in variables model . . . . . . . . . . . . . . . . 2.2 Semilogarithmic models . . . . . . . . . . . . . . . . 2.2.1 Model with log in the exogenous variable . . . 2.2.2 Model with log in the endogenous variable . . 2.3 Double logarithmic model . . . . . . . . . . . . . . . 2.4 Model with quadratic terms . . . . . . . . . . . . . . 2.5 Other models . . . . . . . . . . . . . . . . . . . . . . 2.5.1 Reciprocal model . . . . . . . . . . . . . . . . 2.5.2 Models with interactions . . . . . . . . . . . . 2.6 Final comments . . . . . . . . . . . . . . . . . . . . . 3 Estimation in the multiple regression model: OLS . . . . . 3.1 Properties of the OLS estimators . . . . . . . . . . . 3.2 Estimation of 2 . . . . . . . . . . . . . . . . . . . . 3.3 Variances of the OLS estimators . . . . . . . . . . . . 3.4 Goodness of …t measures . . . . . . . . . . . . . . . . 4 Inference in the multiple regression model . . . . . . . . . 1 . . . . . . . . . long . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . short . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 2 4 6 8 10 11 11 12 14 16 17 17 17 18 18 20 21 21 23 24 4.1 4.2 4.3 4.4 Hypothesis tests on a single coe¢ cient . Tests about a linear restriction on several Tests about q linear restrictions . . . . . Test of joint signi…cance . . . . . . . . . . . . . . . . parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 27 28 31 1 The Multiple Regression Model In most economic applications, more than two variables are involved, since the factors explaining an economic phenomenon used to be multiple. This points out the limitation of the simple regression model for the empirical analysis. The extension to several explanatory variables is imperative to address most interesting real-world problems. We can propose a model to consider the existence of a multiple relationship between Y and several other variables X1 ; X2 ; : : : ; XK . Examples: Y = wage X1 = education X2 = experience X3 = gender Y = sales X1 = advertising expenditure X2 = prices So, for example, we will have: Wage = 0 + 1 Education + 2 Experience + 3 Gender +" – The unobserved error term " captures any other factors di¤erent than Education, Experience or Gender, a¤ecting the Wage. – Notice that since the Multiple Regression Model accounts of several factors, it will be easier to argue the independence of the observed explanatory variables with respect to unobserved factors including in ". – If we are interested in the e¤ect of Education on Wages keeping other factors constant, in our example we can ensure that we can measure the 1 e¤ect of Education for a given Experience and a given Gender. On the contrary, in the simple regression model, the coe¢ cient of education can be interpreted as the e¤ect of education for a given experience and gender only if experience and gender were uncorrelated with education. 1.1 Assumptions of the Multiple Regression Model Let the multiple linear regression model be: Y = 0 + 1 X1 + 2 X2 + + K XK +" The assumptions are very similar to those invoked in the simple regression model. 1. Linearity in parameters 2. E("jX1 ; X2 ; : : : ; XK ) = 0 for any combination of values of X1 ; X2 ; : : : ; XK . ) E(Y jX1 ; X2 ; : : : ; XK ) = 0 + 1 X1 + 2 X2 + + K XK 3. Homoskedasticity: V ("jX1 ; X2 ; : : : ; XK ) = ) V (Y jX1 ; X2 ; : : : ; XK ) = 2 2 4. Absence of multicollinearity: No explanatory variable is an exact linear function of other explanatory variables. As in the simple regression case, assumption 2 is crucial for the model parameter yield a causal interpretation. But conditioning on several variables, this assumption will be more likely to be full…lled. 2 The assumptions 1 and 2 imply that: – The Conditional Expectation Function (CEF) is linear: E(Y jX1 ; X2 ; : : : ; XK ) = 0 + 1 X1 + 2 X2 + + K XK – For each possible combination of (X1 ; X2 ; : : : ; XK ), the CEF yields the mean of Y in the subpopulation given the corresponding values of X1 ; X2 ; : : : ; XK . – The CEF, as in the simple linear model, coincides with L(Y jX1 ; X2 ; : : : ; XK ), is the best predictor, in the sense that it minimizes E("2 ), where " = prediction error = Y c(X1 ; X2 ; : : : ; XK ) = Y ( 0 + 1 X1 + 2 X2 + Hence, the …rst order conditions determining the ’s are: E(") = 0; C(X1 ; ") = 0; : : : ; C(XK ; ") = 0: Example: Linear regression model with two explanatory variables: Y = 0 + 1 X1 + 2 X2 +" where: Y = earnings X1 = education X2 = gender = 1 if woman 0 if man – We have that: E(Y jX1 ; X2 ) = 0 + 1 X1 + 2 X2 so that E(Y jX1 ; X2 = 0) = 0 + 1 X1 E(Y jX1 ; X2 = 1) = 0 + 2 3 + 1 X1 + K XK ). Consequently, if 2 < 0, E(Y jX1 ; X2 = 0) is a line parallel to E(Y jX1 ; X2 = 1) and above it. 1.2 Interpretation of coe¢ cients If all variables except Xj remain constant (other things equal), E(Y jX1 ; X2 ; : : : ; XK ) = j Xj and therefore j = E(Y jX1 ; X2 ; : : : ; XK ) Xj j = @E(Y jX1 ; X2 ; : : : ; XK ) @Xj In other words, 4 Thus, we interpret j as: When Xj increases by one unit (all other things constant), Y varies, on average, by j units of Y . This interpretation corresponds to ceteris paribus notion. Precisely, this ability to make ceteris paribus comparisons in estimating the relationship between one variable and another, is the value of econometric analysis. It must be noticed that the multiple regression Y = E(Y jX1 ; X2 ; : : : ; XK ) + " answers a di¤erent question than the simple regressions Y = E(Y jX1 ) + "1 ; ::: Y = E(Y jXK ) + "K Example: Considering earnings, education and gender, – E(Y jX1 ; X2 ) = 0 + 1 X1 + 2 X2 = Expected earnings for X1 years of education for a given gender X2 . 1: Change in earnings due to an additional year of education for a given gender. 5 – E(Y jX1 ) = + 0 1 X1 = Expected earnings for X1 years of education. 1: 1.3 Change in earnings due to an additional year of education without controlling gender The relation between multiple and simple regression: long vs. short regression Consider the simplest multiple linear regression model (population “long regression”) Y = E(Y jX1 ; X2 ) + " where E(Y jX1 ; X2 ) = L(Y jX1 ; X2 ) = The parameters 0, 1 and 2 0 + 1 X1 + 2 X2 must verify: E(") = 0; C(X1 ; ") = 0; C(X2 ; ") = 0: E(") = 0 ) 0 1 E(X1 ) 2 E(X2 ) (1) 2 C(X1 ; X2 ) = C(X1 ; Y ) (2) (X2 ) = C(X2 ; Y ) (3) = E(Y ) C(X1 ; ") = 0 ) 1V C(X2 ; ") = 0 ) 1 C(X1 ; X2 ) (X1 ) + + 2V From (2) and (3) we have: 1 2 V (X2 )C(X1 ; Y ) C(X1 ; X2 )C(X2 ; Y ) V (X1 )V (X2 ) [C(X1 ; X2 )]2 V (X1 )C(X2 ; Y ) C(X1 ; X2 )C(X1 ; Y ) = V (X1 )V (X2 ) [C(X1 ; X2 )]2 = 6 Note that, if C(X1 ; X2 ) = 0: C(X1 ; Y ) V (X1 ) C(X2 ; Y ) = V (X2 ) = 1 2 (slope of L(Y jX1 )) (slope of L(Y jX2 )) Consider now the simple linear regression model (pupulation “short regression”) Y = E(Y jX1 ) + "1 where E(Y jX1 ) = L(Y jX1 ) = The parameters 0 and 1 0 + 1 X1 must verify: E("1 ) = 0 ) 0 = E(Y ) 1 E(X1 ) (4) C(X1 ; "1 ) = 0 ) 1 = C(X1 ; Y ) /V (X1 ) (5) From (2) and (5), 1 = C(X1 ; Y ) = V (X1 ) 1V (X1 ) + 2 C(X1 ; X2 ) = V (X1 ) 1 + 2 C(X1 ; X2 ) V (X1 ) Hence: – – 1 = 1 only if either C(X1 ; X2 ) = 0 or = 0. 2 C(X1 ; X2 ) is the slope of L(X2 jX1 ) : V (X1 ) L(X2 jX1 ) = 0 + 1 X1 By the same reasoning, there always will be another simple linear regression: E(Y jX2 ) = L(Y jX2 ) = 7 0 + 2 X2 where the parameters 0 and 2 must verify: 0 = E(Y ) 2 E(X2 ) (4’) 2 = C(X2 ; Y ) /V (X2 ) (5’) From (3) and (5’), 2 = C(X2 ; Y ) = V (X2 ) 2V (X2 ) + 1 C(X1 ; X2 ) = V (X2 ) 2 + 1 C(X1 ; X2 ) V (X2 ) Likewise, – – 2 = 2 only if either C(X1 ; X2 ) = 0 or = 0. C(X1 ; X2 ) is the slope of L(X1 jX2 ) : V (X2 ) L(X1 jX2 ) = 2 1 0 + 1 X1 Parameter interpretation in the most usual speci…cations Chapter 7 (7.5) and 13 (13.2), Goldberger Chapter 2 (2.4), 3 (3.1) and 6 (6.2), Wooldridge We have focused on linear relations (both in parameters and in variables) between the dependent variable Y/ and the explanatory variables X1 , . . . , Xk . However, many relations in economics are non linear. Provided that the model is linear-in-parameters, the regression analysis allows to introduce non linear relations. 8 Key point: In general, when we are saying that the regression model is linear, we mean that the model is linear-in-parameters. But it can be referred to non-linear transformations of the original variables. The concept of elasticity is very important in economics: it measures the percentage change in a variable (Y ) in response to a percentage change in another variable (X). – In general, elasticities are not constant for most speci…cations. The value will depend on the realized values of the explanatory variable (X) and the response variable (Y ). – The transformation that we might apply to the variables will a¤ect the way in which elasticities are calculated. We will consider the most usual speci…cations. For the sake of simplicity, we will concentrate in models with one or two explanatory variables. 9 Model Linear Speci…cation Causal e¤ect Y = 0 + 1X +" Semilog (X) Y = 0 + 1 ln X Semilog (Y ) ln Y = Doub. log ln Y = = 1 +" 1 ' 0 + 1X +" 1 ' 0 + 1 ln X +" 1 ' OTHER Reciprocal Y = Interactions (2 o r m o re va ria b le s) 0 + Y = 0+ + 2 X2 + 1 1X E(Y jX) X E(Y jX) ln X E(Y jX) X=X = E(ln Y jX) X E[( Y =Y ) j X] X = E(ln Y jX) ln X E[( Y =Y ) j X] X=X = Elasticity X 1 E(Y jX) 1 E(Y jX) 1X 1 +" E[( Y j X1 ;X2 ] X1 1 X1 3 X1 X2 +" = 1 + 3 X2 MORE... 2.1 Linear in variables model The model is simply Y = where E("jX) = 0 ) E(Y jX) = Interpretation of 0 0 + + 1X + ", 1 X. 1: 1 = E(Y jX) ) As X increases by 1 unit, X Y varies on average by 1 unit of Y . Elasticity of E(Y jX) with respect to X: E [( Y =Y )jX] = X=X 10 1 X E(Y jX) Interpretation As X " by 1 unit Y varies on avg. 1 units. As X " by 1% Y varies on avg. 1 =100 units. As X " by 1 unit Y varies on avg. ( 1 100) %. When X " by 1% Y varies on avg. 1 %. As X " 1 unit Y varies on avg. 1 1 2 units. X As X " 1 unit Y varies on avg. 1 + 3 X2 units. The elasticity varies with the possible realizations of X and Y , so it is not constant. Usually, we calculate elasticities for particular individuals, with particular values of X and Y , as 1 2.2 2.2.1 X . Y Semilogarithmic models Model with log in the exogenous variable Sometimes, percentage changes in X lead to constant changes in Y : Y = 0 + 1 ln X + ", where E("jX) = 0 ) E(Y jX) = 0 + 1 ln X. Interpretation of 1: 1 Notice that 1 = E(Y jX) ' ln X E(Y jX) X=X is a semielasticity. Elasticity of E(Y jX) with respect to X 1 E(Y jX) , which depends on the particular realization of E(Y jX). Usually, we calculate elasticities for particular individuals, with a particular value of Y , as 1 Y 11 , or we use the sample mean of Y , Y , or estimating E(Y jX) as the predicted value of Y for the sample means of the X’s, E Y\ j X = X = b0 + b1 X. Multiplying and dividing by 100 to express the change in X in percentage terms, 1 =100 ' E(Y jX) ) As X increases by 1%, 100 X=X Y varies on average by 1 =100 units of Y . Example: Let Y = Consumption (in euros), X = Income. (in euros). Consider two alternative models: Model 1: Y = 0 + 1X +" E(Y jX) ) If income increases by 1 euro, consumption X varies on average by 1 euros In this model, 1 = (MPC –Marginal Propensity to Consume–constant). Model 2: Y = 0 + 1 ln X + " E(Y jX) ) If income increases by 1%; consump100 X=X tion varies on average by 1 =100 euros. In this model, 1 =100 (Here the MPC is 2.2.2 ' 1 =X, which is not constant: it decreases with income). Model with log in the endogenous variable Sometimes, variations in X entail percentage changes in Y , ln Y = 0 12 + 1X + ", where E("jX) = 0 ) E(ln Y jX) = 0 + 1 X. In terms of the original variables, this model can be expressed as Y = exp( Interpretation of 0 + 1X + ") 1: 1 = E [( Y =Y ) j X] E(ln Y jX) ' X X so that 100 ' 1 1 E [(100 Y =Y ) j X] ) When X varies by 1 unit, X Y varies on average by ( 1 100) %. is a semielasticity. The elasticity of E(Y jX) with respect to X is equal to 1X (so it varies with the value of X). This speci…cation is very useful to describe curves with exponential growth. Particularly, if X = t (time), then Entonces, Y = exp( 0 + 1 t + ") y como E(ln Y jX) , 1 = t 1 recoge la tasa de crecimiento medio de Y a lo largo del tiempo. Example: Let Y = hourly wage (euros), X = education (years). Model 1: Y = 0 + 1X +" E(Y jX) ) An additional year of education implies an increase X in the average wage of 1 euros. where, 1 = 13 (The mean wage increases in a constant amount of 1 for each additional year of education, irrespective of the level of education). Model 2: ln Y = 0 + 1X +" Y =Y ) j X] ) An additional year of education X implies a percentage increase in the average wage of ( 1 100) %. where, 1 100 ' E [(100 (The hourly wage increases by ( Y ) euros for each additional year of edu- 1 cation, which varies with the wage level). 2.3 Double logarithmic model It characterizes situation by which percentage changes in X lead to percentage changes in the mean value of Y ) Constant elasticity. Very useful in studies of demand, production, costs, etc. The model can be expressed as ln Y = where E("jX) = 0 ) E(ln Y jX) = Interpretation of 1 ( 1 = 0 0 + + 1 1 ln X + ", ln X. 1: E [( Y =Y ) j X] E(ln Y jX) ' ) When X varies by 1%, ln X X=X Y varies on average by 1 %. is a elasticity) Example: Let Y = Output, X1 = Labor and X2 = Capital. 14 Model 1: Y = 0 1 X1 + 2 X2 + +" E(Y jX1 ; X2 ) ) If labor input is increased by 1 unit (keeping capital X1 constant), output varies on average by 1 units of output so = 1 ) The elasticity of output with respect to labor is not constant: Y =Y = X1 =X1 1 X1 Y E(Y jX1 ; X2 ) ) If capital input is increased by 1 unit X2 (keeping labor constant), output varies on average by 1 units of output Analogously 2 = ) The elasticity of output with respect to capital is not constant: Y =Y = X2 =X2 2 X2 Y Model 2: ln Y = 0 + 1 ln X1 + 2 ln X2 + " E [( Y =Y ) j X1 ; X2 ] ) If labor input is increased by 1% (keeping X1 =X1 capital constant), output varies on average by 1 %. so 1 ' ) The elasticity of output with respect to labor is constant. Likewise, if capital input varies by 1% (keeping labor constant), output varies on average by 2 %. ) The elasticity of output with respect to capital is constant. – Note that this model has the following representation in terms of the original variables: Y = b0 X1 1 X2 2 exp(") (Cobb-Douglas) 15 2.4 Model with quadratic terms This model allows to model for marginal increasing or decreasing e¤ects of X on Y , Y = 0 where E("jX) = 0 ) E(Y jX) = + 0 1X + + 1X 2X + 2 +" 2X 2 . It is useful in technologies of production of cost functions. Here, E(Y jX) = X 1 +2 2 X, so that when X varies by 1 unit, Y varies on average by ( Note that 1 +2 2 X) units. and 2 cannot be interpreted separately. – The sign of 2 will determine whether the marginal e¤ect is increasing ( 2 1 > 0) or decreasing ( 2 < 0). – There is a critical value of X after which the sign of the e¤ect of X on E(Y jX) switches. Such critical value is X = 1 =2 2 . Example: Let Y = hourly wage (euros), X1 = education (years), X2 = labor experience (years). Model 1: ln Y = 0 + 1 X1 + 2 X2 +" If experience increases by 1 year, keeping education constant, the wage varies on average by ( 2 100 ) %. 16 Model 2: ln Y = 0 + 1 X1 + 2 X2 2 3 X2 + +" If experience increases by 1 year, keeping education constant, the wage varies on average by 100 ( 2 +2 3 X2 )% ) The return to an additional year of experience is not constant (depending on the years of experience). 2.5 2.5.1 Other models Reciprocal model 1 + ", X 1 where E("jX) = 0 ) E(Y jX) = 0 + 1 . X Y = 0 + 1 – It implies a hyperbolic curvature. – It is used to describe nonlinear inverse relationships, such as the Phillips’ curve (unemployment-in‡ation tradeo¤). – When X varies by 1 unit, Y varies on average by 2.5.2 1 1 units of Y . X2 Models with interactions Sometimes, the e¤ect on one explanatory variable depends on the level of another, Y = 0 + 1 X1 + 2 X2 + 3 X1 X2 + ", where: E("jX1 ; X2 ) = 0 ) E(Y jX1 ; X2 ) = 0 + 1 X1 – If X1 varies by 1 unit, Y varies on average by 17 + 1 + 2 X2 + 3 X2 3 X1 X2 : units. – Note that the parameters cannot be interpreted separately. 2.6 Final comments The di¤erent transformations above can be combined in a model, so that we can have logarithmic or semilogarithmc models with interactions, powers, etc. Example: Translogarithmic production function. – Let Y = Output, X1 = Labor and X2 = Capital. ln Y = 0 + + 3 1 ln X1 + (ln X1 )2 + 2 4 ln X2 (ln X2 )2 + 5 (ln X1 ) (ln X2 ) + " Here, the elasticities of output with respect to either labor or capital are not constant, despite being in logarithms: E [( Y =Y ) j X1 ; X2 ] ' X1 =X1 1 +2 3 ln X1 + 5 ln X2 , which depends on the logarithms of both inputs. – This speci…cation is also useful to feature expedniture functions or cost functions. 3 Estimation in the multiple regression model: OLS Goldberger: Chapters 6 (6.4), 8 (8.2 y 8.3), 9 (9.2 y 9.4), 10 (10.2) y 12 (12.1 y 12.3). Wooldridge: Chapters 2 (2.2, 2.3, 2.5 y 2.6), 3 (3.2-3.5), 5 (5.1 y 5.3). Estimation follows the same rationale as in the simple regression case. 18 Consider the model: Y = 0 + 1 X1 + 2 X2 + + K XK +" with the assumptions above. Recall that the analog principle allows to derive estimators for the ’s that coincide with the OLS estimators. We will illustrate for the two-variable case, where the population parameters satisfy: 0 1 2 = E(Y ) 1 E(X1 ) 2 E(X2 ) V (X2 )C(X1 ; Y ) C(X1 ; X2 )C(X2 ; Y ) V (X1 )V (X2 ) [C(X1 ; X2 )]2 V (X1 )C(X2 ; Y ) C(X1 ; X2 )C(X1 ; Y ) = V (X1 )V (X2 ) [C(X1 ; X2 )]2 = and applying the analog principle, b0 = Y b 1 b2 where b1 X 1 b2 X 2 S22 S1y S12 S2y 2 S12 S22 S12 S12 S2y S12 S1y = 2 S12 S22 S12 = P P 2 2 S12 = n1 Pi X1i X 1 S22 = n1 Pi X2i X 2 S1y = n1 Pi X1i X 1 Yi Y S2y = n1 i X2i X 1 Yi S12 = n1 i X1i X 1 X2i X 2 = S21 Y In the general case, if we consider the LS criterion, the OLS estimator solves the problem 1X 2 b ", n i=1 i n min 0; 1; where b " i = Yi Ybi = Yi ; K (b0 + b1 X1i + b2 X2i + 19 + bK XKi ) is the residual. The …rst order conditions for OLS are: P "i ib = 0; "i ib = 0; or equivalently, 1 n P where xji = Xji P "i X1i ib 1 n P "i XKi ib = 0; : : : ; P "i x1i ib = 0; : : : ; 1 n P =0 "i xKi ib X j , j = 1; : : : ; K. =0 These conditions are simply the sample analog of the conditions that the ’s verify in the population: E(") = 0; C(X1 ; ") = 0; : : : ; C(XK ; ") = 0: The …rst order conditions imply the following system of (K + 1) equations with (K + 1) unknowns (the b’s): P P P P + bK i XKi = i Yi nb0 + b1 i X1i + b2 i X2i + P P b P x2 + b P x2i x1i + + bK i xKi x1i = i yi x1i 2 1 i i 1i .. . P P 2 b P x1i xKi + b P x2i xKi + yi xKi x = +b 1 i 2 K i i Ki i Provided that there isn’t any explanatory variable that is a exact linear combination of the others (i.e., there is not exact multicollinearity), the system will have a unique solution. 3.1 Properties of the OLS estimators As in the simple regression case, the OLS estimators satisfy the properties of: – Linearity in the observations of Y . – Unbiasedness (given assumptions 1., 2. and 4.) 20 – Gauss-Markov Theorem: Under assumptions 1. to 4., b0 , b1 ; the lowest variance among the linear and unbiased estimators. ; bK have – Consistency. 3.2 2 Estimation of Similar to the simple regression. A consistent estimator of 2 is 2 e = P "2i ib n . Since the residuals satisfy K + 1 linear restrictions, P "i ib there are only (n = 0; K P "i X1i ib = 0; : : : ; P "i XKi ib =0 1) independent residuals (degrees of freedom). We can the use an unbiased (and also consistent) estimator of 2 b = n P "2i ib K 1 2 , . Under regular conditions, both e2 and b2 are consistent estimators of 2 , and very similar for moderately large sample sizes. 3.3 Variances of the OLS estimators In addition to assumptions 1. and 2., we make use of assumption 3. (V ("jX1 ; X2 ; : : : ; XK ) = 2 V for any combination of the values of X1 ; X2 ; : : : ; XK ). bj = 2 nSj2 1 2 Rj2 =P i x2ji 1 21 Rj2 , (j = 1; : : : ; K) where – Sj2 = 1 P 1 P 2 i xji = i Xji n n Xj 2 – Rj2 is the R2 of the sample linear projection of Xj on the remaining explanatory variables X1i ; X2i ; : : : ; X(j Xji = 0 + 1 X1i ; + 2 X2i + + 1)i ; X(j+1)i ; : : : XKi : j 1 X(j 1)i + j+1 X(j+1)i + + K XKi +ui Rj2 measures the fraction of Xj which can be explained by the remaining explanatory variables. Hence, 1 Rj2 is the information (not contained in other variables) that Xj provides in addition to the remaining explanatory variables. It is not possible that Rj2 = 1, because then Xj would be a exact linear combination of the remaining explanatory variables (discarded by assumption 4.). But if Rj2 were close to 1, V bj would be very large. On the contrary, if Rj2 = 0 (i.e., the correlation of Xj with the remaining explanatory variables is 0), then V smallest. b j would be the Intuitively: – The higher Sj2 = 1 n P i x2ji , the higher the sample variation in Xj , and the better the estimator precision. – The larger the sample size n, the better the estimator precision. – The higher the Rj2 , the lower the estimator precision. 22 The variance of bj can be then consistently estimated using a consistent estimator for 2 , Vb bj = nSj2 where Sj2 is the sample variance of Xj , Sj2 = 3.4 1X Xji n i b2 1 Rj2 Xj 2 . Goodness of …t measures The goodness-of-…t measures are similar to the simple regression case. We can use the square root of b2 , b, denoted as standard error of the regression. We can also use the R2 , with the same interpretation (fraction of variance of Y explained by the explanatory variables), R2 = n P Ybi ESS = i=1 n P TSS Yi 2 Y Y 2 =1 RSS , TSS 0 R2 1. i=1 The R2 can be helpful when comparing di¤erent models for the same dependent variable Y . However, the R2 always increases when adding new regressors, even though they do not add explanatory value. 23 2 There is a similar measure, R , also called adjusted-R2 , which avoids this problem 2 R =1 1 R2 n n 1 K =1 1 RSS/ (n K 1) . TSS/ (n 1) 2 In any case, for large sample sizes, R ' R2 . 4 Inference in the multiple regression model Goldberger: Chapters 7, 10 (10.3), 11 y 12 (12.5 y 12.6). Wooldridge: Chapters 4 y 5 (5.2). 4.1 Hypothesis tests on a single coe¢ cient We must proceed in a similar way as with the simple regression. Suppose we have the following null and alternative hypotheses, H0 : 1 =a H1 : 1 6= a We then construct the t statistic t= b a 1 b 1 which tells us how many standard deviations our sample slope is from the slope hypothetized under the null. Under normality t= b a 1 b 1 24 N (0; 1) or, using the asymptotic approximation, b t= a 1 b 1 e N (0; 1) In general, we would reject the null hypothesis at the signi…cance level 100 (1 )% when jtj = b1 a b > z1 2 1 To test the one-sided (upper tail) hypothesis H0 : 1 =a H1 : 1 >a we would decide in favor of the alternative at the signi…cance level 100 (1 when t= Example: Let b a 1 b > z1 1 Y = logarithm of money demand (M1) X1 = logarithm of real GDP X2 = logarithm of Treasury-bills interest ratees – Using US data, we have obtained the following results Yb = 2:3296 +0:5573X1 (0:2054) (0:0264) R2 = 0:927 s = 0:048 25 0:2032X2 (0:0210) n = 38 Y = 6:629 )% – Interpretation: b1 : estimate of the elasticity of money demand with respect to output (keeping the interest rate constant). If GDP increases by 1% (and the interest rate does not change) the money demand increased on average by 0:6%. b : estimate of the elasticity of money demand with respect to interest 2 rate (keeping GDP constant). If the interest rate increases by 1% (and GDP does not change) the money demand falls on average by 0:2%. H0 : 2 = 0 (money demand is inelastic to interest rate) 2 6= 0 vs. H1 : Then, under H0 : b2 sb2 and jtj = e N (0; 1) 0:2032 = 9:676 > Z = 1:96 0:021 ) we reject H0 at the 5% signi…cance level. H0 : 1 = 1 (unit elasticity of money demand with respect to output) 1 6= 1 vs. H1 : Then, under H0 : b1 1 sb1 26 e N (0; 1) and 0:5573 1 = 16:769 > Z = 1:96 0:0264 jtj = ) we reject H0 at the 5% signi…cance level. 95% Con…dence Interval for b1 1 1:96 : ) 0:5573 sb1 95% Con…dence Interval for 4.2 b sb2 2 1:96 : ) 0:0264 ) [0:505 ; 0:609] 1:96 2 0:2032 0:0210 1:96 ) [ 0:244 ; 0:162] Tests about a linear restriction on several parameters Consider the null hypothesis H0 : where 0; 1; ; K; a 0 0 + 1 1 + + K K = a, are known constants. Using the asymptotic approximation, we have that, under H0 : Example: Let 0 t= r Vb b + 0 b + 1 1 b 0 0+ + b 1 1+ K b K a + K bK e N (0; 1). Y = logarithm of output X1 = logarithm of labour input X2 = logarithm of (physical) capital input – Using data on 31 companies, we have obtained the following results: Yb = 2:37 +0:632X1 +0:452X2 (0:257) (0:219) b b1 ; b2 ) = 0:055 Sb b = C( 1 2 27 n = 31 – Interpretation: b1 : estimate of the elasticity of output with respect to labor (keeping capital constant) When labor input rises by 1% (and capital does not change) ouput increases on average by 0:63%. b2 : estimate of the elasticity of output with respect to capital (keeping labor constant) When capital input rises by 1% (and labor does not change) ouput increases on average by 0:45%. – Consider the hypothesis H0 : 1 + 2 = 1 (Constant returns to scale) 1 + 2 6= 1. vs H1 : Then, under H0 : b +b 1 2 e N (0; 1). t = q1 b b b V ( 1 + 2) b b1 ; b2 ), and where Vb (b1 + b2 ) = Vb (b1 ) + Vb (b2 ) + 2C( 0:632 + 0:452 1 jtj = p (0:257)2 + (0:219)2 + 2 0:055 = 0:177 < Z = 1:96 So we cannot reject constant returns to scale. 4.3 Tests about q linear restrictions How can we test several linear constraints? For example. How can we test q linear restrictions like: 28 – H0 : – H0 : 1 = 1 2 – H0 : 4 = = K =0 =0 =1 + 2 = 1 2 =0 1 2 5=1 3 We must form two regressions: – The one that embodies the null hypothesis: this becomes the restricted model, which is the appropriate model if the null is true. – The original model, which does not restrict the coe¢ cients in any way: this is denoted as the unrestricted model. Basic idea: Ascertain whether imposing the null hypothesis has much of an impact on how well the model …ts the data. – If the null hypothesis is true, then both models should “…t” the data equally well. – Of course, even if the null hypothesis is true, the unrestricted model would better capture random variation in the sample (since the constraints will not be ful…lled exactly in the sample) and would provide somewhat better …t. – What we want to know is whether the …t achieved by the unrestricted model is so much better than the …t achieved by the restricted model so that we are willing to reject the null hypothesis. 29 When estimating each model, we can obtain Unrestricted Restricted R2 RU2 R2 P 2 P 2 R RSS "iU = URSS "iR = RRSS ib ib 0 and (RU2 – It is easy to check that (RRSS URSS) 2 RR ) 0. Examples: – Example 1: H0 : Unrestricted Y = 0 + 1 X1 + – Example 2: H0 : Y = (Y 2 0 +2 + 1 2 X2 ) = 0 +2 + (1 + = 0 vs. H1 : 1 1 6= 0 y/o 2 6= 0. Restricted Y = 0+" = 1 vs. H1 : 2 +2 1 6= 1 Restricted Y = 0 + 1X + " 2 X2 + " where: Y = Y X2 X = X1 2X2 =1) 1 X1 2 2 X2 + " Unrestricted Y = 0 + 1 X1 + since = 1 2 2 1 (X1 =1 2 1 )X2 1 ) +") 2X2 ) + " Assuming conditional normality, it can be proved that Y j X1i ; X2i ; : : : ; XKi N( 0 + 1 X1i + n K q 2 X2i + + K XKi ; 2 ), so under H0 [“q”linear restrictions], F = RRSS URSS URSS 1 Fq ;n K 1 or equivalently (provided that the dependent variable keeps unchanged in restricted model after introducing the constraints), F = 2 RU2 RR 1 RU2 n 30 K q 1 Fq ;n K 1 Or, using the asypmtotic approximation, W0 = RRSS URSS URSS (n 1) = qF e K 2 q or equivalently (provided that the dependent variable keeps unchanged in restricted model after introducing the constraints), W0 = 2 RU2 RR 1 RU2 (n 2 q 1) = qF e K All previous tests are particular cases of this one. 4.4 Test of joint signi…cance This test is also denoted as “regression test”. It consists of testing whether all the regression slope coe¢ cients are zero: H0 : 1 = j 6= 0 for at least any j = 1; : : : ; K. 2 = = K =0 vs. H1 : Unrestricted Y = 0 + 1 X1 + RU2 > 0 2 X2 + + K XK +" Restricted Y = 0+" 2 RR =0 Under H0 , F = RS2 1 RS2 n K K 1 FK ;n K 1 (if we assume conditional normaility) Alternatively, using the asymptotic approximation, we have that under H0 , W0 = RS2 1 RS2 (n 31 K 1) = KF e 2 K Example: Let Y = logarithm of money demand (M1) X1 = logarithm of real GDP X2 = logarithm of Treasury-bills interest ratees – Using US data, we have obtained the following results Yb = 2:3296 +0:5573X1 (0:2054) (0:0264) R2 = 0:927 0:2032X2 (0:0210) s = 0:048 n = 38 Y = 6:629 – We want to test whether money demand is non sensitive to both output and interest rate, H0 : 1 = H1 : 1 6= 0 y/o 2 =0 2 6= 0 – Then, F = 0:927 1 0:927 38 2 2 1 = 222:23 > F2;35 = 3:28 or, using the asymptotic test, W0 = RS2 1 RS2 (n K so that W0 = 0:927 1 0:927 (38 ) we reject H0 . 32 2 1) = KF e 1) = 444:46 > 2 K 2 2 = 5:99