4 Regressions with two explanatory variables.

advertisement
4 Regressions with two explanatory variables.
This chapter contains the following sections:
(4.1) Introduction.
(4.2) The gross versus the partial effect of an explanatory variable.
(4.3) R 2 and adjusted R 2 ( R 2 ) .
(4.4)
Hypothesis testing in multiple regression.
4.1
Introduction
In the proceeding sections we have studied in some detail regressions containing
only one explanatory variable. In econometrics with its multitude of dependencies, the
simple regression can only be a showcase. As such it is important and powerful since
the methods applied to this case, carry over to multiple regression with only small and
obvious modifications. However, some important new problems arise which we point
out in this chapter.
As usual we start with the linear form:
(4.1.1)
Yi  0  1 X i1   2 X i 2   i (i  1, 2,..., N ) ,
where we again assume that the explanatory variables X i1 and X i 2 are deterministic and
also that the random disturbances  i have the standard properties:
(4.1.2)
E ( i )=0
(4.1.3)
Var( i )= 2
(4.1.4)
Cov( i ,  j )  0
i j
The coefficient  0 is the intercept coefficient, 1 is the slope coefficient of
X1 , showing the effect on Y of a unit change in X1 , holding X 2 constant or
controlling for X 2 . Another phrase frequently used is that 1 is the partial effect on Y
of X1 , holding X 2 fixed. The interpretation of  2 is similar, except that X1 and
X 2 change rolls.
1
Since the random disturbances are homoskedastic (i.e. they have the same
variances), the scene is prepared for least square regression. Hence, for arbitrary
values of the structural parameters ̂ 0 , ̂1 and ̂ 2 the sum of squared residuals are
given by:
N
N
i 1
i 0
Q( ˆ0 , ˆ1 , ˆ 2 )   ei2   (Yi  ( ˆ0  ˆ1 X i1  ˆ 2 X i 2 )) 2
(4.1.5)
Minimizing Q(ˆ0 , ˆ1 , ˆ2 ) with respect to ˆ0 , ˆ1 , and ˆ2 give the OLS estimators of
 0 , 1 and  2 . This is a simple optimization problem and we can state the OLS
estimators directly:
(4.1.5)
ˆ0  Y  ˆ1 X1  ˆ2 X 2
(4.1.7)
ˆ1 
( xi1Yi )( x i 2 )  ( xi 2Yi )( x i1 xi 2 )
2
D
( xi 2Yi )( x i 2 )  ( xi1Yi )( x i1 xi 2 )
2
(4.1.8)
ˆ2 
(4.1.9)
D  ( xi1 )( xi 2 )  ( xi1 xi 2 )
D
2
2
2
In these formulas we have used the common notation:
xi1  X i1  X 1
(4.1.10)
,
xi 2  X i 2  X 2
We have also used the useful formulas:
(4.1.11)
 x Y  x
(4.1.12)
 x Y  x
i1 i
i2 i
y
i1 i
y
i2 i
where yi  Yi  Y
You should check for yourself that these formulas hold.
You should also check for yourself that the OLS estimators ̂ 0 , ̂1 , and ̂ 2 are
unbiased, i.e.
2
(4.1.13)
E ( ˆ0 )  0
(4.1.14)
E ( ˆ1 )  1
(4.1.15)
E ( ˆ2 )  2
The variances and covariances of these OLS estimators are readily deduced:
(4.1.16)

Var ( ˆ0 ) 
 X 12Var ( ˆ1 )  X 2Var ( ˆ2 )  2 X 1 X 2Cov( ˆ1 , ˆ2 )
N
(4.1.17)
Var ( ˆ1 ) 
(4.1.18)
Var ( ˆ2 ) 
2
 2 S 22
S11S 22  S122
 2 S11
S11S22  S122
Using:
(4.1.19)
r12 
S12
,
S11S22
1  r  1
The formulas for Var ( ˆ1 ) and Var (ˆ2 ) can be rewritten
(4.1.20)
Var ( ˆ1 ) 
(4.1.21)
Var ( ˆ2 ) 
2
S11 (1  r122 )
2
S22 (1  r122 )
Similar calculation show that:
(4.1.22)
Cov( ˆ1 , ˆ2 ) 
r122 2
S12 (1  r122 )
Evidently, many results derived in the simple regression have immediate
extension to multiple regression. But some new points enter of which we have to be
aware.
4.2 The gross versus the partial effect of an explanatory variable
In order to get insight in this problem, it is enough to start with the multiple
3
regression (4.1.1). Hence, we specify:
(4.2.1)
Yi  0  1 X i1   2 X i 2   i (i  1, 2,..., N )
To facilitate the discussion we assume, for the time being, that also the
explanatory variables X i1 and X i 2 are random variables.
In this situation the above regression is by no means, the only regression we
might think of. For instance, might also consider:
(4.2.2)
Yi   0  1 X i1  u1i
or
(4.2.3)
Yi   0  1 X i 2  u2i
where u1i and u2i are the random disturbances in the regressions of Y on X1
and Y
on X 2
In the regressions (4.2.1) and (4.2.2) 1 and 1 show the impact of X1 on Y,
but obviously 1 and 1 can be quite different. In (4.2.1) 1 shows the effect of
X1 when we control for X 2 . Hence, 1 shows the partial or net effect of X1 on Y. In
contrast, 1 shows the effect of X1 when we do not control for X 2 . Intuitively,
when X 2 is excluded from the regression, 1 will absorb some of the effect of X 2
on Y since there will normally be a correlation between X1 and X 2 . Therefore, we
call 1 the gross effect of X1 . Of course, similar arguments apply to the comparison
of  2 and 1 in the regression (4.2.1) and(4.2.3). As econometricians we often
wish to evaluate the effect of an explanatory variable on the dependent variable. Since
the gross and partial impacts can be quite different, we understand that we have to be
cautious and not too bombastic when we interpret the regression parameters. At the
same time it is evident that this is a serious problem in any statistical application in
the social science.
A comprehensive Danish investigation studied the relation between mortality and
4
jogging. One compared the mortality for two groups, one consisted of regular joggers
and the other of people not jogging. The research worker found that the mortality rate
in the group of regular joggers was considerably lower compared to the group
consisting of non-joggers. This result may be reasonable and expected, but was the
picture that simple? A closer study showed that the joggers were better educated,
smoked less and almost nobody in this group had weight problems. Could these
factors help to explain the lower mortality rate for the joggers? A further study of this
sample showed that although these factors had systematic influences on the mortality
rate, the jogging activity still reduced the mortality rate.
In econometrics or in the social sciences in general similar considerations relate
to almost any applied work. Therefore, we should like to shed some specific light on
the relations between the gross and partial effects. Intuitively, we understand that the
root of this problem is due to the correlation between the explanatory variables, in our
model the correlation between X i1 and X i 2 . So when X i 2 is excluded from the
regression, some of the influence of X i 2 on Yi is captured by X i1 .
In order to be concise, let us assume that
(4.2.4)
X i 2   0  1 X i1  vi
where vi denotes the disturbances term in this regression. ((4.2.4) shows why it is
convenient to assume that X 1 and X 2 are random variables in this illustration).
Using (4.2.4) to substitute for X i 2 in (4.2.1) we attain:
(4.2.5)
Yi  (  0   2 0 )  ( 1   21 ) X i1  ( i   2 vi )
Comparing this equation (4.2.2) we attain:
(4.2.6)
u1i   i   2 vi
(4.2.7)
 0   0   2 0
(4.2.8)
1  1   21
We
observe
immediately
that
if
there
5
is
no
linear
relation
between
X i1 and X i 2 (i. e. 1  0) , then the gross effect  1 coincides with the partial (net)
effect  1 since in this case 1  1 (see (4.2.8)).
The intercept term  0 will be a mixture of intercepts  0 and 0 and the partial
effect of X i 2 , namely  2 .
Equations (4.2.7) - (4.2.8) show the relations between the structural parameters,
we have still to show that OLS estimators confirm these relations. However, they do!
Let us verify this fact for 1 (see (4.2.8)). We know from (4.1.7) and (4.1.8)
that:
(4.2.9)
ˆ1 
S01S 22  S02 S12
S11S 22  S122
(4.2.10)
ˆ2 
S02 S11  S01S12
S11S 22  S122
N
N
i 1
i 1
Where S01   x1Yi and S02   xi 2Yi
From section (4.2) we realize that:
(4.2.11)
ˆ 0  Y  ˆ1 X 1
(4.2.12)
ˆ1 
S01
S11
where ˆ 0 and ˆ1 are obtained by regressing Yi on X i1 .
Similarly, by regressing X i 2 on X i1 (4.2.4) we attain:
(4.2.13)
ˆ0  X 2  ˆ1 X 1
(4.2.14)
ˆ1 
S12
S11
Piecing the various equations together we obtain:
(4.2.15)
ˆ1  ˆ 2ˆ1 
( S 01S 22  S 02 S12 )  ( S 02 S11  S 01S12 )( S12 / S11)
S11S 22  S122
6

S 01 ( S11S 22  S122 ) / S11 S 01

 ˆ1
S11
S11S 22  S122
Thus, we have confirmed that:
(4.2.16)
ˆ1  ˆ1  ˆ 2ˆ1
so that the OLS estimators satisfy (4.2.8). In a similar way we can show that:
(4.2.17)
ˆ 0  ˆ0  ˆ2ˆ0
verifying (4.2.9).
We also observe that S12  0 implies that ˆ1  0 (see (4.2.14)). By (4.2.16) in this
case we have that ˆ 1  ˆ1 .
Therefore, if X i1 and X i 2 are uncorrelated, then the gross and partial effects of X i1
coincide. The specification issue treated in this section is important and interesting,
but at the same time challenging capable of eroding any econometric specification.
Many textbooks treat it under the heading “Omission variable bias”. In my opinion
treating this as a bias problem is not the proper approach. In order to substantiate this
view, we take (4.1.1) as the starting point but now assuming that X i1 and X i 2 are
random. Excluding details we simply assume that the conditional expectation of
Yi given X i1 and X i 2 can be written:
(4.2.18)
E[ Yi X1i , X2i ]  0 
1
Xi 1 
X
2 i 2
If our model is incomplete in that X i 2 has not been included in the specification,
then, evidently the expression the conditional expectation EYi | X i1  can be written:
(4.2.19)
E[ Yi X1i ]  0   1 Xi 1  2 E
[ Xi 2 Xi ] 1
In general E[ X i 2 X i1 ] can be an arbitrary function of X i1 . However, if we stick
to the linearity assumption by assuming:
(4.2.20)
E[ X i 2 X i1 ]  0  1 X i1
equation (4.2.19) will lead to the regression function:
7
EYi | X i1    0  1 X i1
(4.2.21)
where  0 and 1 are expressed by (4.2.7) and (4.2.8).
The point of this lesson is that (4.2.18) and (4.2.21) are simply two different
regression functions, but on their own perfectly all right regressions. To say that  1
is in any respect biased is simply misuse of language.
4.3
R2
and the adjusted
R 2 ( R 2 )
In section (2.3) we defined the coefficient of determination R 2 . We remember:
R2 
(4.3.1)
Explained sum of squares
Sum of squared residuals
 1
Total sum of squares
Total sum squares
N
where: Explained sum of squares is equal to: ESS   (Yˆi  Y ) 2 . Total sum of
i 1
N
squares is equal to: TSS   (Yi  Y ) 2 . Sum of squared residuals is equal to:
i 1
N
SSR   ei2
i 1
Since R 2 never decreases when a new variable is added to a regression, an increase
in R 2 does not imply that adding a new variable actually improves the fit of the
model. In this sense the R 2 gives an inflated estimate of how well the regression fits
the data. One way to correct for this is to deflate R 2 by a certain factor. The outcome
will be the so called adjusted R 2 , denoted R 2 .
The R 2 is a modified version of R 2 that does not necessarily increase when a new
variable is added to the regression equation. R 2 is defined by:
8
(4.3.2)
R2  1
ˆ 2
e
2
i
( N  1) SSR
 1  N  k 1  1 

ˆ (Y )
Var
( N  k  1) TSS
 (Yi  Y )
N 1
N-the number of observations
k-the number of explanatory variables
There are a few things to be noted about R 2 . The ratio
( N  1)
is always
( N  k  1)
larger than 1, so that R 2 is always less than R2 . Adding a new variable has two
opposite effects on R 2 . On the one hand, the SSR falls which increases R 2 . On the
other hand the factor
( N  1)
increases. Whether R 2 increases or decreases,
( N  k  1)
depends on which of the effects are the stronger. Thirdly, an increase in R 2 does not
necessarily mean that the coefficient of the added variable is statistically significant.
To find out if an added variable is statistically significant, we have to perform a
statistical test for example a t-test. Finally, a high R 2 does not necessarily mean that
we specified the most appropriate set of explanatory variables. Specifying
econometric models are difficult. We face observability and data problems around any
concern, but, in general, we ought to remember that the specified model should have a
sound basis in economic theory.
4.4 Hypothesis testing in multiple regression
We have seen above that adding a second explanatory variable to a regression did
not demand any new principle as regard estimation. OLS estimators could be derived
by an immediate extension of the “one explanatory variable” case. Much of the same
can be said about hypothesis testing. We can, therefore, as well start with a multiple
regression containing k explanatory variables. Hence we specify:
(4.4.1)
Yi   0  1 X i1   2 X i 2  .........   k X ik   i
9
where, as usual,  i denotes the random disturbances.
Suppose we wish to test a simple hypothesis on one of the slope coefficients, for
example  2 . Hence, suppose we wish to test:
(4.4.2)
against H 1A
H0
 2   20
 2   20
H A2
H A3
 2   20
 2   20
In chapter 3.1 we showed in detail the relevant procedures for testing H 0
against these alternatives in the simple regression. Similar procedures can be applied
in this case. We start with test statistic
(4.4.3)
T
ˆ2   20
~ t-distributed with ( N  k  1) degrees of freedom when
ˆ ( ˆ2 )
Std
H 0 is true.
ˆ ( ˆ2 ) is an estimator of the
In (4.4.1) ̂ 2 denotes the OLS estimator and Std
standard deviation of ̂ 2 .
In the general case (4.4.1) we only have to remember that in order to get an
unbiased estimator of  2 (the variance of the disturbances) we have to divide by
( N  k  1) . Note, in the simple regression k  1 so that ( N  k  1) reduces to (N-2).
By similar arguments we deduce that the test statistic T given by (4.4.3) has a
t-distribution with (N-k-1) degrees of freedom when the null hypothesis is true. With
this modification we can follow the procedures described in chapter (3.1). Hence, we
can apply the simple t-tests but we have to choose the appropriate number of degrees
of freedom in the t-distribution; remember that fact!
The t-tests are not restricted to testing simple hypothesis on the intercept or the
various slope parameters, t-tests can also be used to test hypothesis involving linear
combinations of the regression coefficients.
For instance, if we wish to test the null hypothesis:
(4.4.4)
H 0 : 1   2
against H A3 : 1   2
We realize that this can be done with an ordinary t-test. The point is that these
10
hypotheses are equivalent to the hypotheses:
(4.4.5)
~
H 0 : 1   2  0 against
~
H A: 1   2  0
So that if we reject H 0 we should also reject H 0 , etc.
In order to test H 0 against H A3 ,we use the test statistic:
T
(4.4.6)
( ˆ1  ˆ2 )  ( 1   2 )
( ˆ1  ˆ2 )

ˆ ( ˆ1  ˆ2 )
ˆ ( ˆ1  ˆ2 )
Std
Std
When H 0 is true, T will have a t-distribution with (N-k-1) degrees of freedom. Note
ˆ ˆ ˆ
that Std (1  2 ) can be estimated by the formulas:
(4.4.7)
ˆ ( ˆ1  ˆ2 )  Var
ˆ ( ˆ1  ˆ2 )
Std
where
(4.4.8)
ˆ (ˆ1  ˆ2 )  Var
ˆ (ˆ1 )  Var
ˆ (ˆ2 )  2Cov(ˆ1, ˆ2 )
Var
ˆ ˆ ˆ
When estimates of ̂1 and ̂ 2 and Std (1  2 ) are available, we can easily
compute the value of the test statistic T. After that we continue as with the usual
t-tests.
Although, the t-tests are not solely restricted to the simple situations, we will
quickly face test situations that these tests can not handle. As an example we consider
model from labor market economics: suppose that wages Yi depend on workers’
education X i1 and experience X i 2 . In order to investigate the dependency of Yi on
X i1 and X i 2 , we specify the regression:
(4.4.9)
(Yi )   0  1 ( X i1 )   2 ( X i 2 )  3 X i22   i
(i  1, 2,..., N )
where  i denote the usual disturbance terms.
Note that the presence of the quadratic term X i22 does not create problems for
estimating the regression coefficient. It is hardly a small hitch. We only have to define
the new variable:
(4.4.10)
X i 3  X i22
11
The regression (4.4.9) becomes:
(4.4.11)
(Yi )  0  1 ( X i1 )   2 ( X i 2 )  3 X i 3   i
(i  1, 2,..., N )
Suppose now that we are uncertain whether workers’ experience X i 2 has any
effect on the wages Yi . In order settle this issue we have to test a joint null hypothesis,
namely:
(4.4.12)
H 0 :  2  0 and 3 =0
versus H A :  2  0 and/or 3  0
In this case the null hypothesis restricts the value of two of the coefficient, so as a
matter of terminology we can say that the null hypothesis in (4.4.12) imposes two
restrictions on the multiple regression model; namely  2   3  0 . In general, a joint
hypothesis is a hypothesis which imposes two or more restrictions on the regression
coefficients.
It might be tempting to think that we could test the joint hypothesis (4.4.12) by
using the usual t-statistics to test the restriction one at a time. But this testing
procedure will be very unreliable. Luckily, there exist test procedures which manage
to handle joint hypothesis on the regression coefficients.
So, how can we proceed to test the joint hypothesis (4.4.12)? If the null
hypothesis is true, the regression (4.4.11) becomes:
(4.4.13)
Yi  0  1 X i1   i
(i  1, 2,..., N )
Obviously, we have to investigate two regressions, the one given by (4.4.11) and
the other one given by (4.4.13). Since there are no restrictions on (4.4.11) it is called
the unrestricted form, while (4.4.13) is called the restricted form of the regression. It
is very natural to base a test of the joint null hypothesis (4.4.12) on the sum of squared
residuals resulting from these two regression. If (SSR) R denotes this sum of squared
residuals obtained from (4.4.13) and ( SSR)U dente that obtained from (4.4.11), we
will be doubtful about the truth of H 0 if (SSR) R is considerably greater than
( SSR)U . If (SSR) R is only slightly larger than ( SSR)U there is no reason to be
12
doubtful about H 0 .
Since (SSR) R stems from the restricted regression (4.4.13), we obviously have:
(4.4.14)
( SSR) R  (SSR)U
In order to test joint hypotheses on the regression coefficients the standard
approach is to use a so-called F-test. In our present example this test is very intuitive.
In the general case it is based on the test-statistic:
F
(4.4.15)
(( SSR) R  ( SSR)U ) / r
( SSR)U /( N  k  1)
where r denotes the number of restrictions and k denotes the number of explanatory
variables.
In our example above: r =2 and k =3.
If H 0 is true, then F has a so-called Fisher distribution with (r, N-k-1) degrees of
freedom. The numerator has r degrees of freedom, and the denominator (N-k-1).
From (4.4.14) it is obvious that the test-statistic F is concentrated on the positive
axis. Small values of F indicate that H 0
is agreeable with the sample data.
The principal approach to test a joint null hypothesis H 0 against an alternative
H A proceeds as usual:
(i)
Choose a suitable test-statistic
(ii)
Choose a level of significance ( )
(iii)
When H 0 is true, the test-statistic F will have a Fisher-distribution with r and
(N-k-1) degrees of freedom.
(iv)
c
The critical value F in this distribution is determined from the equation:
(4.4.16)
(v)
P{F (r , N  k  1)  Fc }  
When the regressions have been performed, we compute the value of the
test-statistic F̂ .
13
(vi)
c
c
ˆ
ˆ
Decision rule: (a) Reject H 0 if F  F (b) Do not reject H 0 if F  F .
Fig (4.4.1)!!
(4.4.17)
Fˆ [ Fc , )
reject H 0
(4.4.18)
Fˆ [0, Fc )
do not reject H 0
Applying this test to our null hypothesis (4.4.12), gave the following result:
(4.4.19)
Yi  0.078  0.118 X i1  0.054 X i 2  0.001X i 3  ei
( SSR)u  18.6085
(4.4.20)
Yi  0.673  0.107 X i1  ei
( SSR)r  19.5033
N  100, k  3, N  k  1  96, r  2
(4.4.21)
(19.5033  18.6085) / 2
Fˆ 
 2.308
18.6085 / 96
  0.05
c
The critical value F0.05
is determined:
(4.4.22)
c
P{F (2,96)  F0.05
}  0.05
Tables of the F-distribution show that F0c.05  3.1
Since Fˆ  2.038  3.1  F0c.05 , there is no reason to reject H 0 . Workers’
experience does not seem to have an impact on workers’ wages in this sample. We can
also compute the P-value for this test in the same way we have learned above. We
observe:
(4.4.23)
P  value  PF (2, 96)  2.038  0.105
A joint hypothesis which at times might interest us all, is to find out if the
explanatory variables have an impact at all on the dependent variable. Thus, referring
to (4.4.1) we wish to test the null hypothesis:
(4.4.24)
H 0 : 1   2  .........   k  0
14
against H A : at least one  j  0
j  1 , 2 , .k. . ,
( SSR)U is computed from the unrestricted regression (4.4.1) , while ( SSR) R is
computed from the restricted regression:
(4.4.25)
Yi  0   i
i  1, 2,..., N
We observe immediately that in the restricted model ̂ 0 is estimated by:
(4.4.26)
ˆ0  Y (implying Yˆi  Y )
So that:
(4.4.27)
( SSR) R   (Yi  Yˆ ) 2   (Yi  Y ) 2
The numerator in the test statistic F becomes:
(4.4.28)
( SSR) R  ( SSR)U   (Yˆi  Y ) 2
Hence, the test statistic F reduces to:
 (Yˆ  Y )
2
ESS
k
k
F
=
(
SSR
)U
( SSR )U /( N  k  1)
( N  k  1)
i
(4.4.29)
R2
k
=
(1  R 2 )
( N  k  1)
(Using the deduction in section (4.3)).
15
Download