Handout 5

advertisement
Research Method
Lecture 5 (Ch6)
Multiple regression:
Further issues
©
1
Effects of data scaling on OLS
statistics.
Consider the following labor supply
equation for married women.
Hours =β0+ β1(non wife income)+
β2(experience)+ β3(education)+ β4(# kids aged
less than 6)+ u
Hours: the woman’s annual hours worked
Non wife income: wife’s non labor income
(such as husband’ s income) in $1000.
Estimate the model using MROZ.dta
2
Question: What is the effect of measuring
non-wife-income in $1 instead of in
$1000?
What is the effect of measuring wife’s
labor supply in days-equivalent (i.e.,
days=hours/8)?
3
Standardizing a variable
The effects of some variables, such as test
scores, are often difficult to interpret: For
example, what does it mean by the effect
of increasing the test score by 1 point on
wage?
In such case, it is often better to
standardize the variable first, then include
it in the model. That is; include the
following variable instead.
(TestScore  TestScore)
(Sample standard deviation of TestScore)
4
Then coefficient on the variable will tell
you the effect when test score increases by
one standard deviation from the mean.
5
Example
Using HPRICE2.dta, estimate the effect of
environment (measured by nox) on
housing price. Standardize nox before
including in the model. In the model,
include (crime), (dist), (rooms) and
(stratio).
6
. use "D:\My Documents\IUJ_teaching\Research Methodology\Wooldridge Econometrics resources\data\HPRICE2.DTA", clear
.
. egen sd_nox=sd(nox)
. egen mean_nox=mean(nox)
.
. gen nox_standardized=(nox-mean_nox)/sd_nox
.
. reg price nox_standardized crime dist rooms stratio
Source
SS
df
MS
Model
Residual
2.7223e+10
1.5603e+10
5 5.4445e+09
500 31205611.6
Total
4.2826e+10
505
price
Coef.
nox_standa~d
crime
dist
rooms
stratio
_cons
-3135.118
-153.601
-1026.806
6735.498
-1149.204
5851.014
84803032
Std. Err.
410.1726
32.92883
188.1079
393.6037
127.4287
4075.662
Number of obs
F( 5, 500)
Prob > F
R-squared
Adj R-squared
Root MSE
t
-7.64
-4.66
-5.46
17.11
-9.02
1.44
P>|t|
0.000
0.000
0.000
0.000
0.000
0.152
=
=
=
=
=
=
506
174.47
0.0000
0.6357
0.6320
5586.2
[95% Conf. Interval]
-3940.993
-218.2969
-1396.386
5962.177
-1399.566
-2156.519
-2329.244
-88.90505
-657.227
7508.819
-898.8422
13858.55
7
Including quadratic terms
Often you would like to capture
‘diminishing marginal returns’. For example,
the effect of experience on wage may face
diminishing marginal returns.
To capture such effects, include quadratic
term.
Log(wage) =β0+β1(experience)+β2(experience)2+u
If β2 is negative, experience faces diminishing
marginal returns.
8
Note that the effect of education is given by
∂log(wage)/∂(experience) =β1+ 2β2(experience)
So, the effect changes with experience.
Exercise 1: Use MROZ.dta, estimate the above equation. What is
the effect of increasing experience by 1 year on wage
evaluated at the average experience?
Exercise 2: What is the effect of increasing experience by 1 year
on wage for those with experience equal to 1?
9
. use "D:\My Documents\IUJ_teaching\Research Methodology\Wooldridge Econometrics resources\data\MROZ.DTA", clear
. reg lwage exper expersq
Source
SS
df
MS
Model 9.38142005 2 4.69071002
Residual 213.946021 425 .503402402
Total 223.327441 427 .523015084
lwage
Coef. Std. Err.
t P>|t|
Number of obs =
F( 2, 425) =
Prob > F =
R-squared =
Adj R-squared =
Root MSE =
428
9.32
0.0001
0.0420
0.0375
.70951
[95% Conf. Interval]
exper .0476388 .0140012 3.40 0.001 .0201185 .0751591
expersq -.0010159 .0004177 -2.43 0.015 -.0018369 -.0001949
_cons .8075371 .1000761 8.07 0.000 .6108314 1.004243
10
Interaction terms
Consider you estimate the following
production function
Log(Q)=β0+β1log(labor)+β2log(capital)+u
This model assumes that the effect of labor
on output is independent of the effect of
capital. But, workers may be more
productive if they have more capital. So in
reality, there is an interaction effect
11
between labor and capital.
To capture such interaction effects, you
can include an interaction term, like:
Log(Q)=β0+β1log(labor)+β2log(capital)
+β3log(labor)log(capital)+ u
The effect of labor on output is now give as
∂log(Q)/∂log(labor) =β1+β3log(capital)
So the effect of labor depends on the
amount of capital.
12
Exercise: using HPRICE1.dta, estimate the
following model
Price=β0+β1sqrft+β2bdrms+β3(sqrft)(bdrms)+ u
Question 1: Is there positive or negative interaction
effect between the size of the house and the # of
bedrooms?
Question 2: What is the effect (sqrft) on price of
house evaluated at the average # of bedrooms?
Question 3: What is the effect of (bdrms) on price
13
of house evaluated at the average size of house?
. do "C:\Users\SHINGO~1\AppData\Local\Temp\STD04000000.tmp"
. use "D:\My Documents\IUJ_teaching\Research Methodology\Wooldridge Econometrics resources\data\HPRICE1.DTA", clear
. gen sqrft_bdrms=sqrft*bdrms
. reg price sqrft bdrms sqrft_bdrms
Source
SS
Model
Residual
600164.068
317690.438
3 200054.689
84 3782.02902
Total
917854.506
87 10550.0518
price
sqrft
bdrms
sqrft_bdrms
_cons
df
Coef. Std. Err.
.0326205
-35.95534
.023448
181.6908
.0436415
24.01237
.0101573
92.18885
MS
Number of obs =
F( 3, 84) =
Prob > F
=
R-squared =
Adj R-squared =
Root MSE
=
88
52.90
0.0000
0.6539
0.6415
61.498
t
P>|t|
[95% Conf. Interval]
0.75
-1.50
2.31
1.97
0.457
0.138
0.023
0.052
-.0541655
-83.70657
.0032491
-1.636817
.1194065
11.79589
.043647
365.0184
14
Adjusted R-squared
The usual R-squared is given by
R2  1
Almost ˆ 2
but different
SSR
SSR / n
 1
SST
SST / n
2
Almost S y
but different
The adjusted R-squared is given by
SSR /( n  k  1)
R  1
SST /( n  1)
2
It is equal
to ˆ 2
It is equal
to S 2 y
15
The usual R squared always increases if
you add additional variables (even if it
does not make sense to add some
variables).
The adjusted R squared imposes a penalty
to adding additional variables because it
divides SSR by (n-k-1).
The adjust R squared can also be written
as R 2  1  (1  R 2 )(n  1) /( n  k  1)
16
Controlling for too many
factors in regression
After learning the omitted variable bias,
one may be tempted to control for as
many factors as possible.
But this often lead students to control for
factors that shouldn’t be controlled for.
Next slide shows the example.
17
Suppose you would like to see the effect of
beer tax on traffic fatalities. The idea is that
beer tax will reduce beer consumption,
which would lead to fewer fatalities. So you
may estimate
(Fatalities)=β0+β1(beer tax)
+β2(Percentage of male in town)
+β3(percentage of young drivers)
+β(other variables)+u
18
Question is whether you should also
control beer consumption. Like:
(Fatalities)=β0+β1(beer tax)
+β2(beer consumption)
+β3(Percentage of male in town)
+β4(percentage of young drivers)
+β(other variables)+u
The answer to this question is NO. Beer tax
would affect fatalities mainly by reducing beer
consumption. So, it does not make sense to hold
beer consumption constant when you examine
19
the effect of beer tax on fatalities.
Different models serve different
purposes
Suppose that you would like to estimate
the gender salary gap among academic
economists. One possible model is to
include female dummy together with rank
dummies.
Log(salary)=β0+β1(Female)
+β2(FullProf)
+β3(AssocProf)
+(other variables)+u
20
The above model estimates the gender
salary gap while holding rank constant.
Thus, the female coefficient captures the
gender salary gap within each rank.
However, females may be discriminated
in terms of promotions as well, and this
would indirectly cause gender salary gap.
21
If you would like to evaluate the gender
salary gap that is caused by (i)
discrimination in terms of salary and (ii)
the discrimination in terms of promotion
combined, it makes sense to drop rank
variables from the model.
When you drop rank variables (i.e.,
FullProf and AssocProf), then female
coefficient will show the salary gap that is
caused by salary discrimination and
promotion discrimination combined.
22
Download