Lecture 11 - BYU Department of Economics

advertisement
Econ 388 R. Butler 2014 revisions lecture 11
I. Constant term in regressions
A. can’t interpret Case 1: not theoretically important (the usual case)
Example: 401K participation rates regressed on age, number of multi-level
marketers in your ward, and income for business school faculty (not meaningful
to talk about zero values for these variables)
B. can’t interpret Case 2: theoretically important, but the data points are far away
from the constant term (so evaluating Y where X=0 is predicting far outside of the
sample observations)
Example: sick days regressed on tenure (but your data consists of a sample of
experienced employees since you haven’t hired in years, but are considering
opening a new plant now)
C. can interpret Case 3: theoretically meaningful and data points are close to the
Y-axis
Example: professor wage regression: wages on tenure when you have new faculty
in the sample and are going to hire new faculty
D. What happens when you suppress the intercept
II. Slope parameter issues:
A. scaling through by constants don’t change the inferences. Recall “A”
transformations, where A was a nonsingular (kxk) matrix of the type XA. These
transformations lead to unchanged projections (so when income is changed from whole
dollars to thousands of dollars, the corresponding coefficients decrease by 1/1000). See
lecture 3.
B. standardized coefficients instead of coefficients: the Firefighter/MMPI paper
and the HPRICE2.raw example:
1. beta coefficients for HPRICE2 [[[hprice2.do ]]]]
*** STATA EXAMPLE****;
# delimit ;
*accessing wooldridge data on HPRICE2.RAW, chapter 4 in Wooldridge;
infile price
crime
nox
rooms
dist
radial
proptax
stratio lowstat
lprice
lnox
lproptax using
"e:\classrm_data\wooldridge\hprice2.raw", clear;
regress price nox crime rooms dist stratio, beta;
/* the ", beta" option prints standardized beta coeff */
/*
on rhs: (std.dev.chg Y)/(std.dev.chg Xi)
*/
***SAS EXAMPLE****;
data one;
*accessing wooldridge data on HPRICE2.RAW, chapter 4 in Wooldridge;
infile "e:\classrm_data\wooldridge\hprice2.raw";
input price
crime
nox
rooms
dist
radial
proptax
stratio lowstat
lprice
lnox
lproptax;
run;
proc reg;
1
model price=nox crime rooms dist stratio/stb; run;
/* the " /stb " option prints standardized beta coeff */
/*
on rhs: (std.dev.chg Y)/(std.dev.chg Xi)
*/
2. log(prices) on dummy variables—interpreting the results HPRICE2
[[[ln_hprice.sha p. 182-184, especially 184]]]]
III. Interpreting quadratic results
Often things like age and tenure will not have a constant impact on wages (or
participation in pension programs or receipt of social insurance benefits, or hours of
work, etc.), but will have a concave (a diminishing impact) or convex (increasing impact)
effect. The most common way for capturing this sort of nonlinear behavior is to include
the variable and its square. For example, wages may increase at a decreasing rate with
respect to age. Hence, our model should be specified something like
or
wage =  0 + 1 age +  2 age2 + other regressors
ln(wage) =  0 + 1 age +  2 age2 + other regressors
We test for “polynomial effects” (the joint significance of age and age-square variables
together) with a F-test as discussed a couple of lectures ago.
Since you change age and age-square simultaneously, you have to use both coefficients
when interpreting the impact of age on wage (that is, you can’t talk about the partial
effect of age on wage—holding other things constant—by discussing the estimated
magnitude of 1 alone. And since there are different ways to calculate the slope of a
"bowl," there are different ways to calculate the impact of age on wages. We will
mention the two most common. For a model like the one above, these alternative wages
to compute the age-wage effect are:
calculus approximation (the one used in Wooldridge, p. 186): ˆ1 + 2* ˆ2 *age
discrete approximation (not shown in the text): ˆ1 + 2* ˆ2 *age + ˆ2
where the calculations are usually made at the mean value of the age variables.
To find where the impact of age is at an extreme (a maximum when the age-squared
coefficient is negative, and a minimum when the age-squared coefficient is positive), just
set the slope = 0 (in the calculus formula) and solve for the resulting value of age:
age (max or min) = - ˆ1 /(2* ˆ2 ).
example using utah data set on wages [[[ut_cps_agesq.X ]]]
***STATA CODE INSERT***;
2
gen lnwage = log(wklywg);
gen agesq =age*age;
regress lnwage age agesq white male no_hi_sc high_sch some_col college
exec tech_sal serv_occ oper_occ ag_cnstr manuf trade pub_admn;
test (age=0) (agesq=0);
**SAS CODE INSERT**;
lnwage = log(wklywg);
agesq =age*age;
run;
proc reg;
model lnwage=age agesq white male no_hi_sc high_sch some_col college
exec tech_sal serv_occ oper_occ ag_cnstr manuf trade pub_admn;
test age,agesq;
run;
IV. Interpreting interaction results
utah data set on wages
**STATA CODE INSERT ****;
gen age_ed=age*educ_att;
regress lnwage age age_ed white male educ_att exec tech_sal serv_occ
oper_occ ag_cnstr manuf trade pub_admn;
**SAS CODE INSERT ****;
age_ed=age*educ_att;
run;
proc reg;
model lnwage=age age_ed white male educ_att exec tech_sal serv_occ
oper_occ ag_cnstr manuf trade pub_admn;
run;
V. confidence intervals and prediction intervals
A. Confidence Intervals for Forecasts
1. Intervals for the line (E(YT)), not a specific value on the line. If the model is
correct, and you are only concerned about the prediction of expected values of Y, EY,
then the only source of uncertainty is because ˆ was estimated. (That is, since
E(Y) = Xβ + E(ε) = Xβ, we are not concerned in this case with variability due to ε since
only the confidence interval about the estimated regression line is sought.) Then we want
the variance of
yT = xT ˆ =xT(X’X)-1X’Y
where "T" is the point on the line that we are interested in, corresponding to the
independent vector of observations "xT" (here xT is a row vector corresponding to a Tth
3
-1
observation, it is not a column variable vector). Since ˆ  N(β, σ2(X'X) ) then we use
useful theorem #1 (from lecture 6) to get
V(yT) = V(xT ̂ ) = xTV( ˆ )xT’
= xT σ2(X'X)-1xT’ = σ2xT(X'X)-1xT’
using the rule that if y  N ( X ,  2 I ) then Ay  N ( AX , A 2 I A' ) .
B. Prediction Intervals for forecasts
“Confidence” intervals for a specific value of xT (not just E(yT), but for yT = xTβ + εT so
the additional variability due to "εT" must be taken into account), is for
"yT" = xT ̂ +  T
so that we have
V("yT") = V(xT ̂ ) + V(  T)
where the covariance between  T and xT is zero, because X's are nonstochastic (or, we
assuming X and  T are uncorrelated), and  T is assumed to be independent of all  
for all P  T . Hence
V("yT") = σ2 xT(X'X)-1x + σ2
= σ2[x (X'X)-1x + 1]
T
These variances are estimated by replacing σ2 with s2. Hence
Prob[ x T ˆ -
-1
s 2 [ x T (XX ) x T  + 1])( t .95,d . f .)
 yT
< x T ˆ + ( s 2 ( x T (XX )-1 x t   1) ( t .95,d . f .)] = .95
C.
Examples of Forecasting from STATA, SAS
Forecasting, and calculating the standard error of the forecast of a particular
(but not the regression line) is fairly easy in STATA and SAS. Here is the STATA
code for the forecast in a very simple model:
4
***STATA CODE for a SIMPLE EXAMPLE***;
# delimit;
input y x; *estimate using first five, then predict for x=6;
3 1;
4 2;
2 3;
6 4;
8 5;
. 6;
end;
regress y x;
predict yhat;
predict stderr_mean, stdp;
predict stderr_ind, stdf;
gen tvalue=invttail(3,.025);
gen lower_ci_mean=yhat - stderr_mean*invttail(3,.025); /* error(xb) */
gen upper_ci_mean=yhat + stderr_mean*invttail(3,.025);
gen lower_ci_ind=yhat - stderr_ind*invttail(3,.025); /* error(xb+e) */
gen upper_ci_ind=yhat + stderr_ind*invttail(3,.025);
list yhat lower_ci_mean upper_ci_mean lower_ci_ind upper_ci_ind tvalue
stderr_mean stderr_ind in 6;
OR, more readily:
# delimit;
input y x pred_trick; *estimate using first five, then predict for x=6;
3 1 0;
4 2 0;
2 3 0;
6 4 0;
8 5 0;
0 6 -1;
end;
regress y x pred_trick;
*the coefficient on pred_trick will be the predicted value for last
observation, and the std err for pred_trick will be = stderr_ind from
the more complex example above;
**SAS CODE for a simple example***;
data one;
input y x; *estimate using first five, then predict for x=6;
cards;
3 1
4 2
2 3
6 4
8 5
. 6
;
run;
proc reg;
model y=x/p clm cli; /* clm=error(xb) cli=error(xb+e) */
run;
can do the same pred_trick ploy for SAS as we did for STATA
VI. Alternative Functional Forms: Unemployment Rates in Interwar Britain
5
A. different specifications for ur and ben_wg uk_unemp.sha
f(ur) =  0 + 1 g(ben_wg) +  2 trend +  3 log(real Net National Product)
should f(.), g(.) be linear or logarithmic?
Here is the STATA program (including the procedure discussed chapter 6 in
Wooldridge):
# delimit ;
*uk_unemp_func_form.do -- interwar UK unempl--to show functional form;
input year ur ben_wg nnp;
1920 3.9 .15 3426;
1921 17.0 .24 3242;
1922 14.3 .37 3384;
1923 11.7 .40 3514;
1924 10.3 .42 3622;
1925 11.3 .48 3840;
1926 12.5 .48 3656;
1927 9.7 .48 3927;
1928 10.8 .50 4003;
1929 10.4 .50 4097;
1930 16.1 .53 4082;
1931 21.3 .54 3832;
1932 22.1 .50 3828;
1933 19.9 .51 3899;
1934 16.7 .53 4196;
1935 15.5 .55 4365;
1936 13.1 .57 4498;
1937 10.8 .56 4665;
1938 12.9 .56 4807;
end;
gen lnur = log(ur);
gen lnben_wg = log(ben_wg);
gen ben_wgsq = ben_wg^2;
gen lnnnp = log(nnp);
gen trend = year - 1919;
gen i_ben_wg = 1/ben_wg;
regress ur ben_wg trend lnnnp;
regress ur lnben_wg trend lnnnp;
regress lnur lnben_wg trend lnnnp;
predict lnur1_hat;
gen exp_ln1=exp(lnur1_hat);
regress ur exp_ln1 , noconstant;
gen alpha1=_b[exp_ln1]; /*e(b) returns a vector of all coeff*/;
gen ur1_hat = alpha1*exp_ln1;
regress lnur ben_wg trend lnnnp;
predict lnur2_hat;
gen exp_ln2=exp(lnur2_hat);
regress ur exp_ln2 , noconstant;
gen alpha2=_b[exp_ln2];
gen ur2_hat = alpha2*exp_ln2;
regress ur ben_wg ben_wgsq trend lnnnp;
regress lnur ben_wg ben_wgsq trend lnnnp;
predict lnur3_hat;
gen exp_ln3=exp(lnur3_hat);
6
regress ur exp_ln3 , noconstant;
gen alpha3=_b[exp_ln3];
gen ur3_hat = alpha3*exp_ln3;
summarize ur ur1_hat ur2_hat ur3_hat;
correlate ur ur1_hat ur2_hat ur3_hat;
Specification
Ur
ben/wg
ur
ln(ben/wg)
ln(ur) ln(ben/wg)
ln(ur)
ben/wg
Ur
b/w;(b/w)2
ln(ur) b/w;(b/w)2
R-square
.8571
.8591
.8351
.8088
.8573
.8236
R-square adj
.8285
.8310
.8022
.7706
.8165
.7732
quasi-R sq. (end of chapter 6)
(.89591)2=.8026
(.89068)2=.7933
(.88617)2=.7853
The quasi-R square is taken by squaring the relevant elements in the correlation matrix
generated by the very last STATA command in the program: CORRELATION MATRIX
UR
UR1_HAT
UR2_HAT
UR3_HAT
1.0000
0.89591
0.89068
0.88617
UR
1.0000
0.99731
0.99611
UR1_HAT
1.0000
0.98827
UR2_HAT
1.0000
UR3_HAT
By these tests, the preferred specification seems to be the second specification, with a R2
value of .8591.
B. Interpreting the specification with ben/wage and its square in it...
The way that the fifth specification picks up the non linear relationship between the
unemployment rate (UR) and the benefit/wage ratio by adding the square of the variable
to the equation--in this case, we added the square of the benefit/wage ratio to the linear
specification. We discussed this above, let’s make sure that we understand the
implications of the estimates: do the results (which follows) indicate?
. regress ur ben_wg ben_wgsq trend lnnnp;
Source |
SS
df
MS
-------------+-----------------------------Model |
308.22099
4 77.0552475
Residual | 51.2990055
14 3.66421468
-------------+-----------------------------Total | 359.519995
18 19.9733331
Number of obs
F( 4,
14)
Prob > F
R-squared
Adj R-squared
Root MSE
=
=
=
=
=
=
19
21.03
0.0000
0.8573
0.8165
1.9142
-----------------------------------------------------------------------------ur |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------ben_wg |
13.26801
38.73626
0.34
0.737
-69.813
96.34902
ben_wgsq |
9.995276
64.29326
0.16
0.879
-127.9
147.8906
trend |
1.593192
.2723781
5.85
0.000
1.009
2.177385
lnnnp | -92.60384
12.34338
-7.50
0.000
-119.0778
-66.12993
_cons |
755.4508
102.1371
7.40
0.000
536.3884
974.5132
------------------------------------------------------------------------------
7
Download