Exercises 5 - Noppa

advertisement
Principles of Econometrics (721066S), Exercises 5
Santtu Karhinen, TA
Practicalities:
 These exercises are due 8.15am on Thursday 3.12.2015.
 Return them either via email to santtu.karhinen@oulu.fi or by hand to a return-box at
Marko’s room TA313 (third floor in the Business School)
 Do the exercises in a groups of 2–3 people. Remember to write down the names and
student numbers of all group members!
 Your answers don’t need to be perfect, showing good effort gives you points.
 Include estimation outputs from EViews / R in your solutions.
1. Use the dataset CPS08 given in Noppa. In the estimations, use the sample consisting of 26–
35 year-olds.
a. Explain average hourly earnings (AHE) with age, gender and education. So run a
regression of AHE on age, gender and education. If age increases from 26 to 27,
how much AHE is expected to change? If age increases from 32 to 33, how much
AHE is expected to change?
b. Take a logarithm of AHE (ln(AHE)). Then, run a regression of ln(AHE) on age, female
and bachelor. If age increases from 26 to 27, how much AHE is expected to change?
If age increases from 32 to 33, how much AHE is expected to change?
c. Now, run a regression of the logarithm of AHE, ln(AHE), on ln(Age), female and
bachelor. If age increases from 26 to 27, how much AHE is expected to change? If
age increases from 32 to 33, how much AHE is expected to change?
d. Now, run a regression of the logarithm of AHE, ln(AHE), on age, age2 , female and
bachelor. If age increases from 26 to 27, how much AHE is expected to change? If
age increases from 32 to 33, how much AHE is expected to change?
e. Is regression in part c) preferred to the regression in part b)? Why?
f. Is regression in part d) preferred to the regression in part b)? Why?
g. Is regression in part d) preferred to the regression in part c)? Why?
h. Run a regression of ln(AHE) on age, age2 , female, bachelor, and the interaction
term female×bachelor.
i. What does the coefficient on the interaction term measure?
ii. Consider a 28-year-old female (A) with bachelor’s degree. Prediction for her
value of ln(AHE)?
iii. Consider a 28-year-old female (B) without bachelor’s degree. Prediction for
her value of ln(AHE)?
iv. Predicted difference between earnings of person A and B?
v. Consider a 31-year-old male (C) with bachelor’s degree. Prediction for his
value of ln(AHE)?
vi. Consider a 31-year-old male (D) without bachelor’s degree. Prediction for
his value of ln(AHE)?
vii. Predicted difference between person C’s and D’s earnings?
i. Is the effect of age on earnings different for men than for women? Specify and
estimate a regression, which can be used to answer this.
j.
Is the effect of age on earnings different for high school graduates than for college
graduates? Specify and estimate a regression, which can be used to answer this.
2. Causes of terrorism? Variables in the dataset are
Variables
Definition
ftmpop
Number of fatalities from terrorist incidents in the country (1998–2004) per million
population
evmpop Number of terrorist events in the country (1998–2004) per million population
gdppc
GDP per capita
lackpf
Index of the lack of political freedoms 1–7 scale, 7 = Extremely limited political
freedoms
language Index of linguistic fractionalization (binary, 0 = no fractionalization)
ethnic
Index of ethnic fractionalization (binary, 0 = no fractionalization)
religion
Index of religious fractionalization (binary, 0 = no fractionalization)
regional = 1 if the country is in the indicated region, = 0 otherwise
dummies
a. Preliminaries
i. Plot a scatterplot of ftmpop vs. gdppc
ii. Generate variables lnftmpop = log(ftmpop) and lngdppc = log(gdppc). Then
produce the scatterplot of lnftmpop vs. lngdppc.
iii. Produce a scatterplot of lnftmpop vs. lackpf.
iv. Look at the scatterplots from i) and ii). Would you suggest using a) ftmpop
and gdppc, or b) lnftmpop and lngdppc for modeling using linear regression?
v. Now look at the scatterplot from iii). Is the relation between lnftmpop and
lackpf linear or nonlinear?
b. Estimate the following models. Report the heteroskedasticity-robust standard
errors in the parentheses. Also, report the corresponding p-values in parentheses
under F-statistics. Finally, mark the significance levels of the coefficients by * in
10%, ** in 5% and *** in 1%. You can mark your report your results in the following
table or provide estimation outputs from EViews / R.
Dependent variable:
Regressor:
lngdppc
(1)
(2)
(3)
(4)
(5)
lnftmpop lnftmpop lnftmpop lnftmpop lnftmpop
(
lngdppc
2
) (
–
) (
) (
–
–
(
lackpf
lackpf
religion
Mideast
Other regional dummies (latinam,
easteurope, africa, eastasia)
Intercept
–
)
) (
–
–
–
–
(
ethnic
)
–
(
2
) (
–
–
No
–
–
No
) (
) (
)
) (
) (
)
(
) (
)
(
) (
)
(
)
–
–
–
No
–
No
Yes
(
) (
) (
) (
) (
F-statistics testing the hypothesis that the population coefficients on the indicated regressors
are all zero:
lngdppc, lngdppc 2
–
–
–
–
(
)
2
lackpf, lackpf
–
–
(
) (
) (
ethnic, religion
–
–
–
(
) (
Other regional dummies
–
–
–
–
(
Regression summary statistics
R-squared
Adjusted R-squared
n
)
)
)
)
c. Interpreting the results:
i. Use regression (1) to test the hypothesis that the coefficient on lngdppc is
zero, against the alternative hypothesis that it is nonzero, at the 5% and 10%
significance level. Explain that the coefficient means.
ii. Use regression (3) to test the hypothesis that both the coefficients for
lngdppc and lngdppc 2 are zero, against the alternative hypothesis that one
or both coefficients are nonzero at the 5% significance level.
iii. Why your conclusions differ in a) and b)?
iv. Look at the regression (3), can you say whether the relation between
lnftmpop and lngdppc is nonlinear?
v. Look at the regression (3), can you say whether the relation between
lnftmpop and lackpf is nonlinear?
vi. Now use the regression (5) to test the null at the 5% significance level that
the all coefficients on the other regional dummies are zero versus the
alternative hypothesis that at least one is nonzero. Report also the number
of restrictions q and critical value in this test.
vii. Use regression (4) to test the hypothesis that the coefficients on ethnic and
religion are both zero versus the alternative hypothesis that one or the
other coefficient is nonzero. Explain what hypothesis you tested for, and
what your conclusion is.
d. Create a new binary variable called high_gdppc, which gets value 1 if gdppc is
greater than or equal to the median in the dataset, and zero otherwise. Create also
interaction variables high_lack=high_gdppc×lackpf and
high_lack2=high_gdppc×lackpf 2 .
Then, estimate the following models:
Dependent variable:
Regressor:
high_gdppc
(6)
lnftmpop
(7)
lnftmpop
(
)
(
)
(
)
(
)
(
)
(
)
lackpf
lackpf
2
high_gdppc×lackpf
–
(
)
high_gdppc×lackpf 2
–
(
)
ethnic
–
religion
–
high_gdppc×ethnic
)
(
)
(
)
(
)
(
)
–
high_gdppc×religion
–
Mideast
–
Other regional dummies (latinam,
easteurope, africa, eastasia)
Intercept
No
(
R-squared
Adjusted R-squared
(
)
–
Yes
(
)
–
n
Download