Principles of Econometrics (721066S), Exercises 5 Santtu Karhinen, TA Practicalities: These exercises are due 8.15am on Thursday 3.12.2015. Return them either via email to santtu.karhinen@oulu.fi or by hand to a return-box at Marko’s room TA313 (third floor in the Business School) Do the exercises in a groups of 2–3 people. Remember to write down the names and student numbers of all group members! Your answers don’t need to be perfect, showing good effort gives you points. Include estimation outputs from EViews / R in your solutions. 1. Use the dataset CPS08 given in Noppa. In the estimations, use the sample consisting of 26– 35 year-olds. a. Explain average hourly earnings (AHE) with age, gender and education. So run a regression of AHE on age, gender and education. If age increases from 26 to 27, how much AHE is expected to change? If age increases from 32 to 33, how much AHE is expected to change? b. Take a logarithm of AHE (ln(AHE)). Then, run a regression of ln(AHE) on age, female and bachelor. If age increases from 26 to 27, how much AHE is expected to change? If age increases from 32 to 33, how much AHE is expected to change? c. Now, run a regression of the logarithm of AHE, ln(AHE), on ln(Age), female and bachelor. If age increases from 26 to 27, how much AHE is expected to change? If age increases from 32 to 33, how much AHE is expected to change? d. Now, run a regression of the logarithm of AHE, ln(AHE), on age, age2 , female and bachelor. If age increases from 26 to 27, how much AHE is expected to change? If age increases from 32 to 33, how much AHE is expected to change? e. Is regression in part c) preferred to the regression in part b)? Why? f. Is regression in part d) preferred to the regression in part b)? Why? g. Is regression in part d) preferred to the regression in part c)? Why? h. Run a regression of ln(AHE) on age, age2 , female, bachelor, and the interaction term female×bachelor. i. What does the coefficient on the interaction term measure? ii. Consider a 28-year-old female (A) with bachelor’s degree. Prediction for her value of ln(AHE)? iii. Consider a 28-year-old female (B) without bachelor’s degree. Prediction for her value of ln(AHE)? iv. Predicted difference between earnings of person A and B? v. Consider a 31-year-old male (C) with bachelor’s degree. Prediction for his value of ln(AHE)? vi. Consider a 31-year-old male (D) without bachelor’s degree. Prediction for his value of ln(AHE)? vii. Predicted difference between person C’s and D’s earnings? i. Is the effect of age on earnings different for men than for women? Specify and estimate a regression, which can be used to answer this. j. Is the effect of age on earnings different for high school graduates than for college graduates? Specify and estimate a regression, which can be used to answer this. 2. Causes of terrorism? Variables in the dataset are Variables Definition ftmpop Number of fatalities from terrorist incidents in the country (1998–2004) per million population evmpop Number of terrorist events in the country (1998–2004) per million population gdppc GDP per capita lackpf Index of the lack of political freedoms 1–7 scale, 7 = Extremely limited political freedoms language Index of linguistic fractionalization (binary, 0 = no fractionalization) ethnic Index of ethnic fractionalization (binary, 0 = no fractionalization) religion Index of religious fractionalization (binary, 0 = no fractionalization) regional = 1 if the country is in the indicated region, = 0 otherwise dummies a. Preliminaries i. Plot a scatterplot of ftmpop vs. gdppc ii. Generate variables lnftmpop = log(ftmpop) and lngdppc = log(gdppc). Then produce the scatterplot of lnftmpop vs. lngdppc. iii. Produce a scatterplot of lnftmpop vs. lackpf. iv. Look at the scatterplots from i) and ii). Would you suggest using a) ftmpop and gdppc, or b) lnftmpop and lngdppc for modeling using linear regression? v. Now look at the scatterplot from iii). Is the relation between lnftmpop and lackpf linear or nonlinear? b. Estimate the following models. Report the heteroskedasticity-robust standard errors in the parentheses. Also, report the corresponding p-values in parentheses under F-statistics. Finally, mark the significance levels of the coefficients by * in 10%, ** in 5% and *** in 1%. You can mark your report your results in the following table or provide estimation outputs from EViews / R. Dependent variable: Regressor: lngdppc (1) (2) (3) (4) (5) lnftmpop lnftmpop lnftmpop lnftmpop lnftmpop ( lngdppc 2 ) ( – ) ( ) ( – – ( lackpf lackpf religion Mideast Other regional dummies (latinam, easteurope, africa, eastasia) Intercept – ) ) ( – – – – ( ethnic ) – ( 2 ) ( – – No – – No ) ( ) ( ) ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) – – – No – No Yes ( ) ( ) ( ) ( ) ( F-statistics testing the hypothesis that the population coefficients on the indicated regressors are all zero: lngdppc, lngdppc 2 – – – – ( ) 2 lackpf, lackpf – – ( ) ( ) ( ethnic, religion – – – ( ) ( Other regional dummies – – – – ( Regression summary statistics R-squared Adjusted R-squared n ) ) ) ) c. Interpreting the results: i. Use regression (1) to test the hypothesis that the coefficient on lngdppc is zero, against the alternative hypothesis that it is nonzero, at the 5% and 10% significance level. Explain that the coefficient means. ii. Use regression (3) to test the hypothesis that both the coefficients for lngdppc and lngdppc 2 are zero, against the alternative hypothesis that one or both coefficients are nonzero at the 5% significance level. iii. Why your conclusions differ in a) and b)? iv. Look at the regression (3), can you say whether the relation between lnftmpop and lngdppc is nonlinear? v. Look at the regression (3), can you say whether the relation between lnftmpop and lackpf is nonlinear? vi. Now use the regression (5) to test the null at the 5% significance level that the all coefficients on the other regional dummies are zero versus the alternative hypothesis that at least one is nonzero. Report also the number of restrictions q and critical value in this test. vii. Use regression (4) to test the hypothesis that the coefficients on ethnic and religion are both zero versus the alternative hypothesis that one or the other coefficient is nonzero. Explain what hypothesis you tested for, and what your conclusion is. d. Create a new binary variable called high_gdppc, which gets value 1 if gdppc is greater than or equal to the median in the dataset, and zero otherwise. Create also interaction variables high_lack=high_gdppc×lackpf and high_lack2=high_gdppc×lackpf 2 . Then, estimate the following models: Dependent variable: Regressor: high_gdppc (6) lnftmpop (7) lnftmpop ( ) ( ) ( ) ( ) ( ) ( ) lackpf lackpf 2 high_gdppc×lackpf – ( ) high_gdppc×lackpf 2 – ( ) ethnic – religion – high_gdppc×ethnic ) ( ) ( ) ( ) ( ) – high_gdppc×religion – Mideast – Other regional dummies (latinam, easteurope, africa, eastasia) Intercept No ( R-squared Adjusted R-squared ( ) – Yes ( ) – n