More exercises for seminars ECON 4150 Problem set 6. Testing simultaneous hypotheses on the regression coefficients - F tests. Solve the exercises: 8.9 using the file codd.xls, 8.13 using the data in the file cars.xls. Problem set 7. This is an exercise in demand analysis. You will find the data in the file table 8-3.xls Demand analysis: We are concerned with analysing the demand for beer in a sample of households. Our sample reports observations on 5 variables: the quantity of beer demanded (QB ) , the price of beer (PB) , the price of other liquor ( PL) , a price index of the remaining goods and services on the households’ budgets (PR) , and finally the households income (INC ). We are not certain of what will be the appropriate specification of this demand so we will try different forms. We start with the ln-ln functional form: (1) ln(QBi ) 0 1 ln( PBi ) 2 ln( PLi ) 3 ln( PRi ) 4 ln( INCi ) i where i denotes the random disturbances. Question A. How will you interpret the regression parameters in this demand function. Use PcGive and the attached data file table 8-3.xls to estimate this model. Question B. Do you think the signs of the estimates are reasonable? Substantiate your assertions. Explain how the numbers in the column P t are calculated. Standard consumer demand theory tells us that if prices and income increase by the same proportion we should expect no change in the quantity demanded. The consumers are said to have no money illusion. Question C. Show that the assumption of no money illusion applied to the demand function (1) implies the restriction: (2) 1 2 3 4 0 Question D. Use results and information from your regression to test the hypothesis: H 0 : 1 2 3 4 0 against with significance level 0.05 . H a : 1 2 3 4 0 Now we are told that the ln-ln functional form (1) has certain theoretical drawbacks. In addition to (1) we thus wish to analyse two forms which are linear in the relative price and the real income. (3) QBi 1 2 ( PLi PBi ) 3 ( PRi / PBi ) 4 ( INCi PBi ) ui (4) QBi 0 (1 PBi ) 1 2 ( PLi / PBi ) 3 ( PRi / PBi ) 4 ( INCi / PBi ) vi where u i and vi denote random disturbances. Use PcGive to estimate the regression equations (3) and (4). Question E. Which of the two equations (3) or (4) do you think is appropriate for testing the assumption of no money illusion? State the reason for your choice and explain how you would test this hypothesis. Question F. Assume that the number of household members ( Ni ) have been wrongly excluded from the regressions (3) and (4). Choose one of these regressions for further study, and explain formally how this misspecification effects the OLS estimates of the ' s when: (i) N i is uncorrelated with the explanatory variables already used in the equation. (ii) N i is correlated with the explanatory variables already used in the equation. Problem set 8. The next exercises illustrates the use of dummy variables in regression analysis. Gender and Wages For sample of Norwegian workers, consisting of 75 women and 75 men, we have observed the variables: gender, wages per hour, years of education. It is often said that women are underpaid in the labour market. We wish to investigate this assertion by applying regression analysis to our data-set. We use the following variables in our analysis: Wi - wage per hour paid to worker (i ), Ei worker’s (i ) number of years of education, the qualitative variable gender is represented by the dummy variables defined by: 1 if wor ker (i) is a woman 1 if wor ker (i) is a man DF (i) , DM (i) 0 if wor ker (i) is a man 0 if wor ker (i) is a woman The first part of the regression analysis is based on the equations: (1) Wi F DF (i) M DM (i) 1 Ei i i 1,2,........150 (2) Wi 0 1 DF (i) 1 Ei i , i 1,2,..........150 The specification (1) includes both dummy variables besides education, but excludes the interept term. The regression equation (2) includes an intercept term, but excludes the dummy variable for men DM (i). The results of these regressions applied to our data set are shown output (1) and output (2). Question 1 Clarify the relations between the parameters F , M in the regression equation (1) and the parameters 0 , 1 in the regression equation (2). Discuss the empirical results as they are shown in output (1) and in output (2) Question 2 In regression analyses where dummy variables enter the set of explanatory variables one usually prefer specification (2) to specification (1). Explain why. In order to obtain a more complete picture of the importance of gender for salaries we have also run the regressions: (3) Wi 0 1 DF (i) 1 Ei 2 ( DF (i) Ei ) i i 1,2,.......150 where the variable ( DF (i) Ei ) is the product of the variables DF (i) og Ei . (4) Wi 0 1 Ei i i 1,2,........150 The reults of these regression are shown in output (3) and output (4) Question 3 When you use the results of the run regressions do you agree or disagree to the assertion that women are underpaid in the Norwegian labour market? Substantiate your answer. Question 4 Use regression (2) (output (2)) to calculate the annual income for a man with 10 years’education and who works 1800 hours per year. Calculate the standard error of the estimate as a measure of the uncertainty of the estimate. You are informed that the covariance between ˆ0 and ˆ1 is –29.082. Question 5 Assume that you exchange the roles of the dummy variables DF (i) and DM (i) in the specification (2), so that in stead of (2) you run the regression: (2*) Wi 0 1 DM (i) 1 Ei i How do you think the output of regression (2*) will look like compared to the output of regression (2) (output 2)? Output 1 (Regression (1)) DF DM E sigma R^2 Coefficient 82.3653 124.549 2.65070 49.8728 0.174338 Std.Error 19.28 19.80 1.535 t-value 4.27 6.29 1.73 t-prob 0.000 0.000 0.086 RSS 365632.519 F(2,147) = 15.52 [0.000]** no. of observations 150 mean(timelønn) 135.707 no. of parameters var(timelønn) 3 2952.24 Output 2 (Regression (2)) Constant DF E sigma R^2 Coefficient 124.549 -42.1839 2.65070 49.8728 0.174338 no. of observations mean(timelønn) Std.Error 19.80 8.163 1.535 t-value 6.29 -5.17 1.73 RSS 365632.519 F(2,147) = 15.52 [0.000]** 150 135.707 no. of parameters var(timelønn) 3 2952.24 t-prob 0.000 0.000 0.086 Output 3 (Regression (3)) Constant DF E ( DF E) sigma R^2 Coefficient 129.266 -54.0309 2.26865 0.976874 50.0269 0.17488 Std.Error 25.03 39.13 1.973 3.155 t-value 5.16 -1.38 1.15 0.310 t-prob 0.000 0.169 0.252 0.757 RSS 365392.546 F(3,146) = 10.31 [0.000]** no. of observations 150 mean(timelønn) 135.707 no. of parameters var(timelønn) 4 2952.24 Output 4 (Regression (4)) Constant E sigma R^2 Coefficient 96.9259 3.18752 54.0306 0.0243395 Std.Error 20.66 1.659 RSS F(1,148) = no. of observations 150 mean(timelønn) 135.707 t-value 4.69 1.92 432057.218 3.692 [0.057] no. of parameters 2 var(timelønn) 2952.24 Problem set 9. Households’ expenditures on food. t-prob 0.000 0.057 Problem set 10. Exercises with dummy variables. Solve the exercises: 9.6 using the data file tuna.xls, 9.8 Problem set 11. This is an exercise with non-linear models. We are interested in investigating how households’ expenditures on food vary with their income . We have observations on the households’ expenditures on food (Y) and incomes ( R ) for 50 Norwegian households, both variables are measured in 1000 kroner. In addition do we have observations on the number of members in the households. ~ For a given income R do we assume that the expected value of Y , denoted Y , is given by the function: (1) R ~ Y R where and denote unknown, positive parameters. (a) Discuss if, in your opinion, the function (1) gives a good description of the relation between expenditures on food and income. Since (1) is non-linear in the parameters, we are unable to estimate the parameters og with ordinary OLS regression. We shall therefore approximate (1), first with a linear and then with quadratic funksjon. The linear approximation Yi is given by: (2) Yi 1 Ri i where 1 and i denotes random disturbances, i 1,2, ….. ,50 (b) Show that OLS estimator of 1 is given by: 50 (3) ˆ1 Y R i 1 50 i i R i 1 2 i Out-print 1 shows the results of the OLS applied to regression (2) together with the histogram of the residuals ˆi . (c) Give your comments to this regression, calculate the Jarque-Bera observator when (skewness) S 0.6332 and kurtosis k 4.5288. ~ In order to improve the approximation of Y , we now approximate (1) by a quadratic function.. The quadratic approximation of Yi is given by: (4) Yi 1 Ri 2 Ri i 2 where 2 2 and i are random disturbances, i 1,2,......50 Outprint 2 shows the results of OLS applied to the regression (4) together with histogram of the residuals ˆi . The variable RR in the outprint corresponds to R 2 in the regression equation (4). (d) Do you think that the results as they appear in this out-print are reasonable? Substantiate your answer. (e) Explain how you will use these results to deduce estimates of og and calculate ̂ and ˆ . (f) Use the expenditure function (1) to derive the Engel elasticity. Use your results above to calculate this elasticity for a household with income equal to 100 000 We suspect that the parameter 1 in equation (4) depends on the number of member in the household S , but we are uncertain how to specify this dependency. There are two proposals: (5) 1 0 1 S eller (6) 1 0 1 S where denotes the random disturbances satisfying the usual conditions. Out-print 3 shows the results of the regression: i 1,2,.......,50 (7) Yi 0 Ri 1 ( RS )i 2 Ri2 ui where u i denotes the random disturbances in this regression (g) Give your comments to this out-print. (h) Explain your opinion about choosing (5) or (6). Out-print 1 EQ( 1) Modelling Y by OLS-CS (using data.eksoppg.h2003.xls) The estimation sample is: 1 to 50 Coefficient 0.233338 R sigma Std.Error 0.01760 6.55912 t-value 13.3 RSS t-prob 0.000 2108.08312 no. of observations 50 no. of parameters 1 mean(Y) 12.5025 var(Y) 37.0663 Density 0.45 r:Y N(0,1) 0.40 0.35 0.30 0.25 0.20 0.15 0.10 0.05 -3.0 -2.5 -2.0 -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 no. of observations 50 no. of parameters 1 mean(Y) 12.5025 var(Y) 37.0663 Out-print 2 Q( 2) Modelling Y by OLS-CS (using data.eksoppg.h2003.xls) The estimation sample is: 1 to 50 R RR Coefficient Std.Error 0.371463 0.05604 -0.00213237 0.0008260 t-value 6.63 -2.58 t-prob 0.000 0.013 sigma 6.20997 RSS 1851.05693 no. of observations 50 no. of parameters 2 mean(Y) 12.5025 var(Y) 37.0663 4.5 0.5 Density r:Y N(0,1) 0.4 0.3 0.2 0.1 -3.0 -2.5 -2.0 -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 Out-print 3 Q( 3) Modelling Y by OLS-CS (using data.eksoppg.h2003.xls) The estimation sample is: 1 to 50 R RS RR Coefficient 0.170144 0.0623471 -0.00238358 Std.Error t-value 0.06098 2.79 0.01248 5.00 0.0006765 -3.52 sigma 5.07192 RSS 1209.04536 no. of observations 50 no. of parameters 3 mean(Y) 12.5025 var(Y) 37.0663 t-prob 0.008 0.000 0.001 Density r:Y 0.40 N(0,1) 0.35 0.30 0.25 0.20 0.15 0.10 0.05 -2.5 -2.0 -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 Problem set 12. Regressions when the disturbances might be autocorrelated. Exercise 12.1, Exercise 12.5, Exercise 12.6, Exercise 12.7