Homework 4 Diego Garzon dig5269 100 points The following two problems will require a lot of calculations in. It will generate many pages of output. Here is how your should organize it. The first pages should contain your answers to all the questions, along with showing any key algebraic equations or explanations you need to use along the way. After that, include a printout of the output from the regressions you executed in support of your answers. Highlight any numbers in this output that you used in the first section. Last, include a copy of the DO file that contains the commands you asked STATA to execute. Be sure you organize these in a way that will be clear to the reader. YOU ARE ENCOURAGED TO SAVE PAPER BY PRINTING YOUR STATA OUTPUT “TWO-UP” AND DOUBLE-SIDED. 1) (50 points total) In this exercise we will study some determinants of housing prices. In particular, we’ll be studying whether being located in same town as a university is important for the price of housing. Descriptions of the variables are provided with the dataset utown.dta. It’s always a good idea to make sure you understand what the different variables are before you begin. In this case, it is also especially important to note the units that the data is provided in. a.) (10 points) We’ll start with some basic dummy variable regressions to test for differences between groups of houses: 1) Regress price on utown. Report the coefficients and p-values. Do homes in university towns have statistically different prices than non-universities? No, homes in university towns do not have different prices than non-universities. Coeff.: 61.51 p-value: 0.00 2) Regress price on fplace. Report the coefficients and p-values. Do homes with fireplaces have significantly different prices than non-fireplaces? Coeff.: 5.46 p-value: 0.041 No, homes with fireplaces don’t have significantly different prices than nonfireplaces. 3) Run a final regression to test whether the effect of a fireplace is different for houses in university towns vs non-university towns. Report the coefficients and p-values. Can you reject the hypothesis of no interaction effect? Coeff.: p-value: Utown: 64.27 0.00 Fplace: 8.64 0.001 Utown_fplace: -5.24 0.150 No, you can’t reject the hypothesis of no interaction effect due to there being a negative interaction effect. 4) What is the average price of a home in a university town without a fireplace? Average price= regress price aph(utown/!fplace) Average price=64.27 b.) (5 points) Now regress price on age and sqft. This is Model 1. To Model 1, add in a quadratic term for sqft. This is Model 2. 1) Calculate the F-statistic for the hypothesis test that sqft and sqft^2 jointly have no effect on price in Model 2. You can execute this test however you prefer, including using STATA’s built-in “test” command. Correlation=0.5947 for sqft and price Correlation=0.5962 for sqft^2 and price c.) (15 points) Go back to Model 1 and modify it so that you can determine whether square feet has a different effect in university towns vs non-university. This is Model 3. 1) Is there a statistically significant difference in the effect of sqft in university vs. non-university towns? There is a slight difference in the effect of sqft on university vs. nonuniversity towns. Utown and sqft: -.101 Utown and sqft_2: -.0001 Utown and sqft correlation: 0.0234 Utown and sqft_2 correlation: 0.0259 2) A house is located in a non-university town. The owners build an extension that increases the square footage by 500 sqft. What is the change in the predicted value of the house? [NOTE: in the dataset, sqft is in units of 100 sqft, and that prices are in units of $1000.] 500/x=100/1000 is going to equal $5000 is the change in the predicted value of the house. 3) An identical house to the above is located in a university town. The owners build a 500 sqft extension. What is the change in the predicted value of the house? The change in the predicted value of the house is going to be close to the above predicted value because, as stated previously, there is a slight difference in the effect of sqft on university vs. non-university towns. d.) (20 points) In practice, price variables are often transformed with logarithms. This is for some practical reasons described in class. Generate a a variable called lprice that is the natural log of the price. Run Model 3 with lprice in place of price. This is Model 4. Recall from class that the change in a log of a variable is approximately equal to the perecentage change of the variable. 1) According to Model 4, if there are two identical homes, but one is twenty years older than the other, what is the approximate percentage difference in prices? Utown= age + lprice + sqft +sqft_2 +… Utown=1+lprice+sqft+sqft_2+… Utown=21+lprice+sqft+sqft_2+… 2) What is the elasticity of price with respect to square footage, when a house is 2500 square feet in size and is not in a university town? 3) What is the elasticity of price with respect to square footage, when a house is 1500 square feet in size and is not in a university town? 4) What is the elasticity of price with respect to square footage, when a house is 2500 square feet in size and is in a university town? [A NOTE ABOUT MODEL 4. If you are looking closely at the results, you’ll notice a puzzling thing. Compared to Model 3, the coefficient on the interaction term has flipped sign from positive to negative. Why? The only difference between the models is that we’re using log price instead of price as a dependent variable. Using the log of price introduces a non-linearity which seems to be important in this case. With a logarithm, the non-linearity is more pronounced for smaller values of the variable. If there is a difference in the effect of square footage for different types of homes in a college town (cheap rental homes vs. expensive faculty homes) the non-linear transformation could reveal itself in that way. I’m just speculating on that, it is a strange thing to try and interpret. Which model is better? They have approximately the same R2 terms (Model 3 is slightly higher), but Model 4 is more precise (higher t-stats) on three of the four variables. It is unclear to me which model is prefered.] 2.) (50 points total) Estimating Cost and Production Functions The file butter.dta has data downloaded from the NBER-CES manufacturing industry database. The industry I have chosen is Creamery Butter (mmmm…. butter.) Additionally, the five variable columns needed for this assignment are vadd= value added (i.e. output) pay= total payroll invest=total capital expenditure matcost=total expenditure on raw materials year= year of observation b) (25 points total) Estimate the cost functions The total cost function models the total cost of production as a function of output. We need first a measure of total cost, which we will take as the sum of labor, capital and materials costs. This isn’t perfect but, we will measure this as the sum of the three input costs that we have: gen tc=pay+invest+matcost We will model total cost as a cubic function of output. Thus we must generate the squared and cubed elements of the polynomial: gen vadd_2=vadd^2 gen vadd_3=vadd^3 and then estimate the cost function: reg tc vadd vadd_2 vadd_3, noc Note the use of the noc (no constant) option. This forces the intercept term to equal zero. 1) (5 points) What is the economic interpretation of suppressing the constant term (forcing it equal zero)? Is this a reasonable simplification? Bhat estimators will be unbiased again according to the Omitted Variable Bias. No, this is not a reasonable simplification. 2) (5 points) Write out the cost function as TCˆ ˆ1vadd ˆ2vadd 2 ˆ3vadd 3 (Use the numbers from the regression you just ran.) TC=20.25vadd+-.121vadd^2+.0002vadd^3 3) (5 points) Now use your regression of TC to create predicted values of the average cost function and the marginal cost function. Here’s how you do this in STATA. For average cost this is straightforward. predict tcpred (This predicts the total cost for each level of vadd) gen ac=tcpred/vadd For marginal cost you need to issue the command gen mc=a+2*b*vadd+3*c*vadd^2 (Where a, b, and c are beta values you have taken from your TC function of part 2.) [Note: Why is this the marginal cost? Calculus. We have taken the derivative of tc with respect to vadd, and the above formula is the result.] Report these equations. Average Cost= tcpred/vadd Marginal Cost= 20.25+2*-.121vadd+3*.0002vadd^2 4) (10 points) Now we are going to graph the marginal and average cost functions. sort vadd scatter ac mc vadd, connect(l l) [Note: The sort command lines the data up in the right order so that the graphs look nice. The scatter command will put ac and mc on the vertical axis and output (vadd) on the horizontal, just like in the textbooks, and the connect option will connect the dots (the l l stands for connecting with unbroken lines.] Print out this graph. Does it look like the textbook depiction of average and marginal cost? Why or why not? That is, did your slope coefficients you estimated above conform to expectations? They are similar to the textbook depiction of average and marginal cost because if we had more data it would give you a more graphical explanation. The slope coefficients estimated conform to my expectations. c) (25 points total) Estimate the production function In this section you will estimate a Cobb-Douglas production function, where the log of output is a function of the log of inputs (and time). [Note: You can do the following without really knowing what’s going on with the economics. But to understand it, you’ll need to recall Econ 302 and its material on marginal productivity/ returns to scale for Cobb-Douglas production functions.] Generate log versions of the output variable and the three inputs. For example: gen lvadd=log(vadd) and do the same for pay, invest and material. Run a regression of lvadd on the three log input variables, and year. Test the hypotheses that: (5 points each) 1) The individual coefficient on each of the inputs =0. Rejection implies that the input is productive. We reject the hypothesis, therefore the input is productive. 2) The individual coefficient on each of the inputs =1. Rejection in the direction <1 implies diminishing marginal productivity of the input. [Note: This is asking for a one-sided test.] Lpay= 1.2 Linvest=.018 Lmatcost= -.253 Year=.019 Therefore we reject the hypothesis for linvest, lmatcost and year because there coefficient is lower than 1. Lpay has a coefficient that is great than 1, therefore we accept the null hypothesis. 3) The sum of the input coefficients=1. You may either do this with a by-hand F-test or use STATA’s test command (which is itself an F-test), and it’s even possible to do this with a t-test, if you really want to try. Rejection implies either increasing or decreasing returns to scale. Failure to reject means constant returns to scale. We accept the null hypothesis because the sum of the input coefficients is approximately 1. 4) The coefficient of year =0. (Rejection (in the positive direction) implies technical progress is being made). [Once again, this is a one-sided test.] The coefficient of year is .019, which is greater than 0. We can’t take the null hypothesis, this therefore implies that there is technical progress being made. 5) Broadly, do you think this production function conforms to the ideas of economic theory? Three out of the four inputs are being productive, therefore, this production function conforms to the ideas of economic theory. Do file and results: . do "/var/folders/JP/JPx2zUbcH1Cf5wi6EddkxU+++TI/-Tmp-//SD15349.000000" . //Do File for HW 4 problem 2 . summarize Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------year | 46 pay | 46 46.00652 12.05072 matcost | 1980.5 13.42262 46 1059.493 269.601 1958 2003 26.3 73.6 710.3 1610.8 vadd | 46 153.9391 67.03739 52.4 329.6 invest | 46 12.20435 8.926489 3.8 49 -------------+-------------------------------------------------------tc | 46 1117.704 275.1079 760.6 1658 vadd_2 | 46 28093.57 24912.13 2745.76 108636.2 vadd_3 | 46 tcpred | ac | 5926210 7876054 143877.8 3.58e+07 46 1108.627 174.0089 761.8911 1956.078 46 8.207979 2.633691 4.677323 14.53991 -------------+-------------------------------------------------------mc | 46 -.1471265 3.345893 -4.13125 9.216656 lvadd | 46 4.947083 .4294764 3.958907 5.79788 lpay | 46 3.796396 .2564117 3.269569 4.298645 linvest | 46 2.350803 .5025254 1.335001 3.89182 lmatcost | 46 6.935213 .2466335 6.565687 7.384486 . //b . //gen tc=pay+invest+matcost . //gen vadd_2=vadd^2 . //gen vadd_3=vadd^3 . reg tc vadd vadd_2 vadd_3, noc Source | SS df MS Number of obs = -------------+-----------------------------Model | 57898991.1 46 F( 3, 43) = 279.15 3 19299663.7 Prob > F Residual | 2972902.86 43 69137.2758 -------------+------------------------------ = 0.0000 R-squared = 0.9512 Adj R-squared = 0.9478 Total | 60871894 46 1323302.04 Root MSE = 262.94 -----------------------------------------------------------------------------tc | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------vadd | 20.25275 1.99598 10.15 0.000 16.22747 24.27803 vadd_2 | -.1214211 .020406 -5.95 0.000 -.1625738 -.0802684 vadd_3 | .0002366 .000048 4.93 0.000 .0001397 .0003335 -----------------------------------------------------------------------------. //b.3 . //predict tcpred . //gen ac=tcpred/vadd . //gen mc=20.25+2*-.121*vadd+3*.0002*vadd^2 . //b.4 . sort vadd . //scatter ac mc vadd, connect(l l) . //gen lvadd=log(vadd) . //gen lpay=log(pay) . //gen linvest=log(invest) . //gen lmatcost=log(matcost) . reg lvadd lpay linvest lmatcost year Source | SS df MS Number of obs = -------------+-----------------------------Model | 6.27505303 F( 4, 41) = 31.76 4 1.56876326 Prob > F Residual | 2.02519469 41 .049394993 -------------+------------------------------ 46 R-squared = 0.0000 = 0.7560 Adj R-squared = 0.7322 Total | 8.30024772 45 .184449949 Root MSE = .22225 -----------------------------------------------------------------------------lvadd | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------lpay | 1.207201 .1708107 linvest | .0184269 .0860557 7.07 0.000 .862242 1.552161 0.21 0.832 -.1553661 .1922199 lmatcost | -.2528764 .1577418 -1.60 0.117 -.5714424 .0656897 year | .0189958 .0028168 6.74 0.000 .0133071 .0246844 _cons | -35.5466 5.244685 -6.78 0.000 -46.13845 -24.95474 -----------------------------------------------------------------------------. test lpay==1 ( 1) lpay = 1 F( 1, 41) = 1.47 Prob > F = 0.2321 . test linvest==1 ( 1) linvest = 1 F( 1, 41) = 130.10 Prob > F = 0.0000 . test lmatcost==1 ( 1) lmatcost = 1 F( 1, 41) = 63.08 Prob > F = 0.0000 . test year==1 ( 1) year = 1 F( 1, 41) = 1.2e+05 Prob > F = 0.0000 end of do-file //Do File for HW 4 problem 2 summarize //b //gen tc=pay+invest+matcost //gen vadd_2=vadd^2 //gen vadd_3=vadd^3 reg tc vadd vadd_2 vadd_3, noc //b.3 //predict tcpred //gen ac=tcpred/vadd //gen mc=20.25+2*-.121*vadd+3*.0002*vadd^2 //b.4 sort vadd //scatter ac mc vadd, connect(l l) //c //gen lvadd=log(vadd) //gen lpay=log(pay) //gen linvest=log(invest) //gen lmatcost=log(matcost) reg lvadd lpay linvest lmatcost year test lpay==1 test linvest==1 test lmatcost==1 test year==1