Homework 4 Diego Garzon dig5269 100 points The following two

advertisement
Homework 4
Diego Garzon
dig5269
100 points
The following two problems will require a lot of calculations in. It will generate many pages of output.
Here is how your should organize it. The first pages should contain your answers to all the questions,
along with showing any key algebraic equations or explanations you need to use along the way.
After that, include a printout of the output from the regressions you executed in support of your answers.
Highlight any numbers in this output that you used in the first section. Last, include a copy of the DO file
that contains the commands you asked STATA to execute. Be sure you organize these in a way that will
be clear to the reader. YOU ARE ENCOURAGED TO SAVE PAPER BY PRINTING YOUR STATA
OUTPUT “TWO-UP” AND DOUBLE-SIDED.
1) (50 points total) In this exercise we will study some determinants of housing prices. In
particular, we’ll be studying whether being located in same town as a university is important
for the price of housing. Descriptions of the variables are provided with the dataset
utown.dta. It’s always a good idea to make sure you understand what the different variables
are before you begin. In this case, it is also especially important to note the units that the
data is provided in.
a.) (10 points) We’ll start with some basic dummy variable regressions to test for differences
between groups of houses:
1) Regress price on utown. Report the coefficients and p-values. Do homes in university
towns have statistically different prices than non-universities?
No, homes in university towns do not have different prices than non-universities.
Coeff.: 61.51 p-value: 0.00
2) Regress price on fplace. Report the coefficients and p-values. Do homes with
fireplaces have significantly different prices than non-fireplaces?
Coeff.: 5.46 p-value: 0.041
No, homes with fireplaces don’t have significantly different prices than nonfireplaces.
3) Run a final regression to test whether the effect of a fireplace is different for houses in
university towns vs non-university towns. Report the coefficients and p-values. Can
you reject the hypothesis of no interaction effect?
Coeff.:
p-value:
Utown: 64.27
0.00
Fplace: 8.64
0.001
Utown_fplace: -5.24
0.150
No, you can’t reject the hypothesis of no interaction effect due to there being a
negative interaction effect.
4) What is the average price of a home in a university town without a fireplace?
Average price= regress price aph(utown/!fplace)
Average price=64.27
b.) (5 points) Now regress price on age and sqft. This is Model 1. To Model 1, add in a
quadratic term for sqft. This is Model 2.
1) Calculate the F-statistic for the hypothesis test that sqft and sqft^2 jointly have no
effect on price in Model 2. You can execute this test however you prefer,
including using STATA’s built-in “test” command.
Correlation=0.5947 for sqft and price
Correlation=0.5962 for sqft^2 and price
c.) (15 points) Go back to Model 1 and modify it so that you can determine whether square
feet has a different effect in university towns vs non-university. This is Model 3.
1) Is there a statistically significant difference in the effect of sqft in university vs.
non-university towns?
There is a slight difference in the effect of sqft on university vs. nonuniversity towns.
Utown and sqft: -.101
Utown and sqft_2: -.0001
Utown and sqft correlation: 0.0234
Utown and sqft_2 correlation: 0.0259
2) A house is located in a non-university town. The owners build an extension that
increases the square footage by 500 sqft. What is the change in the predicted
value of the house? [NOTE: in the dataset, sqft is in units of 100 sqft, and that
prices are in units of $1000.]
500/x=100/1000 is going to equal $5000 is the change in the predicted value of
the house.
3) An identical house to the above is located in a university town. The owners build
a 500 sqft extension. What is the change in the predicted value of the house?
The change in the predicted value of the house is going to be close to the
above predicted value because, as stated previously, there is a slight
difference in the effect of sqft on university vs. non-university towns.
d.) (20 points) In practice, price variables are often transformed with logarithms. This is for
some practical reasons described in class. Generate a a variable called lprice that is the
natural log of the price. Run Model 3 with lprice in place of price. This is Model 4.
Recall from class that the change in a log of a variable is approximately equal to the
perecentage change of the variable.
1) According to Model 4, if there are two identical homes, but one is twenty years
older than the other, what is the approximate percentage difference in prices?
Utown= age + lprice + sqft +sqft_2 +…
Utown=1+lprice+sqft+sqft_2+…
Utown=21+lprice+sqft+sqft_2+…
2) What is the elasticity of price with respect to square footage, when a house is
2500 square feet in size and is not in a university town?
3) What is the elasticity of price with respect to square footage, when a house is
1500 square feet in size and is not in a university town?
4) What is the elasticity of price with respect to square footage, when a house is
2500 square feet in size and is in a university town?
[A NOTE ABOUT MODEL 4. If you are looking closely at the results, you’ll notice a
puzzling thing. Compared to Model 3, the coefficient on the interaction term has flipped
sign from positive to negative. Why? The only difference between the models is that
we’re using log price instead of price as a dependent variable. Using the log of price
introduces a non-linearity which seems to be important in this case. With a logarithm,
the non-linearity is more pronounced for smaller values of the variable. If there is a
difference in the effect of square footage for different types of homes in a college town
(cheap rental homes vs. expensive faculty homes) the non-linear transformation could
reveal itself in that way. I’m just speculating on that, it is a strange thing to try and
interpret. Which model is better? They have approximately the same R2 terms (Model 3
is slightly higher), but Model 4 is more precise (higher t-stats) on three of the four
variables. It is unclear to me which model is prefered.]
2.) (50 points total) Estimating Cost and Production Functions
The file butter.dta has data downloaded from the NBER-CES manufacturing industry database.
The industry I have chosen is Creamery Butter (mmmm…. butter.)
Additionally, the five variable columns needed for this assignment are
vadd= value added (i.e. output)
pay= total payroll
invest=total capital expenditure
matcost=total expenditure on raw materials
year= year of observation
b) (25 points total) Estimate the cost functions
The total cost function models the total cost of production as a function of output. We need first
a measure of total cost, which we will take as the sum of labor, capital and materials costs. This
isn’t perfect but, we will measure this as the sum of the three input costs that we have:
gen tc=pay+invest+matcost
We will model total cost as a cubic function of output. Thus we must generate the squared and
cubed elements of the polynomial:
gen vadd_2=vadd^2
gen vadd_3=vadd^3
and then estimate the cost function:
reg tc vadd vadd_2 vadd_3, noc
Note the use of the noc (no constant) option. This forces the intercept term to equal zero.
1) (5 points) What is the economic interpretation of suppressing the constant term (forcing it
equal zero)? Is this a reasonable simplification?
Bhat estimators will be unbiased again according to the Omitted Variable Bias. No, this
is not a reasonable simplification.
2) (5 points) Write out the cost function as TCˆ  ˆ1vadd  ˆ2vadd 2  ˆ3vadd 3 (Use the
numbers from the regression you just ran.)
TC=20.25vadd+-.121vadd^2+.0002vadd^3
3) (5 points) Now use your regression of TC to create predicted values of the average cost
function and the marginal cost function.
Here’s how you do this in STATA. For average cost this is straightforward.
predict tcpred (This predicts the total cost for each level of vadd)
gen ac=tcpred/vadd
For marginal cost you need to issue the command
gen mc=a+2*b*vadd+3*c*vadd^2
(Where a, b, and c are beta values you have taken from your TC function of part 2.)
[Note: Why is this the marginal cost? Calculus. We have taken the derivative of tc with
respect to vadd, and the above formula is the result.]
Report these equations.
Average Cost= tcpred/vadd
Marginal Cost= 20.25+2*-.121vadd+3*.0002vadd^2
4) (10 points) Now we are going to graph the marginal and average cost functions.
sort vadd
scatter ac mc vadd, connect(l l)
[Note: The sort command lines the data up in the right order so that the graphs look nice.
The scatter command will put ac and mc on the vertical axis and output (vadd) on the
horizontal, just like in the textbooks, and the connect option will connect the dots (the l l
stands for connecting with unbroken lines.]
Print out this graph. Does it look like the textbook depiction of average and marginal
cost? Why or why not? That is, did your slope coefficients you estimated above conform
to expectations?
They are similar to the textbook depiction of average and marginal cost because if
we had more data it would give you a more graphical explanation. The slope coefficients
estimated conform to my expectations.
c) (25 points total) Estimate the production function
In this section you will estimate a Cobb-Douglas production function, where the log of output is
a function of the log of inputs (and time).
[Note: You can do the following without really knowing what’s going on with the
economics. But to understand it, you’ll need to recall Econ 302 and its material on
marginal productivity/ returns to scale for Cobb-Douglas production functions.]
Generate log versions of the output variable and the three inputs. For example:
gen lvadd=log(vadd) and do the same for pay, invest and material.
Run a regression of lvadd on the three log input variables, and year. Test the hypotheses that:
(5 points each)
1) The individual coefficient on each of the inputs =0. Rejection implies that the input is
productive.
We reject the hypothesis, therefore the input is productive.
2) The individual coefficient on each of the inputs =1. Rejection in the direction <1
implies diminishing marginal productivity of the input. [Note: This is asking for a
one-sided test.]
Lpay= 1.2
Linvest=.018
Lmatcost= -.253
Year=.019
Therefore we reject the hypothesis for linvest, lmatcost and year because there
coefficient is lower than 1. Lpay has a coefficient that is great than 1, therefore we
accept the null hypothesis.
3) The sum of the input coefficients=1. You may either do this with a by-hand F-test or
use STATA’s test command (which is itself an F-test), and it’s even possible to do
this with a t-test, if you really want to try. Rejection implies either increasing or
decreasing returns to scale. Failure to reject means constant returns to scale.
We accept the null hypothesis because the sum of the input coefficients is
approximately 1.
4) The coefficient of year =0. (Rejection (in the positive direction) implies technical
progress is being made). [Once again, this is a one-sided test.]
The coefficient of year is .019, which is greater than 0. We can’t take the null
hypothesis, this therefore implies that there is technical progress being made.
5) Broadly, do you think this production function conforms to the ideas of economic
theory?
Three out of the four inputs are being productive, therefore, this production
function conforms to the ideas of economic theory.
Do file and results:
. do "/var/folders/JP/JPx2zUbcH1Cf5wi6EddkxU+++TI/-Tmp-//SD15349.000000"
. //Do File for HW 4 problem 2
. summarize
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------year |
46
pay |
46 46.00652 12.05072
matcost |
1980.5 13.42262
46 1059.493
269.601
1958
2003
26.3
73.6
710.3
1610.8
vadd |
46 153.9391 67.03739
52.4
329.6
invest |
46 12.20435 8.926489
3.8
49
-------------+-------------------------------------------------------tc |
46 1117.704 275.1079
760.6
1658
vadd_2 |
46 28093.57 24912.13 2745.76 108636.2
vadd_3 |
46
tcpred |
ac |
5926210
7876054 143877.8 3.58e+07
46 1108.627 174.0089 761.8911 1956.078
46 8.207979 2.633691 4.677323 14.53991
-------------+-------------------------------------------------------mc |
46 -.1471265 3.345893 -4.13125 9.216656
lvadd |
46 4.947083 .4294764 3.958907 5.79788
lpay |
46 3.796396 .2564117 3.269569 4.298645
linvest |
46 2.350803 .5025254 1.335001 3.89182
lmatcost |
46 6.935213 .2466335 6.565687 7.384486
. //b
. //gen tc=pay+invest+matcost
. //gen vadd_2=vadd^2
. //gen vadd_3=vadd^3
. reg tc vadd vadd_2 vadd_3, noc
Source |
SS
df
MS
Number of obs =
-------------+-----------------------------Model | 57898991.1
46
F( 3, 43) = 279.15
3 19299663.7
Prob > F
Residual | 2972902.86 43 69137.2758
-------------+------------------------------
= 0.0000
R-squared
= 0.9512
Adj R-squared = 0.9478
Total | 60871894 46 1323302.04
Root MSE
= 262.94
-----------------------------------------------------------------------------tc |
Coef. Std. Err.
t P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------vadd | 20.25275 1.99598 10.15 0.000
16.22747 24.27803
vadd_2 | -.1214211 .020406 -5.95 0.000 -.1625738 -.0802684
vadd_3 | .0002366 .000048
4.93 0.000
.0001397 .0003335
-----------------------------------------------------------------------------. //b.3
. //predict tcpred
. //gen ac=tcpred/vadd
. //gen mc=20.25+2*-.121*vadd+3*.0002*vadd^2
. //b.4
. sort vadd
. //scatter ac mc vadd, connect(l l)
. //gen lvadd=log(vadd)
. //gen lpay=log(pay)
. //gen linvest=log(invest)
. //gen lmatcost=log(matcost)
. reg lvadd lpay linvest lmatcost year
Source |
SS
df
MS
Number of obs =
-------------+-----------------------------Model | 6.27505303
F( 4, 41) = 31.76
4 1.56876326
Prob > F
Residual | 2.02519469 41 .049394993
-------------+------------------------------
46
R-squared
= 0.0000
= 0.7560
Adj R-squared = 0.7322
Total | 8.30024772 45 .184449949
Root MSE
= .22225
-----------------------------------------------------------------------------lvadd |
Coef. Std. Err.
t P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------lpay | 1.207201 .1708107
linvest | .0184269 .0860557
7.07 0.000
.862242 1.552161
0.21 0.832 -.1553661 .1922199
lmatcost | -.2528764 .1577418 -1.60 0.117 -.5714424 .0656897
year | .0189958 .0028168
6.74 0.000
.0133071 .0246844
_cons | -35.5466 5.244685 -6.78 0.000 -46.13845 -24.95474
-----------------------------------------------------------------------------. test lpay==1
( 1) lpay = 1
F( 1, 41) = 1.47
Prob > F = 0.2321
. test linvest==1
( 1) linvest = 1
F( 1, 41) = 130.10
Prob > F = 0.0000
. test lmatcost==1
( 1) lmatcost = 1
F( 1, 41) = 63.08
Prob > F = 0.0000
. test year==1
( 1) year = 1
F( 1, 41) = 1.2e+05
Prob > F = 0.0000
end of do-file
//Do File for HW 4 problem 2
summarize
//b
//gen tc=pay+invest+matcost
//gen vadd_2=vadd^2
//gen vadd_3=vadd^3
reg tc vadd vadd_2 vadd_3, noc
//b.3
//predict tcpred
//gen ac=tcpred/vadd
//gen mc=20.25+2*-.121*vadd+3*.0002*vadd^2
//b.4
sort vadd
//scatter ac mc vadd, connect(l l)
//c
//gen lvadd=log(vadd)
//gen lpay=log(pay)
//gen linvest=log(invest)
//gen lmatcost=log(matcost)
reg lvadd lpay linvest lmatcost year
test lpay==1
test linvest==1
test lmatcost==1
test year==1
Download