Exercises: 1st and 2nd day in PC lab

advertisement
PC Exercises 1 & 2
First day in PC lab
Introduction to stata
Copy folder "stata" from public U:-drive to your own drive H:
Example 1: Food expenditure
• Use "H:\stata\foodexp.dta"
• browse
• describe
• discussion on model: foodexp = beta1 + beta2 (income) + e
• scatter foodexp income
• check linear fit: twoway (scatter foodexp income) (lfit foodexp income),
• regress foodexp income and interpret coefficients and confidence intervall
• lecture: Hypothesis testing
• regress foodexp income and interpret t-statistics
Second day in PC lab
• test income=0 (H0: beta2 =0, t=(b2-0)/se(b2)
• explain relationship F-distribution – t-distribution: An F random variable with 1
numerator and m denominator DF is equal to the square of a t(m) random variable.
• display sqrt(17….) and compare with computer output
• test income=0.1 and compare with computer output
• test _cons=0 and compare with computer output
• vce (variance covariance matrix), taking sqrt results in standard errors
• regress foodexp income, nocons (only when your are absolutely sure, from a theory
point of view, that beta1 has to be zero; otherwise, the estimator for beta2 will be
biased, because the assumptions are violated, R2 (see below) should not be
interpreted)
• regress foodexp income (regress without variables will always repeat the last
regression model)
• interpret R2, ANOVA Table
• explain relationship between R2 and Pearson's r (correlation coefficient)
• correlate foodexp income,
• display 0.56^2 (R2 = rho2)
• show relationship between SSR and SST
• display SSR/SST
• predict y_roof
• browse
• scatter y_roof foodexp income, c(l) (first variable is connected with line)
• show where to find relevant information in the stata output (MS residual)
• explain stata table output "root MSE" = Root mean square of the error =
eˆt2
∑
2
σˆ =
T −2
• show importance of professional reflections before setting up model:
• regress income foodexp (model not supported by theory in most cases, but still
works)
Scaling the data
• generate foodex100=foodexp/100 (expressed in 100$)
• generate income100=income/100
• regress foodexp income100
• regress foodex100 income
• regress foodex100 income100
Choosing functional form
• scatter foodexp income
• gen lnfood=ln(foodexp)
• label var lnfood “natural log of foodexp”
• gen lninc=ln(income)
• label var lninc “natural log of income”
• regress lnfood lninc (constant elasticity of 0.69)
• predict y_roof1
• (optional, since self-explaining) show relationship between transformed variables and
original variables:
• gen fit1=exp(y_roof1)
• scatter fit1 foodexp income, c(l)
• twoway(scatter fit1 foodexp income, c(l)) (lfit foodexp income) compares linear
assumption of relationship with log-log assumption of relationship
• Optional: a little example of combining economic reflection with statistical testing:
• test lninc=1 (income elasticity is equal to 1)
• display sqrt(4.95) = calculate t-value, compare with critical t-value at 5% level ->
reject. Engel’s law tells us that it is smaller than 1, therefore one-tailed test would be
okay. In this case, result would be the same (p-value will be half of that from the
equality test).
• Engel's law is an observation in economics stating that, with a given set of tastes and
preferences, as income rises, the proportion of income spent on food falls, even if
actual expenditure on food rises. In other words, the income elasticity of demand of
food is less than 1.
Wheat yield example
• use H:\stata\yield.dta
• Wheat yield observations for a region in Western Australia with annual observations
for the period 1950-1997
• describe
• why should yield increase over time? (technology)
• suppose we want to estimate the effect of technological improvements on yield
(important for research policy)
• direct data on technology are not available, but time is a proxy
• scatter yield time (graph, possibly linear)
• yt = β1 + β 2 xt + et
• implies that yield increases at the same rate β2
• regress yield time
• predict y_roof
• scatter y_roof yield time, c(l)
• there is a pattern of the residuals: positive residuals at both ends and negative inbetween
• predict residual, residuals
• graph bar residual, over(time)
Does not seem to be appropriate functional form
What could be done? y=b1 + b2 x2 or x3
∂y
yt = β1 + β 2 xt2 + et
= 2 β 2 xt
∂x
∂y
yt = β1 + β 2 xt3 + et
= 3β 2 xt2
∂x
both have increasing marginal effects if β2 is positive
200
150
x4
x3
100
50
x2
0
-50
-8 -7 -6 -5 -4 -3 -2 -1 0
1
2
3
4
5
6
7
8
9 10
-100
-150
-200
Test x2 and x3
• gen time2=time^2
• gen time3=time^3
• regress yield time2
Analyze regression result
• predict yhat2
• predict residual2, residuals
• scatter yhat2 yield time, c(l)
• graph bar residual2, over(time)
• same for time3
Test for normality of residuals (desirable because y and e have to be normally
distributed)
• histogram residual, normal (makes histogram for normality test)
• sktest residual (skewness-curtosis test for normality)
• skewness: how symmetric is the distribution around zero?
• kurtosis refers to the peakedness of the distribution (the normal has a particular
peakednes). If the probability is bigger than 0.05 (or other chosen confidence level)
then H0 (i.e. that distribution is normal) cannot be rejected.
• not rejected for residual
• also not for residual2 and residual3
• do not look for highest p-values
Moscow Makkers Example
Download