PC Exercises 1 & 2 First day in PC lab Introduction to stata Copy folder "stata" from public U:-drive to your own drive H: Example 1: Food expenditure • Use "H:\stata\foodexp.dta" • browse • describe • discussion on model: foodexp = beta1 + beta2 (income) + e • scatter foodexp income • check linear fit: twoway (scatter foodexp income) (lfit foodexp income), • regress foodexp income and interpret coefficients and confidence intervall • lecture: Hypothesis testing • regress foodexp income and interpret t-statistics Second day in PC lab • test income=0 (H0: beta2 =0, t=(b2-0)/se(b2) • explain relationship F-distribution – t-distribution: An F random variable with 1 numerator and m denominator DF is equal to the square of a t(m) random variable. • display sqrt(17….) and compare with computer output • test income=0.1 and compare with computer output • test _cons=0 and compare with computer output • vce (variance covariance matrix), taking sqrt results in standard errors • regress foodexp income, nocons (only when your are absolutely sure, from a theory point of view, that beta1 has to be zero; otherwise, the estimator for beta2 will be biased, because the assumptions are violated, R2 (see below) should not be interpreted) • regress foodexp income (regress without variables will always repeat the last regression model) • interpret R2, ANOVA Table • explain relationship between R2 and Pearson's r (correlation coefficient) • correlate foodexp income, • display 0.56^2 (R2 = rho2) • show relationship between SSR and SST • display SSR/SST • predict y_roof • browse • scatter y_roof foodexp income, c(l) (first variable is connected with line) • show where to find relevant information in the stata output (MS residual) • explain stata table output "root MSE" = Root mean square of the error = eˆt2 ∑ 2 σˆ = T −2 • show importance of professional reflections before setting up model: • regress income foodexp (model not supported by theory in most cases, but still works) Scaling the data • generate foodex100=foodexp/100 (expressed in 100$) • generate income100=income/100 • regress foodexp income100 • regress foodex100 income • regress foodex100 income100 Choosing functional form • scatter foodexp income • gen lnfood=ln(foodexp) • label var lnfood “natural log of foodexp” • gen lninc=ln(income) • label var lninc “natural log of income” • regress lnfood lninc (constant elasticity of 0.69) • predict y_roof1 • (optional, since self-explaining) show relationship between transformed variables and original variables: • gen fit1=exp(y_roof1) • scatter fit1 foodexp income, c(l) • twoway(scatter fit1 foodexp income, c(l)) (lfit foodexp income) compares linear assumption of relationship with log-log assumption of relationship • Optional: a little example of combining economic reflection with statistical testing: • test lninc=1 (income elasticity is equal to 1) • display sqrt(4.95) = calculate t-value, compare with critical t-value at 5% level -> reject. Engel’s law tells us that it is smaller than 1, therefore one-tailed test would be okay. In this case, result would be the same (p-value will be half of that from the equality test). • Engel's law is an observation in economics stating that, with a given set of tastes and preferences, as income rises, the proportion of income spent on food falls, even if actual expenditure on food rises. In other words, the income elasticity of demand of food is less than 1. Wheat yield example • use H:\stata\yield.dta • Wheat yield observations for a region in Western Australia with annual observations for the period 1950-1997 • describe • why should yield increase over time? (technology) • suppose we want to estimate the effect of technological improvements on yield (important for research policy) • direct data on technology are not available, but time is a proxy • scatter yield time (graph, possibly linear) • yt = β1 + β 2 xt + et • implies that yield increases at the same rate β2 • regress yield time • predict y_roof • scatter y_roof yield time, c(l) • there is a pattern of the residuals: positive residuals at both ends and negative inbetween • predict residual, residuals • graph bar residual, over(time) Does not seem to be appropriate functional form What could be done? y=b1 + b2 x2 or x3 ∂y yt = β1 + β 2 xt2 + et = 2 β 2 xt ∂x ∂y yt = β1 + β 2 xt3 + et = 3β 2 xt2 ∂x both have increasing marginal effects if β2 is positive 200 150 x4 x3 100 50 x2 0 -50 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 9 10 -100 -150 -200 Test x2 and x3 • gen time2=time^2 • gen time3=time^3 • regress yield time2 Analyze regression result • predict yhat2 • predict residual2, residuals • scatter yhat2 yield time, c(l) • graph bar residual2, over(time) • same for time3 Test for normality of residuals (desirable because y and e have to be normally distributed) • histogram residual, normal (makes histogram for normality test) • sktest residual (skewness-curtosis test for normality) • skewness: how symmetric is the distribution around zero? • kurtosis refers to the peakedness of the distribution (the normal has a particular peakednes). If the probability is bigger than 0.05 (or other chosen confidence level) then H0 (i.e. that distribution is normal) cannot be rejected. • not rejected for residual • also not for residual2 and residual3 • do not look for highest p-values Moscow Makkers Example