*Lab 1 - 3: Stata Introduction, Tests, Violations of GM, Specification Issues *Intro *Open stata, change the directory to your drive cd c:\ *open the datafile lab1.dta Use lab1.dta *open a log-file to save the results: log using name.log, [replace append] *Note: at the end of the session you have to close the log file by: log *close *use commands that describe the characteristics of the variables: descr x1 x2 list x10 sum x1 – x10 tab x9 x10 tab1 x8 x9 x10 tabstat x1 x2 x3 x4 x5 x6 y, statistics(min mean median p25 p75 max sd var range) *generate new variables gen x_inter=x4*x9 egen x1_mean=mean(x1) *look at some correlations and covariances corr x1 x2 x3 x4 x5 corr x1 x2 x3 x4 x5, cov *use the drop down menu to look at some histograms and density plots twoway (scatter x1 x3) kdensity x1, norm * T-Tests and Correlation /*download the file garmit_esspanel1.dta into your drive this file contains a panel of 18 OECD countries from 1961 till 1994. Backgroud-reading: Garrett-Mitchell paper on government spending*/ 1 *open stata *change the working directory to your m drive: cd "c:\...\..." *open the data file in stata: use garmit_esspanel1.dta *summarize the main characteristics of the interesting variables: sum spend unem growthpc depratio left cdem trade lowwage fdi skand *run some simple t-tests: ttest spend=30 ttest spend=41 /*now test, whether government spending is different for different regions/ political systems */ ttest spend, by(skand) ttest spend, by(skand) unequal ttest spend, by(angel) ttest spend, by(angel) unequal /*or test whether the dispersion of government spending is different in different regions:*/ sdtest spend, by(skand) sdtest spend, by(angel) /*Is the dependent variable "spend" normally distributed (enough)? */ hist spend kdensity spend swilk spend sktest spend /* calculate some simple bivariate covariances and correlations and interpret the results, what is the problem with bivariate correlation? */ corr corr corr corr spend spend spend spend unem left unem, cov left, cov /*estimate several binary OLS models for the regression results:*/ reg reg reg reg reg spend spend spend spend spend government spending and interpret unem left growthpc trade depratio * OLS and Violations of GM 2 *summarize the main characteristics of the interesting variables: sum spend unem growthpc depratio left cdem trade lowwage fdi skand *estimate a basic model for government spending and interpret the *regression table: Interpret the regression results, coefficients, standard *errors, confidence intervals, R², F-Test reg spend unem growthpc depratio left cdem trade lowwage fdi *now change the confidence levels (note the default setting is 95% levels) set level 99 set level 90 *estimate the same regression again, what has changed, how and why? *Set the level back to default: set level 95 *estimate the model again: reg spend unem growthpc depratio left cdem trade lowwage fdi *Now run the same regression but estimate standardized coefficients – how *to interpret standardized coefficients? Why do we need standardized *coefficients? reg spend unem growthpc depratio left cdem trade lowwage fdi, beta *AV-plots (additional) variables are for multivariate models to show what *ceteris paribus effect a single independent variable has: avplots avplot unem *for linear OLS models the marginal effect of a single independent variable *is equal for all values of the independent variable: mfx mfx compute, at(mean) mfx compute, at(median) *__________________________________________________________________________ *Multicollinearity: *calculate correlation coefficients for all explanatory variables, do you *find problems of multi-collinearity? corr unem growthpc depratio left cdem trade lowwage fdi skand *What is *bias? the trade-off between multi-collinearity and omitted variable 3 *If there is complete multicollinearity we don't have to worry, STATA does *the job: *Linear transformations of a variable are perfect multicollinear to the *original variable: gen unem2=2*unem+3 corr unem unem2 reg spend unem unem2 if year==1986 *Was does Stata do in case of perfect multicollinearity? *Now look at two variables that *multicollinear, what is the problem? are highly correlated but not gen unem3=2*unem+3*invnorm(uniform()) corr unem unem3 reg spend unem unem3 if year==1986 reg spend unem skand if year==1986 reg spend unem unem3 skand if year==1986 *run the original regression again: reg spend unem growthpc depratio left cdem trade lowwage fdi *Run a variance inflation factor test for higher order multicollinearity: estat vif estat vif, uncentered *How to interprete the vif, what does vif measure? * Principal component analysis: corr trade lowwage fdi factor trade lowwage fdi, pcf predict score, r corr trade lowwage fdi *__________________________________________________________________________ *Outliers: *test for outliers: reg spend unem growthpc depratio left cdem trade lowwage fdi dotplot spend symplot spend rvfplot lvr2plot lvr2plot, ml(country) dfbeta *Solution: jacknife and bootstrapping: 4 jacknife _b _se, eclass saving(name.dta): reg spend unem growthpc depratio left cdem trade lowwage fdi skand bootstrap _b _se, reps(1000) saving(name.dta): depratio left cdem trade lowwage fdi skand reg spend unem growthpc *__________________________________________________________________________ *Heteroscedasticity and Omitted Variable Bias *calculate predicted values and residuals: reg spend unem growthpc depratio left cdem trade lowwage fdi predict spend_hat predict spend_resid, resid *Now create scatterplots for the residuals against some of the explanatory *variables and the country codes: what can you see? Omitted variable bias, *heteroskedasticity? twoway (scatter spend_resid unem) twoway (scatter spend_resid cc) twoway (scatter spend_resid growthpc) *Stata provides build in tests for omitted variable *heteroskedasticity: How to interprete the test results? bias and estat hettest estat ovtest estat szroeter unem growthpc depratio left cdem trade lowwage fdi *What can we do about Heteroskedasticity and omitted variable bias? reg spend unem growthpc depratio left cdem trade lowwage fdi skand *What can be observed, how to interpret the coefficient for "skand"? *Now do the same tests again for the new model and interpret the results: predict spend_hat2 predict spend_resid2, resid estat hettest estat ovtest twoway (scatter spend spend_hat2 if year==1984, mlabel(country)) twoway (scatter spend_resid2 cc) twoway (scatter spend_resid2 unem) estat szroeter unem growthpc depratio left cdem trade lowwage fdi skand *Just treating the standard errors: robust White SEs: reg spend unem growthpc depratio left cdem trade lowwage fdi, robust reg spend unem growthpc depratio left cdem trade lowwage fdi, vce(robust) reg spend unem growthpc depratio left cdem trade lowwage fdi, vce(cluster cc) 5 reg spend unem vce(bootstrap) growthpc depratio left cdem trade lowwage fdi, reg spend unem growthpc depratio left cdem trade lowwage fdi, vce(jacknife) reg spend unem growthpc depratio left cdem trade lowwage fdi, vce(hc2) reg spend unem growthpc depratio left cdem trade lowwage fdi, vce(hc3) *GLS: robust Huber-White Sandwich Estimator xtgls spend unem growthpc depratio left cdem trade lowwage fdi, p(h) *Panel Heteroskedasticity cd m:\ use "...path...\greene5.dta", clear log using "...path...\lab1.log", replace tsset firm year *Estimate groupwise heteroskedastic model and test for heteroskedasticity reg invest fval sval display "White test" predict res, resid gen res2 = res^2 gen fval2 = fval^2 gen sval2 = sval^2 gen fs = fval * sval reg res2 fval sval fval2 sval2 fs ereturn list scalar white = e(N)*e(r2) display white scal whitesig = chi2tail(e(df_m)-1,white) display whitesig display "Lagrange Multiplier test for groupwise heteroskedasticity in panel data" scalar s2 = e(rss)/e(N) xtgls invest fval sval, i(firm) p(h) c(i) matrix list e(Sigma) matrix sis = (1/s2)*e(Sigma) matrix list sis matrix sis1 = sis - I(5) matrix list sis1 matrix sis2 = vecdiag(sis1)' matrix list sis2 matrix sis3 = sis2'*sis2 matrix list sis3 matrix lmh = ((e(N)/e(N_g))/2)*sis3 scalar lm = lmh[1,1] display "Lagrange Multiplier statistic = " lm scalar lmsig = chi2tail((e(N_g)-1),lm) display "Significance level = " lmsig /*Estimate groupwise heteroskedastic and cross-sectionally test for cross-sectional correlation*/ correlated model and xtgls invest fval sval, i(firm) p(c) c(i) xtgls invest fval sval, i(firm) p(h) c(i) igls display "Lagrange multiplier test for cross-sectional correlation in panel data (MLE)" predict yfit , xb 6 gen resgls = invest - yfit set matsize 100 mkmat resgls, matrix(rgls) matrix egls1 = rgls[1..20,1] matrix egls2 = rgls[21..40,1] matrix egls3 = rgls[41..60,1] matrix egls4 = rgls[61..80,1] matrix egls5 = rgls[81..100,1] matrix egls = egls1, egls2, egls3, egls4, egls5 matrix list egls matrix vcv = egls' * egls matrix list vcv matrix csc = corr(vcv) matrix list csc matrix lmm = csc[2...,1]\csc[3...,2]\csc[4...,3]\csc[5,4] matrix list lmm matrix lmlm = 10 * (lmm' * lmm) scalar lml = lmlm[1,1] display "Lagrange Multiplier statistic = " lml scalar lmlsig = chi2tail(10,lml) display "Significance level (alpha) = " lmlsig display "Likelihood data" matrix si1 = (egls1' matrix si2 = (egls2' matrix si3 = (egls3' matrix si4 = (egls4' matrix si5 = (egls5' ratio test for MLE for cross-sectional correlation in panel * * * * * egls1)/20 egls2)/20 egls3)/20 egls4)/20 egls5)/20 scalar lsi = ln(si1[1,1]) ln(si5[1,1]) display lsi + ln(si2[1,1]) + ln(si3[1,1]) + ln(si4[1,1]) + quietly xtgls invest fval sval, i(firm) p(c) c(i) igls scalar ldets = ln(det(e(Sigma))) display ldets scalar llr = 20 * (lsi - ldets) display "Likelihood ratio statistic = " llr scalar llrsig = chi2tail(10,llr) display "Significance level (alpha) = " llrsig *Now reestimate the model using the Beck&Katz approach xtpcse invest fval sval, c(i) log close *__________________________________________________________________________ use "...path...\garmit_esspanel1.dta", clear *open log-file: log using "...path...\lab1.log", append *run basic model: OLS linear regression reg spend unem growthpc depratio left cdem trade lowwage fdi *test for heteroskedasticity *check for heteroskedasticity estat hettest estat szroeter unem growthpc depratio left cdem trade lowwage fdi *check for omitted variable bias 7 estat ovtest * run models dealing with panel correlation and heteroskedasticity reg spend unem growthpc depratio left cdem trade lowwage fdi, robust reg spend unem growthpc depratio left cdem trade lowwage fdi, cluster(cc) * compare different models * iid error structure xtgls spend unem growthpc depratio left cdem trade lowwage fdi, panels(iid) * heteroscedastic but uncorrelated error structure xtgls spend unem growthpc depratio left cdem trade lowwage fdi, panels(h) * heteroscedastic and correlated error structure xtgls govcons growthpc depratio left cdem trade lowwage, panels(c) * independent autocorrelation structure xtgls spend unem growthpc depratio left cdem trade lowwage fdi, corr(i) * AR1 autocorrelation structure xtgls spend unem growthpc depratio left cdem trade lowwage fdi, corr(ar1) * panel specific AR1 autocorrelation structure xtgls spend unem growthpc depratio left cdem trade lowwage fdi, corr(psar1) *Parks Kmenta xtgls govcons growthpc depratio left cdem trade lowwage, panels(c) corr(psar1) *xtpcse xtpcse spend unem growthpc depratio left cdem trade lowwage fdi xtpcse spend unem growthpc depratio left cdem trade lowwage fdi, corr(ar1) xtpcse spend unem growthpc depratio left cdem trade lowwage fdi, corr(psar1) *__________________________________________________________________________ *Autocorrelation: *now let’s look at a single time-series, e.g. Germany (or UK) – what has changed, *what is the difference to the above models? reg spend unem growthpc depratio left cdem trade lowwage fdi if country=="Germany" *Do the same tests again: what can you observe? estat hettest estat ovtest *Now let's turn to another violation of Gauss-Markov assumptions: autocorrelation, *serial correlation and test for it: Durbin-Watson statistic and Breusch-Godfrey *test, how to interprete the results? estat dwatson estat bgodfrey *Or a simpler test: what can we observe? 8 predict spend_residger, resid gen lagspend_residger=l.spend_residger reg spend_residger lagspend_residger unem growthpc depratio left cdem trade lowwage fdi if country == "Germany" *the simplest way to deal with serial correlation is to include the lagged values *of the dependent variable to the right hand side of the model – the LDV (lagged *dependent variable; BUT with including and excluding variables from the model and *do some more Omitted variable bias and heteroskedasticity tests… *Now do the same for the UK! *… *opens another can of worms… We will talk about this in more detail when we talk *about time series models): How to interpret the coefficient of the LDV? reg spend spendl country=="Germany" unem growthpc depratio left cdem trade lowwage fdi if *And do the tests again: estat dwatson estat bgodfrey *Run a Prais-Winsten model: prais spend unem country=="Germany" growthpc depratio left cdem trade lowwage fdi if *Play a little around *__________________________________________________________________________________ *test for functional form: reg spend unem growthpc depratio left cdem trade lowwage fdi acprplot unem, mspline acprplot left, mspline *etc… gen ln_unem = ln(unem) reg spend ln_unem growthpc depratio left cdem trade lowwage fdi gen sqrt_unem=unem^2 reg spend unem sqrt_unem growthpc depratio left cdem trade lowwage fdi *what effect can we observe? Calculate the “turning point”! *__________________________________________________________________________________ *Interpretatipon of a dummy variable and interaction effects: reg spend skand if year==1986 reg spend unem skand *gen skand_unem=unem*skand reg spend unem skand skand_unem reg spend unem skand skand_unem growthpc depratio left cdem trade lowwage fdi *Interaction effects: 9 gen unem_trade=unem*trade reg spend unem trade growthpc depratio left cdem lowwage fdi reg spend unem trade unem_trade growthpc depratio left cdem lowwage fdi ssc install sslope sslope spend unem trade unem_trade) trade unem_trade growthpc depratio left cdem lowwage fdi, i(unem sslope spend unem trade i(trade unem unem_trade) unem_trade growthpc depratio left cdem lowwage fdi, *graphical display of IA effects: sslope spend unem trade unem_trade growthpc depratio left cdem lowwage fdi, i(unem trade unem_trade) graph sslope spend unem trade unem_trade i(trade unem unem_trade) graph growthpc depratio left cdem lowwage fdi, sum trade unem *Program marginal effects of IA effects: *********************************************************************************** capture drop MV-lower reg spend unem trade unem_trade growthpc depratio left cdem lowwage fdi generate MV=((_n-1)*10) replace MV=. if _n>17 matrix b=e(b) matrix V=e(V) scalar b1=b[1,1] scalar b2=b[1,2] scalar b3=b[1,3] scalar varb1=V[1,1] scalar varb2=V[2,2] scalar varb3=V[3,3] scalar covb1b3=V[1,3] scalar covb2b3=V[2,3] scalar list b1 b2 b3 varb1 varb2 varb3 covb1b3 covb2b3 gen conb=b1+b3*MV if _n<=17 gen conse=sqrt(varb1+varb3*(MV^2)+2*covb1b3*MV) if _n<=171 gen a=1.96*conse gen upper=conb+a gen lower=conb-a graph twoway (line conb MV, clwidth(medium) clcolor(blue) clcolor(black))/* */(line upper MV, clpattern(dash) clwidth(thin) clcolor(black))/* */(line lower MV, clpattern(dash) clwidth(thin) clcolor(black)) , /* */ xlabel(0 20 40 60 80 100 120 140 160, labsize(2.5)) /* */ ylabel(-0.5 0 0.5 1 1.5 2, labsize(2.5)) yscale(noline) xscale(noline) legend(col(1) order(1 2) label(1 "Marginal Effect of Unemployment on Spending") label(2 "95% Confidence Interval") /* */ label(3 " ")) yline(0, lcolor(black)) title("Marginal Effect of Unemployment on Spending as Trade Openness changes", size(4))/* 10 */ subtitle(" " "Dependent Variable: Government Spending" " ", size(3)) xtitle( Trade Openness, size(3) ) xsca(titlegap(2)) ysca(titlegap(2)) /* */ ytitle("Marginal Effect of Unemployment", size(3)) scheme(s2mono) graphregion(fcolor(white)) graph export m:\...\unem_trade1.eps, replace translate @Graph m:\...\unem_trade1.wmf, replace capture drop MV-lower *********************************************************************************** *or much easier: *download grinter.ado from Fred Boehmke net from http://myweb.uiowa.edu/fboehmke/stata/grinter reg spend unem trade unem_trade growthpc depratio left cdem lowwage fdi grinter unem, inter (unem_trade) const02(trade) depvar(spend) kdensity yline(0) 11