Lab 1: Introduction to using STATA 7

advertisement
*Lab 1 - 3: Stata Introduction, Tests, Violations of GM, Specification
Issues
*Intro
*Open stata, change the directory to your drive
cd c:\
*open the datafile lab1.dta
Use lab1.dta
*open a log-file to save the results:
log using name.log, [replace append]
*Note: at the end of the session you have to close the log file by: log
*close
*use commands that describe the characteristics of the variables:
descr x1 x2
list x10
sum x1 – x10
tab x9 x10
tab1 x8 x9 x10
tabstat x1 x2 x3 x4 x5 x6 y, statistics(min mean median p25 p75 max sd var
range)
*generate new variables
gen x_inter=x4*x9
egen x1_mean=mean(x1)
*look at some correlations and covariances
corr x1 x2 x3 x4 x5
corr x1 x2 x3 x4 x5, cov
*use the drop down menu to look at some histograms and density plots
twoway (scatter x1 x3)
kdensity x1, norm
* T-Tests and Correlation
/*download the file garmit_esspanel1.dta into your drive
this file contains a panel of 18 OECD countries from 1961 till 1994.
Backgroud-reading: Garrett-Mitchell paper on government spending*/
1
*open stata
*change the working directory to your m drive:
cd "c:\...\..."
*open the data file in stata:
use garmit_esspanel1.dta
*summarize the main characteristics of the interesting variables:
sum spend unem growthpc depratio left cdem trade lowwage fdi skand
*run some simple t-tests:
ttest spend=30
ttest spend=41
/*now test, whether government spending is different for different regions/
political systems */
ttest spend, by(skand)
ttest spend, by(skand) unequal
ttest spend, by(angel)
ttest spend, by(angel) unequal
/*or test whether the dispersion of government spending is different in
different regions:*/
sdtest spend, by(skand)
sdtest spend, by(angel)
/*Is the dependent variable "spend" normally distributed (enough)? */
hist spend
kdensity spend
swilk spend
sktest spend
/* calculate some simple bivariate covariances and correlations and
interpret the results, what is the problem with bivariate correlation? */
corr
corr
corr
corr
spend
spend
spend
spend
unem
left
unem, cov
left, cov
/*estimate several binary OLS models for
the regression results:*/
reg
reg
reg
reg
reg
spend
spend
spend
spend
spend
government spending and interpret
unem
left
growthpc
trade
depratio
* OLS and Violations of GM
2
*summarize the main characteristics of the interesting variables:
sum spend unem growthpc depratio left cdem trade lowwage fdi skand
*estimate a basic model for government spending and interpret the
*regression table: Interpret the regression results, coefficients, standard
*errors, confidence intervals, R², F-Test
reg spend unem growthpc depratio left cdem trade lowwage fdi
*now change the confidence levels (note the default setting is 95% levels)
set level 99
set level 90
*estimate the same regression again, what has changed, how and why?
*Set the level back to default:
set level 95
*estimate the model again:
reg spend unem growthpc depratio left cdem trade lowwage fdi
*Now run the same regression but estimate standardized coefficients – how
*to interpret standardized coefficients? Why do we need standardized
*coefficients?
reg spend unem growthpc depratio left cdem trade lowwage fdi, beta
*AV-plots (additional) variables are for multivariate models to show what
*ceteris paribus effect a single independent variable has:
avplots
avplot unem
*for linear OLS models the marginal effect of a single independent variable
*is equal for all values of the independent variable:
mfx
mfx compute, at(mean)
mfx compute, at(median)
*__________________________________________________________________________
*Multicollinearity:
*calculate correlation coefficients for all explanatory variables, do you
*find problems of multi-collinearity?
corr unem growthpc depratio left cdem trade lowwage fdi skand
*What is
*bias?
the
trade-off
between
multi-collinearity
and
omitted
variable
3
*If there is complete multicollinearity we don't have to worry, STATA does
*the job:
*Linear transformations of a variable are perfect multicollinear to the
*original variable:
gen unem2=2*unem+3
corr unem unem2
reg spend unem unem2 if year==1986
*Was does Stata do in case of perfect multicollinearity?
*Now look at two variables that
*multicollinear, what is the problem?
are
highly
correlated
but
not
gen unem3=2*unem+3*invnorm(uniform())
corr unem unem3
reg spend unem unem3 if year==1986
reg spend unem skand if year==1986
reg spend unem unem3 skand if year==1986
*run the original regression again:
reg spend unem growthpc depratio left cdem trade lowwage fdi
*Run a variance inflation factor test for higher order multicollinearity:
estat vif
estat vif, uncentered
*How to interprete the vif, what does vif measure?
* Principal component analysis:
corr trade lowwage fdi
factor trade lowwage fdi, pcf
predict score, r
corr trade lowwage fdi
*__________________________________________________________________________
*Outliers:
*test for outliers:
reg spend unem growthpc depratio left cdem trade lowwage fdi
dotplot spend
symplot spend
rvfplot
lvr2plot
lvr2plot, ml(country)
dfbeta
*Solution: jacknife and bootstrapping:
4
jacknife _b _se, eclass saving(name.dta): reg spend unem growthpc depratio
left cdem trade lowwage fdi skand
bootstrap _b _se, reps(1000) saving(name.dta):
depratio left cdem trade lowwage fdi skand
reg
spend
unem
growthpc
*__________________________________________________________________________
*Heteroscedasticity and Omitted Variable Bias
*calculate predicted values and residuals:
reg spend unem growthpc depratio left cdem trade lowwage fdi
predict spend_hat
predict spend_resid, resid
*Now create scatterplots for the residuals against some of the explanatory
*variables and the country codes: what can you see? Omitted variable bias,
*heteroskedasticity?
twoway (scatter spend_resid unem)
twoway (scatter spend_resid cc)
twoway (scatter spend_resid growthpc)
*Stata
provides
build
in
tests
for
omitted
variable
*heteroskedasticity: How to interprete the test results?
bias
and
estat hettest
estat ovtest
estat szroeter unem growthpc depratio left cdem trade lowwage fdi
*What can we do about Heteroskedasticity and omitted variable bias?
reg spend unem growthpc depratio left cdem trade lowwage fdi skand
*What can be observed, how to interpret the coefficient for "skand"?
*Now do the same tests again for the new model and interpret the results:
predict spend_hat2
predict spend_resid2, resid
estat hettest
estat ovtest
twoway (scatter spend spend_hat2 if year==1984, mlabel(country))
twoway (scatter spend_resid2 cc)
twoway (scatter spend_resid2 unem)
estat szroeter unem growthpc depratio left cdem trade lowwage fdi skand
*Just treating the standard errors: robust White SEs:
reg spend unem growthpc depratio left cdem trade lowwage fdi, robust
reg spend unem growthpc depratio left cdem trade lowwage fdi, vce(robust)
reg spend unem growthpc depratio left cdem trade lowwage fdi, vce(cluster
cc)
5
reg
spend
unem
vce(bootstrap)
growthpc
depratio
left
cdem
trade
lowwage
fdi,
reg spend unem growthpc depratio left cdem trade lowwage fdi, vce(jacknife)
reg spend unem growthpc depratio left cdem trade lowwage fdi, vce(hc2)
reg spend unem growthpc depratio left cdem trade lowwage fdi, vce(hc3)
*GLS: robust Huber-White Sandwich Estimator
xtgls spend unem growthpc depratio left cdem trade lowwage fdi, p(h)
*Panel Heteroskedasticity
cd m:\
use "...path...\greene5.dta", clear
log using "...path...\lab1.log", replace
tsset firm year
*Estimate groupwise heteroskedastic model and test for heteroskedasticity
reg invest fval sval
display "White test"
predict res, resid
gen res2 = res^2
gen fval2 = fval^2
gen sval2 = sval^2
gen fs = fval * sval
reg res2 fval sval fval2 sval2 fs
ereturn list
scalar white = e(N)*e(r2)
display white
scal whitesig = chi2tail(e(df_m)-1,white)
display whitesig
display "Lagrange Multiplier test for groupwise heteroskedasticity in panel data"
scalar s2 = e(rss)/e(N)
xtgls invest fval sval, i(firm) p(h) c(i)
matrix list e(Sigma)
matrix sis = (1/s2)*e(Sigma)
matrix list sis
matrix sis1 = sis - I(5)
matrix list sis1
matrix sis2 = vecdiag(sis1)'
matrix list sis2
matrix sis3 = sis2'*sis2
matrix list sis3
matrix lmh = ((e(N)/e(N_g))/2)*sis3
scalar lm = lmh[1,1]
display "Lagrange Multiplier statistic = " lm
scalar lmsig = chi2tail((e(N_g)-1),lm)
display "Significance level = " lmsig
/*Estimate groupwise heteroskedastic and cross-sectionally
test for cross-sectional correlation*/
correlated
model
and
xtgls invest fval sval, i(firm) p(c) c(i)
xtgls invest fval sval, i(firm) p(h) c(i) igls
display "Lagrange multiplier test for cross-sectional correlation in panel data
(MLE)"
predict yfit , xb
6
gen resgls = invest - yfit
set matsize 100
mkmat resgls, matrix(rgls)
matrix egls1 = rgls[1..20,1]
matrix egls2 = rgls[21..40,1]
matrix egls3 = rgls[41..60,1]
matrix egls4 = rgls[61..80,1]
matrix egls5 = rgls[81..100,1]
matrix egls = egls1, egls2, egls3, egls4, egls5
matrix list egls
matrix vcv = egls' * egls
matrix list vcv
matrix csc = corr(vcv)
matrix list csc
matrix lmm = csc[2...,1]\csc[3...,2]\csc[4...,3]\csc[5,4]
matrix list lmm
matrix lmlm = 10 * (lmm' * lmm)
scalar lml = lmlm[1,1]
display "Lagrange Multiplier statistic = " lml
scalar lmlsig = chi2tail(10,lml)
display "Significance level (alpha) = " lmlsig
display "Likelihood
data"
matrix si1 = (egls1'
matrix si2 = (egls2'
matrix si3 = (egls3'
matrix si4 = (egls4'
matrix si5 = (egls5'
ratio test for MLE for cross-sectional correlation in panel
*
*
*
*
*
egls1)/20
egls2)/20
egls3)/20
egls4)/20
egls5)/20
scalar lsi = ln(si1[1,1])
ln(si5[1,1])
display lsi
+
ln(si2[1,1])
+
ln(si3[1,1])
+
ln(si4[1,1])
+
quietly xtgls invest fval sval, i(firm) p(c) c(i) igls
scalar ldets = ln(det(e(Sigma)))
display ldets
scalar llr = 20 * (lsi - ldets)
display "Likelihood ratio statistic = " llr
scalar llrsig = chi2tail(10,llr)
display "Significance level (alpha) = " llrsig
*Now reestimate the model using the Beck&Katz approach
xtpcse invest fval sval, c(i)
log close
*__________________________________________________________________________
use "...path...\garmit_esspanel1.dta", clear
*open log-file:
log using "...path...\lab1.log", append
*run basic model: OLS linear regression
reg spend unem growthpc depratio left cdem trade lowwage fdi
*test for heteroskedasticity
*check for heteroskedasticity
estat hettest
estat szroeter unem growthpc depratio left cdem trade lowwage fdi
*check for omitted variable bias
7
estat ovtest
* run models dealing with panel correlation and heteroskedasticity
reg spend unem growthpc depratio left cdem trade lowwage fdi, robust
reg spend unem growthpc depratio left cdem trade lowwage fdi, cluster(cc)
* compare different models
* iid error structure
xtgls spend unem growthpc depratio left cdem trade lowwage fdi, panels(iid)
* heteroscedastic but uncorrelated error structure
xtgls spend unem growthpc depratio left cdem trade lowwage fdi, panels(h)
* heteroscedastic and correlated error structure
xtgls govcons growthpc depratio left cdem trade lowwage, panels(c)
* independent autocorrelation structure
xtgls spend unem growthpc depratio left cdem trade lowwage fdi, corr(i)
* AR1 autocorrelation structure
xtgls spend unem growthpc depratio left cdem trade lowwage fdi, corr(ar1)
* panel specific AR1 autocorrelation structure
xtgls spend unem growthpc depratio left cdem trade lowwage fdi, corr(psar1)
*Parks Kmenta
xtgls govcons growthpc depratio left cdem trade lowwage, panels(c) corr(psar1)
*xtpcse
xtpcse spend unem growthpc depratio left cdem trade lowwage fdi
xtpcse spend unem growthpc depratio left cdem trade lowwage fdi, corr(ar1)
xtpcse spend unem growthpc depratio left cdem trade lowwage fdi, corr(psar1)
*__________________________________________________________________________
*Autocorrelation:
*now let’s look at a single time-series, e.g. Germany (or UK) – what has changed,
*what is the difference to the above models?
reg spend unem growthpc depratio left cdem trade lowwage fdi if country=="Germany"
*Do the same tests again: what can you observe?
estat hettest
estat ovtest
*Now let's turn to another violation of Gauss-Markov assumptions: autocorrelation,
*serial correlation and test for it: Durbin-Watson statistic and Breusch-Godfrey
*test, how to interprete the results?
estat dwatson
estat bgodfrey
*Or a simpler test: what can we observe?
8
predict spend_residger, resid
gen lagspend_residger=l.spend_residger
reg spend_residger lagspend_residger unem growthpc depratio left cdem trade lowwage
fdi if country == "Germany"
*the simplest way to deal with serial correlation is to include the lagged values
*of the dependent variable to the right hand side of the model – the LDV (lagged
*dependent variable; BUT with including and excluding variables from the model and
*do some more Omitted variable bias and heteroskedasticity tests…
*Now do the same for the UK!
*…
*opens another can of worms… We will talk about this in more detail when we talk
*about time series models): How to interpret the coefficient of the LDV?
reg spend spendl
country=="Germany"
unem
growthpc
depratio
left
cdem
trade
lowwage
fdi
if
*And do the tests again:
estat dwatson
estat bgodfrey
*Run a Prais-Winsten model:
prais
spend
unem
country=="Germany"
growthpc
depratio
left
cdem
trade
lowwage
fdi
if
*Play a little around
*__________________________________________________________________________________
*test for functional form:
reg spend unem growthpc depratio left cdem trade lowwage fdi
acprplot unem, mspline
acprplot left, mspline
*etc…
gen ln_unem = ln(unem)
reg spend ln_unem growthpc depratio left cdem trade lowwage fdi
gen sqrt_unem=unem^2
reg spend unem sqrt_unem growthpc depratio left cdem trade lowwage fdi
*what effect can we observe? Calculate the “turning point”!
*__________________________________________________________________________________
*Interpretatipon of a dummy variable and interaction effects:
reg spend skand if year==1986
reg spend unem skand
*gen skand_unem=unem*skand
reg spend unem skand skand_unem
reg spend unem skand skand_unem growthpc depratio left cdem trade lowwage fdi
*Interaction effects:
9
gen unem_trade=unem*trade
reg spend unem
trade growthpc depratio left cdem lowwage fdi
reg spend unem
trade unem_trade growthpc depratio left cdem lowwage fdi
ssc install sslope
sslope spend unem
trade unem_trade)
trade unem_trade growthpc depratio left cdem lowwage fdi, i(unem
sslope spend unem
trade
i(trade unem unem_trade)
unem_trade
growthpc
depratio
left
cdem
lowwage
fdi,
*graphical display of IA effects:
sslope spend unem trade unem_trade growthpc depratio left cdem lowwage fdi, i(unem
trade unem_trade) graph
sslope spend unem
trade unem_trade
i(trade unem unem_trade) graph
growthpc
depratio
left
cdem
lowwage
fdi,
sum trade unem
*Program marginal effects of IA effects:
***********************************************************************************
capture drop MV-lower
reg spend unem trade unem_trade growthpc depratio left cdem lowwage fdi
generate MV=((_n-1)*10)
replace MV=. if _n>17
matrix b=e(b)
matrix V=e(V)
scalar b1=b[1,1]
scalar b2=b[1,2]
scalar b3=b[1,3]
scalar varb1=V[1,1]
scalar varb2=V[2,2]
scalar varb3=V[3,3]
scalar covb1b3=V[1,3]
scalar covb2b3=V[2,3]
scalar list b1 b2 b3 varb1 varb2 varb3 covb1b3 covb2b3
gen conb=b1+b3*MV if _n<=17
gen conse=sqrt(varb1+varb3*(MV^2)+2*covb1b3*MV) if _n<=171
gen a=1.96*conse
gen upper=conb+a
gen lower=conb-a
graph twoway (line conb
MV, clwidth(medium) clcolor(blue) clcolor(black))/*
*/(line upper MV, clpattern(dash) clwidth(thin) clcolor(black))/*
*/(line lower MV, clpattern(dash) clwidth(thin) clcolor(black)) , /*
*/ xlabel(0 20 40 60 80 100 120 140 160, labsize(2.5)) /*
*/ ylabel(-0.5 0 0.5 1 1.5 2,
labsize(2.5)) yscale(noline)
xscale(noline) legend(col(1) order(1 2) label(1 "Marginal Effect of Unemployment on
Spending") label(2 "95% Confidence Interval") /*
*/ label(3 " ")) yline(0, lcolor(black)) title("Marginal Effect of
Unemployment on Spending as Trade Openness changes", size(4))/*
10
*/ subtitle(" " "Dependent Variable: Government Spending" " ",
size(3)) xtitle( Trade Openness, size(3) ) xsca(titlegap(2)) ysca(titlegap(2)) /*
*/ ytitle("Marginal Effect of Unemployment", size(3)) scheme(s2mono)
graphregion(fcolor(white))
graph export
m:\...\unem_trade1.eps, replace
translate @Graph m:\...\unem_trade1.wmf, replace
capture drop MV-lower
***********************************************************************************
*or much easier:
*download grinter.ado from Fred Boehmke
net from http://myweb.uiowa.edu/fboehmke/stata/grinter
reg spend unem
trade unem_trade growthpc depratio left cdem lowwage fdi
grinter unem, inter (unem_trade) const02(trade) depvar(spend) kdensity yline(0)
11
Download