set mem 9000

advertisement
STATA. AND DO READ TO THE END
In Stata anything beginning with a * is a comment line. In what follows everything in blue
[the same colour as this sentence] is part of the explanation I have extracted this from
one of my progras, but you will not be running this program.
set mem 9000
Above sets memory in case the default is too little.
log using "e:\aidvol\aidan3A.smcl"
Above creates a log where your output etc is recorded.
infile aidpcgni growth grdyo gdpculc colony trend id gdppc gdpcuus
gdpconus lhaiddy lhgrdy emerg aidtotus area inf earthq floodetc faminedr
gendisas pop popden m2 conuscon govuscon oecdgdp totdis lhaconus
lhaconlc invsh impsh expsh using "http://staff.bath.ac.uk/hssjrh/stat5a.raw"
This reads the data, not surprisingly when you get to the end of line, carry on typing
STATA seems able to read the line no matter how long, i.e. do not press return ntil the
command has been finished. Your first type infile, then the variable names (which are in
columns, i.e. observation by observation) then where the data file is to be found with
“using”.
replace lhaiddy=lhaconlc
generate trend1=trend
There are two commands to generate variables in STATA. If the variable is new, i.e. not
already defined then use generate or just gen. If the variable has already been defined use
replace. Hence in the above we generate a new variable which is trend1 just equal to the
other variable trend, i.e. the two are identical. Trend in this case takes a value of
1,2,3,4,…n till the end of the sample. Being as this is panel data this is repeated for every
[in this case] country
iis id
tis trend1
The above two commands identify the data, id takes a value of 1 for the first country, two
for the second and so on. Used when using panel data, as the computer needs to be able to
identify two which country and time period each observation belongs. The iis statement
does the first of these tasks. The tis, does the second. I have an idea that you always need
to specify tis when ever using time series data.
generate missy=aidpcgni==-999 | gdppc == -999 | id!=id[_n-1] | aidtotus ==-999 | gdpcuus
==-999 | gdpcuus[_n-1] ==-999
The above generates a variable ‘missy’ which takes a vale of 1 if aidpcgni=-999 [that’s
what the == does] OR [that’s what the | does] gddpc=-999 etc. -999 is commonly used in
data sets as a missing value code, i.e. the data is not available. And I will later specify that
actions e.g. regressions are done which are based on a data set which excludes missing
observations. id!=id[_n-1] means the value of id [the country identifier] is not equal [!=] to
id in the previous observation in the data set. If id and id[_n-1] are not equal then we the
current observation relates to the first for country 6 [e.g.] and the last relates to the final
observation for country 5. Hence taking lagged values which I use in the regressions is
not valid, hence the first observations of each country need to be excluded].
Examples of data generating statements
generate lpop=log(pop)
generate disastl1 =disast[_n-1] if id==id[_n-1]
generate disast1m=max(floodetc1,earthq1,faminedr1,gendisas1)
summarize earthq1 faminedr1 gendisas1 floodetc1
This summarizes information, mean etc, on the variables specified.
*
generate disast=max(floodetc,earthq,faminedr,gendisas)
disast = the biggest of the variables specified, i.e. max stands for
maximum
replace ldisast1=log(disast1) if disast1>0
This creates ldisast1 as the log of disast if positive, if not then it
remains as zero.
generate totdis02 =totdis+totdis[_n-1]+totdis[_n-2] if id==id[_n-2]
This creates the sum of disasters in current and previous two years, if
the previous two observations relate to the same country.
generate aidpcyal1=aidpcya[_n-1] if id==id[_n-2]
This creates a lagged variable if the observation two periods prior to the current one is for
the same country – not sure why TWO periods prior.
generate lhaiddypl1=lhaiddyp[_n-1] if id==id[_n-1]
This creates the lag if for the same country [note the use of ==]
generate ssa =area==1
Creates a dummy variable =1 if area =1 [note: use of ==] ssa stands for sub Saharan
Africa and in the data set area uniquely identifies which part of the world the country
comes from
generate lgdprat=log(gdpconus/gdpconus[_n-1]) if gdpconus>0 & id==id[_n-1]
This variable is created if both gdpconus>0 & id==id[_n-1] are true
xtreg invsh trend aidpcya lhaiddyp lhaiddyn lgdppcl2 asia samerica ssa disast1 lpopden
wgrowth if missy1in==0,fe
This does a panel data regression with fixed effects. Xtreg, indicates that panel data
techniques are to be used. The dependent variable is the first, invsh, the right hand
variables follow, this is done for a sample where missy1in equals 1 and it is done with
fixed effects [fe]
The sequence of commands below does the Hausman test for random vs fixed effects. I
did a search on the web for hausman fixed and stata an came up with a number of
references including: http://www.stata.com/help.cgi?hausman
Look it up it helps explain the Hausman test and perhaps give greater clarity to what we
did in the notes
xtreg invsh trend aidpcya lhaiddyp lhaiddyn lgdppcl2 disasl12 lpopden wgrowth if
missy2in==0,fe
est store fixed
xtreg invsh trend aidpcya lhaiddyp lhaiddyn lgdppcl2 disasl12 lpopden wgrowth if
missy2in==0,re
xttest0
hausman fixed
The sequence of commands below does the Ramsey reset test. It obtains predicted values
for the left hand side variable and residuals which call res. I subtract these residuals from
congpr to obtain predicted values which I then square (i.e. conprs ends up being the
square of the predicted value which is then used in the regression). For some reason I did
not like the conprs from the statement ‘predict conprs, res’ which should result in conprs
giving predicted values.
xtreg congdpr trend aidpcya lhaiddyp lhaiddyn lgdppcl2 asia samerica ssa disasl12
lpopden wgrowth if missy2con==0,fe
*Ramsey Reset
predict conprs, res
replace conprs=congdpr-res
replace conprs=conprs*conprs
xtreg congdpr trend aidpcya lhaiddyp lhaiddyn lgdppcl2 asia samerica ssa disasl12
lpopden wgrowth conprs if missy2con==0,fe
Below does an OLS regression ‘robust’ uses ‘White’s test’ to correct standard errors and t
statistics for heteroscdasticity. Its an option you do not have to have it.
regress invsh trend aidpcya lhaiddyp lhaiddyn lhaiddypl1 lgdppcl2 asia samerica ssa
disast1 lpopden wgrowth if missy1in==0,robust
EXAMPLES
Blockcopy the following and run in STATA:
use http://www.ats.ucla.edu/stat/stata/webbooks/reg/elemapi
regress api00 acs_k3 meals full
describe
codebook api00 acs_k3 meals full yr_rnd
summarize api00 acs_k3 meals full
histogram acs_k3
graph matrix api00 acs_k3 meals full, half
twoway (scatter api00 enroll) (lfit api00 enroll)
and compare with :
http://www.ats.ucla.edu/stat/stata/webbooks/reg/chapter1/statareg1.htm
log using "record.smcl"
log using "record.log"
log close
Download