STATA. AND DO READ TO THE END In Stata anything beginning with a * is a comment line. In what follows everything in blue [the same colour as this sentence] is part of the explanation I have extracted this from one of my progras, but you will not be running this program. set mem 9000 Above sets memory in case the default is too little. log using "e:\aidvol\aidan3A.smcl" Above creates a log where your output etc is recorded. infile aidpcgni growth grdyo gdpculc colony trend id gdppc gdpcuus gdpconus lhaiddy lhgrdy emerg aidtotus area inf earthq floodetc faminedr gendisas pop popden m2 conuscon govuscon oecdgdp totdis lhaconus lhaconlc invsh impsh expsh using "http://staff.bath.ac.uk/hssjrh/stat5a.raw" This reads the data, not surprisingly when you get to the end of line, carry on typing STATA seems able to read the line no matter how long, i.e. do not press return ntil the command has been finished. Your first type infile, then the variable names (which are in columns, i.e. observation by observation) then where the data file is to be found with “using”. replace lhaiddy=lhaconlc generate trend1=trend There are two commands to generate variables in STATA. If the variable is new, i.e. not already defined then use generate or just gen. If the variable has already been defined use replace. Hence in the above we generate a new variable which is trend1 just equal to the other variable trend, i.e. the two are identical. Trend in this case takes a value of 1,2,3,4,…n till the end of the sample. Being as this is panel data this is repeated for every [in this case] country iis id tis trend1 The above two commands identify the data, id takes a value of 1 for the first country, two for the second and so on. Used when using panel data, as the computer needs to be able to identify two which country and time period each observation belongs. The iis statement does the first of these tasks. The tis, does the second. I have an idea that you always need to specify tis when ever using time series data. generate missy=aidpcgni==-999 | gdppc == -999 | id!=id[_n-1] | aidtotus ==-999 | gdpcuus ==-999 | gdpcuus[_n-1] ==-999 The above generates a variable ‘missy’ which takes a vale of 1 if aidpcgni=-999 [that’s what the == does] OR [that’s what the | does] gddpc=-999 etc. -999 is commonly used in data sets as a missing value code, i.e. the data is not available. And I will later specify that actions e.g. regressions are done which are based on a data set which excludes missing observations. id!=id[_n-1] means the value of id [the country identifier] is not equal [!=] to id in the previous observation in the data set. If id and id[_n-1] are not equal then we the current observation relates to the first for country 6 [e.g.] and the last relates to the final observation for country 5. Hence taking lagged values which I use in the regressions is not valid, hence the first observations of each country need to be excluded]. Examples of data generating statements generate lpop=log(pop) generate disastl1 =disast[_n-1] if id==id[_n-1] generate disast1m=max(floodetc1,earthq1,faminedr1,gendisas1) summarize earthq1 faminedr1 gendisas1 floodetc1 This summarizes information, mean etc, on the variables specified. * generate disast=max(floodetc,earthq,faminedr,gendisas) disast = the biggest of the variables specified, i.e. max stands for maximum replace ldisast1=log(disast1) if disast1>0 This creates ldisast1 as the log of disast if positive, if not then it remains as zero. generate totdis02 =totdis+totdis[_n-1]+totdis[_n-2] if id==id[_n-2] This creates the sum of disasters in current and previous two years, if the previous two observations relate to the same country. generate aidpcyal1=aidpcya[_n-1] if id==id[_n-2] This creates a lagged variable if the observation two periods prior to the current one is for the same country – not sure why TWO periods prior. generate lhaiddypl1=lhaiddyp[_n-1] if id==id[_n-1] This creates the lag if for the same country [note the use of ==] generate ssa =area==1 Creates a dummy variable =1 if area =1 [note: use of ==] ssa stands for sub Saharan Africa and in the data set area uniquely identifies which part of the world the country comes from generate lgdprat=log(gdpconus/gdpconus[_n-1]) if gdpconus>0 & id==id[_n-1] This variable is created if both gdpconus>0 & id==id[_n-1] are true xtreg invsh trend aidpcya lhaiddyp lhaiddyn lgdppcl2 asia samerica ssa disast1 lpopden wgrowth if missy1in==0,fe This does a panel data regression with fixed effects. Xtreg, indicates that panel data techniques are to be used. The dependent variable is the first, invsh, the right hand variables follow, this is done for a sample where missy1in equals 1 and it is done with fixed effects [fe] The sequence of commands below does the Hausman test for random vs fixed effects. I did a search on the web for hausman fixed and stata an came up with a number of references including: http://www.stata.com/help.cgi?hausman Look it up it helps explain the Hausman test and perhaps give greater clarity to what we did in the notes xtreg invsh trend aidpcya lhaiddyp lhaiddyn lgdppcl2 disasl12 lpopden wgrowth if missy2in==0,fe est store fixed xtreg invsh trend aidpcya lhaiddyp lhaiddyn lgdppcl2 disasl12 lpopden wgrowth if missy2in==0,re xttest0 hausman fixed The sequence of commands below does the Ramsey reset test. It obtains predicted values for the left hand side variable and residuals which call res. I subtract these residuals from congpr to obtain predicted values which I then square (i.e. conprs ends up being the square of the predicted value which is then used in the regression). For some reason I did not like the conprs from the statement ‘predict conprs, res’ which should result in conprs giving predicted values. xtreg congdpr trend aidpcya lhaiddyp lhaiddyn lgdppcl2 asia samerica ssa disasl12 lpopden wgrowth if missy2con==0,fe *Ramsey Reset predict conprs, res replace conprs=congdpr-res replace conprs=conprs*conprs xtreg congdpr trend aidpcya lhaiddyp lhaiddyn lgdppcl2 asia samerica ssa disasl12 lpopden wgrowth conprs if missy2con==0,fe Below does an OLS regression ‘robust’ uses ‘White’s test’ to correct standard errors and t statistics for heteroscdasticity. Its an option you do not have to have it. regress invsh trend aidpcya lhaiddyp lhaiddyn lhaiddypl1 lgdppcl2 asia samerica ssa disast1 lpopden wgrowth if missy1in==0,robust EXAMPLES Blockcopy the following and run in STATA: use http://www.ats.ucla.edu/stat/stata/webbooks/reg/elemapi regress api00 acs_k3 meals full describe codebook api00 acs_k3 meals full yr_rnd summarize api00 acs_k3 meals full histogram acs_k3 graph matrix api00 acs_k3 meals full, half twoway (scatter api00 enroll) (lfit api00 enroll) and compare with : http://www.ats.ucla.edu/stat/stata/webbooks/reg/chapter1/statareg1.htm log using "record.smcl" log using "record.log" log close