Programs and Simulation in Stata (for ch. 2) First version: make lognormal data, and summarize the data. This defines a program, then runs it. program define lognormaldata drop _all set obs 100 gen mynewvar = exp(rnormal(0,1)) summarize mynewvar end lognormaldata Note on local variables: in Stata, a local variable is a string of text, or number, used in a program but forgotten when the program ends. An example is below. Each local variable has a name, such as localname. The value of the variable is plugged in wherever the text `localname’ appears. Note the difference between left and right quotes around the name. program define usealocal local i = 3 display `i'+7 end usealocal Second version: add optional parameters for the number of observations, mu, & sigma. The syntax command tells how options (and variable names, if statements, etc.) are interpreted when running the program. The numbers 2, 0, and 1 below are default values. By the way, we drop the program before defining it again, to avoid an error because you can’t overwrite an existing program. program drop lognormaldata program define lognormaldata syntax [, obs(integer 2) mu(real 0) sigma(real 1) ] drop _all set obs `obs' gen mynewvar = exp(rnormal(`mu',`sigma')) summarize mynewvar end drop mynewvar lognormaldata browse drop mynewvar lognormaldata , obs(200) mu(0.5) sigma(1.2) summarize return list // The “return list” command shows results like r(mean) saved by summarize. browse Third version: use a temporary variable instead of mynewvar, so we don't leave clutter. Then no variable needs to be dropped. This uses Stata’s “tempvar tempname” command. When Stata gets this command, it makes up a new variable name and stores the new name in the local macro tempname. If any variable has been created with this name when the program finishes, the variable will be dropped. That’s why we call it a temporary variable – it disappears when the program finishes. program drop lognormaldata program define lognormaldata syntax [, obs(integer 2) mu(real 0) sigma(real 1) ] drop _all set obs `obs' tempvar z gen `z' = exp(rnormal(`mu',`sigma')) summarize `z' end lognormaldata Fourth version: Return results from the program. The “rclass” option to the “program define” command says this program can return results (it is in the class of programs that can return results). At the end of the program, the two lines beginning with the words “return scalar” actually return the results, in scalar numbers named r(Mean) and r(Var), equal to results returned by the summarize command. Also, we'll give the program a different name now, lognormalstats instead of lognormaldata. Also, the “version 11” command tells Stata this was created in Stata version 11 – if Stata commands change the program will still use the old behaviors of those commands. program define lognormalstats, rclass version 11 syntax [, obs(integer 2) mu(real 0) sigma(real 1) ] drop _all set obs `obs' tempvar z gen `z' = exp(rnormal(`mu',`sigma')) summarize `z' return scalar myMean = r(mean) return scalar myVar = r(Var) end lognormalstats return list Run a simulation with 10,000 replications of making the data, and save as data the means and variances found each of the 10,000 times. Then, summarize the data. simulate mean=r(myMean) var=r(myVar), reps(10000): lognormalstats, obs(100) summarize Is the sample mean a consistent estimate of the true mean, which is exp(mu + ½ sigma^2)? Is the sample variance a consistent estimate of the true variance, which is (exp(sigma^2) – 1) * exp(2*mu + sigma^2)?