Computing for Research I Spring 2014 Stata Programming March 5 Primary Instructor: Elizabeth Garrett-Mayer Some simple programming • Once again, princeton’s site has some great easy info: http://data.princeton.edu/stata/programming.aspx • We will discuss a few things: – ‘macros’ – looping – writing commands • We will not discuss ‘mata’: powerful matrix programming language macros • macro = a name associated with some text. • macros can be local or global in scope. • Example of use: shorthand for repeated phrase – graphics title – set of ‘adjustment’ covariates • syntax: local name content command Name of macro “guts” of macro Example: covariates * use SCBC data use "I:\Classes\StatComputingI\SCBC2004.dta", clear * make tumor numeric and transform gen sizen=real(tumor) gen logsize = log(sizen) replace logsize = . if sizen==999 regress logsize age black graden *define local macro local adjusters age black graden regress logsize `adjusters' NOTE: must use accent (`) in upper left of keyboard as beginning quote and apostrophe (‘) (next to enter key) for end quote. regress logsize `adjusters' i.ercat regress logsize `adjusters' i.prcat regress logsize `adjusters' i.ercat i.prcat More examples local erprknown ercat<9 & prcat<9 regress logsize `adjusters' i.ercat i.prcat if `erprknown‘ • An important property of the local macros, and the reason they are called "local", is that they only exist within the process where they were defined • This means when you highlight and run from a ‘do’ file, all of the local definitions need to be defined in the highlighted portion. • Stata will NOT remember locals defined from earlier calls to the do file! Example: titles * another example infile str14 country setting effort change /// using http://data.princeton.edu/wws509/datasets/effort.raw, clear graph twoway (lfitci change setting) /// (scatter change setting) /// , title("Fertility Decline by Social Setting") /// ytitle("Fertility Decline") /// legend(ring(0) pos(5) order(2 "linear fit" 1 "95% CI")) local gtitles title("Fertility Decline by Social Setting") ytitle("Fertility Decline") * with macro graph twoway (lfitci change setting) /// (scatter change setting) /// , `gtitles' legend(ring(0) pos(5) order(2 "linear fit" 1 "95% CI")) * without macro graph twoway (lfitci change setting) /// (scatter change setting) /// , legend(ring(0) pos(5) order(2 "linear fit" 1 "95% CI")) Storing results • Stata commands (and new commands that you and others write) can be classified as follows: – r-class: General commands such as summarize. Results are returned in r() and generally must be used/saved before executing more commands. – e-class: Estimation commands such as regress, logistic etc., that fit statistical models. Results are returned in e() and remain there until the next model is estimated. (continued) – s-class: Programming commands that assist in parsing. These commands are relatively rare. Results are returned in s(). – n-class: Commands that do not save results at all, such as generate and replace. – c-class: Values of system parameters and settings and certain constants (such as the value of π) which are contained in c(). Accessing returned values • return list, ereturn list, sreturn list and creturn list return all the values contained in the r(), e(), s() and c() vectors, respectively. • For example, after using summarize, r() will contain r(N), r(mean), r(sd), r(sum) etc. • Elements of each of the vectors can be used when creating new variables. They can also be saved as macros. Using regression results Although coefficients and standard errors from the most recent model are saved in e(), it is quicker to refer to them by using _b[varname] and _se[varname], respectively. regress change setting effort gen fitvals = setting*_b[setting] + effort*_b[effort] _cons*_b[_cons] predict fit + Storing results * run regression and store r-squared value regress change setting local rsq = e(r2) display rsq * run new regression regress change setting effort display e(r2) *see old saved r-squared display rsq * still there if you run it ALL in the same call to do file Saving matrix results matrix list e(b) matrix list e(V) matrix betamodel1=get(_b) matrix list betamodel1 * help matrix get Global macros • Global macros have names of up to 32 characters and, as the name indicates, have global scope. • You define a global macro using global name [=] text and evaluate it using $name. (You may need to use ${name} to clarify where the name ends.) • “I suggest you avoid global macros because of the potential for name conflicts.” • A useful application, however, is to map the function keys on your keyboard. If you work on a shared network folder with a long name try something like this: global F5 \\server\shared\research\project\subproject\ • Then when you hit F5, Stata will substitute the full name. And your do files can use commands like do ${F5}dofile. (We need the braces to indicate that the macro is called F5, not F5dofile.) More on macros • Macros can also be used to obtain and store information about the system or the variables in your dataset using extended macro functions. • For example you can retrieve variable and value labels, a feature that can come handy in programming. • There are also commands to manage your collection of macros, including macro list and macro drop. Check out help macro to learn more. Looping • foreach: loops over a set of variables • forvalues: loops over a set of values (index) • Also: – while loops – if and else sets of commands Programming • ‘ado’ files • create commands in ado file and put them in the appropriate directory for Stata to find • Can also create them in do files for local use • See – http://data.princeton.edu/stata/programming.html – www.ssc.upenn.edu/scg/stata/stata-programming-1.ppt – http://www.ssc.wisc.edu/sscc/pubs/stata_prog2.htm Ado files • An ado-file (“automatic do-file”) is a do-file that defines a Stata command. It has the file extension .ado. • Not all Stata commands are defined by ado-files: some are built-in commands. • The difference between a do-file and an ado-file is that when the name of the latter is typed as a Stata command, Stata will search for and run that file. • For example, the program mysum could be saved in mysum.ado and used in future sessions Ado files • Ado-files often have help (.hlp) files associated with them. • There are three main sources of ado-files: – Official updates from StataCorp. – User-written additions (e.g. from the Stata Journal). – Ado-files that you have written yourself. • Stata stores these in different locations, which can be reviewed by typing sysdir. Ado files • Official updates are saved in the folder associated with UPDATES. • User-written additions are saved in the folder associated with PLUS. • Ado-files written by yourself should be saved in the folder associated with PERSONAL. • If you have an Internet connection, official updates and user-written ado-files can be installed easily. • To install official updates, type: update from http://www.stata.com