Taking the pain out of looping and storing Patrick Royston Nordic and Baltic Stata Users’ meeting, Stockholm, 11 November 2011 Overview • I often find myself running a command repeatedly in a loop • I want to save some results and store them in new variable(s) • A new command, looprun, is described that automates the process in a convenient way • It can handle a single loop, or two nested loops • I shall illustrate looprun using profile likelihood functions and surfaces 1 Example 1: Single loop • A non-standard regression in which a nonlinear parameter is to be estimated by the profile likelihood method • Vary the parameter over an interval, fit the model • Store the parameter and the resulting deviance (-2 * log likelihood) in new variables • Plot the deviance against the parameter and draw inferences 2 Example 1 • Fitting a Cox regression to a variable haem (haemoglobin) in a kidney cancer dataset • Wish to find the best-fitting power transformation, haemp • Draw inferences about p 3 Conventional code to solve the problem . . . . . . . . . . . . . . . . capture drop deviance capture drop p capture drop order gen deviance = . gen p = . gen int order = _n local i 0 quietly foreach p of numlist -3 (0.1) 0.7 { fracgen haem `p', replace stcox haem_1 sort order local ++i replace deviance = -2 * e(ll) in `i‘ replace p = `p' in `i' } line deviance p, sort 4 Solution using looprun . looprun "p=-3(0.1)0.7", generate(deviance) store(-2*e(ll)) : /// fracgen haem @, replace # /// stcox haem_1 . line deviance p, sort 5 3165 3166 3167 -2*e(ll) 3168 3169 3170 Resulting plot -3 -2 -1 p 0 1 6 Example 2: double loop • A non-standard regression in which two nonlinear parameters are to be estimated by inspecting the profile likelihood surface • Vary both parameters over a grid, fit the model and store the resulting deviance (-2 * log likelihood) • Plot the deviance against one parameter by the values of the other parameter • Contour plot of the deviance surface • Requires Stata 12 twoway contour 7 Example 2 • Model is a Gaussian growth curve • predictor = b1+b2*normal(s*(haem ‒ 12.2) + m/10) 8 Solution using looprun . looprun "m=7 (2) 35" "s=0.2 (0.05) 2.5", /// generate(deviance, replace) store(-2*e(ll)) : /// capture drop z # /// gen z = normal(@2 * (haem - 12.2) + @1/10) # /// stcox z 9 Graphs of results Plot deviance against s, by m . sum deviance . gen deviance2 = deviance - r(min) . line deviance2 s, sort by(m) 10 Resulting “casement” plot 9 11 13 15 17 19 21 23 25 27 29 0 20 40 60 0 0 33 2 3 35 20 40 60 31 1 0 deviance2 20 40 60 0 20 40 60 7 0 1 2 3 0 1 2 3 s Graphs by m 0 1 2 3 11 Contour plot . replace deviance2 = min(deviance2, 20) . twoway contour deviance2 m s, ccuts(0(1)20) /// > yscale(r(7 35)) ylabel(10(5)35) xscale(r(.2 2.5)) /// > xlabel(.25(.25)2.5) 12 10 15 20m deviance2 25 30 35 Contour plot .25 .5 .75 1 1.25 1.5 s 1.75 2 2.25 2.5 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 13 What can we learn from the contour plot? • Parameter estimates of m and s are highly correlated • Re-parameterisation might help • The MLE is located along a narrow, long channel • Hence the model may not be well identified in this dataset • The likelihood surface has some peculiarities for low s, high m 14 Syntax of looprun looprun "[name1=]numlist1" ["[name2=]numlist2"] , required [ options ] : command1 [ # command2 ... ] required description ------------------------------------------------------------------------------------store(results_list) results to be stored generate(newvarlist [, replace]) names of new variable(s) to store results in options description ------------------------------------------------------------------------------------nodots suppresses progress dots nosort do not sort data before storing results separator(string) character separating commands (default #) placeholder(string) placeholder character(s) (default @) ------------------------------------------------------------------------------------15 Main limitation: Handling macros • Cannot assign a local or global macro within a looprun subcommand and retrieve it for storage • Easiest way around this is to use scalars, which are global • Need care to avoid clash of scalar names with similarly named variables 16 Conclusion • looprun should take most of the effort out of many simple programming tasks in Stata • looprun can be installed via my UCL webpage: net from http://www.homepages.ucl.ac.uk/~ucakjpr/stata/ 17 Thank you. 18