royston_nordic11

advertisement
Taking the pain out of looping and
storing
Patrick Royston
Nordic and Baltic Stata Users’ meeting, Stockholm, 11 November 2011
Overview
• I often find myself running a command
repeatedly in a loop
• I want to save some results and store them in
new variable(s)
• A new command, looprun, is described that
automates the process in a convenient way
• It can handle a single loop, or two nested
loops
• I shall illustrate looprun using profile
likelihood functions and surfaces
1
Example 1: Single loop
• A non-standard regression in which a nonlinear parameter is to be estimated by the
profile likelihood method
• Vary the parameter over an interval, fit the
model
• Store the parameter and the resulting
deviance (-2 * log likelihood) in new variables
• Plot the deviance against the parameter and
draw inferences
2
Example 1
• Fitting a Cox regression to a variable haem
(haemoglobin) in a kidney cancer dataset
• Wish to find the best-fitting power
transformation, haemp
• Draw inferences about p
3
Conventional code to solve the problem
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
capture drop deviance
capture drop p
capture drop order
gen deviance = .
gen p = .
gen int order = _n
local i 0
quietly foreach p of numlist -3 (0.1) 0.7 {
fracgen haem `p', replace
stcox haem_1
sort order
local ++i
replace deviance = -2 * e(ll) in `i‘
replace p = `p' in `i'
}
line deviance p, sort
4
Solution using looprun
. looprun "p=-3(0.1)0.7", generate(deviance) store(-2*e(ll)) : ///
fracgen haem @, replace # ///
stcox haem_1
. line deviance p, sort
5
3165
3166
3167
-2*e(ll)
3168
3169
3170
Resulting plot
-3
-2
-1
p
0
1
6
Example 2: double loop
• A non-standard regression in which two nonlinear parameters are to be estimated by
inspecting the profile likelihood surface
• Vary both parameters over a grid, fit the
model and store the resulting deviance (-2 *
log likelihood)
• Plot the deviance against one parameter by
the values of the other parameter
• Contour plot of the deviance surface
• Requires Stata 12 twoway contour
7
Example 2
• Model is a Gaussian growth curve
• predictor = b1+b2*normal(s*(haem ‒ 12.2) + m/10)
8
Solution using looprun
. looprun "m=7 (2) 35" "s=0.2 (0.05) 2.5", ///
generate(deviance, replace) store(-2*e(ll)) : ///
capture drop z # ///
gen z = normal(@2 * (haem - 12.2) + @1/10) # ///
stcox z
9
Graphs of results
Plot deviance against s, by m
. sum deviance
. gen deviance2 = deviance - r(min)
. line deviance2 s, sort by(m)
10
Resulting “casement” plot
9
11
13
15
17
19
21
23
25
27
29
0
20 40 60
0
0
33
2
3
35
20 40 60
31
1
0
deviance2
20 40 60
0
20 40 60
7
0
1
2
3
0
1
2
3
s
Graphs by m
0
1
2
3
11
Contour plot
. replace deviance2 = min(deviance2, 20)
. twoway contour deviance2 m s, ccuts(0(1)20) ///
> yscale(r(7 35)) ylabel(10(5)35) xscale(r(.2 2.5)) ///
> xlabel(.25(.25)2.5)
12
10
15
20m
deviance2
25
30
35
Contour plot
.25
.5
.75
1
1.25 1.5
s
1.75
2
2.25
2.5
20
19
18
17
16
15
14
13
12
11
10
9
8
7
6
5
4
3
2
1
0
13
What can we learn from the contour plot?
• Parameter estimates of m and s are highly
correlated
• Re-parameterisation might help
• The MLE is located along a narrow, long
channel
• Hence the model may not be well identified
in this dataset
• The likelihood surface has some peculiarities
for low s, high m
14
Syntax of looprun
looprun "[name1=]numlist1" ["[name2=]numlist2"] , required [ options ] :
command1 [ # command2 ... ]
required
description
------------------------------------------------------------------------------------store(results_list)
results to be stored
generate(newvarlist [, replace])
names of new variable(s) to store results in
options
description
------------------------------------------------------------------------------------nodots
suppresses progress dots
nosort
do not sort data before storing results
separator(string)
character separating commands (default #)
placeholder(string)
placeholder character(s) (default @)
------------------------------------------------------------------------------------15
Main limitation: Handling macros
• Cannot assign a local or global macro within a
looprun subcommand and retrieve it for
storage
• Easiest way around this is to use scalars,
which are global
• Need care to avoid clash of scalar names with
similarly named variables
16
Conclusion
• looprun should take most of the effort out of
many simple programming tasks in Stata
• looprun can be installed via my UCL
webpage:
net from
http://www.homepages.ucl.ac.uk/~ucakjpr/stata/
17
Thank you.
18
Download