Logic Argument of Research Article

advertisement
Chapter 1-10. Programming Stata
In this chapter, we will see how to write programs in Stata.
These programs are typically saved as “ado” files. An “ado” file, is simply a file of Stata
commands saved with the “ado” file extension and contains “end” on the last line of the file.
Since all of the commands in Stata are implemented as an “ado” file, a good source for example
Stata code is to think of a command that does something similar to what you want to do, and then
go look at the Stata code for that command.
Viewing (but cannot edit) an ado-file
This is done with the viewsource command. For example, to see how the ttest command was
written, open up the ado file in a read-only editor using,
viewsource ttest.ado
After the file is open, you can highlight it, and cut-and-paste it into the do-file editor so you have
the sample code available to you when writing your own programs.
Viewing (but cannot edit) a help file
This is a very nice application of the viewsource command, because it displays how the special
markup features of the help file were set up, so you can do the same thing in your own help files.
For example, to see Stata’s template for help files, which was designed to you started with
developing your own help files to look like official Stata help files, use
viewsource examplehelpfile.hlp
To see what the file looks like when it is executed, use
help examplehelpfile
Finding where an ado file is
If you are curious where a particular do-file is stored on your computer, you can do this using the
findfile command.
findfile ttest.ado
C:\Program Files\Stata9\ado\base/t/ttest.ado
_____________________
Source: Stoddard GJ. Biostatistics and Epidemiology Using Stata: A Course Manual [unpublished manuscript] University of Utah
School of Medicine, 2010.
Chapter 1-10 (revision 16 May 2010)
p. 1
Finding the directories where Stata looks for ado files
To see the order in which Stata searches directories when a command is executed, use
adopath
[1]
[2]
[3]
[4]
[5]
[6]
[7]
(UPDATES)
(BASE)
(SITE)
(PERSONAL)
(PLUS)
(OLDPLACE)
"C:\Program Files\Stata9\ado\updates/"
"C:\Program Files\Stata9\ado\base/"
"C:\Program Files\Stata9\ado\site/"
"."
"c:\ado\personal/"
"c:\ado\plus/"
"c:\ado/"
The “.” directory is the “current directory, shown in the lower left-hand corner of the Stata
window. Usually you would store your own commands in the PERSONAL directory, which is
supposed to not be overwritten when you install a new version of Stata.
Smart Quotes
By the way, the smart quotes, “ ” , that Microsoft Word uses, cannot be interpreted by Stata.
So, if you cut-and-paste the following to the command window,
display “stuff”
you get the following error message:
“stuff” invalid name
r(198);
A way to get around this, is to copy it into the do-file editor. The do-file editor changes it into
regular quotes, which looks like,
display "stuff"
That command can then be executed inside the do-file editor, or cut-and-pasted in the Command
window to be executed.
Chapter 1-10 (revision 16 May 2010)
p. 2
Executing do files from the command line
First, decide on the directory where you want to save the do-file to, and change to that directory.
If you put it on your desktop, the directory might be something like the following, with “Greg”
replaced by your username.
cd "C:\Documents and Settings\Greg\Desktop\StataCourse\practice"
Now, put the commands you want to run as a batch in a do-file.
For example, click on the do-file menu bar icon, which brings up a new do-file. Type the
following,
display “Hey, it worked!”
Then, do a “save as” to the file name
program1.do
saving it to the current directory you “cd” to above.
Now, in the Command window, execute the command,
do program1
which executes all of the commands in do-file, and then returns control to the Command
window.
This do-file, program1.do, is a simple program.
It does not mimic a “command”, however, because it requires that you put “do” in front of the
do-file name in order to execute it.
Chapter 1-10 (revision 16 May 2010)
p. 3
Converting your do-file into an ado-file
To turn a do-file into an ado-file, you simply add a “program define” on the first line and an
“end” on the last line.
Open the file program1.do inside the do-file editor, and change it to:
program define amazing
display "Hey, it worked!"
end
The indention on the second line (or all lines between program and end) is not necessary, but it
helps to remind you that you are inside of the program-end combination.
Save it as amazing.ado, instead of program1.do.
On the command line, enter
amazing
Hey, it worked!
You have just extended Stata to include a new command called amazing.
Chapter 1-10 (revision 16 May 2010)
p. 4
Adding some color
We can get the display command to output in different colors, similar to what Stata does.
text = green
result = yellow
error = red
input = white
Open the file amazing.ado inside the do-file editor, and change it to:
program define amazing
display as text "Hey, it worked!"
display as result "Hey, it worked!"
display as error "Hey, it worked!"
display as input "Hey, it worked!"
end
On the command line, enter
amazing
Hey, it worked!
Even though we made a change, the older version is still executing. This is because Stata loads
programs in memory, and continues to execute the original version stored in Stata memory, even
though the file amazing.ado has changed on the hard drive.
It is necessary to drop a program from memory, using the program drop command, before we
change it.
Chapter 1-10 (revision 16 May 2010)
p. 5
Dropping a program from memory
What I like to do is add that as the first line of my ado file, just to avoid this step every time I
make a change. Once the program is fully developed, you can drop that command to avoid a user
dropping his own program by the same name already in memory.
Open the file amazing.ado inside the do-file editor, and add theh program drop command on the
first line. Precede it by “capture” so it runs even if the program is not already loaded in memory.
capture program drop amazing
program define amazing
display as text "Hey, it worked!"
display as result "Hey, it worked!"
display as error "Hey, it worked!"
display as input "Hey, it worked!"
end
On the command line, enter
program drop amazing
We have to first drop it from memory, if it is there, using the Command window, since if we just
run the command amazing, the old version loaded in memory continues to run.
Now if we run the program again, it finds it on the hard drive and runs our updated version
amazing
Hey,
Hey,
Hey,
Hey,
it
it
it
it
worked!
worked!
worked!
worked!
Chapter 1-10 (revision 16 May 2010)
p. 6
Running a program inside a do-file
Sometimes it is nicer to just define the program inside a do-file, and then execute it inside the dofile. One advantage is that the program code is displayed right there where we run, which is nice
documentation. Another advantage is debugging is faster because we don’t have to keep going
back and forth between the do-file and the command line.
Let’s try it.
With the file amazing.ado in the do-file editor, save it as program2.do. (not .ado).
Next add a few blank lines and then put amazing as a command to call the program. (These are
in chapter10.do)
capture program drop amazing
program define amazing
display as text "Hey, it worked!"
display as result "Hey, it worked!"
display as error "Hey, it worked!"
display as input "Hey, it worked!"
end
amazing
Highlight the entire do-file and hit the “do current file” icon (third icon from the right) inside the
do-file editor to execute it.
It executes as expected.
Doing it this way, the program is loaded into Stata memory and is available for the entire Stata
session, unless you drop it. The whole step of making it an ado-file is avoided. Sometimes this
is nice, and sometimes it is easier to use an ado-file so it’s avialable instantly for all your
projects.
Chapter 1-10 (revision 16 May 2010)
p. 7
Writing a program to optimize test characteristics
We are now going to work through a rather complicated example for a very practical problem.
It is a somewhat common research problem, or quality improvement problem, to determine the
optimal cut-point for a continuous (interval scaled) diagnostic test variable to provide the best
test characteristics (see box), such as sensitivity and specificity.
For example, Carpenter et al (1995) did this to discover that 60% or greater carotid artery
stenosis by duplex Doppler ultrasonography provided the best test characteristics when compared
to the gold standard arteriography.
Test Characteristics
With the data in the required form for Stata:
Gold Standard “true value”
disease present ( + )
disease absent ( - )
Test “probable value”
disease present ( + )
disease absent ( - )
a (true positives)
b (false negatives)
c (false positives)
d (true negatives)
a+c
b+d
a+b
c+d
We define the following terminology (Lilienfeld, 1994, p. 118-124), expressed as percents:
sensitivity = (true positives)/(true positives plus false negatives)
= (true positives)/(all those with the disease)
= a / (a + b) 100
specificity = (true negatives)/(true negatives plus false positives)
= (true negatives)/(all those without the disease)
= d / (c + d)  100
Sensitivity and specificity provide information about the accuracy (validity) of a test. Positive
and negative predictive values provide information about the meaning to the test results.
The probability of disease being present given a positive test result is the positive predictive
value (Lilienfeld, 1994, p. 118-124):
positive predictive value = (true positives)/(true positives plus false positives)
= (true positives)/(all those with a positive test result)
= a / (a + c)  100
The probability of no disease being present given a negative test result is the negative predictive
value (Lilienfeld, 1994, p. 118-124):
negative predictive value = (true negatives)/(true negatives plus false negatives)
= (true negatives)/(all those with a negative test result)
Chapter 1-10 (revision 16 May 2010)
p. 8
= d / (b + d)  100
“Unlike sensitivity and specificity, the positive and negative predictive values of a test depend on
the prevalence rate of disease in the population. …For a test of given sensitivity and specificity,
the higher the prevalence of the disease, the greater the positive predictive value and the lower
the negative predictive value.” (Lilienfeld, 1994, p. 122-123)
The overall accuracy, or simply accuracy, is simply the proportion of correct test decisions, and
is defined as (without citation for now, Stoddard just knows this)
overall accuracy = (true postives plus true negative)/(all tests)
= (a + d)/(a + b + c + d)
The area under the receiver operating characteristic curve, or simply ROC, for a dichotomous
test and gold standard variable, or 2 × 2 table, is simply the simple average of the sensitivity and
specificity (without citation for now, Stoddard just knows this)
ROC = (sensitivity + specificity)/2
We will practice will the AngioData.dta file (see box)
AngioData.dta dataset
This file contains n=172 deindentified pairs of measurements provided by an anomonous
researcher, with two continuous scored measurements of carotid artery stenosis.
angio
icapsv
Gold Standard: arteriography (arteriographic stenosis)
Diagnostic Test: internal carotid artery peak systolic velocity (PSVICA)
AngioData
Opening the data file, which is already in the the StataCourse\practice subdirectory,
use angiodata, clear
First, we will dichotomize the angio variable into 60% or greater carotid artery stenosis.
recode angio 0/59=0 60/100=1 .=., gen(gold)
For a first guess at a cutpoint for icapsv, we will use the mean
sum icapsv
Chapter 1-10 (revision 16 May 2010)
p. 9
Variable |
Obs
Mean
Std. Dev.
Min
Max
-------------+-------------------------------------------------------icapsv |
172
172.5407
141.6103
0
575
Defining a dichotomized icapsv variable,
gen test = cond(icapsv>=172,1,0)
replace test=. if icapsv==.
and adding variable labels,
label variable gold "angio"
label variable test "icapsv"
To calculate the diagnostic test characteristics, we must first update Stata to include the diagt
command. While connected to the internet,
findit diagt
-------------------------------------------------------------------------------------search for diagt
(manual:
[R] search)
-------------------------------------------------------------------------------------Keywords:
Search:
diagt
(1) Official help files, FAQs, Examples, SJs, and STBs
(2) Web resources from Stata and from other users
Search of official help files, FAQs, Examples, SJs, and STBs
SJ-4-4
sbe36_2 . . . . . . . . . . . . . . . . . . Software update for diagt
(help diagt if installed) . . . . . . . . . . P. T. Seed and A. Tobias
Q4/04
SJ 4(4):490
new options added to diagt
STB-59
sbe36.1 . . . . . . . . . . . Summary statistics for diagnostic tests
(help diagt if installed) . . . . . . . . . . P. T. Seed and A. Tobias
1/01
pp.9--12; STB Reprints Vol 10, pp.90--93
complete revision of diagtest to assess a simple diagnostic
test in comparison with a reference standard; uses the exact
binomial distribution and provides diagti, an immediate
version of the command
Click on the sbe36_2 link to install the ado file.
If you are not connected to the internet, that is okay. The four files it adds (diagt.ado, diagt.hlp,
diagti.ado, diagti.hlp) are in the StataCourse\practice subdirectory, which is now your current
directory, so Stata will find these commands.
Chapter 1-10 (revision 16 May 2010)
p. 10
Computing the test characteristics,
diagt gold test
|
icapsv
angio |
Pos.
Neg. |
Total
-----------+----------------------+---------Abnormal |
44
12 |
56
Normal |
16
100 |
116
-----------+----------------------+---------Total |
60
112 |
172
True abnormal diagnosis defined as gold = 1
[95% Confidence Interval]
--------------------------------------------------------------------------Prevalence
Pr(A)
33%
26%
40.1%
--------------------------------------------------------------------------Sensitivity
Pr(+|A)
78.6%
65.6%
88.4%
Specificity
Pr(-|N)
86.2%
78.6%
91.9%
ROC area
(Sens. + Spec.)/2
.824
.761
.887
--------------------------------------------------------------------------Likelihood ratio (+)
Pr(+|A)/Pr(+|N)
5.7
3.54
9.16
Likelihood ratio (-)
Pr(-|A)/Pr(-|N)
.249
.15
.413
Odds ratio
LR(+)/LR(-)
22.9
10.1
52.1
Positive predictive value
Pr(A|+)
73.3%
60.3%
83.9%
Negative predictive value
Pr(N|-)
89.3%
82%
94.3%
---------------------------------------------------------------------------
This looks like a pretty good guess for a cutpoint for icapsv.
To do this for every possible cutpoint for icapsv, we could simply put the commands inside a
loop. Let’s begin to build a program inside the do-file, and run it for the first three values of
icapsv.
capture program drop optcut
program define optcut
foreach num of numlist 0 21 36 {
capture drop test
gen test = cond(icapsv>`num’,1,0)
replace test=. if icapsv==.
diagt gold test
}
end
optcut
That worked, but let’s turn scrolling off.
capture program drop optcut
program define optcut
set more off
foreach num of numlist 0 21 36 {
capture drop test
gen test = cond(icapsv>`num’,1,0)
replace test=. if icapsv==.
diagt gold test
}
set more on
end
optcut
Chapter 1-10 (revision 16 May 2010)
p. 11
Let’s make it so it will always work, no matter what the v ariables names are, by passing the gold
and test variables as parameters.
capture program drop optcut
program define optcut
args gold test
local _test
set more off
foreach num of numlist 0 21 36 {
capture drop _test
gen _test = cond(`test’ >`num’,1,0)
replace _test =. if test ==.
diagt `gold’ _test
}
set more on
end
optcut gold icapsv
Notice we named variables created by our program to begin with “_”, similar to what Stata does,
to inform the user that it was created by the program.
The “args” command informs Stata what variables are being based. If more than these two
variables are passed, the additional variables are set to missing. If less than two variables are
passed, the variables not passed are set to missing. In either case, Stata does not issue an error
message.
To make sure the user provides two variables, no fewer and no more, we can use the following:
capture program drop optcut
program define optcut
syntax varlist(min=2 max=2)
tokenize `varlist'
local gold `1'
local test `2'
local _test
set more off
foreach num of numlist 0 21 36 {
capture drop _test
gen _test = cond(`test’ >`num’,1,0)
replace _test =. if `test’ ==.
diagt `gold’ _test
}
set more on
end
optcut gold icapsv
Chapter 1-10 (revision 16 May 2010)
p. 12
Next, let’s make use of the Stata command levelsof to pass all of the values of our test variable to
the foreach command.
capture program drop optcut
program define optcut
syntax varlist(min=2 max=2)
tokenize `varlist'
local gold `1'
local test `2'
local _test
levelsof `test’, local(levels)
set more off
foreach num of local levels {
capture drop _test
gen _test = cond(`test’ >`num’,1,0)
replace _test =. if test ==.
diagt gold _test
}
set more on
end
optcut gold icapsv
This crashes on the last value, but does great until then.
.... there is still much to do. A much more completed, although much more complex, version is
in chapter10.do.
Chapter 1-10 (revision 16 May 2010)
p. 13
Program to compute the statistic, Accuracy, which diagt does not provide.
Here is a program you can use to compute accuracy, (a+c)/N. The last line is how to call it.
* program to compute test characteric accuracy
capture program drop accuracy
program define accuracy , byable(recall)
version 9
syntax varlist(min=2 max=2) [if] [in]
tokenize `varlist'
local goldvar `1'
local testvar `2'
quietly count
quietly scalar N=r(N)
quietly count if `goldvar'==0 & `testvar'==0
quietly scalar d=r(N)
quietly count if `goldvar'==1 & `testvar'==1
quietly scalar a=r(N)
display as result "Accuracy = (" %-2.0f a "+" %-2.0f d ")/" ///
%-2.0f N " = " %-3.1f (a+d)/N*100 "%"
end
accuracy goldvar testvar
Example of how to extend your program to enable the use of the “if” qualifier
We now extend the accuracy program to enable the use of the “if”qualifier.
* program to compute test characteric accuracy
capture program drop accuracy
program define accuracy , byable(recall)
version 9
syntax varlist(min=2 max=2) [if] [in]
tokenize `varlist'
local goldvar `1'
local testvar `2'
tempname touse
mark `touse' `wgt' `if' `in'
preserve
keep if `touse'
quietly count
quietly scalar N=r(N)
quietly count if `goldvar'==0 & `testvar'==0
quietly scalar d=r(N)
quietly count if `goldvar'==1 & `testvar'==1
quietly scalar a=r(N)
display as result "Accuracy = (" %-2.0f a "+" %-2.0f d ")/" ///
%-2.0f N " = " %-3.1f (a+d)/N*100 "%"
restore
end
accuracy goldvar testvar if patientgroup==1
Chapter 1-10 (revision 16 May 2010)
p. 14
References
Carpenter JP, Lexa FJ, Davis JT. (1995). Determination of sixty percent or greater carotid artery
stenosis by duplex Doppler ultrasonography. J Vasc Surg 22(6):697-705.
Lilienfeld DE, Stolley PD (1994). Foundations of Epidemiology, 3rd ed., New York, Oxford
University Press.
Chapter 1-10 (revision 16 May 2010)
p. 15
Download