gen percent

advertisement
Tricks in Stata
Anke Huss
Generating „automatic“ tables
in a do-file
Why programming tables?
• It‘s much more writing in the do-file!
• BUT: once you have done it, the next
one will be faster (copy & paste...)
• No more troubles with updates of your
data
• No more copying mistakes, because
Stata does it for you
Caerphilly castle
Used data: Caerphilly Prospective study (CAPS)
download at: www.blackwellpublishing.com/
essentialmedstats/datasets.htm
Basic idea
• Use the Stata data sheet for your tableto-be
illn
MI
diabetes
%
19.48
1.85
Stored results in r() and e()
• Use stored results usually from
r-class: results after general commands
such as summarize are saved in r() and
generally must be used before executing
more commands. For an overview type:
return list
e-class: results from estimation
commands (regress/logictic…) are saved in
e() until the next model is fitted. Overview:
ereturn list
Steps
1. DESIGN TABLE FIRST: what do I
want my table to look like?
2. generate a new variable for each
column
3. replace cell with number of interest
4. use „outsheet“ to write your new
variables in text/ excel file
Example 1
1. DESIGN FIRST: what do I want my
table to look like? E.g.:
Illness
%
Myocardial inf
19.48
diabetes
1.85
Example 1
2. Generate a new variable for each
column
gen str illness = ““
gen percent =.
Illness
%
Example 1
3. Replace cell with contents/ number
of interest: first column
sort id
replace illness = “myocardial inf“ in 1
replace illness = “diabetes“ in 2
Illness
Myocardial inf
diabetes
%
Example 1
3. Replace cell with contents/ number
of interest: second column
sum mi
sort id
replace percent = r(mean)*100 in 1
Illness
%
Myocardial inf
19.48
diabetes
sum diabetes
sort id
replace percent = r(mean)*100 in 2
format percent %9.2f
1.85
Example 1
4. use „outsheet“ to write your new
variables in text/ excel file
outsheet illness percent in 1/2 using textres/illns.txt
For further
*comment 1: this works only if you have set STATA to
work in a specific STATA folder. Eg: cd
"d:/Statistisches/automatic_tables/STATA„
*comment 2: you can also export as excel file (*.xls),
but automatic import of new textfile lets graphics
survive...
Example 1
*Alternative way to do the same: program a small
loop:
gen str name = ""
gen percent = .
local i = 1
foreach var of varlist mi diabetes {
replace name = “`var'“ in `i'
sum `var'
sort id
replace percent = r(mean)*100 in `i'
local i = `i' + 1
}
format percent %9.2f
Example 2
1. DESIGN TABLE FIRST:
Category
underweight
percent
4.20
normal
32.03
overweight
51.29
obese
12.49
Example 2
2. Generate a new variable for each
column
gen str category = ""
gen percent = .
Category
percent
Example 2
3. Replace cell with contents/ number
of interest: first column
sort id
replace
replace
replace
replace
Category
underweight
normal
Overweight
obese
category
category
category
category
percent
=
=
=
=
"underweight"
"normal"
"overweight"
"obese"
in
in
in
in
1
2
3
4
Example 2
3. Replace cell with numbers: second
column
ta bmicat, gen (bminew)
*4 lines with percentages
*4 variables with ending in numbers from 1 to 4 --LOOP!
Category
forvalues i = 1/4 {
sum bminew`i'
sort id
replace percent = r(mean)*100 in `i'
}
format percent %9.2f
underweight
percent
4.20
normal
32.03
Overweight
51.29
obese
12.49
Example 2
4. Outsheet
...same as in example 1
Less writing...
label list bmicat
capture drop percent category bminew*
ta bmicat, gen (bminew)
gen category =.
gen percent = .
forvalues i = 1/4 {
sum bminew`i'
sort id
replace category = `i' in `i'
replace percent = r(mean)*100 in `i'
}
label values category bmicat
format percent %9.2f
Example 3
1. THINK FIRST: table after logistic reg.
Myocardial infarction
Current smoking
Current smoking
(+ age)
Current smoking(+ age + bmi)
OR
uci
lci
pval
Example 3
2. Generate a new variable for each
column
gen
gen
gen
gen
gen
str currsmok = ""
OR = .
uci = .
lci = .
pval =.
Example 3
3. Replace cell with contents/ number of
interest: first column
sort id
replace currentsm = "current smoking"
in 1
replace currentsm = "current smoking + age"
in 2
replace currentsm = "current smoking + age + bmi" in 3
Example 3
3. Replace cell with numbers: second column
logistic mi cursmoke
sort id
replace OR = exp(_b[cursmoke]) in 1
replace lci = exp(_b[cursmoke] - 1.96*_se[cursmoke]) in
replace uci = exp(_b[cursmoke] + 1.96*_se[cursmoke]) in
est store A
logistic mi
est store B
lrtest A B
sort id
replace pval = r(p) in 1
... In lines 2 and 3
1
1
Example 3
4. outsheet
...as in example 1
Resulting table
Myocardial infarction
OR
uci
lci
pval
Current smoking
1.74
2.22
1.36
6.76e
-06
Current smoking
(+ age)
1.67
2.18
1.28
0
Current smoking(+ age + bmi) 1.82
2.40
1.39
0
Other way to save results
after estimation commands
• Use the statsby command: eg:
statsby "logistic mi diabetes smoking" _b _se, saving
(D:\Statistisches\automatic_tables\STATA\data\caerphillysta
tsby.dta) replace
Statsby will collapse your dataset!
Store results in a new dataset and open the original
file again. Rerun "statsby" with next variables
and append data to first stored results.
Download