Chapter 1-9. More Graphics: Popular Scientific Graphs

advertisement
Chapter 1-9. More Graphics: Popular Scientific Graphs
In this chapter, we will see how to prepare publication quality graphs for several of the popular
graph styles found in the medical literature.
50
45
225
40
200
Systolic Blood Pressure
35
30
25
20
15
10
175
150
125
100
5
75
0
normal weight
overweight
obese
BMI <18.5
BMI 18.5-24.9
BMI 25.0-29.9
BMI 30+
underweight
normal weight
overweight
obese
[ N = 71 ]
[ N = 2,151 ]
[ N = 1,866 ]
[ N = 601 ]
BMI <18.5
BMI 18.5-24.9
BMI 25.0-29.9
BMI 30+
[ N = 71 ]
[ N = 2,151 ]
[ N = 1,866 ]
[ N = 601 ]
(pp. 2-12) Bar chart with error bars
(pp. 13-25) Box plot
1.00
0.90
0.80
0.70
0.60
0.50
0.40
0.30
0.20
0.10
0.00
Survival Probability
1
.8
Sensitivity
50
underweight
.6
.4
.2
ROC area = 86.1%
.2
.4
.6
1 - Specificity
(pp. 26-33) ROC graph
.8
1935-1944 cohort
0 1 2 3 4 5 6 7 8 9 10
0
0
1945-1954 cohort
1
Years Post Diagnosis
Number at risk
1935-44: 388 219 127 89 68
1945-54: 749 554 391 292 151
60
89
(pp. 34-54) Kaplan-Meier Curve
______________
Source: Stoddard GJ. Biostatistics and Epidemiology Using Stata: A Course Manual [unpublished manuscript] University of Utah
School of Medicine, 2011. http://www.ccts.utah.edu/biostats/?pageId=5385
Chapter 1-9 (revision 27 Jun 2011)
p. 1
Bar Chart With Error Bars
Start the Stata program and read in the data,
File
Open
Find the directory where you copied the course CD:
Find the subdirectory datasets & do-files
Single click on framingham.dta
Open
use " C:\Documents and Settings\u0032770.SRVR\Desktop\
Biostats & Epi With Stata\datasets & do files\framingham", clear
*
which must be all on one line, or use:
cd "C:\Documents and Settings\u0032770.SRVR\Desktop\"
cd "Biostats & Epi With Stata\datasets & do files"
use framingham, clear
To obtain the incidence percent by BMI category, we can use,
tab chdfate bmicat, col
|
bmicat
chdfate |
1
2
3
4 |
Total
-----------+--------------------------------------------+---------0 |
63
1,621
1,180
353 |
3,217
|
88.73
75.36
63.24
58.74 |
68.61
-----------+--------------------------------------------+---------1 |
8
530
686
248 |
1,472
|
11.27
24.64
36.76
41.26 |
31.39
-----------+--------------------------------------------+---------Total |
71
2,151
1,866
601 |
4,689
|
100.00
100.00
100.00
100.00 |
100.00
For error bars, we will use the 95% confidence interval. We can obtain the 95% CI using the
immediate form of the ci command, cii followed by sample size and then numerator count. The
cii command is used when we already have numerators and sample size available to us, but do
not the individual level data available.
cii
cii
cii
cii
71 8
2151 530
1866 686
601 248
Since we do have individual level data in memory, we can use the ci command with the
“binomial” option to inform Stata the outcome variable is a 0-1 scored variable, rather than a
continuous variable, which is the default.
ci chdfate , binomial by(bmicat)
Chapter 1-9 (revision 27 Jun 2011)
p. 2
-> bmicat = 1
-- Binomial Exact -Variable |
Obs
Mean
Std. Err.
[95% Conf. Interval]
-------------+--------------------------------------------------------------chdfate |
71
.1126761
.0375256
.0499197
.2100005
-> bmicat = 2
-- Binomial Exact -Variable |
Obs
Mean
Std. Err.
[95% Conf. Interval]
-------------+--------------------------------------------------------------chdfate |
2151
.246397
.0092911
.2283097
.2651779
-> bmicat = 3
-- Binomial Exact -Variable |
Obs
Mean
Std. Err.
[95% Conf. Interval]
-------------+--------------------------------------------------------------chdfate |
1866
.3676313
.0111618
.345709
.3899705
-> bmicat = 4
-- Binomial Exact -Variable |
Obs
Mean
Std. Err.
[95% Conf. Interval]
-------------+--------------------------------------------------------------chdfate |
601
.4126456
.0200817
.3729673
.4531865
For a bar chart with error bars, we need to provide Stata with a group variable, bmicat, the height
of the bar, percent, and the lower and upper CI limits, percentlcl and percentucl.
The following sets up the required data,
clear
input bmicat percent percentlcl percentucl
1 11.27 4.99 21.00
2 24.64 22.83 26.52
3 36.76 34.57 39.00
4 41.26 37.30 45.32
end
To obtained just the bar chart, we would use
10
20
percent
30
40
twoway bar percent bmicat
1
2
3
4
bmicat
Chapter 1-9 (revision 27 Jun 2011)
p. 3
To obtain just an error bar chart, we would use,
0
10
20
30
percentlcl/percentucl
40
50
twoway rcap percentlcl percentucl bmicat
1
2
3
4
bmicat
To overlay the two graphs, we put parentheses around the two graph types,
0
10
20
30
40
50
twoway (bar percent bmicat)( rcap percentlcl percentucl bmicat)
1
2
3
4
bmicat
percent
Chapter 1-9 (revision 27 Jun 2011)
percentlcl/percentucl
p. 4
We are using the four BMI categories recommended by the National Heart, Lung, and Blood
Institute (1998)(Onyike et al., 2003):
underweight (BMI <18.5)
normal weight (BMI 18.5–24.9)
overweight
(BMI 25.0–29.9)
obese
(BMI 30)
To add these labels on the X axis, we eliminate the legend, the x-axis title by assigning it the null
string, and the x-axis tick marks, and assign labels to the bars. We are now using the #delimit
command to continue the command onto multiple lines, so the following block of Stata code
must be run in the do-file editor.
0
10
20
30
40
50
#delimit ;
twoway (bar percent bmicat)
( rcap percentlcl percentucl bmicat)
,
legend(off)
xlabels(1 "underweight" 2 "normal weight" 3 "overweight"
4 "obese" , notick)
xtitle("")
;
#delimit cr
underweight
normal weight
Chapter 1-9 (revision 27 Jun 2011)
overweight
obese
p. 5
To add additional lines to the bar labels, we can use the text command with the default position
of “center” at the y-x coordinate. The y-x coordinate, by default, is at the center of the text
string.
We can use some combination of the following, depending on how much space we need to add at
the bottom of the graph:
xtitle("") b1title(" ") b2title(" ") note(" ") caption(" ")
xtitle("") assigns the x-title the null string, which turns it off.
b1title(" ") assigns a blank for the optional graph title at the bottom
of the graph
and so space is provided for the blank
b2title(" ") assigns a blank for the optional graph subtitle at the bottom of the graph
note(" ") assigns a blank of the optional footnote at the bottom of the graph
caption(" ") assigns a blank of the optional caption at the bottom of the graph
0
10
20
30
40
50
#delimit ;
twoway (bar percent bmicat)
( rcap percentlcl percentucl bmicat)
,
legend(off)
xlabels(1 "underweight" 2 "normal weight" 3 "overweight"
4 "obese" , notick)
xtitle("") b1title(" ")
text(-6 1 "BMI <18.5")
text(-6 2 "BMI 18.5-24.9")
text(-6 3 "BMI 25.0-29.9")
text(-6 4 "BMI 30+")
;
#delimit cr
underweight
BMI <18.5
normal weight
BMI 18.5-24.9
Chapter 1-9 (revision 27 Jun 2011)
overweight
BMI 25.0-29.9
obese
BMI 30+
p. 6
If we wanted the sample sizes shown below these bar titles, we could add some more space, by
assigning a blank subtitle at the bottom of the graph, and adding sample size text string.
0
10
20
30
40
50
#delimit ;
twoway (bar percent bmicat)
( rcap percentlcl percentucl bmicat)
,
legend(off)
xlabels(1 "underweight" 2 "normal weight" 3 "overweight"
4 "obese" , notick)
xtitle("") b1title(" ") b2title(" ")
text(-7 1 "BMI <18.5")
text(-7 2 "BMI 18.5-24.9")
text(-7 3 "BMI 25.0-29.9")
text(-7 4 "BMI 30+")
text(-11 1 "[ N = 71 ]")
text(-11 2 "[ N = 2,151 ]")
text(-11 3 "[ N = 1,866 ]")
text(-11 4 "[ N = 601 ]")
;
#delimit cr
underweight
normal weight
overweight
obese
BMI <18.5
BMI 18.5-24.9
BMI 25.0-29.9
BMI 30+
[ N = 71 ]
[ N = 2,151 ]
[ N = 1,866 ]
[ N = 601 ]
Chapter 1-9 (revision 27 Jun 2011)
p. 7
Next, let’s change it to black and white, since that is what the journal will expect.
0
10
20
30
40
50
#delimit ;
twoway (bar percent bmicat)
( rcap percentlcl percentucl bmicat)
,
legend(off) scheme(s1mono)
xlabels(1 "underweight" 2 "normal weight" 3 "overweight"
4 "obese" , notick)
xtitle("") b1title(" ") b2title(" ")
text(-7 1 "BMI <18.5")
text(-7 2 "BMI 18.5-24.9")
text(-7 3 "BMI 25.0-29.9")
text(-7 4 "BMI 30+")
text(-11 1 "[ N = 71 ]")
text(-11 2 "[ N = 2,151 ]")
text(-11 3 "[ N = 1,866 ]")
text(-11 4 "[ N = 601 ]")
;
#delimit cr
underweight
normal weight
overweight
obese
BMI <18.5
BMI 18.5-24.9
BMI 25.0-29.9
BMI 30+
[ N = 71 ]
[ N = 2,151 ]
[ N = 1,866 ]
[ N = 601 ]
Chapter 1-9 (revision 27 Jun 2011)
p. 8
Journals do not accept gray scales for the bars and error bars, since they do not reproduce well.
Journals also prefer that a border not be drawn around the graph, since it appears more scientific
to just have a y and x axis lines. The option plotregion(style(none)) eliminates the border
around the graph. We darken the bars with the bar color option and the error bar lines with the
line color option.
0
10
20
30
40
50
#delimit ;
twoway (bar percent bmicat , bcolor(black))
( rcap percentlcl percentucl bmicat, lcolor(black))
,
legend(off) scheme(s1mono)
plotregion(style(none))
xlabels(1 "underweight" 2 "normal weight" 3 "overweight"
4 "obese" , notick)
xtitle("") b1title(" ") b2title(" ")
text(-7 1 "BMI <18.5")
text(-7 2 "BMI 18.5-24.9")
text(-7 3 "BMI 25.0-29.9")
text(-7 4 "BMI 30+")
text(-11 1 "[ N = 71 ]")
text(-11 2 "[ N = 2,151 ]")
text(-11 3 "[ N = 1,866 ]")
text(-11 4 "[ N = 601 ]")
;
#delimit cr
underweight
normal weight
overweight
obese
BMI <18.5
BMI 18.5-24.9
BMI 25.0-29.9
BMI 30+
[ N = 71 ]
[ N = 2,151 ]
[ N = 1,866 ]
[ N = 601 ]
Chapter 1-9 (revision 27 Jun 2011)
p. 9
It would look better to have space between the bars. Let’s also add a y-axis title, add additional
y-axis tick marks, and change the y-axis labels to horizontal.
#delimit ;
twoway (bar percent bmicat , bcolor(black) barwidth(.5))
( rcap percentlcl percentucl bmicat, lcolor(black))
,
legend(off) scheme(s1mono)
plotregion(style(none))
xlabels(1 "underweight" 2 "normal weight" 3 "overweight"
4 "obese" , notick)
ytitle(“Incidence of Coronary Heart Disease (%)”)
ylabels(0(5)50, angle(horizontal))
xtitle("") b1title(" ") b2title(" ")
text(-7 1 "BMI <18.5")
text(-7 2 "BMI 18.5-24.9")
text(-7 3 "BMI 25.0-29.9")
text(-7 4 "BMI 30+")
text(-11 1 "[ N = 71 ]")
text(-11 2 "[ N = 2,151 ]")
text(-11 3 "[ N = 1,866 ]")
text(-11 4 "[ N = 601 ]")
;
#delimit cr
50
45
40
35
30
25
20
15
10
5
0
underweight
normal weight
overweight
obese
BMI <18.5
BMI 18.5-24.9
BMI 25.0-29.9
BMI 30+
[ N = 71 ]
[ N = 2,151 ]
[ N = 1,866 ]
[ N = 601 ]
Chapter 1-9 (revision 27 Jun 2011)
p. 10
To add some space to the left and right, so the bars don’t look so crowded with the edges of the
graph, we use
#delimit ;
twoway (bar percent bmicat , bcolor(black) barwidth(.5))
( rcap percentlcl percentucl bmicat, lcolor(black))
,
legend(off) scheme(s1mono)
plotregion(style(none))
xscale(range(.5 4.5 .5))
xlabels(1 "underweight" 2 "normal weight" 3 "overweight"
4 "obese" , notick)
ytitle("Incidence of Coronary Heart Disease (%)")
ylabels(0(5)50, angle(horizontal))
xtitle("") b1title(" ") b2title(" ")
text(-7 1 "BMI <18.5")
text(-7 2 "BMI 18.5-24.9")
text(-7 3 "BMI 25.0-29.9")
text(-7 4 "BMI 30+")
text(-11 1 "[ N = 71 ]")
text(-11 2 "[ N = 2,151 ]")
text(-11 3 "[ N = 1,866 ]")
text(-11 4 "[ N = 601 ]")
;
#delimit cr
50
45
40
35
30
25
20
15
10
5
0
underweight
normal weight
overweight
obese
BMI <18.5
BMI 18.5-24.9
BMI 25.0-29.9
BMI 30+
[ N = 71 ]
[ N = 2,151 ]
[ N = 1,866 ]
[ N = 601 ]
Chapter 1-9 (revision 27 Jun 2011)
p. 11
To make the error bars show up better, lets add thicker error bars with larger error bar caps,
#delimit ;
twoway (bar percent bmicat , bcolor(black) barwidth(.5))
(rcap percentlcl percentucl bmicat, lcolor(black)
blwidth(medthick) msize(large))
,
legend(off) scheme(s1mono)
plotregion(style(none))
xscale(range(.5 4.5 .5))
xlabels(1 "underweight" 2 "normal weight" 3 "overweight"
4 "obese" , notick)
ytitle("Incidence of Coronary Heart Disease (%)")
ylabels(0(5)50, angle(horizontal))
xtitle("") b1title(" ") b2title(" ")
text(-7 1 "BMI <18.5")
text(-7 2 "BMI 18.5-24.9")
text(-7 3 "BMI 25.0-29.9")
text(-7 4 "BMI 30+")
text(-11 1 "[ N = 71 ]")
text(-11 2 "[ N = 2,151 ]")
text(-11 3 "[ N = 1,866 ]")
text(-11 4 "[ N = 601 ]")
;
#delimit cr
50
45
40
35
30
25
20
15
10
5
0
underweight
normal weight
overweight
obese
BMI <18.5
BMI 18.5-24.9
BMI 25.0-29.9
BMI 30+
[ N = 71 ]
[ N = 2,151 ]
[ N = 1,866 ]
[ N = 601 ]
When submitting this to a journal, in the figure legend state, “The error bars represent 95%
confidence intervals for the incidence proportion.” Otherwise, the reader wonders if they are
standard errors, standard deviations, or confidence intervals.
Chapter 1-9 (revision 27 Jun 2011)
p. 12
Box plot
Start the Stata program and read in the data,
File
Open
Find the directory where you copied the course CD: Section 1 Stata
Find the subdirectory datasets & do-files
Single click on framingham.dta
Open
use " C:\Documents and Settings\u0032770.SRVR\Desktop\
& Epi With Stata\Section 1 Stata\datasets & do files\framingham", clear
which must be all on one line, or use:
cd "C:\Documents and Settings\u0032770.SRVR\Desktop\"
cd "Biostats & Epi With Stata\Section 1 Stata\datasets & do files"
use framingham, clear
To obtain a box plot of systolic blood pressure by BMI category, we can use,
50
100
sbp
150
200
250
graph box sbp , over(bmicat)
1
2
3
4
An explanation of the features of a box plot are given in the box on the next page.
Chapter 1-9 (revision 27 Jun 2011)
p. 13
Boxplots
This graph is a box-and-whisker plot, or more simply, a boxplox. The box shows the
interquartile range (IQR) (top of box is the 75th percentile, the bottom of the box is the 25th
percentile). The line inside the box is the median (50th percentile). The lines extending beyond
the box, which look like error bars, called the whiskers, represent the minimum and maximum.
However, if a data value extends beyond 1.5  IQR in either direction, the whiskers exclude that
value and the value is shown separately as a point on the graph. These points are called extreme
values. Extreme values might represent outliers, an outlier being a data value that appears to
have not come from the same population that the rest of sample came from.
This graphical approach for identifying outliers was proposed by Tukey (1977). Tukey referred
to outliers as “extreme values” to avoid the whole “outlier” issue, it be controversial how outliers
should be handled.
Exercise Look at the boxplot in Figure 3 of Bejon et al (2008). Notice in their footnote they
give the same explanation of the boxplot elements as given here.
Let’s try some different marker styles for the extreme values.
Stata has a feature for recalling styles of various things. To see what features such a list is
available for, we can use
graph query
Styles used in graph options are
addedlinestyle
alignmentstyle
anglestyle
areastyle
arrowstyle
arrowdirstyle
compassdirstyle
connectstyle
functionstyle
functiontypestyle
gridstyle
bystyle
linestyle
linepatternstyle
linewidthstyle
pstyle
marginstyle
markerstyle
markerlabelstyle
markersizestyle
sunflowertypestyle
symbolstyle
justificationstyle
clockposstyle
colorstyle
orientationstyle
ringposstyle
textboxstyle
textsizestyle
tickstyle
legendstyle
Chapter 1-9 (revision 27 Jun 2011)
p. 14
We are after the symbol style.
graph query symbolstyle
symbolstyle may be
circle
circle_hollow
diamond
diamond_hollow
lgx
none
plus
point
smcircle
smcircle_hollow
smdiamond
smdiamond_hollow
smplus
smsquare
smsquare_hollow
smtriangle
smtriangle_hollow
smx
square
square_hollow
triangle
triangle_hollow
x
For information on symbolstyle and how to use it, see help symbolstyle.
A more descriptive list of symbols styles can be found using,
help symbolstyle
Title
[G] symbolstyle -- Choices for the shape of markers
Syntax
synonym
symbolstyle
(if any)
description
------------------------------------------------------circle
O
solid
diamond
D
solid
triangle
T
solid
square
S
solid
plus
+
x
X
smcircle
smdiamond
smsquare
smtriangle
smplus smx
o
d
s
t
x
solid
solid
solid
solid
circle_hollow
diamond_hollow
triangle_hollow
square_hollow
Oh
Dh
Th
Sh
hollow
hollow
hollow
hollow
smcircle_hollow
smdiamond_hollow
smtriangle_hollow
smsquare_hollow
oh
dh
th
sh
hollow
hollow
hollow
hollow
point
p
a small dot
none
i
a symbol that is invisible
-------------------------------------------------------
Notice in this table there are two ways to select a symbol, the long name or the short synonym.
Chapter 1-9 (revision 27 Jun 2011)
p. 15
The option marker is used to control the display of the extreme, or outside values. We specify 1
as the first argument, which refers to the first outcome variable, sbp. (Several outcome variables
can be displayed on same graph, by just replacing sbp with a list of variables.)
Trying a hollow circle,
graph box sbp , over(bmicat) marker(1, msymbol(circle_hollow))
<or, using a synonym,
50
100
sbp
150
200
250
graph box sbp , over(bmicat) marker(1, msymbol(Oh))
1
2
3
4
To see what sizes of symbols are available, we use,
graph query symbolsizestyle
symbolsizestyle may be
ehuge
huge
large
medium
medlarge
medsmall
small
tiny
vhuge
vlarge
vsmall
vtiny
zero
To increase the size of the extreme values symbols, we use,
graph box sbp , ///
over(bmicat) marker(1, msymbol(circle_hollow) msize(vlarge))
Note: This must be done in do-file editor, since we use the line continuation marker, ///. That
symbol is not recognized in the Command window.
Chapter 1-9 (revision 27 Jun 2011)
p. 16
250
200
sbp
150
100
50
1
2
3
4
A more convenient way to change symbol sizes and text sizes in Stata is the “*” feature. We
simply multiply the default size by some constant.: “*.5” would make it half as large, while
“*1.5” would make it 1-1/2 times as large.
Trying this,
50
100
sbp
150
200
250
graph box sbp , ///
over(bmicat) marker(1, msymbol(circle_hollow) msize(*3))
1
Chapter 1-9 (revision 27 Jun 2011)
2
3
4
p. 17
If we wanted to show this without the extreme scores, we use the nooutsides option,
50
100
sbp
150
200
graph box sbp , over(bmicat) nooutsides
1
2
3
4
excludes outside values
If we eliminate the extreme scores, we should state so in the legend of the graph.
There is nothing wrong with this approach. After all, we report descriptive statistics with means
and standard deviations or with medians and interquartile range, without any mention of exteme
scores. Box plots are a useful way to discover if you have extreme scores in your data. For
reporting distributions to a reader, however, the extreme scores are usually thought of as just a
distraction, and it is very popular to just not show them.
Chapter 1-9 (revision 27 Jun 2011)
p. 18
To show this graph in a PowerPoint presentation, we could switch to the “s1color” scheme.
50
100
sbp
150
200
graph box sbp , over(bmicat) nooutsides scheme(s1color)
1
2
3
4
excludes outside values
Changing the box color, the box line color, and the box line width,
50
100
sbp
150
200
graph box sbp , over(bmicat) nooutsides scheme(s1color) ///
box(1, bcolor(blue) blcolor(red) blwidth(*3))
1
2
3
4
excludes outside values
Chapter 1-9 (revision 27 Jun 2011)
p. 19
For publication, we switch to the “s1mono” scheme.
50
100
sbp
150
200
graph box sbp , over(bmicat) nooutsides scheme(s1mono)
1
2
3
4
excludes outside values
Journal editors do not like gray scales, so to get graph it in crisp black and white,
50
100
sbp
150
200
#delimit ;
graph box sbp , over(bmicat)
box(1, bcolor(black) blcolor(black))
nooutsides scheme(s1mono)
;
#delimit cr
1
2
3
4
excludes outside values
Chapter 1-9 (revision 27 Jun 2011)
p. 20
The gray grid lines do not reproduce well, and dark lines are distracting. Let’s drop them and
drop the footnote.
50
100
sbp
150
200
#delimit ;
graph box sbp , over(bmicat)
box(1, bcolor(black) blcolor(black))
nooutsides scheme(s1mono) ylabel( , nogrid) note("")
;
#delimit cr
1
Chapter 1-9 (revision 27 Jun 2011)
2
3
4
p. 21
Adding more ticks to the y-axis, and showing them in horizontal position, and adding a better yaxis title,
#delimit ;
graph box sbp , over(bmicat)
box(1, bcolor(black) blcolor(black))
nooutsides scheme(s1mono) note("")
ylabel(50(25)225,angle(horizontal) nogrid)
ytitle(Systolic Blood Pressure)
;
#delimit cr
225
Systolic Blood Pressure
200
175
150
125
100
75
50
1
Chapter 1-9 (revision 27 Jun 2011)
2
3
4
p. 22
Systolic Blood Pressure
Providing better x-axis labels,
#delimit ;
graph box sbp , over(bmicat , relabel(1 "underweight"
2 "normal weight" 3 "overweight" 4 "obese"))
box(1, bcolor(black) blcolor(black))
nooutsides scheme(s1mono) note("")
ylabel(50(25)225,angle(horizontal) nogrid)
ytitle(Systolic Blood Pressure)
;
#delimit cr
225
200
175
150
125
100
75
50
underweight
Chapter 1-9 (revision 27 Jun 2011)
normal weight
overweight
obese
p. 23
Systolic Blood Pressure
To add more description to the BMI categories, we use the text command. We add two black
subtitles at the bottom of the graph to provide room for the text. For the y-axis position, imagine
the displayed y-axis labels continuing below the x-axis line. For box plots, the x-axis is scaled
from 0 to 100. I just used trial-and-error to guess the x-axis position of the four bars until it
looked good.
#delimit ;
graph box sbp , over(bmicat , relabel(1 "underweight"
2 "normal weight" 3 "overweight" 4 "obese"))
box(1, bcolor(black) blcolor(black))
nooutsides scheme(s1mono) note("")
ylabel(50(25)225,angle(horizontal) nogrid)
ytitle(Systolic Blood Pressure)
b1title(" ") b2title(" ")
text(20 10 "BMI <18.5")
text(20 37 "BMI 18.5-24.9")
text(20 63.5 "BMI 25.0-29.9")
text(20 91 "BMI 30+")
text(5 10 "[ N = 71 ]")
text(5 37 "[ N = 2,151 ]")
text(5 63.5 "[ N = 1,866 ]")
text(5 91 "[ N = 601 ]")
;
#delimit cr
225
200
175
150
125
100
75
50
underweight
normal weight
overweight
obese
BMI <18.5
BMI 18.5-24.9
BMI 25.0-29.9
BMI 30+
[ N = 71 ]
[ N = 2,151 ]
[ N = 1,866 ]
[ N = 601 ]
Chapter 1-9 (revision 27 Jun 2011)
p. 24
Journal editors do not like the box all the way around the graph. To just have a y and x axis,
without the box, which has a more scientific look to it, use
#delimit ;
graph box sbp , over(bmicat , relabel(1 "underweight"
2 "normal weight" 3 "overweight" 4 "obese"))
box(1, bcolor(black) blcolor(black))
nooutsides scheme(s1mono) note("")
plotregion(style(none))
ylabel(50(25)225,angle(horizontal) nogrid)
ytitle(Systolic Blood Pressure)
b1title(" ") b2title(" ")
text(20 10 "BMI <18.5")
text(20 37 "BMI 18.5-24.9")
text(20 63.5 "BMI 25.0-29.9")
text(20 91 "BMI 30+")
text(5 10 "[ N = 71 ]")
text(5 37 "[ N = 2,151 ]")
text(5 63.5 "[ N = 1,866 ]")
text(5 91 "[ N = 601 ]")
;
#delimit cr
225
Systolic Blood Pressure
200
175
150
125
100
75
50
underweight
normal weight
overweight
obese
BMI <18.5
BMI 18.5-24.9
BMI 25.0-29.9
BMI 30+
[ N = 71 ]
[ N = 2,151 ]
[ N = 1,866 ]
[ N = 601 ]
Chapter 1-9 (revision 27 Jun 2011)
p. 25
ROC Graph
We will use the Wieand dataset (wiedat2b.dta) (see Appendix 1 for source). These data come
from a case-control study at the Mayo Clinic with 90 pancreatic cancer cases and 51 non-cancer
controls with pancreatitis. The predictors are serum samples assayed for CA-19-9, a
carbohydrate antigen, and CA-125, a cancer antigen.
Codebook
Variable
Labels
y1
y2
d
CA19-9 carbohydrate antigen (continuous)
CA125 cancer antigen (continuous)
pancreatic cancer (referent standard, or “gold” standard)
1 = yes
0 = no
Reading in the data into Stata,
File
Open
Find the directory where you copied the course CD:
Find the subdirectory datasets & do-files
Single click on wiedat2b.dta
Open
use "C:\Documents and Settings\u0032770.SRVR\Desktop\
Biostats & Epi With Stata\datasets & do-files\cass.dta", clear
*
which must be all on one line, or use:
cd "C:\Documents and Settings\u0032770.SRVR\Desktop\"
cd "Biostats & Epi With Stata\datasets & do-files"
use wiedat2b.dta, clear
To obtain an ROC graph, we can use,
Statistics
Epidemiology and related
ROC analysis
Nonparametric ROC analysis
Main tab: Reference variable: d
Classication variable: y1
Check the Graph the ROC curve box
OK
roctab d y1, graph
Chapter 1-9 (revision 27 Jun 2011)
p. 26
1.00
0.75
Sensitivity
0.50
0.25
0.00
0.00
0.25
0.50
1 - Specificity
0.75
1.00
Area under ROC curve = 0.8614
This is a very nice graph, but it is not camera ready for publication. If you look at the Stata
reference manual for the roctab command, you will discover that there is a great deal of control
for modifying this graph.
To have full control of the graph, as well as the data points used to construct it, you can cut-andpaste the following into the Stata do-file editor, highlight it, and hit the execute button (last icon
on right of Stata do-file editor menu bar). This will set up the command, niceroc, which will
remain available for the rest of your Stata session.
Most of this program is setting up the data to get sensitivity and 1-specificity for every possible
cut-point in the data.
The bottom of the program is a black-and-white camera ready graph, for publication, and a color
graph, which you can use in a PowerPoint presentation or poster presentation.
You can simply modify the graph code near the bottom of the program if you want the graph to
have a different appearance, then highlight the entire program, and execute it again to set up the
revised program.
* -- camera ready ROC curve in black-and-white or color
*
* syntax: [by byvar:] niceroc goldvar testvar [if]
*
[in] [fweight] [, color reverse]
* where goldvar is name of reference standard
*
variable (gold standard)
*
assumes 1 = disease present , 0 = disease absent
* and testvar is name of continuous test variable
*
(diagnostic test)
Chapter 1-9 (revision 27 Jun 2011)
p. 27
* options: color = plot in color
*
reverse = low value is positive for
*
gold standard outcome
*
(default is high value is
*
positive for outcome)
capture program drop niceroc
program define niceroc , rclass byable(recall)
syntax varlist(min=2 max=2)[if][in][fweight] ///
[, color reverse ]
tokenize `varlist'
local goldvar `1'
local testvar `2'
marksample touse
tempvar sensitivity specificity oneminusspec
quietly gen `sensitivity'=.
quietly gen `specificity'=.
quietly gen `oneminusspec'=. // 1 - specificity
quietly levelsof `testvar' if `touse', local(templevels)
local ilev=0 // counter for variable level
foreach lev of local templevels {
local ilev = `ilev'+1
* high test positive – create 2 x 2 table
if "`reverse'" == "" {
quietly count if (`goldvar'==1 & `testvar'+1e-16>=`lev' ///
& `goldvar'~=. & `testvar'~=. & `touse')
local cell_a=r(N)
quietly count if (`goldvar'==1 & `testvar'+1e-16<`lev' ///
& `goldvar'~=. & `testvar'~=. & `touse')
local cell_b=r(N)
quietly count if (`goldvar'==0 & `testvar'+1e-16>=`lev' ///
& `goldvar'~=. & `testvar'~=. & `touse')
local cell_c=r(N)
quietly count if (`goldvar'==0 & `testvar'+1e-16<`lev' ///
& `goldvar'~=. & `testvar'~=. & `touse')
local cell_d=r(N)
}
* low test positive (reverse) option specified
* – create 2 x 2 table
else if "`reverse'" ~="" {
quietly count if (`goldvar'==1 & `testvar'<=`lev'+1e-16 ///
& `goldvar'~=. & `testvar'~=. & `touse')
local cell_a=r(N)
quietly count if (`goldvar'==1 & `testvar'>`lev'+1e-16 ///
& `goldvar'~=. & `testvar'~=. & `touse')
local cell_b=r(N)
quietly count if (`goldvar'==0 & `testvar'<=`lev'+1e-16 ///
& `goldvar'~=. & `testvar'~=. & `touse')
local cell_c=r(N)
quietly count if (`goldvar'==0 & `testvar'>`lev'+1e-16 ///
& `goldvar'~=. & `testvar'~=. & `touse')
local cell_d=r(N)
}
quietly replace `sensitivity'= ///
(`cell_a'/(`cell_a'+`cell_b')) if _n==`ilev'
quietly replace `specificity'= ///
(`cell_d'/(`cell_c'+`cell_d')) if _n==`ilev'
quietly replace `oneminusspec' = 1 - `specificity' ///
if _n==`ilev'
}
* force graph to pass through (0,0) coordinate
local ilev = `ilev'+1
quietly replace `oneminusspec'= 0 in `ilev'
quietly replace `sensitivity'= 0 in `ilev'
Chapter 1-9 (revision 27 Jun 2011)
p. 28
*
*list `testvar' `sensitivity' `specificity' `oneminusspec'
roctab `goldvar' `testvar'
*return list // ROC returned in r(area)
local rocarea = string(`r(area)'*100,"%3.1f")
* so displays as percent with 1 decimal place
*display "ROC area = " `rocarea' preserve
sort `oneminusspec' `sensitivity'
* -- black and white graph
if "`color'" == "" {
#delimit ;
graph twoway
(scatter `sensitivity' `oneminusspec'
, color(black) msize(*1.25))
(line `sensitivity' `oneminusspec'
, color(black) lwidth(*1.5))
(pci 0 0 1 1 , color(black) lwidth(*1.5))
/* 45 degree line */
, scheme(s1mono) legend(off)
ysize(5) xsize(5) /* square graph */
xscale(noline) /* x-axis same thickness as y-axis */
ylabel(0(.2)1, angle(horizontal) labsize(*1.25))
xlabel(0(.2)1, labsize(*1.25))
ytitle(Sensitivity, size(*1.25))
xtitle("1 - Specificity", height(5) size(*1.25))
text(.1 .4 "ROC area = `rocarea'%"
, placement(e) size(*1.25))
;
#delimit cr
}
* -- color graph -else if "`color'" ~= "" {
#delimit ;
graph twoway
(scatter `sensitivity' `oneminusspec'
, color(blue) msize(*1.25))
(line `sensitivity' `oneminusspec'
, color(blue) lwidth(*1.5))
(pci 0 0 1 1 , color(green) lwidth(*1.5))
/* 45 degree line */
, scheme(s1color) legend(off)
ysize(5) xsize(5) /* square graph */
xscale(noline) /* x-axis not thicker than y-axis */
ylabel(0(.2)1, angle(horizontal) labsize(*1.25))
xlabel(0(.2)1, labsize(*1.25))
ytitle(Sensitivity, size(*1.25))
xtitle("1 - Specificity", height(5) size(*1.25))
text(.1 .4 "ROC area = `rocarea'%"
, placement(e) size(*1.25))
;
#delimit cr
}
restore
end
Chapter 1-9 (revision 27 Jun 2011)
p. 29
Now, to get a black-and-white graph, you use,
niceroc d y1
1
Sensitivity
.8
.6
.4
.2
ROC area = 86.1%
0
0
.2
.4
.6
1 - Specificity
.8
1
To get a color graph, you specify the “color” option,
niceroc d y1 , color
1
Sensitivity
.8
.6
.4
.2
ROC area = 86.1%
0
0
Chapter 1-9 (revision 27 Jun 2011)
.2
.4
.6
1 - Specificity
.8
1
p. 30
This program assumes that a large value of the classification, or test, variable represents greater
risk for the disease outcome. If the classification variable is reversed, so that a small value
represents greater risk, then use the “reverse” option.
niceroc d y1 , reverse
* or
niceroc d y1 , color reverse
You will recognize when you have this case, because the graph will appear below on the 45
degree reference line, like so,
1
Sensitivity
.8
.6
.4
.2
ROC area = 86.1%
0
0
.2
.4
.6
1 - Specificity
.8
1
In this case, the high values of y1 represent greater risk, so reversing it, created an anomolous
looking graph.
Chapter 1-9 (revision 27 Jun 2011)
p. 31
Kaplan-Meier Graph
We will practice with the LeeLife dataset (see box).
LeeLife dataset
The source of this dataset is given in Appendix 1 “Dataset Descriptions.” The data concern male
patients with localized cancer of the rectum diagnosed in Connecticut from 1935 to 1954. The
research question is whether survival improved for the 1945-1954 cohort of patients (cohort = 1)
relative to the earlier 1935-1944 cohort (cohort = 0).
Data Codebook
________________________________
id
study ID number
cohort
1 = 1945-1955 patient cohort
0 = 1935-1944 patient cohort
interval
1 to 10, time interval (year) following cancer diagnosis
11 = still alive and being followed at end of year 10
died
1 = died
0 = withdrawn alive or lost to follow-up during year interval
Reading the data in,
File
Open
Find the directory where you copied the course CD
Change to the subdirectory datasets & do-files
Single click on LeeLife.dta
Open
use "C:\Documents and Settings\u0032770.SRVR\Desktop\
Biostats & Epi With Stata\datasets & do-files\LeeLife.dta", clear
*
which must be all on one line, or use:
cd "C:\Documents and Settings\u0032770.SRVR\Desktop\"
cd "Biostats & Epi With Stata\datasets & do-files"
use LeeLife.dta, clear
Chapter 1-9 (revision 27 Jun 2011)
p. 32
In preparation for using survival time commands, which all begin with st, we use the stset
command to inform Stata which is the death, or event, variable, and which is the time variable.
Statistics
Survival analysis
Setup & utilities
Declare data to be survival time data
Main tab: Time variable: interval
Failure variable: died
OK
stset interval, failure(died)
Generating a Kaplan-Meier graph, which is a graph of the Kaplan-Meier cumulative survival
estimates,
Statistics
Survival analysis
Graphs
Kaplan-Meier survivor function
Main tab: Graph Kaplan-Meier survivor function
Make separate calculations by group
Grouping variables: cohort
OK
sts graph, by(cohort)
0.00
0.25
0.50
0.75
1.00
Kaplan-Meier survival estimates
0
5
analysis time
cohort = 0
Chapter 1-9 (revision 27 Jun 2011)
10
cohort = 1
p. 33
In this dataset, the follow-up was ended at the end of year 10. The graph, however, is extending
to the end of year 11. In this dataset, the time variable has values of 1, 2, …, 10, where the death
event is scored at the end of the year. The data were augmented with a score of 11 to indicate the
subject was still alive at the end of year 10. This was done solely to make the lifetable come out
right in Chapter 5-7. If you are not interested in a lifetable, there is no need to do this.
We need to change the 11’s back to 10 to make the Kaplan-Meier graph come out right.
recode interval (11=10) , gen(interval2)
stset interval2, failure(died)
Re-graphing,
sts graph, by(cohort)
0.00
0.25
0.50
0.75
1.00
Kaplan-Meier survival estimates
0
2
4
6
8
10
analysis time
cohort = 0
cohort = 1
The graph now correct displays that the follow-up ended at year 10.
Chapter 1-9 (revision 27 Jun 2011)
p. 34
If we decide we wanted to show the cumulative failure, instead, we simply add the failure option.
Statistics
Survival analysis
Graphs
Kaplan-Meier failure function
Main tab: Graph Kaplan-Meier failure function
Make spearate calculations by group
Grouping variables: cohort
OK
sts graph , failure by(cohort)
0.00
0.25
0.50
0.75
1.00
Kaplan-Meier failure estimates
0
2
4
6
8
10
analysis time
cohort = 0
cohort = 1
This graph is not used as much. However, if the cumulative survival probabilities in the previous
graph only ranged from 1.00 down to 0.90, switching to a cumulative failure graph would be a
nice way to spread out the graph in the plot region, since the y-axis would then extend from 0 to
0.10.
We will switch back to the cumulative survival graph.
There are a lot of options you can utilize for this graph in the Stata menu, but let’s do it
manually.
Chapter 1-9 (revision 27 Jun 2011)
p. 35
Next, we make a nicer looking graph by removing the legend and adding text labels to the lines.
#delimit ;
sts graph, by(cohort)
legend(off)
text(.5 6 "1945-1954 cohort",placement(ne))
text(.25 1 "1935-1944 cohort",placement(ne))
;
#delimit cr
0.50
0.75
1.00
Kaplan-Meier survival estimates
0.25
1945-1954 cohort
0.00
1935-1944 cohort
0
2
4
6
8
10
analysis time
The command “#delimit ;” changed the end of line delimiter to a semi-colon, which is a good
choice since a semi-colon is not used required in any Stata command. Stata now considers
whatever we write, no matter how many lines we take, as a single command line until it
encounters the semi-colon. The command “#delimit cr” restores the end of line indicator to the
carriage return, so we don’t have to keep using the semi-colon at the end of each command for
the remainder of the Stata session.
The option “legend(off)” turned off the legend.
The “text” command is what places a text string anywhere we choose on the graph. In the first
text command, we told Stata to place the text string at the graph position (y, x) of 0.5 on the yaxis and 6 on the x-axis. [In algebra, coordinates of a point are (x,y). In Stata, they are (y, x),
consistent with all the regression commands that list y before x.] The string inside quotes is what
we wish to display on the graph. The “placement” option informs Stata how to orient the string.
We used “ne”, which stands for northeast, which tells Stata to position the string on the lower left
corner of the string. The default is “c”, or center, which means to position the text string
centered on the point both horizontally and vertically.
Chapter 1-9 (revision 27 Jun 2011)
p. 36
Next we eliminate the title, change the x-axis title, and add a y-axis title.
0.50
0.25
1945-1954 cohort
1935-1944 cohort
0.00
Survival Probability
0.75
1.00
#delimit ;
sts graph, by(cohort)
legend(off)
text(.5 6 "1945-1954 cohort",placement(ne))
text(.25 1 "1935-1944 cohort",placement(ne))
title("") xtitle("Years Post Diagnosis")
ytitle("Survival Probability")
;
#delimit cr
0
2
Chapter 1-9 (revision 27 Jun 2011)
4
6
Years Post Diagnosis
8
10
p. 37
Stata does not add enough space between the x-axis tick mark labels and the x-axis title. To add
space, we add the “height(5)” option to the xtitle.
0.50
0.25
1945-1954 cohort
1935-1944 cohort
0.00
Survival Probability
0.75
1.00
#delimit ;
sts graph, by(cohort)
legend(off)
text(.5 6 "1945-1954 cohort",placement(ne))
text(.25 1 "1935-1944 cohort",placement(ne))
title("") xtitle("Years Post Diagnosis",height(5))
ytitle("Survival Probability")
;
#delimit cr
0
2
Chapter 1-9 (revision 27 Jun 2011)
4
6
Years Post Diagnosis
8
10
p. 38
Next, we will change the orientation of the y-axis tick labels to horizontal and add some more
tick labels to both the y and x axes.
#delimit ;
sts graph, by(cohort)
legend(off)
text(.5 6 "1945-1954 cohort",placement(ne))
text(.25 1 "1935-1944 cohort",placement(ne))
title("") xtitle("Years Post Diagnosis",height(5))
ytitle("Survival Probability")
ylabels(0(.1)1,angle(horizontal)) xlabels(0(1)10)
;
#delimit cr
For the “xlabels” command, the 0 is the minimum, the “(1)” is the increment, and the 10 is the
maximum.
1.00
0.90
Survival Probability
0.80
0.70
0.60
1945-1954 cohort
0.50
0.40
0.30
1935-1944 cohort
0.20
0.10
0.00
0
1
2
Chapter 1-9 (revision 27 Jun 2011)
3
4
5
6
Years Post Diagnosis
7
8
9
10
p. 39
Survival Probability
We change it to black-and-white using “scheme(s1mono)”.
#delimit ;
sts graph , by(cohort)
legend(off)
text(.5 6 "1945-1954 cohort",placement(ne))
text(.25 1 "1935-1944 cohort",placement(ne))
title("") xtitle("Years Post Diagnosis",height(5))
ytitle("Survival Probability")
ylabels(0(.1)1,angle(horizontal)) xlabels(0(1)10)
scheme(s1mono)
;
#delimit cr
1.00
0.90
0.80
0.70
0.60
1945-1954 cohort
0.50
0.40
0.30
1935-1944 cohort
0.20
0.10
0.00
0
1
2
Chapter 1-9 (revision 27 Jun 2011)
3
4
5
6
Years Post Diagnosis
7
8
9
10
p. 40
To turn off the gridlines, we use the “nogrid” sub-option in the ylabels option. To turn off the
border around the graph, the top and right sides, we use “plotregion(style(none))”. Journals do
not like the top and right borders because it make the graph look less scientific. This comes from
the idea that the Cartesian coordinate system, used in math, does not have that border, but only x
and y axes.
#delimit ;
sts graph , by(cohort)
legend(off)
text(.5 6 "1945-1954 cohort",placement(ne))
text(.25 1 "1935-1944 cohort",placement(ne))
title("") xtitle("Years Post Diagnosis",height(5))
ytitle("Survival Probability")
ylabels(0(.1)1,angle(horizontal) nogrid) xlabels(0(1)10)
scheme(s1mono) plotregion(style(none))
;
#delimit cr
1.00
0.90
Survival Probability
0.80
0.70
0.60
1945-1954 cohort
0.50
0.40
0.30
1935-1944 cohort
0.20
0.10
0.00
0
1
2
Chapter 1-9 (revision 27 Jun 2011)
3
4
5
6
Years Post Diagnosis
7
8
9
10
p. 41
We might like a different style of line for each group. To find out what styles are available, use
graph query linepatternstyle
linepatternstyle may be
blank
dash
dash_3dot
dash_dot
dash_dot_dot
dot
longdash
longdash_3dot
longdash_dot
longdash_dot_dot
longdash_shortdash
shortdash
shortdash_dot
shortdash_dot_dot
solid
tight_dot
vshortdash
For information on linepatternstyle and how to use it, see help linepatternstyle.
Making one line solid and one lined dashed,
#delimit ;
sts graph , by(cohort)
legend(off)
text(.5 6 "1945-1954 cohort",placement(ne))
text(.25 1 "1935-1944 cohort",placement(ne))
title("") xtitle("Years Post Diagnosis",height(5))
ytitle("Survival Probability")
ylabels(0(.1)1,angle(horizontal) nogrid) xlabels(0(1)10)
scheme(s1mono) plotregion(style(none))
plot1opts(lpattern(solid))
plot2opts(lpattern(dash))
;
#delimit cr
1.00
0.90
Survival Probability
0.80
0.70
0.60
1945-1954 cohort
0.50
0.40
0.30
1935-1944 cohort
0.20
0.10
0.00
0
1
2
3
4
5
6
Years Post Diagnosis
7
8
9
10
Most graphs do not have the “plot1opts” and “plot2opts” options. The “specialty” graphs in
Stata sometimes use this convention. Since there is not a twoway line graph used here, this is a
way to pass options to the line feature of the specialty graph.
Chapter 1-9 (revision 27 Jun 2011)
p. 42
These lines will not reproduce well, because they are too thin.
To find out what line thicknesses are available, use
graph query linewidthstyle
linewidthstyle may be
medium
medthick
medthin
none
thick
thin
vthick
vthin
vvthick
vvthin
vvvthick
vvvthin
For information on linewidthstyle and how to use it, see help linewidthstyle.
Making thicker lines,
#delimit ;
sts graph , by(cohort)
legend(off)
text(.5 6 "1945-1954 cohort",placement(ne))
text(.25 1 "1935-1944 cohort",placement(ne))
title("") xtitle("Years Post Diagnosis",height(5
ytitle("Survival Probability")
ylabels(0(.1)1,angle(horizontal) nogrid) xlabels(0(1)10)
scheme(s1mono) plotregion(style(none))
plot1opts(lpattern(solid) lwidth(medthick))
plot2opts(lpattern(dash) lwidth(medthick))
;
#delimit cr
1.00
0.90
Survival Probability
0.80
0.70
0.60
1945-1954 cohort
0.50
0.40
0.30
1935-1944 cohort
0.20
0.10
0.00
0
1
2
Chapter 1-9 (revision 27 Jun 2011)
3
4
5
6
Years Post Diagnosis
7
8
9
10
p. 43
It would be better to make both lines dark black, so they reproduce better, particularly since we
have a different line style to distinguish them.
To find out the avialable line colors, use
graph query colorstyle
colorstyle may be
black
blue
bluishgray
bluishgray8
brown
sunflowerlime
chocolate
cranberry
cyan
dimgray
dkgreen
dknavy
dkorange
ebblue
ebg
edkbg
edkblue
eggshell
eltblue
gs0
gs1
gs10
gs11
gs12
gs6
gs7
gs8
gs9
khaki
midblue
midgreen
mint
navy
navy8
sand
sandb
sienna
stone
eltgreen
emerald
emidblue
erose
forest_green
gold
gray
green
gs13
gs14
gs15
gs16
gs2
gs3
gs4
gs5
lavender
lime
ltblue
ltbluishgray
ltbluishgray8
ltkhaki
magenta
maroon
none
olive
olive_teal
orange
orange_red
pink
purple
red
teal
white
yellow
For information on colorstyle and how to use it, see help colorstyle.
Chapter 1-9 (revision 27 Jun 2011)
p. 44
Survival Probability
Making both lines dark black,
#delimit ;
sts graph , by(cohort)
legend(off)
text(.5 6 "1945-1954 cohort",placement(ne))
text(.25 1 "1935-1944 cohort",placement(ne))
title("") xtitle("Years Post Diagnosis",height(5))
ytitle("Survival Probability")
ylabels(0(.1)1,angle(horizontal) nogrid) xlabels(0(1)10)
scheme(s1mono) plotregion(style(none))
plot1opts(lpattern(solid) lwidth(medthick)
lcolor(black))
plot2opts(lpattern(dash) lwidth(medthick)
lcolor(black))
;
#delimit cr
1.00
0.90
0.80
0.70
0.60
1945-1954 cohort
0.50
0.40
0.30
1935-1944 cohort
0.20
0.10
0.00
0
1
2
3
4
5
6
Years Post Diagnosis
7
8
9
10
Many researchers stop here, as this is a publication quality graph.
A better graph, however, displays the number of subjects still at risk to the bottom of the graph.
Chapter 1-9 (revision 27 Jun 2011)
p. 45
Adding the number still at risk to the bottom of the graph,
#delimit ;
sts graph , by(cohort)
legend(off)
text(.5 6 "1945-1954 cohort",placement(ne))
text(.25 1 "1935-1944 cohort",placement(ne))
title("") xtitle("Years Post Diagnosis",height(5))
ytitle("Survival Probability")
ylabels(0(.1)1,angle(horizontal) nogrid) xlabels(0(1)10)
scheme(s1mono) plotregion(style(none))
plot1opts(lpattern(solid) lwidth(medthick)
lcolor(black))
plot2opts(lpattern(dash) lwidth(medthick)
lcolor(black))
risktable
;
#delimit cr
1.00
0.90
Survival Probability
0.80
0.70
0.60
1945-1954 cohort
0.50
0.40
0.30
1935-1944 cohort
0.20
0.10
0.00
0
Number at risk
cohort = 0 388
cohort = 1 749
1
2
3
388
749
219
554
173
456
Chapter 1-9 (revision 27 Jun 2011)
4
5
6
7
Years Post Diagnosis
127
391
107
338
89
292
77
209
8
9
10
68
151
67
120
60
89
p. 46
The “cohort = 0” and “cohort = 1” is not very helpful. To change these row titles, we add the
“order” option to the risktable option. The order, 1 is the first drawn line and 2 is the second
drawn line, which corresponds with the values 0 and 1 for the cohort variable. If we reversed
this, “risktable( ,order(2 "1935-1944:" 1 "1945-1954:")) ”, then the number at risk
rows would switch, with 1945-1954 being the top row.
#delimit ;
sts graph , by(cohort)
legend(off)
text(.5 6 "1945-1954 cohort",placement(ne))
text(.25 1 "1935-1944 cohort",placement(ne))
title("") xtitle("Years Post Diagnosis",height(5))
ytitle("Survival Probability")
ylabels(0(.1)1,angle(horizontal) nogrid) xlabels(0(1)10)
scheme(s1mono) plotregion(style(none))
plot1opts(lpattern(solid) lwidth(medthick)
lcolor(black))
plot2opts(lpattern(dash) lwidth(medthick)
lcolor(black))
risktable( ,order(1 "1935-1944:" 2 "1945-1954:"))
;
#delimit cr
1.00
0.90
Survival Probability
0.80
0.70
0.60
1945-1954 cohort
0.50
0.40
0.30
1935-1944 cohort
0.20
0.10
0.00
0
Number at risk
1935-1944: 388
1945-1954: 749
1
2
3
388
749
219
554
173
456
4
5
6
7
Years Post Diagnosis
127
391
107
338
89
292
77
209
8
9
10
68
151
67
120
60
89
This graph looks great if you use this much space on the page. However, if you resize it to one
column of an article, the numbers become too small to be easily read.
Chapter 1-9 (revision 27 Jun 2011)
p. 47
Here is how it would look in a 3 inch column.
1.00
0.90
Survival Probability
0.80
0.70
0.60
1945-1954 cohort
0.50
0.40
0.30
1935-1944 cohort
0.20
0.10
0.00
0
Number at risk
1935-1944: 388
1945-1954: 749
1
2
3
388
749
219
554
173
456
4
5
6
7
Years Post Diagnosis
127
391
107
338
89
292
77
209
8
9
10
68
151
67
120
60
89
A way to avoid this problem is to use fewer numbers and make them larger, as well as make all
text and titles larger. First, we will use x-axis tick labels for only the even years.
#delimit ;
sts graph , by(cohort)
legend(off)
text(.5 6 "1945-1954 cohort",placement(ne))
text(.25 1 "1935-1944 cohort",placement(ne))
title("") xtitle("Years Post Diagnosis",height(5))
ytitle("Survival Probability")
ylabels(0(.1)1,angle(horizontal) nogrid) xlabels(0(1)10)
scheme(s1mono) plotregion(style(none))
plot1opts(lpattern(solid) lwidth(medthick)
lcolor(black))
plot2opts(lpattern(dash) lwidth(medthick)
lcolor(black))
risktable(0(2)10 ,order(1 "1935-1944:" 2 "1945-1954:"))
;
#delimit cr
1.00
0.90
Survival Probability
0.80
0.70
0.60
1945-1954 cohort
0.50
0.40
0.30
1935-1944 cohort
0.20
0.10
0.00
0
1
Number at risk
1935-1944: 388
1945-1954: 749
Chapter 1-9 (revision 27 Jun 2011)
2
219
554
3
4
5
6
7
Years Post Diagnosis
127
391
89
292
8
68
151
9
10
60
89
p. 48
Next, let’s make it a square graph with a 3-inch width.
#delimit ;
sts graph , by(cohort)
legend(off)
text(.5 6 "1945-1954 cohort",placement(ne))
text(.25 1 "1935-1944 cohort",placement(ne))
title("") xtitle("Years Post Diagnosis",height(5))
ytitle("Survival Probability")
ylabels(0(.1)1,angle(horizontal) nogrid) xlabels(0(1)10)
scheme(s1mono) plotregion(style(none))
ysize(3) xsize(3)
plot1opts(lpattern(solid) lwidth(medthick)
lcolor(black))
plot2opts(lpattern(dash) lwidth(medthick)
lcolor(black))
risktable(0(2)10 ,order(1 "1935-1944:" 2 "1945-1954:"))
;
#delimit cr
1.00
0.90
Survival Probability
0.80
0.70
0.60
1945-1954 cohort
0.50
0.40
0.30
1935-1944 cohort
0.20
0.10
0.00
0
Number at risk
1935-1944: 388
1945-1954: 749
1
2
3 4 5 6 7 8
Years Post Diagnosis
219
554
127
391
89
292
Chapter 1-9 (revision 27 Jun 2011)
68
151
9 10
60
89
p. 49
Let’s increase the size of all numbers and text. In Stata version 11, you can do this with the
multiplier, using *k, where k is how many times larger or smaller than the default size you desire.
With earlier versions of Stata, you use “large” to get something close to *1.25. The choices are:
graph query textsize
textsizestyle may be
default
full
half
half_tiny
huge
large
medium
medlarge
medsmall
minuscule
quarter
quarter_tiny
small
tenth
third
third_tiny
tiny
vhuge
vlarge
vsmall
zero
For information on textsizestyle and how to use it, see help
textsizestyle.
Assuming version 11,
Survival Probability
#delimit ;
sts graph , by(cohort)
legend(off)
text(.5 6 "1945-1954 cohort",placement(ne) size(*1.25))
text(.25 1 "1935-1944 cohort",placement(ne) size(*1.25))
title("") xtitle("Years Post Diagnosis",height(7) size(*1.25))
ytitle("Survival Probability", size(*1.25))
ylabels(0(.1)1,angle(horizontal) nogrid labsize(*1.25))
xlabels(0(1)10 , labsize(*1.25))
scheme(s1mono) plotregion(style(none))
ysize(3) xsize(3)
plot1opts(lpattern(solid) lwidth(medthick)
lcolor(black))
plot2opts(lpattern(dash) lwidth(medthick)
lcolor(black))
risktable(0(2)10 ,order(1 "1935-1944:" 2 "1945-1954:"))
;
#delimit cr
1.00
0.90
0.80
0.70
0.60
0.50
0.40
0.30
0.20
0.10
0.00
1945-1954 cohort
1935-1944 cohort
0 1 2 3 4 5 6 7 8 9 10
Years Post Diagnosis
Number at risk
1935-1944: 388
1945-1954: 749
219
554
Chapter 1-9 (revision 27 Jun 2011)
127
391
89
292
68
151
60
89
p. 50
Repositioning the placement of the line labels, and increasing the size of the numbers in the atrisk table
Survival Probability
#delimit ;
sts graph , by(cohort)
legend(off)
text(.6 4 "1945-1954 cohort",placement(ne) size(*1.25))
text(.1 1 "1935-1944 cohort",placement(ne) size(*1.25))
title("") xtitle("Years Post Diagnosis",height(7) size(*1.25))
ytitle("Survival Probability", size(*1.25))
ylabels(0(.1)1,angle(horizontal) nogrid labsize(*1.25))
xlabels(0(1)10 , labsize(*1.25))
scheme(s1mono) plotregion(style(none))
ysize(3) xsize(3)
plot1opts(lpattern(solid) lwidth(medthick)
lcolor(black))
plot2opts(lpattern(dash) lwidth(medthick)
lcolor(black))
risktable(0(2)10 ,order(1 "1935-1944:" 2 "1945-1954:")
size(*1.25))
;
#delimit cr
1.00
0.90
0.80
0.70
0.60
0.50
0.40
0.30
0.20
0.10
0.00
1945-1954 cohort
1935-1944 cohort
0 1 2 3 4 5 6 7 8 9 10
Years Post Diagnosis
Number at risk
1935-1944:388
1945-1954:749
219
554
127
391
Chapter 1-9 (revision 27 Jun 2011)
89
292
68
151
60
89
p. 51
Adding two spaces between “1944:” and “388” to get some white space, and increasing the size
of “Number at risk” title, and changing “medthick” to *1.5 to be consistent with how we change
the size of other things,
Survival Probability
#delimit ;
sts graph , by(cohort)
legend(off)
text(.6 4 "1945-1954 cohort",placement(ne) size(*1.25))
text(.1 1 "1935-1944 cohort",placement(ne) size(*1.25))
title("") xtitle("Years Post Diagnosis",height(7) size(*1.25))
ytitle("Survival Probability", size(*1.25))
ylabels(0(.1)1,angle(horizontal) nogrid labsize(*1.25))
xlabels(0(1)10 , labsize(*1.25))
scheme(s1mono) plotregion(style(none))
ysize(3) xsize(3)
plot1opts(lpattern(solid) lwidth(*1.5)
lcolor(black))
plot2opts(lpattern(dash) lwidth(*1.5)
lcolor(black))
risktable(0(2)10 ,order(1 "1935-44: " 2 "1945-54: ")
size(*1.25) title(, size(*1.25)))
;
#delimit cr
1.00
0.90
0.80
0.70
0.60
0.50
0.40
0.30
0.20
0.10
0.00
1945-1954 cohort
1935-1944 cohort
0 1 2 3 4 5 6 7 8 9 10
Years Post Diagnosis
Number at risk
1935-44: 388 219 127 89 68
1945-54: 749 554 391 292 151
60
89
This graph looks just fine now.
If we wanted to change other features of the graph, how to do this in found on pages 415-433, the
“sts graph” section, of the Stata Version 11, Survival Analysis and Epidemiological Tables
manual. You can find this manual by clicking on Help in the Stata menu bar, and then clicking
on PDF Documentation.
Chapter 1-9 (revision 27 Jun 2011)
p. 52
References
Altman DG. (1991). Practical Statistics for Medical Research. New York, Chapman &
Hall/CRC, pp.426-433.
Bejon P, Lusingu J, Olotu A, et al. (2008). Efficacy of RTS,S/AS01E vaccine against malaria in
children 5 to 17 months of age. N Engl J Med 359(24):2521-32.
Onyike CU, Crum RM, Lee HB, Lyketsos CG, Eaton WW. (2003). Is obesity associated with
major depression? Results from the third national health and nutrition examination
survey. Am J Epidemiol 158(12):1139-1153.
Oettle H, Post S, Neuhaus P, et al. (2007). Adjuvant chemotherapy with gemcitabine vs
observation in patients undergoing curative-intent resection of pancreatic cancer. JAMA
297(3):267-277.
Tukey J. (1977). Exploratory data analysis. Reading, MA, Addison-Wesley.
Chapter 1-9 (revision 27 Jun 2011)
p. 53
Download