Chapter 1-9. More Graphics: Popular Scientific Graphs In this chapter, we will see how to prepare publication quality graphs for several of the popular graph styles found in the medical literature. 50 45 225 40 200 Systolic Blood Pressure 35 30 25 20 15 10 175 150 125 100 5 75 0 normal weight overweight obese BMI <18.5 BMI 18.5-24.9 BMI 25.0-29.9 BMI 30+ underweight normal weight overweight obese [ N = 71 ] [ N = 2,151 ] [ N = 1,866 ] [ N = 601 ] BMI <18.5 BMI 18.5-24.9 BMI 25.0-29.9 BMI 30+ [ N = 71 ] [ N = 2,151 ] [ N = 1,866 ] [ N = 601 ] (pp. 2-12) Bar chart with error bars (pp. 13-25) Box plot 1.00 0.90 0.80 0.70 0.60 0.50 0.40 0.30 0.20 0.10 0.00 Survival Probability 1 .8 Sensitivity 50 underweight .6 .4 .2 ROC area = 86.1% .2 .4 .6 1 - Specificity (pp. 26-33) ROC graph .8 1935-1944 cohort 0 1 2 3 4 5 6 7 8 9 10 0 0 1945-1954 cohort 1 Years Post Diagnosis Number at risk 1935-44: 388 219 127 89 68 1945-54: 749 554 391 292 151 60 89 (pp. 34-54) Kaplan-Meier Curve ______________ Source: Stoddard GJ. Biostatistics and Epidemiology Using Stata: A Course Manual [unpublished manuscript] University of Utah School of Medicine, 2011. http://www.ccts.utah.edu/biostats/?pageId=5385 Chapter 1-9 (revision 27 Jun 2011) p. 1 Bar Chart With Error Bars Start the Stata program and read in the data, File Open Find the directory where you copied the course CD: Find the subdirectory datasets & do-files Single click on framingham.dta Open use " C:\Documents and Settings\u0032770.SRVR\Desktop\ Biostats & Epi With Stata\datasets & do files\framingham", clear * which must be all on one line, or use: cd "C:\Documents and Settings\u0032770.SRVR\Desktop\" cd "Biostats & Epi With Stata\datasets & do files" use framingham, clear To obtain the incidence percent by BMI category, we can use, tab chdfate bmicat, col | bmicat chdfate | 1 2 3 4 | Total -----------+--------------------------------------------+---------0 | 63 1,621 1,180 353 | 3,217 | 88.73 75.36 63.24 58.74 | 68.61 -----------+--------------------------------------------+---------1 | 8 530 686 248 | 1,472 | 11.27 24.64 36.76 41.26 | 31.39 -----------+--------------------------------------------+---------Total | 71 2,151 1,866 601 | 4,689 | 100.00 100.00 100.00 100.00 | 100.00 For error bars, we will use the 95% confidence interval. We can obtain the 95% CI using the immediate form of the ci command, cii followed by sample size and then numerator count. The cii command is used when we already have numerators and sample size available to us, but do not the individual level data available. cii cii cii cii 71 8 2151 530 1866 686 601 248 Since we do have individual level data in memory, we can use the ci command with the “binomial” option to inform Stata the outcome variable is a 0-1 scored variable, rather than a continuous variable, which is the default. ci chdfate , binomial by(bmicat) Chapter 1-9 (revision 27 Jun 2011) p. 2 -> bmicat = 1 -- Binomial Exact -Variable | Obs Mean Std. Err. [95% Conf. Interval] -------------+--------------------------------------------------------------chdfate | 71 .1126761 .0375256 .0499197 .2100005 -> bmicat = 2 -- Binomial Exact -Variable | Obs Mean Std. Err. [95% Conf. Interval] -------------+--------------------------------------------------------------chdfate | 2151 .246397 .0092911 .2283097 .2651779 -> bmicat = 3 -- Binomial Exact -Variable | Obs Mean Std. Err. [95% Conf. Interval] -------------+--------------------------------------------------------------chdfate | 1866 .3676313 .0111618 .345709 .3899705 -> bmicat = 4 -- Binomial Exact -Variable | Obs Mean Std. Err. [95% Conf. Interval] -------------+--------------------------------------------------------------chdfate | 601 .4126456 .0200817 .3729673 .4531865 For a bar chart with error bars, we need to provide Stata with a group variable, bmicat, the height of the bar, percent, and the lower and upper CI limits, percentlcl and percentucl. The following sets up the required data, clear input bmicat percent percentlcl percentucl 1 11.27 4.99 21.00 2 24.64 22.83 26.52 3 36.76 34.57 39.00 4 41.26 37.30 45.32 end To obtained just the bar chart, we would use 10 20 percent 30 40 twoway bar percent bmicat 1 2 3 4 bmicat Chapter 1-9 (revision 27 Jun 2011) p. 3 To obtain just an error bar chart, we would use, 0 10 20 30 percentlcl/percentucl 40 50 twoway rcap percentlcl percentucl bmicat 1 2 3 4 bmicat To overlay the two graphs, we put parentheses around the two graph types, 0 10 20 30 40 50 twoway (bar percent bmicat)( rcap percentlcl percentucl bmicat) 1 2 3 4 bmicat percent Chapter 1-9 (revision 27 Jun 2011) percentlcl/percentucl p. 4 We are using the four BMI categories recommended by the National Heart, Lung, and Blood Institute (1998)(Onyike et al., 2003): underweight (BMI <18.5) normal weight (BMI 18.5–24.9) overweight (BMI 25.0–29.9) obese (BMI 30) To add these labels on the X axis, we eliminate the legend, the x-axis title by assigning it the null string, and the x-axis tick marks, and assign labels to the bars. We are now using the #delimit command to continue the command onto multiple lines, so the following block of Stata code must be run in the do-file editor. 0 10 20 30 40 50 #delimit ; twoway (bar percent bmicat) ( rcap percentlcl percentucl bmicat) , legend(off) xlabels(1 "underweight" 2 "normal weight" 3 "overweight" 4 "obese" , notick) xtitle("") ; #delimit cr underweight normal weight Chapter 1-9 (revision 27 Jun 2011) overweight obese p. 5 To add additional lines to the bar labels, we can use the text command with the default position of “center” at the y-x coordinate. The y-x coordinate, by default, is at the center of the text string. We can use some combination of the following, depending on how much space we need to add at the bottom of the graph: xtitle("") b1title(" ") b2title(" ") note(" ") caption(" ") xtitle("") assigns the x-title the null string, which turns it off. b1title(" ") assigns a blank for the optional graph title at the bottom of the graph and so space is provided for the blank b2title(" ") assigns a blank for the optional graph subtitle at the bottom of the graph note(" ") assigns a blank of the optional footnote at the bottom of the graph caption(" ") assigns a blank of the optional caption at the bottom of the graph 0 10 20 30 40 50 #delimit ; twoway (bar percent bmicat) ( rcap percentlcl percentucl bmicat) , legend(off) xlabels(1 "underweight" 2 "normal weight" 3 "overweight" 4 "obese" , notick) xtitle("") b1title(" ") text(-6 1 "BMI <18.5") text(-6 2 "BMI 18.5-24.9") text(-6 3 "BMI 25.0-29.9") text(-6 4 "BMI 30+") ; #delimit cr underweight BMI <18.5 normal weight BMI 18.5-24.9 Chapter 1-9 (revision 27 Jun 2011) overweight BMI 25.0-29.9 obese BMI 30+ p. 6 If we wanted the sample sizes shown below these bar titles, we could add some more space, by assigning a blank subtitle at the bottom of the graph, and adding sample size text string. 0 10 20 30 40 50 #delimit ; twoway (bar percent bmicat) ( rcap percentlcl percentucl bmicat) , legend(off) xlabels(1 "underweight" 2 "normal weight" 3 "overweight" 4 "obese" , notick) xtitle("") b1title(" ") b2title(" ") text(-7 1 "BMI <18.5") text(-7 2 "BMI 18.5-24.9") text(-7 3 "BMI 25.0-29.9") text(-7 4 "BMI 30+") text(-11 1 "[ N = 71 ]") text(-11 2 "[ N = 2,151 ]") text(-11 3 "[ N = 1,866 ]") text(-11 4 "[ N = 601 ]") ; #delimit cr underweight normal weight overweight obese BMI <18.5 BMI 18.5-24.9 BMI 25.0-29.9 BMI 30+ [ N = 71 ] [ N = 2,151 ] [ N = 1,866 ] [ N = 601 ] Chapter 1-9 (revision 27 Jun 2011) p. 7 Next, let’s change it to black and white, since that is what the journal will expect. 0 10 20 30 40 50 #delimit ; twoway (bar percent bmicat) ( rcap percentlcl percentucl bmicat) , legend(off) scheme(s1mono) xlabels(1 "underweight" 2 "normal weight" 3 "overweight" 4 "obese" , notick) xtitle("") b1title(" ") b2title(" ") text(-7 1 "BMI <18.5") text(-7 2 "BMI 18.5-24.9") text(-7 3 "BMI 25.0-29.9") text(-7 4 "BMI 30+") text(-11 1 "[ N = 71 ]") text(-11 2 "[ N = 2,151 ]") text(-11 3 "[ N = 1,866 ]") text(-11 4 "[ N = 601 ]") ; #delimit cr underweight normal weight overweight obese BMI <18.5 BMI 18.5-24.9 BMI 25.0-29.9 BMI 30+ [ N = 71 ] [ N = 2,151 ] [ N = 1,866 ] [ N = 601 ] Chapter 1-9 (revision 27 Jun 2011) p. 8 Journals do not accept gray scales for the bars and error bars, since they do not reproduce well. Journals also prefer that a border not be drawn around the graph, since it appears more scientific to just have a y and x axis lines. The option plotregion(style(none)) eliminates the border around the graph. We darken the bars with the bar color option and the error bar lines with the line color option. 0 10 20 30 40 50 #delimit ; twoway (bar percent bmicat , bcolor(black)) ( rcap percentlcl percentucl bmicat, lcolor(black)) , legend(off) scheme(s1mono) plotregion(style(none)) xlabels(1 "underweight" 2 "normal weight" 3 "overweight" 4 "obese" , notick) xtitle("") b1title(" ") b2title(" ") text(-7 1 "BMI <18.5") text(-7 2 "BMI 18.5-24.9") text(-7 3 "BMI 25.0-29.9") text(-7 4 "BMI 30+") text(-11 1 "[ N = 71 ]") text(-11 2 "[ N = 2,151 ]") text(-11 3 "[ N = 1,866 ]") text(-11 4 "[ N = 601 ]") ; #delimit cr underweight normal weight overweight obese BMI <18.5 BMI 18.5-24.9 BMI 25.0-29.9 BMI 30+ [ N = 71 ] [ N = 2,151 ] [ N = 1,866 ] [ N = 601 ] Chapter 1-9 (revision 27 Jun 2011) p. 9 It would look better to have space between the bars. Let’s also add a y-axis title, add additional y-axis tick marks, and change the y-axis labels to horizontal. #delimit ; twoway (bar percent bmicat , bcolor(black) barwidth(.5)) ( rcap percentlcl percentucl bmicat, lcolor(black)) , legend(off) scheme(s1mono) plotregion(style(none)) xlabels(1 "underweight" 2 "normal weight" 3 "overweight" 4 "obese" , notick) ytitle(“Incidence of Coronary Heart Disease (%)”) ylabels(0(5)50, angle(horizontal)) xtitle("") b1title(" ") b2title(" ") text(-7 1 "BMI <18.5") text(-7 2 "BMI 18.5-24.9") text(-7 3 "BMI 25.0-29.9") text(-7 4 "BMI 30+") text(-11 1 "[ N = 71 ]") text(-11 2 "[ N = 2,151 ]") text(-11 3 "[ N = 1,866 ]") text(-11 4 "[ N = 601 ]") ; #delimit cr 50 45 40 35 30 25 20 15 10 5 0 underweight normal weight overweight obese BMI <18.5 BMI 18.5-24.9 BMI 25.0-29.9 BMI 30+ [ N = 71 ] [ N = 2,151 ] [ N = 1,866 ] [ N = 601 ] Chapter 1-9 (revision 27 Jun 2011) p. 10 To add some space to the left and right, so the bars don’t look so crowded with the edges of the graph, we use #delimit ; twoway (bar percent bmicat , bcolor(black) barwidth(.5)) ( rcap percentlcl percentucl bmicat, lcolor(black)) , legend(off) scheme(s1mono) plotregion(style(none)) xscale(range(.5 4.5 .5)) xlabels(1 "underweight" 2 "normal weight" 3 "overweight" 4 "obese" , notick) ytitle("Incidence of Coronary Heart Disease (%)") ylabels(0(5)50, angle(horizontal)) xtitle("") b1title(" ") b2title(" ") text(-7 1 "BMI <18.5") text(-7 2 "BMI 18.5-24.9") text(-7 3 "BMI 25.0-29.9") text(-7 4 "BMI 30+") text(-11 1 "[ N = 71 ]") text(-11 2 "[ N = 2,151 ]") text(-11 3 "[ N = 1,866 ]") text(-11 4 "[ N = 601 ]") ; #delimit cr 50 45 40 35 30 25 20 15 10 5 0 underweight normal weight overweight obese BMI <18.5 BMI 18.5-24.9 BMI 25.0-29.9 BMI 30+ [ N = 71 ] [ N = 2,151 ] [ N = 1,866 ] [ N = 601 ] Chapter 1-9 (revision 27 Jun 2011) p. 11 To make the error bars show up better, lets add thicker error bars with larger error bar caps, #delimit ; twoway (bar percent bmicat , bcolor(black) barwidth(.5)) (rcap percentlcl percentucl bmicat, lcolor(black) blwidth(medthick) msize(large)) , legend(off) scheme(s1mono) plotregion(style(none)) xscale(range(.5 4.5 .5)) xlabels(1 "underweight" 2 "normal weight" 3 "overweight" 4 "obese" , notick) ytitle("Incidence of Coronary Heart Disease (%)") ylabels(0(5)50, angle(horizontal)) xtitle("") b1title(" ") b2title(" ") text(-7 1 "BMI <18.5") text(-7 2 "BMI 18.5-24.9") text(-7 3 "BMI 25.0-29.9") text(-7 4 "BMI 30+") text(-11 1 "[ N = 71 ]") text(-11 2 "[ N = 2,151 ]") text(-11 3 "[ N = 1,866 ]") text(-11 4 "[ N = 601 ]") ; #delimit cr 50 45 40 35 30 25 20 15 10 5 0 underweight normal weight overweight obese BMI <18.5 BMI 18.5-24.9 BMI 25.0-29.9 BMI 30+ [ N = 71 ] [ N = 2,151 ] [ N = 1,866 ] [ N = 601 ] When submitting this to a journal, in the figure legend state, “The error bars represent 95% confidence intervals for the incidence proportion.” Otherwise, the reader wonders if they are standard errors, standard deviations, or confidence intervals. Chapter 1-9 (revision 27 Jun 2011) p. 12 Box plot Start the Stata program and read in the data, File Open Find the directory where you copied the course CD: Section 1 Stata Find the subdirectory datasets & do-files Single click on framingham.dta Open use " C:\Documents and Settings\u0032770.SRVR\Desktop\ & Epi With Stata\Section 1 Stata\datasets & do files\framingham", clear which must be all on one line, or use: cd "C:\Documents and Settings\u0032770.SRVR\Desktop\" cd "Biostats & Epi With Stata\Section 1 Stata\datasets & do files" use framingham, clear To obtain a box plot of systolic blood pressure by BMI category, we can use, 50 100 sbp 150 200 250 graph box sbp , over(bmicat) 1 2 3 4 An explanation of the features of a box plot are given in the box on the next page. Chapter 1-9 (revision 27 Jun 2011) p. 13 Boxplots This graph is a box-and-whisker plot, or more simply, a boxplox. The box shows the interquartile range (IQR) (top of box is the 75th percentile, the bottom of the box is the 25th percentile). The line inside the box is the median (50th percentile). The lines extending beyond the box, which look like error bars, called the whiskers, represent the minimum and maximum. However, if a data value extends beyond 1.5 IQR in either direction, the whiskers exclude that value and the value is shown separately as a point on the graph. These points are called extreme values. Extreme values might represent outliers, an outlier being a data value that appears to have not come from the same population that the rest of sample came from. This graphical approach for identifying outliers was proposed by Tukey (1977). Tukey referred to outliers as “extreme values” to avoid the whole “outlier” issue, it be controversial how outliers should be handled. Exercise Look at the boxplot in Figure 3 of Bejon et al (2008). Notice in their footnote they give the same explanation of the boxplot elements as given here. Let’s try some different marker styles for the extreme values. Stata has a feature for recalling styles of various things. To see what features such a list is available for, we can use graph query Styles used in graph options are addedlinestyle alignmentstyle anglestyle areastyle arrowstyle arrowdirstyle compassdirstyle connectstyle functionstyle functiontypestyle gridstyle bystyle linestyle linepatternstyle linewidthstyle pstyle marginstyle markerstyle markerlabelstyle markersizestyle sunflowertypestyle symbolstyle justificationstyle clockposstyle colorstyle orientationstyle ringposstyle textboxstyle textsizestyle tickstyle legendstyle Chapter 1-9 (revision 27 Jun 2011) p. 14 We are after the symbol style. graph query symbolstyle symbolstyle may be circle circle_hollow diamond diamond_hollow lgx none plus point smcircle smcircle_hollow smdiamond smdiamond_hollow smplus smsquare smsquare_hollow smtriangle smtriangle_hollow smx square square_hollow triangle triangle_hollow x For information on symbolstyle and how to use it, see help symbolstyle. A more descriptive list of symbols styles can be found using, help symbolstyle Title [G] symbolstyle -- Choices for the shape of markers Syntax synonym symbolstyle (if any) description ------------------------------------------------------circle O solid diamond D solid triangle T solid square S solid plus + x X smcircle smdiamond smsquare smtriangle smplus smx o d s t x solid solid solid solid circle_hollow diamond_hollow triangle_hollow square_hollow Oh Dh Th Sh hollow hollow hollow hollow smcircle_hollow smdiamond_hollow smtriangle_hollow smsquare_hollow oh dh th sh hollow hollow hollow hollow point p a small dot none i a symbol that is invisible ------------------------------------------------------- Notice in this table there are two ways to select a symbol, the long name or the short synonym. Chapter 1-9 (revision 27 Jun 2011) p. 15 The option marker is used to control the display of the extreme, or outside values. We specify 1 as the first argument, which refers to the first outcome variable, sbp. (Several outcome variables can be displayed on same graph, by just replacing sbp with a list of variables.) Trying a hollow circle, graph box sbp , over(bmicat) marker(1, msymbol(circle_hollow)) <or, using a synonym, 50 100 sbp 150 200 250 graph box sbp , over(bmicat) marker(1, msymbol(Oh)) 1 2 3 4 To see what sizes of symbols are available, we use, graph query symbolsizestyle symbolsizestyle may be ehuge huge large medium medlarge medsmall small tiny vhuge vlarge vsmall vtiny zero To increase the size of the extreme values symbols, we use, graph box sbp , /// over(bmicat) marker(1, msymbol(circle_hollow) msize(vlarge)) Note: This must be done in do-file editor, since we use the line continuation marker, ///. That symbol is not recognized in the Command window. Chapter 1-9 (revision 27 Jun 2011) p. 16 250 200 sbp 150 100 50 1 2 3 4 A more convenient way to change symbol sizes and text sizes in Stata is the “*” feature. We simply multiply the default size by some constant.: “*.5” would make it half as large, while “*1.5” would make it 1-1/2 times as large. Trying this, 50 100 sbp 150 200 250 graph box sbp , /// over(bmicat) marker(1, msymbol(circle_hollow) msize(*3)) 1 Chapter 1-9 (revision 27 Jun 2011) 2 3 4 p. 17 If we wanted to show this without the extreme scores, we use the nooutsides option, 50 100 sbp 150 200 graph box sbp , over(bmicat) nooutsides 1 2 3 4 excludes outside values If we eliminate the extreme scores, we should state so in the legend of the graph. There is nothing wrong with this approach. After all, we report descriptive statistics with means and standard deviations or with medians and interquartile range, without any mention of exteme scores. Box plots are a useful way to discover if you have extreme scores in your data. For reporting distributions to a reader, however, the extreme scores are usually thought of as just a distraction, and it is very popular to just not show them. Chapter 1-9 (revision 27 Jun 2011) p. 18 To show this graph in a PowerPoint presentation, we could switch to the “s1color” scheme. 50 100 sbp 150 200 graph box sbp , over(bmicat) nooutsides scheme(s1color) 1 2 3 4 excludes outside values Changing the box color, the box line color, and the box line width, 50 100 sbp 150 200 graph box sbp , over(bmicat) nooutsides scheme(s1color) /// box(1, bcolor(blue) blcolor(red) blwidth(*3)) 1 2 3 4 excludes outside values Chapter 1-9 (revision 27 Jun 2011) p. 19 For publication, we switch to the “s1mono” scheme. 50 100 sbp 150 200 graph box sbp , over(bmicat) nooutsides scheme(s1mono) 1 2 3 4 excludes outside values Journal editors do not like gray scales, so to get graph it in crisp black and white, 50 100 sbp 150 200 #delimit ; graph box sbp , over(bmicat) box(1, bcolor(black) blcolor(black)) nooutsides scheme(s1mono) ; #delimit cr 1 2 3 4 excludes outside values Chapter 1-9 (revision 27 Jun 2011) p. 20 The gray grid lines do not reproduce well, and dark lines are distracting. Let’s drop them and drop the footnote. 50 100 sbp 150 200 #delimit ; graph box sbp , over(bmicat) box(1, bcolor(black) blcolor(black)) nooutsides scheme(s1mono) ylabel( , nogrid) note("") ; #delimit cr 1 Chapter 1-9 (revision 27 Jun 2011) 2 3 4 p. 21 Adding more ticks to the y-axis, and showing them in horizontal position, and adding a better yaxis title, #delimit ; graph box sbp , over(bmicat) box(1, bcolor(black) blcolor(black)) nooutsides scheme(s1mono) note("") ylabel(50(25)225,angle(horizontal) nogrid) ytitle(Systolic Blood Pressure) ; #delimit cr 225 Systolic Blood Pressure 200 175 150 125 100 75 50 1 Chapter 1-9 (revision 27 Jun 2011) 2 3 4 p. 22 Systolic Blood Pressure Providing better x-axis labels, #delimit ; graph box sbp , over(bmicat , relabel(1 "underweight" 2 "normal weight" 3 "overweight" 4 "obese")) box(1, bcolor(black) blcolor(black)) nooutsides scheme(s1mono) note("") ylabel(50(25)225,angle(horizontal) nogrid) ytitle(Systolic Blood Pressure) ; #delimit cr 225 200 175 150 125 100 75 50 underweight Chapter 1-9 (revision 27 Jun 2011) normal weight overweight obese p. 23 Systolic Blood Pressure To add more description to the BMI categories, we use the text command. We add two black subtitles at the bottom of the graph to provide room for the text. For the y-axis position, imagine the displayed y-axis labels continuing below the x-axis line. For box plots, the x-axis is scaled from 0 to 100. I just used trial-and-error to guess the x-axis position of the four bars until it looked good. #delimit ; graph box sbp , over(bmicat , relabel(1 "underweight" 2 "normal weight" 3 "overweight" 4 "obese")) box(1, bcolor(black) blcolor(black)) nooutsides scheme(s1mono) note("") ylabel(50(25)225,angle(horizontal) nogrid) ytitle(Systolic Blood Pressure) b1title(" ") b2title(" ") text(20 10 "BMI <18.5") text(20 37 "BMI 18.5-24.9") text(20 63.5 "BMI 25.0-29.9") text(20 91 "BMI 30+") text(5 10 "[ N = 71 ]") text(5 37 "[ N = 2,151 ]") text(5 63.5 "[ N = 1,866 ]") text(5 91 "[ N = 601 ]") ; #delimit cr 225 200 175 150 125 100 75 50 underweight normal weight overweight obese BMI <18.5 BMI 18.5-24.9 BMI 25.0-29.9 BMI 30+ [ N = 71 ] [ N = 2,151 ] [ N = 1,866 ] [ N = 601 ] Chapter 1-9 (revision 27 Jun 2011) p. 24 Journal editors do not like the box all the way around the graph. To just have a y and x axis, without the box, which has a more scientific look to it, use #delimit ; graph box sbp , over(bmicat , relabel(1 "underweight" 2 "normal weight" 3 "overweight" 4 "obese")) box(1, bcolor(black) blcolor(black)) nooutsides scheme(s1mono) note("") plotregion(style(none)) ylabel(50(25)225,angle(horizontal) nogrid) ytitle(Systolic Blood Pressure) b1title(" ") b2title(" ") text(20 10 "BMI <18.5") text(20 37 "BMI 18.5-24.9") text(20 63.5 "BMI 25.0-29.9") text(20 91 "BMI 30+") text(5 10 "[ N = 71 ]") text(5 37 "[ N = 2,151 ]") text(5 63.5 "[ N = 1,866 ]") text(5 91 "[ N = 601 ]") ; #delimit cr 225 Systolic Blood Pressure 200 175 150 125 100 75 50 underweight normal weight overweight obese BMI <18.5 BMI 18.5-24.9 BMI 25.0-29.9 BMI 30+ [ N = 71 ] [ N = 2,151 ] [ N = 1,866 ] [ N = 601 ] Chapter 1-9 (revision 27 Jun 2011) p. 25 ROC Graph We will use the Wieand dataset (wiedat2b.dta) (see Appendix 1 for source). These data come from a case-control study at the Mayo Clinic with 90 pancreatic cancer cases and 51 non-cancer controls with pancreatitis. The predictors are serum samples assayed for CA-19-9, a carbohydrate antigen, and CA-125, a cancer antigen. Codebook Variable Labels y1 y2 d CA19-9 carbohydrate antigen (continuous) CA125 cancer antigen (continuous) pancreatic cancer (referent standard, or “gold” standard) 1 = yes 0 = no Reading in the data into Stata, File Open Find the directory where you copied the course CD: Find the subdirectory datasets & do-files Single click on wiedat2b.dta Open use "C:\Documents and Settings\u0032770.SRVR\Desktop\ Biostats & Epi With Stata\datasets & do-files\cass.dta", clear * which must be all on one line, or use: cd "C:\Documents and Settings\u0032770.SRVR\Desktop\" cd "Biostats & Epi With Stata\datasets & do-files" use wiedat2b.dta, clear To obtain an ROC graph, we can use, Statistics Epidemiology and related ROC analysis Nonparametric ROC analysis Main tab: Reference variable: d Classication variable: y1 Check the Graph the ROC curve box OK roctab d y1, graph Chapter 1-9 (revision 27 Jun 2011) p. 26 1.00 0.75 Sensitivity 0.50 0.25 0.00 0.00 0.25 0.50 1 - Specificity 0.75 1.00 Area under ROC curve = 0.8614 This is a very nice graph, but it is not camera ready for publication. If you look at the Stata reference manual for the roctab command, you will discover that there is a great deal of control for modifying this graph. To have full control of the graph, as well as the data points used to construct it, you can cut-andpaste the following into the Stata do-file editor, highlight it, and hit the execute button (last icon on right of Stata do-file editor menu bar). This will set up the command, niceroc, which will remain available for the rest of your Stata session. Most of this program is setting up the data to get sensitivity and 1-specificity for every possible cut-point in the data. The bottom of the program is a black-and-white camera ready graph, for publication, and a color graph, which you can use in a PowerPoint presentation or poster presentation. You can simply modify the graph code near the bottom of the program if you want the graph to have a different appearance, then highlight the entire program, and execute it again to set up the revised program. * -- camera ready ROC curve in black-and-white or color * * syntax: [by byvar:] niceroc goldvar testvar [if] * [in] [fweight] [, color reverse] * where goldvar is name of reference standard * variable (gold standard) * assumes 1 = disease present , 0 = disease absent * and testvar is name of continuous test variable * (diagnostic test) Chapter 1-9 (revision 27 Jun 2011) p. 27 * options: color = plot in color * reverse = low value is positive for * gold standard outcome * (default is high value is * positive for outcome) capture program drop niceroc program define niceroc , rclass byable(recall) syntax varlist(min=2 max=2)[if][in][fweight] /// [, color reverse ] tokenize `varlist' local goldvar `1' local testvar `2' marksample touse tempvar sensitivity specificity oneminusspec quietly gen `sensitivity'=. quietly gen `specificity'=. quietly gen `oneminusspec'=. // 1 - specificity quietly levelsof `testvar' if `touse', local(templevels) local ilev=0 // counter for variable level foreach lev of local templevels { local ilev = `ilev'+1 * high test positive – create 2 x 2 table if "`reverse'" == "" { quietly count if (`goldvar'==1 & `testvar'+1e-16>=`lev' /// & `goldvar'~=. & `testvar'~=. & `touse') local cell_a=r(N) quietly count if (`goldvar'==1 & `testvar'+1e-16<`lev' /// & `goldvar'~=. & `testvar'~=. & `touse') local cell_b=r(N) quietly count if (`goldvar'==0 & `testvar'+1e-16>=`lev' /// & `goldvar'~=. & `testvar'~=. & `touse') local cell_c=r(N) quietly count if (`goldvar'==0 & `testvar'+1e-16<`lev' /// & `goldvar'~=. & `testvar'~=. & `touse') local cell_d=r(N) } * low test positive (reverse) option specified * – create 2 x 2 table else if "`reverse'" ~="" { quietly count if (`goldvar'==1 & `testvar'<=`lev'+1e-16 /// & `goldvar'~=. & `testvar'~=. & `touse') local cell_a=r(N) quietly count if (`goldvar'==1 & `testvar'>`lev'+1e-16 /// & `goldvar'~=. & `testvar'~=. & `touse') local cell_b=r(N) quietly count if (`goldvar'==0 & `testvar'<=`lev'+1e-16 /// & `goldvar'~=. & `testvar'~=. & `touse') local cell_c=r(N) quietly count if (`goldvar'==0 & `testvar'>`lev'+1e-16 /// & `goldvar'~=. & `testvar'~=. & `touse') local cell_d=r(N) } quietly replace `sensitivity'= /// (`cell_a'/(`cell_a'+`cell_b')) if _n==`ilev' quietly replace `specificity'= /// (`cell_d'/(`cell_c'+`cell_d')) if _n==`ilev' quietly replace `oneminusspec' = 1 - `specificity' /// if _n==`ilev' } * force graph to pass through (0,0) coordinate local ilev = `ilev'+1 quietly replace `oneminusspec'= 0 in `ilev' quietly replace `sensitivity'= 0 in `ilev' Chapter 1-9 (revision 27 Jun 2011) p. 28 * *list `testvar' `sensitivity' `specificity' `oneminusspec' roctab `goldvar' `testvar' *return list // ROC returned in r(area) local rocarea = string(`r(area)'*100,"%3.1f") * so displays as percent with 1 decimal place *display "ROC area = " `rocarea' preserve sort `oneminusspec' `sensitivity' * -- black and white graph if "`color'" == "" { #delimit ; graph twoway (scatter `sensitivity' `oneminusspec' , color(black) msize(*1.25)) (line `sensitivity' `oneminusspec' , color(black) lwidth(*1.5)) (pci 0 0 1 1 , color(black) lwidth(*1.5)) /* 45 degree line */ , scheme(s1mono) legend(off) ysize(5) xsize(5) /* square graph */ xscale(noline) /* x-axis same thickness as y-axis */ ylabel(0(.2)1, angle(horizontal) labsize(*1.25)) xlabel(0(.2)1, labsize(*1.25)) ytitle(Sensitivity, size(*1.25)) xtitle("1 - Specificity", height(5) size(*1.25)) text(.1 .4 "ROC area = `rocarea'%" , placement(e) size(*1.25)) ; #delimit cr } * -- color graph -else if "`color'" ~= "" { #delimit ; graph twoway (scatter `sensitivity' `oneminusspec' , color(blue) msize(*1.25)) (line `sensitivity' `oneminusspec' , color(blue) lwidth(*1.5)) (pci 0 0 1 1 , color(green) lwidth(*1.5)) /* 45 degree line */ , scheme(s1color) legend(off) ysize(5) xsize(5) /* square graph */ xscale(noline) /* x-axis not thicker than y-axis */ ylabel(0(.2)1, angle(horizontal) labsize(*1.25)) xlabel(0(.2)1, labsize(*1.25)) ytitle(Sensitivity, size(*1.25)) xtitle("1 - Specificity", height(5) size(*1.25)) text(.1 .4 "ROC area = `rocarea'%" , placement(e) size(*1.25)) ; #delimit cr } restore end Chapter 1-9 (revision 27 Jun 2011) p. 29 Now, to get a black-and-white graph, you use, niceroc d y1 1 Sensitivity .8 .6 .4 .2 ROC area = 86.1% 0 0 .2 .4 .6 1 - Specificity .8 1 To get a color graph, you specify the “color” option, niceroc d y1 , color 1 Sensitivity .8 .6 .4 .2 ROC area = 86.1% 0 0 Chapter 1-9 (revision 27 Jun 2011) .2 .4 .6 1 - Specificity .8 1 p. 30 This program assumes that a large value of the classification, or test, variable represents greater risk for the disease outcome. If the classification variable is reversed, so that a small value represents greater risk, then use the “reverse” option. niceroc d y1 , reverse * or niceroc d y1 , color reverse You will recognize when you have this case, because the graph will appear below on the 45 degree reference line, like so, 1 Sensitivity .8 .6 .4 .2 ROC area = 86.1% 0 0 .2 .4 .6 1 - Specificity .8 1 In this case, the high values of y1 represent greater risk, so reversing it, created an anomolous looking graph. Chapter 1-9 (revision 27 Jun 2011) p. 31 Kaplan-Meier Graph We will practice with the LeeLife dataset (see box). LeeLife dataset The source of this dataset is given in Appendix 1 “Dataset Descriptions.” The data concern male patients with localized cancer of the rectum diagnosed in Connecticut from 1935 to 1954. The research question is whether survival improved for the 1945-1954 cohort of patients (cohort = 1) relative to the earlier 1935-1944 cohort (cohort = 0). Data Codebook ________________________________ id study ID number cohort 1 = 1945-1955 patient cohort 0 = 1935-1944 patient cohort interval 1 to 10, time interval (year) following cancer diagnosis 11 = still alive and being followed at end of year 10 died 1 = died 0 = withdrawn alive or lost to follow-up during year interval Reading the data in, File Open Find the directory where you copied the course CD Change to the subdirectory datasets & do-files Single click on LeeLife.dta Open use "C:\Documents and Settings\u0032770.SRVR\Desktop\ Biostats & Epi With Stata\datasets & do-files\LeeLife.dta", clear * which must be all on one line, or use: cd "C:\Documents and Settings\u0032770.SRVR\Desktop\" cd "Biostats & Epi With Stata\datasets & do-files" use LeeLife.dta, clear Chapter 1-9 (revision 27 Jun 2011) p. 32 In preparation for using survival time commands, which all begin with st, we use the stset command to inform Stata which is the death, or event, variable, and which is the time variable. Statistics Survival analysis Setup & utilities Declare data to be survival time data Main tab: Time variable: interval Failure variable: died OK stset interval, failure(died) Generating a Kaplan-Meier graph, which is a graph of the Kaplan-Meier cumulative survival estimates, Statistics Survival analysis Graphs Kaplan-Meier survivor function Main tab: Graph Kaplan-Meier survivor function Make separate calculations by group Grouping variables: cohort OK sts graph, by(cohort) 0.00 0.25 0.50 0.75 1.00 Kaplan-Meier survival estimates 0 5 analysis time cohort = 0 Chapter 1-9 (revision 27 Jun 2011) 10 cohort = 1 p. 33 In this dataset, the follow-up was ended at the end of year 10. The graph, however, is extending to the end of year 11. In this dataset, the time variable has values of 1, 2, …, 10, where the death event is scored at the end of the year. The data were augmented with a score of 11 to indicate the subject was still alive at the end of year 10. This was done solely to make the lifetable come out right in Chapter 5-7. If you are not interested in a lifetable, there is no need to do this. We need to change the 11’s back to 10 to make the Kaplan-Meier graph come out right. recode interval (11=10) , gen(interval2) stset interval2, failure(died) Re-graphing, sts graph, by(cohort) 0.00 0.25 0.50 0.75 1.00 Kaplan-Meier survival estimates 0 2 4 6 8 10 analysis time cohort = 0 cohort = 1 The graph now correct displays that the follow-up ended at year 10. Chapter 1-9 (revision 27 Jun 2011) p. 34 If we decide we wanted to show the cumulative failure, instead, we simply add the failure option. Statistics Survival analysis Graphs Kaplan-Meier failure function Main tab: Graph Kaplan-Meier failure function Make spearate calculations by group Grouping variables: cohort OK sts graph , failure by(cohort) 0.00 0.25 0.50 0.75 1.00 Kaplan-Meier failure estimates 0 2 4 6 8 10 analysis time cohort = 0 cohort = 1 This graph is not used as much. However, if the cumulative survival probabilities in the previous graph only ranged from 1.00 down to 0.90, switching to a cumulative failure graph would be a nice way to spread out the graph in the plot region, since the y-axis would then extend from 0 to 0.10. We will switch back to the cumulative survival graph. There are a lot of options you can utilize for this graph in the Stata menu, but let’s do it manually. Chapter 1-9 (revision 27 Jun 2011) p. 35 Next, we make a nicer looking graph by removing the legend and adding text labels to the lines. #delimit ; sts graph, by(cohort) legend(off) text(.5 6 "1945-1954 cohort",placement(ne)) text(.25 1 "1935-1944 cohort",placement(ne)) ; #delimit cr 0.50 0.75 1.00 Kaplan-Meier survival estimates 0.25 1945-1954 cohort 0.00 1935-1944 cohort 0 2 4 6 8 10 analysis time The command “#delimit ;” changed the end of line delimiter to a semi-colon, which is a good choice since a semi-colon is not used required in any Stata command. Stata now considers whatever we write, no matter how many lines we take, as a single command line until it encounters the semi-colon. The command “#delimit cr” restores the end of line indicator to the carriage return, so we don’t have to keep using the semi-colon at the end of each command for the remainder of the Stata session. The option “legend(off)” turned off the legend. The “text” command is what places a text string anywhere we choose on the graph. In the first text command, we told Stata to place the text string at the graph position (y, x) of 0.5 on the yaxis and 6 on the x-axis. [In algebra, coordinates of a point are (x,y). In Stata, they are (y, x), consistent with all the regression commands that list y before x.] The string inside quotes is what we wish to display on the graph. The “placement” option informs Stata how to orient the string. We used “ne”, which stands for northeast, which tells Stata to position the string on the lower left corner of the string. The default is “c”, or center, which means to position the text string centered on the point both horizontally and vertically. Chapter 1-9 (revision 27 Jun 2011) p. 36 Next we eliminate the title, change the x-axis title, and add a y-axis title. 0.50 0.25 1945-1954 cohort 1935-1944 cohort 0.00 Survival Probability 0.75 1.00 #delimit ; sts graph, by(cohort) legend(off) text(.5 6 "1945-1954 cohort",placement(ne)) text(.25 1 "1935-1944 cohort",placement(ne)) title("") xtitle("Years Post Diagnosis") ytitle("Survival Probability") ; #delimit cr 0 2 Chapter 1-9 (revision 27 Jun 2011) 4 6 Years Post Diagnosis 8 10 p. 37 Stata does not add enough space between the x-axis tick mark labels and the x-axis title. To add space, we add the “height(5)” option to the xtitle. 0.50 0.25 1945-1954 cohort 1935-1944 cohort 0.00 Survival Probability 0.75 1.00 #delimit ; sts graph, by(cohort) legend(off) text(.5 6 "1945-1954 cohort",placement(ne)) text(.25 1 "1935-1944 cohort",placement(ne)) title("") xtitle("Years Post Diagnosis",height(5)) ytitle("Survival Probability") ; #delimit cr 0 2 Chapter 1-9 (revision 27 Jun 2011) 4 6 Years Post Diagnosis 8 10 p. 38 Next, we will change the orientation of the y-axis tick labels to horizontal and add some more tick labels to both the y and x axes. #delimit ; sts graph, by(cohort) legend(off) text(.5 6 "1945-1954 cohort",placement(ne)) text(.25 1 "1935-1944 cohort",placement(ne)) title("") xtitle("Years Post Diagnosis",height(5)) ytitle("Survival Probability") ylabels(0(.1)1,angle(horizontal)) xlabels(0(1)10) ; #delimit cr For the “xlabels” command, the 0 is the minimum, the “(1)” is the increment, and the 10 is the maximum. 1.00 0.90 Survival Probability 0.80 0.70 0.60 1945-1954 cohort 0.50 0.40 0.30 1935-1944 cohort 0.20 0.10 0.00 0 1 2 Chapter 1-9 (revision 27 Jun 2011) 3 4 5 6 Years Post Diagnosis 7 8 9 10 p. 39 Survival Probability We change it to black-and-white using “scheme(s1mono)”. #delimit ; sts graph , by(cohort) legend(off) text(.5 6 "1945-1954 cohort",placement(ne)) text(.25 1 "1935-1944 cohort",placement(ne)) title("") xtitle("Years Post Diagnosis",height(5)) ytitle("Survival Probability") ylabels(0(.1)1,angle(horizontal)) xlabels(0(1)10) scheme(s1mono) ; #delimit cr 1.00 0.90 0.80 0.70 0.60 1945-1954 cohort 0.50 0.40 0.30 1935-1944 cohort 0.20 0.10 0.00 0 1 2 Chapter 1-9 (revision 27 Jun 2011) 3 4 5 6 Years Post Diagnosis 7 8 9 10 p. 40 To turn off the gridlines, we use the “nogrid” sub-option in the ylabels option. To turn off the border around the graph, the top and right sides, we use “plotregion(style(none))”. Journals do not like the top and right borders because it make the graph look less scientific. This comes from the idea that the Cartesian coordinate system, used in math, does not have that border, but only x and y axes. #delimit ; sts graph , by(cohort) legend(off) text(.5 6 "1945-1954 cohort",placement(ne)) text(.25 1 "1935-1944 cohort",placement(ne)) title("") xtitle("Years Post Diagnosis",height(5)) ytitle("Survival Probability") ylabels(0(.1)1,angle(horizontal) nogrid) xlabels(0(1)10) scheme(s1mono) plotregion(style(none)) ; #delimit cr 1.00 0.90 Survival Probability 0.80 0.70 0.60 1945-1954 cohort 0.50 0.40 0.30 1935-1944 cohort 0.20 0.10 0.00 0 1 2 Chapter 1-9 (revision 27 Jun 2011) 3 4 5 6 Years Post Diagnosis 7 8 9 10 p. 41 We might like a different style of line for each group. To find out what styles are available, use graph query linepatternstyle linepatternstyle may be blank dash dash_3dot dash_dot dash_dot_dot dot longdash longdash_3dot longdash_dot longdash_dot_dot longdash_shortdash shortdash shortdash_dot shortdash_dot_dot solid tight_dot vshortdash For information on linepatternstyle and how to use it, see help linepatternstyle. Making one line solid and one lined dashed, #delimit ; sts graph , by(cohort) legend(off) text(.5 6 "1945-1954 cohort",placement(ne)) text(.25 1 "1935-1944 cohort",placement(ne)) title("") xtitle("Years Post Diagnosis",height(5)) ytitle("Survival Probability") ylabels(0(.1)1,angle(horizontal) nogrid) xlabels(0(1)10) scheme(s1mono) plotregion(style(none)) plot1opts(lpattern(solid)) plot2opts(lpattern(dash)) ; #delimit cr 1.00 0.90 Survival Probability 0.80 0.70 0.60 1945-1954 cohort 0.50 0.40 0.30 1935-1944 cohort 0.20 0.10 0.00 0 1 2 3 4 5 6 Years Post Diagnosis 7 8 9 10 Most graphs do not have the “plot1opts” and “plot2opts” options. The “specialty” graphs in Stata sometimes use this convention. Since there is not a twoway line graph used here, this is a way to pass options to the line feature of the specialty graph. Chapter 1-9 (revision 27 Jun 2011) p. 42 These lines will not reproduce well, because they are too thin. To find out what line thicknesses are available, use graph query linewidthstyle linewidthstyle may be medium medthick medthin none thick thin vthick vthin vvthick vvthin vvvthick vvvthin For information on linewidthstyle and how to use it, see help linewidthstyle. Making thicker lines, #delimit ; sts graph , by(cohort) legend(off) text(.5 6 "1945-1954 cohort",placement(ne)) text(.25 1 "1935-1944 cohort",placement(ne)) title("") xtitle("Years Post Diagnosis",height(5 ytitle("Survival Probability") ylabels(0(.1)1,angle(horizontal) nogrid) xlabels(0(1)10) scheme(s1mono) plotregion(style(none)) plot1opts(lpattern(solid) lwidth(medthick)) plot2opts(lpattern(dash) lwidth(medthick)) ; #delimit cr 1.00 0.90 Survival Probability 0.80 0.70 0.60 1945-1954 cohort 0.50 0.40 0.30 1935-1944 cohort 0.20 0.10 0.00 0 1 2 Chapter 1-9 (revision 27 Jun 2011) 3 4 5 6 Years Post Diagnosis 7 8 9 10 p. 43 It would be better to make both lines dark black, so they reproduce better, particularly since we have a different line style to distinguish them. To find out the avialable line colors, use graph query colorstyle colorstyle may be black blue bluishgray bluishgray8 brown sunflowerlime chocolate cranberry cyan dimgray dkgreen dknavy dkorange ebblue ebg edkbg edkblue eggshell eltblue gs0 gs1 gs10 gs11 gs12 gs6 gs7 gs8 gs9 khaki midblue midgreen mint navy navy8 sand sandb sienna stone eltgreen emerald emidblue erose forest_green gold gray green gs13 gs14 gs15 gs16 gs2 gs3 gs4 gs5 lavender lime ltblue ltbluishgray ltbluishgray8 ltkhaki magenta maroon none olive olive_teal orange orange_red pink purple red teal white yellow For information on colorstyle and how to use it, see help colorstyle. Chapter 1-9 (revision 27 Jun 2011) p. 44 Survival Probability Making both lines dark black, #delimit ; sts graph , by(cohort) legend(off) text(.5 6 "1945-1954 cohort",placement(ne)) text(.25 1 "1935-1944 cohort",placement(ne)) title("") xtitle("Years Post Diagnosis",height(5)) ytitle("Survival Probability") ylabels(0(.1)1,angle(horizontal) nogrid) xlabels(0(1)10) scheme(s1mono) plotregion(style(none)) plot1opts(lpattern(solid) lwidth(medthick) lcolor(black)) plot2opts(lpattern(dash) lwidth(medthick) lcolor(black)) ; #delimit cr 1.00 0.90 0.80 0.70 0.60 1945-1954 cohort 0.50 0.40 0.30 1935-1944 cohort 0.20 0.10 0.00 0 1 2 3 4 5 6 Years Post Diagnosis 7 8 9 10 Many researchers stop here, as this is a publication quality graph. A better graph, however, displays the number of subjects still at risk to the bottom of the graph. Chapter 1-9 (revision 27 Jun 2011) p. 45 Adding the number still at risk to the bottom of the graph, #delimit ; sts graph , by(cohort) legend(off) text(.5 6 "1945-1954 cohort",placement(ne)) text(.25 1 "1935-1944 cohort",placement(ne)) title("") xtitle("Years Post Diagnosis",height(5)) ytitle("Survival Probability") ylabels(0(.1)1,angle(horizontal) nogrid) xlabels(0(1)10) scheme(s1mono) plotregion(style(none)) plot1opts(lpattern(solid) lwidth(medthick) lcolor(black)) plot2opts(lpattern(dash) lwidth(medthick) lcolor(black)) risktable ; #delimit cr 1.00 0.90 Survival Probability 0.80 0.70 0.60 1945-1954 cohort 0.50 0.40 0.30 1935-1944 cohort 0.20 0.10 0.00 0 Number at risk cohort = 0 388 cohort = 1 749 1 2 3 388 749 219 554 173 456 Chapter 1-9 (revision 27 Jun 2011) 4 5 6 7 Years Post Diagnosis 127 391 107 338 89 292 77 209 8 9 10 68 151 67 120 60 89 p. 46 The “cohort = 0” and “cohort = 1” is not very helpful. To change these row titles, we add the “order” option to the risktable option. The order, 1 is the first drawn line and 2 is the second drawn line, which corresponds with the values 0 and 1 for the cohort variable. If we reversed this, “risktable( ,order(2 "1935-1944:" 1 "1945-1954:")) ”, then the number at risk rows would switch, with 1945-1954 being the top row. #delimit ; sts graph , by(cohort) legend(off) text(.5 6 "1945-1954 cohort",placement(ne)) text(.25 1 "1935-1944 cohort",placement(ne)) title("") xtitle("Years Post Diagnosis",height(5)) ytitle("Survival Probability") ylabels(0(.1)1,angle(horizontal) nogrid) xlabels(0(1)10) scheme(s1mono) plotregion(style(none)) plot1opts(lpattern(solid) lwidth(medthick) lcolor(black)) plot2opts(lpattern(dash) lwidth(medthick) lcolor(black)) risktable( ,order(1 "1935-1944:" 2 "1945-1954:")) ; #delimit cr 1.00 0.90 Survival Probability 0.80 0.70 0.60 1945-1954 cohort 0.50 0.40 0.30 1935-1944 cohort 0.20 0.10 0.00 0 Number at risk 1935-1944: 388 1945-1954: 749 1 2 3 388 749 219 554 173 456 4 5 6 7 Years Post Diagnosis 127 391 107 338 89 292 77 209 8 9 10 68 151 67 120 60 89 This graph looks great if you use this much space on the page. However, if you resize it to one column of an article, the numbers become too small to be easily read. Chapter 1-9 (revision 27 Jun 2011) p. 47 Here is how it would look in a 3 inch column. 1.00 0.90 Survival Probability 0.80 0.70 0.60 1945-1954 cohort 0.50 0.40 0.30 1935-1944 cohort 0.20 0.10 0.00 0 Number at risk 1935-1944: 388 1945-1954: 749 1 2 3 388 749 219 554 173 456 4 5 6 7 Years Post Diagnosis 127 391 107 338 89 292 77 209 8 9 10 68 151 67 120 60 89 A way to avoid this problem is to use fewer numbers and make them larger, as well as make all text and titles larger. First, we will use x-axis tick labels for only the even years. #delimit ; sts graph , by(cohort) legend(off) text(.5 6 "1945-1954 cohort",placement(ne)) text(.25 1 "1935-1944 cohort",placement(ne)) title("") xtitle("Years Post Diagnosis",height(5)) ytitle("Survival Probability") ylabels(0(.1)1,angle(horizontal) nogrid) xlabels(0(1)10) scheme(s1mono) plotregion(style(none)) plot1opts(lpattern(solid) lwidth(medthick) lcolor(black)) plot2opts(lpattern(dash) lwidth(medthick) lcolor(black)) risktable(0(2)10 ,order(1 "1935-1944:" 2 "1945-1954:")) ; #delimit cr 1.00 0.90 Survival Probability 0.80 0.70 0.60 1945-1954 cohort 0.50 0.40 0.30 1935-1944 cohort 0.20 0.10 0.00 0 1 Number at risk 1935-1944: 388 1945-1954: 749 Chapter 1-9 (revision 27 Jun 2011) 2 219 554 3 4 5 6 7 Years Post Diagnosis 127 391 89 292 8 68 151 9 10 60 89 p. 48 Next, let’s make it a square graph with a 3-inch width. #delimit ; sts graph , by(cohort) legend(off) text(.5 6 "1945-1954 cohort",placement(ne)) text(.25 1 "1935-1944 cohort",placement(ne)) title("") xtitle("Years Post Diagnosis",height(5)) ytitle("Survival Probability") ylabels(0(.1)1,angle(horizontal) nogrid) xlabels(0(1)10) scheme(s1mono) plotregion(style(none)) ysize(3) xsize(3) plot1opts(lpattern(solid) lwidth(medthick) lcolor(black)) plot2opts(lpattern(dash) lwidth(medthick) lcolor(black)) risktable(0(2)10 ,order(1 "1935-1944:" 2 "1945-1954:")) ; #delimit cr 1.00 0.90 Survival Probability 0.80 0.70 0.60 1945-1954 cohort 0.50 0.40 0.30 1935-1944 cohort 0.20 0.10 0.00 0 Number at risk 1935-1944: 388 1945-1954: 749 1 2 3 4 5 6 7 8 Years Post Diagnosis 219 554 127 391 89 292 Chapter 1-9 (revision 27 Jun 2011) 68 151 9 10 60 89 p. 49 Let’s increase the size of all numbers and text. In Stata version 11, you can do this with the multiplier, using *k, where k is how many times larger or smaller than the default size you desire. With earlier versions of Stata, you use “large” to get something close to *1.25. The choices are: graph query textsize textsizestyle may be default full half half_tiny huge large medium medlarge medsmall minuscule quarter quarter_tiny small tenth third third_tiny tiny vhuge vlarge vsmall zero For information on textsizestyle and how to use it, see help textsizestyle. Assuming version 11, Survival Probability #delimit ; sts graph , by(cohort) legend(off) text(.5 6 "1945-1954 cohort",placement(ne) size(*1.25)) text(.25 1 "1935-1944 cohort",placement(ne) size(*1.25)) title("") xtitle("Years Post Diagnosis",height(7) size(*1.25)) ytitle("Survival Probability", size(*1.25)) ylabels(0(.1)1,angle(horizontal) nogrid labsize(*1.25)) xlabels(0(1)10 , labsize(*1.25)) scheme(s1mono) plotregion(style(none)) ysize(3) xsize(3) plot1opts(lpattern(solid) lwidth(medthick) lcolor(black)) plot2opts(lpattern(dash) lwidth(medthick) lcolor(black)) risktable(0(2)10 ,order(1 "1935-1944:" 2 "1945-1954:")) ; #delimit cr 1.00 0.90 0.80 0.70 0.60 0.50 0.40 0.30 0.20 0.10 0.00 1945-1954 cohort 1935-1944 cohort 0 1 2 3 4 5 6 7 8 9 10 Years Post Diagnosis Number at risk 1935-1944: 388 1945-1954: 749 219 554 Chapter 1-9 (revision 27 Jun 2011) 127 391 89 292 68 151 60 89 p. 50 Repositioning the placement of the line labels, and increasing the size of the numbers in the atrisk table Survival Probability #delimit ; sts graph , by(cohort) legend(off) text(.6 4 "1945-1954 cohort",placement(ne) size(*1.25)) text(.1 1 "1935-1944 cohort",placement(ne) size(*1.25)) title("") xtitle("Years Post Diagnosis",height(7) size(*1.25)) ytitle("Survival Probability", size(*1.25)) ylabels(0(.1)1,angle(horizontal) nogrid labsize(*1.25)) xlabels(0(1)10 , labsize(*1.25)) scheme(s1mono) plotregion(style(none)) ysize(3) xsize(3) plot1opts(lpattern(solid) lwidth(medthick) lcolor(black)) plot2opts(lpattern(dash) lwidth(medthick) lcolor(black)) risktable(0(2)10 ,order(1 "1935-1944:" 2 "1945-1954:") size(*1.25)) ; #delimit cr 1.00 0.90 0.80 0.70 0.60 0.50 0.40 0.30 0.20 0.10 0.00 1945-1954 cohort 1935-1944 cohort 0 1 2 3 4 5 6 7 8 9 10 Years Post Diagnosis Number at risk 1935-1944:388 1945-1954:749 219 554 127 391 Chapter 1-9 (revision 27 Jun 2011) 89 292 68 151 60 89 p. 51 Adding two spaces between “1944:” and “388” to get some white space, and increasing the size of “Number at risk” title, and changing “medthick” to *1.5 to be consistent with how we change the size of other things, Survival Probability #delimit ; sts graph , by(cohort) legend(off) text(.6 4 "1945-1954 cohort",placement(ne) size(*1.25)) text(.1 1 "1935-1944 cohort",placement(ne) size(*1.25)) title("") xtitle("Years Post Diagnosis",height(7) size(*1.25)) ytitle("Survival Probability", size(*1.25)) ylabels(0(.1)1,angle(horizontal) nogrid labsize(*1.25)) xlabels(0(1)10 , labsize(*1.25)) scheme(s1mono) plotregion(style(none)) ysize(3) xsize(3) plot1opts(lpattern(solid) lwidth(*1.5) lcolor(black)) plot2opts(lpattern(dash) lwidth(*1.5) lcolor(black)) risktable(0(2)10 ,order(1 "1935-44: " 2 "1945-54: ") size(*1.25) title(, size(*1.25))) ; #delimit cr 1.00 0.90 0.80 0.70 0.60 0.50 0.40 0.30 0.20 0.10 0.00 1945-1954 cohort 1935-1944 cohort 0 1 2 3 4 5 6 7 8 9 10 Years Post Diagnosis Number at risk 1935-44: 388 219 127 89 68 1945-54: 749 554 391 292 151 60 89 This graph looks just fine now. If we wanted to change other features of the graph, how to do this in found on pages 415-433, the “sts graph” section, of the Stata Version 11, Survival Analysis and Epidemiological Tables manual. You can find this manual by clicking on Help in the Stata menu bar, and then clicking on PDF Documentation. Chapter 1-9 (revision 27 Jun 2011) p. 52 References Altman DG. (1991). Practical Statistics for Medical Research. New York, Chapman & Hall/CRC, pp.426-433. Bejon P, Lusingu J, Olotu A, et al. (2008). Efficacy of RTS,S/AS01E vaccine against malaria in children 5 to 17 months of age. N Engl J Med 359(24):2521-32. Onyike CU, Crum RM, Lee HB, Lyketsos CG, Eaton WW. (2003). Is obesity associated with major depression? Results from the third national health and nutrition examination survey. Am J Epidemiol 158(12):1139-1153. Oettle H, Post S, Neuhaus P, et al. (2007). Adjuvant chemotherapy with gemcitabine vs observation in patients undergoing curative-intent resection of pancreatic cancer. JAMA 297(3):267-277. Tukey J. (1977). Exploratory data analysis. Reading, MA, Addison-Wesley. Chapter 1-9 (revision 27 Jun 2011) p. 53