Creating Custom Graphic Output Using the SG Annotate Data Set

advertisement
Creating Custom Graphic Output Using the SG Annotate Data Set
Doug Lassman, Quintiles, Overland Park, KS
ABSTRACT
New to the version 9.3 release, SAS added the capability to include custom text, lines and shapes to ODS Statistical
Graphics. This paper will show examples of how these can be used to enhance graphic outputs. The examples will
be primarily from the pharmaceutical industry but these can easily be extended to use in other areas. Annotations
can be used in the SGPLOT, SGPANEL and SGSCATTER procedures. Graphs using each procedure will be shown
concentrating mainly on the TEXT and LINE annotation functions but the others will also be discussed.
INTRODUCTION
This paper will discuss the basics of the SG annotate dataset, the elements that can be added, and the drawspace
variables to control the position and scaling of the annotations. Three examples are discussed and the complete
code to generate the examples is included in the appendices.
SG ANNOTATION
The following elements can be added to a graph:






text labels
lines and arrows
ovals or circles
rectangles or squares
polygons
images
THE SG ANNOTATE DATASET
To add the above elements to a SG plot, an annotate dataset must be created. Variables in this dataset specify what
type of element, the location, color and text value to be displayed. The following is an example from the SAS® 9.4
ODS Graphics Procedures Guide. The annotate dataset will create a line at the average value of height and then
display that average in a text field. In the first observation the label is specified with the color blue, the function ‘text’
and the (x1, y1) position. In the second observation line color is specified as blue, the function ‘line’ with two sets of
ordered pairs to note the start and end point of the line.
obs
1
2
label
Average Height 62 Inches
textcolor
blue
linecolor
blue
Output 1. Line annotate dataset
PROC SGPLOT is used with the
SCATTER statement to create the plot.
The annotate dataset is specified in the
sgannno=line option. The plot is
displayed with the line at the average
value and the text indicating the average.
The result is displayed in Figure 1.
proc sgplot
data=sashelp.class
sganno=Line;
scatter x=weight
y=height;
run;
Figure 1. Scatter plot with annotations
1
function
text
line
x1
20
10
y1
70
60
x2
.
99
y2
.
60
Creating Custom Graphic Output Using the SG Annotate Data Set, continued
THE SG ANNOTATE FUNCTIONS
Table 1 lists the ten functions available in the SG annotate dataset and describes their output
Function
Description
ARROW
Draws an arrow annotation.
IMAGE
Specifies a graphic file to use for an image annotation.
LINE
Draws a line annotation.
OVAL
Draws an oval or circle annotation.
POLYCONT
Continues drawing a polygon that was begun with the POLYGON function or a line that was
begun with the POLYLINE function.
POLYGON
Specifies the beginning point of a polygon.
POLYLINE
Specifies the beginning point of a polyline, which is a connected series of line segments.
RECTANGLE
Draws a rectangle or square annotation.
TEXT
Places text in the graph output.
TEXTCONT
Continues a text string.
Table 1. Summary of SG Annotate Functions
MODIFYING AN SG PROCEDURE
As mentioned above, the sganno= with the annotate dataset name must be added to the procedure statement. If the
elements will be inside the existing plot then no other modifications are necessary. If the elements will be placed
outside the plot, as in a table of numbers corresponding to the values in the plot, the pad= option can be added to
create the necessary space. If pad=(right=25%) was added to the SGPLOT statement, then a margin on the right
side of the plot would be available to use for those elements.
THE SG ANNOTATE DRAWSPACE
The position and scaling of the annotations is controlled by specifying the drawspace and the units. This can be done
individually for the x and y axes or for both. There are four areas where elements can be rendered: graph, layout,
wall and data. The graph area is the entire region of the graph image, including axes, titles, footnotes and legend
space. The layout area excludes the title and footnote space. The wall area is within the axes including offsets. The
data area is within the axes not including offsets. The following units can be used: percentage, pixels, or data values.
These are set using the drawspace, x1space, y1space, x2space, and y2space variables in the annotate dataset.
Possible values for these variables are datapercent, datapixel, datavalue, graphpercent, graphpixel, layoutpercent,
layoutvalue, wallpercent, and wallpixel. These are described in the examples below.
EXAMPLE 1 – LAB SHIFT PLOT
This is a typical lab shift plot that is presented in clinical trials to assess safety of a drug. The baseline value is
plotted on the x-axis while the post-baseline value is on the y-axis. The treatments have a different symbol. This was
created using PROC SGPLOT and the SCATTER statement. Lines indicating the lower limit of normal (LLN) and
upper limit of normal (ULN) were annotated. For the lab test displayed, the LLN is 150 and the ULN is 450. The
lines can be annotated using three functions: line, rectangle, or polyline/polycont.
The first method creates an annotate dataset using the line function and draws 4 lines from the points (x1, y1) to
(x2, y2). The value of drawspace is ‘datavalue' which positions and scales the line with respect to the data values.
data annotate;
length linecolor $ 9 function $9;
retain linecolor 'black' function 'line' drawspace 'datavalue';
x1=150; y1=150; x2=150; y2=450; output;
x1=150; y1=450; x2=450; y2=450; output;
x1=450; y1=450; x2=450; y2=150; output;
x1=450; y1=150; x2=150; y2=150; output;
run;
2
Creating Custom Graphic Output Using the SG Annotate Data Set, continued
The second method creates an annotate dataset using the rectangle function. One corner is specified as (x1, y1).
The height and width are both set to 300. Heightunit and widthunit are set to ‘data’. Finally, the anchor is specified
as ‘topright’ since the corner was the bottom left.
data annotate;
length linecolor $ 9 function $9;
retain linecolor 'black' function 'rectangle' drawspace 'datavalue'
heightunit 'data' widthunit 'data' anchor 'topright';
x1=150; y1=150; height=300; width=300;
run;
The last method creates an annotate dataset using the polyline and polycont functions. The first corner is specified
as (x1, y1) with the function ‘polyline’. Subsequent points are specified as (x1, y1) with the function ‘polycont’.
data annotate;
length linecolor $ 9 function $9;
retain linecolor 'black' drawspace 'datavalue';
function='polyline'; x1=150; y1=150; output;
function='polycont'; x1=150; y1=450; output;
function='polycont'; x1=450; y1=450; output;
function='polycont'; x1=450; y1=150; output;
function='polycont'; x1=150; y1=150; output;
run;
Any of the three methods could be used to create the lines.
Figure 2. Lab Shift Plot
3
Creating Custom Graphic Output Using the SG Annotate Data Set, continued
EXAMPLE 2 – WATERFALL PLOT
The waterfall plot was created using PROC SGPLOT with the VBAR statement. In this case, a bar for each patient is
displayed indicating the percent change in tumor size. The bar has a different color for each treatment. Also note
that the data is sorted by percent change to create the waterfall effect. In oncology studies, patients are also given a
best overall response rating. This rating was then annotated at the top or bottom of the bar. This was done with the
following annotate dataset. The function ‘text’ is used along with the drawspace value ‘datavalue’. The best overall
response is centered at the point (x1, y1).
data best_response;
set waterfall;
length label $ 2 textcolor $ 9 function $9;
retain textcolor 'black' function 'text' drawspace 'datavalue' textsize 7;
x1=x;
if pcchgb<0 then y1=pcchgb-2;
if pcchgb>=0 then y1=pcchgb+2;
label=put(boro,boro.);
run;
Figure 3. Waterfall Plot
4
Creating Custom Graphic Output Using the SG Annotate Data Set, continued
EXAMPLE 3 – ADVERSE EVENT PLOT
The final example is a plot of adverse events over time. In this case, only adverse events are plotted but additional
symbols such as dosing times or events of special interest could easily be added. This is created with PROC
SGPLOT and the SCATTER statement. Time is displayed on the x-axis and the patient number is on the y-axis. It is
designed to have 20 patients on a page and will dynamically create as many pages as necessary. A format is
created that contains the patient numbers for each page. Finally an annotate dataset is created to display a line for
each patient to indicate the duration in the study. The function ‘line’ is used to draw a line for each patient from x=0
to x=day.
data annotate;
merge dosing paging;
by invpt;
if page=input("&i",best.);
length linecolor $ 9 function $9;
retain linecolor 'black' function 'line' drawspace 'datavalue';
x1=0;
y1=y;
x2=day;
y2=y;
run;
Figure 4. Adverse Event Plot – Page 1
5
Creating Custom Graphic Output Using the SG Annotate Data Set, continued
Figure 5. Adverse Event Plot - Page 2
DIFFERENCES BETWEEN SAS/GRAPH ANNOTATE AND SG ANNOTATE
The following are the main differences between SAS/GRAPH annotate and SG annotate datasets:

SAS/GRAPH annotate datasets use the xsys and ysys variables to specify the draw space while SG
annotate use the drawspace, x1space and y1space variables. The values for xsys and ysys are 1-9 and AC to specify absolute or relative, data or screen or window, and value or cell or percentage. SG annotate
uses more intuitive values such as ‘datapercent’, ‘datavalue’, ‘graphpercent’, or ‘layoutpercent’.

SAS/GRAPH annotate datasets have one paired x and y values per observation. In order to draw a line,
two observations are required. One with a ‘move’ function to the first (x, y) value then another observation
with the ‘draw’ function to the ending (x, y) value. SG annotate dataset allows two sets of ordered pairs on
the same observation.

SAS/GRAPH annotate datasets use the position variable to place text and symbols at the desired location.
The values can be 0-9 and A-F for various combinations of centered, left or right justified and centered,
above or below the specified point. SG annotate uses the anchor variable with more intuitive values of ‘top’,
‘center’, ‘bottom’, ‘topright’, or ‘bottomleft’ to name a few.
CONCLUSION
SG annotate datasets allows custom text, lines, shapes and even images to be added to SG outputs. This allows
easier enhancement of the output with possibly less code and development time.
RECOMMENDED READING

SAS® 9.4 ODS Graphics Procedures Guide
6
Creating Custom Graphic Output Using the SG Annotate Data Set, continued
CONTACT INFORMATION
Your comments and questions are valued and encouraged. Contact the author at:
Doug Lassman
Quintiles
6700 W. 115 St.
Overland Park, KS 66211
913-708-6395
doug.lassman@quintiles.com
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS
Institute Inc. in the USA and other countries. ® indicates USA registration.
Other brand and product names are trademarks of their respective companies.
7
Creating Custom Graphic Output Using the SG Annotate Data Set, continued
APPENDIX 1 – CODE FOR LAB SHIFT PLOT
*** Create dataset with subject number, treatment, baseline and post ***;
*** baseline Platelet Counts
***;
data platelet;
input subj $ trt $ baseline
cards;
0001001 A 269 267 0001002 B
0002001 A 405 348 0002002 A
0002005 B 324 379 0002006 A
0003003 B 398 459 0003004 B
0003007 C 256 253 0003008 A
0004004 B 328 321 0005001 A
0005004 A 392 358 0005005 A
0006002 A 336 282 0006003 B
0009001 B 345 268 0009002 A
0011001 C 270 283 0012001 B
0012004 A 284 225 0012005 C
0012008 C 302 323 0013001 B
0014001 B 267 245 0015001 A
0017001 B 324 334 0017002 C
0026005 A 240 240 0026008 A
0038002 B 361 298 0041001 B
0042001 A 331 214 0042002 C
0042005 B 335 302 0045001 B
0045005 A 366 331 0048001 B
0055003 A 466 478 0056002 B
0301003 A 370 446 0301004 B
0302002 B 205 182 0302003 B
0302006 A 399 393 0302008 B
0401001 B 324 293 0401002 B
0401005 A 294 251 0401006 B
0403004 C 201 200 0403005 A
0403008 B 264 306 0403009 A
0404005 B 384 511 0404007 C
0404010 B 300 398 0404012 A
0404015 A 442 380 0404016 C
0404021 A 384 393 0501001 A
0501004 B 276 379 0501005 A
0601002 B 378 335 0601005 B
0601009 A 373 338 0601010 A
0601013 B 215 178 0701001 B
0702002 B 315 353 0702003 A
0703002 C 269 269 0703003 B
0703007 A 356 220 0901001 A
0901004 A 379 403 0901005 B
0901008 A 379 357 0901009 A
0901012 A 258 206 0901013 A
;
run;
post @@;
247
384
283
390
175
315
264
265
331
366
238
273
399
251
337
316
287
238
160
177
401
260
346
333
322
326
278
307
302
330
306
461
303
340
345
228
236
298
163
254
422
274
419
305
359
150
387
278
284
288
320
248
261
332
259
337
253
224
388
190
241
372
283
387
320
397
234
266
265
253
325
282
379
309
381
450
238
257
282
263
221
347
0001003
0002003
0003001
0003005
0004001
0005002
0005008
0006004
0009004
0012002
0012006
0013002
0015003
0017003
0029002
0041002
0042003
0045002
0055001
0056003
0301005
0302004
0302009
0401003
0403001
0403006
0404002
0404008
0404013
0404018
0501002
0501006
0601006
0601011
0701002
0702004
0703004
0901002
0901006
0901010
0901014
A
A
B
B
A
A
A
C
B
B
A
A
B
C
C
B
C
A
A
B
A
B
C
A
A
C
B
A
B
C
B
C
C
B
C
A
B
A
B
B
B
238
356
246
293
235
306
284
339
228
175
404
374
272
205
314
306
320
283
156
274
410
244
210
281
179
315
460
376
345
156
245
523
248
297
258
237
283
258
327
386
245
392
382
269
299
284
351
332
339
242
140
476
344
343
308
314
316
304
331
188
379
345
362
210
326
175
315
430
250
364
200
222
523
248
298
145
226
297
305
303
360
239
0001004
0002004
0003002
0003006
0004003
0005003
0006001
0008002
0009005
0012003
0012007
0013003
0015007
0026001
0038001
0041003
0042004
0045004
0055002
0301001
0302001
0302005
0302010
0401004
0403002
0403007
0404003
0404009
0404014
0404019
0501003
0601001
0601007
0601012
0702001
0703001
0703006
0901003
0901007
0901011
A
A
C
A
B
B
A
B
A
A
B
A
A
A
B
B
B
B
A
C
B
A
B
A
B
B
A
A
A
C
B
B
C
A
B
C
A
A
B
B
373
253
255
280
225
325
308
285
219
325
275
120
260
497
239
460
197
262
223
389
241
173
292
347
350
310
391
298
307
230
234
334
266
365
394
355
371
272
404
453
425
222
253
285
250
319
254
280
212
324
301
134
223
644
202
458
192
211
185
389
230
174
253
331
466
252
457
309
359
257
259
400
266
371
376
355
335
253
500
400
*** Create annotate dataset to lines for LLN and ULN (150-450)
***;
*** Note the following three annotate datasets do the same thing. ***;
data annotate;
length linecolor $ 9 function $9;
retain linecolor 'black' function 'line' drawspace 'datavalue';
x1=150; y1=150; x2=150; y2=450; output;
x1=150; y1=450; x2=450; y2=450; output;
x1=450; y1=450; x2=450; y2=150; output;
x1=450; y1=150; x2=150; y2=150; output;
run;
8
Creating Custom Graphic Output Using the SG Annotate Data Set, continued
data annotate;
length linecolor $ 9 function $9;
retain linecolor 'black' function 'rectangle' drawspace 'datavalue'
heightunit 'data' widthunit 'data' anchor 'topright';
x1=150; y1=150; height=300; width=300;
run;
data annotate;
length linecolor $ 9 function $9;
retain linecolor 'black' drawspace 'datavalue';
function='polyline'; x1=150; y1=150; output;
function='polycont'; x1=150; y1=450; output;
function='polycont'; x1=450; y1=450; output;
function='polycont'; x1=450; y1=150; output;
function='polycont'; x1=150; y1=150; output;
run;
*** Graphics options ***;
goptions reset=goptions
hsize=8 in
vsize=6 in
device=png300
cback='#FFFFFF'
gsfmode=replace
gsfname=pngloc
gunit=cells
gsfmode=replace
nodisplay;
options orientation=landscape ;
title1 "Plot of Baseline vs Post Baseline Platelet Counts (x10^9/L)";
footnote1 j=l "Lines indicate LLN=150 and ULN=450";
proc sgplot data=platelet sganno=annotate;
scatter x=post y=baseline / group=trt;
xaxis label='Baseline';
yaxis label='Post-Baseline';
keylegend / title='Treatment' location=inside position=topleft;
run;
quit;
APPENDIX 2 – CODE FOR WATERFALL PLOT
*** Create dataset with subject number, treatment, percent change ***;
*** and best overall response
***;
data waterfall;
input subj trt pcchgb boro @@;
cards;
1 0 0.3 4 2 0 -7.7 0 3 1 -52.1 4 4 0 4.5 1 5 0 2.7 1 6 1 -20.2 0 7 1 -18.1 1
8 0 12.5 1 9 1 -18.6 3 10 0 12.7 4 11 0 -23.1 3 12 1 -11.1 1 13 0 -95.6 2
14 1 -3.2 1 15 1 -17.3 2 16 0 -33.8 0 17 1 -16.5 3 18 0 -0.9 1 19 1 -15.0 4
20 1 -15.1 3 21 1 -7.6 2 22 0 -0.3 3 23 1 -9.3 1 24 1 -6.1 2 25 0 -16.8 3
26 0 -62.4 3 27 0 -3.6 2 28 1 15.9 3 29 0 1.7 4 30 1 -30.1 1 31 1 2.7 1
32 0 16.6 3 33 0 -6.5 3 34 1 -20.9 3 35 0 -54.4 3
;
run;
9
Creating Custom Graphic Output Using the SG Annotate Data Set, continued
*** Sort by percent change ***;
proc sort; by pcchgb; run;
*** Create x value ***;
data waterfall;
set waterfall;
x=_n_;
run;
*** Formats for annotate dataset and legend ***;
proc format;
value boro 0='SD'
1='CR'
2='PD'
3='PR'
4='NE';
value trt 0='Drug 1'
1='Drug 2';
run;
*** Create annotate dataset to put best overall response at the top of the ***;
*** bars
***;
data best_response;
set waterfall;
length label $ 2 textcolor $ 9 function $9;
retain textcolor 'black' function 'text' drawspace 'datavalue' textsize 7;
x1=x;
if pcchgb<0 then y1=pcchgb-2;
if pcchgb>=0 then y1=pcchgb+2;
label=put(boro,boro.);
run;
*** Graphics options ***;
goptions reset=goptions
hsize=8 in
vsize=6 in
device=png300
cback='#FFFFFF'
gsfmode=replace
gsfname=pngloc
gunit=cells
gsfmode=replace
nodisplay;
options orientation=landscape ;
title1 "Plot of Tumor Size Change with Best Overall Response";
footnote1 H=0.9 j=l "SD=Stable Disease, CR=Complete Response, PD=Progressive
Disease, PR=Partial Response, NE=Not Evaluable";
*** Create vertical bar chart annotating bars with the overall best response ***;
proc sgplot data=waterfall sganno=best_response;
vbar x / group=trt response=pcchgb;
xaxis display=none;
yaxis label='Percent Change';
keylegend / title='Treatment' location=inside position=topleft;
format trt trt. pcchgb x 4.;
run;
10
Creating Custom Graphic Output Using the SG Annotate Data Set, continued
APPENDIX 3 – CODE FOR ADVERSE EVENT PLOT
*** Create dataset of Adverse Event Days ***;
data ae;
do pt=1 to 37;
do ae=1 to 3+int(8*ranuni(3));
day=int(40*ranuni(4));
output;
end;
end;
run;
*** Create a more realistic inv pt number ***;
data ae;
set ae;
if 1<=pt<=12 then site='101';
if 13<=pt<=19 then site='102';
if 20<=pt<=27 then site='103';
if 28<=pt<=38 then site='104';
invpt=site||'-'||put(pt,z3.);
drop site pt;
run;
proc sort; by invpt; run;
*** Create the dosing data ***;
proc means data=ae noprint nway;
class invpt;
var day;
output out=dosing(drop=_type_ _freq_) max=;
run;
*** Get y variable for first patient at top as 1 to ***;
*** last patient on the page as 20
***;
data dosing;
set dosing;
by invpt;
day=day+int(3*ranuni(9));
if _n_=1 then order=0;
order=order+1;
if order=21 then order=1;
retain order;
y=21-order;
run;
*** Get page variable - 20 patients per page ***;
data paging;
set dosing;
by invpt;
if _n_=1 then page=0;
if order=1 then page=page+1;
retain page;
keep invpt page y;
run;
11
Creating Custom Graphic Output Using the SG Annotate Data Set, continued
*** Merge page number onto AE data ***;
data forgraph;
merge ae paging;
by invpt;
run;
*** Create macro variable for the number of pages ***;
data forgraph;
set forgraph end=eof;
if eof then call symput("maxpage",put(page,best.));
run;
*** Loop through macro for each page ***;
%macro doplot(i);
*** Create annotate dataset for a line from 0 to last dosing day ***;
data annotate;
merge dosing paging;
by invpt;
if page=input("&i",best.);
length linecolor $ 9 function $9;
retain linecolor 'black' function 'line' drawspace 'datavalue';
x1=0;
y1=y;
x2=day;
y2=y;
run;
*** Create format for the y-axis ***;
data fmt;
set annotate;
by invpt;
start=y;
label=invpt;
keep start label;
run;
proc sort; by start; run;
data all;
do start=1 to 20;
output;
end;
run;
proc sort; by start; run;
data fmt;
merge fmt all;
by start;
fmtname='pt';
run;
proc format cntlin=fmt;
run;
12
Creating Custom Graphic Output Using the SG Annotate Data Set, continued
*** Graphic options ***;
goptions reset=goptions
hsize=8 in
vsize=6 in
device=png300
cback='#FFFFFF'
gsfmode=replace
gsfname=pngloc
gunit=cells
gsfmode=replace
nodisplay;
options orientation=landscape ;
title1 h=1.5 j=c "Dot plot of all adverse events over time";
title2 h=1.5 j=c "(Page &i of %cmpres(&maxpage))";
footnote1 h=1 j=l "Length of line indicates duration in study.";
*** Scatter plot with day as x axis and patient as y ***;
proc sgplot data=forgraph sganno=annotate;
where page=input("&i",best.);
scatter y=y x=day;
format y pt.;
xaxis label='Time since first dose (days)' values=(0 to 45 by 5);
yaxis label=' ' values=(1 to 20 by 1);
run;
quit;
%mend;
*** Call the plot macro for each page ***;
%macro call;
%do x=1 %to &maxpage;
%doplot(&x);
%end;
%mend;
%call
13
Download