Creating Custom Graphic Output Using the SG Annotate Data Set Doug Lassman, Quintiles, Overland Park, KS ABSTRACT New to the version 9.3 release, SAS added the capability to include custom text, lines and shapes to ODS Statistical Graphics. This paper will show examples of how these can be used to enhance graphic outputs. The examples will be primarily from the pharmaceutical industry but these can easily be extended to use in other areas. Annotations can be used in the SGPLOT, SGPANEL and SGSCATTER procedures. Graphs using each procedure will be shown concentrating mainly on the TEXT and LINE annotation functions but the others will also be discussed. INTRODUCTION This paper will discuss the basics of the SG annotate dataset, the elements that can be added, and the drawspace variables to control the position and scaling of the annotations. Three examples are discussed and the complete code to generate the examples is included in the appendices. SG ANNOTATION The following elements can be added to a graph: text labels lines and arrows ovals or circles rectangles or squares polygons images THE SG ANNOTATE DATASET To add the above elements to a SG plot, an annotate dataset must be created. Variables in this dataset specify what type of element, the location, color and text value to be displayed. The following is an example from the SAS® 9.4 ODS Graphics Procedures Guide. The annotate dataset will create a line at the average value of height and then display that average in a text field. In the first observation the label is specified with the color blue, the function ‘text’ and the (x1, y1) position. In the second observation line color is specified as blue, the function ‘line’ with two sets of ordered pairs to note the start and end point of the line. obs 1 2 label Average Height 62 Inches textcolor blue linecolor blue Output 1. Line annotate dataset PROC SGPLOT is used with the SCATTER statement to create the plot. The annotate dataset is specified in the sgannno=line option. The plot is displayed with the line at the average value and the text indicating the average. The result is displayed in Figure 1. proc sgplot data=sashelp.class sganno=Line; scatter x=weight y=height; run; Figure 1. Scatter plot with annotations 1 function text line x1 20 10 y1 70 60 x2 . 99 y2 . 60 Creating Custom Graphic Output Using the SG Annotate Data Set, continued THE SG ANNOTATE FUNCTIONS Table 1 lists the ten functions available in the SG annotate dataset and describes their output Function Description ARROW Draws an arrow annotation. IMAGE Specifies a graphic file to use for an image annotation. LINE Draws a line annotation. OVAL Draws an oval or circle annotation. POLYCONT Continues drawing a polygon that was begun with the POLYGON function or a line that was begun with the POLYLINE function. POLYGON Specifies the beginning point of a polygon. POLYLINE Specifies the beginning point of a polyline, which is a connected series of line segments. RECTANGLE Draws a rectangle or square annotation. TEXT Places text in the graph output. TEXTCONT Continues a text string. Table 1. Summary of SG Annotate Functions MODIFYING AN SG PROCEDURE As mentioned above, the sganno= with the annotate dataset name must be added to the procedure statement. If the elements will be inside the existing plot then no other modifications are necessary. If the elements will be placed outside the plot, as in a table of numbers corresponding to the values in the plot, the pad= option can be added to create the necessary space. If pad=(right=25%) was added to the SGPLOT statement, then a margin on the right side of the plot would be available to use for those elements. THE SG ANNOTATE DRAWSPACE The position and scaling of the annotations is controlled by specifying the drawspace and the units. This can be done individually for the x and y axes or for both. There are four areas where elements can be rendered: graph, layout, wall and data. The graph area is the entire region of the graph image, including axes, titles, footnotes and legend space. The layout area excludes the title and footnote space. The wall area is within the axes including offsets. The data area is within the axes not including offsets. The following units can be used: percentage, pixels, or data values. These are set using the drawspace, x1space, y1space, x2space, and y2space variables in the annotate dataset. Possible values for these variables are datapercent, datapixel, datavalue, graphpercent, graphpixel, layoutpercent, layoutvalue, wallpercent, and wallpixel. These are described in the examples below. EXAMPLE 1 – LAB SHIFT PLOT This is a typical lab shift plot that is presented in clinical trials to assess safety of a drug. The baseline value is plotted on the x-axis while the post-baseline value is on the y-axis. The treatments have a different symbol. This was created using PROC SGPLOT and the SCATTER statement. Lines indicating the lower limit of normal (LLN) and upper limit of normal (ULN) were annotated. For the lab test displayed, the LLN is 150 and the ULN is 450. The lines can be annotated using three functions: line, rectangle, or polyline/polycont. The first method creates an annotate dataset using the line function and draws 4 lines from the points (x1, y1) to (x2, y2). The value of drawspace is ‘datavalue' which positions and scales the line with respect to the data values. data annotate; length linecolor $ 9 function $9; retain linecolor 'black' function 'line' drawspace 'datavalue'; x1=150; y1=150; x2=150; y2=450; output; x1=150; y1=450; x2=450; y2=450; output; x1=450; y1=450; x2=450; y2=150; output; x1=450; y1=150; x2=150; y2=150; output; run; 2 Creating Custom Graphic Output Using the SG Annotate Data Set, continued The second method creates an annotate dataset using the rectangle function. One corner is specified as (x1, y1). The height and width are both set to 300. Heightunit and widthunit are set to ‘data’. Finally, the anchor is specified as ‘topright’ since the corner was the bottom left. data annotate; length linecolor $ 9 function $9; retain linecolor 'black' function 'rectangle' drawspace 'datavalue' heightunit 'data' widthunit 'data' anchor 'topright'; x1=150; y1=150; height=300; width=300; run; The last method creates an annotate dataset using the polyline and polycont functions. The first corner is specified as (x1, y1) with the function ‘polyline’. Subsequent points are specified as (x1, y1) with the function ‘polycont’. data annotate; length linecolor $ 9 function $9; retain linecolor 'black' drawspace 'datavalue'; function='polyline'; x1=150; y1=150; output; function='polycont'; x1=150; y1=450; output; function='polycont'; x1=450; y1=450; output; function='polycont'; x1=450; y1=150; output; function='polycont'; x1=150; y1=150; output; run; Any of the three methods could be used to create the lines. Figure 2. Lab Shift Plot 3 Creating Custom Graphic Output Using the SG Annotate Data Set, continued EXAMPLE 2 – WATERFALL PLOT The waterfall plot was created using PROC SGPLOT with the VBAR statement. In this case, a bar for each patient is displayed indicating the percent change in tumor size. The bar has a different color for each treatment. Also note that the data is sorted by percent change to create the waterfall effect. In oncology studies, patients are also given a best overall response rating. This rating was then annotated at the top or bottom of the bar. This was done with the following annotate dataset. The function ‘text’ is used along with the drawspace value ‘datavalue’. The best overall response is centered at the point (x1, y1). data best_response; set waterfall; length label $ 2 textcolor $ 9 function $9; retain textcolor 'black' function 'text' drawspace 'datavalue' textsize 7; x1=x; if pcchgb<0 then y1=pcchgb-2; if pcchgb>=0 then y1=pcchgb+2; label=put(boro,boro.); run; Figure 3. Waterfall Plot 4 Creating Custom Graphic Output Using the SG Annotate Data Set, continued EXAMPLE 3 – ADVERSE EVENT PLOT The final example is a plot of adverse events over time. In this case, only adverse events are plotted but additional symbols such as dosing times or events of special interest could easily be added. This is created with PROC SGPLOT and the SCATTER statement. Time is displayed on the x-axis and the patient number is on the y-axis. It is designed to have 20 patients on a page and will dynamically create as many pages as necessary. A format is created that contains the patient numbers for each page. Finally an annotate dataset is created to display a line for each patient to indicate the duration in the study. The function ‘line’ is used to draw a line for each patient from x=0 to x=day. data annotate; merge dosing paging; by invpt; if page=input("&i",best.); length linecolor $ 9 function $9; retain linecolor 'black' function 'line' drawspace 'datavalue'; x1=0; y1=y; x2=day; y2=y; run; Figure 4. Adverse Event Plot – Page 1 5 Creating Custom Graphic Output Using the SG Annotate Data Set, continued Figure 5. Adverse Event Plot - Page 2 DIFFERENCES BETWEEN SAS/GRAPH ANNOTATE AND SG ANNOTATE The following are the main differences between SAS/GRAPH annotate and SG annotate datasets: SAS/GRAPH annotate datasets use the xsys and ysys variables to specify the draw space while SG annotate use the drawspace, x1space and y1space variables. The values for xsys and ysys are 1-9 and AC to specify absolute or relative, data or screen or window, and value or cell or percentage. SG annotate uses more intuitive values such as ‘datapercent’, ‘datavalue’, ‘graphpercent’, or ‘layoutpercent’. SAS/GRAPH annotate datasets have one paired x and y values per observation. In order to draw a line, two observations are required. One with a ‘move’ function to the first (x, y) value then another observation with the ‘draw’ function to the ending (x, y) value. SG annotate dataset allows two sets of ordered pairs on the same observation. SAS/GRAPH annotate datasets use the position variable to place text and symbols at the desired location. The values can be 0-9 and A-F for various combinations of centered, left or right justified and centered, above or below the specified point. SG annotate uses the anchor variable with more intuitive values of ‘top’, ‘center’, ‘bottom’, ‘topright’, or ‘bottomleft’ to name a few. CONCLUSION SG annotate datasets allows custom text, lines, shapes and even images to be added to SG outputs. This allows easier enhancement of the output with possibly less code and development time. RECOMMENDED READING SAS® 9.4 ODS Graphics Procedures Guide 6 Creating Custom Graphic Output Using the SG Annotate Data Set, continued CONTACT INFORMATION Your comments and questions are valued and encouraged. Contact the author at: Doug Lassman Quintiles 6700 W. 115 St. Overland Park, KS 66211 913-708-6395 doug.lassman@quintiles.com SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies. 7 Creating Custom Graphic Output Using the SG Annotate Data Set, continued APPENDIX 1 – CODE FOR LAB SHIFT PLOT *** Create dataset with subject number, treatment, baseline and post ***; *** baseline Platelet Counts ***; data platelet; input subj $ trt $ baseline cards; 0001001 A 269 267 0001002 B 0002001 A 405 348 0002002 A 0002005 B 324 379 0002006 A 0003003 B 398 459 0003004 B 0003007 C 256 253 0003008 A 0004004 B 328 321 0005001 A 0005004 A 392 358 0005005 A 0006002 A 336 282 0006003 B 0009001 B 345 268 0009002 A 0011001 C 270 283 0012001 B 0012004 A 284 225 0012005 C 0012008 C 302 323 0013001 B 0014001 B 267 245 0015001 A 0017001 B 324 334 0017002 C 0026005 A 240 240 0026008 A 0038002 B 361 298 0041001 B 0042001 A 331 214 0042002 C 0042005 B 335 302 0045001 B 0045005 A 366 331 0048001 B 0055003 A 466 478 0056002 B 0301003 A 370 446 0301004 B 0302002 B 205 182 0302003 B 0302006 A 399 393 0302008 B 0401001 B 324 293 0401002 B 0401005 A 294 251 0401006 B 0403004 C 201 200 0403005 A 0403008 B 264 306 0403009 A 0404005 B 384 511 0404007 C 0404010 B 300 398 0404012 A 0404015 A 442 380 0404016 C 0404021 A 384 393 0501001 A 0501004 B 276 379 0501005 A 0601002 B 378 335 0601005 B 0601009 A 373 338 0601010 A 0601013 B 215 178 0701001 B 0702002 B 315 353 0702003 A 0703002 C 269 269 0703003 B 0703007 A 356 220 0901001 A 0901004 A 379 403 0901005 B 0901008 A 379 357 0901009 A 0901012 A 258 206 0901013 A ; run; post @@; 247 384 283 390 175 315 264 265 331 366 238 273 399 251 337 316 287 238 160 177 401 260 346 333 322 326 278 307 302 330 306 461 303 340 345 228 236 298 163 254 422 274 419 305 359 150 387 278 284 288 320 248 261 332 259 337 253 224 388 190 241 372 283 387 320 397 234 266 265 253 325 282 379 309 381 450 238 257 282 263 221 347 0001003 0002003 0003001 0003005 0004001 0005002 0005008 0006004 0009004 0012002 0012006 0013002 0015003 0017003 0029002 0041002 0042003 0045002 0055001 0056003 0301005 0302004 0302009 0401003 0403001 0403006 0404002 0404008 0404013 0404018 0501002 0501006 0601006 0601011 0701002 0702004 0703004 0901002 0901006 0901010 0901014 A A B B A A A C B B A A B C C B C A A B A B C A A C B A B C B C C B C A B A B B B 238 356 246 293 235 306 284 339 228 175 404 374 272 205 314 306 320 283 156 274 410 244 210 281 179 315 460 376 345 156 245 523 248 297 258 237 283 258 327 386 245 392 382 269 299 284 351 332 339 242 140 476 344 343 308 314 316 304 331 188 379 345 362 210 326 175 315 430 250 364 200 222 523 248 298 145 226 297 305 303 360 239 0001004 0002004 0003002 0003006 0004003 0005003 0006001 0008002 0009005 0012003 0012007 0013003 0015007 0026001 0038001 0041003 0042004 0045004 0055002 0301001 0302001 0302005 0302010 0401004 0403002 0403007 0404003 0404009 0404014 0404019 0501003 0601001 0601007 0601012 0702001 0703001 0703006 0901003 0901007 0901011 A A C A B B A B A A B A A A B B B B A C B A B A B B A A A C B B C A B C A A B B 373 253 255 280 225 325 308 285 219 325 275 120 260 497 239 460 197 262 223 389 241 173 292 347 350 310 391 298 307 230 234 334 266 365 394 355 371 272 404 453 425 222 253 285 250 319 254 280 212 324 301 134 223 644 202 458 192 211 185 389 230 174 253 331 466 252 457 309 359 257 259 400 266 371 376 355 335 253 500 400 *** Create annotate dataset to lines for LLN and ULN (150-450) ***; *** Note the following three annotate datasets do the same thing. ***; data annotate; length linecolor $ 9 function $9; retain linecolor 'black' function 'line' drawspace 'datavalue'; x1=150; y1=150; x2=150; y2=450; output; x1=150; y1=450; x2=450; y2=450; output; x1=450; y1=450; x2=450; y2=150; output; x1=450; y1=150; x2=150; y2=150; output; run; 8 Creating Custom Graphic Output Using the SG Annotate Data Set, continued data annotate; length linecolor $ 9 function $9; retain linecolor 'black' function 'rectangle' drawspace 'datavalue' heightunit 'data' widthunit 'data' anchor 'topright'; x1=150; y1=150; height=300; width=300; run; data annotate; length linecolor $ 9 function $9; retain linecolor 'black' drawspace 'datavalue'; function='polyline'; x1=150; y1=150; output; function='polycont'; x1=150; y1=450; output; function='polycont'; x1=450; y1=450; output; function='polycont'; x1=450; y1=150; output; function='polycont'; x1=150; y1=150; output; run; *** Graphics options ***; goptions reset=goptions hsize=8 in vsize=6 in device=png300 cback='#FFFFFF' gsfmode=replace gsfname=pngloc gunit=cells gsfmode=replace nodisplay; options orientation=landscape ; title1 "Plot of Baseline vs Post Baseline Platelet Counts (x10^9/L)"; footnote1 j=l "Lines indicate LLN=150 and ULN=450"; proc sgplot data=platelet sganno=annotate; scatter x=post y=baseline / group=trt; xaxis label='Baseline'; yaxis label='Post-Baseline'; keylegend / title='Treatment' location=inside position=topleft; run; quit; APPENDIX 2 – CODE FOR WATERFALL PLOT *** Create dataset with subject number, treatment, percent change ***; *** and best overall response ***; data waterfall; input subj trt pcchgb boro @@; cards; 1 0 0.3 4 2 0 -7.7 0 3 1 -52.1 4 4 0 4.5 1 5 0 2.7 1 6 1 -20.2 0 7 1 -18.1 1 8 0 12.5 1 9 1 -18.6 3 10 0 12.7 4 11 0 -23.1 3 12 1 -11.1 1 13 0 -95.6 2 14 1 -3.2 1 15 1 -17.3 2 16 0 -33.8 0 17 1 -16.5 3 18 0 -0.9 1 19 1 -15.0 4 20 1 -15.1 3 21 1 -7.6 2 22 0 -0.3 3 23 1 -9.3 1 24 1 -6.1 2 25 0 -16.8 3 26 0 -62.4 3 27 0 -3.6 2 28 1 15.9 3 29 0 1.7 4 30 1 -30.1 1 31 1 2.7 1 32 0 16.6 3 33 0 -6.5 3 34 1 -20.9 3 35 0 -54.4 3 ; run; 9 Creating Custom Graphic Output Using the SG Annotate Data Set, continued *** Sort by percent change ***; proc sort; by pcchgb; run; *** Create x value ***; data waterfall; set waterfall; x=_n_; run; *** Formats for annotate dataset and legend ***; proc format; value boro 0='SD' 1='CR' 2='PD' 3='PR' 4='NE'; value trt 0='Drug 1' 1='Drug 2'; run; *** Create annotate dataset to put best overall response at the top of the ***; *** bars ***; data best_response; set waterfall; length label $ 2 textcolor $ 9 function $9; retain textcolor 'black' function 'text' drawspace 'datavalue' textsize 7; x1=x; if pcchgb<0 then y1=pcchgb-2; if pcchgb>=0 then y1=pcchgb+2; label=put(boro,boro.); run; *** Graphics options ***; goptions reset=goptions hsize=8 in vsize=6 in device=png300 cback='#FFFFFF' gsfmode=replace gsfname=pngloc gunit=cells gsfmode=replace nodisplay; options orientation=landscape ; title1 "Plot of Tumor Size Change with Best Overall Response"; footnote1 H=0.9 j=l "SD=Stable Disease, CR=Complete Response, PD=Progressive Disease, PR=Partial Response, NE=Not Evaluable"; *** Create vertical bar chart annotating bars with the overall best response ***; proc sgplot data=waterfall sganno=best_response; vbar x / group=trt response=pcchgb; xaxis display=none; yaxis label='Percent Change'; keylegend / title='Treatment' location=inside position=topleft; format trt trt. pcchgb x 4.; run; 10 Creating Custom Graphic Output Using the SG Annotate Data Set, continued APPENDIX 3 – CODE FOR ADVERSE EVENT PLOT *** Create dataset of Adverse Event Days ***; data ae; do pt=1 to 37; do ae=1 to 3+int(8*ranuni(3)); day=int(40*ranuni(4)); output; end; end; run; *** Create a more realistic inv pt number ***; data ae; set ae; if 1<=pt<=12 then site='101'; if 13<=pt<=19 then site='102'; if 20<=pt<=27 then site='103'; if 28<=pt<=38 then site='104'; invpt=site||'-'||put(pt,z3.); drop site pt; run; proc sort; by invpt; run; *** Create the dosing data ***; proc means data=ae noprint nway; class invpt; var day; output out=dosing(drop=_type_ _freq_) max=; run; *** Get y variable for first patient at top as 1 to ***; *** last patient on the page as 20 ***; data dosing; set dosing; by invpt; day=day+int(3*ranuni(9)); if _n_=1 then order=0; order=order+1; if order=21 then order=1; retain order; y=21-order; run; *** Get page variable - 20 patients per page ***; data paging; set dosing; by invpt; if _n_=1 then page=0; if order=1 then page=page+1; retain page; keep invpt page y; run; 11 Creating Custom Graphic Output Using the SG Annotate Data Set, continued *** Merge page number onto AE data ***; data forgraph; merge ae paging; by invpt; run; *** Create macro variable for the number of pages ***; data forgraph; set forgraph end=eof; if eof then call symput("maxpage",put(page,best.)); run; *** Loop through macro for each page ***; %macro doplot(i); *** Create annotate dataset for a line from 0 to last dosing day ***; data annotate; merge dosing paging; by invpt; if page=input("&i",best.); length linecolor $ 9 function $9; retain linecolor 'black' function 'line' drawspace 'datavalue'; x1=0; y1=y; x2=day; y2=y; run; *** Create format for the y-axis ***; data fmt; set annotate; by invpt; start=y; label=invpt; keep start label; run; proc sort; by start; run; data all; do start=1 to 20; output; end; run; proc sort; by start; run; data fmt; merge fmt all; by start; fmtname='pt'; run; proc format cntlin=fmt; run; 12 Creating Custom Graphic Output Using the SG Annotate Data Set, continued *** Graphic options ***; goptions reset=goptions hsize=8 in vsize=6 in device=png300 cback='#FFFFFF' gsfmode=replace gsfname=pngloc gunit=cells gsfmode=replace nodisplay; options orientation=landscape ; title1 h=1.5 j=c "Dot plot of all adverse events over time"; title2 h=1.5 j=c "(Page &i of %cmpres(&maxpage))"; footnote1 h=1 j=l "Length of line indicates duration in study."; *** Scatter plot with day as x axis and patient as y ***; proc sgplot data=forgraph sganno=annotate; where page=input("&i",best.); scatter y=y x=day; format y pt.; xaxis label='Time since first dose (days)' values=(0 to 45 by 5); yaxis label=' ' values=(1 to 20 by 1); run; quit; %mend; *** Call the plot macro for each page ***; %macro call; %do x=1 %to &maxpage; %doplot(&x); %end; %mend; %call 13