Exploratory and Publication-Ready Graphs Easily Obtained Using SAS Version 9.2 Agricultural Experiment Station College of Agricultural, Consumer and Environmental Sciences Exploratory and Publication-Ready Graphs Easily Obtained Using SAS Version 9.2 Long Pham, NMSU Agricultural Experiment Station Dawn M. VanLeeuwen, Department of Economics and International Business; NMSU Agricultural Experiment Station I. INTRODUCTION While NMSU (New Mexico State University) Experiment Station and ACES (College of Agricultural, Consumer, and Environmental Sciences) researchers often use SAS software to analyze data, they typically use other software to obtain graphs. This has been driven at least in part, by the perceived difficulty of obtaining graphs – and in particular, publication quality graphs – using SAS software. However, SAS version 9.2 software introduces the so-called ‘SG’ procedures within the SAS/GRAPH suite that are designed to “produce publication-quality graphics for statistical data presentation with minimal effort” (SAS Institute Inc., retrieved from https://support.sas.com/edu/schedules.html?id=867&ctry=US on 8/27/2010). Using the new SAS procedures, which include PROC SGPLOT, PROC SGPANEL, and PROC SGSCATTER, to obtain graphs has several advantages. One advantage is that once code is written to produce a desired graph, similar graphs can be produced more quickly than if one is creating graphs using a point-and-click interface. While it is possible to code nearly any feature you might want, graphs produced using these procedures can be further customized using a point-and-click graphics editor. A second advantage is that using the SAS ODS (Output Delivery System) facility, one can easily capture information generated by SAS procedures such as PROC MIXED and then use this information in the ‘SG’ procedures to create graphs using model-based estimates. This capability facilitates creating graphs that incorporate correct model-based error estimates. Each of these graphing procedures has multiple capabilities. For example, both PROC SGPLOT and SGPANEL have the capability to produce horizontal bar charts, horizontal box and whisker plots histograms, horizontal line graphs, needle graphs, scatter plots, series graphs, vertical bar charts, vertical box and whisker plots, and vertical line graphs. They also have the capability to easily include fitted Bsplines, regression lines, and loess lines as well as confidence and prediction bands. The SGPLOT procedure produces overlaid plots; The SGPANEL procedure creates a panel for the values of one or more classification variables in one of several layouts including panel, lattice, column lattice, and row lattice (For further information, see SAS (2009) and Kuhfeld (2010)). Once a user is acquainted with these new graphing procedures and can create one or two of the many types of graphs that they can produce, that user can access SAS Help from within their SAS session to learn how to create other types of graphs. The purpose of this brief report is to introduce Experiment Station and ACES researchers to the potential utility of the new graphing procedures. This will be done through a series of examples using a fabricated data set. Each example will provide both the code used and the produced graph. II. THE DATA SET Graphing with PROC SGPANEL and PROC SGPLOT is illustrated using a hypothetical data set. Data correspond to a completely randomized design with repeated measures. In this hypothetical scenario, a total of 12 subjects were randomly assigned to one of 4 treatments. A response variable was measured at times 0, 1, 3, 5, 10 and 20. Table 1 represents the first ten observations; the table provides variable names that will be used in the example code. The data set is named FabDat. Subjects are uniquely identified by the combination of trt (treatment) and rep (replication). Table 1. Fabricated Data (FabDat)- First Ten Observations trt rep time response 1 1 0 2.9354 1 1 1 6.2293 1 1 3 9.3817 1 1 5 9.8197 trt rep time response 1 1 10 10.9859 1 1 20 12.2572 1 2 0 3.4637 1 2 1 7.5704 1 2 3 10.9341 1 2 5 12.6720 III. EXPLORATORY DATA ANALYSIS This section introduces using PROC SGPANEL and PROC SGPLOT to obtain preliminary graphs. The next section (IV) shows how to produce publication quality graphs. 1. Using PROC SGPANEL to Obtain Profile Plots When analyzing repeated measures designs, initial analysis should include obtaining the profiles for each subject. This allows one to identify outlying individuals and to form an initial impression of the data. Profile plots can be obtained very quickly using the following call to PROC SGPANEL which produces Figure 1. The lineattrs option specifies a black and white graph. proc sgpanel data=FabDat; panelby trt; series x=time y=response / group=rep lineattrs=(color=black); title1 'Figure 1. Exploratory Graphs - Profiles of Each Subject'; run; quit; After running the code, the graph can be viewed by double-clicking ‘SGPANEL’ then the item for ‘SGPANEL Procedure’ in the Results panel to the left of the output window. Minor modifications of the code (below) produce a panel of series plots with markers (Figure 2) and with those markers and the lines specified to be black. proc sgpanel data=FabDat; panelby trt; series x=time y=response / group=rep markers markerattrs=(color=black) lineattrs=(color=black); title1 'Figure 2. Exploratory Graphs - Profiles of Each Subject'; run; quit; A scatter plot can serve the same purpose as the series plot. The following code produces Figure 3. Note that both the series and scatter plots reveal some form of subject effects as well as non-constant variance. Both features must be accounted for in the mixed model analysis. proc sgpanel data=FabDat; panelby trt; scatter x=time y=response / group=rep markerattrs=(color=black); title1 'Figure 3. Exploratory Graphs - Scatter Plots for Each Subject'; run; quit; One way to exert more control over the appearance of the graphs produced using SGPLOT or SGPANEL is to reorganize the data using PROC SORT and PROC TRANSPOSE. Reorganizing the data as illustrated below allows one to specify attributes of each series and so to control the graphs’ appearance. The following code creates a variable for each replication. Readers should note that this works because the data structure is complete (i.e., there are no missing values). proc sort data=FabDat; by trt time rep; run; proc transpose data=FabDat out=T_rep; var response; by trt time; proc print data=T_rep; run; quit; Table 2. Transposed Data Obs 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 trt time _NAME_ 1 0 response 1 1 response 1 3 response 1 5 response 1 10 response 1 20 response 2 0 response 2 1 response 2 3 response 2 5 response 2 10 response 2 20 response 3 0 response 3 1 response 3 3 response 3 5 response 3 10 response 3 20 response 4 0 response 4 1 response 4 3 response 4 5 response 4 10 response 4 20 response COL1 2.9354 6.2293 9.3817 9.8197 10.9859 12.2572 2.5795 4.4889 6.7576 7.7639 7.9619 9.0055 3.2687 8.1111 12.3549 15.6742 18.1716 19.8734 3.2968 8.2906 11.9854 14.8362 18.7233 21.8094 COL2 3.4637 7.5704 10.9341 12.6720 13.7576 14.9563 3.0949 6.3537 9.5852 10.4912 11.3025 12.5676 3.5322 8.5962 12.5410 16.1784 19.3255 21.1534 3.2763 7.2997 11.8185 14.6321 18.2905 21.6077 COL3 3.0550 6.0440 8.8853 10.7655 11.2061 12.2812 2.7336 5.3853 7.8442 8.7641 8.6715 10.3107 3.2373 7.6886 11.9952 15.3995 16.9100 19.3367 2.6356 6.1205 9.5506 12.7592 15.6927 18.0451 Figure 4 illustrates the effect of the following code incorporating options relating to line and marker attributes and labels. The effect of the multiple series statements is to include all specified series on the same plot. For each marker and line, the color is specified to be black although blue, green or any of a number of other colors could have been specified. For markers, both the size and symbol have been specified in the code. For lines, the pattern and thickness have been specified. For each series, the legendlabel option specifies the label that appears in the legend. Note that the rowaxis label allows the user to specify the row label which serves as the y-axis label. proc sgpanel data=T_rep; panelby trt; series x=time y=col1 / markers markerattrs=(color=black size=5 symbol=star)lineattrs=(color = black pattern=thindot thickness=2)legendlabel='Replication 1'; series x=time y=col2 / markers markerattrs=(color=black size=5 symbol=circlefilled)lineattrs=(color = black pattern=shortdashdot thickness=2 legendlabel='Replication 2'; series x=time y=col3 / markers markerattrs=(color=black size=5 symbol=homedown)lineattrs=(color = black pattern=mediumdashdotdot thickness=2) legendlabel='Replication 3'; rowaxis label='Response'; title1 'Figure 4. Series Plots for Transposed Data (Utilization of Line and Marker Attributes)'; run; quit; 2. Using PROC SGPLOT to Plot Regression and Loess Lines Loess lines are a nonparametric method of obtaining a curve fitted to the data and can be used in preliminary data exploration. They also can play a role in visually assessing the fit of a parametric model. The code below produces panels with both a loess line and a 3rd degree polynomial regression line overlaid on a scatter plot (Figure 5). Note that including both on a graph reveals more clearly the lack of fit of the regression line. The code as written will produce a color graph while figure 5 illustrates graphs produced using lineattr and markerattr to specify black lines and markers. proc sgpanel data=FabDat; panelby trt; reg x=time y=response / degree=3; loess x=time y=response; title1 'Figure 5. Exploratory Graphs - Profiles of Each Subject (Reg and Loess Plots)'; run; quit; 3. Capturing and Graphing Estimates with PROC SGPLOT PROC MIXED is typically used to analyze data from repeated measures designs. The following code runs a model that accounts for the lack of independence among observations from the same individual as well as the non-constant variance. The statement, ods output lsmeans=mmm;, captures estimates and their standard errors as well as the requested confidence interval (CI) in a data set. proc mixed data=FabDat; class trt time id; model response = trt time trt*time /ddfm=kr; repeated time / subject=id type=csh; lsmeans trt*time / pdiff cl alpha=0.05; ods output lsmeans=mmm; run; proc print data=mmm; run; Table 3. LS Means for Combinations of Trt and Time Obs Effect trt time Estimate StdErr DF tValue Probt Alpha Lower Upper 1 trt*time 2 trt*time 3 trt*time 4 trt*time 5 trt*time 6 trt*time 7 trt*time 8 trt*time 9 trt*time 10 trt*time 11 trt*time 12 trt*time 13 trt*time 14 trt*time 15 trt*time 16 trt*time 17 trt*time 18 trt*time 19 trt*time 20 trt*time 21 trt*time 22 trt*time 23 trt*time 24 trt*time 1 0 1 1 1 3 1 5 1 10 1 20 2 0 2 1 2 3 2 5 2 10 2 20 3 0 3 1 3 3 3 5 3 10 3 20 4 0 4 1 4 3 4 5 4 10 4 20 3.1514 0.1614 8.01 19.53 <.0001 0.05 2.7793 3.5234 6.6145 0.5002 7.99 13.22 <.0001 0.05 5.4610 7.7681 9.7337 0.6549 8 14.86 <.0001 0.05 8.2237 11.2437 11.0857 0.6810 8 16.28 <.0001 0.05 9.5153 12.6562 11.9832 0.8925 8.01 13.43 <.0001 0.05 9.9258 14.0407 13.1649 0.9443 8.03 13.94 <.0001 0.05 10.9889 15.3409 2.8027 0.1614 8.01 17.37 <.0001 0.05 2.4306 3.1747 5.4093 0.5002 7.99 10.81 <.0001 0.05 4.2558 6.5629 8.0623 0.6549 8 12.31 <.0001 0.05 6.5523 9.5723 9.0064 0.6810 8 13.23 <.0001 0.05 7.4360 10.5768 9.3120 0.8925 8.01 10.43 <.0001 0.05 7.2545 11.3694 10.6279 0.9443 8.03 11.25 <.0001 0.05 8.4519 12.8039 3.3461 0.1614 8.01 20.73 <.0001 0.05 2.9740 3.7181 8.1320 0.5002 7.99 16.26 <.0001 0.05 6.9784 9.2855 12.2970 0.6549 8 18.78 <.0001 0.05 10.7870 13.8071 15.7507 0.6810 8 23.13 <.0001 0.05 14.1803 17.3211 18.1357 0.8925 8.01 20.32 <.0001 0.05 16.0783 20.1932 20.1212 0.9443 8.03 21.31 <.0001 0.05 17.9452 22.2971 3.0696 0.1614 8.01 19.02 <.0001 0.05 2.6975 3.4416 7.2369 0.5002 7.99 14.47 <.0001 0.05 6.0834 8.3905 11.1182 0.6549 8 16.98 <.0001 0.05 9.6081 12.6282 14.0758 0.6810 8 20.67 <.0001 0.05 12.5054 15.6462 17.5688 0.8925 8.01 19.69 <.0001 0.05 15.5114 19.6263 20.4874 0.9443 8.03 21.70 <.0001 0.05 18.3114 22.6634 The following code uses SGPLOT to generate a scatter plot (Figure 6) with error bars corresponding to the confidence interval for each mean. proc sgplot data=mmm; scatter x=time y=estimate / group=trt yerrorlower=lower yerrorupper=upper markerattrs=(color=black); title1 'Figure 6. Using SGPLOT to Generate a Scatter Plot with Confidence Interval Bars'; run; quit; In the current version of SAS, series plots do not accommodate adding error bars, but scatter plots do. You can include both plot types in the same graph to create a plot including lines, markers, and error bars (Figure 7). proc sgplot data=mmm; scatter x=time y=estimate / group=trt yerrorlower=lower yerrorupper=upper markerattrs=(color=black); series x=time y=estimate / group=trt lineattrs=(color=black); title1 'Figure 7. Using SGPLOT to Generate a Scatter Plots with Connecting Lines'; run; quit; IV. PUBLICATION READY GRAPHS As noted above, one way to use options to exert more control over a graph’s features involves first restructuring the data. In some instances, this can be accomplished using PROC TRANSPOSE; if data are unbalanced or a more complicated restructuring is required, a series of data steps might be used. The following code illustrates another way to restructure a data set. In the following, we first create individual data sets for each treatment group (trt1, trt2, trt3, and trt4). The code renames the columns for estimated means and confidence interval endpoints so that for each treatment data set, these columns have unique names. As a second step, we merge the 4 data sets by the variable time. This creates a transposed data set with separate columns for each treatment group so that it is easy to specify each treatment’s graph attributes. The following code illustrates these data manipulations. data trt1 (rename=(estimate=trt1mn upper=upper1 lower=lower1)) trt2 (rename=(estimate=trt2mn upper=upper2 lower=lower2 )) trt3 (rename=(estimate=trt3mn upper=upper3 lower=lower3)) trt4 (rename=(estimate=trt4mn upper=upper4 lower=lower4)); set mmm; if trt eq 1 then output trt1; if trt eq 2 then output trt2; if trt eq 3 then output trt3; if trt eq 4 then output trt4; data transMMM; merge trt1 trt2 trt3 trt4; by time; proc print data=transMMM; var time trt1mn lower1 upper1 trt2mn lower2 upper2 trt3mn lower3 upper3 trt4mn lower4 upper4; format trt1mn lower1 upper1 trt2mn lower2 upper2 trt3mn lower3 upper3 trt4mn lower4 upper4 5.2; run; Table 4. Transposed Data Set with Separate Columns for Each Treatment Group Obs time trt1mn lower1 upper1 trt2mn lower2 upper2 trt3mn lower3 upper3 trt4mn lower4 upper4 1 0 3.15 2.78 3.52 2.80 2.43 3.17 3.35 2.97 3.72 3.07 2.70 3.44 2 1 6.61 5.46 7.77 5.41 4.26 6.56 8.13 6.98 9.29 7.24 6.08 8.39 3 3 9.73 8.22 11.24 8.06 6.55 9.57 12.30 10.79 13.81 11.12 9.61 12.63 4 5 11.09 9.52 12.66 9.01 7.44 10.58 15.75 14.18 17.32 14.08 12.51 15.65 5 10 11.98 9.93 14.04 9.31 7.25 11.37 18.14 16.08 20.19 17.57 15.51 19.63 6 20 13.16 10.99 15.34 10.63 8.45 12.80 20.12 17.95 22.30 20.49 18.31 22.66 The following code uses the markerattr, lineattr and legendlabel options as well as both series and scatter plots to produce a customized Black and White Publication Ready Graph (Figure 8). In addition, YAXIS and XAXIS statements allow one to manipulate axis attributes; we use them here to change the axis labels. In order to have each treatment appear only once in the legend, we include only the series lines in the legend. To do this, the name option is used to give each series line a name (“plot1”, “plot2”, “plot3”, and “plot4”). Listing those names in the keylegend statement causes the legend entry for the scatter plots to be suppressed so that only the series lines are included in the legend. Because for each treatment the series and scatter statements specify the same marker symbol, by including labels only for the series lines, the legend includes both the line pattern and the marker. The ods listing statement (ods listing sge = on;) that precedes the call to PROC SGPLOT causes two versions of the graph to be produced. The second is an editable graph. Opening this version of the graph allows further customization via the utilization of a point-and-click graphics editor. These editable graphs can be saved as either .sge graphs or as .png graphs. When saving as a .png graph, you can specify resolution 300 for publication purposes (Henke, 2011). ods listing sge = on; proc sgplot data=transMMM ; series x=time y=trt1mn / name="plot1" markers markerattrs=(symbol=square color=black size=10)lineattrs=(pattern=35 color=black thickness=5 ) legendlabel="Treatment 1 "; scatter x=time y=trt1mn / markerattrs=(symbol=square color=black size=10)yerrorupper=upper1 yerrorlower=lower1; series x=time y=trt2mn / name="plot2" markers markerattrs=(symbol=circlefilled color=black size=10)lineattrs=(pattern=34 color=black thickness=5)legendlabel="Treatment 2"; scatter x=time y=trt2mn / markerattrs=(symbol=circlefilled color=black size=10)yerrorupper=upper2 yerrorlower=lower2; series x=time y=trt3mn / name="plot3" markers markerattrs=(symbol=triangle color=black size=10)lineattrs=(pattern=26 color=black thickness=2)legendlabel="Treatment 3"; scatter x=time y=trt3mn / markerattrs=(symbol=triangle color=black size=10)yerrorupper=upper3 yerrorlower=lower3; series x=time y=trt4mn / name="plot4" markers markerattrs=(symbol=starfilled color=black size=10)lineattrs=(pattern=20 color=black thickness=2)legendlabel="Treatment 4 "; scatter x=time y=trt4mn / markerattrs=(symbol=starfilled color=black size=10)yerrorupper=upper4 yerrorlower=lower4; keylegend "plot1" "plot2" "plot3" "plot4"; yaxis label='Response Mean'; xaxis label = "Time"; title1 'Figure 8. Black and White Publication Ready Graph'; run; quit; ods listing sge = off; In addition, in SAS 9.2, the ModStyle macro can be used to customize some style elements without first restructuring the data (Kuhfeld, 2009). However, some elements (e.g., line thickness and symbol size) cannot be specified using this macro. The following codes illustrate how to use the ModStyle macro to generate a Black and White Publication Ready Graph (Figure 11). %modstyle (name=blackonly, parent=journal2, colors=black, linestyles=1 2 26 20, markers=Square CircleFilled Triangle StarFilled); ods listing style=blackonly; ods listing sge = on style=blackonly; proc sgplot data=mmm; scatter x=time y=estimate / group=trt yerrorlower=lower yerrorupper=upper ; series x=time y=estimate / markers name="inlegend" group=trt ; keylegend "inlegend"; title1 'Figure 9. Using SGPLOT to Generate a Scatter Plot with Connecting Lines'; run; quit; ods listing sge = off; V. CONCLUSION The above examples illustrate the relative ease of using the new SAS 9.2 ‘SG’ procedures to create both exploratory and publication quality graphs. In addition, along with writing code to bring about most desired features, graphs can be further customized via the utilization of a point-and-click graphics editor. Including the statement ods listing sge=on; before a call to an ‘SG’ procedure causes an editable graph that can be saved as a .png image to be created. Both PROC SGPLOT and PROC SGPANEL support the following features: + Basic plot: scatter plots, series plots, band plots, needle plots, and vector plots. + Fit and confidence plots: loess curves, regression curves, penalized B-spline curves, and ellipses. + Distribution plots: histograms, box plots, and density curves. + Categorization plots: bar charts, dot plots, and bar-line charts. + Insets, legends, and reference lines. The % modstyle macro can be used to customize some style elements. But, the user may restructure data within SAS if they wish to specify elements of each overlaid graph. Further information about PROC SGPLOT and PROC SGPANEL, can be found in the SAS documentation and papers such as (Using PROC SGPLOT for Quick High Quality Graphs retrieved from http://www.wuss.org/proceedings08/08WUSS%20Proceedings/papers/how/how05.pdfon 11/12/2010 New SAS/ GRAPH procedures for Creating Statistical Graphics in Data Analysis retrieved from http://www.lexjansen.com/wuss/2007/DataPresentationsBusIntell/DPBI_Heath_SASGraphProcedures.pd f . REFERENCES Delwiche, L. D., & Slaughter, S. J. Using Proc SGPLOT for Quick High Quality Graphs retrieved from http://www.wuss.org/proceedings08/08WUSS%20Proceedings/papers/how/how05.pdfon 11/20/2010. Heath, D. New SAS/GRAPH procedures for Creating Statistical Graphics in Data Analysis retrieved from http://www.lexjansen.com/wuss/2007/DataPresentationsBusIntell/DPBI_Heath_SASGraphProce dures.pdf. Henke, A. (2011). Publishing Tips of the Month. Agricultural Experiment Station, New Mexico State University. Kuhfeld, F. W. (2010). Statistical Graphics in SAS: An Introduction to the Graph Template Language and Statistical Graphics Procedures. SAS Institute. Kuhfeld, F. W. (2009). Modifying ODS Statistical Graphics Templates in SAS 9.2. SAS Institute. SAS (2009). SAS/Graph 9.2: Statistical Graphics Procedures Guide. New Mexico State University is an equal opportunity/affirmative action employer and educator. NMSU and the U.S. Department of Agriculture cooperating.