Exploratory and Publication-Ready Graphs Easily Obtained Using

advertisement
Exploratory and
Publication-Ready
Graphs Easily Obtained
Using SAS Version 9.2
Agricultural Experiment Station
College of Agricultural, Consumer and Environmental Sciences
Exploratory and Publication-Ready Graphs Easily Obtained
Using SAS Version 9.2
Long Pham, NMSU Agricultural Experiment Station
Dawn M. VanLeeuwen, Department of Economics and International Business;
NMSU Agricultural Experiment Station
I. INTRODUCTION
While NMSU (New Mexico State University) Experiment Station and ACES (College of
Agricultural, Consumer, and Environmental Sciences) researchers often use SAS software to analyze
data, they typically use other software to obtain graphs. This has been driven at least in part, by the
perceived difficulty of obtaining graphs – and in particular, publication quality graphs – using SAS
software. However, SAS version 9.2 software introduces the so-called ‘SG’ procedures within the
SAS/GRAPH suite that are designed to “produce publication-quality graphics for statistical data
presentation with minimal effort” (SAS Institute Inc., retrieved from
https://support.sas.com/edu/schedules.html?id=867&ctry=US on 8/27/2010).
Using the new SAS procedures, which include PROC SGPLOT, PROC SGPANEL, and PROC
SGSCATTER, to obtain graphs has several advantages. One advantage is that once code is written to
produce a desired graph, similar graphs can be produced more quickly than if one is creating graphs using
a point-and-click interface. While it is possible to code nearly any feature you might want, graphs
produced using these procedures can be further customized using a point-and-click graphics editor. A
second advantage is that using the SAS ODS (Output Delivery System) facility, one can easily capture
information generated by SAS procedures such as PROC MIXED and then use this information in the
‘SG’ procedures to create graphs using model-based estimates. This capability facilitates creating graphs
that incorporate correct model-based error estimates.
Each of these graphing procedures has multiple capabilities. For example, both PROC SGPLOT
and SGPANEL have the capability to produce horizontal bar charts, horizontal box and whisker plots
histograms, horizontal line graphs, needle graphs, scatter plots, series graphs, vertical bar charts, vertical
box and whisker plots, and vertical line graphs. They also have the capability to easily include fitted Bsplines, regression lines, and loess lines as well as confidence and prediction bands. The SGPLOT
procedure produces overlaid plots; The SGPANEL procedure creates a panel for the values of one or
more classification variables in one of several layouts including panel, lattice, column lattice, and row
lattice (For further information, see SAS (2009) and Kuhfeld (2010)). Once a user is acquainted with
these new graphing procedures and can create one or two of the many types of graphs that they can
produce, that user can access SAS Help from within their SAS session to learn how to create other types
of graphs.
The purpose of this brief report is to introduce Experiment Station and ACES researchers to the
potential utility of the new graphing procedures. This will be done through a series of examples using a
fabricated data set. Each example will provide both the code used and the produced graph.
II. THE DATA SET
Graphing with PROC SGPANEL and PROC SGPLOT is illustrated using a hypothetical data set.
Data correspond to a completely randomized design with repeated measures. In this hypothetical scenario,
a total of 12 subjects were randomly assigned to one of 4 treatments. A response variable was measured at
times 0, 1, 3, 5, 10 and 20. Table 1 represents the first ten observations; the table provides variable names
that will be used in the example code. The data set is named FabDat. Subjects are uniquely identified by
the combination of trt (treatment) and rep (replication).
Table 1. Fabricated Data (FabDat)- First Ten Observations
trt rep time response
1
1
0
2.9354
1
1
1
6.2293
1
1
3
9.3817
1
1
5
9.8197
trt rep time response
1
1
10
10.9859
1
1
20
12.2572
1
2
0
3.4637
1
2
1
7.5704
1
2
3
10.9341
1
2
5
12.6720
III. EXPLORATORY DATA ANALYSIS
This section introduces using PROC SGPANEL and PROC SGPLOT to obtain preliminary
graphs. The next section (IV) shows how to produce publication quality graphs.
1. Using PROC SGPANEL to Obtain Profile Plots
When analyzing repeated measures designs, initial analysis should include obtaining the profiles
for each subject. This allows one to identify outlying individuals and to form an initial impression of the
data. Profile plots can be obtained very quickly using the following call to PROC SGPANEL which
produces Figure 1. The lineattrs option specifies a black and white graph.
proc sgpanel data=FabDat;
panelby trt;
series x=time y=response / group=rep lineattrs=(color=black);
title1 'Figure 1. Exploratory Graphs - Profiles of Each Subject';
run;
quit;
After running the code, the graph can be viewed by double-clicking ‘SGPANEL’ then the item for
‘SGPANEL Procedure’ in the Results panel to the left of the output window.
Minor modifications of the code (below) produce a panel of series plots with markers (Figure 2)
and with those markers and the lines specified to be black.
proc sgpanel data=FabDat;
panelby trt;
series x=time y=response / group=rep markers markerattrs=(color=black) lineattrs=(color=black);
title1 'Figure 2. Exploratory Graphs - Profiles of Each Subject';
run;
quit;
A scatter plot can serve the same purpose as the series plot. The following code produces Figure
3. Note that both the series and scatter plots reveal some form of subject effects as well as non-constant
variance. Both features must be accounted for in the mixed model analysis.
proc sgpanel data=FabDat;
panelby trt;
scatter x=time y=response / group=rep markerattrs=(color=black);
title1 'Figure 3. Exploratory Graphs - Scatter Plots for Each Subject';
run;
quit;
One way to exert more control over the appearance of the graphs produced using SGPLOT or
SGPANEL is to reorganize the data using PROC SORT and PROC TRANSPOSE. Reorganizing the data
as illustrated below allows one to specify attributes of each series and so to control the graphs’
appearance. The following code creates a variable for each replication. Readers should note that this
works because the data structure is complete (i.e., there are no missing values).
proc sort data=FabDat; by trt time rep; run;
proc transpose data=FabDat out=T_rep;
var response;
by trt time;
proc print data=T_rep;
run;
quit;
Table 2. Transposed Data
Obs
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
trt
time
_NAME_
1
0 response
1
1 response
1
3 response
1
5 response
1
10 response
1
20 response
2
0 response
2
1 response
2
3 response
2
5 response
2
10 response
2
20 response
3
0 response
3
1 response
3
3 response
3
5 response
3
10 response
3
20 response
4
0 response
4
1 response
4
3 response
4
5 response
4
10 response
4
20 response
COL1
2.9354
6.2293
9.3817
9.8197
10.9859
12.2572
2.5795
4.4889
6.7576
7.7639
7.9619
9.0055
3.2687
8.1111
12.3549
15.6742
18.1716
19.8734
3.2968
8.2906
11.9854
14.8362
18.7233
21.8094
COL2
3.4637
7.5704
10.9341
12.6720
13.7576
14.9563
3.0949
6.3537
9.5852
10.4912
11.3025
12.5676
3.5322
8.5962
12.5410
16.1784
19.3255
21.1534
3.2763
7.2997
11.8185
14.6321
18.2905
21.6077
COL3
3.0550
6.0440
8.8853
10.7655
11.2061
12.2812
2.7336
5.3853
7.8442
8.7641
8.6715
10.3107
3.2373
7.6886
11.9952
15.3995
16.9100
19.3367
2.6356
6.1205
9.5506
12.7592
15.6927
18.0451
Figure 4 illustrates the effect of the following code incorporating options relating to line and
marker attributes and labels. The effect of the multiple series statements is to include all specified series
on the same plot. For each marker and line, the color is specified to be black although blue, green or any
of a number of other colors could have been specified. For markers, both the size and symbol have been
specified in the code. For lines, the pattern and thickness have been specified. For each series, the
legendlabel option specifies the label that appears in the legend. Note that the rowaxis label allows the
user to specify the row label which serves as the y-axis label.
proc sgpanel data=T_rep;
panelby trt;
series x=time y=col1 / markers markerattrs=(color=black size=5 symbol=star)lineattrs=(color = black
pattern=thindot thickness=2)legendlabel='Replication 1';
series x=time y=col2 / markers markerattrs=(color=black size=5 symbol=circlefilled)lineattrs=(color = black
pattern=shortdashdot thickness=2 legendlabel='Replication 2';
series x=time y=col3 / markers markerattrs=(color=black size=5 symbol=homedown)lineattrs=(color = black
pattern=mediumdashdotdot thickness=2) legendlabel='Replication 3';
rowaxis label='Response';
title1 'Figure 4. Series Plots for Transposed Data (Utilization of Line and Marker Attributes)';
run;
quit;
2. Using PROC SGPLOT to Plot Regression and Loess Lines
Loess lines are a nonparametric method of obtaining a curve fitted to the data and can be used in
preliminary data exploration. They also can play a role in visually assessing the fit of a parametric model.
The code below produces panels with both a loess line and a 3rd degree polynomial regression line
overlaid on a scatter plot (Figure 5). Note that including both on a graph reveals more clearly the lack of
fit of the regression line. The code as written will produce a color graph while figure 5 illustrates graphs
produced using lineattr and markerattr to specify black lines and markers.
proc sgpanel data=FabDat;
panelby trt;
reg x=time y=response / degree=3;
loess x=time y=response;
title1 'Figure 5. Exploratory Graphs - Profiles of Each Subject (Reg and Loess Plots)';
run;
quit;
3. Capturing and Graphing Estimates with PROC SGPLOT
PROC MIXED is typically used to analyze data from repeated measures designs. The following
code runs a model that accounts for the lack of independence among observations from the same
individual as well as the non-constant variance. The statement, ods output lsmeans=mmm;, captures
estimates and their standard errors as well as the requested confidence interval (CI) in a data set.
proc mixed data=FabDat;
class trt time id;
model response = trt time trt*time /ddfm=kr;
repeated time / subject=id type=csh;
lsmeans trt*time / pdiff cl alpha=0.05;
ods output lsmeans=mmm;
run;
proc print data=mmm;
run;
Table 3. LS Means for Combinations of Trt and Time
Obs Effect trt time Estimate
StdErr
DF tValue Probt Alpha
Lower
Upper
1 trt*time
2 trt*time
3 trt*time
4 trt*time
5 trt*time
6 trt*time
7 trt*time
8 trt*time
9 trt*time
10 trt*time
11 trt*time
12 trt*time
13 trt*time
14 trt*time
15 trt*time
16 trt*time
17 trt*time
18 trt*time
19 trt*time
20 trt*time
21 trt*time
22 trt*time
23 trt*time
24 trt*time
1 0
1 1
1 3
1 5
1 10
1 20
2 0
2 1
2 3
2 5
2 10
2 20
3 0
3 1
3 3
3 5
3 10
3 20
4 0
4 1
4 3
4 5
4 10
4 20
3.1514 0.1614 8.01 19.53 <.0001 0.05 2.7793 3.5234
6.6145 0.5002 7.99 13.22 <.0001 0.05 5.4610 7.7681
9.7337 0.6549 8 14.86 <.0001 0.05 8.2237 11.2437
11.0857 0.6810 8 16.28 <.0001 0.05 9.5153 12.6562
11.9832 0.8925 8.01 13.43 <.0001 0.05 9.9258 14.0407
13.1649 0.9443 8.03 13.94 <.0001 0.05 10.9889 15.3409
2.8027 0.1614 8.01 17.37 <.0001 0.05 2.4306 3.1747
5.4093 0.5002 7.99 10.81 <.0001 0.05 4.2558 6.5629
8.0623 0.6549 8 12.31 <.0001 0.05 6.5523 9.5723
9.0064 0.6810 8 13.23 <.0001 0.05 7.4360 10.5768
9.3120 0.8925 8.01 10.43 <.0001 0.05 7.2545 11.3694
10.6279 0.9443 8.03 11.25 <.0001 0.05 8.4519 12.8039
3.3461 0.1614 8.01 20.73 <.0001 0.05 2.9740 3.7181
8.1320 0.5002 7.99 16.26 <.0001 0.05 6.9784 9.2855
12.2970 0.6549 8 18.78 <.0001 0.05 10.7870 13.8071
15.7507 0.6810 8 23.13 <.0001 0.05 14.1803 17.3211
18.1357 0.8925 8.01 20.32 <.0001 0.05 16.0783 20.1932
20.1212 0.9443 8.03 21.31 <.0001 0.05 17.9452 22.2971
3.0696 0.1614 8.01 19.02 <.0001 0.05 2.6975 3.4416
7.2369 0.5002 7.99 14.47 <.0001 0.05 6.0834 8.3905
11.1182 0.6549 8 16.98 <.0001 0.05 9.6081 12.6282
14.0758 0.6810 8 20.67 <.0001 0.05 12.5054 15.6462
17.5688 0.8925 8.01 19.69 <.0001 0.05 15.5114 19.6263
20.4874 0.9443 8.03 21.70 <.0001 0.05 18.3114 22.6634
The following code uses SGPLOT to generate a scatter plot (Figure 6) with error bars
corresponding to the confidence interval for each mean.
proc sgplot data=mmm;
scatter x=time y=estimate / group=trt yerrorlower=lower yerrorupper=upper markerattrs=(color=black);
title1 'Figure 6. Using SGPLOT to Generate a Scatter Plot with Confidence Interval Bars';
run;
quit;
In the current version of SAS, series plots do not accommodate adding error bars, but scatter plots
do. You can include both plot types in the same graph to create a plot including lines, markers, and error
bars (Figure 7).
proc sgplot
data=mmm;
scatter x=time y=estimate / group=trt yerrorlower=lower yerrorupper=upper markerattrs=(color=black);
series x=time y=estimate / group=trt lineattrs=(color=black);
title1 'Figure 7. Using SGPLOT to Generate a Scatter Plots with Connecting Lines';
run;
quit;
IV. PUBLICATION READY GRAPHS
As noted above, one way to use options to exert more control over a graph’s features involves
first restructuring the data. In some instances, this can be accomplished using PROC TRANSPOSE; if
data are unbalanced or a more complicated restructuring is required, a series of data steps might be used.
The following code illustrates another way to restructure a data set. In the following, we first create
individual data sets for each treatment group (trt1, trt2, trt3, and trt4). The code renames the columns for
estimated means and confidence interval endpoints so that for each treatment data set, these columns have
unique names. As a second step, we merge the 4 data sets by the variable time. This creates a transposed
data set with separate columns for each treatment group so that it is easy to specify each treatment’s graph
attributes. The following code illustrates these data manipulations.
data trt1 (rename=(estimate=trt1mn upper=upper1 lower=lower1))
trt2 (rename=(estimate=trt2mn upper=upper2 lower=lower2 ))
trt3 (rename=(estimate=trt3mn upper=upper3 lower=lower3))
trt4 (rename=(estimate=trt4mn upper=upper4 lower=lower4));
set mmm;
if trt eq 1 then output trt1;
if trt eq 2 then output trt2;
if trt eq 3 then output trt3;
if trt eq 4 then output trt4;
data transMMM;
merge trt1 trt2 trt3 trt4;
by time;
proc print data=transMMM;
var time trt1mn lower1 upper1 trt2mn lower2 upper2 trt3mn lower3 upper3 trt4mn lower4 upper4;
format trt1mn lower1 upper1 trt2mn lower2 upper2 trt3mn lower3 upper3 trt4mn lower4 upper4 5.2;
run;
Table 4. Transposed Data Set with Separate Columns for Each Treatment Group
Obs time trt1mn lower1 upper1 trt2mn lower2 upper2 trt3mn lower3 upper3 trt4mn lower4 upper4
1 0 3.15 2.78 3.52 2.80 2.43 3.17 3.35 2.97 3.72 3.07 2.70 3.44
2 1 6.61 5.46 7.77 5.41 4.26 6.56 8.13 6.98 9.29 7.24 6.08 8.39
3 3 9.73 8.22 11.24 8.06 6.55 9.57 12.30 10.79 13.81 11.12 9.61 12.63
4 5 11.09 9.52 12.66 9.01 7.44 10.58 15.75 14.18 17.32 14.08 12.51 15.65
5 10 11.98 9.93 14.04 9.31 7.25 11.37 18.14 16.08 20.19 17.57 15.51 19.63
6 20 13.16 10.99 15.34 10.63 8.45 12.80 20.12 17.95 22.30 20.49 18.31 22.66
The following code uses the markerattr, lineattr and legendlabel options as well as both series and
scatter plots to produce a customized Black and White Publication Ready Graph (Figure 8). In addition,
YAXIS and XAXIS statements allow one to manipulate axis attributes; we use them here to change the
axis labels. In order to have each treatment appear only once in the legend, we include only the series
lines in the legend. To do this, the name option is used to give each series line a name (“plot1”, “plot2”,
“plot3”, and “plot4”). Listing those names in the keylegend statement causes the legend entry for the
scatter plots to be suppressed so that only the series lines are included in the legend. Because for each
treatment the series and scatter statements specify the same marker symbol, by including labels only for
the series lines, the legend includes both the line pattern and the marker. The ods listing statement (ods
listing sge = on;) that precedes the call to PROC SGPLOT causes two versions of the graph to be
produced. The second is an editable graph. Opening this version of the graph allows further
customization via the utilization of a point-and-click graphics editor. These editable graphs can be saved
as either .sge graphs or as .png graphs. When saving as a .png graph, you can specify resolution 300 for
publication purposes (Henke, 2011).
ods listing sge = on;
proc sgplot data=transMMM ;
series x=time y=trt1mn / name="plot1" markers markerattrs=(symbol=square color=black
size=10)lineattrs=(pattern=35 color=black thickness=5 ) legendlabel="Treatment 1 ";
scatter x=time y=trt1mn / markerattrs=(symbol=square color=black size=10)yerrorupper=upper1
yerrorlower=lower1;
series x=time y=trt2mn / name="plot2" markers markerattrs=(symbol=circlefilled color=black
size=10)lineattrs=(pattern=34 color=black thickness=5)legendlabel="Treatment 2";
scatter x=time y=trt2mn / markerattrs=(symbol=circlefilled color=black size=10)yerrorupper=upper2
yerrorlower=lower2;
series x=time y=trt3mn / name="plot3" markers markerattrs=(symbol=triangle color=black
size=10)lineattrs=(pattern=26 color=black thickness=2)legendlabel="Treatment 3";
scatter x=time y=trt3mn / markerattrs=(symbol=triangle color=black size=10)yerrorupper=upper3
yerrorlower=lower3;
series x=time y=trt4mn / name="plot4" markers markerattrs=(symbol=starfilled color=black
size=10)lineattrs=(pattern=20 color=black thickness=2)legendlabel="Treatment 4 ";
scatter x=time y=trt4mn / markerattrs=(symbol=starfilled color=black size=10)yerrorupper=upper4
yerrorlower=lower4;
keylegend "plot1" "plot2" "plot3" "plot4";
yaxis label='Response Mean';
xaxis label = "Time";
title1 'Figure 8. Black and White Publication Ready Graph';
run;
quit;
ods listing sge = off;
In addition, in SAS 9.2, the ModStyle macro can be used to customize some style elements without first
restructuring the data (Kuhfeld, 2009). However, some elements (e.g., line thickness and symbol size)
cannot be specified using this macro. The following codes illustrate how to use the ModStyle macro to
generate a Black and White Publication Ready Graph (Figure 11).
%modstyle (name=blackonly, parent=journal2, colors=black, linestyles=1 2 26 20,
markers=Square CircleFilled Triangle StarFilled);
ods listing style=blackonly;
ods listing sge = on style=blackonly;
proc sgplot
data=mmm;
scatter x=time y=estimate / group=trt yerrorlower=lower yerrorupper=upper ;
series x=time y=estimate / markers name="inlegend" group=trt ;
keylegend "inlegend";
title1 'Figure 9. Using SGPLOT to Generate a Scatter Plot with Connecting Lines';
run;
quit;
ods listing sge = off;
V. CONCLUSION
The above examples illustrate the relative ease of using the new SAS 9.2 ‘SG’ procedures to
create both exploratory and publication quality graphs. In addition, along with writing code to bring
about most desired features, graphs can be further customized via the utilization of a point-and-click
graphics editor. Including the statement ods listing sge=on; before a call to an ‘SG’ procedure causes an
editable graph that can be saved as a .png image to be created. Both PROC SGPLOT and PROC
SGPANEL support the following features:
+ Basic plot: scatter plots, series plots, band plots, needle plots, and vector plots.
+ Fit and confidence plots: loess curves, regression curves, penalized B-spline curves, and ellipses.
+ Distribution plots: histograms, box plots, and density curves.
+ Categorization plots: bar charts, dot plots, and bar-line charts.
+ Insets, legends, and reference lines.
The % modstyle macro can be used to customize some style elements. But, the user may
restructure data within SAS if they wish to specify elements of each overlaid graph.
Further information about PROC SGPLOT and PROC SGPANEL, can be found in the SAS
documentation and papers such as (Using PROC SGPLOT for Quick High Quality Graphs retrieved from
http://www.wuss.org/proceedings08/08WUSS%20Proceedings/papers/how/how05.pdfon 11/12/2010
New SAS/ GRAPH procedures for Creating Statistical Graphics in Data Analysis retrieved from
http://www.lexjansen.com/wuss/2007/DataPresentationsBusIntell/DPBI_Heath_SASGraphProcedures.pd
f
.
REFERENCES
Delwiche, L. D., & Slaughter, S. J. Using Proc SGPLOT for Quick High Quality Graphs retrieved from
http://www.wuss.org/proceedings08/08WUSS%20Proceedings/papers/how/how05.pdfon
11/20/2010.
Heath, D. New SAS/GRAPH procedures for Creating Statistical Graphics in Data Analysis retrieved from
http://www.lexjansen.com/wuss/2007/DataPresentationsBusIntell/DPBI_Heath_SASGraphProce
dures.pdf.
Henke, A. (2011). Publishing Tips of the Month. Agricultural Experiment Station, New Mexico State
University.
Kuhfeld, F. W. (2010). Statistical Graphics in SAS: An Introduction to the Graph Template Language
and Statistical Graphics Procedures. SAS Institute.
Kuhfeld, F. W. (2009). Modifying ODS Statistical Graphics Templates in SAS 9.2. SAS Institute.
SAS (2009). SAS/Graph 9.2: Statistical Graphics Procedures Guide.
New Mexico State University is an equal opportunity/affirmative action employer
and educator. NMSU and the U.S. Department of Agriculture cooperating.
Download