Graphics - Stanford University

Graphics
HRP223 – 2011
November 28, 2011
Copyright © 1999-2011 Leland Stanford Junior University. All rights reserved.
Warning: This presentation is protected by copyright law and international treaties.
Unauthorized reproduction of this presentation, or any portion of it, may result in
severe civil and criminal penalties and will be prosecuted to maximum extent possible
under the law.
1
Robbins
• Creating More Effective Graphics by Naomi
Robbins is a wonderful book showing the right
and wrong ways to visualize scientific data.
Read it when you have an afternoon off. It is
an ideal read on a transcontinental flight.
2
How I do graphics
• Exploratory stuff
– Use the quick and dirty graphics built into EG
• Production quality graphics
– Write SAS or R code to make better looking
graphics
– Edit in Adobe Illustrator
3
Visualization Tools
• This is a excellent
book that covers
how to visualize
stuff using many
tools (including R).
It has a great
introduction to
Adobe Illustrator.
4
Why Do Data Visualization?
• Well designed pictures will show you the
details and the whole pattern in your data.
• Numeric descriptions can easily hide
important information.
• Some patterns are hard to detect in tables.
– Whenever data is reported over time or locations,
you need art.
YOU CAN LEARN A LOT BY JUST LOOKING.
-Yogi Berra
5
Fisher’s Plot Data Reported in
Cleveland
Year 1
Year 2
Based on code written by Robert Allison at SAS Institute
6
Scatter Plot for Correlations
15
15
10
10
5
5
0
0
0
5
10
15
20
15
15
10
10
5
5
0
0
5
10
15
20
0
5
10
15
20
0
0
5
10
15
20
Anscombe 1973, Graphs in Statistical Analysis
All have r2 = .67
7
Bad Things
• First, I want to talk about bad graphics that I
frequently see.
– 3d
– Pie
– Donuts
– Stacked graphics
8
General
• 3D graphics
– Don’t, Don’t,
Don’t
While the SAS
implementation of 3D
graphics is relatively good,
don’t use 3D effects, unless
you are measuring
something in 3D. Even
then, don’t.
9
Tufte is a God to many.
• The empiricist in me is very nervous
about the amount of pontificating in
his books…
– I want to have evidence-based advice.
• His best advice is to put no extra ink
on the page.
– Think about the ink-to-information
ratio.
– Remove all chart junk.
Note: the irony of the chart junk on this slide….
10
Example Bar Chart
Serum Samples in Each Trimester
You can remove ink
rather than adding .
11
Ink-to-Information Ratio
• How much ink for seven numbers?
Based on Soukup & Davidson, 2002 Visual Data Mining
12
Cleveland
• If you want to know how to do scientific visualization,
you must read William Cleveland’s work.
– He attempted to quantify what makes a good graphic good.
• His early work on graphics is one of the reasons why
R/S-plus is taking over the statistical world.
13
Pie is bad.
• Work by Cleveland (and experimental psychologists)
suggests that:
– people are bad at judging the relative magnitude of angles
– if you twist the rotation of the pie you can cause people to
systematically misjudge the size of the angles
– a 3rd dimension makes judgment worse
• If you get a glossy handout with a 3D pie, assume
someone is lying to you.
• Don’t use them.
14
Don’t Explode!
• This exploded 3D pie (brought to you by Excel)
is nearly useless for judging amounts.
Total
tweaked
twisted
wrecked
15
Forbidden Donut….
• Donut plots have the same problems as pies
(if not worse) ….
16
Stacking is Bad
• Cleveland also quantified the fact that people
are bad at judging the relative height of
stacked data.
17
Wow, a cinnamon roll plot!
• Good luck making rapid judgments using this
stacked 3D pie.
18
What is a good graphic?
• Don’t make your audience think unnecessarily!
– The point of the graphic should stand out instantly.
– Plot the quantity (inference) that you want people
to notice.
• Show the central tendency and the variability.
• Minimize the amount of ink on the page.
• Be sure colorblind people can understand it.
– Use a black and white photocopier and make sure
you can distinguish all groups.
19
What is wrong with this?
250
Weight in Framingham Dataset
200
150
100
Never contrast
black on blue.
50
0
Female
Male
What is the point that the reader should learn from this?
How is the variability represented?
What are the error bars?
Can you interpret a 1 SD error bars?
How many people are included?
Ink to information … How many numbers are depicted?
20
What is wrong with this?
What is the point of this
graphic?
How are the two sexes
represented?
What data is this?
21
22
What is wrong with this?
Lovely white space
What is the point?
How are the sexes
represented?
How many people?
Where is the mean?
What data is this?
23
Easy But Awful Boxplots
24
What is the point of this
graphic?
How are the two sexes
represented?
25
Code for a Good Boxplot
26
SAS's Framingham Heart Data
What is the point of this
graphic?
27
28
When you test for
the difference in the
mean SAS gives
you a great plot.
29
Avoid Thinking
• Put labels on the graphic directly instead of
using a key.
• If you want people to compare the difference
between two lines, plot the difference, not the
two lines.
30
Bivariate Comparisons with Lines
• People are extremely bad at judging the distance between
two curves. Never ask people to judge up and down (vertical)
distances between curves.
The distance
between the two
curves is the
same at all points.
Based on:
Robbins
Creating More
Effective
Graphs, 2005
31
Plot Types
• Univariate (one variable)
– Categorical variables
• Bar charts
• Dot plots
• Waffle plots
– Continuous variables
• Histogram
• Box plot
• Violin plots
32
Bar Charts
• The ink-to-information ratio is lousy.
• A one dimensional quantity is being
“expanded” into two dimensions.
– Doubling of the amount corresponds to how much
of an increase in area?
33
SAS Bar Charts
• SAS makes the reader do extra work by rotating
the axis labels in ActiveX images.
• They pointlessly include variable labels by default.
34
How to do it?
Notice you can Edit the data
and apply filters.
You can right click on variables and
apply user-defined formats off the
Properties dialog.
35
First create the format.
In the Data windowpane of the Bar Chart GUI, right click on the variable and
change the format to the User Defined format you had created.
36
The GUI is Solid
• My only complaints are that the rotate grouping
values text does not work (position in this
example) and the summary statistics do not
show up when you request ActiveX images.
37
.PNG format
ActiveX image format
38
Saving the Graphic for Publication
• The easiest way to get publication quality
graphics is to set the output type to be RTF.
39
Default Output and Graphics
• The default graphic format in EG is ActiveX.
These images can be edited (even on the web)
but they only display with Internet Explorer. I
have set my graphics to display as ActiveX images.
Tweak this with Tools> Options… > Graph.
40
41
Types of Images
• The default formats of the images are determined by
the ODS destinations you are using:
– LISTING: pgn visible in the Windows Image Fax Viewer
– HTML: png, gif, jpg contained in web pages and visible in
Internet Explorer, Firefox or Opera
– LATEX: PostScrpt, epsi, gif, jpeg, pgn are visible in GhostView
– PCL or PS: contained in Postscript file are visible in
GhostView
– PDF: contained in pdf, which is visible with Adobe Reader
– RTF: visible in MS Word
• RTF graphics are done at 300 dpi by default
42
What is ODS?
• The Output Delivery System (ODS) controls
the type and appearance (aka the style) of SAS
output.
Different appearance
templates
Different output
destinations/types.
43
You can browse the ODS appearance
templates from the Style Manager on the
Tools menu.
44
I Typically Use HTML
Include image_dpi = 300
to set the resolution to
be higher than the
default 100 dots per
inch. Try 300 for final
images pasting into MS
Office.
This is the appearance template. For optimal
results use:
Analysis: color
Default : overdistinguishes symbols for color or
B&W
Journal or journal2, etc: black and white
Statistical or statistical2, etc: color
This says the images should show
tooltips with extra statistical details
when you hover the mouse over
parts of the graphic. (I can’t image
these.)
45
ods graphics on;
• This turns on the ODS statistical graphics.
• Behind the scenes this combines your data
with a pre-specified description of what to
plot and the aesthetics of the appearance.
Your data
What
Where?
Graph
template
Colors
Fonts
Style
template
46
Useful ods graphics options
If you set only width
or height, it will use
a 4:3 aspect ratio.
• ods graphics on / Width = 8in
Make a
Height = 11in
series of
graphics
Imagefmt = jpg
called
thingy1,
imagename = thingy
thingy2, etc.
imagefmt = staticmap ;
• ods graphics / reset;
• ods graphics off;
Use pop-up
tooltips with
details.
Reset the graphic
counter back to 1
If you want to
disable ods graphics
for a procedure
47
48
ODS SGraphics
• Compared to the competition, for the last 10
years SAS graphics have been between poor
and pathetic.
– Graphics procedures rendered with okay quality,
at best .
– No “what you see is what you get” editing.
– Many plots were nearly impossible to render.
– Custom graphics required extensive programming.
• SAS 9.x has attempted to solve this problem.
49
Old vs. New Procedures
• The old (commonly used) graphics procedures
were gchart, gplot.
• Now most analysis procedures have built in
high quality graphics that can be invoked with
an ODS graphics on statement.
– Early on in the class I told you to tweak the EG
options to include “ODS graphics on” with every
run.
• There are also new “easy to use” statistical
graphics (sg) procedures.
50
New Graphics Statistical Graphics Procs
• proc sgPlot
– general plotting procedure that replaces gplot
• proc sgScatter
– lots of tools for scatterplots and scatter matrices
• proc sgPanel
– quick and easy trellis/lattice/matrix/panel of plots
• Proc sgRender
– used with proc template to make totally custom plots
– It replaces proc greplay
51
Plot Types
• Univariate (one variable)
– Categorical variables
• Bar charts
• Dot plots
• Waffle plots
– Continuous variables
•
•
•
•
Histogram
Box plot
Violin plots
Quantile and QQ plots
52
Categorical variables
You can get an okay looking
graphic using sgpanel.
53
Categorical variables
I was able to get exactly the
graphic I wanted using R.
54
If you want to use R
• Download R for Mac or PC
cran.cnr.berkeley.edu/bin/macosx/
cran.cnr.berkeley.edu/bin/windows/base
55
If you use a PC, also get PERL and
Tinn-R
• PERL is a text manipulation language that is
used by a couple of key R packages. It ships
with Mac OS X. PC users can get ActivePerl
(what I use) or Strawberry Perl for Windows.
www.perl.org/get.html
• Tinn-R is a text editor that knows the R
language.
sourceforge.net/projects/tinn-r/
56
R Help
• R help files are user hostile. To learn about
the options for dotchart type:
?dotchart
• Use: rseek.org
57
Browse
• To see why people use R for graphics look
here:
addictedtor.free.fr/graphiques/thumbs.php
58
Additional Libraries
• If you see sample code that includes require()
or library(), you will need to do a onetime
download of the additional package. If you
are using Vista, run R as the administrator (by
right clicking on the R icon instead of just
double clicking ) to install and update
packages.
59
Categorical variables
Waffle Plots (aka pixel plots)
• I have not found software to do them.
Image from: Visual language for
Designers by Connie Malamed
2009.
60
Continuous variables
Continuous Outcomes
• The Distribution Analysis menu option can do
basic plots.
61
The resolution of the
histogram is okay but the
others are unacceptable.
62
Continuous variables
Use sgplot for high
resolution plots.
63
Continuous variables
64
Continuous variables
Violin
0
50
100
150
• A violin plot mirrors the shape of the
histogram (density). They can be done in R.
65
Grouped categorical variables
Grouped Categorical Variables
• To graph categorical data in SAS you need to
get Michael Friendly’s Visualizing Categorical
Data. Unfortunately, his macros are
copyrighted with the book… So I will show
you the R versions.
– Fourfold plots
– Mosaic plots
– Association plots
66
Grouped categorical variables
Fourfold Plots
45% male vs. 30%
female admission
• They draw 4 slices of pie with
the area corresponding to
the number of people in
each cell of a 2x2 table and
they have confidence bands
such that if the confidence
bounds overlap on adjacent
pie pieces, they are not
statistically significantly
different.
67
Grouped categorical variables
Row: Male
1493
557
1278
Col: Rejected
There is clear
evidence of sexist
policies in
admissions!
1198
Col: Admitted
More males were
admitted than
females.
Row: Female
68
Grouped categorical variables
Department: A
Department: D
Sex: Male
Sex: Male
279
19
Admit?: No
Admit?: No
89
The joy of Simpsons
paradox.
138
Admit?: Yes
313
Admit?: Yes
131
244
Sex: Female
Sex: Female
Department: B
Department: E
Sex: Male
Sex: Male
17
138
8
Admit?: No
Admit?: No
53
Admit?: Yes
207
Admit?: Yes
353
94
299
Sex: Female
Sex: Female
Department: C
Department: F
Sex: Male
Sex: Male
22
202
391
Sex: Female
351
Admit?: No
Admit?: Yes
205
Admit?: Yes
120
Admit?: No
Department A admitted
more females than males
and every other
department had no bias!
512
24
317
Sex: Female
69
Grouped categorical variables
Mosaic Plots
• So you have an
contingency table and
you want to know if
there is as an
association. You do a
chi-square test and it
says there are
associations between
the rows and columns.
What next?
70
Grouped categorical variables
Some basic voodoo in R
shows which combinations
are over (in blue) or under
represented (in red).
71
Grouped categorical variables
Red
Blond
I prefer the simpler
association plots.
Brown
Black
Relation between hair and eye color
Green
Hazel
Blue
Brown
72
Grouped continuous variables
Grouped Continuous Variables
• You can use the Distribution Analysis to get
basic grouped plots.
• For better looking plots you need to write
sgplot and/or sgpanel code.
73
Grouped continuous variables
Request distinct graphics
by subgroups.
74
Grouped continuous variables
75
Grouped continuous variables
Actually this took
a bit of voodoo.
76
Grouped continuous variables
2nd
1st
77
Grouped continuous variables
Double click here.
Put details on the
histogram tweaks here.
I use/tweak nrow ncol and endpoints often.
endpoints = 2 to 10 by 0.5
midpoints = 5.6 5.8 6.0 6.2 6.4
78
Grouped continuous variables
79
Grouped continuous variables
80
Grouped continuous variables
50
100
150
Side by Side Violin Plots
A
B
C
81
Grouped continuous variables
Scatter Plot
82
Jittered Plot
83
Grouped continuous variables
Jitter vs. Sunflowers
In R you can also do sunflower plots.
84
Grouped continuous variables
Ordinary Least Squares Regression
• People typically plot a regression line to show a
relationship between two continuous variables.
85
Grouped continuous variables
Bisquare
• Figure out what is an odd value and then put a weight on it to
devalue it. There are many robust regression algorithms
around. R and S-Plus software have them well implemented.
20
15
15
V3
V3
20
10
10
5
5
0
0
1
2
3
4
5
V1
6
7
8
0
0
1
2
3
4
5
6
7
8
V1
86
Grouped continuous variables
Loess and Splines
• Loess is a technique essentially creates a
rolling window and gets a weighted average
across the values visible inside the window.
• Splines are curved lines that allow different
amounts of stiffness to the curves.
87
Smooth = 99
Smooth = 50
Smooth = 25
88
Grouped continuous variables
Tweaking Specialized Plots
• Most analysis procedures now have customized
high resolution graphics. Most are automatically
produced if you type ods graphics on.
• Proc Freq
– I wanted a deviation plot for a 2x2 (or really any sized
table) showing which cell is driving a significant chisquare. They only give you a plot for a one-way table.
– The ORPlot is very nice.
89
Turn on
editable
graphics with
ods listing
sge= on.
Specifying the
plot name is
optional in
proc freq.
90
Deviance Plot
91
ODS Graphics Editor with EG
• If you want to do extensive tweaking to a
graphic, you can use the WYSIWYG ODS
Graphics editor. Unfortunately it only works
with ODS graphics procedures and you need
to rerun the code in SAS to invoke it.
92
Move code from EG to SAS
1. Use the query builder to put your data in a
permanent SAS library (not the work library).
2. Right click on the graphic node which is run on
data in a permanent library and choose Open…
Open Last Submitted Code.
3. Copy the code beginning with the SQL that
makes the data.
4. Start SAS and paste the code into the program
editor.
93
Move all your code to SAS
• Because the ODS graphics editor is not in EG
(yet), you can export the entire set of code for
the project and then rerun it in SAS.
94
ODS Graphics Editor with EG(2)
• After exporting all your EG project, open the
code in SAS and add these lines at the top of
the program:
ods rtf file = "c:\blah\somefile.rtf";
ods listing sge = on;
• Then open the graphic of interest.
95
96
WYSIWYG Editing
• Right click and/or double click to set properties
for objects in the plot.
The tool is optimized for
some of the ODS style
templates but you can use
custom colors.
97
• Right click on things to set properties.
– Colors, text details, fonts
– Point and click annotation
– Symbols, arrows, text, circles
98
WYSIWYG Editing
• While the Statistical graphics editor is a much
needed improvement, it is incomplete. You
can only use a few, style templates (for setting
default colors and such) and you can not use
custom style templates. This means that you
can not do critical tasks like manually set the
color for different values in scatter plots.
99
Too Many Graphics
• If the ods graphics on statement gives you too
many graphics, you can specify which graphics
you want by including code designed for the
procedure. Typically it looks like this: plot(only) =
(table names). This design is poorly implemented
because you need to know where to put the plot
statement and what the table names are. Does it
go on the proc line (like phreg), the tables line
(like proc freq), or some other line? Also the
table names specified with a plot statement do
not always match the ODS table names.
100
• Usually you can use an ODS exclude statement
or an ODS select statement to pick the correct
things to print. Using the plots(only) = syntax
is more efficient.
101
Proc phreg has a lot of new
features but nothing major in the
graphics. With phreg, if you
specify ods graphics on you do
not automatically get any plots.
Here I request survival and
cumulative hazard plots
including the global confidence
limits option (cl).
Once again the option names are not
consistent with the table names.
102
This shows the number of people at risk after 20, 40 etc days.
Proc lifetest can show
the number at risk but
the implementation is
weak. It labels the
groups with numbers
even if the strata are
character strings. You
have to manually edit
them and this affords
ample opportunity for
mistakes.
I don’t see a way to
change the censoring
symbol in the legend.
103
Splitting a Grid
• Some procedures
produce a grid of plots.
You can get access to the
individual plots by
specifying plots(unpack).
Then you can use
plots(only)=tableName
to get just the right
parts.
• ODS select or exclude
statements will not
work.
104
plots(GlobalOptionsGoHere). The global options apply to all
graphics in this procedure.
105
Beyond the Basic Univariate plots
• There are 4 SG procedures that allow you to
build up complex univariate plots and do
multivariate (trellis/lattice) plots.
106
New Graphics Statistical Graphics Procs
• proc sgPlot
– general plotting procedure that replaces gplot
• proc sgScatter
– lots of tools for scatterplots and scatter matrices
• proc sgPanel
– quick and easy trellis/lattice/matrix/panel of plots
• Proc sgRender
– used with proc template to make totally custom plots
– It replaces proc greplay
107
proc sgPlot
• Basic plots
– scatter, series, band, needle
• Fits curves and generates confidence bounds
– loess, regression, penalized b-splines, ellipse
• Distributions
– boxplots, histograms, normal curves, kernel
density
• Categorization
– dot plots, bar charts, line charts
From Heath 2007. SAS/Graph procedures for creating statistical graphics
108
onLineDoc helps (some)
• onlineDoc for sgplot needs a LOT more
hyperlinks and examples. Find these pages:
• The SGPLOT Procedure: Overview
• The SGPLOT Procedure: Examples
• The SGPLOT Procedure: Procedure Syntax
109
As you add more requests to
the plot, it resizes and shifts
things to make room. It draws
them in the order you request
them. It reads the requests
from the first listed to the
bottom. Change the order if
you want to have an item
appear layered on top of, or
behind, another thing.
Some colors are not set yet in the
enhanced editor. Use the menu
Tools>Options>Enhanced Editor…
then click User Defined Keywords
to add the coloring.
110
How is that made?
proc format library = work;
value $smoked
"Non-smoker" = "None
missing = "Missing"
other = "Not none"
;
run;
"
data fram;
set sashelp.heart;
smokin = put(smoking_Status, $smoked.);
run;
111
How is that made?
Layers of features are added to the
graphic in the order listed.
proc sgplot data = fram;
histogram cholesterol;
density cholesterol / type = kernal;
density cholesterol / type = normal;
keylegend /
location=inside position=topright across=1;
run;
112
How is that made?
The statistical graphics language
template can be saved and studied.
proc sgplot data = fram tmplout= "c:\blah\plate.sas";
histogram cholesterol;
density cholesterol / type = kernal;
density cholesterol / type = normal;
keylegend /
location=inside position=topright across=1;
run;
113
proc template;
Note the name of
define statgraph sgplot;
this template.
begingraph;
layout overlay;
Histogram Cholesterol / primary=true binaxis=false LegendLabel="Cholesterol";
;
DensityPlot Cholesterol / Lineattrs=GraphFit kernel() LegendLabel="Kernel" NAME="DENSITY";
;
DensityPlot Cholesterol / Lineattrs=GraphFit2 normal() LegendLabel="Normal" NAME="DENSITY1";
;
DiscreteLegend "DENSITY" "DENSITY1" / Location=Inside across=1 halign=right valign=top;
endlayout;
endgraph;
end;
This was saved in plate.sas.
run;
proc sgrender data = fram template = sgplot;
run;
Render a graphic with the template
and dataset specified.
114
I want to add in a reference line
showing what is normal and put the
categories in order.
115
116
117
Grids
• You can produce lattices full of graphics with
proc gpanel.
118
119
Spaghetti Plots
Data from Singer and Willett:
www.ats.ucla.edu/stat/examples/alda.htm
120
Customizing graphics
• You can tweak the graphics that ship with SAS
by modifying their graph template or you can
create truly custom graphics by making your
own statistical graph template.
Your data
Graph
template
Style
template
121
If you do not want
to explain what
Kernel density
estimation is…
remove the lines.
122
Finding the template
• Add before the procedure that draws the
graphic add ods trace on; and include ods
trace off; afterwards. This prints the names of
all the templates used by the procedure in the
log.
product.procedure.Graphis.TemplateName
123
Looking at a Template
• You can ask proc template to display the
template with the source statement:
proc template;
source stat.ttest.graphics.summary2;
run;
• Remember to type this before you start
editing:
ods path(prepend) work.template (update);
124
This is a complete
template except for the
proc template
statement here and a
run statement at the
bottom.
Copy this into an editor
window and add proc
template.
Don’t Panic
125
After adding proc template and commenting
out the Kernel statements rerun the code.
126
Oops. Unknown key words…
• You can fix the color coding on the template
code easily.
127
Fixed (permanently)
All your subsequent plots will have no density
line.
128
Details on that new template.
• You can ask SAS to list, into the log, all the
locations where the graphics templates are
stored by using the command ods path show:
Your new
template is
stored here.
The
untouchable
original is
here but it is
“masked” by
the 1st one.
129
Want a temporary template?
• You can request that your templates go into
work instead of SASUSER with the command:
ods path (prepend) work.template (update);
• When you quit SAS the template will be
deleted along with everything else in work.
130
Note the dynamic variables
Dynamic variables allow the same
proc template;
template to work with lots of datasets
define statgraph Stat.Ttest.Graphics.Summary2;
notes
"Comparative histograms with normal/kernel densities and boxplots, (two-sample)";
dynamic _Y1 _Y2 _Y _VARNAME _XLAB _SHORTXLAB _CLASS1 _CLASS2 _CLASSNAME _LOGNORMAL _OBSVAR;
BeginGraph;
entrytitle "Distribution of " _VARNAME;
layout lattice / rows=3 columns=1 columndatarange=unionall rowweights=(.4 .4 .2)
shrinkfonts=true;
columnaxes;
columnaxis / display=(ticks tickvalues label) label=_XLAB shortlabel=_SHORTXLAB
griddisplay=auto_on;
endcolumnaxes;
layout overlay / xaxisopts=(display=none);
histogram _Y1 / binaxis=false primary=true;
if ((NOT EXISTS(_LOGNORMAL)) AND (NOT(EXISTS(_PAIRED) AND EXISTS(_RATIO))))
densityplot _Y1 / normal () name="Normal" legendlabel="Normal" lineattrs=
GRAPHFIT;
endif;
*densityplot _Y1 / kernel () name="Kernel" legendlabel="Kernel" lineattrs=GRAPHFIT2;
131
dynamic
• You can see what things/variables are being passed
to a template by a procedure by printing it in a title:
proc template;
define statgraph Stat.Ttest.Graphics.Summary2;
notes
"Comparative histograms with normal/kernel densities and boxplots, (two-sample)";
dynamic _Y1 _Y2 _Y _VARNAME _XLAB _SHORTXLAB _CLASS1 _CLASS2 _CLASSNAME _LOGNORMAL _OBSVAR;
BeginGraph;
entrytitle "Does _Y1 exist? " eval(exists(_Y1)) "
It is the value: " _Y1;
entrytitle2 "Does _VARNAME exist? " eval(exists(_VARNAME)) "
It is the value: " _VARNAME;
*entrytitle "Distribution of " _VARNAME;
This resolves to 1 or 0 depending
on if the variable is used.
132
entrytitle "Does _Y1 exist? " eval(exists(_Y1))
"
It is the value: " _Y1;
133
Setting dynamic Variables
• You can set the values of dynamic variables
when you call them:
proc sgrender data = blah template= thing;
dynamic _var1Label= 'Dude';
run;
134
SGPlot vs Template
• You can replicate everything done with proc
sgplot using the template language but don’t
reinvent the wheel if you don’t need to.
• You will want to use proc template to build
custom graphics that use many panels.
• Proc sgplot uses statements that start like reg
but template uses names like regressionplot.
– Similar but not identical names… boo.
135
136
137
layout gridded = ticks do not have to align
layout lattice = ticks must align
138
139
140
Styles
• You can also tweak the style (aesthetics/
appearance) of your graphics.
Your data
Graph
template
Style
template
141
What styles?
You can use the GUI to look at the details of the
styles or you can explore them with code:
proc template;
source styles.statistical;
run;
• This template includes sections for:
fonts
IndexTitle
GraphFonts
IndexProcName
Table
SystemFooter
Header
GraphColors
Data
Graph
Color
GraphBackground
GraphGridlines
142
Fonts
SysTitleAndFooterContainer
ListItem
TwoColorRamp
GraphMissing
GraphFonts
TitleAndNoteContainer
Paragraph
TwoColorAltRamp
GraphControlLImits
color_list
TitlesAndFooters
List
ThreeColorRamp
GraphRunText
Color
BylineContainer
List2
ThreeColorAltRamp
GraphStars
GraphColors
SystemTitle
List3
GraphOutlier
Html
SstemFooter
Graph
GraphFit—GraphFit2
Text
PageNo
GraphWalls
GraphConfidence—2
GraphClipping
Container
ExtendedPage
GraphAxisLines
GraphPrediction
Layoutcontainer
Index
Byline
GrapGridLines
GraphPredictionLiimits
Document
Parskip
GraphOutliens
GraphError
Body
Continued
Frame
ProcTitle
GraphBorderLines
GraphBoxMedian
Contents
ProcTitleFixed
GraphReference
GraphBoxMean
Pages
Output
GraphTitleText
GraphBoxWhisker
Date
Table
GraphFootnoteText
GraphHistogram
BodyDate
Batch
GraphDataText
GraphEllipse
IndexItem
Note
GraphLabelText
BraphBand
ContentFolder
noteBanner
GraphValueText
GraphContour
ByContentFolder
UserText
GraphUnicodeText
GraphBlock
IndexProcName
PrePge
GraphBackground
ContentProcLabel
NoteContentFixed
GraphFloor
GraphAltBlock
PagesProcLabel
WarnBanner
GraphLegendBackgrond
GraphAnnoLine
IndexTitle
WarnContentFxed
GraphHeaderBackground
GraphAnnotext
ContentsTitle
ErrorBaner
DropShadowStyle
GraphAnnoShape
PagesTitle
ErrorContentFixed
GraphDataDefault
GraphSelection
FatalBaner
GraphData1—GraphData12
GraphConnectLine
GraphBox
There are a LOT
of different parts
of a template that
can be tweaked.
143
Your Own Style Template
• You can customize a style template based on
Use everything in the
another: statistical
template except
Make the graphic element
proc template;
tweaks listed below.
match the background of
define style myStyle;
graphic (invisible camouflage)
parent = styles.Statistical;
style graphdata1 from graphdata1/ color = colors('docbg');
style graphdata2 from graphdata1/ color = violet;
style graphdata3 from graphdata1/ color = turquoise;
style GraphFonts from GraphFonts /
'GraphDataFont' = ("<sans-serif>, <MTsans-serif>", 9pt);
end;
Change the appearance of the font
run;
used for labeling data elements.
144
To get a list of known colors
proc registry list
startat="COLORNAMES";
run;
145
About the colors
• You can pick colors by names or specifying
details
Contrast around 12th item in
grouped data (typically
confidence bounds)
12th item in grouped data
146
About those colors
• The weird color names are colors in RGB
hexadecimal format prefixed with "cx"
• Go play at
kuler.adobe.com/#create/fromacolor
147
Using the style template
• Once the style is created you can apply it to an
ODS destination (pipeline) with code like:
ods listing style= myStyle;
* stuff goes here;
ods listing close;
• or something like this:
ods html style= myStyle;
ods graphics on / width = 11in height = 11in;
proc sgrender data=whatWhen template=blockplot1;
run;
ods html close;
148
How to set the color for a
histogram
149
proc sgplot data = fram;
histogram weight / fillattrs = (color = coral);
run;
150
You can also tweak the
style template
151
Tweaking the Style Template
proc template;
define style myStyle;
parent = styles.Statistical;
style GraphDataDefault / color=coral;
end;
run;
ods html style = myStyle;
proc sgplot data = fram;
histogram weight ;
run;
ods html close;
152
vbar Version
proc sgplot data = fram;
vbar weight / group = sex;
run;
153
proc sgplot data = fram;
vbar weight / group = sex;
xaxis fitpolicy = thin ;
run;
154
proc template;
define style myStyle;
parent = styles.Statistical;
style graphdata1 from graphdata1 / contrastColor=pink color = pink;
style graphdata2 from graphdata1 / contrastColor=blue color = blue;
end;
run;
ods html style = myStyle;
proc sgplot data = fram;
vbar weight / group = sex;
xaxis fitpolicy = thin ;
run;
ods html close;
155
What is the Current color?
proc template;
source styles.default;
run;
kuler.adobe.com/#
156
Setting Colors … The Hard Way
proc template;
define statgraph TABLENAME;
begingraph;
entrytitle '';
layout overlay ;
histogram v /
fillattrs = (color = black)
outlineattrs = (color=orange)
;
endlayout;
endgraph;
end;
run;
proc sgrender data = blah template = TABLENAME; run;
157
Footnotes
• In the template use:
entryfotnote halign=left textattrs=graphvaluetext "TEXT";
• or use the %modtmplt macro
title;
footnote "halign=left textattrs=graphvaluetext 'blah' ";
%modtmplt(template=NAME, step=t, options titles noquotes)
• Use the template then delete temp version:
%modtmplt(template= NAME, step=d)
Search online doc for modtmplt and look at this:
http://support.sas.com/rnd/app/papers/modtmplt.pdf.
158
proc sgplot data = fram;
scatter x = height y = weight;
run;
proc sgplot data = fram;
reg x = height y = weight;
run;
159
ods listing sge = on style = statistical;
proc sgplot data = fram;
reg x = height y = weight /
markerattrs = (color = green)
lineattrs = graphdata1 (color = lime);
run;
160
ods listing style = statistical;
proc sgplot data = fram;
reg x = height y = weight / group = sex ;
run;
161
proc template;
define style sexE;
parent = styles.Statistical;
style graphdata1 /
contrastColor=pink
markersymbol = "star";
style graphdata2 /
contrastColor=blue
markersymbol = "plus";
end;
run;
ods listing sge = on style = sexE;
proc sgplot data = fram;
scatter x = height y = weight / group = sex ;
reg x = height y = weight / group = sex ;
run;
162
163
The syntax for proc template vs.
proc sgplot
• The following slides marked with:
keyboard macro
show the syntax that I have written into
enhanced editor keyboard macros for sgplot
and template.
• So, after downloading and installing the
keyboard macros use the title on the following
slides and it will auto-complete with useful
syntax.
164
keyboard macros
proc template scatter
Required
proc template;
define statgraph TABLENAME;
Instead of title
begingraph;
statement
entrytitle '';
layout overlay /
xaxisopts = (offsetmin=.05 offsetmax=.05 label=' ')
yaxisopts = (offsetmin=.05 offsetmax=.05 label=' '
linearopts =
(tickvaluesequence = (start = end = increment = ) viewmin = )
);
scatterplot y = x = /
For a single panel
datalabel = LABELVARIABLE
markerattrs = (symbol = circlefilled color = black size = 3px);
endlayout;
endgraph;
end;
run;
proc sgrender data =
run;
template = TABLENAME;
Based on code in Statistical Graphics in SAS by Warren F. Kuhfeld
165
proc template;
define statgraph classscatter;
Edge of plot to fist tick
begingraph;
entrytitle 'Weight by Height';
layout overlay /
xaxisopts = (offsetmin=.05 offsetmax=.05 label='Class Height')
yaxisopts = (offsetmin=.05 offsetmax=.05 label='Class weight'
linearopts =
(tickvaluesequence = (start = 50 end = 150 increment = 25) viewmin = 50)
);
Force to include the
scatterplot y = weight x = height /
datalabel = name
lower tick
markerattrs = (symbol = circlefilled
color = black
Tick range to consider
size = 3px
);
endlayout;
endgraph;
end;
run;
proc sgrender data = sashelp.class template = classscatter;
run;
166
keyboard macros
proc sgplot scatter
proc sgplot data = ;
title "";
scatter y = x = / datalabel =
markerattrs = (symbol = circlefilled color = black size = 3px);
xaxis offsetmin = .05 offsetmax = .05 label = "";
yaxis offsetmin = .05 offsetmax = .05 label = "" values = ( to by );
run;
167
Using proc sgplot scatter
proc sgplot data = sashelp.class;
title "Weight by Height";
scatter y = weight x = height /
datalabel = name
markerattrs = (symbol=circlefilled
color=black size =3px);
regressionplot y = weight x = height
xaxis offsetmin = .05 offsetmax = .05 label = "Height";
yaxis offsetmin = .05 offsetmax = .05 label = "Weight" values = (50 to 150
by 25);
run;
168
keyboard macros
Proc template reg
proc template;
define statgraph TABLENAME;
begingraph;
entrytitle '';
layout overlay;
scatterplot y = x = ;
regressionplot y = x =
endlayout;
endgraph;
end;
run;
proc sgrender data =
run;
/ degree = 3;
template = TABLENAME;
169
keyboard macros
Proc sgplot reg
proc sgplot data = ;
title "";
reg y = x = / datalabel =
markerattrs = (symbol = circlefilled color = black size = 3px);
xaxis offsetmin = .05 offsetmax = .05 label = "";
yaxis offsetmin = .05 offsetmax = .05 label = "" values = ( to by );
run;
170
keyboard macros
Proc template loess
proc template;
define statgraph TABLENAME;
begingraph;
entrytitle '';
layout overlay;
scatterplot y = x = ;
loessplot y = x =;
endlayout;
endgraph;
end;
run;
proc sgrender data =
run;
template = TABLENAME;
171
keyboard macros
proc sgplot loess
proc sgplot data = ;
title "";
loess y = x = / datalabel =
markerattrs = (symbol = circlefilled color = black size = 3px);
xaxis offsetmin = .05 offsetmax = .05 label = "";
yaxis offsetmin = .05 offsetmax = .05 label = "" values = ( to by );
run;
172
keyboard macros
proc loess
proc loess global
ods graphics on;
* Locally optimal;
proc loess data =;
model = ;
run;
* Globally optimal fit;
proc loess data= ;
model = / select = AICC(global);
run;
173
keyboard macros
Proc template bspline
proc template;
define statgraph TABLENAME;
begingraph;
entrytitle '';
layout overlay;
scatterplot y = x = ;
pbsplineplot y = x =;
endlayout;
endgraph;
end;
run;
proc sgrender data =
run;
template = TABLENAME;
174
keyboard macros
proc sgplot bspline
proc sgplot data = ;
title "";
pbspline y = x = / datalabel =
markerattrs = (symbol = circlefilled color = black size = 3px);
xaxis offsetmin = .05 offsetmax = .05 label = "";
yaxis offsetmin = .05 offsetmax = .05 label = "" values = ( to by );
run;
175
keyboard macros
proc transreg
For model informaiton on bsplines
* Global optimum;
proc transreg data =;
model identity(OUTCOME) = pbspline(PREDICTOR);
run;
* Local optimum;
proc transreg data = ;
model identity(OUTCOME) = pbspline(PREDICTOR / sbc lambda = 2 10000 range);
run;
176
keyboard macros
Proc template reg group
proc template;
define statgraph TABLENAME;
begingraph;
entrytitle '';
layout overlay;
scatterplot y = x = / group =;
regressionplot y = x = / group = degree = 3 name ="thingy";
discretelegend = "thingy" / title = "";
endlayout;
endgraph;
end;
run;
proc sgrender data =
run;
template = TABLENAME;
177
keyboard macros
Proc sgplot reg group
proc sgplot data = ;
title "";
reg y = x = / group =
datalabel =
markerattrs = (symbol = circlefilled color = black size = 3px);
xaxis offsetmin = .05 offsetmax = .05 label = "";
yaxis offsetmin = .05 offsetmax = .05 label = "" values = ( to by );
run;
178
Proc template barchart
proc template;
define statgraph TABLENAME;
begingraph;
entrytitle '';
layout overlay ;
barchart y = x = /
stat = mean /*freq pct sum */
orient= horizontal;
endlayout;
endgraph;
end;
run;
proc sgrender data = template = TABLENAME;
run;
179
proc sgplot hbar
proc sgplot data = ;
title "";
hbar GROUP / response = RESPONSE stat = mean /*freq mean sum */
numstd = 2
limitstat = /* clm stddev stderr */;
run;
180
proc template histogram
proc template;
define statgraph TABLENAME;
begingraph;
entrytitle '';
layout overlay ;
histogram VARIABLE / endlabels = true;
endlayout;
endgraph;
end;
run;
proc sgrender data =
run;
template = TABLENAME;
181
Proc sgplot histogram
proc sgplot data = ;
title "";
histogram VARIABLE;
run;
182
proc template density
proc template;
define statgraph TABLENAME;
begingraph;
entrytitle '';
layout overlay ;
histogram VARIABLE / endlabels = true;
densityplot VARIABLE / kernel(); /* normal() */
endlayout;
endgraph;
end;
run;
proc sgrender data = template = TABLENAME;
run;
183
proc template fringe
proc template;
define statgraph TABLENAME;
begingraph;
entrytitle '';
layout overlay ;
histogram / endlabels = true;
densityplot / kernel(); /* normal() */
fringeplot ;
endlayout;
endgraph;
end;
run;
proc sgrender data = template = TABLENAME;
run;
184
proc template boxplot
proc template;
define statgraph TABLENAME;
begingraph;
entrytitle '';
layout overlay ;
vbox y = x = / orient = horizontal;
endlayout;
endgraph;
end;
run;
proc sgrender data = template = TABLENAME;
run;
185
proc sgplot boxplot
proc sgplot data = noautolegend;
title "";
boxplot OUTCOME / category = GROUP;
run;
186
proc template series
proc template;
define statgraph TABLENAME;
begingraph;
entrytitle '';
layout overlay ;
seriesplot y = OUTCOME x = DATEVAR / group = GROUPVAR name = 'thingy';
discretelegend 'thingy' / title = "SOMETHING";
endlayout;
endgraph;
end;
run;
proc sgrender data = template = TABLENAME;
run;
187
proc template dot
proc means data = noprint nway;
var OUTCOME;
class THEGROUP;
output out = tmp mean = OUTCOME lclm = lower uclm = upper;
run;
proc template;
define statgraph dotplot;
begingraph;
entrytitle '';
layout overlay / yaxisopts = (type = discrete griddisplay = on reverse = true);
scatterplot y = THEGROUP x = OUTCOME / xerrorlower = lower xerrorupper = upper
markerattrs = (symbol = circlefilled) name = 'thingy'
legendlabel = "mean and 95% Confidence Limits";
discretelegend 'thingy' / title = "whatever";
endlayout;
endgraph;
end;
run;
proc sgrender data = tmp template = dotplot; run;
188
proc template needle
proc template;
define statgraph TABLENAME;
begingraph;
entrytitle '';
layout overlay;
needleplot y = x = ;
endlayout;
endgraph;
end;
run;
proc sgrender data = template = TABLENAME;
run;
189
proc template step
proc template;
define statgraph TABLENAME;
begingraph;
entrytitle '';
layout overlay;
stepplot y = x = / display = (markers) markersize = (size = 3px);
endlayout;
endgraph;
end;
run;
proc sgrender data = template = TABLENAME;
run;
190
proc template block
proc template;
define statgraph TABLENAME;
begingraph;
entrytitle '';
layout overlay;
blockplot x = DATE block = THEBLOCK / filltype=multicolor
datatransparency=.3 valuevalign=top
labelposition=top display=(fill values label) blockindex = IDNUMBER;
endlayout;
endgraph;
end;
run;
proc sgrender data = template = TABLENAME;
run;
191