General Clinical Research Center
August 15, 2005
Rachel Enriquez
What are we going to talk about?
Why should we care about statistical graphics?
What is the theoretical framework for statistical graphics?
When do we make statistical graphics?
How can we produce good quality graphics?
?
Interpretation - good graphs help you understand your data.
Data visualization is part of analysis
Communication of results is the last step in the scientific process
Many people can comprehend the results better by seeing them in a figure than they can by reading them in a table.
Do you have an opinion?
Graphics can help persuade.
Objectively correct graphics can call attention to the result you WANT the viewer to see.
Get Attention
Can you produce exceptional statistical graphics? (me neither)
Do you want people to know that you are committed to the scientific process?
If people understand your research, they’ll listen to you and do what you tell them to.
The Theory of Statistical
Graphics
Data Visualization
Visually encode the data.
Viewers decode the picture
Easy to figure out
Learn something new
See the right comparisons
Hierarchy of Visual Perception
Position along common scale
Position along nonaligned scale
Length
Angle / Slope
Area
Volume
Color
Aesthetics
A personal matter
Unless you ask Tufte
Data / Ink ratio
Avoid 3-D
Fill patterns are bad
Obtain good resolution
Text can be small
(in print)
Aesthetics
When do we make statistical graphics?
For preliminary analysis
Speed
3 5
3 0
2 5
P e r c e n t
2 0
1 5
1 0
5
0
1 6 2 4 3 2 4 0
A g e
4 8 5 6 6 4 7 2
For Publication in Journals
Data density is good.
Excellent resolution is required.
Color is difficult.
Column width is a consideration.
MS office is frequently not an option.
Too many tables!
A plot is better
Confounding variable
TABLES - Consider the on-line supplement
Maybe…
35
30
25
20
15
10
5
0
Esophageal Squamous Cell Carcinoma
Adenocarcinoma of Gastric Cardia
Esophageal Adenocarcinoma
1/wk 2-3/wk
Frequency of Reflux Symptoms
>3/wk
35
30
25
20
15
10
5
1-2 points 2.5-4 point s 4.5-6.5 point s
Reflux Symptom Score
35
30
25
20
15
10
5
<12 yr 12-20 yr
Duration of Reflux
>20 yr
Variable
Frequency of reflux symptoms
Reflux symptom dcore
Duration of reflux symptoms
Esophageal
Adenocarcinoma
35
30
25
20
15
10
5
0
1/week 2-3/week >3/week
20
15
10
5
0
35
30
25
1 - 2 2.5 - 4 4.5 - 6.5
35
30
25
20
15
10
5
0
<12 yrs 12-20 yrs >20 yrs
15
10
5
0
35
30
25
20
20
15
10
5
0
35
30
25
35
30
25
20
15
10
5
0
Adenocarcinoma of the Gastric Cardia
1/week 2-3/week >3/week
1 - 2 2.5 - 4 4.5 - 6.5
<12 yrs 12-20 yrs >20 yrs
35
30
25
20
15
10
5
0
20
15
10
5
0
35
30
25
35
Esophageal Squamous-
Cell Carcinoma
30
25
20
15
10
5
0
1/week 2-3/week >3/week
1 - 2 2.5 - 4 4.5 - 6.5
<12 yrs 12-20 yrs >20 yrs
The figure should be labeled!
Oral Presentations
Data density should be moderate.
Color is available.
LABEL!
Hope you have interesting data
Posters
Smaller audience
Experimentation is good.
Graphics will bring you customers!
Experimentation may, or may
NOT work.
How Do I do this?
How much time do you have?
It is not easy.
There is no perfect, easy to use, cheap software that is going to solve your problems.
This is not too hard
Books are not very helpful
Software changes quickly.
People use different software.
You want to do it
NOW, not after reading for 5 hours.
Surfing the net is frequently useful.
Vector Graphics vs Bitmaps
Vector graphics.
A set of instructions that tells the device how to display the document.
Adobe software is the most common way to edit vector graphics.
Bitmaps
Resolution depends on the size of the computer file.
Easy to open and publish on-line.
Generally not accepted for publication.
Vector graphics can be made into bitmaps.
Bitmaps cannot be made in vector graphics.
Bitmaps, compression, and enlarging
Compression can be
‘lossy’
We are familiar with the grainy effect of enlargement.
Software
SPSS
Many chart options
Graphics can be edited
Can export vector graphics.
SAS
Known for poor graphics.
However, some people produce very good graphs with SAS.
Hope SAS improves and use something else for now?
Stata
Any comments?
R
It is free.
Produces good graphics that can be exported in various formats.
Infinitely customizable
Difficult for the novice statistician / programmer
R clinic
SyStat
EpiInfo
S+
Spotfire
Prism – also available in GCRC computer lab.
Others…..
Sigma Plot
Can be used with Excel and SPSS
Opens other data formats
Menu driven
Multiple graphics options
Easily produces compound graphics
Exports graphics in multiple formats.
MS Office
Windows Metafile is a vector graphic format.
Excel
More control over graphics
Limited selection of graph types
User typically provides the S.E.s and effect estimates.
PowerPoint
Surprisingly good at managing bitmaps.
If you already use it, then improve your graphics by applying aesthetic rules.
For example…
4
3
6
5
2
1
0 negative family history postive family history
6
5 hay fever asthma eczema current wheeze
2
1
4
3
0 negative family history postive family history
Scanners
Scanned figures are an option.
Good way to clean up figures from journals if you’re proficient in Photoshop
The bitmap resolution problem remains
Which file format and program will avoid lossy compression?
Art Software
As a novice graphic preparator, I appreciate the ability to draw on graphs.
Can also ‘cover’ unwanted parts with white shapes.
Group the resulting collection of shapes and save as a picture.
Adobe Illustrator
Adobe
Photoshop
These programs may seem counter-intuitive at first use.
Paint, MS office, etc.
Easy to use
Bitmap products.
Call the experts
The Medical
Illustrators at
VUMC will improve your graphs.
$50/hr
Average graph is
20 minutes.
Grow your own group ‘expert’.