Effective Use of Graphs
Annie Herbert
Medical Statistician
Research & Development Support Unit
Salford Royal (Hope) Hospitals NHS Foundation Trust
annie.herbert@manchester.ac.uk
(0161 720) 2227
Timetable
Time
Task
60 mins
Presentation
20 mins
Coffee Break
90 mins
Practical Tasks in
IT Room
Outline
• Graphs for categorical data
• Graphs for numerical data
• Comparing groups
• Additional graphs (covered in other courses)
• Final tips & Computer packages
Categorical Data (1)
Examples:
• Sex
– Male/Female
• Blood Group
– A/B/AB/O
• Employment Status
– Unemployed/Part-time/Full-time
Categorical Data (2)
• Record:
Frequency (discrete number) per category
• Summary: Frequency OR
percentage/fraction/proportion
• Visually:
Official Employment Status of Population of Camberwick
Green
Official Employment Status of Population of Camberwick
Green
2500
Frequency
2000
1500
Unemployed
1000
Part-time
Full-time
500
0
Unemployed
Part-time
Full-time
Employment Type
- Bar Chart
- Pie Chart
Example – Discharge Destination (1)
Where Patient Lives
n = 731
Alone
339 (46.3%)
Family
210 (28.7%)
Home
180 (24.6%)
Other
2 (0.3%)
Example – Discharge Destination (2)
Frequency
Discharge Destinations of Patients
400
350
300
250
200
150
100
50
0
Alone
Family
Home
Discharge Destination
Other
Example – Psychiatric Illness/
Discharge Destination (1)
Psychiatric Illness?
Where
Patient
Lives
No
n=208
Yes
n=523
Alone
117 (56%)
222 (42%)
Family
81 (39%)
129 (25%)
Home
9 (4%)
171 (33%)
Other
1 (0%)
1 (0%)
Example – Psychiatric Illness/
Discharge Destination (2)
Where Patient Lives
Psychiatric
Illness?
Alone (n=339)
Family (n=210)
Home (n=180)
Other (n=2)
No
117 (35%)
81 (39%)
9 (5%)
1 (50%)
Yes
222 (65%)
129 (61%)
171 (95%)
1 (50%)
Example – Psychiatric Illness/
Discharge Destination Bar Chart
Discharge Destination of Patients with and without
Psychiatric Illness
250
Frequency
200
150
No
Yes
100
50
0
Alone
Family
Home
Discharge Destination
Other
Stacked Bar Chart
Discharge Destination of Patients with and without
Psychiatric Illness
100%
90%
80%
Percentage
70%
60%
Yes
50%
No
40%
30%
20%
10%
0%
Alone
Family
Home
Discharge Destination
Other
Re-ordering categories can
emphasize a certain effect:
Discharge Destination of Patients with and without Psychiatric
Illness
Discharge Destination of Patients with and without
Psychiatric Illness
100%
100%
90%
90%
80%
80%
70%
Other
60%
Home
50%
Family
40%
Alone
Percentage
Percentage
70%
Family
60%
Other
50%
Home
40%
30%
30%
20%
20%
10%
10%
Alone
0%
0%
No
Yes
Psychiatric Illness?
No
Yes
Psychiatric Illness?
The axis should always start from 0:
Discharge Destination (Alone, Family) for Patients with
and without Psychiatric Illness
Discharge Destination (Alone, Family) for Patients with
and without Psychiatric Illness
250
230
210
190
150
Alone
Family
100
Frequency
Frequency
200
170
Alone
150
Family
130
110
50
90
0
70
No
Yes
Psychiatric Illness?
No
Yes
Psychiatric Illness?
Bar Charts – Adv & Disadv
• Advantages:
- Visually strong.
- Easy to compare between more than one
dataset.
• Disadvantages:
- Categories can be ‘re-ordered’ to emphasize
certain effects.
- Misleading if not used for counts.
- Misleading if y-axis not from 0.
Bar Charts – Things to consider:
• What group differences are you interested in?
• Frequencies or percentages? If percentage, it’s
down to you to specify the totals.
• Is ‘Other’ a large frequency/percentage?
• Consider the categories as un-ordered when
using a stacked bar chart.
Pie Charts
Psychiatric Illness? No
Psychiatric Illness? Yes
Home
Home
Alone
Alone
Family
Family
Other
Other
Pie Charts – Advantages:
• Easy to compare categories, are
equidistant from each other.
• Ordering of categories does not
emphasize certain effects as badly as
stacked bar charts do.
Pie Charts – Disadvantages:
• No choice between frequencies and
percentages (down to you to specify totals).
• Cannot put more than one data set into a pie
chart.
• Lose individual values of the data.
• Limited space: if using more than 5 or 6
categories, chart can look complicated.
Numerical Data (1)
Examples:
• Weight
• Blood Pressure
• Cholesterol Levels
Numerical Data (2)
• Record:
Number/Value
(discrete or continuous)
• Summary:
- Mean (SD)
- Median (IQR)
• Visually:
- Histogram
- Box plot
- Spread plot
Age
48
36
56
66
65
19
36
59
48
Data – Ages of Patients in
Selenium Study
52
67
39
28
58
48
49
39
57
62
74
59
66
45
69
55
63
42
68
54
24
19
70
73
29
34
50
Histogram – Ages of Patients in
Selenium Study
Histograms for the same data can vary:
Compromise:
Beware!
Histogram is not Bar Chart
400
400
300
300
200
200
100
Count
100
0
<5
0
10
30
50
70
90 110 130 150 170 190
Length of stay (day s)
7to14
5to7
31to60
15to30
Length of stay (day s)
61to120
>120
Histograms – Advantages:
• Visual display of interval frequencies, easy
to compare intervals.
• Can give an idea of the distribution of the
data, e.g. shape, typical value, spread.
Histograms – Disadvantages:
• Choice of interval width can alter
appearance.
• Individual values lost.
• One data set per histogram, difficult to
compare data sets.
Box Plot
Extreme
Outlier
Median
Upper Quartile
Lower Quartile
Outlier
Box Plots – Advantages:
• Defines many summary statistics in one plot.
• Defines ‘outliers’ explicitly.
• Can have more than one data set in a plot,
so easy to compare data sets:
Box Plots – Disadvantages:
• More complicated visually than
some other types of data plots.
• Individual values lost.
Spread Plots (1)
Spread Plots (2)
• Advantages:
- Can give an idea of the distribution of the
data, e.g. shape, typical value, spread.
- Shows individual values of the data.
- Can show more than one dataset in a plot.
• Disadvantages:
- Not very widely used in journal
publications.
- Doesn’t explicitly summarise statistics or
outliers as box plot does.
Relationships in Numerical Data
Serial Measurements
Mean TG (±standard error) at each time point
Change of TG over time
3
3
E 1.1
E 2.2
E 3.2
E 4.1
2
2
Mean TG (mM)
E 5.1
TG (mM)
E 6.1
1
0
-100
150
400
650
Time (minutes)
1
0
-100
150
400
650
Time (minutes)
What information does this give?
Mean ± SE, n ≈ 30 per group
Better to look at individual data…
…or give a sensible summary.
Kaplan-Meier Curve (step graph)
Time-to-Event data.
Survival Plot (PL estimates)
1.00
1
0
0.75
Survivor
0.50
0.25
0.00
0
100
200
300
Times
Bland-Altman Plots (scatter plots)
How well do two methods of measurement agree?
Agreement Plot (95% limits of agreement)
100
difference
50
0
-50
-100
200
250
300
350
400
450
mean
Forest Plots (Hi-Lo-Close charts)
Meta-Analysis.
Forest (meta-analysis) plot
AH
0.72 (0.48, 1.08)
SW
0.68 (0.45, 1.03)
MT
0.80 (0.60, 1.07)
KW
0.80 (0.47, 1.36)
Pooled
0.75 (0.50, 1.14)
0.2
0.5
1
2
Final Pointers:
• Before plotting think about the type of data and
what you would like to compare.
• Show all data rather than summaries where
possible.
• Label axes clearly. Graph should ‘stand alone’.
• Make sure when comparing groups that
outcome on the same scale.
• Make sure any colours used are sufficiently
different from each other, and not red/green.
Using a Computer Package:
Package
Advantages
Disadvantages
SPSS
Produces journal
quality graphs
• Difficult to
start with
• Expensive
Difficult to draw
bar/pie charts
StatsDirect When copied and
pasted, these
graphs may be
edited in Word
Excel
Easy to use for
bar/pie charts
Not a statistics
package