uk14_cox

advertisement
Small multiples, or the science and
art of combining graphs
Nicholas J. Cox
Department of Geography
Durham University, UK
1
1
Small multiples
Good graphics often exploit one simple design that is
repeated for different parts of the data.
Edward Tufte called this the use of small multiples.
Well-designed small multiples are inevitably comparative,
deftly multivariate, shrunken, high-density graphics….
Edward Rolf Tufte (1942–)
2
…in Stata
In Stata, small multiples are supported for different subsets
of the data with by() or over() options of many graph
commands.
Users can emulate this in their own programs by writing
wrapper programs that call twoway or graph bar and its
siblings.
Otherwise, specific machinery offers repetition of a design
for different variables, such as the graph matrix
command.
3
Users can always put together their own composite graphs
by saving individual graphs and then combining them using
graph combine.
This presentation offers further modest automation of the
same design repeated for different data.
4
Original programs discussed are
stripplot
sparkline
crossplot
combineplot
designplot
subsetplot
with cameo roles for aaplot and sepscatter.
All may be installed from SSC.
5
5
What’s in a name?
roseplot by any other name…
A minor theme here is that definite names are needed for
programs, even if kinds of graphs do not have distinct
agreed names.
As in advertising, a good name attracts and keeps users.
As in politics, a bad name can be fatal.
6
stripplot
Show me.
Unofficial nickname of Missouri
7
stripplot
stripplot started as an alternative to graph oneway
in 1999, but by a mix of accident and design has morphed
into an alternative to the official command dotplot.
I have shown results from stripplot in previous
meetings, so I will just feature here some additions to the
latest incarnation.
The aim is to compare univariate distributions with scope
for linear or stacked dot plots, box plots and confidence
intervals. We can now do side-by-side quantile plots.
8
As with dotplot, you can now show reference lines for
means or medians
– and indeed any reference level for which there is a
suitable egen function.
The examples here use Stata’s citytemp and auto
datasets.
9
F)
86
68
50
32
14
NE
N Cntrl
South
Census Region
West
10
whiskers to 5 and 95% points
15000
12500
10000
7500
5000
2500
0
Domestic
Foreign
Car type
11
15000
12500
10000
7500
5000
2500
0
Domestic
Foreign
Car type
15000
12500
10000
7500
5000
2500
0
1
2
3
4
Repair Record 1978
5
sparkline
The purpose of visualization is insight, not pictures.
Ben Shneiderman (1947–)
14
Sparklines
The name “sparkline” was suggested by Edward Tufte for
intense text-like graphics.
Sparklines are typically simple in design, sparing of space
and rich in data, but they include several quite different
kinds of graph otherwise.
The most common kind shows several time series stacked
vertically.
sparkline is a Stata implementation.
15
15
Sparklines have long been standard in several fields,
including physics and chemistry (spectroscopy),
seismology, climatology, ecology, archaeology and
physiology (notably encephalography and cardiography).
Tufte provided an memorable and evocative new name and
an excellent provocative discussion.
The Grunfeld data (webuse grunfeld) are a classic
dataset in panel-based economics. Ten companies were
monitored for 1935–54. They give us a simple sandbox.
16
What are we doing here?
The problem of time series graphics
Comparisons of time series are a rich and challenging area
of statistical graphics.
The widespread term spaghetti plot hints immediately at
the difficulties.
As always, we want to combine a grasp of general patterns
with access to individual details.
With this in mind, we look at some sparklines of the
Grunfeld dataset.
17
17
company 1
2226.3
kstock
2.8
6241.7
mvalue
2792.2
1486.7
invest
257.7
1935
1940
1945
year
1950
1955
18
1
2
3
4
5
6
7
8
kstock
mvalue
invest
kstock
mvalue
invest
1935 1940 1945 1950 1955 1935 1940 1945 1950 1955
9
10
kstock
mvalue
invest
1935 1940 1945 1950 1955 1935 1940 1945 1950 1955
Graphs by company
19
19
1
2
3
2226.3
669.7
4
888.9
414.9
kstock
6241.7
2.8
2676.3 50.5
2803.3
97.8
1001.5
10.2
mvalue
1486.7
2792.2
645.5
1362.4
189.6
1170.6
410.9
174.93
invest
257.7
209.9
5
33.1
6
40.29
7
804.9
238.7
8
511.3
213.5
kstock
183.2
398.4
6.5
927.3
91.9
197
135.72
210.1
100.2
.8
1193.5
191.5
90.08
mvalue
151.2
98.1
89.51
invest
39.67
20.36
23.21
12.93
1935 1940 1945 1950 1955 1935 1940 1945 1950 1955
9
10
468
14.33
kstock
496
162
87.94
3.23
mvalue
213.3
66.11
6.53
58.12
invest
20.89
.93
1935 1940 1945 1950 1955 1935 1940 1945 1950 1955
20
20
Vertical and horizontal
By default sparkline stacks small graphs vertically.
If several graphs are combined, it is typical to cut down on
axis labels and rely on differences in shape to convey
information.
Horizontal stacking is also supported, which can be useful
for archaeological or environmental problems focused on
variations with depth or height.
Here is an archaeological dataset as example.
21
21
levels
cores
blanks
tools
12
12
13
13
14
14
15
15
16
16
17
17
18
18
19
19
20
20
21
21
22
22
23
23
24
24
25
25
3.8
17.7
25.6
74.7
18.6
56.9
22
Nightingale’s data
Florence Nightingale (1820–1910) is well remembered for
her nursing in the Crimean war and (within statistical
science) for use of quantitative arguments.
Her most celebrated dataset is often reproduced using her
polar diagram, but is easier to think about as time series.
Zymotic (loosely, infectious) disease mortality dominates
other kinds, so much so that a square root scale helps
comparison. (A logarithmic scale over-transforms here.)
23
23
24
Watch out: the small print does explain that we are given
superimposed sectors.
Each sector must be assessed as a whole, from the centre
outwards.
The distinct colouring of each annular sector shows only
the outermost part of each sector.
Source of image:
http://understandinguncertainty.org/coxcombs
25
Nightingale's data on mortality in the Crimea
1000
zymotic disease
wounds and injuries
all other causes
800
600
400
200
0
1854
1855
1856
annualised rates per 1000
26
26
Nightingale's
Nightingale'sdata
dataon
onmortality
mortalityininthe
theCrimea
Crimea
1000
900
zymotic
zymoticdisease
disease
wounds
woundsand
andinjuries
injuries
all
allother
othercauses
causes
800
625
400
600
225
400
100
200
25
0 0
1854
1854
1855
1855
1856
1856
annualised
annualised
rates
rates
perper
1000
1000
27
27
Would sparkline help?
A sparkline display is useful to show relative shape, such
as times of peaks.
We see that seasonality is only part of what is being seen.
The harsh winter of 1854–5 coincided with some of the
hardest battles of the war, but 1855–6 was quite
different.
But, as often happens, no one graph dominates others here.
28
28
Nightingale's data on mortality in the Crimea
140.1
all other causes
2.5
115.8
wounds and injuries
.4
1022.8
zymotic disease
1.4
1854
1855
1856
annualised rates per 1000
29
29
crossplot
The scatter plot is the workhorse of statistical graphics.
John McKinley Chambers (1941– )
30
crossplot
crossplot is designed as a quick-and-easy way to
combine scatter plots.
The basic syntax is crossplot (yvarlist)
(xvarlist) and the idea is to plot every y in
yvarlist against every x in xvarlist.
The use of two varlists gives greater flexibility than does
graph matrix, which produces every possible scatter
plot for a single varlist.
31
Scatter plot matrices
Scatter plot matrices are great, but they can be excessive.
Their main feature is also a limitation.
p variables mean p2 plots all at once, so 10 means 100,
and so forth.
(The half option just controls which plots you see. )
32
crossplot design
crossplot was developed in teaching, especially of
regression, with the aim of encouraging focused
comparisons.
Originally (1999) crossplot was called cpyxplot,
cp meaning Cartesian product, but the name was ugly,
cryptic and easily forgotten.
The syntax had to be as simple as possible.
33
crossplot examples
Versions of a response variable versus a key predictor.
A response variable versus versions of a key predictor.
Each output versus each input.
Principal components versus original variables.
First, let us look at four versions of mpg versus weight in
the auto dataset.
34
7
rt_mpg
5
6
40
30
4
20
3
10
4,000
3,000
Weight (lbs.)
2,000
4,000
3,000
Weight (lbs.)
5,000
2,000
4,000
3,000
Weight (lbs.)
5,000
2,000
4,000
3,000
Weight (lbs.)
5,000
4
6
rec_mpg
3.5
3
2
2.5
ln_mpg
8
4
2,000
5,000
35
Next we look at an audiometric dataset used as a
multivariate example in the Stata manuals.
There are 8 response variables, 4 for left ears and 4 for
right ears. Here we just focus on the 16 plots pairing left
and right.
Another graph could be the 4 plots comparing left and right
ears at the same frequency, the diagonal here.
36
-10
0
10 20 30
right ear at 2000H
10 15
-5
-10
0
20 40 60
right ear at 4000H
80
-20
0
20 40 60
right ear at 4000H
80
-20
0
20 40 60
right ear at 4000H
80
-20
0
20 40 60
right ear at 4000H
80
10
0
-10
10
0
40
40
20
0
-20
40
20 40 60 80
40
20
0
-20
0
-20
0
20
-20
20
0
10 20 30
right ear at 2000H
20 40 60 80
20
0 5
left ear at 500H
10 15
-5
-10
-10
-10
0
-10
20
0
-20
20 40 60 80
-20
0
10 20 30
right ear at 2000H
left ear at 1000H
0
10
right ear at 1000H
-10
left ear at 2000H
-10
0
left ear at 4000H
30
20
40
left ear at 4000H
0
10
right ear at 1000H
0
10 20 30
right ear at 2000H
20
-10
40
left ear at 2000H
30
0 5
left ear at 500H
10 15
-5
-10
0
10
right ear at 1000H
10
left ear at 1000H
10
0
-10
40
20
0
-20
20 40 60 80
0
-10
-10
-20
0
10
20
right ear at 500H
30
20
left ear at 1000H
-10
0
10
right ear at 1000H
left ear at 2000H
0
10
20
right ear at 500H
-10
left ear at 4000H
-10
30
20
0
10
20
right ear at 500H
0 5
left ear at 500H
10 15
0 5
-5
-10
-10
20
0
10
20
right ear at 500H
-20
left ear at 500H
left ear at 1000H
left ear at 2000H
left ear at 4000H
-10
40
37
crossplot syntax for examples
crossplot (mpg rt_mpg ln_mpg rec_mpg)
weight, combine(imargin(small))
crossplot (lft*) (rght*), jitter(1)
38
crossplot syntax extras
By default, crossplot is just calling twoway scatter
followed by graph combine.
It follows that recast() is available to recast to twoway
line or twoway connected.
crossplot has an extra sequence() option to label
graphs to ease preparation of graphics for papers
e.g. sequence(a b c d)
39
combineplot
The greatest value of a picture is when it forces us
to notice what we never expected to see.
John Wilder Tukey (1915–2000)
40
combineplot
combineplot is a generalisation of crossplot, more
flexible and inevitably more complicated in syntax.
The general problem of combining plots of similar kind
reduces to a loop producing individual plots and a call to
graph combine. That is bound to be a challenge to
beginning users.
The idea is to avoid that by encapsulating the predictable
syntax within one command.
41
combineplot examples
We will look at a series of univariate examples followed by a
series of bivariate examples.
A great variety is possible, as we can loop over user-written
graphics commands as well as official commands.
42
1
2,000
4,000
4
1
2
3
4
4
3
3
5
5,000
2
2
Headroom (in.)
3,000
Weight (lbs.)
1
5
5
1
2
3
4
5
1
2
3
4
5
43
0
10
30
40
Price
10,000 15,000
5,000
20
Price
10,000 15,000
5,000
40
30
20
0
10
4
3
2
Repair Record 1978
1
4
3
2
Repair Record 1978
5
1
4
3
2
Repair Record 1978
5
1
4
3
2
Repair Record 1978
5
4.0
3.0
2.0
1.0
2,000
3,000
Headroom (in.)
4,000
5.0
5,000
1
5
44
0
10
20
30
Mileage (mpg)
40
Price
10,000 15,000
5,000
15,000
Length (in.)
1,000
2,000
3,000
4,000
Inverse Normal
10
15
20
25
Inverse Normal
30
35
180
200
Inverse Normal
220
240
140 160 180 200 220 240
5,000
10,000
Inverse Normal
1,000 2,000 3,000 4,000 5,000
0
5,000
140
160
45
0
0
Price
10,000 15,000
5,000
Price
10,000 15,000
5,000
10
10
30
1
1
2
2
3
3
4
4
20
30
Mileage (mpg)
20
40
40
a
b
5
c
d
5
Domestic
Foreign
Domestic
Foreign
46
10,00015,000
0
0
5,000
Price (USD)
10,00015,000
5,000
Price (USD)
20
30
Mileage (mpg)
2,000
3,000
4,000
Weight (lbs.)
Domestic
5,000
Foreign
10,00015,000
Price (USD)
10,00015,000
Foreign
0
0
5,000
Price (USD)
Domestic
40
5,000
10
140
160
180
200
Length (in.)
Domestic
220
Foreign
240
100
200
300
400
Displacement (cu. in.)
Domestic
500
Foreign
47
A digression on sepscatter
The last example used sepscatter, a program
automating separation of data points on a scatter plot by
a categorical variable.
The repetition of the legend needs some kind of fix. In this
and similar examples, the legend could be deleted and
explaining symbols left as a task for the text caption.
48
sepscatter and
scatter plot matrices
combineplot with sepscatter meets a felt need,
scatter plot matrices with categorisation of data points.
Here is an example with “size” variables from the auto
dataset. The diagonal scatter plots have meaning, yet are
not conventional. But not every graph need be
immediately publishable.
49
2,000
3,000
4,000
Weight (lbs.)
5,000
2,000
3,000
4,000
Weight (lbs.)
5,000
160
180 200 220
Length (in.)
140
160
180 200 220
Length (in.)
240
140
160
180 200 220
Length (in.)
240
140 160 180 200 220 240
Length (in.)
140
100 200 300 400 500
Displacement (cu. in.)
2,000 3,000 4,000 5,000
Weight (lbs.)
2,000 3,000 4,000 5,000
5,000
140 160 180 200 220 240
3,000
4,000
Weight (lbs.)
100 200 300 400 500
Length (in.)
140 160 180 200 220 240
2,000 3,000 4,000 5,000
Weight (lbs.)
Weight (lbs.)
Displacement (cu. in.)
100 200 300 400 500
Length (in.)
2,000
240
100
200
300
400
Displacement (cu. in.)
500
100
200
300
400
Displacement (cu. in.)
500
100
200
300
400
Displacement (cu. in.)
500
50
2
10000 15000
price = -6.7074 + 2.0441 weight R = 29.0%
0
0
5000
Price (USD)
10000 15000
5000
Price (USD)
2
price = 11253 - 238.89 mpg R = 22.0%
10
n = 74
20
30
Mileage (mpg)
40
2000
RMSE = 2,623.7
n = 74
RMSE = 2,502.3
10000 15000
price = 3029 + 15.896 displace~t R = 24.5%
0
0
5000
Price (USD)
10000 15000
5000
2
price = -4584.9 + 57.202 length R = 18.6%
5000
Price (USD)
2
3000
4000
Weight (lbs.)
140
n = 74
160
180
200
Length (in.)
RMSE = 2,678.7
220
240
100
n = 74
200
300
400
Displacement (cu. in.)
500
RMSE = 2,580.6
51
A digression on aaplot
The last example used aaplot.
aaplot customises automatic annotation of scatter plots
with fitted regressions with text for key results.
Originally, it was written following a request by my Ph.D.
student Alona Armstrong.
52
Back to combineplot
Some examples of its syntax will make clearer how it works.
First look at a univariate example:
combineplot mpg price weight headroom:
graph box @y, over(rep78)
Here we have one varlist and the syntax
@y is a placeholder for the variable name.
53
Next look at a bivariate example:
combineplot price (mpg weight length
displacement): sepscatter @y @x,
ytitle("Price (USD)") sep(foreign)
Here we have two varlists and the syntax elements
@y and @x are placeholders for the variable names.
54
The two varlists may each contain a single variable and
they may be identical.
When both are presented, the combination is the Cartesian
product of the varlists.
Naturally, you can reach through to control the options of
graph combine as well as those of the particular
graph command used.
55
Quirk or quick?
The quirky syntax of combineplot might cause some
queasiness.
Some might recall the obsolete for command.
Confident users would (should) be happy to write their own
loops, topped by graph combine, and that is fine too.
The justification for combineplot is just convenience: it
can be quicker than writing your own script.
56
designplot
Real life is both complicated and short,
and we make no mockery of honest adhockery.
Irving John Good (1916–2009)
57
designplot
Here more than anywhere arbitrariness of names can bite.
If you have used S or S-Plus or R much, you may have
come across “design plots”.
But as implemented there they do not look much like the
graphs you are going to see. Nor are they plots showing
fitted results; nor do they imply experimental design.
To understand designplot, we need to creep up on it step
by step.
58
Mileage (mpg)
(all)
1
2
3
4
5
0
10
20
30
mean
59
Mileage (mpg)
(all)
1
2
3
4
5
Domestic
Foreign
1 Domestic
2 Domestic
3 Domestic
3 Foreign
4 Domestic
4 Foreign
5 Domestic
5 Foreign
0
10
20
30
mean
60
designplot syntax
Minimal syntax specifies a response first, then one or more
predictors.
The predictors should in practice be categorical, meaning
taking on only a small or moderate number of distinct
levels (“factors”, if you like).
The examples were
designplot mpg rep78
designplot mpg rep78 foreign
61
designplot default
The statistics shown are means.
Given one, two, … predictors, the means are shown for all
the data, each one-way breakdown, each two-way
breakdown, ….
designplot uses a syntax of way being 0, 1, 2, …
graph dot is the default vehicle.
statsby underpins calculations.
In essence, we can get a multiscale breakdown.
In practice, we might want to restrict what is shown.
62
Mileage (mpg)
(all)
Domestic
Foreign
1
2
3
4
5
0
20
10
30
mean
63
Restricting designplot
Here we restricted the scope by
designplot mpg foreign rep78,
maxway(1)
Let us look at a different dataset. The response variable for
these data on the Titanic is a binary variable survived,
so its mean is the fraction survived.
We restrict using maxway(2).
64
survived
(all)
crew
first
second
third
child
adult
female
male
crew adult
first child
first adult
second child
second adult
third child
third adult
crew female
crew male
first female
first male
second female
second male
third female
third male
child female
child male
adult female
adult male
0
.2
.4
.6
mean
.8
1
65
So we have here:
the overall mean
one-way breakdowns for three predictors
class, adult, male
two-way breakdowns for combinations
class×adult, class×male, adult×male
66
This kind of graph is for detailed scrutiny, rather than
delivering shock.
Logically similar displays are often used for reporting
opinion poll or electoral results.
67
That reminds us of…
The structure echoes analysis of variance, used
descriptively.
Similar ideas appear in ANOVA and other literature going
back to J.W. Tukey in 1977.
It also echoes the little used official command grmeanby.
By default, grmeanby also shows means.
(Medians are allowed.)
It allows one-way breakdowns only.
68
Means of survived
.7
female
.6
first
second
adult
.3
.4
.5
child
third
crew
.2
male
class
adult
male
69
28
Means of mpg, Mileage (mpg)
26
5
22
24
Foreign
4
20
1
Domestic
18
3
2
rep78
foreign
70
grmeanby
In these examples, grmeanby shows different means
distinctly, but that is not guaranteed.
Using graph dot as a default within designplot
ensures more readability, although that too has its limits.
71
designplot can show
other statistics
You can show any summarize result.
In practice, you would only want to plot results sharing the
same units of measurement (including none at all, as
with skewness and kurtosis).
72
Mileage (mpg)
(all)
Domestic
Foreign
1
2
3
4
5
0
10
min
p25
20
median
30
mean
40
p75
max
73
More to say…
Although based on graph dot by default, designplot
can be recast to graph bar or graph hbar.
Reference lines in the style of grmeanby can also easily be
added.
Although based on summarizing single variables, what
could be simpler than putting different designplots
side-by-side?
74
Mileage (mpg)
Price
(all)
(all)
Domestic
Foreign
Domestic
Foreign
1
2
3
4
5
missing
1
2
3
4
5
missing
0
10
20
30
0
2,000
mean
Weight (lbs.)
(all)
(all)
Domestic
Foreign
Domestic
Foreign
1
2
3
4
5
missing
1
2
3
4
5
missing
1,000
2,000
mean
3,000
6,000
counts
0
4,000
mean
4,000
74
52
22
2
8
30
18
11
5
0
20
40
count
60
80
75
Is this just a reinvention
of graph dot?
No.
graph dot and its siblings are restricted in offering only
one-way or two-way or three-way breakdowns given,
respectively, one or two or three “factors”.
designplot gives scope for saving results for separate
graphing or tabulation.
76
subsetplot
To clarify, add detail.
Edward Rolf Tufte (1942– )
Graphing subsets
subsetplot automates an approach discussed in Stata
Journal 10: 670–681 (2010).
The idea is to plot each subset separately, but with the rest
of the data as a backdrop.
We thus combine juxtaposing and superimposing, in the
hope of getting the best of both approaches.
The cost is some redundancy.
Superimpose or juxtapose?
The principle of superimposing subsets is easy to
understand.
The question is whether it really works in practice.
With even say 5 subsets, mentally extracting each subset
and comparing with the others can be hard work.
Consider a conventional grouped scatter plot and a
subsetplot alternative in our final fling with the auto
data.
40
5
30
5
55
4
44
20
3
44
5
3
5
3
35
4
3 4
4
3
4
212
4
3
3 4
5
5
3
5
32
33
33 33 333
23 1
5
3
4
2 344
2
2 4
3
3
32
3
3
4
44
3
3
10
3
2,000
3,000
Weight (lbs.)
4,000
5,000
The previous graph can be got with
sepscatter mpg weight, sep(rep78)
mylabel(rep78)
The next graph can be got with
subsetplot scatter mpg weight, by(rep78)
2,000
3,000 4,000
Weight (lbs.)
5,000
10
20
30
Mileage (mpg)
10
20
30
Mileage (mpg)
30
10
20
40
3
40
2
40
1
2,000
5,000
30
20
10
10
20
30
Mileage (mpg)
40
5
40
4
3,000 4,000
Weight (lbs.)
2,000
3,000 4,000
Weight (lbs.)
5,000
2,000
3,000 4,000
Weight (lbs.)
5,000
2,000
3,000 4,000
Weight (lbs.)
5,000
With an ordered (Likert) scale such as repair record rep78,
self-describing marker labels can be natural and
effective.
40
30
20
10
10
2
2 22
2
2
3,000 4,000
Weight (lbs.)
5,000
2,000
3,000 4,000
Weight (lbs.)
5,000
40
Mileage (mpg)
5
30
5
40
4
5 55
55
5
20
4
4
4 4
44
5 55 5
10
10
4
44
44 4 4
444
4
2,000
3,000 4,000
Weight (lbs.)
5,000
3
3
3 3
3
3 3 33 333
3
333
33
3
3 33
33 3
10
20
22
1
2,000
30
Mileage (mpg)
30
Mileage (mpg)
30
20
1
20
3
40
2
40
1
2,000
3,000 4,000
Weight (lbs.)
5,000
2,000
3,000 4,000
Weight (lbs.)
33
5,000
The Grunfeld data again
This approach is especially suitable as another way to tackle
the spaghetti problem of plotting multiple time series.
Here are the invest data for different companies.
If the plot seems excessively simple, then there are some
bells and whistles for adding key extras.
1000
10
invest
100
1000
1
1
10
invest
1
1
1935 1940 1945 1950 1955
year
1935 1940 1945 1950 1955
year
1935 1940 1945 1950 1955
year
5
6
7
8
100
10
invest
1
1
10
invest
1
1
10
1000
9
1935 1940 1945 1950 1955
year
100
1000
1
1
1935 1940 1945 1950 1955
year
10
100
10
invest
10
invest
100
1000
1000
100
10
1935 1940 1945 1950 1955
year
1935 1940 1945 1950 1955
year
1000
1935 1940 1945 1950 1955
year
1000
100
100
1000
10
invest
100
1000
100
10
invest
invest
invest
4
3
2
1
1935 1940 1945 1950 1955
year
1935 1940 1945 1950 1955
year
invest
100
invest
40
1
1
10
1
172
10
100
1
190
33
10
invest
100
10
8
5
1
3
1
1935 1940 1945 1950 1955
100
69
13
1
10
invest
1
100
10
49
10
invest
100
1000
10
1000
1935 1940 1945 1950 1955
9
1935 1940 1945 1950 1955
90
10
1
1
10
24
1935 1940 1945 1950 1955
27
100
100
invest
100
136
20
10
40
81
1000
1935 1940 1945 1950 1955
7
1000
1935 1940 1945 1950 1955
6
1000
1935 1940 1945 1950 1955
5
1000
1935 1940 1945 1950 1955
invest
invest
invest
1000
459
210
100
1487
318
invest
4
1000
3
1000
2
1000
1
1935 1940 1945 1950 1955
1935 1940 1945 1950 1955
How far can we go?
The Grunfeld data are perhaps at the trivial end of this
problem.
For a stiffer challenge, here are some data for the 28
countries of the European Union on long-term
unemployment.
As often, a graph can be valuable in suggesting what else to
plot….
Spain
199219962000200420082012
15
10
5
0
10
15
0
5
Long-term unemployment (%)
15
10
5
10
15
0
5
Long-term unemployment (%)
0
199219962000200420082012
199219962000200420082012
15
10
5
0
10
15
0
5
Long-term unemployment (%)
Slovenia
Li thuani a
199219962000200420082012
Irel and
Portugal
199219962000200420082012
Greece
10
5
199219962000200420082012
5
10
15
199219962000200420082012
0
10
15
0
5
Long-term unemployment (%)
10
15
0
5
Long-term unemployment (%)
Estoni a
Bulgaria
0
199219962000200420082012
199219962000200420082012
Romani a
199219962000200420082012
15
199219962000200420082012
Croatia
199219962000200420082012
10
15
0
5
Long-term unemployment (%)
10
15
0
5
Long-term unemployment (%)
199219962000200420082012
Slovakia
10
15
0
5
Long-term unemployment (%)
199219962000200420082012
Hungary
Germany
Luxembourg
199219962000200420082012
10
15
0
5
Long-term unemployment (%)
Ital y
Czech Republi c
199219962000200420082012
199219962000200420082012
10
15
0
5
Long-term unemployment (%)
10
15
0
5
Long-term unemployment (%)
Cyprus
Poland
Denmark
199219962000200420082012
199219962000200420082012
199219962000200420082012
10
15
0
5
Long-term unemployment (%)
10
15
0
5
Long-term unemployment (%)
Latvi a
Malta
10
15
0
5
Long-term unemployment (%)
France
10
15
0
5
Long-term unemployment (%)
10
15
0
5
Long-term unemployment (%)
199219962000200420082012
Belgi um
10
15
0
5
Long-term unemployment (%)
10
15
0
5
Long-term unemployment (%)
10
15
0
5
Long-term unemployment (%)
199219962000200420082012
199219962000200420082012
10
15
0
5
Long-term unemployment (%)
United Kingdom
Fi nland
199219962000200420082012
199219962000200420082012
199219962000200420082012
Netherlands
Sweden
10
15
0
5
Long-term unemployment (%)
10
15
0
5
Long-term unemployment (%)
10
15
0
5
Long-term unemployment (%)
Austria
199219962000200420082012
199219962000200420082012
The main players again were
stripplot
sparkline
crossplot
combineplot
designplot
subsetplot
Our attraction to images as a source of understanding
is both primal and pervasive.
Stephen Jay Gould (1941–2002)
90
90
Download