Basic Plots

advertisement
Basic Plots
Univariate, bivariate, multivariate
Histogram, boxplot, dotplot, barchart, spine plot
Scatterplot, density plots, mosaic plot
Parallel coordinate plot, profile plots
Maps
Time series plots
Data and its shapes
Data comes in a lot of different formats
We will assume that we can always get it
into a shape (spread-sheet like) with
headers at the top
columns for each piece of information
and rows for each object
Univariate
A dotplot is used for real-valued
variables.
A dot is positioned along an axis to
represent the data value.
0
5
10
% capitals
15
%Cap
14
12.8
13
1.3
13
10
Breaks at $1
60
40
0
20
Frequency
Histograms are
used for realvalued variables.
80
Univariate
0
1
2
3
4
5
6
7
8
9
10
Tips ($)
0
10
20
30
40
Breaks at 10c
Frequency
Values are binned
and the count is
displayed by a
rectangle
0
1
2
3
4
5
Tips ($)
6
7
8
9
10
Univariate
Q3
3.6
Q1
2
Min
1
4
Median 2.9
6
8
10
2
The boxplot displays
just these 5 numbers
Max
Tips
The data values are
summarized by 5
numbers: min, Q1,
median, Q3, max
10
Boxplots are used for
real-valued variables
Univariate
87
157
100
150
Gender of person paying the bill
50
The count for each
category is
represented by the
height of a
rectangle
M
0
Barcharts are
used for
categorical data
F
F
M
Univariate
Spine plots are used for
categorical variables
The count for each
category is represented
by the width of a
rectangle
Most useful when there
are t wo variables or
more.
F
M
87
157
Gender
F
M
Bivariate
Tip
8
●
●
●
●
●
●
6
●
●
●
●
●● ● ●
●● ● ● ●
●
●●
●
●
●
●
●
●
● ●● ●
● ● ●● ●
●
● ●
●●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ● ● ●● ●
●
●●●●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●●●
●
●●●●● ●●●● ●
●● ●
●
●
●
●● ● ●
●●
●●●
●
●●
● ●
●
●●
●
●
●
●
●
●
● ●
●
●● ●
●
●
● ●●
●●
●●
●
●
●●
●
●
●
●
●
●
●●
●
●
●● ●
● ●● ●
●
●
●
●
●●●● ● ●
●
●
●
●
●●
●
●
●● ●
● ●
●
●
●●
●
●●
●●
●
● ●
●
●
● ●●
●
2
4
●
●
●
0
16.99 1.01
10.34 1.66
21.01 3.5
23.68 3.31
...
...
●
r=0.68
Total Tip
Bill
10
Scatterplots place a dot representing
a pair of numbers on a Cartesian plane
0
10
20
30
Total Bill
40
50
Bivariate
Density plots : hexagonal grids (Carr)
10
Counts
Tip
8
9
7
5
4
3
2
2
1
1
6
4
2
10
20
30
Total Bill
40
50
Thu
F
It starts from a spine
plot and divides the
bars according to
counts of a second
variable.
F 32 9 28 18
M 30 10 59 58
M
A mosaic plot
represents a t wo way
table of categorical
variables.
Thu Fri Sat Sun
Gender
Bivariate
Fri
Day
Sat
Sun
Barcharts with t wo variables
Stacked barchart
Side-by-side barchart
80
Male
Female
Male
0
0
10
20
20
40
30
40
60
50
Female
Male
Thu
Fri
Sat
Day
Sun
Thu
Fri
Sat
Day
Sun
Multivariate
Tip
16.99
10.34
21.01
23.68
...
1.01
1.66
3.5
3.31
...
Standardized Values
Bill
Standardized Values
A parallel coordinate plot changes
from orthogonal Cartesian axes to
parallel axes.
Bill
Tip
Variables
Bill
Tip
Variables
Standardized Values
Parallel
coordinate
plots
Look for
patterns in
the direction
of the lines
Bill
Tip
Variables
Gender
Parallel coordinate plots
Measurements on Beetles
Standardized Values
What
patterns
do you see
here?
tars1
aede3
aede2
aede1
Variables
head
tars2
Maps
Convention North at top
The problems with taking longitude at
number value
Aspect ratio of lat to long
Small regions/areas and reading
information
Longitude at numerical value and
aspect ratio of longitude to
latitude
-50
0
50
Can you imagine
what the world
would look like if
the vertical and
horizontal plot
space were equal?
-150
-100
-50
0
50
100
150
This location is both
-180 and 180
Small areas/regions
2004 election
results on map:
red=republican,
blue=democrat
Cartogram of 2004
election results
http://www-personal.umich.edu/~mejn/election/
Time series plots
40
Fri
Lines indicate
temporal
dependency
Sat
Sun
Thu
50
100
150
Day
0
Average number of sunspots
0
20
Count
60
80
Temporal scale:
Days of weeks
need to be in
conventional
order, lines, ...
What’s
wrong with
this plot?
1750
1760
1770
1780
1790
1800
1810
1820
1830
1840
1850
1860
1870
Year
1880
1890
1900
1910
1920
1930
1940
1950
1960
1970
1980
1990
Combinations
Beetles
120
130
140
120 130 140 150
60
80
100
140
120
160
200
tars1
110
120
130
tars2
45
50
55
head
16
120 130 140 150
aede1
12
14
aede2
120
8
10
~
80
100
aede3
60
Small multiples is an
approach advocated by
Tufte to plotting
multiple variables in a
digestible way. This
might be considered as
combinations of basic
plots.
120
240
110
120
160
200
240
45
50
55
8
10
12
14
16
Waseca
Dotplots
Barley yield for
t wo different
years for 6
locations in
Minnesota and 10
varieties
Trebi
Wisconsin No. 38
No. 457
Glabron
Peatland
Velvet
No. 475
Manchuria
No. 462
Svansota
Crookston
Trebi
Wisconsin No. 38
No. 457
Glabron
Peatland
Velvet
No. 475
Manchuria
No. 462
Svansota
Morris
Trebi
Wisconsin No. 38
No. 457
Glabron
Peatland
Velvet
No. 475
Manchuria
No. 462
Svansota
1932
1931
University Farm
Trebi
Wisconsin No. 38
No. 457
Glabron
Peatland
Velvet
No. 475
Manchuria
No. 462
Svansota
Duluth
Trebi
Wisconsin No. 38
No. 457
Glabron
Peatland
Velvet
No. 475
Manchuria
No. 462
Svansota
Grand Rapids
Trebi
Wisconsin No. 38
No. 457
Glabron
Peatland
Velvet
No. 475
Manchuria
No. 462
Svansota
20
30
40
50
60
Barley Yield (bushels/acre)
Boxplots or
Dotplots
Soprano 1
Different representations of the
heights of the New York Choral
Society
Soprano 2
Soprano 1
Alto 1
Alto 1
Alto 2
Alto 2
Tenor 1
Soprano 2
Tenor 1
Tenor 2
Tenor 2
Bass 1
Bass 1
Bass 2
Bass 2
60
65
70
Height (inches)
60
65
70
75
~
75
Basic plots form the core
How does Napoleon’s March use basic
plots? time series plot + map + barchart
How is John Snow’s cholera map related to
a basic plot? Map + scatterplot
Napoleon’s March
Basic
plot(s)?
Cholera
map
…And on a Global Scale
NORWAY
BRITAIN
COSTA RICA
ECUADOR
PANAMA
PERU
COLOMBIA
ECUADORBOLIVIA
PERU PARAGUAY
CHILE
BOLIVIA
BOLIVIA
KAZAKHSTAN
MONGOLIA
AFGHANISTAN
BELGIUM
AUSTRIA
GER.
UKRAINE
SOUTH
KOREA
CHINA
RUSSIA
FRANCE
CZECH REP.
DOMINICAN
REPUBLIC
PORTUGAL
Crotia
BARBADOS
ALGERIA
Belguim
Poland
KazakhstanSwitzerland
France
TRINIDAD/TOBAGO
Russia Fed
VENEZUELA
BARBADOS
Bosnia/
Herz
JAMAICA
MEXICO
HAITI
GUATEMALA
BELIZE
EL SALVADOR
NICARAGUA
HONDURAS
GUATEMALA
GUATEMALA
COSTA RICA
VENEZUELA
PANAMA
EL SALVADOR
NICARAGUA
NICARAGUA COLOMBIA
BELARUS
IRELAND
BAHAMAS
CUBA
MEXICO
NEPAL
ESTONIA, LATVIA, LITHUANIA
NETHERLANDS
BARBADOS
SURINAME
Norway
TRINIDAD, TOBAGO
GUYANAMAURITANIA
Finland
Romania
GUYANA
Albania
Hungary
Greece
England
Spain
SURINAME
Ireland
Italy
Portugal Germany
Bulgaria
MOROCCO
MALI
SURINAME
SENEGAL
BRAZIL
GAMBIA
GUINEABISSAU
ARGENTINA
CHILE
PARAGUAY
URUGUAY
URUGUAY
Myanmar
INDIA
MIDDLE
EAST
BALKANS
LIBYA
Moldova
SUDAN
MYANMAR
BANGLADESH
VIETNAM
EGYPT
PHILIPPINES
Yugo
LAOS
THAILAND
UGANDA
NIGERIA
Czech
Rep
Lithuania
Latavia
BURKINA
FASO
CAMBODIA
NIGER
GUINEA
SRI LANKA
GHANA
SIERRA
LEONE
CHILE
PAKISTAN
ITALY
SPAIN
TUNISIA
Austria
Ukraine
CAMEROON
MALAYSIA
CHAD
LIBERIA
SINGAPORE
IVORY
COAST
ARGENTINA
BENIN
TOGO
AIDS in
the NY
Times
JAPAN
Slovenia
UNITED STATES
POLAND
DENMARK
In many countries, reliable statistics on AIDS are
hard to obtain. Unaids, the United Nations AIDS
agency, relies on these numbers. Each square on
the grid represents 2,500 people with AIDS.
CANADA
FINLAND
SWEDEN
CANADA
ERITREA
ETHIOPIA
CENTRAL RWANDA
AFRICAN
REPUBLIC
INDONESIA
DJIBOUTI
Highly Industrialized Countries
PAPUA
NEW
GUINEA
AUSTRALIA
Latin America and the Caribbean
CONGO
REP.
North Africa and the Middle East
EACH DOT REPRESENTS
2,500 PEOPLE LIVING
WITH AIDS
KENYA
GABON
BURUNDI
NEW
ZEALAND
CONGO
Eastern Europe and Central Asia
Southern and Eastern Asia
MALAWI
Sub-Saharan Africa
25 million
4 million
ZAMBIA
NEW CASES
The estimated number of new
H.I.V./AIDS cases in highly
industrialized countries has
decreased slightly since
the 1980’s but has
continued growing in
sub-Saharan Africa.
LIVING WITH AIDS
ANGOLA
ZIMBABWE
BOTSWANA
SWAZILAND
EASTERN
EUROPE AND CENTRAL ASIA
HIGHLY INDUSTRIALIZED
COUNTRIES
LATIN AMERICA AND THE CARIBBEAN
NORTH AFRICA AND THE MIDDLE EAST
* Preliminary numbers
Source: UNAIDS
15 million
10 million
SUB-SAHARAN AFRICA
SOUTHERN AND
EASTERN ASIA
’80 ’82 ’84 ’86 ’88 ’90 ’92 ’94 ’96 ’98 ’00*
20 million
NAMIBIA
1 million
0
The estimated number of
people living with
H.I.V./AIDS has exploded
in sub-Saharan Africa
while staying relatively
level in highly
industrialized countries.
MOZAMBIQUE
3 million
2 million
SUB-SAHARAN AFRICA
TANZANIA
Sweeden
LESOTHO
SOUTHERN AND
EASTERN ASIA
MADAGASCAR
5 million
HIGHLY INDUSTRIALIZED COUNTRIES
LATIN AMERICA AND THE CARIBBEAN
SOUTH
AFRICA
0
EASTERN EUROPE AND CENTRAL ASIA
NORTH AFRICA AND THE MIDDLE EAST
’80 ’82 ’84 ’86 ’88 ’90 ’92 ’94 ’96 ’98 ’00*
Source: *UNAIDS
Preliminary
numbers
Download