talk on visualization

advertisement
Data Visualization
© J.F. Campbell UM St. Louis 2015
http://www.nytimes.com/interactive/2009/07/31/business/20080801-metrics-graphic.html?_r=1& 1
Overview
1. Why use visualization?
2. Types of visualizations.
3. Design guidelines.
4. Infographics.
5. Tableau example.
Visualize This http://www.youtube.com/watch?v=mkEXx7sDXAI#t=69
Much of this is drawn from materials at the Duke University Library
Data Visualization site: http://guides.library.duke.edu/datavis/
© J.F. Campbell UM St. Louis 2015
2
Why not Statistics?
• Consider the following four sets of 11 (x,y)
coordinates:
1
3
2
x
y
x
y
10
8
13
9
11
14
6
4
12
7
5
8.04
6.95
7.58
8.81
8.33
9.96
7.24
4.26
10.84
4.82
5.68
10
8
13
9
11
14
6
4
12
7
5
9.14
8.14
8.74
8.77
9.26
8.1
6.13
3.1
9.13
7.26
4.74
x
10
8
13
9
11
14
6
4
12
7
5
y
7.46
6.77
12.74
7.11
7.81
8.84
6.08
5.39
8.15
6.42
5.73
4
x
y
8
8
8
8
8
8
8
19
8
8
8
6.58
5.76
7.71
8.84
8.47
7.04
5.25
12.5
5.56
7.91
6.89
• Are they similar?
© J.F. Campbell UM St. Louis 2015
3
Summary Statistics
1
Mean:
Variance:
Correlation:
3
2
x
y
x
y
10
8
13
9
11
14
6
4
12
7
5
8.04
6.95
7.58
8.81
8.33
9.96
7.24
4.26
10.84
4.82
5.68
10
8
13
9
11
14
6
4
12
7
5
9.14
8.14
8.74
8.77
9.26
8.1
6.13
3.1
9.13
7.26
4.74
9.00
11.00
0.816
9.00
11.00
0.816
x
10
8
13
9
11
14
6
4
12
7
5
y
7.46
6.77
12.74
7.11
7.81
8.84
6.08
5.39
8.15
6.42
5.73
9.00
11.00
0.816
4
x
y
8
8
8
8
8
8
8
19
8
8
8
6.58
5.76
7.71
8.84
8.47
7.04
5.25
12.5
5.56
7.91
6.89
9.00
11.00
0.816
Linear regression line = 3.00 + 0.500X for all 4!
• Statistically, they seem pretty similar…
© J.F. Campbell UM St. Louis 2015
4
Similar?
1
14
x
2
10
14
12
12
10
10
8
8
6
6
4
4
2
2
8
13
9
11
14
6
4
12
7
5
0
0
0
5
10
3
14
0
15
12
10
10
8
8
6
6
4
4
2
2
0
0
© J.F. Campbell UM St. Louis 2015
5
10
15
x
y
8
8
8
8
8
8
8
19
8
8
8
6.58
5.76
7.71
8.84
8.47
7.04
5.25
12.5
5.56
7.91
6.89
5
10
5
10
15
4
14
12
y
7.46
6.77
12.74
7.11
7.81
8.84
6.08
5.39
8.15
6.42
5.73
0
0
15
20
5
Why Visualization?
• To discover new things about your data.
– The most effective way for humans to understand
complex data (and large amounts of data) is visually!
• To tell a story using data.
• To provoke and answer questions.
• To facilitate analysis.
• To better communicate.
• Visualization leverages human visual
capabilities for data analysis.
© J.F. Campbell UM St. Louis 2015
6
The Great One
http://dataremixed.com/2011/08/tribute-to-the-great-one/
© J.F. Campbell UM St. Louis 2015
7
Stages
1. Identify the topic of interest and relevant
questions.
2. Obtain useful and relevant data.
3. Explore the data to identify interesting
relationships:
 Look for trends, patterns and differences across
categories, space and time.
4. Represent the data (maps, charts, etc.).
5. Refine the presentation with your audience in
mind.
6. Provide tools to manipulate or interact with
the data.
© J.F. Campbell UM St. Louis 2015
8
Types of Visualizations
1. 2D and Planar (geospatial):
a. Types: Choropleth, Cartogram…
b. Use a map to show where
something is.
c. Maps are best combined with other
charts to provide details on what
the map shows.
2. Temporal: For changes over
time.
a. Time series or line chart.
b. Stream graph.
c. Polar chart.
© J.F. Campbell UM St. Louis 2015
9
Temporal Charts
http://www.nytimes.com/interactive/2008/02/23/movies/20080223_REVENUE_GRAPHIC.html#
© J.F. Campbell UM St. Louis 2015
10
Types of Visualizations
1. Sankey diagram:

Map flows.
2. Histogram or bar chart.
© J.F. Campbell UM St. Louis 2015
11
Types of Visualizations
3. Bubble chart.
With motion: http://www.logeeka.com/motion_chart.html
© J.F. Campbell UM St. Louis 2015
12
Types of Visualizations
4. Tree maps and hierarchical charts.
© J.F. Campbell UM St. Louis 2015
13
Types of Visualizations
5. Networks.
Vaccine game:
6. Radar chart.
© J.F. Campbell UM St. Louis 2015
http://vax.herokuapp.com/game
http://worldshap.in/
14
Baseball Visualizations
Spray charts for
Justin Heyward
http://www.fangraphs.com/
0.20%
0.40%
0.70%
Pitch to RH batter with
0-2 count: strike=46.0%
8%
0.30% 0.50% 0.70% 0.70% 1.00% 1.30% 0.70% 0.30%
0.90% 1.60% 2.50% 2.80% 2.10% 1.20% 1.10% 1.00%
1.10% 1.40% 2.40% 2.90% 2.90% 2.30% 1.50% 1.30% 1.00% 0.90%
2.00% 2.80% 2.90% 2.90% 2.50% 1.50% 0.90% 0.50%
2.10% 2.50% 3.00% 3.10% 2.10% 0.90% 0.50% 0.40%
batter
0.50% 0.90% 1.40% 1.90% 1.80% 1.20% 0.80% 0.60%
0.80% 1.10% 1.50% 1.50% 1.50% 1.20% 1.00% 1.40%
3.60% 0.80% 1.20% 1.60% 1.40% 1.50% 1.20% 1.20% 1.40% 1.40%
1.20% 1.50% 1.50% 1.40% 1.30% 1.50% 1.40% 0.80%
1.50% 1.30% 1.20% 1.30% 1.30% 1.50% 1.10% 0.80%
1.30% 1.40% 1.10% 1.20% 1.50% 1.20% 0.80% 0.80%
1.60% 1.80% 1.70% 1.80% 1.10% 0.60% 0.40% 0.20%
1.50%
1.10%
0.80%
0.80%
0.80% 0.80% 1.20% 1.40% 1.20% 1.20% 0.80% 1.10%
1.70% 2.30% 2.50% 2.10% 1.20% 0.60% 0.50% 0.40%
© J.F. Campbell UM St. Louis 2015
0.80%
0.80% 0.80% 1.20% 1.60% 1.40% 0.80%
batter
Wainwright’s 1st pitch to
RH batter: strike=67.4%
1.50% 1.20% 1.20% 1.60% 1.50% 1.00% 0.80% 0.80%
0.5
4.30%
6.10%
15
Visualizing Wind
http://www.fangraphs.com/
Live: http://hint.fm/wind/
© J.F. Campbell UM St. Louis 2015
16
Design
• Design is not just what it looks like and feels
like. Design is how it works.
– Steve Jobs, 2003
Beautiful
Yes
Aesthetics
?
Confusing
Clear
Clarity
No
?
Ugly
© J.F. Campbell UM St. Louis 2015
17
Design
© J.F. Campbell UM St. Louis 2015
From http://vizwiz.blogspot.com/2012/04/nielsens-advertising-audiences-report.html
19
Visualization Design Guidelines
• The visualization must have a purpose!
–
All elements should work together to achieve the
purpose.
–
What questions can or does it answer?
–
What questions should it answer?
• Be simple and succinct.
–
Show the main points – do not make the audience try
to figure it out.
–
Do not present too much information! (Limit a
dashboard to 2-4 elements/views).
• Any interactivity should be obvious to the
viewer.
© J.F. Campbell UM St. Louis 2015
20
Visualization Design Guidelines
• Many visualizations combine several elements
(views, charts, etc.) in a “dashboard”.
• Place the most important view at the top, or top
left.
• Be sure the legends are associated with the
correct view.
– Position legends to the right of the view, if possible.
• If elements are linked interactively, arrange
them top to bottom and left to right, with the
linking and filtering starting at the top.
© J.F. Campbell UM St. Louis 2015
21
Choosing a Good Chart
http://extremepresentation.typepad.com/blog/2006/09/choosing_a_good.html
© J.F. Campbell UM St. Louis 2015
22
Design Guidelines: Charts
• Put the most important data on the rows and
columns (x and y axes);
• Use color and size for less important attributes.
• Bar charts are usually better than pie charts:
– Areas in pie charts are difficult to estimate, and the eye
can compare only adjacent slices.
– Put labels on the bars.
• Do not use 3D charts.
• Make sure all axes are understandable.
– Axis scales must be consistent.
• With line charts, limit the number if lines and
highlight the most important line(s).
© J.F. Campbell UM St. Louis 2015
23
Line Charts #1
• Keep it simple!
• Label the lines, instead
of using a legend.
© J.F. Campbell UM St. Louis 2015
24
Line Charts #2
• Highlight what is important.
• Is the baseline 0?
© J.F. Campbell UM St. Louis 2015
25
Line Charts #3
• Elevate the axis if baseline is not 0
• Use a good aspect ratio.
© J.F. Campbell UM St. Louis 2015
26
Bar Charts
• Use horizontal bar charts, rather than vertical
bar charts.
© J.F. Campbell UM St. Louis 2015
27
Tables?
© J.F. Campbell UM St. Louis 2015
From http://www.edwardtufte.com/bboard/q-and-a-fetch-msg?msg_id=0000Jr
28
Color
• Color is important! Choose colors intelligently.
– Use at most 6 colors.
– Use no more than two color palettes.
– Use meaningful colors (pink/blue; red/green, etc.),
but be aware that colors are culturally dependent
• Avoid multiple schemes.
Some colors do not
work well together!!
© J.F. Campbell UM St. Louis 2015
29
Color
• Vary the saturation level (lightness), not the
hue (color).
• Consider that your visualization may be
printed in black and white.
© J.F. Campbell UM St. Louis 2015
30
Color Can Be Deceiving…
Which square is darker – A or B?
Which is darker – A, B or C?
© J.F. Campbell UM St. Louis 2015
31
More Colors
Which dog is bluer?
How many colors are
in this?
© J.F. Campbell UM St. Louis 2015
32
100 Points
What do you see here?
Most points are blue, one is red and
four are green.
The points are spread out “evenly”
over the space.
What do you see here?
Differences are more difficult to
distinguish with symbols alone.
© J.F. Campbell UM St. Louis 2015
33
100 Points Again…
What do you see here?
Most points are blue, one is red and
some are green.
Some are squares, but most are dots;
one is a +.
The points are spread out “evenly”
over the space.
You may not appreciate that one point is very unusual point both an uncommon color and an uncommon shape (the green
square)
Combining color and shape does not work well!
© J.F. Campbell UM St. Louis 2015
34
Fonts
• Use only a few fonts:
– Verdana or Trebuchet for numbers.
– Arial, Georgia, Tahoma, Times New Roman,
Lucida Sans.
• Use a few appropriate font sizes.
• Change adjacent fonts by only one
attribute (bold or underline, not both):
– A good change
© J.F. Campbell UM St. Louis 2015
A
Bad change
35
Infographics
• A common type of visualization specific to
a particular context.
• Usually created for a single dataset for a
particular purpose.
• Not designed for the user to explore the
data.
• Most view infographics as a
type of visualization; but
some see it the opposite way.
© J.F. Campbell UM St. Louis 2015
36
Infographic 1
© J.F. Campbell UM St. Louis 2015
37
Infographic 2
© J.F. Campbell UM St. Louis 2015
38
Infographic 3
© J.F. Campbell UM St. Louis 2015
39
Summary
• Use the real estate wisely.
• Show the main points – do not make the
audience try to figure it out.
• Do not present too much information!
• Do the squint test:
–
What stands out? What do you see?
• Show it to someone else and ask what they see.
• Include the source of the data.
© J.F. Campbell UM St. Louis 2015
40
Basic Information
• A great site for visualization basics.
http://guides.library.duke.edu/datavis/
• A great site for Tableau information.
http://guides.library.duke.edu/tableau
• More design guidance…
http://www.youtube.com/watch?v=pD_OvRtH0aY
© J.F. Campbell UM St. Louis 2015
41
Baby Names in Tableau
• Consider the top baby name in each US state for
each year…
http://www.tableau.com/public/BabyNamesTraining
• What to call on 4th down?
http://datographer.blogspot.com/2014/03/fourth-down.html
© J.F. Campbell UM St. Louis 2015
42
Data for Baby Names in Tableau
AK,F,1910,Mary,14
• Original Data:
AK,F,1910,Annie,12
• Every baby name used >5 times, by
state and by year since 1910.
AK,F,1910,Margaret,8
• State, Gender, Year, Name, # of
occurrences
• From this, extract the top male and
female name for each state for each
year.
AK,F,1910,Anna,10
AK,F,1910,Helen,7
AK,F,1910,Elsie,6
AK,F,1910,Lucy,6
AK,F,1910,Dorothy,5
AK,F,1911,Mary,12
AK,F,1911,Margaret,7
AK,F,1911,Ruth,7
AK,F,1911,Annie,6
AK,F,1911,Elizabeth,6
AK,F,1911,Helen,6
AK,F,1912,Mary,9
AK,F,1912,Elsie,8
AK,F,1912,Agnes,7
AK,F,1912,Anna,7
AK,F,1912,Helen,7
AK,F,1912,Louise,7
© J.F. Campbell UM St. Louis 2015
AK,F,1912,Jean,6
43
Tableau Dashboard
Number of different top male
(blue) and female (pink)
names in the 50 states since
1910
YEAR
Gender
Top name in each state
for chosen year
Frequency of name
(for top names)
Trend of name as
the top name in
states over time
© J.F. Campbell UM St. Louis 2015
http://www.tableau.com/public/BabyNamesTraining 44
Download