Graphics in Excel stat 480
 Heike Hofmann

advertisement
Graphics in Excel
stat 480
Heike Hofmann
Outline
• Pivot Tables, again
• Pivot Table Graphics
• Grouping
• Concepts behind Graphics
Your Turn
• Use file nba-data-graphics.xls
• Create Pivot tables to answer the following questions:
• what is the average salary by position? (How many
players is this based on, what is the standard deviation?)
• do salaries vary by team?
• how many minutes do players play per season?
does position impact this? is there a difference over
time?
• ... identify one additional question of interest that you can
answer with the help of a pivot table
Grouping
• Useful to simplify variables with many values
• This (in excel) is called grouping
• Group ages in decades, height into
something reasonable, dates into months
!
• How? Right-Click into the Pivot table row,
choose “Group” to open the dialog window
Pivot table hints
• Choose the variable with the most
categories and put it in the rows
• Put the variable with the next most in the
columns
• More than two variables is tricky, play
around, remembering what comparison you
are interested in comparing
Field settings
• Change summary type (average, sum,
standard deviation)
• Formatting
• Special calculations
Missing data
• Excel is inconsistent - sometimes missing,
sometimes error
• What happens when a blank is in a
mathematical calculation?
• What happens when a blank is in a formula?
Pivot Chart Graphics
• probably the most useful for us
• line charts, bar charts, scatterplots
• Excel often produces ugly graphics by default
• Bad Graphics devalue work
Take home message: change Excel defaults when you use it!
Clean up
• Remove background - at least dim it
• Adjust axis scales
• Make grid lines pale
• Use muted colors
• Check font size
Graphics need Captions
Every chart needs a caption that covers the
following two items:
- what is shown? (i.e. Figure 3: Scatterplot of players’ salary by
number of minutes played.)
- what do you want me to see?
(i.e. There is a weak positive relationship,
indicating that as number of minutes increase,
salary increases with it on average.)
Graphics in write-ups
Make a reference to a chart in the relevant
section of a writeup. e.g.: as can be seen in the scatterplot of figure 3,
there is a positive relationship between …
Salary and Minutes
These players look
Played
interesting … we should investigate
some more
Further Investigation
Average
Column
of
Labels
Row Labels C
PF
PG
SF
SG
5-10
2317146.50
959111.00
1000000.00
457588.00
5000000.00
10-15
2483701.66
2396208.09
1669428.20
2131741.00
1807900.25
15-20
3534607.02
3387467.15
2022268.56
2880711.42
2625292.02
20-25
5743610.46
4375802.30
2629028.46
3658961.48
3658098.82
25-30
7481716.45
6506591.77
4896735.21
5084835.59
4718771.60
30-35
9649307.38
10731034.41
7000824.59
7691506.81
7753217.35
35-40
11609103.76
11788201.54
7913678.18 11115608.91
10465867.59
40-45
Grand
Total
11000000.00
6405013.76
6070437.54
Double click on the cell ...
9500000.00
4669557.64 5636037.39
5157925.83
Further Investigation
Point is based on a single player:
SALARY
Team/salary
5000000 Los Angeles
Lakers
POSITION
SG
Player/season
Sasha Vujacic:
2010
PLAYER
Sasha Vujacic
We should probably exclude cases where we have very
few players ….
… but Excel doesn’t let us easily ...
Re-worked example
example
Re-worked
…
exclude all
all values
valueswith
withMPG
MPG<<5 5and
and> >4040
… exclude
Pivot Chart Graphics
• probably the most useful for us
• line charts, bar charts, scatterplots
• Excel often produces ugly graphics by default
• Bad Graphics devalue work
Take home message: change Excel defaults when you use it!
Chart exercises
• Download the JuiceAnalytics Worksheet
• Practice cleaning them up with Chart
Exercises 1 & 2 (but NOT 3 & 4)
Clean up
• Remove background - at least dim it
• Adjust axis scales
• Make grid lines pale
• Use muted colors
• Check font size
Graphical Summaries
•
•
•
Show more (complex) data than in a table Each Graphic needs to ‘tell a story’:
•
‘big picture’: •
‘deviations’:
e.g. overall trend, clustering in the data, ... which (groups) of data points do not follow the big picture? Follow-up (in the write-up): •
describe the big picture - critically assess whether this pattern
makes sense
•
Describe deviations - if it is a small group identify records (label
them in the plot), do deviations form another pattern?
Try to find an explanation - if you have found one describe it that’s another story, if you cannot find an explanation, describe
what you did.
Graphics need Captions
Every chart needs a caption that covers the
following two items:
- what is shown? (i.e. Figure 3: Scatterplot of players’ salary by
number of minutes played.)
- what do you want me to see?
(i.e. There is a weak positive relationship,
indicating that as number of minutes increase,
salary increases with it on average.)
Graphics in write-ups
Make a reference to a chart in the relevant
section of a writeup. e.g.: as can be seen in the scatterplot of figure 3,
there is a positive relationship between …
Visual Tasks
• work by (rough) quantification but mainly by
comparisons
• A graphic has to be constructed such, that it
makes quantification/comparisons easy
• Need to know, which visual tasks are easy/
hard
Ranking of perceptual tasks
• usually we are not interested in exact quantities
• ... But ... use accuracy as measure
Premise (Cleveland & McGill):
!
A graphical form that involves elementary perceptual
tasks that lead to more accurate judgments than
another graphical form (with the same quantitative
information) will result in a better organization and
increase the chances of correct perception of patterns
and behavior.
Example:
vs Pie
Pie
Example: Bar
Bar vs
What
comparisons?
Whattasks
tasksare
are involved
involved in
in comparisons?
Area
to value
value
Area isis proportional
proportional to
!"#$%&"'"%
!"#$%&"'"%
-"
,"
!"
#"
+"
$"
%"
*"
&"
'"
)"
("
!"
!
comparison of
of angles,
angles, comparison
curvelength
length
curve
#"
$!"
$#"
%!"
%#"
&!"
&#"
'!"
'#"
comparison
comparisonofofwidths,
widths, positions
positions along
alongaacommon
commonscale
scale
Evaluation of different designs
ask users!
Positions
along
a
common
scale
Positions
along
a
common
scale
Positions along a common scale
Determine
the
angles
forbins
slices
1 to
as
Determinethe
theangles
width for
for
bins
F as
as
Determine
for
slices
1to
66 as
Determine
the
width
AA to
Fto
accurately
aspossible
possible
accurately
as
possible
accurately
as
possible
accurately
as
Bin
A
B
C
D
E
F
!"#$
!"#$
,"
,"
+"
+"
*"
*"
)"
)"
("
("
'"
'"
!"
!"
#"
#"
$!"
$!"
$#"
$#"
%!"
%!"
%#"
%#"
&!"
&!"
Value
12
23
14
24
20
7
write down (absolute)
differences between true
values and your estimates
Show of hands:
Sum of Errors
•
•
•
5 or less?
3 or less?
Accurate?
Angle
Angle comparisons
comparisons
Determine
for slices
1 toA6toasF
Determinethe
theangles
percentage
for slices
as accurately
as possible
accurately
as possible
!"#$
!"
!"
#"
#"
$"
$"
%"
%"
&"
&"
'"
'"
Slice Value
29
A
13
B
7
C
18
D
10
E
24
F
write down differences between
true values and your estimates
Show of hands:
Sum of Errors
•
•
•
•
Ran out of time?
5 or less?
3 or less?
Accurate?
Barcharts give us more accurate results, faster ...
Ranking of Difficulty of Graphical Tasks (Cleveland & McGill (1984), Kosara & Ziemkiewic (2010))
•
•
•
•
•
•
Position along a common scale
Position along non-aligned scales
Length, Direction, Angle
Area
Volume, Curvature
Shading, Color Saturation
Excel graphics
• Our focus will be on Barcharts and
Scatterplots (maybe Lineplots)
• Barchart for investigating distribution
within a column
• Scatterplot for investigating relationship
between two columns
• Everything else is of dubious use.
Barcharts
• Pretty straightforward
• Often useful to sort by value
Scatterplot
• x variable in first column
• y variable in second column, or multiple
columns for multiple series
• shows general relationship between two
variables
• Put dependent variable, if there is one, as y
axis
Line plots
• Use line plots to show time trends
• colors/line types for different categories
!
• only use lines to connect points, when those points
have something in common - draw a line between two points, when it’s
reasonable to assume that there were points
between those measurements
• make sure that the focus is on comparing between
lines, and showing trends along lines
Your Turn
• Get some time with your team
Download