Numeric and Visual Data Summaries Stat 480
 Heike Hofmann

advertisement
Numeric and Visual
Data Summaries
Stat 480
Heike Hofmann
Outline
• Pivot tables • Grouping
• Graphics
Steps for Merging Data
Sources
• Situation: Data regarding same entities (player, season) in different files
• Merge steps:
• identify key: player name, season
• combine into a single column (these keys have to consist
of identical values)
• make this column the first column
• sort data according to key
• use vlookup to merge data
Your Turn
• Download the file nba-data-combine.xls
• This file has performance and salary
information in separate sheets.
• Salary information is merged into performance
statistics
• Your turn: use lookup to merge the position
of each player into the performance
statistics as well.
Check Data for Familiar
Features
• Now that we have position and salary in the
performance, try to see familiar things:
• Performance on some statistics should
depend on position - e.g. we would expect
more scoring/rebounding from offensive
players than from defense
!
• We need: summaries by position
Pivot tables
• Dynamically aggregate and summarize the
data
• Drag and drop columns
• Measured variables in the middle, id variables
in rows and columns (and pages)
• Find it:
Insert > Pivot Table
Pivot tables
• In order to assess actual ‘differences’ between
averages, we need to also take into account: • number of records,
• standard deviation of values.
!
• Ideally we would also like the distribution of values,
but Excel does not make that easy.
What do we want to
find out about the data?
Routes of investigation
• What is the relationship between
performance (which aspect of it?) and a
player’s salary?
• Do different positions get different average
salaries?
• Do salaries vary by team?
• Do we see an age related peak in players
performance/salary?
Routes of investigation
• What is the relationship between
performance (which aspect of it?) and a
player’s salary?
•Do different positions get different
average salaries?
•Do salaries vary by team?
• Do we see an age related peak in players
performance/salary?
Your Turn
• Use file nba-data-all.xls
• Create Pivot tables to answer the following questions:
• what is the average salary by position? (How many
players is this based on, what is the standard deviation?)
• do salaries vary by team?
• how many minutes do players play per season?
does position impact this? is there a difference over
time?
• ... identify one additional question of interest that you can
answer with the help of a pivot table
Grouping
• Useful to simplify variables with many values
• This (in excel) is called grouping
• Group ages in decades, height into
something reasonable, dates into months
!
• How? Right-Click into the Pivot table row,
choose “Group” to open the dialog window
Pivot table hints
• Choose the variable with the most
categories and put it in the rows
• Put the variable with the next most in the
columns
• More than two variables is tricky, play
around, remembering what comparison you
are interested in comparing
Field settings
• Change summary type (average, sum,
standard deviation)
• Formatting
• Special calculations
Missing data
• Excel is inconsistent - sometimes missing,
sometimes error
• What happens when a blank is in a
mathematical calculation?
• What happens when a blank is in a formula?
Pivot Chart Graphics
• probably the most useful for us
• line charts, bar charts, scatterplots
• Excel often produces ugly graphics by default
• Bad Graphics devalue work
Take home message: change Excel defaults when you use it!
Chart exercises
• Download the JuiceAnalytics Worksheet
• Practice cleaning them up with Chart
Exercises 1 & 2 (but NOT 3 & 4)
Clean up
• Remove background - at least dim it
• Adjust axis scales
• Make grid lines pale
• Use muted colors
• Check font size
Graphics need Captions
Every chart needs a caption that covers the
following two items:
- what is shown? (i.e. Figure 3: Scatterplot of players’ salary by
number of minutes played.)
- what do you want me to see?
(i.e. There is a weak positive relationship,
indicating that as number of minutes increase,
salary increases with it on average.)
Graphics in write-ups
Make a reference to a chart in the relevant
section of a writeup. e.g.: as can be seen in the scatterplot of figure 3,
there is a positive relationship between …
Download