Numeric and Visual Data Summaries Stat 480 Heike Hofmann Outline • Pivot tables • Grouping • Graphics Steps for Merging Data Sources • Situation: Data regarding same entities (player, season) in different files • Merge steps: • identify key: player name, season • combine into a single column (these keys have to consist of identical values) • make this column the first column • sort data according to key • use vlookup to merge data Your Turn • Download the file nba-data-combine.xls • This file has performance and salary information in separate sheets. • Salary information is merged into performance statistics • Your turn: use lookup to merge the position of each player into the performance statistics as well. Check Data for Familiar Features • Now that we have position and salary in the performance, try to see familiar things: • Performance on some statistics should depend on position - e.g. we would expect more scoring/rebounding from offensive players than from defense ! • We need: summaries by position Pivot tables • Dynamically aggregate and summarize the data • Drag and drop columns • Measured variables in the middle, id variables in rows and columns (and pages) • Find it: Insert > Pivot Table Pivot tables • In order to assess actual ‘differences’ between averages, we need to also take into account: • number of records, • standard deviation of values. ! • Ideally we would also like the distribution of values, but Excel does not make that easy. What do we want to find out about the data? Routes of investigation • What is the relationship between performance (which aspect of it?) and a player’s salary? • Do different positions get different average salaries? • Do salaries vary by team? • Do we see an age related peak in players performance/salary? Routes of investigation • What is the relationship between performance (which aspect of it?) and a player’s salary? •Do different positions get different average salaries? •Do salaries vary by team? • Do we see an age related peak in players performance/salary? Your Turn • Use file nba-data-all.xls • Create Pivot tables to answer the following questions: • what is the average salary by position? (How many players is this based on, what is the standard deviation?) • do salaries vary by team? • how many minutes do players play per season? does position impact this? is there a difference over time? • ... identify one additional question of interest that you can answer with the help of a pivot table Grouping • Useful to simplify variables with many values • This (in excel) is called grouping • Group ages in decades, height into something reasonable, dates into months ! • How? Right-Click into the Pivot table row, choose “Group” to open the dialog window Pivot table hints • Choose the variable with the most categories and put it in the rows • Put the variable with the next most in the columns • More than two variables is tricky, play around, remembering what comparison you are interested in comparing Field settings • Change summary type (average, sum, standard deviation) • Formatting • Special calculations Missing data • Excel is inconsistent - sometimes missing, sometimes error • What happens when a blank is in a mathematical calculation? • What happens when a blank is in a formula? Pivot Chart Graphics • probably the most useful for us • line charts, bar charts, scatterplots • Excel often produces ugly graphics by default • Bad Graphics devalue work Take home message: change Excel defaults when you use it! Chart exercises • Download the JuiceAnalytics Worksheet • Practice cleaning them up with Chart Exercises 1 & 2 (but NOT 3 & 4) Clean up • Remove background - at least dim it • Adjust axis scales • Make grid lines pale • Use muted colors • Check font size Graphics need Captions Every chart needs a caption that covers the following two items: - what is shown? (i.e. Figure 3: Scatterplot of players’ salary by number of minutes played.) - what do you want me to see? (i.e. There is a weak positive relationship, indicating that as number of minutes increase, salary increases with it on average.) Graphics in write-ups Make a reference to a chart in the relevant section of a writeup. e.g.: as can be seen in the scatterplot of figure 3, there is a positive relationship between …