Graphics in Excel stat 480 Heike Hofmann Outline • Pivot Tables, again • Pivot Table Graphics • Grouping • Concepts behind Graphics Your Turn • Use file nba-data-graphics.xls • Create Pivot tables to answer the following questions: • what is the average salary by position? (How many players is this based on, what is the standard deviation?) • do salaries vary by team? • how many minutes do players play per season? does position impact this? is there a difference over time? • ... identify one additional question of interest that you can answer with the help of a pivot table Grouping • Useful to simplify variables with many values • This (in excel) is called grouping • Group ages in decades, height into something reasonable, dates into months ! • How? Right-Click into the Pivot table row, choose “Group” to open the dialog window Pivot table hints • Choose the variable with the most categories and put it in the rows • Put the variable with the next most in the columns • More than two variables is tricky, play around, remembering what comparison you are interested in comparing Field settings • Change summary type (average, sum, standard deviation) • Formatting • Special calculations Missing data • Excel is inconsistent - sometimes missing, sometimes error • What happens when a blank is in a mathematical calculation? • What happens when a blank is in a formula? Pivot Chart Graphics • probably the most useful for us • line charts, bar charts, scatterplots • Excel often produces ugly graphics by default • Bad Graphics devalue work Take home message: change Excel defaults when you use it! Clean up • Remove background - at least dim it • Adjust axis scales • Make grid lines pale • Use muted colors • Check font size Graphics need Captions Every chart needs a caption that covers the following two items: - what is shown? (i.e. Figure 3: Scatterplot of players’ salary by number of minutes played.) - what do you want me to see? (i.e. There is a weak positive relationship, indicating that as number of minutes increase, salary increases with it on average.) Graphics in write-ups Make a reference to a chart in the relevant section of a writeup. e.g.: as can be seen in the scatterplot of figure 3, there is a positive relationship between … Salary and Minutes These players look Played interesting … we should investigate some more Further Investigation Average Column of Labels Row Labels C PF PG SF SG 5-10 2317146.50 959111.00 1000000.00 457588.00 5000000.00 10-15 2483701.66 2396208.09 1669428.20 2131741.00 1807900.25 15-20 3534607.02 3387467.15 2022268.56 2880711.42 2625292.02 20-25 5743610.46 4375802.30 2629028.46 3658961.48 3658098.82 25-30 7481716.45 6506591.77 4896735.21 5084835.59 4718771.60 30-35 9649307.38 10731034.41 7000824.59 7691506.81 7753217.35 35-40 11609103.76 11788201.54 7913678.18 11115608.91 10465867.59 40-45 Grand Total 11000000.00 6405013.76 6070437.54 Double click on the cell ... 9500000.00 4669557.64 5636037.39 5157925.83 Further Investigation Point is based on a single player: SALARY Team/salary 5000000 Los Angeles Lakers POSITION SG Player/season Sasha Vujacic: 2010 PLAYER Sasha Vujacic We should probably exclude cases where we have very few players …. … but Excel doesn’t let us easily ... Re-worked example example Re-worked … exclude all all values valueswith withMPG MPG<<5 5and and> >4040 … exclude Pivot Chart Graphics • probably the most useful for us • line charts, bar charts, scatterplots • Excel often produces ugly graphics by default • Bad Graphics devalue work Take home message: change Excel defaults when you use it! Chart exercises • Download the JuiceAnalytics Worksheet • Practice cleaning them up with Chart Exercises 1 & 2 (but NOT 3 & 4) Clean up • Remove background - at least dim it • Adjust axis scales • Make grid lines pale • Use muted colors • Check font size Graphical Summaries • • • Show more (complex) data than in a table Each Graphic needs to ‘tell a story’: • ‘big picture’: • ‘deviations’: e.g. overall trend, clustering in the data, ... which (groups) of data points do not follow the big picture? Follow-up (in the write-up): • describe the big picture - critically assess whether this pattern makes sense • Describe deviations - if it is a small group identify records (label them in the plot), do deviations form another pattern? Try to find an explanation - if you have found one describe it that’s another story, if you cannot find an explanation, describe what you did. Graphics need Captions Every chart needs a caption that covers the following two items: - what is shown? (i.e. Figure 3: Scatterplot of players’ salary by number of minutes played.) - what do you want me to see? (i.e. There is a weak positive relationship, indicating that as number of minutes increase, salary increases with it on average.) Graphics in write-ups Make a reference to a chart in the relevant section of a writeup. e.g.: as can be seen in the scatterplot of figure 3, there is a positive relationship between … Visual Tasks • work by (rough) quantification but mainly by comparisons • A graphic has to be constructed such, that it makes quantification/comparisons easy • Need to know, which visual tasks are easy/ hard Ranking of perceptual tasks • usually we are not interested in exact quantities • ... But ... use accuracy as measure Premise (Cleveland & McGill): ! A graphical form that involves elementary perceptual tasks that lead to more accurate judgments than another graphical form (with the same quantitative information) will result in a better organization and increase the chances of correct perception of patterns and behavior. Example: vs Pie Pie Example: Bar Bar vs What comparisons? Whattasks tasksare are involved involved in in comparisons? Area to value value Area isis proportional proportional to !"#$%&"'"% !"#$%&"'"% -" ," !" #" +" $" %" *" &" '" )" (" !" ! comparison of of angles, angles, comparison curvelength length curve #" $!" $#" %!" %#" &!" &#" '!" '#" comparison comparisonofofwidths, widths, positions positions along alongaacommon commonscale scale Evaluation of different designs ask users! Positions along a common scale Positions along a common scale Positions along a common scale Determine the angles forbins slices 1 to as Determinethe theangles width for for bins F as as Determine for slices 1to 66 as Determine the width AA to Fto accurately aspossible possible accurately as possible accurately as possible accurately as Bin A B C D E F !"#$ !"#$ ," ," +" +" *" *" )" )" (" (" '" '" !" !" #" #" $!" $!" $#" $#" %!" %!" %#" %#" &!" &!" Value 12 23 14 24 20 7 write down (absolute) differences between true values and your estimates Show of hands: Sum of Errors • • • 5 or less? 3 or less? Accurate? Angle Angle comparisons comparisons Determine for slices 1 toA6toasF Determinethe theangles percentage for slices as accurately as possible accurately as possible !"#$ !" !" #" #" $" $" %" %" &" &" '" '" Slice Value 29 A 13 B 7 C 18 D 10 E 24 F write down differences between true values and your estimates Show of hands: Sum of Errors • • • • Ran out of time? 5 or less? 3 or less? Accurate? Barcharts give us more accurate results, faster ... Ranking of Difficulty of Graphical Tasks (Cleveland & McGill (1984), Kosara & Ziemkiewic (2010)) • • • • • • Position along a common scale Position along non-aligned scales Length, Direction, Angle Area Volume, Curvature Shading, Color Saturation Excel graphics • Our focus will be on Barcharts and Scatterplots (maybe Lineplots) • Barchart for investigating distribution within a column • Scatterplot for investigating relationship between two columns • Everything else is of dubious use. Barcharts • Pretty straightforward • Often useful to sort by value Scatterplot • x variable in first column • y variable in second column, or multiple columns for multiple series • shows general relationship between two variables • Put dependent variable, if there is one, as y axis Line plots • Use line plots to show time trends • colors/line types for different categories ! • only use lines to connect points, when those points have something in common - draw a line between two points, when it’s reasonable to assume that there were points between those measurements • make sure that the focus is on comparing between lines, and showing trends along lines Your Turn • Get some time with your team