Collecting data stat 480 Heike Hofmann Salaries...economics..financial.data Health...fitness Movies..e.g..ratings..box.office.revenues...... Global.issues..comparison.across.countries Favorite Like it very much Like it Don't like it that much Health...diseasesTraveling..e.g..flight.delays..tarmac.waits..airport.performance...... Sports..e.g..Baseball..Football...... Crime.Data..e.g..FBI.Database. Favorite Like it very much Like it Don't like it that much Climate.Environment.Weather Environmental.Data..e.g..pollution..fuel.economy..CO2.emissions...... Favorite Like it very much Like it Don't like it that much 0 5 10 15 20 0 5 10 15 voting results 20 count answer Don't like it that much Like it Like it very much Favorite Outline • Practice some of the skills you learned last time • Two Scenarios of Data Collection • vlookup Exercises • Functions in Excel / Getting Help • Discussing Data (problems) Sheet formatting • • Excel is primarily an intermediate format • • between collection and analysis you will normally get it after data collection has taken place Want to optimize it for analysis Data tidying • • • Fill in all the blanks Give variables descriptive, but short, names • Don’t use any special characters, stick with numbers and letters Minimum of formatting Your turn • Tidy up the data provided in file excel-tidy.xls Data Collection Scenario I Study in Human Perception Ability How good is your Eyeballing? • The eyeballing game tests your skills in completing a set of visual tasks • Go to http://woodgears.ca/eyeball/ • Play it once • Copy results into Excel file & save • for the next steps, see homework Homework • Now up on the website • Two parts: - collect data from eyeballing game - small write-up • Due next Thursday Scenario II • Basketball is after Baseball the sport with the highest salary pay for individual players. • We can gather data on players’ salaries from online resources (e.g. HoopsHype, ESPN, USAToday’s database). Your Turn • Go to the HoopsHype salary database at http://hoopshype.com/salaries.htm • Select one team and one season, get salary information for all players on the team. • Copy and Paste the salary table into an Excel spread sheet. • compare this to collecting data from http://espn.go.com/nba/salaries • ... what can we do next with this kind of data ... ? How did it go? What are issues with the data? • Reliability? • Accuracy? First Data Cleaning Steps • Combine data into one data set, if you haven’t done this yet. • Augment collected data by team and season. • Clean-up data: make salaries numbers, delete superfluous rows, extract position from names, split names in first and last (why ?) • ... other suggestions ... ? Your Turn • The goal is to first separate the players’ names into name and position and then the name into first and last name. • Team up with your neighbor and try to work out how this can be done in Excel Before we use functions … … we need to back up and look at cell references Referencing Cells • Cells in Excel are described in form of a letter followed by a number for (column, row): A What happens for tables with more than 26 columns? 1 2 3 4 5 6 B C D E Absolute and Relative References • A relative reference has the form letters number, eg A1, Z46, AA34 • An absolute reference is written in the form $letters $number, eg $A$1, $Z$46, $AA$34 • Absolute and relative references can be mixed, e.g. $letters number letters $number What’s the difference? Absolute and Relative References • We need to reference a cell in a formula, e.g. =A1 • A relative reference is storing an offset from the current cell (i.e. it depends on the cell into which you are typing ‘=A1’) • The difference between relative and absolute references becomes clear when you try copying and pasting the formula • Use =A$1 =$A1 =$A$1 for absolute references Text functions in Excel • LEN • FIND • LEFT, RIGHT, MID • TRIM • CONCATENATE ! • use built-in help to get more information on each function and see examples for their use Your Turn • Download the file nba-espn-salaries.xls from our website • Split the names of players into tree columns: ‘First Name’, ‘Last Name’, and ‘Position’ • Make sure that names are cleaned of all white space • Format dollar amounts to integer numbers. • Sort Data according to Players’ last names, first names and season. ! • Fast track: - add column with first year of season - delete rows without information Your Turn • Which aspects of the data can we investigate now? ! • Again, team up with your neighbor and discuss for 2 min. • We will collect ideas afterwards. Routes of Investigation Excel Functionality Working with data sets • Freeze Panes ALT - W - F • Autofilter Ctrl-Shift-L Combining Information from different tables • Do the vlookup exercises in the Juice Analytics Excel help file (see website)