Collecting data stat 480
 Heike Hofmann

advertisement
Collecting data
stat 480
Heike Hofmann
Salaries...economics..financial.data
Health...fitness
Movies..e.g..ratings..box.office.revenues......
Global.issues..comparison.across.countries
Favorite
Like it very much
Like it
Don't like it that much
Health...diseasesTraveling..e.g..flight.delays..tarmac.waits..airport.performance......
Sports..e.g..Baseball..Football......
Crime.Data..e.g..FBI.Database.
Favorite
Like it very much
Like it
Don't like it that much
Climate.Environment.Weather
Environmental.Data..e.g..pollution..fuel.economy..CO2.emissions......
Favorite
Like it very much
Like it
Don't like it that much
0
5
10
15
20
0
5
10
15
voting
results
20
count
answer
Don't like it that much
Like it
Like it very much
Favorite
Outline
• Practice some of the skills you learned last time
• Two Scenarios of Data Collection
• vlookup Exercises
• Functions in Excel / Getting Help
• Discussing Data (problems)
Sheet formatting
•
•
Excel is primarily an intermediate format
•
•
between collection and analysis
you will normally get it after data
collection has taken place
Want to optimize it for analysis
Data tidying
•
•
•
Fill in all the blanks
Give variables descriptive, but short, names
•
Don’t use any special characters, stick
with numbers and letters
Minimum of formatting
Your turn
• Tidy up the data provided in file excel-tidy.xls
Data Collection
Scenario I
Study in Human Perception Ability
How good is your
Eyeballing?
• The eyeballing game tests your skills in
completing a set of visual tasks
• Go to http://woodgears.ca/eyeball/
• Play it once
• Copy results into Excel file & save
• for the next steps, see homework
Homework
• Now up on the website
• Two parts: - collect data from eyeballing game - small write-up
• Due next Thursday
Scenario II
• Basketball is after Baseball the sport with
the highest salary pay for individual players.
• We can gather data on players’ salaries from
online resources (e.g. HoopsHype, ESPN,
USAToday’s database).
Your Turn
• Go to the HoopsHype salary database at http://hoopshype.com/salaries.htm
• Select one team and one season, get salary
information for all players on the team.
• Copy and Paste the salary table into an Excel
spread sheet.
• compare this to collecting data from http://espn.go.com/nba/salaries
• ... what can we do next with this kind of data ... ?
How did it go?
What are issues with
the data?
• Reliability?
• Accuracy?
First Data Cleaning
Steps
• Combine data into one data set, if you
haven’t done this yet.
• Augment collected data by team and season. • Clean-up data: make salaries numbers, delete superfluous rows,
extract position from names,
split names in first and last (why ?)
• ... other suggestions ... ?
Your Turn
• The goal is to first separate the players’ names into
name and position and then the name into first and
last name. • Team up with your neighbor and try to work out
how this can be done in Excel
Before we use functions …
… we need to back up and
look at cell references
Referencing Cells
•
Cells in Excel are described in form of a letter
followed by a number for (column, row):
A
What happens for tables
with more than 26 columns?
1
2
3
4
5
6
B
C
D
E
Absolute and Relative
References
•
A relative reference has the form letters number, eg A1, Z46, AA34
•
An absolute reference is written in the form
$letters $number, eg $A$1, $Z$46, $AA$34
•
Absolute and relative references can be mixed, e.g.
$letters number
letters $number
What’s the difference?
Absolute and Relative
References
•
We need to reference a cell in a formula, e.g.
=A1
•
A relative reference is storing an offset from the
current cell (i.e. it depends on the cell into which you
are typing ‘=A1’)
•
The difference between relative and absolute
references becomes clear when you try copying and
pasting the formula
•
Use =A$1 =$A1 =$A$1 for absolute references
Text functions in Excel
• LEN
• FIND
• LEFT, RIGHT, MID
• TRIM
• CONCATENATE
!
• use built-in help to get more information on
each function and see examples for their use
Your Turn
• Download the file nba-espn-salaries.xls from our website
• Split the names of players into tree columns: ‘First Name’, ‘Last
Name’, and ‘Position’
• Make sure that names are cleaned of all white space
• Format dollar amounts to integer numbers.
• Sort Data according to Players’ last names, first names and
season.
!
• Fast track: - add column with first year of season
- delete rows without information
Your Turn
• Which aspects of the data can we investigate now?
!
• Again, team up with your neighbor and discuss for 2 min.
• We will collect ideas afterwards.
Routes of Investigation
Excel Functionality
Working with data sets
• Freeze Panes
ALT - W - F
• Autofilter
Ctrl-Shift-L
Combining Information
from different tables
• Do the vlookup exercises in the Juice
Analytics Excel help file (see website)
Download