Today’s Agenda - SPSS: Boxplots - Ratios - Crosstabs

advertisement
Today’s Agenda
-
SPSS: Boxplots
Ratios
Crosstabs
Conditionals
Association and causation
SPSS: Crosstabs
Relevant text: P.54-60 Chapter 2
Note:
We haven’t covered response/dependent variables and
Explanatory/independent variables yet, and it won’t fit nicely
into the next couple of lectures.
So Question 3c on Assignment 1 is for a bonus mark.
From last time:
- Boxplots are good for visualizing the general trend in
interval data.
- They show everything in the five number summary (MinQ1-Median-Q3-Max), the whiskers and the outliers.
From last time:
- Side-by-side boxplots can be used to compare the
distributions of multiple sets of interval data.
To build one in SPSS, go to
Graphs  Legacy Dialogs  Boxplot
Then, in the boxplot pop-up, switch to “Summaries of separate
variables”, and click “Define”.
This assumes that we’re comparing data from different
variables like X,Y, and Z in the SPSS example from last week.
Move the variables you want plotted into “Boxes Represent”
Then click OK
- This is the result. Side-by-side boxplots can be used to
compare more than 2 variables.
- But, boxplots can only display interval data.
When we have no interval data, we need something else, like
cross tabulations (or crosstabs).
Cross tabulations (literally “cross tables”) are a non-graphical
method of summarizing two variables of nominal or ordinal
data.
Favoured Pet
Cat
Dragon
Total
Favoured Ice Cream
Vanilla Chocolate
Total
72
29
101
210
825
1035
282
854
1136
Each row represents the category of one variable.
Favoured Pet
Cat
Dragon
Total
Favoured Ice Cream
Vanilla Chocolate
Total
72
29
101
210
825
1035
282
854
1136
The row totals show that 101 people prefer cats,
And 1035 people prefer bearded dragons.
Each column represents the category of the other variable.
Favoured Pet
Cat
Dragon
Total
Favoured Ice Cream
Vanilla Chocolate
Total
72
29
101
210
825
1035
282
854
1136
The row totals show that 282 people prefer vanilla,
And 854 people prefer chocolate.
Each cell represents the number of cases that are in both the
row and column categories.
Favoured Pet
Cat
Dragon
Total
Favoured Ice Cream
Vanilla Chocolate
Total
72
29
101
210
825
1035
282
854
1136
72 people like vanilla AND cats best. (boring)
825 people like chocolate AND dragons best. (clearly better)
Another perspective:
Of the people that prefer cats to dragons, 72 like vanilla.
Conditionals
When we only consider one row or one column, we are
conditioning on that response.
“Of the people that prefer cats to dragons, 72 like vanilla.”
Means:
Conditional on the preference of cats, 72 (out of 101) prefer
vanilla.
To go further we need the ratio.
A ratio is simply a measure of how much of one thing there is
in comparison to another.
Example: The fertility rate in Canada is 1.49 children per
woman. Comparing # of (expected) children to # of women.
Example: The DeLorean car goes 88 miles per hour. Comparing
# of miles traveled to # of hours passed.
Ratios can be used to make fair comparisons between things
that come from different scales.
Canada has a few more hockey players than the US, but a
MUCH bigger hockey player to citizen ratio.
Source: IIHF Survey of Players
These comparisons can be made over time too.
- There are roughly as many traffic fatalities in the US as
there were 60 years ago. (~30,000)
- Traffic fatalities are considered to be “at an all-time low”
because much more driving is happening in 2009 than
1949, but resulting in the same number of deaths.
- The fatalities per mile is much lower now. (About 1/6 as
much)
Source: National Highway Safety Traffic Administration
http://www-fars.nhtsa.dot.gov/Main/index.aspx
Ratios let us make a fair comparison between different
conditions.
Favoured Pet
Cat
Dragon
Total
Favoured Ice Cream
Vanilla Chocolate
Total
72
29
101
210
825
1035
282
854
1136
There may be more total vanilla fans among dragon fans but…
72 of 101, or 71% of cat fans prefer vanilla.
210 of 103, or 20% of dragon fans prefer vanilla.
When we compare the ratios instead of the raw numbers, we
account for the different sizes of the groups being
considered.
Safety Status
Vehicle
Motorbike
Car
Total
Died in
Traffic
Else
11
9989
54 99,946
65 109,935
Total
10,000
100,000
110,000
Which do you think is more dangerous? Motorbikes or cars?
Cars have more fatalities, but the fatalities per user is much
higher with motorbikes.
Safety Status
Vehicle
Motorbike
Car
Total
Died in
Traffic
Else
11
9989
54 99,946
65 109,935
11/10000 = 0.11% Fatality rate on bikes.
54/100000 = 0.05% Fatality rate on cars.
Total
10,000
100,000
110,000
So far we’ve only seen 2x2 crosstabs, but larger ones are
possible. The age and sex table of last week is a (2 x lots)
crosstab. (Also, age is ordinal but sex is nominal, mixes fine)
Ordinal variables can be included because they’re still
categories.
You can have more than two categories for both variable too.
Child’s Education
<HS
HS College BSc
< High School 142 381
112
31
High School 157 637
225
57
Mother’s Some College 36
206
486
549
Education
BSc
18
25
68
410
Adv. Degree
4
11
22
91
TOTAL
357 1260
913
1138
MSc+ TOTAL
2
668
14
1090
98
1375
103
624
54
182
271 3939
Association and causation
If the response from one variable is more common when a
particular response from another variable appears, we say
there is a positive association between the two responses.
Safety Status
Vehicle
Motorbike
Car
Total
Died in
Traffic
Else
11
9989
54 99,946
65 109,935
Total
10,000
100,000
110,000
A negative association means one is less common when the
other happens.
Here we would say that riding a motorbike has a positive
association with dying in traffic.
!!!!!!!: But an association does NOT imply causation.
Just because two things happen together does not mean one
of them caused the other.
Example (this is actually true): Being left handed has a negative
association with living a long life.
That’s right, left handed people historically don’t live as long as
right handed people.
Source: Evidence for longevity differences between left handed and right handed men: an archival
study of cricketers. (1991)
J P Aggleton, R W Kentridge, and N J Neave
What’s so horribly wrong with lefties?
…nothing. Only the ones that died younger were counted.
Not long ago, left handedness was considered. Children were
forced to adopt right handedness, but not anymore.
If you looked at all the deaths today, more of the people that
were born a long time ago will be right handed.
So more of the people that had long lives ending today will
have been right handed.
It isn’t that lefties are dying young, it’s that lefties make up a
smaller portion of old people than of young people.
The authors of the source paper didn’t account for forcedhandedness.
Forced-handedness is a lurking variable , or a
confound.
It’s something other than handedness that affects life
expectancy, but hasn’t been accounted for.
Unless you control for all other possible variables, you can’t pin
an association down to one thing causing the other. (Almost
never in social research)
Later in the semester when we see the interval version of
association, called correlation, we’ll revisit this again.
Crosstabs in SPSS
To build a crosstab table:
Analyze  Descriptive Statistics  Crosstabs
In the crosstab pop-up, move one variable to rows and one to
columns, and click “OK”
In the output window, the crosstab will appear with the labels
instead of the variable names if you set them.
Another name for a crosstab is a contingency plot.
Next Time, we eat a honey badger*.
Next Lecture: Standard Deviation, for real.
*Patently untrue. Apologies to EpicMealTime
Download