Today’s Agenda - SPSS: Boxplots Ratios Crosstabs Conditionals Association and causation SPSS: Crosstabs Relevant text: P.54-60 Chapter 2 Note: We haven’t covered response/dependent variables and Explanatory/independent variables yet, and it won’t fit nicely into the next couple of lectures. So Question 3c on Assignment 1 is for a bonus mark. From last time: - Boxplots are good for visualizing the general trend in interval data. - They show everything in the five number summary (MinQ1-Median-Q3-Max), the whiskers and the outliers. From last time: - Side-by-side boxplots can be used to compare the distributions of multiple sets of interval data. To build one in SPSS, go to Graphs Legacy Dialogs Boxplot Then, in the boxplot pop-up, switch to “Summaries of separate variables”, and click “Define”. This assumes that we’re comparing data from different variables like X,Y, and Z in the SPSS example from last week. Move the variables you want plotted into “Boxes Represent” Then click OK - This is the result. Side-by-side boxplots can be used to compare more than 2 variables. - But, boxplots can only display interval data. When we have no interval data, we need something else, like cross tabulations (or crosstabs). Cross tabulations (literally “cross tables”) are a non-graphical method of summarizing two variables of nominal or ordinal data. Favoured Pet Cat Dragon Total Favoured Ice Cream Vanilla Chocolate Total 72 29 101 210 825 1035 282 854 1136 Each row represents the category of one variable. Favoured Pet Cat Dragon Total Favoured Ice Cream Vanilla Chocolate Total 72 29 101 210 825 1035 282 854 1136 The row totals show that 101 people prefer cats, And 1035 people prefer bearded dragons. Each column represents the category of the other variable. Favoured Pet Cat Dragon Total Favoured Ice Cream Vanilla Chocolate Total 72 29 101 210 825 1035 282 854 1136 The row totals show that 282 people prefer vanilla, And 854 people prefer chocolate. Each cell represents the number of cases that are in both the row and column categories. Favoured Pet Cat Dragon Total Favoured Ice Cream Vanilla Chocolate Total 72 29 101 210 825 1035 282 854 1136 72 people like vanilla AND cats best. (boring) 825 people like chocolate AND dragons best. (clearly better) Another perspective: Of the people that prefer cats to dragons, 72 like vanilla. Conditionals When we only consider one row or one column, we are conditioning on that response. “Of the people that prefer cats to dragons, 72 like vanilla.” Means: Conditional on the preference of cats, 72 (out of 101) prefer vanilla. To go further we need the ratio. A ratio is simply a measure of how much of one thing there is in comparison to another. Example: The fertility rate in Canada is 1.49 children per woman. Comparing # of (expected) children to # of women. Example: The DeLorean car goes 88 miles per hour. Comparing # of miles traveled to # of hours passed. Ratios can be used to make fair comparisons between things that come from different scales. Canada has a few more hockey players than the US, but a MUCH bigger hockey player to citizen ratio. Source: IIHF Survey of Players These comparisons can be made over time too. - There are roughly as many traffic fatalities in the US as there were 60 years ago. (~30,000) - Traffic fatalities are considered to be “at an all-time low” because much more driving is happening in 2009 than 1949, but resulting in the same number of deaths. - The fatalities per mile is much lower now. (About 1/6 as much) Source: National Highway Safety Traffic Administration http://www-fars.nhtsa.dot.gov/Main/index.aspx Ratios let us make a fair comparison between different conditions. Favoured Pet Cat Dragon Total Favoured Ice Cream Vanilla Chocolate Total 72 29 101 210 825 1035 282 854 1136 There may be more total vanilla fans among dragon fans but… 72 of 101, or 71% of cat fans prefer vanilla. 210 of 103, or 20% of dragon fans prefer vanilla. When we compare the ratios instead of the raw numbers, we account for the different sizes of the groups being considered. Safety Status Vehicle Motorbike Car Total Died in Traffic Else 11 9989 54 99,946 65 109,935 Total 10,000 100,000 110,000 Which do you think is more dangerous? Motorbikes or cars? Cars have more fatalities, but the fatalities per user is much higher with motorbikes. Safety Status Vehicle Motorbike Car Total Died in Traffic Else 11 9989 54 99,946 65 109,935 11/10000 = 0.11% Fatality rate on bikes. 54/100000 = 0.05% Fatality rate on cars. Total 10,000 100,000 110,000 So far we’ve only seen 2x2 crosstabs, but larger ones are possible. The age and sex table of last week is a (2 x lots) crosstab. (Also, age is ordinal but sex is nominal, mixes fine) Ordinal variables can be included because they’re still categories. You can have more than two categories for both variable too. Child’s Education <HS HS College BSc < High School 142 381 112 31 High School 157 637 225 57 Mother’s Some College 36 206 486 549 Education BSc 18 25 68 410 Adv. Degree 4 11 22 91 TOTAL 357 1260 913 1138 MSc+ TOTAL 2 668 14 1090 98 1375 103 624 54 182 271 3939 Association and causation If the response from one variable is more common when a particular response from another variable appears, we say there is a positive association between the two responses. Safety Status Vehicle Motorbike Car Total Died in Traffic Else 11 9989 54 99,946 65 109,935 Total 10,000 100,000 110,000 A negative association means one is less common when the other happens. Here we would say that riding a motorbike has a positive association with dying in traffic. !!!!!!!: But an association does NOT imply causation. Just because two things happen together does not mean one of them caused the other. Example (this is actually true): Being left handed has a negative association with living a long life. That’s right, left handed people historically don’t live as long as right handed people. Source: Evidence for longevity differences between left handed and right handed men: an archival study of cricketers. (1991) J P Aggleton, R W Kentridge, and N J Neave What’s so horribly wrong with lefties? …nothing. Only the ones that died younger were counted. Not long ago, left handedness was considered. Children were forced to adopt right handedness, but not anymore. If you looked at all the deaths today, more of the people that were born a long time ago will be right handed. So more of the people that had long lives ending today will have been right handed. It isn’t that lefties are dying young, it’s that lefties make up a smaller portion of old people than of young people. The authors of the source paper didn’t account for forcedhandedness. Forced-handedness is a lurking variable , or a confound. It’s something other than handedness that affects life expectancy, but hasn’t been accounted for. Unless you control for all other possible variables, you can’t pin an association down to one thing causing the other. (Almost never in social research) Later in the semester when we see the interval version of association, called correlation, we’ll revisit this again. Crosstabs in SPSS To build a crosstab table: Analyze Descriptive Statistics Crosstabs In the crosstab pop-up, move one variable to rows and one to columns, and click “OK” In the output window, the crosstab will appear with the labels instead of the variable names if you set them. Another name for a crosstab is a contingency plot. Next Time, we eat a honey badger*. Next Lecture: Standard Deviation, for real. *Patently untrue. Apologies to EpicMealTime