LAB GUIDE (GARNER) CROSSTABS, CONTINGENCY TABLES, AND JOINT FREQUENCY DISTRIBUTIONS In this lab we will work on producing and interpreting crosstabs in SPSS/PASW. Crosstabs is a data analysis technique for categoric variables; all variables need to be at the nominal, ordinal, or dichotomous level of measurement. The Data Set We will be using General Social Survey (GSS) data from a survey of a representative sample of the US adult population. This survey of demographic characteristics and opinions is carried out regularly. The particular data set we are using is a bit old (as you can see from the question about presidential candidates), but it is perfectly useful for learning how to make and read tables (just don’t view it as contemporary information). Alternatively, StatisticsCanada can be used for this type of analysis. The Variables Use only categoric (nominal, ordinal, and dichotomous) variables in the tables. Do not use variables measured at the interval-ratio level. (OK, go ahead and try some, but notice what happens, and do NOT print out the result!) For example, “years of education” is not a good choice for a crosstab. (Use “highest degree” instead.) It is a good idea to use variables that do NOT have a lot of categories; avoid producing tables with many cells and very small cell sizes. In fact, you may get a warning about expected cell counts being less than 5 for the computation of the chi-square test statistic when you try to make a table with variables that have too many categories, a table that, consequently, ends up with the cases distributed in too many cells with very small counts. The Procedure for Bivariate (2-variable) Crosstabs: Click Analyze–Descriptive Statistics– Crosstabs 1. Decide which two variables you would like to use (suggestions follow). 2. Decide which one is the independent variable and which is the dependent variable. If you can’t make up your mind, make two separate tables. 3. Place the INDEPENDENT VARIABLE in the COLUMN(S) and the DEPENDENT VARIABLE in the ROW(S). This will give your table the correct title (DV by IV), and it follows the best practice. 4. Click on Cells and, in the “Cell Display” dialog box, PERCENTAGE ONLY IN THE COLUMNS (i.e., wherever you placed the independent variable). DO NOT PERCENTAGE ANYWHERE ELSE! (More percentages are NOT better—they just 1 make the table messy and unreadable.) You can tell if you have done it correctly if the column percentages add up to 100% down the column. 5. Click on Statistics and request chi-square (test of significance). 6. Also in the “Statistics” dialog box, request lambda for nominal data and gamma for ordinal data (measures of association). Interpreting Output 1. First look at the significance level of chi-square. It is a p-value, and only if it is <.05 are the differences in the percentage distributions statistically significant. 2. Do not confuse the computed value of chi-square, the test statistic, with its p-value (“Sig.”). The “Sig.” is more easily readable for reaching a conclusion. 3. A low probability for “Sig.” (less than .05) allows us to REJECT the null hypothesis that there is no relationship between the variables. (The null hypothesis is that the variables are independent of each other, in other words, that they are not related.) 4. If you want to know how chi-square was computed, you can click Expected in the “Cell Display” dialog box, as well as Observed—and the table will show you what the cell counts are for the null hypothesis. These values were then used to compute chi-square according to the formula—see the text for the formula. (The final version of the table should not include expected cell counts, however; displaying them is like having training wheels on your bike.) 5. Look at the percentage distributions, comparing across the columns. This shows differences among the independent variable categories in the proportion of cases that fell into dependent variable categories. 6. You may want to look at whether the modes of the DV are different for different categories of the IV, although that is less important than the test of significance and the overall “tilt” of the percentage comparison. 7. You can look at one or the other of the measures of association (lambda for nominal variables, gamma for ordinal variables)—with caution. Lambda will be 0 if the modes are the same, even if there are considerable and significant percentage differences. Writing about It Discuss what the question was in terms of the relationship of variables. Were the results significant? What does the comparison of percentages tell us about the relationship of these variables? 2 What Variable Combinations to Try 1. Use sex of respondent as the independent variable, and then try (one at a time) other characteristics and opinion variables as the dependent variable: e.g., life exciting or dull, respondent’s highest degree, belief in life after death, one of the musical preference questions, importance of having a fulfilling job, if rich keep working, and income quartiles. (Use any of these as DVs with respondent’s sex as the IV.) 2. Next, try another demographic characteristic such as total family income in quartiles or college degree status (degree2) as your independent variable, and see if it is related to various characteristics and opinions. 3. Finally, you can try to see if opinion variables are related to each other. For example, is respondent’s happiness related to liking country music? What Results to Keep and Write Up Select four tables to print out for future reference. At least one should show a result that is NOT significant (p >= .05); an excessively large p-value is interpreted to mean that we fail to reject the null hypothesis of the independence of the variables. This result will help you become accustomed to seeing both significant and non-significant chi-square results. Remember, if the p-value is less than .05, the result is significant; this small p-value is associated with a large computed chi-square. It is big enough to exceed the critical value in the chi-square table. If the p-value is too large (>= .05), the result is not significant; this happens when the computed chi-square is small and fails to exceed the critical value. If this gets confusing, think back to how chi-square was computed! Our very first step was to subtract the expected null hypothesis count from the observed count found in the real data. Thus, the computed chi-square is large when there is a big discrepancy between the data we observed and the expected values that we calculated for the null hypothesis; a large chi-square goes with a small p-value (it has a low probability of being a result that came up due to sampling error). And the large chi-square points to a good likelihood that there is a difference in the population. For each table with significant results write a short paragraph “in English” discussing what you found. Going a Bit Further When you feel comfortable producing and reading bivariate tables, you may want to try a few three-variable tables. If so, introduce your third variable as the “Layer” variable. Use a dichotomous variable for the third (layering) variable (such as gender or whether someone has a college degree); otherwise, the results will be very hard to read. SPSS/PASW will create tables with a chi-square test separately for each category of the third variable, and each will have its own chi-square result—one layer may have a significant result, and the other(s) may not. 3