Crosstabs, Contingency Tables, and Joint Frequency Distributions

advertisement
LAB GUIDE (GARNER)
CROSSTABS, CONTINGENCY TABLES, AND JOINT FREQUENCY
DISTRIBUTIONS
In this lab we will work on producing and interpreting crosstabs in SPSS/PASW.
Crosstabs is a data analysis technique for categoric variables; all variables need to be at the
nominal, ordinal, or dichotomous level of measurement.
The Data Set
We will be using General Social Survey (GSS) data from a survey of a representative sample of
the US adult population. This survey of demographic characteristics and opinions is carried out
regularly. The particular data set we are using is a bit old (as you can see from the question about
presidential candidates), but it is perfectly useful for learning how to make and read tables (just
don’t view it as contemporary information). Alternatively, StatisticsCanada can be used for this
type of analysis.
The Variables
 Use only categoric (nominal, ordinal, and dichotomous) variables in the tables.
 Do not use variables measured at the interval-ratio level. (OK, go ahead and try some, but
notice what happens, and do NOT print out the result!) For example, “years of education”
is not a good choice for a crosstab. (Use “highest degree” instead.)
 It is a good idea to use variables that do NOT have a lot of categories; avoid producing
tables with many cells and very small cell sizes. In fact, you may get a warning about
expected cell counts being less than 5 for the computation of the chi-square test statistic
when you try to make a table with variables that have too many categories, a table that,
consequently, ends up with the cases distributed in too many cells with very small counts.
The Procedure for Bivariate (2-variable) Crosstabs: Click Analyze–Descriptive Statistics–
Crosstabs
1. Decide which two variables you would like to use (suggestions follow).
2. Decide which one is the independent variable and which is the dependent variable. If you
can’t make up your mind, make two separate tables.
3. Place the INDEPENDENT VARIABLE in the COLUMN(S) and the DEPENDENT
VARIABLE in the ROW(S). This will give your table the correct title (DV by IV), and it
follows the best practice.
4. Click on Cells and, in the “Cell Display” dialog box, PERCENTAGE ONLY IN THE
COLUMNS (i.e., wherever you placed the independent variable). DO NOT
PERCENTAGE ANYWHERE ELSE! (More percentages are NOT better—they just
1
make the table messy and unreadable.) You can tell if you have done it correctly if the
column percentages add up to 100% down the column.
5. Click on Statistics and request chi-square (test of significance).
6. Also in the “Statistics” dialog box, request lambda for nominal data and gamma for
ordinal data (measures of association).
Interpreting Output
1. First look at the significance level of chi-square. It is a p-value, and only if it is <.05 are
the differences in the percentage distributions statistically significant.
2. Do not confuse the computed value of chi-square, the test statistic, with its p-value
(“Sig.”). The “Sig.” is more easily readable for reaching a conclusion.
3. A low probability for “Sig.” (less than .05) allows us to REJECT the null hypothesis that
there is no relationship between the variables. (The null hypothesis is that the variables
are independent of each other, in other words, that they are not related.)
4. If you want to know how chi-square was computed, you can click Expected in the “Cell
Display” dialog box, as well as Observed—and the table will show you what the cell
counts are for the null hypothesis. These values were then used to compute chi-square
according to the formula—see the text for the formula. (The final version of the table
should not include expected cell counts, however; displaying them is like having training
wheels on your bike.)
5. Look at the percentage distributions, comparing across the columns. This shows
differences among the independent variable categories in the proportion of cases that fell
into dependent variable categories.
6. You may want to look at whether the modes of the DV are different for different
categories of the IV, although that is less important than the test of significance and the
overall “tilt” of the percentage comparison.
7. You can look at one or the other of the measures of association (lambda for nominal
variables, gamma for ordinal variables)—with caution. Lambda will be 0 if the modes are
the same, even if there are considerable and significant percentage differences.
Writing about It
Discuss what the question was in terms of the relationship of variables. Were the results
significant? What does the comparison of percentages tell us about the relationship of these
variables?
2
What Variable Combinations to Try
1. Use sex of respondent as the independent variable, and then try (one at a time) other
characteristics and opinion variables as the dependent variable: e.g., life exciting or dull,
respondent’s highest degree, belief in life after death, one of the musical preference
questions, importance of having a fulfilling job, if rich keep working, and income
quartiles. (Use any of these as DVs with respondent’s sex as the IV.)
2. Next, try another demographic characteristic such as total family income in quartiles or
college degree status (degree2) as your independent variable, and see if it is related to
various characteristics and opinions.
3. Finally, you can try to see if opinion variables are related to each other. For example, is
respondent’s happiness related to liking country music?
What Results to Keep and Write Up
Select four tables to print out for future reference. At least one should show a result that is NOT
significant (p >= .05); an excessively large p-value is interpreted to mean that we fail to reject
the null hypothesis of the independence of the variables. This result will help you become
accustomed to seeing both significant and non-significant chi-square results.
Remember, if the p-value is less than .05, the result is significant; this small p-value is associated
with a large computed chi-square. It is big enough to exceed the critical value in the chi-square
table. If the p-value is too large (>= .05), the result is not significant; this happens when the
computed chi-square is small and fails to exceed the critical value.
If this gets confusing, think back to how chi-square was computed! Our very first step was to
subtract the expected null hypothesis count from the observed count found in the real data. Thus,
the computed chi-square is large when there is a big discrepancy between the data we observed
and the expected values that we calculated for the null hypothesis; a large chi-square goes with a
small p-value (it has a low probability of being a result that came up due to sampling error). And
the large chi-square points to a good likelihood that there is a difference in the population.
For each table with significant results write a short paragraph “in English” discussing what you
found.
Going a Bit Further
When you feel comfortable producing and reading bivariate tables, you may want to try a few
three-variable tables. If so, introduce your third variable as the “Layer” variable. Use a
dichotomous variable for the third (layering) variable (such as gender or whether someone has a
college degree); otherwise, the results will be very hard to read. SPSS/PASW will create tables
with a chi-square test separately for each category of the third variable, and each will have its
own chi-square result—one layer may have a significant result, and the other(s) may not.
3
Download