Chapter 11 Contingency Table Analysis Nonparametric Systems • Another method of examining the relationship between independent (X) and dependant (Y) variables is contingency table analysis. • Up to this point we have used parametric statistics. These methods make a number of assumptions about the way that the population that serves as the basis for your research sample is distributed. Nonparametric Systems • Correlation and regression assumes that the independent and dependent variables are linearly related. • Other assumptions behind the use of parametric statistics include: – Independent observations: the measurement or selection of one case does not affect the measurement or selection of another case – The level of measurement for the variables is at least interval in nature Nonparametric Systems • If these assumptions can not be meet then the researcher has the option of using nonparametric statistics. • They are useful when the data is nominal or ordinal, and they require no assumptions about the population parameters Nonparametric Systems • Parametric statistics are usually preferred • because they are more powerful, the power of a statistic involves the acceptance of a false null hypothesis (reaching the conclusion that there are no differences between the sample and the population when in fact there are) The greater power of the stats the less likely the researcher is to commit a Type II error Nonparametric Systems • In this chapter we will consider one particular • • type of nonparametric measure, the chi-square (X2), and some of the measures of association that are used with nominal and ordinal data. Chi-squared is most appropriate when the data is divided into mutually exclusive categories that can not be legitimately summed up- data at the nominal or ordinal level Chi-squared tells us whether the observed distribution is significantly different from the one that we would expect to occur by chance. Constructing Contingency Tables • A contingency table is a joint frequency distribution- a frequency distribution with two categorical variables. • Again we are concerned with the relationship between the independent and dependant variables. • The contingency table is also known as a crosstabulation, because it counts the cases that fall into each pairing of the table Constructing Contingency Tables • The cells contain those cases that fall into each • pairing of the variables- the number of cases that fit the categories described by the cross listing of the variables. The joint frequencies fall within the cells of the table under the categories for the independent and dependant variables. It is called a contingency table because the cases contained along the rows (the categories of the dependent variable) are contingent upon what is contained along the columns( the independent variables). Constructing Contingency Tables • Consider a crosstabulation of race and attitudes • toward capital punishment from the National Crime Survey data set. Our research hypothesis is that whites are more likely to favor capital punishment than are minorities. The examination begins with a look at the frequency distribution of a variable, including the percentages within each categories as it relates to the entire group. Constructing Contingency Tables • Marginals are the total frequency column, • because they are presented at the margins of the table. However we are usually concerned with subgroup analysis. We examine the breakdown of frequencies and percentages of the dependent variables as they are categorized under the independent variable, as with correlation, the assumption is that the independent variable produces an effect on the dependent variable. Constructing Contingency Tables • The table lists the independent and dependant variables, the usual procedure is to construct the table so that the independent variable is listed along the columns and the dependant variable follows the rows. We are interested in the impact of the independent variable follows the rows. So we read the table by comparing the percentage value of the column (independent variable) for the subgroups under the dependent variable (rows) Constructing Contingency Tables • The examination of a relationship typically begins with a look at the frequency distribution of a variable, including the percentages within each category as it relates to the entire group. • The total frequencies are called marginals, because they are found in the in the margins of the table. Constructing Contingency Tables • With this method we usually concerned with the subgroup analysis, the breakdown of frequencies and percentages of the dependent variable as they are categorized under the independent variable, as with correlation the assumption is that the independent variable produces and effect on the dependent variable. Constructing Contingency Tables • The usual procedure is to construct the table so the independent variable is listed along the columns and the dependent variable is listed along the rows. • We read the table by comparing the percentage value of the column (independent variable) for the subgroups under the dependent variable (rows) Rules for the Construction and Interpretation of Tables • 1. Divide the sample into categories based upon the • • • • values of the independent variable. 2. The table should be fully labeled. The categories of the independent and dependent variable should be clearly presented. The variable headings should describe what is contained in the table. 3. The independent variable follows the columns of the table. The dependent variable follows the rows of the table. 4. Each subgroup is described in terms of the categories of the dependent variable. 5. To read the table, compare the percentages of the independent variable subgroups in terms of the percentages of the subgroups of the dependent variable. Chi-Square Test for Independent Samples • In statistical analysis, conclusions typically result • from a description of the findings. Inferential statistics then allow us to make a decision about the null hypothesis and whether this finding would hold true if we had the data from the entire population. In the previous example a statistical test is needed to determine whether we can assume that this difference in attitudes on capital punishment between racial groups also exists in the entire population. Chi-Square Test for Independent Samples • The data are at the nominal (race) and ordinal • • (support for capital punishment) level of measurement . The groups and the choices fall into different categories. We can use chi-squared to tell us the probability that the frequencies we observed in our survey results (observed frequencies) differ from an expected (hypothesized) set of frequencies. With chi-squared, these expected frequencies represent what we could expect to occur by chance Chi-Square Test for Independent Samples • Chi-squared is based upon the differences • • between observed and expected frequencies. It tells us the level of probability of obtaining the differences between the observed and expected frequencies. If the observed frequencies (the survey results) differ greatly (.05 level) then the null hypothesis can be rejected. If they do not substantially differ, the difference between the two sets of frequencies could be due to a sampling error. Limitations of Chi-Squared • 1. The sample must be randomly selected • 2. Each category must be independent – the • way in which one response is categorized does not influence the way that another response is listed. In our example, the opinion of one respondent did not affect another in terms of his/her attitude toward the death penalty. 3. Each cell must have an expected frequency of no less than five Calculation of Chi-Squared • Chi-squared is relatively easy to calculate by hand with a calculator. In table 11.2 we show how to calculate chi-squared by hand in our example of the relationship between race and attitude toward capital punishment. • Insert table 11.2 Calculation of Chi-Squared using SPSS • To calculate chi-squared using SPSS, take the following steps. • 1. On the Menu bar, click on “Analyze” • 2. On the drop down menu, click on “Descriptive Statistics” • 3. On the next menu, click on “Crosstabs”. These steps are on figure 11.1 Calculation of Chi-Squared using SPSS • 4. In the Crosstabs menu, the variables are • listed in the left-hand window. Highlight “Favor: Death Penalty for Murderers” and paste it into the “Row” window by clicking on the arrow button. This is your dependent variable (Y) – the respondents attitude toward capital punishment. Remember that Y is always the row variable in a contingency table. This is shown in Figure 11.2 Calculation of Chi-Squared using SPSS • 5. In the same window, highlight the • • independent variable (X), “Race Recode” and paste it into the “Columns” window by clicking on the arrow button. 6. In the “Crosstabs” window, click in the “Statistics” button. The “Crosstabs: Statistics” menu then appears. Click on the box next to “Chi-square” to include a checkmark. Then, click on the “Continue” button This is shown in figure 11.3 Calculation of Chi-Squared using SPSS • 7. When you return to the “Crosstabs” window, • click the “Cells” button. The “Crosstabs: Cell Display” window appears. In this window, in the “Counts” section, click the box next to “Observed” to make a checkmark. Then in the “Percentages” section, click on the box next to “Column”. Now your contingency table will give you the observed frequencies for each cell. The table will contain the percentages for the independent variable. This is shown in figure 11.4 Calculation of Chi-Squared using SPSS • 8. Click on the “Continue” button. You then return to the “Crosstabs” window. Click on “OK” to generate your contingency table and the chi-squared statistic. • This is shown in figure 11.5 Figure11.5 • The Crosstabs printout contains the contingency • table and statistics. The first table tells us the number of cases in the sample that had valid information for these variables. The second table mirrors our Table 11.1. Our conclusion is that a higher percentage of whites favor the death penalty in murder cases by a difference of 30 percentage points (whites, 78.5 percent; minorities, 48.5 percent). SPSS Output • What we still need is a decision as to whether • this conclusion would be true if we had data from the entire U.S. population. This is where chi-squared as an inferential statistic, comes in. One major limitation of chi-squared is that no cell can have a expected frequency of less than five, in our case our lowest expected frequency is 17.46, so we can assume that our chi-squared statistic is valid. SPSS Output • In the chi-squared tests table, we see that with two degrees of freedom the Pearson Chi-Squared value of 86.304 is significant at .000. • Because .000 is less than .05, we reject the null hypothesis. Our research conclusion is statistically significant . Measures of Association with ChiSquared • Another aspect of chi-squared analysis involves • measures of association. These measures indicate the strength of a relationship between the independent and dependent variable. The measures of association available under SPSS Studentware are listed in the “Crosstabs: Statistics” screen. The following measures are listed under the “Nominal” section. Cramer’s V • Is useful with nominal data. • It is probably the most used popular of the three measures we have discussed because it has a lower limit of 0 (no relationship) and an upper limit of 1 (perfect relationship). Unlike C and Phi, there is no need to do further calculations to determine the upper limit of Cramer’s V. Introducing a third Variable • We are introducing a third variable as a second independent or control variable. • We reexamine the relationship between the original two variables (X and Y) within each of the categories of the control variable and then compare the results across the categories of the control variable. Introducing a third Variable • Returning to our examination of the forces that influence attitudes toward capital punishment, another key independent variable is sex. • Table 11.5 shows this reexamination. Conclusion • Categorical data measured at the nominal and ordinal • • • • level are very common in criminal justice research. A contingency table is an excellent method to summarize and highlight research findings. Conclusions are drawn from the table and its results. Chi-Squared and its accompanying measures of associations provide a method to determine statistical significance. In order to address complex problems such as crime, multivariate analysis must be conducted. Usually there is more than one contributing factor to social problems such as crime.