Log-Linear Contingency Table Analysis, Two-Way Please read the section on Likelihood Ratio Tests in Howell's Statistical Methods for Psychology (p. 156-157 in the 7th edition) for a brief introduction to the topic of Likelihood Ratio Tests and their use in log-linear analysis of data from contingency tables. You should also read Chapter 17 in Howell and Chapter 7 in Tabachnick and Fidell. Also recommended is David Garson’s document at http://faculty.chass.ncsu.edu/garson/PA765/logit.htm . The Data The data for this assignment were presented in Chapter 9 of the SPSS Advanced Statistics Student Guide, 1990. We wish to determine whether or not there is an association between a person's marital status and happiness. The data are in the file LogLin2.sav on my SPSS Data Page. Download the data file and bring it into SPSS. In the data editor you will see that there are three variables, Happy, Marital and Freq. Happy has two values, 1 (Yes) and 2 (No). Marital has three values, 1 (Married), 2 (Single), and 3 (Split). Each row in the data file represents one cell in the 2 x 3 contingency table. The Freq variable is, for each cell, the number of observations in that cell. I used the WEIGHT CASES command to make SPSS treat the values of FREQ as cell weights. From the data page, I selected Data, Weight Cases, Weight Cases By, Freq. When I saved the data file, the weighting information was saved with it, so you do not have to tell SPSS to use the Freq variable as cell counts, it already knows that when it opens the data file. The PASW/SPSS Actions to Do the Analysis 1. Analyze, Descriptive Statistics, Crosstabs. Move Happy into the Rows box and Marital into the Columns box. Click Statistics and select 2. Continue. Click Cells and select Observed and Column Percentages. Continue. OK. This produces the contingency table with 2 analysis. LogLin2.doc Page 2 2. Analyze, Loglinear, Model Selection. Select "Enter in Single Step." Move Happy and Marital into the Factors box, defining the range for Happy as 1,2 and for Marital as 1,3. Click Model and select Saturated. Click Options and select Display Frequencies, Parameter Estimates, and Association Table. You do not need to display the residuals, since they will all be zero for a saturated model. Change “Delta” from .5 to 0. By default, SPSS will add .5 to every cell in the table to avoid the possibility of having cells with a frequency of 0. We have no such cells, so we need not do this. Doing so would have a very small effect on the results. Click OK to conduct the analysis. 3. Same as 2, with these changes: For Model, select Custom and then move Happy and Marital into the Generating Class box, without any interaction term. Continue Page 3 For Options, ask to display frequencies and residuals. Continue, OK. 4. Go to the data page, variable view, and declare 3 to be a missing value – click in the right-hand side of the cell Marital, Missing. OK. Now run crosstabs again, just as in step 1 -- but this time it will be one of the three 2 x 2 tables that can be constructed from our original 2 x 3 table. 5. Go back to the data page and declare 2 (but not 3) to be a missing value. Run crosstabs again. 6. Go back to the data page and declare 1 (but not 2) to be a missing value. Run crosstabs again. Page 4 Crosstabs: Happiness is Related to Marital Status Now look at the output from the first invocation of crosstabs. The table suggests that reported happiness declines as you move from married to single to split. Both the Pearson and the Likelihood Ratio 2 are significant. Hierarchical Log Linear Analysis, Saturated Model The Loglinear, Model Selection path called up an SPSS routine known as Hiloglinear. This procedure does a hierarchical log linear analysis. The generating class “happy*marital” means that the model will include the two factors as well as their interaction. When a model contains all of the possible effects (a so-called "saturated model," it will predict all of the cell counts perfectly. Our model is such a saturated model. Goodness of Fit Tests These null hypothesis for these tests is that the model fits the data perfectly. Of course, with a saturated model that is true, so the p value will be 1.0. SPSS gives us additional goodness of fit tests, where the value of the LR 2 equals how much the goodness of fit 2 would increase (indicating a reduction in the goodness of fit of the model with the data) were we to delete certain effects from the model. For example, with these data, were we to drop the two-way effects (the Happy x Marital interaction), the LR 2 would increase by 48.012, a significant (p < .0001) decrease in how well the model fits the data. Parameter Estimates A saturated log-linear model for our 2 variable design is of the form: LN(cell freq)ij = + i + j + ij where the term on the left is the natural log of the frequency in the cell at level i of the one variable and level j of the other. The is a constant, the natural log of the geometric mean of the expected cell frequencies. i is the parameter lambda associated with being at level i of the one variable, j is the same for the other variable, and ij is the interaction parameter. Look at the "Parameter Estimates" portion of the output. You get one parameter for each degree of freedom, and you can compute the redundant parameter simply, since the coefficients must sum to zero across categories of a variable. Page 5 For the Main Effect of Marital Status For Marital = 1 (married), = +.397 for Marital = 2 (single), = -.415 Accordingly, for Marital = 3 (split), = 0 - (.397 - .415) = .018. For the Main Effect of Happiness For Happy = 1 (yes), = +.885 Accordingly, for Happy =2 (no), is -.885. For the HappyMarital Interaction 1/1 = Happy/Married, = +.346 Accordingly, No/Married is -.346. Happy/Single is -.111 Accordingly, Unhappy/Single is +.111. Happy/Split = 0 - (.346 - .111) = -.235 Unhappy/Split = 0 - (-.235) = .235. The coefficients can be used to estimate the cell frequencies. When the model is saturated (includes all possible parameters), the expected frequencies will be equal to the observed frequencies. The geometric mean of the cell frequencies, µ, is found by taking the kth root of the product of the cell frequencies, where k is the number of cells. For our data, that is 6 787(221)(301)(67)( 47)(82) 154.3429 ; u is the natural log of , which equals 5.0392. For the married & happy cell, the model predicts that the natural log of the cell frequency = 5.039 + .397 +.885 +.346 = 6.667. The natural log of the observed cell frequency (787) is 6.668. Rounding error accounts for the discrepancy of .001. These coefficients may be standardized by dividing by their standard errors. Such standardized parameters are identified as "Z-value" by SPSS, because their significance can be evaluated via the standard normal curve. When one is attempting to reduce the complexity of a model by deleting some effects, e may decide to delete any effect whose coefficients are small. Our largest coefficients are those for Happy. The positive value of lambda for Happy Yes simply reflects the fact that a lot more people said Yes than No. The large for Married reflects the fact that most of the participants were married. These main effects (marginal frequencies) coefficients may be quite important for predicting the frequency of a given cell, when the marginal frequencies do differ from one another, but they are generally not of great interest otherwise. The interaction coefficients are interesting, since they reflect associations between variables. For example, the +.346 lambda for Happy=Yes/Marital=Married indicates that the actual frequency of persons in the Happy/Married cell is higher than Page 6 would be expected based only upon the marginal frequencies of marital status and happiness. Likewise, the -.235 for Happy Yes/Marital Split indicates that there are fewer participants in that cell than would be expected if being split were independent of happiness. A Reduced Model From the output we have already inspected, we know that it would not be a good idea to delete the interaction term -- but this is an instructional analysis, so we shall delete the interaction and evaluate a main effects only model. For our main effects only model, the expected (predicted) cell frequencies are exactly those that would be used for a traditional Pearson 2 contingency table analysis -- the frequencies expected if the row variable is independent of the column variable. The residuals here (differences in observed and expected frequencies) are rather large, indicating that the model no longer does a very good job of predicting the cell counts. Note that our goodness of fit chi-square jumped from 0 to 48.012, p < .001 -- a value that we have seen twice before: We saw that 2 from the crosstabs we did first and then again with the test that the 2way effects are zero. Pairwise Comparisons It is possible to break down our 3 x 2 Marital Status x Happiness table into three 2 x 2 tables, each of which represents a comparison of two marital status groups with respect to their reported happiness. Our last three invocations of crosstabs accomplish this analysis. The column percentages for the first table show us that 92.2% of the married people are happy, compared to 82.5% of the single people, a statistically significant difference by the LR 2, p < .001. The second crosstabs shows that happiness is significantly more likely in married people than in divorced people. The last crosstabs shows that the difference between single people and divorced people falls short of statistical significance. SPSS Loglinear This program can only be run by syntax. Here is the syntax: LOGLINEAR Happy(1,2) Marital(1,3) / CRITERIA=Delta(0) / PRINT=DEFAULT ESTIM / DESIGN=Happy Marital Happy by Marital. Paste it into a syntax window and run it. You will see that you get the same parameter estimates that you got with Hiloglinear. If you want to print this very wide output, you will need to export it to a rtf document first and adjust the margins to .5 inch, Page 7 change the layout to landscape, and reduce the font size so that the lines do not wrap. Do not use a proportional font. To export the output, go to the output window and click File, Export. Set Type to Word/RTF, point to the location where you wish to save the document, and click OK. SAS Catmod The program below will conduct the analysis. options pageno=min nodate formdlim='-'; data happy; input Happy Marital count; cards; 1 1 787 1 2 221 1 3 301 2 1 67 2 2 47 2 3 82 proc catmod; weight count; model Happy*Marital = _response_; Loglin Happy|Marital; run; You will find that the parameter estimates produced by SAS Catmod are identical to those produced by SPSS Hiloglinear. Now compare the Partial Associations table from SPSS Catmod with the Maximum Likelihood Analysis of Variance table from SAS. You will find that they differ greatly. I find this disturbing. Tabachnick and Fidel wrote Page 8 “due to differences in the algorithms used <by SAS Catmod and SPSS Hiloglinear), these estimates differ a bit <sic> from <each other>. SPSS Genlog This procedure codes the variables differently then does those discussed earlier, so the parameter estimates will differ from those produced by the other procedures. The other procedures use what we called “effects coding” when we discussed least squares ANOVA. Each parameter estimate contrasts one level with the grand mean. Genlog uses what we called “dummy coding” when we discussed least squares ANOVA. For a k level variable, the last level is the reference level, so that each of the k-1 parameter estimates represents a contrast between one level and the reference level. Here is how to point and click your way to a Genlog Analysis of our data: Weight cases by freq and then click Analyze, Loglinear, General. Page 9 Karl L. Wuensch Department of Psychology East Carolina University Greenville, NC 27858 October, 2009 Return to my Statistics Lessons Page PowerPoint to Accompany This Lesson SPSS Output o Crosstabs & Hiloglinear o Loglinear o Genlog