Log-Linear Contingency Table Analysis, Two-Way

advertisement
Log-Linear Contingency Table Analysis, Two-Way
Please read the section on Likelihood Ratio Tests in Howell's Statistical Methods
for Psychology (p. 156-157 in the 7th edition) for a brief introduction to the topic of
Likelihood Ratio Tests and their use in log-linear analysis of data from contingency
tables. You should also read Chapter 17 in Howell and Chapter 7 in Tabachnick and
Fidell. Also recommended is David Garson’s document at
http://faculty.chass.ncsu.edu/garson/PA765/logit.htm .
The Data
The data for this assignment were presented in Chapter 9 of the SPSS
Advanced Statistics Student Guide, 1990. We wish to determine whether or not there
is an association between a person's marital status and happiness. The data are in the
file LogLin2.sav on my SPSS Data Page. Download the data file and bring it into
SPSS. In the data editor you will see that there are three variables, Happy, Marital and
Freq. Happy has two values, 1 (Yes) and 2 (No). Marital has three values, 1 (Married),
2 (Single), and 3 (Split). Each row in the data file represents one cell in the 2 x 3
contingency table. The Freq variable is, for each cell, the number of observations in
that cell. I used the WEIGHT CASES command to make SPSS treat the values of
FREQ as cell weights. From the data page, I selected Data, Weight Cases, Weight
Cases By, Freq. When I saved the data file, the weighting information was saved with
it, so you do not have to tell SPSS to use the Freq variable as cell counts, it already
knows that when it opens the data file.
The PASW/SPSS Actions to Do the Analysis
1. Analyze, Descriptive Statistics, Crosstabs. Move Happy into the Rows box
and Marital into the Columns box. Click Statistics and select 2. Continue. Click Cells
and select Observed and Column Percentages. Continue. OK. This produces the
contingency table with 2 analysis.
LogLin2.doc
Page 2
2. Analyze, Loglinear, Model Selection. Select "Enter in Single Step." Move
Happy and Marital into the Factors box, defining the range for Happy as 1,2 and for
Marital as 1,3. Click Model and select Saturated. Click Options and select Display
Frequencies, Parameter Estimates, and Association Table. You do not need to display
the residuals, since they will all be zero for a saturated model. Change “Delta” from .5
to 0. By default, SPSS will add .5 to every cell in the table to avoid the possibility of
having cells with a frequency of 0. We have no such cells, so we need not do this.
Doing so would have a very small effect on the results.
Click OK to conduct the analysis.
3. Same as 2, with these changes: For Model, select Custom and then move
Happy and Marital into the Generating Class box, without any interaction term.
Continue
Page 3
For Options, ask to display frequencies and residuals.
Continue, OK.
4. Go to the data page, variable view, and declare 3 to be a missing value –
click in the right-hand side of the cell Marital, Missing.
OK. Now run crosstabs again, just as in step 1 -- but this time it will be one of the three
2 x 2 tables that can be constructed from our original 2 x 3 table.
5. Go back to the data page and declare 2 (but not 3) to be a missing value.
Run crosstabs again.
6. Go back to the data page and declare 1 (but not 2) to be a missing value.
Run crosstabs again.
Page 4
Crosstabs: Happiness is Related to Marital Status
Now look at the output from the first invocation of crosstabs. The table suggests
that reported happiness declines as you move from married to single to split. Both the
Pearson and the Likelihood Ratio 2 are significant.
Hierarchical Log Linear Analysis, Saturated Model
The Loglinear, Model Selection path called up an SPSS routine known as
Hiloglinear. This procedure does a hierarchical log linear analysis. The generating
class “happy*marital” means that the model will include the two factors as well as their
interaction. When a model contains all of the possible effects (a so-called "saturated
model," it will predict all of the cell counts perfectly. Our model is such a saturated
model.
Goodness of Fit Tests
These null hypothesis for these tests is that the model fits the data perfectly. Of
course, with a saturated model that is true, so the p value will be 1.0.
SPSS gives us additional goodness of fit tests, where the value of the LR 2
equals how much the goodness of fit 2 would increase (indicating a reduction in the
goodness of fit of the model with the data) were we to delete certain effects from the
model. For example, with these data, were we to drop the two-way effects (the Happy x
Marital interaction), the LR 2 would increase by 48.012, a significant (p < .0001)
decrease in how well the model fits the data.
Parameter Estimates
A saturated log-linear model for our 2 variable design is of the form:
LN(cell freq)ij =  + i + j + ij
where the term on the left is the natural log of the frequency in the cell at level i of the
one variable and level j of the other. The  is a constant, the natural log of the
geometric mean of the expected cell frequencies. i is the parameter lambda
associated with being at level i of the one variable, j is the same for the other variable,
and ij is the interaction parameter.
Look at the "Parameter Estimates" portion of the output. You get one parameter
for each degree of freedom, and you can compute the redundant parameter simply,
since the coefficients must sum to zero across categories of a variable.
Page 5



For the Main Effect of Marital Status
For Marital = 1 (married),  = +.397
for Marital = 2 (single),  = -.415
Accordingly, for Marital = 3 (split),  = 0 - (.397 - .415) = .018.


For the Main Effect of Happiness
For Happy = 1 (yes),  = +.885
Accordingly, for Happy =2 (no),  is -.885.






For the HappyMarital Interaction
1/1 = Happy/Married,  = +.346
Accordingly, No/Married is -.346.
Happy/Single is -.111
Accordingly, Unhappy/Single is +.111.
Happy/Split = 0 - (.346 - .111) = -.235
Unhappy/Split = 0 - (-.235) = .235.
The coefficients can be used to estimate the cell frequencies. When the model
is saturated (includes all possible parameters), the expected frequencies will be equal
to the observed frequencies.
The geometric mean of the cell frequencies, µ, is found by taking the kth root of
the product of the cell frequencies, where k is the number of cells. For our data, that is
  6 787(221)(301)(67)( 47)(82)  154.3429 ; u is the natural log of , which equals
5.0392.
For the married & happy cell, the model predicts that the natural log of the cell
frequency = 5.039 + .397 +.885 +.346 = 6.667. The natural log of the observed cell
frequency (787) is 6.668. Rounding error accounts for the discrepancy of .001.
These coefficients may be standardized by dividing by their standard errors.
Such standardized parameters are identified as "Z-value" by SPSS, because their
significance can be evaluated via the standard normal curve. When one is attempting
to reduce the complexity of a model by deleting some effects, e may decide to delete
any effect whose coefficients are small. Our largest coefficients are those for Happy.
The positive value of lambda for Happy Yes simply reflects the fact that a lot more
people said Yes than No. The large  for Married reflects the fact that most of the
participants were married. These main effects (marginal frequencies) coefficients may
be quite important for predicting the frequency of a given cell, when the marginal
frequencies do differ from one another, but they are generally not of great interest
otherwise.
The interaction coefficients are interesting, since they reflect associations
between variables. For example, the +.346 lambda for Happy=Yes/Marital=Married
indicates that the actual frequency of persons in the Happy/Married cell is higher than
Page 6
would be expected based only upon the marginal frequencies of marital status and
happiness. Likewise, the -.235 for Happy Yes/Marital Split indicates that there are
fewer participants in that cell than would be expected if being split were independent of
happiness.
A Reduced Model
From the output we have already inspected, we know that it would not be a good
idea to delete the interaction term -- but this is an instructional analysis, so we shall
delete the interaction and evaluate a main effects only model. For our main effects only
model, the expected (predicted) cell frequencies are exactly those that would be used
for a traditional Pearson 2 contingency table analysis -- the frequencies expected if the
row variable is independent of the column variable. The residuals here (differences in
observed and expected frequencies) are rather large, indicating that the model no
longer does a very good job of predicting the cell counts. Note that our goodness of fit
chi-square jumped from 0 to 48.012, p < .001 -- a value that we have seen twice before:
We saw that 2 from the crosstabs we did first and then again with the test that the 2way effects are zero.
Pairwise Comparisons
It is possible to break down our 3 x 2 Marital Status x Happiness table into three
2 x 2 tables, each of which represents a comparison of two marital status groups with
respect to their reported happiness. Our last three invocations of crosstabs accomplish
this analysis.
The column percentages for the first table show us that 92.2% of the married
people are happy, compared to 82.5% of the single people, a statistically significant
difference by the LR 2, p < .001. The second crosstabs shows that happiness is
significantly more likely in married people than in divorced people. The last crosstabs
shows that the difference between single people and divorced people falls short of
statistical significance.
SPSS Loglinear
This program can only be run by syntax. Here is the syntax:
LOGLINEAR Happy(1,2) Marital(1,3) /
CRITERIA=Delta(0) /
PRINT=DEFAULT ESTIM /
DESIGN=Happy Marital Happy by Marital.
Paste it into a syntax window and run it. You will see that you get the same
parameter estimates that you got with Hiloglinear. If you want to print this very wide
output, you will need to export it to a rtf document first and adjust the margins to .5 inch,
Page 7
change the layout to landscape, and reduce the font size so that the lines do not wrap.
Do not use a proportional font. To export the output, go to the output window and click
File, Export. Set Type to Word/RTF, point to the location where you wish to save the
document, and click OK.
SAS Catmod
The program below will conduct the analysis.
options pageno=min nodate formdlim='-';
data happy;
input Happy Marital count;
cards;
1 1 787
1 2 221
1 3 301
2 1 67
2 2 47
2 3 82
proc catmod;
weight count;
model Happy*Marital = _response_;
Loglin Happy|Marital;
run;
You will find that the parameter estimates produced by SAS Catmod are identical
to those produced by SPSS Hiloglinear. Now compare the Partial Associations table
from SPSS Catmod with the Maximum Likelihood Analysis of Variance table from SAS.
You will find that they differ greatly. I find this disturbing. Tabachnick and Fidel wrote
Page 8
“due to differences in the algorithms used <by SAS Catmod and SPSS Hiloglinear),
these estimates differ a bit <sic> from <each other>.
SPSS Genlog
This procedure codes the variables differently then does those discussed earlier,
so the parameter estimates will differ from those produced by the other procedures.
The other procedures use what we called “effects coding” when we discussed least
squares ANOVA. Each parameter estimate contrasts one level with the grand mean.
Genlog uses what we called “dummy coding” when we discussed least squares
ANOVA. For a k level variable, the last level is the reference level, so that each of the
k-1 parameter estimates represents a contrast between one level and the reference
level.
Here is how to point and click your way to a Genlog Analysis of our data: Weight
cases by freq and then click Analyze, Loglinear, General.
Page 9
Karl L. Wuensch
Department of Psychology
East Carolina University
Greenville, NC 27858
October, 2009



Return to my Statistics Lessons Page
PowerPoint to Accompany This Lesson
SPSS Output
o Crosstabs & Hiloglinear
o Loglinear
o Genlog
Download