Multigraph for Loglinear Models

The Multigraph for
Loglinear Models
Harry Khamis
Statistical Consulting Center
Wright State University
Dayton, Ohio, USA
OUTLINE
1.
LOGLINEAR MODEL (LLM)
- two-way table
- three-way table
- examples
2.
MULTIGRAPH
- construction
- maximum spanning tree
- conditional independencies
- collapsibility
3.
EXAMPLES
2
Loglinear Model
Goal
Identify the structure of associations
among a set of categorical variables.
3
LLM: two variables
Y
1
2
3
…
J
Total
-----------------------------------------------------------------------------1
n11
n12
n13
…
n1J
n1+
2
n21
n22
n23
…
n2J
n2+
.
.
.
.
.
.
X
.
.
.
.
I
nI1
Total n+1
.
.
.
.
.
nI2
n+2
.
nI3
n+3
.
nIJ
n+J
.
nI+
n
…
…
4
LLM: two variables
Example
Survey of High School Seniors in Dayton, Ohio
Collaboration: WSU Boonshoft School of Medicine and
United Health Services of Dayton
Marijuana Use?
Yes
No
Total
--------------------------------------------------------------------Yes
914
581
1495
Cigarette Use?
No
46
735
781
Total
960
1316
2276
5
LLM: two variables
Two discrete variables, X and Y
Model of independence: generating class is [X][Y]
6
LLM: two variables
LLM of independence:
log  ij    
X
i

Y
j
where

i
X
i


Y
j
 0
j
7
LLM: two variables
Saturated LLM: generating class is [XY]:
log  ij     i   j   ij
X
Y
XY
where

i 
X

i
j
j 
Y

 ij 
XY
i

 ij  0
XY
j
Note :  ij  Odds Ratio
XY
8
LLM: two variables
Generating Probabilistic
Interpretation
Class
Model
------------------------------------------------------------------------------------X and Y independent
[X][Y]
pij = pi+p+j
X and Y dependent
[XY]
pij
9
LLM: three variables
Example: Dayton High School Data
Alcohol
Cigarette
Marijuana Use
Use
Use
Yes
No
---------------------------------------------------------------------------------Yes
Yes
911
538
No
44
456
No
Yes
No
3
2
43
279
10
LLM: three variables
Saturated LLM, [XYZ]:
log  ijk     i   j   k   ij   ik   jk   ijk
X
Y


Z
XY
XZ
YZ
XYZ
where

i
X
i


j
Y
j
i
XY
ij


j
XY
ij
 ...   
XYZ
ijk
0
k
11
LLM: three variables
Generating Probabilistic
Interpretation
Class
Model
-----------------------------------------------------------------------------------mutual independence
[X][Y][Z]
pijk = pi++p+j+p++k
joint independence
[XZ][Y]
pijk = pi+kp+j+
conditional independence
[XY][XZ]
pijk = pij+pi+k/pi++
homogeneous association*
[XY][XZ][YZ]
saturated model
[XYZ]
*nondecomposable
*
pijk
model
12
Decomposable LLMs

closed-form expression for MLEs

closed-form expression for
asymptotic variances (Lee, 1977)

conditional G2 statistic simplifies

allow for causal interpretations

easier to interpret the LLM
13
14
3 Categorical Variables: X, Y, and Z
If [X⊗Y] and [Y⊗Z]
then
[X⊗Z]
FALSE!
15
LLM: three variables
Generating Probabilistic
Interpretation
Class
Model
-----------------------------------------------------------------------------------mutual independence
[X][Y][Z]
pijk = pi++p+j+p++k
joint independence
[XZ][Y]
pijk = pi+kp+j+
conditional independence
[XY][XZ]
pijk = pij+pi+k/pi++
homogeneous association
[XY][XZ][YZ]
pijk = ψijφikωjk
saturated model
[XYZ]
pijk
16
3 Categorical Variables: X, Y, and Z
If [Y⊗Z] for all X = 1, 2, ….
then
[Y⊗Z]
FALSE!
17
LLM: three variables
Generating Probabilistic
Interpretation
Class
Model
-----------------------------------------------------------------------------------mutual independence
[X][Y][Z]
pijk = pi++p+j+p++k
joint independence
[XZ][Y]
pijk = pi+kp+j+
conditional independence
[XY][XZ]
pijk = pij+pi+k/pi++
homogeneous association
[XY][XZ][YZ]
pijk = ψijφikωjk
saturated model
[XYZ]
pijk
18
3 Categorical Variables: X, Y, and Z
If [Y⊗Z]
then
[Y⊗Z] for all X = 1, 2, 3, …
FALSE!
19
Which Treatment is Better?
TRIAL 1
CURED?
Yes
No
Total
---------------------------------------------A
40 (.20) 160
200
TREATMENT
B
30 (.15) 170
200
TRIAL 2
CURED?
Yes
No
Total
---------------------------------------85 (.85) 15
100
300 (.75) 100
400
Combine TRIALS 1 and 2:
CURED?
Yes
No
Total
----------------------------------------------A
125 (.42) 175
300
TREATMENT
B
330 (.55) 270
600
“Ask Marilyn”, PARADE section, DDN, pages 6-7, April 28, 1996
20
Florida Homicide Convictions Resulting in Death Penalty
ML Radelet and GL Pierce, Florida Law Review 43: 1-34, 1991
Death Penalty
Yes
No
---------------------------------------White
53 (0.11) 430
Defendant’s Race
Black
15 (0.08)
176
White Victim
Death Penalty
Yes
No
------------------------------------White
53 (0.11) 414
Defendant’s Race
Black
11 (0.23) 37
Black Victim
Death Penalty
Yes
No
-------------------------------------White
0 (0.00) 16
Black
4 (0.03) 139
21
Multigraph Representation of LLMs

Vertices = generators of the LLM

Multiedges = edges that are equal in
number to the number of indices shared by
the two vertices being joined
22
Multigraph: three variables
[XY][XZ]
XY
XZ
23
Examples of Multigraphs
[AS][ACR][MCS][MAC]
AS
MAC
ACR
MCS
24
Examples of Multigraphs
[ABCD][ACE][BCG][CDF]
CDF
ABCD
ACE
BCG
25
Maximum Spanning Tree
The maximum spanning tree of a multigraph M:
• tree (connected graph with no circuits)
• includes each vertex
• sum of the edges is maximum
26
Examples of maximum spanning trees
[XY][XZ]
XY
XZ
27
Examples of maximum spanning trees
[AS][ACR][MCS][MAC]
AS
ACR
MAC
MCS
28
Examples of maximum spanning trees
[ABCD][ACE][BCG][CDF]
CDF
ABCD
ACE
BCG
29
Fundamental Conditional Independencies
for a Decomposable LLM
1.
Let S be the set of indices in a branch of the maximum
spanning tree
2.
Remove each factor of S from the multigraph, M; the
resulting multigraph is M/S
3.
An FCI is determined as:
where C1, C2, …, Ck are the sets of factors in the
components of M/S
30
FCIs
[XY][XZ]
XY
X
XZ
S = {X}
M/S:
Y
Z
[Y⊗Z|X]
31
Collapsibility Conditions
Consider a conditional independence relationship
of the form
[C1 ⊗ C2|S].
If the levels of all factors in C1 are collapsed, then all
relationships among the remaining factors are
undistorted EXCEPT for relationships among factors
in S.
32
FCIs
[XY][XZ]
XY
X
XZ
S = {X}
M/S:
Y
Z
[Y⊗Z|X]
33
Example: Ob-Gyn Study
(Darrocca, et al., 1996)
n = 201 pregnant mothers
Variables:
E: EGA (Early, Late)
B: Bishop score (High, Low)
T: Treatment (Prostin, Placebo)
34
Example: Ob-Gyn Study
BISHOP SCORE (B)
High
EGA (E)
Low
EGA (E)
TREATMENT (T)
Early
Late
Early
Late
-----------------------------------------------------------------------------------------------------Prostin
34
24
27
21
Placebo
22
16
35
22
Best-fitting model: [E][TB]
35
Example: Ob-Gyn Study
Generating Class: [E][TB]
Multigraph:
E
TB
FCI: [E⊗T,B]
36
Example: Ob-Gyn Study
Collapsed Table (collapse over EGA):
BISHOP SCORE (B)
High
Low Total
------------------------------------------------Prostin
58 (0.55)
48
106
TREATMENT (T)
Placebo
38 (0.40)
57
95
P = 0.037
37
Example: WSU-United Way Study
M: Marijuana (No, Yes)
A: Alcohol (No, Yes)
C: Cigarettes (No, Yes)
R: Race (Other, White)
S: Sex (Female, Male)
Observed cell frequencies (n = 2,276):
12
117
17
133
0
1
0
1
19
218
18
201
2
13
1
28
1
17
8
17
0
1
1
1
23
268
19
228
23
405
30
453
38
Example: WSU-United Way Study
Generating class: [ACE][MAC][MCG]
Multigraph, M:
ACE
MCG
MAC
39
Example: WSU-United Way Study
M:
S = {A,C}
ACE
M/S:
E
A C
MG
MCG
M
MAC
[E⊗M,G|A,C]
A = Alcohol
G = Gender
C = Cigarette
M = Marijuana
E = Ethnic
40
Example: WSU PASS Program
“Preparing for Academic Success”
GPA below 2.0 at the end of first quarter
41
Example: WSU PASS Program
Variables (n = 972):
FACTOR
LABEL
LEVELS
-------------------------------------------------------------------------------------------------------------Retention
R
1=No, 2=Yes
Cohort
C
1, 2, 3, 4
PASS Participation
P
1=No, 2=Yes
Ethnic Group
E
1=Caucasian, 2=African-American, 3=Other
Gender
G
1=Male, 2=Female
42
Example: WSU PASS Program
The best-fitting LLM has generating class
[EG][CP][RC][PG]
Multigraph, M:
G
EG
PG
P
RC
C
CP
43
Example: WSU PASS Program
M:
S = {C}
EG
PG
EG
PG
RC
CP
R
P
C
M
M/S
[E,G,P⊗R|C]
C = Cohort
E = Ethnic
G = Gender
P = PASS Participation
R = Retention
44
Example: Affinal Relations in Bosnia-Herzegovina
Data courtesy of Dr. Keith Doubt, Department of Sociology, Wittenberg University, Springfield, Ohio
N = 861 couples from Bosnia-Herzegovina are surveyed concerning affinal relations.
M:
L:
E:
S:
Marriage Type (traditional, elopement)
Location of Man and Wife (same, different)
Ethnicity (Bosniak, Serb, Croat)
Settlement (rural, urban)
Best-fitting model: [MLES]
Consider structural associations among M, L, and S for each ethnic group (E)
separately.
45
Example: Affinal Relations in Bosnia-Herzegovina
Bosniaks: [ML][LS]
Serbs:
[MS][SL]
Croats:
[M][L][S]
M: Marriage Type
L: Location of Man and Wife
S: Settlement
46
Conclusions


The generator multigraph uses mathematical graph theory to
analyze and interpret LLMs in a facile manner
Properties of the multigraph allow one to:
– Find all conditional independencies
– Determine all collapsibility conditions
REFERENCE
Khamis, H.J. (2011). The Association Graph and the Multigraph for Loglinear Models,
SAGE series Quantitative Applications in the Social Sciences, No. 167.
47
Without data, you’re just one
more
person with an opinion
48