The Multigraph for Loglinear Models Harry Khamis Statistical Consulting Center Wright State University Dayton, Ohio, USA OUTLINE 1. LOGLINEAR MODEL (LLM) - two-way table - three-way table - examples 2. MULTIGRAPH - construction - maximum spanning tree - conditional independencies - collapsibility 3. EXAMPLES 2 Loglinear Model Goal Identify the structure of associations among a set of categorical variables. 3 LLM: two variables Y 1 2 3 … J Total -----------------------------------------------------------------------------1 n11 n12 n13 … n1J n1+ 2 n21 n22 n23 … n2J n2+ . . . . . . X . . . . I nI1 Total n+1 . . . . . nI2 n+2 . nI3 n+3 . nIJ n+J . nI+ n … … 4 LLM: two variables Example Survey of High School Seniors in Dayton, Ohio Collaboration: WSU Boonshoft School of Medicine and United Health Services of Dayton Marijuana Use? Yes No Total --------------------------------------------------------------------Yes 914 581 1495 Cigarette Use? No 46 735 781 Total 960 1316 2276 5 LLM: two variables Two discrete variables, X and Y Model of independence: generating class is [X][Y] 6 LLM: two variables LLM of independence: log ij X i Y j where i X i Y j 0 j 7 LLM: two variables Saturated LLM: generating class is [XY]: log ij i j ij X Y XY where i X i j j Y ij XY i ij 0 XY j Note : ij Odds Ratio XY 8 LLM: two variables Generating Probabilistic Interpretation Class Model ------------------------------------------------------------------------------------X and Y independent [X][Y] pij = pi+p+j X and Y dependent [XY] pij 9 LLM: three variables Example: Dayton High School Data Alcohol Cigarette Marijuana Use Use Use Yes No ---------------------------------------------------------------------------------Yes Yes 911 538 No 44 456 No Yes No 3 2 43 279 10 LLM: three variables Saturated LLM, [XYZ]: log ijk i j k ij ik jk ijk X Y Z XY XZ YZ XYZ where i X i j Y j i XY ij j XY ij ... XYZ ijk 0 k 11 LLM: three variables Generating Probabilistic Interpretation Class Model -----------------------------------------------------------------------------------mutual independence [X][Y][Z] pijk = pi++p+j+p++k joint independence [XZ][Y] pijk = pi+kp+j+ conditional independence [XY][XZ] pijk = pij+pi+k/pi++ homogeneous association* [XY][XZ][YZ] saturated model [XYZ] *nondecomposable * pijk model 12 Decomposable LLMs closed-form expression for MLEs closed-form expression for asymptotic variances (Lee, 1977) conditional G2 statistic simplifies allow for causal interpretations easier to interpret the LLM 13 14 3 Categorical Variables: X, Y, and Z If [X⊗Y] and [Y⊗Z] then [X⊗Z] FALSE! 15 LLM: three variables Generating Probabilistic Interpretation Class Model -----------------------------------------------------------------------------------mutual independence [X][Y][Z] pijk = pi++p+j+p++k joint independence [XZ][Y] pijk = pi+kp+j+ conditional independence [XY][XZ] pijk = pij+pi+k/pi++ homogeneous association [XY][XZ][YZ] pijk = ψijφikωjk saturated model [XYZ] pijk 16 3 Categorical Variables: X, Y, and Z If [Y⊗Z] for all X = 1, 2, …. then [Y⊗Z] FALSE! 17 LLM: three variables Generating Probabilistic Interpretation Class Model -----------------------------------------------------------------------------------mutual independence [X][Y][Z] pijk = pi++p+j+p++k joint independence [XZ][Y] pijk = pi+kp+j+ conditional independence [XY][XZ] pijk = pij+pi+k/pi++ homogeneous association [XY][XZ][YZ] pijk = ψijφikωjk saturated model [XYZ] pijk 18 3 Categorical Variables: X, Y, and Z If [Y⊗Z] then [Y⊗Z] for all X = 1, 2, 3, … FALSE! 19 Which Treatment is Better? TRIAL 1 CURED? Yes No Total ---------------------------------------------A 40 (.20) 160 200 TREATMENT B 30 (.15) 170 200 TRIAL 2 CURED? Yes No Total ---------------------------------------85 (.85) 15 100 300 (.75) 100 400 Combine TRIALS 1 and 2: CURED? Yes No Total ----------------------------------------------A 125 (.42) 175 300 TREATMENT B 330 (.55) 270 600 “Ask Marilyn”, PARADE section, DDN, pages 6-7, April 28, 1996 20 Florida Homicide Convictions Resulting in Death Penalty ML Radelet and GL Pierce, Florida Law Review 43: 1-34, 1991 Death Penalty Yes No ---------------------------------------White 53 (0.11) 430 Defendant’s Race Black 15 (0.08) 176 White Victim Death Penalty Yes No ------------------------------------White 53 (0.11) 414 Defendant’s Race Black 11 (0.23) 37 Black Victim Death Penalty Yes No -------------------------------------White 0 (0.00) 16 Black 4 (0.03) 139 21 Multigraph Representation of LLMs Vertices = generators of the LLM Multiedges = edges that are equal in number to the number of indices shared by the two vertices being joined 22 Multigraph: three variables [XY][XZ] XY XZ 23 Examples of Multigraphs [AS][ACR][MCS][MAC] AS MAC ACR MCS 24 Examples of Multigraphs [ABCD][ACE][BCG][CDF] CDF ABCD ACE BCG 25 Maximum Spanning Tree The maximum spanning tree of a multigraph M: • tree (connected graph with no circuits) • includes each vertex • sum of the edges is maximum 26 Examples of maximum spanning trees [XY][XZ] XY XZ 27 Examples of maximum spanning trees [AS][ACR][MCS][MAC] AS ACR MAC MCS 28 Examples of maximum spanning trees [ABCD][ACE][BCG][CDF] CDF ABCD ACE BCG 29 Fundamental Conditional Independencies for a Decomposable LLM 1. Let S be the set of indices in a branch of the maximum spanning tree 2. Remove each factor of S from the multigraph, M; the resulting multigraph is M/S 3. An FCI is determined as: where C1, C2, …, Ck are the sets of factors in the components of M/S 30 FCIs [XY][XZ] XY X XZ S = {X} M/S: Y Z [Y⊗Z|X] 31 Collapsibility Conditions Consider a conditional independence relationship of the form [C1 ⊗ C2|S]. If the levels of all factors in C1 are collapsed, then all relationships among the remaining factors are undistorted EXCEPT for relationships among factors in S. 32 FCIs [XY][XZ] XY X XZ S = {X} M/S: Y Z [Y⊗Z|X] 33 Example: Ob-Gyn Study (Darrocca, et al., 1996) n = 201 pregnant mothers Variables: E: EGA (Early, Late) B: Bishop score (High, Low) T: Treatment (Prostin, Placebo) 34 Example: Ob-Gyn Study BISHOP SCORE (B) High EGA (E) Low EGA (E) TREATMENT (T) Early Late Early Late -----------------------------------------------------------------------------------------------------Prostin 34 24 27 21 Placebo 22 16 35 22 Best-fitting model: [E][TB] 35 Example: Ob-Gyn Study Generating Class: [E][TB] Multigraph: E TB FCI: [E⊗T,B] 36 Example: Ob-Gyn Study Collapsed Table (collapse over EGA): BISHOP SCORE (B) High Low Total ------------------------------------------------Prostin 58 (0.55) 48 106 TREATMENT (T) Placebo 38 (0.40) 57 95 P = 0.037 37 Example: WSU-United Way Study M: Marijuana (No, Yes) A: Alcohol (No, Yes) C: Cigarettes (No, Yes) R: Race (Other, White) S: Sex (Female, Male) Observed cell frequencies (n = 2,276): 12 117 17 133 0 1 0 1 19 218 18 201 2 13 1 28 1 17 8 17 0 1 1 1 23 268 19 228 23 405 30 453 38 Example: WSU-United Way Study Generating class: [ACE][MAC][MCG] Multigraph, M: ACE MCG MAC 39 Example: WSU-United Way Study M: S = {A,C} ACE M/S: E A C MG MCG M MAC [E⊗M,G|A,C] A = Alcohol G = Gender C = Cigarette M = Marijuana E = Ethnic 40 Example: WSU PASS Program “Preparing for Academic Success” GPA below 2.0 at the end of first quarter 41 Example: WSU PASS Program Variables (n = 972): FACTOR LABEL LEVELS -------------------------------------------------------------------------------------------------------------Retention R 1=No, 2=Yes Cohort C 1, 2, 3, 4 PASS Participation P 1=No, 2=Yes Ethnic Group E 1=Caucasian, 2=African-American, 3=Other Gender G 1=Male, 2=Female 42 Example: WSU PASS Program The best-fitting LLM has generating class [EG][CP][RC][PG] Multigraph, M: G EG PG P RC C CP 43 Example: WSU PASS Program M: S = {C} EG PG EG PG RC CP R P C M M/S [E,G,P⊗R|C] C = Cohort E = Ethnic G = Gender P = PASS Participation R = Retention 44 Example: Affinal Relations in Bosnia-Herzegovina Data courtesy of Dr. Keith Doubt, Department of Sociology, Wittenberg University, Springfield, Ohio N = 861 couples from Bosnia-Herzegovina are surveyed concerning affinal relations. M: L: E: S: Marriage Type (traditional, elopement) Location of Man and Wife (same, different) Ethnicity (Bosniak, Serb, Croat) Settlement (rural, urban) Best-fitting model: [MLES] Consider structural associations among M, L, and S for each ethnic group (E) separately. 45 Example: Affinal Relations in Bosnia-Herzegovina Bosniaks: [ML][LS] Serbs: [MS][SL] Croats: [M][L][S] M: Marriage Type L: Location of Man and Wife S: Settlement 46 Conclusions The generator multigraph uses mathematical graph theory to analyze and interpret LLMs in a facile manner Properties of the multigraph allow one to: – Find all conditional independencies – Determine all collapsibility conditions REFERENCE Khamis, H.J. (2011). The Association Graph and the Multigraph for Loglinear Models, SAGE series Quantitative Applications in the Social Sciences, No. 167. 47 Without data, you’re just one more person with an opinion 48