Analysis of count data

advertisement
1
A. Analysis of count data
Introduction to log-linear models
2
Log-linear analysis
• Contingency-table analysis
• Categorical data analysis
• Discrete multivariate analysis (Bishop, Fienberg
and Holland, 1975)
• Analysis of cross-classified data
• Multivariate analysis of qualitative data
(Goodman, 1978)
• Count data analysis
3
Contrast Coding
Log-linear models for two-way tables
Saturated log-linear model:
ln λij  μ  μi  μ j  μij
A
μ
B
AB
Overall effect (level)
In case of 2 x 2 table:
μ
μ
A
i
B
j
Main effects
(marginal freq.)
μ
AB
ij
Interaction effect
4 observations
9 parameters
Normalisation constraints
4
Survey: leaving parental home in the Netherlands
Number leaving perantal home, by age and sex, 1961 birth cohort
Age
<20
>=20
Total
Censored
Total
Female
135
143
278
13
291
Sex
Male
74
178
252
40
292
Total
209
321
530
53
583
The survey (Sept. 1987 - Febr. 1988):
Sample of 583 young adults born in 1961
530 left home before survey
53 censored cases
5
Leaving home
Descriptive statistics
• Counts
• Percentages
Age
<20
>=20
Total
Female
48.6
51.4
100.0
Sex
Male
29.4
70.6
100.0
Total
39.4
60.6
100.0
Female
64.6
44.5
52.5
Sex
Male
35.4
55.5
47.5
Total
100.0
100.0
100.0
• Odds of leaving home early rather than late
Sex
Female
Male
Total
Odds
0.9441 0.4157 0.6511
Odds ratio (ref.cat: males): 2.271
Reference
category
6
Leaving home
Log-linear models for two-way tables
4 models
Model 1: Null model or overall effect model
All categories are equiprobable (an observation is equally likely
to fall into any cell)
ln λij  μ
for all i and j
 = 4.887 s.e. 0.0434
Exp(4.887) = 132.5
= 530/4
ij is expected count (frequency) in cell (ij): category i of
variable A (row) and category j of variable B (column)
7
Leaving home
μ  14  ln λij  ln 4  λij
ij
ij
Var μ  

1
4
2
1
1
1 
 1
132.50  132.50  132.50  132.50 
Var μ   Var [ln λij]  1/ λij
Where ij is a cell frequency generated by a Poisson
process and Var[aX] = a2 Var[X] where a is a constant (e.g.
Fingleton, 1984, p. 29)
μ   ln λij  Var μ   Var[  ln λij]  
ij
ij
1
4
1
4
1
4
  ln
2
ij
λ
ij
8
Leaving home
Log-linear models for two-way tables
Model 2: B null model
Categories of variable B (sex) are equiprobable within levels of
variable A (age)
ln λij  μ  μi
A
GLIM
estimate
4.649
μ
μ
μ
A
1
A
2
s.e.
Parameter
0.06914
Overall effect
0.0000
0.4291
for all j
Exp(parameter)
104.5
TIME(1)
0.08886
TIME(2)
1.536
9
Leaving home
Log-linear models for two-way tables
Model 3: B null model
Categories of variable A (age) are equiprobable within levels of
variable B (sex)
ln ij    Bj
SPSS
μ
μ
μ
A
1
A
2
estimate
5.773
s.e.
0.0558
-0.4283
0.0888
0.0000
for all j
Parameter
Exp(parameter)
Overall effect
321.5
TIME(1)
TIME(2)
0.6516
10
Leaving home
Log-linear models for two-way tables
Model 4: independence model (unsaturated model)
Categories of variable B (sex) are not equiprobable but the
probability is independent of levels of variable A (time)
ln ij    iA  Bj
ij  exp[   iA  Bj ]  iA Bj
estimate
GLIM

 A2
 B2
4.697
0.429
-0.098
s.e.
Parameter
0.0806 Overall effect
0.0889 TIME(2)
0.0870 SEX(2)
Exp(parameter)
109.62
1.536
0.906
11
Leaving home
LOG-LINEAR MODEL: predictions
Females leaving home early: 109.62
Females leaving home late:
109.62 * 1.536 = 168.37
Males leaving home early:
109.62 * 0.906 = 99.37
Males leaving home late:
109.62 * 1.536 * 0.906 = 152.63
12
Leaving home
SPSS
Parameter Estimate
μ
μ
μ
μ
SE
1
5.0280
.0721
Overall effect
2
-.4291
.0889
Time(1)
3
.0000
.
Time(2)
4
.0982
.0870
Sex(1)
5
.0000
.
Sex (2)
A
1
A
2
B
1
μ
B
2
13
Leaving home
Log-linear models for two-way tables
Model 5: saturated model
The values of categories of variable B (sex) depend on levels of
variable A (time)
ln λij  μ  μi  μ j  μij
A
GLIM
estimate
4.905
μ
μ
μ
μ
B
AB
s.e.
0.08607
parameter
Overall effect
0.05757
0.1200
TIME(2)
-0.6012
0.1446
SEX(2)
0.8201
0.1831
TIME(2).SEX(2)
A
2
B
2
AB
22
14
SPSS
Parameter Estimate
1
5.1846
μ
μ
μ
μ
μ
SE Parameter
.0748 Overall effect
A
1
A
2
2
-.8738
.1379 Time(1)
3
.0000
.
4
-.2183
.1121 Sex(1)
5
.0000
.
6
.8164
.1827 Time(1) * Sex(1)
7
.0000
.
Time(1) * Sex(2)
8
.0000
.
Time(2) * Sex(1)
9
.0000
.
Time(2) * Sex(2)
Time(2)
B
1
B
2
μ
μ
μ
μ
Leaving home
AB
11
AB
12
AB
21
AB
22
Sex(2)
15
Leaving home
LOG-LINEAR MODEL: predictions
Expected frequencies
Fem_<20
Mal_<20
Fem_>20
Mal_>20
Observed
F11
135
F12
74
F21
143
F22
178
Model 1 Model 2 Model 3 Model 4 Model 5
132.50 104.50 139.00 109.63 135.00
132.50 104.50 126.00 99.37 74.00
132.50 160.50 139.00 168.37 143.00
132.50 160.50 126.00 152.63 178.00
D:\s\1\liebr\2_2\2_2.wq2
16
Relation log-linear model and Poisson
regression model
ln λij  μ  μi  μ j  μij
A
B
AB
ln λ     x   x   x
ij
x ,x , x
1i
2j
3ij
0
1
1i
2
2j
3
3ij
are dummy variables (0 if i or j is equal to
1and1 if i or j equal to 2) and interaction
variable is x3ij  x1i * x 2j
17
Observed
Age
< 20
> 20
M
135
143
278
F
74
178
252
209
321
530
18
Model 1: Null Model
Age
< 20
> 20
M
132.5
132.5
265
F
132.5
132.5
265
265
265
530
ln ij    
A
i
19
Model 2: B Null Model (sex equiprobable)
Age
< 20
> 20
M
104.5
160.5
265
F
104.5
160.5
265
209
321
530
ln ij    
A
i
20
Model 3: A Null Model (age equiprobable)
Age
< 20
> 20
M
139
139
278
F
126
126
252
265
265
530
ln ij    
B
j
21
Model 4: Independence Model (no interaction)
Age
< 20
> 20
M
109.63
168.37
278
F
99.37
152.63
252
209
321
530
ln ij      
A
i
B
j
22
Model 5: Saturated Model
Age
< 20
> 20
M
135
143
278
F
74
178
252
209
321
530
ln ij        
A
i
B
j
AB
ij
23
Log-linear model
fit a model to a table of frequencies
Data: survey of political attitudes of British electors
OBSERVED FREQUENCIES FOR VOTE
by gender
Gender
Party
Male
Female
Total
Conservative
279
352
631
Labour
335
291
626
Total
614
643
1257
Source: Payne, C. (1977) The log-linear model for contingency. In: C.O. Muircheartaigh and C. Payne eds. The
analysis of survey data. Vol 2: Model fitting, Wiley, New York, pp. 105-144 [data p. 106].(from Butler and Stokes,
‘Political change in Britain’, Macmillan, 2nd edidition, 1974)
24
The classical approach
Geometric means (Birch, 1963)
Effect coding (mean is ref. Cat.)
Birch, M.W. (1963) ‘Maximum likelihood in three-way contingency
tables’,J. Royal Stat. Soc. (B), 25:220-233
25
Political attitudes
The basic model
Logarithm of frequencies
Gender
Party
Male
Female
Conservative
5.6312
5.8636
Labour
5.8141
5.6733
Total
11.4453 11.5370
Overall effect
: 22.98/4 = 5.7456
Effect of party
: Conservative
Labour
Effect of gender : Male
Female
Total
11.4948
11.4875
22.9823
: 11.49/2 - 5.7456 = 0.0018
: 11.49/2 - 5.7456 = -0.0018
: 11.44/2 - 5.7456 = -0.0229
: 11.54/2 - 5.7456 = 0.0229
Interaction effects: Gender-Party interaction effect
Male conservative
: 5.6312 - 5.7456 - 0.0018 + 0.0229 = -0.0933
Female conservative
: 5.8636 - 5.7456 - 0.0018 - 0.0229 = 0.0933
Male labour
: 5.8141 - 5.7456 + 0.0018 + 0.0229 = 0.0933
Female labour
: 5.6733 - 5.7456 + 0.0018 - 0.0229 = -0.0933
26
Political attitudes
The basic model (Effect Coding: Mean)
Birch, M.W. (1963) ‘Maximum likelihood in three-way contingency tables’,J. Royal Stat. Soc. (B), 25:220-233
Main effect
Party effect
Conservative
Labour
Gender effect
Male
Female
Gender-Party interaction
Male conservative
Female conservative
Male labour
Female labour
5.7456
0.0018
-0.0018
-0.0229
0.0229
-0.0933
0.0933
0.0933
-0.0933
μ
μ
Coding: effect coding
A
μ  0
A
i
μ
B
μ
AB
μ  0
B
j
ij
i
i
j
j
μ  μ  0
AB
i
ij
AB
ij
j
Parameters are subject to constraints: normalisation constraints
Only first-order contrasts can be estimated:
μ -μ
A
A
2
1
27
Political attitudes
The basic model (GLIM)
Main effect
Party effect
Conservative
Labour
Gender effect
Male
Female
Gender-Party interaction
Male conservative
Female conservative
Male labour
Female labour
Estimate
5.6310
S.E.
0.0599
μ
0.0000
0.1829
.
0.0811
μ
0.0000
0.2324
.
0.0802
μ
0.0000
0.0000
0.0000
-0.3732
.
.
.
0.1133
μ
A
i
B
j
AB
ij
28
Political attitudes
The basic model (SPSS)
Main effect
Party effect
Conservative
Labour
Gender effect
Male
Female
Gender-Party interaction
Male conservative
Female conservative
Male labour
Female labour
Asymptotic 95% CI
Lower
Upper
Estimate
SE
5.6750
0.0586
5.56
5.79
0.1900
0.0000
0.0792
.
0.03
.
0.35
.
0.1406
0.0000
0.0801
.
-0.02
.
0.30
.
-0.3726
0.0000
0.0000
0.0000
0.1133
.
.
.
-0.59
.
.
.
-0.15
.
.
.
29
Political attitudes
The basic model (1)
ln λij  μ  μi  μ j  μij
A
B
AB
λ  exp[μ  μ  μ  μ ]
ij
A
B
AB
i
j
ij
ln 11 = 5.7456 + 0.0018 - 0.0229 - 0.0933 = 5.6312
ln 12 = 5.7456 + 0.0018 + 0.0229 + 0.0933 = 5.8636
ln 21 = 5.7456 - 0.0018 - 0.0229 + 0.0933 = 5.8142
ln 22 = 5.7456 - 0.0018 + 0.0229 - 0.0933 = 5.6734
Logarithm of frequencies
Gender
Party
Male
Female
Conservative
5.6312
5.8636
Labour
5.8141
5.6733
Total
11.4453 11.5370
Total
11.4948
11.4875
22.9823
30
The design-matrix approach
31
I. Design matrix: Effect Coding
unsaturated log-linear model
ln λij  μ  μi  μ j
A
 ln λ11  1
ln  1
 λ12   
ln λ21 1

 
ln λ22 1
Y  Xμ
B
1 0 1
1 0 0
0 1 1
0 1 0

μ
u 
0  A 
u
1



1
 u A2 
0  B 
  u1 
1 B
 u 2 
X'X Y
-1
Number of parameters exceeds number of equations  need for additional equations
(X’X)-1 is singular  identify linear dependencies
32
I. Design matrix
unsaturated log-linear model
ln λij  μ  μi  μ j
A
μ -μ
A
A
2
1
B
μ -μ
B
B
2
1
(additional eq.)
Coding!
 ln λ11  1 1 1
 ln  1 1 - 1  u 
 λ12   
 u1A 
ln λ21 1 - 1 1  B 

 
  u1 
ln λ22 1 - 1 - 1
33
3 unknowns  3 equations
1  u 
 ln λ11 1 1
ln   1 1  1  A 
 λ12  
 u1B 
ln λ21 1  1 1   u1 
 u  1
 A   1
u1B  
 u1  1

1
1 1

1 1
1
1
 ln λ11
ln 
 λ12 
ln λ21
u1   0 0.5 0.5 ln λ11
 A    0.5


0
0.5
ln λ12
u1  


 u1B   0.5 - 0.5
0  ln λ21
A
where
ln λij  μ  μi  μ j
A
B
is the frequency predicted by the model
34
Political attitudes
OBSERVED FREQUENCIES FOR VOTE
by gender
Gender
Party
Male
Female
Total
Conservative
279
352
631
Labour
335
291
626
Total
614
643
1257
F1  631 614  308.22

λ11
F1
1257
F
F2 

λ21
F1
631
F


643  322.78
λ
F
1257
F
F2
λ22  F2 
1
2
12

F

F

626
614  305.78
1257
626
643  320.22
1257
PREDICTED FREQUENCIES FOR VOTE
by gender
Gender
Party
Male
Female
Total
Conservative
308.22
322.78
631.00
Labour
305.78
320.22
626.00
Total
614.00
643.00 1257.00
35
Political attitudes
A
u1   0 0.5 0.5  ln λ11
u1   0 0.5 0.5 ln 308.22
 A   0.5
 ln 
 A   0.5
 ln 322.78
0
0.5

u
λ
12
1
0
0.5
  


u1  


B
B
 u1  0.5 - 0.5
0  ln λ21
 u1  0.5 - 0.5
0  ln 305.78
A
A
u1   0
 A   0.5
u1  
 u1B  0.5
0.5
0
- 0.5
0.5 5.7308  5.74995 
- 0.5 5.7770   0.00395 

 

0  5.7229 - 0.02310
 τ   exp[ u ]  314.17
 A   exp[ A ]  1.0040
u1  
 τ1  

B
A
 τ1  exp[ u1 ] 0.9772
A B

λ11 τ τ1 τ1 
A
B

[
1/
]
λ21 τ τ1 τ1 
314.17*1.0040*0.9772 = 308.23
314.17*[1/1.0040]*0.9772 = 305.78
36
Design matrix
Saturated log-linear model
ln λij  μ  μi  μ j  μij
A
μ -μ
μ -μ
AB
AB
12
11
A
A
2
1
B
AB
μ -μ
μ -μ
B
B
2
1
AB
AB
21
11
μ μ
AB
AB
22
11
 ln λ11  1 1 1 1  u 
ln  1 1 - 1 - 1  A 
 λ12   
  u1 
ln λ21 1 - 1 1 - 1  u1B 

 
  AB
ln
1
1
1
1
 λ22 
 u11 
37
Political attitudes
 u  1
 A  1
 u1   
 u1B  1
 AB 
u11  1
1
1 -1 -1

-1 1 -1

-1 -1 1
1
1
-1
 ln λ11 
ln 
 λ12 
ln λ21


ln
 λ22
0.25 0.25
 u  0.25 0.25
 A  0.25 0.25 - 0.30 - 0.25
 u1   

B
 u1  0.25 - 0.25 0.25 - 0.25
 AB 

0.25
0
.
25
0
.
25
0
.
25

u11  
5.6312  5.74555
5.8636  0.00185



5.8141 - 0.02290

 

5.6733
0.09330

 

λ11  exp[ μ  μ1  μ1  μ11 ]  exp[5.7456+0.0018-0.0229-0.0933] = exp[5.6312] = 279
A
B
AB
λ21  exp[ μ  μ2  μ1  μ21 ]  exp[5.7456-0.0018-0.0229+0.0933] = 335
A
B
AB
38
LOG-LINEAR MODEL: expected frequencies
Type of model
Overall
Party
Gender
Observed
Model 1
Model 2
Model 3
Mal_Cons
F11
F11
279
314.25
315.50
307.00
Fem_Cons
F12
352
314.25
315.50
321.50
Mal_Labour F21
335
314.25
313.00
307.00
Fem_Labour F22
291
314.25
313.00
321.50
-------------------------------------------------------------------------Chi-square
11.58
11.54
10.9
Degrees of freedom
3
2
2
Political attitudes
Unsatur.
Model 4
308.22
322.78
305.78
320.22
Saturated
Model 5
279.00
352.00
335.00
291.00
10.89
1
0
0
LOG-LINEAR MODEL: Parameters (EFFECT CODING: first category = 0)
A. Additive model
Type of model
Overall
Party
Gender
Unsatur.
Satur.
Main effect
5.7502
5.7542
5.7269
5.7308
5.6312
Gender effect
0.0000
0.0000
0.0461
0.0462
0.2324
Party effect
0.0000
-0.0080
0.0000
-0.0080
0.1829
Gender-Party interaction effect
0.0000
0.0000
0.0000
0.0000
-0.3732
B. Multiplicative model [exp(u)]
Type of model
Main effect
Gender effect
Party effect
Gender-Party interaction effect
Overall
Party
Gender
Unsatur.
Satur.
314.2500 315.5001 307.0007 308.2157 278.9967
0.0000
1.0000
1.0472
1.0472
1.2616
0.0000
0.9920
1.0000
0.9920
1.2007
0.0000
1.0000
1.0000
1.0000
0.6885
39
Other Ways of Restricting
II. Design Matrix: Contrast Coding
40
III. Design matrix: other restrictions on parameters
saturated log-linear model
ln λij  μ  μi  μ j  μij
A
μ 0
A
2
μ 0
B
2
 ln λ11  1
ln  1
 λ12   
ln λ21 1

 
ln
 λ22 1
B
AB
μ μ μ 0
AB
AB
AB
12
21
22
1 1 1  u 
1 0 0  u1A 
 
0 1 0  u1B 
 
0 0 0 u11AB
(SPSS)
41
Political attitudes
u 
 A
 u1  
 u1B 
 AB
u11 
0
0

0

1
0
1
0
-1
1
0 -1

1 -1

-1 1
0
 ln 279  0
 ln 352  0


 ln 335 0

 
ln
291

 1
0
1
0
-1
1
0 -1

1 -1

-1 1
0
Coding 2 Coding 1
(SPSS) (Birch)
5.6750 5.7456
Main effect
Party effect
Conservative
0.1900 0.0019
Labour
0.0000 -0.0019
Gender effect
Male
0.1406 -0.0229
Female
0.0000 0.0229
Gender-Party interaction
Male conservative
-0.3726 -0.0933
Female conservative 0.0000 0.0933
5.6312
5.8636


5.8141


5.6733


 5.6750 
 0.1900 


 0.1406 


0.3726


42
Political attitudes
u 
 A
 u1  
 u1B 
 AB
u11 
0
0

0

1
0
1
0
-1
1
0 -1

1 -1

-1 1
0
 ln 279  0
 ln 352  0


 ln 335 0

 
ln
291

 1
0
1
0
-1
1
0 -1

1 -1

-1 1
0
5.6312
5.8636


5.8141


5.6733


Coding 2 Coding 1
(SPSS) (Birch)
5.6750 5.7456
Main effect
Party effect
Conservative
0.1900
Labour
0.0000
Gender effect
Male
0.1406
Female
0.0000
Gender-Party interaction
Male conservative
-0.3726
Female conservative 0.0000
Male labour
0.0000
Female labour
0.0000
0.0019
-0.0019
-0.0229
0.0229
-0.0933
0.0933
0.0933
-0.0933
 5.6750 
 0.1900 


 0.1406 


0.3726


43
Political attitudes
OBSERVED FREQUENCIES FOR VOTE BY SEX
Sex
Party
Male Female
Total
Conservative
279
352
631
Labour
335
291
626
Total
614
643
1257
Parameter estimates
Main effect
Party effect
Conservative
Labour
Gender effect
Male
Female
Gender-Party interaction
Male conservative
Female conservative
Male labour
Female labour
Contrast coding
(SPSS)
exp(mu)
mu
291.49
5.6750
Contrast coding
(GLIM)
exp(mu)
mu
279.00
5.6312
Effect coding
(Birch)
exp(mu)
mu
312.80
5.7456
0.1900
0.0000
1.2092
1.0000
0.0000
0.1829
1.0000 0.0019
1.2007 -0.0019
1.0019
0.9982
0.1406
0.0000
1.1510
1.0000
0.0000
0.2324
1.0000 -0.0229
1.2616 0.0229
0.9774
1.0232
-0.3726
0.0000
0.0000
0.0000
0.6889
1.0000
1.0000
1.0000
0.0000
0.0000
0.0000
-0.3732
1.0000 -0.0933
1.0000 0.0933
1.0000 0.0933
0.6885 -0.0933
0.9109
1.0978
1.0978
0.9109
44
Political attitudes
Parameters estimates and standard error
Main effect
Party effect
Conservative
Labour
Gender effect
Male
Female
Gender-Party interaction
Male conservative
Female conservative
Male labour
Female labour
Contrast coding
(SPSS)
Param
s.e.
5.6750
0.0586
Contrast coding
(GLIM)
Param
s.e.
5.6312
0.0599
0.1900
0.0000
0.0792
0.0000
0.1829
0.1406
0.0000
0.0801
-0.3726
0.0000
0.0000
0.0000
0.1133
.
.
.
.
.
0.0000
0.2324
0.0000
0.0000
0.0000
-0.3732
.
0.0811
.
0.0802
.
.
.
0.1133
45
Political attitudes
Prediction of counts or frequencies:
A. Effect coding
279 = 312.80 * 0.97736 * 1.00185 * 0.91092
352 = 312.80 * 1.02316 * 1.00185 * 1.09779
335 = 312.80 * 0.97736 * 0.99815 * 1.09779
291 = 312.80 * 1.02316 * 0.99815 * 0.91092
B. Contrast coding: GLIM
291 = 279 * 1.2616 * 1.2007 * 0.6885 (females voting labour)
279 = 279 * 1
*1
*1
(males voting conservative = ref.cat)
352 = 279 * 1.2616 * 1
*1
(females voting conservative)
335 = 279 * 1
* 1.2007 * 1
(males voting labour)
C. Contrast coding: SPSS (SPSS adds 0.5 to observed values )
279.5 = 291.5 * 1.15096 * 1.20925 * 0.68894
352.5 = 291.5 * 1
* 1.20925 * 1
291.5 = 291.5 * 1
*1
* 1 (females voting labour = ref.cat)
335.5 = 291.5 * 1.15096 * 1
*1
Download