Ioannis Kosmidis and
David Firth
Department of Statistics
JSM 2009
1
2
3
4
Kosmidis, I.
Bias reduction in generalized nonlinear models
In regular parametric models the maximum likelihood estimator consistent and the expansion of its bias has the form
ˆ is
E ( ˆ − β
0
) = b
1
( β
0
)
+ n b
2
( β
0
) n 2
+ b
3
( β
0
) n 3
+ . . . .
Firth (1993): Adjust the score functions U t to
U t
∗
= U t
+ A t
( t = 1 , . . . , p ) .
For appropriate functions A t estimators
˜ with no O ( n
− 1
, U t
∗
= 0 ( t
) bias term.
= 1 , . . . , p ) results to
Mehrabi & Mathhews (1995), Heinze & Schemper (2002;2005), Bull et al (2002;2007) and others.
→ ML estimates are not required.
→ Estimators with “better” properties.
Kosmidis, I.
Bias reduction in generalized nonlinear models
Exponential family of distributions
Adjusted score functions for GNMs
Random variable Y from the exponential family of distributions: f ( y ; θ ) = exp y T θ − b ( θ )
+ c ( y, λ ) ,
λ where the dispersion λ is assumed known.
σ
2 d b ( θ )
µ = E ( Y ; θ ) = , d θ
= var ( Y ; θ ) = λ d
2 b ( θ ) d θ 2
.
Kosmidis, I.
Bias reduction in generalized nonlinear models
Exponential family of distributions
Adjusted score functions for GNMs
y
1
, . . . , y n realizations of independent random variables Y
1
, . . . , Y n from the exponential family.
For a generalized nonlinear model (GNM) g ( µ r
) = η r
( β ) ( r = 1 , . . . , n ) , where g is the link function and η r
: < p → < .
Score functions:
U t
= n
X r =1 w r
( y r d r
− µ r
) x rt
( t = 1 , . . . , p ) , where w r
= d
2 r
/σ
2
, d r
= d µ r
/ d η r and x rt
= ∂η r
/∂β t
.
Kosmidis, I.
Bias reduction in generalized nonlinear models
Exponential family of distributions
Adjusted score functions for GNMs
Bias-reducing adjusted score functions (Kosmidis & Firth, 2008)
U
∗ t
= n
X w r d r r =1 y r
+
1
2 h r d
0 r w r
+ d r tr F
− 1 D 2
( η r
; β ) − µ r x rt
,
→ d
0 r
= d
2
µ r
/ d η
2 r and h r is the r -th diagonal of H = XF
− 1
X
T
W ,
Kosmidis, I.
Bias reduction in generalized nonlinear models
Exponential family of distributions
Adjusted score functions for GNMs
Bias-reducing adjusted score functions (Kosmidis & Firth, 2008)
U
∗ t
= n
X w r d r r =1 z
y r
+
1
2 h r d 0 r w r y
∗ r
}| {
+ d r tr F
− 1
D
2
( η r
; β ) − µ r
x rt
,
→ d
0 r
= d 2 µ r
/ d η 2 r and h r is the r -th diagonal of H = XF
− 1 X T W ,
Kosmidis, I.
Bias reduction in generalized nonlinear models
Exponential family of distributions
Adjusted score functions for GNMs
→ Replace y r with the adjusted responses y
∗ r least squares (IWLS).
in iterative reweighted
In terms of modified working observations
ζ
∗ r
= ζ r
− ξ r
( r = 1 , . . . , n ) , where
→ ζ r
= P p t =1
β t x rt
+ ( y r
− µ r
) /d r is the working observation for maximum likelihood, and
→ ξ r
= − d
0 r h r
/ (2 w r d r
) − tr F
− 1 D 2
( η r
; β ) / 2 .
Kosmidis, I.
Bias reduction in generalized nonlinear models
Exponential family of distributions
Adjusted score functions for GNMs
Modified iterative re-weighted least squares
Iteration
β
( j +1)
= ( X
T
W
( j )
X )
− 1
X
T
W
( j )
( ζ
( j )
− ξ
( j )
) ,
The O ( n
− 1
) bias of the maximum likelihood estimator for generalized nonlinear models is b
1
/n = ( X
T
W X )
− 1
X
T
W ξ
(Cook et al. 1986; Cordeiro & McCullagh, 1991).
Thus the iteration takes the form
β
( j +1)
= ˆ
( j )
− b
1 , ( j )
/n .
Kosmidis, I.
Bias reduction in generalized nonlinear models
Data: Periodontal condition and calcium intake
Two-way cross-classification by factors X and Y with R and S levels, respectively. Entries are realizations of independent Poisson random variables.
The RC(1) model (Goodman, 1979, 1985) log µ rs
= λ + λ
X r
+ λ
Y s
+ ργ r
δ s
.
Modified working observation:
ζ
∗ rs
= ζ rs
+ h rs
2 µ rs
+ γ r
C ( ρ, δ s
) + δ s
C ( ρ, γ r
) + ρC ( γ r
, δ s
) , where for any given pair of unconstrainted parameters κ and ν ,
C ( κ, ν ) denotes the corresponding element of F
− 1 ; if either of κ or
ν is constrained, C ( κ, ν ) = 0 .
Kosmidis, I.
Bias reduction in generalized nonlinear models
Data: Periodontal condition and calcium intake
Table: Periodontal condition and calcium intake (Goodman, 1981, Table 1.a.)
Periodontal condition
A
B
C
D
Calcium intake level
1 2 3 4
5 3 10 11
4 5 8 6
26 11 3
23 11 1
6
2
For identifiability, set λ X
1
= λ Y
1
= 0 , γ
1
= δ
1
= − 2 and γ
4
= δ
Simulate 250000 data sets under the maximum likelihood fit.
4
= 2 .
Estimate biases, mean squared errors and coverage of nominally 95%
Wald-type confidence intervals.
Kosmidis, I.
Bias reduction in generalized nonlinear models
Table: Results for the dental health data. For the method of maximum likelihood, simulation results are all conditional upon finiteness of the estimates (about 3.5% of the simulated datasets resulted in infinite MLEs).
γ
2
γ
3
δ
2
δ
3
λ
λ
λ
λ
λ
λ
λ
X
4
Y
2
X
2
X
3
Y
3
Y
4
ρ
Estimates
ML BR
2 .
31 2 .
35
− 0 .
13 − 0 .
13
0 .
55 0 .
52
0 .
07 0 .
10
− 0 .
53 − 0 .
53
− 1 .
17 − 1 .
05
− 0 .
80 − 0 .
75
− 0 .
20 − 0 .
18
− 1 .
55 − 1 .
48
0 .
90 0 .
91
− 1 .
16 − 1 .
11
3 .
11 2 .
84
−
Bias ( × 10
2
)
37
ML
1 .
88
.
42 −
0
4
BR
− 4 .
19 − 0 .
25
0 .
48 − 0 .
01
2 .
97 − 0 .
22
− 5 .
00
− 0 .
59
16 .
81
− 7 .
21
0
0
.
.
.
.
02
06
1 .
19
22
− 1 .
76 − 0 .
03
− 6 .
08 0 .
68
1 .
43
− 7 .
00 − 0 .
27
92
Simulation results
MSE ( × 10 )
ML
2 .
28
1 .
45
1 .
50
3 .
34
1 .
00
6 .
55
3 .
19
0 .
05
6 .
30
6 .
94
9 .
00
35 .
55
BR
1 .
49
1 .
16
1 .
18
1 .
87
0 .
80
2 .
80
1 .
69
0 .
03
5 .
37
5 .
34
7 .
20
18 .
13
Coverage ( % )
ML BR
96 .
9
95 .
8
95 .
7
97 .
1
96 .
0
97 .
1
97 .
3
95 .
5
95 .
6
93 .
8
94 .
7
92 .
8
96 .
6
96 .
2
96 .
0
97 .
3
96 .
4
96 .
1
97 .
3
95 .
0
96 .
7
95 .
2
96 .
4
92 .
4 ml , maximum likelihood; br , bias-reduced; mse , mean squared error.
Bias-reducing penalized likelihoods
Firth (1993): for a generalized linear model with canonical link, the adjusted scores, correspond to penalization of the likelihood by the
Jeffreys (1946) invariant prior.
In models with non-canonical link and p ≥ 2 , there need not exist such a penalized likelihood interpretation.
Kosmidis, I.
Bias reduction in generalized nonlinear models
Bias-reducing penalized likelihoods
Theorem
Existence of penalized likelihoods
In the class of generalized linear models, there exists a penalized log-likelihood l
∗ such that ∇ l
∗
( β ) ≡ U
∗
( β ) , for all possible specifications of design matrix X , if and only if the inverse link derivatives d r
= 1 /g
0 r
( µ r
) satisfy d r
≡ α r
σ
2 ω
( r = 1 , . . . , n ) , where α r
( r = 1 , . . . , n ) and ω do not depend on the model parameters.
Kosmidis, I.
Bias reduction in generalized nonlinear models
Bias-reducing penalized likelihoods
The form of the penalized likelihoods for bias-reduction
When d r
≡ α r
σ
2 ω
( r = 1 , . . . , n ) for some ω and α ,
l
∗
( β ) =
l l (
(
β
β
)
)
+
+
1
4
X log κ
2 ,r
( β ) h r r
ω
4 ω − 2 log | F ( β ) |
(
(
ω
ω
= 1
= 1
/
/
2)
2) .
→ The canonical link is the special case ω = 1 .
→ With ω = 0 , the condition refers to models with identity-link.
→ For ω = 1 / 2 the working weights, and hence F , H , do not depend on β .
→ If ω / [0 , 1 / 2] , bias-reduction also increases the value of | F ( β ) | .
Thus, approximate confidence ellipsoids, based on asymptotic normality of the estimator, are reduced in volume.
Kosmidis, I.
Bias reduction in generalized nonlinear models
Bias-reducing penalized likelihoods
A computational and conceptual framework for bias-reduction in generalized nonlinear models.
λ was assumed known but this is not restricting the applicability of the results. The dispersion is usually estimated separately from the parameters β .
Bias reduction can be beneficial in terms of the properties of the resultant estimators.
Bias and point estimation are not strong statistical principles:
→ Bias relates to parameterization thus improving the bias violates exact equivariance under reparameterization.
→ Reduction in bias can be accompanied by inflation in variance.
Kosmidis, I.
Bias reduction in generalized nonlinear models
Bull, S. B., Mak, C. and Greenwood, C. (2002).
A modified score function estimator for multinomial logistic regression in small samples.
Computational
Statistics and Data Analysis 39 , 57–74.
Cordeiro, G. M. and McCullagh, P. (1991). Bias correction in generalized linear models.
Journal of the Royal Statistical Society, Series B: Methodological , 53 ,
629–643.
Cook, R. D., Tsai, C.-L. and Wei, B. C. (1986).
Bias in nonlinear regression.
Biometrika 73 , 615–623.
Firth, D. (1993). Bias reduction of maximum likelihood estimates.
Biometrika ,
80 , 27–38.
Goodman, L. A. (1985).
The analysis of cross-classified data having ordered and/or unordered categories: Association models, correlation models, and asymmetry models for contingency tables with or without missing entries.
The Annals of Statistics 13 , 10–69.
Heinze, G. and M. Schemper (2002). A solution to the problem of separation in logistic regression.
Statistics in Medicine , 21 , 2409–2419.
Kosmidis, I. and D. Firth (2008).
Bias reduction in exponential family nonlinear models. Technical Report 8-5, CRiSM working paper series, University of
Warwick. Accepted for publication in Biometrika .
Wei, B. (1997).
Exponential Family Nonlinear Models .
New York: Springer-
Verlag Inc.