Bias reduction in generalized nonlinear models Ioannis Kosmidis and David Firth

Bias reduction in generalized nonlinear models

Ioannis Kosmidis and

David Firth

Department of Statistics

JSM 2009

Reduction of the bias

Generalized nonlinear models

Illustration

Generalized linear models

Outline

1


2


3

Illustration

4


Kosmidis, I.




Illustration


Bias reduction in estimation

Bias reduction in estimation

In regular parametric models the maximum likelihood estimator consistent and the expansion of its bias has the form

ˆ is

E ( ˆ − β

0

) = b

1

( β

0

)

+ n b

2

( β

0

) n 2

+ b

3

( β

0

) n 3

+ . . . .

Firth (1993): Adjust the score functions U t to

U t

∗

= U t

+ A t

( t = 1 , . . . , p ) .

For appropriate functions A t estimators

˜ with no O ( n

− 1

, U t

∗

= 0 ( t

) bias term.

= 1 , . . . , p ) results to

Mehrabi & Mathhews (1995), Heinze & Schemper (2002;2005), Bull et al (2002;2007) and others.

→ ML estimates are not required.

→ Estimators with “better” properties.

Kosmidis, I.




Illustration


Exponential family of distributions


Adjusted score functions for GNMs

Implementation


Random variable Y from the exponential family of distributions: f ( y ; θ ) = exp y T θ − b ( θ )

+ c ( y, λ ) ,

λ where the dispersion λ is assumed known.

σ

2 d b ( θ )

µ = E ( Y ; θ ) = , d θ

= var ( Y ; θ ) = λ d

2 b ( θ ) d θ 2

.

Kosmidis, I.




Illustration


Generalized nonlinear model




Implementation

y

1

, . . . , y n realizations of independent random variables Y

1

, . . . , Y n from the exponential family.

For a generalized nonlinear model (GNM) g ( µ r

) = η r

( β ) ( r = 1 , . . . , n ) , where g is the link function and η r

: < p → < .

Score functions:

U t

= n

X r =1 w r

( y r d r

− µ r

) x rt

( t = 1 , . . . , p ) , where w r

= d

2 r

/σ

2

, d r

= d µ r

/ d η r and x rt

= ∂η r

/∂β t

.

Kosmidis, I.




Illustration





Implementation


Bias-reducing adjusted score functions (Kosmidis & Firth, 2008)

U

∗ t

= n

X w r d r r =1 y r

+

1

2 h r d

0 r w r

+ d r tr F

− 1 D 2

( η r

; β ) − µ r x rt

,

→ d

0 r

= d

2

µ r

/ d η

2 r and h r is the r -th diagonal of H = XF

− 1

X

T

W ,

Kosmidis, I.




Illustration





Implementation


Bias-reducing adjusted score functions (Kosmidis & Firth, 2008)



U

∗ t

= n

X w r d r r =1 z





 y r



+

1

2 h r d 0 r w r y

∗ r



}| {

+ d r tr F

− 1

D

2

( η r

; β ) − µ r





 x rt



,

→ d

0 r

= d 2 µ r

/ d η 2 r and h r is the r -th diagonal of H = XF

− 1 X T W ,

Kosmidis, I.


Implementation



Illustration





Implementation

→ Replace y r with the adjusted responses y

∗ r least squares (IWLS).

in iterative reweighted

In terms of modified working observations

ζ

∗ r

= ζ r

− ξ r

( r = 1 , . . . , n ) , where

→ ζ r

= P p t =1

β t x rt

+ ( y r

− µ r

) /d r is the working observation for maximum likelihood, and

→ ξ r

= − d

0 r h r

/ (2 w r d r

) − tr F

− 1 D 2

( η r

; β ) / 2 .

Kosmidis, I.




Illustration


Modified working observations




Implementation

Modified iterative re-weighted least squares

Iteration

β

( j +1)

= ( X

T

W

( j )

X )

− 1

X

T

W

( j )

( ζ

( j )

− ξ

( j )

) ,

The O ( n

− 1

) bias of the maximum likelihood estimator for generalized nonlinear models is b

1

/n = ( X

T

W X )

− 1

X

T

W ξ

(Cook et al. 1986; Cordeiro & McCullagh, 1991).

Thus the iteration takes the form

β

( j +1)

= ˆ

( j )

− b

1 , ( j )

/n .

Kosmidis, I.




Illustration


Illustration: The RC(1) model


Data: Periodontal condition and calcium intake

Two-way cross-classification by factors X and Y with R and S levels, respectively. Entries are realizations of independent Poisson random variables.

The RC(1) model (Goodman, 1979, 1985) log µ rs

= λ + λ

X r

+ λ

Y s

+ ργ r

δ s

.

Modified working observation:

ζ

∗ rs

= ζ rs

+ h rs

2 µ rs

+ γ r

C ( ρ, δ s

) + δ s

C ( ρ, γ r

) + ρC ( γ r

, δ s

) , where for any given pair of unconstrainted parameters κ and ν ,

C ( κ, ν ) denotes the corresponding element of F

− 1 ; if either of κ or

ν is constrained, C ( κ, ν ) = 0 .

Kosmidis, I.




Illustration



Data: Periodontal condition and calcium intake

Data: Peridontal condition and calcium intake

Table: Periodontal condition and calcium intake (Goodman, 1981, Table 1.a.)

Periodontal condition

A

B

C

D

Calcium intake level

1 2 3 4

5 3 10 11

4 5 8 6

26 11 3

23 11 1

6

2

For identifiability, set λ X

1

= λ Y

1

= 0 , γ

1

= δ

1

= − 2 and γ

4

= δ

Simulate 250000 data sets under the maximum likelihood fit.

4

= 2 .

Estimate biases, mean squared errors and coverage of nominally 95%

Wald-type confidence intervals.

Kosmidis, I.


Results

Table: Results for the dental health data. For the method of maximum likelihood, simulation results are all conditional upon finiteness of the estimates (about 3.5% of the simulated datasets resulted in infinite MLEs).

γ

2

γ

3

δ

2

δ

3

λ

λ

λ

λ

λ

λ

λ

X

4

Y

2

X

2

X

3

Y

3

Y

4

ρ

Estimates

ML BR

2 .

31 2 .

35

− 0 .

13 − 0 .

13

0 .

55 0 .

52

0 .

07 0 .

10

− 0 .

53 − 0 .

53

− 1 .

17 − 1 .

05

− 0 .

80 − 0 .

75

− 0 .

20 − 0 .

18

− 1 .

55 − 1 .

48

0 .

90 0 .

91

− 1 .

16 − 1 .

11

3 .

11 2 .

84

−

Bias ( × 10

2

)

37

ML

1 .

88

.

42 −

0

4

BR

− 4 .

19 − 0 .

25

0 .

48 − 0 .

01

2 .

97 − 0 .

22

− 5 .

00

− 0 .

59

16 .

81

− 7 .

21

0

0

.

.

.

.

02

06

1 .

19

22

− 1 .

76 − 0 .

03

− 6 .

08 0 .

68

1 .

43

− 7 .

00 − 0 .

27

92

Simulation results

MSE ( × 10 )

ML

2 .

28

1 .

45

1 .

50

3 .

34

1 .

00

6 .

55

3 .

19

0 .

05

6 .

30

6 .

94

9 .

00

35 .

55

BR

1 .

49

1 .

16

1 .

18

1 .

87

0 .

80

2 .

80

1 .

69

0 .

03

5 .

37

5 .

34

7 .

20

18 .

13

Coverage ( % )

ML BR

96 .

9

95 .

8

95 .

7

97 .

1

96 .

0

97 .

1

97 .

3

95 .

5

95 .

6

93 .

8

94 .

7

92 .

8

96 .

6

96 .

2

96 .

0

97 .

3

96 .

4

96 .

1

97 .

3

95 .

0

96 .

7

95 .

2

96 .

4

92 .

4 ml , maximum likelihood; br , bias-reduced; mse , mean squared error.



Illustration


Bias-reducing penalized likelihoods

Penalized likelihood interpretation of bias reduction

Firth (1993): for a generalized linear model with canonical link, the adjusted scores, correspond to penalization of the likelihood by the

Jeffreys (1946) invariant prior.

In models with non-canonical link and p ≥ 2 , there need not exist such a penalized likelihood interpretation.

Kosmidis, I.




Illustration




Theorem

Existence of penalized likelihoods

In the class of generalized linear models, there exists a penalized log-likelihood l

∗ such that ∇ l

∗

( β ) ≡ U

∗

( β ) , for all possible specifications of design matrix X , if and only if the inverse link derivatives d r

= 1 /g

0 r

( µ r

) satisfy d r

≡ α r

σ

2 ω

( r = 1 , . . . , n ) , where α r

( r = 1 , . . . , n ) and ω do not depend on the model parameters.

Kosmidis, I.




Illustration




The form of the penalized likelihoods for bias-reduction

When d r

≡ α r

σ

2 ω

( r = 1 , . . . , n ) for some ω and α ,

 l

∗

( β ) =















 l l (

(

β

β

)

)

+

+

1

4

X log κ

2 ,r

( β ) h r r

ω

4 ω − 2 log | F ( β ) |

(

(

ω

ω

= 1

= 1

/

/

2)

2) .

→ The canonical link is the special case ω = 1 .

→ With ω = 0 , the condition refers to models with identity-link.

→ For ω = 1 / 2 the working weights, and hence F , H , do not depend on β .

→ If ω / [0 , 1 / 2] , bias-reduction also increases the value of | F ( β ) | .

Thus, approximate confidence ellipsoids, based on asymptotic normality of the estimator, are reduced in volume.

Kosmidis, I.




Illustration



Discussion

A computational and conceptual framework for bias-reduction in generalized nonlinear models.

λ was assumed known but this is not restricting the applicability of the results. The dispersion is usually estimated separately from the parameters β .

Bias reduction can be beneficial in terms of the properties of the resultant estimators.

Bias and point estimation are not strong statistical principles:

→ Bias relates to parameterization thus improving the bias violates exact equivariance under reparameterization.

→ Reduction in bias can be accompanied by inflation in variance.

Kosmidis, I.


Some references

Bull, S. B., Mak, C. and Greenwood, C. (2002).

A modified score function estimator for multinomial logistic regression in small samples.

Computational

Statistics and Data Analysis 39 , 57–74.

Cordeiro, G. M. and McCullagh, P. (1991). Bias correction in generalized linear models.

Journal of the Royal Statistical Society, Series B: Methodological , 53 ,

629–643.

Cook, R. D., Tsai, C.-L. and Wei, B. C. (1986).

Bias in nonlinear regression.

Biometrika 73 , 615–623.

Firth, D. (1993). Bias reduction of maximum likelihood estimates.

Biometrika ,

80 , 27–38.

Goodman, L. A. (1985).

The analysis of cross-classified data having ordered and/or unordered categories: Association models, correlation models, and asymmetry models for contingency tables with or without missing entries.

The Annals of Statistics 13 , 10–69.

Heinze, G. and M. Schemper (2002). A solution to the problem of separation in logistic regression.

Statistics in Medicine , 21 , 2409–2419.

Kosmidis, I. and D. Firth (2008).

Bias reduction in exponential family nonlinear models. Technical Report 8-5, CRiSM working paper series, University of

Warwick. Accepted for publication in Biometrika .

Wei, B. (1997).

Exponential Family Nonlinear Models .

New York: Springer-

Verlag Inc.

Bias reduction in generalized nonlinear models Ioannis Kosmidis and David Firth

Bias reduction in generalized nonlinear models

Outline

Bias reduction in estimation

Exponential family of distributions

Generalized nonlinear model

Adjusted score functions for GNMs

Adjusted score functions for GNMs

Implementation

Modified working observations

Illustration: The RC(1) model

Data: Peridontal condition and calcium intake

Results

Penalized likelihood interpretation of bias reduction

Penalized likelihood interpretation of bias reduction

Penalized likelihood interpretation of bias reduction

Discussion

Some references

Related documents

Products

Support

Bias reduction in generalized nonlinear models Ioannis Kosmidis and David Firth