Bias reduction in generalized nonlinear models Ioannis Kosmidis and David Firth

advertisement

Bias reduction in generalized nonlinear models

Ioannis Kosmidis and

David Firth

Department of Statistics

JSM 2009

Reduction of the bias

Generalized nonlinear models

Illustration

Generalized linear models

Outline

1

Reduction of the bias

2

Generalized nonlinear models

3

Illustration

4

Generalized linear models

Kosmidis, I.

Bias reduction in generalized nonlinear models

Reduction of the bias

Generalized nonlinear models

Illustration

Generalized linear models

Bias reduction in estimation

Bias reduction in estimation

In regular parametric models the maximum likelihood estimator consistent and the expansion of its bias has the form

ˆ is

E ( ˆ − β

0

) = b

1

( β

0

)

+ n b

2

( β

0

) n 2

+ b

3

( β

0

) n 3

+ . . . .

Firth (1993): Adjust the score functions U t to

U t

= U t

+ A t

( t = 1 , . . . , p ) .

For appropriate functions A t estimators

˜ with no O ( n

− 1

, U t

= 0 ( t

) bias term.

= 1 , . . . , p ) results to

Mehrabi & Mathhews (1995), Heinze & Schemper (2002;2005), Bull et al (2002;2007) and others.

→ ML estimates are not required.

→ Estimators with “better” properties.

Kosmidis, I.

Bias reduction in generalized nonlinear models

Reduction of the bias

Generalized nonlinear models

Illustration

Generalized linear models

Exponential family of distributions

Generalized nonlinear models

Adjusted score functions for GNMs

Implementation

Exponential family of distributions

Random variable Y from the exponential family of distributions: f ( y ; θ ) = exp y T θ − b ( θ )

+ c ( y, λ ) ,

λ where the dispersion λ is assumed known.

σ

2 d b ( θ )

µ = E ( Y ; θ ) = , d θ

= var ( Y ; θ ) = λ d

2 b ( θ ) d θ 2

.

Kosmidis, I.

Bias reduction in generalized nonlinear models

Reduction of the bias

Generalized nonlinear models

Illustration

Generalized linear models

Generalized nonlinear model

Exponential family of distributions

Generalized nonlinear models

Adjusted score functions for GNMs

Implementation

y

1

, . . . , y n realizations of independent random variables Y

1

, . . . , Y n from the exponential family.

For a generalized nonlinear model (GNM) g ( µ r

) = η r

( β ) ( r = 1 , . . . , n ) , where g is the link function and η r

: < p → < .

Score functions:

U t

= n

X r =1 w r

( y r d r

− µ r

) x rt

( t = 1 , . . . , p ) , where w r

= d

2 r

2

, d r

= d µ r

/ d η r and x rt

= ∂η r

/∂β t

.

Kosmidis, I.

Bias reduction in generalized nonlinear models

Reduction of the bias

Generalized nonlinear models

Illustration

Generalized linear models

Exponential family of distributions

Generalized nonlinear models

Adjusted score functions for GNMs

Implementation

Adjusted score functions for GNMs

Bias-reducing adjusted score functions (Kosmidis & Firth, 2008)

U

∗ t

= n

X w r d r r =1 y r

+

1

2 h r d

0 r w r

+ d r tr F

− 1 D 2

( η r

; β ) − µ r x rt

,

→ d

0 r

= d

2

µ r

/ d η

2 r and h r is the r -th diagonal of H = XF

− 1

X

T

W ,

Kosmidis, I.

Bias reduction in generalized nonlinear models

Reduction of the bias

Generalized nonlinear models

Illustration

Generalized linear models

Exponential family of distributions

Generalized nonlinear models

Adjusted score functions for GNMs

Implementation

Adjusted score functions for GNMs

Bias-reducing adjusted score functions (Kosmidis & Firth, 2008)

U

∗ t

= n

X w r d r r =1 z

 y r

+

1

2 h r d 0 r w r y

∗ r

}| {

+ d r tr F

− 1

D

2

( η r

; β ) − µ r

 x rt

,

→ d

0 r

= d 2 µ r

/ d η 2 r and h r is the r -th diagonal of H = XF

− 1 X T W ,

Kosmidis, I.

Bias reduction in generalized nonlinear models

Implementation

Reduction of the bias

Generalized nonlinear models

Illustration

Generalized linear models

Exponential family of distributions

Generalized nonlinear models

Adjusted score functions for GNMs

Implementation

→ Replace y r with the adjusted responses y

∗ r least squares (IWLS).

in iterative reweighted

In terms of modified working observations

ζ

∗ r

= ζ r

− ξ r

( r = 1 , . . . , n ) , where

→ ζ r

= P p t =1

β t x rt

+ ( y r

− µ r

) /d r is the working observation for maximum likelihood, and

→ ξ r

= − d

0 r h r

/ (2 w r d r

) − tr F

− 1 D 2

( η r

; β ) / 2 .

Kosmidis, I.

Bias reduction in generalized nonlinear models

Reduction of the bias

Generalized nonlinear models

Illustration

Generalized linear models

Modified working observations

Exponential family of distributions

Generalized nonlinear models

Adjusted score functions for GNMs

Implementation

Modified iterative re-weighted least squares

Iteration

β

( j +1)

= ( X

T

W

( j )

X )

− 1

X

T

W

( j )

( ζ

( j )

− ξ

( j )

) ,

The O ( n

− 1

) bias of the maximum likelihood estimator for generalized nonlinear models is b

1

/n = ( X

T

W X )

− 1

X

T

W ξ

(Cook et al. 1986; Cordeiro & McCullagh, 1991).

Thus the iteration takes the form

β

( j +1)

= ˆ

( j )

− b

1 , ( j )

/n .

Kosmidis, I.

Bias reduction in generalized nonlinear models

Reduction of the bias

Generalized nonlinear models

Illustration

Generalized linear models

Illustration: The RC(1) model

Illustration: The RC(1) model

Data: Periodontal condition and calcium intake

Two-way cross-classification by factors X and Y with R and S levels, respectively. Entries are realizations of independent Poisson random variables.

The RC(1) model (Goodman, 1979, 1985) log µ rs

= λ + λ

X r

+ λ

Y s

+ ργ r

δ s

.

Modified working observation:

ζ

∗ rs

= ζ rs

+ h rs

2 µ rs

+ γ r

C ( ρ, δ s

) + δ s

C ( ρ, γ r

) + ρC ( γ r

, δ s

) , where for any given pair of unconstrainted parameters κ and ν ,

C ( κ, ν ) denotes the corresponding element of F

− 1 ; if either of κ or

ν is constrained, C ( κ, ν ) = 0 .

Kosmidis, I.

Bias reduction in generalized nonlinear models

Reduction of the bias

Generalized nonlinear models

Illustration

Generalized linear models

Illustration: The RC(1) model

Data: Periodontal condition and calcium intake

Data: Peridontal condition and calcium intake

Table: Periodontal condition and calcium intake (Goodman, 1981, Table 1.a.)

Periodontal condition

A

B

C

D

Calcium intake level

1 2 3 4

5 3 10 11

4 5 8 6

26 11 3

23 11 1

6

2

For identifiability, set λ X

1

= λ Y

1

= 0 , γ

1

= δ

1

= − 2 and γ

4

= δ

Simulate 250000 data sets under the maximum likelihood fit.

4

= 2 .

Estimate biases, mean squared errors and coverage of nominally 95%

Wald-type confidence intervals.

Kosmidis, I.

Bias reduction in generalized nonlinear models

Results

Table: Results for the dental health data. For the method of maximum likelihood, simulation results are all conditional upon finiteness of the estimates (about 3.5% of the simulated datasets resulted in infinite MLEs).

γ

2

γ

3

δ

2

δ

3

λ

λ

λ

λ

λ

λ

λ

X

4

Y

2

X

2

X

3

Y

3

Y

4

ρ

Estimates

ML BR

2 .

31 2 .

35

− 0 .

13 − 0 .

13

0 .

55 0 .

52

0 .

07 0 .

10

− 0 .

53 − 0 .

53

− 1 .

17 − 1 .

05

− 0 .

80 − 0 .

75

− 0 .

20 − 0 .

18

− 1 .

55 − 1 .

48

0 .

90 0 .

91

− 1 .

16 − 1 .

11

3 .

11 2 .

84

Bias ( × 10

2

)

37

ML

1 .

88

.

42 −

0

4

BR

− 4 .

19 − 0 .

25

0 .

48 − 0 .

01

2 .

97 − 0 .

22

− 5 .

00

− 0 .

59

16 .

81

− 7 .

21

0

0

.

.

.

.

02

06

1 .

19

22

− 1 .

76 − 0 .

03

− 6 .

08 0 .

68

1 .

43

− 7 .

00 − 0 .

27

92

Simulation results

MSE ( × 10 )

ML

2 .

28

1 .

45

1 .

50

3 .

34

1 .

00

6 .

55

3 .

19

0 .

05

6 .

30

6 .

94

9 .

00

35 .

55

BR

1 .

49

1 .

16

1 .

18

1 .

87

0 .

80

2 .

80

1 .

69

0 .

03

5 .

37

5 .

34

7 .

20

18 .

13

Coverage ( % )

ML BR

96 .

9

95 .

8

95 .

7

97 .

1

96 .

0

97 .

1

97 .

3

95 .

5

95 .

6

93 .

8

94 .

7

92 .

8

96 .

6

96 .

2

96 .

0

97 .

3

96 .

4

96 .

1

97 .

3

95 .

0

96 .

7

95 .

2

96 .

4

92 .

4 ml , maximum likelihood; br , bias-reduced; mse , mean squared error.

Reduction of the bias

Generalized nonlinear models

Illustration

Generalized linear models

Bias-reducing penalized likelihoods

Penalized likelihood interpretation of bias reduction

Firth (1993): for a generalized linear model with canonical link, the adjusted scores, correspond to penalization of the likelihood by the

Jeffreys (1946) invariant prior.

In models with non-canonical link and p ≥ 2 , there need not exist such a penalized likelihood interpretation.

Kosmidis, I.

Bias reduction in generalized nonlinear models

Reduction of the bias

Generalized nonlinear models

Illustration

Generalized linear models

Bias-reducing penalized likelihoods

Penalized likelihood interpretation of bias reduction

Theorem

Existence of penalized likelihoods

In the class of generalized linear models, there exists a penalized log-likelihood l

∗ such that ∇ l

( β ) ≡ U

( β ) , for all possible specifications of design matrix X , if and only if the inverse link derivatives d r

= 1 /g

0 r

( µ r

) satisfy d r

≡ α r

σ

2 ω

( r = 1 , . . . , n ) , where α r

( r = 1 , . . . , n ) and ω do not depend on the model parameters.

Kosmidis, I.

Bias reduction in generalized nonlinear models

Reduction of the bias

Generalized nonlinear models

Illustration

Generalized linear models

Bias-reducing penalized likelihoods

Penalized likelihood interpretation of bias reduction

The form of the penalized likelihoods for bias-reduction

When d r

≡ α r

σ

2 ω

( r = 1 , . . . , n ) for some ω and α ,

 l

( β ) =

 l l (

(

β

β

)

)

+

+

1

4

X log κ

2 ,r

( β ) h r r

ω

4 ω − 2 log | F ( β ) |

(

(

ω

ω

= 1

= 1

/

/

2)

2) .

→ The canonical link is the special case ω = 1 .

→ With ω = 0 , the condition refers to models with identity-link.

→ For ω = 1 / 2 the working weights, and hence F , H , do not depend on β .

→ If ω / [0 , 1 / 2] , bias-reduction also increases the value of | F ( β ) | .

Thus, approximate confidence ellipsoids, based on asymptotic normality of the estimator, are reduced in volume.

Kosmidis, I.

Bias reduction in generalized nonlinear models

Reduction of the bias

Generalized nonlinear models

Illustration

Generalized linear models

Bias-reducing penalized likelihoods

Discussion

A computational and conceptual framework for bias-reduction in generalized nonlinear models.

λ was assumed known but this is not restricting the applicability of the results. The dispersion is usually estimated separately from the parameters β .

Bias reduction can be beneficial in terms of the properties of the resultant estimators.

Bias and point estimation are not strong statistical principles:

→ Bias relates to parameterization thus improving the bias violates exact equivariance under reparameterization.

→ Reduction in bias can be accompanied by inflation in variance.

Kosmidis, I.

Bias reduction in generalized nonlinear models

Some references

Bull, S. B., Mak, C. and Greenwood, C. (2002).

A modified score function estimator for multinomial logistic regression in small samples.

Computational

Statistics and Data Analysis 39 , 57–74.

Cordeiro, G. M. and McCullagh, P. (1991). Bias correction in generalized linear models.

Journal of the Royal Statistical Society, Series B: Methodological , 53 ,

629–643.

Cook, R. D., Tsai, C.-L. and Wei, B. C. (1986).

Bias in nonlinear regression.

Biometrika 73 , 615–623.

Firth, D. (1993). Bias reduction of maximum likelihood estimates.

Biometrika ,

80 , 27–38.

Goodman, L. A. (1985).

The analysis of cross-classified data having ordered and/or unordered categories: Association models, correlation models, and asymmetry models for contingency tables with or without missing entries.

The Annals of Statistics 13 , 10–69.

Heinze, G. and M. Schemper (2002). A solution to the problem of separation in logistic regression.

Statistics in Medicine , 21 , 2409–2419.

Kosmidis, I. and D. Firth (2008).

Bias reduction in exponential family nonlinear models. Technical Report 8-5, CRiSM working paper series, University of

Warwick. Accepted for publication in Biometrika .

Wei, B. (1997).

Exponential Family Nonlinear Models .

New York: Springer-

Verlag Inc.

Download