On iterative adjustment of responses for the Ioannis Kosmidis IWSM 2009

advertisement
On iterative adjustment of responses for the
reduction of bias in binary regression models
Ioannis Kosmidis
Research Fellow
Department of Statistics
IWSM 2009
Introduction
Parameter-dependent adjustments to the data
Illustration
Discussion and further work
Outline
1
Introduction
2
Parameter-dependent adjustments to the data
3
Illustration
4
Discussion and further work
Kosmidis, I.
On iterative adjustment of the responses
Introduction
Parameter-dependent adjustments to the data
Illustration
Discussion and further work
Constant adjustment schemes
Why adjust the binomial data?
To improve the frequentist properties of the estimators (especially
bias).
To avoid sparseness issues which may result to infinite maximum
likelihood estimates.
Kosmidis, I.
On iterative adjustment of the responses
Introduction
Parameter-dependent adjustments to the data
Illustration
Discussion and further work
Constant adjustment schemes
Constant adjustment schemes
Landmark studies:
→ Haldane (1955), Anscombe (1956): add 1/2 to the binomial
response y and 1 to the binomial totals m and then replace the
actual data with the adjusted in the usual log-odds estimator.
→ Simple logistic regressions: Hitchcock (1962), Gart et al. (1985).
Estimation can be conveniently performed by the following
procedure:
1
2
→
Calculate the value of the adjusted responses and totals.
Proceed with usual estimation methods, treating the adjusted data
as actual.
No constant adjustment scheme can be optimal for every possible
binary regression model, in terms of improving the frequentist
properties of the estimators.
Kosmidis, I.
On iterative adjustment of the responses
Introduction
Parameter-dependent adjustments to the data
Illustration
Discussion and further work
Constant adjustment schemes
Constant adjustment schemes
Estimation of the parameters β1 , . . . , βp of the general logistic
regression model
p
log
X
πr
= ηr =
βt xrt
1 − πr
t=1
(r = 1, . . . , n) ,
where xrt is the (r, t)th component of an n × p design matrix X.
Clog et al. (1991) developed an adjustment scheme where
Pn
p r=1 yr
p
∗
yr = yr + Pn
and m∗r = mr +
(r = 1, . . . , n) .
n
n r=1 mr
→
The resultant estimators are not invariant to different representations
of the data (for example, aggregated and disaggregated view).
Kosmidis, I.
On iterative adjustment of the responses
Introduction
Parameter-dependent adjustments to the data
Illustration
Discussion and further work
Parameter-dependent adjustments to the data
A set of bias-reducing pseudo-data representations
Local maximum likelihood fits on pseudo-data representations
Demonstration
Invariance of the estimates to the structure of the data
Bias-reducing adjusted score functions
Binomial observations y1 , . . . , yn with totals m1 , . . . , mn ,
respectively.
Binary regression:
g(πr ) = ηr =
p
X
βt xrt
(r = 1, . . . , n) ,
t=1
where g : [0, 1] → <.
Score functions
Ut =
n
X
wr
r=1
dr
(yr − mr πr ) xrt
(t = 1, . . . , p) ,
with dr = mr dπr /dηr and wr = d2r /{mr πr (1 − πr )}.
Kosmidis, I.
On iterative adjustment of the responses
Parameter-dependent adjustments to the data
A set of bias-reducing pseudo-data representations
Local maximum likelihood fits on pseudo-data representations
Demonstration
Invariance of the estimates to the structure of the data
Introduction
Parameter-dependent adjustments to the data
Illustration
Discussion and further work
Bias-reducing adjusted score functions
Estimators with second order bias by solving the adjusted score
equations Ut∗ = 0 (t = 1, . . . , p).
Bias-reducing adjusted score functions (Kosmidis & Firth, 2009)
n
X
wr
d0r
∗
Ut =
yr + hr
− mr πr xrt (t = 1, . . . , p) ,
d
2wr
r=1 r
with d0r = mr d2 πr /dηr2 and hr the rth diagonal element of
H = X(X T W X)−1 X T W .
→
Adjusted responses:
yr∗ = yr + hr
d0r
2wr
(r = 1, . . . , n) .
pseudo-data representation
Kosmidis, I.
On iterative adjustment of the responses
Introduction
Parameter-dependent adjustments to the data
Illustration
Discussion and further work
Parameter-dependent adjustments to the data
A set of bias-reducing pseudo-data representations
Local maximum likelihood fits on pseudo-data representations
Demonstration
Invariance of the estimates to the structure of the data
Parameter-dependent adjustments to the data
Use the adjusted responses yr + hr d0r /(2wr ) in existing maximum
likelihood implementations, iteratively.
Practical issues can arise relating to the sign of d0r :
possibly yr∗ > mr of yr∗ < 0 violating the range of the actual data
(0 ≤ yr ≤ mr ).
Kosmidis, I.
On iterative adjustment of the responses
Introduction
Parameter-dependent adjustments to the data
Illustration
Discussion and further work
Parameter-dependent adjustments to the data
A set of bias-reducing pseudo-data representations
Local maximum likelihood fits on pseudo-data representations
Demonstration
Invariance of the estimates to the structure of the data
Pseudo-data representations
Pseudo-data representation: the pair {y ∗ , m∗ }, where y ∗ is the
adjusted binomial response and m∗ the adjusted totals.
By the form of Ut∗ ,
Bias-reducing pseudo-data representation
d0
y+h
, m .
2w
adjusted score function
Kosmidis, I.
On iterative adjustment of the responses
Introduction
Parameter-dependent adjustments to the data
Illustration
Discussion and further work
Parameter-dependent adjustments to the data
A set of bias-reducing pseudo-data representations
Local maximum likelihood fits on pseudo-data representations
Demonstration
Invariance of the estimates to the structure of the data
A set of bias-reducing pseudo-data representations
A countable set of equivalent bias-reducing pseudo-data
representations, where any pseudo-data representation in the set can
be obtained from any other by the following operations:
→ add and subtract a quantity to either the adjusted responses or totals
→ move summands from the adjusted responses to the adjusted totals
after division by −π.
Special case (Firth, 1992): For logistic regressions
Ut∗ =
n X
1
yr + hr (1 − 2πr ) − mr πr xrt
2
r=1
(t = 1, . . . , p) .
→ bias-reducing pseudo-data representations:
{y + h(1 − 2π)/2, m}, {y + h/2, m + h}, etc.
Kosmidis, I.
On iterative adjustment of the responses
Introduction
Parameter-dependent adjustments to the data
Illustration
Discussion and further work
Parameter-dependent adjustments to the data
A set of bias-reducing pseudo-data representations
Local maximum likelihood fits on pseudo-data representations
Demonstration
Invariance of the estimates to the structure of the data
An appropriate pseudo-data representation
Within this set of bias-reducing pseudo-data representations:
Bias-reducing pseudo-data representation with 0 ≤ y ∗ ≤ m∗
md0
1
y + hπ 1 + 2 Id0 >0 ,
2
d
1
md0
m + h 1 + 2 (π − Id0 ≤0 )
,
2
d
where IE = 1 if E holds and IE = 0 else.
Kosmidis, I.
On iterative adjustment of the responses
Introduction
Parameter-dependent adjustments to the data
Illustration
Discussion and further work
Parameter-dependent adjustments to the data
A set of bias-reducing pseudo-data representations
Local maximum likelihood fits on pseudo-data representations
Demonstration
Invariance of the estimates to the structure of the data
Local maximum likelihood fits on pseudo-data
representations
Local maximum likelihood fits on pseudo-data representations
1
2
∗
Update to {yr,(j+1)
, m∗r,(j+1) } (r = 1, . . . , n) evaluating all the
quantities involved at β(j) .
∗
Use maximum likelihood to fit the model with responses yr,(j+1)
and
∗
totals mr,(j+1) (r = 1, . . . , n) with β(j) as starting value.
β(0) : the maximum likelihood estimates, possibly after adding c > 0
to the responses and 2c to the binomial totals.
Pp ∗
Convergence criterion:
Ut (β(j+1) ) ≤ , > 0
t=1
Kosmidis, I.
On iterative adjustment of the responses
Introduction
Parameter-dependent adjustments to the data
Illustration
Discussion and further work
Parameter-dependent adjustments to the data
A set of bias-reducing pseudo-data representations
Local maximum likelihood fits on pseudo-data representations
Demonstration
Invariance of the estimates to the structure of the data
Demonstration: The beetle mortality data
The beetle mortality data (Agresti, 2002,
Table 6.14)
Killed
Total
Log-dose
6
13
18
28
52
53
61
60
59
60
62
56
63
59
62
60
1.691
1.724
1.755
1.784
1.811
1.837
1.861
1.884
Kosmidis, I.
log[− log(1 − πr )] = β1 + β2 xr
On iterative adjustment of the responses
Demonstration
Current estimates
ML estimates
β1
β2
-39.5222
22.0147
Adjusted responses
Adjusted totals
Demonstration
Current estimates
β1
β2
ML estimates
Iteration
1
-39.5222
22.0147
-39.0581
21.7545
Adjusted responses
Adjusted totals
∗
y(1)
m∗(1)
6.1316
13.1500
18.1414
28.0900
52.1025
53.1620
61.1461
60.0375
59.2457
60.2640
62.2293
56.1372
63.1751
59.2824
62.2616
60.0697
Demonstration
Current estimates
β1
β2
ML estimates
Iteration
1
Iteration
2
-39.5222
22.0147
-39.0581
21.7545
-39.0469
21.7482
Adjusted responses
Adjusted totals
∗
y(1)
∗
y(2)
m∗(1)
m∗(2)
6.1316
13.1500
18.1414
28.0900
52.1025
53.1620
61.1461
60.0375
6.1321
13.1497
18.1406
28.0893
52.1005
53.1590
61.1484
60.0414
59.2457
60.2640
62.2293
56.1372
63.1751
59.2824
62.2616
60.0697
59.2464
60.2631
62.2277
56.1361
63.1715
59.2770
62.2653
60.0769
Demonstration
Current estimates
β1
β2
ML estimates
Iteration
1
Iteration
2
Iteration
3
...
-39.5222
22.0147
-39.0581
21.7545
-39.0469
21.7482
-39.0466
21.7481
...
...
Adjusted responses
∗
y(1)
∗
y(2)
∗
y(3)
6.1316
13.1500
18.1414
28.0900
52.1025
53.1620
61.1461
60.0375
6.1321
13.1497
18.1406
28.0893
52.1005
53.1590
61.1484
60.0414
6.1322
13.1497
18.1405
28.0893
52.1004
53.1589
61.1484
60.0415
Adjusted totals
...
m∗(1)
m∗(2)
m∗(3)
...
...
...
...
...
...
...
...
...
59.2457
60.2640
62.2293
56.1372
63.1751
59.2824
62.2616
60.0697
59.2464
60.2631
62.2277
56.1361
63.1715
59.2770
62.2653
60.0769
59.2464
60.2631
62.2277
56.1361
63.1715
59.2769
62.2654
60.0771
...
...
...
...
...
...
...
...
Introduction
Parameter-dependent adjustments to the data
Illustration
Discussion and further work
Parameter-dependent adjustments to the data
A set of bias-reducing pseudo-data representations
Local maximum likelihood fits on pseudo-data representations
Demonstration
Invariance of the estimates to the structure of the data
Demonstration
Estimates of α and β for the Beetle mortality data.
Estimates
Parameters
α
β
ML
BR
-39.5222 (3.2356)
22.0147 (1.7970)
-39.0466 (3.1919)
21.7481 (1.7723)
Kosmidis, I.
On iterative adjustment of the responses
Introduction
Parameter-dependent adjustments to the data
Illustration
Discussion and further work
Parameter-dependent adjustments to the data
A set of bias-reducing pseudo-data representations
Local maximum likelihood fits on pseudo-data representations
Demonstration
Invariance of the estimates to the structure of the data
Invariance of the estimates to the structure of the data
Response
y1
Total
m1
..
.
mn
yn
yr =
kr
X
zrs
Covariates
x1
Response
z11
xn
z1k1
zn1
(r = 1, . . . , n)
s=1
mr =
kr
X
znk1
lrs
Total
l11
..
.
l1k1
..
.
ln1
..
.
lnk1
Covariates
x1
x1
xn
xn
(r = 1, . . . , n)
s=1
0 ≤ zrs ≤ lrs
yr∗ =
kr
X
∗
zrs
and m∗r =
s=1
Kosmidis, I.
On iterative adjustment of the responses
kr
X
s=1
∗
lrs
Introduction
Parameter-dependent adjustments to the data
Illustration
Discussion and further work
A complete enumeration study
Results
A complete enumeration study
Binomial observations y1 , . . . , y5 , each with totals m, made
independently at the five design points x = (−2, −1, 0, 1, 2).
Binary regression model:
g(πr ) = β1 + β2 xr
(r = 1, . . . , 5) .
g(.): logit, probit and complementary log-log links.
The “true” parameter values are set to β1 = −1 and β2 = 1.5:
logit
probit
cloglog
π1
0.01799
0.00003
0.01815
π2
0.07586
0.00621
0.07881
π3
0.26894
0.15866
0.30780
π4
0.62246
0.69146
0.80770
π5
0.88080
0.97725
0.99938
→ increased probability of infinite maximum likelihood estimates.
Through complete enumeration for m = 4, 8, 16 calculate:
bias and mean squared error of the bias-reduced estimator, and
coverage of the nominally 95% Wald-type confidence interval.
Kosmidis, I.
On iterative adjustment of the responses
Results
The parenthesized probabilities refer to the event of encountering at least one infinite
ML estimate for the corresponding m and link function. All the quantities for the ML
estimator are calculated conditionally on the finiteness of its components.
Link
logit
probit
cloglog
Bias (×102 )
ML
BR
4
β1
-8.79
0.52
(0.1621)
β2
14.44
-0.13
8
β1
-12.94
-0.68
(0.0171)
β2
20.17
1.11
16
β1
-7.00
-0.19
(0.0002)
β2
10.55
0.29
4
β1
17.89
13.54
(0.5475)
β2
-18.84
-16.93
8
β1
0.80
3.24
(0.2296)
β2
6.08
-3.81
16
β1
-7.06
0.24
(0.0411)
β2
12.54
-0.17
4
β1
2.97
3.18
(0.3732)
β2
-2.93
-12.97
8
β1
-8.42
0.84
(0.1000)
β2
15.63
-5.40
16
β1
-6.45
0.17
(0.0071)
β2
13.13
-1.74
ML: maximum likelihood; BR: bias
m
Parameter
MSE (×10)
ML
BR
5.84
6.07
3.62
4.73
3.93
3.11
3.70
2.68
1.75
1.42
1.59 1.16
1.44 2.61
0.98 3.07
1.07
1.82
1.26 2.13
1.03
1.08
1.39
1.22
2.97
3.07
1.35
3.51
2.49
1.89
2.33
2.36
1.32
0.98
1.60
1.23
reduction.
Coverage
ML
BR
0.971
0.972
0.960
0.939
0.972
0.964
0.972
0.942
0.961
0.957
0.960
0.948
0.968
0.911
0.960
0.897
0.964
0.938
0.972
0.908
0.974
0.949
0.973
0.933
0.959
0.962
0.955
0.880
0.962
0.953
0.972
0.906
0.964
0.957
0.965
0.921
Results
The parenthesized probabilities refer to the event of encountering at least one infinite
ML estimate for the corresponding m and link function. All the quantities for the ML
estimator are calculated conditionally on the finiteness of its components.
Link
logit
probit
cloglog
Bias (×102 )
ML
BR
4
β1
-8.79
0.52
(0.1621)
β2
14.44
-0.13
8
β1
-12.94
-0.68
(0.0171)
β2
20.17
1.11
16
β1
-7.00
-0.19
(0.0002)
β2
10.55
0.29
4
β1
17.89
13.54
(0.5475)
β2
-18.84
-16.93
8
β1
0.80
3.24
(0.2296)
β2
6.08
-3.81
16
β1
-7.06
0.24
(0.0411)
β2
12.54
-0.17
4
β1
2.97
3.18
(0.3732)
β2
-2.93
-12.97
8
β1
-8.42
0.84
(0.1000)
β2
15.63
-5.40
16
β1
-6.45
0.17
(0.0071)
β2
13.13
-1.74
ML: maximum likelihood; BR: bias
m
Parameter
MSE (×10)
ML
BR
5.84
6.07
3.62
4.73
3.93
3.11
3.70
2.68
1.75
1.42
1.59 1.16
1.44 2.61
0.98 3.07
1.07
1.82
1.26 2.13
1.03
1.08
1.39
1.22
2.97
3.07
1.35
3.51
2.49
1.89
2.33
2.36
1.32
0.98
1.60
1.23
reduction.
Coverage
ML
BR
0.971
0.972
0.960
0.939
0.972
0.964
0.972
0.942
0.961
0.957
0.960
0.948
0.968
0.911
0.960
0.897
0.964
0.938
0.972
0.908
0.974
0.949
0.973
0.933
0.959
0.962
0.955
0.880
0.962
0.953
0.972
0.906
0.964
0.957
0.965
0.921
Results
The parenthesized probabilities refer to the event of encountering at least one infinite
ML estimate for the corresponding m and link function. All the quantities for the ML
estimator are calculated conditionally on the finiteness of its components.
Link
logit
probit
cloglog
Bias (×102 )
ML
BR
4
β1
-8.79
0.52
(0.1621)
β2
14.44
-0.13
8
β1
-12.94
-0.68
(0.0171)
β2
20.17
1.11
16
β1
-7.00
-0.19
(0.0002)
β2
10.55
0.29
4
β1
17.89
13.54
(0.5475)
β2
-18.84
-16.93
8
β1
0.80
3.24
(0.2296)
β2
6.08
-3.81
16
β1
-7.06
0.24
(0.0411)
β2
12.54
-0.17
4
β1
2.97
3.18
(0.3732)
β2
-2.93
-12.97
8
β1
-8.42
0.84
(0.1000)
β2
15.63
-5.40
16
β1
-6.45
0.17
(0.0071)
β2
13.13
-1.74
ML: maximum likelihood; BR: bias
m
Parameter
MSE (×10)
ML
BR
5.84
6.07
3.62
4.73
3.93
3.11
3.70
2.68
1.75
1.42
1.59 1.16
1.44 2.61
0.98 3.07
1.07
1.82
1.26 2.13
1.03
1.08
1.39
1.22
2.97
3.07
1.35
3.51
2.49
1.89
2.33
2.36
1.32
0.98
1.60
1.23
reduction.
Coverage
ML
BR
0.971
0.972
0.960
0.939
0.972
0.964
0.972
0.942
0.961
0.957
0.960
0.948
0.968
0.911
0.960
0.897
0.964
0.938
0.972
0.908
0.974
0.949
0.973
0.933
0.959
0.962
0.955
0.880
0.962
0.953
0.972
0.906
0.964
0.957
0.965
0.921
Shrinkage
Fitted probabilities using bias-reduced estimates versus fitted probabilities using
maximum likelihood estimates for the complementary log-log link and m = 4.
The marked point is (c, c) where c = 1 − exp(−1).
Introduction
Parameter-dependent adjustments to the data
Illustration
Discussion and further work
Discussion and further work
Bias-reduction can be easily applied using pseudo-data
representations, along with existing maximum likelihood fitting
procedures. The brglm R package implements this.
The bias-reduced estimator is invariant to the representation of the
binomial data.
The bias-reduced estimates are always finite and because of their
improved statistical properties their routine use in applications is
appealing.
Kosmidis, I.
On iterative adjustment of the responses
Introduction
Parameter-dependent adjustments to the data
Illustration
Discussion and further work
Discussion and further work
Wald-type approximate confidence intervals for the bias-reduced
estimator perform poorly (mainly because of finiteness and
shrinkage).
Heinze & Schemper (2002), in the case of logistic regressions used
the penalized likelihood interpretation of the adjusted score functions
to construct profile penalized likelihood confidence intervals.
Such a penalized likelihood interpretation is not always possible from
links other than logit (Kosmidis & Firth 2009).
The adjusted-score statistic
Ut∗ (β̂1 , . . . , β̂t−1 , βt , β̂t+1 , . . . , β̂p )2 F tt (β̂1 , . . . , β̂t−1 , βt , β̂t+1 , . . . , β̂p ) ,
could be used for the construction of confidence intervals for the
parameter βt (t = 1, . . . , p), where β̂u are the bias-reduced estimates
when the tth component of the parameter vector is fixed at βt and
F tt is the (t, t)th component of the inverse Fisher information.
Kosmidis, I.
On iterative adjustment of the responses
Introduction
Parameter-dependent adjustments to the data
Illustration
Discussion and further work
Anscombe (1956). On estimating binomial response relations. Biometrika, 43,
461–464.
Clogg, C. C., D. B. Rubin, N. Schenker, B. Schultz, and L. Weidman (1991). Multiple imputation of industry and occupation codes in census public-use samples
using Bayesian logistic regression. Journal of the American Statistical Association, 86, 68–78.
Firth, D. (1992). Bias reduction, the Jeffreys prior and GLIM. In L. Fahrmeir,
B. Francis, R. Gilchrist, and G. Tutz (Eds.), Advances in GLIM and Statistical
Modelling: Proceedings of the GLIM 92 Conference, Munich, New York, pp.
91–100. Springer.
Gart, J. J., H. M. Pettigrew, and D. G. Thomas (1985). The effect of bias,
variance estimation, skewness and kurtosis of the empirical logit on weighted
least squares analyses. Biometrika, 72, 179–190.
Haldane, J. (1955). The estimation of the logarithm of a ratio of frequencies.
Annals of Human Genetics, 20, 309–311.
Heinze, G. and M. Schemper (2002). A solution to the problem of separation in
logistic regression. Statistics in Medicine, 21, 2409–2419.
Hitchcock, S. E. (1962). A note on the estimation of parameters of the logistic
function using the minimum logit χ2 method. Biometrika, 49, 250–252.
Kosmidis, I. and D. Firth (2009). Bias reduction in exponential family nonlinear models. Technical Report 8-5, CRiSM working paper series, University of
Warwick. Accepted for publication in Biometrika.
Kosmidis, I.
On iterative adjustment of the responses
Download