T - Chris Bilder`s

advertisement
Empirical Bayesian estimation of the disease
transmission probability in multiple-vectortransfer designs
Christopher R. Bilder
Department of Statistics, University of Nebraska-Lincoln,
chris@chrisbilder.com, http://www.chrisbilder.com
Joshua M. Tebbs
Department of Statistics, Kansas State University
tebbs@ksu.edu
ABSTRACT: Plant disease is responsible for major losses in agriculture throughout
the world. Diseases are often spread by insect organisms that transmit a
bacterium, virus, or other pathogen. To assess disease epidemics, plant
pathologists often use multiple-vector-transfers. In such contexts, s>1 insect
vectors are moved from an infected source to each of n test plants. The purpose
here is to present new estimators for p, the probability of pathogen transmission
for an individual vector, motivated from an empirical Bayesian approach. In
studying point estimate properties, one of our proposed estimators consistently
results in a smaller bias and mean squared error than the maximum likelihood
estimator (MLE) as proposed by Thompson (1962) and Swallow (1985). This bias
reduction is frequently fivefold or more in optimal settings for the MLE.
Furthermore, these estimators are easier to compute than the classical Bayes
estimators proposed by Chaubey and Li (1995) and Chick (1996). Finally our
newly proposed empirical credible intervals possess the desirable property that
lower bound will never be negative.
1
Background
 Plant disease is responsible for agricultural losses throughout
the world
 Diseases are often spread by insect vectors (e.g., aphids,
leafhoppers, planthoppers, etc.)
Brown planthopper
Whitebacked planthopper
 Vector-transfers are often used by plant pathologists wanting to
estimate p, the probability of disease transmission for a single
vector
2
Background
 Experimental set-up (group testing application)
o Insect vectors are moved from an infected source to test plants in a
greenhouse
o Each enclosed test plant has s insect vectors (assume common group size as
recommended by Swallow (1985))
o n = number of test plants
o Yi = 1 if ith test plant becomes infected; Yi = 0 otherwise
o Want to estimate p, probability an individual vector transmits the
pathogen
 Notation
n
o {Y i }i = 1 i.i.d. Bernoulli( ) random variables
o = 1 – (1-p)s = probability plant becomes infected
n
o T = å Y i = # of infected test plants ~ Binomial(n,
)
i= 1
o MLE for is q̂ = T/n
o MLE for p is ˆpMLE = 1 – (1 – T/n)1/s
3
Past research
 Chaubey and Li (1995) and Chick (1996) use a two-parameter
beta prior for p where hyperparameters are chosen a priori
o Possible poor choices for hyperparameters could cause posterior distribution
to be concentrated away from the truth
o Multiple-vector-transfer experiments often use small n
 Tebbs, Bilder, and Moser (2003) derive parametric empirical
Bayes estimators using one-parameter beta prior for p
PURPOSE HERE:
 Develop new parametric empirical Bayes motivated estimators
for p which have smaller bias and mean square error than those
in Tebbs et al. (2003)
 Form an interpretation for the hyperparameter
 Examine frequentist coverage properties of credible intervals
4
Bayes Estimators
 Prior distribution
(1 – p)
–1
for 0 < p < 1
30
0
10
20
f(p)
40
50
o One-parameter beta family: fP(p| ) =
o Example with = 52.4
0.00
0.02
0.04
0.06
0.08
p
o Why one-parameter beta?
 Values of p are usually close to 0
 MLE is positively bias
 Computation and interpretation simplifications
 Posterior distribution
fP |T ( p | t, b ) =
s G( n + b / s + 1)
(1 - p)s (n - t )+ b - 1[1 - (1 - p)s ]t for 0<p<1
G( n - t + b / s )G( t + 1)
 Bayes estimators for p - Value of a with respect to loss function
L(p,a) which minimizes
EP|T[L(P,a) |T = t]
 L(p,a) = (p – a)2
G(n + b / s + 1)G(n - t + b / s + 1 / s )
G(n - t + b / s )G(n + b / s + 1 + 1 / s )
o Derived by Tebbs et al. (2003)
o ˆp1 = 1 -
5
Bayes Estimators
 New estimator
o Let U = 1 – (1 – P)s and note that U|T=t ~ beta(t + 1, n – t + /s)
o EU|T[(U – a)2 | T=t] is minimized when a = E(U|T=t) =
(t + 1)/(n + /s + 1)
o Since P = 1 – (1 – U)1/s and substituting E(U|T=t) for U, we arrive at a new
1/ s
æ
ö
t
+
1
÷
estimator ˆp2 = 1 - çç1 ÷
÷
÷
çè
n + b / s + 1ø
o This is NOT necessarily a Bayes estimator
o The estimator can also be derived another way:
 Choose a beta(1, /s) prior for and L( ,a) = ( – a)2
 Bayes estimate for is (t + 1)/(n + /s + 1)
 Substitute the Bayes estimate for into p = 1 – (1 – )1/s
6
Empirical Bayes Estimators
 Marginal distribution for T
fT (t | b ) =
bG(n + 1)G(n - t + b / s )
s G(n - t + 1)G(n + b / s + 1)
for t = 0, 1, …, n
 Marginal MLE for
o Maximize f(t | ) with respect to
¶
o Solve
logfT (t | b ) = b - 1 + s - 1 [Y (n - t + b / s ) - Y (n + b / s + 1) ]= 0
¶b
for to find b̂MLE where ( ) is the digamma function
 Marginal MOM estimator for
o Set ET[T] = t to find that b̂MOM = s(n – t)/t = s(1 - ˆq) / ˆq
o Interpretation: b̂MOM
= (# of vectors per plant)  (non-infected prop.) / (infected prop.)
= (group size)  (group failure prop.) / (group success prop.)
o Choosing s is important in order to prevent poor estimates of p; i.e., need to
choose s so that is not close to 0 or 1
 Rule of thumb is to choose s so that approximately ½ test plants are
positive and ½ test plants are negative
 Substituting ½ for q̂ into b̂MOM leads to b̂MOM
s
o Although one can think of = ½ as a “target value,” optimal group sizes
may actually lead to an expected proportion of positive host plants being
anywhere from 0.2 to 0.8 (Swallow, 1985, 1987)
7
Estimators and Methods of Comparison
 The estimators:
o ˆpEB 1 = 1 o ˆpEB 2 = 1 o ˆpEB 3 = 1 o ˆpEB 4 = 1 ˆpEB 4 = 1 -
G(n + ˆbMLE / s + 1)G(n - t + ˆbMLE / s + 1 / s )
ˆ MLE / s + 1 + 1 / s )
G(n - t + ˆbMLE / s )G(n + b
1/ s
é
ù
t+1
ê1 ú
ê n + ˆbMLE / s + 1 ú
ë
û
G(n + ˆbMOM / s + 1)G(n - t + ˆbMOM / s + 1 / s )
ˆ MOM / s + 1 + 1 / s )
G(n - t + ˆbMOM / s )G(n + b
1/ s
é
ù
t
+
1
ê1 ú which reduces to
ê n + ˆbMOM / s + 1 ú
ë
û
1/ s
é tù
ê1 - ú = ˆpMLE using b̂MOM = s(n – t)/t
êë n ú
û
 Bias and MSE for an estimator ˆpi
æn ö
t
é1 - (1 - p )s ù (1 - p )s (n - t )
o Bias(ˆpi ) = å (ˆpi - p ) ççt ÷
÷
֑
û
çè ø
÷
t= 0
æn ö
n
t
2ç ÷
é1 - (1 - p )s ù (1 - p )s (n - t )
o MSE (ˆpi ) = å (ˆpi - p ) çt ÷
֑
û
çè ø
÷
t= 0
n
o t = 0 and n are excluded from the calculations
 By choosing an appropriate s, t = 0 and n can be avoided
 b̂MLE =  for t = 0 and b̂MLE = 0 for t = n; if we used t = 0 +
n – for a small constant > 0 (instead of t = 0 and n), the
conclusions presented here do not change
and t =
 Relative Bias = Bias(ˆpMLE ) Bias(ˆpEB ,i )
 Relative Efficiency = MSE (ˆpMLE ) MSE (ˆpEB ,i )
8
Relative Bias and Relative Efficiency Plots
15
n=80 and s=25
0.04
0.06
0.08
10
5
0.10
0.02
0.02
0.04
0.06
p
n=30 and s=10
n=80 and s=25
0.04
0.06
0.08
0.10
0.6 0.7 0.8 0.9 1.0 1.1 1.2
p
EB1
EB2
EB3
0.00
0.00
Relative efficiency
0.02
0.6 0.7 0.8 0.9 1.0 1.1 1.2
0.00
Relative efficiency
EB1
EB2
EB3
0
5
10
Relative bias
EB1
EB2
EB3
0
Relative bias
15
n=30 and s=10
0.08
0.10
0.08
0.10
EB1
EB2
EB3
0.00
0.02
0.04
0.06
p
p
Relative bias or relative efficiency > 1 means ˆpEB ,i is better than ˆpMLE
9
9
Relative Bias for optimal MLE settings (Swallow, 1985)
n=
s=
ˆpEB 1
10
35
0.96
20
50
0.93
30
50
80
50
50
50
0.93 0.94 0.94
0.01 ˆpEB 2
ˆpEB 3
1.95
5.11
6.40 8.19 9.85 10.59 12.52
0.63
0.52
0.51 0.51 0.51
0.50
0.50
s=
ˆpEB 1
19
0.97
35
0.90
45
50
50
0.87 0.86 0.87
50
0.87
50
0.02 ˆpEB 2
ˆpEB 3
0.87
2.21
4.84
4.87 5.16 5.64
5.83
6.24
0.61
0.51
0.50 0.50 0.50
0.50
0.50
s=
ˆpEB 1
14
0.97
25
0.90
30
40
45
0.88 0.84 0.82
45
0.83
50
0.03 ˆpEB 2
ˆpEB 3
0.81
2.59
4.89
5.07 4.47 4.29
4.40
4.15
0.58
0.51
0.50 0.50 0.50
0.50
0.50
s=
ˆpEB 1
9
1.00
16
0.91
20
25
25
0.88 0.84 0.85
30
0.81
30
0.05 ˆpEB 2
ˆpEB 3
0.82
3.06
5.14
4.96 4.49 4.88
4.07
4.29
0.57
0.51
0.50 0.50 0.50
0.50
0.50
s=
ˆpEB 1
6
1.04
10
0.94
13
16
17
0.89 0.85 0.84
17
0.84
18
0.08 ˆpEB 2
ˆpEB 3
0.83
3.91
6.08
5.29 4.69 4.72
4.87
4.78
0.56
0.51
0.50 0.50 0.50
0.50
0.50
s=
ˆpEB 1
5
1.07
8
0.96
10
12
13
0.91 0.87 0.86
14
0.84
14
0.1 ˆpEB 2
ˆpEB 3
0.85
4.77
7.00
6.11 5.45 5.31
4.89
5.22
0.56
0.51
0.50 0.50 0.50
0.50
0.50
p
p
p
p
p
p
100
50
0.94
200
50
0.94
10
Relative Efficiency for optimal MLE settings (Swallow, 1985)
p
0.01
p
0.02
p
0.03
p
0.05
p
0.08
p
0.1
n=
s=
ˆpEB 1
10
35
0.98
20
50
0.99
30
50
0.99
50
50
0.99
80
50
1.00
100
50
1.00
ˆpEB 2
1.00
1.15
1.08
1.05
1.03
1.02
1.02
ˆpEB 3
1.01
0.84
0.90
0.93
0.96
0.97
0.98
0.99
s=
ˆpEB 1
19
0.98
35
0.97
45
0.97
50
0.98
50
0.99
50
0.99
50
ˆpEB 2
1.00
1.15
1.10
1.08
1.05
1.03
1.02
ˆpEB 3
1.01
0.84
0.87
0.89
0.92
0.95
0.96
0.98
s=
ˆpEB 1
14
0.98
25
0.97
30
0.97
40
0.97
45
0.98
45
0.98
50
ˆpEB 2
0.99
1.15
1.10
1.08
1.06
1.04
1.03
ˆpEB 3
1.02
0.83
0.86
0.89
0.90
0.93
0.94
0.97
s=
ˆpEB 1
9
0.98
16
0.97
20
0.97
25
0.97
25
0.98
30
0.98
30
ˆpEB 2
0.99
1.15
1.11
1.08
1.06
1.04
1.04
ˆpEB 3
1.02
0.83
0.86
0.87
0.90
0.94
0.93
0.97
s=
ˆpEB 1
6
0.99
10
0.98
13
0.97
16
0.97
17
0.98
17
0.98
18
ˆpEB 2
0.99
1.15
1.10
1.09
1.06
1.04
1.03
ˆpEB 3
1.02
0.84
0.86
0.87
0.90
0.93
0.94
0.97
s=
ˆpEB 1
5
1.00
8
0.98
10
0.98
12
0.98
13
0.98
14
0.98
14
ˆpEB 2
0.99
1.15
1.10
1.08
1.06
1.04
1.03
ˆpEB 3
1.02
0.85
0.87
0.88
0.91
0.93
0.94
0.97
200
50
11
Example
 Ornaghi et al. (1999) study the effects of the “Mal Rio Cuarto”
(MRC) virus and its spread by the Delphacodes kuscheli
planthopper
o The MRC virus is the most-damaging maize virus in Argentina
o It was desired to estimate p, the probability of disease transmission for a
single vector
 Female planthoppers in the 4th stage
o s = 7 planthoppers per plant
o n = 24 plants
o t = 3 infected plants observed
 The estimators:
o
o
o
o
ˆpEB 1 = 0.018857 where b̂MLE = 52.4
ˆpEB 2 = 0.018596 where b̂MLE = 52.4
ˆpEB 3 = 0.019165 where b̂MOM = 49
ˆpEB 4 = ˆpMLE = 0.018895 where b̂MOM = 49
12
Summary

ˆpEB 2
é
t+1
= 1 - ê1 ê n + ˆbMLE / s +
ë
1/ s
ù
ú
1ú
û
results in a significant reduction
of bias and moderate reduction in MSE when compared to the
MLE
 Other estimators
o The median and mode of fP|T ( p | t, b ) result in estimators which at times can
be better than ˆpEB 2 ; however, ˆpEB 2 is much more often better in terms of bias
and MSE
o Burrows (1987) presents a frequentist estimator based on the MLE with a
bias correction which predominantly does better than all estimators
examined here with respect to bias reduction; ˆpEB 2 and the Burrows
estimator are much closer with regard to MSE reduction
o There is no uniformly superior estimator!
 Interval estimators for p
o See Tebbs and Bilder (JABES, 2004) for frequentist interval comparisons
o Equal tail and highest posterior density region credible intervals usually
have poorer coverage than a Wald confidence interval for p (of course, the
interpretation of the intervals differ)
 Our examination did not take into account the variability in the
estimate of
o The credible intervals possess the desirable property that the lower bound
will never be negative (unlike the Wald interval)
13
Download