Document 11070426

advertisement
p
Dewey
HD28
.M414
no-
9f
ALFRED
P.
WORKING PAPER
SLOAN SCHOOL OF MANAGEMENT
Pooled Testing for HTV Screening:
Capturing the Dilution Effect
Lawrence M. Wein
Stefanos A. Zenios
#3665-94-MSA
March 1994
MASSACHUSETTS
INSTITUTE OF TECHNOLOGY
50 MEMORIAL DRIVE
CAMBRIDGE, MASSACHUSETTS 02139
Pooled Testing for HTV Screening:
Capturing the Dilution Effect
Lawrence M. Wein
Stefanos A. Zenios
#3665-94-MSA
March 1994
M.I.T.
MAR
LIBRARIES
2 41994
RECEIVED
POOLED TESTING FOR HIV SCREENING:
CAPTURING THE DILUTION EFFECT
Lawrence M. Wein
Sloan School of Management, M.I.T.
and
Stefanos A. Zenios
Operations Research Center, M.I.T
Abstract
We
study pooled (or group) testing as a cost-effective alternative
nated blood products (sera)
for
HIV; rather than
combines various samples into a pool, and then
ifies
an
initial
pool
and based on the HIV
size,
pool for transfusion, discards
further testing.
We
all
test
for screening do-
each sample individually, this method
tests the pool.
A
group testing policy spec-
test result, either releases all
samples
in the
samples in the pool, or divides the pool into subpools
develop a generalized linear model that relates the
HIV
test
for
output to
the antibody concentration in the pool, and hence captures the effect of pooling together
different samples.
studies,
and
is
The model
embedded
is
into a
validated and simplified using data from a variety of
dynamic programming algorithm that derives a group
field
testing
policy to minimize the expected cost due to false negatives, false positives and testing.
A
simulation study shows that significant cost savings can be achieved without compromising
the accuracy of the
test.
classification rule (that
ther) that
is
However, the efficacy of group testing depends upon the use of a
is,
discard the samples in the pool, transfuse
dependent on pool
size,
a characteristic that
pooled testing procedures.
February
18,
1994
is
them or
test
them
fur-
lacking in currently implemented
In the
first
years of the
AIDS
epidemic, numerous instances of
by blood transfusion were reported to the Center
cated that the blood supply
is
safer
is
all
incidence indi-
a virtually frictionless pathway for spreading the epidemic,
infected blood donors would be identified
and many developing
be
at the individual level should
blood supply would be attained. Nevertheless, the cost
substantial,
infection caused
The
for Disease Control.
and the extent of the epidemic dictated that screening
adopted. As a consequence,
AIDS
for
and a measurably
such a screening program
countries, particularly in Africa
where the epidemic
is
spreading rapidly, are struggling to fight the disease on limited budgets.
Pooled testing is one potential way to reduce the monetary cost without compromising
the accuracy of the tests.
we can pool the
If
The rationale behind pooled
sera from ten (for example) individuals
the seroprevalence of HIV, which
enough, then there
in this case,
(either
is
If,
is
and
is
simple and intuitive: suppose
test the
pool using a single
the fraction of the population that
a high probability that
we would
individual tests.
testing
all
learn from a single test
on the other hand, the
is
infected,
ten individuals in the pool are
HIV
is
test.
low
negative;
what otherwise would be learned from ten
test
outcome
is
positive, then additional tests
pooled or individual) would need to be carried out.
However, pooled testing has a possible shortcoming, the dilution
serious concern that
diluted so as to
if
the pool size
is
too large, then any
become undetectable by the
costly, particularly
when pooled
testing
is
test.
HIV
effect:
positive sera will
be
there
is
a
sufficiently
These false negatives can be extremely
employed to protect the blood supply. Moreover,
infected individuals exhibiting an unusually low level of antibody concentration are less
likely to
be detected when screened
can be seriously affected. (Sensitivity
whereas
in pools.
is
Consequently, the sensitivity
of the test
the probability of detecting a diseased individual,
specificity is the probability of detecting a healthy individual.)
Pooling methods have been evaluated
in
blood banking systems
in several
developing
countries, including Zaire,
1988,
Emmanuel
1990).
80)
may be
from
5%
to
1988, Kline et
et al.
These
al.
field
Zimbabwe and Ecuador
1989, Behets et
al.
studies suggest that pooling
as sensitive
and
sensitivity of the test,
is
more conservative
On
the dilution effect
pools of size no greater than five in Tamashiro et
studies
and the recommendations of the
(1993).
al.
WHO
is
its
sizes as large as
result in cost savings
the other hand, the World
and
its
consequences on the
They recommend
The discrepancy between
in
is
employed
testing,
which
is
draftees for syphilis, where
and Groll 1959), and a large
et
al.
motivated by
tests
(i.e.,
HIV
tests
(1987),
literature
testing.
is
all
called
group
exists
on
and
60's, (see, for
this topic;
al.
in screening
The group
example, Sobel
readers are referred to
(1992) for recent work that
existing studies concentrate
is
on either perfect
with no misclassification errors) or imperfect tests with errors that are
who assume
outcome
in the 1950's
now
However, nearly
size;
two exceptions are Hwang (1976) and Burns and Mauro
that test sensitivity
studies neglect the actual test
test
resulted in considerable savings.
(1991) for a survey, and to Litvak et
independent of the group
all
it
problem was researched aggressively
Johnson
is
to efficiently eliminate all defective items
from certain large populations. The method found an immediate application
testing
effect
danger of either underestimating
Dorfman (1943) showed how pooled
testing in the statistical literature, can be
II
the
importance.
In his seminal paper,
World War
the use of
an indication that the dilution
not well understood, and the bloodbanking community
or overestimating
and can
in their proposal:
et al.
1990 and Ledro-Monroy et
methods (with group
specific as individual testing,
(WHO), concerned with
example, Cahoon- Young
al.
80%, depending on the actual seropre valence.
Health Organization
field
(see, for
is
a specified function of the group
mechanism and, except
for
size.
In addition,
Arnold (1977), assume that the
binary rather than continuous.
In contrast,
we attempt
to explicitly
model both the dilution
effect
and the continuous
nature of the test outcome. Our task
outcome, which
is
greatly complicated by the fact that the
a continuous quantity called the optical density
is
measurement of the unobservable antibody concentration.
we
derive a generalized linear
model that
concentration of the tested sera.
and
model
Starting from
A
negative, then the pool
is
released for transfusion,
divided into subpools for further testing.
the sample
discarded
is
if
policies are characterized
the test outcome
by the
initial
is
If
HIV
and
if
the test
must
deemed
HIV
sample in the pool
testing.
Our
is
is
HIV
positive, then the
group testing
embedded
test
outcome, our group testing
based on the test outcome: the pool
is
released for transfusion), or the pool
validated pooling model
is
pool size and the resulting subpool configuration.
positive (each sample in the pool
is
HIV
the test
positive. Hence, traditional
also develop a classification rule that
either
is
if
the pool consists of a single sample, then
Because we explicitly consider the continuous nature of the
policy
antibody
validated using data from an existing pooling study.
is
is
principles,
simplified version of the
Traditional group testing problems consider a binary test outcome;
pool
first
explicitly captures the physical pooling of sera,
validated using data from two existing dilution studies.
is
test
only an indirect
level, is
relates the optical density level to the
The model
HIV
into a
discarded) or
is
HIV
is
negative (each
divided into subpools for further
dynamic programming framework
that derives the group testing policy that minimizes the expected cost due to testing, false
positives
and
false negatives.
Our proposed
policy
is
tested on a
Monte Carlo simulation
model, and the results indicate that pooled testing, with a classification rule that explicitly
depends on the pool
The paper
assay used for
in
Section
2.
is
HIV
The
size,
can achieve significant cost savings over individual testing.
organized as follows.
testing,
is
A
preliminary description of
given in Section
generalized linear model
is
1.
The data used
developed
in the
in Section 3,
validated in Section 4 using the data described in Section
2.
EL ISA,
A
the biological
paper are described
and
is
simplified
and
dynamic programming
framework
A
derived.
Section
for the
group testing problem
simulation study
is
is
undertaken
developed
and several
in Section 5,
policies are
and concluding remarks appear
in Section 6,
in
7.
1.
Serological Tests for
The human body
AIDS
by
reacts to microbial agents, like viruses, bacteria, parasites, etc.,
producing antibodies. The antibodies recognize particular molecules on the surface of the
infectious agent
and bind to them. Such molecules are
called antigens
(anybody
generators).
Various immunological tests are designed to detect antibodies, thereby identifying the serological status of the individual.
The Human Immunodefficiency Virus (HIV)
Immune
HIV
the
Deficiency
is
the pathological agent of the Acquired
Syndrome (AIDS). Enzyme Linked Immunosorbent Assays (ELISA)
virus detect the anti-HIV antibodies
and are frequently used
for
HIV
for
screening.
This section contains a brief nontechnical description of ELISAs.
A common
(see
configuration of
ELISAs
George and Schochetman 1985
phase support (usually
wells).
The
for
more
patient's
by the manufacturer), added to the
By
for
solid
HIV
is
details).
serum
the indirect assay pictured in Figure
Antigens to
(or
plasma)
the end of the incubation period, any antibodies to
unattached material
is
is
added.
HIV
test
outcome
When
is
that are present in the sample
The
well
is
then washed so that
a substrate
is
finally
when a secondary antibody,
the optical density (OD)
The
labeled by
added, an enzymatic reaction takes place
producing a color change proportional to the amount of
ELISA
diluted (at a dilution fixed
removed, and the attached antibodies become detectable.
attached antibodies (immunoglobulins) are detected
an enzyme,
are attached to a solid
phase support and incubated for a time period.
are attached to the antigens on the solid phase support.
all
is
HIV
1
level,
human HIV
antibodies present.
which quantifies
The
this color change.
colorless
substrate
wash
wash
+
+
colored
HIV antigens
Figure
Hence, the
and
OD
1:
reading
their ability to bind
HIV specific
Secondary
antibodies
antibodies
A
is
schematic representation of the indirect ELISA.
determined by two
factors:
the concentration of the antibodies
manufacturer, then the patient
is
declared
alternative configuration of
types of antigens used in indirect
method
critical value, or cutoff,
HIV
recommended by
the
positive; otherwise the patient
differs
ELISAs
ELISAs
from the indirect one
is
is
declared
the competitive assay. Although the same
are attached to the solid phase support, this
in the detection
mechanism.
antibodies compete with the patient's antibodies for binding
inversely proportional to the concentration of
OD
OD
negative.
An
is
affinity). If the
on the antigens on the solid support (antibody
recorded at the end of the process exceeds the
HIV
product
level
exceeds the
critical
negative, otherwise positive.
HIV
sites.
The
Enzyme
labeled
color change observed
antibodies in the serum.
If
the recorded
value set by the manufacturer, then the sample
We
HIV
is
declared
concentrate on indirect assays in this paper, since most of
the commercially available antibody detection kits are based on the indirect configuration
of
ELISAs. Nevertheless, the study of competitive assays
minor modifications, stated when necessary, are required.
is
not any more
difficult
and only
ELISAs
a shortcoming that stems from the
antibodies.
and very accurate; however, they have
axe inexpensive, easy to administer
The
test's
patient's time of infection
antibody concentration
in the patient's
indirect detection of
is
serum
followed by a
is
extends from three to nine months and results
HIV
via the presence of
window period during which the
virtually undetectable. This period usually
in false negatives.
Assays
HIV
for detecting
antibodies cannot identify such individuals; therefore, whenever individuals are referred to
as positive or negative,
we
are actually alluding to the presence or absence of
HIV
antibodies.
Description of the Data
2.
We
use individual testing data, dilution series data and pooled testing data obtained
from three independent sources.
itive individuals
OD
pos-
screened using four different assays were provided by the National
HIV
readings for 4000
Reference Laboratory of Australia (Dax 1993).
ings according to the equation
Am
are the
values of
A
x
=
^
minimum and maximum
and
Am
D ~^ Q
OD
,
ELISA)
for
HIV
so that they
between zero and one;
read-
A and
The
vary by assay, and were chosen based on an analysis of the data and
are given in Figure 2(a).
negative than for
we
is
HIV
We
set
A =
OD
readings for assay
observe that both the
positive individuals.
The
and
Am =
mean and
20.
A
(an indirect
variance are smaller
relatively large spread in the
HIV
to be expected, since an individual's antibody concentration tends
to systematically vary as the disease progresses; see
The two populations
fall
OD
readings, respectively, recorded by the assay.
empirical distributions for the normalized
positive distribution
negative and 3000
convenient to normalize the
It is
discussions with the data providers. For this data,
The
HIV
HIV
Individual Testing Data.
George and Schochetman
for details.
are well separated, and therefore a critical value separating the
outcomes into HIV positive and HIV negative can be
selected.
OD
O
o
o
o
o
c
c
o
Ol
=>
o
cm
o
o
o
S
a.
o
o
GO
9
o
™
CD
>
Z
>
a
C\J
c
o
3
o
o
r-
<S
in
<o
2
o
a
>
z
o
c
o
o
-^mmfiMfllill
0.0
0.2
0.4
OD
0.S
0.8
1.0
readings
LOD
(a)
Figure
HIV
2:
(a)
(b)
Empirical densities for the reactivity ratios of 4000
positive individuals,
For reasons that
and
will
in
OD)
readings.
Figure 2(b).
<7_
=
The
(b) densities for the
become
OD
mation of the normalized
(logit
readings
HIV
we
clear in Section 4,
readings:
x
—
»
LOD
negative and 3000
values.
also consider the logit transfor-
ln(yf^), which will be referred to as the
empirical densities of the
The sample mean and standard
0.42 for the
corresponding
HIV
negative population,
LODs
for the
two populations are given
deviation are, respectively, /z_
and
fj,+
=
0.80
LOD
and a +
=
=
—4.82 and
1.08 for the
HIV
positive population.
Figure 3 displays the normal quantile plot for the empirical distributions in Figure 2(b); that
is,
the
LOD
standard normal quantiles.
tile
plot of the
LOD
HrV
The quan-
straight line indicates normality of the data points.
is
approximately linear
in the tails of
positive population.
of the
A
readings
normality are observed
HIV
readings are ranked in magnitude and are plotted against the
the
HIV
for
both populations. Deviations from
negative population and the right
tail
Most importantly, the normal approximation captures the
positive distribution, which contains the low
detectable under pooled testing.
On
OD readings
that might
of the
left tail
become un-
the other hand, the normal approximation to the
HIV
°
^
_O
CM
o
-2
Figure
2
2
Quantiles of Standard NormaJ
Quantiles of Standard Normal
(a)
(b)
NormaJ quantiles
3:
•2
for the
LOD
readings of
(a)
HIV
HIV
negative and (b)
positive
populations.
negative population underestimates the proportion of negative individuals with a relatively
high
OD
reading, which might lead to an underestimation of false positives.
Nevertheless,
the false negatives, which are the overriding concern in pooled testing, will not be affected.
In the analytical
the
/i_
HIV
=
model developed
in Section 5,
we assume that the
LOD
readings for
negative and positive populations are normally distributed with respective means
—4.82 and
fi
+
=
0.80,
and respective standard deviations o_
Dilution Series Data.
—
0.42 and a +
=
1.08.
Dilution series data were obtained from the Caribbean
Epidemiology Center (Hull 1991 and de Gourville 1992) and the National
HIV
Reference
Laboratory of Australia (Dax). The purpose of both of these studies was to investigate the
effect of dilution
on the
ability of
serum
sequentially in a fixed negative
1
to detect reactive sera.
Caribbean Epidemiology Center (CAREC) study, ten positive sera were diluted
In the
the ratios
ELISAs
:
1, 1
:
4, 1
of the positive sample.
:
16,.
,
.
.
1
:
4
to
produce a
12
.
Each dilution was
A
1
:
n
series of thirteen four-fold dilutions in
ratio
means that £
tested by two indirect
of the pool consists
ELISAs according
to the
manufacturer's instructions. Since the data from both assays yielded similar results, we only
report the results from one of them.
The raw data
consists of 130
OD
We
each of the thirteen dilution levels of each of the ten positive samples.
Am =
OD
15 to normalize the
readings, one for
used
A =
and
readings.
The National HIV Reference Laboratory of Australia (NRL) study
sequentially diluted
ten positive sera in a fixed negative serum to produce a series of 11 two-fold dilutions, with
ratios
1
:
1,1
:
2, 1
:
4,
.
,
.
1
.
:
2
10
.
These dilutions were tested on ten
different assays.
We
analyzed the data from several of these assays and obtained very similar results, and hence
will
Aq
only report on the results from one assay.
=
and
A2 =
OD
readings were normalized using
2.
Pooled Testing Data.
hereafter as
The
Cahoon- Young
Cahoon- Young
et al., tested
et al.
(1992), which will be referred to
1280 specimens individually and in a series of
nested pools. More specifically, the individual specimens were pooled to generate 128 pools
of size 10; the pools of size 10 were then
of size 40
and
finally 16 pools of size 80.
combined
to
form 64 pools of
size 20,
then 32 pools
Twelve individuals were found to be HIV
and no more than one positive sample was found
in
any of the pools of
size 80.
positive,
The
OD
readings at every stage of this nested testing procedure were recorded.
Note that the dilution
al.'s
series studies
by
CAREC and NRL differ from Cahoon Young et
pooling study in one important respect: positive sera are diluted with varying amounts
of the
same negative
sera in the dilution series studies, whereas individual sera are combined
with a varying number of different individual's sera
in
Cahoon- Young
et al.'s
pooling study.
Hence, although the two dilution series can be used to assess the effect of dilution, the
Cahoon-Young
testing.
et
al.
study exactly mimics the pooling that would take place under group
A
3.
When
Probabilistic
Model
for the Dilution Effect
sera are screened in pools, the
and
affinity of the antibodies in the pool.
and
affinity
makes
we develop a
it
The
stochastic
CAREC
and
NRL
predicts the
OD
level of
We
affinity.
OD
OD
The model
level of
OD
In this section,
a sample as a function of
linear
et
model (GLM) that
HIV
level of the
adapted
is
Cahoon- Young
data of CAREC and
a pool.
level of
then specialize the model to the setting of
a pool as a function of the
consisting of individual samples, as in the
series
determined by the concentration
and obtain a generalized
dilution studies,
level.
is
unobservability of the antibody concentration
model that predicts the
and the corresponding dilution
The dilution
reading
very difficult to estimate the
the antibody concentration and
the
OD
positive sample
in Section 4 to consider pools
al.
study.
NRL essentially generate dose-response curves:
the dose takes the form of a fixed positive sample diluted to various levels, and the response
is
simply the corresponding
OD
reading.
Empirical dose-response curves typically exhibit
sigmoid or hyperbolic behavior, and polynomial, general curvilinear,
sion models have been proposed to
fit
Before our model
these curves.
worthwhile reviewing the traditional approach, and we focus on the
concreteness. Let V} denote the
Figure 2(b)) of a particular
ing, as in
where d
LOD
is
regression
an integer (d
=
4 for
reading (that
HIV
CAREC
is,
=
2 for
introduced,
is
is
NRL) and
= 0, 1,
.
.
,
.
n.
The
Cj
read1
:
dJ
,
linear
model hypothesizes that
Yj^a + Pj + ej,
where
OD
diluted to the ratio
j
it is
logistic regression for
the logit of the normalized
positive sample that
and d
logit or probit regres-
are iid normal
random
variables with zero
(1)
mean. Although
generates predicted values that coincide well with observed values,
it
this
model
typically
exhibits considerable
heteroscedasticity (state-dependent noise), and hence one of the model's basic assumptions
is
violated (see Tijssen 1985, Chapter 15).
10
Whereas the
existing literature has taken a purely empirical approach to fitting dose-
response curves, we develop a probabilistic model that
assumptions regarding the behavior of the
leads to a
GLM
ELISA
for the dose-response curve:
test
based upon a
is
and the pooled
variable
is
sera.
Our
analysis
recall that while a linear regression
postulates that the expected value of the dependent variable
independent variable, a
set of primitive
is
model
a linear function of the
GLM assumes that a function of the expected value of the dependent
a linear function of the independent variable. Like model
the sigmoid nature of the dose-response curve. In addition,
it
our
(1),
GLM
captures
proposes a particular variance
function that, as will be seen in the next section, stabilizes the heterogeneous noise present
in
the
CAREC
and
Our model
estimate the
NRL
sets.
influenced by Fisher (1922),
is
number
data
who developed a
of bacteria in a sample of water or
are used to derive our model.
We
soil.
The
probabilistic
model to
following eight assumptions
conferred with several specialists, and none of these
assumptions generated any disputation; assumption 5 was the only one that appeared to
stimulate any reflection.
HIV
Al.
The number
A2.
No more than one HIV antibody can bind
A3.
The
of
antigens, n, attached to a well satisfies
n >
for antibodies
A
A5.
The normalized
secondary antibody
bodies.
OD
will
bind to
reading
is
all
.
is
small, independent for
from the same serum. The expected number
of attached antibodies on a large collection of antigens
A4.
6
to any antigen.
probability of an antibody binding to a specific antigen
each antigen, and constant
10
is
significant.
attached primary antibodies.
linearly proportional to the
number
of attached anti-
,
A6.
The expected number
HIV
of attached
antibodies
is
linearly proportional to the anti-
body concentration. The proportionality constant can vary among
due to differences
A7.
binding properties
in their
The antibody concentration
of pooled sera
different individuals
(affinity).
is
the weighted average of the individual
antibody concentrations.
A8.
If
Measurement
errors are negligible.
a competitive assay
A5c.
is
The normalized
employed, then assumption
OD
reading
A5
is
replaced by
number of attached secondary
proportional to the
is
antibodies,
and the following assumption
A9.
The
affinity of
is
introduced:
primary antibodies
secondary antibodies
will
is
higher than that of secondary antibodies; therefore,
bind to antigens on the solid support
not enough primary antibodies to saturate the binding
In this case, the subsequent
model derivation
if
and only
if
there are
sites.
follows virtually unchanged.
Motivated by Fisher's analysis, we consider a well with n antigens bound on
introduce a partition of the well into k subwells indexed s
antigens are uniformly
bound on the
every subwell. Suppose that serum
and then added to the
is
observable,
refer to the
and
well.
i
is
well,
we
let
m=
diluted with an
=
1
,
.
.
,
.
£ denote the
HIV
Assuming that the
number
of antigens on
negative serum in the ratio
1
:
d?
Since neither the concentration nor the affinity of antibodies
since their net effect
is
multiplicative in nature by A6,
product of the antibody concentration and the antibody
concentration. Let pi
k.
and
it,
denote the antibody concentration
the antibody concentration of the
HIV
for the
we
will hereafter
affinity as the
undiluted serum
antibody
i,
pica
be
negative serum, p^ be the antibody concentration
12
,
for the diluted
serum, and p XJ be the binding probability
note that none of these quantities are observable.
antibodies attached to the antigens on subwell
Our model development
distribution for
=
S^*
(JVyi
+
.
.
which
l
main
is
A6
to relate the
antibodies per subwell
Sl} k
represent the
number
of
probability p tJ to the
we use assumption A5
find the probability
number
of attached antibod-
p, r
Then we use assumption
unknown antibody concentration
to relate the average
OD
to the normalized
we
First,
the average
per subwell, in terms of the unknown binding probability
the diluted serum. Finally,
lJS
steps:
ies
unknown binding
N
Also, let
serum;
s.
consists of three
+ N jk)/k,
for the antibodies in the
number
in
of attached
Combining these three steps
reading.
yields our basic model.
By our comments
with size parameter
m
above, /VtJ i,
.
,
.
.
N
tJ
are independent binomial
k
and success probability piy By A3 and the law
random
variables
of rare events, the
binomial random variable can be approximated by a Poisson random variable:
P(Nijs = k)*e-"^^,
where
1)
=
Pi ]m
=
mpij.
We
can choose a sufficiently
fine partition of the well
such that
P(NlJS >
o(pijm ), implying
the Central Limit
Sijk
0)
a
P(NlJS =
l)
« l-e _p" m
Theorem and
l
—
P(NlJS =
P(Nijs >l) a
By
(2)
™ N(\ - e-
p
e- p
(3)
and
(4)
o( Pl]m ).
(5)
(3)-(5),
'J"-,le- p ^(l
- e- p >""))
as
k -+ oo,
(6)
k
and hence
ln(£(l
- Sijk )) = -pijm
13
.
(7)
Since ]C*=i ^ijs
is
a binomial random variable, the distribution of
by the normal distribution even
4,
for relatively small values of
Sl]k
is
approximated
well
k (typically k
>
15). In Section
the parameters of our resulting model are estimated from the data, and k approximately
equals 20.
Assumption
A6
implies that
Pijrn
=
-jr,
(8)
which relates the binding probability to the antibody concentration
Let
X
that
X =
OD
reading,
Sijk
=
by
denote the normalized
'ySi-jk,
tj
1
Xj
X —
i}
(3)- (5),
where 7
1, is
reading of the diluted sample.
the constant of proportionality.
is
attained
and so 7
OD
=
1
when antibodies
(6)
relating the normalized
and
OD
are
serum.
Then A5
implies
The maximum normalized
bound on
all
subwells.
In this case,
and
Xij
Combining equations
in the diluted
(8)-(9)
=
Sijk-
(9)
and taking logarithms
gives the basic stochastic
model
reading to the antibody concentration:
Xa-NiEiXiil^EiXdil-ElXv])),
(10)
ln(-ln(l-E[X ]))=ln(^).
(11)
where
tJ
In Section 4, this basic model will be adapted to the Cahoon- Young et
study.
By A7,
Now we
specialize this
model
CAREC
to the setting of the
the antibody concentration in the diluted serum
p l0
+
(d J
-
and
NRL
al.
pooling
dilution studies.
is
l)p ioo
P«j
dj
~
PlO
(1+
dJ
^Aoc
)
p.o
14
_
(12)
Combining equations
«
ln(l 4- x)
(11)
and
adequacy of
(the
ln(-
The random component
The
is
link
this
and using the
approximation
of the
-£[*„]))=
ln(l
model
normally distributed by
is
be investigated during the model
will
In
(^)-j In d.
OD
the normalized
level
(13)
X
Xj
The systematic component
(10).
of the diluted sample,
is
the dilution level
between the random component and the systematic component
(Mj
(cloglog),
=
E(Xij), the link function
and
component
is
oii
=
ln(^) and
given by
dispersion parameter
We
<t>
j3
g
4>Hi 3 (l
(14)
i
x
:
= — \nd
Var{XX] ) =
equals
is
—
ln(— ln(l
are constants.
— ^ tJ ), which
j.
of the form
is
= a + 0j
g(fMj )
where
order Taylor series approximation
first
GLM
validation phase) gives the
which
(12),
—
x)), the
complementary
The second moment
will
be denoted by
of the
log-log
random
where the
V(f^ij),
£.
conclude this section with several remarks about the
GLM
It
(13).
captures
the sigmoid nature of the dose-response curve via the cloglog link function. Other suitable
sigmoid link functions are the
and the probit by g
To obtain the
Section
4.
best
:
x
fit
Notice that
— $ _1 (;c),
of the data,
if
set the dilution level j
—
we
0,
normally distributed, which
2,
and that
The
will
logit
be used
and
probit.
where
$
GLMs
is
The
logit link is defined
by g
x
—
for all three link functions will
be considered
in
replace the cloglog function by the logit function in (11) and
then this equation implies that individual
is
LOD
readings are
consistent with the assumption that was discussed in Section
in Section 5.
OD
level of the
and hence provides a measure of the antibody concentration of the
positive sample; our
ln(y^)
the cumulative standard normal distribution.
y-axis intercept ai corresponds to the cloglog of the normalized
positive sample,
:
model predicts that a
x
=
15
}n(p l0 /k), which
is
original
perfectly consistent with
The
this interpretation.
slope
= — In d
dilution level j; hence,
and k are not observable, and d
y— intercept and
the
GLM
to the
and
for
4>
and see
if
4.
is
is
also consistent with this interpretation. Notice that
observable. Hence, the slope of the
model
is
CAREC
and
a fixed slope
/3
NRL
data
= — \nd.
the predicted slope
sets.
Unfortunately,
is
—
we
<j>
from the model,
In d.
Model Validation
we attempt
to validate the
GLM developed in Section 3.
In Subsection
the parameters of the model are estimated using the dilution series studies by
and NRL. The
GLM
simplified pooling
is
adapted to the pooling setting and simplified
model
is
validated on
Cahoon-Young
4.1
Model
The
dilution series studies undertaken by
et al.'s
in Subsection 4.2.
data in Subsection
i
and
CAREC
and
NRL
This data
dilution level j.
the values of the generalized linear model parameters a,/3 and 0.
in the
CAREC
The
4.3.
Fitting
values Xij for positive sample
embodied
the
fit
very tedious to estimate a,
Therefore, we will estimate a,/5 and
close in value to
is
it
pi0
known, but
the dispersion parameter are unknown. In the next section,
In this section,
4.1,
the marginal change in response due to a change in the
(3 is
generate normalized
will
If
the
OD
be used to estimate
random mechanism
GLM is the true process by which the data are generated, then the maximum
likelihood estimators can be obtained by iterative, weighted least squares.
normality assumption on
X^
is
However, the
approximate, and we can relax this assumption by employing
the theory of quasilikelihood functions (see pp. 323-352 of McCullagh and Nelder 1989).
This theory applies under the following four conditions that are satisfied by the
(i)
the range of possible normalized
level is specified as
OD
values X,j
a function of the dilution
16
is
known,
level j,
(iii)
(ii)
the
GLM:
mean normalized
OD
the variance of the normalized
.
OD
is
independent.
Let us
vector of normalized
/ijj
mean
specified as a function of the
fix
OD
and variance V(/i tJ ).
sample
i,
readings,
Then
and
01),
let
and
X =
(A,
t
the observations are statistically
(iv)
,
A,i,
and assume that the
A
tJ
.
's
the log-likelihood function for
.
,
X in
)
denote the random
are independent with
/z tJ
mean
can be replaced by the
quasilikelihood function
where x tJ
is
the realization of
A^. The maximum
likelihood estimators
(MLE)
model
for the
parameters are then obtained by maximizing the quasilikelihood function
QM =W"' -tw
x
for
each study, where n
—
CAREC
12 for
and n
S-plus (see Hastie and Pregibon 1992) to obtain
slope
the
/5
and the y— intercept
mean response
a,,
i
=
1,
.
,
.
r
Since most
GLM
Figure
4.
No
scatter plots.
are detected.
to the data,
scatter
significant deviations
The observations
It is
The
parameter
predicted values
<fi,
the
jlij
are
and are given by
l
+
(17)
/?J.
.
diagnostics are visual,
The
the glm routine of
for the dispersion
we begin by analyzing the
predicted vs. observed values and the Pearson residual plots.
the logit link function.
<16)
NRL. We use
10 for each study.
^-)=d
residuals are defined as ^.'C^y
10 for
MLEs
values predicted by the model,
ln(
The Pearson
.
=
dv
and residual plots
from the predicted
fit
for
Our
analysis
CAREC
and
scatter plot of the
is
illustrated using
NRL
(the dotted line) are observed in the
are fairly uniformly spread along the fitted line
worth noting that the traditional linear regression model
and severe heteroscedasticity was
are given in
and no
(1)
outliers
was also
present; hence, the variance function
fit
V{n)
appears to stabilize the residuals, giving an almost uniform spread of the residuals around
zero.
Moreover, as illustrated in Figure
5,
the three link functions under consideration are
17
CAREC
-
NRL
<0
o
—
o
O)
.2
O)
.V
o
o
O
o
•
.X*
>
v
o
d
0.0
Figure
0.1
0.2
0.3
0.5
0.4
0.6
0.0
0.2
0.1
0.4
0.3
logit link
logit link
(a)
(b)
0.5
0.6
Scatter plots for the response predicted by the logit, cloglog and probit link
5:
functions.
observed values in Figure
nearly
all
The
6.
the observed points
The
lie
predictive
within the
is
95%
many parameters
as observations.
is
of freedom given
by the
The
difference
model parameters. Table
1
residual deviance
is
greater than 0.999 in
fit is
all
Table
1
— In d is
x
2
statistic
for
/3,
2
with degrees
the
MLE
for the
and the residual deviance.
ascertained by the significance level of the
x
2
statistic,
which
cases.
do not contain -In 4
deduce that
model that has as
asymptotically x
shows the 95% confidence intervals
Recall that equation (13) predicts that
in
to the full
between the number of observations and the number
dispersion parameter 0, the degrees of freedom of the
quality of the best
by observing that
twice the log-likelihood ratio. In particular, the goodness-of-fit
GLM
The
verified
confidence interval predicted by the model.
can be assessed quantitatively by comparing the proposed
of
is
be supplemented by quantitative diagnostics based on the
visual diagnostics can
residual deviance, which
power of the model
=
-1.386
0=
for
—bid. Since the 95% confidence
CAREC
and -In 2
not an accurate prediction of the slope that best
by deriving an upper bound on the deviance of the fixed slope
19
=
for
NRL, we
GLM.
However,
-0.693
fits
GLM, we
the
intervals
can show that even
be the set of mean response values predicted using the suboptimal estimates; then Q(p,x),
which
the quasilikelihood for the suboptimal estimates,
is
Therefore,
Qq — Q(p,x)
upper bound
model
for the logit
is
46.2412 for
and 13.617
1.0 respectively,
for
NRL.
Since the
we deduce that the
quasilikelihood analysis can also provide useful insights into the
ELISAs and
normalized
fixed
Equation
OD
Xy, which
level
(18), the values of
is
quite small for
is
much
HIV
<{)
is
in
Table
HIV
observation will be instrumental
Subsection
Central Limit
accurate
fit.
6.2.
1
,
.
^M
and Figure
(18)
4 suggest that the coefficient of variation
positive samples that have not been substantially diluted, whereas
larger (near one) for
Two
mechanism of
the reliability of the test outcomes. Consider the coefficient of variation of the
-
0.
CAREC
GLM. The
model provides a reasonable description of the data.
The
in
bounded below by Q(p*,x).
an upper bound on the deviance of the fixed slope
is
corresponding significance levels are 0.9996 and
slope
is
As a
negative samples or highly diluted positive samples. This
in
the development of the
side remark, since
Theorem employed
variants of the
in (6)
model were
is
=
|,
Monte Carlo simulation model
the value of k
is
at least 15,
and hence the
a reasonable approximation.
also considered in a failed
attempt to obtain a more
Recall that (13) was derived under the rather crude approximation ln(l
For high dilution
this
approximation
ln(l
+x)«i, we
levels,
will
it
+ x) ~
the assumption p i0 S> p loo dJ (see equation (12)) underlying
be violated.
Employing the second order Taylor approximation
tested the refined model
ln(-ln(l -£[*„]))
v
'
=\n(^)-j\nd +
k
d3
^.
(19)
pio
Residual plots and scatter plots that are not displayed here indicate that for
functions, this refinement has very little effect on the quality of the
21
model
all
fit.
three link
We
also
tested the alternative variance function V(//)
The deviance
functions.
=
2
Ojjl
{\
—
=
for the variance function V(fj,)
than the corresponding deviance
for
V{n)
=
— /j) 2
2
(pii
{\
2
fi)
on the
<t>^{\
—
GLM
is
n)
with
all
three link
significantly smaller
indicating that the original variance
,
function provides a better description of the data.
A
4.2.
Simplified Pooling
The complexity
of the traditional
model
model
The
first
where no dilution
group testing policies
for analyzing
two simplifications of the
simplified
(i.e.,
GLM,
combined with the complexity of the
testing problem,
tractable
Model
will
GLM
<f>
in
leads to an analytically in-
that will allow for a tractable analysis in Section
et al.
data
in
Subsection
5.
This
4.3.
rather bold: Motivated by our earlier observation that the
is
Table
captured) group
ELISAs. Consequently, we propc
be validated on the Cahoon-Young
simplification
dispersion estimates
for
effect is
1
we propose
are small,
to ignore the variability in the
GLM
and employ a deterministic model that provides a one-to-one mapping between normalized
OD readings and antibody concentrations.
racy of the
Cu
GLM
Although
this
assumption compromises the accu-
for the sake of tractability, the discussion
about the
coefficient of variation
below equation (18) suggests that the resulting deterministic model should be reasonably
reliable for the dilution of
HIV
positive samples at practical dilution levels.
model
leads to the simplified dilution
h
where the
logit,
(l^)- h (T)-"»*
rather than the cloglog, function
Recall that this dilution model
where a given positive serum
but
is
is
is
is
diluted with a varying
al.'s
present a variant of our model that
is
22
(20)
being employed.
appropriate for the
not appropriate for Cahoon-Young et
We now
Our assumption
CAREC and NRL dilution series,
amount
of a fixed
HIV
negative serum,
data, which mirrors an actual pooled test.
appropriate for pooled testing.
Let a pool
p
consist of
OD
n samples that have individual normalized
concentrations p\,... ,p n
X
Let
-
and p denote the normalized
The only
concentration, respectively, of the pool.
and the pooling
GLM
that equation (12)
is
is
difference
replaced by p
which
is
variability in this
OD
.
.
.
level
,Xn and
and the antibody
between the dilution
=
(p
{
+
.
.
.
4-
antibody
GLM
(13)
p n )/n. Repeating
leads to
the pooling analog to the simplified dilution model (20).
Our second
tration in (22)
simplification
by a
Since lnp*
=
ln(-j^r)
The concavity
is
to replace the logarithm of the average antibody concen-
linear approximation, which yields
ln
(l
v
1
+ln k
= -On p, +
p/
— Avs)
n
for
i
=
1,
.
.
.
,n by
.
•
-
(22),
+
In
n)
-
In k.
we obtain the
+ ^n
of the logarithmic function implies that ln( pl+
n
the linearity assumption
sera,
GLM
Xu
GLM
the steps leading from (11) to (13) gives the pooling
and ignoring the
readings
is
conservative in that
it
testing.
characterization: the
Our
LOD
average of the individual
simplified pooling
(logit of
LOD
readings.
23
)
>
(lnpl+
' ''
l
np "
(24) has
OD)
)
n
OD
;
for the
false negatives that result as
model
the normalized
simplified pooling
underestimates the
and provides an upper bound on the number of
quence of pooled
(23)
model
hence,
pooled
a conse-
an interesting and tractable
reading for a pool
is
given by the
.
The
model
simplified pooling
be validated on the Cahoon- Young
(24) will
The randomness
in this subsection.
the
random walk model
LOD
in the individual
to the pooling data,
et al.
data
when embedded
into
for the pooling data.
We
readings,
random walk model
the deterministic pooling model, leads to a
fit
Model
Validation of the Simplified Pooling
4.3.
and use a nonparametric approach
to test
whether the increments of the random walk are independent and have zero mean.
Recall that Cahoon- Young et
individually tested 1280 samples,
al.
The
nested pools of size 10, 20, 40 and 80.
total
and then generated
sample contained 12 HIV positive individuals,
and none of the pools contained more than one positive sample; hence, 12 of the 16 pools of
size
80 contained exactly one positive sample.
We
only test the random walk model on the pools that contain one positive sample.
In fact, the
Cahoon- Young
not tested at
tested at
LOD
all
all
pool
pool
Let
i
=
1,
reading for positive sample
with positive sample
positive sample
i;
i,
.
,
.
.
i,
Yij
this pool consists of
LOD
i,
Y^
Two
incomplete:
of the 12 positive samples were
the ten samples that were
will restrict ourselves to
10 index the ten positive samples. Let
LOD
be the
and Y? denote the
Notice that for fixed
correspond to
is
Hence, we
sizes.
sizes.
data
et al.
LOD
reading of the j
negative sample pooled
s
reading for the pool of size 10 x 2 containing
samples Yn,
are iid
th
random
.
.
,
Vi,iox2',
variables for j
=
where s
= 2, 3,
.
.
.
0, 1,2, 3.
,
readings for negative sera; the assumption of normal
not required in this subsection. Let
/i
= E(Y
Yn denote the
l] )
for j
>
1.
For
i
=
1,
.
.
,
.
80, since they all
LOD
10
readings
and s
=
1,
is
2,3,
the simplified pooling model (24) implies that
t^10x2"
yr,
=
_
%v^ y
i
1
2
=
y
v^10x2»- 1
2^ =
1
10 x 2
\yT,s-x
+
I i3
s"
1
^
24
(25)
y
^pl0x2"
^j=10x2'- +
1
10 x 2
i
l
'J
(2gA
s
(27)
where
e, 5
three-step
=
—
'
= " x 3 ' ,' tl
x2
1
random walk
is
=
the noise term. Since E(t xs )
Y q,
(Y&, Y&, Y%, Y^) that starts at
^/i,
equation (27) describes a
ends at Y% and has
x
drift i/z.
The
following proposition shows that the autoregressive process (27) can be transformed into an
equivalent driftless
Proposition
1
random walk by
For
i
=
The proof can be found
1,
.
.
,
.
in the
by establishing that the
10, (V.o.
Ki, K2i K3)
s
(Y?
is
—
fi).
when the pool
Hypothesis
I:
walk.
describes the Cahoon- Young et
al.
accomplished by verifying that the random increments are
is
Notice that the
random walk only models
pool size changes from 10 to 20, 20 to 40, and 40 to 80;
pooling effect
random
a three-step driftless
random walk model
independent and have zero mean.
effect as the
2
Appendix. The simplified pooling model (24) can be validated
driftless
pooling data. This validation
Via =
defining
size
changes from
Random
Independent
The
quired to pursue a statistical analysis.
it
the pooling
does not capture the
to 10.
1
Increments.
A
point estimate for
/x is
re-
following proposition suggests that the estimator
should be chosen to minimize the variance.
Proposition 2 Define
Since the true variance
variance.
sum
An
Vis (x) =
is
S
2 (Y?S
not available,
—
x); then
/z
=
minx E(Vls (x) —
arg
Vi0 (x)) 2
.
we consider the estimator that minimizes the sample
additional degree of freedom can be introduced by considering the weighted
of squares
10
3
ZE^(V
ts
(x)-Vl0 (x)) 2
(28)
.
1=1 s=l
The weights Wi, u^, W3 can be chosen
is
minimized.
The weighted
~
_
in such a
way that the sample variance
least squares estimator
ft
minimizing (28)
- * His - yi M2'(l-2«)YS-(l
i^ a =\\.uJs\^
E£i
^t=i E5,i
\>-
ioll.wi-2
and the following two propositions characterize
25
is
2')Y&}\
/i lQ fi
^
2
3
)
}
its statistical
properties.
of the estimator
given by
.
,
2g
Proposition 3 The estimator
Proposition 4 The most
W\
= w2 =
The
0,11/3
=
an unbiased estimator of [i for any choice of weights
p, is
estimator for the pooling data
efficient
now
Vis —
consider the process
data tends to
hence
Vis —» Vis )
infinity,
the Strong
The independence assumption
increments
—
s
2 (Y?
Law
As the number
p.).
of Large
Numbers
AV = V — K
is
implies that
is
|3
_! for s
V
random walk
of the
=
1,2,3.
If
will
/j
(and
described
which
is
a nonparametric procedure that
The
(1988).
sample median and
test
is
A
run
is
having the same value. For example,
the runs are separated as follows:
low run count
is
the sequence
,
Chapter 4 of
AV
above the
is
1 if
iS is
random components
1001001110111100110100, then
1|00|1|00|111|0|1111|00|11|0|1|00,
is
typical for
and the run count
is
median reverting behavior. Note that under
random
the independence hypothesis, the run count for the
fc
in detail in
an indication that observations below or above the median come
together, whereas a high count
with probability p
is
a maximal consecutive set of
if
.
use the runs above or below the
applied as follows. Let u ls take the value
otherwise.
Vls
will
the increments are independent, then with
median
A
>
be tested by studying the
We
Madansky
—
fi
and extrapolate the conclusions to
is
probability | they are either above or below the median.
test,
of positive samples
with probability one. Although we have only ten positive samples, we
carry out a statistical hypothesis test on
where p
x
=
p3
=
j
and p 2
=
\-
Tk
to be the
Tk =
10.
Then
3
* 0!
P{T ,T2 ,T,) =
.
X
vector (A1/tl
Define
vectors in our data set with run count k, where J2 k=l
1 X-J-2- 1 3'
is
obtained for weights
is
1-
in the
12.
.
proofs can be found in the Appendix.
Let us
Vi S
iu t
M
T
p
,
AV AV
l2
number
i3 )
,
of
2
is
k
random
(30)
Pl*
the significance of the observations under the null hypothesis.
For our data, we calc.
vations
is
P(Ti
=
2,
.ted T\
T2 = 5,r3 =
3)
— 2,T2 =
=
0.077.
26
5,T3
=
3;
the significance of these obser-
Although a p- value
for this test
cannot
be obtained without ordering the state space, the probability of observing an outcome as
extreme as
this
under the independence assumption
95%
the independence assumption at the
significance level. In fact, the
along with outcomes (3,5,2) and (2,6,2), the
Hypothesis
for the
mean
of
at least 0.077; hence,
is
mode
AVlU AV
and
l2
AV
l3
by 0.2065
are given
reject
outcome (2,5,3)
is,
of the distribution of (Ti,T2 ,T3 ).
Zero mean random increments.
II:
we cannot
±
The 95% confidence
0.2529, 0.08032
±
intervals
0.4961 and
—0.3491
±
common
point of the three intervals. In conclusion, the data support the hypothesis that the
1.0637, respectively.
random walk
The
zero
mean hypothesis cannot be
rejected, since zero
is
a
(27) provides a realistic description, thus establishing their consistency with
the simplified pooling model (24).
The Derivation
5.
In this section,
framework to find
of
we embed the
efficient
and
its
stop testing and classify
all
policies.
false positives
LOD
reading
Policies
simplified pooling
pooled testing
weighted cost due to testing,
fied size is tested
Group Testing
is
and
Our
objective
false negatives.
determined.
The
individuals in the pool as
samples), stop testing and classify
all
model
(24) into
is
to minimize the expected
Suppose a pool of a
decision
HIV
an optimization
maker has three
HIV
positive (and discard
these samples) or divide the pool into subpools for further testing. There are
many
ways to subdivide the pool under the third option, and we consider a quite general
dure can be modified slightly to allow unequal subpool
in
Hwang
(1984) and the T£(V) procedure in Litvak et
For a given
gorithm
is
initial
developed
in
options:
negative (and transfuse these
individuals in the pool as
multistage policies employed by Arnold, where each subpool
speci-
is
of identical size.
sizes, as in
possible
class of
Our
proce-
the sequential procedure
al.
pool size and subpool configuration, a dynamic programming
al-
Subsection 5.1 for finding the optimal policy within the class of
Exhaustive search among alternative
multistage policies under consideration.
and subpool configurations
sizes
is
required to find the cost minimizing policy.
is
computationally intensive and the resulting policy
plement, a procedure for deriving near optimal Dorfman policies
is
Structural
dynamic pro-
properties of the optimal policy are investigated in Subsection 5.2. Since the
gramming algorithm
pool
initial
im-
difficult to
derived in Subsection
5.3.
The Dynamic Programming Formulation
5.1.
We
assume that the blood donor population
negative (denoted P_) and
are
assumed to be
deviation
The
positive (P+).
fit
composed
LOD
normal random variables with mean
(a+, respectively). This assumption
cr_
a reasonable
that a
iid
HIV
is
to the data in Figure
random donor
HIV
is
2.
is
of two subpopulations:
readings of P_ (P+, respectively)
/i_ (//+, respectively)
and standard
GLM
and provides
consistent with the
The known seroprevalence
random sample
of n\
=
is
individuals and
is
denoted by {Y(ti,..„ ,In),
and
classified
indexed
1
<
individuals.
Based on
negative or
HIV
Y
=
1,
Oi,..
,
Y
,
decide whether
the
all
positive; if so, stop testing.
subpopulations of size n^
ii
the probability
.
,
way
<
1
that the
ijv
<
LOD
Q/v}-
from
Blood sera
collected
is
reading for every sample
The
individuals are tested
according to the following multistage screening procedure (see Figure 7 for a
simple example): Start by obtaining
with
such a
in
<
ii
is
fljli &} individuals
the donor population; the a/s dictate the subpool configuration.
all ri\
ir
Arnold's notation will be adopted to describe the
positive.
multistage testing procedure. Consider a
from
HIV
=
11^=2 a j>
the second with
for all subpopulations.
i\
=
w
^
tn the
LOD
reading of the pool composed of
all
individuals in the pool can be classified as
If
HIV
not, then subdivide the population into a x
first
subpopulation consisting of
2 and so on. Obtain the
LOD
For each subpopulation, decide whether
28
n\
readings Vi ( 1 )
all
individuals
all
,
.
.
.
,
Vi(ai)
individuals should be
I
.
.
Y<1,1> Y(l,
Y
x
Y(2,l) Y(2,2)
2)
Y
(1)
Y(2,l) Y(2,2)
Y(l,l) Y(l,2)
Y2
(
1
(2)
x
|Y 2 (1,2)
,
1)
7\
Figure
classified as
HIV
7:
A
simple example of a multistage group testing procedure.
negative or positive based on the pair (Y ,Yi(j));
if so,
stop testing.
If
not, subdivide those subpopulations that require further testing into a 2 subpopulations of
=
size 713
YijL3
aj- Continue
positive, or stage
in this vein until either all pools are
N, where individual
be equivalently described by a rooted
different
testing
is
used,
deemed HIV negative
The
reached.
is
testing
or
scheme can
where the nodes of the tree correspond to the
tree,
subgroups formed during the procedure.
According to the simplified dilution model
defined by Y^{i\,-
,In)
=
Y(i\,.
.
,
z'jv)
LOD
(24), the
readings are inductively
and
Yj_i(ii,... ,ij_i)
=
(31)
a.
The
state of the system at
LOD
readings obtained thus
posed of
all
=
1,
.
,
.
.
far.
individuals with the
by (Yo,Yi(ii),... ,Yj(ii,...
j
any stage of the screening process can be described by the
,ij)).
N, and denote the
the current state of the system
If
first
To
the pool that
j indices given by
simplify notation,
state of the system
is
Sj
is
and j <
by Sj
N — 2,
29
currently being screened
is
com-
then the state
is
given
i
x
.
.
,ij,
.
,
we shorten
Yj(i\,...,ij) to
=
3 )
{Y
,
.
.
,
Y
for j
=
0,
.
.
.
Y3
,
N.
for
If
then three decisions are possible: Either
declare
all
individuals in the pool as negative and stop testing, declare
all
individuals in the
pool as positive and stop testing, or subdivide the pool into aJ+ \ subpools of size
continue testing. Under the
first
positive individual in the pool,
HIV
incurred for each
cost c(n J+ 2)
is
and under the second
the same notation for the decision at stage
false
negative cost Cfn
we
If
let
is
HIV
Jj(Sj)
programming algorithm
=
random
0,
.
.
.
,
N—
1.
is
1,
we can adopt
(individual testing), the
false positive cost
is
when
cpp or the
in state
S
3
at
defined inductively by
min {c FP P(YN e P-\SN ),cFN P(YN e P+\S N )}
(32)
,
= Tmn{aj+ i(c(nj+2 ) + E[Jj+l (Sj+ i)\Sj]),
a j+i
=
and the
positive or negative,
N
Jj(Sj) denote the optimal cost for stages j through TV
JN (S N )
j
At stage
1.
cpp
decision, a testing
=
defining a^+i
HIV
incurred for any individuals that are misclassified.
stage j, then the dynamic
for
N—
incurred for each
Under the third
By
incurred for each of the a)+ \ subpools.
is
decision, a false positive cost
negative individual in the pool.
individuals are classified as
cp^
decision, a false negative cost
nJ+ 2 and
a/v
£
cfn
£
•••
cfp
E
...f^P(YN (i ll ...,i N )eP-\Sj )}
Because the individual
P(YN(iu...,iN)eP+\Si),
LOD
(33)
readings of each sample in a pool are
iid
variables, equation (33) can be simplified to
Jj(Sj)
=
Tmn{aj+1 (c(nj+2 ) + E[Jj+1 (Sj+1 )\Sj]),
nj+1 cFN P(yj
where V}
is
eP+ \S ),nj+1 cFP P(Y
j
j
eP..\Sj )}
a random variable denoting the individual
the pool at stage
LOD
for
j
= 0,...,N -
reading of a generic
1,
(34)
member
of
j.
Since the state of the system at stage j
is
given by the
stage j, the dimensionality of the state space grows as the
30
LOD readings obtained through
dynamic programming algorithm
proceeds; hence, the algorithm in (32) and (34) cannot be efficiently used for numerical
The
calculations.
LOD
recent
following proposition shows that
reading.
Proposition 5 The
latest
We
LOD
we need only keep track of the most
state of the system at every stage can be adequately described by the
reading.
prove this proposition using Corollary 2 of Arnold, which
is
stated here for completeness.
Corollary 2 (Arnold, 1977) The conditional distribution ofYJ+ given Sj
is
\
the conditional distribution ofYj + i given
The
following
Lemma
on
S:
1
lemma, whose proof
is
only through
so that Jk(Sk)
=
Y
the
2 ;
Jk(Yk)-
same
5.
true for P(Yj
is
The
By
(34)
we can
Yj
G Rj then
transfusion,
also needed:
proposition can be proved by induction on the dynamic
and
is
true for j
Lemma
1,
Jk-i
if
Yj
all
by a
samples in the pool are
G Rj then
=
all
samples are
CF
{
fit
=
{Yj
f+\Y)
:
a function of
from Corollary
classified as
The
c FP n J+2
T },
G P.\Yj) <
31
=
k,
P(Yk G P+\Yk)
Jjt(V/t),
2.
reading Yj, and the optimal
R* and Rj
HIV
HIV
true for j
it is
for j
=
0,
.
.
.
,
N
such
negative and released for
positive
and discarded, and
critical regions are defined
c F p{l-n)j
P (Yj
LOD
classified as
{>':^<-^p(\-TT))
f+(Y)
I
is
N; assume that
set of critical regions
otherwise additional tests are carried out.
flS
=
replace the state Sj by the latest
decision rule can be described
if
is
G P-\Sj).
P_|YJt); hence, the proposition follows
Therefore,
that
Yy
given in the Appendix,
programming algorithm. The proposition
P(Yk G
as
There exists a version of the conditional probability P(Yj G P+\S3 ) that depends
Proof of Proposition
and
same
the
by
(35)
min {c FN n ]+2 P
R~ =
{Yj
:
(Yj
c FN n J+2 P
€
P+ \Y )
3
,c(n J+2 )
P.\Yj) ,c(nJ+2 )
where /_ and /+ denote the normal densities
for the
ulations, respectively. Notice that the critical region
test.
[Jj+1
(Yj+l )
\Yj]}}
+E[Jj+1
(Yj+1 )
\Yj\}}
(37)
,
<
fc e P+ \YS )
rmn{cF pnj+2 P (Y3 €
hypothesis
+E
HIV
R^
negative and
HIV
(38)
,
positive pop-
maximizes the power
for
a simple
Therefore, by the Neyman-Pearson lemma, the proposed classification pol-
icy at the individual testing stage not only minimizes the cost for the particular choices of
Cfn and
c F p,
it
also minimizes the type II error (false positive) for a fixed level of type
I
error (false negative).
5.2.
Structural Properties of the Optimal Policy
Intuitively,
terized
lYj
:
Yj
by a
<
one might expect that the optimal
{cj,c+
set of constants
R* =
c~\ and
lYj
:
Yj
>
<
:
j
classification policy could
< N} (where
cjj
=
be charac-
c%) such that
Rj —
cf\. Such a classification policy for a generalized group
testing procedure will be called a cutoff policy. Arnold obtained sufficient conditions ensur-
ing the optimality of a cutoff policy for a simpler group testing problem that possesses only
two possible
The
the
LOD
classifications. Here,
we extend
his results to the
model
following monotonicity notion was introduced in Arnold:
reading Yj has the Mon(j') property
conditional expectation E{h{Yj)\YJ _\
proposition, which
is
proved
in the
=
s) is
if
for all
in
The
5.1.
density g3 (yj) of
nonincreasing functions h(y), the
monotone nonincreasing
Appendix, provides
Subsection
in s.
The
following
sufficient conditions for the optimality
of the cutoff policy.
Proposition 6 A
cutoff policy
is
optimal
if
the likelihood ratio
-r-
ing and the density gj(y) ofYj has the Mon(j) property for all j.
32
is
monotone nondecreas-
The
Mon(j) cannot be used
definition of
whether a density has the required
for testing
property. Instead, the following proposition can be employed (see the
Proposition 7 The density g} {y) has
y\Yj-i
Mon(j) property
the
if
for
Appendix
all
for a proof).
j and all y,
P{Y} <
=
s) is a
It
turns out that neither of the sufficient conditions in Proposition 6 are satisfied by
nonincreasing function of
the normal density with
our data.
Recall that /_
and /+
normal with mean
is
is
Y^
=
/i+
larger than the variation in P_,
readings
s.
*f-
is
0.8
and a+
=
mean
random
is
—4.82 and
not monotonically nonincreasing.
variables, each of
that the distribution of Vjv|Yjv-i
=
which
is
cr_
Because the variation
1.08.
By
are distributed as a mixture of two normals.
collection of iid
/i_
The
(31), Vjv-i
is
a mixture of normals.
=
in
individual
0.42,
P+
is
LOD
the average of a
can be shown
It
a more complex mixture of normals that does not satisfy
the Mon(_7') property for our parameter values. Although neither condition in Proposition 6
is
satisfied
by our data, the optimal cutoff policy performed nearly as
optimal policy in the computational study described
5.3.
The Dorfman
In the
sample
testing.
for
in
Due
to
its
is
deemed HIV
simplicity
mass screening programs.
et al.
and Kline
et al.)
used to reduce the cost of
next section.
Policy
Dorfman procedure, a pool
the pool
in the
well as the overall
and
of a specified size
is
tested, after
which either every
negative, or every sample in the pool undergoes individual
effectiveness, this procedure
is
frequently used in practice
In particular, recent field studies (e.g., Behets et
in developing countries
HIV
screening.
such as the one described in Subsection
Therefore, the improvement achieved by the
renders
of general group testing strategies,
them more vulnerable
more complex
by the human errors incurred during implementation.
33
Emmanuel
demonstrate that such procedures can be
The complexity
5.1,
al.,
to
human
error.
testing strategy could be offset
Using the dynamic program of Subsection
rule for a
Dorfman procedure with pool
n by
size
we can obtain the optimal decision
5.1,
setting TV
disallowing the option of discarding a pool that contains
—
1
and n\
=
a
x
=
n,
and
more than one sample. However,
numerically solving the dynamic programming algorithm requires a discretization of the state
space of
LOD
and can be cumbersome and computationally
readings,
method
we propose a
relatively simple
method
on two simplifying assumptions:
relies
individuals in the pool are transfused
threshold,
and each sample
a second threshold
outcome
for
obtaining a near optimal Dorfman policy.
the
if
a cutoff policy
(i)
LOD
is
employed (that
reading of the pool
is
of the pooled test
below a certain
HIV
positive or negative),
and
The
first
assumption
is
clearly not
making the most
Consider a Dorfman policy of pool
Let Yi be the
size n.
group
P+
LOD
reading of the
Suppose that x
testing.
If
is
th
i
size
efficient
n applied
individual
and
Yp
in
use of the pooled
be the
LOD
an individual
c(l)
Let Akn be the event that
A:
+
c FP (l
-
tt)
Jx
Cg (z) =
c(n)
the cutoff for
readings for P_ and
+ c FN n /
J
f+(y)dy.
(39)
—<x>
out of the n individuals are
the group testing stage of the process
LOD
is
rx
f-(y)dy
/
reading.
test is
/+00
=
OD
reading of the poc
the cutoff employed for individual testing and z
respectively, then the cost of
However,
a seroprevalence n population.
/_ and /+ are the probability densities for the
d(n,x)
the
(ii)
not very restrictive, particularly
since cutoff policies are the only policies that are apt to be adopted in practice.
is
the
is,
used to calculate the posterior seroprevalence, but not the
is
posterior conditional densities.
the second assumption
Our
in the pool is individually tested otherwise; in the latter case,
used to classify individuals as
is
intensive. Therefore,
HIV
positive.
The
cost incurred at
is
+ c FN J2 P(A lm )P(Y p <
z\A kn )k,
(40)
fc=i
where the
first
term
negatives. Since Yi,
the testing cost and the second
is
.
,
.
.
Yn
are
iid, it
follows that
34
is
the misclassification cost of false
P{Y e P+\A kn =
l
)
£•
Under our second
assumption, the cost incurred at the second stage of the testing procedure
C
ig
(z,x)
£ P(y
=
p
>
z|.4 fc n)P(/W)nC;(-,:c);
C(n,x,z)
=
is
c(n)
C(n,x,z)
+ c FN
The proposed Dorfman procedure
be solved
among
in
two
stages:
= ^[Cg (z) + Cig (z,x)},
£
7r*(l
"j_
-
n-k
ir)
or
kP(Y p < z\A kn
)
given by the solution to min ni z C(n,x, z), which can
Obtain the optimal cutoffs x and z
for every n,
and then search
the integers for the optimal group size n.
Under our
positive
fc„
is
(41)
n
k=o
hence, the cost per individual
is
probabilistic assumptions, the
and n — k HIV negative individuals
+ +(n-fc?„-
and variance a 2n
=
Thus, a locally optimal solution
is
is
k<rl+(n-k)<rl
reading of a pool composed of k
normally distributed with mean
that .^
,
mal cutoff points are obtained by solving the
LOD
first
^^
2
_ N(^ kn ,a kn
).
HIV
fikn
The
=
opti-
and second order optimality conditions.
obtained, which turned out to be globally optimal in our
numerical studies.
6.
Computational Results
In this section,
we
assess the relative performance of four testing policies: individual
testing (with optimal cutoff values derived from equations (35)-(36)), the heuristic
policy developed in Subsection 5.3, the optimal
Dorfman policy derived from the dynamic
programming algorithm and the optimal generalized group
testing policy.
A
scenarios are considered by varying the seroprevalence and false negative cost.
of the optimal policies
is
is
described in Subsection 6.1 and the
specified in Subsection 6.2.
6.3.
In Subsection 6.4,
The
policies are tested
we apply our model
wide range of
The
derivation
Monte Carlo simulation model
on the simulation model
to the data from N'tita et
35
Dorfman
al.
in
(1991).
Subsection
6.1.
Computer Implementation
The
heuristic
Sun Sparc station
Dorfman
The
20.
policy in Subsection 5.2
partial derivatives of
was implemented using Maple on a
C(n,x,z) with respect to x and z were
obtained using symbolic differentiation, and the stationary points were identified using the
built-in routine solve.
Only stationary points
lying in the rectangle [/x_,/i+ ] x [/i_,/i+] were
considered, and these points satisfied both the
A search over the integers
in all cases.
The optimal group
size
first
and second order optimality conditions
was then employed
to obtain the optimal
was found to be bounded above by 20
for
group
size n.
n > 0.001 and cfn >
100;
hence, the search was restricted to this region and a procedure similar to interval halving
was used.
The implementation
of the
dynamic programming algorithm
is
more complex. The
continuous state space must be truncated and discretized: our state space consisted of 200
equally spaced points in
[/i_
-
6cr_,/i+
-I-
6er+],
with step size 0.074.
Simpson's numerical
integration rule with fixed interval size was employed to achieve four digit accuracy.
The Simulation Model
6.2.
The
analytical
model
in Section 5
assumes that
and employs the simplified pooling model
sufficiently realistic to provide
Monte Carlo simulation
model
for this
problem
in a pooled
LOD
randomness
in the
however, we believe that this model
more
realistic
model.
a nontrivial task: There are two possible sources of uncertainty
reading, the variability in the individual antibody concentrations
manner
OD
not
However, building a simulation
in
which antibodies are detected by ELISAs, and
assess the relative impact of each source. Moreover, the pooling
the normalized
is
a reliable assessment of the policies. Therefore, we resort to
to obtain a
is
(24);
LOD readings are normally distributed
level of
GLM
(21),
it is
and the
difficult to
which predicts
a pool as a function of the antibody concentrations of the indi-
36
viduals comprising the pool, cannot be directly simulated because the underlying antibody
concentrations are unobservable.
Consequently, we test the policies on two simulation models of varying complexity.
LOD
The simpler simulation model randomly generates
individuals from the empirical distributions of
deterministic pooling model (22). Taking
sample's antibody concentration p t
Substituting ^jj- for
is
n
=
readings for positive and negative
Dax
that appear in Figure 2(b) and uses the
1 in
equation (22) implies that an individual
related to
its
OD
normalized
This simulation model
(22) does not
X
by p
t
=
x
rrv:-
pi in (22) gives
and hence the value of the parameter k need not be estimated
model
level
is
more
realistic
employ the
linear
for the simulation
than the analytical model
approximation embedded
in
in
model.
two ways: the pooling
model
and the
(24),
LOD
readings are drawn from the empirical distributions rather than the normal distributions.
Although the simulation model ignores the stochastic component of the
the binding
mechanism
of
ELISA, both the
The more complex
model
antibody concentration and the
embedded
LOD
is
GLM
in the empirical
distributions
indirectly captures the second source of uncertainty.
simulation model attributes the variability of the
to the variability of antibody concentrations
and
arising from
variability in the
uncertainty due to the binding mechanism are
of Dax; hence, the simulation
GLM
LOD
and the stochastic component
readings
of the
GLM,
derived from the additional assumptions that (a) the stochastic component of the
is
negligible for
negative individuals
is
HIV
positive individuals
deterministic.
The
first
and
antibody concentration
(b) the
assumption
is
appearing near the end of Subsection 4.1 that the normalized
in
HIV
motivated by the observation
OD
reading for a
HIV
positive
individual with a given antibody concentration has a very small coefficient of variation.
justify the second assumption,
we
recall that the
normalized
OD
reading for a
HIV
To
negative
individual with a given antibody concentration has a coefficient of variation roughly equal
The normalized
to one.
OD
readings for
HIV
negative individuals in Figure 2(a) have
mean
0.0083 and standard deviation 0.0085, and hence coefficient of variation 1.016. Therefore, the
variability of the
normalized
OD
HIV
readings of
uncertainty in the binding mechanism that
is
negative individuals
is
mostly due to the
captured in stochastic component of the
GLM,
and consequently the variance of the antibody concentration of HIV negative individuals can
be approximated by zero.
HIV
Let p_ denote the deterministic antibody concentration of the
uals.
To estimate p_ from the
data, notice that equations (10)
and
negative individ-
(11) (with the logit
function replacing the cloglog function in (11)) imply that the normalized
HIV
negative individuals are
N \-j^rr,
(fcHfp _ )2
By
)-
setting the
mean and
OD
readings for
variance of this
normal distribution equal to the respective mean and variance of the empirical distribution
in
The
Figure 2(a), we obtain two equations and two unknowns.
tions
is
=
p_
0.968 and k
=
115.26.
The
large discrepancy
the estimated value of about 20 from Table
may be due
1
obtained from individual testing data and the other
generate the
random antibody concentrations
sample normalized
OD
readings
invert equation (20) to obtain Pi
tions (10)
and
X
l
p, for
is
between the
to the fact that
t
(21) to calculate the normalized
)
OD
latter value
and
one estimate
is
obtained from a dilution study. To
HIV
positive individuals,
from the empirical distribution
= kXJ(\ — X =
solution to these equa-
115.26X /(l —
t
reading
X
for
Figure 2(a), and then
in
X
we randomly
t
).
Then we use equa-
a pool of size n, where
the antibody concentrations of the n samples are generated from the seroprevalence and the
two distributions specified
It is
the
earlier.
not clear to us which of the two simulation models
more complex model incorporates the
distribution
may
stochasticity in the
lead one to favor the simpler model.
38
It
is
is
more
GLM,
its
realistic;
although
use of the normal
reassuring to report that the
simulation results for the two models are qualitatively nearly identical and quantitatively
very similar (expected total costs are within
we only report the
results for the
al.
Without
testing a single sample
is
estimated.
the next subsection,
briefly
comment on
the
The
To
1,
and the
false positive cost
we normalize these
cpp and the
must undergo two additional ELISAs.
false
with a positive
initial
ELISA
is
more than
ELISA
underestimate the true
1
Red Cross
individuals. Hence,
cost
may be
we have chosen
=
1.35+0.04n
cannot be as easily
one of the additional
tests
is
ELISA
positive,
used to verify the individual's serological status.
is
— 0.99 2 The Western
.
protocol
false positive cost
human
c/r/v
c(n)
positive during an initial
and labor than an ELISA
is
2
+
10(1
—
test.
may
approximately (assuming
Blot test
is
approximately
Hence, the expected
0.99 2 )
because successive
be independent, and the Western Blot test
the latter case, a
HIV
a Western Blot test
test requires
ten times as costly in materials
is
0.99, the probability that a noninfected individual
results are independent)
positive cost under the
is
negative cost
n
we note that under the current Red Cross
at least
If
then a highly specific test (Western Blot)
costs so that the cost of
cost of testing a pool of size
get a rough estimate for cfp,
Since ELISA's specificity
cost of
and the cost of testing a pool containing n > 2 samples
loss of generality,
=
c(l)
detailed cost estimates contained in the
screening protocol, individuals that are found to be
successive
in
They estimated the material and labor
are employed.
to be $2.87 + $0.083n.
2.
The
describe the model parameters.
study of Behets et
n >
and then
results for the simple simulation model,
testing a single sample to be $2.12,
for
Hence,
complex simulation model.
Now we
field
5% of each other).
=
ELISA
2.199.
results are not likely to
not be available in developing countries;
incurred, particularly
if
difficult to quantify.
39
in
test results are reported to
the conservative estimate of cfp
and are very
may
This cost
=
5.
Since a false negative cost will contaminate the blood supply, these costs are
larger than false positive costs
false
Therefore,
much
we consider
,
four different values for
(100, 1000, 5000
cfn
different values for the seroprevalence
and
and combined them with seven
10,000),
(ranging from 0.001 to 0.15) to generate 28 different
tt
scenarios that span a broad range of possible settings.
For each scenario of the simple simulation model, we randomly generated sample
LOD
readings using the seroprevalence
simulation terminated at the
95%
first
ir
and the normal distributions
time after 10,000 simulated pools when the width of the
To avoid the
confidence interval for the expected cost dropped below 0.2.
of sequential dependencies due to any inherent deficiencies of the
possibility
Turbo Pascal random
generator, the ranO routine described in Chapter 7 of Press (1988) was used.
were tested on the same random sequence of
policies
6.3.
LOD
Policies.
We
readings.
begin by comparing the individual testing policy, heuristic
Dorfman policy and optimal Dorfman
policy; later in this subsection, the generalized
testing policy will be considered. Before assessing the policies' performance,
optimal Dorfman policy turned out to be of the form: Continue at the
is
arises
because under the proposed normal
the far
of the
HIV
left tail
positive
LOD
negative reading. However, this
and does not occur
in practice (with
low
first
group
we note that the
stage
if
the
LOD
either above a cutoff point or below a second, extremely small, cutoff point. This
awkward form
HIV
All four
Simulation Results
Dorfman
reading
The
specified earlier.
in Figure 2(b).
good reason), and
LOD readings,
LOD
distributions with
reading eventually dominates the far
phenomenon
is
due
solely to
a+ >
left tail
<r_
of the
our normality assumption,
Moreover, such a policy would never be implemented
so
we disallowed the option
and only report the performance
of the
within the class of cutoff policies defined in Subsection
of continuing for extremely
Dorfman
5.3.
The
policy that was optimal
difference in performance
between the overall optimal Dorfman policy and the optimal cutoff Dorfman policy was very
40
small in our numerical study, and hereafter
the optimal
Dorfman
Our main
we
refer to the
optimal cutoff Dorfman policy as
policy.
results are reported in Table 2,
which displays their performance
which describes the
The
the simulation study.
in
policies,
first
and Table
column
Table 2
in
enumerates the 28 scenarios, and the next two columns characterize the scenarios. The
column gives the pool
size for
The remaining columns
and
cutoff points for the individual testing policy,
pooled testing stage and the second stage
is
For each scenario, Table 3 gives the
total cost of
each policy.
employ
for
both stages (the
give the
first
stage
the individual testing stage) of both
procedures.
(1)
final
each scenario, which was identical for the optimal Dorfman
procedure and the heuristic Dorfman procedure.
The
3,
95%
LOD
is
the
Dorfman
confidence interval for the expected
following three observations can be extracted from our numerical study:
The optimal and
identical
group
heuristic
sizes for
Dorfman procedures
are quite similar.
They both
each scenario, and their cutoff points for each stage are
relatively close in value in Table 2.
Rather surprisingly, as seen
procedure outperforms the optimal procedure
in
in
Table
3,
the heuristic
23 of the 28 cases, and the expected cost
reduction for the heuristic procedure relative to the optimal procedure averages 8.1% over
the 28 scenarios.
As seen
in
Table
2,
the optimal procedure
is
slightly
more conservative
more
the choice of cutoff in the pooled testing stage, resulting in policies that are
in
sensitive,
but require more testing. For low seroprevalence, the optimal Dorfman procedure seems to
overcompensate
for the dilution effect, so that the
the resulting increase in monetary testing cost.
Dorfman procedure
is
noteworthy, since this policy
improved sensitivity does not counteract
The strong performance
is
much
of the heuristic
easier to derive than the optimal
Dorfman procedure.
(2)
Group
testing
is
optimal
for all
28 scenarios, and significant savings over individual
41
testing are achieved.
The expected
to individual testing ranges from
scenarios in Table 3
is
cost reduction for the optimal
5.9%
is
40%
from 7.5% to 79.2% and averages 43.4%. The monetary
and 46%
for the optimal policy
not show the numbers in Table
sensitivity
and
3,
both Dorfman procedures are highly sensitive and
and 99.7%
for the heuristic
optimal Dorfman procedure.
for the
Moreover, although we do
for the heuristic policy.
specificity over the 28 scenarios are
individual testing policy, 99.8%
and 99.7%
cost reduction over the 28
also significantly reduced; the average reduction relative to individual testing
is
The average
and the average
relative
39.3%. For the heuristic Dorfman policy, the expected cost reduction
relative to individual testing ranges
testing cost
to 77.4%,
Dorfman procedure
specific.
99.7% and 99.7%
for the
Dorfman procedure, and 99.8%
Moreover, the sensitivity of the Dorfman
procedures never dropped below 99%.
(3) In
Table
2,
the optimal cutoff values for individual testing are very similar to
the optimal cutoff values for the individual testing stage of the two
and range from
-3.1 to -3.7; (recall that //_
However, the optimal cutoff values
are
much
maintain
its
is
more conservative
same
test kit
employed
To
at
er_
=
0.42, //+
=
0.80
and a +
=
1.05).
pooled testing stage of the two Dorfman procedures
Hence, the Dorfman procedure
is
able to
high test accuracy by a judicious choice of cutoffs at each stage; more specifically,
In contrast, previous (field
that the the
by the
-4.82,
lower, ranging from —4.4 to —4.8.
the cutoff level
effect.
for the
=
Dorfman procedures,
cutoff level
is
at the pooled testing stage to
and
statistical) researchers in
compensate
for the dilution
pooled testing have assumed
used at both stages; in particular, the cutoff level proposed
manufacturer, which presumably
is
close to optimal for individual testing,
is
both stages of the Dorfman procedure.
assess the performance of the traditional
in previous studies,
we assume that the optimal
the fourth column of Table 2)
is
employed
Dorfman policy that has been considered
cutoff level for individual testing
at both stages of the procedure.
44
Under
(i.e.,
this
2
assumption, the optimal value of the pool size n was derived using the cost function (42).
The optimal pool
size
was 15
for scenario
1
and two
for the
other 27 scenarios.
expected cost reduction relative to individual testing was 12.78%, which
39%
than the
To
to
43%
illustrate the predictive
smaller
power of our model with respect to the traditional Dorfman
seroprevalence of the 8000 samples was 2.44%.
The
HIV
where the
in Kishasha, Zaire,
et al.
traditional
Dorfman procedure with
screening by
56%
relative to individual
low reactivity individuals were not detected.
We
used the Monte Carlo
pools of size ten reduced the monetary cost of
testing; however, six
much
reduction achieved by the proposed Dorfman procedures.
we consider the study carried out by Behets
policy,
is
The average
simulation model to calculate the performance of the traditional Dorfman procedure that
employed the individual testing cutoff of scenario 15
of the procedure were
false negatives.
both stages. The expected sensitivity
96.4% and the expected monetary testing cost was 0.40
our testing cost function c(n)
analysis predicts a
at
60%
is
based on the cost model
Behets et
in
al.);
(recall that
hence, our
reduction in monetary testing cost and (0.025) (0.036) (8000)=7.
Therefore, the model accurately captures both the magnitude of the cost
savings and the extent of the dilution effect as manifested by the low reactivity individuals
that are not detectable in pools.
expected monetary testing cost
number
heuristic
of false negatives
is
is
Under the
0.43
heuristic
and the
Dorfman
sensitivity
is
(0.025)(0.0019)(8000)=0.38.
policy for scenario 13, the
99.81%, and hence the expected
Therefore,
we
predict that the
Dorfman procedure would not have had any trouble detecting the low
reactivity
individuals.
Additional scenarios were considered to generate Figure
8,
which provides switching
curves depicting the optimal group size (as calculated by the heuristic
a function of both the seroprevalence and the
is
a decreasing function of both quantities;
if
false
as
negative cost. As expected, the group size
the seroprevalence
45
Dorfman procedure)
is
high, then large group sizes
will
contain
cost
is
HIV
positive individuals with high probability.
if
the false negative
high, then smaller group sizes are required to diminish the impact of dilution. Notice
that groups of size two or three are never optimal. This
testing cost c(n)
savings realized
=
+ 0.04n: The
1.35
when
the group size
binary tests, where group testing
less
Similarly,
than (3
— VE)/2 «
is
is
cost of constructing the pool
two or
optimal
seroprevalence in Figure 8
is
if
and only
if
the proportion of defective items
The breakeven (between
in
is
c(n)
becomes increasingly important
more conservative choice
results for the
is
individual
=
is
only optimal in Figure 8 for
and group
>
1
for all n)
testing)
100
breakeven seroprevalence between 0.18 and 0.382
the form of our pooled testing cost (the traditional cost
The
larger than the limited
a nonincreasing function of Cj?n, and for c^/v
than or equal to 0.18. The gap
of statistical errors, which
is
due to the pooled
is
three. Finally, in contrast to the case of perfect
0.382 (see Ungar 1960), group testing
significantly lower seroprevalences.
leads to a
phenomenon
it is
less
due to
is
and the presence
as seroprevalence increases
and
of group size.
complex simulation model described
in Subsection 6.2,
although
The
not shown here, are consistent with the results from the simple simulation model.
average sensitivity and specificity over the 28 scenarios are 99.8% and 99.7% for the heuristic,
and 99.8% and 99.8%
for the optimal. Relative to the individual testing policy, the
expected cost reduction over the 28 scenarios
46.1%
for the heuristic
is
42.4%
Dorfman procedure, which
values in the simple simulation model.
complex simulation model occurs
The only
in scenario 4,
for the
average
optimal Dorfman procedure and
are slightly larger than the corresponding
more
qualitatively different result for the
where the low seroprevalence and high
false
negative cost lead to a very conservative cutoff at the pooled testing stage for the optimal
Dorfman procedure
(see
Table
2).
individual tests and fared poorly.
-4.76, the
Consequently, the optimal procedure performed too
When
monetary testing cost drops
the optimal pooled cutoff
is
increased from -4.801 to
drastically, while the sensitivity
46
many
and
specificity
remain
1
0.1
0.01
o
in
0.001
00001
.
for
two scenarios. For scenario
initial
pool size n\
=
4,
19,
and consider the subpool configuration
where the optimal Dorfman pool
the subpool configurations (a)
a3
=
1.285
2.
where the optimal Dorfman pool
size
N=
is 8,
2,
a.\
we
=
2
let
size
in Figure 7.
the initial pool size
and a 2
=
4,
and
is
=
ri\
N—
(b)
four,
we
let
the
For scenario
6,
and consider
8,
3 and a
{
=
a2
—
For scenario 19, the expected total cost of the generalized multistage procedure was
± 0.0098,
scenario
6,
which
higher then the cost of either of the proposed
subpool configuration
total cost of 0.3558
the optimal
although
is
±
(a)
outperforms configuration
0.014. This corresponds to a
5.6% expected
Dorfman procedure, and a 0.6% reduction over the
it is
(b),
Dorfman
and
policies.
For
yields an expected
cost reduction relative to
heuristic policy. In
summary,
possible to obtain generalized multistage policies that outperform the
Dorfman
procedures, the additional improvement appears to be offset by the difficulty in deriving and
implementing these
6.4.
policies.
Application
The numerical
results in Subsection 6.3 (for example, the switching curves in Figure 8)
cannot be universally applied for several reasons. Our numerical results depend upon the
HIV
positive
and HIV negative
distributions, which
seroconversion rates (a larger seroconversion rate
readings for
HIV positive
the testing cost c(n)
differ across
lead to a fatter
we
will loosely
the world, due to
left tail
individuals) or the particular strain of virus that
may depend upon
country. Nevertheless,
may
may
is
of the
LOD
prevalent. Also,
various economic factors that are distinctive to each
apply our results in a documented setting to obtain a
rough estimate of the benefits that are achievable from group testing.
In Kishasha, Zaire, of the 3741 units of blood transfused in February 1990, 1045
(27.9%) were not screened for
HIV
infection (see N'tita et
al.).
Assuming that
consequence of budget constraints, we can propose an alternative strategy that
48
this
was a
will reallocate
funds across the transfusion centers so that every blood donor can be tested for antibodies
to
HIV. Since 72.1% of the units were individually
currently implemented policy
2.5%
Behets
(see
et
is
was employed on 72.1% of the
policy are
99.8% and 99.6%,
transfused
is
positives
+
units.
Seroprevalence in Zaire
(0.025) (0.002) (2696)
=
10.24.
If
scenario 15, then the monetary testing cost
the expected specificity
transfused
is
is
cost of the
estimated to be about
Since the expected sensitivity and specificity of this
respectively, the expected
(0.975) (0.004) (2696)
is
monetary testing
Suppose that the individual testing policy under scenario 15
al.).
(0.025)(1045)
0.721.
tested, the
is
99.6%.
(0.025)(0.002)(3741)
is
=
number
26.25,
we use the
0.54, the
of infected units that are
and the expected number of
heuristic
Dorfman
expected sensitivity
false
policy under
is
99.8% and
Hence, the expected number of infected units that are
=
0.18,
and the expected number of
false positives
is
(0.975)(0.004)(3741)=14.6. In summary, pooled testing in this setting reduces the monetary
testing cost
by 25% and reduces the expected number of infected units transfused from 26
to essentially zero.
It is
clear that pooled testing,
if
used properly, can save hundreds of
lives
worldwide.
7.
Concluding Remarks
We
effect that
have developed and validated a mathematical model that captures the dilution
occurs
when HIV
ized linear models (13)
and
positive sera are pooled with
(21) develop
new
HIV
negative sera.
The
general-
insights into the nature of the dilution effect,
and avoid the heteroscedasticity problem that has plagued the traditional regression models
obtained through a purely empirical approach. These
may be
model
(24)
proach
may be
in
GLMs
and the
useful for other applications besides group testing,
applicable whenever pooled testing
a liquid.
49
is
simplified pooling
and our general ap-
used to identify a disease or contaminant
Our numerical
results suggest that the heuristic
5.3 provides a cost-effective, accurate
mented HIV screening
and
Dorfman policy derived
Subsection
relatively simple alternative to currently imple-
protocols. This policy can be used in developing countries to safeguard
the integrity of the blood supply, and consequently reduce the spread of the
While existing
in
field studies
AIDS
epidemic.
and mathematical analyses assume that the same cutoff point
is
used to classify the pool at both stages of the Dorfman procedure, our analysis shows that
only by selecting a different cutoff point at each stage can we ensure that the sensitivity of
the test
is
not compromised. Finally,
estimation; in a
HIV
testing
is
also extensively used for seroprevalence
companion paper, we show how the pooling model developed and validated
here can be employed to derive efficient seroprevalence estimates.
Acknowledgment
We
are very grateful to Barbara
and Richard George
for
Cahoon- Young, Elizabeth Dax, Esther de Gourville
We
providing data.
also
thank Karla Ballman, Barbara Cahoon-
Young, Elizabeth Dax, Richard George, David Heymann, Richard Kline, Eugene Litvak,
Sheila Mitchell, Peter Page, Constantia Petrou, Chris Stowell, Hiko Tamashiro,
and Guido
van der Groen for helpful discussions about various aspects of pooled testing. This research
is
supported by National Science Foundation grant DDM-9057297 and American Foundation
for
AIDS Research (AmFAR)
grant 02100-15-RG.
Appendix
Proof of Proposition
1.
By
definition, for s
Vi>a+1 =
=
2* +1
2
= 0, 1,2,
(V£ +1 -aO
(44)
l
s+l
(
-Y?3 +
e hS+l
-n)
= Vis + 2s (2ei>3+l -f
J>
50
)
(45)
(46)
= Vi3 + ciiS+1
where
Ci,,+i is
a random variable with zero mean.
Proof of Proposition
form, the
(47)
,
minimum
E(Vu (x) -
is
2.
Since
derived from the
Ko(x))
2
E(V j(x) — Vl0 (x)) 2
l
first
V'l0
= E[(yu -
Ko)
s
(2
positive semi- definite quadratic
order conditions.
= E[(Vis -
+
is
-
-
We
have
2
+
(2°
2
+ 2E[(VU - ViQ ){2* -
\)(jjl
x))
(48)
}
]
-l) 2 (fi-x) 2
l)fji
-
x))\
(49)
,
and therefore
^E(Vu{x)-
2
-2(2 5 -l)E[(K,-Ko)]+2(2 i -l) 2 (x-/i)
=
Vio(x))
= 2(2'-l) 2 (x-^),
since
E(Vi3 = E(Vi0 ) by
)
Proposition
Proof of Proposition
and using Proposition
1, i.e.,
3.
The
Proof of Proposition
)
4.
We
want to obtain the
n
x
=
/z.
E(p.) in equation (29)
^3
,
^
r.
set of
nonnegative weights Wi
Equation (29) can be reexpressed as
p..
sSi g.i«.(i-2
,
attained at
0.
minimizing the sample variance of the estimator
M=M
is
by calculating
result follows
E(Vi3 - Vi0 =
(51)
minimum
Thus, the
1.
(50)
)(y<»- Wo)
^7T^
2
10£LiMl-2*)
(° 2 )
•
]
Since different samples are independent, the optimal choice of weights
is
given by the solution
to the following minimization problem:
minimize
subject to
3
E[£, s=1
YL]=\
w
w
s
3
(l
{\
-
-
2
2"){Via
s 2
)
=
w >0.
s
51
- Vi0 )] 2
constant
(53)
(54)
The minimization problem can be
simplified
= w s (l — 2 s ),
by defining x 3
so that the objective
function (53) becomes
3
3
£K*2(Vi. -
Ko)
2
3=1
If
k
>
3
+ 2EQ2 Yl x k x,{Vu - Vi0 )(Vik -
]
(55)
then
s,
-V
E[(Vis
)(Vlk
l0
-V
-V
l0
= E[(VU =
Vij.
(58)
and
(59) follow
Combining equations
(55)
)\Vls }}
(56)
)E[(Vlk
-V
)\Vls
(57)
l0
l0
2
Vio)
(58)
]
(59)
total probability for conditional expectations,
from the martingale property of the
and
}
Var(K,)-
Equation (56) follows from the law of
and equations
-V
l0
ls
ls
)(Vlk
-V
= E[E[(V
l0 )}
= E[(V
walk
Ko)].
s=\k=s+\
driftless
random
(59) gives the following, simplified version of the
minimization problem:
minimize
2
£'=i x 3 Va,r(Vis )
+ 2 £'=i (xaVarV^EJL+i
ELi( 2 ' -
subject to
l)x,
xs >
where c
is
solution
is
60 )
(61)
0,
obtained by formulating the Lagrangian
3
L(x,\)
=
£x
3
2
Vai(V ,)
i
3
+ 2Y,(x a Vai(Via
*=1
3=1
The Karush-Kuhn-Tucker optimality
ws
c
(
the normalization constant.
The optimal
weights
=
**))
satisfying ££-
=
and
£
)
3
x )+2A(c- £(2' - l)x a ).
fc
conditions state that
(61),
then
w
is
(62)
3=1
k=S+l
if
there exists A and nonnegative
the optimal solution of the minimization
problem. This derivative can be written as
Var(K 5 )
£ x + £ ^Var(Kifc) =
k
k=s
fc=l
52
A(2*
-
1).
(63)
By
the orthogonality property of martingales (Williams 1991, Chapter 12),
Var(Vls )
for all s
>
Combining equation
k.
Vai(Via )
£x
The Karush-Kuhn-Tucker
notation.
Let
U
vector defined by
unit vector.
be the
us
=
2
3x3
s
—
The optimality
1,
=
Var(t&)
and
(63)
+ Vax(Vu - Vik )
(64)
=
(64), the condition J^-
+ J2 x k Va.r(Vl3 - Vik ) = X{V -
k
Us k =
matrix defined by
3x1
reformulated as
(65)
1).
more conveniently formulated using matrix
conditions are
v be the
is
Var(V„ — V,* )l(
r
:
vector given by v s
=
,>fc),
J
u be the
Var(Vi s ) and e be the
3x1
3x1
conditions can then be expressed as
T
(x e)v
+ Ux = Xu
(66)
T
u x
=
c
(67)
x
>
0.
(68)
In order to obtain the optimal vector x, the vector quantities
U and v
should be determined.
For the random walk model described in Subsection 4.3,
v3
From
the
first
row of equation
(66),
= a 2 (F-l).
we obtain
ax T e =
since the first
and
(70),
row of
X,
U is the zero vector and Ui =
equation (66) becomes
(69)
Ux =
0.
1.
Hence, Xi
(70)
Therefore, by combining equations (69)
=
X2
=
0,X3
=
f
the optimality conditions. This completes the proof of Proposition of
estimator
and A
4.
=
2
cr
X3 satisfy
The most
efficient
is
p-_
-
Et = l( 8 ^"iN -
Vjp)
/
•
70
53
\
71
i'i;
Lemma
Proof of
N; assume that
expectations and the fact that
E
is
equal to E\E[I, Yk&p
\E[Ly €P
i|Vfc]|Yjfc_i
,
Jj(Xj)
is
by induction. The statement
€P
\
\Sk]\Sk-\
By
•
,
=
when
Corollary
The proof
is
this expression
2,
latest
LOD
observation
j
=
for conditional
we can
and by the induction hypothesis, the
is
right
equivalent to
B
Yfc-i-
by induction. Recall that from equation
(34)
5,
= min{aJ+1 (c(nJ+2 + E[J]+l (YJ+l )\Yj}),
)
nJ+l cFN P(Y, e P+IYfrnj+iCppPiYj €
Let Jj{Yj)
true
individuals in the pool are indistinguishable,
which depends only on the
6:
is
Using the law of total probability
k.
AYk ]\Sk-\
Proof of Proposition
and Proposition
all
= E E[L Y
write £[/ry- €P JSfc_i]
side
=
true for j
is
it
The proof
1:
JjiYj-nj+iCFNPiYj g
P+\YN ). By the law
P+ |^-).
and
P-lYj)}
j
for
= 0,... ,N-
1.
(72)
F(YN = c FP P(YN g P_\YN )-c FN P(YN g
)
of total probability and the fact that
all
individuals in the pool are
indistinguishable,
H
aj+l E[Jj +l (Yj+l )\Yj\
-n j+l CF N P(YjeP+ \Yj)
=
a J+ \E Jj + i(Yj + i)
=
a J+1 E[JJ+1 (yj+1 )l^l,
—
ti
j+ 2CfnP{Yj+\ g P+\Y]+\)\Y3
(73)
and
nj+l
=
(c FP P(Yj
n]+l E
e PAY,) - CFNPiYj e P+\Yj))
[c FP
P(YN g P.\YN )
c F nP(Yn
e
P+ \YN )\Y,_
= nj+1 E[F(YN )\Yj}.
(74)
Subtracting Uj + iCfnP(Y] g P+\Yj) from both sides of (71) yields
Jj (Yj )
= irw{aj+ Mnj+2) + E[jj+1 (Yj+l )\Y^
54
(75)
Straightforward algebraic manipulations show that
F(YN =
hence, F(Y)v)
is
J\ (Yn)
.
)
(76)
monotone nonincreasing by the assumed monotonicity of
min{0, F(V,v)}, the function Jn(Yn)
y-. Since J/v(V/v)
=
monotone nonincreasing and the unique root of
is
F(V^v) gives the optimal cutoff c^ for stage N.
To prove
Jj+i(Yj+\)
on the
{Yj
RJ =
{Yj
< cj}
Yj
:
Yj
<
c~). Moreover, Jj(Yj)
is
/R
P(X >
7.
It is
J3 {Yj)
x
)\Z
=
Zl ]
known
[ P(h(Xi) > x\Z
=
I {P{Xi <h-\x)\Z =
=
if
and only
if
exists cj (the
Y <
3
c~; thus,
z2]
z )dx
x
z
l
:
Yj
>
first
c+}.
E{X) = J^P{X > x)dx and E(X\Z) =
that
- E[h{Xi)\Z =
=
<
us assume that
let
property, the nonzero terms
=
Rf = {Y}
x\Z)dx. For a nondecreasing function h and
E[h{X
,
monotone nonincreasing. This completes the
part of the proof. Similar arguments establish that
Proof of Proposition
some cj
monotone nonincreasing. Therefore, there
of the roots of the two terms) such that
:
for
Then by the Mon(j')
monotone nonincreasing.
right side of (74) are also
minimum
RJ =
is
inductively that
Z\
>
Z2,
=
- f P(h(Xi) > x\Z =
)-P(X <h-\x)\Z =
l
z 2 )dx
(77)
z 2 )}dx
(78)
0.
References:
Arnold, S.F. 1977. Generalized Group Testing. Annals of Statistics 5, 1170-1182.
Behets, F.,
S.
Bertozzi,
M.
Kasali,
M. Kashamuka,
L. Atikala, C.
Brown, R. W. Ryder and
C. Quinn. 1990. Successful use of Pooled Sera to Determine HIV-1 Seroprevalence in Zaire
55
with Development of Cost-Efficiency Models.
Burns, K. C. and C. A. Mauro.
Concentration.
Commun.
Cahoon-Young,
B.,
tivity
and
AIDS
4, 737-741.
Group Testing with Test Error
1987.
Statist. -Theory
Meth. 16, 2821-2837.
A. Chandler, T. Livermore,
Gaudino and
J.
Specificity of Pooled Versus Individual Sera in
Antibody Prevalence Study.
Pool Size for Determination of
HIV
J.
Human
Immunodeficiency Virus
Gaudino and R. Benjamin
Prevalence in
Low
Cox, D. R. and D. V. Hinkley. 1974. Theoretical
HIV
1992.
Optimal
Risk Populations. Presented at the
Surveillance Workshop. South San Fransisco,
Dax, E. M. Director, National
Benjamin. 1989. Sensi-
Ft.
Clinical Microbiology 27, 1893-1895.
J.
Cahoon-Young, B. A. Chandler, T. Livermore,
HIV/ AIDS
as a Function of
CA.
Statistics.
Chapman and
Hall,
London.
Reference Laboratory, Melbourne, Australia. 1993. Pri-
vate Correspondence.
de Gourville, E. Research Associate,
CAREC,
Trinidad W.I. 1992. Private Correspondence.
Dorfman, R. 1943. The Detection of Defective Members of Large Populations. Ann. Math.
Stat.
44, 436-441.
Emmanuel,
Human
J.
C, M.
T. Bassett, H.
J.
Smith and
Immunodeficiency Virus (HIV) Testing:
ing Countries.
J. Clinical
Fisher, R. A. 1922.
On
J.
A. Jacobs.
1988.
An Economical Method
Pooling of Sera for
for use in
Develop-
Pathology 41, 582-585.
the Mathematical Foundations of Theoretical Statistics. Phil. Trans.
R. Soc. 222, 309-368.
56
George, R.
trol,
J.
Atlanta,
George,
fection. In
George,
J.
Ft.
GA.
Hastie, T.J.
for
Disease Con-
1992. Private Correspondence.
and G. Schochetman.
AIDS
(ed.),
HIV/AIDS, Center
Chief. Dev. Technology Section. Division
Testing, Methodology
Springer- Verlag,
and D. Pregibon.
New
Serological Tests for the Detection of
1985.
and Management
Issues, G.
HIV
Schochetman and
J.
R.
York, 49-69.
1992. Generalized Linear Models. In Statistical Models in
J.M. Chambers and T.J. Hastie,
In-
(ed.).
Wadsworth
&
Brooks/Cole Computer Science
S
,
Series,
California, 195-247.
Hull, B. 1991.
Serum Pooling
for
HIV
Screening in Trinidad and Tobago. Carribean Epidi-
mology Center Technical Report.
Hwang,
F. K. 1976.
Group Testing with a Dilution
Hwang,
F. K. 1984.
Robust Group Testing.
Johnson, N.
L., S.
Chapman and
J.
Effect.
Quality Technology 16, 189-195.
Kotz and X. Wu. 1991 Inspection Errors for Attributes in Quality Control.
Hall,
London.
Kline, R. L., T. A. Brothers, R. Brookmeyer, S. Zegger
of
J.
Human
Biometnka 63, 671-673.
Immunodeficiency Virus Seroprevalence
in
and T.C. Quinn. 1989. Evaluation
Population Surveys using Pooled Sera.
Clinical Microbiology 27, 1449-1452.
Ledro-Monroy, G. and E. Archbold. 1990.
HIV Serum
Pooling Study. Cruz Roja Ecuatori-
ana.
Litvak, E., X.
M. Tu and M. Pagano.
1992. Screening for the Presence of
57
HIV by
Pooling
Sera Samples: Simplified Procedures. Working Paper, Harvard School of Public Health.
Madansky, A. 1988. Prescriptions for Working
McCullagh,
and
and
J.
A. Nelder.
I.,
1989 Generalized Linear Models, 2nd Edition.
York.
Chapman
K. Mulunga, C. Dulat, D. Lusamba, T. Rehle, R. Korte and H. Jagger.
Risk of Transfusion-Associated
Press,
W.
in C,
The Art of
HIV
Transmission
H., B. P. Flanney, S. A. Teukolsky
Tamashiro, H.,
Scientific
W.
Thompson, K.H.
Maskill,
1962.
in
Kishasha, Zaire.
AIDS
5,
1991.
437-439.
and W. T. Vetterling. 1988. Numerical Recipes
Computing. Cambridge University Press, Cambridge.
J.
Emmanuel, A. Fauquex,
Reducing the cost of HIV antibody
testing.
P.
Sato and D. Heymann.
1993.
Lancet 342, 87-90.
Estimation of the proportion of vectors in a natural population of
Biometrics 18, 568-578.
insects.
Tijssen, P. 1985.
Elsevier,
Unger,
New
London.
Hall,
N'tita,
P.
Statisticians. Springer- Verlag,
Laboratory Techniques in Biochemistry and Molecular Biology. Vol.
15.
Amsterdam.
P.
1960.
The
cutoff point for group testing.
Communication on Pure and Applied
Mathematics 13, 49-54.
Williams, D. 1991. Probability with Martingales. Cambridge University Press, Cambridge,
England.
58
MIT
3
TDfiO
LIBRARIES OUPL
OOflSbbEl
5
21:25
036
Date Due
-
AUG i
o iggg
Lib-26-67
Related documents
Download