The Island Problem Revisited

advertisement
The Island Problem Revisited
Halvor Mehlum1
Department of Economics, University of Oslo, Norway
E-mail: halvor.mehlum@econ.uio.no
March 10, 2009
1 While
carrying out this research, the author has been associated with the ESOP
centre at the Department of Economics, University of Oslo. ESOP is supported by The
Research Council of Norway.
The author is grateful to the referees and the editors for stimulating comments and
suggestions. The author also wishes to thank David Balding, Philip Dawid, Jo Thori
Lind, Haakon Riekeles, Tore Schweder, and to participants at the Joseph Bell Workshop
in Edinburgh.
Abstract
I revisit the so called Island Problem of forensic statistics. The problem is how
to properly update the probability of guilt when a suspect is found that has the
same characteristics as a culprit. In particular, how should the search protocol be
accounted for? I present the established results of the literature and extend them
by considering the selection effect resulting from a protocol where only cases where
there is a suspect reach the court. I find that the updated probability of guilt is
shifted when properly accounting for the selection effect. Which way the shift goes
depends on the exact distribution of all potential characteristics in the population.
The shift is only marginal in numerical examples that have any resemblance to real
world forensic cases.
The Island Problem illustrates the general point that the exact protocol by
which data are generated is an essential part of the information set that should be
used when analyzing non-experimental data.
KEY WORDS: Bayesian method, Forensic statistics, Collins case.
1
INTRODUCTION
In 1968, a couple in California with certain characteristics committed a robbery.
The Collins couple, who had the same characteristics as the robbers, were arrested
and put on trial. In court, the prosecutor claimed that these particular characteristics would appear in a randomly chosen couple with the probability 1/12,000,000
and that the couple thus had to be guilty. The suspected couple was convicted,
but The California Supreme Court later overturned the conviction (The Supreme
Court of California 1969).
The Collins’s case, or a stylized version of it called the “Island Problem” (Eggleston 1983), has a central place in forensic statistics literature. This problem can
be formulated as a “balls in an urn” problem: An urn contains a known number
of balls with different colors. Then a ball is drawn. Its color, which happens to
be red, is observed. The ball then goes back to the urn. Now, a search for a red
ball is carried out and a red ball is indeed found. What is the probability that
the first and the second red balls are in fact the same ball? That is, what is the
probability that the suspected ball really is the guilty ball? In the literature, one
central theme is how to properly extract information from the circumstances of the
case and the search protocol used when finding the suspect. Central contributions
start with Fairley and Mosteller (1974), Yellin (1979), and Eggleston (1983) and
continue with Lindley (1987), Dawid (1994), Balding and Donnelly (1995), and
1
Dawid and Mortera (1996). These authors show how changes in the assumption
regarding the search procedure change the results.
In addition to being a central starting point for thinking about forensic statistics, the Island Problem is a good example of how the immediate intuition about
probabilities may be wrong. The variations of the Island Problem provide several
illustrations of how to appropriately take account of the often subtle information
in non-experimental situations. For all students of statistics the Island Problem
can serve as a stimulating and challenging puzzle. In particular it can be used in
lectures relating to statistics and society, where the students are invited to contemplate the statisticians role as an expert scientist trying to extract information
from non-experimental situations.
In this note I introduce a possible issue relating to selection in the Island
Problem. My question is as follows: given that the search procedure does not
always produce a suspect, and assuming that for each case with a successful search
there are several cases with unsuccessful searches, what is then the appropriate
analysis of the problem? In contrast to the other authors, I include in the analysis
the fact that only a selected subset of criminal cases reaches the court. As in the
other contributions, I consider a highly stylized and abstract version of the case.
I will explain this argument using the helpful ‘balls in an urn’ analogy. I will first
present two central solutions from the literature. I will then show how the analysis
2
should be modified when taking into account the fact that we are faced with a
selected case.
2
URN MODELS
Consider an urn containing N balls. The N balls in the urn may be a sample from
an underlying population with M different colors. Now, a robbery takes place:
one ball is drawn at random from the urn. All balls are equally likely to be drawn.
The color of the ball is observed as being red, and the ball is put back in. Let this
part of the evidence be denoted F1 = ‘first ball drawn at random is red’.
The question now is: How should F1 affect our belief regarding the number
of red balls in the urn? When the ball we draw at random is red, we will adjust
our belief about the likely number of red balls in the urn. The only knowledge we
have before picking a red ball is that the number of red balls, X, is distributed
bin (N, p) with N and p known. Given the evidence F1 the distribution of X may
be updated using Bayes’ formula:
P (F1 |X = n)
=
P (X = n|F1 ) = P (X = n)
P (F1 )
3
N
n
pn (1 − p)(N −n)
n/N
p
(1)
By re-arranging,
P (X = n|F1 ) =
N −1
n−1
pn−1 (1 − p)(N −1)−(n−1)
(2)
which is the unconditional probability of there being n − 1 red balls among the
N − 1 not observed balls.
Now, a search for a red ball is conducted. Building on Balding and Donnelly
(1995) and Dawid and Mortera (1996), I will first discuss two possible search
procedures: search until success and random search.
2.1
Search until Success
Yellin (1979) was the first to consider a procedure consisting of a search through
the urn until a red ball, the suspect, is found. If no record is kept of the balls
screened, no additional evidence is gained in the process of finding the second red
ball. This search protocol always produces a suspect and the question of guilt G is
the question of whether the second ball is identical to the first ball. The larger the
number of red balls, the lower is the probability of guilt. For a given number of
red balls, X = n, the probability of guilt is 1/n. As X is unknown, the probability
of guilt is
P (G|F1 ) = E X −1 |F1
4
Using formula (2), the probability of guilt is
N
X
1 − (1 − p)N
1
1 N − 1 n−1
(N −n)
p
(1 − p)
=
=
P (G|F1 ) =
n n−1
Np
E (X|X ≥ 1)
n=1
Hence, the probability of guilt P (G|F1 ) has a simple solution which happens to
be identical to the reciprocal of E (X|X ≥ 1).
This solution is different from the solution following from the California
Supreme Court’s erroneous argument. The California Supreme Court’s interpretation of the evidence can be formulated as F0 =‘there is at least one red ball’. In
that case the probability of guilt is P (G|F0 ) = E (X −1 |X ≥ 1). Given this particular relationship between the expressions for P (G|F1 ) and P (G|F0 ) it follows
from Jensen’s inequality that P (G|F1 ) < P (G|F0 ). Hence compared to Yellin
(1979) the California Supreme Court overstated the probability of guilt.
2.2
Random Search
By design, the above search always produces a suspect and provides no additional
information about the distribution of X. Another alternative, as explored by
Dawid (1994), is the random search where only one ball is picked at random.
If this ball is not red the question of guilt is straight away answered negatively.
If, however, the randomly selected ball is indeed red, the additional evidence is
F2 = ‘second ball drawn at random is red’, and the distribution of X should be
5
updated accordingly. Using Bayes’s formula we get:
P (X = n|F1 ∩ F2 ) = P (X = n|F1 )
P (F2 |X = n ∩ F1 )
P (F2 |F1 )
(3)
When conditioning on the number of red balls X = n, the events F1 = ‘first
ball drawn at random is red’ and F2 = ‘second ball drawn at random is red’ are
independent (the draws are done with replacement); it therefore follows that the
numerator can be written as
P (F2 |X = n ∩ F1 ) = P (F2 |X = n) =
n
N
The denominator in (3) can be written as
P (F2 |F1 ) =
N
N
X
n
1 X
E (X|F1 )
P (X = n|F1 ) =
nP (X = n|F1 ) =
,
N
N
N
n=1
n=1
where it follows from (2) that E (X|F1 ) = 1 + (N − 1) p. Therefore (3) can be
written as
P (X = n|F1 ∩ F2 ) =
n
P (X = n|F1 )
E (X|F1 )
(4)
The probability of guilt is now
N
X
n
1
1
P (G|F1 ∩ F2 ) =
P (X = n|F1 ) =
n E (X|F1 )
E (X|F1 )
n=1
6
(5)
As X −1 is a convex transformation for X ≥ 1, it follows from Jensen’s inequality
that
1
< E X −1 |F1
E (X|F1 )
From above we know that P (G|F1 ) = E (X −1 |F1 ) hence
P (G|F1 ∩ F2 ) < P (G|F1 ) .
The probability of guilt of a suspect is thus lower in the random search than in
search until success. The intuition is simple: when two balls drawn at random
happen to be red it increases the likelihood of a large number of red balls more
than if only one ball drawn at random is red. This updating assumes that we are
faced with one experiment, where two balls drawn with replacement happen to be
red. Thus only a fraction of first draws (robberies) will lead to a case. With the
random search protocol there will therefore be a subtle selection effect of cases. It
is the analysis of this selection effect that is my own contribution.
2.3
Random Search with Selection Effect
I will in the following show how the analysis changes when taking account of
the selection effect. As before, consider an urn containing N balls drawn from an
underlying population of M colors. The urn is characterized by the joint frequency
7
of balls by color Xi (i = 1..M ). Now, one of a never ending series of draws takes
place, one ball is drawn and put back in. Then a second ball is drawn. If the balls
are not the same color, the case is closed and the next potential case occurs.
Assume that after an unknown number of draws (i.e. potential cases), drawn
from the same urn, there is a case where the first and the second ball are of the
same color, red. When this case arrives the question is, as before: Are the balls
identical? Let the evidence be denoted F3 = ‘The first time that both balls are
of the same color they are red’. Let Xi , i ∈ [1, M ], denote the number of balls of
color i, where i = 1 is the color red and where
PM
i=1
Xi = N , then Bayes’s formula
gives
P (X1 = n|F3 ) = P (X1 = n)
P (F3 |X1 = n)
P (F3 )
(6)
The denominator P (F3 ) is the unconditional probability of the first instance of
two consecutive draws of balls of the same color involves two red balls. In an urn
where X1 ...XM denotes the number of balls of each of the M colors (and where
X1 is the number of red balls) the probability of drawing two balls of color i is
(Xi /N )2 . Given X1 ...XM the probability of the first instance of two consecutive
draws of balls of the same color involving two red balls is thus
2
(X1 /N ) /
M
X
2
2
(Xi /N ) = X1 /
i=1
M
X
i=1
8
Xi 2
(7)
The expectation of (7) over all combinations of X1 ...XM determines the denominator in (6), P (F3 ) .Hence
P (F3 ) = E
X1 2 /
M
X
!
Xi 2
i=1
The numerator in (6) P (F3 |X1 = n) follows when taking the expectation of (7)
conditioning on the number of red beads X1 = n
P (F3 |X1 = n) = E
X1 2 /
M
X
!
Xi 2 |X1 = n
i=1
=E
n2 /
M
X
!
Xi 2 |X1 = n
i=1
It therefore follows that (6) can be written as
P
2
X
|X
=
n
E n2 / M
i
1
i=1
,
P (X1 = n|F3 ) = P (X1 = n)
P
M
2
2
E X1 / i=1 Xi
(8)
This expression can generally not be simplified further. In order to get an idea on
how updating based on F3 compares to updating based on F1 ∩ F2 I will first look
at some numerical illustrations and then at some approximations for large N .
2.4
Numerical Illustrations
Assume that the underlying distribution amounts to a multinomial distribution
where pi is the probability of color i. I will consider two problems:
9
1. The Tiny island, where N = 5 and P (red ball) = p1 = 1/6
2. The Large island, where N = 100 and P (red ball) = p1 = 0.004
The parameters of the Large island corresponds exactly to Eggleston’s (1983)
Island Problem.
In each of these problems the marginal distribution for red balls is fixed by
bin(N, p = p1 ). For each of these problems I look at three different sets of assumptions regarding the probabilities of colors other than red.
i) Red and green, where M = 2 and p2 = 1 − p1
ii) Red and palette, where M is enormous and p2
= · · · = pM
=
(1 − p1 ) / (M − 1) ≈ 0
iii) All equal, where p1 = p2 = · · · = pM = 1/M (M = 6 in the Tiny island and
M = 250 in the Large island)
The calculations of the results for i) and ii) are straightforward. The calculation
for iii) is more demanding. By construction the combination Tiny island and iii)
is identical to a throw of five dice. In gambling it is known under the name ”poker
dice”, a variant of yatzee. Poker dice is analyzed in the book in gambling by
Epstein (1967 p154) and the essential probabilities can be taken from there. For
the combination Large island and iii) some CPU time is needed and the complete
calculations involve integrating over all 190 mill. partitions of the number 100. I
10
approximate by integrating over the 600,000 partitions with the highest probability.
These account for 1 − 10−6 of the probability mass.
The results for the probability of guilt are summarized in Table 1. The row at
the bottom includes, as a reference, the result of the random search P (G|F1 ∩ F2 )
from (5).
Table 1: Probability of guilt, P (G|F3 ) in the iterated case and in the reference.
Tiny island
Large island
i)
Red and green
0.574
0.712
ii)
Red and palette
0.674
0.722
iii)
All equal
0.643
0.719
Random search, P (G|F1 ∩ F2 )
0.600
0.716
reference
The results show that the probability of guilt P (G|F3 ) = E (X −1 |F3 ) indeed
varies with the distributional assumption regarding the distribution of colors other
than red in the underlying population. The probability of guilt may both be above
and below the reference case of random search P (G|F1 ∩ F2 ).
In order to understand the logic behind the result for a particular parameter
configuration one needs to consider all the possible combinations of colors that
the configuration can generate. The essential insight is gained by looking at the
distribution for the updated distribution of X1 . For the Tiny island the updated
distribution P (X1 = n|F3 ) is given in Figure 1. The lower density is the prior
11
Figure 1: Tiny island, updating the distribution.
P
0.5
.
.... .
. .
.... . . ...
... ................ ..
.
.
........... ........ ...
......
.................
....... ...
.............
... .... ..
............. . . ...........
.
.
.
.
.
.
.
.
.
.
.
... .. .
.
..
...............
.
.
... ... ...
.
.
.
.....
.
.
.
... ... .
.
.
.
..... .
..
... ..... ..
.
.
.
.
.
.
.
... ... ...
. . ....... ........
.
... ....
..... ...
.
... . ... ....
.
.
.
... ..... .
...........
.
.
... . .. ..
.. . ......
.
.
... . ... ...
.. ... ......
.
... . ...
.
.....
..
... .. ... ...
.
.
.
.....
... . .. ..
...
.....
... . ... .
.....
... . ... ...
.
.....
... .. ... .
...
.....
... . .. ..
.....
... . ... ....
.....
... . ... .
.....
... .. ... ......
...
.....
... .
... ....
.....
... .
......
.....
... ..
... .
......
... .
........
......
... . .
....
......
..... . .
.......
......
.
....... . .
......
....... . . .........
......
......
....... .
..
......
....... . . .......
......
.
.
........ . . ......
......
....... . ....
......
........ . ....
......
........ . ......
......
........ . ........
.......
........ ....... ...
......................
........
.. ...... ........
.
......................
... ........
.........................................................................
...................
.........................................................................
............
.........
ii)
0.4
0.3
iii)
(reference)
i)
0.2
(prior)
0.1
1
2
3
4
n
5
P (X1 = n), which is common for all the parameter configurations.
Consider the updated density in configuration i) (Red and green). This density
is adjusted most down for X1 = 1 while it is adjusted most up for X1 = 3. These
adjustments follows from the Bayesian updating given by (6)
P (X1 = n|F3 ) = P (X1 = n)
P (F3 |X1 = n)
P (F3 )
(6)
The likelihood of F3 = ‘The first time that both balls are of the same color they
are red’ in the numerator is low when there is one red (X1 = 1) and four green
balls, while the likelihood is much higher when there is three red balls (X1 = 3)
and two green.
The updated density for ii) (Red and palette) deviates less from the prior. The
reason is that the likelihood of F3 is quite high even when X1 = 1 when all other
12
colors are unique.
The updated distribution i) is the one that moves most to the right, giving more
probability mass to the large values of X1 . The probability of guilt P (G|F3 ) =
E (X −1 |F3 ) is therefore the lowest in case i) . The updated distribution in case ii)
is the one that moves least to the right, giving the least probability mass to large
number of X1 . Therefore the probability of guilt in case i) is the lowest while the
probability of guilt in case ii) is the highest.
2.5
Known Number of Iterations
In the above analysis it was assumed that, the number of iterations until a case
was found, with a second ball of the same color as the first ball, was unknown.
In principle the information regarding the number of iterations could be available.
Let R denote the number of iterations until a case was found. As before let the
first case involve two red balls. Given a combination of colors X A (where the
number of balls of color i is XiA , i ∈ [1, M ] and
P
XiA = N ), the likelihood of
experiencing F4 = [R − 1 potential cases and then a red case] is
P F4 |X
A
=
1−
M
X
!R−1
XiA /N
2
X1A /N
2
(9)
i=1
Using Bayes’s formula this likelihood can be used to update the distribution over
all possible X A s in a given context. The updated probability distribution over
13
X A s can in turn be marginalized to achieve an updated probability distribution
over red balls X1 , which in turn can be used to calculate the probability of guilt. I
have done this exercise for all of the three Tiny islands. The updated probability
of guilt is plotted in Figure 2 . Three distinct features are apparent. First, when
Figure 2: Probability of guilt and the number of iterations.
P
1.0
................................................
...................................... . . . . . . . . . . . . .
...........................
ii) Red and palette................................................................. . . . . . . . . . . . . . . . . . . . .
0.8
0.6
.
...
.....
.............
....
............
....
...........
..........
....
.
.
.
.
.
.
.
.
.
.
.
.
..
.
.........
....
.........
....
........
....
........
....
........
.
.
.
.
.
.
.
.
.
.
..
...
.......
...
.......
...... . . . . .
......
....... . . . .
.
.
.
.
.... .
..... . .
...... . .
......
..... .
.........
..
.....
..
......
.
.......
.......
....... .......
....... ....... ....... ....... ....... ....... ....... ....... ....... ....... ....... ....... ....... ....... ....... ....... ....... ....... ....... ....... ....... .......
iii) All equal
i) Red and green
0.4
0.2
R
10
20
30
40
the number of iterations is exactly equal to one, the probability of guilt is in all
cases equal to 0.600. This is no coincidence. Looking at (9) it is clear that the
distributions of other colors than red do not matter when R = 1. Hence, N and
p1 are the only facts that matter and in all three cases N = 5 and p1 = 1/6. As
seen above, 0.600 is also the probability of guilt in the reference case of random
search. The reference is an experiment where: ”we are faced with one experiment,
where two balls drawn with replacement by chance happen to be red.” Another
14
way of putting it is that the number of iterations before two red balls are drawn
happen to be R = 1. This illustrates again the difference between the reference
experiment and the experiment where it is assumed that the case is a selected case
after a number of iterations.
Second, both for ”Red and palette” and ”All equal” the probability of guilt
increases and approaches unity as R increases. The reason for this asymptotic
behavior is that in both configurations the Bayesian updating, when seeing a long
range of iterations, gives large probability to X A s that are such that there is one
out of each color. Such X A s are possible both in ”Red and palette” and ”All
equal”. One out of each color implies that there is only one red ball, hence sure
guilt.
Third, the probability of guilt with ”Red and green” declines and settles at a
level of around 0.45. As there are only two colors in this configuration, having
just one of each color is not possible. As R increases the likelihood of only two
alternatives stand out: a) X1 = 2 and X2 = 3 or b) X1 = 3 and X2 = 2. Hence, in
this configuration the probability of guilt settles at a level between 1/3 and 1/2.
3
DISCUSSION
The analysis of the Island Problem illustrates the general point that the interpretation of statistical data must take into account exactly how the data were collected.
15
What is the protocol and where does it end? In the solution to the Island Problem,
Yellin (1979) only considered the part of the evidence relating to the fact that the
guilty happened to have certain characteristics. Later Dawid(1994) brought into
the picture the fact that the search had produced a suspect with the same characteristics. Both Yellin and Dawid start from the premise that the case is given. My
argument is that the case may itself be selected in a stochastic process. Only cases
where there is a suspect with identical characteristics as the culprit are brought
before the court.
The discussion started from the Collins Case, a classic case in forensic statistics.
In addition to illustrating core issues of forensic statistics, like ”the prosecutors
fallacy”, the Collins Case also illustrates some important principles in Bayesian
learning and Bayesian reasoning. The present discussion is not meant as a substantial contribution to forensic statistics, but rather as an elaboration on a fascinating
stylized case of Bayesian reasoning.
In fact, the salience of the selection effect would only be marginal in stylized
forensic problems of realistic size. In Eggleston’s (1983) Island Problem, for example, as captured by Large island in Table 1 above, N is only 100. Already at
that modest population size the difference between the different parameter configurations is very small. If N increases further the selection effect would not matter
at all. To put it loosely, when the number of balls is large and when there is a
16
large number of colors, each with small probabilities, then
P
Xi 2 can be treated
as a constant independent of X1 for small X1 . Then, it follows that (8), for small
n, can be simplified as follows
P (X1 = n|F3 ) ≈ P (X1 = n)
n2
= P (X1 = n|F1 ∩ F2 ) .
E (X12 )
(10)
The last equality follows as
P (X1 = n|F1 ∩ F2 ) = P (X1 = n)
P ((F1 ∩ F2 ) |X1 = n)
n2 /N 2
= P (X1 = n)
P (F1 ∩ F2 )
E (X12 /N 2 )
The approximation (10) is accurate when red is quite rare, and when there is a
large number of other features in the population. That the feature red is rare is
an implicit condition for the problem to be relevant in the context of evidence in
a court case.
References
Balding; D. J., and Donnelly, P. (1995), “Inference in Forensic Identification,”
Journal of the Royal Statistical Society. Series A (Statistics in Society), 158
(1), 21-53.
Dawid, A. P. (1994), “The Island Problem: Coherent Use of Identification Evi-
17
dence,” Aspects of Uncertainty. A Tribute to D. V. Lindley, Wiley, New York,
159–170.
Dawid, A. P., and. Mortera, J. (1996), “Coherent Analysis of Forensic Identification Evidence,” Journal of the Royal Statistical Society. Series B (Methodological), 58 (2), 425-443.
Eggleston, R. (1983), Evidence, proof and probability 2nd ed. , London: Weidenfeld
and Nicolson.
Epstein, R. A. (1967), The Theory of Gambling and Statistical Logic, Academic
Press, New York.
Fairley, W. B., and Mosteller, F. (1974) “A Conversation about Collins,” The
University of Chicago Law Review, 41(2). pp. 242-253.
Lindley, D.V., (1987) “The probability approach to the treatment of uncertainty
in artificial intelligence and expert systems,” Statistical Science, 2, 17-24.
The Supreme Court of California (1969) “People v. Collins,” reprinted in Fairley,
W. B. and Mosteller, F. (eds.) (1977) Statistics and Public Policy, Reading,
Mass , 1977. pp. 355-368.
Yellin, J. (1979) “Review of Evidence, Proof and Probability (by R. Eggleston)”
Journal of Economic Literature, 17, 583-584.
18
Download