Formal tools for handling evidence Valentina Leucari

advertisement
Formal tools for handling evidence
Valentina Leucari
Leverhulme/ESRC Research Programme
“Evidence, inference and enquiry”
Department of Statistical Science
University College London
2005-2006
Table of Contents
Table of Contents
2
1 Introduction
3
2 Bayesian networks for the analysis of evidence
2.1 Evidence and Bayesian networks . . . . . . . . . . . . . . . . . . . . . . . .
5
5
3 Bayesian network fragments for representing evidence
3.1 Some recurrent fragments . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.2 Remarks and future work . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7
7
11
4 Recurrent combinations of evidence
4.1 Contradiction and corroboration . . . . . . . . . . . . . . . . . . . . . . . .
4.2 Conflict and convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.3 Remarks and future work . . . . . . . . . . . . . . . . . . . . . . . . . . . .
16
16
19
19
5 Evidence in legal cases: Wigmore charts and Bayesian networks
5.1 A criminal case example . . . . . . . . . . . . . . . . . . . . . . . . .
5.2 The Wigmore chart analysis . . . . . . . . . . . . . . . . . . . . . . .
5.3 The Bayesian network analysis . . . . . . . . . . . . . . . . . . . . .
5.3.1 A simple Bayesian network . . . . . . . . . . . . . . . . . . .
5.3.2 An object-oriented Bayesian network . . . . . . . . . . . . . .
5.4 A comparison between Wigmore charts and Bayesian networks . . .
5.5 Remarks and future work . . . . . . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
21
21
23
25
26
28
32
34
6 A Bayesian network analysis of the Sacco
6.1 The case . . . . . . . . . . . . . . . . . . .
6.2 Items of evidence . . . . . . . . . . . . . .
6.2.1 Witness evidence . . . . . . . . . .
6.2.2 Physical evidence . . . . . . . . . .
6.2.3 Consciousness of guilt evidence . .
6.2.4 Combining all the evidence . . . .
6.3 Remarks and future work . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
36
36
37
37
38
40
40
40
References
and
. . .
. . .
. . .
. . .
. . .
. . .
. . .
Vanzetti case
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
43
2
Chapter 1
Introduction
This is a report of my research activity within the “Evidence, Inference and Enquiry”
programme. I have been involved in the “Formal Tools for Handling Evidence” project, and
my work has been motivated by the investigation of the use of probability and statistics for
analysing evidence. The aim of the project is to
• Identify generic principles for representing and handling evidence
• Develop formal methods for expressing and manipulating them
• Explore their applications.
From a statistical perspective, a formal analysis of evidence entails a description of both
the problem and the related evidence through a model, identification of relevant hyphotheses, quantification of prior knowledge and application of probabilistic techniques to evaluate
the evidence. Within such a general framework, I have focused my research on a specific
statistical tool, namely Bayesian networks, with the aim of analysing complex structures
of the evidence arising in different areas. Contributions to the overall “Evidence, Inference
and Enquiry” programme entail exploring the application of formal methods to different
disciplines, developing a systematic rigorous method to analyse evidence, and providing
general tools for drawing inferences from the observed evidence.
Most of my research work so far has been focused on representing probabilistic structures
of the evidence through Bayesian networks. Besides the overlapping of some of the topics
discussed in different chapters, the main areas of interest are
• Bayesian networks for representing and evaluating complex evidence (see Chapter 2
and 3): some features of Bayesian networks, such as conditional independence relationships, causality, evidence propagation, incorporation of new evidence, make them
a powerful tool for evidential reasoning.
• Representation of recurrent structures of the evidence (see Chapter 3 and 4): we define
simple Bayesian network fragments describing recurrent and very general patterns in
3
the way evidence arises and combine them in object-oriented networks.
• Interactions between different items of evidence (see Chapter 4): different sources
of evidence may exhibit interaction patterns that determine the inference drawn on
certain hypotheses of interest.
• Analysis of evidence in legal cases (see Chapter 5 and 6): we introduce Bayesian
networks as a method for a probabilistic description of legal cases in terms of the
available evidence, and illustrate them with both fictitious examples and real cases.
We also compare Bayesian networks and Wigmore charts, a graphical method used in
forensic science for describing legal reasoning.
In the following chapters research developed in such areas is briefly presented, as well as
directions for future work. Most of this research is still work in progress.
4
Chapter 2
Bayesian networks for the analysis
of evidence
Evidence interpretation has principles and general aspects that are common throughout
different disciplines. Evaluating complex patterns of evidence presents the problem of understanding all of the dependencies which may exist between different aspects of the evidence (see for instance multiple sources of evidence in legal cases). A graphical method can
provide a valuable aid for overcoming such difficulties.
In this chapter, we briefly describe some features of Bayesian networks that make them
a powerful tool for the analysis of evidence.
2.1
Evidence and Bayesian networks
Here are some general thoughts on why it is interesting (and hopefully useful) to use
Bayesian networks. Some of these issues will be discussed in the following chapters.
• Items of evidence form complex interrelated chains or webs, where relevance and
weight of any specific piece of evidence can only be assessed in the light of its relation
to other pieces of evidence. Bayesian networks accomplish this by means of conditional
independence relationships.
• According to Schum (2001), evidence has to be evaluated on the basis of three fundamental attributes: 1) relevance, 2) credibility, 3) strength. In terms of Bayesian
networks, this translates into 1) value of information (it is possible to quantify the
impact of additional evidence in a certain model), 2) conditional probability tables
(by specifying CPTs we assess credibility of the items of evidence), 3) likelihood ratio.
• Evidence can be entered in a model either by fixing the value of some variables or by
introducing likelihood evidence.
5
• Evidence can be propagated through the model, so additional evidence can be entered
at any time during the process.
• Prior information (conditional probability tables) can be used when defining a model.
• A model can be causal, i.e. causal relationships between different items of evidence
can be taken into account.
• Complex evidence structures can be easily handled by object-oriented Bayesian networks.
• Bayesian networks allow for: 1) representing the available evidence, both in a qualitative and quantitative fashion, 2) drawing (statistical) inference from the evidence.
• It is possible to perform sensitivity analyses and evaluate different models for the same
problem.
• Possibility of combining semantic of the evidence and syntax of the evidence in a
Bayesian network?
Some potential drawbacks
• In some problems (e.g. legal cases) the same evidence is used twice: for building the
network and for computing likelihood ratios. Bayesian networks are a tool for making
inference, already inferred inference should not be included in the net.
• Limitations of Hugin (nodes cannot be output and input at the same time, conditional
probability tables cannot be changed when instances of a certain network are used in
different context, and others).
• In some problems (see Chapter 5) it would be useful to have a “dynamic” model which
allows for taking decisions depending on the observed evidence.
• Problems when there is the need for conditioning on variables that are at the bottom
of the net (as it sometimes happens in Wigmore charts?)
6
Chapter 3
Bayesian network fragments for
representing evidence
It is often possible to find certain structures that are used repeatedly within a single Bayesian
network and also throughout different networks constructed for different problems. An
object-oriented Bayesian network allows to first define general basic networks (we call them
fragments) and then to combine them together in a main network, having as nodes both
random variables and Bayesian networks which are instances of the general fragments previously defined, see Dawid (2003) and Dawid et al.(2005).
We are interested in finding Bayesian network fragments for representing recurrent evidence structures. Relationships between different items of evidence, the process whereby
evidence arises and evidential reasoning all exhibit recurrent patterns that can be captured
by small general idioms. Schum (2004) discusses recurrent interaction schemes between
items of evidence. In Levitt and Laskey (2001), fragments are introduced in a legal context.
3.1
Some recurrent fragments
Below, we present a few general fragments together with examples of their applications.
Combining these recurrent network structures allows an easy representation of mixed sources
of evidence. This has been done in legal examples (see Chapter 5 and 6), but this kind of
models is generally applicable within other frameworks and in general every time we are
dealing with items of evidence. In Leucari (2005) an object-oriented Bayesian network has
been constructed for a fictitious burglary example, using some of the network structures
that are described below (see also Chapter 5).
Report. This network represents a report about a certain item of evidence (a physical
item or an event). The structure of the network is shown in Figure 3.1, and the variables
are
7
I1
the true item or event (input node)
I2
a “randomly chosen” item or event (unrelated to the true one)
R
the source’s report about I1
C
a biased coin (with an assigned probability parameter).
I1
C
I2
R
Figure 3.1: The “Report” fragment.
The report can be correct, and hence equal to the true item I1 , or wrong, and hence
equal to some random item I2 , independent of I1 . The biased coin is used for modelling
non-symmetric error, as it is reasonable to assume that the source is more likely to make
mistakes in one direction (e.g. it is more likely that the source says “true” when the event
is “false” rather then “false” when the event is “true”, or viceversa). This fragment can be
used, for instance, for modelling witness testimonies, or results from laboratory analyses,
e.g. on blood or DNA samples (see the burglary example in Chapter 5).
If we assume that Ii , i = 1, 2, are binary variables and define
P (I2 = 1) = β
P (C = head) = π
so that β = 1/2 corresponds to symmetric error, we can compute the probabilities of
obtaining a correct report as
f1 = P (R = 1|I1 = 1) = π + β(1 − π)
f0 = P (R = 0|I1 = 0) = π + (1 − β)(1 − π)
so that π = f0 + f1 − 1. Notice that π = 0 (i.e. f1 = β and f0 = 1 − β) corresponds to
deception, whereas π = 1 (i.e. f1 = 1 and f0 = 1 ) to accuracy. It has also to be taken into
account that errors happen for different reasons: the source can be mistaken, or deceptive,
etc. Assuming that mistake and deception are the two only reasons for an error, a possible
model for the source’s error is in Figure 3.2, where E=error, M =mistake, D=deception. A
8
M
D
E
Figure 3.2: A model for errors in a report.
possible functional relationship (assuming binary variables) is
E = M D + M (1 − D) + D(1 − M ).
This needs further investigation.
Match. This network represents a situation where a match between different items of
evidence is investigated. For instance, a trace left at the crime scene has to be compared
to traces coming from different suspects, e.g. blood or DNA samples. This kind of model
has been studied in Cavallini and Corradi (2005) in a legal context. See also Chapter 5 for
further legal applications. The network structure is shown in Figure 3.3, and the variables
are
H
hypothesis of interest, for instance “Is the suspect guilty?” (input node)
T
trace we are interested in
T1 , . . . , Tn
possible sources for the trace.
H
T1
T2
.
.
.
Tn
T
Figure 3.3: The “Match” fragment.
Variable H should have n states so that, for example, evidence of T = Ti would increase the
probability of suspect i being guilty. Notice that, when using instances of this network in
different contexts, it might be useful to be able to leave n unspecified, and fix it according to
the problem at hand (this is currently unfeasible in Hugin). A possible solution could be the
model in Figure 3.4, where N is a random variable uniformly distributed over {1, . . . , n},
to be fixed to the desired number, depending on the problem.
9
H
T1
T
N
T2
.
.
.
Tn
Figure 3.4: An alternative “Match” fragment.
Contradiction. This network represents a situation where two sources give contradictory
evidence, and has been introduced by Schum (2001, 2004). The network in shown in Figure
4.1, and the variables are
H
hypothesis of interest (the arrow line is dotted since this variables is not strictly
part of the fragment)
E
evidence (input node)
S1
source 1 of evidence
S2
source 2 of evidence.
H
E
S1
S2
Figure 3.5: The “Contradiction” fragment.
See also Chapter 4 for a more accurate analysis. Notice that since contradictory evidence
involves events that cannot happen at the same time, credibility of the sources becomes
relevant (see the next section for a model for credibility).
Conflict. This network represents a situation where two sources of evidence are in conflict.
As the “Contradiction” fragment above, this is a model of dissonant evidence, but unlike
the “Contradiction” model, the “Conflict” model does not necessarily involve incompatible
10
sources. This is because the two sources refer two different events. The network is shown
in Figure 4.2, and the variables are
H
hypothesis of interest (does not directly enter the fragment)
E1
intermediate evidence (input node)
E2
intermediate evidence, unrelated to E1 (input node)
S1
source 1 of evidence
S2
source 2 of evidence.
H
E1
E2
S1
S2
Figure 3.6: The “Conflict” fragment.
For more detailed explanations, see Schum (2001).
Corroboration. A model for harmonious evidence: two sources give corroborative evidence about a hypothesis of interest, see Schum (2001). The model is the same as the one
for conflicting evidence, except for the value of S1 and S2 (that must be equal).
Convergence. A model for harmonious evidence: two sources give convergent evidence,
though only indirectly related to a hypothesis of interest, see Schum (2001). The model
is the same as the one for contradictory evidence, except for the value of S1 and S2 (that
must be equal).
3.2
Remarks and future work
There are some more fragments that have not been explored yet. They are listed below.
11
Credibility. Credibility of a source of evidence depends on many factors, both subjective
and objective. This fragment has been elaborated by Dawid and Schum. There are two
levels: the top level is the “Credibility” fragment, which contains instances of other networks, “Competence” and “Filter”. The “Credibility” network in shown in Figure 3.7, and
the variables are
E
event
C
competence
S
sensation, instance of “Sensation”
O
objectivity, instance of “Filter”
V
veracity, instance of “Filter”
T
testimony.
E
C
S
O
V
T
Figure 3.7: The “Credibility” fragment.
The “Sensation” network in shown in Figure 3.8 and the variables are
C
competence
12
A
agreement
S
sensation.
C
A
S
Figure 3.8: The “Sensation” fragment.
Finally, the “Filter” network is shown in Figure 3.9 and the variables are
In
in
Co
correct
Out
out.
In
Co
Out
Figure 3.9: The “Filter” fragment.
These networks are self-explanatory. Notice that this fragment can be used in conjunction
with the “Report” fragment for modelling reliability of the source.
Manipulation. An issue directly related to credibility of the sources is the one of manipulated evidence. This network, introduced in Baio and Corradi (2004), models a situation
of uncertainty about whether the evidence that is available is genuine or somehow manipulated. The network for manipulated evidence in shown in Figure 3.10 and the variables
are
H
hypothesis of interest (input node)
W
indicator of presence/absence of manipulation
13
T
uncertain evidence
A
control evidence.
H
W
T
A
Figure 3.10: The “Manipulated evidence” fragment.
Evidence T can be either genuine or manipulated, but this is unknown. Variable A is
additional evidence, certainly genuine, that helps to investigate about the origin of the
unclear node T . For a more detailed description see Baio and Corradi (2004).
Explaining away. Two possible causes of the same event; knowledge of one of them being
true lowers the probability of the other one, see for example Pearl (2000). The network is
shown in Figure 3.11 and the variables are
X1
cause 1
X2
cause 2
Y
event.
X1
X2
Y
Figure 3.11: The “Explaining away” fragment.
Confounding. This network represents the effect of some variable that confounds the
relationships between two or more variables, see Dawid (2000, 2002) and Lauritzen (2003).
The network is shown in Figure 3.12 and the variables are
U
unobserved confounder
14
T
“treatment” variable
S
covariate
R
“response” variable.
U
T
S
R
Figure 3.12: The “Confounding” fragment.
More fragments to be developed
• Interactions between witnesses (when the testimony of some witness is influenced or
forced by some other witness)
• Alternative explanations for the same event (as they use in Wigmore charts, see
Anderson et al., 2005)
• The use of generalisations for deriving inferences (see Wigmore charts)?
15
Chapter 4
Recurrent combinations of
evidence
We consider recurrent patterns of interaction between evidence items, as described in Schum
(2001), and give a probabilistic interpretation of such structures. Recurrent combinations of
evidence include both dissonant (contradiction and conflict) and harmonious (corroboration
and convergence) evidence. Combinations of evidence items are also discussed in Dawid
(1987).
4.1
Contradiction and corroboration
Consider an item of evidence E and two sources of evidence S1 and S2 for such item. The
probabilistic structure of this kind of interaction is represented in Figure 4.1, where S1 and
S2 are conditionally independent given E. The variable H is the hypothesis of interest for
the overall problem, but we will not discuss its role for the moment (this is the reason for
the dotted line in the picture). We assume all variables are binary. Generally, we observe
H
E
S1
S2
Figure 4.1: Contradiction/corroboration.
16
E=0
E=1
S1 = 0
β1
1 − α1
S1 = 1
1 − β1
α1
Table 4.1: Conditional probability table for S1 |E.
E=0
E=1
S2 = 0
β2
1 − α2
S2 = 1
1 − β2
α2
Table 4.2: Conditional probability tables for for S2 |E.
S1 and S2 , whereas E is not observed, and we are interested in the likelihood ratio
λ=
P (S1 , S2 |E = 1)
P (S1 |E = 1) P (S2 |E = 1)
=
×
.
P (S1 , S2 |E = 0)
P (S1 |E = 0) P (S2 |E = 0)
Evidence can be as follows
i) S1 = 1, S2 = 0
ii) S1 = 0, S2 = 1
iii) S1 = 1, S2 = 1
iv) S1 = 0, S2 = 0.
Evidence as in (i) and (ii) is termed contradictory, whereas evidence as in (iii) and (iv) is
termed corroborating, for obvious reasons.
The usual parameterisation is by means of conditional probabilities, as shown in Table
4.1 and 4.2. Such probabilities represent the prior beliefs in the sources telling the truth
about E, namely
α1 = P (S1 = 1|E = 1)
β1 = P (S1 = 0|E = 0)
α2 = P (S1 = 1|E = 1)
β2 = P (S1 = 0|E = 0).
17
When dealing with contradictory evidence a fundamental issue is assessing credibility
of the sources. A useful reparameterisation is given by likelihood ratios
P (S1 = 1|E = 1)
P (S1 = 1|E = 0)
P
(S1 = 0|E = 1)
λ−
1 =
P (S1 = 0|E = 0)
P (S2 = 1|E = 1)
λ+
2 =
P (S2 = 1|E = 0)
P
(S2 = 0|E = 1)
λ−
2 =
P (S2 = 0|E = 0)
λ+
1 =
α1
1 − β1
1 − α1
=
β1
α2
=
1 − β2
1 − α2
=
β2
=
which directly represent credibilities. The corresponding probability tables are shown in
Table 4.3 and 4.4. The overall likelihood ratio in (i)-(iv) is respectively
+
λ = λ+
1 λ2
−
λ = λ−
1 λ2
−
λ = λ+
1 λ2
+
λ = λ−
1 λ2 .
When the sources are contradictory they contribute to the overall likelihood ratio in opposite
ways: the impact of the sources largely depends on their credibility. Credibility is not so
relevant when the evidence is corroborative.
The relationship between the two parameterisations is
α1 =
−
λ+
1 (1 − λ1 )
−
λ+
1 − λ1
β1 =
λ+
1 −1
+
λ1 − λ−
1
−
+ −
and similarly for variable S2 . Notice that if λ+
1 > 1 then λ1 ≤ 1, i.e. (λ1 , λ1 ) are not
variation independent and take values in (−∞, 1] × (1, +∞) ∪ [1, +∞) × (−∞, 1). If λ+
1 >1
then source S1 is supporting the evidence E = 1: the more credible S1 is the stronger the
support. Of course, one can perform sensitivity analyses in order to assess the value of these
parameters. As special cases: if we cannot say anything about the credibility of S1 we can
−
assign λ+
1 = λ1 = 1 (this is equivalent to set α1 = β1 = 0.5), and if we believe S1 to be
1
credible both in supporting E = 1 and E = 0 we can assign λ+
1 = λ− (this is equivalent to
1
set α1 = β1 = 1).
Using logs we can define
λ̃+
= log λ+
1
1
λ̃−
= log λ−
1
1
18
E=0
E=1
λ+
1 −1
−
λ+
1 −λ1
1−λ−
1
−
λ+
1 −λ1
+
λ−
1 (λ1 −1)
+
λ1 −λ−
1
−
λ+
1 (1−λ1 )
+
−
λ1 −λ1
S1 = 0
S1 = 1
Table 4.3: Conditional probability table for S1 |E in terms of likelihood ratios.
E=0
E=1
λ+
2 −1
−
λ+
2 −λ2
1−λ−
2
−
λ+
2 −λ2
+
λ−
2 (λ2 −1)
+
λ2 −λ−
2
−
λ+
2 (1−λ2 )
+
λ2 −λ−
2
S2 = 0
S2 = 1
Table 4.4: Conditional probability table for S2 |E in terms of likelihood ratios.
−
so that λ̃+
1 > 0 (and λ̃1 < 0) represents support for E = 1, and similarly for the other
source. With such a parameterisation, contradictory and corroborative evidence is represented respectively by
−
λ̃ = λ̃+
1 + λ̃2
+
λ̃ = λ̃+
1 + λ̃2
or
or
+
λ̃−
1 + λ̃2
−
λ̃−
1 + λ̃2 .
Table 4.5 shows relationships between evidence and credibility of the sources. Source S1 is
−
credible when λ+
1 > λ1 , and similarly for source S2 .
4.2
Conflict and convergence
H
E1
E2
S1
S2
Figure 4.2: Conflict/convergence.
4.3
Remarks and future work
• More on contradiction/corroboration
19
Evidence
Likelihood ratio
Contradiction
S1 = 1, S2 = 0
−
λ = λ+
1 λ2
S1 = 0, S2 = 1
+
λ = λ−
1 λ2
S1 = 1, S2 = 1
λ=
+
λ+
1 λ2
S1 = 0, S2 = 0
−
λ = λ−
1 λ2
Conjunction
Effect of evidence
(based on credibility)
Both
Only
Only
Both
Only
Only
−
credible: λ > 1 if λ+
1 > 1/λ2
S1 credible: λ > 1
S2 credible: λ < 1
−
credible: λ > 1 if λ+
2 > 1/λ1
S1 credible: λ < 1
S2 credible: λ > 1
Both
Only
Only
Both
Only
Only
credible: λ > 1
S1 credible: λ > 1
S2 credible: λ > 1
credible: λ < 1
S1 credible: λ > 1
S2 credible: λ > 1
+
if λ+
1 > 1/λ2
+
if λ+
2 > 1/λ1
−
if λ−
2 > 1/λ1
−
if λ−
1 > 1/λ2
Table 4.5: Conflicting/corroborative evidence and credibility.
• A similar analysis has to be done for the more complex case of conflicting and convergent evidence
• Application of these models to the Sacco and Vanzetti case (see Chapter 6.)
20
Chapter 5
Evidence in legal cases: Wigmore
charts and Bayesian networks
In forensic science a major task is interpreting patterns of evidence which involve many
variables, and combining different items of evidence within a complex framework of circumstances. Typical features of the evidence arising from legal cases are its complex structure
and ambiguity. Therefore, marshalling and evaluating evidence are two fundamental issues in forensic science, both for constructing arguments about questions of fact and for
taking final decisions. The chart method for analysing evidence introduced by Wigmore
is a technique which allows to organise and describe the available evidence, and to construct reasoning processes through sequential steps (see Anderson et al., 2005). Bayesian
networks are a general statistical tool which can be applied in the legal context to model
relationships between different sources of evidence, weigh the available evidence, and draw
statistical inferences from it. Wigmore charts and Bayesian networks are clearly different in
nature, but both of them are an attempt to a rigorous and formal approach to the analysis
of evidence. Therefore, we are interested in pointing out strengths and weaknesses of such
tools and exploring possible interactions between them. This continues the work started in
Dawid and Schum (2004). For a general discussion of statistics applied to the analysis of
evidence in legal cases see Dawid (2005).
5.1
A criminal case example
In Dawid and Evett (1997) a hypothetical criminal case is presented, involving a complicated
pattern of interactions between different items of evidence. Such interactions are represented
in a Bayesian network, and the likelihood ratio for the hypothesis “Is the person prosecuted
for this crime truly the offender?” is then computed. In Schum (2005) the same example
–and the same evidence– is analysed via the Wigmore chart method, and arguments are
constructed in order to charge the defendant with the crime.
21
The story is as follows (see Dawid and Evett, 1997). An unknown number of offenders
entered commercial premises late at night through a hole which they cut in a metal grille.
Inside, they were confronted by a security guard (Willard R. in Schum, 2005) who was able
to set off an alarm before one of the intruders punched him in the face, causing his nose
to bleed. The intruders left from the front of the building just as a police patrol car was
arriving and they dispersed on foot, their getaway car having made off at the first sound of
the alarm. The security guard said that there were four men but the light was too poor for
him to describe them and he was confused because of the blow he had received. The police
in the patrol car (Detective Inspector Leary in Schum, 2005) saw the offenders only from
a considerable distance away. They searched the surrounding area and, about ten minutes
later, one of them found the suspect (Harold S. in Schum, 2005) trying to “hot wire” a car
in an alley about a quarter of a mile from the incident. At the scene, a tuft of red fibers
was found on the jagged end of one of the cut edges of the grille. Blood samples were taken
from the guard and the suspect. The suspect denied having anything to do with the offence.
He was wearing jumpers and jeans that were taken for examination. A spray patterns of
blood was found on the front and right sleeve of the suspect’s jumper. The blood type was
different from that of the suspect, but the same as that from the security guard. The tuft
from the scene was found to be red acrylic. The suspect’s jumper was red acrylic. The tuft
was indistinguishable from the fibers of the jumper by eye, microspectrofluorimetry (MSF)
and thin layer cromatography (TLC). The jumper was well worn and had several holes,
though none of them could clearly be said to be a possible origin for the tuft.
This example, though quite simple at first sight, possesses many of the features of legal
cases that give raise to complex structures of the associated evidence. Most of them are
obvious, but it may be useful to highlight them, as they recur in many criminal cases:
• There are multiple and different sources of evidence: an item found at the crime scene
(the tuft of fibres), an item belonging to the suspect (the jumper), a trace left by an
unknown individual (the blood stain on the jumper), people involved with the crime
(the suspect, the guard, the police officer). The structure of such mixed evidence is
described in Figure 5.1.
• Besides evidence directly related to the crime scene, further evidence can be (and will
be) collected: laboratory analyses on the fibers found at the crime scene, tests on
blood samples (taken from the suspect’s jumper and possibly from various people.)
• There are no data, in the statistical meaning of repeated observations of an experiment: the “experiment” is unique and not replicable.
• Uncertainty can be introduced at various levels: are we willing to assume that there
was a crime (and take this for granted)? And many other such examples.
22
Notice that there are other classes of problems in other disciplines that exhibit some of
these features or similar issues.
NUMBER OF
OFFENDERS
FIBER
EVIDENCE
SUSPECT GUILTY?
BLOOD
EVIDENCE
WITNESS
EVIDENCE
Figure 5.1: Mixed evidence for the burglary example.
The aim of our work is to compare and synthesize the two analyses: starting from the
original Bayesian network in Dawid and Evett (1997), a more detailed network is constructed
by taking into account the process followed in Schum (2005) of breaking down the evidence
into single “units” and building a chain of reasoning by connecting them. The ultimate
objective would be the development of a formal method to handle multiple sources of
evidence in legal cases (and not only) by jointly exploiting Wigmore charts and Bayesian
networks. This synthesis is largely at its very early stages and the model presented here is
a first attempt at “combining” the two methods, and by no means the final product.
5.2
The Wigmore chart analysis
In Schum (2005) a Wigmore chart analysis of the example above is presented. The objectives
of such analysis are
• Marshalling and organising the available evidence
• Constructing arguments from evidence to penultimate probanda
• Establishing the probative force of an emerging collection of evidence
• Describing a (subjective) chain of inferences.
The Wigmorean model is therefore constructed as a reasoning process aiming at proving
the following ultimate probandum (U ) and penultimate probanda (P1 , P2 , P3 , P4 )
U
Harold S. unlawfully and intentionally assaulted and injured the security guard
Willard R. during a break-in at the Blackbread Brewery premises in the early morning
hours of 1 May, 2003.
P1
In the early morning hours of 1 May, 2003, four men unlawfully broke into the
premises of the Blackbread Brewery
23
P2 Harold S. was one of the four men who broke into the premises of the Blackbread
Brewery in the early morning hours of 1 May, 2003
P3
A security guard at the Blackbread Brewery, Willard R., was assaulted and
injured during the break-in at the Blackbread Brewery on 1 May, 2003.
P4
It was Harold S. who intentionally assaulted and injured Willard R. during the
break-in at the Blackbread Brewery in 1 May, 2003.
The structure of the chart is as in Figure 5.2, where only the top part is shown. The
U
P1
CHART
FOR P1
P2
P3
CHART
FOR P2
CHART
FOR P3
P4
CHART
FOR P4
Figure 5.2: The top of the Wigmore chart for the burglary example.
Wigmore method consists of several steps
a) Defining the ultimate probandum and the penultimate probanda
b) Parsing and organising the evidence into trifles, i.e. assessing relevance
c) Assigning trifles to penultimate probanda
d) Constructing key lists bearing upon the probanda
e) Drawing a chart that shows inferential linkages among elements in the key lists,
where (a)-(c) and (d)-(e) are called analysis and synthesis respectively.
This process applied to the burglary example is described in detail in Schum (2005).
For illustration, we only describe the key list and Wigmore chart for P4 . Items 1-82 are in
the key lists for P1 , P2 , P3 . The key list for P4 is
83. A blood sample was taken from Willard R. on 1 May, 2003
84. DI Leary testimony about 83
24
85. A blood sample was taken from Harold S. on 1 May, 2003
86. DI Leary testimony to 85
87. A spray pattern of blood was found on the front and right sleeve of the jumper
belonging to Harold S.
88. DI Leary testimony to 87
89. The jumper showing the blood stains to be shown at trial
90. The jumper shown at trial is the same one taken from Harold S. after his apprehension
on 1 May, 2003
91. The blood type of the blood on Harold S.’s jumper matches the blood type of the
security guard
92. DI Leary testimony to 91
93. A tangible record of the blood match analysis to be shown at trial
94. The analysis shown at trial is the same one reported by the forensic scientist who
performed the analysis
95. The blood on Harold S.’s jumper was not already there before the break-in on 1 May,
2003
96. The blood on Harold S.’s jumper came from Willard R.’s nose on 1 May, 2003
97. Harold S.’s testimonial denial of P4 , that he was the one who punched the security
guard Willard R. on 1 May, 2003.
The corresponding Wigmore chart is shown in Figure 5.3, where white circles denote what
has to be proven and black circles are certain.
5.3
The Bayesian network analysis
Dawid and Evett (1997) show a possible Bayesian network for modelling relationships between various items of evidence. Here we try to improve the original network based on the
Wigmore chart analysis in Schum (2005). Trying to translate from one technique to the
other would be the most intuitive way of proceeding, but the two methods are too different
in nature for this to be profitable. Rather, Wigmore charts could be usefully combined with
Bayesian networks in order to provide a more satisfying model to be used to derive inference
from the observed evidence. This is what we discuss in the remaining of this section.
25
P4
96
83
85
97
87
95
90
91
84
86
88
94
89
92
93
Figure 5.3: The Wigmore chart for P4 .
5.3.1
A simple Bayesian network
The key list in Schum (2005), p. 5, describes all the items of evidence related to the
burglary example and used to build the corresponding Wigmore chart. Based on such a
list, we define the variables described in Table 5.1 (the variable names in parenthesis are
the labels originally used in Dawid and Evett, 1997) and build a more elaborate (compared
to the original one in Dawid and Evett, 1997) Bayesian network, see Figure 5.4. All the
elements in the key list are taken into account in this “extended” model: in particular,
uncertainty about the crime (Did it really happen?) is introduced, see variables C1 , C2 in
the network, as well as some details about the crime, see variables C4 , C5 , C8 . Moreover,
witnesses are explicitly represented. In Table 5.1 some of the names in Schum (2005) are
used: BB stands for Blackbread Brewery, HS is the suspect, DI Leary is the detective
inspector who apprehended the suspect. Finally, I is the indicator function, that takes
value 1 if the event is true and 0 otherwise. Notice that the graph obtained by removing
all the new variables is the same as the one in Figure 1 in Dawid and Evett (1997) and it
is shown in Figure 5.5.
Once the evidence (E) has been entered in the network, i.e. the observed variables have
26
Label
Variable description
States
C1
C2
C3 (N )
C4
C5
C6 (C)
C7
C8 (B)
F1
F2 (A)
F3 (Y1 )
J1
J2 (X3 )
B1 (X1 )
B2 (X2 )
B3 (R)
I {Someone made a cut in the grille in order to break in}
I {An unknown number of persons entered BB}
Number of offenders
I {One of the offenders punched the guard}
Consequences of the punch
I {HS is guilty, i.e. he is one of the offenders}
I {HS was nearby the scene of the crime}
Identity of the person who punched the guard
I {Fibre tuft shown at trial was found at crime scene}
Identity of the person who left the fibre tuft in the grille
Properties of the fibre tuft
I {The jumper shown at trial is HS’s jumper}
Properties of HS’s jumper
Blood sample from HS
Blood sample from the guard
Shape of blood stain on jumper
B4 (Y2 )
P ICC1
P ICC5
P ICC7
M SF
T LC
LC1
LC2
LC7
LF 1
LF 3
LJ 1
LJ 2
LB1
LB2
LB3
LB4
LM SF
LT LC
GC2
GC3 (G1 )
GC4
GC5 (G2 )
HSC6
HSC8
Blood type on jumper
Evidence from the photo of the hole in the grille
Evidence form the photo of the guard’s injury
Evidence from the photo of HS after he was apprehended
Result of microspectrofluorimetry
Result of thin layer cromatography
DI Leary’s testimony about C1
DI Leary’s testimony about C2
DI Leary’s testimony about C7
DI Leary’s testimony about F1
DI Leary’s testimony about F3
DI Leary’s testimony about J1
DI Leary’s testimony about J2
DI Leary’s testimony about B1
DI Leary’s testimony about B2
DI Leary’s testimony about B3
DI Leary’s testimony about B4
DI Leary’s testimony about microspectrofluorimetry
DI Leary’s testimony about thin layer cromatography
Guard’s testimony about C2
Guard’s testimony about C3
Guard’s testimony about C4
Guard’s testimony about C5
27
HS’s testimony about C6
HS’s testimony about C8
0, 1
0, 1
1, 2, 3, 4, 5, 6
0, 1
0, 1
0, 1
0, 1
0, 1
0, 1
0, 1
0=other, 1=red acrylic
0, 1
0=other, 1=red acrylic
A, B, AB, O
A, B, AB, O
0=no blood stain,
1=spray, 2=other
A, B, AB, O
0, 1
0, 1
0, 1
0, 1
0, 1
0, 1
0, 1
0, 1
0, 1
0, 1
0, 1
0, 1
A, B, AB, O
A, B, AB, O
0, 1
0, 1
0, 1
0, 1
0, 1
1, 2, 3, 4, 5, 6
0, 1
0, 1
0, 1
0, 1
Table 5.1: Variables for the network in Figure 5.4.
LC1
C1
PC1
C2
GC2
LC2
LC7
C7
PC7
HC6
LF1
F1
LF3
F3
J2
MSF
TLC
LMSF
C3
C4
GC3
C5
GC4
C6
C8
PC5
HC8
F2
LJ2
LB1
B1
LTLC
LB2
B2
B3
B4
J1
LB4
Figure 5.4: Bayesian networks for the burglary example.
been fixed to their observed value, the probabilities of the nodes of interest, namely C6 and
C8 , can be updated, and the likelihood ratio
P (E|C8 = 1)
P (E|C8 = 0)
can be computed. It would be useful to show the algebra (exploiting conditional independence relationships) for simplifying computations of this (and other) quantities of interest,
along the lines of what is described in Dawid and Evett (1997).
5.3.2
GC5
An object-oriented Bayesian network
It is often possible to find certain structures that are used repeatedly inside a single Bayesian
network and in different networks contructed for different problems. An object-oriented
Bayesian network allows to define general single networks –or fragments– and to combine
them together in a main network, having as nodes both random variables and Bayesian
28
LB3
LJ1
C3
GC5
GC3
LC7
C6
C8
F2
F3
J2
B1
B2
B3
B4
Figure 5.5: Original Bayesian networks for the burglary example.
networks which are instances of the general fragments previously defined, see Dawid et al.
(2005).
Figure 5.6 represents the object-oriented network constructed from the simple network
in Figure 5.4, collapsing some of the nodes. Circles are simple variables as in Figure 5.4,
whereas rectangles are instances of the following networks (see Figure 5.7, 5.8, 5.9)
Report (blue rectangles). This network represents a report (variable R) regarding an
item or event (variable I1 ). The report can be right, and hence equal to the original
item or event, or wrong, and hence equal to a random item or event (variable I2 ). The
variable C is a “biased coin” that allows for a non-symmetric error in the report. Such
fragment is used here for witnesses who testify about events related to the crime, and
for laboratory analyses (e.g. microspectrofluorimetry) on items related to the crime.
Match (white rectangles). This fragment represents a match between a trace (variable
T ) and several possible sources of that trace (actually, only one source is considered
29
BREAK
IN
C1
PHOTO
1
GUARD
1
POLICE
1
POLICE
2
C3
GUARD
2
GUARD
3
PHOTO
2
GUARD
4
C7
IDEN−
TITY
C6
PHOTO
3
POLICE
3
BLEED−
ING
C4
B3
SUSPECT
2
POLICE
10
SUSPECT
1
J1
FIBER
TRUE
F2
POLICE
11
POLICE
4
FIBER
MATCH
POLICE
5
MSF
POLICE
6
TLC
BLOOD
1
BLOOD
2
POLICE
8
POLICE
9
POLICE
8
Figure 5.6: The object-oriented Bayesian networks developed from Figure 5.4.
in this example, variable T1 ). Variable H is the hypothesis variables “Who left the
trace?”. Such fragment is used here for describing blood matches between the suspect,
the guard, and the blood found on the suspect’s jumper, as well as the match between
the fibres found at the crime scene and the suspect’s jumper.
Consequence (black rectangles). This network represents events (variable Y ) that
are consequences of some possibly false event (variable X), tipically the event “Did
the crime happen?”. Such fragment is used here to model “meaningless” conditioning,
such as conditioning the break-in variable on the fact that nobody cut the grille (and
hence that nobody even tried to break in). The variable “bernoulli” is itself a network
that corresponds to a Bernoulli random variable with unknown probability parameter.
For a more detailed description see Chapter 3.
the object-oriented network of Figure 5.6.
30
Table 5.2 is a description of the nodes in
Label
Variable description
Instance of
C1
C3
C4
C6
C7
F2
J1
POLICE 1
POLICE 2
POLICE 3
POLICE 4
POLICE 5
POLICE 6
POLICE 7
POLICE 8
POLICE 9
POLICE 10
POLICE 11
GUARD 1
GUARD 2
GUARD 3
GUARD 4
SUSPECT 1
SUSPECT 2
PHOTO 1
PHOTO 2
BREAK IN
BLEEDING
IDENTITY
FIBER TRUE
FIBER MATCH
BLOOD 1
BLOOD 2
I {Someone made a cut in the grille in order to break in}
Number of offenders
I {One of the offenders punched the guard}
I {HS is guilty, i.e. he is one of the offenders}
I {HS was nearby the scene of the crime}
Identity of the person who left the fibre tuft in the grille
I {The jumper shown at trial is HS’s jumper}
Police testimony about the cut in the grille
Police testimony about the break-in
Police testimony about the suspect being at the crime scene
Police testimony about the authenticity of the fiber tuft presented at trial
Police testimony about the match fiber tuft/jumper
Police testimony about the result of MSF
Police testimony about the result of TLC
Police testimony about the match suspect’s blood/blood on jumper
Police testimony about the match guard’s blood/blood on jumper
Police testimony about the blood on the jumper
Police testimony about the authenticity of the jumper shown at trial
Guard testimony about the break-in
Guard testimony about the number of intruders
Guard testimony about the punch
Guard testimony about the nose bleeding
Suspect testimony about himself being guilty
Suspect testimony about himself being at the crime scene
Picture of the cut in the grille
Picture of the suspect after he was apprehended
Whether or not there was a break-in
Whether or not the guard’s nose was bleeding
Identity of the person who punched the guard
Authenticity of the fiber tuft shown at trial
Match between the fiber tuft and the suspect’s jumper
Match between the suspect’s blood and the blood on the jumper
Match between the guard’s blood and the blood on the jumper
Report
Report
Report
Report
Report
Report
Report
Report
Report
Report
Report
Report
Report
Report
Report
Report
Report
Report
Report
Consequence
Consequence
Consequence
Consequence
Match
Match
Match
Table 5.2: Variables for the network in Figure 5.4.
31
I1
C
I2
R
Figure 5.7: The “Report” fragment.
H
T1
T
Figure 5.8: The “Match” fragment.
5.4
A comparison between Wigmore charts and Bayesian
networks
Both Wigmore charts and Bayesian networks are graphical methods allowing incorporation
of complex evidence structures. What do they have in common? Which are the differences?
Are they complementary methods in some sense? Some remarks/questions about comparing
these two techniques are listed below.
• Wigmore charts are a technique for describing a reasoning process, and for constructing arguments. Bayesian networks are a statistical tool for deriving inferences given
the available evidence. Wigmore charts are deterministic.
BER
NOULLI
X
Y
Figure 5.9: The “Consequence” fragment.
32
• Wigmore charts are constructed backward, after collecting the evidence, whereas
Bayesian networks are meant to be a model for representing how things can happen
(evidence is entered after the network has been built). Arrows in the two methods
have opposite directions.
• No conditional independence in Wigmore charts, but still notion of relevance?
• No quantification in Wigmore charts: how can one draw inferences from the observed
evidence or formulate hypotheses?
• No construction of arguments in Bayesian networks?
• A lot of details vs. not so many details (take some of the elements for granted in a
Bayesian network).
• In order to build a Bayesian network one has to choose a “starting point”, and hence
take some of the events for granted and implicitly make assumptions which do not
appear in the model, and thus decide what is evidence...
• A Wigmore chart describes different hypotheses at different levels (ultimate, penultimate and intermediate probanda). What is that correspond to this in Bayesian
networks? Do we always want a single hypothesis (guilty/not guilty)?
• Construction of the network is a subjective matter, in both cases.
• Is the Wigmore chart method unnecessarily complicated (lots of trifles) from a statistical point of view? (Wigmore charts are better confined to the forensic framework
only?)
Why they are similar
• Graphical methods
• Inference networks
• Subjectivity
• Computations via conditional independence and conditional non-independence
• Models for incorporating complex evidence structures.
Why they are different
• Wigmore charts are constructed backwards after the evidence has been observed
• Bayesian networks are a “process model” (Schum, 2005), since they are intended to
capture a complex process by which some series of events could have been generated
33
• In order to construct a Bayesian network one needs to make assumptions about events
related to the problem
• Wigmore charts are based on binary propositions true/false
• Wigmore charts are chains of reasoning from the bottom (evidence) to the top (probanda)
• Wigmore charts use generalisations in order to establish connections among variables
• Bayesian networks are entirely probabilistic
• Arrows mean probabilistic dependence in Bayesian networks, whereas they indicate
the inferential flow of reasoning in Wigmore charts.
5.5
Remarks and future work
Many important issues have not been taken into account in the models described above.
Further work is needed in the following areas
• How to evaluate and compare different models, if needed. Sensitivity analysis: how do
the results change when prior probabilities and/or structure of the network change.
• What model when more than a suspect is involved (see also Levitt and Lasky, 2001)
• Manipulated evidence (e.g. testimonies of witnesses may not be genuine, or item
presented at trial may not be those found at crime scene or could have been modified
after being found at crime scene)
• Interactions between different witnesses
• Credibility of witnesses
• Limitations of Hugin when building a network
• How to use generalisations when building Bayesian networks
• Other issues related to the forensic framework are: manipulated evidence, chain of
custody.
A problem that often arises when modelling legal case examples is the following. Wigmore charts always include probanda (ultimate, penultimate and intermediate). Some
of such probanda, when translated into nodes of a Bayesian networks, may generate
difficulties in interpreting dependencies. Consider for instance the binary variables
(yes /no) X1 =“Did that specific crime happen?” (which would be a penultimate
probanda in a Wigmore chart) and its child node X2 =“Was the suspect at the crime
scene?”. Assigning conditional probabilities P (X2 |X1 = no) is meaningless of course
34
(if there is no crime there is no crime scene). A solution could be to define X2 as
a three-state variables, taking values “yes”, “no” and “not applicable”, and doing so
for all its children and so on (the idea is to “interrupt” the flow of information going
through that part of the network).
35
Chapter 6
A Bayesian network analysis of the
Sacco and Vanzetti case
The Sacco and Vanzetti case, a very famous case in the American legal history, has been
analysed from a probabilistic point of view in Kadane and Schum (1996). A Bayesian
network analysis is described in Cheung (2005). We try to make a more sophisticated
Bayesian network analysis.
6.1
The case
The following is a description of the Sacco and Vanzetti case from Cheung (2005).
After a robbery that took place at about 3 pm on 15 April 1920 in South Braintree,
Massachusetts, Nicola Sacco and Bartolomeo Vanzetti, both Italian immigrants with ties
to the anarchist movement, were convicted first-degree murder for shooting and killing
Alessandro Berardelli and Frederick Parmenter, the payroll guards who were taking two
iron boxes which consisted of a total of over $ 15000 from one factory of the Slater and
Morril shoe factory to the second Slater and Morris factory in South Braintree. During their
journey, two men who were leaning against a pipe-rail fence attacked them from behind.
After the incident, the two men escaped in a black car, with three other men, who had
picked them up at the scene of the crime. Berardelli and Parmenter were both dead. On 5
May the same year, Sacco and Vanzetti were arrested after they had gone with two other
Italians to a garage to claim a car that local police had connected with the crime. After
a long trial that lasted for more than 7 years, the two men were sentenced to death on 23
August 1927, despite of many witnesses giving contradicting and conflicting evidence.
The hypothesis of interest, which we denote by H, is
H = Did Sacco shot Berardelli?
a binary random variable (H = 1 if Sacco is guilty). We want to use the available evidence,
36
which we denote by E, to assess the likelihood ratio
P (E|H = 1)
.
P (E|H = 0)
Notice that we could include Vanzetti as a suspect. Moreover, the hypothesis of interest
can be formulated in a different way: for instance, a multiple state variable “Who shot
Berardelli?”.
6.2
Items of evidence
The case is very complicated, as so is the available evidence. We consider different categories
of evidence
• Witness evidence, EW
• Physical evidence, EP
• Consciousness of guilt evidence, EC .
See Cheung (2005) for more details. These are typical items of evidence in criminal cases,
see also Chapter 5. The aim of using Bayesian networks is to create a structure for both
representing such a complex evidence and make sense of it. The resulting likelihood ratio
will be
P (E|H = 1)
P (E|H = 0)
=
P (EW |H = 1) P (EP |H = 1) P (EC |H = 1)
×
×
P (EW |H = 0) P (EP |H = 0) P (EC |H = 0)
if we make the (restrictive) assumption that different sources of evidence are independent,
where E = EW ∪ EP ∪ EC .
6.2.1
Witness evidence
A temptative representation of the witness evidence is in Figure 6.1 When dealing with
testimonial evidence credibility of the witnesses plays an important role. An idea would
be to use object-oriented Bayesian networks to describe witness credibility, both based on
the moral judgements about the person and on objective criteria (for instance, if there was
enough light for the witness to see etc.). Moreover, interactions between witnesses have to
be taken into account (influence of one witness on another one...), though in the picture
below they are considered as independent. Also, mistake and deception have to be taken
into account when assessing relevance of testimonies, see Chapter 3. The variables involved
are
H
H1
Sacco did it?
was Sacco at the crime scene?
37
P
Pelser’s testimony
W
Wade’s testimony
C
Costantino’s testimony.
After having established the probabilistic relationships among witnesses and the relevant
hypotheses, conditional indepedencies can be exploited as in Cheung (2005) for likelihood
ratio computations. The likelihood ratio based on witness evidence
P (EW |H = 1)
P (EW |H = 0)
will be combined with the likelihood ratios based on the remaining evidence.
H
H1
S
W
P
C
Figure 6.1: Witness evidence for the Sacco and Vanzetti case.
Other witness evidence relates to Sacco’s alibi, but it is not considered here. See also
the “Report” fragment in Chapter 3.
6.2.2
Physical evidence
The following items were introduced as evidence at trial: a 32-caliber Winchester bullet
(exhibit 18) extracted from Berardelli’s body, a 32-caliber Colt (exhibit 28) belonging to
Sacco, a Winchester shell found at the crime scene, a cap belonging to Sacco found at the
crime scene. We define the following variables
B1
characteristics of the bullet
B2
was the bullet really extracted from Berardelli’s body?
B3
was Berardelli killed by that bullet?
B4
expert’e testimony about the bullet
G1
characteristics of the gun
38
G2
is the gun really belonging to Sacco?
G3
Sacco’s testimony about the gun
G4
police’s testimony about the gun
S1
characteristics of the shell
S2
was the shell really found at the crime scene?
S3
expert’s testimony about the shell
T
result from the firing test
C1
characteristics of the cap shown at trial
C2
characteristics of Sacco’s hat
C3
was the cap found at crime scene?
C4
Sacco’s testimony about the cap
C5
other testimonies about the cap.
The last variable C5 only indicates that there is a complicated structure of further testimonies about these items, which we will not model for now. As it often happens when
dealing with evidence related to criminal cases, this is a problem of matching different
items of evidence. This is described in more detail in Chapter 3. We construct two separate
Bayesian networks, as the firearm evidence relates to H, whereas the cap evidence relates
to H1 , and they can be considered as independent.
H
B2
G2
G3
G1
G4
B1
T
B3
B4
S1
S2
Figure 6.2: Firearm for the Sacco and Vanzetti case.
39
S3
H1
C1
C3
C5
C2
C4
Figure 6.3: Cap evidence for the Sacco and Vanzetti case.
6.2.3
Consciousness of guilt evidence
This kind of evidence is more difficult to interpret. It relates to the behaviour of Sacco at
the moment of his arrest, see Figure 6.4. We consider the following variables
X1
Sacco was conscious when he was arrested
X2
Sacco was somehow involved with the crime
X3
Sacco was involved in other crimes
X4
Sacco intended to escape from the police when they arrested him
X5
Sacco attempted to take the gun out of his coat when he was arrested
X6
Sacco’s testimony
X7
police’s testimony.
6.2.4
Combining all the evidence
If we combine the three sources of evidence together and consider them as conditionally
independent (restrictive assumption) we obtain a network like the one in Figure 6.5. We
then have to fix the values of the observed variables.
6.3
Remarks and future work
The model described above is a very simple one. The idea would be to work on it and make
it better. Also, not all the elements in the case have been taken into account. Interesting
issues include
40
H
X2
X3
X5
X6
X1
X4
X7
Figure 6.4: Consciousness of guilt evidence for the Sacco and Vanzetti case.
H
E1
H1
E2
E3
Figure 6.5: Combining all the evidence for the Sacco and Vanzetti case.
• How to combine sources of evidence (not just independence)
• How to relate the Bayesian network analysis to the Wigmore chart analysis in Kadane
and Schum (1996)
• Witness evidence and witness credibility
and more...
41
References
Anderson, T., Schum, D. and Twinings, W. (2005). Analysis of evidence, second
edition. Cambridge University Press.
Baio, G. and Corradi, F. (2004). Handling Manipulated Evidence. Working Paper
no. 13, Department of Statistics “G. Parenti”, University of Florence, Italy.
Cavallini, D. and Corradi, F. (2005). OOBN for forensic identification trough searching a DNA profile’s database. In Proceedings of AISTATS 2005.
Cheung, C. (2005). The analysis of mixed evidence using graphical and probability
models with application to the Sacco and Vanzetti case. BSc project, UCL.
Dawid, A. P. (1987). The difficulty about conjunction. The Statistician 36, 91-97.
Dawid, A. P. (2000). Causal inference using influence diagrams: the problem of partial
compliance. Research Report no. 213, Department of Statistical Science, University
College London.
Dawid, A. P. (2002). Influence diagrams for causal modelling and inference. International Statistical Review 70, 161-189.
Dawid, A. P. (2003). An object-oriented Bayesian network for estimating mutation
rates. In Proceedings of the Ninth International Workshop on Artificial Intelligence
and Statistics, January 3-6 2003, Key West, Florida, edited by Christopher M. Bishop
and Brendan J. Frey. ISBN 0-9727358-0-1.
Dawid, A. P. (2005). Statistics and the law. In Evidence, edited by Karin Tybjerg,
John Swenson-Wright and Andrew Bell. Cambridge University Press (to appear).
Dawid, A. P. and Evett, I. W. (1997). Using a graphical method to assist the evaluation of complicated patterns of evidence. Journal of Forensic Sciences 42, 226-231.
Dawid, A. P., Mortera, J. and Vicard, P. (2005). Object-oriented Bayesian networks
for complex forensic DNA profiling problems. Technical Report 256, Department of
Statistical Science, University College London.
42
Dawid, A. P. and Schum, D. A. (2004). Bayes, Wigmore and inference networks: a
dialogue. Technical report.
Kadane, J. B. and Schum, D. A. (1996). A probabilistic analysis of the Sacco and
Vanzetti evidence. Wiley.
Lauritzen, S. L. (2003). Graphical models for surrogates. Bulletin of the International
Statistical Institute 60, 144-147.
Leucari, V. (2005). Analysis of complex patterns of evidence in legal cases: Wigmore
charts vs. Bayesian networks. Working paper.
Levitt, T. S. and Laskey, K. B. (2001). Computational Inference for Evidential Reasoning in support of Judicial Proof, Cardozo Law Review 22, 1691-1731.
Pearl, J. (2000). Causality. Cambridge University Press.
Schum, D. A. (2001). Evidential foundations of probabilistic reasoning. Northwestern.
Schum, D. A. (2004). Capturing an interesting subtlety involving a source of testimonial evidence. Technical report.
Schum, D. A. (2005). A Wigmorean interpretation of the evaluation of a complicated
pattern of evidence. Technical report.
43
Download