Formal tools for handling evidence Valentina Leucari Leverhulme/ESRC Research Programme “Evidence, inference and enquiry” Department of Statistical Science University College London 2005-2006 Table of Contents Table of Contents 2 1 Introduction 3 2 Bayesian networks for the analysis of evidence 2.1 Evidence and Bayesian networks . . . . . . . . . . . . . . . . . . . . . . . . 5 5 3 Bayesian network fragments for representing evidence 3.1 Some recurrent fragments . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Remarks and future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 7 11 4 Recurrent combinations of evidence 4.1 Contradiction and corroboration . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Conflict and convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3 Remarks and future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 16 19 19 5 Evidence in legal cases: Wigmore charts and Bayesian networks 5.1 A criminal case example . . . . . . . . . . . . . . . . . . . . . . . . . 5.2 The Wigmore chart analysis . . . . . . . . . . . . . . . . . . . . . . . 5.3 The Bayesian network analysis . . . . . . . . . . . . . . . . . . . . . 5.3.1 A simple Bayesian network . . . . . . . . . . . . . . . . . . . 5.3.2 An object-oriented Bayesian network . . . . . . . . . . . . . . 5.4 A comparison between Wigmore charts and Bayesian networks . . . 5.5 Remarks and future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 21 23 25 26 28 32 34 6 A Bayesian network analysis of the Sacco 6.1 The case . . . . . . . . . . . . . . . . . . . 6.2 Items of evidence . . . . . . . . . . . . . . 6.2.1 Witness evidence . . . . . . . . . . 6.2.2 Physical evidence . . . . . . . . . . 6.2.3 Consciousness of guilt evidence . . 6.2.4 Combining all the evidence . . . . 6.3 Remarks and future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 36 37 37 38 40 40 40 References and . . . . . . . . . . . . . . . . . . . . . Vanzetti case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 2 Chapter 1 Introduction This is a report of my research activity within the “Evidence, Inference and Enquiry” programme. I have been involved in the “Formal Tools for Handling Evidence” project, and my work has been motivated by the investigation of the use of probability and statistics for analysing evidence. The aim of the project is to • Identify generic principles for representing and handling evidence • Develop formal methods for expressing and manipulating them • Explore their applications. From a statistical perspective, a formal analysis of evidence entails a description of both the problem and the related evidence through a model, identification of relevant hyphotheses, quantification of prior knowledge and application of probabilistic techniques to evaluate the evidence. Within such a general framework, I have focused my research on a specific statistical tool, namely Bayesian networks, with the aim of analysing complex structures of the evidence arising in different areas. Contributions to the overall “Evidence, Inference and Enquiry” programme entail exploring the application of formal methods to different disciplines, developing a systematic rigorous method to analyse evidence, and providing general tools for drawing inferences from the observed evidence. Most of my research work so far has been focused on representing probabilistic structures of the evidence through Bayesian networks. Besides the overlapping of some of the topics discussed in different chapters, the main areas of interest are • Bayesian networks for representing and evaluating complex evidence (see Chapter 2 and 3): some features of Bayesian networks, such as conditional independence relationships, causality, evidence propagation, incorporation of new evidence, make them a powerful tool for evidential reasoning. • Representation of recurrent structures of the evidence (see Chapter 3 and 4): we define simple Bayesian network fragments describing recurrent and very general patterns in 3 the way evidence arises and combine them in object-oriented networks. • Interactions between different items of evidence (see Chapter 4): different sources of evidence may exhibit interaction patterns that determine the inference drawn on certain hypotheses of interest. • Analysis of evidence in legal cases (see Chapter 5 and 6): we introduce Bayesian networks as a method for a probabilistic description of legal cases in terms of the available evidence, and illustrate them with both fictitious examples and real cases. We also compare Bayesian networks and Wigmore charts, a graphical method used in forensic science for describing legal reasoning. In the following chapters research developed in such areas is briefly presented, as well as directions for future work. Most of this research is still work in progress. 4 Chapter 2 Bayesian networks for the analysis of evidence Evidence interpretation has principles and general aspects that are common throughout different disciplines. Evaluating complex patterns of evidence presents the problem of understanding all of the dependencies which may exist between different aspects of the evidence (see for instance multiple sources of evidence in legal cases). A graphical method can provide a valuable aid for overcoming such difficulties. In this chapter, we briefly describe some features of Bayesian networks that make them a powerful tool for the analysis of evidence. 2.1 Evidence and Bayesian networks Here are some general thoughts on why it is interesting (and hopefully useful) to use Bayesian networks. Some of these issues will be discussed in the following chapters. • Items of evidence form complex interrelated chains or webs, where relevance and weight of any specific piece of evidence can only be assessed in the light of its relation to other pieces of evidence. Bayesian networks accomplish this by means of conditional independence relationships. • According to Schum (2001), evidence has to be evaluated on the basis of three fundamental attributes: 1) relevance, 2) credibility, 3) strength. In terms of Bayesian networks, this translates into 1) value of information (it is possible to quantify the impact of additional evidence in a certain model), 2) conditional probability tables (by specifying CPTs we assess credibility of the items of evidence), 3) likelihood ratio. • Evidence can be entered in a model either by fixing the value of some variables or by introducing likelihood evidence. 5 • Evidence can be propagated through the model, so additional evidence can be entered at any time during the process. • Prior information (conditional probability tables) can be used when defining a model. • A model can be causal, i.e. causal relationships between different items of evidence can be taken into account. • Complex evidence structures can be easily handled by object-oriented Bayesian networks. • Bayesian networks allow for: 1) representing the available evidence, both in a qualitative and quantitative fashion, 2) drawing (statistical) inference from the evidence. • It is possible to perform sensitivity analyses and evaluate different models for the same problem. • Possibility of combining semantic of the evidence and syntax of the evidence in a Bayesian network? Some potential drawbacks • In some problems (e.g. legal cases) the same evidence is used twice: for building the network and for computing likelihood ratios. Bayesian networks are a tool for making inference, already inferred inference should not be included in the net. • Limitations of Hugin (nodes cannot be output and input at the same time, conditional probability tables cannot be changed when instances of a certain network are used in different context, and others). • In some problems (see Chapter 5) it would be useful to have a “dynamic” model which allows for taking decisions depending on the observed evidence. • Problems when there is the need for conditioning on variables that are at the bottom of the net (as it sometimes happens in Wigmore charts?) 6 Chapter 3 Bayesian network fragments for representing evidence It is often possible to find certain structures that are used repeatedly within a single Bayesian network and also throughout different networks constructed for different problems. An object-oriented Bayesian network allows to first define general basic networks (we call them fragments) and then to combine them together in a main network, having as nodes both random variables and Bayesian networks which are instances of the general fragments previously defined, see Dawid (2003) and Dawid et al.(2005). We are interested in finding Bayesian network fragments for representing recurrent evidence structures. Relationships between different items of evidence, the process whereby evidence arises and evidential reasoning all exhibit recurrent patterns that can be captured by small general idioms. Schum (2004) discusses recurrent interaction schemes between items of evidence. In Levitt and Laskey (2001), fragments are introduced in a legal context. 3.1 Some recurrent fragments Below, we present a few general fragments together with examples of their applications. Combining these recurrent network structures allows an easy representation of mixed sources of evidence. This has been done in legal examples (see Chapter 5 and 6), but this kind of models is generally applicable within other frameworks and in general every time we are dealing with items of evidence. In Leucari (2005) an object-oriented Bayesian network has been constructed for a fictitious burglary example, using some of the network structures that are described below (see also Chapter 5). Report. This network represents a report about a certain item of evidence (a physical item or an event). The structure of the network is shown in Figure 3.1, and the variables are 7 I1 the true item or event (input node) I2 a “randomly chosen” item or event (unrelated to the true one) R the source’s report about I1 C a biased coin (with an assigned probability parameter). I1 C I2 R Figure 3.1: The “Report” fragment. The report can be correct, and hence equal to the true item I1 , or wrong, and hence equal to some random item I2 , independent of I1 . The biased coin is used for modelling non-symmetric error, as it is reasonable to assume that the source is more likely to make mistakes in one direction (e.g. it is more likely that the source says “true” when the event is “false” rather then “false” when the event is “true”, or viceversa). This fragment can be used, for instance, for modelling witness testimonies, or results from laboratory analyses, e.g. on blood or DNA samples (see the burglary example in Chapter 5). If we assume that Ii , i = 1, 2, are binary variables and define P (I2 = 1) = β P (C = head) = π so that β = 1/2 corresponds to symmetric error, we can compute the probabilities of obtaining a correct report as f1 = P (R = 1|I1 = 1) = π + β(1 − π) f0 = P (R = 0|I1 = 0) = π + (1 − β)(1 − π) so that π = f0 + f1 − 1. Notice that π = 0 (i.e. f1 = β and f0 = 1 − β) corresponds to deception, whereas π = 1 (i.e. f1 = 1 and f0 = 1 ) to accuracy. It has also to be taken into account that errors happen for different reasons: the source can be mistaken, or deceptive, etc. Assuming that mistake and deception are the two only reasons for an error, a possible model for the source’s error is in Figure 3.2, where E=error, M =mistake, D=deception. A 8 M D E Figure 3.2: A model for errors in a report. possible functional relationship (assuming binary variables) is E = M D + M (1 − D) + D(1 − M ). This needs further investigation. Match. This network represents a situation where a match between different items of evidence is investigated. For instance, a trace left at the crime scene has to be compared to traces coming from different suspects, e.g. blood or DNA samples. This kind of model has been studied in Cavallini and Corradi (2005) in a legal context. See also Chapter 5 for further legal applications. The network structure is shown in Figure 3.3, and the variables are H hypothesis of interest, for instance “Is the suspect guilty?” (input node) T trace we are interested in T1 , . . . , Tn possible sources for the trace. H T1 T2 . . . Tn T Figure 3.3: The “Match” fragment. Variable H should have n states so that, for example, evidence of T = Ti would increase the probability of suspect i being guilty. Notice that, when using instances of this network in different contexts, it might be useful to be able to leave n unspecified, and fix it according to the problem at hand (this is currently unfeasible in Hugin). A possible solution could be the model in Figure 3.4, where N is a random variable uniformly distributed over {1, . . . , n}, to be fixed to the desired number, depending on the problem. 9 H T1 T N T2 . . . Tn Figure 3.4: An alternative “Match” fragment. Contradiction. This network represents a situation where two sources give contradictory evidence, and has been introduced by Schum (2001, 2004). The network in shown in Figure 4.1, and the variables are H hypothesis of interest (the arrow line is dotted since this variables is not strictly part of the fragment) E evidence (input node) S1 source 1 of evidence S2 source 2 of evidence. H E S1 S2 Figure 3.5: The “Contradiction” fragment. See also Chapter 4 for a more accurate analysis. Notice that since contradictory evidence involves events that cannot happen at the same time, credibility of the sources becomes relevant (see the next section for a model for credibility). Conflict. This network represents a situation where two sources of evidence are in conflict. As the “Contradiction” fragment above, this is a model of dissonant evidence, but unlike the “Contradiction” model, the “Conflict” model does not necessarily involve incompatible 10 sources. This is because the two sources refer two different events. The network is shown in Figure 4.2, and the variables are H hypothesis of interest (does not directly enter the fragment) E1 intermediate evidence (input node) E2 intermediate evidence, unrelated to E1 (input node) S1 source 1 of evidence S2 source 2 of evidence. H E1 E2 S1 S2 Figure 3.6: The “Conflict” fragment. For more detailed explanations, see Schum (2001). Corroboration. A model for harmonious evidence: two sources give corroborative evidence about a hypothesis of interest, see Schum (2001). The model is the same as the one for conflicting evidence, except for the value of S1 and S2 (that must be equal). Convergence. A model for harmonious evidence: two sources give convergent evidence, though only indirectly related to a hypothesis of interest, see Schum (2001). The model is the same as the one for contradictory evidence, except for the value of S1 and S2 (that must be equal). 3.2 Remarks and future work There are some more fragments that have not been explored yet. They are listed below. 11 Credibility. Credibility of a source of evidence depends on many factors, both subjective and objective. This fragment has been elaborated by Dawid and Schum. There are two levels: the top level is the “Credibility” fragment, which contains instances of other networks, “Competence” and “Filter”. The “Credibility” network in shown in Figure 3.7, and the variables are E event C competence S sensation, instance of “Sensation” O objectivity, instance of “Filter” V veracity, instance of “Filter” T testimony. E C S O V T Figure 3.7: The “Credibility” fragment. The “Sensation” network in shown in Figure 3.8 and the variables are C competence 12 A agreement S sensation. C A S Figure 3.8: The “Sensation” fragment. Finally, the “Filter” network is shown in Figure 3.9 and the variables are In in Co correct Out out. In Co Out Figure 3.9: The “Filter” fragment. These networks are self-explanatory. Notice that this fragment can be used in conjunction with the “Report” fragment for modelling reliability of the source. Manipulation. An issue directly related to credibility of the sources is the one of manipulated evidence. This network, introduced in Baio and Corradi (2004), models a situation of uncertainty about whether the evidence that is available is genuine or somehow manipulated. The network for manipulated evidence in shown in Figure 3.10 and the variables are H hypothesis of interest (input node) W indicator of presence/absence of manipulation 13 T uncertain evidence A control evidence. H W T A Figure 3.10: The “Manipulated evidence” fragment. Evidence T can be either genuine or manipulated, but this is unknown. Variable A is additional evidence, certainly genuine, that helps to investigate about the origin of the unclear node T . For a more detailed description see Baio and Corradi (2004). Explaining away. Two possible causes of the same event; knowledge of one of them being true lowers the probability of the other one, see for example Pearl (2000). The network is shown in Figure 3.11 and the variables are X1 cause 1 X2 cause 2 Y event. X1 X2 Y Figure 3.11: The “Explaining away” fragment. Confounding. This network represents the effect of some variable that confounds the relationships between two or more variables, see Dawid (2000, 2002) and Lauritzen (2003). The network is shown in Figure 3.12 and the variables are U unobserved confounder 14 T “treatment” variable S covariate R “response” variable. U T S R Figure 3.12: The “Confounding” fragment. More fragments to be developed • Interactions between witnesses (when the testimony of some witness is influenced or forced by some other witness) • Alternative explanations for the same event (as they use in Wigmore charts, see Anderson et al., 2005) • The use of generalisations for deriving inferences (see Wigmore charts)? 15 Chapter 4 Recurrent combinations of evidence We consider recurrent patterns of interaction between evidence items, as described in Schum (2001), and give a probabilistic interpretation of such structures. Recurrent combinations of evidence include both dissonant (contradiction and conflict) and harmonious (corroboration and convergence) evidence. Combinations of evidence items are also discussed in Dawid (1987). 4.1 Contradiction and corroboration Consider an item of evidence E and two sources of evidence S1 and S2 for such item. The probabilistic structure of this kind of interaction is represented in Figure 4.1, where S1 and S2 are conditionally independent given E. The variable H is the hypothesis of interest for the overall problem, but we will not discuss its role for the moment (this is the reason for the dotted line in the picture). We assume all variables are binary. Generally, we observe H E S1 S2 Figure 4.1: Contradiction/corroboration. 16 E=0 E=1 S1 = 0 β1 1 − α1 S1 = 1 1 − β1 α1 Table 4.1: Conditional probability table for S1 |E. E=0 E=1 S2 = 0 β2 1 − α2 S2 = 1 1 − β2 α2 Table 4.2: Conditional probability tables for for S2 |E. S1 and S2 , whereas E is not observed, and we are interested in the likelihood ratio λ= P (S1 , S2 |E = 1) P (S1 |E = 1) P (S2 |E = 1) = × . P (S1 , S2 |E = 0) P (S1 |E = 0) P (S2 |E = 0) Evidence can be as follows i) S1 = 1, S2 = 0 ii) S1 = 0, S2 = 1 iii) S1 = 1, S2 = 1 iv) S1 = 0, S2 = 0. Evidence as in (i) and (ii) is termed contradictory, whereas evidence as in (iii) and (iv) is termed corroborating, for obvious reasons. The usual parameterisation is by means of conditional probabilities, as shown in Table 4.1 and 4.2. Such probabilities represent the prior beliefs in the sources telling the truth about E, namely α1 = P (S1 = 1|E = 1) β1 = P (S1 = 0|E = 0) α2 = P (S1 = 1|E = 1) β2 = P (S1 = 0|E = 0). 17 When dealing with contradictory evidence a fundamental issue is assessing credibility of the sources. A useful reparameterisation is given by likelihood ratios P (S1 = 1|E = 1) P (S1 = 1|E = 0) P (S1 = 0|E = 1) λ− 1 = P (S1 = 0|E = 0) P (S2 = 1|E = 1) λ+ 2 = P (S2 = 1|E = 0) P (S2 = 0|E = 1) λ− 2 = P (S2 = 0|E = 0) λ+ 1 = α1 1 − β1 1 − α1 = β1 α2 = 1 − β2 1 − α2 = β2 = which directly represent credibilities. The corresponding probability tables are shown in Table 4.3 and 4.4. The overall likelihood ratio in (i)-(iv) is respectively + λ = λ+ 1 λ2 − λ = λ− 1 λ2 − λ = λ+ 1 λ2 + λ = λ− 1 λ2 . When the sources are contradictory they contribute to the overall likelihood ratio in opposite ways: the impact of the sources largely depends on their credibility. Credibility is not so relevant when the evidence is corroborative. The relationship between the two parameterisations is α1 = − λ+ 1 (1 − λ1 ) − λ+ 1 − λ1 β1 = λ+ 1 −1 + λ1 − λ− 1 − + − and similarly for variable S2 . Notice that if λ+ 1 > 1 then λ1 ≤ 1, i.e. (λ1 , λ1 ) are not variation independent and take values in (−∞, 1] × (1, +∞) ∪ [1, +∞) × (−∞, 1). If λ+ 1 >1 then source S1 is supporting the evidence E = 1: the more credible S1 is the stronger the support. Of course, one can perform sensitivity analyses in order to assess the value of these parameters. As special cases: if we cannot say anything about the credibility of S1 we can − assign λ+ 1 = λ1 = 1 (this is equivalent to set α1 = β1 = 0.5), and if we believe S1 to be 1 credible both in supporting E = 1 and E = 0 we can assign λ+ 1 = λ− (this is equivalent to 1 set α1 = β1 = 1). Using logs we can define λ̃+ = log λ+ 1 1 λ̃− = log λ− 1 1 18 E=0 E=1 λ+ 1 −1 − λ+ 1 −λ1 1−λ− 1 − λ+ 1 −λ1 + λ− 1 (λ1 −1) + λ1 −λ− 1 − λ+ 1 (1−λ1 ) + − λ1 −λ1 S1 = 0 S1 = 1 Table 4.3: Conditional probability table for S1 |E in terms of likelihood ratios. E=0 E=1 λ+ 2 −1 − λ+ 2 −λ2 1−λ− 2 − λ+ 2 −λ2 + λ− 2 (λ2 −1) + λ2 −λ− 2 − λ+ 2 (1−λ2 ) + λ2 −λ− 2 S2 = 0 S2 = 1 Table 4.4: Conditional probability table for S2 |E in terms of likelihood ratios. − so that λ̃+ 1 > 0 (and λ̃1 < 0) represents support for E = 1, and similarly for the other source. With such a parameterisation, contradictory and corroborative evidence is represented respectively by − λ̃ = λ̃+ 1 + λ̃2 + λ̃ = λ̃+ 1 + λ̃2 or or + λ̃− 1 + λ̃2 − λ̃− 1 + λ̃2 . Table 4.5 shows relationships between evidence and credibility of the sources. Source S1 is − credible when λ+ 1 > λ1 , and similarly for source S2 . 4.2 Conflict and convergence H E1 E2 S1 S2 Figure 4.2: Conflict/convergence. 4.3 Remarks and future work • More on contradiction/corroboration 19 Evidence Likelihood ratio Contradiction S1 = 1, S2 = 0 − λ = λ+ 1 λ2 S1 = 0, S2 = 1 + λ = λ− 1 λ2 S1 = 1, S2 = 1 λ= + λ+ 1 λ2 S1 = 0, S2 = 0 − λ = λ− 1 λ2 Conjunction Effect of evidence (based on credibility) Both Only Only Both Only Only − credible: λ > 1 if λ+ 1 > 1/λ2 S1 credible: λ > 1 S2 credible: λ < 1 − credible: λ > 1 if λ+ 2 > 1/λ1 S1 credible: λ < 1 S2 credible: λ > 1 Both Only Only Both Only Only credible: λ > 1 S1 credible: λ > 1 S2 credible: λ > 1 credible: λ < 1 S1 credible: λ > 1 S2 credible: λ > 1 + if λ+ 1 > 1/λ2 + if λ+ 2 > 1/λ1 − if λ− 2 > 1/λ1 − if λ− 1 > 1/λ2 Table 4.5: Conflicting/corroborative evidence and credibility. • A similar analysis has to be done for the more complex case of conflicting and convergent evidence • Application of these models to the Sacco and Vanzetti case (see Chapter 6.) 20 Chapter 5 Evidence in legal cases: Wigmore charts and Bayesian networks In forensic science a major task is interpreting patterns of evidence which involve many variables, and combining different items of evidence within a complex framework of circumstances. Typical features of the evidence arising from legal cases are its complex structure and ambiguity. Therefore, marshalling and evaluating evidence are two fundamental issues in forensic science, both for constructing arguments about questions of fact and for taking final decisions. The chart method for analysing evidence introduced by Wigmore is a technique which allows to organise and describe the available evidence, and to construct reasoning processes through sequential steps (see Anderson et al., 2005). Bayesian networks are a general statistical tool which can be applied in the legal context to model relationships between different sources of evidence, weigh the available evidence, and draw statistical inferences from it. Wigmore charts and Bayesian networks are clearly different in nature, but both of them are an attempt to a rigorous and formal approach to the analysis of evidence. Therefore, we are interested in pointing out strengths and weaknesses of such tools and exploring possible interactions between them. This continues the work started in Dawid and Schum (2004). For a general discussion of statistics applied to the analysis of evidence in legal cases see Dawid (2005). 5.1 A criminal case example In Dawid and Evett (1997) a hypothetical criminal case is presented, involving a complicated pattern of interactions between different items of evidence. Such interactions are represented in a Bayesian network, and the likelihood ratio for the hypothesis “Is the person prosecuted for this crime truly the offender?” is then computed. In Schum (2005) the same example –and the same evidence– is analysed via the Wigmore chart method, and arguments are constructed in order to charge the defendant with the crime. 21 The story is as follows (see Dawid and Evett, 1997). An unknown number of offenders entered commercial premises late at night through a hole which they cut in a metal grille. Inside, they were confronted by a security guard (Willard R. in Schum, 2005) who was able to set off an alarm before one of the intruders punched him in the face, causing his nose to bleed. The intruders left from the front of the building just as a police patrol car was arriving and they dispersed on foot, their getaway car having made off at the first sound of the alarm. The security guard said that there were four men but the light was too poor for him to describe them and he was confused because of the blow he had received. The police in the patrol car (Detective Inspector Leary in Schum, 2005) saw the offenders only from a considerable distance away. They searched the surrounding area and, about ten minutes later, one of them found the suspect (Harold S. in Schum, 2005) trying to “hot wire” a car in an alley about a quarter of a mile from the incident. At the scene, a tuft of red fibers was found on the jagged end of one of the cut edges of the grille. Blood samples were taken from the guard and the suspect. The suspect denied having anything to do with the offence. He was wearing jumpers and jeans that were taken for examination. A spray patterns of blood was found on the front and right sleeve of the suspect’s jumper. The blood type was different from that of the suspect, but the same as that from the security guard. The tuft from the scene was found to be red acrylic. The suspect’s jumper was red acrylic. The tuft was indistinguishable from the fibers of the jumper by eye, microspectrofluorimetry (MSF) and thin layer cromatography (TLC). The jumper was well worn and had several holes, though none of them could clearly be said to be a possible origin for the tuft. This example, though quite simple at first sight, possesses many of the features of legal cases that give raise to complex structures of the associated evidence. Most of them are obvious, but it may be useful to highlight them, as they recur in many criminal cases: • There are multiple and different sources of evidence: an item found at the crime scene (the tuft of fibres), an item belonging to the suspect (the jumper), a trace left by an unknown individual (the blood stain on the jumper), people involved with the crime (the suspect, the guard, the police officer). The structure of such mixed evidence is described in Figure 5.1. • Besides evidence directly related to the crime scene, further evidence can be (and will be) collected: laboratory analyses on the fibers found at the crime scene, tests on blood samples (taken from the suspect’s jumper and possibly from various people.) • There are no data, in the statistical meaning of repeated observations of an experiment: the “experiment” is unique and not replicable. • Uncertainty can be introduced at various levels: are we willing to assume that there was a crime (and take this for granted)? And many other such examples. 22 Notice that there are other classes of problems in other disciplines that exhibit some of these features or similar issues. NUMBER OF OFFENDERS FIBER EVIDENCE SUSPECT GUILTY? BLOOD EVIDENCE WITNESS EVIDENCE Figure 5.1: Mixed evidence for the burglary example. The aim of our work is to compare and synthesize the two analyses: starting from the original Bayesian network in Dawid and Evett (1997), a more detailed network is constructed by taking into account the process followed in Schum (2005) of breaking down the evidence into single “units” and building a chain of reasoning by connecting them. The ultimate objective would be the development of a formal method to handle multiple sources of evidence in legal cases (and not only) by jointly exploiting Wigmore charts and Bayesian networks. This synthesis is largely at its very early stages and the model presented here is a first attempt at “combining” the two methods, and by no means the final product. 5.2 The Wigmore chart analysis In Schum (2005) a Wigmore chart analysis of the example above is presented. The objectives of such analysis are • Marshalling and organising the available evidence • Constructing arguments from evidence to penultimate probanda • Establishing the probative force of an emerging collection of evidence • Describing a (subjective) chain of inferences. The Wigmorean model is therefore constructed as a reasoning process aiming at proving the following ultimate probandum (U ) and penultimate probanda (P1 , P2 , P3 , P4 ) U Harold S. unlawfully and intentionally assaulted and injured the security guard Willard R. during a break-in at the Blackbread Brewery premises in the early morning hours of 1 May, 2003. P1 In the early morning hours of 1 May, 2003, four men unlawfully broke into the premises of the Blackbread Brewery 23 P2 Harold S. was one of the four men who broke into the premises of the Blackbread Brewery in the early morning hours of 1 May, 2003 P3 A security guard at the Blackbread Brewery, Willard R., was assaulted and injured during the break-in at the Blackbread Brewery on 1 May, 2003. P4 It was Harold S. who intentionally assaulted and injured Willard R. during the break-in at the Blackbread Brewery in 1 May, 2003. The structure of the chart is as in Figure 5.2, where only the top part is shown. The U P1 CHART FOR P1 P2 P3 CHART FOR P2 CHART FOR P3 P4 CHART FOR P4 Figure 5.2: The top of the Wigmore chart for the burglary example. Wigmore method consists of several steps a) Defining the ultimate probandum and the penultimate probanda b) Parsing and organising the evidence into trifles, i.e. assessing relevance c) Assigning trifles to penultimate probanda d) Constructing key lists bearing upon the probanda e) Drawing a chart that shows inferential linkages among elements in the key lists, where (a)-(c) and (d)-(e) are called analysis and synthesis respectively. This process applied to the burglary example is described in detail in Schum (2005). For illustration, we only describe the key list and Wigmore chart for P4 . Items 1-82 are in the key lists for P1 , P2 , P3 . The key list for P4 is 83. A blood sample was taken from Willard R. on 1 May, 2003 84. DI Leary testimony about 83 24 85. A blood sample was taken from Harold S. on 1 May, 2003 86. DI Leary testimony to 85 87. A spray pattern of blood was found on the front and right sleeve of the jumper belonging to Harold S. 88. DI Leary testimony to 87 89. The jumper showing the blood stains to be shown at trial 90. The jumper shown at trial is the same one taken from Harold S. after his apprehension on 1 May, 2003 91. The blood type of the blood on Harold S.’s jumper matches the blood type of the security guard 92. DI Leary testimony to 91 93. A tangible record of the blood match analysis to be shown at trial 94. The analysis shown at trial is the same one reported by the forensic scientist who performed the analysis 95. The blood on Harold S.’s jumper was not already there before the break-in on 1 May, 2003 96. The blood on Harold S.’s jumper came from Willard R.’s nose on 1 May, 2003 97. Harold S.’s testimonial denial of P4 , that he was the one who punched the security guard Willard R. on 1 May, 2003. The corresponding Wigmore chart is shown in Figure 5.3, where white circles denote what has to be proven and black circles are certain. 5.3 The Bayesian network analysis Dawid and Evett (1997) show a possible Bayesian network for modelling relationships between various items of evidence. Here we try to improve the original network based on the Wigmore chart analysis in Schum (2005). Trying to translate from one technique to the other would be the most intuitive way of proceeding, but the two methods are too different in nature for this to be profitable. Rather, Wigmore charts could be usefully combined with Bayesian networks in order to provide a more satisfying model to be used to derive inference from the observed evidence. This is what we discuss in the remaining of this section. 25 P4 96 83 85 97 87 95 90 91 84 86 88 94 89 92 93 Figure 5.3: The Wigmore chart for P4 . 5.3.1 A simple Bayesian network The key list in Schum (2005), p. 5, describes all the items of evidence related to the burglary example and used to build the corresponding Wigmore chart. Based on such a list, we define the variables described in Table 5.1 (the variable names in parenthesis are the labels originally used in Dawid and Evett, 1997) and build a more elaborate (compared to the original one in Dawid and Evett, 1997) Bayesian network, see Figure 5.4. All the elements in the key list are taken into account in this “extended” model: in particular, uncertainty about the crime (Did it really happen?) is introduced, see variables C1 , C2 in the network, as well as some details about the crime, see variables C4 , C5 , C8 . Moreover, witnesses are explicitly represented. In Table 5.1 some of the names in Schum (2005) are used: BB stands for Blackbread Brewery, HS is the suspect, DI Leary is the detective inspector who apprehended the suspect. Finally, I is the indicator function, that takes value 1 if the event is true and 0 otherwise. Notice that the graph obtained by removing all the new variables is the same as the one in Figure 1 in Dawid and Evett (1997) and it is shown in Figure 5.5. Once the evidence (E) has been entered in the network, i.e. the observed variables have 26 Label Variable description States C1 C2 C3 (N ) C4 C5 C6 (C) C7 C8 (B) F1 F2 (A) F3 (Y1 ) J1 J2 (X3 ) B1 (X1 ) B2 (X2 ) B3 (R) I {Someone made a cut in the grille in order to break in} I {An unknown number of persons entered BB} Number of offenders I {One of the offenders punched the guard} Consequences of the punch I {HS is guilty, i.e. he is one of the offenders} I {HS was nearby the scene of the crime} Identity of the person who punched the guard I {Fibre tuft shown at trial was found at crime scene} Identity of the person who left the fibre tuft in the grille Properties of the fibre tuft I {The jumper shown at trial is HS’s jumper} Properties of HS’s jumper Blood sample from HS Blood sample from the guard Shape of blood stain on jumper B4 (Y2 ) P ICC1 P ICC5 P ICC7 M SF T LC LC1 LC2 LC7 LF 1 LF 3 LJ 1 LJ 2 LB1 LB2 LB3 LB4 LM SF LT LC GC2 GC3 (G1 ) GC4 GC5 (G2 ) HSC6 HSC8 Blood type on jumper Evidence from the photo of the hole in the grille Evidence form the photo of the guard’s injury Evidence from the photo of HS after he was apprehended Result of microspectrofluorimetry Result of thin layer cromatography DI Leary’s testimony about C1 DI Leary’s testimony about C2 DI Leary’s testimony about C7 DI Leary’s testimony about F1 DI Leary’s testimony about F3 DI Leary’s testimony about J1 DI Leary’s testimony about J2 DI Leary’s testimony about B1 DI Leary’s testimony about B2 DI Leary’s testimony about B3 DI Leary’s testimony about B4 DI Leary’s testimony about microspectrofluorimetry DI Leary’s testimony about thin layer cromatography Guard’s testimony about C2 Guard’s testimony about C3 Guard’s testimony about C4 Guard’s testimony about C5 27 HS’s testimony about C6 HS’s testimony about C8 0, 1 0, 1 1, 2, 3, 4, 5, 6 0, 1 0, 1 0, 1 0, 1 0, 1 0, 1 0, 1 0=other, 1=red acrylic 0, 1 0=other, 1=red acrylic A, B, AB, O A, B, AB, O 0=no blood stain, 1=spray, 2=other A, B, AB, O 0, 1 0, 1 0, 1 0, 1 0, 1 0, 1 0, 1 0, 1 0, 1 0, 1 0, 1 0, 1 A, B, AB, O A, B, AB, O 0, 1 0, 1 0, 1 0, 1 0, 1 1, 2, 3, 4, 5, 6 0, 1 0, 1 0, 1 0, 1 Table 5.1: Variables for the network in Figure 5.4. LC1 C1 PC1 C2 GC2 LC2 LC7 C7 PC7 HC6 LF1 F1 LF3 F3 J2 MSF TLC LMSF C3 C4 GC3 C5 GC4 C6 C8 PC5 HC8 F2 LJ2 LB1 B1 LTLC LB2 B2 B3 B4 J1 LB4 Figure 5.4: Bayesian networks for the burglary example. been fixed to their observed value, the probabilities of the nodes of interest, namely C6 and C8 , can be updated, and the likelihood ratio P (E|C8 = 1) P (E|C8 = 0) can be computed. It would be useful to show the algebra (exploiting conditional independence relationships) for simplifying computations of this (and other) quantities of interest, along the lines of what is described in Dawid and Evett (1997). 5.3.2 GC5 An object-oriented Bayesian network It is often possible to find certain structures that are used repeatedly inside a single Bayesian network and in different networks contructed for different problems. An object-oriented Bayesian network allows to define general single networks –or fragments– and to combine them together in a main network, having as nodes both random variables and Bayesian 28 LB3 LJ1 C3 GC5 GC3 LC7 C6 C8 F2 F3 J2 B1 B2 B3 B4 Figure 5.5: Original Bayesian networks for the burglary example. networks which are instances of the general fragments previously defined, see Dawid et al. (2005). Figure 5.6 represents the object-oriented network constructed from the simple network in Figure 5.4, collapsing some of the nodes. Circles are simple variables as in Figure 5.4, whereas rectangles are instances of the following networks (see Figure 5.7, 5.8, 5.9) Report (blue rectangles). This network represents a report (variable R) regarding an item or event (variable I1 ). The report can be right, and hence equal to the original item or event, or wrong, and hence equal to a random item or event (variable I2 ). The variable C is a “biased coin” that allows for a non-symmetric error in the report. Such fragment is used here for witnesses who testify about events related to the crime, and for laboratory analyses (e.g. microspectrofluorimetry) on items related to the crime. Match (white rectangles). This fragment represents a match between a trace (variable T ) and several possible sources of that trace (actually, only one source is considered 29 BREAK IN C1 PHOTO 1 GUARD 1 POLICE 1 POLICE 2 C3 GUARD 2 GUARD 3 PHOTO 2 GUARD 4 C7 IDEN− TITY C6 PHOTO 3 POLICE 3 BLEED− ING C4 B3 SUSPECT 2 POLICE 10 SUSPECT 1 J1 FIBER TRUE F2 POLICE 11 POLICE 4 FIBER MATCH POLICE 5 MSF POLICE 6 TLC BLOOD 1 BLOOD 2 POLICE 8 POLICE 9 POLICE 8 Figure 5.6: The object-oriented Bayesian networks developed from Figure 5.4. in this example, variable T1 ). Variable H is the hypothesis variables “Who left the trace?”. Such fragment is used here for describing blood matches between the suspect, the guard, and the blood found on the suspect’s jumper, as well as the match between the fibres found at the crime scene and the suspect’s jumper. Consequence (black rectangles). This network represents events (variable Y ) that are consequences of some possibly false event (variable X), tipically the event “Did the crime happen?”. Such fragment is used here to model “meaningless” conditioning, such as conditioning the break-in variable on the fact that nobody cut the grille (and hence that nobody even tried to break in). The variable “bernoulli” is itself a network that corresponds to a Bernoulli random variable with unknown probability parameter. For a more detailed description see Chapter 3. the object-oriented network of Figure 5.6. 30 Table 5.2 is a description of the nodes in Label Variable description Instance of C1 C3 C4 C6 C7 F2 J1 POLICE 1 POLICE 2 POLICE 3 POLICE 4 POLICE 5 POLICE 6 POLICE 7 POLICE 8 POLICE 9 POLICE 10 POLICE 11 GUARD 1 GUARD 2 GUARD 3 GUARD 4 SUSPECT 1 SUSPECT 2 PHOTO 1 PHOTO 2 BREAK IN BLEEDING IDENTITY FIBER TRUE FIBER MATCH BLOOD 1 BLOOD 2 I {Someone made a cut in the grille in order to break in} Number of offenders I {One of the offenders punched the guard} I {HS is guilty, i.e. he is one of the offenders} I {HS was nearby the scene of the crime} Identity of the person who left the fibre tuft in the grille I {The jumper shown at trial is HS’s jumper} Police testimony about the cut in the grille Police testimony about the break-in Police testimony about the suspect being at the crime scene Police testimony about the authenticity of the fiber tuft presented at trial Police testimony about the match fiber tuft/jumper Police testimony about the result of MSF Police testimony about the result of TLC Police testimony about the match suspect’s blood/blood on jumper Police testimony about the match guard’s blood/blood on jumper Police testimony about the blood on the jumper Police testimony about the authenticity of the jumper shown at trial Guard testimony about the break-in Guard testimony about the number of intruders Guard testimony about the punch Guard testimony about the nose bleeding Suspect testimony about himself being guilty Suspect testimony about himself being at the crime scene Picture of the cut in the grille Picture of the suspect after he was apprehended Whether or not there was a break-in Whether or not the guard’s nose was bleeding Identity of the person who punched the guard Authenticity of the fiber tuft shown at trial Match between the fiber tuft and the suspect’s jumper Match between the suspect’s blood and the blood on the jumper Match between the guard’s blood and the blood on the jumper Report Report Report Report Report Report Report Report Report Report Report Report Report Report Report Report Report Report Report Consequence Consequence Consequence Consequence Match Match Match Table 5.2: Variables for the network in Figure 5.4. 31 I1 C I2 R Figure 5.7: The “Report” fragment. H T1 T Figure 5.8: The “Match” fragment. 5.4 A comparison between Wigmore charts and Bayesian networks Both Wigmore charts and Bayesian networks are graphical methods allowing incorporation of complex evidence structures. What do they have in common? Which are the differences? Are they complementary methods in some sense? Some remarks/questions about comparing these two techniques are listed below. • Wigmore charts are a technique for describing a reasoning process, and for constructing arguments. Bayesian networks are a statistical tool for deriving inferences given the available evidence. Wigmore charts are deterministic. BER NOULLI X Y Figure 5.9: The “Consequence” fragment. 32 • Wigmore charts are constructed backward, after collecting the evidence, whereas Bayesian networks are meant to be a model for representing how things can happen (evidence is entered after the network has been built). Arrows in the two methods have opposite directions. • No conditional independence in Wigmore charts, but still notion of relevance? • No quantification in Wigmore charts: how can one draw inferences from the observed evidence or formulate hypotheses? • No construction of arguments in Bayesian networks? • A lot of details vs. not so many details (take some of the elements for granted in a Bayesian network). • In order to build a Bayesian network one has to choose a “starting point”, and hence take some of the events for granted and implicitly make assumptions which do not appear in the model, and thus decide what is evidence... • A Wigmore chart describes different hypotheses at different levels (ultimate, penultimate and intermediate probanda). What is that correspond to this in Bayesian networks? Do we always want a single hypothesis (guilty/not guilty)? • Construction of the network is a subjective matter, in both cases. • Is the Wigmore chart method unnecessarily complicated (lots of trifles) from a statistical point of view? (Wigmore charts are better confined to the forensic framework only?) Why they are similar • Graphical methods • Inference networks • Subjectivity • Computations via conditional independence and conditional non-independence • Models for incorporating complex evidence structures. Why they are different • Wigmore charts are constructed backwards after the evidence has been observed • Bayesian networks are a “process model” (Schum, 2005), since they are intended to capture a complex process by which some series of events could have been generated 33 • In order to construct a Bayesian network one needs to make assumptions about events related to the problem • Wigmore charts are based on binary propositions true/false • Wigmore charts are chains of reasoning from the bottom (evidence) to the top (probanda) • Wigmore charts use generalisations in order to establish connections among variables • Bayesian networks are entirely probabilistic • Arrows mean probabilistic dependence in Bayesian networks, whereas they indicate the inferential flow of reasoning in Wigmore charts. 5.5 Remarks and future work Many important issues have not been taken into account in the models described above. Further work is needed in the following areas • How to evaluate and compare different models, if needed. Sensitivity analysis: how do the results change when prior probabilities and/or structure of the network change. • What model when more than a suspect is involved (see also Levitt and Lasky, 2001) • Manipulated evidence (e.g. testimonies of witnesses may not be genuine, or item presented at trial may not be those found at crime scene or could have been modified after being found at crime scene) • Interactions between different witnesses • Credibility of witnesses • Limitations of Hugin when building a network • How to use generalisations when building Bayesian networks • Other issues related to the forensic framework are: manipulated evidence, chain of custody. A problem that often arises when modelling legal case examples is the following. Wigmore charts always include probanda (ultimate, penultimate and intermediate). Some of such probanda, when translated into nodes of a Bayesian networks, may generate difficulties in interpreting dependencies. Consider for instance the binary variables (yes /no) X1 =“Did that specific crime happen?” (which would be a penultimate probanda in a Wigmore chart) and its child node X2 =“Was the suspect at the crime scene?”. Assigning conditional probabilities P (X2 |X1 = no) is meaningless of course 34 (if there is no crime there is no crime scene). A solution could be to define X2 as a three-state variables, taking values “yes”, “no” and “not applicable”, and doing so for all its children and so on (the idea is to “interrupt” the flow of information going through that part of the network). 35 Chapter 6 A Bayesian network analysis of the Sacco and Vanzetti case The Sacco and Vanzetti case, a very famous case in the American legal history, has been analysed from a probabilistic point of view in Kadane and Schum (1996). A Bayesian network analysis is described in Cheung (2005). We try to make a more sophisticated Bayesian network analysis. 6.1 The case The following is a description of the Sacco and Vanzetti case from Cheung (2005). After a robbery that took place at about 3 pm on 15 April 1920 in South Braintree, Massachusetts, Nicola Sacco and Bartolomeo Vanzetti, both Italian immigrants with ties to the anarchist movement, were convicted first-degree murder for shooting and killing Alessandro Berardelli and Frederick Parmenter, the payroll guards who were taking two iron boxes which consisted of a total of over $ 15000 from one factory of the Slater and Morril shoe factory to the second Slater and Morris factory in South Braintree. During their journey, two men who were leaning against a pipe-rail fence attacked them from behind. After the incident, the two men escaped in a black car, with three other men, who had picked them up at the scene of the crime. Berardelli and Parmenter were both dead. On 5 May the same year, Sacco and Vanzetti were arrested after they had gone with two other Italians to a garage to claim a car that local police had connected with the crime. After a long trial that lasted for more than 7 years, the two men were sentenced to death on 23 August 1927, despite of many witnesses giving contradicting and conflicting evidence. The hypothesis of interest, which we denote by H, is H = Did Sacco shot Berardelli? a binary random variable (H = 1 if Sacco is guilty). We want to use the available evidence, 36 which we denote by E, to assess the likelihood ratio P (E|H = 1) . P (E|H = 0) Notice that we could include Vanzetti as a suspect. Moreover, the hypothesis of interest can be formulated in a different way: for instance, a multiple state variable “Who shot Berardelli?”. 6.2 Items of evidence The case is very complicated, as so is the available evidence. We consider different categories of evidence • Witness evidence, EW • Physical evidence, EP • Consciousness of guilt evidence, EC . See Cheung (2005) for more details. These are typical items of evidence in criminal cases, see also Chapter 5. The aim of using Bayesian networks is to create a structure for both representing such a complex evidence and make sense of it. The resulting likelihood ratio will be P (E|H = 1) P (E|H = 0) = P (EW |H = 1) P (EP |H = 1) P (EC |H = 1) × × P (EW |H = 0) P (EP |H = 0) P (EC |H = 0) if we make the (restrictive) assumption that different sources of evidence are independent, where E = EW ∪ EP ∪ EC . 6.2.1 Witness evidence A temptative representation of the witness evidence is in Figure 6.1 When dealing with testimonial evidence credibility of the witnesses plays an important role. An idea would be to use object-oriented Bayesian networks to describe witness credibility, both based on the moral judgements about the person and on objective criteria (for instance, if there was enough light for the witness to see etc.). Moreover, interactions between witnesses have to be taken into account (influence of one witness on another one...), though in the picture below they are considered as independent. Also, mistake and deception have to be taken into account when assessing relevance of testimonies, see Chapter 3. The variables involved are H H1 Sacco did it? was Sacco at the crime scene? 37 P Pelser’s testimony W Wade’s testimony C Costantino’s testimony. After having established the probabilistic relationships among witnesses and the relevant hypotheses, conditional indepedencies can be exploited as in Cheung (2005) for likelihood ratio computations. The likelihood ratio based on witness evidence P (EW |H = 1) P (EW |H = 0) will be combined with the likelihood ratios based on the remaining evidence. H H1 S W P C Figure 6.1: Witness evidence for the Sacco and Vanzetti case. Other witness evidence relates to Sacco’s alibi, but it is not considered here. See also the “Report” fragment in Chapter 3. 6.2.2 Physical evidence The following items were introduced as evidence at trial: a 32-caliber Winchester bullet (exhibit 18) extracted from Berardelli’s body, a 32-caliber Colt (exhibit 28) belonging to Sacco, a Winchester shell found at the crime scene, a cap belonging to Sacco found at the crime scene. We define the following variables B1 characteristics of the bullet B2 was the bullet really extracted from Berardelli’s body? B3 was Berardelli killed by that bullet? B4 expert’e testimony about the bullet G1 characteristics of the gun 38 G2 is the gun really belonging to Sacco? G3 Sacco’s testimony about the gun G4 police’s testimony about the gun S1 characteristics of the shell S2 was the shell really found at the crime scene? S3 expert’s testimony about the shell T result from the firing test C1 characteristics of the cap shown at trial C2 characteristics of Sacco’s hat C3 was the cap found at crime scene? C4 Sacco’s testimony about the cap C5 other testimonies about the cap. The last variable C5 only indicates that there is a complicated structure of further testimonies about these items, which we will not model for now. As it often happens when dealing with evidence related to criminal cases, this is a problem of matching different items of evidence. This is described in more detail in Chapter 3. We construct two separate Bayesian networks, as the firearm evidence relates to H, whereas the cap evidence relates to H1 , and they can be considered as independent. H B2 G2 G3 G1 G4 B1 T B3 B4 S1 S2 Figure 6.2: Firearm for the Sacco and Vanzetti case. 39 S3 H1 C1 C3 C5 C2 C4 Figure 6.3: Cap evidence for the Sacco and Vanzetti case. 6.2.3 Consciousness of guilt evidence This kind of evidence is more difficult to interpret. It relates to the behaviour of Sacco at the moment of his arrest, see Figure 6.4. We consider the following variables X1 Sacco was conscious when he was arrested X2 Sacco was somehow involved with the crime X3 Sacco was involved in other crimes X4 Sacco intended to escape from the police when they arrested him X5 Sacco attempted to take the gun out of his coat when he was arrested X6 Sacco’s testimony X7 police’s testimony. 6.2.4 Combining all the evidence If we combine the three sources of evidence together and consider them as conditionally independent (restrictive assumption) we obtain a network like the one in Figure 6.5. We then have to fix the values of the observed variables. 6.3 Remarks and future work The model described above is a very simple one. The idea would be to work on it and make it better. Also, not all the elements in the case have been taken into account. Interesting issues include 40 H X2 X3 X5 X6 X1 X4 X7 Figure 6.4: Consciousness of guilt evidence for the Sacco and Vanzetti case. H E1 H1 E2 E3 Figure 6.5: Combining all the evidence for the Sacco and Vanzetti case. • How to combine sources of evidence (not just independence) • How to relate the Bayesian network analysis to the Wigmore chart analysis in Kadane and Schum (1996) • Witness evidence and witness credibility and more... 41 References Anderson, T., Schum, D. and Twinings, W. (2005). Analysis of evidence, second edition. Cambridge University Press. Baio, G. and Corradi, F. (2004). Handling Manipulated Evidence. Working Paper no. 13, Department of Statistics “G. Parenti”, University of Florence, Italy. Cavallini, D. and Corradi, F. (2005). OOBN for forensic identification trough searching a DNA profile’s database. In Proceedings of AISTATS 2005. Cheung, C. (2005). The analysis of mixed evidence using graphical and probability models with application to the Sacco and Vanzetti case. BSc project, UCL. Dawid, A. P. (1987). The difficulty about conjunction. The Statistician 36, 91-97. Dawid, A. P. (2000). Causal inference using influence diagrams: the problem of partial compliance. Research Report no. 213, Department of Statistical Science, University College London. Dawid, A. P. (2002). Influence diagrams for causal modelling and inference. International Statistical Review 70, 161-189. Dawid, A. P. (2003). An object-oriented Bayesian network for estimating mutation rates. In Proceedings of the Ninth International Workshop on Artificial Intelligence and Statistics, January 3-6 2003, Key West, Florida, edited by Christopher M. Bishop and Brendan J. Frey. ISBN 0-9727358-0-1. Dawid, A. P. (2005). Statistics and the law. In Evidence, edited by Karin Tybjerg, John Swenson-Wright and Andrew Bell. Cambridge University Press (to appear). Dawid, A. P. and Evett, I. W. (1997). Using a graphical method to assist the evaluation of complicated patterns of evidence. Journal of Forensic Sciences 42, 226-231. Dawid, A. P., Mortera, J. and Vicard, P. (2005). Object-oriented Bayesian networks for complex forensic DNA profiling problems. Technical Report 256, Department of Statistical Science, University College London. 42 Dawid, A. P. and Schum, D. A. (2004). Bayes, Wigmore and inference networks: a dialogue. Technical report. Kadane, J. B. and Schum, D. A. (1996). A probabilistic analysis of the Sacco and Vanzetti evidence. Wiley. Lauritzen, S. L. (2003). Graphical models for surrogates. Bulletin of the International Statistical Institute 60, 144-147. Leucari, V. (2005). Analysis of complex patterns of evidence in legal cases: Wigmore charts vs. Bayesian networks. Working paper. Levitt, T. S. and Laskey, K. B. (2001). Computational Inference for Evidential Reasoning in support of Judicial Proof, Cardozo Law Review 22, 1691-1731. Pearl, J. (2000). Causality. Cambridge University Press. Schum, D. A. (2001). Evidential foundations of probabilistic reasoning. Northwestern. Schum, D. A. (2004). Capturing an interesting subtlety involving a source of testimonial evidence. Technical report. Schum, D. A. (2005). A Wigmorean interpretation of the evaluation of a complicated pattern of evidence. Technical report. 43