Statistical evidence evaluation Exercises 4 1. Shoeprints and dirty shoes Assume a burglary was made into a private home. The offender has obviously entered the house from the backdoor and the evidence for that assumption is a number of strange footmarks left outside the backdoor in the soil. Very soon after the burglary a suspect is apprehended wearing shoes with substantial amount of soil on the soles. A comparison between the soil on the shoes and the soil outside the backdoor of the house shows enough coincidences in compounds to state there is an uncontroversial match. Further this type of soil is not that common, the prevalence is about 8% on the average. We wish to evaluate the match against a pair of propositions stating that (1) The suspect was walking around close to the backdoor and; (2) The suspect has never been in the neighbourhood of the backdoor. There are two studies available to help in this evaluation. The first is concerned with the transfer of soil material to shoe soles and consists of 1200 experiments using different footwear and a variety of soil types. The shoes of this study are initially carefully cleaned and then inspected 30 minutes after they have been in contact with the soil. The soil used in the experiment was dyed to facilitate the identification of residues. The results are given in the file soiltransfer.txt. The second is a survey of shoe soles with respect of residues from soil and comprises results from 1500 sampled footwear of different types. The results are given in the file soilresidues.txt. Both files can be downloaded from the course web (www.ida.liu.se/~732A45/info/diary.en.shtml) When evaluating the match it can be taken for granted that there was a contact between the shoes of the true offender and the soil outside the backdoor, thus no consideration of contact needs to be done. a) Use all available data in an efficient way to evaluate the match with a BN taking into account transfer and background nodes. Use the “Match form” of the network. b) Reconstruct the network to an “X-Y form”. (See lecture notes from Meeting 5). What are the immediate consequences for the background node(s)? c) Now, suppose we also identify the suspect’s shoe as being of type F. Would that affect the probabilitiy tables of the obtained network? Does the value of evidence change? 2. Shoeprints: Combining findings Now besides the recovery of soil residues on the suspect’s shoes there are also matches in pattern between all shoe marks and the suspect’s shoes. This pattern is classified as number 113 from an internal classification list. In the file soilresidues.txt there is also information about the classified pattern of each shoe in the survey. a) Use the information in soilresidues.txt and construct a BN with which the match(es) in pattern can be evaluated against source level propositions. b) Combine the network obtained in a) with the network obtained in task 1 to get a combined evidence value of the match in soil and the match in pattern. What assumptions need to be made to make this network valid? 3. Transfer of fibres Let’s assume that in bank robberies where an escape car is used the driver of this car does seldom take part in the very robbery (although he may be convicted for assistance to robbery). A provisional estimate is that only in 2% of all bank robberies the car driver would take active part. Now assume that a witness has pointed out a suspect to have taken active part in a bank robbery. Inspecting the escape car we find 6 strange wool fibres on the driver’s seat, but no strange fibres are secured anywhere else in the car. The suspect has a wool made pull-over with exactly the same kind of fibres that were secured on the driver’s car seat. The fibres are quite common, though, and are estimated to be found in 23% of all wool-made pull-overs on the market. The witness stated however that the suspect wore that particular pull-over when committing the robbery. There is also a vast study made on car seats to investigate the prevalence of persisting fibres. This study shows that one may expect to find one group of strange fibres on 43% of all car seats, two groups of (different) strange fibres on 22% of all car seats and more than two groups on 10% of all car seats. Further an experimental study shows that woollen garments tend to leave groups of between 2 and 10 fibres on car seats in 2 out of 5 cases when there is a contact between the garment and the seat and more than 10 fibres in 1 out of 15 cases. One may also assume that once some fibres have been left they are very persistent and can therefore with a probability close to 100% be recovered at an inspection. Use all this information to put up a BN with which it is possible to evaluate the match between the recovered fibres and the fibres of the suspect’s pullover against the pair of propositions: Hp : The suspect drove the escape car Hd: The suspect has never sat on the driver’s seat of the escape car Discuss the validity of the way the evidence is evaluated. Are there serious drawbacks with the procedure? 4. DNA in a crime case Assume a stain with the genotype AB has been found, and a suspect has been tested to have the genotype AB also. Assume the population frequencies of A and B is 0.03 and 0.12, respectively. Construct a Bayesian Network for this situation, where you use one node for each genotype, one node for each of the involved paternal alleles, and one node for each of the involved maternal alleles (see the lecture notes for lecture 10). Use also one node for the hypotheses of the prosecutor/defence. a) If you assume the prior probability that the suspect deposited the stain was 0.01, what is the posterior probability given the genetic data? (Use the network to compute) b) Use instead the network to compute the probability of the observed data under each of the two hypotheses. c) Set up an equation computing the answer in (a) from the answer in (b). 5. DNA in a disputed paternity case Using the same type of nodes as in exercise 4, set up a simple BN for paternity cases, using a single locus with three possible alleles: A, B, and C. Use nodes for the genotype of the mother, the genotype of the putative father, and the genotype for the child, and in addition nodes for paternal and maternal alleles to all genotype nodes. Assume the allele frequencies are 0.3, 0.45, and 0.25, respectively. Use also a node for the two usual hypotheses in paternity cases. a) Compute the posterior probability for the hypothesis that the putative father is the father, assuming that the prior is 0.5, and that the data is AB, AC, AC for the mother, putative father, and child, respectively. b) Compute the likelihood ratio in the same case for the putative father being the real father. c) Assume you also have data from an independent locus, with possible alleles X, Y, and Z, and frequencies 0.1, 0.1, 0.8, respectively. Make a network combining information from both loci, and compute the result if the data is XX, XY, XY for the mother, putative father, and child, respectively.