Statistical evidence evaluation Exercises 4

Statistical evidence evaluation
Exercises 4
1. Shoeprints and dirty shoes
Assume a burglary was made into a private home. The offender has obviously entered the
house from the backdoor and the evidence for that assumption is a number of strange
footmarks left outside the backdoor in the soil.
Very soon after the burglary a suspect is apprehended wearing shoes with substantial amount
of soil on the soles. A comparison between the soil on the shoes and the soil outside the
backdoor of the house shows enough coincidences in compounds to state there is an
uncontroversial match. Further this type of soil is not that common, the prevalence is about
8% on the average.
We wish to evaluate the match against a pair of propositions stating that (1) The suspect was
walking around close to the backdoor and; (2) The suspect has never been in the
neighbourhood of the backdoor.
There are two studies available to help in this evaluation. The first is concerned with the
transfer of soil material to shoe soles and consists of 1200 experiments using different
footwear and a variety of soil types. The shoes of this study are initially carefully cleaned and
then inspected 30 minutes after they have been in contact with the soil. The soil used in the
experiment was dyed to facilitate the identification of residues. The results are given in the
file soiltransfer.txt. The second is a survey of shoe soles with respect of residues from soil and
comprises results from 1500 sampled footwear of different types. The results are given in the
file soilresidues.txt. Both files can be downloaded from the course web
When evaluating the match it can be taken for granted that there was a contact between the
shoes of the true offender and the soil outside the backdoor, thus no consideration of contact
needs to be done.
a) Use all available data in an efficient way to evaluate the match with a BN taking into
account transfer and background nodes. Use the “Match form” of the network.
b) Reconstruct the network to an “X-Y form”. (See lecture notes from Meeting 5). What
are the immediate consequences for the background node(s)?
c) Now, suppose we also identify the suspect’s shoe as being of type F. Would that affect
the probabilitiy tables of the obtained network? Does the value of evidence change?
2. Shoeprints: Combining findings
Now besides the recovery of soil residues on the suspect’s shoes there are also matches in
pattern between all shoe marks and the suspect’s shoes. This pattern is classified as number
113 from an internal classification list. In the file soilresidues.txt there is also information
about the classified pattern of each shoe in the survey.
a) Use the information in soilresidues.txt and construct a BN with which the match(es) in
pattern can be evaluated against source level propositions.
b) Combine the network obtained in a) with the network obtained in task 1 to get a
combined evidence value of the match in soil and the match in pattern. What
assumptions need to be made to make this network valid?
3. Transfer of fibres
Let’s assume that in bank robberies where an escape car is used the driver of this car does
seldom take part in the very robbery (although he may be convicted for assistance to robbery).
A provisional estimate is that only in 2% of all bank robberies the car driver would take active
part. Now assume that a witness has pointed out a suspect to have taken active part in a bank
robbery. Inspecting the escape car we find 6 strange wool fibres on the driver’s seat, but no
strange fibres are secured anywhere else in the car. The suspect has a wool made pull-over
with exactly the same kind of fibres that were secured on the driver’s car seat. The fibres are
quite common, though, and are estimated to be found in 23% of all wool-made pull-overs on
the market. The witness stated however that the suspect wore that particular pull-over when
committing the robbery.
There is also a vast study made on car seats to investigate the prevalence of persisting fibres.
This study shows that one may expect to find one group of strange fibres on 43% of all car
seats, two groups of (different) strange fibres on 22% of all car seats and more than two
groups on 10% of all car seats. Further an experimental study shows that woollen garments
tend to leave groups of between 2 and 10 fibres on car seats in 2 out of 5 cases when there is a
contact between the garment and the seat and more than 10 fibres in 1 out of 15 cases. One
may also assume that once some fibres have been left they are very persistent and can
therefore with a probability close to 100% be recovered at an inspection.
Use all this information to put up a BN with which it is possible to evaluate the match
between the recovered fibres and the fibres of the suspect’s pullover against the pair of
Hp : The suspect drove the escape car
Hd: The suspect has never sat on the driver’s seat of the escape car
Discuss the validity of the way the evidence is evaluated. Are there serious drawbacks with
the procedure?
4. DNA in a crime case
Assume a stain with the genotype AB has been found, and a suspect has been tested to have
the genotype AB also. Assume the population frequencies of A and B is 0.03 and 0.12,
respectively. Construct a Bayesian Network for this situation, where you use one node for
each genotype, one node for each of the involved paternal alleles, and one node for each of
the involved maternal alleles (see the lecture notes for lecture 10). Use also one node for the
hypotheses of the prosecutor/defence.
a) If you assume the prior probability that the suspect deposited the stain was 0.01, what
is the posterior probability given the genetic data? (Use the network to compute)
b) Use instead the network to compute the probability of the observed data under each of
the two hypotheses.
c) Set up an equation computing the answer in (a) from the answer in (b).
5. DNA in a disputed paternity case
Using the same type of nodes as in exercise 4, set up a simple BN for paternity cases, using a
single locus with three possible alleles: A, B, and C. Use nodes for the genotype of the
mother, the genotype of the putative father, and the genotype for the child, and in addition
nodes for paternal and maternal alleles to all genotype nodes. Assume the allele frequencies
are 0.3, 0.45, and 0.25, respectively. Use also a node for the two usual hypotheses in paternity
a) Compute the posterior probability for the hypothesis that the putative father is the
father, assuming that the prior is 0.5, and that the data is AB, AC, AC for the mother,
putative father, and child, respectively.
b) Compute the likelihood ratio in the same case for the putative father being the real
c) Assume you also have data from an independent locus, with possible alleles X, Y, and
Z, and frequencies 0.1, 0.1, 0.8, respectively. Make a network combining information
from both loci, and compute the result if the data is XX, XY, XY for the mother,
putative father, and child, respectively.