TRIAL BY PROBABILITY: BAYES’ THEOREM IN COURT Jill Thompson

advertisement
TRIAL BY PROBABILITY:
BAYES’ THEOREM IN COURT
Jill Thompson
Sally Danielson
Eric Hartmann
Dave Bucheger
Justin Koplitz
Let’s say that DNA evidence is brought into a trial. The prosecution explains a
defendant has a consistent DNA match with the blood found at the crime scene. How
convicting can it be? If you say there’s a one-in-a-million chance that two people have a
DNA match for a particular test, is that enough to convict? They say there’s a one-in-amillion chance to win the lottery, but someone does win. DNA can play a crucial role in
a court case; it can be used to exonerate as well as convict. It is also interesting to note
that not only human DNA, but also non-human DNA can be presented in court.
This report will deal with many issues that DNA evidence raises. First, DNA
cases currently in the news will be discussed. Next, the prosecutor’s fallacy will be
explained, and it will be determined if jury members really do understand the probability
behind DNA matches. The actual composition of DNA will then be explained to help
comprehend the number of possible DNA permutations involved. Following this, the
probability behind DNA will be discussed with the help of Bayes’ theorem. Also, the
concept of compounding evidence will be applied to the trial of the century, the O.J.
Simpson trial. Finally, a simple analogy of what a consistent DNA match indicates will
be presented.
The earliest case involving DNA evidence occurred in 1979. The case involved a
conviction due to the evidence. However, it was not until July of 1987 that DNA
evidence was involved in the exoneration of a defendant. More recently, there have been
many interesting cases in the news that involve DNA evidence.
An interesting case that involves non-human DNA is in Texas. A judge has
decided to allow evidence involving DNA tests on strands of dog hair found on a duffel
bag left at the scene of the crime. Hair was taken from suspects’ dogs and a match was
found. The trial is set to begin July 10, 2000.
Another interesting case, reported on April 22, 2000, is taking place in Spokane,
Washington. A special task force was established for three years at a cost of 2.2 million
to find a serial killer. There were many samples of similar DNA left at the crime scenes
with no other clues in the cases but one; a description of a 1977 white corvette had been
seen in the area after many of the murders. Finding records of a 1977 white corvette
getting a traffic violation in the area obtained a name along with a DNA test. After some
further questioning, a man was arrested and charged with 12 murders dating back to
1990. This number could still increase to 18 before the trial begins.
A third case was reported on April 8, 2000 in Australia, about 310 miles from
Sydney. The small community will be taking DNA samples from over 600 men who are
18 or older to help solve the rape and beating of a 91 year old women. It is not a required
test; however, if anyone chooses not to take the DNA test he will automatically become a
suspect.
This screening practice is already taking place in Britain. In the last five years,
there have been more than 100 tests carried out with 1/3 resulting in arrests. The British
government now wants to set up a nation wide database of DNA samples when
previously the innocent samples were destroyed.
In Illinois, since the death penalty resumed in 1977, there have been 12 people put
to death. However, there have also been 13 death row inmates who have been cleared of
their crimes. Most of them were released on behalf of DNA testing. The Illinois
governor has postponed all lethal injections pending an investigation. Nationally, there
have been 64 people who have had their criminal convictions thrown out after being
exonerated by DNA testing.
Another case where interpreting DNA evidence came into play was in December
of 1993 when Andrew Dean’s appeal for a rape conviction was tried. Originally, in
Dean’s 1990 trial, DNA match probabilities were used to help get a conviction. Match
probability is the probability that the DNA profile of a random person in the population
matches the crime sample.
In the Dean appeal, Peter Donnelly, a statistics professor from London, argued
that the DNA probabilities were presented in a way that was misunderstood by the jury;
this is known as the prosecutor’s fallacy. He claimed that forensic evidence gives us the
probability that the defendant'
s DNA profile matches the crime sample assuming the
defendant is innocent, or P (Match | Innocent). What the jury is looking for though is the
probability that the defendant is innocent assuming the DNA profiles of the defendant
and the crime sample match, or P (Innocent | Match). Assuming that the probability of a
match given innocence is equal to the probability of innocence given a match is the
prosecutor’s fallacy.
Donnelly presented an example of this misconception. He said to assume a group
of judges were playing poker with the Archbishop of Canterbury. If the archbishop dealt
a royal flush on the first hand, the judges may suspect him of cheating; the probability of
the archbishop dealing the royal flush, assuming he’s honest, is 1 in 70,000. But, if the
judges were asked whether the archbishop was honest, given he dealt a royal flush, they’d
probably quote a much greater probability. Therefore, the probability of a flush given
honesty and the probability of honesty given a flush may not be equal.
Donnelly then showed that by using Bayes’ Theorem, we can precisely figure out
how prior beliefs should be altered in the light of experimental data. With Bayes’
Theorem we can find the odds of the defendant being innocent, or the ratio of the
probability of the innocence to the probability of guilt. “Prior odds” are the odds of
innocence before hearing DNA evidence, and the “posterior odds” are the odds of
innocence after hearing DNA evidence. So the posterior odds equal the prior odds
multiplied by the DNA match probability.
The following example demonstrates how match probabilities can be confused.
Pretend a crime is committed in Menomonie,WI by an unidentified white male. The
number of possible perpetrators could be at most the entire white male population in
Menomonie, say 8,000. So the prior odds of a white male in Menomonie,WI are about
8,000 to 1 in favor of innocence. If the probability of a random DNA match with a
suspect is 1 in a million, then the posterior odds are 8,000 x 1 in a million, or 125 to 1
favoring innocence, not one in a million. This misconception is an example of the
prosecutor’s fallacy.
Getting back to the Dean case, Donnelly showed that the prosecutor’s fallacy was
effective enough to make the verdict of the original trial unsafe. Therefore, Dean’s
conviction was lifted, and a retrial was ordered. The judge’s decision didn’t indicate that
DNA profiling was unsafe, but that DNA evidence will have to be presented carefully
enough in the future to avoid the jury from assuming the prosecutor’s fallacy.
So, how accurately can judges, jurors, prosecutors, and attorneys process
quantitative probabilistic evidence? In most cases, courts have been reluctant to admit
the presenting of mathematical evidence because they fear that jurors will overweigh it
relative to other evidence, meaning that it will sway their judgement one way more than it
really should. However, studies have shown that the jury’s tendencies are the opposite,
and most mathematical evidence is underweighed rather than overweighed.
To determine whether or not juries correctly interpret probabilistic evidence,
many studies involving hypothetical crimes and mock juries were conducted. The studies
operated in the following manner: (1) the jury would initially write down the probability
of how guilty they thought the defendant was, or P0. (2) They were next presented with
the probabilistic evidence, such as the frequency of an incriminating trait T said to occur
in the general population (denoted F (T)), or the probability that the defendant had the
incriminating trait, denoted P (TD). (3) The jury’s posterior probabilities were taken on
how guilty they thought the defendant was based on the new evidence. The data that
came back from these mock trials would then be compared to the standard rule for
revising probabilistic beliefs in the light of new evidence, or Bayes’ Theorem. Therefore,
using Bayes’ Theorem, the probability of the person being guilty, or (P (G)), should be:
P(TD | G) P (G )
P(G | TD )
=
*
P (G '
| TD )
P(TD | G '
) P (G ')
Since the juror probably believes the guilty person has the incriminating trait, and
only a fraction F(T) of innocent people have trait T, then we can substitute 1/ F (T) for
the likelihood ratio:
P (TD | G )
P (TD | G ')
So the odds of guilt are now:
1
P (G )
*
F (T ) P (G ')
Going back to the mock trials, in all cases it was shown that the mean judgement
probabilities increased when evidence TD and F (T) was presented. A typical table from
several mock trials is as follows:
“Trial”
F (T)
Mean P0
Mean P1
Bayes P1
Goodman
.001
.29
.47
.997
Goodman
.10
.29
.34
.803
Faigmon,
Baglioni
.20
.61
.70
.876
Faigmon,
Baglioni
.40
.64
.71
.816
As seen in the table, comparing the mean posterior probability P1 and the Bayes
P1 shows that the jury underweighed the probabilistic evidence since most jury P1’s are
much lower than their Bayes’ counterpart. The study also showed that the prosecutor’s
fallacy rarely occurred. The prosecutor’s fallacy would occur when people equate
P (G | TD) to P (TD | G), or (1 – F (T)). No more than 5% of the juries reported their
posterior judgement probabilities to be 1 – F(T).
The most famous court case where probabilistic evidence was severely under
weighed was in the trial of the century, the O.J. Simpson trial for the double murder of
Nichole Simpson and Ron Goldman. There were many DNA related items found at the
murder site including blood found on the white Ford Bronco, on his driveway, in the
foyer of the house, in the bathroom, on a leather glove, and on a sock at the foot of his
bed. The defense claimed that due to the volume of DNA evidence and alleged racist
cops, the evidence must have been planted. O.J. was found not guilty.
In order to understand DNA testing, one must first understand what DNA even is.
DNA has a one of a kind structure that is made up of two chain-like strands arranged in a
twisted ladder double helix form. Alternating sections of phosphate and a sugar called
deoxyribose make up the sides of this unique structure. The inside strands that resemble
the rungs of a ladder are pairs of thymine with adenosine and guanine with cytosine.
A person’s DNA code has approximately 3 billion loci, or places, that tell his or
her traits. Since all humans belong to the same species, much of each person’s DNA
codes are identical. Some traits that have identical codes are things like ten fingers and
ten toes; unique differences are created by unique DNA codes. Each person has a
difference at 10 million spots along the DNA strand. One in every three hundred loci
will be unique. It is these unique places that are called DNA markers and are used in
Forensic Science to identify people with a process known as DNA fingerprinting.
DNA fingerprinting is a process of identification that compares portions of DNA.
Removing the DNA from a sample such as hair, blood, saliva, or a tissue sample creates a
DNA fingerprint. A detergent is used to separate the DNA from the rest of the cellular
material. It can also be separated by applying a large amount of pressure in order to
“squeeze out” the DNA. Using restriction enzymes, the sample is then segmented.
Next, the DNA is poured into a gel, such as agarose, or onto an agarose-coated
sheet. Electrophoresis is then performed. Since DNA has a small negative charge, all of
the DNA will be attracted to the bottom of the gel, where a positive charge was applied
during electrophoresis. The smaller pieces of DNA will be able to move faster and thus
be closer to the bottom than the longer pieces. Therefore, the various sizes of DNA will
be separated by size, with the smaller pieces towards the bottom and the larger pieces
towards the top.
Heating or chemically treating the DNA in the gel can denature the DNA,
rendering it into a single strand. The DNA is next blotted and applied to a sheet of
nitrocellulose paper upon which it is baked so it permanently attaches to the sheet.
Finally, the sample is ready to be analyzed. By taking an X-ray photo, a picture
showing the DNA pattern can be developed. If the two DNA samples match, it is very
likely that they came from the same person.
Let M = the event of a DNA match between the defendant’s blood and blood
found at the crime scene. Let I = the event the defendant is innocent. Let I’ = the event
the defendant is guilty. There are reliable measures for P(M|I), the conditional
probability of M given I. If the defendant is innocent, the blood must be someone else’s;
hence, P(M|I) is the probability of a match between blood samples of two different
individuals. Since human DNA signatures are extremely distinctive, the probability of a
random match is very low. It is often found that P(M|I) is between 10-8 and 10-10,
depending on the amount of DNA obtained. However, it is P(I|M), the likelihood of
innocence given the evidence that should be considered by the jurors, not P(M|I). These
two conditional probabilities are related by Bayes’ theorem:
P (I | M ) =
P (I )
P (M
P (M )
| I)
So P ( I | M ) ≠ P ( M | I ) unless P ( I ) = P ( M ) , which is the prosecutor’s fallacy. If in
Bayes’ theorem, we substitute for P(M ) from the theory of total probability, we obtain
the bigger form of Bayes’ theorem:
| M
) =
P ( I | M
) =
P ( I
P ( I ) P ( M
P ( M
P ( I ') P ( M
P ( I | M
P ( I | M
) <
) ≈
P ( I ) P ( M
| I )
| I ) + P ( I ') P ( M
P ( M
P ( M
| I ')
| I ')
| I ) P ( I )
P ( M
| I ) P ( I )
+ 1
P ( M
| I ') P ( I ')
| I ) P ( I )
| I ') P ( I ')
P ( I ) P ( M
P ( I ')
| I )
In many cases, such as in the O.J. Simpson trial, more than one DNA sample is
involved. Since the jury will need to consider this compounding evidence, let’s define
some more variables. Suppose events, M1,…, Mk are all pieces of evidence introduced
against the defendant. The probability of innocence given all the evidence, P ( I
k
i =1
Mi ) ,
needs to be calculated. A theorem exists that gives an upper bound for this probability,
but first, some definitions need to be introduced.
For a given event I, denote PI (⋅) = P (⋅ | I ) . Two events M1 and M2 are said to be
conditionally independent with respect to I if PI ( M 2 | M 1) = PI ( M 2) . In this definition,
the events M1 and M2 can be interchanged. Once interchanged, the conditional
independence becomes mutual independence.
The ratio
P ( M 2 | M 1)
gives a measure of how strongly associated M2 and M1 are.
P ( M 2)
If the ratio is < 1, then P ( M 2 | M 1) < P ( M 2) , meaning that M2 is less likely to occur
given M1. The opposite is also true, if the ratio > 1 then P ( M 2 | M 1) > P ( M 2) , meaning
that M2 is more likely to occur given M1. P ( M 2 | M 1) = P ( M 2) if and only if M1 and M2
are mutually independent.
In the O.J. trial, evaluating the probability of guilt given the totality of DNA
evidence is the concern. Let M1 be the event that a drop of blood found near the victims’
bodies is consistent with the defendant’s blood. Cellmark Diagnostics, the DNA
laboratory of record in the trial, said that only one person in 170 million could be
expected to match the genetic markers identified in the drop of blood. This would mean
P ( M 1 | I ) = (1.7 × 10 8 ) −1 = 5.88 × 10 −9 . Let M2 be the event that blood on a sock
belonging to the defendant and found in his bedroom is consistent with Nichole’s blood.
Cellmark said that this probability was 1 in 6.8 million that another person would match
the genetic markers they found in the victims’ blood. This would mean
P ( M 2 | I ) = (6.8 × 10 9 ) −1 = 1.47 × 10 −10 . Also, it is given that P ( M 1 | I ') ≈ 1 and
P ( M 2 | I ') ≈ 1 , noted as the compliment of the world’s population.
One must strongly argue that M1 and M2 are more strongly associated
conditionally given guilt than given innocence, in order to apply the theorem. If I is true,
then M1 and M2 are independent. But assuming guilt, the occurrence of one would
increase the probability of the other. The two blood-matching events would reasonably
be judged to be more strongly associated conditionally given I’ than given I.
Substituting these numbers into our equations gives:
P (I | M
1
∩ M
2
) ≤
≤
P (I )
P (M
⋅
P ( I ')
P (M
1
⋅ ( 5 . 88 × 10
10 − 9
≤ 10
9
−9
⋅ ( 8 . 65
1
1
| I )P (M
| I ') P ( M
)( 1 . 47 × 10
× 10
− 10
− 19
2
2
| I )
| I ')
)
)
≤ 8 .65 × 10 −10
Now we can give an upper bound for the ratio
P( I )
. If we go to the extreme of
P( I ')
saying that O.J. Simpson was no more likely to be the killer than anyone else in the
world, then P ( I ') ≈ 10 −10 . Realistically, P(I’) is much larger than this. Also, P ( I ) ≈ 1 .
Hence,
P( I )
= 1010 and P ( I | M 1M 2) ≤ 8.65 × 10 −9 . The conditional probability of
P ( I ')
innocence given both DNA matches is so small as to place the defendant’s guilty beyond
any reasonable doubt.
Many jurors have stated that DNA blood matching is no more reliable than
fingerprint matching. To illustrate the reliability of DNA evidence, here’s an analogy
that involves a deck of cards. There are 52! ≈ 8.07 x 1067 possible card permutations.
Keep in mind that DNA permutations are much larger. From a drop of blood at a crime
scene, there are only a few DNA sites that could be extracted. This is comparable to
having only a few card positions in the deck being taken out. Also, depending on the
amount of DNA extracted or the degree of contamination, not all the information at a site
could be established. This would correspond to only being able to determine the number,
suit, or color of a particular card. So, if the suspect’s blood is consistent with that found
at the crime scene, how convicting would that be?
Let’s assume this the partial information is available at only seven positions:
(♠, ♦ , 2 - ♥, 3 - ♣, 10 - ♥, 4 - ♦, B)
Likewise, suppose at the same DNA site locations in the suspect’s blood sample we find
(2 -♠, 10 - ♦ , 2 - ♥, 3 - ♣, 10 - ♥, 4 - ♦, K - ♣)
This shows that the suspect’s blood is consistent with that found at the crime scene.
Assuming all permutations are equally likely, the probability of obtaining the match is
13
52
×
13
1
×
51
50
×
1
49
×
1
48
×
1
47
×
24
46
≈ 6 . 0154
× 10
− 9
This demonstrates only six people in a billion would produce a match.
This report has shown that DNA evidence is involved with many court cases.
Each strand of DNA evidence matching greatly increases the probability of one person
committing a crime. However, the evidence has not always been presented correctly
since many jurors don’t understand the evidence at hand such as in our case study of the
O.J. Simpson trial.
Bibliography
Dale, Mike., Alissa Proctor, and Joel Williams. “Evidence: The True Witness.”
[http://library.thinkquest.org/17049/]. 1998.
“Illinois suspends death penalty.” [www.cnn.com/2000/US/01/31/illinois.executions.02].
Jan. 31, 2000.
“Judge allows DNA tests on robbery suspect’s dog.”
[www.cnn.com/2000/us/04/04/dog.hair.dna.ap]. April 4, 2000.
Kaye, D.H., and Koehler, Jonathan J. “Can Jurors Understand Probabilistic Evidence?”
Journal of the Royal Statistical Society, Series A Vol. 154, Part I. 1991: 75-81.
Possley, Maurice. “Prisoner to go free as DNA clears him in beauty shop rape.”
[www.soci.niu.edu/rape-willis.html]. Feb. 24, 1999.
Pringle, David. “Who’s the DNA Fingerprinting Pointing At?” New Scientist. Jan 29,
1994: 51-52.
Saunders, Sam C., Chris N. Meyer, and Dane W. Wu. “Compounding Evidence from
Multiple DNA-Tests.” Mathematics Magazine 72, NO 1. Feb. 1999: 39-43.
“Serial killer probe relies on old-fashioned police work, DNA technology.”
[www.cnn.com/2000/us/04/22/spokane.slayings.ap/index.html]. April 22, 2000.
Starr, Cecie. BIOLOGY concepts and applications 4th ed. Brooks/Cole, 2000.
“The Simpson defense: a case review.”
[www.cnn.com/US/OJ/verdict/defense/index2.html]. Oct. 3, 1995.
Bibliography cont.
“The Simpson prosecution: a case review.”
[www.cnn.com/us/oj/verdict/prosecution/index2.html]. Oct. 3, 1995.
Download