TRIAL BY PROBABILITY: BAYES’ THEOREM IN COURT Jill Thompson Sally Danielson Eric Hartmann Dave Bucheger Justin Koplitz Let’s say that DNA evidence is brought into a trial. The prosecution explains a defendant has a consistent DNA match with the blood found at the crime scene. How convicting can it be? If you say there’s a one-in-a-million chance that two people have a DNA match for a particular test, is that enough to convict? They say there’s a one-in-amillion chance to win the lottery, but someone does win. DNA can play a crucial role in a court case; it can be used to exonerate as well as convict. It is also interesting to note that not only human DNA, but also non-human DNA can be presented in court. This report will deal with many issues that DNA evidence raises. First, DNA cases currently in the news will be discussed. Next, the prosecutor’s fallacy will be explained, and it will be determined if jury members really do understand the probability behind DNA matches. The actual composition of DNA will then be explained to help comprehend the number of possible DNA permutations involved. Following this, the probability behind DNA will be discussed with the help of Bayes’ theorem. Also, the concept of compounding evidence will be applied to the trial of the century, the O.J. Simpson trial. Finally, a simple analogy of what a consistent DNA match indicates will be presented. The earliest case involving DNA evidence occurred in 1979. The case involved a conviction due to the evidence. However, it was not until July of 1987 that DNA evidence was involved in the exoneration of a defendant. More recently, there have been many interesting cases in the news that involve DNA evidence. An interesting case that involves non-human DNA is in Texas. A judge has decided to allow evidence involving DNA tests on strands of dog hair found on a duffel bag left at the scene of the crime. Hair was taken from suspects’ dogs and a match was found. The trial is set to begin July 10, 2000. Another interesting case, reported on April 22, 2000, is taking place in Spokane, Washington. A special task force was established for three years at a cost of 2.2 million to find a serial killer. There were many samples of similar DNA left at the crime scenes with no other clues in the cases but one; a description of a 1977 white corvette had been seen in the area after many of the murders. Finding records of a 1977 white corvette getting a traffic violation in the area obtained a name along with a DNA test. After some further questioning, a man was arrested and charged with 12 murders dating back to 1990. This number could still increase to 18 before the trial begins. A third case was reported on April 8, 2000 in Australia, about 310 miles from Sydney. The small community will be taking DNA samples from over 600 men who are 18 or older to help solve the rape and beating of a 91 year old women. It is not a required test; however, if anyone chooses not to take the DNA test he will automatically become a suspect. This screening practice is already taking place in Britain. In the last five years, there have been more than 100 tests carried out with 1/3 resulting in arrests. The British government now wants to set up a nation wide database of DNA samples when previously the innocent samples were destroyed. In Illinois, since the death penalty resumed in 1977, there have been 12 people put to death. However, there have also been 13 death row inmates who have been cleared of their crimes. Most of them were released on behalf of DNA testing. The Illinois governor has postponed all lethal injections pending an investigation. Nationally, there have been 64 people who have had their criminal convictions thrown out after being exonerated by DNA testing. Another case where interpreting DNA evidence came into play was in December of 1993 when Andrew Dean’s appeal for a rape conviction was tried. Originally, in Dean’s 1990 trial, DNA match probabilities were used to help get a conviction. Match probability is the probability that the DNA profile of a random person in the population matches the crime sample. In the Dean appeal, Peter Donnelly, a statistics professor from London, argued that the DNA probabilities were presented in a way that was misunderstood by the jury; this is known as the prosecutor’s fallacy. He claimed that forensic evidence gives us the probability that the defendant' s DNA profile matches the crime sample assuming the defendant is innocent, or P (Match | Innocent). What the jury is looking for though is the probability that the defendant is innocent assuming the DNA profiles of the defendant and the crime sample match, or P (Innocent | Match). Assuming that the probability of a match given innocence is equal to the probability of innocence given a match is the prosecutor’s fallacy. Donnelly presented an example of this misconception. He said to assume a group of judges were playing poker with the Archbishop of Canterbury. If the archbishop dealt a royal flush on the first hand, the judges may suspect him of cheating; the probability of the archbishop dealing the royal flush, assuming he’s honest, is 1 in 70,000. But, if the judges were asked whether the archbishop was honest, given he dealt a royal flush, they’d probably quote a much greater probability. Therefore, the probability of a flush given honesty and the probability of honesty given a flush may not be equal. Donnelly then showed that by using Bayes’ Theorem, we can precisely figure out how prior beliefs should be altered in the light of experimental data. With Bayes’ Theorem we can find the odds of the defendant being innocent, or the ratio of the probability of the innocence to the probability of guilt. “Prior odds” are the odds of innocence before hearing DNA evidence, and the “posterior odds” are the odds of innocence after hearing DNA evidence. So the posterior odds equal the prior odds multiplied by the DNA match probability. The following example demonstrates how match probabilities can be confused. Pretend a crime is committed in Menomonie,WI by an unidentified white male. The number of possible perpetrators could be at most the entire white male population in Menomonie, say 8,000. So the prior odds of a white male in Menomonie,WI are about 8,000 to 1 in favor of innocence. If the probability of a random DNA match with a suspect is 1 in a million, then the posterior odds are 8,000 x 1 in a million, or 125 to 1 favoring innocence, not one in a million. This misconception is an example of the prosecutor’s fallacy. Getting back to the Dean case, Donnelly showed that the prosecutor’s fallacy was effective enough to make the verdict of the original trial unsafe. Therefore, Dean’s conviction was lifted, and a retrial was ordered. The judge’s decision didn’t indicate that DNA profiling was unsafe, but that DNA evidence will have to be presented carefully enough in the future to avoid the jury from assuming the prosecutor’s fallacy. So, how accurately can judges, jurors, prosecutors, and attorneys process quantitative probabilistic evidence? In most cases, courts have been reluctant to admit the presenting of mathematical evidence because they fear that jurors will overweigh it relative to other evidence, meaning that it will sway their judgement one way more than it really should. However, studies have shown that the jury’s tendencies are the opposite, and most mathematical evidence is underweighed rather than overweighed. To determine whether or not juries correctly interpret probabilistic evidence, many studies involving hypothetical crimes and mock juries were conducted. The studies operated in the following manner: (1) the jury would initially write down the probability of how guilty they thought the defendant was, or P0. (2) They were next presented with the probabilistic evidence, such as the frequency of an incriminating trait T said to occur in the general population (denoted F (T)), or the probability that the defendant had the incriminating trait, denoted P (TD). (3) The jury’s posterior probabilities were taken on how guilty they thought the defendant was based on the new evidence. The data that came back from these mock trials would then be compared to the standard rule for revising probabilistic beliefs in the light of new evidence, or Bayes’ Theorem. Therefore, using Bayes’ Theorem, the probability of the person being guilty, or (P (G)), should be: P(TD | G) P (G ) P(G | TD ) = * P (G ' | TD ) P(TD | G ' ) P (G ') Since the juror probably believes the guilty person has the incriminating trait, and only a fraction F(T) of innocent people have trait T, then we can substitute 1/ F (T) for the likelihood ratio: P (TD | G ) P (TD | G ') So the odds of guilt are now: 1 P (G ) * F (T ) P (G ') Going back to the mock trials, in all cases it was shown that the mean judgement probabilities increased when evidence TD and F (T) was presented. A typical table from several mock trials is as follows: “Trial” F (T) Mean P0 Mean P1 Bayes P1 Goodman .001 .29 .47 .997 Goodman .10 .29 .34 .803 Faigmon, Baglioni .20 .61 .70 .876 Faigmon, Baglioni .40 .64 .71 .816 As seen in the table, comparing the mean posterior probability P1 and the Bayes P1 shows that the jury underweighed the probabilistic evidence since most jury P1’s are much lower than their Bayes’ counterpart. The study also showed that the prosecutor’s fallacy rarely occurred. The prosecutor’s fallacy would occur when people equate P (G | TD) to P (TD | G), or (1 – F (T)). No more than 5% of the juries reported their posterior judgement probabilities to be 1 – F(T). The most famous court case where probabilistic evidence was severely under weighed was in the trial of the century, the O.J. Simpson trial for the double murder of Nichole Simpson and Ron Goldman. There were many DNA related items found at the murder site including blood found on the white Ford Bronco, on his driveway, in the foyer of the house, in the bathroom, on a leather glove, and on a sock at the foot of his bed. The defense claimed that due to the volume of DNA evidence and alleged racist cops, the evidence must have been planted. O.J. was found not guilty. In order to understand DNA testing, one must first understand what DNA even is. DNA has a one of a kind structure that is made up of two chain-like strands arranged in a twisted ladder double helix form. Alternating sections of phosphate and a sugar called deoxyribose make up the sides of this unique structure. The inside strands that resemble the rungs of a ladder are pairs of thymine with adenosine and guanine with cytosine. A person’s DNA code has approximately 3 billion loci, or places, that tell his or her traits. Since all humans belong to the same species, much of each person’s DNA codes are identical. Some traits that have identical codes are things like ten fingers and ten toes; unique differences are created by unique DNA codes. Each person has a difference at 10 million spots along the DNA strand. One in every three hundred loci will be unique. It is these unique places that are called DNA markers and are used in Forensic Science to identify people with a process known as DNA fingerprinting. DNA fingerprinting is a process of identification that compares portions of DNA. Removing the DNA from a sample such as hair, blood, saliva, or a tissue sample creates a DNA fingerprint. A detergent is used to separate the DNA from the rest of the cellular material. It can also be separated by applying a large amount of pressure in order to “squeeze out” the DNA. Using restriction enzymes, the sample is then segmented. Next, the DNA is poured into a gel, such as agarose, or onto an agarose-coated sheet. Electrophoresis is then performed. Since DNA has a small negative charge, all of the DNA will be attracted to the bottom of the gel, where a positive charge was applied during electrophoresis. The smaller pieces of DNA will be able to move faster and thus be closer to the bottom than the longer pieces. Therefore, the various sizes of DNA will be separated by size, with the smaller pieces towards the bottom and the larger pieces towards the top. Heating or chemically treating the DNA in the gel can denature the DNA, rendering it into a single strand. The DNA is next blotted and applied to a sheet of nitrocellulose paper upon which it is baked so it permanently attaches to the sheet. Finally, the sample is ready to be analyzed. By taking an X-ray photo, a picture showing the DNA pattern can be developed. If the two DNA samples match, it is very likely that they came from the same person. Let M = the event of a DNA match between the defendant’s blood and blood found at the crime scene. Let I = the event the defendant is innocent. Let I’ = the event the defendant is guilty. There are reliable measures for P(M|I), the conditional probability of M given I. If the defendant is innocent, the blood must be someone else’s; hence, P(M|I) is the probability of a match between blood samples of two different individuals. Since human DNA signatures are extremely distinctive, the probability of a random match is very low. It is often found that P(M|I) is between 10-8 and 10-10, depending on the amount of DNA obtained. However, it is P(I|M), the likelihood of innocence given the evidence that should be considered by the jurors, not P(M|I). These two conditional probabilities are related by Bayes’ theorem: P (I | M ) = P (I ) P (M P (M ) | I) So P ( I | M ) ≠ P ( M | I ) unless P ( I ) = P ( M ) , which is the prosecutor’s fallacy. If in Bayes’ theorem, we substitute for P(M ) from the theory of total probability, we obtain the bigger form of Bayes’ theorem: | M ) = P ( I | M ) = P ( I P ( I ) P ( M P ( M P ( I ') P ( M P ( I | M P ( I | M ) < ) ≈ P ( I ) P ( M | I ) | I ) + P ( I ') P ( M P ( M P ( M | I ') | I ') | I ) P ( I ) P ( M | I ) P ( I ) + 1 P ( M | I ') P ( I ') | I ) P ( I ) | I ') P ( I ') P ( I ) P ( M P ( I ') | I ) In many cases, such as in the O.J. Simpson trial, more than one DNA sample is involved. Since the jury will need to consider this compounding evidence, let’s define some more variables. Suppose events, M1,…, Mk are all pieces of evidence introduced against the defendant. The probability of innocence given all the evidence, P ( I k i =1 Mi ) , needs to be calculated. A theorem exists that gives an upper bound for this probability, but first, some definitions need to be introduced. For a given event I, denote PI (⋅) = P (⋅ | I ) . Two events M1 and M2 are said to be conditionally independent with respect to I if PI ( M 2 | M 1) = PI ( M 2) . In this definition, the events M1 and M2 can be interchanged. Once interchanged, the conditional independence becomes mutual independence. The ratio P ( M 2 | M 1) gives a measure of how strongly associated M2 and M1 are. P ( M 2) If the ratio is < 1, then P ( M 2 | M 1) < P ( M 2) , meaning that M2 is less likely to occur given M1. The opposite is also true, if the ratio > 1 then P ( M 2 | M 1) > P ( M 2) , meaning that M2 is more likely to occur given M1. P ( M 2 | M 1) = P ( M 2) if and only if M1 and M2 are mutually independent. In the O.J. trial, evaluating the probability of guilt given the totality of DNA evidence is the concern. Let M1 be the event that a drop of blood found near the victims’ bodies is consistent with the defendant’s blood. Cellmark Diagnostics, the DNA laboratory of record in the trial, said that only one person in 170 million could be expected to match the genetic markers identified in the drop of blood. This would mean P ( M 1 | I ) = (1.7 × 10 8 ) −1 = 5.88 × 10 −9 . Let M2 be the event that blood on a sock belonging to the defendant and found in his bedroom is consistent with Nichole’s blood. Cellmark said that this probability was 1 in 6.8 million that another person would match the genetic markers they found in the victims’ blood. This would mean P ( M 2 | I ) = (6.8 × 10 9 ) −1 = 1.47 × 10 −10 . Also, it is given that P ( M 1 | I ') ≈ 1 and P ( M 2 | I ') ≈ 1 , noted as the compliment of the world’s population. One must strongly argue that M1 and M2 are more strongly associated conditionally given guilt than given innocence, in order to apply the theorem. If I is true, then M1 and M2 are independent. But assuming guilt, the occurrence of one would increase the probability of the other. The two blood-matching events would reasonably be judged to be more strongly associated conditionally given I’ than given I. Substituting these numbers into our equations gives: P (I | M 1 ∩ M 2 ) ≤ ≤ P (I ) P (M ⋅ P ( I ') P (M 1 ⋅ ( 5 . 88 × 10 10 − 9 ≤ 10 9 −9 ⋅ ( 8 . 65 1 1 | I )P (M | I ') P ( M )( 1 . 47 × 10 × 10 − 10 − 19 2 2 | I ) | I ') ) ) ≤ 8 .65 × 10 −10 Now we can give an upper bound for the ratio P( I ) . If we go to the extreme of P( I ') saying that O.J. Simpson was no more likely to be the killer than anyone else in the world, then P ( I ') ≈ 10 −10 . Realistically, P(I’) is much larger than this. Also, P ( I ) ≈ 1 . Hence, P( I ) = 1010 and P ( I | M 1M 2) ≤ 8.65 × 10 −9 . The conditional probability of P ( I ') innocence given both DNA matches is so small as to place the defendant’s guilty beyond any reasonable doubt. Many jurors have stated that DNA blood matching is no more reliable than fingerprint matching. To illustrate the reliability of DNA evidence, here’s an analogy that involves a deck of cards. There are 52! ≈ 8.07 x 1067 possible card permutations. Keep in mind that DNA permutations are much larger. From a drop of blood at a crime scene, there are only a few DNA sites that could be extracted. This is comparable to having only a few card positions in the deck being taken out. Also, depending on the amount of DNA extracted or the degree of contamination, not all the information at a site could be established. This would correspond to only being able to determine the number, suit, or color of a particular card. So, if the suspect’s blood is consistent with that found at the crime scene, how convicting would that be? Let’s assume this the partial information is available at only seven positions: (♠, ♦ , 2 - ♥, 3 - ♣, 10 - ♥, 4 - ♦, B) Likewise, suppose at the same DNA site locations in the suspect’s blood sample we find (2 -♠, 10 - ♦ , 2 - ♥, 3 - ♣, 10 - ♥, 4 - ♦, K - ♣) This shows that the suspect’s blood is consistent with that found at the crime scene. Assuming all permutations are equally likely, the probability of obtaining the match is 13 52 × 13 1 × 51 50 × 1 49 × 1 48 × 1 47 × 24 46 ≈ 6 . 0154 × 10 − 9 This demonstrates only six people in a billion would produce a match. This report has shown that DNA evidence is involved with many court cases. Each strand of DNA evidence matching greatly increases the probability of one person committing a crime. However, the evidence has not always been presented correctly since many jurors don’t understand the evidence at hand such as in our case study of the O.J. Simpson trial. Bibliography Dale, Mike., Alissa Proctor, and Joel Williams. “Evidence: The True Witness.” [http://library.thinkquest.org/17049/]. 1998. “Illinois suspends death penalty.” [www.cnn.com/2000/US/01/31/illinois.executions.02]. Jan. 31, 2000. “Judge allows DNA tests on robbery suspect’s dog.” [www.cnn.com/2000/us/04/04/dog.hair.dna.ap]. April 4, 2000. Kaye, D.H., and Koehler, Jonathan J. “Can Jurors Understand Probabilistic Evidence?” Journal of the Royal Statistical Society, Series A Vol. 154, Part I. 1991: 75-81. Possley, Maurice. “Prisoner to go free as DNA clears him in beauty shop rape.” [www.soci.niu.edu/rape-willis.html]. Feb. 24, 1999. Pringle, David. “Who’s the DNA Fingerprinting Pointing At?” New Scientist. Jan 29, 1994: 51-52. Saunders, Sam C., Chris N. Meyer, and Dane W. Wu. “Compounding Evidence from Multiple DNA-Tests.” Mathematics Magazine 72, NO 1. Feb. 1999: 39-43. “Serial killer probe relies on old-fashioned police work, DNA technology.” [www.cnn.com/2000/us/04/22/spokane.slayings.ap/index.html]. April 22, 2000. Starr, Cecie. BIOLOGY concepts and applications 4th ed. Brooks/Cole, 2000. “The Simpson defense: a case review.” [www.cnn.com/US/OJ/verdict/defense/index2.html]. Oct. 3, 1995. Bibliography cont. “The Simpson prosecution: a case review.” [www.cnn.com/us/oj/verdict/prosecution/index2.html]. Oct. 3, 1995.