Primer on Probability Trees Basic definitions Probability is a number between zero and one that tells us the likelihood of an event (near 0 means very unlikely, near 1 means likely). A trial is an operation whose outcome is uncertain, such as tossing a coin. An event is one of the outcomes that can occur when a trial occurs, such as tossing a coin and getting heads. If we use the letter E denote an event (such as tossing a coin and getting heads), then the probability of the event’s occurrence is Pr{E} There are two ways to assign a specific number to this probability 1. List all the events that can occur when a trial happens, such as heads and tails when tossing a coin. Then, decide logically what the probability of each event is based on the nature of the trials and events. When tossing a coin, the probability of getting heads is 0.5, and the probability of getting tails is also 0.5, because both outcomes are equally likely. 2. Think about, or do an experiment, where the same trial is repeated for a very large number of times, approaching infinity. The probability of event E is the number of times it occurs, divided by the number of trials. For example, if we tossed a coin a billion billion times, we would expect the proportion of times that we got heads to be very close to 0.5. A probability tree represents the events that can happen during one or more trials. Each trial is represented by a layer of branches. Each branch corresponds to one possible outcome of the trial, and each branch is labeled with the probability of the outcome. To represent the trial of tossing a coin once, the probability tree is relatively simple and has one layer of branches: Heads 0.5 0.5 Tails 1 To represent complex process of several trials and their outcomes, we add a layer of branches for each trial. Here is the probability tree representing the process of tossing a coin twice: Heads 0.5 Heads 0.5 0.5 Tails Heads 0.5 0.5 Tails 0.5 Tails Calculating with probability trees Probability trees can be used to identify all the outcomes that can occur during a complex process of multiple trials, and calculate their probabilities. Each pathway along the branches through a tree corresponds to one possible outcome of the complex process. Here there are four possibilities: Heads, Heads Heads, Tails Tails, Heads Tails, Tails For each possible outcome corresponding to one pathway through the branches, the probability that it occurs is the product of all the probabilities along the branches. So the probabilities of the four outcomes we identified are: Heads, Heads probability = 0.25 Heads, Tails probability = 0.25 Tails, Heads probability = 0.25 Tails, Tails probability = 0.25 Once we have defined all possible outcomes and their probabilities, we can calculate the probabilities of more complex events that combine the possible outcomes in different ways. The probability of a complex event is the sum of the probabilities of the simpler events identified in the tree. Example: What is the probability of getting different sides of the coin when it is tossed twice? Answer: This can happen two ways, according to the tree. Either Heads, then Tails, or Tails then Heads. Each of these two outcomes has probability 0.25. So the sum is 0.5, and this is the probability of getting two different sides when tossing a coin twice. 2 Example: What is the probability of getting the same side of the coin both times when it is tossed twice? Answer: Again, this can happen two ways, according to the tree. Either Heads occurs both times, or Tails occurs both times. Each of these two outcomes has probability 0.25. So the sum is 0.5, and this is the probability of getting the same side both times when tossing a coin twice. Example: When we sample individuals at random from a population that contains a proportion p of individuals of one type and a proportion q of individuals of another type, the probability of getting an individual of the first type is p, and of getting an individual of the second type is q. Suppose we sample individuals from a population of fruit flies where 30% are black and 70% are gray. Here is the probability tree for sampling one individual: Black 0.3 0.7 Gray We want to know the probability of getting two flies of the same color when we sample two flies at random. We calculate this by making a tree with two layers: Black 0.3 Black 0.3 0.7 Gray Black 0.3 0.7 Gray 0.7 Gray Our complex event is that we sample the two flies and they are the same color. There are two ways that this can happen, either both flies are black or both are gray. So we have to work along the paths corresponding to each of these possibilities, and calculate their probabilities by multiplying together the probabilities on the branches. 3 Path 1 – 2 black flies 0.3 X 0.3 = 0.09 Path 2 – 2 gray flies 0.7 X 0.7 = 0.49 Either of these outcomes satisfies the definition of our event of getting two flies of the same color, so now we add together these probabilities. Pr{2 flies same color} = 0.09 + 0.49 = 0.58 To summarize, probability trees allow us to break a complex event down into simpler events and combine probabilities following two rules: 1. The probability of an outcome corresponding to one path is the product of the probabilities along the branches. 2. The probability of an event with more than one path is the sum of the probabilities of the paths involved. Additional rules about calculating with probabilities: 1. The probability of an event is always between 0 and 1. 0 Pr{E} 1 A probability of zero means an event never happens, and a probability of one means it always happens. 2. The sum of the probabilities of all possible events is one. If the set of all possible events is E1, E2, …, Ek then k Pr{E } 1 i 1 i For example, on the probability trees we just looked at, when we get to the end of the four branches, we get to all possible events. You can check and see that the probabilities all add up to one. 3. The probability that an event E does not happen is one minus the probability of the event. 4 If we denote “the event E does not happen” symbolically as EC, then Pr{E C } 1 Pr{E} The notation here uses the superscript C to mean “complement”, in the sense that the complement of the event “E happens” is the event “E does not happen”. By themselves, these rules are too basic to do all the calculations we might want to do with probabilities, so we need some more rules. In particular we need rules to deal with compound events, meaning complex events made up of simpler events. When we combine two events to make a more complex event, an important consideration is whether the events are disjoint, or not. Disjoint means the two events cannot happen simultaneously, by definition. Consider the population of fruit flies where some are black and some are gray. Here are some examples of disjoint events: E1 = sample two flies and get both the same color E2 = sample two flies and get two different colors E1 = sample two flies and get two black flies E2 = sample two flies and get two gray flies If two can happen simultaneously then they are not disjoint. Here are some examples from the fly population again: E1 = sample two flies and get both the same color E2 = sample two flies and get two males E1 = sample two flies and get both the same color E2 = sample two flies and get at least one black fly Events that are disjoint may or may not be likely to occur together. But it is logically possible for them both to occur, given the way they are defined. Now we can introduce two more rules for situations where we have events E1 and E2 and we want to think about the event Either E1 or E2 happens Just to be clear, when we say “or” we mean that this complex event happens when E1 happens, when E2 happens, or when they both happen. 4. If two events E1 and E2 are disjoint, then 5 Pr{E1 or E2} = Pr{E1} + Pr{E2} 5. If two events E1 and E2 are not disjoint, then Pr{E1 or E2} = Pr{E1} + Pr{E2} – Pr{E1 and E2} Here the statement “E1 and E2” means the complex event that both E1 and E2 happen, not just one or the other. For events that are not disjoint, it is logically possible for both of them to happen so that Pr{E1 and E2} > 0 But for disjoint events, they can’t both happen, so Pr{E1 and E2} = 0 Therefore, rule 4 is really just a special case of rule 5, which applies to all events, and simplifies to rule 4 when the two events are disjoint. For events that are not disjoint, they could both happen, and we might want to do calculations with the probability that both happen. To deal with this, we need to consider one more way that two events can be related. We have already dealt with disjoint and non-disjoint events. Now we have to talk about independent and dependent events. Two events E1 and E2 are independent if the probability that E2 happens is unrelated to whether E1 has happened and vice-versa. The two events are dependent if the probability of E2 changes if E1 has happened. Very often, when events are dependent, some kind of cause and effect relationship is going on underneath these events. For example, when event E1 happens it could change things directly so that event E2 becomes more likely. Or perhaps some third event that we don’t know about has happened, and it makes both E1 and E2 more likely. That would make the probability of E2 related to the probability of E1. When events are dependent the concept of conditional probability becomes important. Conditional probability is the probability that one event happens, given that we know another event has happened. We write it like this: Pr{E2|E1} 6 We read this notation as the “probability that E2 happens, given that E1 has happened”, or simply as the “probability that E2 happens given E1.” Note that when we write this expression, it indicates that we know or assume that E1 has happened, and we are regarding E2 as uncertain. The order that E2 and E1 appear in this expression does not depend on their order in time, or which events might be “cause” and “effect”. The event we regard as uncertain is written to the right of the vertical line, and the one we know or assume has happened is written to the left. The conditional probability can be contrasted with the simple probability Pr{E2} This is just the overall probability that event E2 happens, without any information on the event E1. When two events are dependent Pr{E2|E1} ≠ Pr{E2} When two events are independent Pr{E2|E1} = Pr{E2} Mathematically, the conditional probability of E2 given E1 is defined as Pr{E2 | E1} Pr{E2 and E1} Pr{E1} Example: We are interested in hair and eye color in a group of men living in Germany, and have data on 2190 men in our group: Brown Eyes Blue Eyes Dark Hair 300 200 Light Hair 420 1270 We are interested in E1 = a man has dark hair E2 = a man has blue eyes We are interested in the conditional probability that a man has blue eyes, given that he has dark hair. 7 Pr{E2|E1} In the sample for this example, 500 men have dark hair and 1470 have blue eyes; 200 men have both dark hair and blue eyes. From these data we apply the principle that the probability of an individual having a particular characteristic is the proportion of individuals who have that characteristic in the larger population: Pr{E1} = 500 / 2190 = 0.2283 Pr{E2} = 1470 / 2190 = 0.6712 Pr{E1 and E2} = 200 / 2190 = 0.0913 Now, from the definition of conditional probability we can calculate Pr{E2|E1} = Pr{E1 and E2} / Pr{E1} = 0.0913 / 0.2283 = 0.4000 Now compare this to the simple probability that a man has blue eyes, which is Pr{E2}. Pr{E2} = 0.6712 > Pr{E2|E1} = 0.4000 Therefore, having blue eyes depends on hair color, and in this case having dark hair makes it less likely that a man will also have blue eyes. In fact, this result is fairly typical. Not every human population would have exactly the same numbers as this group of German men, but genes for hair and eye color are related in such a way that if someone has dark hair, they are also likely to have a gene that produces brown eyes instead of blue eyes. This means that these two traits are not independent, and blue-eyed, dark-haired people tend to be uncommon in human populations. Now that we know what conditional probability is, we can look at two more rules for taking the probabilities of simple events and combining them into the probabilities of more complex events. 6. If two events E1 and E2 are independent then Pr{E1 and E2} = Pr{E1} X Pr{E2} 7. If two events E1 and E2 are not independent then 8 Pr{E1 and E2} = Pr{E1} X Pr{E2|E1} Conditional probability is an extremely important concept, but it’s also a confusing concept. There are a lot of situations where people tend to confuse one conditional probability with the reverse one: Pr{E2|E1} versus Pr{E1|E2} These numbers are not usually equal, and just because one is large or small does not make the other one large or small. Application of conditional probability and probability trees to diagnostic medical tests One application of conditional probability where confusion often arises is in medical testing. Medical tests are not perfect, so if someone tests positive for a disease, that doesn’t necessarily mean they have the disease. Suppose you are tested for a fatal disease, something like cancer or AIDS. How should you react to a positive test result? If you get tested for a disease you are interested in this conditional probability Pr{have disease | positive test} Medical tests are designed to be good in the sense that the reverse conditional probability should be high. That is, so that Pr{test positive | have disease} This conditional probability is called the sensitivity of the test, and for a good test we want this to be high, meaning a probability close to 1. For example, we might have Pr{test positive | have disease} = 0.95 We want a good test to be likely to be positive when someone really has the disease. We also want it to be likely to be negative when someone does not have the disease. This aspect involves the conditional probability Pr{test negative | don’t have disease} This conditional probability is called the specificity of the test. For a good test, we want the specificity to be a high probability, so let’s say we have Pr{test negative | don’t have disease} = 0.90 9 Suppose you are in the position of taking a test for which the sensitivity is 0.95 and the specificity is 0.90, and you have just tested positive. How certain can you be that you have the fatal disease? Let’s analyze this situation by means of a probability tree. We’re going to need one additional hypothetical assumption to do so. We need to know the probability that a person has the disease, without considering whether or not they’ve been tested. That is, we need to know the prevalence of the disease in the population at large. Let’s assume that this is a moderately uncommon disease, with Pr{have disease} = 0.08 meaning that 8% of people at large have the disease. Now let’s construct a probability tree showing all the possibilities for this situation. We’ll start with the simple probability that a person has the disease based on this prevalence of the disease in the whole population: Have disease 0.08 0.92 Don't have disease We have to start here because the outcome of the test will depend on whether an individual actually has the disease. Next we can add branches involving the test results, and calculate the probabilities of all possible outcomes: Test positive, Pr = 0.076 0.95 Have disease 0.08 0.05 Test negative, Pr = 0.004 Test positive, Pr = 0.092 0.1 0.92 Don't have disease 0.9 Test negative, Pr = 0.828 10 Now let’s get back to the conditional probability that someone is most interested in if they’ve had a positive result. Pr{have disease | positive test} Our definition of conditional probability says that this is equal to Pr{have disease and positive test} / Pr{positive test} The probability of both having the disease and getting a positive test result is the uppermost branch of the table, and we calculated this to be Pr{have disease and positive test} = 0.076 The probability of a positive test is calculated by noticing that two of the four possible branches of the tree involve a positive test: Pr{have disease and positive test} Pr{don’t have disease and positive test} The first of these probabilities was just calculated to be 0.076 The second of these probabilities corresponds to the third branch down, and we calculated a probability of Pr{don’t have disease and positive test} = 0.092 Notice that these two events leading to a positive test are disjoint (events corresponding to different pathways through a probability tree are always disjoint), so we can calculate the probability of a positive test by the addition rule for disjoint events: Pr{positive test} = Pr{have disease and positive test} + Pr{ don’t have disease and positive test} = 0.092 + 0.076 = 0.168 Now we can finish calculating the conditional probability we want Pr{have disease | positive test} = Pr{have disease and positive test} / Pr{positive test} = 0.076 / 0.168 = 0.452 11 This is rather low. Even though you’ve just had a positive test for a fatal disease, the chances are less than 50% that you really have the disease. Is this surprising? Most people find it surprising, probably for two reasons. 1. There is a natural tendency to confuse two different conditional probabilities Pr{have disease | positive test} versus Pr{test positive | have disease} The latter probability is high because the test is a good test, with good sensitivity. But that doesn’t mean the first probability is high. 2. The prevalence of the disease in the population at large plays a big role in these calculations, and many people don’t understand this. Many fatal diseases are uncommon or rare. That is, the prevalence is low, and most people in the population at large do not have the disease. Whenever a disease is uncommon or rare, the probability calculation will show that Pr{have disease | positive test} is lower than you might expect, meaning usually lower than 50%, and definitely lower than the high probability associated with the sensitivity of the test. If a disease is very rare, then Pr{have disease | positive test} is likely to be a very small number, something like 0.05. So, most medical tests are somewhat inconclusive, meaning that a positive test indicates that a disease might be present, but does not make it certain that a disease is present. Because of this, medical doctors don’t make a diagnosis based on a single test result. Instead, if a person tests positive, the doctor follows up with further tests and examinations, to be certain that the disease is present, before starting a course of treatment. These issues of conditional probability are also related to the kinds of errors that medical tests can make. Diagnostic tests like those we’re talking about can have two kinds of correct results and two kinds of errors. 12 The correct results are True positive –someone who really has the disease tests positive. True negative –someone who really does not have the disease tests negative. The errors are False negative – someone who really has the disease tests negative. False positive – someone who really does not have the disease tests positive. In constructing our probability tree we found the probabilities for all these possible correct results and errors: Test positive, Pr = 0.076, True positive 0.95 Have disease 0.08 0.05 Test negative, Pr = 0.004, False negative Test positive, Pr = 0.092, False positive 0.1 0.92 Don't have disease 0.9 Test negative, Pr = 0.828, False negative Notice that the probability of a false positive is higher than the probability of a true positive. In medical testing situations where the disease is uncommon in the population at large, the probability of a false positive is often higher than the probability of a true positive. Exercises: 1. Researchers are interested in whether wearing seat belts protects against fatal injury in automobile accidents. They compiled the following data from all people involved in automobile accidents in the state of Florida: No Seat Belt Seat Belt Fatal 1,601 510 Using these data, calculate a. The simple probability that a seat belt is not worn. b. The simple probability that a fatal injury occurs. 13 Not Fatal 162,527 412,368 c. The probability that both a fatal injury occurs and a seat belt is not worn. d. The conditional probability that a fatal injury occurs, given that a seat belt was not worn. e. Is the event of having a fatal injury independent of not wearing a seat belt? 2. In the fall semester of 2007, 52% of the students at UT Arlington were women. Suppose that two students in the University are selected at random. Using the probability tree method, calculate: a. The probability that both students are men. b. The probability that at least one student is a man. 4. The following figures are typical for situations involving cancer and diagnostic tests. The prevalence of the cancer is 160 per 100,000 people, that is, the simple probability that an individual in the population at large has the disease is 0.0016. The diagnostic test for the disease has a probability of 0.85 of turning out positive for an individual who has the disease. The diagnostic test has a probability of 0.03 of turning out positive for an individual who does not have the disease. Calculate the following conditional probabilities: a. The conditional probability that a person has the disease, given that they have had a positive test result, Pr{have disease | positive test}. b. The conditional probability that a person does not have the disease, given that they have had a negative test result, Pr{don’t have disease | negative test}. 14