MATH 2441 Probability and Statistics for Biological Sciences Calculating Probabilities: II Some Basic Relationships Between Probabilities We continue the summary of terminology of probability theory and basic properties of probabilities by looking briefly at some relationships between probabilities of events which may be compound events. Example SmokerAge: This table gives a breakdown of the 2534 employees of a certain large organization by age group and smoking history. age group 30-39 40-49 50 total cigarette smoker 171 318 353 130 972 pipe smoker ex-smoker non-smoker 0 43 227 2 141 406 11 183 281 9 134 125 22 501 1039 441 867 828 398 2534 20-29 total We will use this information to illustrate the formulas and concepts described briefly below. The Story So Far…. Suppose employees of this company were to be selected randomly so that every employee had the same likelihood of being selected. Since the selection of one employee can have any of 2534 distinct but equallylikely outcomes, the probability selecting a specific individual employee is 1/2534. If we define the event: A = the employee selected is an ex-smoker then Pr(A) = 501/2534 0.1977 because the probability of the event A is equal to the sum of the probabilities of the simple events which make it up. For A, those simple events are the 501 equally-likely outcomes that correspond to each of the 501 ex-smoker employees of this organization, each having a probability of 1/2534 of being selected. Similarly, if we define the event B = the employee selected is in the age group 40 - 49 then Pr(B) = 828/2534 0.3268. The Complementary Event We refer to the event that "A does not occur" as the complement of A, denoted by Ac (other systems of notation are also used). Since A and Ac are mutually exclusive, and since between them they cover all possible outcomes, we can write that David W. Sabo (1999) Calculating Probabilities: II Page 1 of 8 Pr(A) + Pr(Ac) = 1 or Pr(A) = 1 - Pr(Ac) (PR-1) Thus, given that A = the event that a randomly selected employee is an ex-smoker and that Pr(A) = 501/2534 0.1977 from above, then it follows that Ac = the event that the randomly selected employee is not an ex-smoker and Pr(Ac) = 1 - Pr(A) = 1 - 501/2534 = 2033/2534 0.8023 or Pr(Ac) 1 - 0.1977 = 0.8023 Note that we get the same result if we just sum up the probabilities of all employees who are not exsmokers: Pr(not an ex-smoker) = Pr(cigarette smoker) + Pr(pipe smoker) + Pr(non-smoker) = 972/2534 + 22/2534 + 1039/2534 = 2033/2534 0.8023 The formula (PR-1) is one of the most important and useful probability formulas we will encounter in the course. At the very least, it allows us to use standard probability tables very flexibly. However, there are also many probability problems which are almost impossible to solve, but for which the probability of the complement of the event of interest is obtained very easily. In class, we will look at one dealing with duplicate birthdays, but no details are given here so that the fun is not spoiled. Intersections of Events The event C = A and B C=AB or known as the "intersection of events A and B" is the event that occurs only when both A and B have occurred. For example, if A = the event that the selected employee is in the age group 30 - 39 B = the event that the selected employee is a non-smoker then Pr(A B) = Pr(the selected employee is both in the 30-39 age group and is a non-smoker) = 406/2534 0.1602 We could get this probability directly from the numbers in the table. Later in this document, we will give a somewhat more general formula for Pr(A B). Use of Venn Diagrams to Sort Out Events When you work with compound events, it is important to be able to accurately keep track of outcomes which are common to two or more compound events as well as those which are not. One common way of doing so is through the use of so-called Venn Diagrams. Page 2 of 8 Calculating Probabilities: II David W. Sabo (1999) A Venn diagram represents the sample space, S, as a rectangle. Simple events are thought of as points inside this rectangle, though they are not drawn explicitly. Compound events are represented by circles sketched inside the rectangle. You think of the circles as containing the simple events that make up that compound event. Then, two compound events that have some simple events in common will be sketched as overlapping circles -- the region of overlap represented the simple events that are common to both compound events. S B A A B The intent here is simply to represent the major components of the problem. So, the circle labeled 'A' simply indicates that A is an event comprising one or more simple outcomes. Similarly, the circle labeled 'B' indicates that B is an event comprising one or more simple outcomes. The region where circles A and B overlap represents those simple outcomes which are shared by A and B. The crescent-shaped region of A outside of the overlap region represents those outcomes that are part of A, but not part of B. If A and B have no outcomes in common (that is, they are mutually exclusive or disjoint), then the two circles would not be sketched overlapping in the Venn diagram: S A B mutually exclusive events To be specific, consider the two events from the previous section: A = the event that the selected employee is in the age group 30 - 39 B = the event that the selected employee is a non-smoker These two events are not mutually exclusive, since there are non-smokers in the age group 30 - 39 in the employ of the organization. The parts of the Venn diagram for these two events have the following meanings: David W. Sabo (1999) Calculating Probabilities: II Page 3 of 8 employees who are in the 30-39 age group but are not non-smokers employees who are nonsmokers but are not in the 30-39 age group S B A employees who are non-smokers and in the 30-39 age group all other employees who are neither non-smokers nor in the 30-39 age group Unions of Events The event C = A or B C=AB or known as the "union of events A and B" is the event that occurs whenever one or both of events A and B occur. Thus, if A = the event that the selected employee is in the 30 - 39 age group B = the event that the selected employee is in the 40 - 49 age group then C = A B is the event that the selected employee is either in the 30 - 39 age group or in the 40 49 age group. In a situation such as this, with events A and B mutually exclusive (or non-intersecting), the probability of their union is just the sum of their individual probabilities: Pr(C) = Pr(A B) = Pr(A) + Pr(B) = 867/2534 + 828/2534 = 1695/2534 0.6689 We need to be a bit more careful, however, when the two events overlap. Return to a previous example where we had A = the event that the selected employee is in the age group 30 - 39 B = the event that the selected employee is a non-smoker In the Venn diagram, A B corresponds to the outcomes contained within the two-lobed region of the overlapping A and B circles. If we simply sum Pr(A) and Pr(B) in an attempt to get Pr(A B), we will end up counting the outcomes in the overlap region twice -- once in Pr(A) and again in Pr(B). This is an error, of course. To correct for the double counting, we must subtract the extra counting of the common outcomes. In symbols, this is Pr(A B) = Pr(A) + Pr(B) - Pr(A B) (PR-2) Thus, for the present example, we must first determine that Pr(A B) = Pr(selected employee is a non-smoker in the 30-39 age group) = 406/2534 Page 4 of 8 Calculating Probabilities: II David W. Sabo (1999) and so Pr(A B) = Pr(selected employee is a non-smoker or is in the 30-39 age group or both) = Pr(A) + Pr(B) - Pr(A B) = 1039/2534 + 867/2534 - 406/2534 = 1500/2534 0.5919 As a check, we note that A B corresponds to those numbers in the second column and fourth row of the body of the data table at the beginning of this document. This includes 318 + 2 + 141 + 406 + 227 + 281 + 125 = 1500 individuals out of the total of 2534 employees. From this, we also conclude Pr(A B) = 1500/2534 0.5919. Note that formula (PR-2) is valid whether A and B are mutually exclusive or not. If the two events are mutually exclusive, then A B is impossible, and so Pr(A B) = 0. In that case, Pr(A B) is just the sum, Pr(A) + Pr(B), as we noted before. Conditional Probability It is useful to introduce a notation to indicate some restriction of the sample space (or to represent some additional condition that is known to be true). The symbol Pr(B|A) spoken "the probability of event B given event A" stands for the probability of B occurring if we know A has occurred or A is true. This is what we mean by a conditional probability. It distinguishes the probability of event B occurring, Pr(B), in the absence of any other information from the probability of event B occurring when we know that event A has occurred. These two probabilities may not have the same values. Conditional probabilities are useful because very often, we have information which doesn't make it certain that an event will occur (or will not occur), but does make one or the other alternatives more likely than they would be in the absence of that information. For example, if B = the event that it rains today, and A = the event that it is cloudy today, then Pr(B|A) is the probability of rain today when we know the day is cloudy, whereas Pr(B) would be the probability of rain today without any information on current climatic conditions. These two probabilities can be quite different. The probability of rain is presumably higher on a cloudy day than on a day which is not cloudy. In reference to a Venn diagram, Pr(B|A) implies that we are looking only at the part of the sample space corresponding to outcomes in A -- we know A has happened or is true. The only part of that region which corresponds to B occurring is the overlap region, A B. Thus, formally at least, we can write Pr( B | A) Pr( B A) Pr( A) (PR-3) This is how a conditional probability works. Define the events A and B as before: A = the event that the selected employee is in the age group 30 - 39 B = the event that the selected employee is a non-smoker Suppose an employee is selected at random, and identified as being in the 30 - 39 age group. What is the probability that they are a non-smoker? In the absence of any information about the employees age group, the best we can do is Pr(B) = 1039/2534 0.4100 However, once we are told that the employee selected is in the 30 - 39 age group, our possible simple outcomes must be just those 867 employees in that age group. Further, the question "what is the probability David W. Sabo (1999) Calculating Probabilities: II Page 5 of 8 that a randomly selected employee is a non-smoker if we know that they are in the 30 - 39 age group?" is just a question to determine Pr(B|A) = Pr(employee is a non-smoker | employee is in 30 - 39 age group): 406 Pr( B A) 2534 406 Pr( B | A) 0.4683 867 Pr( A) 867 2534 Notice how the fractions simplify down to what you'd expect: the probability of having selected one of the 406 non-smokers in the 867 employees in the 30 - 39 age group. This example shows you how to use the formula to calculate a conditional probability. It doesn't really indicate how immensely useful the notion of conditional probability actually is. We will give one application in the next section, but the truly astonishing results that arise from this formula must wait until later in the course, when we discuss the total probability formula and Bayes' formula. In many instances, conditional probabilities are easier to compute than is the probability Pr(A B), and so formula (PR-3) is used to compute Pr(A B) -- see the section on the multiplication law below. Independent Events Two events, A and B, are said to be independent if Pr (B|A) = Pr(B) or Pr(A|B) = Pr(A) That is, the probability of one of them occurring isn't affected by whether or not the other has occurred. Events that are not independent are of course dependent. (Don't confuse the notion of independent events with the notion of mutually exclusive events. In fact, mutually exclusive events are, by their definition, very very dependent! Since Pr(A B) = 0 if events A and B are mutually exclusive, then Pr(B|A) = 0 and Pr(A|B) = 0, and so the conditions for independence would not be satisfied if A and B each had a nonzero probability.) For example, the two events, A = the event that the selected employee is in the age group 30 - 39 B = the event that the selected employee is a non-smoker are dependent (that is, they are not independent), because Pr (B|A) = 406/867 0.4683, but Pr(B) = 1039/2534 0.4100 Pr(A|B) = 406/1039 0.3908, but Pr(A) = 867/2534 0.3421 Similarly Independence is an important statistical concept, and we will develop ways of detecting its probable presence or absence from sample data later in the course. Perhaps the easiest example of independent events can be demonstrated for the experiment in which a fair coin is flipped twice in a row. We know from the preceding document in this series that this experiment will result in four possible equally likely outcomes: {HH, HT, TH, TT} where 'HH' means the first flip resulted in heads and the second flip resulted in heads, etc. Now, define the events A and B as follows: A = the event that on the first flip, the coin lands heads up B = the event that the second flip, the coin lands heads up Then, Page 6 of 8 Calculating Probabilities: II David W. Sabo (1999) Pr(B|A) = Pr(second flip produces a heads up given that the first flip produced a heads up). If these two events are independent, then the probability of flipping the coin heads up is not affected in any way by how many heads you've already gotten. Now, Pr( B | A) 1 Pr( B A) Pr( HH ) 4 1 Pr( A) Pr( HH ) Pr( HT ) 1 1 2 4 4 But Pr(B) = Pr(HH) + Pr(TH) = 1/4 + 1/4 = 1/2 In this last line, we used the fact that B is the event that the second flip results in heads. That means that B corresponds to the two simple outcomes HH and TH, which are mutually exclusive and each have a probability of 1/4. In the line before that, we noted that B A is the event that both the first flip and the second flip resulted in heads, and therefore must be the same thing as HH, which has a probability of 1/4. Event A, that the first flip resulted in heads, corresponds to the two simple outcomes HH and HT, each with a probability of 1/4. Anyway, the result of the calculation is that Pr(B|A) = Pr(B), and so events A and B are independent. This means that for a fair coin, the second flip is no more or less likely to give heads if the first flip gave heads than if the first flip gave tails. In fact, you can extend this result to any sequence of coin flips. Even if you flip 10 heads in a row, the probability of getting heads on the 11 th flip is still 1/2. (Don't confuse this with the statement that the probability of getting 11 heads in a row is 1/2 -- that's a recipe for losing your shirt!). The Multiplication Law The multiplication law is just a rearrangement of the defining equation for conditional probabilities: Pr(A B) = Pr(A|B) Pr(B) = Pr(B|A) Pr(A) (PR-4) If the two events, A and B, are independent, then this simplifies to Pr(A B) = Pr(A) Pr(B) (PR-5) because Pr(A|B) = Pr(A) and Pr(B|A) = PR(B) in that case. The more general form is quite intuitive. Pr(A B) is the probability of observing both events A and B. For both to occur, either A must occur and then B, or B must occur and then A -- hence the two alternative righthand sides. Further, if you think of probabilities in terms of relative frequencies, then to get the relative frequency of both A and B occurring, we can start with the relative frequency of A occurring, which is Pr(A), and multiply this by the relative frequency with which B occurs when A has occurred, namely Pr(B|A). Alternatively, we could start with the relative frequency with which B occurs, Pr(B), and multiply it by the relative frequency with which A occurs when B has occurred, Pr(A|B). As a way to illustrate the use of this formula, let's return to the familiar two events: A = the event that the selected employee is in the age group 30 - 39 B = the event that the selected employee is a non-smoker We already know that Pr(A B) = 406/2534 0.1602 from previous work. However, we can demonstrate that these multiplication law formulas give exactly the same result. Recall that Pr(A) = 867/2534, Pr(B) = 1039/2534, Pr(A|B) = 406/1039, and Pr(B|A) = 406/867. So, using (PR-4), we get Pr(A B) = Pr(B|A) Pr(A) = (406/867) x (867/2534) = 406/2534 or Pr(A B) = Pr(A|B) Pr(B) = (406/1039) x (1039/2534) = 406/2534 David W. Sabo (1999) Calculating Probabilities: II Page 7 of 8 Thus, we get the expected result with both variants of the formula. Page 8 of 8 Calculating Probabilities: II David W. Sabo (1999)