Basic Properties of Probability Definitions: A random experiment is a procedure or an operation whose outcome is uncertain and cannot be predicted with certainty in advance. The collection of all possible outcomes is called the sample space. We will typically use the letter S to denote a sample space. An event is any subset of the sample space. Events are usually denoted by capital letters other than S. Events are collections (or sets) of outcomes from the sample space. Examples of random experiments: 1. Toss a coin three times and record the results of each toss in order. List all events in the sample space. Two examples of events from this random experiment are A = the event that there are exactly 2 heads in the 3 tosses B = the event that there are no heads List the outcomes contained in each of these events. 2. Toss a coin repeatedly and count the number of tosses until the first heads. What is the sample space? 3. Measure the lifetime of a light bulb in hours. What is the sample space? An example of an event in this sample space would be the event A that a light bulb lasts at least 500 hours. Write this event using set notation. The outcomes in a sample space are said to be “equally likely” if they will all occur approximately equally often in the long run if the random experiment is repeated many, many times. In which of the examples above does the sample space consist of “equally likely” outcomes? 1 Notation of Set Theory First let us introduce some definitions and notation: Definitions: Let S be a sample space and let A and B be any two events from S. Then The union of the events A and B, denoted A B , is the event consisting of all outcomes that belong to A or B or both. The intersection of the events A and B, denoted A B or sometimes by the shorter AB, is the event consisting of all outcomes common to both A and B. The complement of an event A, denoted Ac , is the collection of all outcomes that are not in A. The event A is a subset of B, denoted A B , if every outcome in A is also contained in B. The empty set or null set, denoted Ø, is the event which consists of no outcomes. The events A and B are disjoint or mutually exclusive if A and B cannot happen simultaneously. Thus A and B are disjoint if A B . Pr(A) or P(A) is used to denote the probability of event A. Example: Shuffle a standard deck of 52 cards and randomly select one card from the deck. Then the sample space S consists of each of the 52 cards in the deck. Some possible events to consider: A = the card is a heart B = the card is a face card C = the card is the king of hearts D = the card is black Describe the following related events: Ac = A B = A B = The union of A and B is an event consisting of 22 outcomes (all 13 of the hearts plus the king, queen, and jack from each of the others suits). Note that C A and C B. List two pairs of events from above that are mutually exclusive. Probability as a Long-term Relative Frequency The probability of a random event is the long run proportion (or relative frequency) of times the event would occur if the random process were repeated over and over an extremely large number of times under identical conditions. The probability of an event can be approximated by simulating the process a large number of times. Simulation leads to an empirical, or experimental, estimate of the probability. 2 Treating probabilities as long-term frequencies is known as the frequentist approach to probability. Using the frequentist approach, if a sample space consists of a finite number of possible outcomes, say N, and all outcomes are equally likely, then it is natural to assign equal probabilities to each outcome. That is, 1 1 P each outcome . total number of outcomes N Furthermore, in this situation in which all outcomes are equally likely, if an event A consists of M distinct outcomes, then the probability of the event A is given by P A number of outcomes in A number of outcomes in S M N . In the example of dealing cards on the previous page, since we are shuffling the deck and then randomly selecting one card, all 52 cards are equally likely. As a result, we can compute the probability of any event in the sample space S simply by counting the number of outcomes in the event and dividing by 52, the total number of outcomes in the sample space. For example, 3 P(the card is a heart and a face card) = P A B . 52 Find the probability of each of the other events described on the previous page. Note: Since events are sets, it makes sense to perform set operations such as complement, intersection, and union on them, but it makes no sense to perform arithmetic operations such as addition or multiplication on events. On the other hand, probabilities are numbers, so it is legitimate to add, multiply, and divide probabilities but not to take complements, intersections, or unions of them. 3 Case Study: 100 Best Films In 1998, the American Film Institute created a list of the top 100 American films ever made. The list is included below. Rank Title 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 Citizen Kane Casablanca The Godfather Gone With The Wind Lawrence Of Arabia The Wizard Of Oz The Graduate On The Waterfront Schindler's List Singin' In The Rain It's A Wonderful Life Sunset Boulevard The Bridge On The River Kwai Some Like It Hot Star Wars All About Eve The African Queen Psycho Chinatown One Flew Over The Cuckoo's Nest The Grapes Of Wrath 2001: A Space Odyssey The Maltese Falcon Raging Bull E.T The Extra-Terrestrial Dr. Strangelove Bonnie And Clyde Apocalypse Now Mr. Smith Goes To Washington The Treasure Of The Sierra Madre Annie Hall The Godfather Part Ii High Noon To Kill A Mockingbird It Happened One Night Midnight Cowboy The Best Years Of Our Lives Double Indemnity Doctor Zhivago North By Northwest West Side Story Rear Window King Kong The Birth Of A Nation A Streetcar Named Desire A Clockwork Orange Taxi Driver Jaws Snow White And The Seven Dwarfs Butch Cassidy And The Sundance Kid Year 1941 1942 1972 1939 1962 1939 1967 1954 1993 1952 1946 1950 1957 1959 1977 1950 1951 1960 1974 1975 1940 1968 1941 1980 1982 1964 1967 1979 1939 1948 1977 1974 1952 1962 1934 1969 1946 1944 1965 1959 1961 1954 1933 1915 1951 1971 1976 1975 1937 1969 Rank Title 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 Year The Philadelphia Story From Here To Eternity Amadeus All Quiet On The Western Front The Sound Of Music M*A*S*H The Third Man Fantasia Rebel Without A Cause Raiders Of The Lost Ark Vertigo Tootsie Stagecoach Close Encounters Of The Third Kind The Silence Of The Lambs Network The Manchurian Candidate An American In Paris Shane The French Connection Forrest Gump Ben-Hur Wuthering Heights The Gold Rush Dances With Wolves City Lights American Graffiti Rocky The Deer Hunter The Wild Bunch Modern Times Giant Platoon Fargo Duck Soup Mutiny On The Bounty Frankenstein Easy Rider Patton The Jazz Singer My Fair Lady A Place In The Sun The Apartment Goodfellas Pulp Fiction The Searchers Bringing Up Baby Unforgiven Guess Who's Coming To Dinner Yankee Doodle Dandy 1940 1953 1984 1930 1965 1970 1949 1940 1955 1981 1958 1982 1939 1977 1991 1976 1962 1951 1953 1971 1994 1959 1939 1925 1990 1931 1973 1976 1978 1969 1936 1956 1986 1996 1933 1935 1931 1969 1970 1927 1964 1951 1960 1990 1994 1956 1938 1992 1967 1942 Suppose that two people (we’ll call them Allan and Beth) get together to watch a movie and, to avoid potentially endless debates about a selection, decide to choose a movie at random from the “top 100” list. You will investigate the probability that it has already been seen by at least one of them. 4 Explorations Basic Probability Rules Suppose that one film is selected at random from the list. Let A denote the event that Allan has seen the film and let B denote the event that Beth has seen the film. Note that the events A and B can be thought of as sets; for example, A is the set of all films that Allan has seen. Since the movie is being selected “at random,” each of the 100 films is equally likely to be chosen; that is, each has probability 1/100. The probabilities of various events can thus be calculated by counting how many of the 100 films comprise the event of interest. For example, the following 2x2 table classifies each movie according to whether it was seen by Allan and whether it was seen by Beth: Beth yes 36 16 Allan yes Allan no Total Beth no 9 39 Total 100 a) Translate the following events into set notation using the symbols (A and B, complement, union, intersection) defined on the previous page. Then give the probability of the event as determined from the table: Allan and Beth have both seen the film. Allan has seen the film and Beth has not. Beth has seen the film and Allan has not. Neither Allan nor Beth has seen the film. b) Fill in the marginal totals of the table. From these totals determine the probability that Allan has seen the film and also the probability that Beth has seen the film. (Remember that the film is chosen at random, so all 100 are equally likely.) Record these probabilities along with the appropriate symbols below. c) Determine the probability that Allan has not seen the film. Do the same for Beth. Record these, along with the appropriate symbols, below. 5 d) If you had not been given the table, but instead had merely been told that P A .45 and and P B ? P B .52 , would you have been able to calculate P Ac c Explain how. One of the most basic probability rules is the complement rule, which asserts that the probability of the complement of an event equals one minus the probability of the event: P Ac 1 P A . e) Add the counts in the appropriate cells of the table to calculate the probability that either Allan or Beth (or both) have seen the movie. Also indicate the symbols used to represent this event. f) If you had not been given the table but instead had merely been told that P A .45 and P B .52 , would you have been able to calculate P A B ? Explain. g) One might naively think that P A B P A P B . Calculate this sum, and indicate whether it is larger or smaller than P A B and by how much. Explain why this makes sense, and indicate how to adjust the right side of this expression to make the equality valid. 6 The addition rule asserts that the probability of the union of two events can be calculated by adding the individual event probabilities and then subtracting the probability of their intersection: P A B P A P B P A B . This rule should make good intuitive sense. If we simply add the probabilities of events A and B, we are counting all outcomes that are in both A and B twice, so we need to subtract off the probability of A B in order to eliminate this double counting. h) Use this addition rule as a second way to calculate the probability that Allan or Beth has seen the movie, verifying your answer to e). i) As a third way to calculate this probability, first identify (in words and in symbols) the complement of the event that Allan or Beth has seen the movie. Then find the probability of this complement from the table. Then use the complement rule to determine P A B . Does this match your answers to e) and h)? j) Under what circumstances is it valid to say that P A B P A P B ? If A B , then it follows that P A B P A P B . This is known as the addition rule for disjoint events; it is a special case of the addition rule since if A B , P A B P 0 . 7 Conditional Probability We are often interested in the conditional probability of one event given the information that another event has occurred. You will find that it is straightforward and intuitive to derive a reasonable definition of conditional probability by examining data in a two-way table. Suppose again that one of the top 100 films is to be selected at random, and consider again the 2x2 table indicating how many of the top 100 films Allan and Beth have seen: Allan yes Allan no Total Beth yes 36 16 52 Beth no 9 39 48 Total 45 55 100 k) Recall the (unconditional) probability that Allan has seen the movie and the (unconditional) probability that Beth has seen it. Also recall the probability that they both have seen the film. Record these, along with the appropriate symbols, below: l) Now suppose that once the film has been selected, you learn the partial information that Allan has seen it. Given this information, determine the conditional probability that Beth has seen it by restricting your attention to the “Allan yes” row of the table and assuming that those films are equally likely. We use the notation P B A to denote the conditional probability of event B given event A. m) Determine how P B A relates to P A , P B , and P A B . [Hints: Actually, one of these three probabilities is irrelevant. Determine which two are relevant and how they relate to P B A by following your calculation from the table in (l). 8 Definition: The conditional probability of event B given event A is defined as follows: P A B , P B A P A provided that P A 0 . Note that when defining a conditional probability, it is essential to require that P A is positive since it would not make sense to condition on an event that is impossible in the first place. n) Use the definition to calculate the conditional probability that Allan has seen the movie given that Beth has seen it. Does the knowledge that Beth has seen it increase, decrease, or not affect the (unconditional) probability that Allan has seen it? Explain. o) Does P B A P A B in this case? To convince yourself that these need not even be close, consider selecting one American citizen at random. Let M be the event that the person is male, and let S be the event that the person is a U.S. Senator. Make an educated guess as to the values of P M S and P S M . Are they close? p) Use the definition of conditional probability to calculate P B Ac and P A B for the film c example. Does the knowledge that Beth has not seen the film increase, decrease, or not affect the probability that Allan has seen it? Explain. Independence Two events are said to be independent if knowledge that one occurs does not change the probability of other’s occurrence. In other words, the events are independent if the conditional probability of one given the other (e.g., P A B ) is the same as the unconditional probability of the one in the first place (e.g., P A ). Thus, in symbols, events A and B are independent if P A B P A . This is equivalent to requiring that P B A P B . 9 q) Express this condition for independence in terms of the probability of the intersection of A and B. [Hint: Use the definition of conditional probability on either of the expressions above.] You should find that another equivalent definition for A and B to be independent is that P A B P A P B . Mathematicians typically take this as the definition of independence and then prove that this is equivalent to the other two conditions for probability given above. Definition: Two events A and B are independent if and only if P A B P A P B . The following theorem follows from the definition above and basic properties of conditional probabilities: Theorem: Let A and B be any two events. Then the following are equivalent: (That is, if any one of the following statements is true, then all four must be true.) 1. A and B are independent, 2. P A B P A P B , 3. P A B P A , 4. P B A P B . As a consequence of this theorem, we only need to check any one of the conditions above to check for independence of two events. r) Are the events {Allan has seen the film} and {Beth has seen the film} independent? Defend your answer using any one of the equivalent definitions of independence. Then write a sentence or two explaining why your answer makes sense given the data in the table. Example: Randomly select one card from a standard deck of 52. Consider the following three events: A = the card is a heart B = the card is a face card C = the card is the king of hearts 13 1 . Similarly, there are 12 face cards (4 jacks, 4 First note that there are 13 hearts, so P A 52 4 12 3 1 queens, 4 kings), so P B . Finally, there is only one king of hearts, so P C . 52 13 52 10 If we know that the event A has occurred, then we know that the card is one of the 13 hearts. With 3 this knowledge that the card is a heart, the probability of the event B becomes P B | A . Note 13 3 P B . Thus A that this is the same as the unconditional probability of B; that is, P B | A 13 and B are independent – knowing that the card is a heart has no effect at all on the probability that the card is a face card. Similarly, knowing that the card is a face card has no effect at all on the probability that it is a heart (that is, P A | B P A ; verify this for yourself.) On the other hand, if we know that the event B has occurred, then we know that the card is one of 1 the 12 face cards, so P C | B . This is not the same as the unconditional probability P C , 12 so B and C are not independent (they are dependent). In this case, knowing that the card is a face card substantially increases the probability that it is the king of hearts – knowing that on of the two events has occurred does have a strong impact on the probability of the other occurring – so the are not independent. Finally, note that if we know that the event C has occurred, then the card is a face card, so P B | C 1 and P B | C P B , so we have verified in a second way that B and C are not independent. Axioms of Probability and Proofs of Basic Probability Rules In the explorations section above, we used intuition to discover rules for probability that seem to make sense. In fact, each of these rules can be proven using the axiomatic approach to probability. In the axiomatic approach, developed by the Russian mathematician Kolmogorov (1903-1987), we begin with a small number of axioms (or assumptions). These axioms are assumed to be self-evident, and then, on the basis of a few definite rules of mathematical and logical manipulation, all other results are carefully proven or derived from these axioms. Probability theory is based on the following three axioms. Let S denote the sample space of an experiment. Associated with each event A in S is a number P(A), called the probability of A which satisfies the following axioms: Axiom 1: P A 0 for every event A. Axiom 2: P S 1. Axiom 3: If A and B are mutually exclusive events, then P A B P A P B . Axiom 3a: (Needed if sample space is infinite.) If A1 , A2 , A3 , is an infinite sequence of mutually j 1 exclusive events, then P Aj P Aj . j 1 These axioms can be used to prove many of the other results discovered earlier in this chapter. Theorem 1: (Complement Rule) P Ac 1 P A . Proof: 11 Theorem 2: P 0. Proof: Let A = S. Then Ac . Now apply Theorem 2.1 and Axiom 2 to obtain P P Ac 1 P A 1 P S 1 1 0. Theorem 1 Axiom 2 Theorem 3: If A and B are events and A B, then P A P B . Proof: Theorem 4: For every event A, 0 P A 1. Proof: Theorem 5: (Addition Rule for Two Events) For any two events A and B, P A B P A P B P A B . Proof: 12 Corollary: (Bonferroni Inequality) For any two events A and B, P A B P A P B . Proof: By Axiom 1, P A B 0, so P A B 0. Using this inequality along with the result of Theorem 2.5, we have P A B P A P B P A B P A P B 0. □ Theorem 6: (Addition Rule for Three Events) For any three events A, B, and C, P A B C P A P B P C P A B P A C P B C P A B C . Proof: To prove this, write A B C A B C and then use Theorem 2.5 twice. The details are left as an exercise. Theorem 7: Let A and B be any two events. Then the following are equivalent: 1. A and B are independent, 2. P A B P A P B , 3. P A B P A , 4. P B A P B . Proof: Numbers (1) and (2) are equivalent by the definition of independence. We will prove here that (2) and (3) are also equivalent. The proof that (2) and (4) are equivalent is left as an exercise. Since all of the other statements are equivalent to (2), all four are equivalent. We begin by showing that (2) implies (3). By the definition of conditional probability, P A B P A | B . Assuming that (2) is true, this can be rewritten as P B P A P B P A . Thus if (2) is true, then (3) must also be true. P B To prove that (2) and (3) are equivalent, we also need to prove the converse of the statement above. That is, we need to prove that if (3) is true, then (2) must also be true. By P A B the definition of conditional probability, P A | B . Assuming that (3) is true, P B P A | B we can rewrite this as P A P A B , so multiplying by P B yields the desired P B result: P A B P A P B . Thus (3) implies (2), completing the proof. □ Theorem 8: If A and B are independent events, then each of the following pairs of events are also independent: 1. A and B c 2. Ac and B 3. Ac and B c 13 Proof: We will prove (1) here. The proofs of (2) and (3) are left as an exercise. Assume that A and B are independent. Note that the event A can be broken into two parts: the part of A that is inside of B, which is A B denoted A B or just AB, and the part of A that is outside of B, c which is denoted A B c or just ABc. AB AB So A AB AB c . Furthermore, the events AB and ABc are clearly mutually exclusive (an outcome cannot be both in B and not in B). Now, using Axiom 3 along with the fact that AB and ABc are mutually exclusive, we obtain P A P AB AB c P AB P AB c . Since A and B are independent, P AB P A P B , so the equation above becomes P A P A P B P AB c . Subtracting P A P B from both sides and then factoring yields P AB c P A P A P B P A 1 P B . Finally, by the complement rule (Theorem 2.1), 1 P B P B c , so we have P AB c P A P B c . But this means that A and B c are independent (by the definition of independence). □ Exercises 1. Suppose that you flip two fair coins. Is the sample space of equally likely outcomes properly represented as {HH, TH, HT, TT} or as {2 heads, 2 tails, 1 of each}? Explain. 2. For each of the following situations, list the sample space (that is, list all possible outcomes) and also indicate whether it seems reasonable to assume that all of the outcomes are equally likely. If not, include a short explanation. a) whether or not you pass this course b) your grade in this course c) the color of a randomly selected M&M candy d) the outcome of the roll of a fair die e) the sum of the outcomes of independently rolling two fair dice f) a tennis racquet landing with the label “up” or “down” when spun g) the last digit of the Social Security Number of a randomly selected American h) the number of flips of a fair coin until the first “heads” appears 3. Suppose that you independently roll two fair four-sided dice. (Each die is equally likely to land on 1 or 2 or 3 or 4.) a) Using the notation (x,y) to mean that the first die lands on x and the second on y, list all 16 outcomes in the sample space. b) List all of the outcomes in the following events: o A = {the first die lands on 2} o B = {the sum of the dice exceeds 5} o C = {the first die lands on a larger number than the second die} o D = {the difference between the two dice is one or less} 14 c) Determine the probability of each event listed in b). d) Now suppose that the two dice were fair but six-sided. Recalculate the probabilities of the events listed in b). (You need not list out all of the outcomes in each event.) 4. Identify which of the following are legitimate uses of event/probability notation and which are not. Give an explanation for the ones that are not. d) P A B P A B a) P A B b) P A P B c) P A Bc A B c e) P f) P AB c 5. Suppose that you roll two fair, six-sided dice. Consider the events: A = {the first die lands on 2} B = {the sum of the dice equals 7} C = {the first die lands on a larger number than the second die} D = {the difference between the two dice is one or less} E = {the sum of the dice equals 11}. a) Identify all pairs of these events that are disjoint. Explain your answers. b) Identify all pairs of these events that are independent. Justify your answers with appropriate calculations. c) Identify one pair of these events with the property that learning that one has occurred makes the other more likely to have occurred. Justify your answer with appropriate calculations. d) Identify one pair of these events with the property that learning that one has occurred makes the other less likely to have occurred. Justify your answer with appropriate calculations. 6. Suppose that you hear a weather forecast announcing that the probability of rain on Saturday is 50% and that the probability of rain on Sunday is 50%. Define the events A = {rain on Saturday} and U = {rain on Sunday}. a) If A and U are independent, what is the probability of rain on at least one day of the weekend? b) If P U A .8 , what must be true of P U Ac ? In this case what is the probability of rain on at least one day of the weekend? c) What is the largest possible value for P A U ? What has to be true of P U A to achieve this value? d) What is the smallest possible value for P A U ? What has to be true of P U A to achieve this value? 7. Given the following probabilities: P A 0.5 P B 0.6 P A B 0.3 P A B C 0.1 P C 0.6 P A C 0.2 P B C 0.3 find the conditional probabilities P A B , P B C , P A B c , and P B c A C . 15 8. Let A and B be two events and let P A 0.4 , P B p , and P A B 0.8 . a. For what value of p will A and B be mutually exclusive? b. For what value of p will A and B be independent? 9. A survey is conducted to determine the sources that people in a large metropolitan area use to get news. The survey indicates that 77% obtain news from television, 63% from newspapers, 47% from radio, 45% from television and newspapers, 29% from television and radio, 21% from newspapers and radio, and 6% from all three. a. Sketch a Venn diagram and fill in all of the appropriate probabilities. b. What proportion of people obtain news from television, but not newspapers? c. What proportion of people do not obtain news from either television or radio? d. What proportion of people do not obtain news from any of these three sources? e. Given that radio is a news source, what is the probability that a newspaper is also a news source? f. Given that TV is a news source, what is the probability that radio is not a news source? g. Given that both newspaper and radio are news sources, what is the probability that TV is not a news source? 10. Prove Theorem 2.6. Make sure that every step of your proof is clearly explained and justified. Indicate all other theorems or axioms that you make use of in your proof. 11. Prove parts (2) and (3) of Theorem 2.8. Make sure that every step of your proof is clearly explained and justified. Indicate all other theorems or axioms that you make use of in your proof. 12. Show that the 3 axioms of probability are satisfied by conditional probabilities. In other words, if P B 0, prove that a. P A | B 0, b. P B | B 1, c. If A1 and A2 are mutually exclusive, then P A1 A2 | B P A1 | B P A2 | B . Make sure that every step of your proof is clearly explained and justified. Indicate all theorems or axioms that you make use of in your proof. 16