Probability and Bayes Theorem Pregnancy Test Kit The test kit may either be accurate, or inaccurate. Actually pregnant Actually not pregnant Test kit shows +ve Correct +ve diagnosis (Sensitivity, or Power) Incorrect +ve diagnosis Test kit shows –ve Incorrect –ve diagnosis Correct –ve diagnosis (Specificity) Sensitivity or Specificity? • Objective of the experiment • HIV diagnostic kit, 99.9% sensitive and 99.5% specific Correct identification of HIV +ves Correct identification of HIV -ves What do these numbers mean? What’s the interpretation of probability and chance? Data exploration and Statistical analysis 1. Data checking, identifying problems and characteristics 2. Understanding chance and uncertainty Probability • A mathematical attempt at explaining random phenomenon • Examples: • Flipping a coin • the bus arriving within the next 5 minutes Types of probability Experimental (empirical) probabilities Arbitrarily good estimates of the probability of a certain outcome of an experiment can be obtained by repeating the experiment sufficiently often. (e.g. flipping a coin, rolling a die) Subjective probabilities Some phenomena are observed just once and repeated experiments are impossible. When there is no optimal model, probabilities are often based on subjective judgement. (e.g. flash floods in Orchard Road, Wall Street crashes) Understanding probability • Need to know what are the possible outcomes (or sample space (S), in geek-speak) • Is the outcome predictable, or random? Basic definitions in probability • Union of events: A B means that A or B (or both). • Intersection of events: A B means that both A and B occur at the same time. • Complementary event: The complement of an event A, denoted A’ or , occurs if A does not occur. • Mutually exclusive events: Two events are mutually exclusive if they cannot occur at the same time. Suppose that you plan to roll a die just once. Let A be the event that you get an odd number, and B the event that you get a six. Then A and B are mutually exclusive. Basic rules in probability • P(A) 0, for any event A, i.e. probabilities are always positive. • P(S) = 1, which means that some element of the probability space will occur for sure as outcome of our random experiment. Note that the empty set is also found in the probability space. • P(A B) = P(A) + P(B), if A and B are mutually exclusive. It is actually assumed that this rule generalizes to an arbitrary number of mutually exclusive events. Example 1 Example 1: Flip a fair coin twice and count the number of heads. Let S = {0, 1, 2} denote the sample space. Are all elements of S equally likely? Combinatorics • Often sample spaces can be quite large. In such cases, combinatorics is helpful to figure out the precise size. • Multiplication rule Suppose that an experiment (or a procedure) is carried out k times and that in each step there are n possible outcomes. All combinations of individual outcomes are possible. Then there is a total of n n … n = nk k times possible outcomes sequences. Example 2 Example 2: Suppose a die is rolled six times and consider the sample space S of all possible outcome sequences. One element of S would be for instance (4, 2, 2, 6, 1, 3). What is the number of elements of S? Generalized multiplication rule Suppose that an experiment (or a procedure) consists of k steps and that there are nj possible outcomes for step j. All combinations of individual outcomes are possible. Then there is a total of n1 n2 … nk possible outcome sequences. Example 3 Example 3: Suppose that a license number consists of four digits followed by three letters (uppercase). How many different license numbers are there? Example 4 Example 4: DNA strands consist of nucleotide sequences. There are four possible nucleotides, labeled A, C, T, G. Find the total number of possible nucleotide sequences of length 50. What is the chance that a randomly composed sequence of length 50 will start with the letters “TATA”? Permutation How many different arrangements of the letters a, b and c are possible? It is easy to check that there are six, namely abc, acb, bac, bca, cab, cba Each arrangement is called a permutation. When arranging a larger number of items, direct enumeration quickly becomes unwieldy. For n different objects, there are n! = n (n – 1) … 2 1 possible arrangements. Example 5 Example 5: An important branch of statistics is experimental design. In agriculture and plant biology, methods of experimental design are used to find good ways to allocate different varieties of plant (that should be compared) to different experimental fields. Suppose you have six varieties of plant and six experimental fields. How many possible allocations of plants to fields are there? K out of N permutations Suppose we have to assign k different items to n different objects, one item per object (k n). The number of possible assignments is Example 6 Example 6: In football world championships, there are usually 32 participants. How many possible outcomes are there, if only the first three places are of interest? K distinct items out of N permutations Suppose that you are allocating n = n1 + n2 +…+ nk items. Among these n1 are of type 1, n2 are of type 2,…, and nk of type k. Items of the same type are not distinguishable. Then the number of distinct possible allocations is Example 7 Example 7: How many nucleotide sequences are there that consist of four A’s, three C’s, three T’s and no G? If a nucleotide sequence of length 10 is composed at random such that each letter has the same probability to occur at any position, what is the chance of getting one of the above mentioned nucleotide sequences? Choosing K items out of N irrespective of order If the order of the selection is irrelevant, there are possibilities to choose k objects from n, or “n choose k”, denoted as Example 8 Example 8: Suppose that there are 20 patients participating in a clinical study. We want to assign five of them to a control group that receives only a placebo. How many possible assignments are there? Example 9 Example 9: Suppose that there is a group of 23 people in a room. What is the chance that at least two of them have the same birthday? (assuming none of them were born in leap years) Example 10 Example 10: In ecology, animals are often caught, marked and released. When the same animal is recaptured, this provides valuable information concerning the size of a population (capture/recapture models). Development of animals and habits such as migration behavior can also be studied this way. Suppose that we have a population of 1000 birds in an area. A team of ecologists plans to capture 50 of them (one after the other), mark them and release them subsequently. Suppose that for all the birds, the probability of capture is the same (this will often be an oversimplification of reality). What is the chance that none of the birds is recaptured? (Consider the use of Stirling’s formula: ) Addition rule in probability P(E F) = P(E) + P(F) – P(E F) Example 11: We toss two fair coins. Let A denote the event that the first coin lands heads, and B the event that the second coin lands heads. Find P(A B). Example 12 Example 12: A total of 36 members of a club play tennis, 28 play squash, and 18 play badminton. Furthermore, 22 of the members play both tennis and squash, 12 play both tennis and badminton, 9 play squash and badminton and 4 play all three sports. How many members of the club play at least one of these sports? Mutually exclusive and independence If A and B are mutually exclusive events, i.e. A B = , then P(A B) = 0. Two events are independent if the occurrence or nonoccurrence of one event has no influence on the occurrence or non-occurrence of the other event. Events A and B are independent P(A B) = P(A) P(B) Dependent events and conditional probability Consider this scenario: If 2 cards are drawn from a deck, what is the probability that both will be diamonds? Simple? Alternatively, we could treat the situation as two outcomes: the outcome for the first draw of a card and the outcome for the draw of the second card. Let us denote event A as drawing a diamond on the first draw and event B as drawing a diamond on the second draw. In this case, the probability for the second event is dependent on the outcome of the first event. The probability that the second draw is a diamond given that the first card drawn is a diamond is denoted by P(B | A) and is called the conditional probability of the event B given that the event A has occurred. Probability tree diagram Definitions concerning conditional probability P(X | Y) denotes the probability that the event X occurs given that the event Y has occurred. If P(Y) > 0, then the conditional probability of the event X given the event Y is defined by This generalizes to yield the very useful Bayes’ Theorem. Partitioning A useful concept in biomedical sciences is the concept of partitioning. Given any events A and B, P(A) = P(A | B) P(B) + P(A | B’)P(B’) where B’ is the complement of B (or defined as “not B”). This generalizes to a useful result in probability, known as the Law of Total Probability Example 13: A hospital sends probes to one of three laboratories. Twenty percent of the probes are sent to lab A, 30% to lab B and 50% to lab C. Lab A is expensive, but produces incorrect results with a probability of only 0.0002. Lab B has an error probability of 0.0005 and lab C one of 0.0008. Suppose you go to this hospital and a probe is taken. What is your chance that you get a correct result, if you do not know to which of the labs your probe will be sent. (In practice, the error probabilities will also depend on the type and difficulty of the analysis required). Partitioning with Bayes’ Theorem Suppose that both P(A), P(B) > 0. Then This is an extremely useful representation! Example 14: An ectopic pregnancy is twice as likely to develop when the pregnant woman is a smoker as it is when she is a non-smoker. If 32% of women of childbearing age are smokers, what percentage of women having ectopic pregnancies are smokers? Example 15: There is a 50-50 chance that the queen carries the gene for hemophilia. If she is a carrier, then each prince has a 50-50 chance of having hemophilia. If the queen has had three princes without the disease, what is the probability that the queen is a carrier? Arrggghhhhh!!! (I thought I chose life sciences so I don’t need to do mathematics anymore!!!) So why do we need to learn all these theoretical probability in life sciences?! GIBBERISH?! Pregnancy Test Kit The test kit may either be accurate, or inaccurate. How do we interpret the findings from such a pregnancy test kit? Images from www.google.com Example 16: Suppose you (or your girlfriend) bought a pregnancy test-kit and observed a positive result upon testing. Recalling today’s lesson in probability, you immediately turned to be the back of the box, and it says: “… 99% chance of a true positive result, and a 99.9% chance of a true negative result.” What is the probability that you (or your girlfriend) are actually NOT pregnant, given that your prior belief is that you are equally likely to be either? Thinking that the test-kit may be faulty, you decided to buy another kit which showed another positive result. So what is the probability that you (or your girlfriend) are actually NOT pregnant, given the observation of two positive results? Students should be able to • understand the fundamental rules of probability • perform simple counting through combinatorics and permutation • understand the meanings of mutually exclusivity and independence • calculate conditional probabilities • understand partitioning and the use of the Law of Total Probability • understand and use Bayes’ Theorem for tackling practical problems in probability