Conditional probability and Statistically Independent Events Introduction • In many real-world “experiments”, the outcomes are not completely random since we have some prior knowledge. • Knowing that it has rained the previous two days might influence our assignment of the probability of sunshine for the following day. • The probability that an individual chosen from some general population weights more than 90 kg, knowing that his height exceeds 6ft. • We might inquire the probability of obtaining a 4, if it is known that the outcome is an even number Joint Events and Conditional Probability • The event of interest A = {4}, and the event describing our prior knowledge is an even outcome or B = {2,4,6} • Note, conditional probability does not address the reasons for the prior information. The odd outcomes are shown as dashed lines and are to be ignored, then P[A] = 9/25 9 is the total number of 4’s 25 is the number of 2’s, 4’s and 6’s Joint Events and Conditional Probability Determine the probability of A = {1,4}, knowing that outcome is even. • We should use N AÇB / N B to make sure we only count the outcomes that can occur in light of our knowledge of B. • Let S = {1,2,3,4,5,6} be the sample space, and NS its size, the probability of A given B is N AÇB N AÇB N S P[A Ç B] = » NB N B NS P[B] The conditional probability is denoted by P[A | B] and defined as P[A Ç B] P[B] 1/ 6 A Ç B = {4} Ç {2, 4,6} = {4} = A, B = {2, 4,6} ® P[A | B] = = 1/ 3 3/6 P[A | B] = Height and weights of college students • A population of college students have heights H and weights W which are grouped into ranges. • Determine the probability of the event A that the student has a weight in the range 130-160 lbs. 5 P[A] = å P[(H i ,W2 )] = 0.04 + 0.12 + 0.06 + 0.02 + 0 = 0.24 i=1 • Determine the probability of the event A, given that the student has height less than 6’ (event B). Probability P[A Ç B] 0.04 + 0.12 + 0.06 P[A | B] = = = 0.33 P[B] 0.14 + 0.26 + 0.26 changed Marginal probability • The probabilities P[Hi] are called the marginal probabilities since they are written in the margin of the table. 3 5 P[B] = å P[H i ] P[H i ] = å P[(H i ,W j )] i=1 j=1 • If we were to sum along the columns, then we would obtain the marginal probabilities for the weights or P[Wj]. 5 P[W j ] = å P[(H i ,W j )] i=1 Conditional probability • The probability of the event A that the student has a weight in the range 130-160 lbs. 5 P[A] = å P[(H i ,W2 )] = 0.04 + 0.12 + 0.06 + 0.02 + 0 = 0.24 i=1 • Probability of even A, given that the students height is less than 6’, the probability of the event has changed to P[A Ç B] 0.04 + 0.12 + 0.06 P[A | B] = = = 0.33 > P[A] P[B] 0.14 + 0.26 + 0.26 • Probability of even A, given that the students height is greater than 6’, the probability of the event has changed to P[A | B] = P[A Ç B] 0.02 + 0 = = 0.058 < P[A] P[B] 0.22 + 0.12 Conditional probability • In general P[A | B] > P[A] P[A | B] < P[A] P[A | B] = P[A] Statistically independent A compound experiment • Two urns contain different proportions of red and black balls. • Urn 1 has a proportions p1 of red balls and 1-p1 of black balls. • Urn 2 has proportions of p2 and 1-p2 red balls and black balls. • An urn is chosen at random, followed by the selection of a ball. • Find a probability that red ball is selected. • Do B1 = {urn 1 chosen} and B2 = {urn 2 chosen} partition the sample space? • We need to show B1 È B2 = S and B1 Ç B2 = 0. Probability of error in a digital communication system • In digital com. System a “0” or “1” is transmitted to a receiver. • Either bit is equally likely to occur so that prior probability of ½ is assumed. • At the receiver a decoding error e can be made due to channel noise, so that a 0 may be mistaken for a 1 and vice versa. Similar to urn problem Monty Hall problem Behind one door there is a new car, while the others concealed goats. The contestant would first have the opportunity to choose a door, but it would not be opened. Monty would then choose the door with a goat. The contestant was given is given the option to either open the door that was chosen or open the other door. Question is open of switch? http://www.youtube.com/watch?v=mhlc7peGlGg Monty Hall problem • Assume the car is behind door 1. • Define the events Ci = {contestant initially chooses door i} for i = 1,2,3 and Mj = {Mounty opens door j} for j = 1,2,3. • Determine the joint probabilities P[Ci, Mj] by using P[Ci, Mj] = P[Mj | Ci]P[Ci] • Since the winning door is never chosen by Monty, we have P[M1|Ci] = 0. Also Monty never opens the door initially chosen by the contestant so that P[Mi|Ci] = 0.Then • P[M2 | C3] = P[M3 | C2] = 1 • P[M3 | C1] = P[M2 | C1] = ½ • P[Ci] = 1/3 (contestant chooses losing door) (contestant chooses winning door) Statistically Independent Events • Two events A and B are said to be statistically independent if P[A|B] = P[A] • Thus P[A | B] = P[A Ç B] = P[A] P[B] P[A Ç B] = P[A]P[B] • Knowing that B has occurred does in fact affect that possible outcomes. However, it is the ratio of P[A Ç B] to P[B]that remains the same. Statistical independence does not mean one event does not affect another event • If a fair die is tossed, the probability of a 2 or a 3 is P[A={2,3}] = 1/3. Now assume we know the outcome is an even number or B = {2,4,6}. Recomputing the probability P[A | B] = P[A Ç B] P[{2}] 1 = = = P[A] P[B] P[{2, 4,6}] 3 • The event P[A Ç B] has half as many elements as A, but the reduced sample space S’ = B also has half as many elements. Statistically Independent Events: Important points • We need only know the marginal probabilities or P[A], P[B] to determine the join probability P[A Ç B] = P[A]P[B]. • Hence, we need only know the marginal probabilities or P[A], P[B]. • Statistical independence has a symmetry property. If A is independent of B, then B must be independent of A since P[B | A] = P[B Ç A] P[A Ç B] P[A]P[B] = = = P[B] P[A] P[A] P[B] commutative A is independent property of B Statistical independent events are different than mutually exclusive events • If A and B are mutually exclusive and B occurs, then A cannot occur. P[A|B] = 0 • If A and B are statistically independent and B occurs, then P[A|B] = P[A]. The probabilities are only the same if P[A] = 0. • The conditions of mutually exclusivity and independence are different. P[A È B] = P[A] + P[B] P[A Ç B] = P[A]P[B] Mutually exclusive events Statistically independent events Statistically Independent three or more Events • Three events are defined to be independent if the knowledge that any one or two of the events has occurred does not affect the probability of the third event. P[A | B Ç C] = P[A] • Note that if B and C has occurred, then by definition B Ç C has occurred. • The full set of conditions is P[A | B] = P[A | C] = P[A | B,C] = P[A] P[B | A] = P[B | C] = P[B | A,C] = P[B] P[C | A] = P[C | B] = P[C | A, B] = P[C] • These conditions are satisfied if and only if P[AB] = P[A]P[B] P[AC] = P[A]P[C] P[BC] = P[B]P[C] P[ABC] = P[A]P[B]P[C] Pairwise independent Statistically Independent three or more Events • We need P[ABC] = P[A]P[B]P[C] because P[A | B,C] = P[ABC] P[ABC] P[A]P[B]P[C] = = = P[A] P[BC] P[B]P[C] P[B]P[C] • In general, events E1, E2, …, EN are defined to be statistically independent if P[Ei E j ] = P[Ei ]P[E j ] P[Ei E j Ek ] = P[Ei ]P[E j ]P[Ek ] i¹ j i¹ j¹k ... P[E1E2 ...EN ] = P[E1 ]P[E2 ]...P[EN ] • Statistical independence allows us to compute joint probabilities based on only the marginal probabilities. Probability chain rule • We can still determine joint probabilities for dependent events, but it becomes more difficult. Consider three events P[ABC] = P[A | B,C]P[BC] = P[A | B,C]P[B | C]P[C] chain rule • It is not always an easy matter to find conditional probabilities. • Example: Tossing a fair die • If we toss a fair die, then the probability of the outcome being 4 is 1/6. • Let us rederive this result by using chain rule. Letting • A = {event number} = {2, 4, 6} • B = {numbers > 2} = {3, 4, 5, 6} • C = {numbers < 5} = {1, 2, 3, 4} • We have that ABC = {4}. Applying chain rule and noting that BC = {3,4} we have 1/ 6 2 / 6 4 1 P[ABC] = P[A | B,C]P[B | C]P[C] = × × = 2/6 4/6 6 6 Bayes’ Theorem • Recall conditional probability P[A | B] = P[AB] P[B] P[B | A] = P[AB] P[A] • Substituting P[AB] from P[A | B] to P[B | A] we obtain obtain Bayes’ theorem. P[A | B]P[B] P[B | A] = P[A] • By knowing the marginal probabilities P[A], P[B] and conditional probability P[A | B] we can determine the other conditional probability P[B | A]. • The theorem allows us to perform inference or to assess the validity of an event when some other event has been observed. Bayes’ Theorem: Example • If an urn containing an unknown composition of balls is sampled with replacement and produces an outcome of 10 red balls, what are we to make of this? ? • Are all balls red? • Is the urn fair (half red and half black)? • To test that the urn is a “fair” one, we determine the probability of a fair urn given that 10 red balls have just drawn. • Usually we compute the probability of choosing 10 red balls given a fain urn, now are going “backwards”. Now we are given the outcomes and wish to determine the probability of a fair urn. Bayes’ Theorem: Example • We believe that the urn is fair with probability 0.9 (from our past experience). B = {fair urn}, so P[B] = 0.9. • If A = {10 red balls drawn}, we wish to determine P[B|A]. • This probability is our reassessment of the fair urn in light of the new evidence (10 red balls drawn). • According to Byes theorem we need conditional probability P[A|B]. P[A|B] is the probability of drawing 10 successive red balls from an urn with p = ½. æ P[A | B] = P[k = 10] = ç è æ =ç è M ö k M -k p (1p) = ÷ k ø 10 10 10 ö 0 1 / 2 (1 / 2) = 1 / 2 ( ) ( ) 10 ÷ø • By now we know P[B], P[A|B]. We need to find P[A]. Bayes’ Theorem: Example • P[A] can be found using the law of total probability as P[A] = P[A | B]P[B]+ P[A | B]P[B] = P[A | B]P[B]+ P[A | B](1- P[B]) • And thus only P[A | B] need to be determined. This is the conditional probability of drawing 10 red balls from a unfair urn (assume all balls in the urn are red) and thus P[A | B] = 1. 10 æ 1ö P[A] = ç ÷ (0.9) + (1)(0.1) è 2ø • The posterior probability (after 10 red balls have been drawn) (1 / 2)10 (0.9) P[B | A] = = 0.0087 10 (1 / 2) (0.9) + (1)(0.1) Reject fair urn hypothesis • The odds ratio is interpreted as the odds against the hypothesis of a fair urn. P[B | A] 1- 0.0087 odds = P[B | A] = 0.0087 = 113 Binary communication system Binary communication system One more example Multiple Experiments • The experiment of the repeated tossing coin can be view as a succession of subexperiments (single coin toss). • Assume a coin is tossed twice and we wish to determine P[A], where A ={(H,T)}. For a fair coin it is ¼ since we assume that all 4 possible outcomes are equally probable. • If the coin had P[{H}]=0.99, the answer is not straight forward. How to determine such probabilities? Multiple Experiments • Consider the experiment composed of two separate subexperiments with each subexperiment having a sample space S1 = {H, T}. The sample space of the overall experiment is S = S1 ´ S1 = {(i, j) :i ÎS1, j ÎS1} = {(H, H ),(H,T ),(T, H ),(T ,T )} • P[complicated events] by determining P[events defined on S1]. • The statement is correct if we Assume that the subexperiments are independent. • So what is P[A] = P[(H,T)] is a coin with P[{H}] = p. • This event is define on the sample space of 2-tuples, which is S. {(H,T )} = {(H, H ),(H,T )} Ç {(H,T ),(T ,T )} = {heads on toss 1} Ç {tails on toss 2} = H1 ÇT2 Multiple Experiments • Subexperiments are independent, hence P[(H,T)] = P[H1][T2], • We can determine P[H1] either as P[(H,H), (H,T)] or as P[H] due to independence assumption. • P[H] is a marginal probability and is equal to P[(H,H)] + P[(H,T)] . • • • • P[H1] = p P[T2] = 1 – p. and finally P[(H,T)] = p(1-p). Generally, if we have M independent subexperiments, with Ai an event described for experiment i, then the joint event A = A1 Ç A2 Ç...Ç AM has probability P[A] = P[A1 ]P[A2 ]...P[AM ] Equivalence to statistical independence of events defined on the same sample space. Bernoulli Sequence • Any sequence of M independent subexperiments with each subexperiment producing two possible outcomes is called a Bernoulli sequence • Typically, for a Bernoulli trial P[0] = 1 – p, P[1] = p. Binomial Probability Law • Assume that M independent Bernoulli are carried out. • We wish to determine the probability of k’s (successes) and M-k 0’s. • Thus, each successful outcome has a probability of pk(1 - p)M-k. • The total number of successful outcomes is the number of ways k 1’s may be place in the M-tuple i.e. æ M ö . çè k ÷ø • Hence, we have æ M ö k M -k P[k] = ç p (1p) è k ÷ø M = 10, p = 0.5 Binomial law M = 10, p = 0.7 Geometric Probability Law • If we let k be the Bernoulli trial for which the first success is observed, then the event of interest us the simple even (f,f,…,f,s), where s, f denote success and failure. • The probability of the first success at trial k is therefore • The first success is always most likely to occur on the first trial(k = 1) p = 0.25 p = 0.5 • The average number of trials required for success is 1/p. Example: Telephone calling • A fax machine dials a phone number that is typically busy 80% of the time. What is the probability that the fax machine will have to dial the number 9 times? • The number of times the line is busy can be considered the number of failures with each failure having a probability of 1- p = 0.8. • If the number is dialed 9 times, then the first success occurs for k = 9 and P[9] = (0.8)8(0.2) = 0.0336 Noindependent Subexperiments • The assumption of independence can sometimes be unreasonable. In the absence of independence, the probability would found by using chain rule. P[A] = P[AM|AM-1,…,A1]P[AM-1|AM-2,…,A1]…P[A2|A1]P[A1] • Dependent Bernoulli trials • • • • Assume we have two coins. One fair and the other is weighted (p≠1/2) We randomly choose a coin and toss it. If it comes up head, we choose the fair coin to use on the next trial. If it comes up tails, we choose the weighted coin to use on the next trial. M = 100, p = 0.25 M = 100, p = 0.5 Markov Sequence • The dependency between trials is due only to the outcome of the (i - 1) trial affecting the outcome of the ith trial. • The previous outcome is called the state of the sequence. State probability diagram • The Bernoulli sequence, in which the probability of trials i depend only on the outcome of the previous trial, is state transition called a Markov sequence. P[Ai|Ai-1, Ai-2,…,A1] = P[Ai|Ai-1] P[A] = P[AM|AM-1]P[AM-1|AM-2]…P[A2|A1]P[A1] Markov Sequence: example • We wish to determine the probability of N = 10 tails in succession or the event A = {(0,0,0,0,0,0,0,0,0,0)}. • If the coin is fair, then P[A] = (1/2)10 = 0.000976, • If the coin is weighted (P[1] = ¼ ), then æ 10 ö P[A] = ç Õ P [ Ai | Ai-1 ]÷ P[A1 ] è i=2 ø • but P[Ai|Ai-1] = P[0|0] = P[tail | weighted coin] = ¾ for i = 2,3,…,10. • Initial choice of the coin is random, hence P[A1 ] = P[0] = P[tail | weighted]P[weighted]+ P[tail | fair]P[ fair] æ 3ö æ 1ö æ 1ö æ 1ö æ 5ö = ç ÷ç ÷ +ç ÷ç ÷ = ç ÷ è 4ø è 2ø è 2ø è 2ø è 8ø æ 10 3 ö æ 5 ö P[A] = ç Õ ÷ ç ÷ = 0.0469 è i=2 4 ø è 8 ø 48 times larger Trellis diagram • The probability of any sequence is found by tracing the sequence values through the trellis diagram and multiplying the probabilities for each branch together. • The sequence 1,0,0 has a probability of (3/8)(1/2)(3/4). Real-world example: Cluster recognition • We wish to determine if a cluster of event has occurred. • By cluster we mean that more occurrences of an event are observed than would normally be expected. Hit rate is 29/2500 = 1.16% Real-world example: Cluster recognition • The shaded area exhibit more hits than the expected 145 x 0.0116 = 1.68 number. Is this cluster? • Let’s apply Bayesian approach and calculate odds ration to answer the question. odds = P[no cluster | observed data] P[cluster | observed data] • B = {cluster} and • A = {observed data} odds = P[B | A] P[A | B]P[B] = P[B | A] P[A | B]P[B] • P[A] cancelles out Real-world example: Cluster recognition • We need to determine P[B], P[A | B], P[A | B] • P[B] is prior probability, since we believe a cluster is quite unlikely, we assign probability of 10-6. • The probability of the observed data if there is no cluster is calculated using Bernoulli sequence (array). If M cells contained in the supposed cluster area (M = 145) then the probability of k hits is æ M ö k M -k P[k] = ç p (1p) è k ÷ø • Setting pnc = 0.01, which is less then 0.0116 the hit rate, and denotes the probability of no cluster. • The probability that there is a cluster is set to pc = 0.1. 11/145 = 0.07 Real-world example: Cluster recognition æ M ö k M -k P[A | B] = P[observed data | no cluster] = ç p (1p ) nc nc è k ÷ø æ 145 ö 11 134 = P[k = 11| no cluster] = ç (0.01) (0.99) è 11 ÷ø æ M ö k M -k P[A | B] = P[observed data | cluster] = ç p (1p ) c c è k ÷ø æ 145 ö 11 134 = P[k = 11| cluster] = ç (0.1) (0.9) è 11 ÷ø • Which results in an odds ratio of P[A | B]P[B] (0.01)11 (0.99)134 (1-10 -6 ) odds = = = 3.52 P[A | B]P[B] (0.1)11 (0.9)134 (10 -6 ) Reject cluster hyp. • Mainly the influence of the small prior probability of a cluster, P[B]=10-6, that has resulted in the greater than unity odds ratio. Practice problems 1. 2. Practice problem 3. 4. Practice 5. 6. 7. Practice 8. Practice 9. In a culture used for biological research the growth of unavoidable bacteria occasionally spoils results of an experiment that requires at least three out of five cultures to be unspoiled to obtain a single datum point. Experience has shown that about 5 of every 100 cultures are randomly spoiled by the bacteria. If the experiment requires three simultaneously derived , unspoiled data points for success, find the probability of success for any given set of 15 cultures (three data pointes of five cultures each). Homework problems 1. 2. 3. Homework 4. 5. 6. 7*.