Lecture 9 5.3 Discrete Probability 5.3 Bayes’ Theorem We have seen that the following holds: P( E F ) P( E | F ) P( F ) P( E P( F ) P( E F ) P( F | E ) P( F | E ) P( E ) P( E P( E ) P( E | F ) F) F) P( F | E ) P( E ) P( E | F ) P( F ) P( E | F ) P( F | E ) P( F ) P( E ) We can write one conditional probability in terms of the other: Bayes’ Theorem 5.3 Example: What is the probability that a family with 2 kids has two boys, given that they have at least one boy? (all possibilities are equally likely). S: all possibilities: {BB, GB, BG, GG}. E: family has two boys: {BB}. F: family has at least one boy: {BB, GB, BG}. E F = {BB} GG F E BB p(E|F) = (1/4) / (3/4) = 1/3 Now we compute the probability of P(F|E), what is the probability that a family with two boys has at least one boy ? P(F|E) = P(E|F) P(F) / P(E) = 1/3 * ¾ / ¼ = 1 BG GB 5.3 Expected Values The definition of an expected value of a random variable is: E ( X ) X ( s ) p( s) sS This equivalent to: E ( X ) P( X r ) r rX ( S ) Example: What is the expected number of heads if we toss a fair coin n times? We know that the distribution for this experiment is the Binomial distribution: n! P(k , n; p) p k (1 p)n k k !(n k )! 5.3 Therefore we need to compute: n E ( X ) k P( X k ) k 0 n n! k p k (1 p ) n k k !(n k )! k 0 np 5.3 Expectation are linear: Theorem: E(X1+X2) = E(X1) + E(X2) E(aX + b) = aE(X) + b Examples: 1) Expected value for the sum of the values when a pair of dice is rolled: X1 = value of first die, X2 value of second die: E(X1+X2) = E(X1) + E(X2) = 2 * (1+2+3+4+5+6)/6 = 7. 2) Expected number of heads when a fair coin is tossed n times (see example previous slide) Xi is the outcome coin toss i. Each has a probability of p of coming up heads. linearity: E(X1+...+Xn) = E(X1) + ... + E(xn) = n p. 5.3 More examples: A person checking out coats mixed the labels up randomly. When someone collects his coat, he checks out a coat randomly from the remaining coats. What is the expected number of correctly returned coats? There are n coats checked in. Xi = 1 of correctly returned, and 0 if wrongly returned. Since the labels are randomly permuted, E(Xi) = 1/n E(X1+...Xn) = n 1/n = 1 (independent of the number of checked in coats) 5.3 Geometric distribution Q: What is the distribution of waiting times until a tail comes up, when we toss a fair coin? A: Possible outcomes: T, HT, HHT, HHHT, HHHHT, .... (infinitely many possibilities) P(T) = p, P(HT) = (1-p) p, P(HHT) = (1-p)^2 p, .... P( X k ) (1 p)k 1 p geometric distribution (matlab) X(s) = number of tosses before success. Normalization: P( X k ) 1 k 1 (1 p) k 1 k 1 p 1 5.3 Geometric Distr. E ( X ) k (1 p ) k 1 p k 1 d p (1 p ) k dp k 1 Here is how you can compute the expected value of the waiting time: d 1 k p (1 p ) p dp p k 1 d 1 l 1 p (1 p ) p dp p l 2 d 1 l 1 p (1 p ) p p dp p l 1 p 1 p d 1 (1 p ) dp p 5.3 Independence Definition: Two random variables X(s) and Y(s) on a sample space S are independent if the following holds: r1 , r2 P( X (s) r1 Y (s) r2 ) P( X (s) r1 ) P(Y (s) r2 ) Examples 1) Pair of dice is rolled. X1 is value first die, X2 value second die. Are these independent? P(x1=r1) = 1/6 P(X2=r2)=1/6 P(X1=r1 AND X2=r2)=1/36 = P(X1=r1) P(X2=r2): YES independent. 2) Are X1 and X=X1+X2 independent? P(X=12) =1/36 P(X1=1)=1/6 P(X=12 AND X1=1)=0 which is not the product: P(X=12) P(X1=1) 5.3 Independence Theorem: If two random variables X and Y are independent over a sample space S then: E(XY)=E(X) E(Y). (proof, read book) Note1: The reverse is not true: Two random variables do not have to be independent for E(XY)=E(X)E(Y) to hold. Note2: If 2 random variables are not independent, it follows that E(XY) does not have to be equal to E(X)E(Y), although it might still happen. Example: X counts number of heads when a coin is tossed twice: P(X=0) =1/4 (TT) P(X=1)=1/2 (HT,TH) P(X=2) =1/4 (HH). E(X) = 1x½+2x1/4=1. Y counts the number of tails: E(Y)=1 as well (symmetry, switch role H,T). However, P(XY=0) = 1/2 (HH,TT) P(XY=1) =1/2 (HT,TH) E(XY) = 0x1/2 + 1x1/2=1/2 5.3 Variance The average of a random variable tells us noting about the spread of a probability distribution. (matlab demo) Thus we introduce the variance of a probability distribution: Definition: The variance of a random variable X over a sample space S is given by: variance V ( X ) ( X ( s ) E ( X )) 2 p( s ) sS E (( X E ( X )) 2 ) E ( X 2 ) 2 E ( XE ( X )) E ( E ( X ) 2 ) E ( X 2 ) 2E ( X )2 E ( X )2 E ( X 2 ) E ( X )2 standard deviation (X ) V (X ) (this is the width of the distribution) 5.3 Variance Theorem: For independent random variables the variances add: (proof in book) E ( X Y ) E ( X ) E (Y ) (always true) V ( X Y ) V ( X ) V (Y ) ( X , Y independent ) Example: 1) We toss 2 coins, Xi(H)=1, Xi(T)=0. What is the STD of X=X1+X2? X1 and X2 are independent. V(X1+X2)=V(X1)+V(X2)=2V(X1) E(X1)=1/2 V(X1) = (0-1/2)^2 x ½ + (1-1/2)^2 x ½ =1/4 V(X) = ½ STD(X)=sqrt(1/2). 5.3 Variance What is the variance of the number of successes when n independent Bernoulli trials are performed. V(X) = V(X1+...+Xn)=nV(X1) V(X1) = (0-p)^2 x (1-p) + (1-p)^2 x p = p^2(1-p) + p(1-p)^2=p(1-p) V(X)=np(1-p) (matlab demo)