Conditional Probability Mass Function Introduction P[A|B] is the probability of an event A, giving that we know that some other event B has occurred. Unless A and B are independent, B will affect the probability of A. P[A | B] = P[A ∩ B] P[B] Example: We choose a coin out of a fair and weighted coins and toss it 4 times. What’s the probability of observing 2 or more head? The probability depends on which coin is selected (condition). px[k| coin 1 chosen] is a Binomial PMF depends on p1 px[k| coin 2 chosen] is a Binomial PMF depends on p2 pX [k] = pX [k | coin 1 chosen]P[coin 1 chosen] + + pX [k | coin 2 chosen]P[coin 2 chosen] 2 Conditional Probability Mass Function Let X be the discrete RV describing the outcome of the coin choice ì 1 if coin 1 is chosen X=ì ì 2 if coin 2 is chosen Since SX = {1,2}, we assign a PMF to X of ì a pX [i ] = ì ì 1- a i =1 i=2 0 <a <1 The second part of the experiment consists of tossing the chosen coin 4 times in succession. SY = {0,1,2,3,4} The event A corresponds to 2 or more heads. 3 Conditional Probability Mass Function P[A] = å {(i , j ):(i , j )ÎA} 2 4 pX,Y [i, j ] = å å pX,Y [i, j ] i =1 j =2 Only the PMF is needed to determine the desired probability. To do so we need pX,Y [i, j ] = P[X = i,Y = j ] pX [i] = P[X = i ] By using the definition of conditional probability for events we have pX,Y [i, j ] = P[X = i,Y = j ] (definition of joint PMF) = P[Y = j | X = i ]P[X = i ] (definition of cond. prob.) (definition of marginal PMF) = P[Y = j | X = i ]pX [i ] 4 Conditional Probability Mass Function pX,Y [i, j ] = P[Y = j | X = i]pX [i] Is given earlier ì a pX [i ] = ì ì 1- a i =1 i=2 P[Y = j | X = i ] can be determined from the experimental description ì 4 ì j 4- j P[Y = j | X = i ] = ì p (1p ) , j = 0,1,2, 3, 4 i i ì ì j ì Note, that probability depends on the outcome X = i via pi. For a given value of X = i , the probability has all usual properties of a PMF 4 0 £ P[Y = j | X = i] £ 1 å P[Y = j | X = i ] = 1 j =0 5 Conditional Probability Mass Function Then pY|X [ j | i ] = P[Y = j | X = i] is a conditional PMF i = 1, p1 = 1/ 4 i = 2, p2 = 1 / 2 Now we know pY|X[j|i] and pX we have pX,Y [i, j ] = pY|X [ j | i]pX [i] 6 Conditional Probability Mass Function The joint PDF is then given by ì 4 ì j 4- j pX,Y [i, j ] = ì p 1p a+ ( ) i ì i j ì ì ì 4 ì j 4- j +ì p 1p (1- a ) ( ) i ì 2 j ì ì i = 1; j = 0,1,2, 3, 4 i = 2; j = 0,1,2, 3, 4 Finally the desired probability of even A is 4 4 j =2 j =2 P[A] = ì pX,Y [1, j ] + ì pX,Y [2, j ] = 4 ì ì 4 ì j 4 ì j 4- j 4- j =ìì p 1p a + p 1p (1- a ) ( ) ( ) ì i i ì i ì j ì 2 j ì ì j =2 ì j =2 ì 4 As an example, if p1 = ¼ and p2 = ¾, we have for α = ½, that P[A] = 0.6055, but if α = 1/8, then P[A] = 0.8633. Why?? 7 Conditional Probability Mass Function The conditional PMF can be expressed as pX,Y [i, j ] pY|X [ j | i ] = pX [i ] To make connection with cond. probability let’s rename P[X = i,Y = j ] P[Aj ∩ Bi ] pY|X [ j | i ] = P[Y = j X = i ] = = = P[Aj Bi ] P[X = i ] P[Bi ] Hence, pY|X[j|i] is a conditional probability for the events Aj and Bi. 8 Joint, Conditional, and Marginal PMFs Conditional PMF is defined as pX,Y [xi , yj ] pY|X [yj | xi ] = pX [xi ] Each PMF in the family is a valid PMF when xi is considered to be a constant. In previous example {pY|X[j|1], pY|X[j|2]} is a family or valid PMFs. ¥ åp Y|X ¥ åp [ j |1] = 1 Y|X j =-¥ j =-¥ ¥ But not [ j | 2] = 1 åp Y|X [ j | i] = 1 i =-¥ 9 Example: Two Dice toss Two dice are tossed. All outcomes are equally likely. The numbers of dots are added together. What’s the cond. PMF of the sum if it’s known the sum is even? Let Y is the sum ì 0 if sum is odd X=ì ì 1 if sum is even We wish to determine pY|X[j|0] and pY|X[j|1] for all j. The sample space for Y is SY = {2,3,…,12}. 10 Example: Two Dice toss Conditional probability if the sum being even and also equaling j pX,Y [1, j ] pY|X [ j |1] = , j = 2, 4,6,8,10,12 pX [1] or pY|X [ j |1] = pX [ j ], j = 2, 4,6,8,10,12 pX [ j ] N j (1 / 36 ) 1 pY|X [ j |1] = = = Nj 1/ 2 1/ 2 18 Nj is the number of outcomes in SX,Y for which the sum is j. 11 Example: Two Dice toss ì ì ì ì ì ì ì pY|X [ j |1] = ì ì ì ì ì ì ì ì 1 /18 j =2 3 /18 j=4 5 /18 j =6 5 /18 j =8 3 /18 j = 10 1 /18 j = 12 ì ì ì ì ì ì p [ j | 0] = ì Y|X Note that ì ì p [ j |1] = 1 Y|X j ì ì ì ì p [ j | 0] ¹ 1- p [ j |1] å Y|X 2 /18 j=3 4 /18 j =5 6 /18 j=7 4 /18 j =9 2 /18 j = 11 Y|X 12 Properties of PMF Property 1. Joint PMF yields conditional PMFs If the joint PMF pX,Y[xi, yj] is known, then the conditional PMFs are pY|X [yj | xi ] = pX,Y [xi , yj ] åp X,Y [xi , yj ] j pX [xi ] pY|X [xi | yj ] = pX,Y [xi , yj ] åp X,Y [xi , yj ] i pY [yi ] Hence, the cond. PMF is the joint PMF with xi fixed and then normalized so that it sums to one. 13 Properties of PMF Property 2. Conditional PMFs are related pY|X [yj | xi ]pX [xi ] pX|Y [xi | yj ] = pY [yj ] pY,X [xi , yj ] pX|Y [xi | yj ] = pY [yj ] Proof: but therefore pY,X [yj , xi ] = P[Y = yj , X = xi ] = P[ X = xi ,Y = yj ] = pX,Y [xi , yj ] pY,X [xi , yj ] pX|Y [xi | yj ] = pY [yj ] (*) Using pX,Y[xi, yj] = pY,X[yj|xi]pX[xi] yields the desired the results. 14 Properties of PMF Property 3. Conditional PMF is expressible using Bayes’ rule pY|X [yj | xi ] = pX|Y [xi | yj ]pY [yj ] åp X|Y Proof: From property 1 j pY|X [yi | x j ] = and using (*) we have [xi | yj ]pY [yj ] pX,Y [xi , yj ] åp X,Y [xi , yj ] (**) j pX,Y [xi , yj ] = pX|Y [xi | yj ]pY [yj ] substituting it into (**) yields the desired results 15 Properties of PMF Property 4. Conditional PMF and its corresponding marginal PMF yields the joint PMF pX,Y [xi , yj ] = pY|X [yj | xi ]pX [xi ] pX,Y [xi , yj ] = pX|Y [xi | yj ]pY [yi ] Property 5. Conditional PMF and its corresponding marginal PMF yields the other marginal PMF pY [yj ] = å pY|X [yj | xi ]pX [xi ] i This is the law of total probability. 16 Conditional PMF relationships Can also interchange X and Y for similar results 17 Simplifying Probability Calculations Using Conditioning Conditional PMFs can be used to simplify probability calculations. Find Z = X + Y, if X and Y are independent. If X were known X = i we can find the PMF of Z because Z = i + Y This is a transformation of one discrete RV to another discrete RV Z. pZ|X[j|i] = pY|X[j-i|i]. (*) To find unconditional PMF of Z we use property 5. ¥ pZ [ j ] = Since (*) pZ [ j ] = åp [ j | i ]pX [i ] åp [ j - i | i ]pX [i ] Z|X i =-¥ ¥ Y|X i =-¥ If X and Y are independent so that pY|X = pY then pZ [ j ] = ¥ å p [ j - i ]p [i ] Y X i =-¥ 18 Mean of the Conditional PMF We can determine attributes such as the expected value of a RV Y, when it is known that X = xi. EY|X p[Y | xi ] = å yj pY|X [yj | xi ] j The mean of the conditional PMF is a constant when xi is fixed. Generally, mean is a function of xi. Example: Two dice are tossed, the event of interest is a sum, given the sum is even or odd. The means of the conditional PMF are given ì1 ì ì3ì ì5 ì ì5 ì ì3ì ì1 ì EY|X [Y |1] = 2 ì ì + 4 ì ì + 6 ì ì + 8 ì ì + 10 ì ì + 12 ì ì = 7 ì18 ì ì18 ì ì18 ì ì18 ì ì18 ì ì18 ì ì2 ì ì5 ì ì6 ì ì4 ì ì2 ì EY|X [Y | 0] = 3 ì ì + 5 ì ì + 7 ì ì + 9 ì ì + 11ì ì = 7 ì18 ì ì18 ì ì18 ì ì18 ì ì18 ì Usually not equal var[Y | xi ] = å j (yj - EY|X [Y | xi ])2 pY|X [yj | xi ] 19 Example: Toss one of two dice Two dice are given: D1 = {1,2,3,4,5,6} and D2 = {2,3,2,3,2,3}. The die is selected at random and tossed. What’s the expected number of dots observed for the tossed die? We can view this problem as a conditional one by letting ì 1 if die 1 is chosen X=ì ì 2 if die 2 is chosen and Y is the number of dots observed. Thus, we wish to determine EY|X[Y|1] and EY|X[Y|2]. 1 pY|X [ j |1] = , j = 1,2, 3, 4,5,6 6 6 7 EY|X [Y |1] = å jpY|X [ j |1] = 2 j =1 1 pY|X [ j | 2] = , j = 2, 3 2 3 EY|X [Y |1] = å jpY|X [ j | 2] = j =2 5 2 20 Example: Toss one of two dice Mean = 3.88, True mean = 3.5 Outcomes when die 1 is chosen Mean = 3.58, True mean = 2.5 Outcomes when die 2 is chosen What is the unconditional mean (mean of Y)? Unconditional mean is the number of dots observed without first condition on which die was chosen. 1 1 Intuitively EY [Y] = EY|X [Y |1] + EY|X [Y | 2] 2 2 21 Unconditional mean Let determine EY[Y] for the following experiment 1. Choose die 1 or die 2 with probability of ½. 2. Toss the chose die. 3. Count the number of dots on the face of tossed die, that is RV Y. To determine theoretical mean Y we need pY[j]. pY [ j ] = å i pX,Y [i, j ] pX,Y [i, j ] = pY|X [ j | i]pX [i] ì ì 1 /12 i = 1; j = 1,2, 3, 4,5,6 pX,Y [i, j ] = pY|X [ j | i ]pX [i ] = ì i = 2; j = 2, 3 ì 1/ 4 ì pX,Y [1, j ] = 1 /12 ì pY [ j ] = ì i pX,Y [i, j ] = ì 1 1 1 p [1, j ] + p [2, j ] = + = X,Y ì X,Y 12 4 3 ì j = 1, 4,5,6 j = 2, 3 22 Unconditional mean Thus the unconditional mean becomes 6 EY [Y] = ì jpY [ j ] = j =1 ì 1 ì ì1 ì ì1 ì ì 1 ì ì 1 ì ì 1 ì = 1ì ì + 2 ì ì + 3 ì ì + 4 ì ì + 5 ì ì + 6 ì ì = 3 ì12 ì ì 3 ì ì 3 ì ì12 ì ì12 ì ì12 ì The other way to find unconditional mean EY [Y] = EY|X [Y |1]pX [1]+ EY|X [Y | 2]pX [2] That is the average of the conditional means. 23 Unconditional mean (Proof) In general unconditional mean is found as EY [Y] = å EY|X [Y | xi ]pX [xi ] i Proof ìE Y|X [Y | xi ]pX [xi ] = i ì ì ì ììì yj pY|X [yj | xi ]ìì pX [xi ] = (def. of cond. mean) i j ìì i j pX,Y [x j , yi ] yj pX [xi ] = pX [xi ] ìyìp j i X,Y (def. of cond. PMF) [x j , yi ] = j ì y p [y ] = E [Y] j Y j Y (marginal PMF from joint PMF) i 24 Modeling human learning Child learns by attempting to pick up the toy, dropping it, picking it up again after having learned something. Each time the experiment, “attempting to pick up the toy”, is repeated the child learns something or equivalently narrows down then number the number of strategies. Many models of human learning employ a Baysian framework. By using it we are able to discriminate the right strategy with more accuracy as we repeatedly perform and experiment and observe the output. 25 Modeling human learning: Example Suppose we wish to “learn” whether a coin is fair (p = ½) or is weighted (p ≠ ½). Our certainty that the coin is fair or not, will increase as the number of trials increase. In the Bayesian model we assume that p is a RV. In reality, the coin has a fixed probably, but it is unknown to us. Let the probability of heads be denoted by RV Y and its values by yj. PY [yj ] = 1 M +1 for yj = 0, 1 2 M -1 , ,..., ,1 M M M Prior PMF, it summarizes our state of knowledge before the experiment is performed 26 Modeling human learning: Example Let N be the number of coin tosses and X denote the number of tosses heads observed in the N tosses. X ~ bin(N, p) i.e. is binomially distributed, however the probability of heads Y is unknown. We can only specify the PMF of X conditionally Y = yj then the conditional PMF of the number of heads for X = i is ì N ì i N-1 pX|Y [i | yj ] = ì y (1y ) ,i = 0,1,..., N. j j ì ì i ì We are interested in the prob of heads or the PMF of Y after observing the outcomes of N coin tosses pY|X[yj|i]. pY|X[yj|i] is a posterior PMF, since it is determined after the experiment is performed. 27 Modeling human learning: Example The posterior PMF pY|X[yj|i] contains all the info about the prob. of heads that results from our prior knowledge, summarized by pY, and our “data” knowledge, summarized by pX|Y. The posterior PMF is given by Bayes’ rule with xi = i as pY|X [yj | i ] = pX|Y [i | yj ]pY [yj ] å j pX|Y [i | yj ]pY [yj ] ì N ì i 1 N-1 yj (1- yj ) ì ì M +1 yij (1- yj )N-1 ì i ì pY|X [yj | i ] = M = M ì N ì i 1 i N-1 N-1 y (1y ) j ì ìì i ììyj (1- yj ) M + 1 ìj =0 j pY|X[yj|i] depends j =0 on the observed number of heads i. yj = 0,1/ M,...,1;i = 0,1,..., N. 28 Modeling human learning: Example p = 0.5 N = 10,i = 4 N = 20,i = 11 N = 40,i = 19 N = 10,i = 2 N = 20,i = 5 N = 40,i = 7 29 Problems A fair coin is tossed, If it comes up heads, then X = 1 and if it comes up tails, then X = 0. Next, a point is selected at random from the area A if X = 1 and from the area B if X = 0 as shown. 1 B A 1 The area of the square is 4 and A and B both have areas of 3/2. If the point selected is in an upper quadrant, we set Y = 1 and if it is in a lower quadrant, we set Y = 0. Find the conditional PMF pY|X[j|i] for all values of i and j. Next, compute P[Y = 0]. 30 Problems Prove that 2 var(Y | xi ) = EY|X [Y 2 | xi ] - EY|X [Y | xi ] 31 Problems If X and Y are independent RV, find PMF of Z = | X – Y|. Assume that SX = {0,1,…} and SY = {0,1,…}. Hint: The answer is ì ì ì pZ [k] = ì ì ì ì ì ì p [i ]p [i ] X k=0 Y i =0 ì ì ( p [i ]p [i + k] + p [i ]p [i + k]) Y X X Y k = 1,2,... i=0 as intermediate step show that ì pY [i ] k=0 ì pZ [k] = ì p [i + k] + pY [i = k] k ì 0 ì Y 32