Math-UA.233: Theory of Probability Lecture 21 Tim Austin From last time: covariance Let X and Y be RVs, and let µX covariance of X and Y is E rX s and µY E rY s. The Cov pX , Y q E pX looooomooooon µX qpY µY q notation First properties: 1. Symmetry: CovpX , Y q CovpY , X q 2. Generalizes variance: CovpX , X q VarpX q 3. Alternative formula: CovpX , Y q E rXY s E rX sE rY s 4. If X and Y are independent then CovpX , Y q 0. Here’s what makes covariance really useful: how it transforms under sums and products: Proposition (Ross Prop 7.4.2) 1. For any RVs X and Y and any real value a, we have CovpaX , Y q CovpX , aY q a CovpX , Y q p q CovpaX , aX q a2 CovpX , X q a2 VarpX q — sanity check). (so Var aX 2. For any RVs X1 , . . . , Xn and Y1 , . . . , Ym , we have Cov ņ i 1 m̧ Xi , j 1 Yj ņ m̧ CovpYi , Yj q. i 1j 1 (so it behaves just like multiplying out a product of sums of numbers.) In particular, if m eqn (7.4.1)) Var ņ Xi n and Yi Xi in part 2 above, we get (Ross ņ VarpXi q 2 i 1 looooomooooon i 1 ¸ CovpXi , Xj q 1¤i j ¤n loooooooooooomoooooooooooon the ‘diagonal terms1 the ‘cross terms1 If X1 , . . . , Xn are independent, then CovpXi , Xj q 0 whenever i j, so in this case we’re left with Var ņ i 1 Xi ņ i 1 VarpXi q. Example (Ross E.g. 7.4b) If X is binompn, p q, then VarpX q np p1 p q. IDEA: Recall that X indicator variables. X1 Xn , a sum of independent Example (Special case of Ross E.g. 7.4f) An experiment has three possible outcomes with respective probabilities p1 , p2 , p3 . After n independent trials are performed, we write Xi for the number of times outcome i occurred, i 1, 2, 3. Find CovpX1 , X2 q. Now some continuous examples... Example (Ross E.g. 7.2ℓ) A flee starts at the origin in the plane. Each second it jumps one inch. For each jump it chooses the direction uniformly at random, independently of its previous jumps. Find the expected square of the distance from the origin after n jumps. Example (Normal RVs) If X1 , . . . , Xn are independent normal RVs with respective parameters µi , σi for i 1, . . . , n and, then E ņ Xi ņ and Var ņ i 1 normal with those parameters. Xi ņ σi2 . ° This checks out with our previous calculation that ni1 Xi is still i 1 µi i 1 i 1 Conditioning with random variables Our next topic is the use of conditional probability in connection with random variables, as well as single events. As for events, this can come up in various ways. Sometimes we know the joint distribution of RVs X and Y , and want to compute ‘updated’ probabilities for the behaviour of X in light of some information about Y . Sometimes there is a natural modeling choice for the conditional probabilities, and we must reconstruct the unconditioned probabilities from them. The tools we introduce below can give a convenient way to do a calculation, even when we start and end with ‘unconditional’ quantities. Conditional distributions: discrete case (Ross Sec 6.4) Suppose that X and Y are discrete RVs with joint PMF p. We have various events defined in terms of X , and others defined in terms of Y . Sometimes we need the conditional probabilities of “X -events” given “Y -events”, or vice-versa. This requires no new ideas beyond conditional probability. But some new notation can be convenient. The conditional PMF of X given Y is the function pX |Y px |y q P tX loooomoooon x | Y y u pppx,pyyqq . Y notation It is defined for any x, and any y for which pY py q ¡ 0 (we simply don’t use it for other choices of y). Example (Ross E.g. 6.4a) Suppose that X and Y have joint PMF given by p p0, 0q 0.4, p p0, 1q 0.2, p p1, 0q 0.1, Calculate the conditional PMF of X given that Y p p1, 1q 0.3. 1. Example (Ross E.g. 6.4b) Let X and Y be independent Poipλq and Poipµq RVs. Calculate the conditional PMF of X given that X Y n. Conditioning can also involve three or more RVs. This requires care, but still no new ideas. Example (Special case of Ross E.g. 6.4c) An experiment has three possible outcomes with respective probabilities p1 , p2 , p3 . After n independent trials are performed, we write Xi for the number of times outcome i occurred, i 1, 2, 3. Find the conditional distribution (that is, the conditional joint PMF) of pX1 , X2 q given that X3 m. WHAT THE QUESTION IS ASKING FOR: pX1 ,X2 |X3 pk, ℓ|mq P tX1 k, X2 ℓ | X3 mu for all possible values of k, ℓ and m. Another fact to note: If X and Y are independent, then pX |Y px |y q pX px qpY py q pY py q pX px q. So this is just like for events: if X and Y are independent, then knowing the value taken by Y doesn’t influence the probability distribution of X . Conditional distributions: continuous case (Ross Sec 6.5) When continuous RVs are involved, we do need a new idea. THE PROBLEM: Suppose that Y is a continuous RV and E is any event. Then P tY yu 0 for any real value y, so we cannot define the conditional probability P pE | tY using the usual formula. y uq However, we can sometimes make sense of this in another way... Instead of assuming that Y takes the value y exactly, let us condition on Y taking a value in a tiny window around y: P pE | ty ¤Y ¤y dy uq P pE X ty ¤ Y ¤ y dy uq P ty ¤ Y ¤ y dy u P pE X ty ¤ Y ¤ y dy uq , fY py q dy where we use the infinitessimal interpretation of fY py q. This makes sense provided fY py q ¡ 0. Now let dy ÝÑ 0. The conditional probability of E given that Y y is defined to be P pE | Y P pE X ty ¤ Y ¤ y y q dylim ÝÑ0 f py q dy Y dy uq . That definition again: P pE | Y P pE X ty ¤ Y ¤ y y q dylim f py q dy ÝÑ0 dy uq . Y Theorem (Surprisingly hard) The above limit always exists, except maybe for a “negligible” set of possible values y which you can safely ignore. “Negligible” sets are another idea from measure theory, so won’t describe them here. Ross doesn’t mention this result at all. In this course you can always just assume that the limit exists. Example (Ross E.g. 6.5b) Suppose that X , Y have joint PMF f px, y q Find P tX # x {y y e e y 0 0 x, y 8 otherwise. ¡ 1 | Y y u for y ¡ 0. In this example, E is defined in terms of another RV, jointly continuous with Y . In fact there is a useful general tool for this situation. Definition (Ross p250) Suppose X and Y are jointly continuous with joint PDF f . The conditional PDF of X given Y is the function fX |Y px |y q f px, y q . fY py q It’s defined for all real values x and all y such that fY py q ¡ 0. BEWARE: This is not a conditional probability, since it’s a ratio of densities, not of probability values. INTUITIVE INTERPRETATION: Instead of conditioning tX x u on tY y u, let’s allow very small windows around both x and y. We find: P tx ¤ X ¤ x dx | y ¤ Y ¤ y dy u P tx ¤ X P¤tyx ¤ Ydx,¤ yy ¤ Ydy¤u y dy u f px,f ypyqqdxdydy fX |Y px |y q dx. Y So fX |Y px |y q is a PDF which describes the probability distribution of X , given that Y lands in a very small window around y. Also, just as in the discrete case, if X and Y are independent then fX |Y px |y q fX px q So knowing the value taken by Y (to arbitrary accuracy) doesn’t influence the probability distribution of X . The conditional PDF enables us to compute probabilities of “X -events” given that Y y, without taking a limit: Proposition (See Ross p251) Let a b, or a 8 or b fY py q ¡ 0. Then P ta ¤ X 8. Let y be a real value such that ¤ b | Y yu »b a fX |Y px |y q dx. MORAL: fX |Y px |y q is a new PDF. It describes the probabilities of “X -events” given that Y y. It has all the other properties that we’ve already seen for PDFs.