PROBABILITY AND STATISTICS FOR ENGINEERING Conditional Density Functions and Conditional Expected Values Hossein Sameti Department of Computer Engineering Sharif University of Technology Conditional probability density functions: useful to update the information about an event based on the knowledge about some other related event We shall analyze the situation where the related event happens to be a random variable that is dependent on the one of interest. Recall that the distribution function of X given an event B is P( X ( ) x ) B FX ( x | B) P X ( ) x | B . P( B ) Suppose, we let B y1 Y ( ) y2 . Then FX ( x | y1 Y y2 ) P X ( ) x, y1 Y ( ) y2 P( y1 Y ( ) y2 ) FXY ( x, y2 ) FXY ( x, y1 ) , FY ( y2 ) FY ( y1 ) This can be rewritten as x FX ( x | y1 Y y2 ) y2 y1 f XY (u, v )dudv y2 y1 fY ( v )dv . To determine, the limiting case FX ( x | Y y), let y1 y and y2 y y This gives x FX ( x | y Y y y ) y y y f XY (u, v )dudv y y y x f XY (u, y )du y fY ( v )dv fY ( y ) y and hence in the limit FX ( x | Y y ) lim FX y 0 Thus and FX |Y ( x | y Y y y ) ( x | Y y) f X |Y ( x | Y y ) x f XY (u, y )du fY ( y ) f XY ( x, y ) . fY ( y ) . x f XY (u, y )du fY ( y ) . The subscript X | Y is used to remind about the conditional nature We have f X |Y ( x | Y y ) and f X |Y f XY ( x, y ) 0 fY ( y ) ( x | Y y )dx f XY ( x, y )dx fY ( y ) fY ( y ) 1, fY ( y ) Thus the formula represents a valid p.d.f. We shall refer to it as the conditional p.d.f of the r.v X given Y = y. We may also write f X |Y ( x | Y y ) f X |Y ( x | y ). Thus we have f XY ( x, y ) f XY ( x, y ) f ( y | x ) . and similarly f X |Y ( x | y ) , Y |X f X ( x) fY ( y ) If the r.vs X and Y are independent, then f XY ( x, y ) f X ( x) fY ( y ) and the previous formuls reduces to f X |Y ( x | y ) f X ( x), fY | X ( y | x) fY ( y ), This makes sense, since if X and Y are independent r.vs, information about Y shouldn’t be of any help in updating our knowledge about X. In the case of discrete-type r.vs, f X |Y ( x | y ) P X xi | Y y j How to obtain conditional p.d.fs? f XY ( x, y ) reduces to fY ( y ) P( X xi , Y y j ) P(Y y j ) . Example k , 0 x y 1, f XY ( x, y ) otherwise , 0, Given determine f X |Y ( x | y )and fY | X ( y | x ). Solution f XY ( x, y ) dxdy 1 0 y 0 y k dx dy 1 1 k y dy 0 k 1 k 2. 2 1 Similarly 1 and f X ( x) f XY ( x, y )dy k dy k (1 x), x y fY ( y ) f XY ( x, y )dx k dx k y, 0 0 x 1, 0 y 1. x Example - continued Also, and f X |Y ( x | y ) f XY ( x, y ) 1 , fY ( y ) y fY | X ( y | x ) f XY ( x, y ) 1 , f X ( x) 1 x 0 x y 1, 0 x y 1. f XY ( x, y ) and f ( y | x) f XY ( x, y ) , From f X |Y ( x | y ) , we also have Y|X f X ( x) fY ( y ) f XY ( x, y ) f X |Y ( x | y ) fY ( y ) fY | X ( y | x ) f X ( x ) or But fY | X ( y | x ) f X ( x) f X |Y ( x | y ) fY ( y ) f X ( x) f XY ( x, y )dy . f X |Y ( x | y ) fY ( y )dy Example - continued So, f Y |X ( y | x ) f X |Y (x | y )f Y ( y ) . f X |Y (x | y )f Y ( y )dy This represents the p.d.f version of Bayes’ theorem. This is very significant in communication problems In these problems, observations can be used to update our knowledge about unknown parameters. Example An unknown random phase is uniformly distributed in (0,2 ). r n, where Determine f ( | r ). n N (0, 2 ). Solution Initially almost nothing about the r.v is known. We assume its a-priori p.d.f to be uniform in the interval ( 0,2 ), In r n, we can think of - n as the noise contribution - r as the observation. It is reasonable to assume that and n are independent. Example - continued In that case f (r | θ ) ~ N ( , 2 ) Since it is given that θ is a constant, r n behaves like n. This gives the a-posteriori p.d.f of given r : f ( | r ) 2 0 f ( r | ) f ( ) f ( r | ) f ( )d ( r ) e ( r ) where (r) 2 / 2 2 2 2 0 e ( r ) 2 / 2 2 d . , e 1 2 ( r ) 2 / 2 2 2 0 e ( r ) 2 0 2 , / 2 2 d Example - continued The knowledge about the observation r is reflected in the a-posteriori p.d.f of . It is no longer flat as the a-priori p.d.f It shows higher probabilities in the neighborhood of r. f |r ( | r ) f ( ) 1 2 2 (a) a-priori p.d.f of r (b) a-posteriori p.d.f of Conditional Mean Defined using conditional p.d.fs E g( X ) | B g ( x) f X ( x | B)dx. using a limiting argument, X |Y E X | Y y x f X |Y ( x | y ) dx This is the conditional mean of X given Y = y. Note that E ( X | Y y ) will be a function of y. Also Y |X E Y | X x y fY |X ( y | x) dy. Conditional Variance In a similar manner, the conditional variance of X given Y = y is given by Var( X | Y ) X2 |Y E X 2 | Y y E ( X | Y y ) E ( X X |Y )2 | Y y . 2 Example 1, 0 | y | x 1, f XY ( x, y ) otherwise . 0, Let Determine E ( X | Y ) and E (Y | X ). Solution f XY ( x, y) 1 in the shaded area, and zero elsewhere. f X ( x) x x 1 fY ( y ) So f XY ( x, y )dy 2 x, 0 x 1, 1 dx 1 | y |, | y| 1, | y| f XY ( x, y ) 1 , 0 | y | x 1, fY ( y ) 1 | y | f ( x, y ) 1 fY | X ( y | x ) XY , 0 | y | x 1. f X ( x) 2x f X |Y ( x | y ) y 1 x Example - continued Hence E ( X | Y ) x f X |Y ( x | y )dx 2 1 1 x (1 | y |) 2 E (Y | X ) | y| x dx | y | (1 | y |) 1 1 | y |2 1 | y | , | y | 1. 2(1 | y |) 2 y 1 y2 yf Y | X ( y | x )dy dy x 2 x 2x 2 x x 0, 0 x 1. x As generalization of the conditional mean formulas, E g( X ) | Y y g ( x ) f X |Y ( x | y )dx . Example - continued Also, E g( X ) g ( x ) f X ( x )dx g ( x) g ( x ) f XY ( x, y )dxdy f XY ( x, y )dydx g ( x ) f X |Y ( x | y )dx fY ( y )dy E g ( X )|Y y E g ( X ) | Y y fY ( y )dy E E g ( X ) | Y y . Here the inner expectation is with respect to X and the outer expectation is with respect to Y. Example - continued Letting g( X ) = X in this equation we get E ( X ) E E ( X | Y y ), where the inner expectation on the right side is with respect to X and the outer one is with respect to Y. Similarly, we have E(Y ) E E(Y | X x). We can also obtain Var( X ) E Var( X | Y y ) . Estimation with Conditional Mean Conditional mean turns out to be an important concept in estimation and prediction theory. What is the best predicted value of Y given that X = x ? It turns out that - if “best” is meant in the sense of minimizing the mean square error between Y and its estimate Yˆ, then - the conditional mean of Y given X = x, i.e., E (Y | X x ) is the best estimate. Example: Poisson Sum of Bernoulli Random Variables Let X i , i 1, 2, 3, represent independent, identically distributed Bernoulli random variables with P( X i 1) p, P( X i 0) 1 p q and N a Poisson random variable with parameter that is independent of all X i . N Consider the random variables Y X i , Z N Y. i 1 Show that Y and Z are independent Poisson random variables. Solution To determine the joint probability mass function of Y and Z, consider P(Y m, Z n) P(Y m, N Y n) P(Y m, N m n) P(Y m N m n) P( N m n) N P ( X i m N m n) P ( N m n) i 1 m n P ( X i m) P ( N m n ) i 1 m n ( Note that X i ~ B(m n, i 1 p) and X i s are independen t of N ) (m n )! m n m n p q e (m n )! m!n ! Solution - continued p ( p ) m q (q ) n e e m! n! P(Y m) P( Z n). Thus Y ~ P( p ) and Z ~ P(q ) and Y and Z are independent random variables. Thus - if a bird lays eggs that follow a Poisson random variable with parameter , - and if each egg survives with probability p, then the number of chicks that survive also forms a Poisson random variable with parameter p .