PROBABILITY AND STATISTICS FOR ENGINEERING Joint Moments and Joint Characteristic Functions Hossein Sameti Department of Computer Engineering Sharif University of Technology Compact Representation of Joint p.d.f Various parameters for compact representation of the information contained in the joint p.d.f of two r.vs Given two r.vs X and Y and a function g ( x, y ), define the r.v Z g ( X ,Y ) We can define the mean of Z to be Z E ( Z ) z f Z ( z )dz. It is possible to express the mean of Z g ( X , Y ) in terms of f XY ( x, y ) without computing f Z (z ). Recall that P z Z z z f Z ( z ) z P z g ( X , Y ) z z f ( x , y )Dz XY ( x, y ) xy where Dz is the region in xy plane satisfying the above inequality. z f Z ( z )z g ( x, y ) ( x , y )Dz f XY ( x, y )x y. As z covers the entire z axis, the corresponding regions Dz are nonoverlapping, and they cover the entire xy plane. By integrating, we obtain or E(Z ) z f Z ( z )dz E[ g ( X , Y )] g ( x, y ) f XY ( x, y )dxdy. g ( x, y ) f XY ( x, y )dxdy. If X and Y are discrete-type r.vs, then E[ g ( X , Y )] g ( xi , y j ) P( X xi , Y y j ). i j Since expectation is a linear operator, we also get E ak gk ( X , Y ) ak E[ g k ( X , Y )]. k k If X and Y are independent r.vs, W h(Y ) and Z g ( X ) are always independent of each other. Therefore E[ g ( X )h(Y )] g ( x )h( y ) f X ( x ) fY ( y )dxdy g ( x ) f X ( x )dx h( y ) fY ( y )dy E[ g ( X )] E[h(Y )]. Note that this is in general not true (if X and Y are not independent). How to parametrically represent cross-behavior between two random variables? we generalize the variance definition Covariance Given any two r.vs X and Y, define Cov( X ,Y ) E( X X )(Y Y ). By expanding and simplifying the right side of (10-10), we also get Cov( X , Y ) E ( XY ) X Y E ( XY ) E ( X ) E (Y ) ____ __ __ XY X Y . It is easy to see that Cov( X , Y ) Var( X )Var(Y ) . Proof Let U aX Y , so that Var (U ) E a( X X ) (Y Y ) 2 a 2Var ( X ) 2a Cov( X , Y ) Var (Y ) 0 . Hence the discriminant must be non-positive a quadratic in the variable a with imaginary(or double) roots Var (U ) Cov( X , Y ) 2 Var( X ) Var(Y ) This gives us the desired result. a Correlation Parameter (Covariance) Using Cov( X , Y ) Var ( X )Var (Y ) , we may define the normalized parameter XY or Cov( X , Y ) Cov( X , Y ) , X Y Var( X )Var(Y ) 1 XY 1, Cov( X , Y ) XY X Y This represents the correlation coefficient between X and Y. Uncorrelatedness and Orthogonality If XY 0, then X and Y are said to be uncorrelated r.vs. if X and Y are uncorrelated, then E ( XY ) E ( X ) E (Y ) since ____ __ __ Cov( X , Y ) XY X Y . X and Y are said to be orthogonal if E ( XY ) 0. If either X or Y has zero mean, then orthogonality and uncorrelatedness are equivalent. If X and Y are independent, from E[ g ( X )h(Y )] E[ g ( X )]E[h(Y )], with g ( X ) X , h(Y ) Y , we get E ( XY ) E ( X ) E (Y ). We conclude that independence implies uncorrelatedness. There cannot be any correlation between two independent r.vs.( XY 0). However, random variables can be uncorrelated without being independent. Example Let X ~ U (0,1), Y ~ U (0,1). Suppose X and Y are independent. Define Z = X + Y, W = X - Y . Show that Z and W are dependent, but uncorrelated r.vs. Solution z x y, w x y gives the only solution set to be zw zw , y . 2 2 Moreover 0 z 2, 1 w 1, z w 2, z w 2, z | w | and | J ( z, w) | 1 / 2. x Example - continued Thus 1 / 2, 0 z 2, 1 w 1, z w 2, z w 2, | w | z, f ZW ( z, w) otherwise , 0, w 1 2 and hence fZ ( z) 1 z 1 0 z 1, z 2 dw z, f ZW ( z, w)dw 2 -z 1 dw 2 z, 1 z 2, z2 2 z Example - continued or by direct computation ( Z = X + Y ) 0 z 1, z, f Z ( z ) f X ( z ) fY ( z ) 2 z, 1 z 2, 0, otherwise , and fW ( w) f ZW ( z, w)dz 2 |w| |w| We have f ZW ( z, w) f Z ( z ) fW ( w). Thus Z and W are not independent. 1 | w |, 1 w 1, 1 dz otherwise . 2 0, Example - continued However E ( ZW ) E( X Y )( X Y ) E ( X 2 ) E (Y 2 ) 0, and E (W ) E ( X Y ) 0, and hence Cov( Z ,W ) E ( ZW ) E ( Z ) E (W ) 0 implying that Z and W are uncorrelated random variables. Example Let Z aX bY . Determine the variance of Z in terms of X , Y and XY . Solution Z E ( Z ) E (aX bY ) a X bY and using Cov( X , Y ) XY X Y , 2 Z2 Var ( Z ) E ( Z Z ) 2 E a ( X X ) b(Y Y ) a 2 E ( X X ) 2 2abE ( X X )(Y Y ) b2 E (Y Y ) 2 a 2 X2 2ab XY X Y b2 Y2 . If X and Y are independent, then XY 0, and this reduces to Z2 a 2 X2 b2 Y2 . Thus the variance of the sum of independent r.vs is the sum of their variances ( a b 1). Moments E[ X Y ] k m x k y m f XY ( x, y )dx dy, represents the joint moment of order (k,m) for X and Y. We can define the joint characteristic function between two random variables This will turn out to be useful for moment calculations. Joint Characteristic Functions The joint characteristic function between X and Y is defined as XY (u, v) E e j ( Xu Yv ) Note that XY (u, v) XY (0,0) 1. It is easy to show that 1 2 XY (u, v ) E ( XY ) 2 j uv . u 0 ,v 0 e j ( Xu Yv ) f XY ( x, y )dxdy. If X and Y are independent r.vs, then from the definition of joint characteristic function, we obtain XY (u, v ) E (e juX ) E (e jvY ) X (u)Y (v ). Also X (u) XY (u, 0), Y ( v) XY (0, v). More on Gaussian r.vs X and Y are said to be jointly Gaussian as N ( X , Y , X2 , Y2 , ), if their joint p.d.f has the form f XY ( x, y ) 1 2 X Y 1 2 e 1 ( x X ) 2 2 ( x X )( y Y ) ( y Y ) 2 XY 2 (1 2 ) X2 Y2 , x , y , | | 1. By direct substitution and simplification, we obtain the joint characteristic function of two jointly Gaussian r.vs to be XY (u, v) E (e j ( XuYv ) ) e So, X (u) XY (u,0) e 1 j ( X u Y v ) ( X2 u 2 2 X Y uv Y2 v 2 ) 2 1 j X u X2 u 2 2 , . By direct computation, it is easy to show that for two jointly Gaussian random variables, Cov( X , Y ) X Y . 2 2 in N ( X , Y , X , Y , ) represents the actual correlation coefficient of the two jointly Gaussian r.vs Notice that 0 implies f XY ( X ,Y ) f X ( x) fY ( y). Thus if X and Y are jointly Gaussian, uncorrelatedness does imply independence between the two random variables. Gaussian case is the only exception where the two concepts imply each other. Example Let X and Y be jointly Gaussian r.vs with parameters N ( X , Y , X2 , Y2 , ). Define Z aX bY . Determine f Z (z ). Solution make use of characteristic function Z (u ) E (e jZu ) E (e j ( aX bY ) u ) E (e jauX jbuY ) XY (au, bu ). We get Z (u ) e 1 j ( a X bY ) u ( a 2 X2 2 ab X Y b2 Y2 ) u 2 2 where Z a X bY , Z2 a 2 X2 2 ab X Y b2 Y2 . e 1 j Z u Z2 u 2 2 , Example - continued This has the form X (u) XY (u,0) e 1 j X u X2 u 2 2 , We conclude that Z aX bY is also Gaussian with so defined mean and variance. We can also conclude that any linear combination of jointly Gaussian r.vs generate a Gaussian r.v. Example Suppose X and Y are jointly Gaussian r.vs as in the previous example. Define two linear combinations Z aX bY , W cX dY . what can we say about their joint distribution? Solution The characteristic function of Z and W is given by ZW (u, v ) E (e j ( Zu W v) ) E (e j ( aX bY ) u j ( cX dY ) v ) E (e jX ( aucv ) jY ( budv ) ) XY (au cv, bu dv ). Example - continued 1 2 2 j ( Z u W v ) ( Z2 u 2 2 ZW X Y uv W v ) 2 Thus ZW (u, v ) e Where Z a X bY , W c X dY , , Z2 a 2 X2 2ab X Y b2 Y2 , W2 c 2 X2 2cd X Y d 2 Y2 , and ZW ac X2 (ad bc) X Y bd Y2 Z W . We conclude that Z and W are also jointly distributed Gaussian r.vs with means, variances and correlation coefficient as described. Example - summary Any two linear combinations of jointly Gaussian random variables (independent or dependent) are also jointly Gaussian r.vs. Gaussian input Linear operator Gaussian output Of course, we could have reached the same conclusion by deriving the joint p.d.f f ZW ( z, w) Gaussian random variables are also interesting because of the Central Limit Theorem. Central Limit Theorem Suppose X 1 , X 2 ,, X n are a set of - zero mean - independent, identically distributed (i.i.d) random variables - with some common distribution Consider their scaled sum Y X1 X 2 X n . n Then asymptotically (as n ) Y N (0, 2 ). Central Limit Theorem - Proof We shall prove it here under the independence assumption The theorem is true under even more general conditions 2 Let represent their common variance. Since E ( X i ) 0, we have Var( X i ) E ( X i2 ) 2 . Consider Y (u ) E ( e jYu )E e j ( X 1 X 2 X n ) u / n E (e n i 1 n X i (u / n ) i 1 where we have made use of the independence of the r.vs X 1 , X 2 ,, X n . jX i u / n ) Central Limit Theorem - Proof But E (e jX i u / n jX i u j 2 X i2u 2 j 3 X i3u3 2u 2 1 ) E 1 1 o 3 / 2 , 3/ 2 2 ! n 3 ! n 2 n n n Also, n u 1 Y (u ) 1 o 3 / 2 , 2n n 2 2 and as lim Y (u) e 2 2 n n x x lim 1 e . Since n n u This is the characteristic function of /2 , a zero mean2 normal r.v with variance and this completes the proof. Central Limit Theorem - Conclusion A large sum of independent random variables each with finite variance tends to behave like a normal random variable. The individual p.d.fs become unimportant to analyze the collective sum behavior. If we model the noise phenomenon as the sum of a large number of independent random variables, then this theorem allows us to conclude that noise behaves like a Gaussian r.v. (eg: electron motion in resistor components) Central Limit Theorem - Conclusion The finite variance assumption is necessary. Consider the r.vs to be Cauchy distributed, (we know that its variance is not defined or ∞), and let X1 X 2 X n Y . n where each X i C ( ).Then since X (u) e |u| , i We get n Y (u ) X (u / n ) e i 1 |u|/ n n ~ C ( n ), which shows that Y is still Cauchy with parameter n . Central Limit Theorem - Conclusion Joint characteristic functions are useful in determining the p.d.f of linear combinations of r.vs. With X and Y as independent Poisson r.vs with parameters 1 and respectively, let Z X Y . Then 2 Z (u) X (u)Y (u). X (u ) e 1 ( e ju 1) so that Z (u) e( 1 2 )( e Y (u) e , ju 1) 2 ( e ju 1) ~ P(1 2 ) i.e., sum of independent Poisson r.vs is also a Poisson random variable.