Lecture 10 Joint Moments.pptx

PROBABILITY AND STATISTICS FOR ENGINEERING Joint Moments and Joint Characteristic Functions Hossein Sameti Department of Computer Engineering Sharif University of Technology Compact Representation of Joint p.d.f  Various parameters for compact representation of the information contained in the joint p.d.f of two r.vs  Given two r.vs X and Y and a function g ( x, y ), define the r.v Z  g ( X ,Y )  We can define the mean of Z to be Z  E ( Z )     z f Z ( z )dz.  It is possible to express the mean of Z  g ( X , Y ) in terms of f XY ( x, y ) without computing f Z (z ).  Recall that P z  Z  z  z   f Z ( z ) z  P z  g ( X , Y )  z  z   f ( x , y )Dz XY ( x, y ) xy where Dz is the region in xy plane satisfying the above inequality. z f Z ( z )z  g ( x, y )  ( x , y )Dz f XY ( x, y )x y.  As z covers the entire z axis, the corresponding regions Dz are nonoverlapping, and they cover the entire xy plane.  By integrating, we obtain  or E(Z )     z f Z ( z )dz   E[ g ( X , Y )]             g ( x, y ) f XY ( x, y )dxdy. g ( x, y ) f XY ( x, y )dxdy.  If X and Y are discrete-type r.vs, then E[ g ( X , Y )]   g ( xi , y j ) P( X  xi , Y  y j ). i j  Since expectation is a linear operator, we also get   E   ak gk ( X , Y )    ak E[ g k ( X , Y )].  k  k  If X and Y are independent r.vs, W  h(Y ) and Z  g ( X ) are always independent of each other.  Therefore E[ g ( X )h(Y )]        g ( x )h( y ) f X ( x ) fY ( y )dxdy       g ( x ) f X ( x )dx  h( y ) fY ( y )dy  E[ g ( X )] E[h(Y )]. Note that this is in general not true (if X and Y are not independent).  How to parametrically represent cross-behavior between two random variables?  we generalize the variance definition Covariance  Given any two r.vs X and Y, define Cov( X ,Y )  E( X   X )(Y  Y ).  By expanding and simplifying the right side of (10-10), we also get Cov( X , Y )  E ( XY )   X Y  E ( XY )  E ( X ) E (Y ) ____ __ __  XY  X Y .  It is easy to see that Cov( X , Y )  Var( X )Var(Y ) . Proof  Let U  aX  Y , so that  Var (U )  E  a( X   X )  (Y  Y ) 2   a 2Var ( X )  2a Cov( X , Y )  Var (Y )  0 .  Hence the discriminant must be non-positive a quadratic in the variable a with imaginary(or double) roots Var (U ) Cov( X , Y ) 2  Var( X ) Var(Y )  This gives us the desired result. a Correlation Parameter (Covariance)  Using Cov( X , Y )  Var ( X )Var (Y ) , we may define the normalized parameter  XY   or Cov( X , Y ) Cov( X , Y )  ,  X Y Var( X )Var(Y )  1   XY  1, Cov( X , Y )   XY X  Y  This represents the correlation coefficient between X and Y. Uncorrelatedness and Orthogonality  If  XY  0, then X and Y are said to be uncorrelated r.vs.  if X and Y are uncorrelated, then E ( XY )  E ( X ) E (Y ) since ____ __ __ Cov( X , Y )  XY  X Y .  X and Y are said to be orthogonal if E ( XY )  0.  If either X or Y has zero mean, then orthogonality and uncorrelatedness are equivalent.  If X and Y are independent, from E[ g ( X )h(Y )]  E[ g ( X )]E[h(Y )], with g ( X )  X , h(Y )  Y , we get E ( XY )  E ( X ) E (Y ).  We conclude that independence implies uncorrelatedness.  There cannot be any correlation between two independent r.vs.(  XY  0).  However, random variables can be uncorrelated without being independent. Example  Let X ~ U (0,1), Y ~ U (0,1).  Suppose X and Y are independent.  Define Z = X + Y, W = X - Y .  Show that Z and W are dependent, but uncorrelated r.vs. Solution  z  x  y, w  x  y gives the only solution set to be zw zw , y . 2 2  Moreover 0  z  2,  1  w  1, z  w  2, z  w  2, z | w | and | J ( z, w) | 1 / 2. x Example - continued  Thus 1 / 2, 0  z  2,  1  w  1, z  w  2, z  w  2, | w | z, f ZW ( z, w)   otherwise ,  0, w 1 2  and hence fZ ( z)   1  z 1 0  z  1,    z 2 dw  z,  f ZW ( z, w)dw    2 -z 1 dw  2  z, 1  z  2,   z2 2  z Example - continued  or by direct computation ( Z = X + Y ) 0  z  1,  z,  f Z ( z )  f X ( z )  fY ( z )  2  z, 1  z  2,  0, otherwise ,   and fW ( w)   f ZW ( z, w)dz   2 |w| |w|  We have f ZW ( z, w)  f Z ( z ) fW ( w).  Thus Z and W are not independent. 1 | w |,  1  w  1, 1 dz   otherwise . 2  0, Example - continued  However E ( ZW )  E( X  Y )( X  Y )  E ( X 2 )  E (Y 2 )  0,  and E (W )  E ( X  Y )  0,  and hence Cov( Z ,W )  E ( ZW )  E ( Z ) E (W )  0  implying that Z and W are uncorrelated random variables. Example  Let Z  aX  bY .  Determine the variance of Z in terms of  X , Y and  XY . Solution Z  E ( Z )  E (aX  bY )  a  X  bY  and using Cov( X , Y )   XY X  Y , 2  Z2  Var ( Z )  E ( Z   Z ) 2   E  a ( X   X )  b(Y  Y )      a 2 E ( X   X ) 2  2abE  ( X   X )(Y  Y )   b2 E (Y  Y ) 2  a 2 X2  2ab  XY  X  Y  b2 Y2 .  If X and Y are independent, then  XY  0, and this reduces to  Z2  a 2 X2  b2 Y2 .  Thus the variance of the sum of independent r.vs is the sum of their variances ( a  b  1). Moments E[ X Y ]   k m      x k y m f XY ( x, y )dx dy, represents the joint moment of order (k,m) for X and Y.  We can define the joint characteristic function between two random variables  This will turn out to be useful for moment calculations. Joint Characteristic Functions  The joint characteristic function between X and Y is defined as  XY (u, v)  E  e j ( Xu Yv )        Note that  XY (u, v)   XY (0,0)  1.  It is easy to show that 1  2 XY (u, v ) E ( XY )  2 j uv . u  0 ,v  0 e j ( Xu Yv ) f XY ( x, y )dxdy.  If X and Y are independent r.vs, then from the definition of joint characteristic function, we obtain  XY (u, v )  E (e juX ) E (e jvY )   X (u)Y (v ).  Also  X (u)   XY (u, 0), Y ( v)   XY (0, v). More on Gaussian r.vs  X and Y are said to be jointly Gaussian as N (  X , Y , X2 , Y2 ,  ), if their joint p.d.f has the form f XY ( x, y )  1 2 X  Y 1   2 e 1  ( x   X ) 2 2  ( x   X )( y  Y ) ( y  Y ) 2     XY 2 (1  2 )   X2  Y2     ,    x  ,    y  , |  | 1.  By direct substitution and simplification, we obtain the joint characteristic function of two jointly Gaussian r.vs to be  XY (u, v)  E (e j ( XuYv ) )  e  So,  X (u)   XY (u,0)  e 1 j (  X u  Y v )  ( X2 u 2  2  X  Y uv Y2 v 2 ) 2 1 j X u   X2 u 2 2 , .  By direct computation, it is easy to show that for two jointly Gaussian random variables, Cov( X , Y )    X  Y . 2 2   in N (  X , Y , X ,  Y ,  ) represents the actual correlation coefficient of the two jointly Gaussian r.vs  Notice that   0 implies f XY ( X ,Y )  f X ( x) fY ( y).  Thus if X and Y are jointly Gaussian, uncorrelatedness does imply independence between the two random variables.  Gaussian case is the only exception where the two concepts imply each other. Example  Let X and Y be jointly Gaussian r.vs with parameters N (  X , Y , X2 , Y2 ,  ).  Define Z  aX  bY .  Determine f Z (z ). Solution  make use of characteristic function  Z (u )  E (e jZu )  E (e j ( aX bY ) u )  E (e jauX  jbuY )   XY (au, bu ).  We get  Z (u )  e 1 j ( a X bY ) u  ( a 2 X2  2 ab X  Y b2 Y2 ) u 2 2  where  Z  a X  bY ,   Z2  a 2 X2  2 ab X  Y  b2 Y2 . e 1 j Z u   Z2 u 2 2 , Example - continued  This has the form  X (u)   XY (u,0)  e 1 j X u   X2 u 2 2 ,  We conclude that Z  aX  bY is also Gaussian with so defined mean and variance.  We can also conclude that any linear combination of jointly Gaussian r.vs generate a Gaussian r.v. Example  Suppose X and Y are jointly Gaussian r.vs as in the previous example.  Define two linear combinations Z  aX  bY , W  cX  dY .  what can we say about their joint distribution? Solution  The characteristic function of Z and W is given by  ZW (u, v )  E (e j ( Zu W v) )  E (e j ( aX bY ) u  j ( cX dY ) v )  E (e jX ( aucv ) jY ( budv ) )   XY (au  cv, bu  dv ). Example - continued 1 2 2 j (  Z u  W v )  ( Z2 u 2  2  ZW X  Y uv W v ) 2  Thus  ZW (u, v )  e  Where  Z  a X  bY , W  c X  dY , ,  Z2  a 2 X2  2ab X  Y  b2 Y2 ,  W2  c 2 X2  2cd X  Y  d 2 Y2 ,  and  ZW  ac X2  (ad  bc)  X  Y  bd Y2  Z W .  We conclude that Z and W are also jointly distributed Gaussian r.vs with means, variances and correlation coefficient as described. Example - summary  Any two linear combinations of jointly Gaussian random variables (independent or dependent) are also jointly Gaussian r.vs. Gaussian input Linear operator Gaussian output  Of course, we could have reached the same conclusion by deriving the joint p.d.f f ZW ( z, w)  Gaussian random variables are also interesting because of the Central Limit Theorem. Central Limit Theorem  Suppose X 1 , X 2 ,, X n are a set of - zero mean - independent, identically distributed (i.i.d) random variables - with some common distribution  Consider their scaled sum Y  X1  X 2    X n . n  Then asymptotically (as n  ) Y  N (0, 2 ). Central Limit Theorem - Proof  We shall prove it here under the independence assumption  The theorem is true under even more general conditions 2  Let  represent their common variance. Since E ( X i )  0,  we have Var( X i )  E ( X i2 )   2 .  Consider  Y (u )  E ( e jYu  )E e j ( X 1  X 2  X n ) u / n    E (e n i 1 n    X i (u / n ) i 1  where we have made use of the independence of the r.vs X 1 , X 2 ,, X n . jX i u / n ) Central Limit Theorem - Proof  But E (e jX i u / n   jX i u j 2 X i2u 2 j 3 X i3u3  2u 2  1  )  E 1       1   o 3 / 2 , 3/ 2 2 ! n 3 ! n 2 n n n     Also, n   u  1  Y (u )  1   o 3 / 2  , 2n  n   2 2  and as lim Y (u)  e  2 2 n  n x  x lim 1   e .  Since n n  u This is the characteristic function of /2 , a zero mean2 normal r.v with variance  and this completes the proof. Central Limit Theorem - Conclusion  A large sum of independent random variables each with finite variance tends to behave like a normal random variable.  The individual p.d.fs become unimportant to analyze the collective sum behavior.  If we model the noise phenomenon as the sum of a large number of independent random variables, then this theorem allows us to conclude that noise behaves like a Gaussian r.v.  (eg: electron motion in resistor components) Central Limit Theorem - Conclusion  The finite variance assumption is necessary.  Consider the r.vs to be Cauchy distributed, (we know that its variance is not defined or ∞), and let X1  X 2    X n Y  . n  where each X i  C ( ).Then since  X (u)  e |u| , i  We get n   Y (u )    X (u / n )  e i 1  |u|/ n  n ~ C ( n ),  which shows that Y is still Cauchy with parameter  n . Central Limit Theorem - Conclusion  Joint characteristic functions are useful in determining the p.d.f of linear combinations of r.vs.  With X and Y as independent Poisson r.vs with parameters 1 and respectively, let Z  X  Y .  Then 2 Z (u)   X (u)Y (u).  X (u )  e 1 ( e ju 1)  so that  Z (u)  e( 1 2 )( e Y (u)  e , ju 1) 2 ( e ju 1) ~ P(1  2 )  i.e., sum of independent Poisson r.vs is also a Poisson random variable.

Lecture 10 Joint Moments.pptx

Related documents

Products

Support

Lecture 10 Joint Moments.pptx

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib