Lecture 10 Joint Moments.pptx

advertisement
PROBABILITY AND STATISTICS
FOR ENGINEERING
Joint Moments
and
Joint Characteristic Functions
Hossein Sameti
Department of Computer Engineering
Sharif University of Technology
Compact Representation of Joint p.d.f
 Various parameters for compact representation of the information
contained in the joint p.d.f of two r.vs
 Given two r.vs X and Y and a function g ( x, y ), define the r.v
Z  g ( X ,Y )
 We can define the mean of Z to be
Z  E ( Z )  


z f Z ( z )dz.
 It is possible to express the mean of Z  g ( X , Y ) in terms of f XY ( x, y )
without computing f Z (z ).
 Recall that
P z  Z  z  z   f Z ( z ) z  P z  g ( X , Y )  z  z 

f
( x , y )Dz
XY
( x, y ) xy
where Dz is the region in xy plane satisfying the above inequality.
z f Z ( z )z  g ( x, y )

( x , y )Dz
f XY ( x, y )x y.
 As z covers the entire z axis, the corresponding regions Dz are
nonoverlapping, and they cover the entire xy plane.
 By integrating, we obtain
 or
E(Z )  


z f Z ( z )dz  
E[ g ( X , Y )]  










g ( x, y ) f XY ( x, y )dxdy.
g ( x, y ) f XY ( x, y )dxdy.
 If X and Y are discrete-type r.vs, then
E[ g ( X , Y )]   g ( xi , y j ) P( X  xi , Y  y j ).
i
j
 Since expectation is a linear operator, we also get


E   ak gk ( X , Y )    ak E[ g k ( X , Y )].
 k
 k
 If X and Y are independent r.vs, W  h(Y ) and Z  g ( X ) are always
independent of each other.
 Therefore
E[ g ( X )h(Y )]  





g ( x )h( y ) f X ( x ) fY ( y )dxdy




  g ( x ) f X ( x )dx  h( y ) fY ( y )dy  E[ g ( X )] E[h(Y )].
Note that this is in general not true (if X and Y are not independent).
 How to parametrically represent cross-behavior between two random
variables?
 we generalize the variance definition
Covariance
 Given any two r.vs X and Y, define
Cov( X ,Y )  E( X   X )(Y  Y ).
 By expanding and simplifying the right side of (10-10), we also get
Cov( X , Y )  E ( XY )   X Y  E ( XY )  E ( X ) E (Y )
____
__ __
 XY  X Y .
 It is easy to see that Cov( X , Y )  Var( X )Var(Y ) .
Proof
 Let U  aX  Y , so that

Var (U )  E  a( X   X )  (Y  Y ) 2

 a 2Var ( X )  2a Cov( X , Y )  Var (Y )  0 .
 Hence the discriminant must be non-positive
a quadratic in the
variable a with
imaginary(or double)
roots
Var (U )
Cov( X , Y ) 2  Var( X ) Var(Y )
 This gives us the desired result.
a
Correlation Parameter (Covariance)
 Using Cov( X , Y )  Var ( X )Var (Y ) , we may define the normalized
parameter
 XY 
 or
Cov( X , Y )
Cov( X , Y )

,
 X Y
Var( X )Var(Y )
 1   XY  1,
Cov( X , Y )   XY X  Y
 This represents the correlation coefficient between X and Y.
Uncorrelatedness and Orthogonality
 If  XY  0, then X and Y are said to be uncorrelated r.vs.
 if X and Y are
uncorrelated,
then E ( XY )  E ( X ) E (Y ) since
____
__ __
Cov( X , Y )  XY  X Y .
 X and Y are said to be orthogonal if E ( XY )  0.
 If either X or Y has zero mean, then orthogonality and uncorrelatedness
are equivalent.
 If X and Y are independent, from E[ g ( X )h(Y )]  E[ g ( X )]E[h(Y )], with
g ( X )  X , h(Y )  Y , we get E ( XY )  E ( X ) E (Y ).
 We conclude that independence implies uncorrelatedness.
 There cannot be any correlation between two independent r.vs.(  XY  0).
 However, random variables can be uncorrelated without being
independent.
Example
 Let X ~ U (0,1), Y ~ U (0,1).
 Suppose X and Y are independent.
 Define Z = X + Y, W = X - Y .
 Show that Z and W are dependent, but uncorrelated r.vs.
Solution

z  x  y, w  x  y gives the only solution set to be
zw
zw
, y
.
2
2
 Moreover 0  z  2,  1  w  1, z  w  2, z  w  2, z | w | and
| J ( z, w) | 1 / 2.
x
Example - continued
 Thus
1 / 2, 0  z  2,  1  w  1, z  w  2, z  w  2, | w | z,
f ZW ( z, w)  
otherwise ,
 0,
w
1
2
 and hence
fZ ( z)  
1
 z 1
0  z  1,
   z 2 dw  z,

f ZW ( z, w)dw  
 2 -z 1
dw  2  z, 1  z  2,


z2
2

z
Example - continued
 or by direct computation ( Z = X + Y )
0  z  1,
 z,

f Z ( z )  f X ( z )  fY ( z )  2  z, 1  z  2,
 0,
otherwise ,

 and
fW ( w)   f ZW ( z, w)dz  
2 |w|
|w|
 We have f ZW ( z, w)  f Z ( z ) fW ( w).
 Thus Z and W are not independent.
1 | w |,  1  w  1,
1
dz  
otherwise .
2
 0,
Example - continued
 However
E ( ZW )  E( X  Y )( X  Y )  E ( X 2 )  E (Y 2 )  0,
 and
E (W )  E ( X  Y )  0,
 and hence
Cov( Z ,W )  E ( ZW )  E ( Z ) E (W )  0
 implying that Z and W are uncorrelated random variables.
Example
 Let Z  aX  bY .
 Determine the variance of Z in terms of  X , Y and  XY .
Solution
Z  E ( Z )  E (aX  bY )  a  X  bY
 and using Cov( X , Y )   XY X  Y ,
2
 Z2  Var ( Z )  E ( Z   Z ) 2   E  a ( X   X )  b(Y  Y )  


 a 2 E ( X   X ) 2  2abE  ( X   X )(Y  Y )   b2 E (Y  Y ) 2
 a 2 X2  2ab  XY  X  Y  b2 Y2 .
 If X and Y are independent, then  XY  0, and this reduces to
 Z2  a 2 X2  b2 Y2 .
 Thus the variance of the sum of independent r.vs is the sum of their
variances ( a  b  1).
Moments
E[ X Y ]  
k
m





x k y m f XY ( x, y )dx dy,
represents the joint moment of order (k,m) for X and Y.
 We can define the joint characteristic function between two random
variables
 This will turn out to be useful for moment calculations.
Joint Characteristic Functions
 The joint characteristic function between X and Y is defined as
 XY (u, v)  E  e
j ( Xu Yv )
 




 Note that
 XY (u, v)   XY (0,0)  1.
 It is easy to show that
1  2 XY (u, v )
E ( XY )  2
j
uv
.
u  0 ,v  0
e j ( Xu Yv ) f XY ( x, y )dxdy.
 If X and Y are independent r.vs, then from the definition of joint
characteristic function, we obtain
 XY (u, v )  E (e juX ) E (e jvY )   X (u)Y (v ).
 Also
 X (u)   XY (u, 0),
Y ( v)   XY (0, v).
More on Gaussian r.vs
 X and Y are said to be jointly Gaussian as N (  X , Y , X2 , Y2 ,  ), if their
joint p.d.f has the form
f XY ( x, y ) 
1
2 X  Y 1  
2
e
1  ( x   X ) 2 2  ( x   X )( y  Y ) ( y  Y ) 2



 XY
2 (1  2 )   X2
 Y2




,
   x  ,    y  , |  | 1.
 By direct substitution and simplification, we obtain the joint characteristic
function of two jointly Gaussian r.vs to be
 XY (u, v)  E (e j ( XuYv ) )  e
 So,
 X (u)   XY (u,0)  e
1
j (  X u  Y v )  ( X2 u 2  2  X  Y uv Y2 v 2 )
2
1
j X u   X2 u 2
2
,
.
 By direct computation, it is easy to show that for two jointly Gaussian
random variables,
Cov( X , Y )    X  Y .
2
2
  in N (  X , Y , X ,  Y ,  ) represents the actual correlation coefficient
of the two jointly Gaussian r.vs
 Notice that
  0 implies f XY ( X ,Y )  f X ( x) fY ( y).
 Thus if X and Y are jointly Gaussian, uncorrelatedness does imply
independence between the two random variables.
 Gaussian case is the only exception where the two concepts imply each
other.
Example
 Let X and Y be jointly Gaussian r.vs with parameters N (  X , Y , X2 , Y2 ,  ).
 Define Z  aX  bY .
 Determine f Z (z ).
Solution
 make use of characteristic function
 Z (u )  E (e jZu )  E (e j ( aX bY ) u )  E (e jauX  jbuY )
  XY (au, bu ).
 We get
 Z (u )  e
1
j ( a X bY ) u  ( a 2 X2  2 ab X  Y b2 Y2 ) u 2
2
 where

Z 
a X  bY ,

 Z2 
a 2 X2  2 ab X  Y  b2 Y2 .
e
1
j Z u   Z2 u 2
2
,
Example - continued
 This has the form  X (u)   XY (u,0)  e
1
j X u   X2 u 2
2
,
 We conclude that Z  aX  bY is also Gaussian with so defined mean
and variance.
 We can also conclude that any linear combination of jointly Gaussian
r.vs generate a Gaussian r.v.
Example
 Suppose X and Y are jointly Gaussian r.vs as in the previous example.
 Define two linear combinations Z  aX  bY ,
W  cX  dY .
 what can we say about their joint distribution?
Solution
 The characteristic function of Z and W is given by
 ZW (u, v )  E (e j ( Zu W v) )  E (e j ( aX bY ) u  j ( cX dY ) v )
 E (e jX ( aucv ) jY ( budv ) )   XY (au  cv, bu  dv ).
Example - continued
1
2 2
j (  Z u  W v )  ( Z2 u 2  2  ZW X  Y uv W
v )
2
 Thus
 ZW (u, v )  e
 Where
 Z  a X  bY ,
W  c X  dY ,
,
 Z2  a 2 X2  2ab X  Y  b2 Y2 ,
 W2  c 2 X2  2cd X  Y  d 2 Y2 ,
 and
 ZW 
ac X2  (ad  bc)  X  Y  bd Y2
 Z W
.
 We conclude that Z and W are also jointly distributed Gaussian r.vs with
means, variances and correlation coefficient as described.
Example - summary
 Any two linear combinations of jointly Gaussian random variables
(independent or dependent) are also jointly Gaussian r.vs.
Gaussian input
Linear
operator
Gaussian output
 Of course, we could have reached the same conclusion by deriving the
joint p.d.f f ZW ( z, w)
 Gaussian random variables are also interesting because of the Central
Limit Theorem.
Central Limit Theorem
 Suppose X 1 , X 2 ,, X n are a set of
- zero mean
- independent, identically distributed (i.i.d) random variables
- with some common distribution
 Consider their scaled sum Y 
X1  X 2    X n
.
n
 Then asymptotically (as n  ) Y  N (0, 2 ).
Central Limit Theorem - Proof
 We shall prove it here under the independence assumption
 The theorem is true under even more general conditions
2
 Let  represent their common variance. Since E ( X i )  0,
 we have Var( X i )  E ( X i2 )   2 .
 Consider
 Y (u )  E ( e
jYu

)E e
j ( X 1  X 2  X n ) u / n
   E (e
n
i 1
n
   X i (u / n )
i 1
 where we have made use of the independence of the
r.vs X 1 , X 2 ,, X n .
jX i u / n
)
Central Limit Theorem - Proof
 But
E (e
jX i u / n


jX i u j 2 X i2u 2 j 3 X i3u3
 2u 2  1 
)  E 1 


   1 
 o 3 / 2 ,
3/ 2
2
!
n
3
!
n
2
n
n
n 


 Also,
n
  u
 1 
Y (u )  1 
 o 3 / 2  ,
2n
 n 

2 2
 and as lim Y (u)  e 
2 2
n 
n
x

x
lim
1

 e .
 Since n
n

u
This is the characteristic function of
/2
, a zero mean2 normal r.v with
variance  and this completes the
proof.
Central Limit Theorem - Conclusion
 A large sum of independent random variables each with finite variance
tends to behave like a normal random variable.
 The individual p.d.fs become unimportant to analyze the collective sum
behavior.
 If we model the noise phenomenon as the sum of a large number of
independent random variables, then this theorem allows us to conclude
that noise behaves like a Gaussian r.v.
 (eg: electron motion in resistor components)
Central Limit Theorem - Conclusion
 The finite variance assumption is necessary.
 Consider the r.vs to be Cauchy distributed, (we know that its variance is
not defined or ∞), and let
X1  X 2    X n
Y 
.
n
 where each X i  C ( ).Then since  X (u)  e |u| ,
i
 We get
n

 Y (u )    X (u / n )  e
i 1
 |u|/ n

n
~ C ( n ),
 which shows that Y is still Cauchy with parameter  n .
Central Limit Theorem - Conclusion
 Joint characteristic functions are useful in determining the p.d.f of linear
combinations of r.vs.
 With X and Y as independent Poisson r.vs with parameters 1 and
respectively, let Z  X  Y .
 Then
2
Z (u)   X (u)Y (u).
 X (u )  e
1 ( e ju 1)
 so that  Z (u)  e( 1 2 )( e
Y (u)  e
,
ju
1)
2 ( e ju 1)
~ P(1  2 )
 i.e., sum of independent Poisson r.vs is also a Poisson random variable.
Download