16.322 Stochastic Estimation and Control, Fall 2004 Prof. Vander Velde Lecture 4 Last time: Left off with characteristic function. 4. Prove φx (t ) = Πφx (t ) where X = X 1 + X 2 + ... + X n (Xi independent) i Let S = X 1 + X 2 + ...X n where the Xi are independent. jtS ⎡ jt ( X + X φs (t ) = E ⎡e ⎣ ⎤⎦ = E ⎣ e 1 = E ⎡⎣ e ⎤⎦ E ⎣⎡ e = ∏ φX i (t ) jtX 1 jtX 2 2 +... + X n ⎤⎦ ...E ⎡⎣ e ) ⎤ jtX n ⎦ ⎤ ⎦ This is the main reason why use of the characteristic function is convenient. This would also follow from the more devious reasoning of the density function for the sum of n independent random variables being the nth order convolution of the individual density functions – and the knowledge that convolution in the direct variable domain becomes multiplication in the transform domain. 5. MacLaurin series expansion of φ (t ) ∞ Because f(x) is non-negative and ∫ ∞ f ( x )dx = 1 (or, even better, −∞ ∫ f ( x ) dx = 1 ), it −∞ ∞ follows that ∫ f ( x ) dx = 1 converges so that f(x) is Fourier transformable. Thus −∞ the characteristic function φ (t ) exists for all distributions and the inverse relation φ (t ) → f (x) holds for all distributions. This implies that φ (t ) is analytic for all real values of t. Then it can be expanded in a power series, which converges for all finite values of t. φ (t) = φ (0) + φ (1) ( 0 ) t + φ (t ) = ∞ ∫ 1 (2) 1 φ ( 0 ) t 2 + ... + φ ( n ) ( 0 ) t n + ... n! 2! f (x )e jtx dx , φ (0) = 1 −∞ Page 1 of 6 16.322 Stochastic Estimation and Control, Fall 2004 Prof. Vander Velde ∞ d nφ (t ) = ∫ f ( x )( jx) n e jtx dx dt n −∞ φ (n) ∞ ( 0 ) = j n ∫ x n f (x )dx = j n X n −∞ 1 2 22 1 j ) X t + ... + ( j n ) X n t n + ... ( 2! n! The coefficients of the expansion are given by the moments of the distribution. Thus the characteristic function can be determined from the moments. Similarly, the moments can be determined from the characteristic function directly by 1 d nφ (t) Xn = n j dt n t =0 φ (t) = 1 + jXt + or by expanding φ (t ) into its power series in some other way and identifying the coefficients of the various powers of t. The Generating Function The generating function has its most useful application to random variables which take integer values only. Examples of such would be the number of telephone calls into a switchboard in a certain time interval, the number of cars entering a toll station in a certain time interval, the number of times a 7 is thrown in n tosses of 2 dice, etc. For integer-valued random variables, the Generating Function yields the same advantages as the Characteristic Function and is of simpler form. Consider a random variable which takes the integer values k: P( X = k ) = pk (k=0,1,2,…) For a discrete distribution you can sum in lieu of integration. The Characteristic Function for this random variable is ∞ φ (t ) = E ⎡⎣ e jtX ⎤⎦ = ∑ e jtk pk k =0 ∞ = ∑ pk ( e jt ) k k = 0 If we define a new variable s = e jt , we have ∞ G ( s ) = ∑ pk s k k =0 Page 2 of 6 16.322 Stochastic Estimation and Control, Fall 2004 Prof. Vander Velde which is called the Generating Function. It has all the interesting properties of the characteristic function. Note that t → 0 corresponds to s → 1 . Let’s establish the connection between moments of a distribution and the generating function: dG ∞ = ∑ kpk s k −1 ds k =0 d 2G ∞ = ∑ k ( k − 1) pk s k −2 ds 2 k =0 ∞ ∞ k =0 k =0 = ∑ k 2 pk s k −2 − ∑ kpk s k −2 Just calculate dG ds d 2G ds 2 dG ds and s=1 d 2G ds 2 and reorganize them in terms of X and X 2 : s=1 ∞ s=1 = ∑ kpk = X , ← 1st moment expression k =0 ∞ ∞ k =0 k =0 = ∑ k 2 pk − ∑ kpk s=1 X2 = 2 d G ds 2 + s=1 dG ds ← 2 nd moment expression s=1 Each moment is a linear combination of its order derivative and lower order derivatives. The generating function for the sum of independent integer-valued variables is the product of their generating functions. This is harder to prove than the same property of the characteristic function, but it does, in fact, hold true. Multiple Random Variables Characterizing a joint set of random variables, define a probability distribution function F ( x) = P( X 1 ≤ x1 , X 2 ≤ x2 ,..., X n ≤ xn ) This is called the joint probability distribution function. Properties: If any of the arguments xi goes to −∞ , then F ( x ) → 0 . lim F ( x) = 0 any xi →−∞ Page 3 of 6 16.322 Stochastic Estimation and Control, Fall 2004 Prof. Vander Velde If all of the xi go to ∞ , then F ( x ) → 1 . lim F ( x) = 1 all xi →∞ F ( x ) is monotonically non-decreasing in each xi. Define joint density function by differentiation: ∂n f ( x) = ∂x1∂x2 ...∂xn f ( x ) ≥ 0, ∀x Fx1 ... xn ( x1...xn ) = xn x1 ∫ du ... ∫ du 1 −∞ n f x1 ... xn (u1...un ) −∞ Setting each xi → ∞ , ∞ ∞ ∫ du ... ∫ du F 1 −∞ n u1 ,...,un (u1 ,..., un ) = 1 −∞ Fx1 ,..., xk ( x1 ,..., xk ) = P( X 1 ≤ x1 ,..., X n ≤ xn ) = P( X 1 ≤ x1 ,..., X k ≤ xk , X k +1 ≤ ∞,..., X n ≤ ∞ ) = Fx1 ,..., xn ( x1 ,..., xk , ∞,..., ∞) Page 4 of 6 16.322 Stochastic Estimation and Control, Fall 2004 Prof. Vander Velde For the density function: ∂ k f x1 ,..., xk ( x1 ,..., xk ) = Fx ,..., x ( x1 ,..., xk ) ∂x1∂x2 ...∂xk 1 k ∂ k Fx ,..., x ( x1 ,..., xk , ∞,..., ∞) = ∂x1∂x2 ...∂xk 1 n ∂k ∂x1∂x2 ...∂xk = ∞ ∫ du = ∞ ∫ = −∞ ∫ −∞ xk du1... ∫ duk −∞ ∞ ∫ −∞ ∞ duk +1... ∫ dun f x1 ,..., xn (u1 ,..., un ) −∞ ∞ k +1 −∞ x1 ... ∫ dun f x1 ,...,xn ( x1 ,..., xk , uk +1 ,..., un ) −∞ ∞ duk +1... ∫ dun f x1 ,..., xn ( x1 ,..., xn ) −∞ Marginal density If you integrate above over all variables but one, it is referred to as the marginal density. ∞ f xi ( xi ) = ∞ ∫ dx ... ∫ dx 1 −∞ n f x1 ,..., xn ( x1 ,..., xn ) −∞ 14 4244 3 n-1 terms: all except xi Mutually independent sets of random variables Definition of independence: P [ X 1 ∈ s1 , X 2 ∈ s2 ,...] = P [ X 1 ∈ s1 ] P [ X 2 ∈ s2 ]... for any sets s1, s2, … The product rule holds for joint probability distribution and density functions for independent random variables. Fx1 , x2 , x3 ,... ( x1 , x2 , x3 ,...) = Fx1 ( x1 ) Fx2 ( x2 ) Fx2 ( x2 )... f x1 , x2 , x3 ,... ( x1 , x2 , x3 ,...) = f x1 ( x1 ) f x2 ( x2 ) f x2 ( x2 )... Expectations E [ g ( x )] = ∞ ∫ −∞ ∞ dx1... ∫ dxn g ( x ) f ( x ) −∞ Page 5 of 6 16.322 Stochastic Estimation and Control, Fall 2004 Prof. Vander Velde For the sum of multiple random variables: E [ X 1 + X 2 + ... + X n ] = ∞ ∞ ∫ dx ... ∫ dx ( x 1 n −∞ ∞ = ∫ −∞ 1 ∞ ∞ −∞ −∞ dx1... ∫ dxn x1 f x1 ,..., xn ( x1 ,..., xn ) + ... + ∞ = + x2 + ... + xn ) f x1 ,..., xn ( x1 ,..., xn ) −∞ ∫xf ∫ ∞ 1 x1 ( x1 )dx1 + −∞ ∫x ∞ dx1... ∫ dxn xn f x1 ,..., xn ( x1 ,..., xn ) −∞ ∞ 2 f x2 ( x2 )dx2 + ... + −∞ = E [ X 1 ] + E [ X 2 ] + ... + E [ X n ] ∫x n f xn ( xn )dxn −∞ This relation is true whether or not the xi are independent. For the product of multiple independent random variables: E [ X 1 X 2 ... X n ] = ∞ ∫ −∞ ∞ = ∫ −∞ ∞ dx1... ∫ dxn ( x1 x2 ... xn ) f x1 ,..., xn ( x1 ,..., xn ) −∞ ∞ dx1... ∫ dxn ( x1 x2 ... xn ) f x1 ( x1 ) f x2 ( x2 )... f xn ( xn ) −∞ ∞ = ∫xf 1 x1 −∞ ∞ ∞ −∞ −∞ ( x1 )dx1 ∫ x2 f x2 ( x2 )dx2 ... ∫ xn f xn ( xn )dxn = E [ X 1 ] E [ X 2 ]... E [ X n ] Page 6 of 6