Multiple Random Variables Joint Probability Density Let X and Y be two random variables. Their joint distribution ( ) ( ) ( −∞,−∞ ) = F ( x,−∞ ) = F ( −∞, y ) = 0 F ( ∞,∞ ) = 1 function is FXY x, y ≡ P ⎡⎣ X ≤ x ∩ Y ≤ y ⎤⎦ . 0 ≤ FXY x, y ≤ 1 , − ∞ < x < ∞ , − ∞ < y < ∞ FXY ( ) XY XY XY FXY x, y does not decrease if either x or y increases or both increase ( ) ( ) ( ) () FXY ∞, y = FY y and FXY x,∞ = FX x Joint Probability Density Joint distribution function for tossing two dice Joint Probability Density f XY ( ) ( ( )) ∂2 x, y = FXY x, y ∂ x∂ y ( ) f XY x, y ≥ 0 , − ∞ < x < ∞ , − ∞ < y < ∞ ∞ ∞ y ∫ ∫ f ( x, y ) dxdy = 1 ( ) ∫ ∫ f (α , β ) dα d β FXY x, y = XY −∞ −∞ ( ) ∫ f ( x, y ) dy XY ∞ P ⎡⎣ x1 < X ≤ x2 , y1 < Y ≤ y2 ⎤⎦ = (( ( ) ∫ f ( x, y ) dx and fY y = −∞ E g X ,Y XY −∞ −∞ ∞ fX x = x XY −∞ y2 x2 ∫ ∫ f ( x, y ) dxdy XY y1 x1 ∞ ∞ )) = ∫ ∫ g ( x, y ) f ( x, y ) dxdy XY −∞ −∞ Combinations of Two Random Variables Example X and Y are independent, identically distributed (i.i.d.) random variables with common PDF () () ( ) f X x = e− x u x ( ) fY y = e − y u y Find the PDF of Z = X / Y. Since X and Y are never negative, Z is never negative. () () () FZ z = P ⎡⎣ X / Y ≤ z ⎤⎦ ⇒ FZ z = P ⎡⎣ X ≤ zY ∩ Y > 0 ⎤⎦ + P ⎡⎣ X ≥ zY ∩ Y < 0 ⎤⎦ Since Y is never negative FZ z = P ⎡⎣ X ≤ zY ∩ Y > 0 ⎤⎦ Combinations of Two Random Variables ∞ zy () ∫ FZ z = ∞ zy ( ) −x − y f x, y dxdy = e ∫ XY ∫ ∫ e dxdy −∞ −∞ 0 0 Using Leibnitz’s formula for differentiating an integral, b( z ) b( z ) ⎡ ⎤ db z da z ∂ g x, z d ⎢ ∫ g x, z dx ⎥ = g b z ,z − g a z ,z + ∫ dx dz ⎢ a( z ) dz dz ∂z ⎥ a z ( ) ⎣ ⎦ () ( ) fZ fZ ∞ (() ) ∂ z = FZ z = ∫ ye− zy e− y dy , z > 0 ∂z 0 () () u(z) (z) = ( z + 1) 2 () (() ) ( ) Combinations of Two Random Variables Combinations of Two Random Variables Example The joint PDF of X and Y is defined as ⎧6x , x ≥ 0, y ≥ 0, x + y ≤ 1 f XY x, y = ⎨ ⎩0 , otherwise Define Z = X − Y. Find the PDF of Z. ( ) Combinations of Two Random Variables Given the constraints on X and Y , − 1 ≤ Z ≤ 1. 1+ Z 1− Z Z = X − Y intersects X + Y = 1 at X = , Y= 2 2 (1− z )/2 1− y (1− z )/2 1− y 2 For 0 ≤ z ≤ 1, FZ z = 1− ∫ ∫ 6xdxdy = 1− ∫ ⎡⎣3x ⎤⎦ dy () )( 0 ) y+ z 3 3 2 FZ z = 1− 1− z 1− z ⇒ f Z z = 1− z 1+ 3z 4 4 () ( () y+ z 0 ( )( ) Combinations of Two Random Variables For − 1 ≤ z ≤ 0 (1− z )/2 y+ z (1− z )/2 (1− z )/2 y+ z 2 2 FZ z = 2 ∫ ∫ 6xdxdy = 6 ∫ ⎡⎣ x ⎤⎦ dy = 6 ∫ y + z dy () −z 1+ z ) ( F (z) = 0 3 Z 4 ⇒ fZ ( (z) = −z 3 1+ z 4 0 ) 2 −z ( ) Joint Probability Density Let f XY ⎛ x − X0 ⎞ ⎛ y − Y0 ⎞ 1 x, y = rect ⎜ rect ⎜ ⎟ ⎟ wX wY ⎝ wX ⎠ ⎝ wY ⎠ ( ) ∞ ∞ ( ) ∫ ∫ x f ( x, y ) dxdy = X E (Y ) = Y E X = XY 0 −∞ −∞ 0 ∞ ∞ ( ) ∫ ∫ xy f ( x, y ) dxdy = X Y E XY = XY 0 0 −∞ −∞ ∞ ⎛ x − X0 ⎞ 1 x, y dy = rect ⎜ ⎟ wX ⎝ wX ⎠ ( ) ∫f ( ) fX x = XY −∞ Joint Probability Density Conditional Probability { Let A = Y ≤ y FX |A } ( ) P ⎡⎣ X ≤ x ∩ A⎤⎦ x = P ⎡⎣ A⎤⎦ () ( ) ( ) P ⎡⎣ X ≤ x ∩ Y ≤ y ⎤⎦ FXY x, y FX | Y ≤ y x = = P ⎡⎣Y ≤ y ⎤⎦ FY y Let A = y1 < Y ≤ y2 () { } () FX | y <Y ≤ y x = 1 2 ( ) ( ) F (y )− F (y ) FXY x, y2 − FXY x, y1 Y 2 Y 1 Joint Probability Density { Let A = Y = y FX | Y = y FX | Y = y } ( ( )) ( ( )) ∂ FXY x, y FXY x, y + Δy − FXY x, y ∂y x = lim = Δy→0 d FY y + Δy − FY y FY y dy ∂ FXY x, y f XY x, y ∂ ∂y x = , f X |Y = y x = FX | Y = y x = ∂x fY y fY y ( () () ) ) ( ( ( )) ( ) ( ) Similarly fY |X = x y = ( ) f ( x) f XY x, y X ( ) ( ) () ( ( )) ( ) ( ) Joint Probability Density f ( x, y ) ( ) In a simplified notation f ( x ) = and f ( y ) = f ( y) f ( x) Bayes’ Theorem f ( x ) f ( y ) = f ( y ) f ( x ) f XY x, y XY X |Y Y |X Y X |Y Y X Y |X X Marginal PDF’s from joint or conditional PDF’s ∞ ∞ ( ) ∫ f ( x, y ) dy = ∫ f ( x ) f ( y ) dy fX x = XY X |Y −∞ −∞ ∞ ∞ Y ( ) ∫ f ( x, y ) dx = ∫ f ( y ) f ( x ) dx fY y = XY −∞ Y |X −∞ X Joint Probability Density Example: Let a message X with a known PDF be corrupted by additive noise N also with known pdf and received as Y = X + N. Then the best estimate that can be made of the message X is the value at the peak of the conditional PDF, () f X |Y x = ( ) () f ( y) fY |X y f X x Y Joint Probability Density Let N have the PDF, Then, for any known value of X, the PDF of Y would be () Therefore if the PDF of N is f N n , the conditional PDF of Y given ( X is f N y − X ) Joint Probability Density Using Bayes’ theorem, f X |Y f ( y − x)f ( x) ( ) ( ) ( x) = f ( y) = f ( y) f ( y − x)f ( x) f ( y − x)f ( x) = = ∫ f ( y ) f ( x ) dx ∫ f ( y − x ) f ( x ) dx fY |X y f X x N Y Y N X ∞ N X N X ∞ Y |X −∞ X X −∞ Now the conditional PDF of X given Y can be computed. Joint Probability Density To make the example concrete let − x/ E X e ( ) 1 − n2 /2 σ N2 fX x = u x fN n = e E X σ N 2π () ( ) ( ) () Then the conditional pdf of X given Y is found to be ⎡ σ2 y N exp ⎢ 2 − E X ⎢⎣ 2 E X y = 2E X ⎤⎡ ⎛ σ N2 ⎥⎢ ⎜ y− E X ⎥⎦ ⎢ ⎜ fY 1+ erf ⎢ ⎜ 2σ N ⎢ ⎜ ⎢⎣ ⎝ where erf is the error function. ( ) ( ) ( ) ( ) ( ) ⎞⎤ ⎟⎥ ⎟⎥ ⎟⎥ ⎟⎥ ⎠ ⎥⎦ Joint Probability Density Independent Random Variables If two random variables X and Y are independent then () f X |Y x = f X f ( x, y ) ( ) ( x ) = f ( y ) and f ( y ) = f ( y ) = f ( x ) ( x, y ) = f ( x ) f ( y ) and their correlation is the product f XY x, y XY Y |X Y Y Therefore f XY X X Y of their expected values. ∞ ∞ ∞ ∞ ( ) ∫ ∫ xy f ( x, y ) dxdy = ∫ y f ( y ) dy ∫ x f ( x ) dx = E ( X ) E (Y ) E XY = XY −∞ −∞ Y −∞ X −∞ Independent Random Variables Covariance σ XY ( ) ( ) * ⎛ ≡ E ⎡⎣ X − E X ⎤⎦ ⎡⎣Y − E Y ⎤⎦ ⎞ ⎝ ⎠ ∫ ∫ ( x − E ( X ))( y − E (Y )) f ( x, y ) dxdy = E ( XY ) − E ( X ) E (Y ) σ XY = ∞ ∞ * * XY −∞ −∞ σ XY * * ( ) ( ) ( ) ( ) If X and Y are independent, σ XY = E X E Y * − E X E Y * = 0 Independent Random Variables Correlation Coefficient ( ) ( ) ρ XY ⎛ X −E X Y* − E Y* ⎞ = E⎜ × ⎟ ⎜⎝ ⎟⎠ σX σY ρ XY ⎛ x − E X ⎞ ⎛ y* − E Y * ⎞ = ∫ ∫⎜ ⎟ f XY x, y dxdy ⎟⎜ ⎟⎠ σ X ⎠ ⎜⎝ σY −∞ −∞ ⎝ ρ XY = ∞ ∞ ( ( ) ) ( ) ( )= E XY * − E X E Y * σ XσY ( ) ( ) σ XY σ XσY If X and Y are independent ρ = 0. If they are perfectly positively correlated ρ = +1 and if they are perfectly negatively correlated ρ = − 1. Independent Random Variables If two random variables are independent, their covariance is zero. However, if two random variables have a zero covariance that does not mean they are necessarily independent. Independence ⇒ Zero Covariance Zero Covariance ⇒ Independence Independent Random Variables In the traditional jargon of random variable analysis, two “uncorrelated” random variables have a covariance of zero. Unfortunately, this does not also imply that their correlation is zero. If their correlation is zero they are said to be orthogonal. X and Y are "Uncorrelated" ⇒ σ XY = 0 ( ) X and Y are "Uncorrelated" ⇒ E XY = 0 Independent Random Variables The variance of a sum of random variables X and Y is σ 2X +Y = σ 2X + σ Y2 + 2σ XY = σ 2X + σ Y2 + 2 ρ XY σ X σ Y If Z is a linear combination of random variables X i N Z = a0 + ∑ ai X i i=1 N ( ) ( ) then E Z = a0 + ∑ ai E X i i=1 N N N N N σ Z2 = ∑ ∑ ai a jσ X X = ∑ ai2σ 2X + ∑ ∑ ai a jσ X X i=1 j=1 i j i=1 i i=1 j=1 i≠ j i j Independent Random Variables If the X’s are all independent of each other, the variance of the linear combination is a linear combination of the variances. N σ Z2 = ∑ ai2σ 2X i=1 i If Z is simply the sum of the X’s, and the X’s are all independent of each other, then the variance of the sum is the sum of the variances. N σ Z2 = ∑ σ 2X i=1 i One Function of Two Random Variables ( ) Let Z = g X ,Y . Find the pdf of Z. () ( ) ( ) FZ z = P ⎡⎣ Z ≤ z ⎤⎦ = P ⎡⎣ g X ,Y ≤ z ⎤⎦ = P ⎡⎣ X ,Y ∈RZ ⎤⎦ where RZ is the region in the XY plane where g X ,Y ≤ z For example, let Z = X + Y ( ) Probability Density of a Sum of Random Variables Let Z = X + Y. Then for Z to be less than z, X must be less than z − Y. Therefore, the distribution function for Z is ∞ z− y ( ) ∫ ∫ f ( x, y ) dxdy FZ z = XY −∞ −∞ ∞ ⎛ z− y ⎞ y ⎜ ∫ f X x dx ⎟ dy ⎝ −∞ ⎠ ( ) ∫f ( ) If X and Y are independent, FZ z = Y −∞ () ∞ ( ) ∫ f ( y ) f ( z − y ) dy = f ( z ) ∗ f ( z ) and it can be shown that f Z z = Y −∞ X Y X Moment Generating Functions () The moment-generating function Φ X s of a CV random variable () ∞ ( ) ∫ f ( x)e X is defined by Φ X s = E esX = sx X −∞ dx. () () Relation to the Laplace transform → Φ X s = L ⎡⎣ f X x ⎤⎦ s→− s ( ( )) ∞ d Φ X s = ∫ f X x xesx dx ds −∞ () ∞ ⎡d ⎤ ⎢ ds Φ X s ⎥ = ∫ x f X x dx = E ⎣ ⎦ s→0 −∞ n ⎡ d Relation to moments → E X n = ⎢ n Φ X s ⎣ ds ( ( )) () ( ) (X) ( ( )) ⎤ ⎥ ⎦ s→0 Moment Generating Functions () The moment-generating function Φ X z of a DV random variable ( ) ∑ P ⎡⎣ X = n ⎤⎦ z = ∑ p z . Relation to the z transform → Φ ( z ) = Z ( P ( n )) d d Φ ( z ) = E ( Xz ) Φ ( z ) = E ( X ( X − 1) z ) dz dz () X is defined by Φ X z = E z X = ∞ n=−∞ X X −1 X n n=−∞ X n n z→z −1 2 2 ∞ X −2 X ⎧⎡ d ⎤ ⎪⎢ Φ X z ⎥ = E X ⎦ z=1 ⎪ ⎣ dz Relation to moments → ⎨ 2 ⎪⎡ d Φ z ⎤ = E X 2 − E X ⎥ ⎪ ⎢⎣ dz 2 X ⎦ z=1 ⎩ () ( ) () ( ) ( ) The Chebyshev Inequality For any random variable X and any ε > 0, −( µ X + ε ) ∞ P ⎡⎣ X − µ X ≥ ε ⎤⎦ = ∫ f X x dx + ∫ f X x dx = ∫ −∞ () µ X +ε () X − µ X ≥ε () f X x dx Also σ = 2 X ∞ ∫ ( x − µ ) f ( x ) dx ≥ ∫ 2 X −∞ X ( x − µ ) f ( x ) dx ≥ ε ∫ 2 X − µ X ≥ε X 2 X X − µ X ≥ε () f X x dx It then follows that P ⎡⎣ X − µ X ≥ ε ⎤⎦ ≤ σ 2X / ε 2 This is known as the Chebyshev inequality. Using this we can put a bound on the probability of an event with knowledge only of the variance and no knowledge of the PMF or PDF. The Markov Inequality () For any random variable X let f X x = 0 for all X < 0 and let ε be a postive constant. Then E ⎡⎣ X ⎤⎦ = ∞ ∞ ∞ ∞ ∫ x f ( x ) dx = ∫ x f ( x ) dx ≥ ∫ x f ( x ) dx ≥ ε ∫ f ( x ) dx = ε P ⎡⎣ X ≥ ε ⎤⎦ X −∞ Therefore P ⎡⎣ X ≥ ε ⎤⎦ ≤ X 0 X ε X ε ( ) . This is known as the Markov inequality. E X ε It allows us to bound the probability of certain events with knowledge only of the expected value of the random variable and no knowledge of the PMF or PDF except that it is zero for negative values. The Weak Law of Large Numbers { Consider taking N independent values X 1 , X 2 ,, X N } from a random variable X in order to develop an understanding of the nature of X . They 1 N constitute a sampling of X . The sample mean is X N = ∑ X n . The sample N n=1 size is finite, so different sets of N values will yield different sample means. Thus X N is itself a random variable and it is an estimator of the expected ( ) value of X , E X . A good estimator has two important qualities. It is ( ) ( ) unbiased and consistent. Unbiased means E X N = E X . Consistent means that as N is increased the variance of the estimator is decreased. The Weak Law of Large Numbers Using the Chebyshev inequality we can put a bound on the probable deviation of X N from its expected value. ( ) σ 2X P ⎡ X − E X N ≥ ε ⎤ ≤ 2N ⎣ ⎦ ε This implies that σ 2X = , ε >0 2 Nε σ 2X P ⎡ X N − E X < ε ⎤ ≥ 1− , ε >0 2 ⎣ ⎦ Nε The probability that X N is within some small deviation from E X can be ( ) ( ) made as close to one as desired by making N large enough. The Weak Law of Large Numbers Now, in σ 2X P ⎡ X N − E X < ε ⎤ ≥ 1− , ε >0 2 ⎣ ⎦ Nε let N approach infinity. ( ) ( ) lim P ⎡ X N − E X < ε ⎤ = 1 , ε > 0 ⎦ N →∞ ⎣ The Weak Law of Large Numbers states that if X 1 , X 2 ,, X N ( ) { sequence of iid random variable values and E X is finite, then ( ) lim P ⎡ X N − E X < ε ⎤ = 1 , ε > 0 ⎦ N →∞ ⎣ This kind of convergence is called convergence in probability. } is a The Strong Law of Large Numbers { } Now consider a sequence X 1 , X 2 , of independent values of X and let ( ) a sequence of sample means { X , X ,} defined by X X have an expected value E X and a finite variance σ 2X . Also consider 1 2 N 1 N = ∑ X n . The N n=1 Strong Law of Large Numbers says ( ) P ⎡ lim X N = E X ⎤ = 1 ⎣ N →∞ ⎦ This kind of convergence is called almost sure convergence. The Laws of Large Numbers The Weak Law of Large Numbers ( ) lim P ⎡ X N − E X < ε ⎤ = 1 , ε > 0 ⎦ N →∞ ⎣ and the Strong Law of Large Numbers ( ) P ⎡ lim X N = E X ⎤ = 1 ⎣ N →∞ ⎦ seem to be saying about the same thing. There is a subtle difference. It can be illustrated by the following example in which a sequence converges in probability but not almost surely. The Laws of Large Numbers ( ) ⎧⎪1 , k / n ≤ ζ < k + 1 / n , 0 ≤ k < n , n = 1,2,3, Let X nk = ⎨ ⎪⎩0 , otherwise and let ζ be uniformly distributed between 0 and 1. As n increases from one we get this "triangular" sequence of X 's. X 10 X 20 X 21 X 30 X 31 X 32 { } Now let Yn( n−1)/2+ k +1 = X nk meaning that Y = X 10 , X 20 , X 21 , X 30 , X 31 , X 32 , . X 10 is one with probability one. X 20 and X 21 are each one with probability 1/2 and zero with probability 1/2. Generalizing we can say that X nk is one with probability 1/n and zero with probability 1− 1 / n. The Laws of Large Numbers Yn( n−1)/ 2+ k +1 is therefore one with probability 1 / n and zero with probability 1− 1 / n. For each n the probability that at least one of the n numbers in each length-n sequence is one is ( ) n P ⎡⎣at least one 1⎤⎦ = 1− P ⎡⎣ no ones ⎤⎦ = 1− 1− 1 / n . In the limit as n approaches infinity this probability approaches 1− 1 / e ≅ 0.632. So no matter how large n gets there is a non-zero probability that at least one 1 will occur in any length-n sequence. This proves that the sequence Y does not converge almost surely because there is always a non-zero probability that a length-n sequence will contain a 1 for any n. The Laws of Large Numbers ( ) The expected value E X nk is ( ) E X nk = P ⎡⎣ X nk = 1⎤⎦ × 1+ P ⎡⎣ X nk = 0 ⎤⎦ × 0 = 1 / n and is therefore independent of k and approaches zero as n approaches infinity. The expected value of X nk2 is ( ) ( ) E X nk2 = P ⎡⎣ X nk = 1⎤⎦ × 12 + P ⎡⎣ X nk = 0 ⎤⎦ × 02 = E X nk = 1 / n n −1 and the variance is X nk is 2 . So the variance of Y approaches zero n as n approaches infinity. Then according to the Chebyshev inequality n −1 2 2 ⎡ ⎤ P ⎣ Y − µY ≥ ε ⎦ ≤ σ Y / ε = 2 2 nε implying that as n approaches infinity the variation of Y gets steadily smaller and that says that Y converges in probability to zero. The Laws of Large Numbers The Laws of Large Numbers Consider an experiment in which we toss a fair coin and assign the value 1 to a head and the value 0 to a tail. Let N H be the number of heads, let N be the number of coin tosses, let rH be N H / N and let X be the random N ( ) variable indicating a head or tail. Then N H = ∑ X n , E N H = N / 2 and ( ) E rH = 1 / 2. n=1 The Laws of Large Numbers σ r2 = σ 2X / N ⇒ σ r = σ X / N Therefore rH − 1 / 2 generally approaches H H zero but not smoothly or monotonically. ( ) σ N2 = N σ 2X ⇒ σ N = N σ X . Therefore N H − E N H does not approach H H zero. So the variation of N H increases with N . Convergence of Sequences of Random Variables We have already seen two types of convergence of sequences of random variables, almost sure convergence (in the Strong Law of Large Numbers) and convergence in probability (in the Weak Law of Large Numbers). Now we will explore other types of convergence. Convergence of Sequences of Random Variables Sure Convergence { ( )} converges surely to the random A sequence of random variables X n ζ () () variable X ζ if the sequence of functions X n ζ converges to the function () X ζ as n → ∞ for all ζ in S. Sure convergence requires that every possible sequence converges. Different sequences may converge to different limits but all must converge. () () X n ζ → X ζ as n → ∞ for all ζ ∈S Convergence of Sequences of Random Variables Almost Sure Convergence { ( )} converges almost surely to the A sequence of random variables X n ζ () () random variable X ζ if the sequence of functions X n ζ converges to the () function X ζ as n → ∞ for all ζ in S, except possible on a set of probability zero. () () P ⎡⎣ζ : X n ζ → X ζ as n → ∞ ⎤⎦ = 1 This is the convergence in the Strong Law of Large Numbers. Convergence of Sequences of Random Variables Mean Square Convergence { ( )} converges in the mean - square The sequence of random variables X n ζ () sense to the random variable X ζ if ( ( ) ( )) 2 ⎡ E X n ζ − X ζ ⎤ → 0 as n → ∞ ⎢⎣ ⎥⎦ If the limiting random variable X ζ is not known we can use the Cauchy () { ( )} converges in the Criterion: The sequence of random variables X n ζ () mean - square sense to the random variable X ζ if and only if ( () ( )) E ⎡ Xn ζ − Xm ζ ⎢⎣ 2 ⎤ → 0 as n → ∞ and m → ∞ ⎥⎦ Convergence of Sequences of Random Variables Convergence in Probability { ( )} converges in probability The sequence of random variables X n ζ () P ⎡ X (ζ ) − X (ζ ) > ε ⎤ → 0 as n → ∞ ⎣ ⎦ to the random variable X ζ if, for any ε > 0 n This is the convergence in the Weak Law of Large Numbers. Convergence of Sequences of Random Variables Convergence in Distribution { } The sequence of random variables X n with cumulative distribution { ( )} converges in distribution to the random variable X functions Fn x () with cumulative distribution function F x if () () Fn x → F x as n → ∞ () for all x at which F x is continuous. The Central Limit Theorem (coming soon) is an example of convergence in distribution. Long-Term Arrival Rates Suppose a system has a component that fails at time X 1 , it is replaced and () that component fails at time X 2 , and so on. Let N t be the number of () components that have failed at time t. N t is called a renewal counting process. Let X j denote the lifetime of the jth component. Then the time when the nth component fails is S n = X 1 + X 2 + + X n where we assume that the X j are iid non-negative random ( ) ( ) variables with 0 ≤ E X = E X j < ∞. We call the X j 's the interarrival or cycle times. Long-Term Arrival Rates Since the average interarrival time is E ( X ) seconds per event one would expect intuitively that the average rate of arrivals is 1/E ( X ) events per second. SN(t ) ≤ t ≤ SN(t )+1 Dividing through by N ( t ) , SN(t ) N (t ) SN(t ) N (t ) ≤ t N (t ) ≤ SN(t )+1 N (t ) is the average interarrival time for the first N ( t ) arrivals. Long-Term Arrival Rates 1 () = Xj ∑ N ( t ) N ( t ) j=1 SN(t ) Similarly, N t SN(t )+1 N (t ) + 1 we can say lim t→∞ t As t → ∞, N ( t ) → ∞ and → E ( X ) . So from N (t ) N (t ) 1 lim = . t→∞ t E(X) = E ( X ) and SN(t ) N (t ) ≤ SN(t ) N (t ) t N (t ) → E ( X ). ≤ SN(t )+1 N (t ) Long-Term Time Averages Suppose that events occur at random with iid interarrival times X j and that a cost C j is associated with each event. Let C j ( t ) be the cost N(t ) accumulated up to time t. Then C j ( t ) = ∑ C j . The average cost up to j=1 C ( t ) 1 N (t ) N ( t ) 1 N(t ) time t is = ∑Cj = C j . In the limit t → ∞, ∑ t t j=1 t N ( t ) j=1 N t N (t ) C (t ) E (C ) 1 1 () → and C j → E ( C ) . Therefore lim = . ∑ t→∞ t E(X) N ( t ) j=1 t E(X) The Central Limit Theorem N Let YN = ∑ Xn where the Xn's are an iid sequence of random variable n=1 values. N Let Z N = YN X ( ∑ − N E(X) = n=1 σX N n − E ( X )) . σX N =0 N N ⎛ ⎞ ( X n − E ( X )) ⎟ ∑ E ( X n − E ( X )) ⎜∑ n=1 E ( Z N ) = E ⎜ n=1 =0 ⎟= σX N σX N ⎜ ⎟ ⎜⎝ ⎟⎠ The Central Limit Theorem σ 2 ZN 2 ⎛ 1 ⎞ 2 = ∑⎜ σ ⎟ X =1 N⎠ n=1 ⎝ σ X N () ( ) ( ( )) ⎞⎟ ⎞⎟ = The MGF of Z N is Φ Z s = E e N ΦZ ΦZ N N sZ N ⎛ N ⎛ X −E X n s = E ⎜ ∏ exp ⎜ s ⎜⎝ ⎜⎝ n−1 σX N () ( ( )) ⎞⎟ ⎞⎟ ⎛ ⎛ X −E X s = E N ⎜ exp ⎜ s ⎜⎝ ⎜⎝ σX N () ⎛ ⎛ N ⎞⎞ ⎜ ⎜ ∑ Xn − E X ⎟ ⎟ = E ⎜ exp ⎜ s n=1 ⎟⎟ . ⎜ σX N ⎜ ⎟⎟ ⎜⎝ ⎟⎠ ⎟⎠ ⎜⎝ ⎟⎠ ⎟⎠ ⎟⎠ ⎟⎠ ( ( )) ( ( )) ⎞⎟ ⎞⎟ ⎛ ⎛ X −E X n ⎜ E exp s ⎜ ∏ ⎜⎝ ⎜⎝ n−1 σX N N ⎟⎠ ⎟⎠ The Central Limit Theorem We can expand the exponential function in an infinite series. ΦZ ΦZ ΦZ N N N ( ( )) + s ( X − E ( X )) ⎛ X −E X N ⎜ s = E 1+ s ⎜ σX N ⎝ () 2 2!σ N 2 X 2 X − E ( X )) ( +s ⎞ +⎟ 3 ⎟ 3!σ X N N ⎠ 3 3 = σ 2X ⎛ ⎞ =0 ⎜ ⎟ 2 3 ⎛ ⎞ ⎛ ⎞ E X −E X E X −E X ⎜ ⎟ E X −E X ⎝ ⎠ ⎝ ⎠ 2 3 s = ⎜ 1+ s +s +s +⎟ 2 3 2!σ X N ⎜ ⎟ σX N 3!σ X N N ⎜ ⎟ ⎜⎝ ⎟⎠ () ( ( )) ( ( ( )) ( )) 3 ⎛ ⎞ ⎛ ⎞ E X − E X ⎝ ⎠ s2 ⎜ ⎟ s = ⎜ 1+ + s3 +⎟ 3 2N 3! σ N N X ⎜ ⎟ ⎝ ⎠ () N ( ( )) N The Central Limit Theorem For large N we can neglect the higher-order terms. Then using m ⎛ z⎞ lim ⎜ 1+ ⎟ = e z we get m→∞ ⎝ m⎠ ΦZ N ⎛ s ⎞ e s2 /2 s = lim ⎜ 1+ = e ⇒ fZ z = ⎟ N N →∞ ⎝ 2N ⎠ () 2 () − z 2 /2 2π Thus the PDF approaches a Gaussian shape, with no assumptions about N the shapes of the PDF's of the X n's. This is convergence in distribution. The Central Limit Theorem Comparison of the distribution functions of two different Binomial random variables and Gaussian random variables with the same expected value and variance The Central Limit Theorem Comparison of the distribution functions of two different Poisson random variables and Gaussian random variables with the same expected value and variance The Central Limit Theorem Comparison of the distribution functions of two different Erlang random variables and Gaussian random variables with the same expected value and variance The Central Limit Theorem Comparison of the distribution functions of a sum of five independent random variables from each of four distributions and a Gaussian random variable with the same expected value and variance as that sum The Central Limit Theorem The PDF of a sum of independent random variables is the convolution of their PDF's. This concept can be extended to any number of random N variables. If Z = ∑ Xn then fZ ( z ) = fX1 ( z ) ∗ fX2 ( z ) ∗ fX2 ( z ) ∗∗ fX N ( z ) . n=1 As the number of convolutions increases, the shape of the PDF of Z approaches the Gaussian shape. The Central Limit Theorem The Central Limit Theorem The Gaussian pdf () fX x = ( ) 1 σ X 2π e ( − x− µ X )2 /2σ 2X ( ) 2 ⎛ µ X = E X and σ X = E ⎡⎣ X − E X ⎤⎦ ⎞ ⎝ ⎠ The Central Limit Theorem The Gaussian PDF Its maximum value occurs at the mean value of its argument. It is symmetrical about the mean value. The points of maximum absolute slope occur at one standard deviation above and below the mean. Its maximum value is inversely proportional to its standard deviation. The limit as the standard deviation approaches zero is a unit impulse. ( ) δ x − µ x = lim σ X →0 1 σ X 2π e ( − x− µ X )2 /2σ 2X The Central Limit Theorem The normal PDF is a Gaussian PDF with a mean of zero and a variance of one. 1 − x 2 /2 fX x = e 2π () The central moments of the Gaussian PDF are ⎧⎪0 , n odd n ⎛ ⎞ E ⎡⎣ X − E X ⎤⎦ = ⎨ ⎝ ⎠ ⎪1⋅ 3⋅5… n − 1 σ n , n even X ⎩ ( ) ( ) The Central Limit Theorem In computing probabilities from a Gaussian PDF it is necessary to x2 evaluate integrals of the form, ∫σ x1 () G x = 1 2π x ∫e −∞ − λ 2 /2 dx X 2π e ( − x− µ X )2 /2σ 2X . Define a function x − µX d λ . Then, using the change of variable λ = σX x2 − µ X σX dλ ∫ we can convert the integral to x1 − µ X 2π e − λ 2 /2 ⎛ x2 − µ X ⎞ ⎛ x1 − µ X ⎞ or G ⎜ −G⎜ . ⎟ ⎟ ⎝ σX ⎠ ⎝ σX ⎠ σX The G function is closely related to some other standard functions. For example () the "error" function erf x = 2 x e ∫ π 0 − λ2 ( ( 2x ) + 1). 1 d λ and G x = erf 2 () The Central Limit Theorem Jointly Normal Random Variables ( )( ( f XY ) 2 ⎡ ⎛ x − µ ⎞ 2 2ρ x − µ ⎤ y − µ ⎛ ⎞ y − µ XY X Y X Y ⎢ ⎥ − + ⎜ ⎟ ⎜ ⎟ σ XσY ⎢ ⎝ σX ⎠ ⎝ σY ⎠ ⎥ exp ⎢ − ⎥ 2 2 1− ρ XY ⎢ ⎥ ⎢ ⎥ ⎣ ⎦ x, y = 2πσ X σ Y 1− ρ 2XY ( ) ) The Central Limit Theorem Jointly Normal Random Variables The Central Limit Theorem Jointly Normal Random Variables The Central Limit Theorem Jointly Normal Random Variables The Central Limit Theorem Jointly Normal Random Variables Any cross section of a bivariate Gaussian PDF at any value of x or y is a Gaussian. The marginal PDF’s of X and Y can be found using ∞ ( ) ∫ f ( x, y ) dy fX x = XY −∞ which turns out to be () fX x = e ( − x− µ X )2 /2σ 2X σ X 2π Similarly ( ) fY y = e ( − y− µY )2 /2σ Y2 σ Y 2π The Central Limit Theorem Jointly Normal Random Variables The conditional PDF of X given Y is ( f X |Y ) ( ( ⎧ ⎡ ⎪ ⎣ x − µ X − ρ XY σ X / σ Y exp ⎨− 2 2 2 σ 1− ρ ⎪ X XY ⎩ x = 2πσ X 1− ρ 2XY () ( ) ( y − µ )) ) Y ⎤ ⎫ ⎦ ⎪ ⎬ ⎪ ⎭ 2 The conditional PDF of Y given X is ( fY |X ) ( ( ⎧ ⎡ ⎪ ⎣ y − µY − ρ XY σ Y / σ X exp ⎨− 2 2 2 σ 1− ρ ⎪ Y XY ⎩ y = 2πσ Y 1− ρ 2XY ( ) ( )( ) )) ⎫ x − µX ⎤ ⎪ ⎦ ⎬ ⎪ ⎭ 2 Transformations of Joint Probability Density Functions ( If W = g X ,Y ) and ( ) Z = h X ,Y and both functions are ( invertible then it is possible to write X = G W , Z ) and ( Y = H W ,Z ) and P ⎡⎣ x < X ≤ x + Δx, y < Y ≤ y + Δy ⎤⎦ = P ⎡⎣ w < W ≤ w + Δw, z < Z ≤ z + Δz ⎤⎦ ( ) ( ) f XY x, y ΔxΔy ≅ fWZ w, z ΔwΔz Transformations of Joint Probability Density Functions ∂G ΔxΔy = J ΔwΔz where J = ∂ w ∂H ∂w ( ) ( ) ∂G ∂z ∂H ∂z ( ( ) ( )) fWZ w, z = J f XY x, y = J f XY G w, z , H w, z Transformations of Joint Probability Density Functions Let R = X +Y 2 2 and ⎛Y⎞ Θ = tan ⎜ ⎟ , − π < Θ ≤ π ⎝ X⎠ -1 where X and Y are independent and Gaussian, with zero mean and equal variances. Then ( ) X = Rcos Θ ∂x J = ∂r ∂y ∂r and ∂x ∂θ = cos θ ∂y sin θ ∂θ ( ) Y = Rsin Θ () () ( ) =r r cos (θ ) −r sin θ Transformations of Joint Probability Density Functions () fX x = 1 σ X 2π e − x 2 /2 σ 2X ( ) and fY y = 1 σ Y 2π e − y 2 /2 σ Y2 Since X and Y are independent 1 −( x 2 + y 2 )/2σ 2 2 2 2 f XY x, y = e σ = σ = σ X Y 2πσ 2 Applying the transformation formula r − r 2 /2 σ 2 f RΘ r,θ = e u r , −π <θ ≤ π 2 2πσ r − r 2 /2 σ 2 f RΘ r,θ = e u r rect θ / 2π 2 2πσ ( ) ( ) () ( ) () ( ) Transformations of Joint Probability Density Functions The radius R is distributed according to the Rayleigh PDF π r r − r 2 /2σ 2 − r 2 /2 σ 2 fR r = ∫ e u r dθ = 2 e u r 2 σ − π 2πσ () () () π E R = σ and σ R2 = 0.429σ 2 2 The angle is uniformly distributed ( ) ( ) rect θ / 2π ⎧1 / 2π , − π < θ ≤ π r − r 2 /2 σ 2 fΘ θ = ∫ e u r dr = =⎨ 2 2π ⎩0 , otherwise −∞ 2πσ () ∞ () Multivariate Probability Density FX , X 1 2 ,, X N 0 ≤ FX , X 1 FX , X 1 1 2 2 ,, X N 2 ,, X N FX , X 1 2 ,, X N FX , X 1 2 ,, X N FX , X 2 ,, X N 1 ( x , x ,, x ) ≡ P ⎡⎣ X ≤ x ∩ X ≤ x ∩∩ X ≤ x ⎤⎦ ( x , x ,, x ) ≤ 1 , − ∞ < x < ∞ , , − ∞ < x < ∞ ( −∞,,−∞ ) = F ( −∞,, x ,,−∞ ) =F ( x ,,−∞,, x ) = 0 ( +∞,,+∞ ) = 1 ( x , x ,, x ) does not decrease if any number of x 's increase ( +∞,, x ,,+∞ ) = F ( x ) N 1 2 1 1 2 N 2 X1 , X 2 ,, X N 1 Xk k N N k N k N 1 X1 , X 2 ,, X N 1 2 N Multivariate Probability Density fX ,X 1 2 ,, X N fX ,X 2 ,, X N 1 ∞ ( x , x ,, x ) ( x , x ,, x ) ≥ 0 1 2 N 1 2 N ∞ ∞ ∫∫ ∫ f −∞ −∞ −∞ ∂N = FX , X ,, X x1 , x2 ,, x N ∂ x1∂ x2 ∂x N 1 2 N X1 , X 2 ,, X N ( , − ∞ < x1 < ∞ , , − ∞ < x N < ∞ ( x , x ,, x ) dx dx dx 1 2 N xN FX , X 1 2 ,, X N 1 2 N −∞ ∞ 2 −∞ −∞ =1 N x2 x1 ( x , x ,, x ) = ∫ ∫ ∫ f 1 ) X1 , X 2 ,, X N ( λ , λ ,, λ ) d λ d λ d λ 1 2 N 1 2 ∞ ∞ ( ) ∫∫ ∫ f ( x , x ,, x , x ,, x ) dx dx dx dx P ⎡⎣( X , X ,, X ) ∈R ⎤⎦ = ∫ ∫∫ f ( x , x ,, x ) dx dx dx f X xk = k −∞ 1 2 −∞ −∞ X1 , X 2 ,, X N 1 N E g X 1 , X 2 ,, X N k −1 2 X1 , X 2 ,, X N R (( N k +1 1 N 2 ∞ ∞ )) = ∫ ∫ g ( x , x ,, x ) f 1 −∞ −∞ 2 N X1 , X 2 ,, X N 1 k −1 2 N 1 2 k +1 dx N N ( x , x ,, x ) dx dx dx 1 2 N 1 2 N Other Important Probability Density Functions In an ideal gas the three components of molecular velocity are all Gaussian with zero mean and equal variances of σ V2 = σ V2 = σ V2 = σ V2 = kT / m X Y Z The speed of a molecule is V = V X2 + VY2 + VZ2 and the PDF of the speed is called Maxwellian and is given by v 2 − v 2 /2σV2 fV v = 2 / π 3 e u v σV () () Other Important Probability Density Functions Other Important Probability Density Functions N If χ = Y + Y + Y + + Y = ∑ Yn2 and the random variables 2 2 1 2 2 2 3 2 N n=1 Yn are all mutually independent and normally distributed then fχ2 x N /2−1 x = N /2 e− x/2 u x 2 Γ N /2 () ( ) () This is the chi - squared PDF. ( ) E χ2 = N σ χ2 2 = 2N Other Important Probability Density Functions Reliability () Reliability is defined by R t = P ⎡⎣T > t ⎤⎦ where T is the random variable representing the length of time after a system first begins operation that it fails. () () FT t = P ⎡⎣T ≤ t ⎤⎦ = 1− R t ( ( )) d R t = − fT t dt () Reliability Probably the most commonly-used term in reliability analysis is mean time to failure (MTTF). MTTF is the expected value ∞ ( ) ∫ t f (t ) dt. of T which is E T = T The conditional distribution −∞ function and PDF for the time to failure T given the condition T > t0 are FT |T >t 0 ⎧0 , t < t0 ⎫ ⎪ ⎪ FT t − FT t0 t = ⎨ FT t − FT t0 u t − t0 ⎬= , t ≥ t0 ⎪ R t0 ⎪ 1− F t T 0 ⎩ ⎭ fT t fT |T >t t = u t − t0 0 R t0 () () ( ) ( ) () () () ( ( ) ) ( ) ( ( ) ) Reliability A very common term in reliability analysis is failure rate which () () is defined by λ t dt = P ⎡⎣t < T ≤ t + dt ⎤⎦ = fT |T >t t dt. Failure rate is the probability per unit time that a system which has been operating properly up until time t will fail, as a function of t. R ′ (t ) ( ) λ (t ) = =− , t ≥0 R (t ) R (t ) R ′ (t ) + λ (t ) R (t ) = 0 , t ≥ 0 fT t Reliability λ ( x ) dx ∫ 0 The solution of R ′ t + λ t R t = 0 , t ≥ 0 is R t = e , t ≥ 0. () () () () − t One of the simplest models for system failure used in reliability analysis is that the failure rate is a constant. Let that constant be K. Then () t () () Kdx ∫ 0 R t =e = e− Kt and fT t = − R ′ t = Ke− Kt ← Exponential PDF − MTTF is 1/K. Reliability In some systems if any of the subsystems fails the overall system fails. If subsystem failure mechanisms are independent, the probability that the overall system is operating properly is the product of the probabilities that the subsystems are all operating properly. Let Ak be the event “subsystem k is operating properly” and let As be the event “the overall system is operating properly”. Then, if there are N subsystems () () () () P ⎡⎣ As ⎤⎦ = P ⎡⎣ A1 ⎤⎦ P ⎡⎣ A2 ⎤⎦P ⎡⎣ AN ⎤⎦ and R s t = R 1 t R 2 t R N t If the subsystems all have failure times with exponential PDF’s then −t 1/ τ +1/ τ ++1/ τ N ) −t / τ −t / τ −t / τ R t = e 1 e 2 e N = e ( 1 2 = e−t /τ s () 1 / τ = 1 / τ 1 + 1 / τ 2 + + 1 / τ N Reliability In some systems the overall system fails only if all of the subsystems fail . If subsystem failure mechanisms are independent, the probability that the overall system is not operating properly is the product of the probabilities that the subsystems are all not operating properly. As before let Ak be the event “subsystem k is operating properly” and let As be the event “the overall system is operating properly”. Then, if there are N subsystems P ⎡⎣ As ⎤⎦ = P ⎡⎣ A1 ⎤⎦ P ⎡⎣ A2 ⎤⎦P ⎡⎣ AN ⎤⎦ () ( ( ))(1− R (t ))(1− R (t )) and 1− R s t = 1− R 1 t 2 N If the subsystems all have failure times with exponential PDF’s then () ( R s t = 1− 1− e −t / τ 1 )( 1− e −t / τ 2 ) ( 1− e −t / τ N ) Reliability An exponential failure rate implies that whether a system has just begun operation or has been operating properly for a long time, the probability that it will fail in the next unit of time is the same. The expected value of the additional time to failure at any arbitrary time is a constant independent of past history, ( ) ( ) E T | T > t0 = t0 + E T This model is fairly reasonable for a wide range of times but not for all times in all systems. Many real systems experience two additional types of failure that are not indicated by an exponential PDF of failure times, infant mortality and wear - out. Reliability The “Bathtub” Curve Reliability The two higher-failure-rate portions of the bathtub curve are often modeled by the log - normal distribution of failure times. If a random variable X is Gaussian distributed its PDF is 2 − ( x− µ X ) /2 σ 2X e fX x = σ X 2π () ( ) If Y = e X then dY / dX = e X = Y , X = ln Y and the PDF of Y is ( ) fY y = ( ( )) = f X ln y dy / dx ( ) ( () e − ln y − µ X ) /2σ 2 2 X yσ X 2π Y is log-normal distributed E Y = e µ X + σ 2X /2 and σ = e 2 Y 2 µ X + σ 2X (e − 1). σ 2X The Log-Normal Distribution Y = eX The Log-Normal Distribution Another common application of the log-normal distribution is to model the pdf of a random variable X that is formed from the product of a large number N of independent random variables X n . N X = ∏ Xn n=1 The logarithm of X is then ( ) N ( ) log X = ∑ log X n ( ) n=1 Since log X is the sum of a large number of independent random variables its PDF tends to be Gaussian which implies that the PDF of X is log-normal in shape.