Probability Theory Chapter 3 Transforms Thommy Perlinger, Probability Theory 1 Introduction to transforms We are often interested in finding the probability distribution of a function (or transformation) of a random vector. The Transformation Theorem is a tool for doing this. The method is, however, rather cumbersome to apply when the number of components of the random vector is large. An alternative method for finding the probability distributions of such functions is given by transforms (or generating functions). Transforms are especially useful when the components of the random vector are independent and the function of interest is a linear combinations of those components. Thommy Perlinger, Probability Theory 2 Moments E(X) is the first moment of X. The k:th moment of X is defined by Moments with respect to the mean are called central moments of X. The k:th central moment of X is defined by In some proofs we also have use for factorial moments. The k:th factorial moment of X is defined by 3 The moments and the probability distribution E(X) and Var(X) are often used in order to summarize a probability distribution. The first two moments, that is, E(X) and E(X2), are not sufficient if we want a complete description of the probability distribution of X. However, access to all moments, i.e. E(X), E(X2), E(X3),… , are (almost) sufficient when it comes to give us a complete description of the probability distribution of X. Consequently; if all moments of X and Y exists and are identical, i.e. E(X)=E(Y), E(X2)=E(Y2), E(X3)=E(Y3),… , then it holds (in principle) that X and Y are equidistributed. Conclusion. If all the moments of X exists and we can find a function that is built-up by these moments, then this function will uniquely determine the probability distribution of X. Thommy Perlinger, Probability Theory 4 Taylor series expansion The Taylor series of a function f that is infinitely differentiable in a neighborhood of x=a is the power series The special case a=0 is called the Maclaurin series of f. For f(x)=ex we have that f (n)(0)=e0=1 for all n. The Maclaurin series then is 5 The moment generating function Let X be a random variable for which all moments exists. The moment generating function of X is defined by The moment generating function of X is thus a function of t and exists only if the expectation exists and is finite in a neighborhood of origo. Thommy Perlinger, Probability Theory 6 The moment generating function So why is the moment generating function of X of any interest to us? Let us start by taking a look att the Maclaurin series of etx. Given that all moments of X exists we get that Thommy Perlinger, Probability Theory 7 The moment generating function So why do we call X(t) the moment generating function of X? Repeated differentiation gives us the answer. Since it follows that The moment generating function of X thus offers us a (often convenient) method to compute moments for X. The most important property of the moment generating function is, however, that it is a complete and unique description of the probability distribution of X. This stems from the fact that it is built-up by the moments of X. Theorem 3.1. Let X and Y be random variables. If there exists h>0, such that X(t) = Y(t) for –h < t < h, then X and Y are equidistributed, i.e. Thommy Perlinger, Probability Theory 8 Problem 3.8.6 (Part 1) Determine the momentgenerating function of L(1), that is the standard Laplace distribution. It follows that Thommy Perlinger, Probability Theory 9 Problem 3.8.6 (Part 1, extended) Use the momentgenerating function to find E(X) and Var(X). Since It follows that E(X) = ´X(0) = 0. Furthermore which means that E(X2) = ´´X(0) = 2, and so Var(X)=2. Thommy Perlinger, Probability Theory 10 Problem 3.8.3 For the random variable X the moments can be expressed as Use this fact to find the (unique) probability distribution of X. The moment generating function is built-up by the moments so the Taylor series is which means that X is U(0,2). Thommy Perlinger, Probability Theory 11 Functions of random variables Exercise 3.3.6 a Exercise 3.3.6. Show that if XN(0,1) then X2χ2(1). The integrand show similarities with (another) normal density and to see which one it is we rewrite the expression as follows. because the last integrand is density function of N(0,1/(1-2t)). By uniqueness, it is clear that X2χ2(1). Thommy Perlinger, Probability Theory 12 Moment generating functions for linear transformations If we from the moment generating function of X can deduce the moment generating of aX+b then we have completely and uniquely determined the probability distribution of aX+b. Theorem 3.4. Let X be a random variable with moment generating function X(t), a and b real numbers. Then Proof. We use basic rules concerning expectations and get that Thommy Perlinger, Probability Theory 13 Moment generating functions for linear combinations of independent r.v. Theorem 3.2 (generalized). Let X₁,X₂,…,Xn be independent random variables with mgf’s ₁(t),₂(t),…, n(t). Let further a₁,a₂,…,an be real numbers and consider the linear combination Then the moment generating function of U is given by Corollary 3.2.1. If, in addition, X₁,X₂,…,Xn are equidistributed, then and Thommy Perlinger, Probability Theory 14 Problem 3.8.6 (Part 2) Let Y₁ and Y₂ be independent random variables such that Y₁Exp(θ₁) and Y₂Exp(θ₂). The mgf of Y₁-Y₂ now follows via If now θ₁=θ₂=1 we have that Conclusion. The difference of two independent Exp(1)-distributed random variables is L(1). Thommy Perlinger, Probability Theory 15 The normal distribution A random variable X with density function given by is said to be normally distributed with parameters μ och σ2, and we use the notation N(μ,σ2). In previous courses in probability we have been taught that the normal distribution possess a number of nice properties. With the aid of moment generating functions it is fairly easy to show that this is indeed the case. Thommy Perlinger, Sannolikhetslära och inferens II 16 The moment generating function of the normal distribution For the integrand to be a normal density, the expression x2-2μx+μ2-2σ2tx must be rewritten as a square in the form (x-θ)2. Thommy Perlinger, Sannolikhetslära och inferens II 17 The moment generating function of the normal distribution Since a square on the form (x-θ)2 can be expressed as We rewrite the expression in question as We now have the square that we sought but it has left us with two remainders. However, these remainders do not depend on x which means that they can be put outside of the integral. Thommy Perlinger, Sannolikhetslära och inferens II 18 The moment generating function of the normal distribution Since the integrand in the last expression is the density of N(μ+σ2t,σ2). Thommy Perlinger, Sannolikhetslära och inferens II 19 Moment generating functions for random vectors Definition 3.2. Let X = (X₁,X₂,…,Xn )´ be a random vector. The moment generating function of X is or using vector notation Thommy Perlinger, Probability Theory 20 The characteristic function Definition 4.1. The characteristic function of a random variable X is Since the expectation is bounded, the characteristic function exists for all t and for all probability distributions. The characteristic function of a random variable X completely and uniquely determines the probability distribution of X. The drawback of characteristic functions is that they necessitate understanding complex analysis. Therefore the use of characteristic functions is optional in this course. Thommy Perlinger, Probability Theory 21 The probability generating function Definition 2.1. Let X be a discrete random variable whose domain is (a subset of) the non-negative integers. The probability generating function of X is The most important property of the probability generating functions is, as for other transforms, that it completely and uniquely determines the probability distribution. Theorem 2.1. Let X and Y be discrete random variables whose domain is (a subset of) the non-negative integers. If gX=gY, then pX=pY. Thommy Perlinger, Probability Theory 22 The probability generating function So why do we call gX(t) the probability generating function of X? As for the moment generating function, repeated differentiation is the key. Since it follows that Hence, by differentiating gX(t) k times and put t=0 we ”generate” Pr(X=k). If we differentiate gX(t) k times and put t=1 (which requires more care) we instead generate factorial moments. Theorem 2.3. Let X be a discrete random variable whose domain is (a subset of) the non-negative integers, and suppose that E|X|k< for some k=0,1,2,…. Then Thommy Perlinger, Probability Theory 23 The probability generating function Theorem 2.2 (generalized). Let X₁,X₂,…,Xn be independent discrete random variables, whose domains are (subsets) of the non-negative integers, with gf’s g₁(t),g₂(t),…, gn(t). Let further a₁,a₂,…,an be real numbers and consider the linear combination Then the probability generating function of U is given by Corollary 2.2.1. If, in addition, X₁,X₂,…,Xn are equidistributed, then and Thommy Perlinger, Probability Theory 24 Distributions with random parameters (Hierarchic models revisited) The moment generating functions X│M=m(t) and M(t) are known. We are interested in finding the marginal distribution of X, i.e., X(t). where which implies that Often the expectation E(X│M(t)) can be written in terms of M in which case the distribution of X is easily found. Thommy Perlinger, Probability Theory 25 Exercise 5.1.a (or Exercise 2.3.1.a) Situation. X│M Po(m) and M Exp(a). Find the distribution of X, i.e., gX(t). Solution. The probability generating function of X is given by where It therefore follows that Thommy Perlinger, Probability Theory 26 Sums of a random number of random variables Sums of independent and identically distributed random variables (i.i.d.r.v) are of great interest in probability theory and statistics. In Corollary 3.2.1 we found that if X₁,X₂,…,Xn are i.i.d. with common moment generating function X(t) and we let Sn= X₁+X₂+…+Xn, then But what if n, the number of terms in the sum, is not given but considered to be the result of a random variable N? What about the distribution of SN? If N is independent of the X’s then it turns out that we can find the probability distribution of SN using the methods presented in the previous section. Thommy Perlinger, Probability Theory 27 Sums of a random number of random variables (Transforms) Theorem 6.3. Let X₁,X₂,… be i.i.d. with moment generating function X(t). Furthermore, let N be a discrete random variable, whose domain are (a subset) of the non-negative integers, and independent of the X’s. Set Sn= X₁+X₂+…+Xn , for n≥1. Then Proof. By applying the methods used in Section 3.5 we get that where which yields 28 Problem 3.8.27 (Part 1) Situation. A miner has been trapped in a mine with three doors. One takes him to freedom after one hour, one brings him back to the mine after three hours and the third one brings him back after five hours. He picks one of the three doors uniformly at random and continues to do so until he is free. Find the probability generating function for the time it will take him to get out. Thommy Perlinger, Probability Theory 29 Problem 3.8.27 (Part 1) Solution. The number of trials before he picks the correct way is given by N = The number of trials where he picks a return path, N Ge(1/3) Let Y represent the time it takes to complete one return path. Y is a two-point distribution with equal probability on y=3 and y=5. The probability distribution of Y is given by the generating function gY(t), Thommy Perlinger, Probability Theory 30 Problem 3.8.27 (Part 1) The total time that he spends in the return paths is thus described by and by Theorem 6.1 it therefore follows that Finally, the total time it takes to get out is given by W=Z+1, which means that Thommy Perlinger, Probability Theory 31 Problem 3.8.25 Situation. Let X₁,X₂,… be a sequence of i.i.d. 0-truncated Po(m)-distributed, random variables that is Let further N Bi(n,1-e-m) independent of X₁,X₂,…, and set a. Find the distribution of Y. b. Compute E(Y) without using (a). Thommy Perlinger, Probability Theory 32 Problem 3.8.25 a In order to use Theorem 3.6.1 we have to find the generating function of X. It therefore follows that and so it is clear that Y Po(mn). Thommy Perlinger, Probability Theory 33 Sums of a random number of random variables (Expectations) Theorem 6.2 (mod.). Suppose the conditions of Theorem 6.3 are satisfied. If moreover E(N) < and E|X| < then If, in addition, Var(N) < and Var(X) < then 34 Problem 3.8.27 (Part 2) Find the mean and the variance for the time it takes him to reach freedom. Basic rules of expectation yields that According to Theorem 6.2 it therefore follows that and Finally, we thus have that Thommy Perlinger, Probability Theory 35 Problem 3.8.25 b By Theorem 3.6.2.a, E(Y)=E(N)E(X), and since It follows that just as expected Thommy Perlinger, Probability Theory 36 Branching processes Branching processes is an important application of ”sums of a random number of random variables”. A branching process is a model of how (the number of individuals in) a population will evolve over time. In generation 0 (or at time t=0) there is an initial population (or founding members). Each individual ”gives birth” to a random number of ”children”. During their lifespans, these children give birth to to a random number of children, and so on. Let X(n) = The number of individuals in generation n where we here only have one founding member, i.e. X(0)=1. Thommy Perlinger, Probability Theory 37 The Galton-Watson process A branching process is called a Galton-Watson process if 1. All individuals give birth according to the same probability distribution, independently of each other, and 2. the number of children produced by an individual is independent of the number of individuals in their generation. Notation. Let Y1,Y2,… represent the number of children produced by individuals. It is clear that they are i.i.d. non-negative integer-valued random variables. Let the common probability function of Y1,Y2,… be given by p(k), k=0,1,2,…, and the common probability generating function be given by g(t). Let X(n) be the number of individuals in generation n, where X(0)=1. Furthermore, let the probability generating function of X(n) be given by gn(t). Thommy Perlinger, Probability Theory 38 The Galton-Watson process We are now interested in finding the probability distribution, the mean, and the variance of X(n) in such a Galton-Watson process d Due to the fact that X(1)=Y, and, for instance, that we can use results for sums of a random number of random variables. Theorem 7.1. For such a Galton-Watson process we have Theorem 7.2. Suppose m = E(Y1) < and σ2 = Var(Y1) < . Then Thommy Perlinger, Probability Theory 39 Problem 3.8.35 b (modified) Consider a Galton-Watson process where the offspring distribution (of Y1, Y2,…) is Ge(p). Determine the distribution, mean, and variance of X(2). From Theorem 7.1 it follows that the probability generating function of X(2) is From Theorem 7.2 it follows that the mean and the variance of X(2) are Thommy Perlinger, Probability Theory 40 Problem 3.8.35 b (modified) Determine Pr(X(2))=0, the probability that the population will become extinct in the second generation. In this case the probability distribution of X(2) is not one of the common distributions. We find the probabilities of X(2), p2(k), by using the fact that which means that Thommy Perlinger, Probability Theory 41 Asymptotics The probability of extinction For branching processes it is of natural interest to determine whether the population will die out (at some point in time). Denote by η the probability of (ultimate) extinction of a branching process. It then follows that From Theorem 7.2 we conclude that η=1 if m<1. But what if m≥1? Theorem 7.3. For a Galton-Watson process we have that a. η satisfies the equation t = g(t). b. η is the smallest non-negative root of the equation t = g(t). c. η=1 for m ≤ 1 and η < 1 for m > 1. Thommy Perlinger, Probability Theory 42 Problem 3.8.35 a Determine η, that is, the probability of (ultimate) extinction. Since m=q/p it follows from Theorem 7.3.c that η=1 if p≥1/2. But what if p<1/2? According to Theorem 7.3.b, η is the smallest non-negative root of the equation t = g(t). Since the roots of the equation t=g(t) are t=1 and t=p/q. We thus have that Thommy Perlinger, Probability Theory 43