Uploaded by berndtssonfilip

Probability Transforms: Moment & Generating Functions

Probability Theory
Chapter 3
Thommy Perlinger, Probability Theory
Introduction to transforms
We are often interested in finding the probability distribution of a
function (or transformation) of a random vector.
The Transformation Theorem is a tool for doing this. The
method is, however, rather cumbersome to apply when the
number of components of the random vector is large.
An alternative method for finding the probability distributions of
such functions is given by transforms (or generating functions).
Transforms are especially useful when the components of the
random vector are independent and the function of interest is a
linear combinations of those components.
Thommy Perlinger, Probability Theory
E(X) is the first moment of X. The k:th moment of X is defined by
Moments with respect to the mean are called central moments
of X. The k:th central moment of X is defined by
In some proofs we also have use for factorial moments. The k:th
factorial moment of X is defined by
The moments and
the probability distribution
E(X) and Var(X) are often used in order to summarize a probability
The first two moments, that is, E(X) and E(X2), are not sufficient if we want a
complete description of the probability distribution of X.
However, access to all moments, i.e. E(X), E(X2), E(X3),… , are (almost)
sufficient when it comes to give us a complete description of the probability
distribution of X.
Consequently; if all moments of X and Y exists and are identical, i.e.
E(X)=E(Y), E(X2)=E(Y2), E(X3)=E(Y3),… , then it holds (in principle) that X
and Y are equidistributed.
Conclusion. If all the moments of X exists and we can find a function that is
built-up by these moments, then this function will uniquely determine the
probability distribution of X.
Thommy Perlinger, Probability Theory
Taylor series expansion
The Taylor series of a function f that is infinitely differentiable in a
neighborhood of x=a is the power series
The special case a=0 is called the Maclaurin series of f.
For f(x)=ex we have that f (n)(0)=e0=1 for all n. The Maclaurin series then is
The moment generating function
Let X be a random variable for which all moments exists.
The moment generating function of X is defined by
The moment generating function of X is thus a function of t and
exists only if the expectation exists and is finite in a
neighborhood of origo.
Thommy Perlinger, Probability Theory
The moment generating function
So why is the moment generating function of X of any interest to us?
Let us start by taking a look att the Maclaurin series of etx.
Given that all moments of X exists we get that
Thommy Perlinger, Probability Theory
The moment generating function
So why do we call X(t) the moment generating function of X? Repeated
differentiation gives us the answer. Since
it follows that
The moment generating function of X thus offers us a (often convenient)
method to compute moments for X.
The most important property of the moment generating function is, however,
that it is a complete and unique description of the probability distribution of X.
This stems from the fact that it is built-up by the moments of X.
Theorem 3.1. Let X and Y be random variables. If there exists h>0, such that
X(t) = Y(t) for –h < t < h, then X and Y are equidistributed, i.e.
Thommy Perlinger, Probability Theory
Problem 3.8.6 (Part 1)
Determine the momentgenerating function of L(1), that is the
standard Laplace distribution. It follows that
Thommy Perlinger, Probability Theory
Problem 3.8.6 (Part 1, extended)
Use the momentgenerating function to find E(X) and Var(X). Since
It follows that E(X) = ´X(0) = 0. Furthermore
which means that E(X2) = ´´X(0) = 2, and so Var(X)=2.
Thommy Perlinger, Probability Theory
Problem 3.8.3
For the random variable X the moments can be expressed as
Use this fact to find the (unique) probability distribution of X. The moment
generating function is built-up by the moments so the Taylor series is
which means that X is U(0,2).
Thommy Perlinger, Probability Theory
Functions of random variables
Exercise 3.3.6 a
Exercise 3.3.6. Show that if XN(0,1) then X2χ2(1).
The integrand show similarities with (another) normal density and to see
which one it is we rewrite the expression as follows.
because the last integrand is density function of N(0,1/(1-2t)). By uniqueness,
it is clear that X2χ2(1).
Thommy Perlinger, Probability Theory
Moment generating functions for
linear transformations
If we from the moment generating function of X can deduce the
moment generating of aX+b then we have completely and
uniquely determined the probability distribution of aX+b.
Theorem 3.4. Let X be a random variable with moment
generating function X(t), a and b real numbers. Then
Proof. We use basic rules concerning expectations and get that
Thommy Perlinger, Probability Theory
Moment generating functions for linear
combinations of independent r.v.
Theorem 3.2 (generalized). Let X₁,X₂,…,Xn be independent random
variables with mgf’s ₁(t),₂(t),…, n(t). Let further a₁,a₂,…,an be real
numbers and consider the linear combination
Then the moment generating function of U is given by
Corollary 3.2.1. If, in addition, X₁,X₂,…,Xn are equidistributed, then
Thommy Perlinger, Probability Theory
Problem 3.8.6 (Part 2)
Let Y₁ and Y₂ be independent random variables such that
Y₁Exp(θ₁) and Y₂Exp(θ₂). The mgf of Y₁-Y₂ now follows via
If now θ₁=θ₂=1 we have that
Conclusion. The difference of two independent
Exp(1)-distributed random variables is L(1).
Thommy Perlinger, Probability Theory
The normal distribution
A random variable X with density function given by
is said to be normally distributed with parameters μ och σ2, and
we use the notation N(μ,σ2).
In previous courses in probability we have been taught that the
normal distribution possess a number of nice properties. With
the aid of moment generating functions it is fairly easy to show
that this is indeed the case.
Thommy Perlinger, Sannolikhetslära och inferens II
The moment generating function of
the normal distribution
For the integrand to be a normal density, the expression
x2-2μx+μ2-2σ2tx must be rewritten as a square in the form (x-θ)2.
Thommy Perlinger, Sannolikhetslära och inferens II
The moment generating function of
the normal distribution
Since a square on the form (x-θ)2 can be expressed as
We rewrite the expression in question as
We now have the square that we sought but it has left us with
two remainders. However, these remainders do not depend on x
which means that they can be put outside of the integral.
Thommy Perlinger, Sannolikhetslära och inferens II
The moment generating function of
the normal distribution
Since the integrand in the last expression is the density of
Thommy Perlinger, Sannolikhetslära och inferens II
Moment generating functions
for random vectors
Definition 3.2. Let X = (X₁,X₂,…,Xn )´ be a random vector. The moment
generating function of X is
or using vector notation
Thommy Perlinger, Probability Theory
The characteristic function
Definition 4.1. The characteristic function of a random variable X is
Since the expectation is bounded, the characteristic function exists for all t
and for all probability distributions.
The characteristic function of a random variable X completely and uniquely
determines the probability distribution of X.
The drawback of characteristic functions is that they necessitate
understanding complex analysis. Therefore the use of characteristic functions
is optional in this course.
Thommy Perlinger, Probability Theory
The probability generating function
Definition 2.1. Let X be a discrete random variable whose
domain is (a subset of) the non-negative integers.
The probability generating function of X is
The most important property of the probability generating
functions is, as for other transforms, that it completely and
uniquely determines the probability distribution.
Theorem 2.1. Let X and Y be discrete random variables whose
domain is (a subset of) the non-negative integers. If gX=gY,
then pX=pY.
Thommy Perlinger, Probability Theory
The probability generating function
So why do we call gX(t) the probability generating function of X? As for the
moment generating function, repeated differentiation is the key. Since
it follows that
Hence, by differentiating gX(t) k times and put t=0 we ”generate” Pr(X=k). If
we differentiate gX(t) k times and put t=1 (which requires more care) we
instead generate factorial moments.
Theorem 2.3. Let X be a discrete random variable whose domain is
(a subset of) the non-negative integers, and suppose that E|X|k< for some
k=0,1,2,…. Then
Thommy Perlinger, Probability Theory
The probability generating function
Theorem 2.2 (generalized). Let X₁,X₂,…,Xn be independent discrete random
variables, whose domains are (subsets) of the non-negative integers, with
gf’s g₁(t),g₂(t),…, gn(t). Let further a₁,a₂,…,an be real numbers and consider
the linear combination
Then the probability generating function of U is given by
Corollary 2.2.1. If, in addition, X₁,X₂,…,Xn are equidistributed, then
Thommy Perlinger, Probability Theory
Distributions with random parameters
(Hierarchic models revisited)
The moment generating functions X│M=m(t) and M(t) are known. We are
interested in finding the marginal distribution of X, i.e., X(t).
which implies that
Often the expectation E(X│M(t)) can be written in terms of M in which case
the distribution of X is easily found.
Thommy Perlinger, Probability Theory
Exercise 5.1.a (or Exercise 2.3.1.a)
Situation. X│M  Po(m) and M  Exp(a). Find the distribution of X, i.e., gX(t).
Solution. The probability generating function of X is given by
It therefore follows that
Thommy Perlinger, Probability Theory
Sums of a random number of
random variables
Sums of independent and identically distributed random variables (i.i.d.r.v)
are of great interest in probability theory and statistics.
In Corollary 3.2.1 we found that if X₁,X₂,…,Xn are i.i.d. with common moment
generating function X(t) and we let Sn= X₁+X₂+…+Xn, then
But what if n, the number of terms in the sum, is not given but considered to
be the result of a random variable N? What about the distribution of SN?
If N is independent of the X’s then it turns out that we can find the probability
distribution of SN using the methods presented in the previous section.
Thommy Perlinger, Probability Theory
Sums of a random number of
random variables (Transforms)
Theorem 6.3. Let X₁,X₂,… be i.i.d. with moment generating function X(t).
Furthermore, let N be a discrete random variable, whose domain are
(a subset) of the non-negative integers, and independent of the X’s.
Set Sn= X₁+X₂+…+Xn , for n≥1. Then
Proof. By applying the methods used in Section 3.5 we get that
which yields
Problem 3.8.27 (Part 1)
Situation. A miner has been trapped in a mine with three doors.
One takes him to freedom after one hour, one brings him back
to the mine after three hours and the third one brings him back
after five hours.
He picks one of the three doors uniformly at random and
continues to do so until he is free.
Find the probability generating function for the time it will take
him to get out.
Thommy Perlinger, Probability Theory
Problem 3.8.27 (Part 1)
Solution. The number of trials before he picks the correct way is given by
N = The number of trials where he picks a return path,
N  Ge(1/3)
Let Y represent the time it takes to complete one return path. Y is a two-point
distribution with equal probability on y=3 and y=5.
The probability distribution of Y is given by the generating function gY(t),
Thommy Perlinger, Probability Theory
Problem 3.8.27 (Part 1)
The total time that he spends in the return paths is thus described by
and by Theorem 6.1 it therefore follows that
Finally, the total time it takes to get out is given by W=Z+1, which means that
Thommy Perlinger, Probability Theory
Problem 3.8.25
Situation. Let X₁,X₂,… be a sequence of i.i.d. 0-truncated Po(m)-distributed,
random variables that is
Let further N  Bi(n,1-e-m) independent of X₁,X₂,…, and set
a. Find the distribution of Y.
b. Compute E(Y) without using (a).
Thommy Perlinger, Probability Theory
Problem 3.8.25 a
In order to use Theorem 3.6.1 we have to find the generating function of X.
It therefore follows that
and so it is clear that Y  Po(mn).
Thommy Perlinger, Probability Theory
Sums of a random number of
random variables (Expectations)
Theorem 6.2 (mod.). Suppose the conditions of Theorem 6.3
are satisfied. If moreover E(N) <  and E|X| <  then
If, in addition, Var(N) <  and Var(X) <  then
Problem 3.8.27 (Part 2)
Find the mean and the variance for the time it takes him to reach freedom.
Basic rules of expectation yields that
According to Theorem 6.2 it therefore follows that
Finally, we thus have that
Thommy Perlinger, Probability Theory
Problem 3.8.25 b
By Theorem 3.6.2.a, E(Y)=E(N)E(X), and since
It follows that
just as expected
Thommy Perlinger, Probability Theory
Branching processes
Branching processes is an important application of ”sums of a
random number of random variables”.
A branching process is a model of how (the number of
individuals in) a population will evolve over time.
In generation 0 (or at time t=0) there is an initial population (or
founding members). Each individual ”gives birth” to a random
number of ”children”. During their lifespans, these children give
birth to to a random number of children, and so on. Let
X(n) = The number of individuals in generation n
where we here only have one founding member, i.e. X(0)=1.
Thommy Perlinger, Probability Theory
The Galton-Watson process
A branching process is called a Galton-Watson process if
1. All individuals give birth according to the same probability distribution,
independently of each other, and
2. the number of children produced by an individual is independent of the
number of individuals in their generation.
Let Y1,Y2,… represent the number of children produced by individuals. It is
clear that they are i.i.d. non-negative integer-valued random variables.
Let the common probability function of Y1,Y2,… be given by p(k), k=0,1,2,…,
and the common probability generating function be given by g(t).
Let X(n) be the number of individuals in generation n, where X(0)=1.
Furthermore, let the probability generating function of X(n) be given by gn(t).
Thommy Perlinger, Probability Theory
The Galton-Watson process
We are now interested in finding the probability distribution, the mean, and
the variance of X(n) in such a Galton-Watson process
Due to the fact that X(1)=Y, and, for instance, that
we can use results for sums of a random number of random variables.
Theorem 7.1. For such a Galton-Watson process we have
Theorem 7.2. Suppose m = E(Y1) <  and σ2 = Var(Y1) < . Then
Thommy Perlinger, Probability Theory
Problem 3.8.35 b (modified)
Consider a Galton-Watson process where the offspring distribution
(of Y1, Y2,…) is Ge(p). Determine the distribution, mean, and variance of X(2).
From Theorem 7.1 it follows that the probability generating function of X(2) is
From Theorem 7.2 it follows that the mean and the variance of X(2) are
Thommy Perlinger, Probability Theory
Problem 3.8.35 b (modified)
Determine Pr(X(2))=0, the probability that the population will become extinct
in the second generation.
In this case the probability distribution of X(2) is not one of the common
distributions. We find the probabilities of X(2), p2(k), by using the fact that
which means that
Thommy Perlinger, Probability Theory
The probability of extinction
For branching processes it is of natural interest to determine whether the
population will die out (at some point in time).
Denote by η the probability of (ultimate) extinction of a branching process.
It then follows that
From Theorem 7.2 we conclude that η=1 if m<1. But what if m≥1?
Theorem 7.3. For a Galton-Watson process we have that
a. η satisfies the equation t = g(t).
b. η is the smallest non-negative root of the equation t = g(t).
c. η=1 for m ≤ 1 and η < 1 for m > 1.
Thommy Perlinger, Probability Theory
Problem 3.8.35 a
Determine η, that is, the probability of (ultimate) extinction. Since m=q/p it
follows from Theorem 7.3.c that η=1 if p≥1/2. But what if p<1/2?
According to Theorem 7.3.b, η is the smallest non-negative root of the
equation t = g(t). Since
the roots of the equation t=g(t) are t=1 and t=p/q. We thus have that
Thommy Perlinger, Probability Theory