Uploaded by Luca Leimbeck Del Duca

Markov

advertisement
A crash course on Conditional Expectation
1. Joint and conditional distribution
2. Conditional expectation
3. Law of total expectation
Chen Zhou
Erasmus University Rotterdam
1 / 23
Joint and Conditional Distribution
Chen Zhou
Erasmus University Rotterdam
2 / 23
Joint and marginal mass function
▶ Let X and Y be two discrete random variables.
▶ The joint (probability) mass function p(x, y ) is defined for each pair
of numbers (x, y ) by
p(x, y ) = P(X = x, Y = y ).
▶ Let A be any set consisting of (x, y ) pairs. Then,
X
P((X , Y ) ∈ A) =
p(x, y ).
(x,y )∈A
▶ The marginal (probability) mass functions of X and Y , denoted by
pX (x) and pY (y ), respectively, are given by
X
X
pX (x) =
p(x, y ), pY (y ) =
p(x, y ).
y
Chen Zhou
Erasmus University Rotterdam
x
3 / 23
Conditional mass function
▶ Recall that, for two events A and B, the conditional probability of A
given B is defined by
P(A | B) =
P(A ∩ B)
,
P(B)
provided that P(B) > 0.
▶ If X and Y are two discrete random variables with joint mass function
p(x, y ), and marginal mass functions pX (x) and pY (y ), then the
conditional (probability) mass function of X given Y is
pX |Y (x | y ) =
p(x, y )
,
pY (y )
provided that pY (y ) > 0.
Chen Zhou
Erasmus University Rotterdam
4 / 23
Example 1
When an automobile is stopped by a roving safety patrol, each tire is
checked for tire wear, and each headlight is checked to see whether it is
properly aimed. Let X denote the number of headlights that need
adjustment, and let Y denote the number of defective tires. Suppose the
joint probability mass function of X and Y is given in the following tabel.
X =0
1
2
P(Y = y )
P(X = 0|Y = 2) =
P(X = 1|Y = 2) =
Y =0
0.3
0.18
0.12
0.6
1
0.05
0.03
0.02
0.1
P(X =0,Y =2)
P(Y =2)
P(X =1,Y =2)
P(Y =2)
P(X =2,Y =2)
P(Y =2)
2
0.025
0.015
0.01
0.05
=
=
3
0.025
0.015
0.01
0.05
4
0.1
0.06
0.04
0.20
P(X = x)
0.50
0.3
0.2
1
0.025
0.05 = 0.5
0.015
0.05 = 0.3
0.01
0.05 = 0.2
=
P(X = 2|Y = 2) =
P(X = 0|Y = 2) + P(X = 1|Y = 2) + P(X1 = 2|Y = 2) = 1
Conditional mass function is still a mass function!
Chen Zhou
Erasmus University Rotterdam
5 / 23
Joint and marginal density function
▶ Let X and Y be continuous random variables.
▶ If
Z Z
P((X , Y ) ∈ A) =
f (x, y ) dxdy .
A
for any two-dimensional set A, then f (x, y ) is the joint (probability)
density function for X and Y .
▶ The marginal (probability) density functions of X is given by
Z ∞
fX (x) =
f (x, y ) dy .
−∞
▶ Likewise, the marginal density function of Y is given by
Z ∞
fY (y ) =
f (x, y ) dx.
−∞
Chen Zhou
Erasmus University Rotterdam
6 / 23
Example 2
▶ Two components of a minicomputer have the following joint density
function for their useful lifetimes X and Y :
(
xe −x(1+y ) x ≥ 0, y ≥ 0
f (x, y ) =
0
otherwise
▶ Compute the marginal density functions of X and Y .
▶ For x ≥ 0,
Z ∞
fX (x) =
f (x, y ) dy
−∞
Z ∞
=
xe −x(1+y ) dy = e −x .
0
▶ For y ≥ 0,
Z
∞
fY (y ) =
f (x, y ) dx
−∞
Z ∞
=
xe −x(1+y ) dx = (1 + y )−2 .
Chen Zhou
0Erasmus University Rotterdam
7 / 23
Conditional density function
▶ Let X and Y be two continuous random variables with joint density
function f (x, y ), and marginal density functions fX (x) and fY (y ).
▶ Then, the conditional (probability) density function of X given Y is
fX |Y (x | y ) =
f (x, y )
,
fY (y )
provided that fY (y ) > 0.
Chen Zhou
Erasmus University Rotterdam
8 / 23
Conditional cumulative distribution function
▶ Let X and Y be two discrete random variables.
▶ Then, the conditional cumulative distribution function of X given
Y = y is given by
X
FX |Y (a | y ) = P(X ≤ a | Y = y ) =
P(X = x | Y = y )
x≤a
=
X
pX |Y (x | y ),
x≤a
provided that P(Y = y ) > 0.
▶ Let X and Y be two continuous random variables.
▶ Then, the conditional cumulative distribution function of X given
Y = y is given by
FX |Y (a | y ) = P(X ≤ a | Y = y )
Z a
=
fX |Y (x | y ) dx,
−∞
provided that fY (y ) > 0.
Chen Zhou
Erasmus University Rotterdam
9 / 23
Conditional Expectation
Chen Zhou
Erasmus University Rotterdam
10 / 23
Conditional expectation at a specific value
▶ Let X and Y be jointly distributed discrete random variables with
mass function p(x, y ). Then the conditional expectation of h(X )
given that Y = y is defined as
X
E(h(X ) | Y = y ) =
h(x)pX |Y (x | y )
x
▶ Let X and Y be jointly distributed continuous random variables with
density function f (x, y ). Then the conditional expectation of h(X )
given that Y = y is defined as
Z ∞
E(h(X ) | Y = y ) =
h(x)fX |Y (x | y ) dx
−∞
Chen Zhou
Erasmus University Rotterdam
11 / 23
Example R3.5
▶ The joint density of X and Y is given by
(
6xy (2 − x − y ), 0 < x < 1, 0 < y < 1
f (x, y ) =
0,
otherwise
▶ Compute E(X | Y = y ), where 0 < y < 1.
Chen Zhou
Erasmus University Rotterdam
12 / 23
Example R3.5: continued
▶ Starting point
Z
E(X | Y = y ) =
1
xfX |Y (x | y ) dx.
0
▶ The domain of x may also depend on y (not in this example).
▶ Recall the conditional density
fX |Y (x | y ) =
f (x, y )
fY (y )
▶ Hence, we need the marginal density of Y
Z 1
fY (y ) =
6xy (2 − x − y ) dx = y (4 − 3y )
0
Chen Zhou
Erasmus University Rotterdam
13 / 23
Example R3.5: continued
▶ The conditional density of X given the event Y = y is

 6x(2 − x − y ) , 0 < x < 1, 0 < y < 1
4 − 3y
fX |Y (x | y ) =

0,
otherwise
▶ The conditional expectation of X given Y = y is
Z 1
5 − 4y
6x(2 − x − y )
E(X | Y = y ) =
x
dx =
8 − 6y
4 − 3y
0
Chen Zhou
Erasmus University Rotterdam
14 / 23
Conditional expectation as a random variable
▶ We may think of E(h(X ) | Y = y ) as a function of y
▶ Denote the function as u(y ) = E(h(X ) | Y = y )
▶ Without specifying the value y , the value of Y is random
▶ The same as applying the function u(y ) to the random variable Y
▶ We obtain a random variable u(Y ), which is called the conditional
expectation of h(X ) given Y , and denoted by E(h(X ) | Y ).
▶ A major difference
E(h(X ) | Y = y ) is a non-random function of y
E(h(X ) | Y ) is a random variable!
Chen Zhou
Erasmus University Rotterdam
15 / 23
Law of Total Expectation
Chen Zhou
Erasmus University Rotterdam
16 / 23
Law of total expectation: Tower property
▶ Consider E(X | Y ) (a random variable!)
▶ The law of total expectation (or Tower property) states that
E(X ) = E(E(X | Y ))
▶ This result holds regardless the joint distribution of (X , Y ).
▶ It leaves the possibility to “smartly” choose Y for calculating the
expectation of X with a complex distribution function.
Chen Zhou
Erasmus University Rotterdam
17 / 23
Example 3
▶ A quality control plan calls for randomly selecting three items and
observing the number X of defects.
▶ However, the proportion U of defects produced by the machine varies
from day to day and has a uniform distribution on the interval (0, 1).
▶ Find the expected number E(X ) of defects observed among three
sampled items.
▶ Answer:
▶ Given the event U = u, X has expectation 3u.
▶ That is, E(X | U = u) = 3u, and E(X | U) = 3U.
▶ It follows that
E(X ) = E(E(X | U)) = E(3U) = 3E(U) = 32 .
Chen Zhou
Erasmus University Rotterdam
18 / 23
Computing variances
▶ Consider the random variable Var(X | Y ), conditional variance of X
given Y .
▶ We may write
Var(X | Y ) = E(X 2 | Y ) − (E(X | Y ))2 .
▶ The conditional variance formula states that
Var(X ) = E(Var(X | Y )) + Var(E(X | Y ))
▶ Important in Rao-Blackwell Theorem and in “variance reduction”.
Chen Zhou
Erasmus University Rotterdam
19 / 23
Computing variances: Example 3
▶ Find the variance of the number X of defects observed among three
sampled items.
▶ Given the event U = u, we have that X has variance 3u(1 − u), and
thus Var(X | U) = 3U(1 − U).
▶ Use the conditional variance formula
Var(X ) = E(Var(X | U)) + Var(E(X | U))
= E(3U(1 − U)) + Var(3U)
= 3E(U) − 3E(U 2 ) + 9Var(U).
▶ We have E(U) = 1/2, E(U 2 ) = 1/3 and
Var(U) = E(U 2 ) − (E(U))2 =
▶ Hence,
Var(X ) =
Chen Zhou
1 1
1
− = .
3 4
12
3 3
9
5
− +
= .
2 3 12
4
Erasmus University Rotterdam
20 / 23
Relate (conditional) probability to expectation of an
indicator variable
▶ Let A be an arbitrary event.
▶ Define the indicator variable IA by
(
1 if A occurs
IA =
0 otherwise
▶ Simple but useful property:
E(IA ) = P(IA = 1) · 1 + P(IA = 0) · 0 = P(IA = 1) = P(A).
▶ Similarly, for any random variable Y ,
E(IA | Y = y ) = P(A | Y = y ).
Chen Zhou
Erasmus University Rotterdam
21 / 23
Applying conditional expectation for calculating probability
▶ By the law of total expectation, it follows for a discrete random
variable Y that
X
E(IA | Y = y )P(Y = y ).
P(A) = E(IA ) =
y
=
X
P(A | Y = y )P(Y = y ).
y
▶ This is in fact the law of total probability.
▶ By the law of total expectation, it follows for a continuous random
variable Y that
Z ∞
P(A) = E(IA ) =
E(IA | Y = y )fY (y ) dy
−∞
Z ∞
=
P(A | Y = y )fY (y ) dy
−∞
Chen Zhou
Erasmus University Rotterdam
22 / 23
Example 3 again: calculating probability
▶ A quality control plan calls for randomly selecting three items and
observing the number X of defects.
▶ However, the proportion U of defects produced by the machine varies
from day to day and has a uniform distribution on the interval (0, 1).
▶ Find the unconditional probability that exactly two defects are
observed in the sample.
▶ Answer:
▶ Given the event U = u, we have that X has a Bin(3, u) distribution.
▶ Hence,
Z 1
Z 1 3 2
1
P(X = 2) =
P(X = 2 | U = u)fU (u) du =
u (1−u) du = .
2
4
0
0
Chen Zhou
Erasmus University Rotterdam
23 / 23
Markov Process
Chen Zhou
Chen Zhou
Erasmus University Rotterdam
1 / 22
Who is Markov?
▶ The only name showing up in a course title!
Andrey Markov (1856-1922)
▶ Have you met him before?
▶ Markov inequality in Probability Theory!
Chen Zhou
Erasmus University Rotterdam
2 / 22
What is a process?
▶ In probability theory, we make models for “randomness”
▶ Univariate random variable X : e.g. temperature at 12.00 tomorrow
▶ Multivariate random vector (X , Y ): e.g. height and weight of a
random person
▶ What if the dimension goes to infinity?
▶ Random sequence: X1 , X2 , . . .
▶ In statistics, we often assume independent and identically distributed
observations
▶ But what if these variables are dependent?
▶ Stochastic process: infinite many (dependent) random variables
▶ How to model their dependence? No infinite-dimensional joint
distribution function!
▶ How to use such a process for modeling real life?
▶ What conclusion one can draw from such a model?
Chen Zhou
Erasmus University Rotterdam
3 / 22
Modeling real life by a stochastic process
▶ Random sequence X1 , X2 , . . .
▶ Robinson’s game: Xi ∈ {1, 2, 3} indicating the location
▶ Particle coloring: Xi = the number of orange particles
▶ Marginal and dependence
▶ Each Xi is a random variable
▶ Dependence often described by conditional distribution
▶ Example: Particle coloring (total number of particles: m)
Pr(Xi = n + 1|Xi−1 = n) = 1 − mn , Pr(Xi = n − 1|Xi−1 = n) =
n
m
▶ Goal: study some characteristics of the process
▶ More than random sequence: indexing on real number
▶ Example: status of a machine, temperature over time
▶ A stochastic process X (t) with t ∈ [0, ∞)
Markov process:
Given the present, the future does not depend on the past.
Chen Zhou
Erasmus University Rotterdam
6 / 22
The organization of the lectures
▶ Literature: Ross, S. Introduction to Probability Models, 12th
edition, 2019.
▶ Theorems and Exercises refer to the 12th edition
▶ It is possible to use the 11th edition (indicated by [...])
▶ Lectures
▶ Two lectures per week: theory and simple examples
▶ Exercise lecture: once per two weeks (Friday)
▶ Exercise lectures are not only about exercises! Also important methods.
▶ Blended learning
▶ Videos before (exercise) lecture: 1-2 videos (10-15min)
▶ Motivation of the coming lecture
▶ Preparation knowledge
▶ Simple concepts explained
▶ Lecture (1h20min): remaining time are used for Q&A
▶ Lectures are recorded, but released only if
▶ 70% of registered students show up
▶ No technical failure
Chen Zhou
Erasmus University Rotterdam
7 / 22
Exercises and Tutorials
▶ Tutorial: 2-3 exercises explained
▶
▶
▶
▶
5-6 exercises per lecture
One simple exercise ♡: do it right after the lecture!
Two exercises ♢: Explained in tutorial, try to do before tutorial
2-3 extra exercises: do it after tutorial
▶ Additional practical material (released in Week 5)
▶ Sample exam: solution videos released in Week 7
▶ Additional exercises
Chen Zhou
Erasmus University Rotterdam
8 / 22
Assignment and Exam
▶ Mid-term assignment
▶ Released in Week 4.
▶ Format: two questions in the format of final exam
▶ Opportunity to receive feedback for your solutions!
▶ Final exam (100 points)
▶ Format: Open book
▶ Content: all (exercise) lectures
▶ See the Sample Exam (ANS format)
Chen Zhou
Erasmus University Rotterdam
9 / 22
Plan for Lecture 1
Lecture 0: A crash course on Conditional Expectation
1. Conditional expectation at a specific value (Chap 3.2-3.3)
2. Conditional expectation as a random variable, Law of total
expectation (Chap 3.4 till Example R3.9)
3. Variance and conditional variance (Chap 3.4.1)
4. Calculating probability (Chap 3.5 till Example R3.23 [R3.22])
Lecture
1. A quick refresh for conditional expectation
▶ A difficult example with Markov flavor: Example R3.12 [R3.13]
2. Stochastic process: an overview
Chen Zhou
Erasmus University Rotterdam
10 / 22
Law of Total Expectation
Chen Zhou
Erasmus University Rotterdam
11 / 22
Law of total expectation: Tower property
▶ Consider E(X | Y ) (a random variable!)
▶ The law of total expectation (or Tower property) states that
E(X ) = E(E(X | Y ))
Chen Zhou
Erasmus University Rotterdam
12 / 22
A difficult example: Example R3.12 [R3.13]
▶ A miner is trapped in a mine containing three doors.
▶ The first door leads to a tunnel that takes him to safety after two
hours of travel.
▶ The second door leads to a tunnel that returns him to the mine after
three hours.
▶ The third door leads to a tunnel that returns him to his mine after five
hours.
▶ Assume that the miner does not learn, and is at all times equally
likely to choose any one of the doors.
▶ What is the expected length of time till the miner reaches safety?
Chen Zhou
Erasmus University Rotterdam
13 / 22
Example R3.13 [R3.12]: continued
▶ Let X be the time until the miner reaches safety, and let Y be the
door initially chosen.
▶ From the description:
▶ E(X | Y = 1) = 2,
▶ E(X | Y = 2) = 3 + E(X ),
▶ E(X | Y = 3) = 5 + E(X ).
▶ By the law of total expectation,
E(X ) = E(X | Y = 1)P(Y = 1)
+ E(X | Y = 2)P(Y = 2)
+ E(X | Y = 3)P(Y = 3)
= 13 E(X | Y = 1) + 13 E(X | Y = 2)
+ 13 E(X | Y = 3)
=
1
3
(2 + 3 + E(X ) + 5 + E(X )) .
▶ Solving this equation yields E(X ) = 10.
▶ This is a “recursive” way of finding the quantity desired.
Chen Zhou
Erasmus University Rotterdam
14 / 22
Compound random variables: Example R3.10 and R3.19
[R3.20]
▶ Let X1 , X2 , . . . be a sequence of iid random variables.
▶ Let N be a random variable with possible outcomes 0, 1, 2, . . ., which
is independent of the sequence X1 , X2 , . . ..
▶ Define the random variable SN by
SN =
N
X
Xi .
i=1
▶ Then, SN is called a compound random variable.
Chen Zhou
Erasmus University Rotterdam
15 / 22
Compound random variables: expectation
▶ Denote the common mean of X1 , X2 , . . . by µ.
▶ We have, as N is independent of X1 , X2 , . . .,
E
N
X
!
Xi | N = n
i=1
=E
n
X
!
Xi | N = n
i=1
Hence E
P
N
i=1 Xi
Chen Zhou
!
Xi
= nµ,
i=1
| N = Nµ.
▶ By the law of total expectation
!
!!
N
N
X
X
E
Xi = E E
Xi | N
i=1
=E
n
X
= E(Nµ) = E(N)µ.
i=1
Erasmus University Rotterdam
16 / 22
Compound random variables: conditional variance
▶ Denote the common variance of X1 , X2 , . . . by σ 2 .
▶ We have, as N is independent of X1 , X2 , . . .,
Var(SN | N = n) =Var
=Var
N
X
i=1
n
X
i=1
!
Xi | N = n
= Var
n
X
!
Xi | N = n
i=1
!
Xi
=
n
X
Var(Xi ) = nσ 2 .
i=1
▶ This implies
Var(SN | N) = Nσ 2 .
Chen Zhou
Erasmus University Rotterdam
17 / 22
Compound random variables: unconditional variance
▶ Recall that E(SN | N) = Nµ.
▶ According to the conditional variance formula,
Var(SN ) = E(Var(SN | N)) + Var(E(SN | N)).
with
▶ E(Var(SN | N)) = E(Nσ 2 ) = E(N)σ 2 ,
▶ Var(E(SN | N)) = Var(Nµ) = µ2 Var(N).
▶ Hence, we obtain
Var(SN ) = E(N)σ 2 + µ2 Var(N).
Chen Zhou
Erasmus University Rotterdam
18 / 22
Stochastic Processes: an overview
Chen Zhou
Erasmus University Rotterdam
19 / 22
Stochastic processes: time and states
▶ A stochastic process is a collection of (infinitely many) random
variables
▶ Often describe the behavior of a system that evolves randomly in time
Be careful with the notations: How to define a stochastic process.
▶ Discrete time
▶ Stochastic process is of the form {Xn , n ≥ 0}.
▶ Continuous time
▶ Stochastic process is of the form {X (t), t ≥ 0}.
▶ The random variables take values in a certain state space S.
▶ Discrete: S is finite or countable
▶ Countable means that there is one-to-one matching between the
elements of S and the natural numbers.
▶ {0, 1, 2, . . .} for instance.
▶ Continuous: S is uncountable.
▶ [0, 1) for instance.
Chen Zhou
Erasmus University Rotterdam
20 / 22
Time and states: examples
▶ Inventory level of a product in a system with periodic review.
▶ Discrete time, discrete state space.
▶ Temperature at noon on successive days.
▶ Discrete time, continuous state space.
▶ Number of customers in a shop during a given day.
▶ Continuous time, discrete state space.
▶ Value of the AEX-index during a certain day.
▶ Continuous time, continuous state space.
Chen Zhou
Erasmus University Rotterdam
21 / 22
Our starting point: discrete time and discrete state space
▶ The stochastic process: {Xn , n ≥ 0}, with values in a countable state
space S
▶ The event that the stochastic process is in state s at time n is denoted
by {Xn = s}.
▶ Questions to be answered
▶ Short term: What can we say about the future states
Xn+1 , Xn+2 , Xn+3 , . . ., once we know the past and current states
X0 , X1 , . . . , Xn ?
▶ Long term: Which part of the time does a stochastic process spend in
a certain state?
▶ More: What is the average cost per period if in every state certain
costs have to be paid?
▶ Starting point: how to define and characterize such a process
Chen Zhou
Erasmus University Rotterdam
22 / 22
Plan for Lecture 2
Video before lecture
▶ Markov chain and transition matrix (Slides 2-9)
Lecture
1. Markov chain and transition matrix (Chap 4.1, Examples R4.4 and
R4.5)
2. n−step transition (Chap 4.2)
3. Long run behavior: an introduction
Chen Zhou
Erasmus University Rotterdam
1 / 35
Markov Chain and Transition Matrix
Chen Zhou
Erasmus University Rotterdam
2 / 35
Markov chain
▶ Given the present, the future does not depend on the past.
▶ A stochastic process {Xn , n ≥ 0} with state space S is called a
discrete time Markov chain if, for all states i, j, s0 , . . . , sn−1 ∈ S,
P(Xn+1 = j | Xn = i, Xn−1 = sn−1 , . . . , X0 = s0 )
= P(Xn+1 = j | Xn = i).
▶ The present Xn
▶ The past X0 , . . . , Xn−1
▶ The future Xn+1 only depends on the present and not on the past.
▶ The conditional probabilities
P(Xn+1 = j | Xn = i)
are called (1-step) transition probabilities from state i to state j.
Chen Zhou
Erasmus University Rotterdam
3 / 35
Time homogeneous Markov chain
▶ In most applications we have, for all n,
P(Xn+1 = j | Xn = i) = P(X1 = j | X0 = i) = Pij .
▶ In this case, the 1-step transition probabilities are said to be time
homogeneous.
▶ That is, they do not depend on n.
▶ In the sequel, we will only look at time homogeneous Markov chains.
Chen Zhou
Erasmus University Rotterdam
4 / 35
Transition probability matrix
▶ The matrix


P00 P01 P02 . . .
P10 P11 P12 . . .


 ..
..
..
.. 

.
.
. 
P= .

 Pi0 Pi1 Pi2 . . .


..
..
..
..
.
.
.
.
is called the (1-step) transition probability matrix.
▶ A transition probability matrix has the following two properties:
▶ positive entries: Pij ≥ 0 for all i and j,
P
▶ rows summing up to 1: j∈S Pij = 1 for all i.
▶ A matrix P with the above two properties is called a stochastic
matrix.
Chen Zhou
Erasmus University Rotterdam
5 / 35
Visualize the transition matrix
▶ A transition diagram is a directed graph, visualizing a Markov chain.
▶ The nodes in the graph represent the states.
▶ Draw an arc from i to j if Pij > 0.
▶ Put the value Pij next to the arc.
▶ For instance, the transition matrix


0
1
0
P = 0.5 0 0.5
0.1 0.6 0.3
is visualized by
0.5
1
1
3
2
0.3
0.6
0.5
0.1
Chen Zhou
Erasmus University Rotterdam
6 / 35
Example R4.1: weather forecasting
▶ Suppose that the chance of rain tomorrow depends on the previous
weather conditions only through the weather today.
▶ If it rains today, tomorrow it will
▶ rain with probability α,
▶ be sunny with probability 1 − α.
▶ If today is sunny, tomorrow it will
▶ rain with probability β,
▶ be sunny with probability 1 − β.
▶ Let Xn be the weather on day n (0=rain, 1=sunny).
▶ The stochastic process {Xn , n ≥ 0} is a Markov chain with state
space S = {0, 1}, and transition probability matrix
α 1−α
P=
β 1−β
Chen Zhou
Erasmus University Rotterdam
7 / 35
Example R4.7: bonus malus
▶ In car insurance, the annual premium depends on the last year
premium and on the number of claims made last year.
▶ Typically, no claims result in a lower premium and many claims result
in a higher premium.
Class
1
2
3
4
Annual
premium
200
250
400
600
0 claims
1
1
2
3
Next class if
1 claim 2 claims
2
3
3
4
4
4
4
4
> 2 claims
4
4
4
4
▶ The number of claims per year has a Poisson distribution with
parameter λ.
Chen Zhou
Erasmus University Rotterdam
8 / 35
Example R4.7: bonus malus
▶ Let Xn be the bonus malus class at time n.
▶ The stochastic process {Xn , n ≥ 0} is a Markov chain with state
space S = {1, 2, 3, 4}, and transition probability matrix


a0 a1 a2 1 − a0 − a1 − a2
a0 0 a1
1 − a0 − a1 

P=
 0 a0 0

1 − a0
0 0 a0
1 − a0
▶ Here
λk
,
k!
represents the probability of having k claims per year.
ak = e −λ
Chen Zhou
Erasmus University Rotterdam
9 / 35
Example R4.4: a non-Markov chain
▶ Suppose that the today’s weather depends on the last 2 days’
weather.
▶ if it has rained the past two days, it will rain tomorrow with probability
0.7
▶ if it rained today but not yesterday, it will rain tomorrow with
probability 0.5
▶ if it rained yesterday but not today, it will rain tomorrow with
probability 0.4
▶ if it has not rained in the past two days, it will rain tomorrow with
probability 0.2.
▶ Let Xn be the weather on day n. Is the process {Xn , n ≥ 0} a Markov
chain?
Chen Zhou
Erasmus University Rotterdam
10 / 35
Example R4.4: smartly define S to achieve “Markovian”
▶ With “smartly” choosing the state space S, we are able to obtain a
Markov chain.
▶ Let the state at time n represent the weather on day n and day n − 1.
▶
▶
▶
▶
state
state
state
state
0:
1:
2:
3:
(R, R);
(R, S);
(S, R);
(S, S).
▶ Clearly it is not possible to go from (R, R) to (R, S) or (S, S).
▶ We now have
▶
▶
▶
▶
P00
P01
P02
P03
= P(X2
= P(X2
= P(X2
= P(X2
Chen Zhou
= (R, R) | X1 = (R, R)) = 0.7,
= (R, S) | X1 = (R, R)) = 0,
= (S, R) | X1 = (R, R)) = 0.3,
= (S, S) | X1 = (R, R)) = 0.
Erasmus University Rotterdam
11 / 35
Example R4.4: continued
▶ Hence, we may describe this “two-day weather” process by means of
a Markov chain with transition matrix


0.7 0 0.3 0
0.5 0 0.5 0 

P=
 0 0.4 0 0.6
0 0.2 0 0.8
Chen Zhou
Erasmus University Rotterdam
12 / 35
Example R4.5: random walk
▶ A random walk is a Markov chain with state space given by Z
▶ The state space is infinite, but countable
▶ At state i, it may walk one step “up” to i + 1 or “down” to i − 1
▶ The transition probabilities are given by
Pi,i+1 = p,
Pi,i−1 = 1 − p
for all i ∈ Z, with 0 < p < 1.
Chen Zhou
Erasmus University Rotterdam
13 / 35
n-step Transition
Chen Zhou
Erasmus University Rotterdam
14 / 35
n-step transitions probabilities
▶ So far we considered only one-step, what happens as time goes by?
▶ Probabilities P(Xn = j | X0 = i) are called n-step transition
probabilities.
▶ Denote P(Xn = j | X0 = i) by Pijn .
▶ Question: How do we calculate n-step transition probabilities?
Chen Zhou
Erasmus University Rotterdam
15 / 35
Example of the “two-day” weather
▶ Recall the states
▶
▶
▶
▶
state
state
state
state
0:
1:
2:
3:
(R, R);
(R, S);
(S, R);
(S, S).
▶ And the transition matrix {0, 1, 2, 3}


0.7 0 0.3 0
0.5 0 0.5 0 

P=
 0 0.4 0 0.6
0 0.2 0 0.8
▶ It rained today (Friday) but not yesterday (Thursday), what is the
probability that both Sunday and Monday will be sunny?
▶ Current state: (R, S) (state 1)
▶ Target: in three steps, we get to (S, S) (state 3)
3 ?
▶ What is the probability P13
Chen Zhou
Erasmus University Rotterdam
16 / 35
An inefficient calculation
0.3
0.6
0.5
0.7
1
0
0.5
2
3
0.8
0.4
0.2
▶ Starting in 1, the Markov chain can arrive in state 3 in 3 steps via the
path 1 ↣ 2 ↣ 3 ↣ 3 or the path 1 ↣ 0 ↣ 2 ↣ 3.
▶ Hence,
3
P13
= 0.5 · 0.6 · 0.8 + 0.5 · 0.3 · 0.6 = 0.33.
▶ This method is inefficient if we need to make many step!
Chen Zhou
Erasmus University Rotterdam
17 / 35
Chapman-Kolmogorov equations
▶ An efficient solution to compute n-step transition probabilities: the
Chapman-Kolmogorov equations
X (n) (m)
(n+m)
Pi,j
=
Pi,k Pk,j .
k∈S
▶ Proof
(n+m)
Pi,j
=P(Xn+m = j | X0 = i)
X
P(Xn+m = j, Xn = k | X0 = i)
=
k∈S
=
X
P(Xn+m = j | Xn = k, X0 = i) × P(Xn = k | X0 = i).
k∈S
▶ From the definition of the Markov chain, it follows that
X
X (n) (m)
(n+m)
Pi,j
=
P(Xn+m = j | Xn = k)P(Xn = k | X0 = i) =
Pik Pkj .
k∈S
Chen Zhou
k∈S
Erasmus University Rotterdam
18 / 35
Chapman-Kolmogorov equations in matrix notation
▶ If we denote the n-step transition matrix by P(n) , then we can rewrite
the Chapman-Kolmogorov equations in the form of matrix production
P(n+m) = P(n) P(m) .
▶ For a time homogeneous Markov chain, by successive application of
the Chapman-Kolmogorov equations we obtain the n-step transition
matrix
P(n) = Pn .
Chen Zhou
Erasmus University Rotterdam
19 / 35
“Two-day” weather example: an efficient way
▶ The Markov chain with state space {0, 1, 2, 3} and transition matrix


0.7 0 0.3 0
0.5 0 0.5 0 

P=
 0 0.4 0 0.6
0 0.2 0 0.8
3 ?
▶ What is the probability P13
▶ The answer rests in the 3-step transition matrix


0.403 0.120 0.207 0.270
0.345 0.120 0.205 0.330

P3 = 
0.200 0.176 0.120 0.504
0.150 0.168 0.110 0.572
Chen Zhou
Erasmus University Rotterdam
20 / 35
“Two-day” weather example: unconditional probability
▶ What is the probability that after 3 transitions, the Markov chain will
be in state 3?
▶ This is an unconditional probability:
▶ The state of X0 is unknown
▶ Use the law of total probability: consider all cases
P(X3 = 3) = P(X3 = 3 | X0 = 0)P(X0 = 0)
+ P(X3 = 3 | X0 = 1)P(X0 = 1)
+ P(X3 = 3 | X0 = 2)P(X0 = 2)
+ P(X3 = 3 | X0 = 3)P(X0 = 3)
= 0.270P(X0 = 0) + 0.330P(X0 = 1)
+ 0.504P(X0 = 2) + 0.572P(X0 = 3).
▶ The answer depends on the initial state
Chen Zhou
Erasmus University Rotterdam
21 / 35
Unconditional distribution of Xn
▶ Let α(0) denote the initial distribution, that is, the vector with
elements P(X0 = i) with i ∈ S.
▶ Let α(n) denote the distribution at time n, that is, the vector with
elements P(Xn = i) with i ∈ S.
▶ Then, by conditioning on the initial state we have
X
P(Xn = j) =
P(Xn = j | X0 = i)P(X0 = i)
=
X
i∈S
n (0)
Pij αi .
i∈S
▶ Thus, with the initial distribution α(0) and the transition probability
matrix P, one can calculate the distribution at time n by
n
α(n) = (Pn )⊤ α(0) = P⊤ α(0) .
Chen Zhou
Erasmus University Rotterdam
22 / 35
A difficult example
▶ Suppose weather is described via a Markov chain with state space
S = {0, 1, 2}.
▶ State 0=rain, state 1=cloudy, state 2=sunny.
▶ Moreover, suppose the transition matrix

0.4 0.6

P = 0.2 0.5
0.1 0.7
is

0
0.3
0.2
▶ Monday is sunny.
▶ What is the probability that it will not rain in the next three days
(Tuesday, Wednesday, Thursday)?
▶ An inefficient solution
▶ P(2 ↣ 2 ↣ 2 ↣ 2) + P(2 ↣ 2 ↣ 2 ↣ 1) + · · ·
▶ The complexity of the inefficient solution: ruling out some rainy days
during the three days
Chen Zhou
Erasmus University Rotterdam
23 / 35
An efficient solution: smartly design “states”
▶ Solve the complexity by designing an “absorbing state”
▶ A state i with Pii = 1 is called absorbing.
▶ You can enter an absorbing state, but you may never leave . . .
▶ We design state 0 (rain) to be an “absorbing state”
▶ In this “parallel universe”: once rains, always rains
▶ Although the two worlds are different, the event “it will not rain in
the next three days” in the real world is the same as “not entering the
absorbing state” in the “parallel universe”
▶ The transition matrix in the “parallel universe: Change P to


1
0
0
Q = 0.2 0.5 0.3
0.1 0.7 0.2
▶ Compute the corresponding 3-step transition matrix


1
0
0
Q3 = 0.4430 0.3770 0.1800
0.3830 0.4200 0.1970
Chen Zhou
Erasmus University Rotterdam
24 / 35
An efficient solution: continued
▶ The event “not entering absorbing state in the next three days”
means after three days, we are in state 1 or 2.
▶ Recall that we start from a sunny date (state 2)
▶ The desired probability is the probability that the Markov chain
starting from state 2 but arrives in state 1 or 2 after 3 steps.
3
3
+ Q22
P(no rain within 3 days) = Q21
= 0.4200 + 0.1970 = 0.6170.
Chen Zhou
Erasmus University Rotterdam
25 / 35
General procedure regarding designing absorbing date
▶ Let {Xn , n ≥ 0} be a Markov chain with transition probabilities Pij .
▶ Suppose that we concentrate on a specific set of states, A .
▶ We want to determine the probability that a Markov chain never
enters any of the states in A by time n.
▶ We design states in A as absorbing states.
▶ Transform this Markov chain to a new transition probabilities Qij
defined by


if i ∈ A , j = i,
1
Qij = 0
if i ∈ A , j ̸= i,


Pij otherwise.
▶ Except the set A is visited, the two Markov chains follow the
identical transition probabilities.
▶ Therefore, for i, j ∈ S \ A , Qijn represents the probability that the
original chain, starting in i, will be in state j at time n without ever
entering any one of the states in A .
▶ You may even collapse the states in A into one state!
Chen Zhou
Erasmus University Rotterdam
26 / 35
Long Run Behavior: an Introduction
Chen Zhou
Erasmus University Rotterdam
27 / 35
Limiting distribution
▶ So far we have studied the transient behavior of Markov chains after
n periods.
▶ Next we are interested in the limiting behavior of Markov chains
(after infinite periods).
▶ Let π denote the limiting distribution of the Markov chain, that is,
the vector with elements πi = limn→∞ P(Xn = i) with i ∈ S.
▶ Questions we would like to answer are:
▶ Does the limiting distribution π exist?
▶ If the limiting distribution π exists, is it unique?
▶ Or does π depend on the initial distribution α(0) ?
▶ How do you compute the limiting distribution π if it exists?
Chen Zhou
Erasmus University Rotterdam
28 / 35
Example 1
▶ Consider a Markov chain with transition matrix
0 1
P=
1 0
▶ This Markov chain is represented by the graph
1
1
2
1
Chen Zhou
Erasmus University Rotterdam
29 / 35
Example 1
▶ Compute the transient distributions.
α(0) = [1, 0],
α(2) = P⊤ α(1) = [1, 0],
..
.
α(1) = P⊤ α(0) = [0, 1],
α(3) = P⊤ α(2) = [0, 1],
..
.
α(2n) = P⊤ α(2n−1) = [1, 0],
α(2n+1) = P⊤ α(2n) = [0, 1].
▶ Conclusion: the limiting distribution does not exist!
Chen Zhou
Erasmus University Rotterdam
30 / 35
Example 2
▶ Consider a Markov chain with transition matrix


0 0.5 0.5 0
0 0
0 1

P=
0 0
1 0
0 0
0 1
▶ This Markov chain is represented by the graph
0.5
1
3
1
2
4
1
0.5
Chen Zhou
1
Erasmus University Rotterdam
31 / 35
Example 2
▶ Compute the transient distributions.
α(0)
α(1)
α(2)
P(X0 = 1) = 1
= (1, 0, 0, 0),
= (0, 0.5, 0.5, 0),
= (0, 0, 0.5, 0.5),
..
.
α(n) = (0, 0, 0.5, 0.5).
P(X0 = 2) = 1
= (0, 1, 0, 0),
α(1) = (0, 0, 0, 1),
α(2) = (0, 0, 0, 1),
..
.
α(0)
α(n) = (0, 0, 0, 1).
▶ Conclusion: Limiting distribution depends on initial distribution!
Chen Zhou
Erasmus University Rotterdam
32 / 35
Example 3
▶ Consider a Markov chain with transition matrix
0.50 0.50
P=
0.75 0.25
▶ This Markov chain is represented by the graph
0.50
0.50
1
2
0.25
0.75
▶ A simplified version of “Robinson’s game”
Chen Zhou
Erasmus University Rotterdam
33 / 35
Example 3
▶ Compute the transient distributions.
α(0)
α(1)
α(2)
P(X0 = 1) = 1
= (1, 0),
= (0.5, 0.5),
= (0.625, 0.375),
..
.
α(5) = (0.60, 0.40),
α(6) = (0.60, 0.40),
..
.
α(0)
α(1)
α(2)
P(X0 = 2) = 1
= (0, 1),
= (0.75, 0.25),
= (0.563, 0.437),
..
.
α(5) = (0.60, 0.40),
α(6) = (0.60, 0.40),
..
.
▶ Conclusion: only one limiting distribution exists!
Chen Zhou
Erasmus University Rotterdam
34 / 35
Recapitulating
▶ The following situations may be encountered.
▶ No limiting distribution exists.
▶ See Example 1.
▶ Several limiting distributions exist.
▶ See Example 2.
▶ A unique limiting distribution exists.
▶ See Example 3.
▶ How can this be explained?
▶ To answer this question, we first have to formulate a few structural
properties of Markov chains.
Chen Zhou
Erasmus University Rotterdam
35 / 35
Plan for Lecture 3
Video before lecture
▶ Recurrent and transient states (Slides 2-8)
Lecture
1. Classification of states (Chap 4.3 till Example R4.18 [R4.17])
2. Long run limit (Chap 4.4)
Chen Zhou
Erasmus University Rotterdam
1 / 26
Recurrent and Transient States
Chen Zhou
Erasmus University Rotterdam
2 / 26
Communication of states
▶ Recall the notation Pijk : the probability that starting from state i,
after k steps, arriving at state j
▶ j is called accessible from i if Pijk > 0 for some k ≥ 0.
▶ In particular, i is accessible from i, as Pii0 > 0.
▶ States i and j are said to communicate if they are accessible from
each other.
▶ We denote this by i ↔ j
▶ Communicating states form a class:
▶ i ↔ i;
▶ If i ↔ j, then j ↔ i;
▶ If i ↔ j and j ↔ k then i ↔ k.
▶ If there is only one class, MC is said to be irreducible.
▶ If there are more classes, the Markov chain is called reducible.
Chen Zhou
Erasmus University Rotterdam
3 / 26
A simple example
▶ Consider a Markov chain with transition matrix


0 0 0.5 0.5
1 0 0
0

P=
0 1 0
0
0 1 0
0
▶ Is this MC irreducible?
Chen Zhou
Erasmus University Rotterdam
4 / 26
A not-so-simple example
▶ Consider a Markov chain with transition matrix


0.50 0.50
0
0
0
0.50 0.50
0
0
0 



0
0.50 0.50
0 
P= 0

 0
0
0.50 0.50
0 
0.25 0.25
0
0
0.50
▶ Is this MC irreducible?
Chen Zhou
Erasmus University Rotterdam
5 / 26
Recurrent and transient
▶ Let fi denote the probability that, starting in state i, the process will
ever reenter state i.
▶ State i is said to be recurrent if fi = 1, and transient if fi < 1.
▶ Recurrent: “One day I will be back!”
▶ Transient: “I may never come back!”
▶ Previous example (five states)
▶ The classes {0, 1}, {2, 3} are recurrent, and {4} is transient.
Chen Zhou
Erasmus University Rotterdam
6 / 26
Visiting a recurrent state
▶ If a Markov chain start in a recurrent state i, with probability 1, the
MC visits state i infinitely often.
▶ Let the Markov chain start in i. It will eventually reenter state i, with
probability 1.
▶ At the moment the Markov chain reenters state i, we are back in the
original situation.
▶ The Markov chain “restarts” at the moment it reenters state i.
▶ The restarted Markov chain will eventually reenter state i, with
probability 1.
▶ Moreover, the restarted Markov chain “restarts” at the moment it
reenters state i.
▶ The rerestarted Markov chain will eventually reenter state i, with
probability 1.
▶ . . . and so on and so forth . . .
▶ The Markov chain visits state i infinitely often, with probability 1.
Chen Zhou
Erasmus University Rotterdam
7 / 26
Visiting a transient state
▶ Let the Markov chain start in a transient state i.
▶ Recall that fi < 1.
▶ The probability that the Markov chain will never reenter state i is
1 − fi > 0.
▶ The probability that the Markov chain visits state i exactly n times is
equal to fi n−1 (1 − fi ).
▶ The number of visits to state i follows a geometric distribution with
finite mean (1 − fi )−1 .
▶ If starting from a transient state i, the Markov chain visits state i
only a finite number of times, with probability 1.
Chen Zhou
Erasmus University Rotterdam
8 / 26
Necessary and sufficient condition for a transient state
▶ Consider a transient state i.
P
▶ Observe that ∞
n=0 IXn =i is the number of visits to state i.
P
▶ E( ∞
n=0 IXn =i |X0 = i) is the expected number of visits to state i, if
starting from state i
P
▶ If the state i is transient, then E( ∞
n=0 IXn =i |X0 = i) is finite
P∞
n
▶ Hence n=0 Pii is finite.
E
∞
X
!
IXn =i | X0 = i
n=0
∞
X
=
=
∞
X
n=0
P(Xn = i | X0 = i) =
▶ Conversely, if n=0 Piin is finite, then E(
and hence state i is transient.
Chen Zhou
∞
X
Piin .
n=0
n=0
P∞
E(IXn =i | X0 = i)
P∞
Erasmus University Rotterdam
n=0 IXn =i |X0
= i) is finite,
9 / 26
Various ways of distinguishing between recurrent and
transient states
▶ According to the probability fi that, starting in state i, the Markov
chain will ever reenter state i.
▶ State i is recurrent if fi = 1.
▶ State i is transient if fi < 1.
▶ According to the number of visits of state i.
▶ State i is recurrent if the Markov chain visits state i infinitely often,
with probability 1.
▶ State i is transient if the Markov chain visits state i only a finite
number of times, with probability 1.
▶ According to
P∞
n
n=0 Pii .
P
n
▶ State i is recurrent if ∞
n=0 Pii is infinite.
P∞
n
▶ State i is transient if n=0 Pii is finite.
Chen Zhou
Erasmus University Rotterdam
10 / 26
Classification of States
Chen Zhou
Erasmus University Rotterdam
11 / 26
Classification of states
▶ Recurrence and transience are class properties.
▶ If state i is recurrent and state i communicates with state j, then state
j is also recurrent.
▶ If state i is transient and state i communicates with state j, then state
j is also transient.
▶ A Markov chain can be decomposed in recurrent classes and transient
classes.
Chen Zhou
Erasmus University Rotterdam
12 / 26
Recurrent and transient class for finite-state Markov chain
▶ Statement 1: In a finite-state Markov chain, not all states are
transient.
▶ If all states are transient then after finitely many steps the chain can
not return to any of the states, which yields a contradiction.
▶ Statement 2: All states of an irreducible finite-state Markov chain are
recurrent.
▶ A finite-state Markov chain should have at least one recurrent class.
▶ In an irreducible Markov chain all states belong to the same class.
▶ The two statements are proved by rigorous arguments.
Chen Zhou
Erasmus University Rotterdam
13 / 26
Further decomposing recurrent states
▶ Recurrent states: with probability 1, it comes back
▶ Denote Nj = min {n > 0, Xn = j}.
▶ Nj |X0 = j is the time to come back. Then j is recurrent means
P(Nj < +∞|X0 = j) = 1,
i.e., Nj |X0 = j will not take the value +∞
▶ A random variable not taking +∞ may still have an infinite mean
▶ A random variable that has a finite mean must not take the value
+∞ with a positive probability
▶ Distinguish two types of recurrent states
▶ A recurrent state is called positive recurrent if the expected time until
the process returns to the same state is finite.
E (Nj |X0 = j) < +∞
▶ Recurrent states which are not positive are called null recurrent states.
E (Nj |X0 = j) = +∞
Chen Zhou
Erasmus University Rotterdam
14 / 26
Positive recurrent states: properties
▶ Positive recurrence is a class property.
▶ In a finite-state Markov chain, all recurrent states are positive
recurrent.
▶ At this moment, we cannot prove the two statements rigorously with
arguments, only by intuition.
▶ So far we classified states into
▶ Recurrent: “One day I will come back, for infinitely many times!”
▶ Positive recurrent: “On average I will be back in finite time”
▶ Null recurrent: “The expected time for my return is infinity!”
▶ Transient: “I may never come back”
▶ “The expected number of visits is finite”
Chen Zhou
Erasmus University Rotterdam
15 / 26
A new classification system: periodic states
▶ The period d of state i is defined as
d = gcd{n > 0 : Piin > 0}
▶ Here gcd denotes the greatest common divisor.
▶ Thus, starting in i, the Markov chain can return to i only at multiples
of the period d, and d is the largest such integer.
▶ To compute the period of state i, take the gcd of the lengths of all
paths which start at i, return to i, and have positive probability.
▶ “I will come back only at certain season!”
▶ A state with period d > 1 is said to be periodic.
▶ A state with period d = 1 is said to be aperiodic.
▶ Periodic states may complicate the study of the Markov chain.
Chen Zhou
Erasmus University Rotterdam
16 / 26
Periodicity: properties
▶ Periodicity is also a class property.
▶ That is, if i has period d, and states i and j communicate, then j has
also period d.
▶ This can be proved rigorously! But we skip the proof.
▶ So we have a new classification system according to periodicity
▶ Joining the two systems: aperiodic, positive recurrent states are called
ergodic.
Chen Zhou
Erasmus University Rotterdam
17 / 26
Example: periodicity
1
2
3
6
5
4
▶ Question: is the Markov chain visualized above periodic or aperiodic?
Chen Zhou
Erasmus University Rotterdam
18 / 26
Long Run Limit
Chen Zhou
Erasmus University Rotterdam
19 / 26
Long-run behavior of a MC
▶ We are interested in the limit behavior limn→∞ Pijn
▶ We hope the limit exist
▶ We hope the limit does not depend on i (starting point)
Theorem (Theorem 4.1)
For an irreducible ergodic Markov chain limn→∞ Pijn exists and is
independent of i
▶ Denote πj = limn→∞ Pijn for j ∈ S and the limiting distribution
π = (πj )j∈S
Chen Zhou
Erasmus University Rotterdam
20 / 26
The stationary distribution of a MC
▶ The stationary distribution ϖ = (ϖj )j∈S which is the unique solution
of the steady-state equations
P
j ∈ S,
i ϖi Pij ,
P ϖj =
ϖ
=
1.
j
j
▶ Intuitively, consider a MC starting from a distribution ϖ
▶ The starting point follows the distribution P(X0 = j) = ϖj
▶ Compute P(X1 = j) by conditioning on the state at time 0.
X
X
P(X1 = j) =
P(X1 = j | X0 = i)P(X0 = i) =
Pij ϖi = ϖj
i
i
▶ Once the MC starts from ϖ, we will always have P(Xn = j) = ϖj
This is the definition of Stationary!
Chen Zhou
Erasmus University Rotterdam
21 / 26
Limiting distribution and stationary distribution
▶ Once the MC starts from ϖ, we will always have P(Xn = j) = ϖj
▶ For such a MC, the limiting distribution is ϖ
▶ Recall that the limiting distribution does not depend on where the
MC starts
▶ For an irreducible ergodic Markov chain, The limiting distribution π
coincides with the stationary distribution ϖ.
▶ In other words, π is the unique solution of the steady-state equations.
▶ Whenever we have an irreducible, positive recurrent, periodic Markov
chain, then the stationary distribution ϖ is still the unique solution of
the steady-state equations.
Chen Zhou
Erasmus University Rotterdam
22 / 26
Example: social classes
▶ Suppose that the transition between social classes (upper, middle or
lower) of the successive generations in a family can be modeled as a
Markov chain.
▶ That is, it is assumed that the occupation of a child depends only on
the occupation of her parents.
▶ Let the transition probability matrix for

0.45 0.48

P = 0.05 0.70
0.01 0.50
a given society be

0.07
0.25 .
0.49
▶ Compute the long run percentages of people in different classes for
this society.
Chen Zhou
Erasmus University Rotterdam
23 / 26
Social classes: continued
▶ The steady state equations are
π0 = 0.45π0 + 0.05π1 + 0.01π2
π1 = 0.48π0 + 0.70π1 + 0.50π2
π2 = 0.07π0 + 0.25π1 + 0.49π2
π0 + π1 + π2 = 1
▶ Interpretation:
▶ In an equilibrium a fraction π0 of people is in the upper class and a
fraction 0.45π0 of their children manage to remain in the upper class.
▶ Moreover, the inflow of the middle class into the upper class is a
fraction 0.05π1 and from the lower class a fraction 0.01π2 .
▶ So assuming an equilibrium yields the above equations.
▶ The equilibrium percentages of the classes are: 7%, 62% and 31%.
Chen Zhou
Erasmus University Rotterdam
24 / 26
Reward/cost of a process
▶ Let {Xn , n ≥ 1} be an irreducible Markov chain with stationary
probabilities πj , j ≥ 0.
▶ Let r be a bounded function defined on the state space S.
▶ Here r (j) can be considered as the reward of being in state j.
▶ Then, with probability 1,
PN
lim
n=1 r (Xn )
N→∞
▶ Thus,
P
j
Chen Zhou
N
=
X
r (j)πj .
j
r (j)πj gives us the average reward per unit time.
Erasmus University Rotterdam
25 / 26
Example: output of a machine
▶ Consider a machine which can be in three states: good, bad and
failed.
▶ In the good state, its output is 200 euro per hour.
▶ In the bad state, its output is 100 euro per hour.
▶ In the failed state, its output is 0 euro per hour.
▶ Suppose that we observe the state of the machine every hour and
that the transitions can be described by a Markov chain.
▶ What is the mean output per hour?
▶ By solving the steady-state equations, we obtain that the limiting
probabilities of the good, bad and failed states are 0.70, 0.20 and
0.10, respectively.
▶ Then, the mean reward is 200 · 0.70 + 100 · 0.20 + 0 · 0.10 = 160 euro
per hour.
Chen Zhou
Erasmus University Rotterdam
26 / 26
Plan for Lecture 4
Video before lecture
▶ Gambler’s ruin and potential questions (Slides 2-6)
Lecture
1. Mean time spent in transient states (Chap 4.6)
2. Time from transient states to recurrent states (NOT in the book!)
Chen Zhou
Erasmus University Rotterdam
1 / 25
Gambler’s Ruin
Chen Zhou
Erasmus University Rotterdam
2 / 25
Gambler’s ruin: the problem
▶ Gambler’s ruin is a classical probability problem.
▶ Consider a game in which “the bank” pays out e2 with probability p,
and pays out nothing with probability 1 − p.
▶ In order to play this game, you need to pay a fee of e1.
▶ In other words, for each game, you either earn e1 or loss e1.
▶ Assume that the successive plays of the game are independent.
▶ A gambler has an initial capital of ei
▶ He will continue playing the game until
▶ his capital is eN (“break the bank”),
▶ or his capital is e0 (gambler’s ruin).
▶ What is the probability of breaking the bank?
Chen Zhou
Erasmus University Rotterdam
3 / 25
Gambler’s ruin: the model
p
1
1
0
q
p
...
2
q
p
p
N −1
N
1
q
q
▶ Let Xn denote the player’s capital at time n.
▶ Then, {Xn , n ≥ 0} is a Markov chain with transition probabilities
P00 = PNN = 1,
Pi,i+1 = p, Pi,i−1 = 1 − p,
i = 1, 2, . . . , N − 1.
▶ In fact, {Xn , n ≥ 0} is a random walk.
Chen Zhou
Erasmus University Rotterdam
4 / 25
Gambler’s ruin: the question
▶ Let Pi , i = 0, 1, . . . , N denote the probability that, starting with ei,
the gambler will eventually reach eN.
▶ We are interested in calcluating Pi for each i.
▶ Obviously, P0 = 0 and PN = 1 since {0} and {N} are two absorbing
states (two recurrent classes).
▶ For any 0 < i < N, state i is a transient state.
▶ {1, 2, . . . , N − 1} form a transient classes
▶ The question on Pi is essentially about “what is the probability to
reach one recurrent class (instead of the others) when starting from a
transient state”
Chen Zhou
Erasmus University Rotterdam
5 / 25
A systematic view about potential questions
▶ If a Markov Chain starts from a recurrent state
▶ The process travels only within the class it belongs to
▶ Within the class, if it is ergodic, there is a limit distribution
▶ If there are recurrent class(es) and transient class(es) and the Markov
Chain starts from a transient class
▶ It may visit some other transient states belonging to the same class
▶ It will eventually land in one recurrent class
▶ Questions to be handled when starting from a transient state
▶
▶
▶
▶
How many time the process visit one (other) transient state?
Would the process ever visit one (other) transient state?
How many steps it takes to land in the recurrent class (if only one)?
What is the chance to land in a specific recurrent class (if many)?
Chen Zhou
Erasmus University Rotterdam
6 / 25
Gambler’s ruin: the solution
▶ Recall that Pi , i = 0, 1, . . . , N denote the probability that, starting
with ei, the gambler will eventually reach eN.
▶ Conditioning on the outcome of the first play yields
Pi = pPi+1 + (1 − p)Pi−1 ,
i = 1, 2, . . . , N − 1.
▶ Re-organize the equation to
Pi+1 − Pi =
Chen Zhou
1−p
(Pi − Pi−1 ),
p
Erasmus University Rotterdam
i = 1, 2, . . . , N − 1.
7 / 25
Solution using iteration: continued
▶ Note from the figure that P0 = 0. Therefore,
1−p
1−p
(P1 − P0 ) =
P1 ,
p
p
1−p
1−p 2
P3 − P2 =
(P2 − P1 ) =
P1 ,
p
p
..
.
1 − p N−1
1−p
PN − PN−1 =
(PN−1 − PN−2 ) =
P1 .
p
p
P2 − P1 =
▶ Adding up the first i − 1 of these equations give
Pi − P1 = P1
i−1 X
1−p k
k=1
Chen Zhou
Erasmus University Rotterdam
p
.
8 / 25
Solution using iteration: continued
▶ From
Pi = P1
i−1 X
1−p k
k=0
p
.
▶ We get
1−((1−p)/p)i
1−((1−p)/p) P1
if p ̸= 1/2,
iP1
if p = 1/2.
(
Pi =
▶ Solve P1 by using PN = 1
(
P1 =
Chen Zhou
1−((1−p)/p)
1−((1−p)/p)N
if p ̸= 1/2,
1/N
if p = 1/2.
Erasmus University Rotterdam
9 / 25
Final solution for Pi
▶ Therefore,
Pi =

 1−((1−p)/p)i
1−((1−p)/p)N
i
N
if p ̸= 1/2,
if p = 1/2.
▶ Note that as N → ∞
(
1 − ((1 − p)/p)i
lim Pi =
N→∞
0
if p > 1/2,
if p ≤ 1/2,
▶ Conclusion:
▶ if p > 1/2, there is a positive probability that the gambler’s fortune will
increase indefinitely;
▶ if p ≤ 1/2, the gambler will go broke with probability 1.
Chen Zhou
Erasmus University Rotterdam
10 / 25
Mean Time Spent in Transient States
Chen Zhou
Erasmus University Rotterdam
11 / 25
Questions to be discussed
▶ Consider a Markov Chain starting from a transient state
▶
▶
▶
▶
How many time the process visit one (other) transient state?
Would the process every visit one (other) transient state?
How many steps it takes to land in the recurrent class (if only one)?
What is the chance to land in a specific recurrent class (if many)?
▶ The first question is also called “mean time spent in a transient
state”.
▶ The second question is also called “probability of entering a transient
state”.
Chen Zhou
Erasmus University Rotterdam
12 / 25
Mean time spent: solution
▶ Suppose that all transient states in a Markov chain are in a set
T = {1, . . . , t}, while the other recurrent states are in {t + 1, . . . , s}
▶ Consider transient states i and j
▶ Let sij be the expected number of time periods that the Markov chain
is in state j given that it started in state i.
▶ By “conditioning on the first step”, we obtain

P
1 + k Pik skj if i = j
sij = P

if i ̸= j.
k Pik skj
▶ Since a transition from a recurrent state to a transient state is not
possible, this simplifies into
(
P
1 + tk=1 Pik skj if i = j,
sij = Pt
if i =
̸ j.
k=1 Pik skj
Chen Zhou
Erasmus University Rotterdam
13 / 25
Mean time spent: matrix notation
▶ Let



P1t
s11 s12 · · ·
..  and S =  ..
..
..
 .
. 
.
.
Ptt
st1 st2 · · ·
P11 P12 · · ·
 ..
..
..
PT =  .
.
.
Pt1 Pt2 · · ·

s1t
..  .
. 
stt
▶ PT is the t × t submatrix of P corresponding to T .
▶ Note that the sum of the rows are not necessarily equal to 1.
▶ S is the t × t matrix to be solved
▶ In matrix notation, we have
S = I + PT S.
▶ Equivalently, we can solve S as
S = (I − PT )−1 .
Chen Zhou
Erasmus University Rotterdam
14 / 25
Example R4.32 [R4.30]: Gumbler’s ruin problem
p
1
1
0
q
p
...
2
q
p
p
q
6
7
1
q
▶ Suppose in the gambler’s ruin problem that p = 0.4 and N = 7.
▶ The Gambler starts with 3 units.
▶ Compute
1. the expected amount of time the gambler has 5 units,
2. the expected amount of time the gambler has 2 units.
Chen Zhou
Erasmus University Rotterdam
15 / 25
Example R4.32 [R4.30]: continued
▶ Note that the transient states are {1, 2, . . . , 6}:
▶ The answer for this problem boils down to computing s3,5 and s3,2 .
▶ The matrix PT which specifies Pi,j , i, j = 1, . . . , 6 is as follows.


0 0.4 0
0
0
0
0.6 0 0.4 0
0
0


 0 0.6 0 0.4 0
0
.
PT = 
0
0 0.6 0 0.4 0 


0
0
0 0.6 0 0.4
0
0
0
0 0.6 0
Chen Zhou
Erasmus University Rotterdam
16 / 25
Example R4.32 [R4.30]: continued
▶ Inverting I − PT gives

1.615 1.025
1.537 2.562

1.421 2.368
S=
1.246 2.076

0.984 1.639
0.590 0.984
0.631
1.578
2.999
2.630
2.076
1.246
0.369
0.923
1.753
2.999
2.368
1.421
0.194
0.486
0.923
1.578
2.562
1.537

0.078
0.194

0.369
.
0.631

1.025
1.615
▶ Hence, s3,5 = 0.9228 and s3,2 = 2.3677.
Chen Zhou
Erasmus University Rotterdam
17 / 25
Probability of entering a transient state
▶ Still consider transient states i and j
▶ Let fij denote the probability that the Markov chain ever enters state
j ∈ T given that it starts in state i ∈ T .
▶ To derive fij , we calcluate sij using conditioning
sij = E(time in j | starts in i, ever enters state j)fij
+ E(time in j | starts in i, never enters state j) (1 − fij ) .
▶ This yields
(
(1 + sjj )fjj + (1 − fjj ) = 1 + fjj sjj
sij =
sjj fij
if i = j,
if i =
̸ j.
▶ We then obtain the desired probability
( sjj −1
if i = j,
fij = sijsjj
if i =
̸ j.
sjj
Chen Zhou
Erasmus University Rotterdam
18 / 25
Example R4.31 [R4.33]
▶ The same Gambler’s ruin probability
▶ What is the probability that the gambler ever has e1?
▶ Answer:
▶ Since s3,1 = 1.4206 and s1,1 = 1.6149, then
f3,1 =
s3,1
= 0.8797
s1,1
▶ So far we are playing within transient states
▶ sij : expected number of visits to j
▶ fij : probability that ever visits j
Chen Zhou
Erasmus University Rotterdam
19 / 25
Time from Transient States to Recurrent States
Chen Zhou
Erasmus University Rotterdam
20 / 25
Questions to be discussed
▶ Consider a Markov Chain starting from a transient state
▶
▶
▶
▶
How many time the process visit one (other) transient state?
Would the process every visit one (other) transient state?
How many steps it takes to land in the recurrent class (if only one)?
What is the chance to land in a specific recurrent class (if many)?
▶ The last question is related our initial question for Gambler’s ruin.
Chen Zhou
Erasmus University Rotterdam
21 / 25
Expected steps to a recurrent class
▶ Suppose there is only one recurrent class, denoted by R.
▶ For a transient state i, define miR as the mean time it takes to entry
to R.
▶ By “conditioning on the first step”, we obtain
miR = 1 +
X
Pij mjR .
j∈T
▶ We can solve all miR by solving the equation system.
Chen Zhou
Erasmus University Rotterdam
22 / 25
Expected steps to a recurrent class: Matrix notation
▶ Denote m as the t-dimensional vector


m1R
m2R 


 .. 
 . 
mtR
▶ The equation system can be written in matrix notation
m = 1 + PT m.
▶ Recall that PT is part of the transition matrix for the “transient states”
▶ m can be solved as
m = (I − PT )−1 · 1 = S · 1.
▶ Recall that S is the solution for “mean time spent in transient states”
Chen Zhou
Erasmus University Rotterdam
23 / 25
Probability of landing in a specific recurrent class
▶ Assume that there are multiple recurrent classes R1 , R2 , . . .
▶ Let fiR1 be the probability that the Markov chain ever enters the
recurrent class R1 , given it starts in state i ∈ T .
▶ By “conditioning on the first step”, we obtain
fiR1 =
X
j
Pij fjR1 =
X
j∈R1
Pij · 1 +
X
Pij · 0 +
j∈R2 ,R3 ,...
X
Pij fjR1 .
j∈T
since fjR1 = 1 for j ∈ R1 and fjR1 = 0 for j ∈ R2 ∪ R3 ∪ · · ·
▶ We can solve all fiR1 by solving the equation system.
Chen Zhou
Erasmus University Rotterdam
24 / 25
Again, in matrix notation
▶ Recall that
fiR1 =
X
j∈R1
Pij +
X
Pij fjR1 .
j∈T
▶ Denote fR1 and pR1 as the t-dimensional vectors
P



P
f1R1
Pj∈R1 1j


f2R 
 j∈R1 P2j 
 1
fR1 =  .  , pR1 = 
.
..


 .. 
.
P
ftR1
j∈R1 Ptj
▶ Written in matrix notation, this becomes
fR1 = pR1 + PT fR1 .
▶ fR1 can be solved as
fR1 = (I − PT )−1 · pR1 = S · pR1 .
▶ Recall that S is the solution for “mean time spent in transient states”
Chen Zhou
Erasmus University Rotterdam
25 / 25
Plan for Lecture 5
1. Examples for Markov Chain: Robinson’s game
2. An old exam question
Chen Zhou
Erasmus University Rotterdam
1 / 21
Examples for Markov Chain: Robinson’s Game
Chen Zhou
Erasmus University Rotterdam
2 / 21
Robinson plays a game
1
2
1
4
1
2
1
1
4
1
2
▶ Where will he spend most of his time (in the long run)?
▶ What is the expected number of periods before he reaches forest?
▶ What is the expected number of visits to the beach before first return?
Chen Zhou
Erasmus University Rotterdam
3 / 21
A “cooking recipe”: preparation
▶ Modeling
▶ Define states
▶ Check whether the process is a MC
▶ If not, smartly define states (multiple days, etc)
▶ Check what the questions are about
▶ Are they related to standard quantities? (πi , sij , fij , miR )
▶ Often, there is a conditional probability/expectation involved
▶ Modifying the model if necessary (part of solution)
▶ Re-define a MC process
▶ Often, make some states “absorbing”
▶ After that, the MC process usually contains some transient states
▶ Does the questions remain the same for the “re-defined process”?
▶ This step should be considered together with the questions
Chen Zhou
Erasmus University Rotterdam
4 / 21
A “cooking recipe”: cooking and troubleshooting
▶ Solving the problem
▶ Often, with smart “modification”, the problem turns to be “standard”
▶ Use one of the known formulas (We have open book exam!)
▶ For extremely difficult question: consider a smart conditioning
▶ Often, conditional on the first step result
▶ In fact, this is how the “known formulas” are derived
Chen Zhou
Erasmus University Rotterdam
5 / 21
Robinson’s game: modeling
▶ Define states
▶ Home - 0; Beach - 1; Forest -2
▶ The process is a MC with a transition matrix


0 1/2 1/2
P = 1/2 1/4 1/4
1
0
0
▶ Are the states recurrent?
Chen Zhou
Erasmus University Rotterdam
6 / 21
Check what the questions are about
▶ Where will he spend most of his time (in the long run)?
▶ Calculating πi for each state and compare
▶ A standard question! Answer: (3/7, 2/7, 2/7)
▶ What is the expected number of periods before he reaches forest?
▶ Non-standard question, we need to modify the process.
▶ What is the expected number of visits to the beach before first
return?
▶ Non-standard question, we need to modify the process.
Chen Zhou
Erasmus University Rotterdam
7 / 21
Robinson’s game: question 2
▶ What is the expected number of periods before he reaches forest?
▶ How to modify the process?
▶ Forest is the focal state in this question
▶ If we make it “absorbing”, the other two states will not be recurrent,
but transient
▶ The question is the expected steps from a transient state to a recurrent
state/clase
▶ That is a standard question! (about miR )
▶ Solution: make state 2 absorbing, we get a different MC with


0 1/2 1/2
Q = 1/2 1/4 1/4
0
0
1
▶ States 0 and 1 are transient, state 2(R) is in its own class (recurrent)
▶ Check the formula about m0R . Answer: m0R = 5/2
Chen Zhou
Erasmus University Rotterdam
8 / 21
Robinson’s game: question 3
▶ What is the expected number of visits to the beach before first return?
▶ How to modify the process?
▶ Who is the focal state?
▶ Certainly not forest: even not mentioned in the question
▶ Not beach! We will not make it absorbing (then no returned visits)
▶ Home? No, if we make it absorbing then Robinson never leaves home!
▶ We get stuck... this is a difficult question!
▶ We turn to the last hope: conditioning on the first step
Chen Zhou
Erasmus University Rotterdam
9 / 21
Robinson’s game: quesitons 3
▶ Our last hope: conditioning on the first step
▶ If he goes to beach, then the question is starting from beach, how
many time he will visits beach before returning home?
▶ If he goes to forest, he will return home: zero visit to beach before first
return
▶ Can we solve the first subquestion?
Chen Zhou
Erasmus University Rotterdam
10 / 21
Question 3: handle the subquestion
▶ Subquestion: starting from beach, how many time he will visits beach
before returning home
▶ It is possible to solve it by applying geometric distribution
▶ Let us do it the “Markovian” way: modifying the process
▶ Make Home (State 0) absorbing, we get a different MC with


1
0
0
Q = 1/2 1/4 1/4
1
0
0
▶ Both states 1 and 2 are transient, we are interested in s11
4
s11 = 1 + 41 s11 + 14 s21
⇒ s11 =
s21 = 0 · s11 + 0 · s21
3
Or
S = (I − PT )−1 =
Chen Zhou
3/4 0
−1/4 1
−1
Erasmus University Rotterdam
=
4/3 1/3
0
1
11 / 21
Question 3: final solution
▶ What is the expected number of visits to the beach before first return?
▶ Denote the number of visits to the beach before first return as Y
E(Y ) = E(Y |first visit beach) ·
1
s11
1
+ E(Y |first visit forest) · =
+0
2
2
2
▶ Here s11 is the answer of Subquestion 1
▶ Hence, we get that
E(Y ) =
Chen Zhou
2
3
Erasmus University Rotterdam
12 / 21
Question 3: Recap the thinking procedure
▶ A difficult question that cannot be converted to a standard question
▶ We conditional on the first step move
▶ If first visiting beach, it is still a non-standard subquestion, but solvable
▶ If first visiting forest, the answer is simply zero.
▶ We solve the subquestion separately by “modifying the process”
▶ Why the problem becomes simpler after conditioning on the first step?
Drag Robinson out of his home, then set Home as an absorbing state!
Chen Zhou
Erasmus University Rotterdam
13 / 21
An old exam question
Chen Zhou
Erasmus University Rotterdam
14 / 21
Descriptions
▶ A small Intranet consists of only four websites. They link to each
other as shown in the figure below.
▶ An arrow from Website 1 to Website 2 indicates that there is a link
on Website 1 pointing towards Website 2.
▶ Larry starts from one website, and clicks with equal probability on one
of the links on the website. He repeats this behavior in the new
website. If there is no link on a website, Larry stops his web surfing.
Chen Zhou
Erasmus University Rotterdam
15 / 21
Questions
▶ First two questions
1. Define a Markov chain to model Larry’s surfing behavior. Determine
the transition matrix P and the recurrent and transient classes of this
Markov chain. Motivate your answer.
2. Larry starts from Website 2. What is the probability that Larry will
never return to Website 2?
▶ Larry changes his surfing behavior. After reaching one website, with
probability 0.4, Larry will randomly visit one of the four websites; with
probability 0.6, he will follow the original behavior to click one of the
links on the website with equal probability.
3. With the new surfing behavior, what is the limit distribution of
websites visited by Larry?
4. Glgoo Ads collects Larry’s surfing behavior. If Larry visits Websites 1,
2, 3 and 4, Glgoo Ads will make a profit of 2, 4, 8 and 0 cents
respectively. In the long run, what is the average profit Glgoo Ads will
make per visit of Larry?
Chen Zhou
Erasmus University Rotterdam
16 / 21
How to deal with such questions
▶ The questions are often long with lots of texts
▶ Stay calm, read carefully!
▶ Question 1: modeling
▶ The figure is already providing states and transition
▶ Only need to figure out the transition matrix carefully
▶ Question 2: never return to a state
▶ This occurs only for a transient states
▶ The probability of “ever return” is fii , so we solve 1 − fii
▶ There are extra descriptions after the first two questions, let’s worry
about them later!
Chen Zhou
Erasmus University Rotterdam
17 / 21
Solution to Question 1
▶ Define {Xn , n ≥ 1} as the website visited by Larry at step n.
▶ The transition matrix is given as follows


0 1/2 1/2 0
 1/3 0 1/3 1/3 
.
P=
 1/2 0
0 1/2 
0
0
0
1
▶ Because the state 4 is an absorbing state, the recurrent class contains
only one state {4}.
▶ The other three states are communicating because the path
1 → 2 → 3 → 1 has a positive probability. Therefore, they form a
transient class {1, 2, 3}.
Chen Zhou
Erasmus University Rotterdam
18 / 21
Solution to Question 2
▶ The probability of never returning to state 2 given X0 = 2 is 1 − f22
▶ We have that
1
f22 = 1 −
,
s22
where s22 is the expected time in state 2 given X0 = 2.
▶ By conditioning on the first step, we have that
1
1
s12 = s22 + s32 ,
2
2
1
1
s22 = 1 + s12 + s32 ,
3
3
1
s32 = s12 .
2
▶ By solving this equation system, we get that s12 = 2s32 , s22 = 3s32 . It
implies that s32 = 1/2 and s22 = 3/2.
▶ Hence, the probability of never returning to state 2 given X0 = 2 is
1
2
1 − f22 =
= .
s22
3
Chen Zhou
Erasmus University Rotterdam
19 / 21
Handling the last two questions
▶ Read the additional descriptions
▶ The transition matrix is changed to

Q = 0.6P + 0.1 × 14×4

0.1 0.4 0.4 0.1
 0.3 0.1 0.3 0.3 

=
 0.4 0.1 0.1 0.4  ,
0.1 0.1 0.1 0.7
where 14×4 is a 4 × 4 matrix with all elements being 1.
▶ All states are in one recurrent class
▶ The two questions are both regarding the limiting distribution
▶ The limiting distribution π is the same as the stationary distribution
Chen Zhou
Erasmus University Rotterdam
20 / 21
Solution to Questions 3 and 4
▶ Question 3: the limiting distribution
▶ Write down the equation system for solving the stations distribution
π = Q T π and π T 1 = 1
▶ The rest is to solve the equation system.
5
3
▶ Solutions: π1 = π3 = 16
, π2 = 32
, π4 = 15
32 .
▶ Question 4
▶ Average cost depends on the limiting distribution
▶ The average profit Glgoo Ads will make per visit is
2π1 + 4π2 + 8π3 + 0π4 =
3 5 3
+ + = 2.5
8 8 2
cents per visit.
Chen Zhou
Erasmus University Rotterdam
21 / 21
Plan for Lecture 6
Video before lecture
▶ Exponential distribution and its properties (Slides 2-12)
Lecture
1. Three definitions of Poisson process (Chap 5.3)
▶ Equivalent definitions of Poisson process (Chap 5.3.2)
Proof of Theorem 5.1 is not required, but the intuition in Remark (iii)
[Remark (i)] is understandable.
▶ Interarrival and waiting time (Chap 5.3.2 [Chap 5.3.3])
Chen Zhou
Erasmus University Rotterdam
1 / 34
Exponential distribution
Chen Zhou
Erasmus University Rotterdam
2 / 34
Exponential distribution with parameter λ
Warning! In statistics, the exponential distribution is usually
parameterized by the mean θ = 1/λ rather than the rate λ.
▶ The probability density function is given by
(
λe −λx , x ≥ 0,
f (x) =
0,
x < 0.
▶ The cumulative distribution function becomes
(
1 − e −λx , x ≥ 0,
F (x) =
0,
x < 0.
▶ The moment generating function is
ϕ(t) =
λ
λ−t
for t < λ.
▶ The mean and variance are given by E(X ) = λ1 ,
Chen Zhou
Erasmus University Rotterdam
Var(X ) =
1
.
λ2
3 / 34
Memoryless property
▶ An exponential random variable X has the memoryless property.
P(X > s + t | X > t) = P(X > s)
for all s, t ≥ 0.
▶ The exponential distribution is the only continuous distribution that
has the memoryless property.
Chen Zhou
Erasmus University Rotterdam
4 / 34
A simple example: Example R5.2
▶ Suppose that the amount of time one spends in a bank is
exponentially distributed with mean ten minutes, that is, λ = 1/10.
▶ What is the probability that a customer will spend more than twelve
minutes in the bank?
▶ What is the probability that a customer will spend more than twelve
minutes in the bank given that she is still in the bank after ten
minutes?
Chen Zhou
Erasmus University Rotterdam
5 / 34
A difficult example: Example R5.3
▶ Mr. Smith enters a post office and sees that his friends, Mr. Jones
and Mr. Brown are being served.
▶ Suppose that there are two clerks in the office, and for each clerk the
time to serve a client is exponentially distributed with mean 1/λ.
▶ What is the probability that Mr. Smith leaves the post office after
Mr. Jones and Mr. Brown?
Chen Zhou
Erasmus University Rotterdam
6 / 34
Example R5.3: solution using memoryless property
▶ Consider the moment when a clerk becomes available for Mr. Smith
▶ At that moment one clerk finishes the work, and is available for Mr.
Smith, but the other one is still busy with some remaining work.
▶ To leave after both friends, the service for Mr. Smith should take
longer time than the remaining work.
▶ Due to the memoryless property of the exponential distribution,
although the busy clerk has been busy for some time, the amount of
time this clerk needs to finish his job follows still the same
distribution as if he were just starting.
▶ Thus, at that moment when Mr. Smith starts, the situation is
completely symmetric: “both clerks are just starting”.
▶ By symmetry, the probability that Mr. Smith leaves the post office
after both friends is 1/2.
▶ Solution without calculation!
Chen Zhou
Erasmus University Rotterdam
7 / 34
Exponential Distribution: Properties
Chen Zhou
Erasmus University Rotterdam
8 / 34
Exponential distribution has a constant hazard rate
▶ Let X be a positive random variable with cdf F and pdf f .
▶ The hazard rate or failure rate function is given by
r (t) =
f (t)
1 − F (t)
▶ Interpretation: the conditional probability density that the final event
occurs at time t, given that the final event did not occur up to time t.
▶ Similar to the PDF, CDF and MGF, the hazard function uniquely
determines the distribution of X .
▶ The hazard rate function for the exponential distribution is constant:
r (t) =
f (t)
λe −λt
= −λt = λ.
1 − F (t)
e
▶ The exponential distribution is the only distribution with a constant
hazard rate.
Chen Zhou
Erasmus University Rotterdam
9 / 34
Additional properties of the exponential distribution
1. Let X1 , . . . , Xn be iid exponential random variables
with common
P
mean 1/λ. Then the random variable Y = ni=1 Xi has a gamma
distribution with parameters n and λ, and its probability density
function is given by
fY (t) = λe −λt
(λt)n−1
.
(n − 1)!
2. Let X1 and X2 be two independent exponential random variables with
respective means 1/λ1 and 1/λ2 . Then,
P(X1 < X2 ) =
Chen Zhou
λ1
.
λ1 + λ2
Erasmus University Rotterdam
10 / 34
Additional properties of the exponential distribution
3. Let X1 , . . . , Xn be independent exponential random variables with
respective rates λi , i = 1, . . . , n. Then,
 
 
n


X
P(min(X1 , . . . , Xn ) > x) = exp − 
λj  x .


j=1
▶ That is, minj Xj has an exponential distribution with rate
Pn
j=1
λj .
4. Let X1 , . . . , Xn be independent exponential random variables with
respective rates λi , i = 1, . . . , n. Then,
λi
.
P Xi = min Xj = P Xi < min Xj = Pn
j
j̸=i
j=1 λj
Chen Zhou
Erasmus University Rotterdam
11 / 34
Additional properties of the exponential distribution
5. Let X1 , . . . , Xn be independent exponential random variables with
respective rates λi , i = 1, . . . , n. Then,
P(Xi1 < · · · < Xin | min Xi > t)
i
= P(Xi1 − t < · · · < Xin − t | min Xi > t)
i
= P(Xi1 < · · · < Xin ).
▶ The random variable mini Xi and the rank ordering of the Xi (i.e.,
Xi1 < Xi2 < · · · < Xin ) are independent.
Chen Zhou
Erasmus University Rotterdam
12 / 34
Exponential Distribution: an Example
Chen Zhou
Erasmus University Rotterdam
13 / 34
Example R5.8
▶ Suppose you arrive at a post office having two clerks
▶ Both are busy but there is no one else waiting in a line.
▶ You will enter service when either clerk becomes free.
▶ Services times for clerk i are exponential random variables with rate
λi for i = 1, 2. All service times are independent.
▶ Find E(T ), where T is the amount of time you spend in the post
office.
Chen Zhou
Erasmus University Rotterdam
14 / 34
Example R5.8: continued
▶ Let Ri denote the remaining service time of the customer with clerk i.
▶ By the memoryless property, it follows that R1 and R2 are independent
exponential random variables with means 1/λ1 and 1/λ2 , respectively.
▶ The waiting time before being served is W = min(R1 , R2 ))
▶ By property 3
1
E(W ) = E(min(R1 , R2 )) =
λ1 + λ2
▶ The service time S depends on being served by whom
▶ If R1 < R2 , it will be served by barber 1, E(S|R1 < R2 ) = E(R1 ) =
▶ If R1 > R2 , it will be served by barber 2, E(S|R1 > R2 ) = E(R2 ) =
▶ Hence, by conditioning, we get
1
λ1
1
λ2
E(S) = E(S|R1 < R2 )P(R1 < R2 ) + E(S|R1 > R2 )P(R1 > R2 )
1
1
P(R1 < R2 ) + P(R1 > R2 )
=
λ1
λ2
▶ We still need to get the probabilities
Chen Zhou
Erasmus University Rotterdam
15 / 34
Example R5.8: continued
▶ We need to calculate P(R1 < R2 )
▶ By property 2
P(R1 < R2 ) =
λ1
λ1 + λ2
▶ Continue for calculating E(S)
1
1
P(R1 < R2 ) + P(R1 > R2 )
λ1
λ2
1
λ1
1
λ2
=
+
λ1 λ1 + λ2 λ2 λ1 + λ2
2
=
λ1 + λ2
E(S) =
▶ Finally
E(T ) = E(W ) + E(S) =
Chen Zhou
Erasmus University Rotterdam
3
λ1 + λ2
16 / 34
Poisson process: first definition
Chen Zhou
Erasmus University Rotterdam
17 / 34
Moving from Markov Chain to continuous time process
▶ Markov Chain: discrete time, discrete states process
▶ We are going to explore the world of continuous time process
▶ Simple example: number of clients arrived at a counter
{X (t)}t∈[0,+∞)
▶ Time: [0, +∞), continuous!
▶ States: X (t) ∈ {0, 1, 2, . . .}, still discrete
▶ More on the sates: a fixed path 0 → 1 → 2 . . .
▶ We start with such simple continuous time, discrete state process
with states going up one by one. It is called a counting process.
▶ The only interesting element in the process: how long it takes to go
from i to i + 1?
▶ This is often modeled by the exponential distribution.
Chen Zhou
Erasmus University Rotterdam
18 / 34
Counting processes
▶ A stochastic process {N(t), t ≥ 0} is a counting process whenever
N(t) denotes the total number of events that occur by time t.
▶ A counting process {N(t), t ≥ 0} should satisfy the following:
▶ N(t) ≥ 0.
▶ N(t) is integer valued.
▶ For s < t, N(s) ≤ N(t).
▶ For s < t, N(t) − N(s) represents the number of events that occur in
the interval (s, t].
Chen Zhou
Erasmus University Rotterdam
19 / 34
Independent and stationary increments
▶ A counting process has independent increments whenever the number
of events that occur in one time interval is independent of the
number of events that occur in another (disjoint) time interval.
▶ That is, N(s) is independent of N(s + t) − N(s).
▶ A counting process has stationary increments whenever the number of
events that occur in any interval depends only on the length of the
interval.
▶ That is, the number of events in the interval (s, s + t] has the same
distribution for all s.
Chen Zhou
Erasmus University Rotterdam
20 / 34
Poisson process: first definition
▶ The counting process {N(t), t ≥ 0} is a Poisson process with rate λ,
λ > 0 when
1. N(0) = 0.
2. The process has independent increments.
3. The number of events in any interval of length t is Poisson distributed
with mean λt. In other words, for all s, t ≥ 0
P(N(t + s) − N(s) = n) = e −λt
(λt)n
,
n!
n = 0, 1, . . .
▶ Note that the last condition implies that a Poisson process
▶ has stationary increments,
▶ E(N(t)) = λt.
Chen Zhou
Erasmus University Rotterdam
21 / 34
Poisson process: second definition
Chen Zhou
Erasmus University Rotterdam
22 / 34
Poisson process: second definition
▶ The counting process {N(t), t ≥ 0} is a Poisson process with rate λ,
λ > 0 when
1.
2.
3.
4.
N(0) = 0.
The process has stationary and independent increments.
P(N(h) = 1) = λh + o(h), as h → 0
P(N(h) ≥ 2) = o(h), as h → 0
▶ A function g (·) is said to be o(h) if
g (h)
= 0.
h→0 h
lim
That is, the function g (h) goes to zero faster than h goes to zero.
▶ P(N(h) = 1) = λh + o(h) means
lim
h→0
Chen Zhou
P(N(h) = 1) − λh
= 0.
h
Erasmus University Rotterdam
23 / 34
Comparing the two definitions
▶ Commonality: both definitions assume independent and stationary
increments (no need to prove)
▶ Difference
▶ The first definition has a precise distribution for N(t) for any t
▶ The second definition only defines distribution for N(h) with h → 0
▶ The second definition did not even specify a precise distribution, but
only an approximation
▶ Still, the two definitions are equivalent!
Chen Zhou
Erasmus University Rotterdam
24 / 34
The first definition implies the second
▶ The proof is not difficult since the first definition is precise
▶ We only need to verify that the limit relations in the second definition
hold
▶ Using Taylor expansion (or L’Hôspital)
P(N(h) = 0) = e −λh = 1 − λh + o(h)
P(N(h) = 1) = λhe −λh = λh + o(h)
P(N(h) ≥ 2) = 1 − P(N(h) = 0) − P(N(h) = 1)
= o(h)
Note: o(h) + o(h) = o(h), c · o(h) = o(h).
Chen Zhou
Erasmus University Rotterdam
25 / 34
The second definition implies the first: intuition
▶ The second definition implies that, approximately
▶ For sufficiently small h, N(h) is either 0 or 1
▶ The probability that N(h) = 1 is approximately λh
▶ Cut the region [0, t] into m small equal length intervals
▶ Each interval has a length h = t/m
▶ As m → ∞, h → 0
▶ Due to stationary increments, on each interval, there will be either 0
or 1 event (Bernoulli) with the same probability λh + o(h)
▶ Due to independent increments, N(t) follows approximately a
Binomial distribution Bin(m, λh + o(h))
Chen Zhou
Erasmus University Rotterdam
26 / 34
The second definition implies the first: intuition
▶ For any m and h = t/m, N(t) follows approximately a Binomial
distribution Bin(m, λh + o(h))
▶ As m → ∞, m · (λh + o(h)) = λt + m · o(1/m) → λt
▶ For a Binomial distribution B(n, pn ) if as n → ∞ npn → µ, then the
Bionmial distribution converges to a Poisson distribution Poi(µ).
▶ By taking m → ∞, we argue that N(t) follows Poi(λt).
▶ For the formal proof for the equivalence, see Theorem 5.1 (not
required).
Chen Zhou
Erasmus University Rotterdam
27 / 34
Poisson process: third definition
Chen Zhou
Erasmus University Rotterdam
28 / 34
Interarrival times
▶ For a Poisson process, let Tn , n > 1 be the nth interarrival time: the
time elapsed between the (n − 1)th event and the nth event.
▶ Then, {Tn , n = 1, 2, . . .} is a sequence of iid exponential random
variables with common mean 1/λ.
▶ See Proposition 5.4 [Proposition 5.1] (proof is required!)
Chen Zhou
Erasmus University Rotterdam
29 / 34
Proof
▶ For T1 , notice that the events T1 < t is equivalent to N(t) ≥ 1.
P(T1 < t) = P(N(t) ≥ 1) = 1 − P(N(t) = 0)
= 1 − e −λt
Exponentional with rate λ
▶ After T1 , we can restart the process
▶ Since we have independent increment, the new process is independent
of T1
▶ Since we have stationary increment, the new process has the same
distributional property as the original process
▶ Hence, T2 has the same distribution and is independent of T1 .
▶ Further extend the series, we get {Tn , n = 1, 2, . . .} is a sequence of
iid exponential random variables with common mean 1/λ.
Chen Zhou
Erasmus University Rotterdam
30 / 34
Arrival times
▶ The arrival time of the nth event, Sn , is also called the waiting time
until the nth event. Clearly,
Sn =
n
X
Ti ,
n ≥ 1.
i=1
▶ Thus, Sn has a gamma distribution with parameters with n and λ
yielding
(λt)n−1
fSn (t) = λe −λt
, t ≥ 0.
(n − 1)!
▶ We define S0 = 0.
▶ Note that
N(t) ≥ n ⇐⇒ Sn ≤ t.
▶ This is more than a note! Often used in Poisson process
Chen Zhou
Erasmus University Rotterdam
31 / 34
Poisson process: third definition
▶ In fact, we can use the exponential interarrival times to define a
Poisson process. Start with a sequence {Tn , n ≥ 1} of iid exponential
random variables with a common mean of 1/λ.
▶ Define then a counting process for which the nth event occurs at time
Sn ≡ T1 + T2 + · · · + Tn .
▶ Then, {N(t), t ≥ 0} is a Poisson process with rate λ, where
N(t) ≡ max{n : Sn ≤ t}.
▶ We have proved that the first definition implies the third. In fact, the
third definition also implies the first (equivalence). Proof not required.
▶ All three definitions are equivalent. Sometimes, one is more handy
than the other two in applications
Chen Zhou
Erasmus University Rotterdam
32 / 34
Poisson process: an example
▶ Customers enter a shop at a Poisson rate λ = 1 per day.
a. What is the probability that the first customer will arrive after 0.5
days?
▶ The event N(0.5) = 0 or T1 > 0.5. Answer:
0.50 −0.5
0! e
= e −0.5 .
b. What is the expected time until the tenth customer arrives?
▶ Answer: E(S10 ) = 10/λ = 10 days (using Gamma distribution)
c. What is the probability that the time elapsed between the tenth and
the eleventh customer exceeds two days?
▶ The event T11 > 2. Answer: e −2 .
Chen Zhou
Erasmus University Rotterdam
33 / 34
Poisson process: example
d. What is the probability that exactly 4 customers have arrived in the
first two days?
▶ The event N(2) = 4. Answer: e −2 24 /4!.
e. What is the probability that exactly 4 customers have arrived in the
third and fourth day, given that 2 customers have arrived in the first
two days?
▶ Answer: Indifferent from d. (why?)
Chen Zhou
Erasmus University Rotterdam
34 / 34
Plan for Lecture 7
Video before lecture
▶ Merging two Poisson processes (Slides 2-5)
Lecture
1. Decomposing a Poisson process (Chapter 5.3.3 [Chap 5.3.4], till
Example R5.14)
2. Conditional arrival distribution (Chap 5.3.4 [Chap 5.3.5], till Theorem
5.2)
3. Nonhomogeneous Poisson process (Chap 5.4.1, till Example 5.24)
Proof of Theorem 5.3 is not required, but the intuition in Remark (ii)
[Remark] should be understood.
Chen Zhou
Erasmus University Rotterdam
1 / 24
Merging two Poisson processes
Chen Zhou
Erasmus University Rotterdam
2 / 24
Merging two Poisson processes
▶ Suppose that {N1 (t), t ≥ 0} and {N2 (t), t ≥ 0} are independent
Poisson processes with respective rates λ1 and λ2 , where Ni (t)
corresponds to type i arrivals.
▶ Let N(t) = N1 (t) + N2 (t), for t ≥ 0.
▶ Then the following holds:
▶ The merged process {N(t), t ≥ 0} is a Poisson process with rate
λ = λ1 + λ2 .
▶ The probability that an arrival in the merged process is of type i is
λi /(λ1 + λ2 ).
Chen Zhou
Erasmus University Rotterdam
3 / 24
Merging two Poisson processes: proof
▶ Proof the first statement using the first definition
▶ Clearly the merged process has independent and stationary increments
▶ To prove N(t) follows Poi(λt), we use the the fact that sum of
independent Poisson distributed random variable is still Poisson
distributed (MGF)
▶ Proof the second statement using the third definition
▶ Denote the time to the next arrival of type i as T1(i) . Then the next
(1)
(2)
arrival is type 1 if T1 < T1
▶ The probability of having the next arrival as type 1 is
(1)
(2)
P(T1 < T1 ) = λ1 /(λ1 + λ2 )
▶ Choose the definition smartly!
Chen Zhou
Erasmus University Rotterdam
4 / 24
Merged Poisson process: Example
▶ Two types of parkers arrive at a parking lot: long-term parkers arrive
at a hourly rate of λ1 = 4 and short-term parkers arrive at a hourly
rate of λ2 = 20.
a. What is the probability that nobody wants to park during an interval
of 15 minutes?
▶ Merged process has rate λ = 24. 15 mins is 1/4 hour.
▶ Answer: P(N(0.25) = 0) = exp {−(24)(1/4)} = e −6 .
b. What is the probability that the first arriving parker is a long-term
parker?
▶ Answer: λ1 /(λ1 + λ2 ) = 1/6.
Chen Zhou
Erasmus University Rotterdam
5 / 24
Decomposing a Poisson process
Chen Zhou
Erasmus University Rotterdam
6 / 24
Decomposing a Poisson process
▶ Consider a Poisson process {N(t), t ≥ 0} with rate λ.
▶ Suppose that each event in this process is classified as type I with
probability p and type II with probability (1 − p) independently of all
other events.
▶ Let N1 (t) and N2 (t) respectively denote the type I and type II events
occurring in time (0, t].
▶ Then, the counting processes {N1 (t), t ≥ 0} and {N2 (t), t ≥ 0} are
both Poisson processes with respective rates λp and λ(1 − p).
Moreover, the two processes are independent.
▶ This theorem (Proposition 5.5 [Proposition 5.2]) reads as an
“inverse” result of merged Possion process.
▶ It can be generalized to the case of r possible types of events.
Chen Zhou
Erasmus University Rotterdam
7 / 24
Proof for N1 (t) using the second definition
▶ Clearly, N1 (t) has independent and stationary increments
▶ N(t) has independent and stationary increments
▶ For each interval, given the number of events, the type of each event is
independent of the process N(t).
▶ Recall the second definition
▶ N(h) is approximately a Bernoulli distribution with parameter λh + o(h)
▶ With law of total probability
P(N1 (h) = 1) = P(N1 (h) = 1|N(h) = 1)P(N(h) = 1)
+ P(N1 (h) = 1|N(h) ≥ 2)P(N(h) ≥ 2)
= p · (λh + o(h)) + o(h)
= (λp)h + o(h)
P(N1 (h) ≥ 2) ≤ P(N(h) ≥ 2) = o(h)
▶ The second definition is useful for this proof!
Chen Zhou
Erasmus University Rotterdam
8 / 24
Proof for independence
▶ The only remaining issue is to prove independence between the two
processes
▶ For each value of k, m ∈ {0, 1, 2, . . .}, we show
P(N1 (t) = k, N2 (t) = m) = P(N1 (t) = k)P(N2 (t) = m).
▶ By the law of total probability,
P(N1 (t) = k, N2 (t) = m)
∞
X
=
P(N1 (t) = k, N2 (t) = m | N(t) = n) × P(N(t) = n).
n=0
▶ Only for n = k + m, P(N1 (t) = k, N2 (t) = m|N(t) = n) is not zero.
Chen Zhou
Erasmus University Rotterdam
9 / 24
Proof: continued
▶ It follows that
P(N1 (t) = k, N2 (t) = m)
= P(N1 (t) = k, N2 (t) = m | N(t) = k + m)
× P(N(t) = k + m)
▶ Using binomial distribution
P(N1 (t) = k, N2 (t) = m | N(t) = k + m) × P(N(t) = k + m)
k +m k
(λt)k+m
=
p (1 − p)m e −λt
k
(k + m)!
(λpt)k −λ(1−p)t (λ(1 − p)t)m
=e −λpt
e
k!
m!
=P(N1 (t) = k)P(N2 (t) = m).
Chen Zhou
Erasmus University Rotterdam
10 / 24
Decomposing a Poisson process: an Example
a. If customers to shop A arrive at Poisson rate λ = 10 per week and
1
, what is the probability
each customer is local with probability p = 12
that 20 local persons will enter shop A in February?
▶ The arrival of local customers form a Poisson process with weekly rate
λp = 10/12 = 5/6
▶ One month is four weeks. Arrivals of local customers during the 4
weeks of February follow a Poisson distribution with mean 4λp = 20/6
▶ The wanted probability is
exp{−(4λp)}(4λp)20 /20! = exp{−20/6}(20/6)20 /20!.
b. Calculate the same probability if we know that 5 non-local persons
have entered the shop in February.
▶ Answer: Indifferent! Because arrivals of non-local customers form a
Poisson process which is independent of the “local” Poisson process!
Chen Zhou
Erasmus University Rotterdam
11 / 24
Conditional arrival distribution
Chen Zhou
Erasmus University Rotterdam
12 / 24
Conditional arrival distribution
▶ Suppose that we know that exactly one event of a Poisson process
has taken place by time t. When would this event actually occur?
▶ The conditional distribution of T1 , conditional on N(t) = 1.
▶ For s ≤ t, we have
P(T1 < s, N(t) = 1)
P(N(t) = 1)
P(N(s) = 1, N(t) − N(s) = 0)
=
P(N(t) = 1)
P(T1 < s | N(t) = 1) =
=
λse −λs e −λ(t−s)
s
= .
−λt
t
λte
▶ Then, the distribution of the time of this event is uniformly
distributed over (0, t).
Chen Zhou
Erasmus University Rotterdam
13 / 24
Generalizing the conditional arrival distribution
▶ We generalize the result to conditioning on N(t) = n and study the
arrival times S1 , . . . , Sn
▶ For this purpose, we need order statistics (check Probability Theory!)
▶ Let Y1 , Y2 , . . . , Yn be n iid continuous random variables. The order
statistics for these variables are given by Y(1) ≤ Y(2) ≤ · · · ≤ Y(n) .
▶ Think of Y1 , Y2 , . . . , Yn as a random sample. Then
Y(1) ≤ Y(2) ≤ · · · ≤ Y(n) is the ordered sample.
▶ For any realization of the random variables Y1 , . . . , Yn , we sort these
realizations in an increasing order.
▶ For example let n = 3, if the realized values are
Y1 = 4,
Y2 = 5,
Y3 = 1,
then
Y(1) = 1,
Chen Zhou
Y(2) = 4,
Erasmus University Rotterdam
Y(3) = 5.
14 / 24
Joint density of the order statistics
▶ If Y1 , . . . , Yn are iid with density f
▶ Then the joint density of the order statistics Y(1) , . . . , Y(n) is
f (y1 , . . . , yn ) = n!
n
Y
f (yi ),
y1 ≤ · · · ≤ yn .
i=1
▶ If the random variables U1 , U2 , . . . , Un are uniformly distributed over
(0, t), then the joint density function of the order statistics
U(1) ≤ U(2) ≤ . . . ≤ U(n) becomes
f (u1 , . . . , un ) =
Chen Zhou
n!
,
tn
u1 ≤ · · · ≤ un ≤ t.
Erasmus University Rotterdam
15 / 24
Conditional arrival distribution: generalized
Theorem (Theorem 5.2)
Given that N(t) = n, the n arrival times S1 , . . . , Sn have the same
distribution as the order statistics corresponding to n independent random
variables uniformly distributed on the interval (0, t).
▶ Proof is very similar to handling the conditional distribution of T1
given N(t) = 1, just slightly more complicated
Chen Zhou
Erasmus University Rotterdam
16 / 24
Using the joint conditional arrival distribution
▶ The theorem states that conditional on N(t) = n
d
(S1 , S2 , . . . , Sn ) = (U(1) , U(2) , . . . , U(n) )
where U(1) ≤ U(2) ≤ . . . ≤ U(n) are the order statistics of
{U1 , U2 , . . . , Un }, iid random variables from U(0, t).
▶ For any function f
n
X
i=1
d
f (Si ) =
n
X
d
f (U(i) ) =
i=1
n
X
f (Ui )
i=1
▶ The first equality follows the theorem
▶ The second equality: treating the ordered sample symmetrically is
equivalent to treating the original sample in the same way
▶ It is not limited to sum, any “symmetric” operation will be fine
Chen Zhou
Erasmus University Rotterdam
17 / 24
Conditional arrival distribution: example
▶ Customers, arriving according to a Poisson process of rate λ, pay e1,
upon arrival.
▶ The payment of customer k, discounted to time 0 is e −βSk , where Sk
is the kth arrival time.
▶ Typical setup in financial asset pricing!
▶ What is the expected total sum paid up to time t discounted to time
0?
▶ Answer




N(t)
N(t)
∞
X
X
X
E
e −βSk  =
E
e −βSk |N(t) = n P(N(t) = n)
k=1
=
=
n=1
k=1
∞
X
n
X
E
!
e −βSk |N(t) = n P(N(t) = n)
n=1
k=1
∞
n
XX
E(e −βSk |N(t) = n)P(N(t) = n).
n=1 k=1
Chen Zhou
Erasmus University Rotterdam
18 / 24
Example: continued
Theorem (Theorem 5.2)
Given that N(t) = n, the n arrival times S1 , . . . , Sn have the same
distribution as the order statistics corresponding to n independent random
variables uniformly distributed on the interval (0, t).
▶ Let U1 , U2 , . . . , Un be a random sample drawn from the uniform
distribution on (0, t).


N(t)
∞ X
n
X
X
E
e −βSk  =
E(e −βUk )P(N(t) = n)
k=1
=
∞
X
n=1 k=1
nE(e −βU )P(N(t) = n) = E(e −βU )E(N(t))
n=1
=
Chen Zhou
λ
1
(1 − e −βt )λt = (1 − e −βt ).
β
βt
Erasmus University Rotterdam
19 / 24
Nonhomogeneous Poisson process
Chen Zhou
Erasmus University Rotterdam
20 / 24
From homogeneous to nonhomogeneous
▶ Poisson process has a fixed rate λ – homogeneous Poisson process
▶ Whenever the arrival rate of a Poisson process is a function of time,
then we have a nonhomogeneous or nonstationary Poisson process.
▶ The counting process {N(t), t ≥ 0} is said to be a nonhomogeneous
Poisson process with intensity function λ(t), t ≥ 0, if
1.
2.
3.
4.
N(0) = 0.
{N(t), t ≥ 0} has independent increments.
P(N(t + h) − N(t) = 1) = λ(t)h + o(h).
P(N(t + h) − N(t) ≥ 2) = o(h).
▶ Compared to homogeneous, we have a (varying) function λ(t) instead
of λ
Chen Zhou
Erasmus University Rotterdam
21 / 24
Nonhomogeneous Poisson process: properties
▶ If we let
Z
m(t) =
t
λ(y ) dy ,
0
then, N(s + t) − N(s) is a Poisson random variable with mean
m(s + t) − m(s). (Theorem 5.3)
▶ Mathematically, for n ≥ 0,
P(N(s + t) − N(s) = n)
[m(s + t) − m(s)]n
= e −[m(s+t)−m(s)]
.
n!
▶ Here m(t) is called the mean value function of the nonhomogeneous
Poisson process.
▶ A nonhomogeneous Poisson process allows us to model events that
are more likely to occur at certain times.
Chen Zhou
Erasmus University Rotterdam
22 / 24
Nonhomogeneous Poisson process: example R5.24
▶ A hot dog stand opens at 8 A.M.
▶ From 8 until 11 A.M. customers arrive, on the average, at a steadily
increasing rate that starts with an initial rate of 5 customers per hour
at 8 A.M. and reaches a maximum of 20 customers per hour at 11 A.M.
▶ From 11 A.M. until 1 P.M. the average rate seems to remain constant
at 20 customers per hour.
▶ From 1 P.M. until closing time at 5 P.M. the rate drops steadily to 12
customers per hour.
▶ Suppose that the numbers of customers arriving at the stand during
disjoint time periods are independent.
▶ What is the probability that no customers arrive between 8:30 A.M.
and 9:30 A.M. on Monday morning?
▶ What is the expected number of arrivals in this period?
Chen Zhou
Erasmus University Rotterdam
23 / 24
Example R5.24: continued
▶ From the description we get the intensity


5 + 5t,
λ(t) = 20,


20 − 2(t − 5),
▶
▶
▶
▶
t
t
t
t
=0
=3
=5
=9
corresponds
corresponds
corresponds
corresponds
to
to
to
to
function
0 ≤ t ≤ 3,
3 ≤ t ≤ 5,
5≤t≤9
8AM
11AM
1PM
5PM
▶ The number of arrivals between 8:30 A.M. and 9:30 A.M. is Poisson
distributed with mean
Z 1.5
m(1.5) − m(0.5) =
λ(t)dt = 10
0.5
▶ The probability that this number will be zero is e −10
▶ The mean number of arrivals in the same time period is simply 10.
Chen Zhou
Erasmus University Rotterdam
24 / 24
Plan for Lecture 8
Video before lecture
▶ Definition of Continuous-Time Markov Chain (Slides 2-8)
▶ A one-slide course on differential equations (Slide 21)
Lecture
1. Birth and death processes (Chap 6.3, excluding Example 6.4)
2. Transition probability function (Chap 6.4)
Chen Zhou
Erasmus University Rotterdam
1 / 32
Continuous-time Markov chains
Chen Zhou
Erasmus University Rotterdam
2 / 32
From counting process to continuous-time Markov chains
▶ Poisson process is a counting process
▶ Continuous time, discrete states ({0, 1, 2, . . .})
▶ The process goes in an increasing order in the states, with step 1
0 → 1 → 2 → ···
▶ It only matters when the events occur
▶ Even non-homogeneous Poisson process is still a counting process
▶ We now move to continuous-time Markov chains
▶ Continuous time, discrete states ({0, 1, 2, . . .} or finite)
▶ The process may transit in these states
5 → 8 → 3 → 5 → ···
▶ State transition is similar to a Markov Chain, the occurrence of
transition is similar to Poisson process!
Chen Zhou
Erasmus University Rotterdam
3 / 32
Continuous-time Markov chains: definition
▶ Let {X (t), t ≥ 0} be a continuous-time stochastic process taking
values in {0, 1, 2, . . .}.
▶ Let {x(t), t ≥ 0} be any deterministic function taking values in
{0, 1, 2, . . .}.
▶ The process {X (t), t ≥ 0} is called a continuous-time Markov chain if
P(X (t + s) = j | X (s) = i, X (u) = x(u), 0 ≤ u < s)
= P(X (t + s) = j | X (s) = i).
for all s, t ≥ 0, functions {x(t), t ≥ 0} and i, j = 0, 1, 2, . . . .
▶ The probability P(X (t + s) = j | X (s) = i) is called a transition
probability.
Chen Zhou
Erasmus University Rotterdam
4 / 32
Stationarity
▶ If a continuous-time Markov chain {X (t), t ≥ 0} satisfies
P(X (t + s) = j | X (s) = i) = P(X (t) = j | X (0) = i)
for every s, t ≥ 0, then {X (t), t ≥ 0} is stationary or
time-homogeneous.
▶ We may also say that the transition probabilities are stationary.
▶ In this lecture, we will concentrate on stationary continuous-time
Markov chains {X (t) : t ≥ 0}.
Chen Zhou
Erasmus University Rotterdam
5 / 32
Time spent in state i before transition
▶ Let Ti denote the time the process {X (t), t ≥ 0} spends in state i
before making a transition into a different state.
▶ It is called the sojourn time in state i
▶ Then, by the Markov property
P(Ti > s + t | Ti > s) = P(Ti > t)
for every s, t ≥ 0.
▶ As the continuous random variable Ti is memoryless, it must have an
exponential distribution, say with mean 1/vi .
▶ vi is the rate of the exponential distribution.
Chen Zhou
Erasmus University Rotterdam
6 / 32
Transition probabilities to a different state
▶ When the process leaves the current state i, it makes a transition to
some other state j ̸= i.
▶ Let Pij denote the probability of next entering state j, given that the
current state is i.
▶ The transition probabilities Pij satisfy
Pii = 0,
and
X
Pij = 1
j
for every i = 0, 1, 2, . . . .
Chen Zhou
Erasmus University Rotterdam
7 / 32
Characterizing Continuous-time Markov chains
▶ A stationary continuous-time Markov chain is characterized by the
following two properties:
▶ each of the random variables Ti has an exponential distribution with
rate vi ;
▶ the transition probabilities Pij satisfy
X
Pii = 0 and
Pij = 1.
j
▶ We have shown these properties!
▶ Conversely, a process which satisfies these two properties is a
stationary continuous-time Markov chain.
Chen Zhou
Erasmus University Rotterdam
8 / 32
Birth and death process
Chen Zhou
Erasmus University Rotterdam
9 / 32
Pure birth process, Birth and death process
▶ If only the transition from i to i + 1 is allowed, then the process is
called a pure birth process.
...
▶ Every transition is a “birth”.
▶ A process X (t) is said to “start at zero” if X (0) = 0.
▶ A pure birth process starting at zero is a counting process.
▶ Think of the counting process as counting the number of births.
▶ The Poisson process is a pure birth process.
▶ If only transitions from i to i − 1 or i + 1 are allowed, then the
process is called a birth and death process.
...
Chen Zhou
Erasmus University Rotterdam
10 / 32
Birth and death processes: model
▶ If the next state is i + 1, then the transition is called a birth or an
arrival.
▶ Arrivals occur with rate λi .
▶ That is, the time until the next arrival is exponentially distributed with
mean 1/λi .
▶ If the next state is i − 1, then the transition is called a death or a
departure.
▶ Departures occur with rate µi .
▶ That is, the time until the next departure is exponentially distributed
with mean 1/µi .
▶ The parameters of this birth and death process are the arrival (birth)
rates {λi } and the departure (death) rates {µi }.
▶ To avoid negative numbers, we can require that the departure rate µ0
is equal to zero.
Chen Zhou
Erasmus University Rotterdam
11 / 32
Pure birth and Poisson processes
▶ A pure birth process is a birth and death process in which no
departures may occur.
▶ Thus, the departure rates µi are all equal to zero.
▶ The parameters of a pure birth process are the arrival rates {λi }∞
i=0 .
▶ A Poisson process is a pure birth process in which all arrival rates are
equal.
▶ Thus, the arrival rates λi have a common value λ.
▶ A Poisson process has only one parameter: the common arrival rate λ.
Chen Zhou
Erasmus University Rotterdam
12 / 32
Birth and death processes: example
▶ Consider a system with a single server.
▶ Customers arrive at the server according to a Poisson process with
rate λ. Upon arrival,
▶ if a customer finds a server free, then he enters the service immediately;
▶ if a customer finds a server busy, then he first has to wait in a queue
before entering the service.
▶ The service needs some time to complete.
▶ The successive service times are independent exponential random
variables with mean 1/µ.
▶ The number of customers in the system {X (t), t ≥ 0} is a birth and
death system
▶ λi = λ
▶ µi = µ for i ≥ 1 and µ0 = 0
Chen Zhou
Erasmus University Rotterdam
13 / 32
Birth and death processes: transition rates
▶ Suppose we know all arrival rates and all departure rates. What can
we say about the rate at which transitions occur?
▶ Assume we have just arrived in state i. Let Ai be the time until the
next arrival, and Di be the time until the next departure.
▶ Thus, Ai and Di have exponential distributions with rates λi and µi ,
respectively.
▶ The time until the next transition Ti is then the minimum of the two
exponential random variables Ai and Di
▶ Ti has an exponential distribution with rate
vi = λi + µi .
▶ In particular, v0 = λ0 .
▶ Expected time to transition
E(Ti ) =
Chen Zhou
1
1
=
.
vi
λi + µi
Erasmus University Rotterdam
14 / 32
Birth and death processes: transition probability
▶ Suppose we know all arrival rates and all departure rates. What can
we say about the transition probabilities Pij ?
▶ Given current state i, the probability that the next transition
corresponds to an arrival is equal to
Pi,i+1 = P(Ai < Di ) =
λi
.
λi + µi
▶ In particular, P01 = 1.
▶ Given current state i, the probability that the next transition
corresponds to a departure is equal to
Pi,i−1 =
Chen Zhou
µi
.
λi + µi
Erasmus University Rotterdam
15 / 32
Transition probability function
Chen Zhou
Erasmus University Rotterdam
16 / 32
The transition probability function: definition
▶ The transition probability function of the continuous-time Markov
chain is given by
Pij (t) = P(X (t + s) = j | X (s) = i).
Chen Zhou
Erasmus University Rotterdam
17 / 32
The transition probability function: pure birth
▶ In case of a pure birth process with distinct birth rates {λn }∞
n=0 , we
can explicitly determine the transition probability function.
▶ Since the process is stationary and the states are discrete
Pij (t) =P(X (t) = j | X (0) = i)
=P(X (t) < j + 1 | X (0) = i) − P(X (t) < j | X (0) = i)
▶ Given X (0) = i, it holds for j > i that
X (t) < j ⇐⇒ Ti + · · · + Tj−1 > t,
where Tk denotes the time the process spends in state k before
making a transition.
▶ For a pure birth process, the transition from i to j must pass all
k = i, i + 1, . . . , j − 1!
▶ Consequently, for j > i,
j−1
X
P(X (t) < j | X (0) = i) = P(
Tk > t).
k=i
Chen Zhou
Erasmus University Rotterdam
18 / 32
The transition probability function: pure birth
P
▶ Note that j−1
k=i Tk is a hypoexponential random variable, see
Ross [2014], Subsection 5.2.4.
▶ It follows that, for j > i,
!
j−1
X
P(X (t) < j | X (0) = i) = P
Tk > t
k=i
=
j−1
X
k=i
e −λk t
j−1
Y
r ̸=k,r =i
λr
,
λr − λk
see Ross [2014], Equation (5.9).
▶ Complex and scary!
▶ We have, even more complex, see Proposition 6.1
Pij (t) = P(X (t) < j + 1 | X (0) = i) − P(X (t) < j | X (0) = i)
▶ Combining with Pii (t) = P(Ti > t) = e −λi t , we have obtained the
transition probability function for a pure birth process.
Chen Zhou
Erasmus University Rotterdam
19 / 32
Complexity in transition probability functions
▶ The transition probability function for the pure birth process is
already complicated.
▶ It is getting worse for a general continuous-time MC: Often no
explicit way to derive the transition probability functions.
▶ Alternatively, we will describe the transition probability functions by a
set of differential equations
Chen Zhou
Erasmus University Rotterdam
20 / 32
A one-slide course on differential equations
▶ Example of a differential equation
f ′ (x) = cf (x)
▶ Differential equations might be explicitly solved
f ′ (x)
= c ⇒ (log f (x))′ = c ⇒ log f (x) = cx + b ⇒ f (x) = e cx+b
f (x)
▶ Or at least can be numerically solved
f (0) = b ⇒ f (0.01) = f (0) + 0.01 × c × f (0)
⇒ f (0.02) = f (0.01) + 0.01 × c × f (0.01)
⇒ · · · f (5)
▶ Getting differential equation(s) is equivalent to having an explicit form
Chen Zhou
Erasmus University Rotterdam
21 / 32
Instantaneous transition rates
▶ The rate of transition from state i into state j is given by
qij = vi Pij .
▶ The qij values are called the instantaneous transition rates.
▶ Note that qii = 0 as a consequence of the fact that Pii = 0.
▶ Observe that

vi = vi 

X
Pij  =
vi Pij =
j
j
▶ Moreover,
Pij =
Chen Zhou
X
X
j
qij =
X
qij .
j̸=i
qij
qij
=P
.
vi
j̸=i qij
Erasmus University Rotterdam
22 / 32
Instantaneous rates and transition probability function
▶ Therefore, a continuous-time Markov chain can be defined completely
by the instantaneous transition rates qij .
▶ If i ̸= j, then
Pij (h)
lim
= qij
h→0
h
▶ The instantaneous transition rate qij is the derivative Pij′ (0) of the
transition probability Pij (t) with respect to t, evaluated in t = 0.
Chen Zhou
Erasmus University Rotterdam
23 / 32
Instantaneous rates and transition probability function
▶ As
X
Pij (h) = 1 − Pii (h),
j̸=i
it follows that
X Pij (h)
1 − Pii (h)
= lim
h→0
h→0
h
h
lim
j̸=i
=
X
j̸=i
Pij (h) X
qij = vi .
=
h→0
h
lim
j̸=i
▶ Interchanging a limit and an infinite sum is not always allowed, but can
be justified here.
▶ Similarly, −vi is the derivative Pii′ (0) of the transition probability Pii (t)
with respect to t, evaluated in t = 0.
Chen Zhou
Erasmus University Rotterdam
24 / 32
Chapman-Kolmogorov equations
▶ The next (rather obvious) result is our point of departure in deriving
the (rather difficult) Kolmogorov backward and forward equations.
▶ Chapman-Kolmogorov equations for continuous-time Markov
chains: for all s ≥ 0, t ≥ 0,
Pij (t + s) =
∞
X
Pik (t)Pkj (s).
k=0
▶ Intuitive proof
▶ In order for a continuous-time Markov chain X to go from state i to
state j in time t + s, it must be at some state k at time t.
Chen Zhou
Erasmus University Rotterdam
25 / 32
Chapman-Kolmogorov equations: Proof
▶ Thus, by the law of total probability,
Pij (t + s) = P(X (t + s) = j | X (0) = i)
∞
X
=
P(X (t + s) = j, X (t) = k | X (0) = i)
k=0
=
∞
X
P(X (t + s) = j | X (t) = k, X (0) = i)
k=0
× P(X (t) = k | X (0) = i)
=
∞
X
P(X (t + s) = j | X (t) = k)
k=0
× P(X (t) = k | X (0) = i)
∞
X
=
Pik (t)Pkj (s).
k=0
Chen Zhou
Erasmus University Rotterdam
26 / 32
Kolmogorov’s equations
▶ We will now use the Chapman-Kolmogorov equations to develop sets
of differential equations which can be used to derive explicit
expressions for the transition probability functions Pij (t).
▶ In particular, the following two sets:
▶ Kolmogorov’s backward equations.
▶ Kolmogorov’s forward equations.
▶ Both sets have their pros and cons: in some situations the backward
equations are more convenient, in other situations the forward
equations.
Chen Zhou
Erasmus University Rotterdam
27 / 32
Kolmogorov’s backward equations
▶ Kolmogorov’s backward equations for continuous-time
Markov chains: for all states i, j and times t ≥ 0,
X
Pij′ (t) =
qik Pkj (t) − vi Pij (t).
k̸=i
▶ Proof:
▶ It follows by the Chapman-Kolmogorov equations that
Pij (h + t) − Pij (t) =
∞
X
Pik (h)Pkj (t) − Pij (t)
k=0
=
X
Pik (h)Pkj (t) + Pii (h)Pij (t) − Pij (t)
k̸=i
=
X
Pik (h)Pkj (t) − {1 − Pii (h)} Pij (t).
k̸=i
Chen Zhou
Erasmus University Rotterdam
28 / 32
Proof: continued
Pij (h + t) − Pij (t) =
X
Pik (h)Pkj (t) − {1 − Pii (h)} Pij (t)
k̸=i
▶ We are ready to divide both sides by h and take the limit, note
Pik (h)
1 − Pii (h)
lim
= qik for k ̸= i lim
= vi
h→0
h→0
h
h
▶ Therefore
Pij (h + t) − Pij (t)
lim
h→0
h
P
k̸=i Pik (h)Pkj (t) − {1 − Pii (h)} Pij (t)
= lim
h→0
h
X
=
qik Pkj (t) − vi Pij (t).
k̸=i
▶ The interchange between the limit and the infinite sum is not always
allowed, but can be justified here.
Chen Zhou
Erasmus University Rotterdam
29 / 32
Kolmogorov’s backward equations: example R6.9 and
R6.10 [Combined into R6.10]
▶ For the pure birth process the backward equations are
Pij′ (t) = λi Pi+1,j (t) − λi Pij (t).
▶ The backward equations for the birth and death process are
Pij′ (t) = λi Pi+1,j (t) + µi Pi−1,j (t) − (λi + µi ) Pij (t).
▶ In particular,
P0j′ (t) = λ0 (P1j (t) − P0j (t)) .
Chen Zhou
Erasmus University Rotterdam
30 / 32
Kolmogorov equations: forward equations
▶ Kolmogorov’s forward equations for continuous-time Markov
chains: under suitable regularity conditions,
X
Pij′ (t) =
qkj Pik (t) − vj Pij (t).
k̸=j
▶ The regularity conditions do not hold in all models. However these
conditions hold for all birth and death processes and finite state models.
▶ The forward equations for the birth and death process are
Pij′ (t) = λj−1 Pi,j−1 (t) + µj+1 Pi,j+1 (t) − (λj + µj ) Pij (t).
▶ In particular,
′
Pi0
(t) = µ1 Pi1 (t) − λ0 Pi0 (t).
Chen Zhou
Erasmus University Rotterdam
31 / 32
Backward versus forward equations
▶ Why backward is backward, forward is forward?
▶ Recall the two
▶ Kolmogorov’s backward equations
X
Pij′ (t) =
qik Pkj (t) − vi Pij (t).
k̸=i
▶ Kolmogorov’s forward equations
X
Pij′ (t) =
qkj Pik (t) − vj Pij (t).
k̸=j
▶ They are both about Pij′ (t) for all i, j
▶ In order to have a numerical solution, we need a “set” of equations: a
set of functions that show up both on the left and right sides with no
other functions involved
▶ For a fixed j, the backward equations are about Pkj (t) for all k
▶ For a fixed i, the forward equations are about Pik (t) for all k
▶ Backward equations are regarding transition probability functions
when standing at a fixed state j and looking backward
Chen Zhou
Erasmus University Rotterdam
32 / 32
Plan for Lecture 9
Video before lecture
▶ Kendall’s notation (Slides 2-6)
Lecture
1. Limiting probabilities (Chap 6.5)
2. The M/M/1 queue (Chap 6.5)
3. Little’s law (Chap 8.2.1)
4. Limiting properties of a queueing system: PASTA principle (Chap
8.2.2)
Chen Zhou
Erasmus University Rotterdam
1 / 32
Queueing theory: Kendall’s notation
Chen Zhou
Erasmus University Rotterdam
2 / 32
Queueing systems
▶ A queueing system is a system with a service facility.
▶ The facility contains k ≥ 1 servers.
▶ Clients arrive at the service facility according to some arrival process.
Upon arrival,
▶ if a client finds a server free, then he enters the service immediately;
▶ if a client finds all servers busy, then he first has to wait in a queue
before entering the service:
▶
▶
▶
▶
there may be one general queue, or
there may be separate queues for low and high priority, and/or
there may be separate queues for parts of the system;
the queueing capacity may be limited, and clients may be “lost”
▶ Queueing systems are typical continuous-time Markov chains, widely
applied.
Chen Zhou
Erasmus University Rotterdam
3 / 32
Queueing rules
▶ If the queue is non-empty and finds an available server free, then it
releases one of the waiting clients, selected according to the service
discipline.
▶ FIFO: first in, first out.
▶ LIFO: last in, first out.
▶ Random.
▶ Each server needs some time to service a client.
▶ These service times S1 , S2 , . . . are random variables.
Chen Zhou
Erasmus University Rotterdam
4 / 32
Kendall’s notation: letters
▶ Queueing systems are often indicated via two letters followed by one
or two numbers: M/M/1, M/M/2/5.
▶ The first letter indicates the arrival process:
D Deterministic: clients arrive at equidistant time points.
M Markovian: clients arrive according to a Poisson process.
G General: clients arrive according to a general arrival process.
▶ The second letter indicates the type of service times:
D Deterministic: service times are fixed.
M Markovian: service times S1 , S2 , . . . are independent exponential
random variables with common rate.
G General: service times S1 , S2 , . . . are independent and identically
distributed (i.i.d.) random variables. They may have any distribution.
Chen Zhou
Erasmus University Rotterdam
5 / 32
Kendall’s notation: numbers
▶ Queueing systems are often indicated via two letters followed by one
or two numbers: M/M/1, M/M/2/5.
▶ The first number indicates the number of servers.
▶ The second number indicates the capacity of the system, that is, the
maximum number of clients in the system.
▶ The capacity is equal to the number of servers plus the maximum
number of waiting clients.
▶ The capacity is omitted if there is infinite capacity.
Chen Zhou
Erasmus University Rotterdam
6 / 32
Limiting probabilities for CTMC
Chen Zhou
Erasmus University Rotterdam
7 / 32
Limiting probabilities of a continuous-time MC
▶ Assume that the limiting probability of being in state j
Pj = lim Pij (t)
t→∞
exists and is independent of the initial state i.
▶ Consider now the forward equations (assume that the regularity
conditions holds)
▶ Take limit t → ∞


X
qkj Pik (t) − vj Pij (t)
lim Pij′ (t) = lim 
t→∞
t→∞
=
X
=
X
k̸=j
qkj
k̸=j
lim Pik (t) − vj lim Pij (t)
t→∞
t→∞
qkj Pk − vj Pj .
k̸=j
▶ Why not starting with backward equations?
Chen Zhou
Erasmus University Rotterdam
8 / 32
Limiting probabilities: balance equations
▶ As Pij (t) is a probability, and hence is bounded between 0 and 1, it
follows that if Pij′ (t) converges, then it must converge to 0.
▶ Suppose that Pij′ (t) converges to a non-zero value.
▶ Then Pij (t) will exceed one of the boundaries 0 and 1, eventually.
▶ This contradicts the fact that Pij (t) is a probability.
▶ Thus, to find the limiting probabilities Pj , we may solve the balance
equations
X
vj Pj =
qkj Pk ,
k̸=j
under the condition
P
j
Pj = 1.
▶ vj Pj is the limiting rate at which the process leaves state j.
P
▶
k̸=j qkj Pk is the limiting rate at which the process enters state j.
▶ The balancing equations state that “in” and “out” are balanced.
Chen Zhou
Erasmus University Rotterdam
9 / 32
Similarity to Markov Chain
▶ The limiting probabilities exist if
▶ all states of the Markov chain communicate, and
▶ the Markov chain is positive recurrent.
▶ Under these two conditions, the limiting probability Pj can be
interpreted as the long-run proportion of time that the process is in
state j.
▶ When Pj ’s exist, the chain is called ergodic.
▶ Like in the discrete-time case, Pj ’s are also called the stationary
probabilities.
Chen Zhou
Erasmus University Rotterdam
10 / 32
Limiting probabilities: examples R6.1 and R6.15
▶ There are two chairs: chair 1 and chair 2.
▶ In chair 1, shoes are cleaned and polished. Service times are
independent exponential random variables with common rate µ1 .
▶ In chair 2, the polish is buffed. Service times are independent
exponential random variables with common rate µ2 .
▶ Potential customers arrive at the shop according to a Poisson process
with rate λ.
▶ A potential customer will enter the shop only if both chairs are free.
▶ Customer always go for cleaning first and then buffing
▶ Consider the process modeling the status of the shop
a What are the states?
b What are the balance equations?
c Determine the proportion of time the process spends in each state (in
the long run).
Chen Zhou
Erasmus University Rotterdam
11 / 32
Examples R6.1 and R6.15: continued
a States: 0 (no customer), 1 (customer in chair 1), 2 (customer in chair
2)
b The balance equations
▶ For state 0, λP0 = µ2 P2
▶ For state 1, µ1 P1 = λP0
▶ For state 2, µ2 P2 = µ1 P1
c Solution
▶ First turn all probabilities to P0
P1 =
λ
λ
P0 , P2 =
P0
µ1
µ2
▶ Use P0 + P1 + P2 = 1 to solve P0
P0 =
1
1+
λ
µ1
+
λ
µ2
▶ Finally get P1 and P2
Chen Zhou
Erasmus University Rotterdam
12 / 32
The M/M/1 queue
Chen Zhou
Erasmus University Rotterdam
13 / 32
The M/M/1 queue: definition
▶ Recall the Kendall’s notation for M/M/1
▶ Consider a system with a single server.
▶ Customers arrive at the server according to a Poisson process with
rate λ. Upon arrival,
▶ if a customer finds a server free, then he enters the service immediately;
▶ if a customer finds a server busy, then he first has to wait in a queue
before entering the service.
▶ The service needs some time to complete.
▶ The successive service times are independent exponential random
variables with mean 1/µ.
▶ The number of clients in the system is a birth and death process with
common arrival rate λ and common departure rate µ
Chen Zhou
Erasmus University Rotterdam
14 / 32
Limiting probabilities: birth and death process
▶ For a birth and death process, the balance equations are
▶ For i = 0,
λ0 P0 = µ1 P1 .
▶ For every i > 0,
(λi + µi ) Pi = µi+1 Pi+1 + λi−1 Pi−1 .
▶ By solving the balance equations that the limiting probabilities for a
birth and death process should satisfy
Pi =
λ0 λ1 · · · λi−1
P0
µ1 µ2 · · · µi
for i ≥ 1.
Chen Zhou
Erasmus University Rotterdam
15 / 32
Limiting probabilities: birth and death process
▶ By using the fact that
P∞
i=0 Pi
P0 =
= 1, it follows that
1
1+
λ0 λ1 ···λi−1
i=1 µ1 µ2 ···µi
P∞
.
▶ The last equation gives us a necessary and sufficient condition for
existence of the limiting probabilities:
∞
X
λ0 λ1 · · · λn−1
n=1
Chen Zhou
µ1 µ2 · · · µn
< ∞.
Erasmus University Rotterdam
16 / 32
The M/M/1 queue: limiting distribution
▶ Recall that for a birth and death process, the limiting probabilities are
λ0 λ1 · · · λi−1
P0 ,
µ1 µ2 · · · µi
1
P0 =
P∞ λ0 λ1 ···λi−1 .
1 + i=1 µ1 µ2 ···µi
Pi =
▶ As λi = λ for every i ≥ 0 and µi = µ for every i ≥ 1
P0 =
Pi =
1+
1
P∞
= 1 − λ/µ,
1+
P∞
= (λ/µ)i (1 − λ/µ)
n
n=1 (λ/µ)
(λ/µ)i
n
n=1 (λ/µ)
provided that λ/µ < 1
Chen Zhou
Erasmus University Rotterdam
17 / 32
Little’s law
Chen Zhou
Erasmus University Rotterdam
18 / 32
Little’s law: notations
▶ Consider a queueing system.
▶ Let N(t) denote the number of arrivals up to time t.
▶ Clients start arriving at time zero.
▶ Then,
N(t)
t
is the overall arrival rate into the system. This is different from arrival
rate in birth and death process! But somewhat related.
λ = lim
t→∞
▶ Let Vn denote sojourn time of client n; that is, the time client n
spends in the system. This is different from sojourn time in CTMC!
▶ Then,
n
1X
W = lim
Vj
n→∞ n
j=1
is the average sojourn time.
Chen Zhou
Erasmus University Rotterdam
19 / 32
Little’s law
▶ Let X (t) denote the number of clients in the system at time t.
▶ Then,
Z
1 t
L = lim
X (s) ds
t→∞ t 0
is the average number of clients in the system (over time).
▶ Little’s law If both λ and W exist and are finite, then L exists
L = λW .
▶ Intuitive proof
▶ Consider total time of all customers spent in the system till T (large)
▶ Two ways of calculating an integral
▶ Over each customer: W × λT
R
▶ Over time: T X (s) ds = L × T
0
▶ Equating the two, and divided by T (important!) ⇒ Little’s Law.
Chen Zhou
Erasmus University Rotterdam
20 / 32
Little’s law: a simple example
▶ At a hospital, on average 25 new patients arrive per day, and their
average stay is three days.
▶ How many beds are occupied on average?
▶
▶
▶
▶
▶
The system is the hospital.
The arrival rate into the system is λ = 25.
The average sojourn time is W = 3.
The question is about L (average number of customers in the system)
Answer: 75.
Chen Zhou
Erasmus University Rotterdam
21 / 32
Little’s law: a slightly more complicated example
▶ A beer shop sells on average 100 crates of beer per week, and has on
average 250 crates of beer in store.
▶ What is the average number of weeks a beer crate is in store?
▶
▶
▶
▶
The system is the beer shop. A beer crate is a “customer”
The average number of crates beer in store is L = 250.
The question is about W (sojourn time), so we need the arrival rate λ
The departure rate is µ = 100, so the arrival rate must be λ = µ = 100.
▶ The departure/arrival rates are not the same as those in the birth and
death process!
▶ If the arrival of beer is faster than the arrival of customers: L is infinity
▶ If the arrival of beer is slower than the arrival of customer
▶ Sometimes the shop is empty, and the customer cannot buy a beer
▶ The “departure rate” counts the “empty shop”: it is lower than the
arrival rate of customer, and must be the same as the “arrival rate”
▶ Answer: 2.5.
Chen Zhou
Erasmus University Rotterdam
22 / 32
Little’s law: final remark
▶ Little’s law is regarding the long run limit of a queueing system
▶ Little’s law is one of the most general and versatile laws in queueing
theory: applicable to any queueing system, not limited to
continuous-time Markov chain
▶ Following Kendall’s notation, only “M/M” type of queue is a
continuous-time Markov chain
Chen Zhou
Erasmus University Rotterdam
23 / 32
Little’s law: M/M/1 queue
▶ The limiting probabilities
Pn = (λ/µ)n (1 − λ/µ)
for n ≥ 0, provided that λ/µ < 1.
▶ It follows that the time average number in the system is
L=
∞
X
n=0
nPn =
∞
X
n(λ/µ)n (1 − λ/µ) =
n=0
λ
.
µ−λ
▶ Thus, the average sojourn time is given by
W =
1
.
µ−λ
▶ The arrival rate is λ
▶ The “departure rate of the system” is also λ (why not µ?)
Chen Zhou
Erasmus University Rotterdam
24 / 32
The M/M/1 queue: focusing on the queue
▶ Let λQ denote the arrival rate into the queue: λQ = λ
▶ Let WQ denote the average sojourn time in the queue (average
waiting time), which is calculated as
1
average
WQ = W −
=W −
service time
µ
1
λ
1
− =
.
=
µ−λ µ
µ(µ − λ)
▶ Let LQ denote the average number of clients in the queue.
▶ Following the Little’s law
LQ = λWQ =
λ2
.
µ(µ − λ)
▶ This result would be difficult to derive without using Little’s law!
▶ The trick is to choose what the “system” is.
Chen Zhou
Erasmus University Rotterdam
25 / 32
Limiting properties of a queueing system:
The PASTA principle
Chen Zhou
Erasmus University Rotterdam
26 / 32
Limit probabilities of a general queueing system
▶ Let X (t) denote the number of clients in the system at time t.
▶ Define the long-run or steady-state probability of exactly n clients in
the system by
Pn = lim P(X (t) = n).
t→∞
▶ Assume that this limit exists!
▶ Often, Pn is also equal to
1
t→∞ t
Z
lim
t
I{X (s)=n} ds,
0
the long-run proportion of time that the system contains exactly n
clients.
▶ Here, I{X (s)=n} is the indicator of the event that there are exactly n
clients in the system at time s.
Chen Zhou
Erasmus University Rotterdam
27 / 32
Notation: an and dn
▶ Other steady-state probabilities of special interest are
an long-run proportion of clients that find n in the system upon arrival
dn long-run proportion of clients that leave n in the system upon departure
▶ These probabilities may differ from Pn !
▶ Example: a G/D/1 queue.
▶ “1”: a single server queue.
▶ “G”: The interarrival times are independent random variables
▶ We assume that it follows a uniform distribution on the interval [1, 5].
▶ “D”: The service times are all equal
▶ We assume that it to be 1.
▶ As the interarrival times exceed the service times with probability 1,
every arriving client finds the system empty, and every departing client
leaves the system empty, and thus a0 = d0 = 1.
▶ However, since the system is not always empty, P0 < 1.
Chen Zhou
Erasmus University Rotterdam
28 / 32
Arrivals and departures
▶ The observation a0 = d0 in the example is no coincidence.
▶ In any system in which clients arrive and depart one at a time, the
two probabilities an and dn coincide
▶ See Proposition 8.1
▶ Intuitive proof
▶ If overall arrival rate is higher than departure rate: the system will
explode, which implies an = dn = 0
▶ If overall arrival rate is lower than departure rate: the same situation
▶ If overall arrival rate = departure rate
▶ An arrival finds n means n → n + 1; A departure finds n means
n+1→n
▶ It happened infinitely many times the system goes from n to n + 1 and
from n + 1 to n
▶ Between two n → n + 1, there must be once n + 1 → n
▶ Between two n + 1 → n, there must be once n → n + 1
▶ The proportions are the same!
Chen Zhou
Erasmus University Rotterdam
29 / 32
PASTA
▶ In the G/D/1 queue example, P1 ̸= a1 (since P1 > 0 and a1 = 0)
▶ Averages over clients may differ from average over time.
▶ P1 is a long-run average over time.
▶ a1 is a long-run average over clients.
▶ However, if the arrival process is a Poisson process
▶ The arrivals occur homogeneously over time
▶ The averages over time and over clients are the same
▶ The PASTA property: Poisson arrivals see time averages.
(Proposition 8.2)
Pn = an
Chen Zhou
Erasmus University Rotterdam
30 / 32
PASTA: a simple workstation example
▶ Consider a workstation without buffer.
▶ Orders arrive according to a Poisson process with rate λ
▶ The workstation does not have a buffer: orders are only accepted if
the workstation is idle.
▶ The service times are i.i.d. with mean β.
▶ Find the long-run proportion of time that the workstation is busy.
▶ The workstation is always in a “busy-idle” cycle
▶ The expected time that the workstation is busy per “cycle” is equal to
the expected service time β.
▶ The expected time that the workstation is idle per cycle is equal to
1/λ, the expected time until the “next” arrival.
▶ The long-run proportion of time that the workstation is busy (P1 ) is
β
.
β + 1/λ
▶ This is the time average!
Chen Zhou
Erasmus University Rotterdam
31 / 32
Workstation example: continued
▶ Find the long-run proportion of orders lost.
▶ Solution: when an order arrives
▶ It is served if the system is “idle”, lost if the system is “busy”
▶ The probability to be lost is the probability that an arrival sees “busy”
▶ PASTA: The long-run proportion of orders lost (an ) equals to the
long-run proportion of time that the workstation is busy (P1 )
β
.
β + 1/λ
Chen Zhou
Erasmus University Rotterdam
32 / 32
Plan for Lecture 10
1. An old exam question: Poisson process
2. Continuous-time Markov Chain: a gas station example
3. Cut equations
4. More examples: M/M/k/k
Chen Zhou
Erasmus University Rotterdam
1 / 18
An old exam question: Poisson process
Chen Zhou
Erasmus University Rotterdam
2 / 18
Descriptions and simple questions
▶ A mini-bus has only two seats. Passengers must sit down upon
departure.
▶ At 8AM there is no passenger at the bus stop. Passengers arrive at
the bus stop following a Poisson process with rate λ = 3 per hour.
▶ The next bus is scheduled at 8:30AM.
▶ First two questions
1. What is the probability that the next bus leaves with no passenger?
2. What is the probability that the next bus leaves with two passengers?
▶ The bus charges a flexible ticket price with compensation for waiting
time: a full ticket costs 2 euro; for each waiting minute, 5 cents is
compensated. For example, a passenger arrived at 8:20AM will be
charged 2 − 0.05 × 10 = 1.5 euro.
▶ At 8:30AM, there are exactly two passengers waiting for the bus.
3. What is the expected total income for the bus at 8:30AM?
Chen Zhou
Erasmus University Rotterdam
3 / 18
How to deal with such questions?
▶ The questions are again long with lots of texts
▶ Stay calm, read carefully!
▶ First two questions are very similar, only differ in one number
▶ Is Chen crazy by asking the same question twice? Most likely not!
▶ Questions about Poisson process are often “easy” if you define
notation carefully
▶ What is the process of “counting”?
▶ When is t = 0?
▶ What is the question about?
Chen Zhou
Erasmus University Rotterdam
4 / 18
Solution to Question 1 and 2
▶ Denote {N(t), t ≥ 0} as the process for the arrival of passengers
▶ Do not forget {·} and t ≥ 0
▶ t = 0 corresponds to 8:00AM (say it!)
▶ Then the process is a Poisson process with rate λ = 3 per hour
▶ You may convert the rate to 3/60 = 1/20 per minute
▶ Otherwise, you have to be careful with using hour as the unit
▶ Write carefully the event for each question
▶ Both questions are about N(30) (if using rate per minute)
1
▶ N(30) follows a Poisson distribution with parameter 30 · 20
=
3
2
▶ If using hourly rate, this is regarding N(1/2) which follows a Poisson
distribution with parameter 21 · 3 = 23 (must be the same!).
▶ Question 1 is about N(30) = 0
▶ Answer: e −3/2
▶ Question 2 is about N(30) ≥ 2 (why?)
▶ Answer: 1 − exp{− 3 } −
2
Chen Zhou
3
2
1!
exp{− 32 } = 1 − 52 e −3/2
Erasmus University Rotterdam
5 / 18
Solution to Question 3
▶ Introduce new notations to handle the question
▶ Given that N(30) = 2, denote the arrival times of the two arrived
passengers as S1 and S2 .
▶ Given that N(30) = 2, (S1 , S2 ) has the same distribution as
(U(1) , U(2) ) which are the order statistics of two independent and
identically distributed random variables U1 and U2 following U(0, 30).
▶ This is not a easy statement to make
▶
▶
▶
▶
Do not forget the conditioning event
This is regarding the joint distribution of (S1 , S2 ) (not only marginal)
Introduce the notation of the order statistics
Do not forget that they are order statistics of i.i.d. random variables
▶ Write such a statement carefully in your solution. It is the argument
for the calculation
Chen Zhou
Erasmus University Rotterdam
6 / 18
Solution to Question 3
▶ The ticket price for passenger i is 2 − 0.05(30 − Si ) with i = 1, 2
▶ The total income for the bus at 8:30AM is
2 − 0.05(30 − S1 ) + 2 − 0.05(30 − S2 ) = 1 + 0.05(S1 + S2 ).
▶ Notice that
d
d
S1 + S2 = U(1) + U(2) = U1 + U2 .
▶ Here we use the argument!
▶ The expected total income is
E(1 + 0.05(S1 + S2 )|N(30) = 2) = 1 + 0.05E(S1 + S2 |N(30) = 2)
= 1 + 0.05E(U1 + U2 )
= 1 + 0.05 × 30 = 2.5 euro.
Chen Zhou
Erasmus University Rotterdam
7 / 18
Continuous-time MC: a gas station example
Chen Zhou
Erasmus University Rotterdam
8 / 18
A gas station example
▶ Potential customers arrive at a one-pump gas station at a Poisson
rate of λ = 20 cars per hour.
▶ Potential customers will only enter the station if there are less than
two cars at the pump.
▶ Service times are independent exponential random variables with
mean five minutes.
▶ That is, with rate µ = 12 cars per hour.
a. What fraction of the attendant’s time will be spent on servicing cars?
b. What fraction of potential customers are lost?
Chen Zhou
Erasmus University Rotterdam
9 / 18
Example: modeling
▶ Modeling X (t): number of cars at the station
▶
▶
▶
▶
States: 0 (idle), 1 (one car being served), 2 (one car waiting)
A birth and death process, but at 2, only going down to 1
Instantaneous rates: q01 = q12 = λ, q10 = q21 = µ
Define the limit probabilities as Pi , i = 0, 1, 2
▶ Convert questions to proper quantities
a. What fraction of the attendant’s time will be spent on servicing cars?
▶ This question is about P1 + P2
b. What fraction of potential customers are lost?
▶ Customers are lost if they “see two cars at the station”
▶ This question is about a2
▶ PASTA! The question is about a2 , a2 = P2
▶ Essentially we need to solve all limit probabilities Pi for i = 0, 1, 2
▶ Do not forget to state PASTA
Chen Zhou
Erasmus University Rotterdam
10 / 18
Example: solution
▶ Solving the limiting probabilities
▶ Balance equations for each state
P0 λ = P1 µ, P1 (λ + µ) = P0 λ + P2 µ, P2 µ = P1 λ
▶ Use only two equations (similar to discrete time Markov chain!)
P0 λ = P1 µ, P2 µ = P1 λ
▶ Solutions
P0 =
1
9
15
25
=
, P1 =
, P2 =
1 + (λ/µ) + (λ/µ)2
49
49
49
▶ This is actually a M/M/1/2 queue!
Chen Zhou
Erasmus University Rotterdam
11 / 18
Example: extended
▶ Next consider the addition of a second pump with an new attendant
▶ Customers still leave if two cars are being served (no waiting
custormer)
▶ What percentage of customers is lost in that case?
▶ Changed balance equations
P0 λ = P1 µ, P1 (λ + µ) = P0 λ + P2 (2µ), P2 (2µ) = P1 λ
▶ Why 2µ?
▶ The system goes from 2 to 1 if one of the car is finished (minimum of
two independent exponential distributions)
▶ Final answer: 25/73
▶ This is a M/M/2/2 queue.
Chen Zhou
Erasmus University Rotterdam
12 / 18
Cut equations
Chen Zhou
Erasmus University Rotterdam
13 / 18
Shorter equations in solving a queueing system
▶ In solving the gas station example, we observe some simpler equations
than the balance equations
▶ For the M/M/1/2 queue
P0 λ = P1 µ, P2 µ = P1 λ
▶ These shorter equations have a different interpretation
Chen Zhou
Erasmus University Rotterdam
14 / 18
Using cut equations to derive the limit probabilities
▶ Note that clients arrive and depart one at a time.
▶ This is always the case for a queueing system!
▶ It may not be the case for a general continuous-time Markov chain.
▶ Cut equations
λPn = µPn+1 .
▶ Interpretation: the “rate” to go from n to n + 1 equals to that from
n + 1 to n
▶ This gives
P2 =
λ
λ
P1 , P1 = P0
µ
µ
▶ The rest of the solution is similar.
▶ Use the cut equations to solve queueing system is faster!
Chen Zhou
Erasmus University Rotterdam
15 / 18
The M/M/k/k queue: Erlang’s loss system
Chen Zhou
Erasmus University Rotterdam
16 / 18
Erlang’s loss system: the M/M/k/k queue
▶ Erlang’s loss system: A loss system is a queueing system in which
arrivals that find all servers busy do not enter the system, and are lost.
▶ Especially relevant for telecom.
▶ It is simply the M/M/k/k queueing system.
▶ Called Erlang-B in telecom.
▶ Erlang-C refers to the M/M/k queue.
▶ As clients arrive and depart at one at time, we obtain
λPn = (n + 1)µPn+1
for n = 0, 1, . . . , k − 1.
▶ Hence,
Pn =
(λ/µ)n
P0
n!
for n = 1, 2, . . . , k.
Chen Zhou
Erasmus University Rotterdam
17 / 18
Erlang’s loss system
Pn =
(λ/µ)n
P0
n!
▶ It looks like a Poisson distribution, but actually not
▶ The M/M/k/k queue has finite number of states n = 0, 1, 2, . . . , k
▶ For large k, the Poisson distribution is a good approximation
▶ For finite k, as the Pn ’s should add to 1, it follows that
(λ/µ)n /n!
Pn = Pk
j
j=0 (λ/µ) /j!
for n = 0, 1, . . . , k.
▶ The long-run proportion of lost clients is
(λ/µ)k /k!
P̃ = Pk = Pk
.
j
j=0 (λ/µ) /j!
Chen Zhou
Erasmus University Rotterdam
18 / 18
Plan for Lecture 11
Video before lecture
▶ The limit of a random walk (Slides 2-16)
Lecture
1. Brownian motion: definition and properties (Chap 10.1)
2. Reflection principle for random walk (NOT in the book)
Chen Zhou
Erasmus University Rotterdam
1 / 33
Brownian motion: the limit of a random walk
Chen Zhou
Erasmus University Rotterdam
2 / 33
Continuous-time continuous-state Markov process
▶ Moving from discrete-state to continuous-state is similar to move
from a discrete distribution to a continuous distribution
▶ The number of potential states is uncountably infite!
▶ In probability theory, how we moved from a discrete distribution to a
continuous distribution?
▶ Approximate binomial distribution by a normal distribution
▶ Aggregating Bernoulli distributed random variables
▶ We start with constructing something as fundamental as the normal
distribution: the Brownian motion
▶ By taking the limit of a discrete-state MC
Chen Zhou
Erasmus University Rotterdam
3 / 33
Constructing a random walk
▶ A random variable R which satisfies
P(R = −1) = P(R = +1) =
1
2
is called a Rademacher random variable.
▶ The mean of a Rademacher variable is
E(R) = −1 ·
1
2
+1·
1
2
= 0.
▶ The variance of a Rademacher variable is
Var(R) = (−1 − 0)2 ·
1
2
+ (1 − 0)2 ·
1
2
= 1.
▶ Let R1 , R2 , . . . be a sequence of independent Rademacher variables.
P
▶ Consider the process {Sk , k ≥ 0} with Sk = ki=1 Ri .
▶ {Sk , k ≥ 0} is a discrete-time discrete-state MC: a random walk
Chen Zhou
Erasmus University Rotterdam
4 / 33
Random walk: visualization
▶ Plot of the first n = 20 terms of Sk versus k.
Chen Zhou
Erasmus University Rotterdam
5 / 33
Random walk: continuous version
▶ The plot depicts a sample path.
▶ The path is piecewise constant and has jumps.
▶ Obviously, the path is not continuous.
▶ Next, we approximate the path by means of a continuous path.
▶ Mathematically, we connect (k, Sk ) and (k + 1, Sk+1 ) by a straight
line
S̃(t) = Sk + (t − k)Rk+1 , for k ≤ t ≤ k + 1
▶
n
o
S̃(t) is a continuous-time and continuous-state process!
Chen Zhou
Erasmus University Rotterdam
6 / 33
The continuous approximation: visualization
n
o
▶ Plot of the continuous approximation S̃(t) for t ∈ [0, 20].
Chen Zhou
Erasmus University Rotterdam
7 / 33
The continuous approximation: observations
▶ Some observations:
▶ The path is now indeed continuous, but “kinky”.
▶ However, the continuous path approximates the piecewise constant
path poorly.
▶ Next, we again plot Sk and its continuous approximation
but this time we use n = 100 (or t ∈ [0, 100]).
Chen Zhou
Erasmus University Rotterdam
n
o
S̃(t) ,
8 / 33
Random walk: n = 100
▶ Plot of the first n = 100 terms of Sk versus k.
Chen Zhou
Erasmus University Rotterdam
9 / 33
The continuous approximation: t ∈
n [0,o100]
▶ Plot of the continuous approximation S̃(t) for t ∈ [0, 100].
Chen Zhou
Erasmus University Rotterdam
10 / 33
Observations: from 20 to 100
▶ Some observations:
▶ The path of Sk is still piecewise constant and has jumps.
▶ As the horizontal time scale has been adjusted, the relative size of the
constant pieces is decreased.
▶ As the vertical time scale has been adjusted, the relative size of the
jumps is decreased.
▶ The continuous approximation works better for n = 100 than for
n = 20.
▶ Next,
we repeat plotting Sk and its continuous approximation
n
o
S̃(t) , but this time we use n = 500 and t ∈ [0, 500].
Chen Zhou
Erasmus University Rotterdam
11 / 33
Random walk: n = 500
▶ Plot of the first n = 500 terms of Sk versus k.
Chen Zhou
Erasmus University Rotterdam
12 / 33
The continuous approximation: t ∈
n [0,o500]
▶ Plot of the continuous approximation S̃(t) for t ∈ [0, 500].
Chen Zhou
Erasmus University Rotterdam
13 / 33
Final set of observations
▶ Some observations:
▶ As the horizontal time scale has been adjusted again, it has become
difficult to distinguish the constant pieces.
▶ As the vertical time scale has been adjusted, it has become difficult to
distinguish the jumps.
▶ For n ≥ 500, it becomes difficult to distinguish between the path of Sk
and its continuous approximation.
▶ Increasing n further does not lead to dramatic differences in structure
of the sample path.
▶ This suggest that there is a limiting stochastic process with
continuous sample paths.
Chen Zhou
Erasmus University Rotterdam
14 / 33
The limit of a random walk: existence
▶ The plots have automatically adjusted for the fact that the horizontal
√
scale is proportional to n. The vertical scale is proportional to n.
▶ This suggests that we should consider the rescaled version of the
random walk defined by
[nt]
1 X
1
Ri
Wn (t) = √ S[nt] = √
n
n
i=1
for 0 ≤ t ≤ 1.
▶ Here [nt] denotes the largest integer not exceeding nt.
▶ As illustrated by the plots, the rescaled random walk Wn (t) is close to
a continuous sample path process as n → ∞
▶ In fact, the rescaled random walk Wn (t) tends to a limiting stochastic
process W (t) with continuous sample paths.
Chen Zhou
Erasmus University Rotterdam
15 / 33
Brownian motion: the limit of a random walk
▶ This limiting process W (t) is called Brownian motion or Wiener
process.
▶ Definition 10.1 is slightly more general.
▶ Often, the general case is called Brownian motion.
▶ The specific case σ = 1 is called Wiener process.
▶ Brownian motion was first used in 1828 by the botanist Robert
Brown, to describe the movements of particles of pollen suspended in
water.
▶ Brownian motion was formally defined in 1900 by Louis Bachelier, in
thesis on speculation in the bond market.
▶ Brownian motion was thoroughly studied by Norbert Wiener
Chen Zhou
Erasmus University Rotterdam
16 / 33
Brownian motion: definition and properties
Chen Zhou
Erasmus University Rotterdam
17 / 33
Brownian motion: starting point and univariate marginal
▶ As Wn (0) is equal to an “empty” sum, we obtain Wn (0) = 0.
▶ Probabilists say that the process Wn (t) starts at zero.
▶ As Wn (0) tends to W (0) as n → ∞, it follows that W (0) = 0.
▶ Next, consider a fixed t1 , with 0 < t1 ≤ 1.
▶ As [nt1 ] tends to infinity and [nt1 ]/n tends to t1 as n → ∞, the
Central Limit Theorem implies that
[nt1 ]
1 X
Wn (t1 ) = √
Ri =
n
i=1
r
[nt1 ]
[nt1 ] 1 X
p
Ri
n
[nt1 ]
i=1
tends to a normal random variable with mean zero and variance t1 .
▶ Thus, W (t1 ) has a normal distribution with mean zero and variance
t1 .
Chen Zhou
Erasmus University Rotterdam
18 / 33
Brownian motion: increments
▶ Next, consider fixed t1 and t2 , with 0 < t1 < t2 ≤ 1.
▶ Then, W (t2 ) − W (t1 ) is called an increment of the Brownian motion.
▶ To study the increments of the Brownian motion, first look at the
increments of the rescaled random walk Wn (t).
▶ As we may write
1 1
Wn (t2 ) − Wn (t1 ) = √ S[nt2 ] − S[nt1 ] = √
n
n
[nt2 ]
X
Ri ,
i=[nt1 ]+1
it follows that
▶ the random variables Wn (t1 ) and Wn (t2 ) − Wn (t1 ) are independent;
▶ the random variable Wn (t2 ) − Wn (t1 ) tends to a normal random
variable with mean zero and variance t2 − t1 .
Chen Zhou
Erasmus University Rotterdam
19 / 33
Brownian motion: independent and stationary increments
▶ This implies that
▶ the random variables W (t1 ) and W (t2 ) − W (t1 ) are independent;
▶ the random variable W (t2 ) − W (t1 ) is a normal random variable with
mean zero and variance t2 − t1 .
▶ Note that the distribution of the increment W (t1 + s) − W (t1 ) does
not depend on t1
▶ The increments of the Brownian motion are stationary.
Chen Zhou
Erasmus University Rotterdam
20 / 33
Brownian motion: multivariate marginal distribution
▶ Finally, consider fixed t1 , t2 , . . . , tm , with 0 < t1 < t2 < . . . < tm ≤ 1.
▶ One may show that the increments
W (t1 ), W (t2 ) − W (t1 ), . . . , W (tm ) − W (tm−1 ),
are independent normal random variables.
▶ This leads to the joint distribution of (W (t1 ), . . . , W (tm ))
▶ It is called a “marginal distribution” of the process (Why?)
Chen Zhou
Erasmus University Rotterdam
21 / 33
Brownian motion: properties
▶ We may summarize the properties of Brownian motion as follows.
▶ Brownian motion starts at zero: W (0) = 0.
▶ Brownian motion has stationary and independent increments.
▶ Brownian motion evaluated at a fixed time t1 is a normal random
variable with mean zero and variance t1 .
▶ These three properties characterize Brownian motion
▶ Brownian motion satisfies these properties
▶ Any continuous-time continuous-state process satisfying these
properties is a Brownian motion.
Chen Zhou
Erasmus University Rotterdam
22 / 33
Brownian motion: an alternative characterization
▶ An alternative characterization of Brownian motion:
▶ Brownian motion starts at zero: W (0) = 0.
▶ For t1 ≤ t2 , W (t1 ) and W (t2 ) have a bivariate normal distribution with
mean zero and covariance t1 .
▶ We check the equivalence
▶ From the definition to the alternative definition:
▶ W (t1 ) and W (t2 ) − W (t1 ) are independent normally distributed
random variable, which implies that (W (t1 ), W (t2 )) is bivariate
normally distributed.
▶ For 0 < t1 < t2 :
cov(W (t1 ), W (t2 )) = cov(W (t1 ), W (t1 ) + (W (t2 ) − W (t1 )))
= cov(W (t1 ), W (t1 )) + cov(W (t1 ), W (t2 ) − W (t1 ))
= Var(W (t1 )) + 0 = t1 .
Chen Zhou
Erasmus University Rotterdam
23 / 33
Alternative characterization implies the definition
▶ The first and last properties are obvious
▶ W (0) = 0
▶ By taking t1 = t2 , we get W (t1 ) is a normal random variable with
mean zero and variance t1
▶ Only need to check independent and stationary increment
▶ From cov(W (t1 ), W (t2 )) = t1 , for 0 < t1 < t2 ,
cov(W (t1 ), W (t2 ) − W (t1 ))
=cov(W (t1 ), W (t2 )) − cov(W (t1 ), W (t1 ))
=t1 − t1 = 0
▶ From multivariate normality, W (t1 ) and W (t2 ) − W (t1 ) are
independent.
▶ For stationary increment, it follows from
Var(W (t2 ) − W (t1 )) = t2 + t1 − 2t1 = t2 − t1
▶ Hence the two definitions are equivalent!
Chen Zhou
Erasmus University Rotterdam
24 / 33
Brownian motion: invariance
▶ Let Y1 , Y2 , . . . be a sequence of i.i.d. variables with mean zero and
variance σ 2 .
▶ Consider the partial sum process defined by
[nt]
1 X
Yi
Wn (t) = √
σ n
i=1
for 0 ≤ t ≤ 1.
▶ The partial sum process Wn (t) tends to a Brownian motion.
▶ An invariance result: replacing the Rademacher random variables Ri in
the rescaled random walk by Yi /σ did not affect the limiting stochastic
process.
Chen Zhou
Erasmus University Rotterdam
25 / 33
Reflection principle
Chen Zhou
Erasmus University Rotterdam
26 / 33
Boundary crossing probabilities
▶ Consider a Brownian motion {W (t), t ≥ 0}.
▶ Let ψ(t) be a positive continuous function defined on the interval
[0, b], and let y be a positive constant.
▶ Note that, at t = 0, the Brownian motion is below the boundary y ψ(t).
▶ We are interested in the probability that {W (t), t ≥ 0} crosses the
boundary y ψ(t) somewhere on the interval [0, b].
▶
▶
▶
▶
ψ(t) defines the “shape” of the boundary
y defines the “scale” (or “height”) or the boundary
Often, we start with ψ(t) = 1, i.e. a constant boundary y
The Brownian motion crosses the boundary somewhere on the interval
[0, b] if and only if there exists a t ∈ [0, b] such that W (t) > y ψ(t).
▶ The only available tool by now is that Brownian motion is the limit of
a random walk
▶ We start with studying the boundary crossing of random walk first.
Chen Zhou
Erasmus University Rotterdam
27 / 33
Constant boundary crossing for random walk
▶ Let R1 , R2 , . . . be a sequence of independent Rademacher variables.
▶ We are interested in the “random walk” S1 , S2 , . . . with
P
Sn = ni=1 Ri , in particular, when it crosses a boundary k
▶ Sn only takes integer values and has the same parity as n.
▶ The parity of an integer indicates whether the integer is odd or even.
▶ Choose 0 < k < n such that k and n have different parity.
▶ We will either have Sn > k or Sn < k
▶ We study P(maxj=1,2,...,n Sj ≥ k)
▶ The random walk crosses the constant boundary k somewhere before n
Chen Zhou
Erasmus University Rotterdam
28 / 33
Reflection principle: random walk
Chen Zhou
Erasmus University Rotterdam
29 / 33
The time to reach the boundary
▶ Let A be the event that maxj=1,2,...,n Sj ≥ k.
▶ Let
(
min{j : Sj ≥ k} if maxj=1,2,...,n Sj ≥ k,
T =
∞
else.
▶ A = {T < ∞}
▶ If T is finite, then n − T is odd,
Sn − ST =
n
X
Ri
i=T +1
should also be odd, and hence Sn − ST cannot be 0.
Chen Zhou
Erasmus University Rotterdam
30 / 33
Decomposing the boundary crossing event
▶ Introduce the event
Aj = {T = j}
▶
▶
▶
▶
The events A1 , A2 , . . . , An are subsets of A.
The events A1 , A2 , . . . , An are disjoint.
In fact An = ∅
A is the union of the events A1 , A2 , . . . , An .
▶ Thus, we have split up A in disjoint events A1 , A2 , . . . , An .
▶ The event Aj depends only on the first j Rademacher variables
R1 , R2 , . . . , Rj .
▶ Thus, Aj and
Markov!)
Pn
▶ The distribution of
conditioning on Aj .
Chen Zhou
i=j+1
Pn
Ri are independent (Markov! but more than
i=j+1 Ri
is symmetric around zero, even after
Erasmus University Rotterdam
31 / 33
Symmetry in the pathes after T
▶ Condition on the event Aj .
▶ Note that the random variable
Sn − ST =
n
X
Ri
i=j+1
has a symmetric distribution around zero, but cannot be zero.
▶ It follows that
P(Sn − ST > 0 | Aj ) = 12 .
▶ Since the events A1 , A2 , . . . , An are disjoint with union A, it follows
that
P(Sn − ST > 0 | A) = 12 .
Chen Zhou
Erasmus University Rotterdam
32 / 33
Reflection principle for random walk
▶ Thus,
P(Sn ≥ k) = P(Sn > k)
= P(Sn − ST > 0 | A) · P(A) =
1
2
· P(A).
▶ It follows that
P( max Sj ≥ k) = P(A) = 2P(Sn ≥ k)
j=1,2,...,n
▶ Essential reason for the factor 2: Starting at T , for any set of paths,
reflection in the line y = k preserves probabilities.
▶ The two following set of paths have the same probability.
▶ The set of paths which have starting value k at T and end at n in a
value greater than k.
▶ The set of paths which have starting value k at T and end at n in a
value less than k.
Chen Zhou
Erasmus University Rotterdam
33 / 33
Plan for Lecture 12
Video before lecture
▶ Boundary crossing probability: constant boundary (Slides 2-6)
Lecture
1. Hitting time (Chap 10.2)
2. Boundary crossing in application: Butler test (NOT in the book)
3. Boundary crossing probability: linear boundary (Chap 10.3.1 and
slides)
Chen Zhou
Erasmus University Rotterdam
November 13, 2022
1 / 31
Boundary crossing for Brownian motion
Chen Zhou
Erasmus University Rotterdam
November 13, 2022
2 / 31
Boundary crossing probabilities problem
▶ Recall the boundary crossing probability problem
▶ Consider a Brownian motion {W (t), t ≥ 0}.
▶ Let ψ(t) be a positive continuous function defined on the interval
[0, b], and let y be a positive constant.
▶ We are interested in the probability that {W (t), t ≥ 0} crosses the
boundary y ψ(t) somewhere on the interval [0, b].
▶ For a general shape ψ(t), introduce the “weighted supremum”
W (t)
.
0≤t≤b ψ(t)
Mψ+ = sup
▶ We shall refer to ψ(t) as the weight function.
▶ Boundary crossing: if and only if Mψ+ > y .
▶ Equivalent to the cumulative distribution function of Mψ+
Chen Zhou
Erasmus University Rotterdam
November 13, 2022
3 / 31
Constant boundaries: reflection principle
▶ We start with ψ(t) = 1, i.e. a constant weight
▶ For a constant weight ψ(t) = 1, we study the distribution of
sup W (t)
0≤t≤b
▶ We use the reflection principle to find the distribution of
sup0≤t≤b W (t).
Chen Zhou
Erasmus University Rotterdam
November 13, 2022
4 / 31
Constant boundaries: reflection principle visulization
Chen Zhou
Erasmus University Rotterdam
November 13, 2022
5 / 31
Constant boundaries crossing probability
▶ The reflection principle yields
P( sup W (t) > y ) = 2P(W (b) > y )
0≤t≤b
▶ As W (b) has mean zero and variance b, it follows that
W (b) − 0
y −0
√
2P(W (b) > y ) = 2P
> √
b
b
y
= 2P Z > √
b
Z ∞
1 −z 2 /2
=2
e
dz.
√ √
2π
y/ b
Chen Zhou
Erasmus University Rotterdam
November 13, 2022
6 / 31
Express the probability as a function of b
√
▶ The change of variable z = y / s (implying dz = − 21 ys −3/2 ds) yields
that
Z ∞
Z b
1 −z 2 /2
y
√
√
2
e
dz
=
exp{−y 2 /2s} ds
√
3
2π
2πs
y/ b
0
▶ Thus, we have shown that
Z
P( sup W (t) > y ) =
0≤t≤b
Chen Zhou
0
b
√
y
2πs 3
Erasmus University Rotterdam
exp{−y 2 /2s} ds.
November 13, 2022
7 / 31
Hitting time
▶ Let Ty denote the first time the Brownian motion hits the level y .
▶ If Ty is less than b, then the Brownian motion exceeds the level y
before time b (and vice versa).
▶ Thus, the cumulative distribution function of Ty is given by
P(Ty ≤ b) = P( sup W (t) > y )
0≤t≤b
b
Z
√
=
0
y
2πs 3
exp{−y 2 /2s} ds.
▶ It follows that the density of Ty is given by
f (s) = √
Chen Zhou
y
2πs 3
exp{−y 2 /2s}, for s > 0.
Erasmus University Rotterdam
November 13, 2022
8 / 31
Comparing two hitting times
▶ Let Ta and Tb be two hitting times with a, b > 0
▶ Calculate the probability
P(Ta < Tb )
▶ Probability that the Brownian motion hits a first before hitting b
▶ It is quite obvious if a and b share the same sign
▶ Example: if a > b > 0, we always have Tb < Ta (continuous sample
path!)
▶ We are particularly interested in a > 0 > b
Chen Zhou
Erasmus University Rotterdam
November 13, 2022
9 / 31
Comparing two hitting times: solution
Calculate the probability
P(Ta < Tb )
▶ Can this problem be solved by reflection principle?
▶ No, there is no “end of the process” to be checked on
▶ We resort to the construction of Brownian motion: limit of random
walk
▶ Ri are i.i.d. Rademacher variables
▶ Denote
[nt]
1 X
Wn (t) = √
Ri
n i=1
▶ The limit of the process {Wn (t)} is a Brownian motion
▶ We first solve the problem for {Wn (t)}
Chen Zhou
Erasmus University Rotterdam
November 13, 2022
10 / 31
Comparing two hitting times: solution
▶ We calculate the probability that Wn (t) hits a before b (a > 0 > b)
[nt]
1 X
Ri
Wn (t) = √
n
i=1
P[nt]
√
▶ It means that i=1 Ri hits na before
▶ With a fixed n, as t increases per 1/n
√
nb
▶ The sum goes either up if Ri = 1 or down if Ri = −1
▶ This is a Gambler’s ruin problem!
▶ The solution for a Gambler’s ruin problem
√
n(−b)
−b
√
P(Wn (t) hits a before b) = √
=
a−b
na + n(−b)
▶ The solution does not depend on n, thus it remains unchanged in the
limit:
−b
P(Ta < Tb ) =
a−b
Chen Zhou
Erasmus University Rotterdam
November 13, 2022
11 / 31
Boundary crossing from both sides
Chen Zhou
Erasmus University Rotterdam
November 13, 2022
12 / 31
Boundary crossing from both sides
▶ Consider the probability that the Brownian motion {W (t), t ≥ 0}
crosses either the upper boundary y ψ(t) or the lower boundary
−y ψ(t) somewhere on the interval [0, b].
▶ The Brownian motion crosses one of these boundaries somewhere on
the interval [0, b] if and only if there exists a t ∈ [0, b] such that
W (t) < −y ψ(t) or W (t) > y ψ(t).
▶ Introduce the “absolute weighted supremum”
|W (t)|
.
0≤t≤b ψ(t)
Mψ = sup
▶ The Brownian motion crosses one of the boundaries somewhere on the
interval [0, b] if and only if Mψ > y .
Chen Zhou
Erasmus University Rotterdam
November 13, 2022
13 / 31
Absolute supremum
▶ Next, we consider both side boundary crossing with constant weight
function ψ(t) = 1
▶ We determine the probability
P( sup |W (t)| > y )
0≤t≤b
▶ Introduce the events
(
A+ =
)
sup W (t) > y
,
0≤t≤b
(
A− =
)
sup −W (t) > y
.
0≤t≤b
Chen Zhou
Erasmus University Rotterdam
November 13, 2022
14 / 31
Absolute supremum: a rough calcluation
▶ Because
(
)
sup |W (t)| > y
= A+ ∪ A− ,
0≤t≤b
we have
P( sup |W (t)| > y ) < P(A+ ) + P(A− ).
0≤t≤b
▶ To get an equation, we need to subtract “double counted” events
▶ The right-hand side counts the following paths twice:
▶ A+− : paths that go above y and then below −y .
▶ A−+ : paths that go below −y and then above y .
Chen Zhou
Erasmus University Rotterdam
November 13, 2022
15 / 31
Reflection principle for the correction
Chen Zhou
Erasmus University Rotterdam
November 13, 2022
16 / 31
Absolute supremum: correcting the rough calculation
▶ By the reflection principle, it follows that
!
P(A+− ) = P
sup W (t) > 3y
,
0≤t≤b
P(A−+ ) = P
inf W (t) < −3y
.
0≤t≤b
▶ By subtracting P(A+− ) and P(A−+ ) we now obtain
P( sup |W (t)| > y )
0≤t≤b
> P(A+ ) + P(A− ) − P(A+− ) − P(A−+ ),
▶ Here, we have overcompensated (subtracted too much)
Chen Zhou
Erasmus University Rotterdam
November 13, 2022
17 / 31
Further correction
▶ To get an equation, we need to add back what has been “double
subtracted”
▶ A+−+ : paths that go above y , then below −y , and then above y .
▶ A−+− : paths that go below −y , then above y , and then below −y .
▶ By the reflection principle, it follows that
!
P(A+−+ ) = P
sup W (t) > 5y
,
0≤t≤b
P(A−+− ) = P
Chen Zhou
inf W (t) < −5y
0≤t≤b
Erasmus University Rotterdam
,
November 13, 2022
18 / 31
Never ending corrections
▶ By adding P(A+−+ ) and P(A−+− ) we now obtain
P( sup |W (t)| > y )
0≤t≤b
< P(A+ ) + P(A− ) − P(A+− ) − P(A−+ )
+ P(A+−+ ) + P(A−+− ),
▶ Again, we have overcompensated (added too much)
▶ To get an equation, we need to subtract again “double counted”
▶ paths that go above y , then below −y , then above y , and then below
−y .
▶ paths that go below −y , then above y , then below −y , and then above
y.
Chen Zhou
Erasmus University Rotterdam
November 13, 2022
19 / 31
Correct result after infinite many corrections
▶ In this way the computation continues indefinitely . . . .
▶ By symmetry, we have P(A+ ) = P(A− ), P(A+− ) = P(A−+ ),
P(A+−+ ) = P(A−+− ), . . . .
▶ It follows that
P( sup |W (t)| > y )
0≤t≤b
= P(A+ ) − P(A+− ) + P(A+−+ ) + · · ·
+ P(A− ) − P(A−+ ) + P(A−+− ) + · · ·
= 2 {P(A+ ) − P(A+− ) + P(A+−+ ) + · · · }
Chen Zhou
Erasmus University Rotterdam
November 13, 2022
20 / 31
Absolute supremum: explicit result
▶ Applying our earlier result with respect to the supremum of the
Brownian motion, we obtain
P( sup |W (t)| > y )
0≤t≤b
=2
=4
∞
X
j=1
∞
X
(−1)j+1 P( sup W (t) > (2j − 1)y )
0≤t≤b
(−1)j+1 P(W (b) > (2j − 1)y )
j=1
Chen Zhou
Erasmus University Rotterdam
November 13, 2022
21 / 31
Absolute supremum: table used in practice
▶ Below, there is a table of the cumulative distribution function of
T = sup0≤t≤1 |W (t)|.
x
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
1.1
1.2
1.3
1.4
1.5
Chen Zhou
P(T ≤ x)
0.000000
0.000000
0.000001
0.000570
0.009157
0.041362
0.102674
0.185242
0.277614
0.370777
0.459269
0.540358
0.612990
0.677027
0.732785
x
1.6
1.7
1.8
1.9
2.0
2.1
2.2
2.3
2.4
2.5
2.6
2.7
2.8
2.9
3.0
P(T ≤ x)
0.780806
0.821739
0.856279
0.885134
0.908999
0.928542
0.944386
0.957104
0.967210
0.975161
0.981355
0.986132
0.989779
0.992537
0.994600
Erasmus University Rotterdam
x
3.1
3.2
3.3
3.4
3.5
3.6
3.7
3.8
3.9
4.0
4.1
4.2
4.3
4.4
4.5
P(T ≤ x)
0.996130
0.997251
0.998066
0.998652
0.999069
0.999364
0.999569
0.999711
0.999808
0.999873
0.999917
0.999947
0.999966
0.999978
0.999986
November 13, 2022
22 / 31
Boundary crossing in application: Butler test
Chen Zhou
Erasmus University Rotterdam
November 13, 2022
23 / 31
Boundary crossing: application
▶ Suppose we have a random sample Y1 , Y2 , . . . , Yn drawn from an
unknown distribution, with the aim of testing whether this
distribution is symmetric around zero.
▶ In Butler (1969), a statistic is proposed to test the null hypothesis of
symmetry around zero.
▶ Rearrange the sample so as to satisfy |Y(1) | ≤ |Y(2) | ≤ . . . ≤ |Y(n) |.
▶ Define random variables R1 , R2 , . . . , Rn by
(
1,
if Y(i) > 0,
Ri =
−1, if Y(i) < 0.
▶ Butler’s test statistic
Tn = sup
0≤t≤1
[nt]
1 X
√
Ri
n i=1
is the absolute supremum of partial sum process.
Chen Zhou
Erasmus University Rotterdam
November 13, 2022
24 / 31
Constant boundaries: application
▶ Under the null hypothesis that Y is following a symmetric
distribution, sign and magnitude are independent
▶ Ri ’s are independent Rademacher variables
P[nt]
▶ √1
i=1 Ri converges to a Browanian motion
n
▶ Tn converges in distribution to the absolute supremum of the Brownian
motion on the unit interval.
▶ Under the null hypothesis, for large sample size n, we may
approximate the distribution of Tn by the distribution of the absolute
supremum of the Brownian motion on the unit interval.
▶ Selected critical values
x P(T ≤ x)
1.959964 0.900000
2.241403 0.950000
2.497705 0.975000
2.807034 0.990000
Chen Zhou
Erasmus University Rotterdam
November 13, 2022
25 / 31
Constant boundaries: application
▶ Graphical representation of Butler’s test, n = 100, null hypothesis.
Chen Zhou
Erasmus University Rotterdam
November 13, 2022
26 / 31
Constant boundaries: application
▶ Graphical representation of Butler’s test, n = 100, alternative.
Chen Zhou
Erasmus University Rotterdam
November 13, 2022
27 / 31
Linear boundary crossing
Chen Zhou
Erasmus University Rotterdam
November 13, 2022
28 / 31
Linear boundaries: weighted supremum
▶ Next consider linear boundary ψ(t) = 1 + at with a > 0 and unlimited
interval b = +∞
(t)
▶ Note that the event {supt≥0 W
1+at ≤ y } coincides with the event
{W (t) ≤ y (1 + at)
for every t > 0}
▶ Deriving this result is complicated, though reflection principle still
works (Corollary 10.1 with t → ∞), not required for exam
▶ The result is known from Doob (1949)
W (t)
P sup
>y
1
t≥0 + at
Chen Zhou
= exp −2ay 2
Erasmus University Rotterdam
November 13, 2022
29 / 31
Brownian motion with drift
▶ A Brownian motion with drift coefficient µ is a stochastic process
{X (t), t ≥ 0} which
▶ starts at zero: X (0) = 0;
▶ for fixed t, X (t) has a normal distribution with mean µt and variance t;
▶ has independent and stationary increments.
▶ If {W (t), t ≥ 0} is a Brownian motion, then the stochastic process
defined by X (t) = W (t) + µt is a Brownian motion with drift
coefficient µ.
▶ What is the probability that a Brownian motion with drift µ < 0,
{X (t), t ≥ 0} ever exceeds a constant bound y > 0?
Chen Zhou
Erasmus University Rotterdam
November 13, 2022
30 / 31
Constant boundary crossing for Brownian motion with drift
▶ Calculate P(supt≥0 X (t) > y )
▶ Write
P(sup X (t) > y ) = P(X (t ∗ ) > y for some t ∗ > 0)
t≥0
= P(W (t ∗ ) + µt ∗ > y for some t ∗ > 0)
= P(W (t ∗ ) > y − µt ∗ for some t ∗ > 0)
= P(W (t ∗ ) > y (1 − (µ/y )t ∗ ) for some t ∗ > 0)
W (t ∗ )
= P(
> y for some t ∗ > 0)
1 − (µ/y )t ∗
W (t)
= P(sup
> y ) = exp (2µy ) .
t≥0 1 − (µ/y )t
▶ Convert “constant boundary crossing for Brownian motion with drift”
to “linear boundary crossing for Brownian motion”
Chen Zhou
Erasmus University Rotterdam
November 13, 2022
31 / 31
Plan for Lecture 13
Video before lecture
▶ Definition of empirical process (Slides 2-9)
Lecture
1. Uniform empirical process and Brownian bridge (NOT in the book)
2. Brownian bridge and asymptotic statistics (NOT in the book)
Chen Zhou
Erasmus University Rotterdam
November 12, 2022
1 / 27
Empirical process
Chen Zhou
Erasmus University Rotterdam
November 12, 2022
2 / 27
Statistics: an overview
▶ In statistics, we often model a random variable X by a parametric
model
▶ Known density format f (x; θ)
▶ Unknown parameter θ (can be more than univariate)
▶ We obtain independent and identically distributed sample of X :
X1 , . . . , Xn
▶ Sample size n
▶ Xi are regarded as from the distribution f (x; θ)
▶ The goal: estimate θ using the observations
▶ How to construct an estimator θ̂ = g (X1 , . . . , Xn )?
Chen Zhou
Erasmus University Rotterdam
November 12, 2022
3 / 27
Mathematical statistics: statistics with asymptotic theory
▶ Construction of an estimator
▶ Method of moment: matching the theoretical moments (a function of
θ) with empirical moments
▶ Maximum likelihood: maximizing the (logarithm of) likelihood
▶ Theoretical properties (asymptotic theory)
P
▶ Consistency: as n → ∞, do we have θ̂ →
θ?
▶ Asymptotic normality: as n → ∞,
d
sn (θ̂ − θ) → (Often normal) Distribution
▶ It is often difficult to prove asymptotic theory.
▶ Is there a “silver bullet”? A tool that can be used to produce
asymptotic theories automatically: Empirical process
Chen Zhou
Erasmus University Rotterdam
November 12, 2022
4 / 27
Estimating parameters by estimating the CDF
▶ Parameters can often be written as a “functional” of the CDF
▶ “functional” refers to some treatments: taking value at certain point,
taking integral, etc.
▶ Example: for exponential distribution F (x; λ) = 1 − e −λx
1
λ = − log(1 − F (2))
2
1
1 − F (4)
λ = − log
2
1 − F (2)
1
λ = R∞
(1
−
F (x))dx
0
▶ If we can estimate the entire CDF F (x), we can “plug in” the
estimator(s) of F (x) to get an estimator θ̂
Chen Zhou
Erasmus University Rotterdam
November 12, 2022
5 / 27
Estimating the CDF: Empirical distribution function
▶ Recall the definition of the CDF
F (x) = P(X ≤ x)
▶ A natural estimator using observations X1 , . . . , Xn is then
n
Fn (x) =
1X
I{Xi ≤x} .
n
i=1
▶ This is called the empirical distribution function
▶ It is “empirical”: based on observations, i.e. an estimator
▶ It is a “distribution function”: a discrete distribution assigning
probability 1/n to each observation.
▶ It is a stochastic process: defined for all x ∈ R
Chen Zhou
Erasmus University Rotterdam
November 12, 2022
6 / 27
Why the empirical distribution function is a “silver bullet”
▶ The EDF helps to construct estimators
▶ Recall that in the example of exponential distribution
1
1 − F (4)
λ = − log
2
1 − F (2)
▶ An estimator for λ can be constructed as
1 − Fn (4)
1
λ̂ = − log
2
1 − Fn (2)
where Fn is the empirical distribution function
▶ The EDF helps to prove asymptotic theory
▶ With the joint asymptotic behavior for Fn (x) over all x ∈ R, we can
derive the asymptotic theory for an estimator using the Delta method.
▶ The key issue is “joint”!
Chen Zhou
Erasmus University Rotterdam
November 12, 2022
7 / 27
Asymptotic theory for the EDF: empirical process
n
1X
I{Xi ≤x} .
Fn (x) =
n
i=1
▶ We derive the asymptotic for Fn (x) at one fixed x0 .
▶ Fn (x0 ) is the average of i.i.d. Bernoulli random variables I{Xi ≤x0 }
E(I{Xi ≤x0 } ) = P(Xi ≤ x0 ) = F (x0 )
V(I{Xi ≤x0 } ) = P(Xi ≤ x0 )(1 − P(Xi ≤ x0 )) = F (x0 )(1 − F (x0 ))
▶ From Central Limit Theorem, as n → ∞,
√
d
n(Fn (x0 ) − F (x0 )) → N(0, F (x0 )(1 − F (x0 )))
▶ This is only at one point x0 , what about “joint”?
Chen Zhou
Erasmus University Rotterdam
November 12, 2022
8 / 27
Limit of the empirical process
▶ The stochastic process
√
n(Fn (x) − F (x)) for x ∈ R
is called the empirical process
▶ For statistics we need joint asymptotics at many locations for the
process
▶ For simplicity, we first study the empirical process for U[0, 1]
Chen Zhou
Erasmus University Rotterdam
November 12, 2022
9 / 27
Uniform empirical process
▶ Let U1 , . . . , Un be i.i.d. random variables from U[0, 1], the process
)
( n
X
1
I{Ui ≤u} − u ,
for 0 ≤ u ≤ 1,
Bn (u) = n1/2
n
i=1
is called the uniform empirical process.
▶ General empirical process can be linked to the uniform empirical
process
▶ If the random variable X has continuous CDF F (x), then
F (X ) ∼ U[0, 1].
▶ Moreover, if U ∼ U[0, 1], then F ← (U) has CDF F (x).
▶ The quantile function F ← (u) is defined by
F ← (u) = inf {y : F (y ) ≥ u}
Chen Zhou
Erasmus University Rotterdam
for 0 ≤ u ≤ 1.
November 12, 2022
10 / 27
Linking general and the uniform empirical processes
▶ If F is continuous, then a general empirical process is
√
n(Fn (x) − F (x)) =
√
= n
d√
= n
n
√
n
!
n
1X
I{Xi ≤x} − F (x)
n
i=1
!
1X
I{F (Xi )≤F (x)} − F (x)
n
i=1
!
n
1X
I{Ui ≤F (x)} − F (x) = Bn (F (x)),
n
i=1
where
(
Bn (u) = n
1/2
n
1X
I{Ui ≤u} − u
n
)
,
for 0 ≤ u ≤ 1.
i=1
▶ We only need to study the limit of the uniform empirical process.
Chen Zhou
Erasmus University Rotterdam
November 12, 2022
11 / 27
Asymptotic theory for the EDF: bivariate
(
Bn (u) = n
1/2
n
1X
I{Ui ≤u} − u
n
)
for 0 ≤ u ≤ 1.
,
i=1
▶ We derive the asymptotic for Bn (u) at two points s and t
▶ From the univariate result, we get
d
d
Bn (s) → N(0, s(1 − s)), Bn (t) → N(0, t(1 − t))
▶ What about their covariance?

Cov(Bn (s), Bn (t)) = n · Cov 
=
1
n
n X
n
X
i=1 j=1
n
1X
n
Cov(I{Ui ≤s} , I{Uj ≤t} ) =
I{Ui ≤s} ,
i=1
1
n
n
X
n
1X
n

I{Uj ≤t} 
j=1
Cov(I{Ui ≤s} , I{Ui ≤t} )
i=1
=E(I{Ui ≤s} · I{Ui ≤t} ) − (EI{Ui ≤s} ) · (EI{Ui ≤t} ) = min(s, t) − st
Chen Zhou
Erasmus University Rotterdam
November 12, 2022
12 / 27
The limit of the uniform empirical process
▶ For the uniform empirical process, at any two points s and t (not very
rigorously)
Bn (s)
0
s(1 − s)
min(s, t) − st
d
→N
,
Bn (t)
0
min(s, t) − st
t(1 − t)
▶ What about the whole process Bn (u)?
▶ As illustrated by the plots that follow, the uniform empirical process
{Bn (u), 0 ≤ u ≤ 1} tends to a limiting stochastic process
{B(u), 0 ≤ u ≤ 1} with continuous sample paths as the sample size n
tends to infinity.
▶ The limiting process {B(u), 0 ≤ u ≤ 1} is called Brownian bridge.
Chen Zhou
Erasmus University Rotterdam
November 12, 2022
13 / 27
Brownian bridge: existence
▶ Plot of the uniform empirical process Bn (u) for n = 20.
Chen Zhou
Erasmus University Rotterdam
November 12, 2022
14 / 27
Brownian bridge: existence
▶ Plot of the uniform empirical process Bn (u) for n = 100.
Chen Zhou
Erasmus University Rotterdam
November 12, 2022
15 / 27
Brownian bridge: existence
▶ Plot of the uniform empirical process Bn (u) for n = 500.
Chen Zhou
Erasmus University Rotterdam
November 12, 2022
16 / 27
Brownian bridge: characterization
▶ The uniform empirical process {Bn (u), 0 ≤ u ≤ 1}
▶ At each fixed u1 , Bn (u1 ) converges to N(0, u1 (1 − u1 ))
▶ At two fixed locations s and t (Bn (u1 ), Bn (u2 )) converges to a bivariate
normal distribution with covariance min(u1 , u2 ) − u1 u2
▶ Consequently, for the Brownian bridge {B(u), 0 ≤ u ≤ 1}
▶ For every 0 ≤ u1 ≤ 1, B(u1 ) is a normal random variable with mean
zero and variance u1 (1 − u1 ).
▶ For every 0 ≤ u1 , u2 ≤ 1, (B(u1 ), B(u2 )) is a bivariate random vector
with
Cov(B(u1 ), B(u2 )) = min(u1 , u2 ) − u1 u2
▶ These properties characterize the Brownian bridge.
Chen Zhou
Erasmus University Rotterdam
November 12, 2022
17 / 27
Brownian bridge and asymptotic statistics
Chen Zhou
Erasmus University Rotterdam
November 12, 2022
18 / 27
Limit of a general empirical process
▶ Recall that for a general empirical process, we have that
√
d
n(Fn (x) − F (x)) = Bn (F (x))
▶ As
√ the sample size n → ∞, the general empirical process
{ n(Fn (x) − F (x)), x ∈ R} converges to a limiting process
{B(F (x)), x ∈ R}
▶ Here {B(u), 0 ≤ u ≤ 1} is a Brownian bridge
▶ F (x) is “transforming” the indexation of the Brownian bridge.
Chen Zhou
Erasmus University Rotterdam
November 12, 2022
19 / 27
Limit of a general empirical process: rigorous fomulation
▶ A more rigorous way to formulate the convergence result is as follows.
There “exists” a proper Brownian bridge {B(u), 0 ≤ u ≤ 1} such that
sup
√
P
n(Fn (x) − F (x)) − B(F (x)) → 0
x∈R
▶ The convergence is uniformly for all x ∈ R
▶ This statement, together with the Delta method, is the “silver bullet”
for proving asymptotic theories.
Chen Zhou
Erasmus University Rotterdam
November 12, 2022
20 / 27
Example: estimating λ for the exponential distribution (1)
▶ The first estimator
1
λ̂1 = − log(1 − Fn (2))
2
▶ This estimator involves only one location of the EDF Fn (2)
▶ We have, as n → ∞,
√
d
n(Fn (2) − F (2)) → N(0, F (2)(1 − F (2))
▶ For the exponential distribution F (2) = 1 − e −2λ
▶ Write λ̂1 = g (Fn (2)), with g (y ) = − 12 log(1 − y )
1
▶ g ′ (y ) = 12 1−y
▶ g (F (2)) = λ and g ′ (F (2)) = 12 e 2λ
▶ From the Delta method, we get, as n → ∞,
√
1 2λ
d 1 2λ
−2λ
−2λ
n(λ̂1 − λ) → e · N 0, e
(1 − e
) = N 0, (e − 1)
2
4
Chen Zhou
Erasmus University Rotterdam
November 12, 2022
21 / 27
Example: estimating λ for the exponential distribution (2)
▶ The second estimator
1 − Fn (4)
1
λ̂2 = − log
2
1 − Fn (2)
▶ This estimator involves two locations of the EDF: Fn (2) and Fn (4)
▶ We have, jointly, as n → ∞,
√
d
n(Fn (2) − F (2), Fn (4) − F (4)) → (B(F (2)), B(F (4))),
where (L1 , L2 ) := (B(F (2)), B(F (4))) follows a bivariate normal
distribution with mean zero and covariance matrix
F (2)(1 − F (2)) F (2) − F (2)F (4)
F (2) − F (2)F (4) F (4)(1 − F (4))
Chen Zhou
Erasmus University Rotterdam
November 12, 2022
22 / 27
Example: estimating λ for the exponential distribution (2)
▶ Write λ̂2 = g (Fn (2), Fn (4)) with
1
1−y
g (x, y ) = − log
2
1−x
▶ Apply the Delta method to the function g
▶ At the limit g (F (2), F (4)) = λ (as it should be)
∂g
1 1
1 1
▶ Partial derivatives: g1 (x, y ) = ∂g
∂x = − 2 1−x , g2 (x, y ) = ∂y = 2 1−y
1
1
2λ
4λ
▶ At the limit: g1 (F (2), F (4)) = − 2 e , g2 (F (2), F (4)) = 2 e
▶ From the Delta method, we get, as n → ∞,
√
Chen Zhou
1
1
d
n(λ̂2 − λ) → − e 2λ L1 + e 4λ L2
2
2
Erasmus University Rotterdam
November 12, 2022
23 / 27
Example: estimating λ for the exponential distribution (3)
▶ Calculate the variance of the limit
1
1
Var (− e 2λ L1 + e 4λ L2 )
2
2
1
1
1 4λ −2λ
−2λ
(1 − e
) + e 8λ e −4λ (1 − e −4λ ) − e 6λ e −4λ (1 − e −2λ )
= e e
4
4
2
1 2λ 2λ
= e (e − 1)
4
Chen Zhou
Erasmus University Rotterdam
November 12, 2022
24 / 27
Example: estimating λ for the exponential distribution (3)
▶ The third estimator
λ̂3 = R ∞
0
1
(1 − Fn (x))dx
▶ This estimator involves the entire EDF!
▶ We really need the “silver bullet”: as n → ∞,
sup
√
P
n(Fn (x) − F (x)) − B(F (x)) → 0
x∈R
▶ It means
√
n((1 − Fn (x)) − (1 − F (x))) converges to −B(F (x))
▶ By taking integral on both sides (not very rigorously)
Z ∞
Z ∞
Z ∞
√
d
n
(1 − Fn (x))dx −
(1 − F (x))dx →
−B(F (x))dx
0
Chen Zhou
0
Erasmus University Rotterdam
0
November 12, 2022
25 / 27
Example: estimating λ for the exponential distribution (3)
R
▶ Note that 0∞ (1 − F (x))dx = 1/λ
▶ We apply the Delta method for a simple function g (y ) = 1/y
R
▶ At the limit g ( ∞ (1 − F (x))dx) = λ (as it should be)
0
R
▶ Clearly g ′ (y ) = −1/y 2 and g ′ ( ∞ (1 − F (x))dx) = −λ2
0
▶ From the Delta method, we get, as n → ∞,
Z ∞
Z
√
d
2
2
n(λ̂3 − λ) → −λ
−B(F (x))dx = λ
0
∞
B(F (x))dx
0
R
▶ Integral is essentially a “sum”, so 0∞ B(F (x))dx is still a normally
distributed random variable with mean zero
▶ We need to calculate its variance (complicated but doable)
Chen Zhou
Erasmus University Rotterdam
November 12, 2022
26 / 27
Example: estimating λ for the exponential distribution (3)
λ̂3 = R ∞
0
▶ What does
R∞
0
Z
1
(1 − Fn (x))dx
(1 − Fn (x))dx truly mean?
Z (
∞
∞
n
1X
(1 − Fn (x))dx =
1−
I{Xi ≤x}
n
0
0
i=1
Z ∞ X
n
n Z
1
1X ∞
=
I{Xi >x} dx =
I{Xi >x} dx
n
n
0
0
i=1
i=1
n Z Xi
n
X
X
1
1
dx =
=
Xi
n
n
0
i=1
)
dx
i=1
▶ Therefore λ̂3 = 1/X̄ , which is the moment estimator.
Chen Zhou
Erasmus University Rotterdam
November 12, 2022
27 / 27
Plan for Lecture 14
Video before lecture
▶ Gaussian processes (Slides 2-5)
Lecture
1. Brownian motion and Brownian bridge (Chap 10.7 and slides)
2. Kolmogorov-Smirnov test (NOT in the book)
3. Other continuous-time continuous-state processes (Example 10.6 and
Chap 10.3.2)
Chen Zhou
Erasmus University Rotterdam
November 12, 2022
1 / 29
Gaussian processes
Chen Zhou
Erasmus University Rotterdam
November 12, 2022
2 / 29
Gaussian processes
▶ One way of describing a stochastic process {X (t), t ∈ T } is to give
all its finite-dimensional distributions.
▶ That is, the joint distribution of X (t1 ), X (t2 ), . . . , X (tm ) for any
t1 , t2 , . . . , tm and m.
▶ If every finite-dimensional distribution of {X (t), t ∈ T } is multivariate
normal, then we say that {X (t), t ∈ T } is a Gaussian process.
▶ Recall that a multivariate normal distribution is completely described
by its mean vector and its covariance matrix.
▶ Similarly, a Gaussian process {X (t), t ∈ T } is completely described by
its mean function E(X (t1 )) and its covariance function
cov(X (t1 ), X (t2 )).
Chen Zhou
Erasmus University Rotterdam
November 12, 2022
3 / 29
Finite-dimensional distributions
▶ In particular, the mean and covariance functions allow us to
determine any finite-dimensional distribution of a Gaussian process.
▶ If {X (t), t ∈ T } is a Gaussian process, then the joint distribution of
X (t1 ), X (t2 ), . . . , X (tm ) is multivariate normal, with mean vector

E(X (t1 ))
 E(X (t2 )) 




..


.
E(X (tm ))

and covariance matrix
cov(X (t1 ), X (t1 ))
 cov(X (t1 ), X (t2 ))


..

.
cov(X (t1 ), X (tm ))

Chen Zhou
cov(X (t1 ), X (t2 ))
cov(X (t2 ), X (t2 ))
..
.
cov(X (t2 ), X (tm ))
Erasmus University Rotterdam
...
...
..
.
...

cov(X (t1 ), X (tm ))
cov(X (t2 ), X (tm )) 


..

.
cov(X (tm ), X (tm ))
November 12, 2022
4 / 29
Brownian motion and Brownian bridge as Gaussian
processes
▶ A Brownian motion {W (t), t ≥ 0} is a Gaussian process with zero
mean function and covariance function
cov(W (t1 ), W (t2 )) = min(t1 , t2 ).
▶ A Brownian bridge {B(t), 0 ≤ t ≤ 1} is a Gaussian process with zero
mean function and covariance function
cov(B(t1 ), B(t2 )) = min(t1 , t2 ) − t1 t2 .
▶ We study various ways to go from one to the other
Chen Zhou
Erasmus University Rotterdam
November 12, 2022
5 / 29
Brownian motion and Brownian bridge
Chen Zhou
Erasmus University Rotterdam
November 12, 2022
6 / 29
From Brownian motion to Brownian bridge
▶ Let {W (t), 0 ≤ t ≤ 1} be a Brownian motion
▶ The process {X (t), 0 ≤ t ≤ 1} defined by X (t) = W (t) − tW (1) for
0 ≤ t ≤ 1 is a Brownian bridge on the unit interval.
▶ Proof:
▶ {X (t), 0 ≤ t ≤ 1} is a Gaussian process (why?) with mean function
E(X (t1 )) = E(W (t1 )) − t1 E(W (1)) = 0,
and covariance function
cov(X (t1 ), X (t2 ))
= cov(W (t1 ) − t1 W (1), W (t2 ) − t2 W (1))
= cov(W (t1 ), W (t2 )) − t2 cov(W (t1 ), W (1))
− t1 cov(W (1), W (t2 )) + t1 t2 cov(W (1), W (1))
= min(t1 , t2 ) − t2 t1 − t1 t2 + t1 t2
= min(t1 , t2 ) − t1 t2 .
Chen Zhou
Erasmus University Rotterdam
November 12, 2022
7 / 29
From Brownian motion to Brownian bridge: visualization
▶ Plot of W (t) and tW (1) versus t for 0 ≤ t ≤ 1.
Chen Zhou
Erasmus University Rotterdam
November 12, 2022
8 / 29
From Brownian motion to Brownian bridge: visualization
▶ Plot of W (t) − tW (1) versus t for 0 ≤ t ≤ 1.
Chen Zhou
Erasmus University Rotterdam
November 12, 2022
9 / 29
Alternative way from Brownian motion to Brownian bridge
▶ By conditioning on the event {W (1) = 0}, the Brownian motion
{W (t), t ≥ 0} becomes a Brownian bridge on the unit interval.
▶ Proof:
▶ First note that
cov(W (t) − tW (1), W (1))
= cov(W (t), W (1)) − t · cov(W (1), W (1))
= t − t = 0,
so W (t) − tW (1) and W (1) are independent (why?).
▶ Thus, conditioning on W (1) does not alter the distribution of
W (t) − tW (1): it remains Brownian bridge on the unit interval.
▶ However, conditional on the event {W (1) = 0}, W (t) − tW (1) and
W (t) coincide.
Chen Zhou
Erasmus University Rotterdam
November 12, 2022
10 / 29
From Brownian bridge to Brownian motion
▶ Let Z be a standard normal random variable, independent of the
Brownian bridge {B(t), t ≥ 0}.
▶ Then, the process {X (t), t ≥ 0} defined by X (t) = B(t) + tZ for
0 ≤ t ≤ 1 is a Brownian motion on the unit interval.
▶ Proof:
▶ {X (t), t ≥ 0} is a Gaussian process (why?) with mean function
E(X (t1 )) = E(B(t1 )) − t1 E(Z ) = 0,
and covariance function
cov(X (t1 ), X (t2 )) = cov(B(t1 ) + t1 Z , B(t2 ) + t2 Z )
= cov(B(t1 ), B(t2 ) + t2 Z )
+ t1 cov(Z , B(t2 ) + t2 Z )
= cov(B(t1 ), B(t2 )) + t2 cov(B(t1 ), Z )
+ t1 cov(Z , B(t2 )) + t1 t2 cov(Z , Z )
= min(t1 , t2 ) − t1 t2 + 0 + 0 + t1 t2 = min(t1 , t2 ).
Chen Zhou
Erasmus University Rotterdam
November 12, 2022
11 / 29
Summary of the relations
▶ They are both Gaussian processes
▶ Any finite dimensional distribution follows a multivariate normal
distribution
▶ The mean function is zero
▶ The covariance function describes the whole process
▶ They both have continuous sample path
▶ Different covariance functions
▶ For Brownian motion cov(W (t1 ), W (t2 )) = min(t1 , t2 )
▶ For Brownian bridge cov(B(t1 ), B(t2 )) = min(t1 , t2 ) − t1 t2
▶ Note V(B(1)) = (cov )(B(1), B(1)) = 0 ⇒ B(1) = 0 (Bridge!)
▶ From BM to BB: force the process to end at 0
d
▶ B(t) =
W (t) − tW (1)
d
▶ B(t) = W (t)|W (1) = 0
▶ From BB to BM: “compensate” the randomness at the end
▶ Let Z ∼ N(0, 1) independent of {B(t)}
d
▶ W (t) =
B(t) + tZ
Chen Zhou
Erasmus University Rotterdam
November 12, 2022
12 / 29
Kolmogorov-Smirnov test
Chen Zhou
Erasmus University Rotterdam
November 12, 2022
13 / 29
The Kolmogorov-Smirnov test: testing statistic
▶ Suppose we have a random sample Y1 , Y2 , . . . , Yn drawn from an
unknown distribution, with the aim of testing the null hypothesis that
the unknown distribution has some given CDF F0 (y ).
▶ The Kolmogorov statistic
Kn =
√
n sup |Fn (y ) − F0 (y )|
y ∈R
may be used to test this null hypothesis.
▶ Idea: Under the null, the EDF should not be too far off the true CDF
▶ The supremum considers the deviation from the CDF at all places
▶ Under the null, the statistic should not be too high
▶ If we obtain a high value, the null should be rejected
▶ The
√
n term is useful for deriving asymptotic theory
Chen Zhou
Erasmus University Rotterdam
November 12, 2022
14 / 29
KS test: asymptotic result
▶ Under the null, Y1 , Y2 , . . . , Yn are i.i.d. following the CDF F0 (y )
√
▶ As n → ∞, the process { n(Fn (y ) − F0 (y )), y ∈ R} converges to the
process {B(F0 (y )), y ∈ R}, where B is a Brownian bridge
▶ Therefore, by taking absolute value, and then supremum we get
√
d
n sup |Fn (y ) − F0 (y )| → sup |B(F0 (y ))|
y ∈R
y ∈R
▶ The limit depends on F0 ?
▶ No! For all y ∈ R, F0 (y ) takes all values in [0, 1]
sup |B(F0 (y ))| = sup |B(u)|
y ∈R
u∈[0,1]
▶ Hence, as n → ∞
d
Kn → sup |B(u)|
u∈[0,1]
Chen Zhou
Erasmus University Rotterdam
November 12, 2022
15 / 29
KS test: the limit distribution
d
Kn → sup |B(u)|
u∈[0,1]
▶ The limit does not depend on F0 (y )
▶ We can use the test for testing any F0
▶ The limit is a (positive) univariate random variable well defined
▶ Can we obtain its CDF, PDF? Not explicitly
▶ The limit distribution is related to the supremum of the absolute
value of a Brownian brdige.
P( sup |B(u)| > y )
u∈[0,1]
the boundary crossing probability of a Brownian bridge from both side
▶ Let us start with a simple boundary crossing probability (one side)
Chen Zhou
Erasmus University Rotterdam
November 12, 2022
16 / 29
The supremum of a Brownian bridge
▶ Consider the random variable sup0≤u≤1 B(u), the supremum of the
Brownian bridge over the interval [0, 1].
▶ For y > 0,
P( sup B(u) > y ) = exp{−2y 2 },
0≤u≤1
▶ This is the probability that a Brownian bridge exceeds the level y
somewhere in the interval [0, 1].
Chen Zhou
Erasmus University Rotterdam
November 12, 2022
17 / 29
The supremum of a Brownian bridge: proof
▶ We use the following relation between the Brownian motion
{W (t), t ≥ 0} and the Brownian bridge.
▶ The stochastic process {X (u), 0 ≤ u ≤ 1} defined by
u
for 0 ≤ u < 1,
X (u) = (1 − u)W
1−u
and X (1) = 0 is a Brownian bridge.
▶ This will be left as a homework exercise.
▶ Let t denote u/(1 − u).
▶ If 0 ≤ u < 1, then 0 ≤ t < ∞.
▶ u is equal to t/(1 + t), and thus 1 − u is equal to 1/(1 + t).
Chen Zhou
Erasmus University Rotterdam
November 12, 2022
18 / 29
The supremum of a Brownian bridge: proof
▶ We may write
P( sup B(u) > y )
0≤u≤1
= P( sup (1 − u)W
0≤u<1
u
1−u
> y)
W (t)
> y)
t≥0 1 + t
= exp −2y 2 .
= P(sup
▶ In the last equation, we use the boundary crossing probability of the
Brownian motion W (t) over a linear boundary y (1 + at). (Doob!)
Chen Zhou
Erasmus University Rotterdam
November 12, 2022
19 / 29
The absolute supremum of a Brownian bridge
▶ Consider the random variable sup0≤u≤1 |B(u)|, the absolute
supremum of the Brownian bridge over the interval [0, 1].
▶ For y > 0,
P( sup |B(u)| > y ) = 2
0≤u≤1
∞
X
(−1)j+1 exp{−2j 2 y 2 }.
j=1
▶ This is the probability that the absolute value of a Brownian bridge
exceeds the level y somewhere in the interval [0, 1].
Chen Zhou
Erasmus University Rotterdam
November 12, 2022
20 / 29
The absolute supremum of a Brownian bridge: proof
▶ Similar as before,
P( sup |B(u)| > y )
0≤u≤1
= P( sup (1 − u)W
0≤u<1
= P(sup
t≥0
u
1−u
> y)
|W (t)|
> y)
1+t
∞
X
=2
(−1)j+1 exp{−2j 2 y 2 }.
j=1
▶ In the last equation, we use the probability that the absolute value of
a Brownian motion W (t) crosses a linear boundary y (1 + at).
▶ Such a complex CDF is useless, for practical use of the KS test, we
need a Table of critical values
Chen Zhou
Erasmus University Rotterdam
November 12, 2022
21 / 29
The KS test: critical values
▶ Selected critical values of the absolute supremum of the Brownian
bridge on the unit interval are given below.
x P(T ≤ x)
1.22 0.900
1.36 0.950
1.52 0.975
1.63 0.990
Chen Zhou
Erasmus University Rotterdam
November 12, 2022
22 / 29
Brownian bridge: application
▶ Now, suppose we have drawn the sample.
▶ One way to perform the Kolmogorov-Smirnov test is to draw Fn (y )
first.
▶ Determine a maximum allowed distance by multiplying your favorite
√
critical value kα by 1/ n.
√
▶ Draw the two lines Fn (y ) ± kα / n in red.
▶ These lines are called the confidence bands for the CDF of the
unknown distribution from which we sampled Y1 , Y2 , . . . , Yn .
▶ If F0 (y ) falls completely between the red lines, do not reject the null
hypothesis.
▶ If F0 (y ) exceeds one of the red lines, reject the null hypothesis.
Chen Zhou
Erasmus University Rotterdam
November 12, 2022
23 / 29
Brownian bridge: application
▶ The empirical distribution function Fn (y ) (blue), confidence bands for
the unknown CDF (red), and a hypothetical CDF F0 (y ) (green). As
the green line exceeds one of the red lines, reject the null hypothesis.
Chen Zhou
Erasmus University Rotterdam
November 12, 2022
24 / 29
Other continuous-time continuous-state processes
Chen Zhou
Erasmus University Rotterdam
November 12, 2022
25 / 29
A Gaussian process: Ornstein-Uhlenbeck
▶ Let {W (t), t ≥ 0} be Brownian motion on the interval [0, ∞), and let
α ≥ 0.
▶ The stochastic process {X (t), t ≥ 0} defined by
X (t) = e −αt/2 W e αt
is called the Ornstein-Uhlenbeck process.
▶ The Ornstein-Uhlenbeck process {X (t), t ≥ 0} is a Gaussian process
with zero mean function and covariance function
cov(X (t1 ), X (t2 )) = exp{−α|t1 − t2 |/2}.
▶ Note that the covariance function of the Ornstein-Uhlenbeck process
only depends on t1 and t2 through |t1 − t2 |.
Chen Zhou
Erasmus University Rotterdam
November 12, 2022
26 / 29
Difference between Ornstein-Uhlenbeck and Brownian
motion
▶ Although they are both Gaussian processes, they are quite different
▶ Stationarity
▶ Brownian motion is NOT a stationary process
▶ Ornstein-Uhlenbeck process is a stationary process
▶ Increment
▶ Brownian motion a process with independent (and stationary)
increments
▶ Ornstein-Uhlenbeck process does not have independent increments, but
it has stationary increments
Chen Zhou
Erasmus University Rotterdam
November 12, 2022
27 / 29
Ornstein-Uhlenbeck process: visualization and applications
▶ Plot of the Ornstein-Uhlenbeck process.
▶ Applications
▶ Originally, the Ornstein-Uhlenbeck process was used to describe the
velocity of a particle in a fluid.
▶ Nowadays, the Ornstein-Uhlenbeck process is also applied for modeling
interest rate process in finance.
Chen Zhou
Erasmus University Rotterdam
November 12, 2022
28 / 29
A non-Gaussian processes: geometric Brownian motion
▶ Let {W (t), t ≥ 0} be Brownian motion on the interval [0, ∞).
▶ The stochastic process {X (t), t ≥ 0} defined by
X (t) = e µt+σW (t)
is called the geometric Brownian motion with drift coefficient µ and
variance parameter σ 2 .
▶ Often used in finance to model the asset values
▶ If {X (t), t ≥ 0} is a geometric Brownian motion with drift coefficient
µ and variance parameter σ 2 , then {σ −1 ln X (t), t ≥ 0} is a Brownian
motion with drift coefficient µ/σ.
▶ Note that the geometric Brownian motion is not a Gaussian process!
▶ For fixed t, the random variable has a log-normal distribution with
parameters µt and σ 2 t.
Chen Zhou
Erasmus University Rotterdam
November 12, 2022
29 / 29
Plan for Lecture 15
1. A difficult example: from BB to BM
2. An old exam question
Chen Zhou
Erasmus University Rotterdam
November 23, 2022
1 / 19
A difficult example: from BB to BM
Chen Zhou
Erasmus University Rotterdam
November 23, 2022
2 / 19
A quick refresh about Brownian motion and Broawnian
bridge
▶ They are both Gaussian processes
▶ Any finite dimensional distribution follows a multivariate normal
distribution
▶ The mean function is zero
▶ The covariance function describes the whole process
▶ They both have continuous sample path
▶ Different covariance functions
▶ For Brownian motion cov(W (t1 ), W (t2 )) = min(t1 , t2 )
▶ For Brownian bridge cov(B(t1 ), B(t2 )) = min(t1 , t2 ) − t1 t2
▶ To prove a process is a BM or BB is often down to calculate the
covariance functions.
Chen Zhou
Erasmus University Rotterdam
November 23, 2022
3 / 19
Alternative way from Brownian bridge to Brownian motion
▶ Let {B(t), 0 ≤ t ≤ 1} be a Brownian bridge
▶ Show that the process {X (t), t ≥ 0} defined by
Rt
X (t) = B(t) − 0 B(s)
s ds for 0 ≤ t ≤ 1 is a Brownian motion on the
unit interval.
▶ Key steps in proving such a result
▶ Show that {X (t), t ≥ 0} is a Gaussian process
▶ Show that the mean function is always zero
▶ Calculate the covariance function at two locations t1 and t2 , show that
it is min(t1 , t2 )
Chen Zhou
Erasmus University Rotterdam
November 23, 2022
4 / 19
Proof
▶ {X (t), t ≥ 0} is a Gaussian process with zero mean function
▶ Linear combinations of multivariate normal distributed random
variables still follow a multivariate normal distribution
▶ Linear operations of a Gaussian process is still a Gaussian process
R
▶ X (t) = B(t) − t B(s)
ds for 0 ≤ t ≤ 1 is a linear operation on the
0 s
Gaussian process {B(t), 0 ≤ t ≤ 1}
▶ We calculate its covariance function cov(X (t1 ), X (t2 ))
▶ Tip: start with assuming t1 ≤ t2 , the other half can be claimed by
“symmetry”
Chen Zhou
Erasmus University Rotterdam
November 23, 2022
5 / 19
Proof: continued
▶ For symmetry assume t1 ≤ t2
cov(X (t1 ), X (t2 ))
Z
= cov B(t1 ) −
Z t2
B(s)
B(s)
ds, B(t2 ) −
ds
s
s
0
0
Z t2
B(s)
= cov(B(t1 ), B(t2 )) − cov B(t1 ),
ds
s
0
Z t1
Z t1
Z t2
B(s)
B(s)
B(s)
− cov
ds, B(t2 ) + cov
ds,
ds .
s
s
s
0
0
0
Chen Zhou
t1
Erasmus University Rotterdam
November 23, 2022
6 / 19
Proof: continued
▶ For the first term in this expression, we simply have
cov(B(t1 ), B(t2 )) = min(t1 , t2 ) − t1 t2 = t1 − t1 t2
▶ For the second term, we have
Z t2
B(s)
ds
− cov B(t1 ),
s
0
Z t2
E(B(t1 )B(s))
=−
ds
s
0
Z t2
Z t2
t1 s
min(t1 , s)
=
ds −
ds
s
s
0
0
Z t1
Z t2
s
t1
= t1 t2 −
ds −
ds
s
0
t1 s
t2
= t1 t2 − t1 − t1 log
t1
Chen Zhou
Erasmus University Rotterdam
November 23, 2022
7 / 19
Proof: continued
▶ For the third term, similar to the second term (but simpler)
Z t1
B(s)
− cov B(t2 ),
ds
s
0
Z t1
E(B(t2 )B(s))
=−
ds
s
0
Z t1
s − st2
=−
ds = −t1 (1 − t2 )
s
0
Chen Zhou
Erasmus University Rotterdam
November 23, 2022
8 / 19
Proof: continued
▶ The fourth term is the most complicated:
Z t1
Z t2
B(s)
B(s)
ds,
ds
cov
s
s
0
0
Z t1 Z t2
E(B(s1 )B(s2 ))
=
ds2 ds1
s1 s2
0
0
Z t1 Z t2
Z t1 Z t2
min(s1 , s2 )
s1 s2
=
ds2 ds1 −
ds2 ds1
s1 s2
0
0
0
0 s1 s2
Chen Zhou
Erasmus University Rotterdam
November 23, 2022
9 / 19
Proof: continued
▶ We calculate the first integral carefully
Z t1 Z t2
min(s1 , s2 )
ds2 ds1
s1 s2
0
0
Z t1 Z s1
Z t1 Z t2
s2
s1
=
ds2 ds1 +
ds2 ds1
0 s1 s2
s1 s1 s2
0
0
Z t1
Z t1
=
1 ds1 +
{log t2 − log s1 } ds1
0
0
= t1 + t1 log t2 − (t1 log t1 − t1 ) = t1 log
t2
+ 2t1
t1
▶ Hence the fourth term is
t1 log
Chen Zhou
t2
+ 2t1 − t1 t2
t1
Erasmus University Rotterdam
November 23, 2022
10 / 19
Proof: continued
▶ After summing the four terms yields
cov(X (t1 ), X (t2 )) = t1 = min(t1 , t2 ).
▶ The process {X (t), t ≥ 0} is a Brownian motion.
Chen Zhou
Erasmus University Rotterdam
November 23, 2022
11 / 19
An old exam question: Gaussian process
Chen Zhou
Erasmus University Rotterdam
November 23, 2022
12 / 19
First half of the question: theory
▶ Let {W (t), t ≥ 0} be a Brownian motion
▶ Let y , b and h be positive real numbers.
▶ Provide an expression for
!
P W (b) < y | sup W (t) ≥ y + h ,
0≤t≤b
in terms of the cumulative distribution function of a standard normal
random variable.
Chen Zhou
Erasmus University Rotterdam
November 23, 2022
13 / 19
How to deal with such questions?
▶ Such questions are not long, but abstract
▶ Try to visualize it by some drawing
▶ We need to turn theoretical questions about Gaussian process into
“standard questions”
▶
▶
▶
▶
▶
▶
About a single location: normal distribution
About several locations: multivariate normal distribution
About supremum: reflection principle turns it to “single location”
About crossing a linear boundary: Doob
About two hitting time: back to random walk and we know the formula
Still complicated? reflection principle helps!
Chen Zhou
Erasmus University Rotterdam
November 23, 2022
14 / 19
Solution
▶ Use the reflection principle to modify the event
!
P W (b) < y | sup W (t) ≥ y + h
0≤t≤b
!
= P W (b) > y + 2h | sup W (t) ≥ y + h .
0≤t≤b
▶ Using the definition of conditional probability, we can write this as
!
!
P W (b) > y + 2h, sup W (t) ≥ y + h
0≤t≤b
!
P
sup W (t) ≥ y + h
0≤t≤b
Chen Zhou
P W (b) > y + 2h
=
!.
P
sup W (t) ≥ y + h
0≤t≤b
Erasmus University Rotterdam
November 23, 2022
15 / 19
Solutoin: continued
▶ The denominator: apply the reflection principle again
!
P
sup W (t) ≥ y + h
= 2 P(W (b) ≥ y + h)
0≤t≤b
▶ We obtain
!
P W (b) < y | sup W (t) ≥ y + h
0≤t≤b
1
P(W (b) > y + 2h)
=
=
2 P(W (b) ≥ y + h)
2
Chen Zhou
√ !
1 − Φ((y + 2h)/ b)
√
.
1 − Φ((y + h)/ b)
Erasmus University Rotterdam
November 23, 2022
16 / 19
Second half of the question: application
▶ A company engages in a long-term project that requires an initial
investment of 25 million euro and subsequently generates or consumes
money (free cash flow) over an infinite horizon.
▶ The cumulative free cash flow Y (t) over the first t years of the
project satisfies
Y (t) = 5W (t) + (5/2)t,
for all t ≥ 0, where {W (t), t ≥ 0} is a Brownian motion.
▶ The cash position for this project, at any moment in time, is the
cumulative free cash flow up to that point minus the initial
investment.
▶ At the start of the project, a local government guarantees to offer a
loan when the cash position falls below −30 million euro.
▶ Determine the probability that the cash position of the project ever
drops below that value.
Chen Zhou
Erasmus University Rotterdam
November 23, 2022
17 / 19
How to deal with such questions?
▶ Such questions are again long with lots of texts
▶ Often a stochastic process is already defined
▶ These questions are often about “boundary crossing”
▶ Which process (often defined in the question)?
▶ What is the boundary (defined as the “event”)?
▶ Do not be fooled by linear or constant bound: after some manipulation
it can change!
▶ Focus on the time horizon
▶ Infinite horizon: Doob
▶ Finite horizon: reflection principle
Chen Zhou
Erasmus University Rotterdam
November 23, 2022
18 / 19
Solution
▶ The question is about P (inf t≥0 Y (t) − 25 ≤ −30). We rewrite it as
P inf Y (t) − 25 ≤ −30
t≥0
=P (∃t ∗ ≥ 0, s.t. Y (t ∗ ) − 25 ≤ −30)
=P (∃t ∗ ≥ 0, s.t. W (t ∗ ) ≤ −1 − (1/2)t)
=P (∃t ∗ ≥ 0, s.t.
− W (t ∗ ) ≥ 1 + (1/2)t)
▶ By symmetry of the Brownian motion, {−W (t), t ≥ 0} is also a
Brownian motion. We continue as
P inf Y (t) − 25 ≤ −30
t≥0
=P (∃t ∗ ≥ 0, s.t. W (t ∗ ) ≥ 1 + (1/2)t)
W (t)
Doob
=P sup
≥1
= exp{−2 · (1/2) · 12 } = exp{−1}.
t≥0 1 + (1/2)t
Chen Zhou
Erasmus University Rotterdam
November 23, 2022
19 / 19
What are the exam questions?
Chen Zhou
Erasmus University Rotterdam
November 13, 2022
1 / 35
An overview
Chen Zhou
Erasmus University Rotterdam
November 13, 2022
2 / 35
General structure
▶ The course contains three major units
▶ Discrete time Markov chain
▶ Continuous time Markov chain
▶ Poisson process as the foundation
▶ General continuous time Markov chain
▶ Gaussian processes
▶ The final exam has four questions
▶ Yes! One per each part (Poisson process counts as an individual part)
▶ Roughly equal number of points
Chen Zhou
Erasmus University Rotterdam
November 13, 2022
3 / 35
Most difficult issue: Modeling
▶ Exam questions are often about a real life case
▶ Modeling it by a (Markov) stochastic process
▶ The modeling part is often awarded with points
▶ With a correct model, most questions turn to be standard
▶ Recipes
▶
▶
▶
▶
Read the description carefully (sentence by sentence)
Determine what to model (not always straightforward)
Use proper notations: declare a notation before using it
Describe the question using the notation
▶ Notation: often we are defining a process
▶ Discrete time: {Xn , n ≥ 0}
▶ Continuous time: {X (t), t ≥ 0}
Chen Zhou
Erasmus University Rotterdam
November 13, 2022
4 / 35
Solving (exam) exercises
▶ With defining proper notations, the questions are half solved!
▶ Narrow down what the question is about and which part is related
▶ Step 1: which part (out of the four) it belongs to?
▶ Step 2: which sub-area within the part?
▶ Step 3: which exact standard question it corresponds to?
▶ “Conditioning on the first step”
▶
▶
▶
▶
Different meaning in different contexts
A convenient tool for solving standard questions
A last resort for difficult questions
Don’t use it too often
▶ Conditional probability and conditional expectation everywhere!
Chen Zhou
Erasmus University Rotterdam
November 13, 2022
5 / 35
Disclaimers
▶ What I say is what I say!
▶ What I do not say is what I do not say!
▶ The word “often” often means “often”: it is not guaranteed!
▶ To obtain 10, you are expected to master all corners of the course.
Chen Zhou
Erasmus University Rotterdam
November 13, 2022
6 / 35
Discrete time Markov chain
Chen Zhou
Erasmus University Rotterdam
November 13, 2022
7 / 35
Sub-areas for discrete time Markov chain
▶ Defining Markov chain: short-run view
▶ Classification of states
▶ Long-run limiting distribution
▶ Long-run behavior starting from transient states
Chen Zhou
Erasmus University Rotterdam
November 13, 2022
8 / 35
Defining Markov chain: short-run view
▶ Learning points
▶ Definition of Markov property
▶ Transition matrix
▶ n−step transition: Chapman-Kolmogorov equations
▶ Potential exam questions
Q1 Check a process is (or is not) Markovian, or redefine states to obtain a
Markov chain
Q2 Calculate n−step transition probability (often n ≤ 3)
▶ Recipes
Q2 Using matrix production can be slow, better think about potential path
Chen Zhou
Erasmus University Rotterdam
November 13, 2022
9 / 35
Classification of states
▶ Learning points
▶ Recurrent and transient states
▶ Positive recurrent and null recurrent states
▶ Periodic and aperiodic states
▶ Communicating and classification
▶ Using a diagram
▶ Potential exam questions
Q3 Distinguish recurrent and transient classes
Q4 Argue that a state is aperiodic
▶ Recipes: provide arguments
Q3 communicating (by a circle path), recurrent (by finite states, or
absorbing), transient (by a path towards a recurrent class)
Q4 by two circle paths with co-prime steps, or by a path to itself
Chen Zhou
Erasmus University Rotterdam
November 13, 2022
10 / 35
Long-run limiting distribution
▶ Learning points
▶ Condition: only for irreducible ergodic Markov chain
▶ All states must be in one class, positive recurrent, and aperodic
▶ Solving the limit distribution: equivalent to the stationary distribution
Steady state equations:
π = PT π, 1T π = 1
▶ Long-run expected reward/cost: mean using long-run limiting
distribution
▶ Potential exam questions
Q5 Write down the steady state equations (and solve them)
Q6 Calculate long-run expected reward/cost
▶ Recipes
Q5 Do not forget the transpose. Do not forget “summing to one”. You
may remove one of the other equations (often the complicated one!)
Q6 Be careful whether the reward/cost is deterministic or random
Chen Zhou
Erasmus University Rotterdam
November 13, 2022
11 / 35
Long-run behavior starting from transient states
▶ Learning points
▶ Notations: sij , fij , miR , fiR1
▶
▶
▶
▶
sij : starting from i, the expected number of visits to j
fij : starting from i, the probability that every visits j
miR : starting from i, the expected number of steps before entering R
fiR1 : staring from i, the probability that enters R1
▶ Solving these quantities
▶ Do not memorize: open book!
▶ Often they are obtained by “conditioning on the first step”
▶ No need to use the full matrix inversion (often only one column is used)
▶ Potential exam questions
Q7 Calculating these quantities
▶ Recipes
▶ For miR and fiR1 define the recurrent (sub)class smartly.
▶ Sometimes collapsing all recurrent states in the same (sub)class into
once state helps to simplify the transition matrix.
Chen Zhou
Erasmus University Rotterdam
November 13, 2022
12 / 35
Poisson processes
Chen Zhou
Erasmus University Rotterdam
November 13, 2022
13 / 35
Sub-areas for Poisson processes
▶ Exponential distribution: properties
▶ Multiple definitions of Poisson processes
▶ Properties of Poisson processes: merging, decomposing
▶ Conditional arrival distribution
▶ Non-homogeneous Poisson process
Chen Zhou
Erasmus University Rotterdam
November 13, 2022
14 / 35
Exponential distribution: properties
▶ Learning points
▶
▶
▶
▶
Memoryless property
Minimum of independent exponential distributed random variables
Probability of X1 < X2
Independence between minima and the order
▶ Potential exam questions
▶ Exponential distribution is always embedded in questions regarding
Poisson process (e.g. inter arrival time) and continuous time Markov
chain (e.g. instantaneous transition rates)
Chen Zhou
Erasmus University Rotterdam
November 13, 2022
15 / 35
Multiple definitions of Poisson processes
▶ Learning points
▶ Definition based on counting process
▶ Definition based on instantaneous arrival rate
▶ After studying general continuous time Markov chain, you know what it
means!
▶ Definition based on inter arrival time
▶ Properties
▶ Independent and stationary increments
▶ Potential exam questions
Q8 Calculate (conditional) probabilities regarding a counting event
▶ Make use of independent and stationary increments as much as possible
Q9 Calculate (conditional) expectation regarding an arrival time
▶ Make use of the memoryless property as much as possible
▶ Recipes
▶ The devil is in the notations: write down your arguments!
▶ Sometimes, one can convert time to counting and vice versa
Chen Zhou
Erasmus University Rotterdam
November 13, 2022
16 / 35
Merging and decomposingof Poisson processes
▶ Learning points
▶ For merging, the two initial processes must be independent
Consequence: event “type” is independent of arrival
▶ For decomposing, “type” must be independent of the arrival
Consequence: the two decomposed processes are independent
▶ Potential exam questions
Q10 Analyze a merged or decomposed Poisson process
▶ Recipes
▶ Define notations for the merged or decomposed process; claim that
they are Poisson with calculated rates
▶ Use/Argue the independence properties as much as possible: between
subprocesses, between “type” and arrival
Chen Zhou
Erasmus University Rotterdam
November 13, 2022
17 / 35
Conditional arrival distribution
▶ Learning points
▶ Conditioning on N(t) = n, the arrival times of n events in [0, t] follows
the joint distribution as the order statistics of n i.i.d. U(0, t).
▶ If n = 1, the only arrival time follows U(0, t)
▶ If n > 1, the i-th arrival time follows U(i) where U(1) ≤ · · · ≤ U(n)
denotes the order statistics of U1 , U2 , · · · , Un (i.i.d. U(0, t))
▶ Potential exam questions
Q11 Analyze quantities related to (S1 , · · · , Sn ) given N(t) = n
▶ Recipes
▶ Do not forget the key elements in the statement: “Conditioning event”,
“joint distribution”, “order statistics”, “i.i.d.”, “Uniform distribution”
Chen Zhou
Erasmus University Rotterdam
November 13, 2022
18 / 35
Non-homogeneous Poisson process
▶ Learning points
▶ Distribution of an increment based on the rate function
▶ Independent increments (but not stationary!)
▶ Potential exam questions
Q12 Calculate (conditional) probabilities regarding a counting event for a
non-homogeneous Poisson process
▶ Recipes
▶ Make use of independent increments as much as possible. Be careful,
there is no stationary increment: always resort to integrals over the
rate function
Chen Zhou
Erasmus University Rotterdam
November 13, 2022
19 / 35
General continuous time Markov chain
Chen Zhou
Erasmus University Rotterdam
November 13, 2022
20 / 35
Sub-areas for general continuous time Markov chain
▶ Model definition
▶ Kolmogorov backward and forward equations
▶ Long-run limiting distribution
▶ Queueing theory
Chen Zhou
Erasmus University Rotterdam
November 13, 2022
21 / 35
Model definition
▶ Learning points
▶ Defining a continuous time Markov chain
▶ Parameters used: vi , Pij
▶ Summarized in instantaneous transition rate: qij
▶ Example: Birth and death process
▶ Only about arrival (rate) and departure (rate)
▶ Be careful with what “birth” (or “death”) means
▶ Potential exam questions
Q13 Model a real life process by a continuous time Markov chain, write
down instantaneous transition rates
▶ Recipes
▶ Define states carefully (sometimes by number, sometimes by “status”)
▶ For each state, think about where it can move to in “one move” and
what is the waiting time to that “move” (must be exponential)
▶ Having “two events” (arrival or departure) simultaneously is
“negligible” (rate zero, based on the o(h) argument)
Chen Zhou
Erasmus University Rotterdam
November 13, 2022
22 / 35
Kolmogorov backward and forward equations
▶ Learning points
▶ Differential equations to characterize transition probability function
▶ Instantaneous rate: what happens in time h as h → 0
▶ Transition probability function: what happens after t for t fixed
▶ Limiting distribution: what happens in long-run as t → ∞
▶ Difference between backward and forward equations
▶ “Backward” is about Pij (t) for given landing location j
▶ “Forward” is about Pij (t) for given departure location i
▶ Potential exam questions
Q14 Write down the Kolmogorov backward (or forward) equations
▶ Recipes
▶ Backward: think about from i where it may go to in “one move”
▶ Forward: think about from where, in “one move”, it can reach j
Chen Zhou
Erasmus University Rotterdam
November 13, 2022
23 / 35
Long-run limiting distribution
▶ Learning points
▶ Balance equations for solving long-run limit
▶ Cut equations for Birth and death process
▶ Potential exam questions
Q15 Write down balance equations (and solve them)
▶ Recipes
▶ Balance equations are used for general case while cut equations are
easier for Birth and death process (queueing)
▶ On the left side, for state i, think about where it can reach in “one
move”. Add all the rates
▶ On the right side, for state i, think about from where in “one move” it
can reach i, write the weighted sum per state
▶ To solve balance equation (cut equation), convert all probabilities into
a multiple of one of them, then use the fact that they sum to 1.
Chen Zhou
Erasmus University Rotterdam
November 13, 2022
24 / 35
Queueing theory
▶ Learning points
▶
▶
▶
▶
Kendall’s notation
Little’s law: L = λW
PASTA principle: an = Pn
Different queueing problems (M/M/1, ...)
▶ Potential exam questions
Q16 Recognize a real life problem as a queueing with Kendall’s notation
Q17 Solve L, λ or W for a queueing problem
Q18 Apply (and declare) PASTA principle
▶ Recipes
Q16 Be careful with what is the system!
Q17 L is derived from the limiting distribution, λ counts admitted clients
only, W sometimes can be calculated between two different systems
Q18 Question is often not about “what an arrival client see” directly, but
can be argued to be equivalent to that
Chen Zhou
Erasmus University Rotterdam
November 13, 2022
25 / 35
Gaussian processes
Chen Zhou
Erasmus University Rotterdam
November 13, 2022
26 / 35
Sub-areas for general continuous time Gaussian processes
▶ Brownian motion as a limit of random walk
▶ Reflection principle and boundary crossing probability
▶ Hitting time
▶ Relation between Brownian motion and Brownian bridge
▶ Other (non-)Gaussian processes
Chen Zhou
Erasmus University Rotterdam
November 13, 2022
27 / 35
Brownian motion as a limit of random walk
▶ Learning points
▶ Definition of Brownian motion using limit and Rademacher variables
▶ Properties of Brownian motion
▶ Multivariate normal marginal with covariance function min(t1 , t2 )
▶ Independent and stationary increments
▶ Potential exam questions
▶ The definition of Brownian motion will be embedded in other questions
regarding proving something as Brownian motion/bridge
▶ The definition as a limit of random walk might be used for solving
difficult questions such as comparing two hitting times
Chen Zhou
Erasmus University Rotterdam
November 13, 2022
28 / 35
Reflection principle and boundary crossing probability
▶ Learning points
▶
▶
▶
▶
Reflection principle
One side constant boundary crossing
Double sides constant boundary crossing using inclusion-exclusion
One side linear boundary crossing: Doob
▶ Potential exam questions
Q19 Apply reflection principle
Q20 Relate a real life case to boundary crossing probabilities
▶ Recipes
Q19 The reflection principle can be applied if the sample path is still
continuous after reflection. It is often applied to simplify the problem,
i.e. converting the problem to checking “the end of the process”.
Q20 Recognize whether it is finite time horizon or infinite, whether for
constant bound or linear bound (actually, always paired!)
Q20 Be careful with sup or inf. If not confident, use ∃ notation
Chen Zhou
Erasmus University Rotterdam
November 13, 2022
29 / 35
Hitting time
▶ Learning points
▶ One hitting time: distribution obtained from reflection principle
▶ Two hitting times: comparison done using the limit of random walk
▶ Potential exam questions
Q21 Calculate probabilities related to one or two hitting times
▶ Recipes
▶ Convert the real life question and recognize that it is about hitting time
▶ Check how many hitting times are involved and decide what to use
Chen Zhou
Erasmus University Rotterdam
November 13, 2022
30 / 35
Relation between Brownian motion and Brownian bridge
▶ Learning points
▶ Similarity and differences between BM and BB
▶ Construct BB from BM (force ending at zero)
▶ Construct BM from BB (compensate “randomness” at the end)
▶ Potential exam questions
Q22 Prove that one process is BM or BB
▶ Recipes
▶ Three-steps approach: “Gaussian process”, “mean function zero”,
“covariance function”
▶ Provide argument for the first step
▶ You know the result of the covariance calculation even without
calculation.
Chen Zhou
Erasmus University Rotterdam
November 13, 2022
31 / 35
Other (non-)Gaussian processes
▶ Learning points
▶ Ornstein-Uhlembeck process: stationary Gaussian process
▶ Geometric Brownian motion: non-Gaussian process
▶ Potential exam questions
Q23 Handling a real life process modeled by these processes
▶ Recipes
▶ Don’t panic, they are all based on Brownian motion
▶ Gradually adjust the process to turn it into a question about Brownian
motion
Chen Zhou
Erasmus University Rotterdam
November 13, 2022
32 / 35
Final words
Chen Zhou
Erasmus University Rotterdam
November 13, 2022
33 / 35
▶ What are the exam questions?
▶ The four exam questions are a subset of these 23 types of questions, or
their combinations
▶ Definitely!
▶ How difficult the exam can be?
▶ If you studied well, the real difficulty is in modeling
▶ Pay extra attention in defining your notation
▶ Do not worry too much about calculation
Chen Zhou
Erasmus University Rotterdam
November 13, 2022
34 / 35
Good luck for your exam!
Chen Zhou
Erasmus University Rotterdam
November 13, 2022
35 / 35
Download