Introduction to Algorithmic Trading Strategies
Lecture 5
Pairs Trading by Stochastic Spread Methods
Haksun Li haksun.li@numericalmethod.com
www.numericalmethod.com
Outline
First passage time
Kalman filter
Maximum likelihood estimate
EM algorithm
2
References
As the emphasis of the basic co-integration methods of most papers are on the construction of a synthetic mean-reverting asset, the stochastic spread methods focuses on the dynamic of the price of the synthetic asset.
Most referenced academic paper: Elliot, van der Hoek, and Malcolm,
2005, Pairs Trading
Model the spread process as a state-space version of Ornstein-Uhlenbeck process
Jonathan Chiu, Daniel Wijaya Lukman, Kourosh Modarresi, Avinayan
Senthi Velayutham. High-frequency Trading. Stanford University. 2011
The idea has been conceived by a lot of popular pairs trading books
Technical analysis and charting for the spread, Ehrman, 2005, The
Handbook of Pairs Trading
ARMA model, HMM ARMA model, some non-parametric approach, and a
Kalman filter model, Vidyamurthy, 2004, Pairs Trading: Quantitative
Methods and Analysis
3
Spread as a Mean-Reverting Process
𝑥 𝑘
− 𝑥 𝑘−1
= 𝑎 − 𝑏𝑥 𝑘−1
= 𝑏 𝑎 𝑏
− 𝑥 𝑘−1 𝜏 + 𝜎 𝜏𝜀
The long term mean = 𝑎 𝑏
.
𝑘 𝜏 + 𝜎 𝜏𝜀 𝑘
The rate of mean reversion = 𝑏 .
4
Sum of Power Series
We note that
= 𝑘−1 𝑖=0 𝑎 𝑖 = 𝑎 𝑘
−1 𝑎−1
5
Unconditional Mean
𝐸 𝑥 𝑘
= 𝜇 𝑘
= 𝜇 𝑘−1
+ 𝑎 − 𝑏𝜇
= 𝑎𝜏 + 1 − 𝑏𝜏 𝜇 𝑘−1 𝑘−1 𝜏
= 𝑎𝜏 + 1 − 𝑏𝜏 𝑎𝜏 + 1 − 𝑏𝜏 𝜇 𝑘−2
= 𝑎𝜏 + 1 − 𝑏𝜏 𝑎𝜏 + 1 − 𝑏𝜏 2 𝜇 𝑘−2
= 𝑘−1 𝑖=0
1 − 𝑏𝜏 𝑖 𝑎𝜏 + 1 − 𝑏𝜏 𝑘 𝜇
0
= 𝑎𝜏
1− 1−𝑏𝜏
1− 1−𝑏𝜏 𝑘
+ 1 − 𝑏𝜏 𝑘 𝜇
0
= 𝑎𝜏
1− 1−𝑏𝜏 𝑏𝜏 𝑎 𝑎
= − 𝑘
1 − 𝑏𝜏 𝑏 𝑏
+ 1 − 𝑏𝜏 𝑘 𝑘 𝜇
0
+ 1 − 𝑏𝜏 𝑘 𝜇
0
6
Long Term Mean
𝑎 𝑏
→ 𝑎
− 𝑎 𝑏 𝑏
1 − 𝑏𝜏 𝑘 + 1 − 𝑏𝜏 𝑘 𝜇
0
7
Unconditional Variance
Var 𝑥 𝑘
= 𝜎 𝑘
2
= 1 − 𝑏𝜏
= 1 − 𝑏𝜏
= 1 − 𝑏𝜏
2 𝜎 𝑘−1
2
2
+ 𝜎
1 − 𝑏𝜏 2
2 𝜏 𝜎 𝑘−2
2
2 𝜎 𝑘−1
2
+ 𝜎 2
+ 𝜎 2 𝜏 𝜏 + 𝜎 2 𝜏
= 𝜎 2 𝜏 𝑘−1 𝑖=0
1 − 𝑏𝜏 2𝑖 + 1 − 𝑏𝜏 2𝑘 𝜎
0
2
= 𝜎 2 𝜏
1− 1−𝑏𝜏
1− 1−𝑏𝜏
2𝑘
2
+ 1 − 𝑏𝜏 2𝑘 𝜎
0
2
8
Long Term Variance
𝜎 2 𝜏
1− 1−𝑏𝜏
1− 1−𝑏𝜏
2𝑘
2 𝜎
2 𝜏
+ 1 − 𝑏𝜏
→
1− 1−𝑏𝜏 2
2𝑘 𝜎
0
2
9
Observations and Hidden State Process
The hidden state process is: 𝑥 𝑘
= 𝑥 𝑘−1
+ 𝑎 − 𝑏𝑥 𝑘−1
= 𝑎𝜏 + 1 − 𝑏𝜏 𝑥 𝑘−1
+ 𝜎 𝜏𝜀 𝑘
= 𝐴 + 𝐵𝑥 𝑘−1
+ 𝐶𝜀 𝑘
𝐴 ≥ 0, 0 < 𝐵 < 1 𝜏 + 𝜎 𝜏𝜀 𝑘
The observations:
𝑦 𝑘
= 𝑥 𝑘
+ 𝐷𝜔 𝑘
We want to compute the expected state from observations.
𝑥 𝑘 𝑥 𝑘|𝑘
= 𝐸 𝑥 𝑘
|𝑌 𝑘
10
First Passage Time
Standardized Ornstein-Uhlenbeck process
𝑑𝑍 𝑡 = −𝑍 𝑡 𝑑𝑡 + 2𝑑𝑊 𝑡
First passage time
𝑇
0,𝑐
= inf 𝑡 ≥ 0, 𝑍 𝑡 = 0|𝑍 0 = 𝑐
The pdf of 𝑇
0,𝑐
𝑡 =
1
2 ln 1 +
1
2 has a maximum value at 𝑐 2 − 3 2 + 4𝑐 2 + 𝑐 2 − 3
11
A Sample Trading Strategy
𝑥 𝑘
= 𝑥 𝑘−1
+ 𝑎 − 𝑏𝑥 𝑘−1 𝜏 + 𝜎 𝜏𝜀 𝑘 𝑑𝑋 𝑡 = 𝑎 − 𝑏𝑋 𝑡 𝑑𝑡 + 𝜎𝑑𝑊 𝑡
𝑋 0 = 𝜇 + 𝑐 𝜎
2𝜌
, 𝑋 𝑇 = 𝜇
1
𝑇 = 𝜌
𝑡
Buy when 𝑦 𝑘
< 𝜇 − 𝑐 𝜎
2𝜌 unwind after time 𝑇
Sell when 𝑦 𝑘
> 𝜇 + 𝑐 𝜎
2𝜌 unwind after time 𝑇
12
Kalman Filter
The Kalman filter is an efficient recursive filter that estimates the state of a dynamic system from a series of incomplete and noisy measurements.
13
Conceptual Diagram as new measurements come in prediction at time t Update at time t+1 correct for better estimation
14
A Linear Discrete System
𝑥 𝑘
= 𝐹 𝑘 𝑥 𝑘−1
+ 𝐵 𝑘 𝑢 𝑘
+ 𝜔 𝑘
𝐹 𝑘
: the state transition model applied to the previous state
𝐵 𝑘
: the control-input model applied to control vectors 𝜔 𝑘
~𝑁 0, 𝑄 𝑘
: the noise process drawn from multivariate Normal distribution
15
Observations and Noises
𝑧 𝑘
= 𝐻 𝑘 𝑥 𝑘
+ 𝑣 𝑘
𝐻 𝑘
: the observation model mapping the true states to observations
𝑣 𝑘
~𝑁 0, 𝑅 𝑘
: the observation noise
16
Discrete System Diagram
17
Prediction
predicted a prior state estimate
𝑥 𝑘|𝑘−1
= 𝐹 𝑘 𝑥 𝑘−1|𝑘−1
+ 𝐵 𝑘 𝑢 𝑘
predicted a prior estimate covariance
𝑃 𝑘|𝑘−1
= 𝐹 𝑘
𝑃 𝑘−1|𝑘−1
𝐹 𝑘
𝑇
+ 𝑄 𝑘
18
Update
measurement residual
𝑦 𝑘
= 𝑧 𝑘
− 𝐻 𝑘 𝑥 𝑘|𝑘−1
residual covariance
𝑆 𝑘
= 𝐻 𝑘
𝑃 𝑘|𝑘−1
𝐻 𝑘
𝑇
+ 𝑅 𝑘
optimal Kalman gain
𝐾 𝑘
= 𝑃 𝑘|𝑘−1
𝐻 𝑘
𝑇
𝑆 𝑘
−1
updated a posteriori state estimate
𝑥 𝑘|𝑘 𝑥 𝑘|𝑘−1
+ 𝐾 𝑘 𝑦 𝑘
updated a posteriori estimate covariance
𝑃 𝑘|𝑘
= 𝐼 − 𝐾 𝑘
𝐻 𝑘
𝑃 𝑘|𝑘−1
19
Computing the ‘Best’ State Estimate
Given 𝐴 , 𝐵 , 𝐶 , 𝐷 , we define the conditional variance
𝑅 𝑘
= Σ 𝑘|𝑘
≡ E 𝑥 𝑘 𝑥 𝑘
2 |𝑌 𝑘
Start with 𝑥
0|0
= 𝑦
0
, 𝑅
0
= 𝐷 2
.
20
Predicted (a Priori) State Estimation
𝑥 𝑘+1|𝑘
= E 𝑥 𝑘+1
|𝑌 𝑘
= E 𝐴 + 𝐵𝑥 𝑘
+ 𝐶𝜀 𝑘+1
|𝑌 𝑘
= E 𝐴 + 𝐵𝑥 𝑘
|𝑌 𝑘
= 𝐴 + 𝐵 E 𝑥 𝑘
|𝑌 𝑘 𝑥 𝑘|𝑘
21
Predicted (a Priori) Variance
Σ 𝑘+1|𝑘
= E 𝑥 𝑘+1
= E 𝐴 + 𝐵𝑥 𝑘 𝑥 𝑘+1
+ 𝐶𝜀
2 |𝑌 𝑘 𝑘+1 𝑥 𝑘+1
2 |𝑌 𝑘
= E 𝐴 + 𝐵𝑥 𝑘
+ 𝐶𝜀 𝑘+1
2
|𝑌 𝑘 𝑥 𝑘|𝑘
= E 𝐵𝑥 𝑘 𝑥 𝑘|𝑘
+ 𝐶𝜀 𝑘+1
= E 𝐵𝑥 𝑘 𝑥 𝑘|𝑘
= 𝐵 2 Σ 𝑘|𝑘
+ 𝐶 2
2
+ 𝐶 2 𝜀 2 𝑘+1
|𝑌 𝑘
2
|𝑌 𝑘
22
Minimize Posteriori Variance
Let the Kalman updating formula be
𝑥 𝑘+1 𝑥 𝑘+1|𝑘+1 𝑥 𝑘+1|𝑘
+ 𝐾 𝑦 𝑘+1 𝑥 𝑘+1|𝑘
We want to solve for K such that the conditional variance is minimized.
Σ 𝑘+1|𝑘
= E 𝑥 𝑘+1 𝑥 𝑘+1
2 |𝑌 𝑘
23
Solve for K
E 𝑥 𝑘+1 𝑥 𝑘+1
2 |𝑌 𝑘
= E 𝑥 𝑘+1 𝑥 𝑘+1|𝑘
− 𝐾 𝑦 𝑘+1 𝑥 𝑘+1|𝑘
2
|𝑌 𝑘
= E 𝑥
= E 𝑘+1 𝑥 𝑘+1|𝑘
− 𝐾 𝑥 𝑘+1
1 − 𝐾 𝑥 𝑘+1 𝑥 𝑘+1|𝑘 𝑥 𝑘+1|𝑘
+ 𝐷𝜔 𝑘+1
− 𝐾𝐷𝜔 𝑘+1
2
|𝑌 𝑘
= 1 − 𝐾
= 1 − 𝐾
2 E 𝑥 𝑘+1 𝑥 𝑘+1|𝑘
2 Σ 𝑘+1|𝑘
+ 𝐾 2 𝐷 2
2
|𝑌 𝑘
+ 𝐾 2 𝐷 2
2
|𝑌 𝑘
24
First Order Condition for k
𝑑
1 − 𝐾 𝑑𝐾 𝑑
= 𝑑𝐾
2 Σ 𝑘+1|𝑘
1 − 2𝐾 + 𝐾 2
+ 𝐾 2 𝐷 2
Σ 𝑘+1|𝑘
= −2 + 2𝐾 Σ 𝑘+1|𝑘
+ 𝐾
+ 2𝐾𝐷 2
2 𝐷 2
= 0
25
Optimal Kalman Filter
𝐾 𝑘+1
=
Σ 𝑘+1|𝑘
Σ 𝑘+1|𝑘
+𝐷 2
26
Updated (a Posteriori) State Estimation
So, we have the “optimal” Kalman updating rule.
𝑥 𝑘+1 𝑥 𝑘+1|𝑘+1 𝑥 𝑘+1|𝑘
+
Σ 𝑘+1|𝑘
Σ 𝑘+1|𝑘
+𝐷 2 𝑥 𝑘+1|𝑘 𝑦 𝑘+1
+ 𝐾 𝑦 𝑘+1 𝑥 𝑘+1|𝑘 𝑥 𝑘+1|𝑘
27
Updated (a Posteriori) Variance
𝑅 𝑘+1
= Σ 𝑘+1|𝑘
= E 𝑥 𝑘+1
= 1 −
Σ 𝑘+1|𝑘
Σ 𝑘+1|𝑘
+𝐷 2
2
Σ 𝑘+1|𝑘
+ 𝑥 𝑘+1
2 |𝑌 𝑘+1
Σ 𝑘+1|𝑘
Σ 𝑘+1|𝑘
+𝐷 2
= 1 − 𝐾
2
𝐷 2
=
𝐷
2
Σ 𝑘+1|𝑘
+𝐷 2
2
Σ 𝑘+1|𝑘
=
𝐷
4
Σ 𝑘+1|𝑘
+𝐷
2
Σ 𝑘+1|𝑘
2
Σ 𝑘+1|𝑘
+𝐷 2 2
+
Σ 𝑘+1|𝑘
Σ 𝑘+1|𝑘
+𝐷 2
2
𝐷 2
=
𝐷
4
Σ 𝑘+1|𝑘
+𝐷
2
Σ 𝑘+1|𝑘
2
Σ 𝑘+1|𝑘
+𝐷 2 2
=
Σ 𝑘+1|𝑘
𝐷
2
𝐷
2
+Σ 𝑘+1|𝑘
Σ 𝑘+1|𝑘
+𝐷 2 2
𝐷
2
= Σ 𝑘+1|𝑘
𝐷 2
2 Σ 𝑘+1|𝑘
+ 𝐾 2 𝐷 2
28
Parameter Estimation
We need to estimate the parameters 𝜗 = 𝐴, 𝐵, 𝐶, 𝐷 from the observable data before we can use the
Kalman filter model.
We need to write down the likelihood function in terms of 𝜗 , and then maximize w.r.t. 𝜗 .
29
Likelihood Function
A likelihood function (often simply the likelihood) is a function of the parameters of a statistical model, defined as follows: the likelihood of a set of parameter values given some observed outcomes is equal to the probability of those observed outcomes given those parameter values.
𝐿 𝜗; 𝑌 = 𝑝 𝑌|𝜗
30
Maximum Likelihood Estimate
We find 𝜗 such that 𝐿 𝜗; 𝑌 is maximized given the observation.
31
Example Using the Normal Distribution
We want to estimate the mean of a sample of size
𝑁 drawn from a Normal distribution.
𝑓 𝑦 =
1
2𝜋𝜎 2 exp − 𝑦−𝜇
2𝜎 2
2 𝜗 = 𝜇, 𝜎
2
𝐿
𝑁 𝜗; 𝑌 =
𝑁 𝑖=1
1
2𝜋𝜎 2 exp − 𝑦 𝑖
−𝜇
2𝜎 2
32
Log-Likelihood
log 𝐿
𝑁 𝜗; 𝑌 =
𝑁 𝑖=1 log
1
2𝜋𝜎 2
− 𝑥 𝑖
−𝜇
2𝜎 2
2
Maximizing the log-likelihood is equivalent to maximizing the following.
− 𝑁 𝑖=1 𝑥 𝑖
− 𝜇 2
First order condition w.r.t., 𝜇 𝜇 =
1
𝑁
𝑁 𝑖=1 𝑥 𝑖
33
Nelder-Mead
After we write down the likelihood function for the
Kalman model in terms of 𝜗 = 𝐴, 𝐵, 𝐶, 𝐷 , we can run any multivariate optimization algorithm, e.g., Nelder-
Mead, to search for 𝜗 .
ma𝑥 𝜗
𝐿 𝜗; 𝑌
The disadvantage is that it may not converge well, hence not landing close to the optimal solution.
34
Marginal Likelihood
For the set of hidden states, 𝑋 𝑡
𝐿 𝜗; 𝑌 = 𝑝 𝑌|𝜗 =
𝑋 𝑝 𝑌, 𝑋|𝜗
, we write
Assume we know the conditional distribution of 𝑋 , we could instead maximize the following.
ma𝑥 𝜗
E
𝑋 ma𝑥 𝜗
E
𝑋
𝐿 𝜗|𝑌, 𝑋 , or log 𝐿 𝜗|𝑌, 𝑋
The expectation is a weighted sum of the (log-) likelihoods weighted by the probability of the hidden states.
35
The Q-Function
Where do we get the conditional distribution of 𝑋 𝑡 from?
Suppose we somehow have an (initial) estimation of the parameters, 𝜗
0
. Then the model has no unknowns.
We can compute the distribution of 𝑋 𝑡
.
𝑄 𝜗|𝜗 𝑡 = E
𝑋|𝑌,𝜗 log 𝐿 𝜗|𝑌, 𝑋
36
EM Intuition
Suppose we know 𝜗 , we know completely about the mode; we can find 𝑋.
Suppose we know 𝑋 , we can estimate 𝜗 , by, e.g., maximum likelihood.
What do we do if we don’t know both 𝜗 and 𝑋 ?
37
Expectation-Maximization Algorithm
Expectation step (E-step): compute the expected value of the log-likelihood function, w.r.t., the conditional distribution of 𝑋 under 𝑌 and 𝜗 .
𝑄 𝜗|𝜗 𝑡 = E
𝑋|𝑌,𝜗 log 𝐿 𝜗|𝑌, 𝑋
Maximization step (M-step): find the parameters, 𝜗 , that maximize the Q-value.
𝜗 𝑡+1 = argmax 𝜗
𝑄 𝜗|𝜗 𝑡
38
EM Algorithms for Kalman Filter
Offline: Shumway and Stoffer smoother approach,
1982
Online: Elliott and Krishnamurthy filter approach,
1999
39
A Trading Algorithm
From 𝑦
0
, 𝑦
1
, …, 𝑦
𝑁
, we estimate 𝜗 𝑁 .
Decide whether to make a trade at 𝑡 = 𝑁 , unwind at 𝑡 = 𝑁 + 1 , or some time later, e.g., 𝑡 = 𝑁 + 𝑇 .
As 𝑦
𝑁+1 arrives, estimate 𝜗 𝑁 + 1 .
Repeat.
40
Results (1)
41
Results (2)
42
Results (3)
43