t+1

Introduction to Algorithmic Trading Strategies

Lecture 5

Pairs Trading by Stochastic Spread Methods

Haksun Li haksun.li@numericalmethod.com

www.numericalmethod.com

Outline

 First passage time

 Kalman filter

 Maximum likelihood estimate

 EM algorithm

2









References

As the emphasis of the basic co-integration methods of most papers are on the construction of a synthetic mean-reverting asset, the stochastic spread methods focuses on the dynamic of the price of the synthetic asset.



Most referenced academic paper: Elliot, van der Hoek, and Malcolm,

2005, Pairs Trading

Model the spread process as a state-space version of Ornstein-Uhlenbeck process





Jonathan Chiu, Daniel Wijaya Lukman, Kourosh Modarresi, Avinayan

Senthi Velayutham. High-frequency Trading. Stanford University. 2011

The idea has been conceived by a lot of popular pairs trading books

Technical analysis and charting for the spread, Ehrman, 2005, The

Handbook of Pairs Trading

ARMA model, HMM ARMA model, some non-parametric approach, and a

Kalman filter model, Vidyamurthy, 2004, Pairs Trading: Quantitative

Methods and Analysis

3

Spread as a Mean-Reverting Process







 𝑥 𝑘

− 𝑥 𝑘−1

= 𝑎 − 𝑏𝑥 𝑘−1

= 𝑏 𝑎 𝑏

− 𝑥 𝑘−1 𝜏 + 𝜎 𝜏𝜀

The long term mean = 𝑎 𝑏

.

𝑘 𝜏 + 𝜎 𝜏𝜀 𝑘

The rate of mean reversion = 𝑏 .

4

Sum of Power Series





We note that

= 𝑘−1 𝑖=0 𝑎 𝑖 = 𝑎 𝑘

−1 𝑎−1

5

Unconditional Mean

















𝐸 𝑥 𝑘

= 𝜇 𝑘

= 𝜇 𝑘−1

+ 𝑎 − 𝑏𝜇

= 𝑎𝜏 + 1 − 𝑏𝜏 𝜇 𝑘−1 𝑘−1 𝜏

= 𝑎𝜏 + 1 − 𝑏𝜏 𝑎𝜏 + 1 − 𝑏𝜏 𝜇 𝑘−2

= 𝑎𝜏 + 1 − 𝑏𝜏 𝑎𝜏 + 1 − 𝑏𝜏 2 𝜇 𝑘−2

= 𝑘−1 𝑖=0

1 − 𝑏𝜏 𝑖 𝑎𝜏 + 1 − 𝑏𝜏 𝑘 𝜇

0

= 𝑎𝜏

1− 1−𝑏𝜏

1− 1−𝑏𝜏 𝑘

+ 1 − 𝑏𝜏 𝑘 𝜇

0

= 𝑎𝜏

1− 1−𝑏𝜏 𝑏𝜏 𝑎 𝑎

= − 𝑘

1 − 𝑏𝜏 𝑏 𝑏

+ 1 − 𝑏𝜏 𝑘 𝑘 𝜇

0

+ 1 − 𝑏𝜏 𝑘 𝜇

0

6

Long Term Mean



 𝑎 𝑏

→ 𝑎

− 𝑎 𝑏 𝑏

1 − 𝑏𝜏 𝑘 + 1 − 𝑏𝜏 𝑘 𝜇

0

7

Unconditional Variance











Var 𝑥 𝑘

= 𝜎 𝑘

2

= 1 − 𝑏𝜏

= 1 − 𝑏𝜏

= 1 − 𝑏𝜏

2 𝜎 𝑘−1

2

2

+ 𝜎

1 − 𝑏𝜏 2

2 𝜏 𝜎 𝑘−2

2

2 𝜎 𝑘−1

2

+ 𝜎 2

+ 𝜎 2 𝜏 𝜏 + 𝜎 2 𝜏

= 𝜎 2 𝜏 𝑘−1 𝑖=0

1 − 𝑏𝜏 2𝑖 + 1 − 𝑏𝜏 2𝑘 𝜎

0

2

= 𝜎 2 𝜏

1− 1−𝑏𝜏

1− 1−𝑏𝜏

2𝑘

2

+ 1 − 𝑏𝜏 2𝑘 𝜎

0

2

8

Long Term Variance



 𝜎 2 𝜏

1− 1−𝑏𝜏

1− 1−𝑏𝜏

2𝑘

2 𝜎

2 𝜏

+ 1 − 𝑏𝜏

→

1− 1−𝑏𝜏 2

2𝑘 𝜎

0

2

9

Observations and Hidden State Process











The hidden state process is: 𝑥 𝑘

= 𝑥 𝑘−1

+ 𝑎 − 𝑏𝑥 𝑘−1

= 𝑎𝜏 + 1 − 𝑏𝜏 𝑥 𝑘−1

+ 𝜎 𝜏𝜀 𝑘

= 𝐴 + 𝐵𝑥 𝑘−1

+ 𝐶𝜀 𝑘

𝐴 ≥ 0, 0 < 𝐵 < 1 𝜏 + 𝜎 𝜏𝜀 𝑘





The observations:

 𝑦 𝑘

= 𝑥 𝑘

+ 𝐷𝜔 𝑘

We want to compute the expected state from observations.

 𝑥 𝑘 𝑥 𝑘|𝑘

= 𝐸 𝑥 𝑘

|𝑌 𝑘

10

First Passage Time

 Standardized Ornstein-Uhlenbeck process

 𝑑𝑍 𝑡 = −𝑍 𝑡 𝑑𝑡 + 2𝑑𝑊 𝑡





First passage time

 𝑇

0,𝑐

= inf 𝑡 ≥ 0, 𝑍 𝑡 = 0|𝑍 0 = 𝑐



The pdf of 𝑇

0,𝑐

𝑡 =

1

2 ln 1 +

1

2 has a maximum value at 𝑐 2 − 3 2 + 4𝑐 2 + 𝑐 2 − 3

11

A Sample Trading Strategy







 𝑥 𝑘

= 𝑥 𝑘−1

+ 𝑎 − 𝑏𝑥 𝑘−1 𝜏 + 𝜎 𝜏𝜀 𝑘 𝑑𝑋 𝑡 = 𝑎 − 𝑏𝑋 𝑡 𝑑𝑡 + 𝜎𝑑𝑊 𝑡

𝑋 0 = 𝜇 + 𝑐 𝜎

2𝜌

, 𝑋 𝑇 = 𝜇

1

𝑇 = 𝜌

𝑡

 Buy when 𝑦 𝑘

< 𝜇 − 𝑐 𝜎

2𝜌 unwind after time 𝑇

 Sell when 𝑦 𝑘

> 𝜇 + 𝑐 𝜎

2𝜌 unwind after time 𝑇

12

Kalman Filter

 The Kalman filter is an efficient recursive filter that estimates the state of a dynamic system from a series of incomplete and noisy measurements.

13

Conceptual Diagram as new measurements come in prediction at time t Update at time t+1 correct for better estimation

14

A Linear Discrete System



 𝑥 𝑘

= 𝐹 𝑘 𝑥 𝑘−1

+ 𝐵 𝑘 𝑢 𝑘

+ 𝜔 𝑘

𝐹 𝑘

: the state transition model applied to the previous state





𝐵 𝑘

: the control-input model applied to control vectors 𝜔 𝑘

~𝑁 0, 𝑄 𝑘

: the noise process drawn from multivariate Normal distribution

15

Observations and Noises



 𝑧 𝑘

= 𝐻 𝑘 𝑥 𝑘

+ 𝑣 𝑘

𝐻 𝑘

: the observation model mapping the true states to observations

 𝑣 𝑘

~𝑁 0, 𝑅 𝑘

: the observation noise

16

Discrete System Diagram

17

Prediction

 predicted a prior state estimate

 𝑥 𝑘|𝑘−1

= 𝐹 𝑘 𝑥 𝑘−1|𝑘−1

+ 𝐵 𝑘 𝑢 𝑘



 predicted a prior estimate covariance

𝑃 𝑘|𝑘−1

= 𝐹 𝑘

𝑃 𝑘−1|𝑘−1

𝐹 𝑘

𝑇

+ 𝑄 𝑘

18

Update

 measurement residual

 𝑦 𝑘

= 𝑧 𝑘

− 𝐻 𝑘 𝑥 𝑘|𝑘−1



 residual covariance

𝑆 𝑘

= 𝐻 𝑘

𝑃 𝑘|𝑘−1

𝐻 𝑘

𝑇

+ 𝑅 𝑘



 optimal Kalman gain

𝐾 𝑘

= 𝑃 𝑘|𝑘−1

𝐻 𝑘

𝑇

𝑆 𝑘

−1

 updated a posteriori state estimate

 𝑥 𝑘|𝑘 𝑥 𝑘|𝑘−1

+ 𝐾 𝑘 𝑦 𝑘



 updated a posteriori estimate covariance

𝑃 𝑘|𝑘

= 𝐼 − 𝐾 𝑘

𝐻 𝑘

𝑃 𝑘|𝑘−1

19

Computing the ‘Best’ State Estimate







Given 𝐴 , 𝐵 , 𝐶 , 𝐷 , we define the conditional variance

𝑅 𝑘

= Σ 𝑘|𝑘

≡ E 𝑥 𝑘 𝑥 𝑘

2 |𝑌 𝑘

Start with 𝑥

0|0

= 𝑦

0

, 𝑅

0

= 𝐷 2

.

20

Predicted (a Priori) State Estimation

 𝑥 𝑘+1|𝑘











= E 𝑥 𝑘+1

|𝑌 𝑘

= E 𝐴 + 𝐵𝑥 𝑘

+ 𝐶𝜀 𝑘+1

|𝑌 𝑘

= E 𝐴 + 𝐵𝑥 𝑘

|𝑌 𝑘

= 𝐴 + 𝐵 E 𝑥 𝑘

|𝑌 𝑘 𝑥 𝑘|𝑘

21

Predicted (a Priori) Variance







Σ 𝑘+1|𝑘

= E 𝑥 𝑘+1

= E 𝐴 + 𝐵𝑥 𝑘 𝑥 𝑘+1

+ 𝐶𝜀

2 |𝑌 𝑘 𝑘+1 𝑥 𝑘+1

2 |𝑌 𝑘

 = E 𝐴 + 𝐵𝑥 𝑘

+ 𝐶𝜀 𝑘+1

2

|𝑌 𝑘 𝑥 𝑘|𝑘

 = E 𝐵𝑥 𝑘 𝑥 𝑘|𝑘

+ 𝐶𝜀 𝑘+1

 = E 𝐵𝑥 𝑘 𝑥 𝑘|𝑘

 = 𝐵 2 Σ 𝑘|𝑘

+ 𝐶 2

2

+ 𝐶 2 𝜀 2 𝑘+1

|𝑌 𝑘

2

|𝑌 𝑘

22

Minimize Posteriori Variance

 Let the Kalman updating formula be

 𝑥 𝑘+1 𝑥 𝑘+1|𝑘+1 𝑥 𝑘+1|𝑘

+ 𝐾 𝑦 𝑘+1 𝑥 𝑘+1|𝑘





We want to solve for K such that the conditional variance is minimized.

Σ 𝑘+1|𝑘

= E 𝑥 𝑘+1 𝑥 𝑘+1

2 |𝑌 𝑘

23





Solve for K

E 𝑥 𝑘+1 𝑥 𝑘+1

2 |𝑌 𝑘

= E 𝑥 𝑘+1 𝑥 𝑘+1|𝑘

− 𝐾 𝑦 𝑘+1 𝑥 𝑘+1|𝑘

2

|𝑌 𝑘









= E 𝑥

= E 𝑘+1 𝑥 𝑘+1|𝑘

− 𝐾 𝑥 𝑘+1

1 − 𝐾 𝑥 𝑘+1 𝑥 𝑘+1|𝑘 𝑥 𝑘+1|𝑘

+ 𝐷𝜔 𝑘+1

− 𝐾𝐷𝜔 𝑘+1

2

|𝑌 𝑘

= 1 − 𝐾

= 1 − 𝐾

2 E 𝑥 𝑘+1 𝑥 𝑘+1|𝑘

2 Σ 𝑘+1|𝑘

+ 𝐾 2 𝐷 2

2

|𝑌 𝑘

+ 𝐾 2 𝐷 2

2

|𝑌 𝑘

24

First Order Condition for k







 𝑑

1 − 𝐾 𝑑𝐾 𝑑

= 𝑑𝐾

2 Σ 𝑘+1|𝑘

1 − 2𝐾 + 𝐾 2

+ 𝐾 2 𝐷 2

Σ 𝑘+1|𝑘

= −2 + 2𝐾 Σ 𝑘+1|𝑘

+ 𝐾

+ 2𝐾𝐷 2

2 𝐷 2

= 0

25

Optimal Kalman Filter

 𝐾 𝑘+1

=

Σ 𝑘+1|𝑘

Σ 𝑘+1|𝑘

+𝐷 2

26

Updated (a Posteriori) State Estimation

 So, we have the “optimal” Kalman updating rule.



 𝑥 𝑘+1 𝑥 𝑘+1|𝑘+1 𝑥 𝑘+1|𝑘

+

Σ 𝑘+1|𝑘

Σ 𝑘+1|𝑘

+𝐷 2 𝑥 𝑘+1|𝑘 𝑦 𝑘+1

+ 𝐾 𝑦 𝑘+1 𝑥 𝑘+1|𝑘 𝑥 𝑘+1|𝑘

27

Updated (a Posteriori) Variance















𝑅 𝑘+1

= Σ 𝑘+1|𝑘

= E 𝑥 𝑘+1

= 1 −

Σ 𝑘+1|𝑘

Σ 𝑘+1|𝑘

+𝐷 2

2

Σ 𝑘+1|𝑘

+ 𝑥 𝑘+1

2 |𝑌 𝑘+1

Σ 𝑘+1|𝑘

Σ 𝑘+1|𝑘

+𝐷 2

= 1 − 𝐾

2

𝐷 2

=

𝐷

2

Σ 𝑘+1|𝑘

+𝐷 2

2

Σ 𝑘+1|𝑘

=

𝐷

4

Σ 𝑘+1|𝑘

+𝐷

2

Σ 𝑘+1|𝑘

2

Σ 𝑘+1|𝑘

+𝐷 2 2

+

Σ 𝑘+1|𝑘

Σ 𝑘+1|𝑘

+𝐷 2

2

𝐷 2

=

𝐷

4

Σ 𝑘+1|𝑘

+𝐷

2

Σ 𝑘+1|𝑘

2

Σ 𝑘+1|𝑘

+𝐷 2 2

=

Σ 𝑘+1|𝑘

𝐷

2

𝐷

2

+Σ 𝑘+1|𝑘

Σ 𝑘+1|𝑘

+𝐷 2 2

𝐷

2

= Σ 𝑘+1|𝑘

𝐷 2

2 Σ 𝑘+1|𝑘

+ 𝐾 2 𝐷 2

28

Parameter Estimation

 We need to estimate the parameters 𝜗 = 𝐴, 𝐵, 𝐶, 𝐷 from the observable data before we can use the

Kalman filter model.

 We need to write down the likelihood function in terms of 𝜗 , and then maximize w.r.t. 𝜗 .

29

Likelihood Function

 A likelihood function (often simply the likelihood) is a function of the parameters of a statistical model, defined as follows: the likelihood of a set of parameter values given some observed outcomes is equal to the probability of those observed outcomes given those parameter values.

 𝐿 𝜗; 𝑌 = 𝑝 𝑌|𝜗

30

Maximum Likelihood Estimate

 We find 𝜗 such that 𝐿 𝜗; 𝑌 is maximized given the observation.

31

Example Using the Normal Distribution









We want to estimate the mean of a sample of size

𝑁 drawn from a Normal distribution.

𝑓 𝑦 =

1

2𝜋𝜎 2 exp − 𝑦−𝜇

2𝜎 2

2 𝜗 = 𝜇, 𝜎

2

𝐿

𝑁 𝜗; 𝑌 =

𝑁 𝑖=1

1

2𝜋𝜎 2 exp − 𝑦 𝑖

−𝜇

2𝜎 2

32

Log-Likelihood





 log 𝐿

𝑁 𝜗; 𝑌 =

𝑁 𝑖=1 log

1

2𝜋𝜎 2

− 𝑥 𝑖

−𝜇

2𝜎 2

2



Maximizing the log-likelihood is equivalent to maximizing the following.

− 𝑁 𝑖=1 𝑥 𝑖

− 𝜇 2



First order condition w.r.t., 𝜇 𝜇 =

1

𝑁

𝑁 𝑖=1 𝑥 𝑖

33

Nelder-Mead





After we write down the likelihood function for the

Kalman model in terms of 𝜗 = 𝐴, 𝐵, 𝐶, 𝐷 , we can run any multivariate optimization algorithm, e.g., Nelder-

Mead, to search for 𝜗 .

 ma𝑥 𝜗

𝐿 𝜗; 𝑌

The disadvantage is that it may not converge well, hence not landing close to the optimal solution.

34

Marginal Likelihood









For the set of hidden states, 𝑋 𝑡

𝐿 𝜗; 𝑌 = 𝑝 𝑌|𝜗 =

𝑋 𝑝 𝑌, 𝑋|𝜗

, we write

Assume we know the conditional distribution of 𝑋 , we could instead maximize the following.



 ma𝑥 𝜗

E

𝑋 ma𝑥 𝜗

E

𝑋

𝐿 𝜗|𝑌, 𝑋 , or log 𝐿 𝜗|𝑌, 𝑋

The expectation is a weighted sum of the (log-) likelihoods weighted by the probability of the hidden states.

35

The Q-Function

 Where do we get the conditional distribution of 𝑋 𝑡 from?





Suppose we somehow have an (initial) estimation of the parameters, 𝜗

0

. Then the model has no unknowns.

We can compute the distribution of 𝑋 𝑡

.

𝑄 𝜗|𝜗 𝑡 = E

𝑋|𝑌,𝜗 log 𝐿 𝜗|𝑌, 𝑋

36

EM Intuition





Suppose we know 𝜗 , we know completely about the mode; we can find 𝑋.

Suppose we know 𝑋 , we can estimate 𝜗 , by, e.g., maximum likelihood.

 What do we do if we don’t know both 𝜗 and 𝑋 ?

37

Expectation-Maximization Algorithm







Expectation step (E-step): compute the expected value of the log-likelihood function, w.r.t., the conditional distribution of 𝑋 under 𝑌 and 𝜗 .

𝑄 𝜗|𝜗 𝑡 = E

𝑋|𝑌,𝜗 log 𝐿 𝜗|𝑌, 𝑋



Maximization step (M-step): find the parameters, 𝜗 , that maximize the Q-value.

𝜗 𝑡+1 = argmax 𝜗

𝑄 𝜗|𝜗 𝑡

38

EM Algorithms for Kalman Filter

 Offline: Shumway and Stoffer smoother approach,

1982

 Online: Elliott and Krishnamurthy filter approach,

1999

39

A Trading Algorithm









From 𝑦

0

, 𝑦

1

, …, 𝑦

𝑁

, we estimate 𝜗 𝑁 .

Decide whether to make a trade at 𝑡 = 𝑁 , unwind at 𝑡 = 𝑁 + 1 , or some time later, e.g., 𝑡 = 𝑁 + 𝑇 .

As 𝑦

𝑁+1 arrives, estimate 𝜗 𝑁 + 1 .

Repeat.

40

Results (1)

41

Results (2)

42

Results (3)

43

t+1

Related documents

Products

Support

t+1

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib