Introduction to Algorithmic Trading Strategies
Lecture 2
Hidden Markov Trading Model
Haksun Li haksun.li@numericalmethod.com
www.numericalmethod.com
Outline
Carry trade
Momentum
Valuation
CAPM
Markov chain
Hidden Markov model
2
References
Algorithmic Trading: Hidden Markov Models on
Foreign Exchange Data. Patrik Idvall, Conny Jonsson.
University essay from Linköpings universitet/Matematiska institutionen; Linköpings universitet/Matematiska institutionen. 2008.
A tutorial on hidden Markov models and selected applications in speech recognition. Rabiner, L.R.
Proceedings of the IEEE, vol 77 Issue 2, Feb 1989.
3
FX Market
FX is the largest and most liquid of all financial markets – multiple trillions a day.
FX is an OTC market, no central exchanges.
The major players are:
Central banks
Investment and commercial banks
Non-bank financial institutions
Commercial companies
Retails
4
Electronic Markets
Reuters
EBS (Electronic Broking Service)
Currenex
FXCM
FXall
Hotspot
Lava FX
5
Fees
Brokerage
Transaction, e.g., bid-ask
6
Basic Strategies
Carry trade
Momentum
Valuation
7
Carry Trade
Capture the difference between the rates of two currencies.
Borrow a currency with low interest rate.
Buy another currency with higher interest rate.
Take leverage, e.g., 10:1.
Risk: FX exchange rate goes against the trade.
Popular trades: JPY vs. USD, USD vs. AUD
Worked until 2008.
8
Momentum
FX tends to trend.
Long when it goes up.
Short when it goes down.
Irrational traders
Slow digestion of information among disparate participants
9
Purchasing Power Parity
McDonald’s hamburger as a currency.
The price of a burger in the USA = the price of a burger in Europe
E.g., USD1.25/burger = EUR1/burger
EURUSD = 1.25
10
FX Index
Deutsche Bank Currency Return (DBCR) Index
A combination of
Carry trade
Momentum
Valuation
11
CAPM
Individual expected excess return is proportional to the market expected excess return.
𝐸 𝑟 𝑖
− 𝑟 𝑓
= 𝛽 𝑓
𝐸 𝑟
𝑀
− 𝑟 𝑓 𝑟 𝑖
, 𝑟
𝑀 𝑟 𝑓 are geometric returns is an arithmetic return
Sensitivity 𝛽 𝑓
=
𝐶𝑜𝑣 𝑟 𝑖
,𝑟
𝑀
𝑉𝑎𝑟 𝑟
𝑀
12
Alpha
Alpha is the excess return above the expected excess return.
𝛼 = 𝑟 − 𝐸 𝑟 𝑖
For FX, we usually assume 𝑟 𝑓
= 0 .
13
Bayes Theorem
Bayes theorem computes the posterior probability of a hypothesis H after evidence E is observed in terms of the prior probability, 𝑃 𝐻 the prior probability of E, 𝑃 𝐸 the conditional probability of 𝑃 𝐸|𝐻
𝑃 𝐻|𝐸 =
𝑃 𝐸|𝐻 𝑃 𝐻
𝑃 𝐸
14
Markov Chain
15 a11 = 0.4
a12 = 0.2
s1: UP a21 = 0.3
a22 = 0.2
s2:
MEAN-
REVERTIN
G a32 = 0.25
a23 = 0.5
a13 = 0.4
a31 = 0.25
s3:
DOWN a33 = 0.5
Example: State Probability
What is the probability of observing
Ω = 𝑠
3
, 𝑠
1
, 𝑠
1
, 𝑠
1
𝑃 Ω|Model = P 𝑠
3
, 𝑠
1
, 𝑠
1
, 𝑠
1
|Model
= P 𝑠
P 𝑠
1
3
|𝑠
1
|Model × P 𝑠
1
,Model
|𝑠
3
,Model × P 𝑠
1
|𝑠
1
,Model ×
= 1 × 0.25
× 0.4
× 0.4
= 0.04
16
Markov Property
Given the current information available at time 𝑡 − 1 , the history, e.g., path, is irrelevant.
𝑃 𝑞 𝑡
|𝑞 𝑡−1
, ⋯ , 𝑞
1
= 𝑃 𝑞 𝑡
|𝑞 𝑡−1
Consistent with the weak form of the efficient market hypothesis.
17
Hidden Markov Chain
Only observations are observable (duh).
World states may not be known (hidden).
We want to model the hidden states as a Markov Chain.
Two assumptions:
Markov property
𝑃 𝜔 𝑡
|𝑞 𝑡−1
, ⋯ , 𝑞
1
, 𝜔 𝑡−1
, ⋯ , 𝜔
1
= 𝑃 𝜔 𝑡
|𝑞 𝑡
18
Markov Chain
19 a11 = ?
a12 = ?
s1: UP a21 = ?
a22 = ?
s2:
MEAN-
REVERTIN
G a32 = ?
a23 = ?
a13 = ?
a31 = ?
s3:
DOWN a33 = ?
Problems
Likelihood
Given the parameters, λ, and an observation sequence, Ω, compute 𝑃 Ω|𝜆 .
Decoding
Given the parameters, λ, and an observation sequence, Ω, determine the best hidden sequence Q.
Learning
Given an observation sequence, Ω, and HMM structure, learn λ.
20
Likelihood Solutions
21
Likelihood By Enumeration
𝑃 Ω|𝜆 = 𝑞 ′ 𝑠
𝑃 Ω, 𝑄|𝜆
= 𝑞 ′ 𝑠
𝑃 Ω|𝑄, 𝜆 × 𝑃 𝑄|𝜆
𝑃 Ω|𝑄, 𝜆 =
𝑇 𝑡=1
𝑃 𝜔 𝑡
|𝑞 𝑡
, 𝜆
𝑃 𝑄|𝜆 = 𝜋 𝑞
1
× 𝑎 𝑞
1 𝑞
2
× 𝑎 𝑞
2 𝑞
3
× ⋯ × 𝑎 𝑞
𝑇−1
But… this is not computationally feasible.
𝑞
𝑇
22
Forward Procedure
𝛼 𝑡 𝑖 = 𝑃 𝜔
1
, 𝜔
2
, ⋯ , 𝜔 𝑡
, 𝑞 𝑡
= 𝑠 𝑖
|𝜆 the probability of the partial observation sequence until time t and the system in state 𝑠 𝑖 at time t.
Initialization 𝛼
1
𝑖 = 𝜋 𝑖 𝑏 𝑖 𝜔
1 𝑏 𝑖
: the conditional distribution of 𝜔
Induction 𝛼 𝑡+1 𝑗 =
𝑁 𝑖=1 𝛼 𝑡 𝑖 𝑎 𝑖𝑗 𝑏 𝑗 𝜔 𝑡+1
Termination
𝑃 Ω|𝜆 =
𝑁 𝑖=1 𝛼
𝑇 𝑖 , the likelihood
23
Backward Procedure
𝛽 𝑡 𝑖 = 𝑃 𝜔 𝑡+1
, 𝜔 𝑡+2
, ⋯ , 𝜔
𝑇
|𝑞 𝑡
= 𝑠 𝑖
, 𝜆 the probability of the system in state 𝑠 𝑖 at time t, and the partial observations from then onward till time t
Initialization
𝛽
𝑇 𝑖 = 1
Induction 𝛽 𝑡 𝑖 =
𝑁 𝑗=1 𝑎 𝑖𝑗 𝑏 𝑗 𝜔 𝑡+1 𝛽 𝑡+1 𝑗
24
Decoding Solutions
25
Decoding Solutions
Given the observations and model, the probability of the system in state 𝑠 𝑖 is: 𝛾
= 𝑡 𝑖 = 𝑃 𝑞 𝑡
𝑃 𝑞 𝑡
=𝑠 𝑖
,Ω|𝜆
𝑃 Ω|𝜆
= 𝑠 𝑖
|Ω, 𝜆 𝑖
= 𝛼 𝑡 𝑖 𝛽 𝑡
𝑃 Ω|𝜆
= 𝛼 𝑡
𝑁 𝑖=1 𝑖 𝛽 𝛼 𝑡 𝑡 𝑖 𝑖 𝛽 𝑡 𝑖
26
Maximizing The Expected Number Of States
𝑞 𝑡
= argmax
1≤𝑖≤𝑁 𝛾 𝑡 𝑖
This determines the most likely state at every instant, t, without regard to the probability of occurrence of sequences of states.
27
Viterbi Algorithm
The maximal probability of the system travelling these states and generating these observations:
𝛿 𝑡
= max 𝑃 𝑞
1
, 𝑞
2
, ⋯ , 𝑞 𝑡
= 𝑠 𝑖
, 𝜔
0
, ⋯ , 𝜔 𝑡
|𝜆
28
Viterbi Algorithm
Initialization
𝛿
1 𝑖 = 𝜋 𝑖 𝑏 𝑖
Recursion 𝜔
1
𝛿 𝑡
𝑗 = max 𝛿 𝑡−1 𝑖 𝑎 𝑖𝑗 𝑏 𝑗 𝜔 𝑡 the probability of the most probable state sequence for the first t observations, ending in state j 𝜓 𝑡
𝑗 = argmax 𝛿 𝑡−1 the state chosen at t 𝑖 𝑎 𝑖𝑗
Termination
𝑃 ∗ = max 𝛿
𝑇 𝑞 ∗ = argmax 𝛿
𝑇 𝑖 𝑖
29
Learning Solutions
30
As A Maximization Problem
Our objective is to find λ that maximizes 𝑃 Ω|𝜆 .
For any given λ, we can compute 𝑃 Ω|𝜆 .
Then solve a maximization problem.
Algorithm: Nelder-Mead.
31
Baum-Welch
the probability of being in state 𝑠 𝑖 𝑠 𝑗 at time 𝑡 , and state at time 𝑡 + 1 , given the model and the observation sequence
𝜉 𝑡 𝑖, 𝑗 = 𝑃 𝑞 𝑡
= 𝑠 𝑖
, 𝑞 𝑡+1
= 𝑠 𝑗
|Ω, 𝜆
32
Xi
𝜉 𝑡 𝑖, 𝑗 = 𝑃 𝑞 𝑡
= 𝑠 𝑖
, 𝑞 𝑡+1
= 𝑠 𝑗
|Ω, 𝜆
=
𝑃 𝑞 𝑡
=𝑠 𝑖
,𝑞 𝑡+1
=𝑠 𝑗
,Ω|𝜆
𝑃 Ω|𝜆
= 𝛼 𝑡 𝑖 𝑎 𝑖𝑗 𝑏 𝑗 𝜔 𝑡+1
𝑃 Ω|𝜆 𝛽 𝑡+1 𝑗 𝛾 𝑡 𝑖 = 𝑃 𝑞 𝑡
=
𝑁 𝑗=1 𝜉 𝑡 𝑖, 𝑗
= 𝑠 𝑖
|Ω, 𝜆
33
Estimation Equation
By summing up over time,
𝛾 𝑡 𝑖 ~ the number of times 𝑠 𝑖 is visited 𝜉 𝑡 𝑖, 𝑗 state 𝑠 𝑖
~ the number of times the system goes from to state 𝑠 𝑗
Thus, the parameters λ are:
𝜋 𝑖 𝑎 𝑖𝑗 𝑏 𝑗
= 𝛾
1 𝑣
= 𝑘 𝑖 , initial state probabilities
𝑇−1 𝑡=1
𝑇−1 𝑡=1 𝜉 𝑡 𝛾 𝑡 𝑖,𝑗 𝑖
=
, transition probabilities
𝑇−1 𝑡=1,𝜔𝑡=𝑣𝑘
𝑇−1 𝑡=1 𝛾 𝑡 𝛾 𝑡 𝑗 𝑗
, conditional probabilities
34
Estimation Procedure
Guess what λ is.
Compute λ 𝑖 using the estimation equations.
Practically, we can estimate the initial λ by Nelder-
Mead to get “closer” to the solution.
35
Conditional Probabilities
Our formulation so far assumes discrete conditional probabilities.
The formulations that take other probability density functions are similar.
But the computations are more complicated, and the solutions may not even be analytical, e.g., t-distribution.
36
Heavy Tail Distributions
t-distribution
Gaussian Mixture Model a weighted sum of Normal distributions
37
Trading Ideas
Compute the next state.
Compute the expected return.
Long (short) when expected return > (<) 0.
Long (short) when expected return > (<) c.
c = the transaction costs
Any other ideas?
38
Experiment Setup
EURUSD daily prices from 2003 to 2006.
6 unknown factors.
Λ is estimated on a rolling basis.
Evaluations:
Hypothesis testing
Sharpe ratio
VaR
Max drawdown alpha
39
Best Discrete Case
40
Best Continuous Case
41
Results
More data (the 6 factors) do not always help (esp. for the discrete case).
Parameters unstable.
42
TODOs
How can we improve the HMM model(s)? Ideas?
43