Mohamed Saidane 1 and Christian Lavergne 2
1
2
I3M University of Montpellier II CC 051 - 34095 France saidane@math.univ-montp2.fr
I3M Christian.Lavergne@math.univ-montp2.fr
Summary.
In this paper a state-space model based on an underlying hidden
Markov chain model (HMM) with factor analysis observation process is introduced.
The HMM generates a piece-wise constant state evolution process and the observations are produced from the state vectors by a conditionally heteroscedastic factor analysis observation process. An expectation maximization (EM) algorithm based on a switching Kalman filter approach combined with a generalized pseudo-bayesian method is derived for maximum likelihood estimation. The various regimes, the common factors and their volatilities are supposed unobservable and the inference must be carried out from the observable process. Extensive Monte Carlo simulations and preliminary experiments obtained with a dataset on weekly average returns of closing spot prices for eight European currencies show promising results.
Key words: Dynamic Factor Analysis, EM Algorithm, Extended Switching
Kalman filter, GQARCH Processes, HMM.
In the recent two decades, the multifactor analysis has become more and more attractive in the economic literature. In a factor model, the dynamics of multivariate time series can be parsimoniously determined by a small number of factors. Factor analysis was shown useful to understanding the dynamics of financial markets, macroeconomic business cycles and the structure of the consumer demand system.
It has been used in finance and econometrics as an alternative to the Capital Asset
Pricing Model since the early 1960s.
Traditionally, these issues were considered in a static framework, but recently, the emphasis has shifted toward inter-temporal asset pricing models in which agents decisions are based on the distribution of returns conditional on the available information, which is obviously changing. Several researchers have used Factor-ARCH models to provide a plausible and parsimonious parameterization of the time varying covariance structure of asset returns. [ENG92] apply such structures to study the dynamic behavior of the term structure of interest rates. [DEB88] use a latent factor ARCH model to describe the dynamics of exchange rate volatility. [ENG93] use the factor ARCH to test for common volatility in international equity markets.
1558 Mohamed Saidane and Christian Lavergne
In this paper we propose a Markov-switching common factor approach which allows to estimate simultaneously both the common factors, underlying common dynamics of several financial time series, and the conditional regime probabilities corresponding to the states through which the factors evolve. In other words, this approach incorporates nonlinear dynamics into the common factors extraction by combining the conditionally heteroscedastic factor models proposed in the above literature with HMMs. This permits reflecting two defining features of the latent volatility: co-movement among conditionally heteroscedastic financial returns and switching between different unobservable regimes.
The model that we propose supposes that excess returns depend both on unobservable factors that are common across the multivariate time series, and on unobservable different regimes that describe the different states of volatility. In this framework, we allow a dynamic structure for the conditional variances of the underlying factors in order to investigate possible time-varying latent processes, and their implications in modeling changes in covariance matrices over time. This new specification is defined by:
S t
∼ P ( S t
= j/S t − 1
= i ) f s t t = 1 , ..., n and i, j = 1 , ..., m
= H
1 / 2 s t f t
∗ where f
∗ t
∼ N ( 0 , I k
) y t
= X s t f s t
+ ε s t with ε s t
∼ N ( θ s t
, Ψ s t
) where S t
∼ P ( S t
= j/S t − 1
= i ) is an homogenous hidden Markov chain indicating the state or the regime at the date t , and y t is a ( q × 1) random vector of observable variables (financial returns in our case). The HMM state transition probabilities from state i to state j are represented by p ij
. In an unspecified state S t
1 , ..., m ), X j are the ( q × k ) factor loadings matrices; θ j and Ψ j
= j ( j = are, respectively, the
( q × 1) mean vectors and ( q × q ) diagonal and definite-positive covariance matrices of the ( q × 1) vectors of idiosyncratic noises ε t
; 0 and H jt are, respectively, the
( k × 1) mean vectors and ( k × k ) diagonal and definite-positive covariance matrices of the latent common factors f jt
. We suppose here that the common variances are time varying and their parameters change according to the regime. In particular, we suppose that the variances of the common factors follow switching Generalized
Quadratic Autoregressive Conditionally Heteroskedastic processes, GQARCH(1,1), the l-th element of matrix H jt under an unspecified regime S t
= j since S t − 1
= i being h
( j ) lt
= w l j
+ γ l j f
( i ) lt − 1
+ α l j f
( i ) 2 lt − 1
+ δ l j h
( i ) lt − 1 for l = 1 , ..., k . To guarantee the identification of the model, we suppose that q ≥ k and rank ( X j
) = k ∀ j . We suppose also that the common and idiosyncratic factors are uncorrelated, and that f t and ε t ′ are mutually independent for all t, t ′ .
The model developed above can be regarded as a random field with indices i =
1 , ..., q , t = 1 , ..., n and j = 1 , ..., m . Therefore, it has a switching state-space repre-
Conditionally heteroskedastic factorial HMMs for time series in finance 1559 sentation, with f t as the continuous state variables. The measurement and transition equations are, respectively, given by: y t
= θ s t
+ X s t f s t
+ ε s t f s t
= 0 .
f s t − 1
+ f s t
For the implementation of the filtering and smoothing algorithms, we start by introducing some notation.
f i ( j ) t/τ
= E [ f t
/ Y
1: τ
, S t − 1
= i, S t
= j ] f
( j ) k t/τ
= E [ f t
/ Y
1: τ
, S t
= j, S t +1
= k ] f j t/τ h j lt/τ
= E [ f t
/ Y
1: τ
, S t
= j ]
= V ar ( f lt
/ Y
1: τ
, S t
= j ) h i ( j ) lt/t − 1
= V ar ( f lt
/ Y
1: t − 1
, S t − 1
= i, S t
= j )
M t − 1 ,t/τ
( i, j ) = p ( S t − 1
= i, S t
= j/ Y
1: τ
)
M t/τ
( j ) = p ( S t
= j/ Y
1: τ
)
We perform the following steps in sequence.
i ( j ) f t/t − 1
= 0 .
f i t − 1 /t − 1
= 0 ∀ i, j = 1 , ..., m and h i ( j ) lt/t − 1
= w lj
+ γ lj f i lt − 1 /t − 1
+ α lj h f i 2 lt − 1 /t − 1
+ h i lt − 1 /t − 1 i
+ δ lj h i lt − 1 /t − 2
(1)
(2)
Then we compute the prediction error e t
( i, j ) = y t
− θ j
− X j f i ( j ) t/t − 1
, the variance of the error Σ i ( j ) t/t − 1
= X j
H i ( j ) t/t − 1
X
′ j
+ Ψ j
, the Kalman gain matrix K t
H i ( j ) t/t − 1
X
′ j
Σ i ( j ) − 1 t/t − 1
, the likelihood of this observation L t
( i, j ) = N we update our estimates of the mean and variance: h
0 , Σ i ( j )
( i, j ) = t/t − 1 i and f i ( j ) t/t
H i ( j ) t/t
= f i ( j ) t/t − 1
+ K t
( i, j ) e t
( i, j )
= H i ( j ) t/t − 1
− K t
( i, j ) Σ i ( j ) t/t − 1
K t
( i, j ) ′
(3)
(4)
The fundamental problem with switching Kalman filters is that the belief state grows exponentially with time. To dealing with this problem we have used the collapsing technique. This method consists in approximating the mixture of m t
Gaussians with a mixture of r Gaussians. This is called the Generalized Pseudo
Bayesian algorithm of order r (see e.g., [BAR93], [KIM94]). When r = 1, we approximate a mixture of Gaussians with a single Gaussian using moment matching; this can be shown (e.g., [LAU96]) to be the best (in the Kullback-Leibler sense) single Gaussian approximation. For the implementation of this algorithm we calculate the probabilities
M t/t
( j ) = m
P i =1
M
Z i/j
( t ) = p ( S t − 1 t − 1 ,t/t
( i, j ) and
= i/S t
= j, Y
1: t
) = M t − 1 ,t/t
( i, j ) /M t/t
( j ), where
1560 Mohamed Saidane and Christian Lavergne
M t − 1 ,t/t
( i, j ) = m
P
L t
( i, j ) p ij
M t − 1 /t − 1 m
P
( i )
L t
( i, j ) p ij
M t − 1 /t − 1
( i ) i =1 j =1
Finally, we update our estimates of the mean and volatilities.
h j f j t/t h j lt/t lt/t − 1
=
= m
X
Z i/j
( t ) f i ( j ) t/t i =1 m
X i =1
Z i/j
( t ) h i ( j ) lt/t
= m
X
Z i/j
( t ) h i ( j ) lt/t − 1 i =1
+ m
X
Z i/j
( t ) h f i ( j ) lt/t i =1
− f
+ m
X
Z i/j
( t ) h f i ( j ) lt/t − 1 i =1 j lt/t i
− f h f i ( j ) lt/t j lt/t − 1
− f j lt/t i
′ i h f i ( j ) lt/t − 1
− f j lt/t − 1 i
′
Given the degenerate nature of the (time-series) transition equation, the smoother gain matrix is always null, hence smoothing is unnecessary in this case: f
( j ) k t/n
= f j t/t and H
( j ) k t/n
M t,t +1 /n
= H j t/t
. For updating the parameters, we have need of the probabilities: m
P ( j, k ) = U j/k t/t +1
M t +1 /n
( k ) and M t/n
( j ) = M t,t +1 /n
( j, k ), where
U j/k t/t +1
= p ( S t
= j/S t +1
= k, Y
1: n
) ≃ m
P
M t/t k =1
( j ) p jk
M t/t
( j ′ ) p j ′ k the approximation arises because S t j ′ =1 is not conditionally independent of the future evidence y t +1
, ..., y n
, given S t +1
. This approximation will not be too bad provided future evidence does not contain much information about S t tained in S t +1
.
beyond what is con-
The joint likelihood of the observations sequence Y , the continuous state vector sequence F and the HMM state sequence S is given by: p ( Y , F , S ) = p ( S
1
) n
Y p ( S t
/S t − 1
) n
Y p ( f t
/S t
, D
1: t − 1
) p ( y t
/ f t
, S t
, D
1: t − 1
) t =2 t =1 where D
1: t − 1
= {Y
1: t − 1
, F
1: t − 1
, S
1: t − 1
} , is the information set at time t − 1, p ( S
1
) = π s
1
: the initial state probability and p ( S t
/S t − 1
) = p s t − 1 s t
: the transition probabilities. The model parameters can be obtained by maximizing the conditional expectation of this complete log-likelihood function with respect to the subset of parameters Θ
1 j
= { π j
, p ij
, X j
, θ j
, Ψ j
} .
π b j
=
M
1 /n
( j ) m
P i =1
M
1 /n
( i ) and p b ij
= n
P M t − 1 ,t/n
( i, j ) t =2 n
P M t − 1 /n
( i ) t =2
Conditionally heteroskedastic factorial HMMs for time series in finance 1561 x jl
= b
" n
X t =1
M t/n
( j )( y tl
− θ jl
) f j t/n
#
′
" n
X t =1
M t/n
( j ) h
H j t/n
+ f j t/n f j ′ t/n i
#
− 1
θ j
= n
P t =1
M t/n
( j )( y t
− X n
P t =1
M t/n
( j ) j f j t/n
)
Ψ j
= n
P t =1
M t/n
( j ) diag h y jt
− X j f e j t/n y e jt n
P t =1
M t/n
( j )
− X j f j t/n
′
+ X j
H j t/n
X
′ j i where y jt
= ( y t
− θ j
), x jl is the l-th row vector of X j
, y tl and θ jl are, respectively, e the l -th elements of the current observation and the observation noise mean vectors under regime j .
j, S
Now, being given the new values above and the fact that y t
/ Y
1: t − 1
, S t
= h i
1: t − 1
≈ N θ j
, Σ j t/t − 1
, where Σ j t/t − 1
= X j
H j t/t − 1
X
′ j
+ Ψ j
, the parameters
Θ
2 j
= { w jl
, γ jl
, α jl
, δ jl
} for j = 1 , ..., m can be updated by maximizing (using a
Newton-Raphson algorithm) the observed log-likelihood function:
L ∗
= c −
1 n m
X X
2 p ( S t t =1 j =1
= j/ D
1: t − 1
) h log | Σ j t/t − 1
| + ( y t
− θ j
)
′
Σ j − 1 t/t − 1
( y t
− θ j
) i
However, for the implementation of the optimization algorithm it is necessary to identify the optimal sequence of the hidden states. This sequence can be carried out either by using the probabilities M t/n
( j ) given by the smoothing algorithm, or by an approximated version of the Viterbi algorithm.
The simulations we now present are based on models with q = 6 observable variables, only one GQARCH(1,1) common factor and two hidden Markovian states (the regime switching date is t ∗ = n/ 2+1). The iterations of the EM algorithm stop when the relative change in each component of the values of the estimated parameters, are all smaller than a threshold value = 10 − 4
(the initial and transition probabilities are excluded). The initial parameters for the EM algorithm, were obtained by randomly perturbing the true parameter values by up to 20% of their true value.
To study the behavior of the estimates when the size of the sequence n increases, we have used the empirical Kullback-Leibler divergence of estimators from the true parameters:
K n
( Θ
0
, Θ ) def
=
1 n n log L ( y
1
, ...., y n
; Θ
0
) − log L ( y
1
, ...., y n
; Θ ) o
1562 Mohamed Saidane and Christian Lavergne
0.12
0.1
0.08
0.06
0.04
0.02
n = 800 n = 1000 n = 1200 n = 1300 n = 1500
0
0 10 20 30 40 50 60 70 80 90 100
Fig. 1.
Box plots of e
( Θ
0
, e n
). The sets of distances for the various values of n clearly show an increasing accuracy and stability of the estimators as n increases.
for each value of n , the estimation procedure was carried out a hundred times, and the distances e n
( Θ
0
, e n
) between each of the hundred estimators and the true parameter Θ
0 were evaluated on a new sequence.
To investigate the asymptotic distribution of e n
, we have used the Shapiro-
Wilk statistic in order to test the univariate normality of each component of e n
.
The results suggest that for large n , n ≥ 600 the estimates distribution tends to be normal.
For model selection we have put in competition factor models which differ by their hidden structures. The first is the true conditionally heteroskedastic factorial
HMM described above M
1
. Model M
2 is a conditionally heteroskedastic factorial
HMM with 2 hidden states and 2 common factors. Model M
3 is a conditionally heteroskedastic factorial HMM with 3 hidden states and 1 common factor.
M
4 and
M
5 are non standrad and standard models with only one latent factor and 2 hidden markovian states. The last model is a GQARCH(1,1) conditionally heteroskedastic factor model without regime switching M
6
. In this experiment, we used n = 800 and
AIC, BIC and ICL critera. The ICL criterion, is based on the maximization of the integrated complete log-likelihood function. For model M , the ICL is: ICL( M ) =
BIC( M ) + log p ( Z|Y , b
M
), where Z indicates the continuous and discret hidden variables. In this case, log p ( Z|Y , b
M
) is a measurement of the missing information carried by the model M .
Table 1.
Model Selection. Number of times when each factor model reaches the minimal criterion for 100 replications.
AIC
BIC
ICL
M
1
90
96
96
M
10
4
4
2
M
0
0
0
3
M
0
0
0
4
M
0
0
0
5
M
0
0
0
6
Simulation results provide empirical evidence of the usefulness of the estimates.
In the cases we examined we saw that our model offers a significant improvement in fit over the simpler standard and conditionally heteroskedastic models without regime switching. The last simulation exercise show that the conditionally heteroskedastic factor model without regime switching tends to overestimate the com-
Conditionally heteroskedastic factorial HMMs for time series in finance 1563 mon volatility, the correlation structure and the proportion of the time series variances explained by each of the factors. However, the standard factor analysis with and without regime switching leads to an underestimate.
We have applied our model also to learn and analyze the co-movements amongst several exchange rate returns during the financial crisis that the European exchange markets has faced in fall 1992. The time series considered here are the weekly average returns of closing spot prices relative to the US Dollar of the French Francs (FRF),
Swiss Francs (CHF), Italian Lira (ITL), German Marks (DEM), Belgian Francs
(BEF), Spanish Pesetas (ESP), Swedish Krona (SEK), and British Pounds (GBP) from 07/17/1985 to 01/22/1997 (600 observations). We have considered models with
1, 2 and 3 conditionally heteroskedastic latent factors within a structure characterized by 1, 2 and 3 latent regimes. All the selection criteria argue that the time varying covariance structure could be modeled by two conditionally heteroskedastic common factors and two markovian regimes. The better fit of this new specification over the one-state standard and conditionally heteroskedastic factor models and multi-state standard models support the idea that all or a subset of parameters are not fixed in time.
3
2
1
0
−1
−2
−3
−4
07/85
1
0.5
0
07/85
1
0.5
FRF
CHF
ITL
DEM
BEF
ESP
SEK
GBP
06/87
06/87
05/89
05/89
04/91
04/91
03/93
03/93
02/95
02/95
01/97
01/97
0
07/85 06/87 05/89 04/91 03/93 02/95 01/97
Fig. 2.
Graphic 1: Standardized prices of the different currencies. The vertical line represents t = 31 / 08 / 1992 (beginning of the crisis).
Graphic 2: Posterior probabilities of the two hidden states M t/n
( j ) given by the smoothing algorithm. This figure shows that our model is able to detect the regime switching date.
From a broader viewpoint, this study illustrates the usefulness of Markov regime switching models in the analysis of process that exhibit only local homogeneity. Such complex process can be found in a variety of scientific fields, and we believe the ideas presented here can be successfully applied in many such contexts. An interesting direction for further research is the generalization of this model to one where one
1564 Mohamed Saidane and Christian Lavergne allows the idiosyncratic variances to be a stochastic function of time. One can also think of the case where the state transition probabilities are not homogeneous in time, but depend on the previous state and the previously observed covariates levels.
The study of such models would provide a further step in the extension of hidden
Markov models to conditionally heteroskedastic factor analysis and allow for further flexibility in applications.
[BAR93] Bar-Shalom, Y., Li, X.R.: Estimation and tracking: principles, techniques and software. Boston, London: Artech House Inc., (1993)
[DEB88] Diebold F., Nerlove M.: The Dynamics of Exchange Rate Volatility: A
Multivariate Latent Factor ARCH Model. Journal of Applied Econometrics., 4 , 1–22 (1988)
[ENG93] Engle R. F., Susmel R.: Common Volatility in International Equity Markets. Journal of Business and Economic Statistics., 11 , 369–380 (1993)
[ENG92] Engle R., Ng V. K., Rothschild M.: A Multi-Dynamic Factor Model for
Stock Returns. Journal of Econometrics., 52 , 245–266 (1992)
[KIM94] Kim, C. J: Dynamic linear models with Markov switching. Journal of
Econometrics., 60 , 1–22 (1994)
[LAU96] Lauritzen S.: Graphical Models. OUP (1996)