Maximum Likelihood Estimation of Hidden Markov Processes Halina Frydman and Peter Lakner New York University June 11, 2001 Abstract We consider the process dYt = utdt + dWt; where u is a process not necessarily adapted to F Y (the …ltration generated by the process Y ) and W is a Brownian Motion. We obtain a general representation for the likelihood ratio of the law of the Y process relative to Brownian measure. This representation involves only one basic …lter (expectation of u conditional on observed process Y ): This generalizes the result of Kailath and Zakai (1971) where it is assumed that the process u is adapted to F Y : In particular, we consider the model in which u is a functional of Y and of a random element X which is independent of the Brownian Motion W: For example X could be a di¤usion or a Markov chain. This result can be applied to the estimation of an unknown multidimensional parameter µ appearing in the dynamics of the process u based on continuous observation of Y on the time interval [0,T]. For a speci…c hidden di¤usion …nancial model in which u is an unobserved mean-reverting di¤usion, we give an explicit form for the likelihood function of µ: For this model we also develop a computationally explicit E-M algorithm for the estimation of µ: In contrast to the likelihood ratio, the algorithm involves evaluation of a number of …ltered integrals in addition to the basic …lter. 1 Introduction Let (-; F; P ), fF t ; t T g be a …ltered complete probability space and W = fWt; t T g a standard Brownian Motion. We consider process Y with the 1 decomposition dYt = ut(µ)dt + dWt (1) where u is a measurable, adapted process, and µ is an unknown, vector-valued parameter. We assume that Y is observed continuously on the interval [0; T ]; but that neither u = fut ; t T g nor W are observed. The purpose of this work is to develop methods for the maximum likelihood estimation of µ: Our interest in this work was motivated by the following model from …nance. Suppose that we observe continuously a single security with the price process {St ; t T g which follows the equation dSt = ¹t St dt + ¾St dWt; where ¾ > 0 is a known constant, but the drift coe¢cient ¹ = f¹t ; t an unobserved mean-reverting process with the dynamics d¹t = ®(À ¡ ¹ t )dt + ¯dWt1 : (2) T g is (3) Here ® > 0; ¯ > 0; and À are unknown parameters, and W 1 is a standard Brownian Motion independent of W: The initial value of the security S0 is an observable constant whereas the initial value of the drift, ¹0 ; is an unknown constant. The model in (2) and (3) has been recently found to be very ‡exible in modeling of security prices. Its versions have been used extensively in the area of dynamic asset management (see e.g. Bielecki and Pliska (1999), Lakner (1998), Kim and Omberg (1996), and Lo (2000)). We now transform the model de…ned by (2) and (3) to the form in (1) by introducing the following changes of variables. The reason for the transformation is that we want the law of {Yt ; t T g to be equivalent to that of a Brownian Motion. Let 1 1 ¹t ¡ ¹ 0; ¯ ¯ µ ¶ 1 St ¾ = log + t: ¾ S0 2 Xt = Yt Then the hidden process is dXt = ®(± ¡ Xt )dt + dWt1; 2 (4) where ± = À=¯; and the observed process is dYt = (¸Xt + º)dt + dWt ; (5) where ¸ = ¯=¾ and º = ¹ 0=¾: Let µ = (®; ±; ¸; º): We note that X0 = 0 and Y0 = 0; and that equation (5) is in the form (1) with ut(µ) = ¸Xt + º: We summarize our results. Our main result is a new formula for the likelihood ratio of the law of the process Y; speci…ed in (2), with respect to the Brownian measure. Naturally, the likelihood ratio is a function of µ: This result generalizes a result in Kailath and Zakai (1971) that gives a formula for the likelihood ratio in the case when u is adapted to F Y = fFtY ; t T g; the …ltration generated by the observed process Y: Here we extend this result to the case when u is not adapted to F Y : We note that in our …nancial model example (5) u is not adapted to F Y : We establish the new formula under a stronger (Theorem 1) and a weaker (Theorem 2) set of su¢cient conditions. These conditions allow u to be a functional of Y; and of a random element X which is independent of a Brownian Motion W: For example X could be a di¤usion or a Markov chain. The likelihood ratio involves only one basic …lter, that is, the expectation of u conditional on the observed data. The new formula makes it possible to directly maximize the likelihood function with respect to the parameters. It may also be useful in developing the likelihood ratio tests of various hypotheses about the parameters of hidden Markov processes. For the …nancial model in (4) and (5) we give an explicit form of the likelihood function. We also consider the Expectation-Maximization (E-M) algorithm approach to the maximization of the loglikelihood function for the model in (4) and (5). Dembo and Zeitouni (1986) extended the E-M algorithm to the parameter estimation of the hidden continuous time random processes and in particular to the parameter estimation of hidden di¤usions. The …nancial model in (4) and (5) generalizes an example in Dembo and Zeitouni (1986). For this model we develop a computationally explicit E-M algorithm for the maximum likelihood estimation of µ: The expectation step of this algorithm requires evaluation of …ltered integrals, that is, evaluation of the expectations of stochastic integrals conditional on observed data. As remarked by Elliott et al (1997, p. 203), Dembo and Zeitouni (1986) express the …ltered integrals in the form of nonadapted stochastic integrals which are not well de…ned. In our development we express a required …ltered integral in terms of adapted stochastic integrals (Proposition 4). In contrast to the likelihood function, the E-M algorithm involves a number of …lters in addition to the basic …lter. 3 The development of the likelihood ratio tests based on the new formula for the likelihood function and of simulations to assess the accuracy of the maximum likelihood estimators is left for a future investigation. This paper is organized as follows. The new formula for the likelihood ratio is established in Section 2. Section 3 presents the extended E-M algorithm and some of its properties. In Section 4 and 5 we apply the E-M algorithm to the estimation of µ in model in (4) and (5). Appendix contains an auxiliary result for the derivation of the likelihood ratio in Section 2. 2 The Likelihood Ratio In this section we generalize a result of Kailath and Zakai (1971). We consider the process in (1), but to simplify notation, we suppress the dependence of u on µ; so that dYt = ut dt + dWt : (6) Our standing assumption is Z T 0 u2t dt < 1: (7) Let Py be the measure induced by Y on B(C [0; T ]) and Pw be the Brownian measure on the same class of Borel sets. We know from Kailath & Zakai (1971) that (7) implies Py ¿ Pw . The same authors give a formula for the dPy likelihood ratio dP for the case when u is adapted to F Y . Here we are going w to extend this result to the case when u is not adapted to F Y . Let fv(t); t T g be the F Y ¡optional projection of u, thus E[ut j F tY ] = vt; a:s: (8) whenever Ejutj < 1: Theorem 1 Additionally to the assumption (7), we assume that Ejut j < 1 ; for all t 2 [0; T ] 4 (9) Z T 0 (10) a.s. vs2ds < 1 ; and n Z E exp ¡ T 0 1 us dWs ¡ 2 Z T 0 o¸ u2s ds = 1: Then P y is equivalent to Pw and Z T Z dPy 1 T 2 = expf vs dYs ¡ v dsg: dPw 2 0 s 0 (11) (12) Remark: The formula on the right-hand side is an FTY -measurable random variable, thus it is also a path-functional of fYt ; t T g. Thus this expression represents a random variable and a functional on C[0; T ] simultaneously, and in (12) it appears in its capacity as a functional on the space of continuous functions. Proof. We de…ne the process Z t Z 1 t 2 Zt = expf¡ us dWs ¡ u dsg; t 2 [0; T ] (13) 2 0 s 0 and the probability measure P0 by dP0 = ZT ; dP which is a bona …de probability measure by condition (11). We are going to use repeatedly the following form of “Bayes theorem”: for every F t measurable random variable V we have E0[V j FtY ] = 1 E[ZtV j FtY ]; E[Zt j FtY ] t 2 [0; T ]; (14) where E0 is the expectation operator corresponding to the probability measure P0: According to Girsanov’s Theorem Y is a Brownian Motion under P0. Also, one can see easily that dPy 1 = E 0[ j FTY ] = (E[ZT j F TY ])¡1 : dPw ZT 5 The equivalence of Py and Pw already follows, since ZT > 0 (under P and P0) thus also E0[ 1 j F TY ] > 0; ZT hence the same expression, as a path-functional of fY t; t T g is Py-almost everywhere positive. We still need to show (12). We start with the identity Z t 1 1 = 1+ us dYs : Zt 0 Zs Theorem 12.1 in Lipster and Shiryaev (1978), implies that Z t 1 1 Y E0[ j Ft ] = 1 + E0[ us j FsY ]dYs ; Zt Z s 0 as long as the following two conditions hold: ¸ 1 E0 ju j < 1 Zs s Z Tµ 0 E0 1 us j FsY Zs ¸¶2 ds < 1 ; a.s. (15) (16) (17) However, (16) follows from condition (9). In order to show (17) we use the rule (14) and compute Z T Z T Z T 1 1 Y 2 Y ¡2 2 (E0 [ us j F s ]) ds = (E[Zs j F s ]) vs ds = (E0[ j FsY ])2vs2ds: Zs Zs 0 0 0 £ ¤ Now we observe that by (15) (s; !) 7! E0 Z1s j F sY is a P0 -local martingale (and also a martingale but this is not important at this point), adapted to the …ltration generated by the P0-Brownian Motion Y , thus it must be continuous (Karatzas and Shreve, Problem 3.4.16). This fact and condition (10) implies (17). Now using (15) and (14) we get ¸ Z t Z t 1 1 1 Y Y E0[ j Ft ] = 1 + v dYs = 1 + E0 j F s vs dYs ; Y s Zt Zs 0 E[Zs j Fs ] 0 6 which implies (12). In the rest of this section we are going to strengthen Theorem 1 by eliminating condition (11). This will be a generalization again of a result by Kailath and Zakai (1971), where u is assumed to be F Y -adapted. Our objective here is to have a result which applies to a situation when u depends on Y and possibly on another process which is independent of W . This other process may be a Brownian Motion, or possibly a Markov chain. In order to achieve this generality we assume that X is a complete separable metric space and X : - 7! X is an F 7! B(X )-measurable random element, independent of the Brownian Motion W . We denote by PX the measure induced by X on B(X): We assume that us = ©(Y0s ; s; X); s 2 [0; T ] (18) where Y0s represents the path of Y up to time s. Before spelling out the exact conditions on © we note that X may become a Brownian Motion if X = C[0; T ], or a Markov-chain if X = D[0; T ] (the space of right-continuous functions with left limits) with the Skorohod topology. Other examples are also available. The conditions for © are the following. First a notation: for every f 2 C([0; T ]) and s 2 [0; T ] we denote by f s 2 C [0; s] the restriction of f to [0; s]. We assume the following: For every s 2 [0; T ] …xed, ©(¢; s; ¢) is a mapping from C [0; s] £ X to < such that (i) the C[0; T ] £ [0; T ] £ X 7! < mapping given by (f; s; x) 7! ©(f s ; s; x) is measurable with respect to the appropriate Borel-classes. (ii) for every s 2 [0; T ] and f 2 C[0; s] the random variable ©(f; s; X) is Fs -measurable. (iii) the following “functional Lipschitz” condition (Protter, 1990) holds: for every f; g 2 C([0; T ]) and PX ¡ almost every x 2 X there exists an increasing (…nite) process Kt (x) such that ¯ ¯ ¯©(f t ; t; x) ¡ ©(g t ; t; x)¯ Kt (x) sup jf (u) ¡ g(u)j u t for all t 2 [0; T ]: Our basic setup now is Y with a decomposition (6), u of the form (18), where X is a random element independent of W and © a functional satisfying (i), (ii) and (iii). 7 Theorem 2 Suppose that (7) holds, i.e., in our case Z T © 2(Y0s ; s; X)ds < 1 (19) 0 and also Z T 0 (20) ©2 (W0s ; s; X)ds < 1: Then n Z E exp ¡ T 0 1 us dWs ¡ 2 Z T 0 o¸ u2s ds = 1: (21) Remark: It is interesting to observe that if © does not depend on Y , i.e., u is independent of W then (21) follows from (7) only. Remark: Conditions (19) and (20) are obviously satis…ed if the mapping s 7! ©(f s ; s; x) is continuous for every (f; x) 2 C[0; T ] £ X . Proof. The main idea of the proof is to derive our theorem from the results of Kailath & Zakai by conditioning on X = x. Let Q(x; ¢), x 2 X be a regular conditional probability measure on F given X. The existence of such regular version is well-known in this situation. It is shown in the Appendix that the independence of the Brownian Motion W and X implies that f(Wt; F tW ); t T g remains a Brownian Motion under Q(x; ¢) for PX-almost every x 2 X . We also show in the Appendix that if A is a zero probability event under P then for PX ¡ almost every x 2 X we have Q(x; A) = 0: (The “PX¡ almost every” is important here; obviously Q(x; ¢) is not absolutely continuous with respect to P:) Accordingly, for PX-almost every x 2 X Z T 0 ©2(Y0s ; s; x)ds < 1 ; Q(x; ¢) ¡ a:s:; ©2(W0s ; s; x)ds < 1 ; Q(x; ¢) ¡ a:s: and Z T 0 8 Since X is assumed to be a complete measurable metric space, we have Q(x; f! 2 - : X (!) = xg) = 1 for PX -almost every x 2 X (Karatzas and Shreve,(1988), Theorem 5.3.19) What follows is that under Q(x; ¢) our equations (6) and (18) become dYt = ©(Y0t; t; x)dt + dWt: It follows from Theorem V.3.7 in Protter (1990) and our condition (iii) that the above equation has a unique (strong) solution. Thus the process Y must be adapted to the Q(x; ¢)¡ augmentation of F W : However, now we are back in the situation studied by Kailath and Zakai (1971). For every x 2 X let PY (x; ¢) be the measure induced by Y on B(C [0; T ]) under the conditional probability measure Q(x; ¢). Kailath and Zakai, Theorems 2 and 3 imply that for P X-almost every x 2 X we have PY (x; ¢) » Pw and n dPw = exp ¡ dPY (x; ¢) Z T 0 ©(Y0t ; t; x)dYt 1 + 2 Z T 0 o ©2 (Y0t ; t; x)dt : This implies that the integral of the right-hand side with respect to the measure PY (x; ¢) is 1, which can be expressed as ¸ Z n Z T 1 T 2 o E exp ¡ utdYt + u dt j X = x = 1 : 2 0 0 for PX -almost every x 2 X. Formula (21) now follows. We summarize the results of Theorems 1 and 2 in the following theorem. Theorem 3 Let X : - 7! X be a measurable mapping taking values in the complete separable metric space X . Assume that X is independent of the Brownian motion W , Y has the form (6) and u is as speci…ed in (18) where © satis…es conditions (i), (ii) and (iii). Additionally we assume E[©(Y0s ; s; X)] < 1; Z T 0 vs2ds < 1 ; 9 a.s., Z T 0 Z T 0 ©2(Y0s ; s; X )ds < 1 ; a.s., ©2(W0s ; s; X)ds < 1; a.s., where v is the optional projection of u to F Y .Then Py » Pw and Z nZ T dP y 1 T 2 o = exp vs dYs ¡ v ds : dPw 2 0 s 0 Furthermore, ¸ ¸ Z T Z Z T Z 1 T 2 1 T 2 E expf¡ us dWs ¡ u dsg = E expf¡ vs dWs ¡ v dsg = 1:¥ 2 0 s 2 0 s 0 0 For our example of the security price model we compute explicitly the required optional projection in Section 5. 3 E-M algorithm. We brie‡y describe the extended EM algorithm presented in Dembo and Zeitouni (1986). In the next section we apply this algorithm to the estimation of µ in the model de…ned in (4) and (5). Let FTX be the …ltration generated by {Xt; 0 t T g; F TY the …ltration generated by {Yt ; 0 t T g; and X;Y FT the …ltration generated by {Xt ; 0 t T g and {Yt ; 0 t T g: We identify - as C [0; T ] £ C[0; T ] and F as the Borel sets of -: Let X be the …rst coordinate mapping process and Y be the second coordinate mapping process, i. e., for (!1; ! 2) 2 C[0; T ] £ C [0; T ] we have Xt (!1; !2 ) = ! 1(t) and Yt (!1; ! 2) = !2 (t): Let Pµ be a family of probability measures such that Pµ » PÁ for all µ; Á 2 £; where £ is a parameter space. Also let Eµ denote the expectation under the measure Pµ : We denote by PµY the restriction of Pµ to F TY : Then obviously dPµY dPµ Y jF ): Y = E Á( dPÁ dPÁ T 10 (22) The maximum likelihood estimator b µ of µ is de…ned as: à ! Y bµ = arg max dPµ for some constant Á 2 £: µ dPÁY It should be clear that this de…nition does not depend on the choice of Á: The maximization of dPµY =dPÁY is equivalent to the maximization of the loglikelihood function LÁ(µ) , log dPµY : dPÁY (23) Now by (22) and (23) dPµY0 LÁ(µ ) ¡ LÁ(µ) = log = log Eµ dPµY 0 and from the Jensen’s inequality 0 LÁ(µ ) ¡ LÁ(µ) ¸ Eµ µ µ dPµ0 Y jF dPµ T dP 0 log µ jF TY dPµ ¶ ; ¶ ; (24) with equality if and only if µ 0 = µ: The E-M algorithm is based on (24) and proceeds as follows. In the …rst step we select an initial value µ 0 for µ: In the (n + 1)st iteration, n ¸ 0; the E-M algorithm is applied to produce µ n+1: Each iteration consists of two steps: the expectation, or E, step and the maximization, or M, step. In the E-step of the (n + 1)st iteration the following quantity is computed Q(µ; µ n ) , Eµ n (log dPµ Y jF ): dPµ n T In the M-step Q(µ; µn ) is maximized with respect to µ to obtain µ n+1: Iterations are repeated until the sequence fµ ng converges. It is clear from (24) that LÁ(µ n+1) ¸ LÁ(µ n); (25) with equality if and only if µ n+1 = µn : Thus at each iteration of the E-M algorithm the likelihood function increases. In the M-step the maximizer of µ is obtained as the solution of @ @ dPµ Y Q(µ; µn ) = Eµn (log jF ) = 0; (26) @µ @µ dPµ n T 11 or if we can interchange the di¤erentiation and expectation operations, the maximizer of µ is the solution of @ dPµ Y Eµ n ( log jF ) = 0: @µ dPµ n T A corollary of (25) is that the maximum likelihood estimator is a …xed point of the E-M algorithm. For summary of the convergence properties of the E-M algorithm and additional references we refer the reader to Dembo and Zeitouni (1986). 4 Estimation of the drift of the security price process with the E-M algorithm We develop the E-M algorithm for the estimation of µ = (®; ±; ¸; º) in the security price model in (4) and (5). By Girsanov’s theorem, if also X process is completely observed, the log-likelihood function of µ is given by ½Z T ¾ Z dPµ 1 T 2 2 = exp ®(± ¡ Xs )dXs ¡ ® (± ¡ Xs) ds dPW 2 0 0 ½Z T ¾ Z 1 T 2 £ exp (¸Xs + º)dYs ¡ (¸Xs + º) ds ; 2 0 0 where W =(W 1; W ) is a two-dimensional Brownian motion and PW is the associated Brownian measure on ß(C[0; T ] £ C [0; T ]): Hence ½Z T ¾ Z dPµ 1 T 2 2 log = ®(± ¡ Xs )dXs ¡ ® (± ¡ Xs ) ds dPµn 2 0 0 ½Z T ¾ Z 1 T 2 + (¸Xs + º)dYs ¡ (¸Xs + º) ds 2 0 0 +f(µn ); where f(µn ) is a term which does not depend on µ: Since, for our problem, we can interchange the operations of di¤erentiation and integration, the equations in (26) take the form ¸ @ dPµ Y Eµ n log jF = @® dPµ n T Z T Z T ¸ 2 Y Eµ n (± ¡ Xs )dXs ¡ ® (± ¡ Xs ) dsjFT = 0; (27) 0 0 12 @ dPµ Y log jF @± dPµ n T Eµn Eµ n ¸ = Eµn ®XT ¡ Z T 2 ® (± ¡ 0 Xs )dsjFTY ¸ = 0; (28) ¸ ¸ Z T Z T @ dPµ Y Y log jF = Eµn Xs dYs ¡ (¸Xs + º)Xs dsjFT = 0: @¸ dPµn T 0 0 (29) Eµ n @ dP µ Y log jF @À dPµ n T ¸ = Eµn YT ¡ Let Then, recalling that RT 0 Z T 0 (¸Xs + º)dsjFTY ¸ = 0; (30) m(t; T; µ n ) = Eµn (Xt jF TY ); n(t; T; µ n ) = Eµn (Xt2 jFTY ): Xs dXs = 12 XT2 ¡ T2 ; equation (27) becomes 1 ±m(T; T ; µn ) ¡ n(T; T ; µn ) + 2 T + ¡ ®±2 T + 2®± 2 Z T 0 m(s; T; µ n)ds ¡ ® Z T n(s; T; µ n )ds = 0: (31) 0 Equation (28) takes the form m(T ; T; µ n) ¡ ®±T + ® Z T m(s; T; µ n )ds = 0: (32) 0 Multiplying (32) by ± and subtracting the result from (31) gives Z T Z T 1 T ¡ n(T; T; µ n ) + + ®± m(s; T ; µn )ds ¡ ® n(s; T; µ n)ds = 0: (33) 2 2 0 0 From equations (32) and (33) we obtain updated values of ® and ± : RT 2 ¡ 12 T n(T ; T; µ n ) + T2 + m(T; T ; µn ) 0 m(s; T; µ n)ds ®n+1 = ; hR i2 RT T T 0 n(s; T; µ n )ds ¡ 0 m(s; T ; µn )ds RT m(s; T; µ n )ds m(T ; T; µ n) ±n+1 = + 0 : T ®n+1 T 13 (34) (35) Similarly we write equation (30) as Z T YT ¡ ¸ m(s; T ; µ n)ds ¡ ºT = 0; (36) 0 and equation (29) as ¸ Z T Z Y Eµn Xs dYs jF T ¡ ¸ 0 T 0 n(s; T; µ n )ds ¡ º Z T m(s; T ; µ n)ds = 0: (37) 0 We compute the …rst term of (37) in the following lemma. Let ms (µn ) = Eµ n (XsjFsY ); and to simplify notation we will write m(t; T ) = m(t; T ; µ); n(t; T ) = n(t; T; µ); ms = ms (µ) Proposition 4 ¸ Z T Y Eµ Xs dYs jFT 0 ZT Z T Z = msdYs ¡ ms (º + ¸ms)ds + º 0 0 T m(s; T )ds + ¸ 0 Proof. Let ft = Yt ¡ ºt ¡ ¸ W Z Z T m(s; T )ms ds 0 T ms ds (38) 0 ft ; FtY ); 0 By Lemma 11.3 in Lipster and Shiryayev (1978), the process (W t T is a Brownian motion. By Theorem 12.1 of Lipster and Shiryayev (1978) mt satis…es the following equation dmt = ®(± ¡ mt )dt ¡ ¸° t (º + ¸mt)dt + ¸° tdYt ; m0 = 0; (39) where ° t = V (Xt jF tY ) is a deterministic function satisfying d° t = ¡¸2° 2t ¡ 2®° t + 1; ° 0 = 0: dt 14 (40) From (38) ft = dYt ¡ (º + ¸mt )dt; dW and thus ft : dmt = ®(± ¡ mt)dt + ¸° tdW f (41) (42) f We see from (42) that mt is FtW ¡ adapted. Then Yt is FtW ¡ adapted by (38). Hence ¸ ¸ Z T Z T f Y W Eµ Xs dYs jF T = Eµ Xs dYs jFT 0 0 ¸ Z T Z T f W fs + = Eµ Xs dW Xs (º + ¸ms )dsjFT 0 0 ¸ Z T Z T f W f = Eµ Xs dWs jF T + º m(s; T )ds 0 0 Z T +¸ m(s; T )ms ds: (43) 0 Now by Theorem 5.14 in Lipster and Shiryayev (1977, p.185) ¸ Z T Z T f W f fs ; Eµ Xs dWs jFT = ms dW 0 (44) 0 provided that and E jXs j < 1; 0 Z P( T 0 h s i f 2 W E(jXs j jF s ) ds T; < 1) = 1: (45) (46) Conditions (45) and (46) hold in our case and thus by (44), (43) and (41) ¸ Z T Y Eµ Xs dYs jF T 0 Z T Z T Z T f = ms dWs + º m(s; T )ds + ¸ m(s; T )ms ds 0 0 0 Z T Z T Z T = ms dYs ¡ ms (º + ¸ms )ds + º m(s; T )ds 0 0 0 Z T +¸ m(s; T )ms ds; 0 15 which is the statement of the Proposition. By Proposition 4 equation (37) becomes Z T Z T Z T 0 = ms (µ n)dYs ¡ º ms (µ n)ds ¡ ¸ m2s (µ n )ds 0 0 Z T Z T 0 +¸ m(s; T ; µn )ms (µn )ds ¡ ¸ n(s; T; µ n)ds; 0 (47) 0 From (36) and (47) the updated values for ¸ and º are ¸n+1 = T and RT 0 T RT ms (µ n)dYs ¡ YT (48) RT ms (µ n)ds ; RT RT [m2s (µ n ) + n(s; T; µ n) ¡ m(s; T; µ n )ms (µ n)] ds ¡ 0 m(s; T; µ n)ds 0 ms (µ n)ds ºn+1 = 0 YT ¡ ¸n+1 0 RT 0 m(s; T; µ n )ds : T (49) To implement the E-M algorithm using equations (34), (35), (48), and (49) we need the expressions for ms ; m(s; T ) and n(s; T ): These expressions are obtained in the next section. 5 Expressions for Filters. We obtain mt as the solution of the equation (39). Letting µ ¶ Z t 2 ©t = exp ¡®t ¡ ¸ ° s ds : (50) 0 the solution is mt = ©t ¸ Z t 0 ° s ©¡1 s (dYs ¡ ºds) + ®± Z 0 t ©¡1 s ds ¸ : (51) The function ° t is obtained as the solution of (40) and is given by °t = b exp(2at) ¡ d ® ¡ 2; 2 exp(2at) + d ¸ ¸ 16 (52) where b= p ®2 + ¸2; d = b¡® : b+® For evaluation of mt it is useful to note the following expressions. Substituting (52) into (50) gives µ Z t ¶ (b + ®) exp(bs) ¡ (b ¡ ®) exp(¡bs) ©t = exp ¡b ds 0 (b + ®) exp(bs) + (b ¡ ®) exp(¡bs) © ª = exp ¡ ln [(b + ®) exp(bs) + (b ¡ ®) exp(¡bs)] j t0 2b = ; (b + ®) exp(bt) + (b ¡ ®) exp(¡bt) and also sinh(bs) : b ° s ©¡1 s = We note that the likelihood function of µ involves only the …lter ms ; and is of the following explicit form ½Z T ¾ Z 1 T 2 exp [¸ms (µ) + º]dYs ¡ [¸ms (µ) + º] ds : 2 0 0 Application of Lemma 12.3 and Theorem 12.3 in Lipster and Shiryayev (1978, p. 36-37) to our model shows that m(s; t) (53) Z t Z t ¡ t ¢ ¡1 = 's mt ¡ ®± ('us)¡1 du ¡ ¸ ('us)¡1 °(u; s)(dYu ¡ ºdu); s t; s s and 2 °(s; t) = 1 + ¸ ° s Z t s (' us)2 du ¸¡1 ° s; s t; (54) respectively, where 'ts ; t ¸ s; is the solution of ¢ d'ts ¡ = ¡® ¡ ¸2°(t; s) ' ts; ' ss = 1; dt 17 (55) and for t > s; °(t; s) = V (XtjzYt ; Xs ) satis…es d°(t; s) = ¡2®°(t; s) + 1 ¡ ¸2°(t; s)2; °(s; s) = 0: dt (56) It can be easily veri…ed that the solution to (56) is °(t; s) = b exp [2b(t ¡ s)] ¡ d ® ¡ 2 ; t ¸ s: 2 ¸ exp [2b(t ¡ s)] + d ¸ Using (57), the solution to (55) is Z t 's = exp ¡b t exp[2b(u ¡ s)] ¡ d du s exp[2b(u ¡ s)] + d exp[b(t ¡ s)] = (1 + d) : exp[2b(t ¡ s)] + d ¸ For evaluation of m(s; t) it is useful to note that ¡ 'ts ¢¡1 °(t; s) = sinh [b(t ¡ s)] ; b and to evaluate °(s; t) we compute the integral Z t Z t 4b2 exp[2b(u ¡ s)] u 2 (' s ) du = du 2 (b + ®) s fexp[2b(u ¡ s)] + dg2 s ¸ 2b 1 t = ¡ j exp[2b(u ¡ s)] + d s (b + ®)2 exp[2b(t ¡ s)] ¡ 1 = : (b + ®) exp[2b(t ¡ s)] + b ¡ ® Finally we note n(s; t) = °(s; t) + m(s; t)2; 0 Appendix 18 s t: (57) Proposition: Suppose that {(Wt ; Ft ); t T g is a Brownian Motion, X is a complete separable metric space, X : - ! X is a measurable mapping, and {Q(x; ¢); x 2 X g is a family of regular conditional probability measures for F given X: Let P X be the measure induced by X on B(X ). Then (a) for every A 2 F such that P (A) = 0 we have Q(x; A) = 0 for PX ¡almost every x 2 X , (b) for PX ¡almost every x 2 X , the process {(Wt; F tW ); t Brownian Motion under the probability measure Q(x; ¢): T g is a Proof: Statement (a) is an obvious consequence of the de…nition of regular conditional probability measure. Indeed, for every A 2 F we have Q(x; A) = P (AjX = x); PX ¡ a:e:; x 2 X and thus Q(X; A) = P (AjX); P ¡ a:s: This implies that if A is a P ¡zero event then E[Q(X ; A)] = P (A) = 0 and (a) now follows. Next we are going to prove (b). It follows from (a) that the process W has Q(x; ¢)¡ almost surely continuous paths for P X¡almost every x 2 X. We need to show that the …nite dimensional distributions of W under Q(x; ¢) coincide with those under the probability measure P: Let us …x a positive integer n and time points 0 t1 ::: tn T: We denote by Q the set of rational numbers. For every q = (q1; :::; qn ) 2 Qn there exists a set Bq 2 B(X), PX (Bq ) = 1 such that for all x 2 Bq Q(x; W (t1) q1 ; :::; W(tn ) qn) = P (W (t1 ) = P (W (t1 ) q1; :::; W (tn) qn ); q1; :::; W (tn) qnjX = x) where the last step follows from independence of X and W under P: Now we de…ne \ B= Bq ; q2Q n and note that PX (B) = 1 since Qn is countable. It follows that the equation above holds for every x 2 B which completes the proof. 19 Acknowledgement Halina Frydman completed part of this work while visiting the Department of Biostatistics at the University of Copenhagen. She thanks the department for the hospitality. REFERENCES Bielecki, T. R. and Pliska, S. R., (1999). Risk Sensitive Dynamic Asset Management. Journal of Applied Mathematics and Optimization 39: 337360. Dembo, A. and Zeitouni O. (1986). Parameter Estimation of Partially Observed Continuous Time Stochastic Processes. Stochastic Processes and their Applications 23: 91-113. Elliott, R. J., Aggoun, L. P. and Moore, J. B. (1997). Hidden Markov Models. Springer-Verlag. Haugh, M. B. and Lo, A. W. (2000). Asset Allocation and Derivatives. Draft. MIT Sloan School of Management. Kailath, T. and Zakai, M. (1971). Absolute Continuity and RadonNikodym Derivatives for Certain Measures Relative to Wiener Measure. The Annals of Mathematical Statistics 42: 130-140. Karatzas, I. and Shreve, S. E. (1988). Brownian Motion and Stochastic Calculus. Springer Verlag. Kim, T. and Omberg, E. (1996). Dynamic Nonmyopic Portfolio Behavior. Review of Financial Studies 9: 141-161. Lakner, P. (1998). Optimal Trading Strategy for an Investor: the Case of Partial Information. Stochastic Processes and their Applications 76: 77-97. Lipster, R. S. and Shiryayev, A. N. (1977). Statistics of Random Processes, Vol. 1. Springer Verlag. Lipster, R. S. and Shiryayev, A. N. (1978). Statistics of Random Processes, Vol. 2. Springer Verlag. 20 Protter, P. (1990). Stochastic Integration and Di¤erential Equations. Springer-Verlag. Department of Statistics and Operations Research Stern School of Business, New York University 44 West 4th Street, New York, NY 10012 E-mail: hfrydman@stern.nyu.edu E-mail: plakner@stern.nyu.edu Abbreviated Title: Hidden Markov Processes 21