Analyticity of Entropy Rate of Hidden Markov Chains with Continuous Alphabet

1 Analyticity of Entropy Rate of Hidden Markov Chains with Continuous Alphabet Guangyue Han and Brian Marcus, Fellow, IEEE, Abstract—We first prove that under certain mild assumptions, the entropy rate of a hidden Markov chain, observed when passing a finite-state stationary Markov chain through a discretetime continuous-output channel, is analytic with respect to the input Markov chain parameters. We then further prove, under strengthened assumptions on the channel, that the entropy rate is jointly analytic as a function of both the input Markov chain parameters and the channel parameters. In particular, the main theorems establish the analyticity of the entropy rate for two representative channels: Cauchy and Gaussian. The analyticity results obtained are expected to be helpful in computation/estimation of entropy rate of hidden Markov chains and capacity of finite-state channels with continuous output alphabet. Index Terms—hidden Markov chain, entropy rate, analyticity, continuous alphabet, Hilbert metric. 0 here z−n , (z−n , z−n+1 , · · · , z0 ) denotes an instance of 0 0 Z−n , (Z−n , Z−n+1 , · · · , Z0 ), and p(z−n ) denotes the prob0 ability density of z−n . It is well-known (e.g., see page 60 0 of [19]) that if h(Z−n ) is finite for all n, then the limit in (1) exists and can be written as h(Z) = lim hn (Z), n→∞ where Z hn (Z) = − Z n+1 −1 0 0 p(z−n ) log p(z0 |z−n )dz−n ; (2) −1 here p(z0 |z−n ) denotes the conditional density of z0 given −1 z−n . Since the channels considered in this paper are memoryless and Y is stationary, we have 0 0 h(Z−n |Y−n ) = (n + 1)h(Z0 |Y0 ), I. M AIN R ESULTS AND R ELATED W ORK where Introduction and background. Consider a discrete-time channel with a finite input alphabet Y = {1, 2, · · · , l} and the continuous output alphabet Z = R. We assume that the input process is a Y-valued first-order stationary Markov chain Y with transition probability matrix Π = (πij )l×l and stationary vector π = (πi )1×l (here we assume Y is first-order only for simplicity; a standard “blocking” approach can be used to reduce higher order cases to the first-order case). We assume that the channel is memoryless in the sense that at each time, the distribution of the output z ∈ Z, given the input y ∈ Y, is independent of the past and future inputs and outputs, and is distributed according to a probability density function q(z|y). The corresponding output process of this channel is a hidden Markov chain, which will be denoted by Z throughout the paper. The entropy rate h(Z) is defined as h(Z) = lim n→∞ 1 0 h(Z−n ), n+1 (1) when the limit exists, where 1 Z 0 0 0 0 h(Z−n )=− p(z−n ) log p(z−n )dz−n ; Z n+1 Guangyue Han is with the Department of Mathematics, The University of Hong Kong, Hong Kong, China. email: ghan@hku.hk. Brian Marcus is with the Department of Mathematics, The University of British Columbia, B.C., Canada. email: marcus@math.ubc.ca. A preliminary version of this paper has been presented in IEEE ISIT 2010 [14]. Copyright (c) 2014 IEEE. Personal use of this material is permitted. However, permission to use this material for any other purposes must be obtained from the IEEE by sending a request to pubs-permissions@ieee.org. 1 Throughout the paper, since all the integrands are at least continuous and most are analytic, all the integrals should be interpreted as Riemann integrals. h(Z0 |Y0 ) = − X y∈Y Z πy q(z|y) log q(z|y)dz. Z It then follows from 0 0 0 0 0 0 (n+1) = h(Z0 |Y0 )h(Z−n |Y−n ) ≤ h(Z−n ) ≤ h(Y−n )+h(Z−n |Y−n ) that if Z q(z|y) log q(z|y)dz is finite for all y, (3) z∈Z 0 ), which implies that h(Z) as so is h(Z0 |Y0 ) and then h(Z−n in (1) is well-defined and finite. Unless specified otherwise, we will assume that Π = Π~ε = ~ ε (πij ) is analytically parameterized by ~ε ∈ Ω1 , where Ω1 denotes a bounded domain in Rm1 (a domain is an open ~ and connected set), and for any (y, z) ∈ Y × Z, q θ (z|y) is ~ analytically parameterized by θ ∈ Ω2 , where Ω2 denotes a bounded domain in Rm2 . Main results. We are now ready to state our main results. Here, we remark that all our results in this paper can be straightforwardly translated to the setting where Z is finite or countably infinite. Under some mild regularity condition, we prove the following theorem, which establishes the analyticity of h(Z) as a function of the underlying Markov chain parameters. Theorem I.1. Assume that for any y ∈ Y, q(z|y) is positive and continuous on Z, and the following two integrals Z Z 0 q(z|y)| log min q(z|y )|dz, q(z|y)| log max q(z|y 0 )|dz 0 0 Z y Z y (4) 2 are finite. If Π is strictly positive at ~ε0 , then h(Z) is analytic around ~ε0 . Remark I.2. Theorem I.1 has been announced in [14], where Condition (3) has been assumed. Unfortunately, we have found that, at least for the rigorous proof in this paper, Condition (3) is not strong enough. More specifically, as in the proof of Theorem I.1, we need to use (4), a strengthened version of −1 Condition (3), to establish the analyticity of h~εn (Z0 |Z−n ) on m1 m1 C~ε0 (r1 ); here and throughout the paper, C~ε0 (r1 ) denotes the r1 -ball of ~ε0 in Cm1 with respect to the Euclidean metric. Remark I.3. When Z is finite, the analyticity of h(Z) has been established under mild positivity assumptions in [11]. Due to the fact that for any y, q(z|y) can be arbitrarily close to 0, the complexfication and uniform convergence arguments in [11] do not carry over to Theorem I.1. As elaborated in Sections V and VI, the proof of Theorem I.1 requires a critical use of the complex Hibert metric in Section V. We will also prove analyticity of h(Z) as a function of both the Markov chain parameters and channel parameters. Simple examples show that h(Z) can fail to be analytic as a function of the channel parameter alone; see Example IV.1 and Remark IV.2. Our positive results require several technical regularity conditions, which we describe as follows. These conditions involve the complexification of the channel density functions (by definition, any real analytic function, such as ~ q θ (z|y) as above, at a given point can be uniquely extended to a complex analytic function on some complex neighborhood of the given point; we will continue to use the same notation, ~ such as q θ (z|y), for this complex extension). We require these technical conditions, which abstract the properties of commonly used probability density functions (e.g., Cauchy and Gaussian), in order to make our proofs work. There may be more general conditions that suffice. On a first reading, the reader may want to skip directly to the statements of results below. Before listing our regularity conditions, we remind the ~ z)}z of θ~ is equiconreader that a family of functions {f (θ, tinuous if for any ε > 0, there exists δ > 0 such that |f (θ~1 , z) − f (θ~2 , z)| ≤ ε for all z as long as kθ~1 − θ~2 k ≤ δ. Note that uniform boundedness (over all feasible z) of the ~ z) with respect to θ~ will imply its equiconderivatives of f (θ, tinuity. The regularity conditions are as follows: For given (~ε0 , θ~0 ) ∈ Ω1 × Ω2 , (a) Π is strictly positive at ~ε0 ; ~ (b) for all y ∈ Y, q θ0 (z|y) is positive on Z; (c) there exists r2 > 0 such that ~ (i) for any (y, z) ∈ Y × Z, q θ (z|y) is analytic on 2 Cm ~ (r2 ), θ0 ~ (ii) for any y ∈ Y, q θ (z|y) is jointly continuous on 2 Z × Cm ~0 (r2 ), θ (iii) the following three integrals Z Z Z q̆(z, r2 )dz, q̆(z, r2 ) log q̂(z, r2 )dz, q̆(z, r2 ) log q̆(z, r2 )dz, (5) are all finite, where q̆(z, r2 ) , ~ |q θ (z|y)|, sup m2 ~ (y,θ)∈Y×C (r2 ) ~ θ0 q̂(z, r2 ) , ~ inf m2 ~ (y,θ)∈Y×C (r2 ) ~ |q θ (z|y)|; θ0 (d) for some I ∈ {1, 2, . . . , l}, (i) there exist r2 > 0 such that for all j, the family ~ ~ 2 of functions {q θ (z|j)/q θ (z|I)}z on θ~ ∈ Cm ~0 (r2 ) is θ equicontinuous, (ii) there exists r2 > 0 such that for each z, the real ~ 2 log q θ (z|I) can be analytically extended to Cm ~0 (r2 ) θ and for all j, Z ~ ~ sup q θ (z|j) log q θ (z|I) dz < ∞. (6) Z ~ m2 (r2 ) θ∈C ~ θ0 It is easily seen that the most commonly used channel models, including Cauchy and Gaussian channels, satisfy all of these conditions; see Examples I.6 and I.7. The following theorem is our first result on the joint analyticity of h(Z). Theorem I.4. For given (~ε0 , θ~0 ) ∈ Ω1 × Ω2 , assume Conditions (a), (b), (c) and (d). If, in addition, one of the following two conditions holds, 0 00 • there exist C , C > 0 such that for all i, j and all z ∈ Z, q θ~0 (z|j) (7) C0 ≤ ~ ≤ C 00 ; q θ0 (z|i) • for the same I as in Condition (d), there exists r2 > 0 such that for any ε > 0, there exists a compact subset Σ ⊂ Z such that for all z 6∈ Σ, all j 6= I and all 2 θ~ ∈ Cm ~0 (r2 ) θ q θ~ (z|j) (8) ~ ≤ ε, q θ (z|I) then h(Z) is analytic around (~ε0 , θ~0 ). Remark I.5. Conditions (7) and (8) are in fact mutually exclusive in the sense that if one of them holds, then the other one must fail. The reader should also note that while Theorem I.4 requires Condition (7) to hold only at one given parameter value, θ~0 , Condition (8) needs to hold for all parameter values θ~ in a complex neighborhood of the given θ~0 . On the other hand, if Condition (8) holds only at θ~0 , then one can conclude that h(Z) is analytic with respect to ~ε at ~ε0 . Theorem I.1, however, says that a condition as simple as (4) is already sufficient to imply the analyticity of h(Z) with respect to ~ε alone. In the following example, we show that Conditions (b)-(d) and (7) are satisfied for additive Cauchy channels. Example I.6. Consider an additive Cauchy channel with channel transition probability function taking the following form: 1 γi ~ q θ (z|i) = , γi > 0, i = 1, 2, . . . , l, (9) π (z − µi )2 + γi2 3 which is parameterized by θ~ ∈ Ω2 , where 2l Ω2 , {(γ1 , µ1 , γ2 , µ2 , . . . , γl , µl ) ∈ R : γi > 0, i = 1, 2, . . . , l}. Let θ~0 ∈ Ω2 . Obviously, Condition (b) is trivially satisfied. Now, for Condition (c), note that since γi > 0 for each i, each (z − µi )2 + γi2 is bounded away from 0 for all θ~ ∈ C2l ~0 (r2 ) if r2 is sufficiently small. The existence of r2 for (i) θ and (ii) then immediately follows. As for (iii), notice that for sufficiently small r2 , both q̆(z, r2 ) and q̂(z, r2 ) are of order O(1/z 2 ) for large z and are uniformly bounded below away from 0 for all z; this implies that the three integrals in (5) are all finite. We now check Condition (d) with I set to be 1 (here I can be chosen arbitrarily). For (i), observe that for sufficiently ~ ~ small r2 , the derivative of q θ (z|j)/q θ (z|1) with respect to θ~ ∈ m2 C~ (r2 ) is bounded uniformly over all j and z ∈ Z. As for θ0 (ii), observe that for sufficiently small r2 , the imaginary part ~ of q θ (z|1) is dominated by the real part, which implies that ~ 2 log q θ (z|1) can be analytically extended to Cm ~0 (r2 ), and the θ finiteness of the integral in (6) follows from the fact that for ~ large z, each q θ (z|i) is of order O(1/z 2 ). For Condition (7), it can be easily checked that for all i, j, ~ lim z→∞ q θ0 (z|j) q θ~0 (z|i) = 1, which immediately implies Condition (7). So, for an additive Cauchy channel, if Π is strictly positive at ~ε0 ∈ Ω1 , then h(Z) is analytic around (~ε0 , (γ1 , µ1 , γ2 , µ2 , . . . , γl , µl )). In the following example, we show that Conditions (b)-(d) and (8) are satisfied for additive Gaussian channels. Example I.7. Consider an additive Gaussian channel with channel transition probability function taking the following form: 2 2 1 ~ q θ (z|i) = √ e−(z−µi ) /(2σi ) , σi > 0, i = 1, 2, . . . , l, 2πσi and all σi are distinct, (10) which is parameterized by θ~ ∈ Ω2 , where and z ∈ Z, which immediately implies the equicontinuity. As for (ii), observe that √ (z − µI )2 ~ log q θ (z|I) = − log( 2πσI ) − , 2σI2 2 which can be analytically extended to Cm ~0 (r2 ) for sufficiently θ small r2 . And the finiteness of the integral in (6) follows from 2 ~ the fact that for large z, q θ (z|j) is of order e−O(z ) for all j. For Condition (8), it can be easily checked that for all i, j, ~ lim z→∞ 0 2 C1 e−C1 z ≤ q̆(z, r2 ), 0 2 q̂(z, r2 ) ≤ C2 e−C2 z , which immediately implies (iii). We now check Condition (d) with I set to be the index i corresponding to the largest σi . For (i), observe that for ~ ~ sufficiently small r2 , the derivative of q θ (z|j)/q θ (z|I) with m2 ~ ~ respect to θ is bounded on θ ∈ C~ (r2 ) uniformly over all j θ0 q θ~0 (z|I) = 0, which immediately implies Condition (8). So, for an additive Gaussian channel, if Π is strictly positive at ~ε0 ∈ Ω1 , then h(Z) is analytic around (~ε0 , (σ1 , µ1 , σ2 , µ2 , · · · , σl , µl )). Our second result on joint analyticity deals with the case when for all i, the real part of q θ (z|i) “dominates” the imaginary part of the complex extension. Theorem I.8. For given (~ε0 , θ~0 ) ∈ Ω1 × Ω2 , assume Conditions (a), (b) and (c). If, in addition, for any δ > 0, there exists 2 r2 > 0 such that for all (y, z) ∈ (Y, Z) and all θ~ ∈ Cm ~0 (r2 ) θ ~ q θ (z|y) ~ ~ θ θ (i) |=(q (z|y))| < δ|<(q (z|y))|, (ii) log ~ ≤ δ, q θ0 (z|y) (11) then h(Z) is analytic around (~ε0 , θ~0 ). Theorem I.8 applies to Cauchy channels (9) roughly because small complex perturbations of the parameters will have small ~ effect on the imaginary part of q θ (z|i) uniformly over z and i. However, it need not apply to Gaussian channels (10) because a small imaginary perturbation of a variance may overwhelm ~ the real part of q θ (z|i) for large z. Theorem I.8 also applies to other channels for which Conditions (7) and (8) fail. Roughly ~ speaking, Condition (7) says that all q θ (z|i) are in some sense “comparable” with one another at given point θ~0 , and ~ Condition (8) says that one of q θ (z|i) dominates all the others. It is not surprising that one can find channels that do not satisfy both conditions, however satisfy (11). A simple yet somewhat artificial example of such channels with channel parameter θ~ = (µ1 , γ1 , µ2 , γ2 ) is defined as follows: Ω2 , {(σ1 , µ1 , σ2 , µ2 , . . . , σl , µl ) ∈ R2l : σi > 0, i = 1, 2, . . . , l}. Let θ~0 ∈ Ω2 . Obviously, Condition (b) is trivially satisfied. Now, for Condition (c), note that (i) and (ii) hold for any r2 with 0 < r2 < mini {σi }. And for any such fixed r2 , there exists C1 , C10 , C2 , C20 > 0, which are independent of z, such that q θ0 (z|j) ~ q θ (z|i) = and 1 γi , π (z − µi )2 + γi2 γi > 0, i = 1, 2, 2 1 q(z|3) = √ e−z /2 . 2π ~ ~ It is easy to see that q θ (z|1) is comparable to q θ (z|2), however not to q(z|3); and clearly, none of the three distributions dominates. On the other hand, such a channel satisfies the assumptions in Theorem I.8, and therefore h(Z) is analytic with respect to (µ1 , γ1 , µ2 , γ2 ) for the same reason as Theorem I.8 applies to Cauchy channels in Example I.6. Theorems I.4 and I.8 can be regarded as extensions of [11, Theorem 1.1], which deals with the case where Z is finite. In particular, the flow of the proof of Theorem I.4 follows closely 4 that of this case. However, new techniques are needed to deal with the continuous case. The extension to Theorems I.8 is less direct, where the use of a complex Hilbert metric [15] to replace the classical real Hilbert metric, is critical. This metric was also used in the proof of Theorem I.1. It is not needed for Theorem I.4, because it assumes stronger conditions on the channel density functions. We remark that in [11, Theorem 1.1], zero values are allowed for some transition probabilities. It seems more difficult to handle this phenomena in the continuous-alphabet setting; this is the subject of forthcoming work. To the best of our knowledge, the results in this paper, together with those in [14], are among the first results establishing analyticity of the entropy rate of hidden Markov chains with continuous alphabet. Nevertheless, following the approach in [11], the analyticity of the expected log-likelihood for certain hidden Markov models has been established in [40]. Our paper has some similarities with [40]. Specifically, the entropy rate of a hidden Markov chain corresponding to parameter θ is exactly the expected value of the log-likelihood when both the expectation and the log-likelihood are taken with the same parameter value. So, to prove analyticity of the entropy rate, we must let the expectation and log-likelihhod vary together. In contrast, in [40], the expectation is taken with respect to a fixed parameter value. Related work in finite alphabet. As opposed to continuous alphabet, the entropy rate of hidden Markov chains with finite alphabet has been extensively studied. Next, we briefly review the related results in the literature. In the remainder of this section, unless specified otherwise, we assume Z is a finite alphabet and X is a finite-state stationary Markov chain with transition probability matrix Π and stationary vector π; then, the output process Z is a finite-state hidden Markov chain. It is well known that h(X), the entropy rate of X, has a simple analytic formula in terms of Π and π; in stark contrast, there is no simple and explicit formula of h(Z) for most non-degenerate channels ever since hidden Markov chains (or, more precisely, hidden Markov models) were formulated half a century ago. Here, we remark that Blackwell [4] showed that h(Z) can be written as an integral of an explicit function on a simplex with respect to the Blackwell Measure. However, the Blackwell measure seems to be rather complicated for effective computation of h(Z). Recently there has been a rebirth of interest in computing and estimating h(Z) in a variety of scenarios, and many approaches have been adopted to tackle this problem: the Blackwell measure has been used to bound h(Z) [27], a variation on the classical Birch bounds [3] can be found in [7] and a new numerical approximation of h(Z) has been proposed in [25]. Generalizing Blackwell’s idea, an integral formula for the derivatives of h(Z) has been derived in [36]. In another direction, [2], [8], [18], [27], [42], [43], [26], [21], [33], [36], [29] have studied the variation of the entropy rate as parameters of the underlying Markov chain vary. Particularly in [42], for a special type of hidden Markov chain Z, the Taylor series expansion of h(Z) is given, under the assumption that h(Z) is analytic. Under mild positivity assumptions, we prove [11] that, h(Z) is an analytic function of Π, thereby confirming the analyticity assumption in [42]. Aside from its significance in mathematics (see, e.g., [31], [32], [5], [22]), this analyticity result is of interest to a wide range of applications (see, e.g., [42], [43], [1], [25], [40], [36]). In particular, the analyticity result and techniques employed in [11], such as complexification and exponential convergence, are rather useful in many aspects of information theory as well. Using the ideas for proving the analyticity result, we derive [16] an asymptotic formula of h(Z) at so-called “weak Black Holes”, rather general settings which include “Black Holes” [12] and the input-restricted channels in [13] as special cases. These results provide much insight for characterizing [13] the asymptotic behavior (as ε tends to zero) of capacity of a class of input-restricted channels and deriving [17] concavity of the mutual information rate for such channels. Here, we remark that the concavity result in [17] confirms the concavity conjecture in [30] in a special scenario; on the other hand, the conjecture in the general setting has been disproved in [24] by counter-examples constructed based on the results in [12]. It turns out the result and techniques in [11] are instrumental in terms of computing the mutual information rate and the channel capacity of finitestate channel as well. Recently, employing analyticity in a critical way, we have established refinements of the ShannonMcMillan-Breiman theorem which sheds light on effective computation of the mutual information rate of a class of finitestate channels and we further proposed [10] a randomized algorithm to compute the capacity of a class of finite-state channels, whose convergence behavior is analyzed using the analyticity result. In this paper, we will establish the analyticity of the entropy rate of hidden Markov chains with continuous alphabet. Given the implications of the counterpart results in the discrete setting, we expect that the results in this paper will be of great interest and significance in relevant areas. Organization of the paper. The remainder of this paper is organized as follows. In Section II, we review the (real) Hilbert metric, outline the framework of the proofs of our theorems and highlight the differences among the proofs. We prove Theorem I.4 in Section III; on the other hand, we show that analyticity fails for the Gaussian channel with uniform i.i.d. inputs in Section IV. Section V is devoted to reviewing the complex Hilbert metric, which is of critical use to the proofs of Theorem I.1 in Section VI and Theorem I.8 in Section VII. II. T HE M AIN I DEA OF THE P ROOFS We first briefly review the classical (real) Hilbert metric. The real Hilbert metric will be used in the proof of Theorem I.4. Let W be the standard simplex in the l-dimensional real Euclidean space, X W = {w = (w1 , w2 , · · · , wl ) ∈ Rl : wi ≥ 0, wi = 1}, i ◦ and let W denote its interior, consisting of the vectors with positive coordinates. For any two vectors v, w ∈ W ◦ , the Hilbert metric [38] is defined as wi /wj . (12) dH (w, v) = max log i,j vi /vj 5 Remark II.1. It immediately follows from the definition that for any diagonal matrix A with strictly positive diagonal entries, we have dH (w, v) = dH (wA, vA). For an l × l positive matrix T = (tij ) (i.e., each tij > 0), the mapping fT induced by T on W is defined by wT , (13) wT 1 where 1 is the all 1’s column vector. The following theorem is well-known (see [38]). fT (w) = Theorem II.2. For a positive T , fT is a contraction mapping on the entire W ◦ under the Hilbert metric and the contraction coefficient (often referred to as the Birkhoff coefficient), is given by p 1 − φ(T ) dH (vT, wT ) p , (14) = τ (T ) = sup 1 + φ(T ) v6=w dH (v, w) tik tjl tjk til . where φ(T ) = mini,j,k,l We will also need a complex version of W , X W̃ = {w = (w1 , w2 , · · · , wl ) ∈ Cl : wi = 1}. i ~ ~ ε,θ Now, for each z ∈ Z, define Π the entries (z) as an l × l matrix with ~ ~ ~ ε θ Π~ε,θ (z)ij = πij q (z|j), for all i, j. ~ (15) ~ By (13), Π~ε,θ (z) will induce a mapping fz~ε,θ , fΠ~ε,θ~ (z) from 0 , define W to W . For any fixed n and z−n ~ ~ ~ i x~εi ,θ = x~εi ,θ (z−n ) = p~ε,θ (yi = · |zi , zi−1 , · · · , z−n ), (16) (here · represent the states of the Markov chain Y ) then similar ~ to Blackwell [4], {x~εi ,θ } satisfies the random dynamical system ~ ~ ~ ,θ ,θ x~εi+1 = fz~εi+1 (x~εi ,θ ), (17) starting with ~ ,θ = π ~ε, x~ε−n−1 (18) where π ~ε is the stationary vector for Π~ε. And it can be verified that ~ ~ ~ ~ ,θ −1 p~ε,θ (z0 |z−n ) = x~ε−1 Πε,θ (z0 )1, (19) ~ ~ ε,θ 2 to C~εm01 (r1 ) × Cm ~0 (r2 ). Ultimately hn (Z) (which is defined θ ~ on p(z 0 ) and p(z0 |z −1 )) by (2) with real superscripts (~ε, θ) −n −n 2 (r2 ) as well. can be analytically extended to C~εm01 (r1 ) × Cm ~ θ0 The framework for the proofs of the main theorems can be outlined as follows: (I) If necessary, we consecutively re-block the Z process to k(i) a Ẑ process such that Ẑi is of the form Zj(i) . (II) We then show that there exists a complex neighborhood ~ of a subset of W such that each complexified fẑ~ε,θ is a contraction mapping, with respect to some metric, and ~ i moreover the complexified x̂~εi ,θ (ẑ−n̂ ) stays within the neighborhood. ~ i (III) It then follows that the complexified x~εi ,θ (z−n ) and thus ~ the complexified p~ε,θ (z0 |z−n ) exponentially forget their initial conditions. (IV) This, together with bounding arguments, will further im~ ply that the complexified h~εn,θ (Z) uniformly converges to a complex analytic function, which is necessarily ~ the complexified h~ε,θ (Z), on a complex domain, and ~ therefore h~ε,θ (Z) is analytic. The proofs of Theorem I.1 and I.8 are somewhat parallel, while the latter requiring extra attention to address the technical issues arising in proving joint analyticity. On the other hand, despite the fact that the proofs of Theorems I.4 and Theorems I.8 all fit in the same above-mentioned framework, there does not seem to be a natural way to unify them. Among numerous differences, the most essential one is the way we establish (II): ~ ~ ε,θ is • To establish (II), we will use the fact that each real fz ◦ a contraction on W , with respect to the Hilbert metric. Then we use the “equivalence” between the Euclidean metric and the Hilbert metric, and equicontinuity in Condition (d(i)) to establish the contractiveness (with ~ respect to the Euclidean metric) of the complexified fẑ~ε,θ on a complex neighborhood of W . • For Theorem I.8, to establish (II), the complex Hilbert metric in Section V is employed to directly show the contractiveness, with respect to the complex Hilbert met~ ric, of the complexified fz~ε,θ on a complex neighborhood of W ◦ . III. P ROOF OF T HEOREM I.4 ~ and ~ ~ ~ ~ 0 p~ε,θ (z−n ) = π ~εΠ~ε,θ (z−n )Π~ε,θ (z−n+1 ) · · · Π~ε,θ (z0 )1. (20) ~ ~ ~ 0 Evidently x~εi ,θ , p~ε,θ (z0 |z−n ) and p~ε,θ (z−n ) all depend on ~ ∈ Ω1 ×Ω2 . In what follows, we shall show the real vector (~ε, θ) that they can be “complexified”. For any ~ε ∈ C~εm01 (r1 ), one checks that for r1 > 0 small enough, the stationary vector π ~ε is unique and analytic on C~εm01 (r1 ) as a function of ~ε (because P ~ε it is the unique solution of π ~εΠ~ε = π ~ε, y πy = 1). ~ Then through (18) and (17), x~εi ,θ can be analytically ex2 tended to C~εm01 (r1 ) × Cm ~ (r2 ); furthermore, through (19) and ~ ~ ε,θ (20), p θ0 ~ ~ ε,θ (z0 |z−n ) and p 0 (z−n ) can be analytically extended The following lemma says that fz~ε,θ does not change much ~ under a small complex perturbation of (~ε, θ). Lemma III.1. For any δ > 0, there exist r1 , r2 > 0 such that ~ ∈ Cm1 (r1 ) × Cm2 (r2 ), any z ∈ Z and any for any (~ε, θ) ~ ε0 ~0 θ x ∈ W , we have ~ ~ kfz~ε,θ (x) − fz~ε0 ,θ0 (x)k ≤ δ where k · k means the Euclidean norm. Proof. We first note that ~ ~ ~ fz~ε,θ (x) = xΠ~ε,θ (z) xΠ~ε,θ~ (z)1 = ~ x(Π~ε,θ (z)/q θ (z|I)) x(Π~ε,θ~ (z)/q θ~ (z|I))1 . 6 Assuming (7): The lemma immediately follows from (7) and Condition (d(i)). Assuming (8): It follows from (8) that for sufficiently small r1 , r2 > 0, there exists a compact subset Σ ⊂ Z such that for ~ ∈ Cm1 (r1 ) × Cm2 (r2 ), any z 6∈ Σ and any x ∈ W , any (~ε, θ) ~ ~ ε0 ~ ~ ε,θ Lemma III.3. 1) For any δ > 0, there exist r1 , r2 > 0 ~ ∈ Cm1 (r1 ) × Cm2 (r2 ) and for such that for any (~ε, θ) ~ ε0 ~0 θ 0 all z−n ∈ Z n+1 and −n − 1 ≤ i ≤ −1, ~ i x~εi ,θ (z−n ) ∈ W̃W (δ). θ0 ~ θ x(Π (z)/q (z|I))1 is bounded away from 0. On the other hand, by the compactness of Σ, we deduce that there exist ~ ∈ Cm1 (r1 ) × Cm2 (r2 ), any r1 , r2 > 0 such that for any (~ε, θ) ~ ~ ε0 ~ θ0 ~ z ∈ Z and any x ∈ W , x(Π~ε,θ (z)/q θ (z|I))1 is bounded away from 0. The lemma then follows from Condition (d(i)). Now, define 2) 3) There exist r1 , r2 > 0, 0 < ρ1 < 1 and L1 > 0 such that for any two Z-valued sequences {a0−n1 } and {b0−n2 } ~ ∈ Cm1 (r1 )×Cm2 (r2 ), with a0−n = b0−n and for all (~ε, θ) ~0 ~ ε0 θ we have ~ Lemma III.2. Given any (~ε0 , θ~0 ) ∈ Ω1 × Ω2 , there exist r1 , r2 , δ > 0, 0 < ρ1 < 1 and a positive integer n0 such that, ~ ∈ Cm1 (r1 )×Cm2 (r2 ), for all zij with j ≥ i+n0 and all (~ε, θ) ~ ε0 ~ θ0 ~ fz~εj,θ i is a ρ1 -contraction mapping on W̃W (δ) under the Euclidean metric. Proof. For any z ∈ Z and sufficiently small r1 , r2 > 0, it follows from Remark II.1 that for any u, v ∈ W , we have ~ ~ dH (uΠ~ε0 ,θ0 (z), vΠ~ε0 ,θ0 (z)) = dH (uΠ~ε0 , vΠ~ε0 ). (21) It then follows that for any zij , ~ θ0 contraction on W̃W (δ) under the Euclidean metric. To prove (22), it is enough to prove the version of (22) ~ i ~ i with x~εi ,θ (z−n ) replaced by x̂~εi ,θ (ẑ−n ) (with perhaps smaller r1 , r2 ). To see this, note that by Lemma III.1, for sufficiently small ~ ∈ Cm1 (r1 )×Cm2 (r2 ), r1 , r2 > 0, for all ẑ, x ∈ W , and (~ε, θ) ~ ε0 ~ θ0 ~ ~ kfẑ~ε,θ (x) i where τ (Π~ε0 ), the Birkhoff coefficient of Π~ε0 as defined in (14), is strictly less than 1. It then follows from Lemma 3.4 in [23] (the mixing assumption is satisfied due to (15) and Condition (a)) that there exists C > 0 such that for any zij and any u, v ∈ W , ~ kfz~εj0 ,θ0 (u) i − ~ fz~εj0 ,θ0 (v)k i ≤ Cτ (Π~ε0 )j−i ku − vk. For n0 sufficiently large, ρ0 := Cτ (Π~ε0 )n0 < 1. Thus, for j ≥ i + n0 , ~ k∇fz~εj0 ,θ0 (w)k ≤ ρ0 < 1 i for all w ∈ W . From this and Condition (d(i)), it follows that there exists r1 , r2 , δ > 0, 0 < ρ1 < 1 such that ~ k∇fz~εj,θ (w)k i (24) kπ ~ε − π(~ε0 )k ≤ δ(1 − ρ1 ). (25) Thus, ~ ~ ~ ~ ~ ~ ~ ~ ,θ 0 ,θ0 kx̂~εi ,θ − x̂~εi 0 ,θ0 k = kfẑ~εi,θ (x̂~εi−1 ) − fẑ~εi0 ,θ0 (x̂~εi−1 )k ~ ~ ,θ 0 ,θ0 ≤ kfẑ~εi,θ (x̂~εi−1 ) − fẑ~εi,θ (x̂~εi−1 )k ~ ~ ~ ~ 0 ,θ0 0 ,θ0 + kfẑ~εi,θ (x̂~εi−1 ) − fẑ~εi0 ,θ0 (x̂~εi−1 )k. (26) Then by (24) and (25), and (26), we have ~ ~ ~ ~ ,θ 0 ,θ0 kx̂~εi ,θ − x̂~εi 0 ,θ0 k ≤ ρ1 kx̂~εi−1 − x̂~εi−1 k + δ(1 − ρ1 ). So, for all i, ~ ~ ~ The following lemma essentially follows from the framework in the proof of Theorem 1.1 in [11]. We briefly outline the proof for completeness. We first introduce some notation. Let ~ −1 −1 p̊~ε,θ (z0 |z−n ) , p~ε,θ (z0 |z−n )/q θ (z0 |I), where I is the same as in Condition (d). ≤ δ(1 − ρ1 ), and for all ~ε ∈ C~εm01 (r1 ) ≤ ρ1 < 1 ~ − ~ fẑ~ε0 ,θ0 (x)k kx̂~εi ,θ − x̂~εi 0 ,θ0 k ≤ δ, ~ ∈ Cm1 (r1 ) × Cm2 (r2 ). The for all w ∈ W̃W (δ) and (~ε, θ) ~ ε0 ~0 θ lemma then follows. ~ (23) Proof. 1. For a fixed n0 > 0, we will consecutively reblock k(i) −1 −1 z−n into ẑ−n̂ such that each ẑi is of the form zj(i) , where k(i) − j(i) + 1 = n0 (n0 is determined below). ~ ~ 0 By (17), for any z−n and i, x~εi 0 ,θ0 (and thus x̂~εi 0 ,θ0 ) belongs to W ◦ . By Lemma III.2, we can choose r1 , r2 , δ > 0 sufficiently small, n0 sufficiently large and 0 < ρ1 < 1 ~ ∈ Cm1 (r1 ) × Cm2 (r2 ), f ~ε,θ~ is a ρ1 such that for all (~ε, θ) ẑ ~ ε0 ~ dH (fz~εj0 ,θ0 (u), fz~εj0 ,θ0 (v)) ≤ τ (Π~ε0 )j−i dH (uΠ~ε0 , vΠ~ε0 ), i ~ ~ ε,θ n |p̊~ε,θ (a0 |a−1 (b0 |b−1 −n1 ) − p̊ −n2 )| ≤ L1 ρ1 . W̃W (δ) = {w̃ ∈ W̃ : kw̃ − wk < δ for some w ∈ W }. We need the following lemma. (22) 0 There exist r1 , r2 > 0 such that for all z−n ∈ Z n+1 , ~ −1 2 p~ε,θ (z0 |z−n ) is analytic on C~εm01 (r1 ) × Cm ~0 (r2 ). θ and thus for all i, we have x̂~εi ,θ ∈ W̃W (δ), as desired. 2. It follows from Condition (d(i)) that for sufficiently small ~ r1 , r2 , δ > 0 and any z ∈ Z, fz~ε,θ (x) is analytic with respect to m m ~ x) ∈ C 1 (r1 )×C 2 (r2 )× W̃W (δ). It then follows from (~ε, θ, ~ ε0 ~ θ0 ~ this fact and the iterative nature of x~εi ,θ (see (17)) and Part 1 ~ that for sufficiently small r1 , r2 > 0, each x~εi ,θ is analytic on m2 m1 C~ε0 (r1 ) × C~ (r2 ). Part 2 then immediately follows from θ0 (19). 3. Applying the same reblocking as in Part 1, we write ~ ~ ~ ,θ x̂~εi,â = x̂~εi ,θ (âi−n̂1 ) = p~ε,θ (yk(i) = · |âi−n̂1 ), 7 ~ ~ ~ x̂~εi,,b̂θ = x̂~εi ,θ (b̂i−n̂2 ) = p~ε,θ (yk(i) = · |b̂i−n̂2 ). In other words, Rn uniformly (in θ ∈ D) converges to Z f (θ, z)dz, Evidently we have ~ ~ ~ ~ ,θ ,θ ,θ x̂~εi+1,â = fâ~εi+1 (x̂~εi,â ), ~ Σ ~ ,θ x̂~εi+1, = fb̂~ε,θ (x̂~εi,,b̂θ ). b̂ i+1 Note that there exists a positive constant L01 such that for all ~ ∈ Cm1 (r1 ) × Cm2 (r2 ), (~ε, θ) ~0 θ ~ ε0 ~ ~ ,θ ,θ kx̂~ε−n̂,â − x̂~ε−n̂, k ≤ L01 , b̂ ~ ∈ Cm1 (r1 ) × Cm2 (r2 ), where r1 , r2 > 0 are for all (~ε, θ) ~ ~ ε0 θ0 ~ chosen sufficiently small. Since fẑ~ε,θ is a ρ1 -contraction on W̃W (δ), we have, by Part 1, ~ ~ ,θ ,θ kx̂~ε−1,â − x̂~ε−1, k ≤ L01 ρn̂−1 . 1 b̂ This implies that there exists L1 > 0, independent of n1 , n2 , ~ ∈ Cm1 (r1 ) × Cm2 (r2 ), such that for all (~ε, θ) ~ ~ ε0 θ0 ~ ~ ε,θ kp̊ ~ ~ ε,θ (a0 |a−1 −n1 ) − p̊ R and so Σ f (θ, z)dz is continuous in θ ∈ D. Now, take any increasing sequence of compact sets Σi R whose union is Rn2 . By R(28), Σi f (θ, z)dz converges uniformly, in θ ∈ D, to Rn2 f (θ, z)dz, which is therefore continuous on D. This gives Part 1. Part 2 follows in the same way with analyticity replacing continuity. n (b0 |b−1 −n2 )k ≤ L1 ρ1 . (27) We are now ready for the proof of Theorem I.4. Proof of Theorem I.4. We first show that there exist r1 , r2 > 0 ~ 2 such that for any n, h~εn,θ (Z) is analytic on C~εm01 (r1 )×Cm ~0 (r2 ). θ For a fixed n, recall that Z ~ ~ 0 ~ −1 ~ ε,θ 0 hn (Z) = − p~ε,θ (z−n ) log p~ε,θ (z0 |z−n )dz−n , Z n+1 where ~ The proof of the following lemma is straightforward. Nonetheless, we sketch the proof for completeness. Lemma III.4. Let n1 and n2 be positive integers and D a compact domain in Cn1 . Let f (θ, z) be a jointly continuous function on D × Rn2 . Assume that Z sup |f (θ, z)|dz < ∞. (28) Rn2 θ∈D Then R 1) Rn2 f (θ, z)dz is continuous on D. 2) RIf, for each z ∈ Z, f is analytic on D, then f (θ, z)dz is analytic on D. Rn2 Proof. Let Σ be a compact domain in Rn2 . Let δi , i = 1, 2, . . . be a sequence of positive numbers converging to 0. Consider a sequence of partitions of Σ: Σ= Rn = f (θ, zi )vol(∆n,i ) i=1 (here zi ∈ ∆n,i ) is continuous in θ. Then Z mn Z X f (θ, z)dz − Rn = (f (θ, z) − f (θ, zi ))dz. Σ i=1 ∆n,i By the compactness of D and Σ, we deduce that for any ε0 > 0, there exists N0 such that for all n ≥ N0 and all i = 1, 2, . . . , mn |f (θ, z) − f (θ, zi )| ≤ ε0 , for any θ ∈ D and any z ∈ ∆n,i , X and 0 Y 0 p~ε(y−n ) 0 y−n (29) i=−n ~ ~ ~ q θ (zi |yi ) ~ ,θ −1 −1 ) = x~ε−1 p~ε,θ (z0 |z−n (z−n )Π~ε,θ (z0 )1. P 0 Now, it follows from (29) and the fact that p~ε(y−n ) 0 y−n is continuous and therefore bounded as a function of ~ε ∈ C~εm01 (r1 ) that there is a constant K > 0 such that 0 Y ~ sup m1 m ~ (~ ε,θ)∈C (r1 )×C~ 2 (r2 ) ~ ε0 θ0 0 |p~ε,θ (z−n )| ≤ K i=−n ~ |q θ (zi |y)|. sup m2 ~ (y,θ)∈Y×C (r2 ) ~ θ 0 (30) It also follows from Part 1 of Lemma III.3 and (Condition (7) or (8)) that for sufficiently small r1 , r2 > 0, there exist 0 C1 , C2 > 0 such that for any z−n , ~ −1 )| ≤ C2 , C1 ≤ |p̊~ε,θ (z0 |z−n n ∪m i=1 ∆n,i , where diam(∆n,i ) ≤ δn for all 1 ≤ i ≤ mn . Evidently, the corresponding Riemann sum mn X 0 p~ε,θ (z−n )= (31) which implies that for some C3 > 0, ~ −1 | log p̊~ε,θ (z0 |z−n )| ≤ C3 . (32) ~ ∈ Cm1 (r1 ) × Cm2 (r2 ), We then deduce that for any (~ε, θ) ~ ε0 ~0 θ Z ~ ~ε,θ~ 0 −1 0 sup ) dz−n p (z−n ) log p~ε,θ (z0 |z−n m1 m ~ Z n+1 (~ ε,θ)∈C (r1 )×C~ 2 (r2 ) ~ ε θ0 0 Z = sup m1 m ~ Z n+1 (~ ε,θ)∈C (r1 )×C~ 2 (r2 ) ~ ε 0 ~ ~ ε,θ +p ~ ~ε,θ~ 0 p (z−n ) log q θ (z0 |I) θ0 ~ −1 0 0 (z−n ) log p̊~ε,θ (z0 |z−n ) dz−n 0 Y ~ ~ 0 θ ≤ sup K q (zi |yi ) log q θ (z0 |I) dz−n m2 n+1 ~ Z θ∈C (r2 ) Z ~ θ 0 i=−n which implies that for any ε > 0, there exists N1 such that Z ~ 0 ~ −1 0 for all n ≥ N1 and all θ ∈ D, + sup |p~ε,θ (z−n )| | log p̊~ε,θ (z0 |z−n )|dz−n < ∞. m m n+1 Z 1 (r )×C 2 (r ) ~ Z (~ ε , θ)∈C 1 2 ~ ε0 ~ θ 0 f (θ, z)dz − Rn ≤ ε. (33) Σ 8 (for the first term we have used (5(i)) and (6); for the second term, we have used (5(i)), (30) and (32)). By lemma III.4 (Part 2), Z ~ ~ 0 ~ −1 −1 ~ ε,θ 0 hn (Z0 |Z−n ) = p~ε,θ (z−n ) log p~ε,θ (z0 |z−n )dz−n , Z n+1 2 is analytic on C~εm01 (r1 ) × Cm ~0 (r2 ). θ Now, to prove the theorem, we only need to prove that ~ there exist r1 , r2 > 0 such that h~εn,θ (Z) uniformly converges 2 on C~εm01 (r1 ) × Cm ~0 (r2 ) as n → ∞. First, we observe that θ Z ~ ~ 0 ~ ~ ,θ −1 0 p~ε,θ (z−n−1 ) log p~ε,θ (z0 |z−n−1 )dz−n−1 (Z) − h~εn,θ (Z)| = |h~εn+1 n+2 Z Z ~ 0 ~ −1 ~ ε,θ ~ ε,θ 0 − p (z−n ) log p (z0 |z−n )dz−n n+1 ZZ ~ 0 ~ −1 0 p~ε,θ (z−n−1 ) log p̊~ε,θ (z0 |z−n−1 )dz−n−1 = n+2 Z Z ~ 0 ~ −1 0 − p~ε,θ (z−n ) log p̊~ε,θ (z0 |z−n )dz−n n+1 ZZ ~ 0 ~ −1 p~ε,θ (z−n−1 )(log p̊~ε,θ (z0 |z−n−1 ) = Z n+2 ~ −1 0 ))dz−n−1 − log p̊~ε,θ (z0 |z−n . ~ ∈ Cm1 (r1 ) × Cm2 (r2 ). Then, by (23), (31), (32), Fix (~ε, θ) ~ ε0 ~0 θ we have, for some 0 < ρ1 < 1, L01 , L1 > 0, ~ ~ ~ −1 −1 0 |p~ε,θ (z−n−1 )(log p̊~ε,θ (z0 |z−n−1 ) − log p̊~ε,θ (z0 |z−n ))| ~ 0 ~ ~ −1 −1 0 ~ ε,θ ~ ε,θ ~ ε,θ ≤ L1 p (z−n−1 )(p̊ (z0 |z−n−1 ) − p̊ (z0 |z−n )) ~ 0 ≤ L01 |p~ε,θ (z−n−1 )|L1 ρn1 . Notice that for any given δ > 0, there exist r1 , r2 > 0 such that for all ~ε ∈ C~εm01 (r1 ), 0 kπy~ε−n k ≤ (1 + δ)πy~ε−n , kπy~εi yi+1 k ≤ (1 + δ)πy~εi0yi+1 , 2 and for any y ∈ Y and all θ~ ∈ Cm ~0 (r2 ), θ Z Z ~ ~ |q θ (z|y)|dz ≤ (1 + δ) q θ0 (z|y)dz = 1 + δ, IV. FAILURE OF A NALYTICITY With the following example, we show that for Gaussian channels, h(Z) need not be analytic even as a function of the channel parameters alone, when the largest σi is not unique. Example IV.1. Consider an additive Gaussian channel parameterized as in (10) with the binary input alphabet Y = {1, 2}. Assume that the input Y is an i.i.d. process with P (Y1 = 1) = P (Y1 = 2) = 1/2; and assume that 2 2 2 2 1 1 e−(z+1) /σ1 , q(z|2) = √ e−(z−1) /σ2 . q(z|1) = √ 2πσ1 2πσ2 We then have p(z) = P (Y = 1)q(z|1) + P (Y = 2)q(z|2) 2 2 2 2 1 1 1 1 = √ e−(z+1) /σ1 + √ e−(z−1) /σ2 . 2 2πσ1 2 2πσ2 We claim that for any fixed σ > 0, analyticity of h(Z) as a function of (σ1 , σ2 ) fails at (σ1 , σ2 ) = (σ, σ). To see this, we fix σ1 = σ, and we show that h(Z) is not analytic with respect to σ2 at σ2 = σ. Note that for any real σ2 , Z ∞ h(Z) = − p(z) log p(z)dz Z−∞ ∞ 1 1 −(z−1)2 /σ22 e =− p(z) log √ dz 2 2πσ2 0 Z 0 2 2 1 1 e−(z−1) /σ dz (34) − p(z) log √ 2 2πσ Z−∞ ∞ − p(z) log (1 + Φz (σ2 )) dz 0 Z 0 − p(z) log 1 + Φ−1 (35) z (σ2 ) dz, −∞ where 1 σ2 (z−1)2 /σ22 −(z+1)2 /σ2 e and Φ−1 . z (σ2 ) , σ Φz (σ2 ) (36) Z Z We note that (34) is analytic as a function of σ at σ = σ. 2 2 R ~ (here we have used the fact that Z |q θ (z|y)|dz is a continuous To see this, observe that the first term of (34) can be further 2 computed as function of θ~ ∈ Cm ~0 (r2 ); this follows from Lemma III.4 (Part θ Z ∞ Z ∞ 1)). It then follows from (30) that √ 2 2 1 Z √ p(z)dz− (z−1)2 e−(z+1) /σ dz − log(2 2πσ2 ) 2 ~ −1 −1 2 2πσσ2 0 0 |p~ε,θ (z−n−1 )|dz−n−1 ≤ (1 + δ)2(n+2) . Z ∞ Z n+1 2 2 1 √ − (z − 1)2 e−(z−1) /σ2 dz, By choosing δ > 0 sufficiently small, we can combine all 3 2 2πσ2 0 the relevant inequalities above to obtain some L > 0 and some m m 1 2 ~ ∈ C (r1 ) × C (r2 ), which is analytic at σ2 = σ since each of the three terms above 0 < ρ < 1 such that for all (~ε, θ) ~ ε0 ~0 θ is analytic at σ2 = σ (for the second or third term, regard σ2 as ~ ~ ,θ a complex variable and use the exponentially-decaying tail of |h~εn+1 (Z) − h~εn,θ (Z)| Z the integrand). With a similar argument applied to the second ~ 0 ~ ~ −1 −1 0 ≤ |p~ε,θ (z−n−1 )(log p̊~ε,θ (z0 |z−n−1 ) − log p̊~ε,θ (z0 |z−n ))|dz−n−1 term of (34), we can then establish the analyticity of (34). So, Z n+2 to prove h(Z) is not analytic at σ2 = σ, it suffices to show n ≤ Lρ , that (35) is not analytic at σ2 = σ. ~ Suppose, by way of contradiction, that (35) is analytic at σ, which implies the uniform convergence of h~εn,θ (Z) on or equivalently, the following function of ω m1 m2 C~ε0 (r1 ) × C~ (r2 ) as n tends to infinity, and thus the Z θ0 ∞ 2 2 1 ω 1 σ −1 −(z+1)2 σ−2 ~ √ e + √ e−(z−1) ω log (1 + Φz (1/ω)) dz analyticity of h~ε,θ (Z) around (~ε0 , θ~0 ). Φz (σ2 ) , 0 2 2π 2 2π 9 Z 0 + −∞ 2 2 1 σ −1 −(z+1)2 σ−2 1 ω √ e + √ e−(z−1) ω 2 2π 2 2π log 1 + Φ−1 z (1/ω) dz (37) is analytic at σ −1 ∈ Cσ−1 (r) (the closure of the rneighborhood of σ −1 in C) for some r > 0, where, recalling from (36), Φz (1/ω) = σ −1 (z−1)2 ω2 −(z+1)2 σ−2 e . ω Then, by uniqueness, the analytic extension of (37) to Cσ−1 (r) would agree with any analytic extension along the circle {σ −1 + reiα : α ∈ [−π/2, 3π/2]} (from α = −π/2 to α = 3π/2). Such an analytic extension is obtained by regarding ω as a complex variable on the circle (this is a valid analytic extension by virtue of the exponentially-decaying tails of the integrands in (37)). Here, we remark that for any r > 0 and α, there are at most two “singular” z (note that the following inequality boils down to a system of two quadratic equations in z) such that and ∂ (2(z − 1)2 σ −1 r sin α) ∂ ((z − 1)2 r2 sin 2α + b(r, α)) . ∂α ∂α Note that for all 0 ≤ z ≤ N , by (I), Φz (1/ω) will not go around −1 (in any direction) for one complete round as α increases from −π/2 to 3π/2. Next, we consider the case when z ≥ N . Notice that, by (II), for any fixed z ≥ N , as α increases from −π/2 + ε to π/2 − ε, B(z, r, α) increases as well. If, for some z ≥ N and α0 ∈ [−π/2, −π/2 + ε] ∪ [π/2 − ε, π/2], A(z, r, α0 ) > 0, it then follows from (II) that there exists ` = `(ε) > 0 such that ` → ∞ as ε → 0 and for the same z and any α ∈ [−`ε, `ε], A(z, r, α) > 0. On the other hand, it follows from (II) that for any z ≥ N and for any α ∈ [π/2 + ε, 3π/2 − ε], Φz (1/(σ −1 + reiα )) = −1, −1 A(z, r, α) < 0; iα which means log 1 + Φz (1/(σ + re )) or −1 + reiα )) would “blow up” at such z. log 1 + Φ−1 z (1/(σ However, an easy bounding argument (roughly speaking, the two “blowing up” terms will only do so “slowly”) yields that during the analytic extension, (37) is still well-defined with the presence of such singular z, and so the above analytic extension is indeed valid. Next, we will find a contradiction by showing that the analytic extension of (37) disagrees at α = −π/2 and α = 3π/2. Setting ω = σ −1 + reiα , we then have σ −1 σ −1 = −1 w σ + r cos α + ir sin α σ −2 + σ −1 r cos α − iσ −1 r sin α = , ea(r,α)+ib(r,α) , σ −2 + 2σ −1 r cos α + r2 where one can easily check that ∂b(r, α) = O(r). ∂α Then, some straightforward computations yield that a(r, α) = O(r), b(r, α) = O(r), Φz (1/ω) = eA(z,r,α) eiB(z,r,α) , where A(z, r, α) , 2(z−1)2 σ −1 r cos α+(z−1)2 r2 cos 2α−4zσ −2 +a(r, α), and B(z, r, α) , 2(z −1)2 σ −1 r sin α+(z −1)2 r2 sin 2α+b(r, α). Now, for some small yet fixed ε > 0, choose N > 0 large enough and then r > 0 small enough such that (I) for all 0 ≤ z ≤ N and all α ∈ [−π/2, 3π/2], B(z, r, α) ∈ (−π, π); (II) for all z ≥ N and all α ∈ [−π/2 + ε, π/2 − ε] ∪ [π/2 + ε, 3π/2 − ε], 4zσ −2 |a(r, α)|, |2(z − 1)2 σ −1 r cos α| |(z − 1)2 r2 cos 2α + a(r, α)| straightforward computations also yield that for any z ≥ N and for any α ∈ [π/2, π/2 + ε] ∪ [3π/2 − ε, 3π/2], A(z, r, α) < 0. It then follows that Φz (1/ω) will not go around −1 (in any direction) for one complete round as α increases from π/2 to 3π/2. We are now ready to conclude that as α increases from −π/2 to 3π/2, for any z ≥ N with Φz (1/(σ −1 +reiα )) 6= −1, Φz (1/ω) will go around −1 anti-clockwise k(z) times, where k(z) is a non-negative integer; meanwhile, one checks that when z is large enough, k(z) is strictly positive. The idea can be roughly described as follows. Consider the “trajectory” of Φz (1/ω) as α increases from −π/2 to 3π/2. Obviously, A(z, r, α) > 0 means the magnitude of the corresponding “location” is strictly bigger than 1; B(z, r, α) > 0 means at the corresponding “location”, Φz (1/ω) is going anti-clockwise. The above argument shows that given sufficiently small ε (and thus ` sufficiently large), for all the time when the “location” is at least 1 away from the origin, “more often” Φz (1/ω) goes around −1 anti-clockwise (for any α ∈ [−π/2, −π/2 + ε] ∪ [π/2 − ε, π/2], Φz (1/ω) may go around −1 clockwise, whereas for all α ∈ [−`ε, `ε], Φz (1/ω) must go around −1 anti-clockwise). So, for any analytic extension along the circle {σ −1 +reiα : α ∈ [−π/2, 3π/2]} (from α = −π/2 to α = 3π/2), we have proven that for any z ≥ 0 with Φz (1/(σ −1 + reiα )) 6= −1, −1 iα = lim log 1 + Φz (1/(σ + re )) α→(3π/2)− == lim α→(−π/2)+ log 1 + Φz (1/(σ −1 + reiα )) + 2k(z)πi, where k(z) is a non-negative integer for all z and a strictly positive integer for all sufficiently large z. Using a similar 10 argument, we can also prove that for any z ≤ 0 with −1 Φ−1 + reiα )) 6= −1, z (1/(σ −1 −1 iα = lim log 1 + Φz (1/(σ + re )) α→(3π/2)− == lim α→(−π/2)+ M = {Π = (πij ) ∈ Rl×l : πij ≥ 0, −1 −1 iα log 1 + Φz (1/(σ + re )) +2k(z)πi, Remark IV.2. The previous example shows that for fixed σ1 = σ, h(Z) is not analytic with respect to σ2 at σ2 = σ. On the other hand, when σ1 , σ2 are both equal to σ > 0 and vary together keeping σ1 = σ2 , h(Z) is in fact analytic with respect to σ. To see this, note that when σ1 = σ2 = σ > 0, we have 2 2 1 1 −(z+1)2 /σ2 + e−(z−1) /σ , e p(z) = √ 2 2πσ l X πij = 1}, j=1 where k(z) is a non-negative integer for all z and a strictly positive integer for all sufficiently large |z|. This, however, implies that for the above analytic extension, (37) disagrees at α = −π/2 and α = 3π/2, which is a contradiction. and furthermore, Z h(Z) = − call a complex Hilbert metric (for alternative complex Hilbert metrics, see [37] and [6]). Let M denote the set of all l × l stochastic matrices, i.e., and let M̃ denote the complex version of M , defined as M̃ = {Π = (πij ) ∈ Cl×l : l X πij = 1, for all i}. j=1 For a given positive Π ∈ M and a small δ1 > 0, let M̃Π (δ1 ) denote the δ1 -neighborhood, under the Euclidean metric, around Π within M̃ . For an element Π̃ ∈ M̃Π (δ1 ), similar to (13), Π̃ will induce a mapping fΠ̃ on W̃ . For a small δ2 > 0, let W̃W ◦ ,H (δ2 ) denote the δ2 -neighborhood of W ◦ within W̃ + under the complex Hilbert metric, i.e., W̃W ◦ ,H (δ2 ) = {v = (v1 , v2 , · · · , vl ) ∈ W̃ + : d˜H (v, u) ≤ δ2 , for some u ∈ W ◦ }. The main result of [15] states: ∞ p(z) log p(z) −∞ Z ∞ √ p(z)(z + 1)2 /σ 2 dz = − log(2 2πσ) + −∞ Z ∞ 2 − p(z) log(1 + e−4z/σ )dz −∞ Z ∞ √ p(z)(z + 1)2 /σ 2 dz = − log(2 2πσ) + −∞ Z ∞ 2 −σ p(σ 2 z) log(1 + e−4z )dz. (38) −∞ Note that it is easy to check that all the three terms in (38) are analytic with respect to σ, which immediately implies that h(Z) is analytic with respect to σ. V. A C OMPLEX H ILBERT M ETRIC In this section, we briefly review the complex Hilbert metric that will be of critical use in the proofs of Theorem I.1 and I.8. Several useful properties of this metric will be stated/proved as well. Recall that W̃ denote the complex version of W , X W̃ = {w = (w1 , w2 , · · · , wl ) ∈ Cl : wi = 1}. i Let W̃ + = {v ∈ W̃ : <(vi /vj ) > 0 for all i, j}. For v, w ∈ W̃ + , define wi /wj ˜ , dH (v, w) = max log i,j vi /vj (39) where log is taken as the principal branch of the complex log(·) function (i.e., the branch whose branch cut is the negative real axis). Since the principal branch of log is additive on the right-half plane, d˜H is a metric on W̃ + , which we Theorem V.1. Let Π be a positive matrix in M . For sufficiently small δ1 , δ2 > 0, there exists 0 < ρ1 < 0 such that for any Π̃ ∈ M̃Π (δ1 ), fΠ̃ is a ρ1 -contraction mapping on W̃W ◦ ,H (δ2 ) under the complex Hilbert metric in (39). For δ > 0, let CR+ [δ] denote the “δ-cone” of R+ within C, i.e., CR+ [δ] = {x + yi ∈ C : x > 0, −δx ≤ y ≤ δx}. The following Lemma can be easily checked. Lemma V.2. For sufficiently small δ1 > 0, there exists a positive constant L1 such that for any α, β ∈ CR+ [δ1 ] |α − β| |α − β| , . | log α − log β| ≤ L1 max |α| |β| The following lemma essentially follows from the proof of Part 2 of Lemma 2.3 in [15] (in particular, its Part 1 is just a rephrased version of Part 2 of that lemma), allows us to connect the complex Hilbert metric and the Euclidean metric. We give a proof for completeness. Lemma V.3. 1) For any δ > 0, there exists ξ > 0 such that for any x̃ ∈ W̃ + , x ∈ W ◦ with d˜H (x̃, x) ≤ ξ, we have x̃i ∈ W̃W ◦ ,H (δ) for all i. 2) For any ζ > 0, there exists a constant C > 0 such that for any x̃, ỹ ∈ W̃ + with kx̃ − xk, kỹ − yk ≤ ζ for some x, y ∈ W ◦ , we have kx̃ − ỹk ≤ C d˜H (x̃, ỹ). Proof. We only prove Part 2. Let ξ = d˜H (x̃, ỹ). Then we have for all i, j, log x̃i /ỹi ≤ ξ. x̃j /ỹj There exists C1 > 0 such that for ξ sufficiently small, and for /ỹi all i, j, x̃x̃ji /ỹ − 1 ≤ C1 ξ. Let αj = x̃j /ỹj . Then for all i, j, j |x̃i − αj ỹi | ≤ C1 ξ|αj ||ỹi |, 11 and so n n X X |1 − αj | = (x̃i − αj ỹi ) ≤ |x̃i − αj ỹi | i=1 ≤ C1 ξ|αj | i=1 n X |ỹi | = C1 (1 + Bζ)ξ|αj |. i=1 It follows that |x̃j − ỹj | ≤ C1 (1 + Bζ)ξ|x̃j | ≤ C1 (1 + Bζ)ξ(xj + ζ), which implies Part 2, if ξ is sufficiently small. VI. P ROOF OF T HEOREM I.1 The following lemma is an analog of Lemma III.1. Lemma VI.1. For any δ > 0, there exists r1 > 0 such that for any ~ε ∈ C~εm01 (r1 ), any z ∈ Z and any x ∈ W , we have d˜H (fz~ε(x), fz~ε0 (x)) ≤ δ. Proof. Since all πij (~ε0 ) are strictly positive, for any δ1 > 0, there exists r1 > 0 such that for all i, j and all ~ε ∈ C~εm01 (r1 ), ~ ε |πij − πij (~ε0 )| ≤ δ1 . πij (~ε0 ) Now, for any x = (x1 , x2 , · · · , xl ) ∈ W , any j and any ~ε ∈ C~εm01 (r1 ), we have Pl ~ ε i=1 xi (πij − πij (~ε0 )) Pl ε0 ) i=1 xi πij (~ Pl ~ ε i=1 xi πij (~ε0 )(πij − πij (~ε0 ))/πij (~ε0 ) = ≤ δ1 . Pl xi πij (~ε0 ) i=1 Thus, for any δ2 > 0, choosing δ1 sufficiently small, we have Pl ~ ε x π i ij i=1 log Pl ε0 ) i=1 xi πij (~ ! Pl ~ ε ε0 )) i=1 xi (πij − πij (~ = log 1 + ≤ δ2 . Pl xi πij (~ε0 ) i=1 Notice that d˜H (fz~ε(x), fz~ε0 (x)) Pl Pl ~ ε ~ ε i=1 xi πij q(z|j) i=1 xi πik q(z|k) = max log Pl − log Pl j,k ε0 )q(z|j) ε0 )q(z|k) i=1 xi πij (~ i=1 xi πik (~ Pl Pl ~ ε ~ ε i=1 xi πij i=1 xi πik = max log Pl − log Pl . j,k ε0 ) ε0 ) i=1 xi πij (~ i=1 xi πik (~ It then follows that for any δ > 0, there exists r1 > 0 such that for any ~ε ∈ C~εm01 (r1 ) and any x ∈ W , we have d˜H (fz~ε(x), fz~ε0 (x)) ≤ δ. is uniform over all the values of z. More precisely, recalling W̃W ◦ ,H (δ) denote the δ-neighborhood of W ◦ of W̃ under the complex Hilbert metric, we have Lemma VI.2. For sufficiently small r1 , δ > 0, there exists 0 < ρ1 < 1 such that for any ~ε ∈ C~εm01 (r1 ) and any z ∈ Z, fz~ε is a ρ1 -contraction mapping on W̃W ◦ ,H (δ) under the complex Hilbert metric in (39). Proof. It can be easily checked that there exist some r1 , δ > 0 such that for any ~ε ∈ C~εm01 (r1 ), any u, v ∈ W̃W ◦ ,H (δ) and any z ∈ Z, d˜H (uΠ~ε(z), vΠ~ε(z)) = d˜H (uΠ~ε, vΠ~ε). (40) The lemma then immediately follows from Theorem V.1. The following lemma is also needed. Lemma VI.3. 1) For any δ > 0, there exist r1 > 0 such 0 ∈ Z n+1 and that for any ~ε ∈ C~εm01 (r1 ) and for any z−n −n − 1 ≤ i ≤ −1, i x~εi(z−n ) ∈ W̃W ◦ ,H (δ), (41) −1 ) ∈ CR+ [δ]. p~ε(z0 |z−n (42) and 0 2) There exist r1 > 0 such that for all z−n ∈ Z n+1 , −1 p~ε(z0 |z−n ) is analytic on C~εm01 (r1 ). 3) For sufficiently small r1 > 0, there exist 0 < ρ1 < 1 and a positive constant L1 such that for any two Z-valued sequences {a0−n1 } and {b0−n2 } with a0−n = b0−n and for all ~ε ∈ C~εm01 (r1 ), we have −1 n ~ ε q(a0 |y 0 ). |p~ε(a0 |a−1 −n1 ) − p (b0 |b−n2 )| ≤ L1 ρ1 max 0 y ∈Y Proof. 1. By Lemma VI.2, we can choose r1 , δ > 0 sufficiently small such that there exists 0 < ρ1 < 1 such that for all ~ε ∈ C~εm01 (r1 ), fz~ε is a ρ1 -contraction mapping on W̃W ◦ ,H (δ) under the complex Hilbert metric. Now, choose r1 > 0 so small (the existence of r1 is guaranteed by Lemma VI.1) such that for any z ∈ Z, for all x ∈ W , all ~ε ∈ C~εm01 (r1 ) d˜H (fz~ε(x), fz~ε0 (x)) ≤ δ(1 − ρ1 ), and for all ~ε ∈ (43) C~εm01 (r1 ), d˜H (π ~ε, π(~ε0 )) ≤ δ(1 − ρ1 ). (44) We then deduce that 0 0 d˜H (x~εi+1 , x~εi+1 ) = d˜H (fz~εi+1 (x~εi), fz~εi+1 (x~εi 0 )) ≤ d˜H (f ~ε (x~ε), f ~ε (x~ε0 )) zi+1 i zi+1 i 0 + d˜H (fz~εi+1 (x~εi 0 ), fz~εi+1 (x~εi 0 )). (45) Then, by (43), (44) and (45), for i > −n − 1, we have 0 d˜H (x~εi+1 , x~εi+1 ) ≤ ρd˜H (x~εi, x~εi 0 ) + δ(1 − ρ1 ). So, for all i, The following lemma, roughly speaking, says that when we perturb ~ε0 “a bit” to ~ε, fz~ε is still a contraction mapping on a complex neighborhood of W ◦ , and the contraction coefficient 0 d˜H (x~εi+1 , x~εi+1 ) ≤ δ, and thus for all i, we have x~εi+1 ∈ W̃W ◦ ,H (δ), as desired. This, together with (19) and Lemma V.3 (Part 2), implies (42). 12 2. It follows from Part 1 of Lemma V.3 that for sufficiently small r1 , δ > 0 and any z ∈ Z, fz~ε(x) is analytic with respect to (~ε, x) ∈ C~εm01 (r1 )× W̃W ◦ ,H (δ). It follows from this fact, the ~ x~εi ,θ iterative nature of (see (17)) and Part 1 that for sufficiently small r1 > 0, each x~εi is analytic on C~εm01 (r1 ). Part 2 then immediately follows from (19). 3. For all ~ε ∈ C~εm01 (r1 ), we write And, by (41), for sufficiently small r1 > 0, there exist C1 , C2 > 0 such that for all ~ε ∈ C~εm01 (r1 ) −1 C1 min q(z0 |y 0 ) ≤ |p~ε(z0 |z−n )| ≤ C2 max q(z0 |y 0 ), 0 0 y ∈Y which, together with (42), implies that for some C3 > 0, −1 | log p~ε(z0 |z−n )| ≤ C3 +max{| log max q(z0 |y 0 )|, | log min q(z0 |y 0 )|}. 0 0 y ∈Y m1 Z n+1 ~ (r1 ) ε∈C~ ε 0 Apparently we have x~εi+1,a = fa~εi+1 (x~εi,a ), x~εi+1,b = fb~εi+1 (x~εi,b ). Note that there exists a positive constant L01 such that d˜H (x~ε−n,a , x~ε−n,b ) ≤ L01 , for all ~ε ∈ C~εm01 (r1 ), where r1 > 0 are chosen sufficiently small. Then from (52), we have d˜H (x~ε−1,a , x~ε−1,b ) ≤ L01 ρn−1 . 1 Therefore, by Lemma V.3, there exists a positive constant L001 independent of n1 , n2 such that for any ~ε ∈ C~εm01 (r1 ), we have kx~ε−1,a − x~ε−1,b k ≤ L001 ρn1 . (46) y we conclude that there is a positive constant L1 , independent of n1 , n2 such that −1 ~ ε n |p~ε(a0 |a−1 q(a0 |y 0 ). −n1 ) − p (b0 |b−n2 )| ≤ L1 ρ1 max 0 y ∈Y −1 By Lemma III.4 (Part 2), h~εn (Z0 |Z−n ) is analytic on C~εm01 (r1 ). Now, to prove the theorem, we only need to prove that there exist r1 > 0 such that the h~εn (Z), as n → ∞, uniformly converges on C~εm01 (r1 ). Note that Z −1 0 ~ ε ~ ε 0 p~ε(z−n−1 ) log p~ε(z0 |z−n−1 |hn+1 (Z) − hn (Z)| = )dz−n−1 n+2 Z Z −1 0 0 − p~ε(z−n ) log p~ε(z0 |z−n )dz−n Z n+1 Z −1 0 p~ε(z−n−1 = )(log p~ε(z0 |z−n−1 ) Z n+2 −1 0 . − log p~ε(z0 |z−n ))dz−n−1 Fix ~ε ∈ C~εm01 (r1 ). Then, by Lemmas V.2 and VI.3, either we have, for some 0 < ρ1 < 1, L01 > 0 and some δ1 with (1 + δ1 )ρ1 < 1 Now, using (19) and the fact that X p~ε(a0 ) = πy~ε q(a0 |y), (47) −1 −1 0 ))| |p~ε(z−n−1 )(log p~ε(z0 |z−n−1 ) − log p~ε(z0 |z−n −1 −1 ~ ε ~ ε ) p (z |z ) − p (z |z 0 0 −n −n−1 0 ) ≤ L01 p~ε(z−n−1 −1 ~ ε p (z0 |z−n−1 ) −1 ≤ L01 |p~ε(z−n−1 )|L1 ρn1 max q(z0 |y 0 ), 0 We then have finished the proof. y ∈Y We are now ready for the proof of Theorem I.1. Proof of Theorem I.1. We first prove that there exist r1 > 0 such that for any n, h~εn (Z) is analytic on C~εm01 (r1 ). For a fixed n, recall that Z −1 ~ ε 0 0 hn (Z) = − p~ε(z−n ) log p~ε(z0 |z−n )dz−n , Z n+1 where 0 (z−n ) = X ~ ε p 0 Y 0 (y−n ) 0 y−n p −1 (z0 |z−n ) Now, for any ~ε ∈ 0 |p~ε(z−n )| ≤ X q(zi |yi ) Notice that for any given δ > 0, there exist r1 , r2 > 0 such that for all ~ε ∈ C~εm0 (r1 ), 0 |πy~ε−n | ≤ (1 + δ)πy~ε−n , −1 x~ε−1 (z−n )Π~ε(z0 )1. 0 y−n ≤ X 0 y−n 0 |p~ε(y−n )| 0 Y Zn q(zi |yi ) Z i=−n 0 Y i=−n max q(zi |y 0 ). 0 y ∈Y |πy~εi yi+1 | ≤ (1 + δ)πy~εi0yi+1 . It then follows from the first inequality sign of (48) that Z ~ −1 −1 |p~ε,θ (z−n )|dz−n ≤ (1 + δ)n , we have 0 |p~ε(y−n )| −1 −1 0 |p~ε(z−n−1 )(log p~ε(z0 |z−n−1 ) − log p~ε(z0 |z−n ))| −1 −1 ~ ε ~ ε p (z |z ) − p (z |z )) 0 0 −n −n−1 0 ≤ L01 p~ε(z−n−1 ) −1 ~ ε p (z0 |z−n ) y ∈Y i=−n = C~εm01 (r1 ), or we have, for some 0 < ρ1 < 1, L01 > 0 and some δ1 with (1 + δ1 )ρ1 < 1, −1 0 ≤ L01 |p~ε(z−n )||p~ε(z−n−1 |z−n )|L1 ρn1 max q(z0 |y 0 ). 0 and ~ ε y ∈Y This, together with (4), implies that on Z 0 −1 0 ) log p~ε(z0 |z−n ) dz−n < ∞ (50) sup p~ε(z−n x~εi,b = x~εi(bi−n2 ) = p~ε(yi = · |bi−n2 ). p (49) C~εm01 (r1 ), x~εi,a = x~εi(ai−n1 ) = p~ε(yi = · |ai−n1 ), ~ ε y ∈Y Z n+1 ~ −1 −1 |p~ε,θ (z−n−1 )|dz−n−1 ≤ (1 + δ)n+1 . Moreover, similar to (49), we have for some C4 , C5 > 0, (48) ~0 ~ 0 C4 min q θ (z−n−1 |y 0 ) ≤ |p~ε,θ (z−n−1 |z−n )| ≤ C5 max q(z−n−1 |y 0 ). 0 0 y ∈Y y ∈Y 13 By choosing δ > 0 sufficiently small, we can combine all the relevant inequalities above to obtain some L > 0 and some 0 < ρ < 1 such that for all ~ε ∈ C~εm01 (r1 ), |h~εn+1 (Z) Z ≤ Z n+2 − It then follows from the second inequality of (11) that for ~ ∈ any δ > 0, there exist r1 , r2 > 0 such that for any (~ε, θ) m2 m1 C~ε0 (r1 ) × C~ (r2 ) and any x ∈ W , we have θ0 h~εn (Z)| ~ ~ d˜H (fz~ε,θ (x), fz~ε0 ,θ0 (x)) ≤ δ. 0 −1 −1 0 |p~ε(z−n−1 )(log p~ε(z0 |z−n−1 ) − log p~ε(z0 |z−n ))|dz−n−1 ≤ Lρn , The following lemma is an analog of Theorem VI.2. which implies the analyticity of h~ε(Z) around ~ε0 . Lemma VII.2. For sufficiently small r1 , r2 , δ > 0, there exists ~ ∈ Cm1 (r1 )×Cm2 (r2 ) and 0 < ρ1 < 1 such that for any (~ε, θ) ~ ~ ε0 VII. P ROOF OF T HEOREM I.8 The proof of Theorem I.8 follows from a parallel flow as in that of Theorem I.1. We will only give a sketch of the proof, while highlighting the parts requiring the extra conditions (b), (c) and (11). The following lemma is an analog of Lemma VI.1. Lemma VII.1. For any δ > 0, there exist r1 , r2 > 0 such ~ ∈ Cm1 (r1 ) × Cm2 (r2 ), any z ∈ Z and any that for any (~ε, θ) ~ ε0 ~0 θ x ∈ W , we have ~ ~ d˜H (fz~ε,θ (x), fz~ε0 ,θ0 (x)) ≤ δ. Proof. By (11), we can choose r1 , r2 , δ > 0 sufficiently small ~ ∈ Cm1 (r1 ) × Cm2 (r2 ) and such that for any z ∈ Z, any (~ε, θ) ~ ~ ε0 θ0 any u, v ∈ W̃W ◦ ,H (δ), ~ ~ d˜H (uΠ~ε,θ (z), vΠ~ε,θ (z)) is well-defined. Moreover, it can be easily checked that ~ ~ d˜H (uΠ~ε,θ (z), vΠ~ε,θ (z)) = d˜H (uΠ~ε, vΠ~ε). The following lemma is an analog of Theorem VI.3. Lemma VII.3. 1) For any δ > 0, there exist r1 , r2 > 0 ~ ∈ Cm1 (r1 ) × Cm2 (r2 ) and for such that for any (~ε, θ) ~ ε0 ~0 θ 0 any z−n ∈ Z n+1 and −n − 1 ≤ i ≤ −1, Now, for any x = (x1 , x2 , · · · , xl ) ∈ W , any j and any ~ε ∈ C~εm01 (r1 ), we have Pl ~ ε i=1 xi (πij − πij (~ε0 )) Pl ε0 ) i=1 xi πij (~ Pl ~ ε i=1 xi πij (~ε0 )(πij − πij (~ε0 ))/πij (~ε0 ) = ≤ δ1 . Pl ε0 ) i=1 xi πij (~ Thus, for any δ2 > 0, choosing δ1 sufficiently small, we have Pl ~ ε i=1 xi πij log Pl xi πij (~ε0 ) i=1 ! Pl ~ ε ε0 )) i=1 xi (πij − πij (~ = log 1 + ≤ δ2 . Pl xi πij (~ε0 ) i=1 Notice that ~ ~ d˜H (fz~ε,θ (x), fz~ε0 ,θ0 (x)) Pl Pl ~ ~ ~ ε θ ~ ε θ x π q (z|k) i i=1 xi πij q (z|j) ik i=1 − log = max log Pl P l ~0 ~0 θ θ j,k xi πij (~ε0 )q (z|j) xi πik (~ε0 )q (z|k) i=1 i=1 Pl ~ ~ ε q θ (z|j) i=1 xi πij = max log Pl + log ~ j,k q θ0 (z|j) ε0 ) i=1 xi πij (~ Pl ~ ~ ε q θ (z|k) i=1 xi πik − log Pl − log ~ . q θ0 (z|k) ε0 ) i=1 xi πik (~ (51) The lemma then immediately follows from Theorem V.1. Proof. Since all πij (~ε0 ) are strictly positive, for any δ1 > 0, there exists r1 > 0 such that for all i, j and all ~ε ∈ C~εm01 (r1 ), ~ ε |πij − πij (~ε0 )| ≤ δ1 . πij (~ε0 ) θ0 ~ any z ∈ Z, fz~ε,θ is a ρ1 -contraction mapping on W̃W ◦ ,H (δ) under the complex Hilbert metric in (39). ~ i x~εi ,θ (z−n ) ∈ W̃W ◦ ,H (δ), (52) and ~ −1 ) ∈ CR+ [δ]. p~ε,θ (z0 |z−n (53) 0 ∈ Z n+1 , 2) There exist r1 , r2 > 0 such that for all z−n ~ m1 m2 −1 ~ ε,θ p (z0 |z−n ) is analytic on C~ε0 (r1 ) × C~ (r2 ). θ0 3) For sufficiently small r1 , r2 > 0, there exist 0 < ρ1 < 1 and a positive constant L1 such that for any two Zvalued sequences {a0−n1 } and {b0−n2 } with a0−n = b0−n ~ ∈ Cm1 (r1 ) × Cm2 (r2 ), we have and for all (~ε, θ) ~ ε0 ~ θ0 ~ ~ ε,θ |p ~ ~ ε,θ (a0 |a−1 −n1 )−p n (b0 |b−1 −n2 )| ≤ L1 ρ1 ~0 |q θ (a0 |y 0 )|. sup ~0 )∈Y×Cm2 (r2 ) (y 0 ,θ ~ θ 0 Proof. 1. It follows from a parallel argument as in the proof of Part 1 of Theorem VI.3. 2. It follows from (11(i)) and Part 1 of Lemma V.3 that ~ for sufficiently small r1 , r2 , δ > 0 and any z ∈ Z, fz~ε,θ (x) ~ x) ∈ Cm1 (r1 ) × Cm2 (r2 ) × is analytic with respect to (~ε, θ, ~ ε0 ~ θ0 W̃W ◦ ,H (δ). It follows from this fact, the iterative nature of ~ x~εi ,θ (see (17)) and Part 1 that for sufficiently small r1 , r2 > ~ 2 0, each x~εi ,θ is analytic on C~εm01 (r1 ) × Cm ~0 (r2 ). Part 2 then θ immediately follows from (19). 3. It follows from a parallel argument as in the proof of Part 3 of Theorem VI.3. We are now ready for the proof of Theorem I.8. 14 Proof of Theorem I.8. We first prove that there exist r1 , r2 > ~ 0 such that for any n, h~εn,θ (Z) is analytic on C~εm01 (r1 ) × 2 Cm ~0 (r2 ). θ For a fixed n, recall that Z ~ ~ 0 ~ −1 ~ ε,θ 0 hn (Z) = − p~ε,θ (z−n ) log p~ε,θ (z0 |z−n )dz−n , Z n+1 where ~ 0 p~ε,θ (z−n )= X 0 Y 0 p~ε(y−n ) 0 y−n ~ ∈ Cm1 (r1 ) × Cm2 (r2 ). Then, by Lemmas V.2 Fix (~ε, θ) ~0 ~ ε0 θ and VII.3, either we have, for some 0 < ρ1 < 1, L01 > 0 and some δ1 with (1 + δ1 )ρ1 < 1 ~ ~ ~ q θ (zi |yi ) ~ −1 ≤ L01 |p~ε,θ (z−n−1 )|L1 ρn1 i=−n ~ −1 −1 0 |p~ε,θ (z−n−1 )(log p~ε,θ (z0 |z−n−1 ) − log p~ε,θ (z0 |z−n ))| ~ ~ −1 −1 ~ ε,θ ~ ε,θ p (z |z ) − p (z |z ) ~ 0 0 −n −n−1 0 ) ≤ L01 p~ε,θ (z−n−1 ~ −1 ~ ε , θ p (z0 |z−n−1 ) ~0 |q θ (z0 |y 0 )|, sup m ~0 )∈Y×C 2 (r2 ) (y 0 ,θ ~ θ0 and ~ ~ or we have, for some 0 < ρ1 < 1, L01 > 0 and some δ1 with (1 + δ1 )ρ1 < 1, ~ ,θ −1 −1 p~ε,θ (z0 |z−n ) = x~ε−1 (z−n )Π~ε,θ (z0 )1. ~ ∈ Cm1 (r1 ) × Cm2 (r2 ), we have Now, for any (~ε, θ) ~ ~ ε0 ~ ~ ~ ε,θ |p 0 (z−n )| ≤ X ~ ε |p 0 y−n ≤ X 0 |p~ε(y−n )| 0 y−n 0 Y 0 (y−n )| ~ |q θ (zi |yi )| i=−n ~0 |q θ (zi |y 0 )|. sup ~ ~ ~ 0 Y (54) −1 0 )||p~ε,θ (z−n−1 |z−n )|L1 ρn1 ≤ L01 |p~ε,θ (z−n m ~0 )∈Y×C 2 (r2 ) i=−n (y 0 ,θ ~ θ sup ~0 |q θ (z0 |y 0 )|. ~0 )∈Y×Cm2 (r2 ) (y 0 ,θ ~ θ 0 0 And, by (52), for sufficiently small r1 , r2 > 0, there exist C1 , C2 > 0 such that C1 inf m ~0 )∈Y×C 2 (r2 ) (y 0 ,θ ~ ~ −1 ≤ |p~ε,θ (z0 |z−n )| ≤ C2 ~0 |q θ (z0 |y 0 )|, sup Notice that for any given δ > 0, there exist r1 , r2 > 0 such that for all ~ε ∈ C~εm0 (r1 ), 0 , |πy~ε−n | ≤ (1 + δ)πy~ε−n ~0 |q θ (z0 |y 0 )| θ0 (55) ~0 )∈Y×Cm2 (r2 ) (y 0 ,θ ~ Z which, together with (53), implies that for some C3 > 0, ~ −1 | log p~ε,θ (z0 |z−n )| ≤ C3 + max{| log log inf sup ~0 |q θ (z0 |y 0 )|, ~0 )∈Y×Cm2 (r2 ) (y 0 ,θ ~ θ 0 ~0 )∈Y×Cm2 (r2 ) (y 0 ,θ ~ θ 0 ~0 |q θ (z0 |y 0 )|}. m1 m ~ (~ ε,θ)∈C (r1 )×C~ 2 (r2 ) ~ ε0 θ0 ~ (here we have used the fact that Z |q θ (z|y)|dz is a continuous 2 function of θ~ ∈ Cm ~0 (r2 ); this follows from Lemma III.4 (Part θ 1)). It then follows from the first inequality sign of (54) that Z ~ −1 −1 ≤ (1 + δ)2n , )|dz−n |p~ε,θ (z−n Zn Z θ0 sup Z R 2 This, together with (5), implies that on C~εm01 (r1 ) × Cm ~ (r2 ), Z |πy~εi yi+1 | ≤ (1 + δ)πy~εi0yi+1 , 2 and for any y ∈ Y and all θ~ ∈ Cm ~0 (r2 ), θ Z Z ~ ~ |q θ (z|y)|dz ≤ (1 + δ) q θ0 (z|y)dz = 1 + δ, θ0 Z n+1 ~ −1 −1 0 |p~ε,θ (z−n−1 )(log p~ε,θ (z0 |z−n−1 ) − log p~ε,θ (z0 |z−n ))| ~ ~ −1 −1 ~ ε,θ ~ ε,θ p (z |z ) − p (z |z )) ~ 0 0 −n −n−1 0 ) ≤ L01 p~ε,θ (z−n−1 ~ −1 ~ ε , θ p (z0 |z−n ) θ0 ~ ~ε,θ~ 0 −1 0 ) dz−n <∞ p (z−n ) log p~ε,θ (z0 |z−n ~ Z n+1 −1 −1 |p~ε,θ (z−n−1 )|dz−n−1 ≤ (1 + δ)2(n+1) . Moreover, similar to (55), we have for some C4 , C5 > 0, C4 inf ~0 )∈Y×Cm2 (r2 ) (y 0 ,θ ~ θ 0 ~0 ~ 0 |q θ (z−n−1 |y 0 )| ≤ |p~ε,θ (z−n−1 |z−n )| (56) ~ −1 By Lemma III.4 (Part 2), h~εn,θ (Z0 |Z−n ) is analytic on ~0 2 C~εm01 (r1 ) × Cm ≤ C5 sup |q θ (z−n−1 |y 0 )|. ~0 (r2 ). θ ~0 )∈Y×Cm2 (r2 ) (y 0 ,θ Now, to prove the theorem, we only need to prove that there ~ θ 0 ~ ~ ε,θ exist r1 , r2 > 0 such that the hn (Z), as n → ∞, uniformly By choosing δ > 0 sufficiently small, we can combine all 2 converges on C~εm01 (r1 ) × Cm ~0 (r2 ). Note that θ the relevant inequalities above to obtain some L > 0 and some Z ~ ∈ Cm1 (r1 ) × Cm2 (r2 ), 0 < ρ < 1 such that for all (~ε, θ) ~ ~ ~ 0 ~ ~ ε,θ ~ ε0 ~0 −1 ~ ε,θ 0 θ |hn+1 (Z) − hn (Z)| = ) log p~ε,θ (z0 |z−n−1 p~ε,θ (z−n−1 )dz−n−1 Z n+2 ~ ~ ,θ Z |h~εn+1 (Z) − h~εn,θ (Z)| ~ 0 ~ −1 ~ ε,θ ~ ε,θ 0 − p (z−n ) log p (z0 |z−n )dz−n Z n+1 ~ 0 ~ ~ −1 −1 0 ZZ ≤ |p~ε,θ (z−n−1 )(log p~ε,θ (z0 |z−n−1 )−log p~ε,θ (z0 |z−n ))|dz−n−1 ≤ Lρn , ~ ~ n+2 −1 ~ ε , θ 0 ~ ε , θ Z = p (z−n−1 )(log p (z0 |z−n−1 ) Z n+2 ~ which implies the analyticity of h~ε,θ (Z) around (~ε0 , θ~0 ). ~ −1 0 − log p~ε,θ (z0 |z−n ))dz−n−1 . 15 VIII. C ONCLUDING R EMARKS Under certain mild assumptions, employing a complex Hilbert metric in a critical way, we show that the entropy rate of a class of hidden Markov chains with continuous alphabet is analytic with respect to the input Markov chain parameters. Joint analyticity results with respect to both the input Markov chain parameters and the channel parameters, which apply to additive Cauchy or Gaussian channels, are further obtained under strengthened assumptions on the channel. Given the implications of the analyticity results in the discrete setting, we expect that the results in this paper will be of great interest and significance in the continuous setting. Further work include the applications of the analyticity results in this paper to computations of entropy rate of hidden Markov chains and capacity of finite-state channels with continuous output alphabet. R EFERENCES [1] A. Allahverdyan. Entropy of Hidden Markov Processes via Cycle Expansion. J. Stat. Phys., vol. 133, 2008, pp. 535-564. [2] L. Arnold, V. M. Gundlach and L. Demetrius. Evolutionary formalism for products of positive random matrices. Annals of Applied Probability, vol. 4, 1994, pp. 859-901. [3] J. J. Birch. Approximations for the entropy for functions of Markov chains. Ann. Math. Statist., vol. 33, 1962, pp. 930-938. [4] D. Blackwell. The entropy of functions of finite-state Markov chains. Trans. First Prague Conf. Information Thoery, Statistical Decision Functions, Random Processes, 1957, pp. 13-20. [5] M. Boyle and K. Petersen. Hidden Markov processes in the context of symbolic dynamics. Entropy of Hidden Markov Processes and Connections to Dynamical Systems, London Mathematical Society Lecture Note Series, vol. 385, edited by B. Marcus, K. Petersen and T. Weissman, 2011, pp. 5-71. [6] L. Dubois. Projective metrics and contraction principles for complex cones. Journal of the London Mathematical Society, vol. 79, no. 3, 2009, pp. 719-737. [7] S. Egner, V. Balakirsky, L. Tolhuizen, S. Baggen and H. Hollmann. On the entropy rate of a hidden Markov model. IEEE ISIT, 2004, pp. 12. [8] R. Gharavi and V. Anantharam. An upper bound for the largest Lyapunov exponent of a Markovian product of nonnegative matrices. Theoretical Computer Science, vol. 332, no. 1-3, 2005, pp. 543-557. [9] G. Han. Limit theorems in hidden Markov models. IEEE Trans. Info. Theory, vol. 59, no. 3, 2013, pp. 1311-1328. [10] G. Han. A randomized approach to the capacity of finite-state channels. IEEE ISIT, 2013, pp. 2109-2113. [11] G. Han and B. Marcus. Analyticity of entropy rate of hidden Markov chains. IEEE Trans. Info. Theory, vol. 52, no. 12, 2006, pp. 5251-5266. [12] G. Han and B. Marcus. Derivatives of entropy rate in special families of hidden Markov chains. IEEE Trans. Info. Theory, vol. 53, no. 7, 2007, pp. 2642-2652. [13] G. Han and B. Marcus. Asymptotics of input-constrained binary symmetric channel capacity. Annals of Applied Probability, vol. 19, no. 3, 2009, pp. 1063-1091. [14] G. Han and B. Marcus. Entropy rate of continuous-state hidden Markov chains. IEEE ISIT, 2010, pp. 1468-1472. [15] G. Han and B. Marcus and Y. Peres. A complex Hilbert metric and applications to domain of analyticity for entropy rate of hidden Markov processes. Entropy of Hidden Markov Processes and Connections to Dynamical Systems, London Mathematical Society Lecture Note Series, vol. 385, edited by B. Marcus, K. Petersen and T. Weissman, 2011, pp. 98-116. [16] G. Han and B. Marcus. Asymptotics of entropy rate in special families of hidden Markov chains. IEEE Trans. Info. Theory, vol. 56, no. 3, 2010, pp. 1287-1295. [17] G. Han and B. Marcus. Concavity of mutual information rate for inputrestricted finite-state memoryless channels at high SNR. IEEE Trans. Info. Theory, vol. 58, no. 3, 2012, pp. 1534-1548. [18] T. Holliday, A. Goldsmith, and P. Glynn. Capacity of finite state channels based on Lyapunov exponents of random matrices. IEEE Trans. Info. Theory, vol. 52, no. 8, 2006, pp. 3509-3532. [19] S. Ihara. Information Theory for Continuous Systems, World Scientific, 1993. [20] P. Jacquet, G. Seroussi, and W. Szpankowski. Noisy constrained capacity. IEEE ISIT, 2007, pp. 986-990. [21] P. Jacquet, G. Seroussi, and W. Szpankowski. On the entropy of a hidden Markov process. Theoretical Computer Science, vol. 395, 2008, pp. 203-219. [22] F. Ledrappier. Analyticity of the entropy for some random walks. Preprint. Available at http://arxiv.org/abs/1009.5354. [23] F. Le Gland and N.Oudjane. Stability and Uniform Approximation of Nonlinear Filters Using the Hilbert Metric and Application to Particle Filters. The Annals of Applied Probability, vol. 14, no. 1, 2004, pp. 144-187. [24] Y. Li and G. Han. Concavity of mutual information rate of finite-state channels. IEEE ISIT, 2013, pp. 2114-2118. [25] J. Luo and D. Guo. On the entropy rate of hidden Markov processes observed through arbitrary memoryless channels. IEEE Trans. Info. Theory, vol. 55, no. 4, 2009, pp. 1460-1467. [26] C. Nair, E. Ordentlich and T. Weissman. Asymptotic filtering and entropy rate of a hidden Markov process in the rare transitions regime. IEEE ISIT, 2005, pp. 1838-1842. [27] E. Ordentlich and T. Weissman. On the optimality of symbol by symbol filtering and denoising. IEEE Trans. Info. Theory, vol. 52, no. 1, 2006, pp. 19-40. [28] E. Ordentlich and T. Weissman. New bounds on the entropy rate of hidden Markov process. IEEE ITW, 2004, pp. 117-122. [29] E. Ordentlich and T. Weissman. Bounds on the entropy rate of biunary hidden Markov processes Entropy of Hidden Markov Processes and Connections to Dynamical Systems, London Mathematical Society Lecture Note Series, vol. 385, edited by B. Marcus, K. Petersen and T. Weissman, 2011, pp. 117-171. [30] P. Vontobel, A. Kavcic, D. Arnold and Hans-Andrea Loeliger. A generalization of the Blahut-Arimoto algorithm to finite-state channels. IEEE Trans. Info. Theory, vol. 54, no. 5, 2008, pp. 1887-1918. [31] Y. Peres. Analytic dependence of Lyapunov exponents on transition probabilities. Lecture Notes in Mathematics, Springer Verlag, vol. 1486, 1990. [32] Y. Peres. Domains of analytic continuation for the top Lyapunov exponent. Ann. Inst. H. Poincare Prob. Statist, vol. 28, no. 1, 1991, pp. 131-148. [33] Y. Peres and A. Quas. Entropy rate for hidden Markov chains with rare transitions. Entropy of Hidden Markov Processes and Connections to Dynamical Systems, London Mathematical Society Lecture Note Series, vol. 385, edited by B. Marcus, K. Petersen and T. Weissman, 2011, pp. 172-178. [34] H. Pfister, J. Soriaga and P. Siegel. The achievable information rates of finite-state ISI channels. IEEE GLOBECOM, 2011, pp. 2992-2996. [35] H. Pfister. On the capacity of finite state channels and the analysis of convolutional accumulate-m codes. Ph.D. thesis, University of California at San Diego, USA, 2003. [36] H. Pfister. The capacity of finite-state channels in the high-noise regime. Entropy of Hidden Markov Processes and Connections to Dynamical Systems, London Mathematical Society Lecture Note Series, vol. 385, edited by B. Marcus, K. Petersen and T. Weissman, 2011, pp. 179-222. [37] H. Rugh. Cones and gauges in complex spaces: Spectral gaps and complex Perron-Frobenius theory. Annals of Mathematics, vol. 171, no. 3, 2010. [38] E. Seneta. Non-negative Matrices and Markov Chains, Springer-Verlag, New York Heidelberg Berlin, 1980. [39] V. Sharma and S. Singh. Entropy and channel capacity in the regenerative setup with applications to Markov channels. IEEE ISIT, 2001, pp. 283. [40] V. Tadic. Analyticity, Convergence, and Convergence Rate of Recursive Maximum-Likelihood Estimation in Hidden Markov Models. IEEE Trans. Info. Theory, vol. 56, no, 12, 2010, pp. 6406-6432. [41] E. Zehavi and J. Wolf. On runlength codes. IEEE Trans. Info. Theory, vol. 34, 1988, pp. 45-54. [42] O. Zuk, I. Kanter and E. Domany. The entropy of a binary hidden Markov process. J. Stat. Phys., vol. 121, no. 3-4, 2005, pp. 343-360. [43] O. Zuk, E. Domany, I. Kanter and M. Aizenman. From finite-system entropy to entropy rate for a hidden Markov process. IEEE Signal Processing Letters, vol. 13, no. 9, 2006, pp. 517-520. ACKNOWLEDGMENT The first author would like to thank the support from the University Grants Committee of the Hong Kong Special 16 Administrative Region, China under grant No. AoE/E-02/08 and the support from Research Grants Council of the Hong Kong Special Administrative Region, China under grant No. 17301814. Guangyue Han Guangyue Han received the B.S. and M.S. degrees in mathematics from Peking University, China, and the Ph.D. degree in mathematics from the University of Notre Dame, U.S.A. in 1997, 2000 and 2004, respectively. After three years with the department of mathematics at the University of British Columbia, Canada, he joined the department of mathematics at the University of Hong Kong, China in 2007. His main research areas are analysis and combinatorics, with an emphasis on their applications to coding and information theory. Brian Marcus Brian Marcus attended Claremont McKenna College and received his B.A. from Pomona College in 1971 and Ph.D. in mathematics from UC-Berkeley in 1975 (supervised by Rufus Bowen). He held the IBM Watson Postdoctoral Fellowship in mathematical sciences in 1976-7. From 1975-1985 he was Assistant Professor and then Associate Professor of Mathematics (with tenure) at UNC-Chapel Hill. From 1984-2002 he was a Research Staff Member at the IBM Almaden Research Center. He is Professor of Mathematics at UBC-Vancouver, and served as Head of Mathematics (2002-2007). He has been Consulting Associate Professor in Electrical Engineering at Stanford University (2000-2003) and Visiting Associate Professor in Mathematics at UC-Berkeley (1986). He has served as primary dissertation supervisor of PhD students at UNC-Chapel Hill, UC Santa Cruz, Stanford University and UBC. He was co-recipient (with Paul Siegel and Jack Wolf) of the Leonard G. Abraham Prize Paper Award of the IEEE Communications Society in 1993 and gave an invited plenary lecture at the 1995 International Symposium on Information Theory. He is the author of more than sixty research papers in refereed mathematics and engineering journals, as well as a textbook, An Introduction to Symbolic Dynamics and Coding (Cambridge University Press, 1995, co-authored with Doug Lind). He also holds twelve US patents. He is an IEEE Fellow and served as an elected member of the American Mathematical Society Council (2003-2006). His current research interests include symbolic dynamics, ergodic theory, information theory and statistical physics.

Analyticity of Entropy Rate of Hidden Markov Chains with Continuous Alphabet

Related documents

Products

Support

Analyticity of Entropy Rate of Hidden Markov Chains with Continuous Alphabet

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib