Analyticity of Entropy Rate of Hidden Markov Chains with Continuous Alphabet

advertisement
1
Analyticity of Entropy Rate of Hidden Markov
Chains with Continuous Alphabet
Guangyue Han and Brian Marcus, Fellow, IEEE,
Abstract—We first prove that under certain mild assumptions,
the entropy rate of a hidden Markov chain, observed when
passing a finite-state stationary Markov chain through a discretetime continuous-output channel, is analytic with respect to the
input Markov chain parameters. We then further prove, under
strengthened assumptions on the channel, that the entropy rate
is jointly analytic as a function of both the input Markov
chain parameters and the channel parameters. In particular,
the main theorems establish the analyticity of the entropy
rate for two representative channels: Cauchy and Gaussian.
The analyticity results obtained are expected to be helpful in
computation/estimation of entropy rate of hidden Markov chains
and capacity of finite-state channels with continuous output
alphabet.
Index Terms—hidden Markov chain, entropy rate, analyticity,
continuous alphabet, Hilbert metric.
0
here z−n
, (z−n , z−n+1 , · · · , z0 ) denotes an instance of
0
0
Z−n , (Z−n , Z−n+1 , · · · , Z0 ), and p(z−n
) denotes the prob0
ability density of z−n . It is well-known (e.g., see page 60
0
of [19]) that if h(Z−n
) is finite for all n, then the limit in (1)
exists and can be written as
h(Z) = lim hn (Z),
n→∞
where
Z
hn (Z) = −
Z n+1
−1
0
0
p(z−n
) log p(z0 |z−n
)dz−n
;
(2)
−1
here p(z0 |z−n
) denotes the conditional density of z0 given
−1
z−n . Since the channels considered in this paper are memoryless and Y is stationary, we have
0
0
h(Z−n
|Y−n
) = (n + 1)h(Z0 |Y0 ),
I. M AIN R ESULTS AND R ELATED W ORK
where
Introduction and background. Consider a discrete-time
channel with a finite input alphabet Y = {1, 2, · · · , l} and the
continuous output alphabet Z = R. We assume that the input
process is a Y-valued first-order stationary Markov chain Y
with transition probability matrix Π = (πij )l×l and stationary
vector π = (πi )1×l (here we assume Y is first-order only
for simplicity; a standard “blocking” approach can be used to
reduce higher order cases to the first-order case). We assume
that the channel is memoryless in the sense that at each time,
the distribution of the output z ∈ Z, given the input y ∈ Y, is
independent of the past and future inputs and outputs, and is
distributed according to a probability density function q(z|y).
The corresponding output process of this channel is a hidden
Markov chain, which will be denoted by Z throughout the
paper. The entropy rate h(Z) is defined as
h(Z) = lim
n→∞
1
0
h(Z−n
),
n+1
(1)
when the limit exists, where 1
Z
0
0
0
0
h(Z−n
)=−
p(z−n
) log p(z−n
)dz−n
;
Z n+1
Guangyue Han is with the Department of Mathematics, The University of
Hong Kong, Hong Kong, China. email: ghan@hku.hk.
Brian Marcus is with the Department of Mathematics, The University of
British Columbia, B.C., Canada. email: marcus@math.ubc.ca.
A preliminary version of this paper has been presented in IEEE ISIT
2010 [14].
Copyright (c) 2014 IEEE. Personal use of this material is permitted.
However, permission to use this material for any other purposes must be
obtained from the IEEE by sending a request to pubs-permissions@ieee.org.
1 Throughout the paper, since all the integrands are at least continuous and
most are analytic, all the integrals should be interpreted as Riemann integrals.
h(Z0 |Y0 ) = −
X
y∈Y
Z
πy
q(z|y) log q(z|y)dz.
Z
It then follows from
0
0
0
0
0
0
(n+1) = h(Z0 |Y0 )h(Z−n
|Y−n
) ≤ h(Z−n
) ≤ h(Y−n
)+h(Z−n
|Y−n
)
that if
Z
q(z|y) log q(z|y)dz is finite for all y,
(3)
z∈Z
0
), which implies that h(Z) as
so is h(Z0 |Y0 ) and then h(Z−n
in (1) is well-defined and finite.
Unless specified otherwise, we will assume that Π = Π~ε =
~
ε
(πij ) is analytically parameterized by ~ε ∈ Ω1 , where Ω1
denotes a bounded domain in Rm1 (a domain is an open
~
and connected set), and for any (y, z) ∈ Y × Z, q θ (z|y) is
~
analytically parameterized by θ ∈ Ω2 , where Ω2 denotes a
bounded domain in Rm2 .
Main results. We are now ready to state our main results.
Here, we remark that all our results in this paper can be
straightforwardly translated to the setting where Z is finite
or countably infinite.
Under some mild regularity condition, we prove the following theorem, which establishes the analyticity of h(Z) as
a function of the underlying Markov chain parameters.
Theorem I.1. Assume that for any y ∈ Y, q(z|y) is positive
and continuous on Z, and the following two integrals
Z
Z
0
q(z|y)| log min
q(z|y
)|dz,
q(z|y)| log max
q(z|y 0 )|dz
0
0
Z
y
Z
y
(4)
2
are finite. If Π is strictly positive at ~ε0 , then h(Z) is analytic
around ~ε0 .
Remark I.2. Theorem I.1 has been announced in [14], where
Condition (3) has been assumed. Unfortunately, we have found
that, at least for the rigorous proof in this paper, Condition
(3) is not strong enough. More specifically, as in the proof of
Theorem I.1, we need to use (4), a strengthened version of
−1
Condition (3), to establish the analyticity of h~εn (Z0 |Z−n
) on
m1
m1
C~ε0 (r1 ); here and throughout the paper, C~ε0 (r1 ) denotes the
r1 -ball of ~ε0 in Cm1 with respect to the Euclidean metric.
Remark I.3. When Z is finite, the analyticity of h(Z) has
been established under mild positivity assumptions in [11].
Due to the fact that for any y, q(z|y) can be arbitrarily close
to 0, the complexfication and uniform convergence arguments
in [11] do not carry over to Theorem I.1. As elaborated in
Sections V and VI, the proof of Theorem I.1 requires a critical
use of the complex Hibert metric in Section V.
We will also prove analyticity of h(Z) as a function of
both the Markov chain parameters and channel parameters.
Simple examples show that h(Z) can fail to be analytic as
a function of the channel parameter alone; see Example IV.1
and Remark IV.2. Our positive results require several technical
regularity conditions, which we describe as follows. These
conditions involve the complexification of the channel density
functions (by definition, any real analytic function, such as
~
q θ (z|y) as above, at a given point can be uniquely extended
to a complex analytic function on some complex neighborhood
of the given point; we will continue to use the same notation,
~
such as q θ (z|y), for this complex extension). We require
these technical conditions, which abstract the properties of
commonly used probability density functions (e.g., Cauchy
and Gaussian), in order to make our proofs work. There may
be more general conditions that suffice. On a first reading, the
reader may want to skip directly to the statements of results
below.
Before listing our regularity conditions, we remind the
~ z)}z of θ~ is equiconreader that a family of functions {f (θ,
tinuous if for any ε > 0, there exists δ > 0 such that
|f (θ~1 , z) − f (θ~2 , z)| ≤ ε for all z as long as kθ~1 − θ~2 k ≤ δ.
Note that uniform boundedness (over all feasible z) of the
~ z) with respect to θ~ will imply its equiconderivatives of f (θ,
tinuity.
The regularity conditions are as follows: For given
(~ε0 , θ~0 ) ∈ Ω1 × Ω2 ,
(a) Π is strictly positive at ~ε0 ;
~
(b) for all y ∈ Y, q θ0 (z|y) is positive on Z;
(c) there exists r2 > 0 such that
~
(i) for any (y, z) ∈ Y × Z, q θ (z|y) is analytic on
2
Cm
~ (r2 ),
θ0
~
(ii) for any y ∈ Y, q θ (z|y) is jointly continuous on
2
Z × Cm
~0 (r2 ),
θ
(iii) the following three integrals
Z
Z
Z
q̆(z, r2 )dz, q̆(z, r2 ) log q̂(z, r2 )dz, q̆(z, r2 ) log q̆(z, r2 )dz,
(5)
are all finite, where
q̆(z, r2 ) ,
~
|q θ (z|y)|,
sup
m2
~
(y,θ)∈Y×C
(r2 )
~
θ0
q̂(z, r2 ) ,
~
inf
m2
~
(y,θ)∈Y×C
(r2 )
~
|q θ (z|y)|;
θ0
(d) for some I ∈ {1, 2, . . . , l},
(i) there exist r2 > 0 such that for all j, the family
~
~
2
of functions {q θ (z|j)/q θ (z|I)}z on θ~ ∈ Cm
~0 (r2 ) is
θ
equicontinuous,
(ii) there exists r2 > 0 such that for each z, the real
~
2
log q θ (z|I) can be analytically extended to Cm
~0 (r2 )
θ
and for all j,
Z
~
~
sup q θ (z|j) log q θ (z|I) dz < ∞. (6)
Z
~ m2 (r2 )
θ∈C
~
θ0
It is easily seen that the most commonly used channel models,
including Cauchy and Gaussian channels, satisfy all of these
conditions; see Examples I.6 and I.7.
The following theorem is our first result on the joint
analyticity of h(Z).
Theorem I.4. For given (~ε0 , θ~0 ) ∈ Ω1 × Ω2 , assume Conditions (a), (b), (c) and (d). If, in addition, one of the following
two conditions holds,
0
00
• there exist C , C > 0 such that for all i, j and all z ∈ Z,
q θ~0 (z|j) (7)
C0 ≤ ~
≤ C 00 ;
q θ0 (z|i) •
for the same I as in Condition (d), there exists r2 > 0
such that for any ε > 0, there exists a compact subset
Σ ⊂ Z such that for all z 6∈ Σ, all j 6= I and all
2
θ~ ∈ Cm
~0 (r2 )
θ
q θ~ (z|j) (8)
~
≤ ε,
q θ (z|I) then h(Z) is analytic around (~ε0 , θ~0 ).
Remark I.5. Conditions (7) and (8) are in fact mutually
exclusive in the sense that if one of them holds, then the
other one must fail. The reader should also note that while
Theorem I.4 requires Condition (7) to hold only at one given
parameter value, θ~0 , Condition (8) needs to hold for all
parameter values θ~ in a complex neighborhood of the given
θ~0 . On the other hand, if Condition (8) holds only at θ~0 , then
one can conclude that h(Z) is analytic with respect to ~ε at
~ε0 . Theorem I.1, however, says that a condition as simple as
(4) is already sufficient to imply the analyticity of h(Z) with
respect to ~ε alone.
In the following example, we show that Conditions (b)-(d)
and (7) are satisfied for additive Cauchy channels.
Example I.6. Consider an additive Cauchy channel with
channel transition probability function taking the following
form:
1
γi
~
q θ (z|i) =
, γi > 0, i = 1, 2, . . . , l, (9)
π (z − µi )2 + γi2
3
which is parameterized by θ~ ∈ Ω2 , where
2l
Ω2 , {(γ1 , µ1 , γ2 , µ2 , . . . , γl , µl ) ∈ R : γi > 0, i = 1, 2, . . . , l}.
Let θ~0 ∈ Ω2 .
Obviously, Condition (b) is trivially satisfied.
Now, for Condition (c), note that since γi > 0 for each
i, each (z − µi )2 + γi2 is bounded away from 0 for all θ~ ∈
C2l
~0 (r2 ) if r2 is sufficiently small. The existence of r2 for (i)
θ
and (ii) then immediately follows. As for (iii), notice that for
sufficiently small r2 , both q̆(z, r2 ) and q̂(z, r2 ) are of order
O(1/z 2 ) for large z and are uniformly bounded below away
from 0 for all z; this implies that the three integrals in (5) are
all finite.
We now check Condition (d) with I set to be 1 (here I
can be chosen arbitrarily). For (i), observe that for sufficiently
~
~
small r2 , the derivative of q θ (z|j)/q θ (z|1) with respect to θ~ ∈
m2
C~ (r2 ) is bounded uniformly over all j and z ∈ Z. As for
θ0
(ii), observe that for sufficiently small r2 , the imaginary part
~
of q θ (z|1) is dominated by the real part, which implies that
~
2
log q θ (z|1) can be analytically extended to Cm
~0 (r2 ), and the
θ
finiteness of the integral in (6) follows from the fact that for
~
large z, each q θ (z|i) is of order O(1/z 2 ).
For Condition (7), it can be easily checked that for all i, j,
~
lim
z→∞
q θ0 (z|j)
q θ~0 (z|i)
= 1,
which immediately implies Condition (7).
So, for an additive Cauchy channel, if Π is strictly
positive at ~ε0 ∈ Ω1 , then h(Z) is analytic around
(~ε0 , (γ1 , µ1 , γ2 , µ2 , . . . , γl , µl )).
In the following example, we show that Conditions (b)-(d)
and (8) are satisfied for additive Gaussian channels.
Example I.7. Consider an additive Gaussian channel with
channel transition probability function taking the following
form:
2
2
1
~
q θ (z|i) = √
e−(z−µi ) /(2σi ) , σi > 0, i = 1, 2, . . . , l,
2πσi
and all σi are distinct,
(10)
which is parameterized by θ~ ∈ Ω2 , where
and z ∈ Z, which immediately implies the equicontinuity. As
for (ii), observe that
√
(z − µI )2
~
log q θ (z|I) = − log( 2πσI ) −
,
2σI2
2
which can be analytically extended to Cm
~0 (r2 ) for sufficiently
θ
small r2 . And the finiteness of the integral in (6) follows from
2
~
the fact that for large z, q θ (z|j) is of order e−O(z ) for all j.
For Condition (8), it can be easily checked that for all i, j,
~
lim
z→∞
0
2
C1 e−C1 z ≤ q̆(z, r2 ),
0
2
q̂(z, r2 ) ≤ C2 e−C2 z ,
which immediately implies (iii).
We now check Condition (d) with I set to be the index
i corresponding to the largest σi . For (i), observe that for
~
~
sufficiently small r2 , the derivative of q θ (z|j)/q θ (z|I) with
m2
~
~
respect to θ is bounded on θ ∈ C~ (r2 ) uniformly over all j
θ0
q θ~0 (z|I)
= 0,
which immediately implies Condition (8).
So, for an additive Gaussian channel, if Π is strictly
positive at ~ε0 ∈ Ω1 , then h(Z) is analytic around
(~ε0 , (σ1 , µ1 , σ2 , µ2 , · · · , σl , µl )).
Our second result on joint analyticity deals with the case
when for all i, the real part of q θ (z|i) “dominates” the
imaginary part of the complex extension.
Theorem I.8. For given (~ε0 , θ~0 ) ∈ Ω1 × Ω2 , assume Conditions (a), (b) and (c). If, in addition, for any δ > 0, there exists
2
r2 > 0 such that for all (y, z) ∈ (Y, Z) and all θ~ ∈ Cm
~0 (r2 )
θ
~
q θ (z|y) ~
~
θ
θ
(i) |=(q (z|y))| < δ|<(q (z|y))|, (ii) log ~
≤ δ,
q θ0 (z|y) (11)
then h(Z) is analytic around (~ε0 , θ~0 ).
Theorem I.8 applies to Cauchy channels (9) roughly because
small complex perturbations of the parameters will have small
~
effect on the imaginary part of q θ (z|i) uniformly over z and i.
However, it need not apply to Gaussian channels (10) because
a small imaginary perturbation of a variance may overwhelm
~
the real part of q θ (z|i) for large z. Theorem I.8 also applies to
other channels for which Conditions (7) and (8) fail. Roughly
~
speaking, Condition (7) says that all q θ (z|i) are in some
sense “comparable” with one another at given point θ~0 , and
~
Condition (8) says that one of q θ (z|i) dominates all the others.
It is not surprising that one can find channels that do not satisfy
both conditions, however satisfy (11). A simple yet somewhat
artificial example of such channels with channel parameter
θ~ = (µ1 , γ1 , µ2 , γ2 ) is defined as follows:
Ω2 , {(σ1 , µ1 , σ2 , µ2 , . . . , σl , µl ) ∈ R2l : σi > 0, i = 1, 2, . . . , l}.
Let θ~0 ∈ Ω2 .
Obviously, Condition (b) is trivially satisfied.
Now, for Condition (c), note that (i) and (ii) hold for any
r2 with 0 < r2 < mini {σi }. And for any such fixed r2 , there
exists C1 , C10 , C2 , C20 > 0, which are independent of z, such
that
q θ0 (z|j)
~
q θ (z|i) =
and
1
γi
,
π (z − µi )2 + γi2
γi > 0,
i = 1, 2,
2
1
q(z|3) = √ e−z /2 .
2π
~
~
It is easy to see that q θ (z|1) is comparable to q θ (z|2), however
not to q(z|3); and clearly, none of the three distributions
dominates. On the other hand, such a channel satisfies the assumptions in Theorem I.8, and therefore h(Z) is analytic with
respect to (µ1 , γ1 , µ2 , γ2 ) for the same reason as Theorem I.8
applies to Cauchy channels in Example I.6.
Theorems I.4 and I.8 can be regarded as extensions of [11,
Theorem 1.1], which deals with the case where Z is finite. In
particular, the flow of the proof of Theorem I.4 follows closely
4
that of this case. However, new techniques are needed to deal
with the continuous case. The extension to Theorems I.8 is
less direct, where the use of a complex Hilbert metric [15] to
replace the classical real Hilbert metric, is critical. This metric
was also used in the proof of Theorem I.1. It is not needed
for Theorem I.4, because it assumes stronger conditions on
the channel density functions.
We remark that in [11, Theorem 1.1], zero values are allowed for some transition probabilities. It seems more difficult
to handle this phenomena in the continuous-alphabet setting;
this is the subject of forthcoming work.
To the best of our knowledge, the results in this paper,
together with those in [14], are among the first results establishing analyticity of the entropy rate of hidden Markov
chains with continuous alphabet. Nevertheless, following the
approach in [11], the analyticity of the expected log-likelihood
for certain hidden Markov models has been established in [40].
Our paper has some similarities with [40]. Specifically, the
entropy rate of a hidden Markov chain corresponding to
parameter θ is exactly the expected value of the log-likelihood
when both the expectation and the log-likelihood are taken
with the same parameter value. So, to prove analyticity of the
entropy rate, we must let the expectation and log-likelihhod
vary together. In contrast, in [40], the expectation is taken with
respect to a fixed parameter value.
Related work in finite alphabet. As opposed to continuous
alphabet, the entropy rate of hidden Markov chains with finite
alphabet has been extensively studied. Next, we briefly review
the related results in the literature. In the remainder of this
section, unless specified otherwise, we assume Z is a finite
alphabet and X is a finite-state stationary Markov chain with
transition probability matrix Π and stationary vector π; then,
the output process Z is a finite-state hidden Markov chain.
It is well known that h(X), the entropy rate of X, has a
simple analytic formula in terms of Π and π; in stark contrast,
there is no simple and explicit formula of h(Z) for most
non-degenerate channels ever since hidden Markov chains (or,
more precisely, hidden Markov models) were formulated half
a century ago. Here, we remark that Blackwell [4] showed that
h(Z) can be written as an integral of an explicit function on
a simplex with respect to the Blackwell Measure. However,
the Blackwell measure seems to be rather complicated for
effective computation of h(Z).
Recently there has been a rebirth of interest in computing
and estimating h(Z) in a variety of scenarios, and many
approaches have been adopted to tackle this problem: the
Blackwell measure has been used to bound h(Z) [27], a
variation on the classical Birch bounds [3] can be found
in [7] and a new numerical approximation of h(Z) has been
proposed in [25]. Generalizing Blackwell’s idea, an integral
formula for the derivatives of h(Z) has been derived in [36].
In another direction, [2], [8], [18], [27], [42], [43], [26], [21],
[33], [36], [29] have studied the variation of the entropy rate as
parameters of the underlying Markov chain vary. Particularly
in [42], for a special type of hidden Markov chain Z, the
Taylor series expansion of h(Z) is given, under the assumption
that h(Z) is analytic.
Under mild positivity assumptions, we prove [11] that, h(Z)
is an analytic function of Π, thereby confirming the analyticity
assumption in [42]. Aside from its significance in mathematics
(see, e.g., [31], [32], [5], [22]), this analyticity result is of
interest to a wide range of applications (see, e.g., [42], [43],
[1], [25], [40], [36]). In particular, the analyticity result and
techniques employed in [11], such as complexification and
exponential convergence, are rather useful in many aspects of
information theory as well. Using the ideas for proving the
analyticity result, we derive [16] an asymptotic formula of
h(Z) at so-called “weak Black Holes”, rather general settings
which include “Black Holes” [12] and the input-restricted
channels in [13] as special cases. These results provide much
insight for characterizing [13] the asymptotic behavior (as
ε tends to zero) of capacity of a class of input-restricted
channels and deriving [17] concavity of the mutual information
rate for such channels. Here, we remark that the concavity
result in [17] confirms the concavity conjecture in [30] in
a special scenario; on the other hand, the conjecture in the
general setting has been disproved in [24] by counter-examples
constructed based on the results in [12]. It turns out the result
and techniques in [11] are instrumental in terms of computing
the mutual information rate and the channel capacity of finitestate channel as well. Recently, employing analyticity in a
critical way, we have established refinements of the ShannonMcMillan-Breiman theorem which sheds light on effective
computation of the mutual information rate of a class of finitestate channels and we further proposed [10] a randomized
algorithm to compute the capacity of a class of finite-state
channels, whose convergence behavior is analyzed using the
analyticity result.
In this paper, we will establish the analyticity of the entropy
rate of hidden Markov chains with continuous alphabet. Given
the implications of the counterpart results in the discrete
setting, we expect that the results in this paper will be of
great interest and significance in relevant areas.
Organization of the paper. The remainder of this paper is
organized as follows. In Section II, we review the (real) Hilbert
metric, outline the framework of the proofs of our theorems
and highlight the differences among the proofs. We prove
Theorem I.4 in Section III; on the other hand, we show that
analyticity fails for the Gaussian channel with uniform i.i.d.
inputs in Section IV. Section V is devoted to reviewing the
complex Hilbert metric, which is of critical use to the proofs
of Theorem I.1 in Section VI and Theorem I.8 in Section VII.
II. T HE M AIN I DEA OF THE P ROOFS
We first briefly review the classical (real) Hilbert metric. The
real Hilbert metric will be used in the proof of Theorem I.4.
Let W be the standard simplex in the l-dimensional real
Euclidean space,
X
W = {w = (w1 , w2 , · · · , wl ) ∈ Rl : wi ≥ 0,
wi = 1},
i
◦
and let W denote its interior, consisting of the vectors with
positive coordinates. For any two vectors v, w ∈ W ◦ , the
Hilbert metric [38] is defined as
wi /wj
.
(12)
dH (w, v) = max log
i,j
vi /vj
5
Remark II.1. It immediately follows from the definition that
for any diagonal matrix A with strictly positive diagonal
entries, we have
dH (w, v) = dH (wA, vA).
For an l × l positive matrix T = (tij ) (i.e., each tij > 0),
the mapping fT induced by T on W is defined by
wT
,
(13)
wT 1
where 1 is the all 1’s column vector. The following theorem
is well-known (see [38]).
fT (w) =
Theorem II.2. For a positive T , fT is a contraction mapping
on the entire W ◦ under the Hilbert metric and the contraction
coefficient (often referred to as the Birkhoff coefficient), is
given by
p
1 − φ(T )
dH (vT, wT )
p
,
(14)
=
τ (T ) = sup
1 + φ(T )
v6=w dH (v, w)
tik tjl
tjk til .
where φ(T ) = mini,j,k,l
We will also need a complex version of W ,
X
W̃ = {w = (w1 , w2 , · · · , wl ) ∈ Cl :
wi = 1}.
i
~
~
ε,θ
Now, for each z ∈ Z, define Π
the entries
(z) as an l × l matrix with
~
~
~
ε θ
Π~ε,θ (z)ij = πij
q (z|j), for all i, j.
~
(15)
~
By (13), Π~ε,θ (z) will induce a mapping fz~ε,θ , fΠ~ε,θ~ (z) from
0
, define
W to W . For any fixed n and z−n
~
~
~
i
x~εi ,θ = x~εi ,θ (z−n
) = p~ε,θ (yi = · |zi , zi−1 , · · · , z−n ),
(16)
(here · represent the states of the Markov chain Y ) then similar
~
to Blackwell [4], {x~εi ,θ } satisfies the random dynamical system
~
~
~
,θ
,θ
x~εi+1
= fz~εi+1
(x~εi ,θ ),
(17)
starting with
~
,θ
= π ~ε,
x~ε−n−1
(18)
where π ~ε is the stationary vector for Π~ε. And it can be verified
that
~ ~
~
~
,θ
−1
p~ε,θ (z0 |z−n
) = x~ε−1
Πε,θ (z0 )1,
(19)
~
~
ε,θ
2
to C~εm01 (r1 ) × Cm
~0 (r2 ). Ultimately hn (Z) (which is defined
θ
~ on p(z 0 ) and p(z0 |z −1 ))
by (2) with real superscripts (~ε, θ)
−n
−n
2
(r2 ) as well.
can be analytically extended to C~εm01 (r1 ) × Cm
~
θ0
The framework for the proofs of the main theorems can be
outlined as follows:
(I) If necessary, we consecutively re-block the Z process to
k(i)
a Ẑ process such that Ẑi is of the form Zj(i) .
(II) We then show that there exists a complex neighborhood
~
of a subset of W such that each complexified fẑ~ε,θ is a
contraction mapping, with respect to some metric, and
~ i
moreover the complexified x̂~εi ,θ (ẑ−n̂
) stays within the
neighborhood.
~ i
(III) It then follows that the complexified x~εi ,θ (z−n
) and thus
~
the complexified p~ε,θ (z0 |z−n ) exponentially forget their
initial conditions.
(IV) This, together with bounding arguments, will further im~
ply that the complexified h~εn,θ (Z) uniformly converges
to a complex analytic function, which is necessarily
~
the complexified h~ε,θ (Z), on a complex domain, and
~
therefore h~ε,θ (Z) is analytic.
The proofs of Theorem I.1 and I.8 are somewhat parallel,
while the latter requiring extra attention to address the technical issues arising in proving joint analyticity. On the other
hand, despite the fact that the proofs of Theorems I.4 and
Theorems I.8 all fit in the same above-mentioned framework,
there does not seem to be a natural way to unify them. Among
numerous differences, the most essential one is the way we
establish (II):
~
~
ε,θ
is
• To establish (II), we will use the fact that each real fz
◦
a contraction on W , with respect to the Hilbert metric.
Then we use the “equivalence” between the Euclidean
metric and the Hilbert metric, and equicontinuity in
Condition (d(i)) to establish the contractiveness (with
~
respect to the Euclidean metric) of the complexified fẑ~ε,θ
on a complex neighborhood of W .
• For Theorem I.8, to establish (II), the complex Hilbert
metric in Section V is employed to directly show the
contractiveness, with respect to the complex Hilbert met~
ric, of the complexified fz~ε,θ on a complex neighborhood
of W ◦ .
III. P ROOF OF T HEOREM I.4
~
and
~
~
~
~
0
p~ε,θ (z−n
) = π ~εΠ~ε,θ (z−n )Π~ε,θ (z−n+1 ) · · · Π~ε,θ (z0 )1. (20)
~
~
~
0
Evidently x~εi ,θ , p~ε,θ (z0 |z−n ) and p~ε,θ (z−n
) all depend on
~ ∈ Ω1 ×Ω2 . In what follows, we shall show
the real vector (~ε, θ)
that they can be “complexified”. For any ~ε ∈ C~εm01 (r1 ), one
checks that for r1 > 0 small enough, the stationary vector π ~ε
is unique and analytic on C~εm01 (r1 ) as a function of ~ε (because
P ~ε
it is the unique solution of π ~εΠ~ε = π ~ε,
y πy = 1).
~
Then through (18) and (17), x~εi ,θ can be analytically ex2
tended to C~εm01 (r1 ) × Cm
~ (r2 ); furthermore, through (19) and
~
~
ε,θ
(20), p
θ0
~
~
ε,θ
(z0 |z−n ) and p
0
(z−n
)
can be analytically extended
The following lemma says that fz~ε,θ does not change much
~
under a small complex perturbation of (~ε, θ).
Lemma III.1. For any δ > 0, there exist r1 , r2 > 0 such that
~ ∈ Cm1 (r1 ) × Cm2 (r2 ), any z ∈ Z and any
for any (~ε, θ)
~
ε0
~0
θ
x ∈ W , we have
~
~
kfz~ε,θ (x) − fz~ε0 ,θ0 (x)k ≤ δ
where k · k means the Euclidean norm.
Proof. We first note that
~
~
~
fz~ε,θ (x) =
xΠ~ε,θ (z)
xΠ~ε,θ~ (z)1
=
~
x(Π~ε,θ (z)/q θ (z|I))
x(Π~ε,θ~ (z)/q θ~ (z|I))1
.
6
Assuming (7): The lemma immediately follows from (7)
and Condition (d(i)).
Assuming (8): It follows from (8) that for sufficiently small
r1 , r2 > 0, there exists a compact subset Σ ⊂ Z such that for
~ ∈ Cm1 (r1 ) × Cm2 (r2 ), any z 6∈ Σ and any x ∈ W ,
any (~ε, θ)
~
~
ε0
~
~
ε,θ
Lemma III.3. 1) For any δ > 0, there exist r1 , r2 > 0
~ ∈ Cm1 (r1 ) × Cm2 (r2 ) and for
such that for any (~ε, θ)
~
ε0
~0
θ
0
all z−n
∈ Z n+1 and −n − 1 ≤ i ≤ −1,
~
i
x~εi ,θ (z−n
) ∈ W̃W (δ).
θ0
~
θ
x(Π (z)/q (z|I))1 is bounded away from 0. On the other
hand, by the compactness of Σ, we deduce that there exist
~ ∈ Cm1 (r1 ) × Cm2 (r2 ), any
r1 , r2 > 0 such that for any (~ε, θ)
~
~
ε0
~
θ0
~
z ∈ Z and any x ∈ W , x(Π~ε,θ (z)/q θ (z|I))1 is bounded away
from 0. The lemma then follows from Condition (d(i)).
Now, define
2)
3) There exist r1 , r2 > 0, 0 < ρ1 < 1 and L1 > 0 such that
for any two Z-valued sequences {a0−n1 } and {b0−n2 }
~ ∈ Cm1 (r1 )×Cm2 (r2 ),
with a0−n = b0−n and for all (~ε, θ)
~0
~
ε0
θ
we have
~
Lemma III.2. Given any (~ε0 , θ~0 ) ∈ Ω1 × Ω2 , there exist
r1 , r2 , δ > 0, 0 < ρ1 < 1 and a positive integer n0 such that,
~ ∈ Cm1 (r1 )×Cm2 (r2 ),
for all zij with j ≥ i+n0 and all (~ε, θ)
~
ε0
~
θ0
~
fz~εj,θ
i
is a ρ1 -contraction mapping on W̃W (δ) under the Euclidean metric.
Proof. For any z ∈ Z and sufficiently small r1 , r2 > 0, it
follows from Remark II.1 that for any u, v ∈ W , we have
~
~
dH (uΠ~ε0 ,θ0 (z), vΠ~ε0 ,θ0 (z)) = dH (uΠ~ε0 , vΠ~ε0 ).
(21)
It then follows that for any zij ,
~
θ0
contraction on W̃W (δ) under the Euclidean metric.
To prove (22), it is enough to prove the version of (22)
~ i
~ i
with x~εi ,θ (z−n
) replaced by x̂~εi ,θ (ẑ−n
) (with perhaps smaller
r1 , r2 ).
To see this, note that by Lemma III.1, for sufficiently small
~ ∈ Cm1 (r1 )×Cm2 (r2 ),
r1 , r2 > 0, for all ẑ, x ∈ W , and (~ε, θ)
~
ε0
~
θ0
~
~
kfẑ~ε,θ (x)
i
where τ (Π~ε0 ), the Birkhoff coefficient of Π~ε0 as defined in
(14), is strictly less than 1. It then follows from Lemma 3.4
in [23] (the mixing assumption is satisfied due to (15) and
Condition (a)) that there exists C > 0 such that for any zij
and any u, v ∈ W ,
~
kfz~εj0 ,θ0 (u)
i
−
~
fz~εj0 ,θ0 (v)k
i
≤ Cτ (Π~ε0 )j−i ku − vk.
For n0 sufficiently large, ρ0 := Cτ (Π~ε0 )n0 < 1. Thus, for
j ≥ i + n0 ,
~
k∇fz~εj0 ,θ0 (w)k ≤ ρ0 < 1
i
for all w ∈ W . From this and Condition (d(i)), it follows that
there exists r1 , r2 , δ > 0, 0 < ρ1 < 1 such that
~
k∇fz~εj,θ (w)k
i
(24)
kπ ~ε − π(~ε0 )k ≤ δ(1 − ρ1 ).
(25)
Thus,
~
~
~
~
~
~
~
~
,θ
0 ,θ0
kx̂~εi ,θ − x̂~εi 0 ,θ0 k = kfẑ~εi,θ (x̂~εi−1
) − fẑ~εi0 ,θ0 (x̂~εi−1
)k
~
~
,θ
0 ,θ0
≤ kfẑ~εi,θ (x̂~εi−1
) − fẑ~εi,θ (x̂~εi−1
)k
~
~
~
~
0 ,θ0
0 ,θ0
+ kfẑ~εi,θ (x̂~εi−1
) − fẑ~εi0 ,θ0 (x̂~εi−1
)k.
(26)
Then by (24) and (25), and (26), we have
~
~
~
~
,θ
0 ,θ0
kx̂~εi ,θ − x̂~εi 0 ,θ0 k ≤ ρ1 kx̂~εi−1
− x̂~εi−1
k + δ(1 − ρ1 ).
So, for all i,
~
~
~
The following lemma essentially follows from the framework in the proof of Theorem 1.1 in [11]. We briefly outline
the proof for completeness.
We first introduce some notation. Let
~
−1
−1
p̊~ε,θ (z0 |z−n
) , p~ε,θ (z0 |z−n
)/q θ (z0 |I),
where I is the same as in Condition (d).
≤ δ(1 − ρ1 ),
and for all ~ε ∈ C~εm01 (r1 )
≤ ρ1 < 1
~
−
~
fẑ~ε0 ,θ0 (x)k
kx̂~εi ,θ − x̂~εi 0 ,θ0 k ≤ δ,
~ ∈ Cm1 (r1 ) × Cm2 (r2 ). The
for all w ∈ W̃W (δ) and (~ε, θ)
~
ε0
~0
θ
lemma then follows.
~
(23)
Proof. 1. For a fixed n0 > 0, we will consecutively reblock
k(i)
−1
−1
z−n
into ẑ−n̂
such that each ẑi is of the form zj(i) , where
k(i) − j(i) + 1 = n0 (n0 is determined below).
~
~
0
By (17), for any z−n
and i, x~εi 0 ,θ0 (and thus x̂~εi 0 ,θ0 ) belongs
to W ◦ . By Lemma III.2, we can choose r1 , r2 , δ > 0
sufficiently small, n0 sufficiently large and 0 < ρ1 < 1
~ ∈ Cm1 (r1 ) × Cm2 (r2 ), f ~ε,θ~ is a ρ1 such that for all (~ε, θ)
ẑ
~
ε0
~
dH (fz~εj0 ,θ0 (u), fz~εj0 ,θ0 (v)) ≤ τ (Π~ε0 )j−i dH (uΠ~ε0 , vΠ~ε0 ),
i
~
~
ε,θ
n
|p̊~ε,θ (a0 |a−1
(b0 |b−1
−n1 ) − p̊
−n2 )| ≤ L1 ρ1 .
W̃W (δ) = {w̃ ∈ W̃ : kw̃ − wk < δ for some w ∈ W }.
We need the following lemma.
(22)
0
There exist r1 , r2 > 0 such that for all z−n
∈ Z n+1 ,
~
−1
2
p~ε,θ (z0 |z−n
) is analytic on C~εm01 (r1 ) × Cm
~0 (r2 ).
θ
and thus for all i, we have x̂~εi ,θ ∈ W̃W (δ), as desired.
2. It follows from Condition (d(i)) that for sufficiently small
~
r1 , r2 , δ > 0 and any z ∈ Z, fz~ε,θ (x) is analytic with respect to
m
m
~ x) ∈ C 1 (r1 )×C 2 (r2 )× W̃W (δ). It then follows from
(~ε, θ,
~
ε0
~
θ0
~
this fact and the iterative nature of x~εi ,θ (see (17)) and Part 1
~
that for sufficiently small r1 , r2 > 0, each x~εi ,θ is analytic on
m2
m1
C~ε0 (r1 ) × C~ (r2 ). Part 2 then immediately follows from
θ0
(19).
3. Applying the same reblocking as in Part 1, we write
~
~
~
,θ
x̂~εi,â
= x̂~εi ,θ (âi−n̂1 ) = p~ε,θ (yk(i) = · |âi−n̂1 ),
7
~
~
~
x̂~εi,,b̂θ = x̂~εi ,θ (b̂i−n̂2 ) = p~ε,θ (yk(i) = · |b̂i−n̂2 ).
In other words, Rn uniformly (in θ ∈ D) converges to
Z
f (θ, z)dz,
Evidently we have
~
~
~
~
,θ
,θ
,θ
x̂~εi+1,â
= fâ~εi+1
(x̂~εi,â
),
~
Σ
~
,θ
x̂~εi+1,
= fb̂~ε,θ (x̂~εi,,b̂θ ).
b̂
i+1
Note that there exists a positive constant L01 such that for all
~ ∈ Cm1 (r1 ) × Cm2 (r2 ),
(~ε, θ)
~0
θ
~
ε0
~
~
,θ
,θ
kx̂~ε−n̂,â
− x̂~ε−n̂,
k ≤ L01 ,
b̂
~ ∈ Cm1 (r1 ) × Cm2 (r2 ), where r1 , r2 > 0 are
for all (~ε, θ)
~
~
ε0
θ0
~
chosen sufficiently small. Since fẑ~ε,θ is a ρ1 -contraction on
W̃W (δ), we have, by Part 1,
~
~
,θ
,θ
kx̂~ε−1,â
− x̂~ε−1,
k ≤ L01 ρn̂−1
.
1
b̂
This implies that there exists L1 > 0, independent of n1 , n2 ,
~ ∈ Cm1 (r1 ) × Cm2 (r2 ),
such that for all (~ε, θ)
~
~
ε0
θ0
~
~
ε,θ
kp̊
~
~
ε,θ
(a0 |a−1
−n1 ) − p̊
R
and so Σ f (θ, z)dz is continuous in θ ∈ D.
Now, take any increasing sequence
of compact sets Σi
R
whose union is Rn2 . By R(28), Σi f (θ, z)dz converges uniformly, in θ ∈ D, to Rn2 f (θ, z)dz, which is therefore
continuous on D. This gives Part 1.
Part 2 follows in the same way with analyticity replacing
continuity.
n
(b0 |b−1
−n2 )k ≤ L1 ρ1 .
(27)
We are now ready for the proof of Theorem I.4.
Proof of Theorem I.4. We first show that there exist r1 , r2 > 0
~
2
such that for any n, h~εn,θ (Z) is analytic on C~εm01 (r1 )×Cm
~0 (r2 ).
θ
For a fixed n, recall that
Z
~
~ 0
~
−1
~
ε,θ
0
hn (Z) = −
p~ε,θ (z−n
) log p~ε,θ (z0 |z−n
)dz−n
,
Z n+1
where
~
The proof of the following lemma is straightforward.
Nonetheless, we sketch the proof for completeness.
Lemma III.4. Let n1 and n2 be positive integers and D a
compact domain in Cn1 . Let f (θ, z) be a jointly continuous
function on D × Rn2 . Assume that
Z
sup |f (θ, z)|dz < ∞.
(28)
Rn2
θ∈D
Then
R
1) Rn2 f (θ, z)dz is continuous on D.
2) RIf, for each z ∈ Z, f is analytic on D, then
f (θ, z)dz is analytic on D.
Rn2
Proof. Let Σ be a compact domain in Rn2 . Let δi , i = 1, 2, . . .
be a sequence of positive numbers converging to 0. Consider
a sequence of partitions of Σ:
Σ=
Rn =
f (θ, zi )vol(∆n,i )
i=1
(here zi ∈ ∆n,i ) is continuous in θ. Then
Z
mn Z
X
f (θ, z)dz − Rn =
(f (θ, z) − f (θ, zi ))dz.
Σ
i=1
∆n,i
By the compactness of D and Σ, we deduce that for any
ε0 > 0, there exists N0 such that for all n ≥ N0 and all
i = 1, 2, . . . , mn
|f (θ, z) − f (θ, zi )| ≤ ε0 , for any θ ∈ D and any z ∈ ∆n,i ,
X
and
0
Y
0
p~ε(y−n
)
0
y−n
(29)
i=−n
~
~
~
q θ (zi |yi )
~
,θ −1
−1
) = x~ε−1
p~ε,θ (z0 |z−n
(z−n )Π~ε,θ (z0 )1.
P
0
Now, it follows from (29) and the fact that
p~ε(y−n
)
0
y−n
is continuous and therefore bounded as a function of ~ε ∈
C~εm01 (r1 ) that there is a constant K > 0 such that
0
Y
~
sup
m1
m
~
(~
ε,θ)∈C
(r1 )×C~ 2 (r2 )
~
ε0
θ0
0
|p~ε,θ (z−n
)| ≤ K
i=−n
~
|q θ (zi |y)|.
sup
m2
~
(y,θ)∈Y×C
(r2 )
~
θ
0
(30)
It also follows from Part 1 of Lemma III.3 and (Condition
(7) or (8)) that for sufficiently small r1 , r2 > 0, there exist
0
C1 , C2 > 0 such that for any z−n
,
~
−1
)| ≤ C2 ,
C1 ≤ |p̊~ε,θ (z0 |z−n
n
∪m
i=1 ∆n,i ,
where diam(∆n,i ) ≤ δn for all 1 ≤ i ≤ mn . Evidently, the
corresponding Riemann sum
mn
X
0
p~ε,θ (z−n
)=
(31)
which implies that for some C3 > 0,
~
−1
| log p̊~ε,θ (z0 |z−n
)| ≤ C3 .
(32)
~ ∈ Cm1 (r1 ) × Cm2 (r2 ),
We then deduce that for any (~ε, θ)
~
ε0
~0
θ
Z
~
~ε,θ~ 0
−1 0
sup
) dz−n
p (z−n ) log p~ε,θ (z0 |z−n
m1
m
~
Z n+1 (~
ε,θ)∈C
(r1 )×C~ 2 (r2 )
~
ε
θ0
0
Z
=
sup
m1
m
~
Z n+1 (~
ε,θ)∈C
(r1 )×C~ 2 (r2 )
~
ε
0
~
~
ε,θ
+p
~
~ε,θ~ 0
p (z−n ) log q θ (z0 |I)
θ0
~
−1 0
0
(z−n
) log p̊~ε,θ (z0 |z−n
) dz−n
0
Y
~
~
0
θ
≤
sup K q (zi |yi ) log q θ (z0 |I) dz−n
m2
n+1
~
Z
θ∈C
(r2 )
Z
~
θ
0
i=−n
which implies that for any ε > 0, there exists N1 such that Z
~ 0
~
−1
0
for all n ≥ N1 and all θ ∈ D,
+
sup
|p~ε,θ (z−n
)| | log p̊~ε,θ (z0 |z−n
)|dz−n
< ∞.
m
m
n+1
Z
1 (r )×C 2 (r )
~
Z
(~
ε
,
θ)∈C
1
2
~
ε0
~
θ
0
f (θ, z)dz − Rn ≤ ε.
(33)
Σ
8
(for the first term we have used (5(i)) and (6); for the second
term, we have used (5(i)), (30) and (32)). By lemma III.4 (Part
2),
Z
~
~ 0
~
−1
−1
~
ε,θ
0
hn (Z0 |Z−n ) =
p~ε,θ (z−n
) log p~ε,θ (z0 |z−n
)dz−n
,
Z n+1
2
is analytic on C~εm01 (r1 ) × Cm
~0 (r2 ).
θ
Now, to prove the theorem, we only need to prove that
~
there exist r1 , r2 > 0 such that h~εn,θ (Z) uniformly converges
2
on C~εm01 (r1 ) × Cm
~0 (r2 ) as n → ∞. First, we observe that
θ
Z
~
~ 0
~
~
,θ
−1
0
p~ε,θ (z−n−1
) log p~ε,θ (z0 |z−n−1
)dz−n−1
(Z) − h~εn,θ (Z)| = |h~εn+1
n+2
Z Z
~ 0
~
−1
~
ε,θ
~
ε,θ
0 −
p (z−n ) log p (z0 |z−n )dz−n n+1
ZZ
~ 0
~
−1
0
p~ε,θ (z−n−1
) log p̊~ε,θ (z0 |z−n−1
)dz−n−1
= n+2
Z
Z
~ 0
~
−1
0 −
p~ε,θ (z−n
) log p̊~ε,θ (z0 |z−n
)dz−n
n+1
ZZ
~ 0
~
−1
p~ε,θ (z−n−1
)(log p̊~ε,θ (z0 |z−n−1
)
= Z n+2
~
−1
0
))dz−n−1
− log p̊~ε,θ (z0 |z−n
.
~ ∈ Cm1 (r1 ) × Cm2 (r2 ). Then, by (23), (31), (32),
Fix (~ε, θ)
~
ε0
~0
θ
we have, for some 0 < ρ1 < 1, L01 , L1 > 0,
~
~
~
−1
−1
0
|p~ε,θ (z−n−1
)(log p̊~ε,θ (z0 |z−n−1
) − log p̊~ε,θ (z0 |z−n
))|
~ 0
~
~
−1
−1 0 ~
ε,θ
~
ε,θ
~
ε,θ
≤ L1 p (z−n−1 )(p̊ (z0 |z−n−1 ) − p̊ (z0 |z−n ))
~
0
≤ L01 |p~ε,θ (z−n−1
)|L1 ρn1 .
Notice that for any given δ > 0, there exist r1 , r2 > 0 such
that for all ~ε ∈ C~εm01 (r1 ),
0
kπy~ε−n k ≤ (1 + δ)πy~ε−n
,
kπy~εi yi+1 k ≤ (1 + δ)πy~εi0yi+1 ,
2
and for any y ∈ Y and all θ~ ∈ Cm
~0 (r2 ),
θ
Z
Z
~
~
|q θ (z|y)|dz ≤ (1 + δ)
q θ0 (z|y)dz = 1 + δ,
IV. FAILURE OF A NALYTICITY
With the following example, we show that for Gaussian
channels, h(Z) need not be analytic even as a function of the
channel parameters alone, when the largest σi is not unique.
Example IV.1. Consider an additive Gaussian channel parameterized as in (10) with the binary input alphabet Y = {1, 2}.
Assume that the input Y is an i.i.d. process with
P (Y1 = 1) = P (Y1 = 2) = 1/2;
and assume that
2
2
2
2
1
1
e−(z+1) /σ1 , q(z|2) = √
e−(z−1) /σ2 .
q(z|1) = √
2πσ1
2πσ2
We then have
p(z) = P (Y = 1)q(z|1) + P (Y = 2)q(z|2)
2
2
2
2
1 1
1 1
= √
e−(z+1) /σ1 + √
e−(z−1) /σ2 .
2 2πσ1
2 2πσ2
We claim that for any fixed σ > 0, analyticity of h(Z) as
a function of (σ1 , σ2 ) fails at (σ1 , σ2 ) = (σ, σ). To see this,
we fix σ1 = σ, and we show that h(Z) is not analytic with
respect to σ2 at σ2 = σ. Note that for any real σ2 ,
Z ∞
h(Z) = −
p(z) log p(z)dz
Z−∞
∞
1 1
−(z−1)2 /σ22
e
=−
p(z) log √
dz
2 2πσ2
0
Z 0
2
2
1 1
e−(z−1) /σ dz
(34)
−
p(z) log √
2 2πσ
Z−∞
∞
−
p(z) log (1 + Φz (σ2 )) dz
0
Z 0
−
p(z) log 1 + Φ−1
(35)
z (σ2 ) dz,
−∞
where
1
σ2 (z−1)2 /σ22 −(z+1)2 /σ2
e
and Φ−1
.
z (σ2 ) ,
σ
Φz (σ2 )
(36)
Z
Z
We
note
that
(34)
is
analytic
as
a
function
of
σ
at
σ
=
σ.
2
2
R
~
(here we have used the fact that Z |q θ (z|y)|dz is a continuous To see this, observe that the first term of (34) can be further
2
computed as
function of θ~ ∈ Cm
~0 (r2 ); this follows from Lemma III.4 (Part
θ
Z ∞
Z ∞
1)). It then follows from (30) that
√
2
2
1
Z
√
p(z)dz−
(z−1)2 e−(z+1) /σ dz
− log(2 2πσ2 )
2
~
−1
−1
2 2πσσ2
0
0
|p~ε,θ (z−n−1
)|dz−n−1
≤ (1 + δ)2(n+2) .
Z ∞
Z n+1
2
2
1
√
−
(z − 1)2 e−(z−1) /σ2 dz,
By choosing δ > 0 sufficiently small, we can combine all
3
2 2πσ2
0
the relevant inequalities above to obtain some L > 0 and some
m
m
1
2
~ ∈ C (r1 ) × C (r2 ),
which is analytic at σ2 = σ since each of the three terms above
0 < ρ < 1 such that for all (~ε, θ)
~
ε0
~0
θ
is analytic at σ2 = σ (for the second or third term, regard σ2 as
~
~
,θ
a complex variable and use the exponentially-decaying tail of
|h~εn+1
(Z) − h~εn,θ (Z)|
Z
the integrand). With a similar argument applied to the second
~ 0
~
~
−1
−1
0
≤
|p~ε,θ (z−n−1
)(log p̊~ε,θ (z0 |z−n−1
) − log p̊~ε,θ (z0 |z−n
))|dz−n−1
term of (34), we can then establish the analyticity of (34). So,
Z n+2
to prove h(Z) is not analytic at σ2 = σ, it suffices to show
n
≤ Lρ ,
that (35) is not analytic at σ2 = σ.
~
Suppose, by way of contradiction, that (35) is analytic at σ,
which implies the uniform convergence of h~εn,θ (Z) on
or equivalently, the following function of ω
m1
m2
C~ε0 (r1 ) × C~ (r2 ) as n tends to infinity, and thus the Z θ0
∞
2 2
1 ω
1 σ −1 −(z+1)2 σ−2
~
√ e
+ √ e−(z−1) ω log (1 + Φz (1/ω)) dz
analyticity of h~ε,θ (Z) around (~ε0 , θ~0 ).
Φz (σ2 ) ,
0
2
2π
2
2π
9
Z
0
+
−∞
2 2
1 σ −1 −(z+1)2 σ−2
1 ω
√ e
+ √ e−(z−1) ω
2 2π
2 2π
log 1 + Φ−1
z (1/ω) dz
(37)
is analytic at σ −1 ∈ Cσ−1 (r) (the closure of the rneighborhood of σ −1 in C) for some r > 0, where, recalling
from (36),
Φz (1/ω) =
σ −1 (z−1)2 ω2 −(z+1)2 σ−2
e
.
ω
Then, by uniqueness, the analytic extension of (37) to Cσ−1 (r)
would agree with any analytic extension along the circle
{σ −1 + reiα : α ∈ [−π/2, 3π/2]} (from α = −π/2
to α = 3π/2). Such an analytic extension is obtained by
regarding ω as a complex variable on the circle (this is a valid
analytic extension by virtue of the exponentially-decaying tails
of the integrands in (37)). Here, we remark that for any r > 0
and α, there are at most two “singular” z (note that the
following inequality boils down to a system of two quadratic
equations in z) such that
and
∂
(2(z − 1)2 σ −1 r sin α) ∂ ((z − 1)2 r2 sin 2α + b(r, α)) .
∂α
∂α
Note that for all 0 ≤ z ≤ N , by (I), Φz (1/ω) will not
go around −1 (in any direction) for one complete round as
α increases from −π/2 to 3π/2. Next, we consider the case
when z ≥ N . Notice that, by (II), for any fixed z ≥ N , as
α increases from −π/2 + ε to π/2 − ε, B(z, r, α) increases
as well. If, for some z ≥ N and α0 ∈ [−π/2, −π/2 + ε] ∪
[π/2 − ε, π/2],
A(z, r, α0 ) > 0,
it then follows from (II) that there exists ` = `(ε) > 0 such that
` → ∞ as ε → 0 and for the same z and any α ∈ [−`ε, `ε],
A(z, r, α) > 0.
On the other hand, it follows from (II) that for any z ≥ N
and for any α ∈ [π/2 + ε, 3π/2 − ε],
Φz (1/(σ −1 + reiα )) = −1,
−1
A(z, r, α) < 0;
iα
which
means
log 1 + Φz (1/(σ + re ))
or
−1
+ reiα )) would “blow up” at such z.
log 1 + Φ−1
z (1/(σ
However, an easy bounding argument (roughly speaking, the
two “blowing up” terms will only do so “slowly”) yields that
during the analytic extension, (37) is still well-defined with
the presence of such singular z, and so the above analytic
extension is indeed valid.
Next, we will find a contradiction by showing that the
analytic extension of (37) disagrees at α = −π/2 and
α = 3π/2. Setting ω = σ −1 + reiα , we then have
σ −1
σ −1
= −1
w
σ + r cos α + ir sin α
σ −2 + σ −1 r cos α − iσ −1 r sin α
=
, ea(r,α)+ib(r,α) ,
σ −2 + 2σ −1 r cos α + r2
where one can easily check that
∂b(r, α)
= O(r).
∂α
Then, some straightforward computations yield that
a(r, α) = O(r),
b(r, α) = O(r),
Φz (1/ω) = eA(z,r,α) eiB(z,r,α) ,
where
A(z, r, α) , 2(z−1)2 σ −1 r cos α+(z−1)2 r2 cos 2α−4zσ −2 +a(r, α),
and
B(z, r, α) , 2(z −1)2 σ −1 r sin α+(z −1)2 r2 sin 2α+b(r, α).
Now, for some small yet fixed ε > 0, choose N > 0 large
enough and then r > 0 small enough such that
(I) for all 0 ≤ z ≤ N and all α ∈ [−π/2, 3π/2], B(z, r, α) ∈
(−π, π);
(II) for all z ≥ N and all α ∈ [−π/2 + ε, π/2 − ε] ∪ [π/2 +
ε, 3π/2 − ε],
4zσ −2 |a(r, α)|,
|2(z − 1)2 σ −1 r cos α| |(z − 1)2 r2 cos 2α + a(r, α)|
straightforward computations also yield that for any z ≥ N
and for any α ∈ [π/2, π/2 + ε] ∪ [3π/2 − ε, 3π/2],
A(z, r, α) < 0.
It then follows that Φz (1/ω) will not go around −1 (in any
direction) for one complete round as α increases from π/2 to
3π/2.
We are now ready to conclude that as α increases from
−π/2 to 3π/2, for any z ≥ N with Φz (1/(σ −1 +reiα )) 6= −1,
Φz (1/ω) will go around −1 anti-clockwise k(z) times, where
k(z) is a non-negative integer; meanwhile, one checks that
when z is large enough, k(z) is strictly positive. The idea
can be roughly described as follows. Consider the “trajectory”
of Φz (1/ω) as α increases from −π/2 to 3π/2. Obviously,
A(z, r, α) > 0 means the magnitude of the corresponding
“location” is strictly bigger than 1; B(z, r, α) > 0 means at the
corresponding “location”, Φz (1/ω) is going anti-clockwise.
The above argument shows that given sufficiently small ε (and
thus ` sufficiently large), for all the time when the “location”
is at least 1 away from the origin, “more often” Φz (1/ω)
goes around −1 anti-clockwise (for any α ∈ [−π/2, −π/2 +
ε] ∪ [π/2 − ε, π/2], Φz (1/ω) may go around −1 clockwise,
whereas for all α ∈ [−`ε, `ε], Φz (1/ω) must go around −1
anti-clockwise).
So, for any analytic extension along the circle {σ −1 +reiα :
α ∈ [−π/2, 3π/2]} (from α = −π/2 to α = 3π/2), we have
proven that for any z ≥ 0 with Φz (1/(σ −1 + reiα )) 6= −1,
−1
iα
=
lim
log 1 + Φz (1/(σ + re ))
α→(3π/2)−
==
lim
α→(−π/2)+
log 1 + Φz (1/(σ −1 + reiα )) + 2k(z)πi,
where k(z) is a non-negative integer for all z and a strictly
positive integer for all sufficiently large z. Using a similar
10
argument, we can also prove that for any z ≤ 0 with
−1
Φ−1
+ reiα )) 6= −1,
z (1/(σ
−1
−1
iα
=
lim
log 1 + Φz (1/(σ + re ))
α→(3π/2)−
==
lim
α→(−π/2)+
M = {Π = (πij ) ∈ Rl×l : πij ≥ 0,
−1
−1
iα
log 1 + Φz (1/(σ + re )) +2k(z)πi,
Remark IV.2. The previous example shows that for fixed
σ1 = σ, h(Z) is not analytic with respect to σ2 at σ2 = σ.
On the other hand, when σ1 , σ2 are both equal to σ > 0 and
vary together keeping σ1 = σ2 , h(Z) is in fact analytic with
respect to σ. To see this, note that when σ1 = σ2 = σ > 0,
we have
2
2
1 1 −(z+1)2 /σ2
+ e−(z−1) /σ ,
e
p(z) = √
2 2πσ
l
X
πij = 1},
j=1
where k(z) is a non-negative integer for all z and a strictly
positive integer for all sufficiently large |z|. This, however,
implies that for the above analytic extension, (37) disagrees at
α = −π/2 and α = 3π/2, which is a contradiction.
and furthermore,
Z
h(Z) = −
call a complex Hilbert metric (for alternative complex Hilbert
metrics, see [37] and [6]).
Let M denote the set of all l × l stochastic matrices, i.e.,
and let M̃ denote the complex version of M , defined as
M̃ = {Π = (πij ) ∈ Cl×l :
l
X
πij = 1, for all i}.
j=1
For a given positive Π ∈ M and a small δ1 > 0, let
M̃Π (δ1 ) denote the δ1 -neighborhood, under the Euclidean
metric, around Π within M̃ . For an element Π̃ ∈ M̃Π (δ1 ),
similar to (13), Π̃ will induce a mapping fΠ̃ on W̃ . For a
small δ2 > 0, let W̃W ◦ ,H (δ2 ) denote the δ2 -neighborhood of
W ◦ within W̃ + under the complex Hilbert metric, i.e.,
W̃W ◦ ,H (δ2 ) = {v = (v1 , v2 , · · · , vl ) ∈ W̃ + : d˜H (v, u) ≤ δ2 ,
for some u ∈ W ◦ }.
The main result of [15] states:
∞
p(z) log p(z)
−∞
Z ∞
√
p(z)(z + 1)2 /σ 2 dz
= − log(2 2πσ) +
−∞
Z ∞
2
−
p(z) log(1 + e−4z/σ )dz
−∞
Z ∞
√
p(z)(z + 1)2 /σ 2 dz
= − log(2 2πσ) +
−∞
Z ∞
2
−σ
p(σ 2 z) log(1 + e−4z )dz.
(38)
−∞
Note that it is easy to check that all the three terms in (38)
are analytic with respect to σ, which immediately implies that
h(Z) is analytic with respect to σ.
V. A C OMPLEX H ILBERT M ETRIC
In this section, we briefly review the complex Hilbert metric
that will be of critical use in the proofs of Theorem I.1 and I.8.
Several useful properties of this metric will be stated/proved
as well.
Recall that W̃ denote the complex version of W ,
X
W̃ = {w = (w1 , w2 , · · · , wl ) ∈ Cl :
wi = 1}.
i
Let
W̃ + = {v ∈ W̃ : <(vi /vj ) > 0 for all i, j}.
For v, w ∈ W̃ + , define
wi /wj ˜
,
dH (v, w) = max log
i,j
vi /vj (39)
where log is taken as the principal branch of the complex
log(·) function (i.e., the branch whose branch cut is the
negative real axis). Since the principal branch of log is additive
on the right-half plane, d˜H is a metric on W̃ + , which we
Theorem V.1. Let Π be a positive matrix in M . For sufficiently
small δ1 , δ2 > 0, there exists 0 < ρ1 < 0 such that for any
Π̃ ∈ M̃Π (δ1 ), fΠ̃ is a ρ1 -contraction mapping on W̃W ◦ ,H (δ2 )
under the complex Hilbert metric in (39).
For δ > 0, let CR+ [δ] denote the “δ-cone” of R+ within C,
i.e.,
CR+ [δ] = {x + yi ∈ C : x > 0, −δx ≤ y ≤ δx}.
The following Lemma can be easily checked.
Lemma V.2. For sufficiently small δ1 > 0, there exists a
positive constant L1 such that for any α, β ∈ CR+ [δ1 ]
|α − β| |α − β|
,
.
| log α − log β| ≤ L1 max
|α|
|β|
The following lemma essentially follows from the proof of
Part 2 of Lemma 2.3 in [15] (in particular, its Part 1 is just
a rephrased version of Part 2 of that lemma), allows us to
connect the complex Hilbert metric and the Euclidean metric.
We give a proof for completeness.
Lemma V.3. 1) For any δ > 0, there exists ξ > 0 such
that for any x̃ ∈ W̃ + , x ∈ W ◦ with d˜H (x̃, x) ≤ ξ, we
have x̃i ∈ W̃W ◦ ,H (δ) for all i.
2) For any ζ > 0, there exists a constant C > 0 such that
for any x̃, ỹ ∈ W̃ + with kx̃ − xk, kỹ − yk ≤ ζ for some
x, y ∈ W ◦ , we have
kx̃ − ỹk ≤ C d˜H (x̃, ỹ).
Proof. We only prove Part 2. Let ξ = d˜H (x̃, ỹ). Then we have
for all i, j,
log x̃i /ỹi ≤ ξ.
x̃j /ỹj There exists
C1 > 0 such that for ξ sufficiently small, and for
/ỹi
all i, j, x̃x̃ji /ỹ
− 1 ≤ C1 ξ. Let αj = x̃j /ỹj . Then for all i, j,
j
|x̃i − αj ỹi | ≤ C1 ξ|αj ||ỹi |,
11
and so
n
n
X
X
|1 − αj | = (x̃i − αj ỹi ) ≤
|x̃i − αj ỹi |
i=1
≤ C1 ξ|αj |
i=1
n
X
|ỹi | = C1 (1 + Bζ)ξ|αj |.
i=1
It follows that
|x̃j − ỹj | ≤ C1 (1 + Bζ)ξ|x̃j | ≤ C1 (1 + Bζ)ξ(xj + ζ),
which implies Part 2, if ξ is sufficiently small.
VI. P ROOF OF T HEOREM I.1
The following lemma is an analog of Lemma III.1.
Lemma VI.1. For any δ > 0, there exists r1 > 0 such that
for any ~ε ∈ C~εm01 (r1 ), any z ∈ Z and any x ∈ W , we have
d˜H (fz~ε(x), fz~ε0 (x)) ≤ δ.
Proof. Since all πij (~ε0 ) are strictly positive, for any δ1 > 0,
there exists r1 > 0 such that for all i, j and all ~ε ∈ C~εm01 (r1 ),
~
ε
|πij
− πij (~ε0 )|
≤ δ1 .
πij (~ε0 )
Now, for any x = (x1 , x2 , · · · , xl ) ∈ W , any j and any ~ε ∈
C~εm01 (r1 ), we have
Pl
~
ε
i=1 xi (πij − πij (~ε0 )) Pl
ε0 )
i=1 xi πij (~
Pl
~
ε
i=1 xi πij (~ε0 )(πij − πij (~ε0 ))/πij (~ε0 ) =
≤ δ1 .
Pl
xi πij (~ε0 )
i=1
Thus, for any δ2 > 0, choosing δ1 sufficiently small, we have
Pl
~
ε
x
π
i
ij
i=1
log Pl
ε0 ) i=1 xi πij (~
!
Pl
~
ε
ε0 )) i=1 xi (πij − πij (~
= log 1 +
≤ δ2 .
Pl
xi πij (~ε0 )
i=1
Notice that
d˜H (fz~ε(x), fz~ε0 (x))
Pl
Pl
~
ε
~
ε
i=1 xi πij q(z|j)
i=1 xi πik q(z|k) = max log Pl
− log Pl
j,k ε0 )q(z|j)
ε0 )q(z|k) i=1 xi πij (~
i=1 xi πik (~
Pl
Pl
~
ε
~
ε
i=1 xi πij
i=1 xi πik = max log Pl
− log Pl
.
j,k ε0 )
ε0 ) i=1 xi πij (~
i=1 xi πik (~
It then follows that for any δ > 0, there exists r1 > 0 such
that for any ~ε ∈ C~εm01 (r1 ) and any x ∈ W , we have
d˜H (fz~ε(x), fz~ε0 (x)) ≤ δ.
is uniform over all the values of z. More precisely, recalling
W̃W ◦ ,H (δ) denote the δ-neighborhood of W ◦ of W̃ under the
complex Hilbert metric, we have
Lemma VI.2. For sufficiently small r1 , δ > 0, there exists
0 < ρ1 < 1 such that for any ~ε ∈ C~εm01 (r1 ) and any z ∈ Z, fz~ε
is a ρ1 -contraction mapping on W̃W ◦ ,H (δ) under the complex
Hilbert metric in (39).
Proof. It can be easily checked that there exist some r1 , δ > 0
such that for any ~ε ∈ C~εm01 (r1 ), any u, v ∈ W̃W ◦ ,H (δ) and any
z ∈ Z,
d˜H (uΠ~ε(z), vΠ~ε(z)) = d˜H (uΠ~ε, vΠ~ε).
(40)
The lemma then immediately follows from Theorem V.1.
The following lemma is also needed.
Lemma VI.3. 1) For any δ > 0, there exist r1 > 0 such
0
∈ Z n+1 and
that for any ~ε ∈ C~εm01 (r1 ) and for any z−n
−n − 1 ≤ i ≤ −1,
i
x~εi(z−n
) ∈ W̃W ◦ ,H (δ),
(41)
−1
) ∈ CR+ [δ].
p~ε(z0 |z−n
(42)
and
0
2) There exist r1 > 0 such that for all z−n
∈ Z n+1 ,
−1
p~ε(z0 |z−n
) is analytic on C~εm01 (r1 ).
3) For sufficiently small r1 > 0, there exist 0 < ρ1 < 1 and
a positive constant L1 such that for any two Z-valued
sequences {a0−n1 } and {b0−n2 } with a0−n = b0−n and for
all ~ε ∈ C~εm01 (r1 ), we have
−1
n
~
ε
q(a0 |y 0 ).
|p~ε(a0 |a−1
−n1 ) − p (b0 |b−n2 )| ≤ L1 ρ1 max
0
y ∈Y
Proof. 1. By Lemma VI.2, we can choose r1 , δ > 0 sufficiently small such that there exists 0 < ρ1 < 1 such that for all
~ε ∈ C~εm01 (r1 ), fz~ε is a ρ1 -contraction mapping on W̃W ◦ ,H (δ)
under the complex Hilbert metric.
Now, choose r1 > 0 so small (the existence of r1 is
guaranteed by Lemma VI.1) such that for any z ∈ Z, for
all x ∈ W , all ~ε ∈ C~εm01 (r1 )
d˜H (fz~ε(x), fz~ε0 (x)) ≤ δ(1 − ρ1 ),
and for all ~ε ∈
(43)
C~εm01 (r1 ),
d˜H (π ~ε, π(~ε0 )) ≤ δ(1 − ρ1 ).
(44)
We then deduce that
0
0
d˜H (x~εi+1 , x~εi+1
) = d˜H (fz~εi+1 (x~εi), fz~εi+1
(x~εi 0 ))
≤ d˜H (f ~ε (x~ε), f ~ε (x~ε0 ))
zi+1
i
zi+1
i
0
+ d˜H (fz~εi+1 (x~εi 0 ), fz~εi+1
(x~εi 0 )).
(45)
Then, by (43), (44) and (45), for i > −n − 1, we have
0
d˜H (x~εi+1 , x~εi+1
) ≤ ρd˜H (x~εi, x~εi 0 ) + δ(1 − ρ1 ).
So, for all i,
The following lemma, roughly speaking, says that when we
perturb ~ε0 “a bit” to ~ε, fz~ε is still a contraction mapping on a
complex neighborhood of W ◦ , and the contraction coefficient
0
d˜H (x~εi+1 , x~εi+1
) ≤ δ,
and thus for all i, we have x~εi+1 ∈ W̃W ◦ ,H (δ), as desired. This,
together with (19) and Lemma V.3 (Part 2), implies (42).
12
2. It follows from Part 1 of Lemma V.3 that for sufficiently
small r1 , δ > 0 and any z ∈ Z, fz~ε(x) is analytic with respect
to (~ε, x) ∈ C~εm01 (r1 )× W̃W ◦ ,H (δ). It follows from this fact, the
~
x~εi ,θ
iterative nature of
(see (17)) and Part 1 that for sufficiently
small r1 > 0, each x~εi is analytic on C~εm01 (r1 ). Part 2 then
immediately follows from (19).
3. For all ~ε ∈ C~εm01 (r1 ), we write
And, by (41), for sufficiently small r1 > 0, there exist
C1 , C2 > 0 such that for all ~ε ∈ C~εm01 (r1 )
−1
C1 min
q(z0 |y 0 ) ≤ |p~ε(z0 |z−n
)| ≤ C2 max
q(z0 |y 0 ),
0
0
y ∈Y
which, together with (42), implies that for some C3 > 0,
−1
| log p~ε(z0 |z−n
)| ≤ C3 +max{| log max
q(z0 |y 0 )|, | log min
q(z0 |y 0 )|}.
0
0
y ∈Y
m1
Z n+1 ~
(r1 )
ε∈C~
ε
0
Apparently we have
x~εi+1,a = fa~εi+1 (x~εi,a ),
x~εi+1,b = fb~εi+1 (x~εi,b ).
Note that there exists a positive constant L01 such that
d˜H (x~ε−n,a , x~ε−n,b ) ≤ L01 ,
for all ~ε ∈ C~εm01 (r1 ), where r1 > 0 are chosen sufficiently
small. Then from (52), we have
d˜H (x~ε−1,a , x~ε−1,b ) ≤ L01 ρn−1
.
1
Therefore, by Lemma V.3, there exists a positive constant L001
independent of n1 , n2 such that for any ~ε ∈ C~εm01 (r1 ), we have
kx~ε−1,a − x~ε−1,b k ≤ L001 ρn1 .
(46)
y
we conclude that there is a positive constant L1 , independent
of n1 , n2 such that
−1
~
ε
n
|p~ε(a0 |a−1
q(a0 |y 0 ).
−n1 ) − p (b0 |b−n2 )| ≤ L1 ρ1 max
0
y ∈Y
−1
By Lemma III.4 (Part 2), h~εn (Z0 |Z−n
) is analytic on C~εm01 (r1 ).
Now, to prove the theorem, we only need to prove that
there exist r1 > 0 such that the h~εn (Z), as n → ∞, uniformly
converges on C~εm01 (r1 ). Note that
Z
−1
0
~
ε
~
ε
0
p~ε(z−n−1
) log p~ε(z0 |z−n−1
|hn+1 (Z) − hn (Z)| = )dz−n−1
n+2
Z
Z
−1
0
0 −
p~ε(z−n
) log p~ε(z0 |z−n
)dz−n
Z n+1
Z
−1
0
p~ε(z−n−1
= )(log p~ε(z0 |z−n−1
)
Z n+2
−1
0
.
− log p~ε(z0 |z−n
))dz−n−1
Fix ~ε ∈ C~εm01 (r1 ). Then, by Lemmas V.2 and VI.3, either
we have, for some 0 < ρ1 < 1, L01 > 0 and some δ1 with
(1 + δ1 )ρ1 < 1
Now, using (19) and the fact that
X
p~ε(a0 ) =
πy~ε q(a0 |y),
(47)
−1
−1
0
))|
|p~ε(z−n−1
)(log p~ε(z0 |z−n−1
) − log p~ε(z0 |z−n
−1
−1 ~
ε
~
ε
)
p
(z
|z
)
−
p
(z
|z
0
0
−n −n−1
0
)
≤ L01 p~ε(z−n−1
−1
~
ε
p (z0 |z−n−1 )
−1
≤ L01 |p~ε(z−n−1
)|L1 ρn1 max
q(z0 |y 0 ),
0
We then have finished the proof.
y ∈Y
We are now ready for the proof of Theorem I.1.
Proof of Theorem I.1. We first prove that there exist r1 > 0
such that for any n, h~εn (Z) is analytic on C~εm01 (r1 ).
For a fixed n, recall that
Z
−1
~
ε
0
0
hn (Z) = −
p~ε(z−n
) log p~ε(z0 |z−n
)dz−n
,
Z n+1
where
0
(z−n
)
=
X
~
ε
p
0
Y
0
(y−n
)
0
y−n
p
−1
(z0 |z−n
)
Now, for any ~ε ∈
0
|p~ε(z−n
)| ≤
X
q(zi |yi )
Notice that for any given δ > 0, there exist r1 , r2 > 0 such
that for all ~ε ∈ C~εm0 (r1 ),
0
|πy~ε−n | ≤ (1 + δ)πy~ε−n
,
−1
x~ε−1 (z−n
)Π~ε(z0 )1.
0
y−n
≤
X
0
y−n
0
|p~ε(y−n
)|
0
Y
Zn
q(zi |yi )
Z
i=−n
0
Y
i=−n
max
q(zi |y 0 ).
0
y ∈Y
|πy~εi yi+1 | ≤ (1 + δ)πy~εi0yi+1 .
It then follows from the first inequality sign of (48) that
Z
~ −1
−1
|p~ε,θ (z−n
)|dz−n
≤ (1 + δ)n ,
we have
0
|p~ε(y−n
)|
−1
−1
0
|p~ε(z−n−1
)(log p~ε(z0 |z−n−1
) − log p~ε(z0 |z−n
))|
−1
−1 ~
ε
~
ε
p
(z
|z
)
−
p
(z
|z
))
0
0
−n
−n−1
0
≤ L01 p~ε(z−n−1
)
−1
~
ε
p (z0 |z−n )
y ∈Y
i=−n
=
C~εm01 (r1 ),
or we have, for some 0 < ρ1 < 1, L01 > 0 and some δ1 with
(1 + δ1 )ρ1 < 1,
−1
0
≤ L01 |p~ε(z−n
)||p~ε(z−n−1 |z−n
)|L1 ρn1 max
q(z0 |y 0 ).
0
and
~
ε
y ∈Y
This, together with (4), implies that on
Z
0
−1 0
) log p~ε(z0 |z−n
) dz−n
< ∞ (50)
sup p~ε(z−n
x~εi,b = x~εi(bi−n2 ) = p~ε(yi = · |bi−n2 ).
p
(49)
C~εm01 (r1 ),
x~εi,a = x~εi(ai−n1 ) = p~ε(yi = · |ai−n1 ),
~
ε
y ∈Y
Z n+1
~
−1
−1
|p~ε,θ (z−n−1
)|dz−n−1
≤ (1 + δ)n+1 .
Moreover, similar to (49), we have for some C4 , C5 > 0,
(48)
~0
~
0
C4 min
q θ (z−n−1 |y 0 ) ≤ |p~ε,θ (z−n−1 |z−n
)| ≤ C5 max
q(z−n−1 |y 0 ).
0
0
y ∈Y
y ∈Y
13
By choosing δ > 0 sufficiently small, we can combine all
the relevant inequalities above to obtain some L > 0 and some
0 < ρ < 1 such that for all ~ε ∈ C~εm01 (r1 ),
|h~εn+1 (Z)
Z
≤
Z n+2
−
It then follows from the second inequality of (11) that for
~ ∈
any δ > 0, there exist r1 , r2 > 0 such that for any (~ε, θ)
m2
m1
C~ε0 (r1 ) × C~ (r2 ) and any x ∈ W , we have
θ0
h~εn (Z)|
~
~
d˜H (fz~ε,θ (x), fz~ε0 ,θ0 (x)) ≤ δ.
0
−1
−1
0
|p~ε(z−n−1
)(log p~ε(z0 |z−n−1
) − log p~ε(z0 |z−n
))|dz−n−1
≤ Lρn ,
The following lemma is an analog of Theorem VI.2.
which implies the analyticity of h~ε(Z) around ~ε0 .
Lemma VII.2. For sufficiently small r1 , r2 , δ > 0, there exists
~ ∈ Cm1 (r1 )×Cm2 (r2 ) and
0 < ρ1 < 1 such that for any (~ε, θ)
~
~
ε0
VII. P ROOF OF T HEOREM I.8
The proof of Theorem I.8 follows from a parallel flow as in
that of Theorem I.1. We will only give a sketch of the proof,
while highlighting the parts requiring the extra conditions (b),
(c) and (11).
The following lemma is an analog of Lemma VI.1.
Lemma VII.1. For any δ > 0, there exist r1 , r2 > 0 such
~ ∈ Cm1 (r1 ) × Cm2 (r2 ), any z ∈ Z and any
that for any (~ε, θ)
~
ε0
~0
θ
x ∈ W , we have
~
~
d˜H (fz~ε,θ (x), fz~ε0 ,θ0 (x)) ≤ δ.
Proof. By (11), we can choose r1 , r2 , δ > 0 sufficiently small
~ ∈ Cm1 (r1 ) × Cm2 (r2 ) and
such that for any z ∈ Z, any (~ε, θ)
~
~
ε0
θ0
any u, v ∈ W̃W ◦ ,H (δ),
~
~
d˜H (uΠ~ε,θ (z), vΠ~ε,θ (z))
is well-defined. Moreover, it can be easily checked that
~
~
d˜H (uΠ~ε,θ (z), vΠ~ε,θ (z)) = d˜H (uΠ~ε, vΠ~ε).
The following lemma is an analog of Theorem VI.3.
Lemma VII.3. 1) For any δ > 0, there exist r1 , r2 > 0
~ ∈ Cm1 (r1 ) × Cm2 (r2 ) and for
such that for any (~ε, θ)
~
ε0
~0
θ
0
any z−n
∈ Z n+1 and −n − 1 ≤ i ≤ −1,
Now, for any x = (x1 , x2 , · · · , xl ) ∈ W , any j and any ~ε ∈
C~εm01 (r1 ), we have
Pl
~
ε
i=1 xi (πij − πij (~ε0 )) Pl
ε0 )
i=1 xi πij (~
Pl
~
ε
i=1 xi πij (~ε0 )(πij − πij (~ε0 ))/πij (~ε0 ) =
≤ δ1 .
Pl
ε0 )
i=1 xi πij (~
Thus, for any δ2 > 0, choosing δ1 sufficiently small, we have
Pl
~
ε
i=1 xi πij
log Pl
xi πij (~ε0 ) i=1
!
Pl
~
ε
ε0 )) i=1 xi (πij − πij (~
= log 1 +
≤ δ2 .
Pl
xi πij (~ε0 )
i=1
Notice that
~
~
d˜H (fz~ε,θ (x), fz~ε0 ,θ0 (x))
Pl
Pl
~
~
~
ε θ
~
ε θ
x
π
q
(z|k)
i
i=1 xi πij q (z|j)
ik
i=1
−
log
= max log Pl
P
l
~0
~0
θ
θ
j,k xi πij (~ε0 )q (z|j)
xi πik (~ε0 )q (z|k) i=1
i=1
Pl
~
~
ε
q θ (z|j)
i=1 xi πij
= max log Pl
+ log ~
j,k q θ0 (z|j)
ε0 )
i=1 xi πij (~
Pl
~
~
ε
q θ (z|k) i=1 xi πik
− log Pl
− log ~
.
q θ0 (z|k) ε0 )
i=1 xi πik (~
(51)
The lemma then immediately follows from Theorem V.1.
Proof. Since all πij (~ε0 ) are strictly positive, for any δ1 > 0,
there exists r1 > 0 such that for all i, j and all ~ε ∈ C~εm01 (r1 ),
~
ε
|πij
− πij (~ε0 )|
≤ δ1 .
πij (~ε0 )
θ0
~
any z ∈ Z, fz~ε,θ is a ρ1 -contraction mapping on W̃W ◦ ,H (δ)
under the complex Hilbert metric in (39).
~
i
x~εi ,θ (z−n
) ∈ W̃W ◦ ,H (δ),
(52)
and
~
−1
) ∈ CR+ [δ].
p~ε,θ (z0 |z−n
(53)
0
∈ Z n+1 ,
2) There exist r1 , r2 > 0 such that for all z−n
~
m1
m2
−1
~
ε,θ
p (z0 |z−n ) is analytic on C~ε0 (r1 ) × C~ (r2 ).
θ0
3) For sufficiently small r1 , r2 > 0, there exist 0 < ρ1 < 1
and a positive constant L1 such that for any two Zvalued sequences {a0−n1 } and {b0−n2 } with a0−n = b0−n
~ ∈ Cm1 (r1 ) × Cm2 (r2 ), we have
and for all (~ε, θ)
~
ε0
~
θ0
~
~
ε,θ
|p
~
~
ε,θ
(a0 |a−1
−n1 )−p
n
(b0 |b−1
−n2 )| ≤ L1 ρ1
~0
|q θ (a0 |y 0 )|.
sup
~0 )∈Y×Cm2 (r2 )
(y 0 ,θ
~
θ
0
Proof. 1. It follows from a parallel argument as in the proof
of Part 1 of Theorem VI.3.
2. It follows from (11(i)) and Part 1 of Lemma V.3 that
~
for sufficiently small r1 , r2 , δ > 0 and any z ∈ Z, fz~ε,θ (x)
~ x) ∈ Cm1 (r1 ) × Cm2 (r2 ) ×
is analytic with respect to (~ε, θ,
~
ε0
~
θ0
W̃W ◦ ,H (δ). It follows from this fact, the iterative nature of
~
x~εi ,θ (see (17)) and Part 1 that for sufficiently small r1 , r2 >
~
2
0, each x~εi ,θ is analytic on C~εm01 (r1 ) × Cm
~0 (r2 ). Part 2 then
θ
immediately follows from (19).
3. It follows from a parallel argument as in the proof of Part
3 of Theorem VI.3.
We are now ready for the proof of Theorem I.8.
14
Proof of Theorem I.8. We first prove that there exist r1 , r2 >
~
0 such that for any n, h~εn,θ (Z) is analytic on C~εm01 (r1 ) ×
2
Cm
~0 (r2 ).
θ
For a fixed n, recall that
Z
~
~ 0
~
−1
~
ε,θ
0
hn (Z) = −
p~ε,θ (z−n
) log p~ε,θ (z0 |z−n
)dz−n
,
Z n+1
where
~
0
p~ε,θ (z−n
)=
X
0
Y
0
p~ε(y−n
)
0
y−n
~ ∈ Cm1 (r1 ) × Cm2 (r2 ). Then, by Lemmas V.2
Fix (~ε, θ)
~0
~
ε0
θ
and VII.3, either we have, for some 0 < ρ1 < 1, L01 > 0
and some δ1 with (1 + δ1 )ρ1 < 1
~
~
~
q θ (zi |yi )
~
−1
≤ L01 |p~ε,θ (z−n−1
)|L1 ρn1
i=−n
~
−1
−1
0
|p~ε,θ (z−n−1
)(log p~ε,θ (z0 |z−n−1
) − log p~ε,θ (z0 |z−n
))|
~
~
−1
−1 ~
ε,θ
~
ε,θ
p
(z
|z
)
−
p
(z
|z
)
~
0
0
−n −n−1
0
)
≤ L01 p~ε,θ (z−n−1
~
−1
~
ε
,
θ
p (z0 |z−n−1 )
~0
|q θ (z0 |y 0 )|,
sup
m
~0 )∈Y×C 2 (r2 )
(y 0 ,θ
~
θ0
and
~
~
or we have, for some 0 < ρ1 < 1, L01 > 0 and some δ1 with
(1 + δ1 )ρ1 < 1,
~
,θ −1
−1
p~ε,θ (z0 |z−n
) = x~ε−1
(z−n )Π~ε,θ (z0 )1.
~ ∈ Cm1 (r1 ) × Cm2 (r2 ), we have
Now, for any (~ε, θ)
~
~
ε0
~
~
~
ε,θ
|p
0
(z−n
)|
≤
X
~
ε
|p
0
y−n
≤
X
0
|p~ε(y−n
)|
0
y−n
0
Y
0
(y−n
)|
~
|q θ (zi |yi )|
i=−n
~0
|q θ (zi |y 0 )|.
sup
~
~
~
0
Y
(54)
−1
0
)||p~ε,θ (z−n−1 |z−n
)|L1 ρn1
≤ L01 |p~ε,θ (z−n
m
~0 )∈Y×C 2 (r2 )
i=−n (y 0 ,θ
~
θ
sup
~0
|q θ (z0 |y 0 )|.
~0 )∈Y×Cm2 (r2 )
(y 0 ,θ
~
θ
0
0
And, by (52), for sufficiently small r1 , r2 > 0, there exist
C1 , C2 > 0 such that
C1
inf
m
~0 )∈Y×C 2 (r2 )
(y 0 ,θ
~
~
−1
≤ |p~ε,θ (z0 |z−n
)| ≤ C2
~0
|q θ (z0 |y 0 )|,
sup
Notice that for any given δ > 0, there exist r1 , r2 > 0 such
that for all ~ε ∈ C~εm0 (r1 ),
0
,
|πy~ε−n | ≤ (1 + δ)πy~ε−n
~0
|q θ (z0 |y 0 )|
θ0
(55)
~0 )∈Y×Cm2 (r2 )
(y 0 ,θ
~
Z
which, together with (53), implies that for some C3 > 0,
~
−1
| log p~ε,θ (z0 |z−n
)| ≤ C3 + max{| log
log
inf
sup
~0
|q θ (z0 |y 0 )|,
~0 )∈Y×Cm2 (r2 )
(y 0 ,θ
~
θ
0
~0 )∈Y×Cm2 (r2 )
(y 0 ,θ
~
θ
0
~0
|q θ (z0 |y 0 )|}.
m1
m
~
(~
ε,θ)∈C
(r1 )×C~ 2 (r2 )
~
ε0
θ0
~
(here we have used the fact that Z |q θ (z|y)|dz is a continuous
2
function of θ~ ∈ Cm
~0 (r2 ); this follows from Lemma III.4 (Part
θ
1)). It then follows from the first inequality sign of (54) that
Z
~ −1
−1
≤ (1 + δ)2n ,
)|dz−n
|p~ε,θ (z−n
Zn
Z
θ0
sup
Z
R
2
This, together with (5), implies that on C~εm01 (r1 ) × Cm
~ (r2 ),
Z
|πy~εi yi+1 | ≤ (1 + δ)πy~εi0yi+1 ,
2
and for any y ∈ Y and all θ~ ∈ Cm
~0 (r2 ),
θ
Z
Z
~
~
|q θ (z|y)|dz ≤ (1 + δ)
q θ0 (z|y)dz = 1 + δ,
θ0
Z n+1
~
−1
−1
0
|p~ε,θ (z−n−1
)(log p~ε,θ (z0 |z−n−1
) − log p~ε,θ (z0 |z−n
))|
~
~
−1
−1 ~
ε,θ
~
ε,θ
p
(z
|z
)
−
p
(z
|z
))
~
0
0
−n
−n−1
0
)
≤ L01 p~ε,θ (z−n−1
~
−1
~
ε
,
θ
p (z0 |z−n )
θ0
~
~ε,θ~ 0
−1 0
) dz−n
<∞
p (z−n ) log p~ε,θ (z0 |z−n
~
Z n+1
−1
−1
|p~ε,θ (z−n−1
)|dz−n−1
≤ (1 + δ)2(n+1) .
Moreover, similar to (55), we have for some C4 , C5 > 0,
C4
inf
~0 )∈Y×Cm2 (r2 )
(y 0 ,θ
~
θ
0
~0
~
0
|q θ (z−n−1 |y 0 )| ≤ |p~ε,θ (z−n−1 |z−n
)|
(56)
~
−1
By Lemma III.4 (Part 2), h~εn,θ (Z0 |Z−n
) is analytic on
~0
2
C~εm01 (r1 ) × Cm
≤ C5
sup
|q θ (z−n−1 |y 0 )|.
~0 (r2 ).
θ
~0 )∈Y×Cm2 (r2 )
(y 0 ,θ
Now, to prove the theorem, we only need to prove that there
~
θ
0
~
~
ε,θ
exist r1 , r2 > 0 such that the hn (Z), as n → ∞, uniformly
By choosing δ > 0 sufficiently small, we can combine all
2
converges on C~εm01 (r1 ) × Cm
~0 (r2 ). Note that
θ
the relevant inequalities above to obtain some L > 0 and some
Z
~ ∈ Cm1 (r1 ) × Cm2 (r2 ),
0 < ρ < 1 such that for all (~ε, θ)
~
~
~ 0
~
~
ε,θ
~
ε0
~0
−1
~
ε,θ
0
θ
|hn+1 (Z) − hn (Z)| = ) log p~ε,θ (z0 |z−n−1
p~ε,θ (z−n−1
)dz−n−1
Z n+2
~
~
,θ
Z
|h~εn+1
(Z) − h~εn,θ (Z)|
~ 0
~
−1
~
ε,θ
~
ε,θ
0 −
p (z−n ) log p (z0 |z−n )dz−n Z
n+1
~ 0
~
~
−1
−1
0
ZZ
≤
|p~ε,θ (z−n−1
)(log p~ε,θ (z0 |z−n−1
)−log p~ε,θ (z0 |z−n
))|dz−n−1
≤ Lρn ,
~
~
n+2
−1
~
ε
,
θ
0
~
ε
,
θ
Z
= p (z−n−1 )(log p (z0 |z−n−1 )
Z n+2
~
which implies the analyticity of h~ε,θ (Z) around (~ε0 , θ~0 ).
~
−1
0
− log p~ε,θ (z0 |z−n
))dz−n−1
.
15
VIII. C ONCLUDING R EMARKS
Under certain mild assumptions, employing a complex
Hilbert metric in a critical way, we show that the entropy rate
of a class of hidden Markov chains with continuous alphabet
is analytic with respect to the input Markov chain parameters.
Joint analyticity results with respect to both the input Markov
chain parameters and the channel parameters, which apply to
additive Cauchy or Gaussian channels, are further obtained
under strengthened assumptions on the channel. Given the
implications of the analyticity results in the discrete setting,
we expect that the results in this paper will be of great
interest and significance in the continuous setting. Further
work include the applications of the analyticity results in this
paper to computations of entropy rate of hidden Markov chains
and capacity of finite-state channels with continuous output
alphabet.
R EFERENCES
[1] A. Allahverdyan. Entropy of Hidden Markov Processes via Cycle
Expansion. J. Stat. Phys., vol. 133, 2008, pp. 535-564.
[2] L. Arnold, V. M. Gundlach and L. Demetrius. Evolutionary formalism
for products of positive random matrices. Annals of Applied Probability,
vol. 4, 1994, pp. 859-901.
[3] J. J. Birch. Approximations for the entropy for functions of Markov
chains. Ann. Math. Statist., vol. 33, 1962, pp. 930-938.
[4] D. Blackwell. The entropy of functions of finite-state Markov chains.
Trans. First Prague Conf. Information Thoery, Statistical Decision
Functions, Random Processes, 1957, pp. 13-20.
[5] M. Boyle and K. Petersen. Hidden Markov processes in the context
of symbolic dynamics. Entropy of Hidden Markov Processes and Connections to Dynamical Systems, London Mathematical Society Lecture
Note Series, vol. 385, edited by B. Marcus, K. Petersen and T. Weissman,
2011, pp. 5-71.
[6] L. Dubois. Projective metrics and contraction principles for complex
cones. Journal of the London Mathematical Society, vol. 79, no. 3,
2009, pp. 719-737.
[7] S. Egner, V. Balakirsky, L. Tolhuizen, S. Baggen and H. Hollmann. On
the entropy rate of a hidden Markov model. IEEE ISIT, 2004, pp. 12.
[8] R. Gharavi and V. Anantharam. An upper bound for the largest Lyapunov
exponent of a Markovian product of nonnegative matrices. Theoretical
Computer Science, vol. 332, no. 1-3, 2005, pp. 543-557.
[9] G. Han. Limit theorems in hidden Markov models. IEEE Trans. Info.
Theory, vol. 59, no. 3, 2013, pp. 1311-1328.
[10] G. Han. A randomized approach to the capacity of finite-state channels.
IEEE ISIT, 2013, pp. 2109-2113.
[11] G. Han and B. Marcus. Analyticity of entropy rate of hidden Markov
chains. IEEE Trans. Info. Theory, vol. 52, no. 12, 2006, pp. 5251-5266.
[12] G. Han and B. Marcus. Derivatives of entropy rate in special families of
hidden Markov chains. IEEE Trans. Info. Theory, vol. 53, no. 7, 2007,
pp. 2642-2652.
[13] G. Han and B. Marcus. Asymptotics of input-constrained binary
symmetric channel capacity. Annals of Applied Probability, vol. 19,
no. 3, 2009, pp. 1063-1091.
[14] G. Han and B. Marcus. Entropy rate of continuous-state hidden Markov
chains. IEEE ISIT, 2010, pp. 1468-1472.
[15] G. Han and B. Marcus and Y. Peres. A complex Hilbert metric and
applications to domain of analyticity for entropy rate of hidden Markov
processes. Entropy of Hidden Markov Processes and Connections to
Dynamical Systems, London Mathematical Society Lecture Note Series,
vol. 385, edited by B. Marcus, K. Petersen and T. Weissman, 2011, pp.
98-116.
[16] G. Han and B. Marcus. Asymptotics of entropy rate in special families
of hidden Markov chains. IEEE Trans. Info. Theory, vol. 56, no. 3,
2010, pp. 1287-1295.
[17] G. Han and B. Marcus. Concavity of mutual information rate for inputrestricted finite-state memoryless channels at high SNR. IEEE Trans.
Info. Theory, vol. 58, no. 3, 2012, pp. 1534-1548.
[18] T. Holliday, A. Goldsmith, and P. Glynn. Capacity of finite state channels
based on Lyapunov exponents of random matrices. IEEE Trans. Info.
Theory, vol. 52, no. 8, 2006, pp. 3509-3532.
[19] S. Ihara. Information Theory for Continuous Systems, World Scientific,
1993.
[20] P. Jacquet, G. Seroussi, and W. Szpankowski. Noisy constrained
capacity. IEEE ISIT, 2007, pp. 986-990.
[21] P. Jacquet, G. Seroussi, and W. Szpankowski. On the entropy of a
hidden Markov process. Theoretical Computer Science, vol. 395, 2008,
pp. 203-219.
[22] F. Ledrappier. Analyticity of the entropy for some random walks.
Preprint. Available at http://arxiv.org/abs/1009.5354.
[23] F. Le Gland and N.Oudjane. Stability and Uniform Approximation of
Nonlinear Filters Using the Hilbert Metric and Application to Particle
Filters. The Annals of Applied Probability, vol. 14, no. 1, 2004, pp.
144-187.
[24] Y. Li and G. Han. Concavity of mutual information rate of finite-state
channels. IEEE ISIT, 2013, pp. 2114-2118.
[25] J. Luo and D. Guo. On the entropy rate of hidden Markov processes
observed through arbitrary memoryless channels. IEEE Trans. Info.
Theory, vol. 55, no. 4, 2009, pp. 1460-1467.
[26] C. Nair, E. Ordentlich and T. Weissman. Asymptotic filtering and
entropy rate of a hidden Markov process in the rare transitions regime.
IEEE ISIT, 2005, pp. 1838-1842.
[27] E. Ordentlich and T. Weissman. On the optimality of symbol by symbol
filtering and denoising. IEEE Trans. Info. Theory, vol. 52, no. 1, 2006,
pp. 19-40.
[28] E. Ordentlich and T. Weissman. New bounds on the entropy rate of
hidden Markov process. IEEE ITW, 2004, pp. 117-122.
[29] E. Ordentlich and T. Weissman. Bounds on the entropy rate of
biunary hidden Markov processes Entropy of Hidden Markov Processes
and Connections to Dynamical Systems, London Mathematical Society
Lecture Note Series, vol. 385, edited by B. Marcus, K. Petersen and
T. Weissman, 2011, pp. 117-171.
[30] P. Vontobel, A. Kavcic, D. Arnold and Hans-Andrea Loeliger. A
generalization of the Blahut-Arimoto algorithm to finite-state channels.
IEEE Trans. Info. Theory, vol. 54, no. 5, 2008, pp. 1887-1918.
[31] Y. Peres. Analytic dependence of Lyapunov exponents on transition
probabilities. Lecture Notes in Mathematics, Springer Verlag, vol. 1486,
1990.
[32] Y. Peres. Domains of analytic continuation for the top Lyapunov
exponent. Ann. Inst. H. Poincare Prob. Statist, vol. 28, no. 1, 1991,
pp. 131-148.
[33] Y. Peres and A. Quas. Entropy rate for hidden Markov chains with rare
transitions. Entropy of Hidden Markov Processes and Connections to
Dynamical Systems, London Mathematical Society Lecture Note Series,
vol. 385, edited by B. Marcus, K. Petersen and T. Weissman, 2011, pp.
172-178.
[34] H. Pfister, J. Soriaga and P. Siegel. The achievable information rates of
finite-state ISI channels. IEEE GLOBECOM, 2011, pp. 2992-2996.
[35] H. Pfister. On the capacity of finite state channels and the analysis of
convolutional accumulate-m codes. Ph.D. thesis, University of California
at San Diego, USA, 2003.
[36] H. Pfister. The capacity of finite-state channels in the high-noise regime.
Entropy of Hidden Markov Processes and Connections to Dynamical
Systems, London Mathematical Society Lecture Note Series, vol. 385,
edited by B. Marcus, K. Petersen and T. Weissman, 2011, pp. 179-222.
[37] H. Rugh. Cones and gauges in complex spaces: Spectral gaps and
complex Perron-Frobenius theory. Annals of Mathematics, vol. 171,
no. 3, 2010.
[38] E. Seneta. Non-negative Matrices and Markov Chains, Springer-Verlag,
New York Heidelberg Berlin, 1980.
[39] V. Sharma and S. Singh. Entropy and channel capacity in the regenerative setup with applications to Markov channels. IEEE ISIT, 2001, pp.
283.
[40] V. Tadic. Analyticity, Convergence, and Convergence Rate of Recursive
Maximum-Likelihood Estimation in Hidden Markov Models. IEEE
Trans. Info. Theory, vol. 56, no, 12, 2010, pp. 6406-6432.
[41] E. Zehavi and J. Wolf. On runlength codes. IEEE Trans. Info. Theory,
vol. 34, 1988, pp. 45-54.
[42] O. Zuk, I. Kanter and E. Domany. The entropy of a binary hidden
Markov process. J. Stat. Phys., vol. 121, no. 3-4, 2005, pp. 343-360.
[43] O. Zuk, E. Domany, I. Kanter and M. Aizenman. From finite-system
entropy to entropy rate for a hidden Markov process. IEEE Signal
Processing Letters, vol. 13, no. 9, 2006, pp. 517-520.
ACKNOWLEDGMENT
The first author would like to thank the support from
the University Grants Committee of the Hong Kong Special
16
Administrative Region, China under grant No. AoE/E-02/08
and the support from Research Grants Council of the Hong
Kong Special Administrative Region, China under grant No.
17301814.
Guangyue Han Guangyue Han received the B.S. and M.S. degrees in
mathematics from Peking University, China, and the Ph.D. degree in mathematics from the University of Notre Dame, U.S.A. in 1997, 2000 and
2004, respectively. After three years with the department of mathematics
at the University of British Columbia, Canada, he joined the department
of mathematics at the University of Hong Kong, China in 2007. His main
research areas are analysis and combinatorics, with an emphasis on their
applications to coding and information theory.
Brian Marcus Brian Marcus attended Claremont McKenna College and
received his B.A. from Pomona College in 1971 and Ph.D. in mathematics
from UC-Berkeley in 1975 (supervised by Rufus Bowen). He held the
IBM Watson Postdoctoral Fellowship in mathematical sciences in 1976-7.
From 1975-1985 he was Assistant Professor and then Associate Professor of
Mathematics (with tenure) at UNC-Chapel Hill. From 1984-2002 he was a
Research Staff Member at the IBM Almaden Research Center. He is Professor
of Mathematics at UBC-Vancouver, and served as Head of Mathematics
(2002-2007). He has been Consulting Associate Professor in Electrical Engineering at Stanford University (2000-2003) and Visiting Associate Professor
in Mathematics at UC-Berkeley (1986). He has served as primary dissertation
supervisor of PhD students at UNC-Chapel Hill, UC Santa Cruz, Stanford
University and UBC. He was co-recipient (with Paul Siegel and Jack Wolf)
of the Leonard G. Abraham Prize Paper Award of the IEEE Communications
Society in 1993 and gave an invited plenary lecture at the 1995 International
Symposium on Information Theory. He is the author of more than sixty
research papers in refereed mathematics and engineering journals, as well as
a textbook, An Introduction to Symbolic Dynamics and Coding (Cambridge
University Press, 1995, co-authored with Doug Lind). He also holds twelve
US patents. He is an IEEE Fellow and served as an elected member of the
American Mathematical Society Council (2003-2006). His current research
interests include symbolic dynamics, ergodic theory, information theory and
statistical physics.
Download