Coding of Brownian Motion by Quantization of Exit Times

advertisement
Coding of Brownian Motion by
Quantization of Exit Times
Felix Poloczek and Florin Ciucu
T-Labs / TU Berlin
{felix,florin}@net.t-labs.tu-berlin.de
Abstract—We present a concrete coding scheme for standard
linear Brownian motion. The coding error under sup-norm
1
decays like O(r− 2 ) in the entropy r, and is thus rate-optimal.
Moreover, the multiplicative constant lies below the theoretical
maximum, and thus improves the known results. The scheme
is based on a novel technique for quantizing the exit times of
Brownian motion within a random error.
I. I NTRODUCTION
A fundamental problem in information theory concerns the
optimal (lossless) data compression, i.e., assigning codewords
to the outcomes of a random source X such that the average
codeword length is minimal. This problem is completely
solved by the Huffman code algorithm which yields the
minimum as the entropy H(X) ([4]).
In practice, however, the original source cannot be exactly
encoded if the available memory or bandwidth is limited.
This is typically the case for sources with infinite entropy,
like continuously distributed random variables or stochastic
processes. One important example of a source with infinite
entropy is Brownian motion which was originally intended
to model the trace of a particle moving randomly in space
([15]). More recently, Brownian motion has been applied
to various other scientific fields, including the analysis of
stock markets and networking ([16], [17]), and has arguably
become the most important object in modern probability
theory ([13]). This paper is concerned with coding Brownian
motion.
In recent years, the coding complexity (i.e., approximability by simpler processes) of Brownian motion and related
objects has been investigated ([2], [5], [6]). The focus has
been primarily on determining the asymptotic behavior of
the entropy coding error ([7])
In the particular case of a Brownian motion B, an important result ([9]) is that
√
lim D(e) (r | B, s) r = κ,
(1)
r→∞
where κ is a constant independent of s. A further result is
that κ ∈ [ √π8 , π] ([8]). These two results, however, are not
accompanied by a concrete coding strategy.
In this paper we present such a concrete coding strategy
B̂ for a Brownian motion B. Our strategy is not only rateoptimal, but it also reduces the interval for κ to
π
κ ∈ [ √ , 2.11] ,
8
by explicitly bounding κ ≤ 2.11.2
The main idea of our coding strategy is to represent B̂ in
terms of piecewise constant paths whose jumping points are
the exit times of B of certain intervals. The key observation
is that B is bounded between two consecutive exit times, and
hence the error |Bt − B̂t | is bounded as well. We point out,
however, that the exit times are continuously distributed and
cannot be encoded exactly.
In order to deal with this technical problem we must resort
to quantized versions, i.e., discrete approximations D of the
exit times X. In order to quantize random variables, a typical
method is to minimize
E[d(X, D)]
(for an appropriate metric d), subject to constraints of the
form ([11], [12])
rng D ≤ n,
or
H(D) ≤ H .
where s > 0 is the moment under consideration and the
infimum is taken over all stochastic processes X̃ such that
He (X̃) ≤ r1 .
Unfortunately, such a method fails in our context. The
reason is that the maximum of Brownian motion has an
unbounded distribution in every (deterministic) interval of
positive length, and hence |Bt − B̂t | cannot be bounded (a.s.)
by any deterministic quantization error. Our crucial idea is
to suitably construct a random variable Y ≥ 0 (which is also
an exit time and independent of X), and which plays the
role of a random error. In this way, by bounding Brownian
1 H and H denote the entropy measured in nats and bits, respectively;
e
similarly, ln := loge and log := log2 .
2 Another coding strategy was recently proposed in [1] with κ ≤ 5.19,
and hence outside the interval established in [8].
D(e) (r | X, s) := inf sup Xt − X̃t t∈[0,1]
,
Ls
motion with a (random) interval, the supremum error of the
approximation B̂ can also be bounded.
In more concrete terms, given two independent random
variables X and Y ≥ 0, our goal is to define a discrete
random variable D with minimal finite entropy such that
X ≤D ≤X +Y
a.s.
(2)
The rest of this paper is organized as follows. In Section II
we develop solutions for the quantization problem introduced
in Eq. (2). We then use these solutions in Section III for our
proposed coding strategy of Brownian motion. Finally, we
present some brief conclusions in Section IV.
II. Q UANTIZATION OF RANDOM VARIABLES
It is possible that no D satisfying Eq. (2) with finite
entropy exists. For a quick example take X integer-valued
with H(X) = ∞, and let Y such that Y < 1 almost surely;
note that D exactly encodes X and thus has infinite entropy
as well. Therefore, we must enforce some conditions on Y .
We will show in particular that the conditions
∀t ≥ 0 : P(Y ≤ t) ≤ βtn
and E[|X|] < ∞
(for some n ∈ N, β > 0) suffice for the existence of a D
with finite entropy and satisfying Eq. (2).
The following two scaling properties will be frequently
used in the sequel:
1) For c > 0, a ∈ R the problem for (cX, cY ) and (X +
a, Y ) is just as hard as for (X, Y ) (simply take the
random variables cD and a + D instead of D, which
have the same entropy).
2) If P(Y1 ≤ t) ≤ P(Y2 ≤ t) for every t ≥ 0, the
problem for (X, Y1 ) is not harder than for (X, Y2 )
(by the independence assumption one may assume that
Y2 ≤ Y1 a.s., so that X ≤ D ≤ X + Y2 ≤ X + Y1 ).
The second property is crucial: If the probability of Y is
too large, for small values of t, then the resulting entropy of
D is also large. We therefore confine ourselves to the case
when this probability is bounded by
P(Y ≤ t) ≤ βn tn .
Among all distributions satisfying this constraint, the worst
case is represented by those satisfying the above line with
equality. It is sufficient to work with these worst-case distributions, which we denote for convenience by Mnβn .
In the following we first consider the simpler case when
X is a bounded random variable, and then we consider the
general case.
A. Bounded random variable X
In this section, in addition to Y ∼ Mnβn (for some n ∈ N
and βn > 0), we also assume that X is bounded on some
interval [0, α].
Given X and Y , the main idea to construct a quantized
version D of X is as follows. First estimate Y from below,
i.e., find d ≤ Y . Then divide the range of X (i.e., [0, α]) into
subintervals of length d and determine the one where X lies
in. The right endpoint of this interval is finally chosen as the
value for D.
We introduce two integer-valued random variables Ŷ and
X̂Ŷ corresponding to the first two steps:
Ŷ = i ⇔ Y ∈ [2−i , 2−i+1 )
X̂Ŷ = j ⇔ X ∈ [(j − 1)2−Ŷ , j2−Ŷ ),
i, j ≥ 1.
With these random variables we can easily define D:
Definition 1: For Ŷ , X̂Ŷ as above, define
X̂Ŷ
.
2Ŷ
D is the right endpoint of the interval in the definition of
X̂Ŷ .
Some further simplifications are useful.
√ By applying the
previous scaling property 1) with c = n βn , we obtain an
equivalent problem with
p
P(cY ≤ t) = tn and cX ∈ [0, α n βn ] .
D :=
Further, if √m ∈ Z denotes the unique integer such that
2m−1 < α n βn ≤ 2m , and applying both scaling properties,
the quantization problem becomes harder when the range of
X is replaced by [0, 2m ].
The advantage of this setting (i.e., X ∈ [0, 2m ] and Y ∼
Mn1 ) is that it provides a “worst case” distribution for X: the
entropy H(D) is maximal if X is uniformly distributed. To
see this, consider the entropy of D conditioned on {Ŷ = i}
(for i ≥ 1):
H(D | Ŷ = i) = H(X̂Ŷ | Ŷ = i)
Clearly, if X is uniformly distributed, so is X̂Ŷ | {Ŷ = i}.
As the (discrete) uniform distribution maximizes the entropy
among all distribution with the same (bounded) support, all
the conditioned entropies H(D | Ŷ = i) and thus H(D) are
maximal.
Therefore, for the rest of this section, we assume
Y ∼ Mn1
and
X ∼ U[0,2m ] ,
for some n ≥ 1 and m ∈ Z.
Suppose first the special case m = −1. We calculate the
(joint) distribution of the corresponding Ŷ and X̂Ŷ . For Ŷ it
clearly holds:
P(Ŷ = i) = P(Y ≤ 2−i+1 ) − P(Y ≤ 2−i ) = (2n − 1)2−ni .
Since X and Y are assumed to be independent, the conditioned random variable X̂Ŷ | Ŷ = i is uniformly distributed
on {1, . . . , 2m+i } and hence
P(Ŷ = i ∧ X̂Ŷ = j) = (2n − 1)2−(m+(n+1)i) .
As D is completely determined by the values of Ŷ and
X̂Ŷ , its entropy is bounded by H(D) ≤ H(Ŷ , X̂Ŷ ) ≤
H(Ŷ ) + H(X̂Ŷ ). But as (Ŷ , X̂Ŷ ) cannot be reconstructed
out of D, this bound can be improved by calculating D’s
distribution.
The range of D consists of all the dyadic rationals in
the interval (0, 2m ]. We next introduce the following useful
representation. Let R0 := {2−1 } and for k ≥ 1
P∞
where we used the well known fact that i=1 ia−i =
Clearly, H(D) = H0 + H≥1 and hence:
Rk := {l2−k−1 | l is odd, l ≤ 2k } .
S
Clearly, rng(D) =
k≥0 Rk and the Rk ’s are pairwise
disjoint.
Lemma 1: Let Y ∼ Mn1 , X ∼ U[0,2−1 ] . Then
( n+1
2
−2
,
r = 12
n+1
P(D = r) = 22n+1 −1
−2 −(n+1)k
r ∈ Rk , k ≥ 1.
2n+1 −1 2
The result easily extends to the case m ≥ 0. Concretely,
by conditioning on the events {X ∈ [(j −1)2−1 , j2−1 )}, one
obtains:
Theorem 1: Let m ≥ −1, Y ∼ Mn1 , X ∼ U[0,2m ] . Then
for the entropy H(D) it holds
Proof: We have for r = 2−1 :
P(D = 2−1 ) =
∞
X
P(D = 2−1 , Ŷ = i)
i=1
=
∞
X
(2n − 1) 21−(n+1)i =
i=1
l
2k
and for r =
P(D = r) =
=
2n+1 − 2
,
2n+1 − 1
∈ Rk (k ≥ 1, l odd, l ≤ 2k ):
∞
X
i=k+1
∞
X
P(D = r, Ŷ = i)
(2n − 1) 21−(n+1)i =
i=k+1
2n+1 − 2 −(n+1)k
2
2n+1 − 1
Now it is easy to compute the entropy of D:
Lemma 2: With the situation as in Lemma 1 the entropy
of D is given by
H(D) = Λn − 1 := log
2n+1 − 1
(n + 1)2n
+ n
− 1.
n
2 −1
(2 − 1)(2n+1 − 1)
Proof: We have |R0 | = 1, and |Rk | = 2k−1 for k ≥ 1.
Therefore
H0 := − P(D = 2−1 ) log P(D = 2−1 )
2n+1 − 2
2n+1 − 1
= n+1
log n+1
2
−1
2
−2
and
H≥1 := −
∞ X
X
P(D = r) log P(D = r)
k=1 r∈Rk
∞
X
2n+1
=
2k−1 n+1
2
k=1
2n+1 − 1 (n+1)k
− 2 −(n+1)k
2
log n+1
2
−1
2
−2
1
2n+1 − 1
2n
= n+1
log n+1
+ (n + 1) n
,
2
−1
2
−2
2 −1
H(D) = log
a
(a−1)2 .
2n+1 − 1
(n + 1)2n
+ n
.
n+1
2
− 2 (2 − 1)(2n+1 − 1)
H(D) = Λn + m .
For the remaining case (i.e., m ≤ −2), let us remark that
Definition 1 is inadequate. To see that we let m = −2 and
consider two outcomes where Y ∈ [ 21 , 1] (i.e., Ŷ = 1) and
Y ∈ [ 14 , 21 ] (i.e., Ŷ = 2). Since X ∈ [0, 2m ] = [0, 14 ], we
have that X̂Ŷ = 1 in both cases. Although the value of 2−2
is possible for both outcomes, Definition 1 separates them by
giving D = 2−1 and D = 2−2 , respectively, which increases
the entropy. To avoid this problem, we modify the slightly
modify Definition 1:
Definition 2: For m ≤ −2 and Ŷ , X̂Ŷ as above, let
(
2m Ŷ ≤ |m|
D := X̂
Ŷ
otherwise .
2Ŷ
Let us also introduce a new random variable B, indicating
which branch in the above definition is used:
(
+1 Ŷ ≤ |m|
B :=
−1 otherwise .
Since P(B = −1) = 1 − P(B = +1) = 2nm , the
conditional entropy H(D | B) is given by
H(D | B) =(1 − 2nm )H(D | B = +1)
+ 2nm H(D | B = −1).
By definition, D is constant on {B = +1}, and thus has
entropy = 0 on this set. In turn, for {B = −1}, note that for
t ≤ 2m
P(Y ≤ t | B = −1) = 2−mn P(Y ≤ t) = 2−mn tn ,
i.e., on {B = −1} it holds that X ∼ U[0,2m ] and Y ∼
Mn2−mn . By applying the scaling properties 1) and 2), we
obtain an equivalence to the case when X ∼ U[0,1] and Y ∼
Mn1 . Hence, by Theorem 1:
H(D | B) = 2nm Λn .
Theorem 2: If m ≤ −2 the entropy of D is bounded by
H(D) ≤ 2nm (2n |m| + Λn ) .
Proof: By the monotonicity of the log-function we get
H(B) = −2nm log 2nm − (1 − 2nm ) log(1 − 2nm )
≤ −2(2nm log 2nm ) = 2nm 2n |m|
Applying the chain rule, H(D) ≤ H(D, B) = H(B) +
H(D | B) yields the result.
For large |m| this bound is quite sharp since H(D | B) ≤
H(D) and the term 2n |m| has a small influence compared
to 2nm .
Summarizing, for random variables X and Y such that
X ∈ [0, α] and P(Y ≤ t) ≤ βn tn there is a solution D
of the quantization problem
for (X, Y ). Its entropy grows
√
linearly in m := dlog α n βn e for m → ∞ (Theorem 1) and
decays exponentially in m for m → −∞ (Theorem 2).
B. Unbounded random variable X
Here we still assume that Y ∼ Mn1 , but we no drop the
condition that X is bounded. By the example above, without
further conditions on X, there is in general no D with finite
entropy. Thus an analogue of the uniform distribution as
a worst case distribution for X is lacking, and we cannot
compute the entropy of the corresponding D directly.
Our strategy consists of, first, bounding X to some interval
I and, second, applying the previously developed solution to
Y and X | {X ∈ I}. The entropy of D can then be estimated
by the entropy of the bounding procedure for X plus the
entropy as calculated in the previous section.
For a given length d > 0, we let the random variable X̂
indicate in which interval X lies in, i.e.,
X̂ = n ⇔ X ∈ [nd, (n + 1)d),
n∈Z.
Next we modify the definition of X̂Ŷ to
X̂Ŷ = j ⇔ X ∈ [dX̂ + (j − 1)2−Ŷ , dX̂ + j2−Ŷ ) ,
such that we can give a formal definition of D:
Definition 3: With X̂ and X̂Ŷ as above and Ŷ as before,
define:
X̂
D := dX̂ + Ŷ .
2Ŷ
Again, D is the right endpoint of the interval in the definition
of X̂Ŷ .
Applying the chain rule to D and X̂ leads to
H(D) ≤ H(X̂) + H(D | X̂) .
p
As X̂ ≤ X
d , it holds for the moments that E[|X̂| ] ≤
Since maximum entropy distributions are hard to
compute in the discrete case, we follow an approach from
[10] and define a continuous random variable Ξ by
E[|X|p ]
dp .
(3)
As the random variable D | X̂ = n is bounded for n ≥ 1,
we can apply Theorem 1. So all we have to do is to estimate
H(X̂). This will be done by the use of maximum entropy
distributions. As mentioned, a universal X such that H(X̂)
is maximal does not exist. Therefore we consider only those
X satisfying some moment constraint E[h(X)] ≤ µ and
determine the distribution which maximizes the entropy in
that class.
Ξ := X̂ + U ,
where U ∼ U[0,1] is independent of X̂. Clearly, the density
of Ξ is given by f (t) = P(X̂ = btc) and we have:
H(Ξ) = H(X̂) .
It is well known (see, e.g., [14]) that the distributions
maximizing the entropy in Cµp := {Z | E[|Z p |] ≤ µ},
p = 1, 2 are the centered Laplacian and normal distribution,
respectively. Thus:
(
log(2eµ)
p=1
p
(4)
max{H(Z) | Z ∈ Cµ } = 1
2 log(2πeµ) p = 2 .
A straightforward calculation gives the relationship bep
tween E[|Ξ| ] and E[|X̂|p ]:
Lemma 3: Let X̂, Ξ as above, then:
1
1 E[|X|]
+ E[|X̂|] ≤ +
,
2
2
d
2
1
1 E[|X|] E[|X| ]
2
E[|Ξ| ] = + E[|X̂|] + E[|X̂|2 ] ≤ +
+
.
3
3
d
d2
In order to combine the preceding results with those of
Section II-A we have to choose a specific value for d.
It seems reasonable to set d = 2m for some m ∈ Z,
which corresponds to the m from the previous section. We
additionally assume m ≥ −1, as we have exact results only
for this case. Then X|{X̂ = n} is a random variable bounded
in [n2m , (n + 1)2m ), which is, due to the scaling properties,
equivalent to an X being bounded in [0, 2m ].
In order to find the optimal m ≥ −1, note that by applying
Eq. (3) with the entropy in Theorem 1 and the maximum
entropy Eq. (4) with µ from Lemma 3, the entropy of our
discrete D can be bounded by:
(
m + log(2e( 12 + E|X|
p=1
2m )) + Λn
H(D) ≤
E|X|2
m + 21 log(2πe( 13 + E|X|
+
))
+
Λ
n p=2
2m
4m
E[|Ξ|] =
Replacing m by a continuous variable t ∈ [−1, ∞), it is
easy to see that the derivative of the right hand side is > 0 in
both cases. Thus, it is monotonically increasing and therefore
m = −1 (or equivalently d = 12 ) is always the best choice.
Combining the preceding results we have:
Theorem 3: Let p = 1, 2, X ∈ Lp , and Y a random
variable with P(Y ≤ t) ≤ βn tn , for some n ≥ 1, βn > 0.
Then there is a discrete D with X ≤ D ≤ X +Y . Its entropy
is bounded by:

1
Λn + log e( 1 + 2βnn E|X|)
2
H(D) ≤
1
2
Λn + 1 log eπ ( 1 + 2βnn E|X| + 4βnn E|X 2 |) .
2
2 3
3
Proof: By the scaling properties we may work with Y ∼
1
and βnn X instead of the original X and Y . Let m = −1
by the remark above. By Eq. (3), now add up the entropy
from Lemma 2 and the maximum entropy in Eq. (4) with
the moments from Lemma 3, which concludes the proof.
For the application from Section III, the encoding of
Brownian motion, the special case X ≥ 0 will be particularly important. Clearly, the corresponding maximum entropy
distributions turn into its absolute value, i.e., the exponential
and truncated normal. Thus, the entropy reduces by 1.
Theorem 4: Consider the hypotheses from Theorem 3, and
restrict X ≥ 0. Then the entropy H(D) is bounded by:

1
Λn + log e ( 1 + 2βnn E|X|)
2 2
H(D) ≤
1
2
Λn + 1 log eπ ( 1 + 2βnn E|X| + 4βnn E|X 2 |) .
2
8 3
−2
−1
0
1
2
Mn1
0
1
2
3
4
5
6
7
Fig. 1. Sample path of Brownian motion B and its approximation B̂.
For simplicity the jumping points are assumed to be in the middle of the
τ -interval
III. C ODING OF B ROWNIAN MOTION
Encoding Brownian motion B on [0, 1] with an error of
ε is, due to the scaling property (of Brownian motion), the
same as encoding B on [0, ε12 ] with an error of 1. We will
approximate (Bt )t∈[0, 12 ] by a cadlag process B̂, whose paths
ε
are piecewise constant. Its jumping points will depend on the
exit times of B of intervals of size 1.
Definition 4: Let (τn )n∈N denote the exit times of Brownian motion on a grid of size 1, i.e., let
τ0 := 0
τn+1 := inf t ≥ 0 | BPni=1 τi +t − BPni=1 τi = 1 , n ≥ 0
Further, let ξ0 = 0 and ξn+1 := BPn+1 τi − BPni=1 τi = ±1
i=1
for n ≥ 0, indicate whether the Brownian motion moves up
or down at the specific time.
Suppose we have constructed discrete random variables Zi
such that for every n ∈ N
n
X
τi ≤
i=0
n
X
i=0
Zi ≤
n
X
τi + τn+1 ,
(5)
i=0
and its entropy is uniformly bounded by some H ≥ 0. An
equivalent definition will be given below. Given these Zi ’s
we can directly define our process B̂:
Definition 5: With the ξn as in Definition 4, define the
approximating process B̂ by
B̂t :=
∞
X
ξn + ξn+1 P
1[ ni=0 Zi ,∞) (t) .
2
n=0
i=0
i=0
so that for the difference B − B̂ it holds:
BPni=1 τi − BPn−1 τi
n
i=1
, and
(Bt − B̂t ) Il ∈ (−1, +1) +
2
P
P
n
B i=1 τi − B n+1 τi
i=1
.
(Bt − B̂t ) Irn ∈ (−1, +1) +
2
Thus we have shown the following upper bound for the
supremum error of our coding strategy B̂:
Lemma 4: For the coding error it holds that supt≥0 |Bt −
B̂t | ≤ 32 almost surely, and hence for any s > 0:
3
supt∈[0,1] Bt − B̃t ≤ .
2
s
In Figure 1 a sample path of Brownian motion B and
the corresponding path of B̂ are shown. We see that the
process B̂ does not necessarily jump in every interval I n .
Surely, this happens exactly when the coefficient ξn +ξ2 n+1
vanishes, i.e., when ξn = −ξn+1 . If this is case, then the
coding of the corresponding random variable Zn is in that
sense dispensable, as for B̂ we only need the values of
...,
Pn
Pn+1
In every (random) interval I n := [ i=0 τi , i=0 τi ) at
most one jump of B̂ occurs. If we let
" n
!
" n
!
n
n+1
X X
X
X
n
n
Il :=
τi ,
Zi , and Ir :=
Zi ,
τi
i=0
the subintervals left and right of the jump, we have for every
n∈N
BPn−1 τi + BPni=1 τi
i=1
B̂ Iln =
2
BPni=1 τi + BPn+1 τi
n
i=1
B̂ Ir =
, and clearly
2
B I n ∈ BPni=1 τi − 1, BPni=1 τi + 1
i=0
n−1
X
i=1
Zi ,
n+1
X
Zi , . . .
i=1
and not of Zn itself.
Therefore, we will define a sequence of random times
(σn )n∈N corresponding to those τn where B̂ does jump
within the following interval. These times together with its
coefficients fully determine B̂.
First, we define the integer-valued random variables Nn ’s
as follows: N0 = 0, N1 denotes the number of τ -intervals
before Bt moves for the second time in the same direction,
N2 denotes the number of intervals
Bt moves two
PNbefore
1
τ
,
etc.
Finally, let
times in
the
same
direction
after
i=0 i
Pn
Mn = i=0 Ni the cumulative sum of the Nn ’s.
The motivation for this definition is that the n-th P
jumping
Mn
τi
point of B̂ occurs in the interval I Mn , i.e., between i=1
PMn +1
and i=1 τi . Moreover, as long as the path goes up and
down alternately (i.e., ξ1 = −ξ2 = ξ3 = . . . ), the coefficients
ξn +ξn+1
vanish and thus B̂ does not jump.
2
Additionally, two consecutive jumps of B̂ go into the same
direction if and only if the corresponding Nn is odd.
Definition 6: Define, for n ∈ N, the real-valued random
variables σn and πn by σ0 = 0, π0 = τ1 and:
Mn
X
σn :=
τi ,
πn := τMn +1 ,
with Mn as above. Further, let η0 :=
ηn :=
BPn
i=0
σi +πn
−
BPn
i=0
σi ,
and
n≥1.
Next, the goal is to construct discrete random variables
Dn ’s such that for every n ∈ N:
n
n
n
X
X
X
σi ≤
Di ≤
σi + πn .
(6)
i=0
i=0
n
X
i=1
P(σn ≤ t) = P(∃s ≤ t : Bs ∈
/ (−1, 2)) and
P(πn ≤ t) = P(∃s ≤ t : Bs ∈
/ (−1, 1)) .
Moreover, the probability P (σn ≤ t ∧ sgn(ηn−1 · ηn ) = +1)
is equal to the probability of the exit before t and that the
exit takes place at −1.
Proof: Because of the identical distribution, we may
assume without loss of generality that n = 1. The assertion
for π1 is evident from its definition. For σ1 , note that the exit
time can be written as
N
X
inf{s ≥ 0 | Bs ∈
/ (−1, 2)} =
τi ,
i=1
i=1
Di ≤
n
X
σi + πn
for n ∈ N,
i=1
where D holds σ ≤ D ≤ σ + π and is constructed as in
Section II.
Proof: As π1 and σ1 are independent, we can construct
D1 with σ1 ≤ D1 ≤ σ1 +π1 as in Section II. Clearly H(D1 |
η0 , . . . , ηn ) ≤ H(D1 | sgn η0 · η1 ) ≤ H(D | Bσ ) (by Lemma
5). For D2 we stipulate that σ1 + σ2 ≤ D1 + D2 ≤ σ1 +
σ2 + π2 , i.e.,
σ2 − (D1 − σ1 ) ≤ D2 ≤ σ2 − (D1 − σ1 ) + π2 .
Again, π2 is independent of σ1 and σ2 (and thus of D1 ). As
D1 − σ1 ≥ 0 we apply the scaling property to see that the
quantization problem is easier than
n=0
We now need to determine the distribution of σn and πn .
Due to the Markov property, both (τn )n and (Nn )n are i.i.d.
families, and hence (σn )n and (πn )n are i.i.d. as well. Since
πn ≤ σn+1 a.s., both families are not mutually independent.
However, as πn does not depend on B moving up or down
in that interval, πn is independent of Mn and hence for any
fixed n ∈ N, σn and πn are independent.
Lemma 5: For n ≥ 1, the distribution of σn and πn is the
same as the first exit time of Brownian motion of the interval
[−1, 2] and [−1, 1] respectively, i.e.,
n
X
H(D1 , . . . , Dn | η0 , . . . ηn ) ≤ nH(D | Bσ ) ,
i=0
With this property, B̂ can be defined equivalently as follows
∞
X
ηn 1[Pni=0 Di ,∞) (t) .
B̂t =
σi ≤
whose conditional joint entropy can be bounded by
n≥1
i=Mn−1 +1
B π0
2
where N = min{n | BPni=1 τi ∈ {−1, 2}}. But as P(N1 =
k) = 2−k = P(N = k), we have N =D N1 .
For the last part, note that Brownian motion leaves (−1, 2)
at −1 (+2) if and only if N is odd (even). Clearly,
P(N is odd) ⇔ P(Nn is odd). The remark above Definition
6 gives the result.
For now on, let σ =D inf{t ≥ 0 | Bt ∈
/ (−1, 2)} and
π =D inf{t ≥ 0 | Bt ∈
/ (−1, 1)} be mutually independent
copies of the exit times in Lemma 5.
Lemma 6: For σ1 , σ2 , . . ., π1 , π2 , . . . and η0 , η2 , . . . as
above, there are random variables D1 , D2 , . . . such that
σ2 ≤ D2 ≤ σ2 + π2 .
Therefore, as for D1 : H(D2 | η0 , . . . , ηn ) ≤ H(D2 | sgn η1 ·
η2 ) ≤ H(D | Bσ ). Proceeding in this fashion we define the
Dn , such that for n ∈ N:
H(D1 , . . . , Dn | η0 , . . . ηn ) ≤
n
X
H(Di | η0 , . . . , ηn )
i=1
≤ nH(D | Bσ ) .
By the scaling property, we need to encode B̂ up to time
therefore we let B̂ ε := B̂ [0, ε−2 ]. This process depends
only on finitely many Dn . We introduce the new random
variable
(
)
n
X
ε
−2
M := max n ∈ N |
σi ≤ ε
,
1
ε2 ,
i=0
indicating that number.
Note that, as our first jumping point is at D0 , the number
of jumps of B̂ ε is M ε + 1. As we may choose D0 ≡ 0
deterministically, M ε corresponds to the number of Dn ,
which we actually have to encode. The entropy H(B̂ ε ) is
now estimated as follows:
Lemma 7: For the entropy of the coding strategy B̂ ε holds
H(B̂ ε ) ≤ H(M ε )+H(η0 )+[H(η1 | η0 ) + H(D | Bσ )] E[M ε ]
Proof: We use the conditional entropy H(B̂ ε ) ≤
ε
H(B̂ , M ε ) = H(M ε ) + H(B̂ ε | M ε ). Further:
Lemma 9: For the (conditional) entropy H(η0 ) and H(η1 |
η0 ) it holds that
H(η0 ) = 1 and H(η1 | η0 ) < 0.92 .
Proof: By symmetry of Brownian motion, P(η0 =
± 21 ) = 12 , and thus H(η0 ) = 1. By the remark above
∞
X
Definition 6:
H(B̂ ε | M ε ) =
P(M ε = n)H(D0 , η0 , . . . , Dn , ηn )
1
1
n=0
=
P
η
=
−1
|
η
=
−
=
P
η
=
1
|
η
=
1
0
1
0
∞
X
2
2
ε
=
P(M = n) [H(D1 , . . . , Dn | η0 , . . . , ηn ) + H(η0 , . . . , ηn )]
∞
X
2
n=0
= P(N1 odd) =
2−(2n+1) = .
"
#
3
∞
n
X
X
n=0
≤
P(M ε = n) nH(D | Bσ ) + H(η0 ) +
H(ηi | ηi−1 )
2
3
1
Thus, H(η1 | η0 ) = 3 log 2 + 3 log 3 < 0.92.
n=0
i=1
∞
In order to estimate H(D | Bσ ) we need the probabilities
X
P(π ≤ t) and some moment of σ | {Bσ = −1} and σ |
= H(η0 ) + [H(η1 | η0 ) + H(D | Bσ )]
P(M ε = n)n
n=0
{Bσ = 2}:
Lemma 10: Let π be the exit time of B of the interval
= H(η0 ) + [H(η1 | η0 ) + H(D | Bσ )] E[M ε ] .
(−1, 1). The following estimation holds:
Here we used the fact that we can chose D0 ≡ 0 and that
P(π ≤ t) ≤ β2 t2 ,
(ηi , ηi−1 ) =D (η1 , 2η0 ), by the Markov property.
For the rest of this section we determine the constants in
where
this inequality and its asymptotic behavior. We begin with
4 3 3
β2 = √ ( ) 2 < 1.86 .
the expectation and entropy of M ε :
2π e
Lemma 8: The expected number of jumping points of B̂
Proof: Clearly,
per unit interval is asymptotically equal to E(σ)−1 , i.e.,
P(π ≤ t) ≤P(sup{Bs | s ∈ [0, t]} ≥ 1)
1
1
lim ε2 E(M ε ) =
= .
+P(inf{Bs | s ∈ [0, t]} ≤ 1).
ε→0
E(σ)
2
Moreover, for the entropy of this number holds
lim ε2 H(M ε ) = 0 .
ε→0
Proof: Introducing a time parameter t ≥ 0, the process
1
√
Nt := M t can be regarded as a renewal process with interarrival times distributed like σ. By the elementary renewal
theorem
E(Nt )
1
=
.
t→∞
t
E(σ)
lim ε2 E(M ε ) = lim
ε→0
For the fact that E(σ) = 2 see, e.g., [3, p. 212]. For the
second result, we can define a continuous random variable
Ξ as in the previous section, such that H(Ξ) = H(Nt ).
Applying the maximum entropy (with p = 1) we thus have:
1
H(Nt ) ≤ log 2e(E(Nt ) + ) .
2
It then follows that
lim ε2 H(M ε ) = lim
ε→0
t→∞
H(Nt )
=0.
t
As the random variables ηn only assume two different
values, its entropy is bounded by 1. In fact, as these ηn are
not independent, a slightly better estimation holds:
By the reflection principle for Brownian motion we thus have
1
4 1
P(π ≤ t) = 4P(Bt > 1) ≤ √ t 2 e− 2t =: γ(t)t2 .
2π
The derivative of γ is given by
dγ
4 − 5 − 1 1 −1 3
(t) = √ t 2 e 2t
t −
.
dt
2
2
2π
Therefore there is a unique maximum at t = 31 . Now
1
4 3 3
= √ ( ) 2 = β2 .
γ
3
2π e
Hence P(π ≤ t) ≤ γ(t)t2 ≤ β2 t2 .
Evaluating its Laplace transforms (see, e.g., [3, p. 212]),
the first two moments of σ | {Bσ = −1} and σ | {Bσ = 2}
can be seen as:
5
8
E(σ | Bσ = −1) = ,
E(σ | Bσ = 2) = ,
3
3
17
32
E((σ | Bσ = −1)2 ) =
, E((σ | Bσ = 2)2 ) =
.
3
3
For H(D | Bσ ), the last remaining term of the right hand
side of Lemma 7 we apply the results of Section II.
Lemma 11: The conditional entropy H(D | Bσ ) is
bounded by:
H(D | Bσ ) < 4.74 .
Proof: We apply Proposition 4 with p = 1, 2, n = 2
and Lemma 10. Therefore H(D | Bσ = •) is bounded by
the minimum of:

e 1
√

β2 E[σ | Bσ = •])
Λ2 + log
2(2 + 2 √
πe 1
Λ2 + 12 log
•]
8 ( 3 + 2 β2 E[σ | Bσ =


2
+4β2 E[(σ | Bσ = •) ])
With the moments above we obtain the values:
H(D | Bσ = −1) ≤ min{4.57, 4.62} = 4.57
H(D | Bσ = +2) ≤ min{5.20, 5.06} = 5.06.
By the probabilities calculated in the proof of Lemma 9:
2
1
H(D | Bσ = −1) + H(D | Bσ = +2)
3
3
≤ 4.74.
H(D | Bσ ) =
Collecting results, we can now state the central result of
this paper:
Theorem 5: For the optimal coding error D(e) (H) of
Brownian motion B on the interval [0, 1] it holds that
√
κ = lim D(e) (H) · H < 2.11 .
H→∞
Proof: Let ε > 0. The coding error of B̂ (and thus of
B̂ ε = B̂ [0, ε12 ]) is bounded by 32 (Lemma 4). Thus, by the
scaling property there are B̃ ε such that supt∈[0,1] |B − B̃ ε | ≤
3
ε
ε
(e)
(He (B̃ ε )) ≤ 32 ε. With
2 ε and H(B̃ ) = H(B̂ ), hence D
the substitution
H := He (B̃ ε ) = H(B̃ ε ) ln 2
and according to the Lemmas 7, 8, 9 and 11 we have:
q
√
3
κ = lim D(e) (H) · H ≤ lim ε H(B̂ ε ) ln 2
ε→0 2
H→∞
3h 2
ε
≤ lim ε H(M ) + H(η0 )+
ε→0 2
i 12
[H(η1 | η0 ) + H(D | Bσ )] E[M ε ] ln 2
1
3
ln 2 2
=
0 + [H(η1 | η0 ) + H(D | Bσ )]
2
2
r
3 0.92 + 4.74
<
ln 2 < 2.11 .
2
2
IV. C ONCLUSIONS
We have derived upper bounds for the quantization problem of the exit times of Brownian motion, as formalized in
Eq. (2). We have considered the bounded (Theorems 1 and 2),
arbitrary (Theorem 3) and nonnegative (Theorem 4) cases for
these exit times. These results were then applied to construct
and determine the entropy of a coding strategy for Brownian
motion (Theorem 5). Together with the result from [9], we
have showed that for the optimal constant κ from Eq. (1),
it holds that κ ∈ [ √π8 , 2.11]. Therefore, we have not only
provided a concrete coding strategy for Brownian motion,
but we have also improved the range of the multiplicative
constant.
Although we could not achieve optimal results in Section
II, we speculate that the multiplicative constant κ is less than
2.11. In particular, for the estimation of H(X̂), one could apply maximum entropy distributions of classes different from
Cµp (e.g., moments of higher order, exponential moments, etc.)
in order to tighten the bounds.
R EFERENCES
[1] A. G. Adelmann von Adelmannsfelden. Coding of brownian motion
under supremum norm distortion. Diplomarbeit, Technische Universität Berlin, 2010.
[2] F. Aurzada and S. Dereich. The coding complexity of lévy processes.
Foundations of Computational Mathematics, 9(3):359–390, 2009.
[3] A. N. Borodin and P. Salminen. Handbook and Brownian Motion Facts and Formulae. Birkhäuser, Basel, second edition, 2002.
[4] T. M. Cover and J. A. Thomas. Elements of Information Theory. John
Wiley & Sons, New York, 1991.
[5] S. Dereich. The coding complexity of diffusion processes under
lp[0,1]-norm distortion. Stochastic Processes and their Applications,
118(6):938–951, 2008.
[6] S. Dereich. The coding complexity of diffusion processes under supremum norm distortion. Stochastic Processes and their Applications,
118(6):917–937, 2008.
[7] S. Dereich. Asymptotic formulae for coding problems and intermediate
optimization problems: a review. Trends in Stochastic Analysis, pages
187–232, 2009.
[8] S. Dereich, F. Fehringer, A. Matoussi, and M. Scheutzow. On the
link between small ball probabilities and the quantization problem
for gaussian measures on banach spaces. Journal of Theoretical
Probability, 16(1):249–265, 2003.
[9] S. Dereich and M. Scheutzow. High-resolution quantization and
entropy coding for fractional brownian motion. Electronic Journal
of Probability, 11:700–722, 2006.
[10] S. Dolinar. Maximum-entropy probability distributions under lp-norm
constraints. The Telecommunications and Data Acquisition Progress
Report, 42-104:74–87, 1991.
[11] S. Graf and H. Luschgy. Foundations of Quantization for Probability
Distributions. Springer, Berlin, 2000.
[12] A. György and T. Linder. On the structure of optimal entropyconstrained scalar quantizers. IEEE Transactions on Information
Theory, 48(2):416–427, 2002.
[13] O. Kallenberg. Foundations of modern probability. Probability and its
Applications (New York). Springer-Verlag, New York, 1997.
[14] J. N. Kapur. Maximum Entropy Models in Science and Engineering.
John Wiley & Sons, New York, 1989.
[15] P. Mörters and Y. Peres. Brownian Motion. Cambridge University
Press, New York, 2010.
[16] M. F. M. Osborne. Brownian motion in the stock market. Operations
Research, 7:145–173, 1959.
[17] L. M. Wein. Brownian networks with discretionary routing. Operations
Research, 39(2):322–340, 1990.
Download