Coding of Brownian Motion by Quantization of Exit Times Felix Poloczek and Florin Ciucu T-Labs / TU Berlin {felix,florin}@net.t-labs.tu-berlin.de Abstract—We present a concrete coding scheme for standard linear Brownian motion. The coding error under sup-norm 1 decays like O(r− 2 ) in the entropy r, and is thus rate-optimal. Moreover, the multiplicative constant lies below the theoretical maximum, and thus improves the known results. The scheme is based on a novel technique for quantizing the exit times of Brownian motion within a random error. I. I NTRODUCTION A fundamental problem in information theory concerns the optimal (lossless) data compression, i.e., assigning codewords to the outcomes of a random source X such that the average codeword length is minimal. This problem is completely solved by the Huffman code algorithm which yields the minimum as the entropy H(X) ([4]). In practice, however, the original source cannot be exactly encoded if the available memory or bandwidth is limited. This is typically the case for sources with infinite entropy, like continuously distributed random variables or stochastic processes. One important example of a source with infinite entropy is Brownian motion which was originally intended to model the trace of a particle moving randomly in space ([15]). More recently, Brownian motion has been applied to various other scientific fields, including the analysis of stock markets and networking ([16], [17]), and has arguably become the most important object in modern probability theory ([13]). This paper is concerned with coding Brownian motion. In recent years, the coding complexity (i.e., approximability by simpler processes) of Brownian motion and related objects has been investigated ([2], [5], [6]). The focus has been primarily on determining the asymptotic behavior of the entropy coding error ([7]) In the particular case of a Brownian motion B, an important result ([9]) is that √ lim D(e) (r | B, s) r = κ, (1) r→∞ where κ is a constant independent of s. A further result is that κ ∈ [ √π8 , π] ([8]). These two results, however, are not accompanied by a concrete coding strategy. In this paper we present such a concrete coding strategy B̂ for a Brownian motion B. Our strategy is not only rateoptimal, but it also reduces the interval for κ to π κ ∈ [ √ , 2.11] , 8 by explicitly bounding κ ≤ 2.11.2 The main idea of our coding strategy is to represent B̂ in terms of piecewise constant paths whose jumping points are the exit times of B of certain intervals. The key observation is that B is bounded between two consecutive exit times, and hence the error |Bt − B̂t | is bounded as well. We point out, however, that the exit times are continuously distributed and cannot be encoded exactly. In order to deal with this technical problem we must resort to quantized versions, i.e., discrete approximations D of the exit times X. In order to quantize random variables, a typical method is to minimize E[d(X, D)] (for an appropriate metric d), subject to constraints of the form ([11], [12]) rng D ≤ n, or H(D) ≤ H . where s > 0 is the moment under consideration and the infimum is taken over all stochastic processes X̃ such that He (X̃) ≤ r1 . Unfortunately, such a method fails in our context. The reason is that the maximum of Brownian motion has an unbounded distribution in every (deterministic) interval of positive length, and hence |Bt − B̂t | cannot be bounded (a.s.) by any deterministic quantization error. Our crucial idea is to suitably construct a random variable Y ≥ 0 (which is also an exit time and independent of X), and which plays the role of a random error. In this way, by bounding Brownian 1 H and H denote the entropy measured in nats and bits, respectively; e similarly, ln := loge and log := log2 . 2 Another coding strategy was recently proposed in [1] with κ ≤ 5.19, and hence outside the interval established in [8]. D(e) (r | X, s) := inf sup Xt − X̃t t∈[0,1] , Ls motion with a (random) interval, the supremum error of the approximation B̂ can also be bounded. In more concrete terms, given two independent random variables X and Y ≥ 0, our goal is to define a discrete random variable D with minimal finite entropy such that X ≤D ≤X +Y a.s. (2) The rest of this paper is organized as follows. In Section II we develop solutions for the quantization problem introduced in Eq. (2). We then use these solutions in Section III for our proposed coding strategy of Brownian motion. Finally, we present some brief conclusions in Section IV. II. Q UANTIZATION OF RANDOM VARIABLES It is possible that no D satisfying Eq. (2) with finite entropy exists. For a quick example take X integer-valued with H(X) = ∞, and let Y such that Y < 1 almost surely; note that D exactly encodes X and thus has infinite entropy as well. Therefore, we must enforce some conditions on Y . We will show in particular that the conditions ∀t ≥ 0 : P(Y ≤ t) ≤ βtn and E[|X|] < ∞ (for some n ∈ N, β > 0) suffice for the existence of a D with finite entropy and satisfying Eq. (2). The following two scaling properties will be frequently used in the sequel: 1) For c > 0, a ∈ R the problem for (cX, cY ) and (X + a, Y ) is just as hard as for (X, Y ) (simply take the random variables cD and a + D instead of D, which have the same entropy). 2) If P(Y1 ≤ t) ≤ P(Y2 ≤ t) for every t ≥ 0, the problem for (X, Y1 ) is not harder than for (X, Y2 ) (by the independence assumption one may assume that Y2 ≤ Y1 a.s., so that X ≤ D ≤ X + Y2 ≤ X + Y1 ). The second property is crucial: If the probability of Y is too large, for small values of t, then the resulting entropy of D is also large. We therefore confine ourselves to the case when this probability is bounded by P(Y ≤ t) ≤ βn tn . Among all distributions satisfying this constraint, the worst case is represented by those satisfying the above line with equality. It is sufficient to work with these worst-case distributions, which we denote for convenience by Mnβn . In the following we first consider the simpler case when X is a bounded random variable, and then we consider the general case. A. Bounded random variable X In this section, in addition to Y ∼ Mnβn (for some n ∈ N and βn > 0), we also assume that X is bounded on some interval [0, α]. Given X and Y , the main idea to construct a quantized version D of X is as follows. First estimate Y from below, i.e., find d ≤ Y . Then divide the range of X (i.e., [0, α]) into subintervals of length d and determine the one where X lies in. The right endpoint of this interval is finally chosen as the value for D. We introduce two integer-valued random variables Ŷ and X̂Ŷ corresponding to the first two steps: Ŷ = i ⇔ Y ∈ [2−i , 2−i+1 ) X̂Ŷ = j ⇔ X ∈ [(j − 1)2−Ŷ , j2−Ŷ ), i, j ≥ 1. With these random variables we can easily define D: Definition 1: For Ŷ , X̂Ŷ as above, define X̂Ŷ . 2Ŷ D is the right endpoint of the interval in the definition of X̂Ŷ . Some further simplifications are useful. √ By applying the previous scaling property 1) with c = n βn , we obtain an equivalent problem with p P(cY ≤ t) = tn and cX ∈ [0, α n βn ] . D := Further, if √m ∈ Z denotes the unique integer such that 2m−1 < α n βn ≤ 2m , and applying both scaling properties, the quantization problem becomes harder when the range of X is replaced by [0, 2m ]. The advantage of this setting (i.e., X ∈ [0, 2m ] and Y ∼ Mn1 ) is that it provides a “worst case” distribution for X: the entropy H(D) is maximal if X is uniformly distributed. To see this, consider the entropy of D conditioned on {Ŷ = i} (for i ≥ 1): H(D | Ŷ = i) = H(X̂Ŷ | Ŷ = i) Clearly, if X is uniformly distributed, so is X̂Ŷ | {Ŷ = i}. As the (discrete) uniform distribution maximizes the entropy among all distribution with the same (bounded) support, all the conditioned entropies H(D | Ŷ = i) and thus H(D) are maximal. Therefore, for the rest of this section, we assume Y ∼ Mn1 and X ∼ U[0,2m ] , for some n ≥ 1 and m ∈ Z. Suppose first the special case m = −1. We calculate the (joint) distribution of the corresponding Ŷ and X̂Ŷ . For Ŷ it clearly holds: P(Ŷ = i) = P(Y ≤ 2−i+1 ) − P(Y ≤ 2−i ) = (2n − 1)2−ni . Since X and Y are assumed to be independent, the conditioned random variable X̂Ŷ | Ŷ = i is uniformly distributed on {1, . . . , 2m+i } and hence P(Ŷ = i ∧ X̂Ŷ = j) = (2n − 1)2−(m+(n+1)i) . As D is completely determined by the values of Ŷ and X̂Ŷ , its entropy is bounded by H(D) ≤ H(Ŷ , X̂Ŷ ) ≤ H(Ŷ ) + H(X̂Ŷ ). But as (Ŷ , X̂Ŷ ) cannot be reconstructed out of D, this bound can be improved by calculating D’s distribution. The range of D consists of all the dyadic rationals in the interval (0, 2m ]. We next introduce the following useful representation. Let R0 := {2−1 } and for k ≥ 1 P∞ where we used the well known fact that i=1 ia−i = Clearly, H(D) = H0 + H≥1 and hence: Rk := {l2−k−1 | l is odd, l ≤ 2k } . S Clearly, rng(D) = k≥0 Rk and the Rk ’s are pairwise disjoint. Lemma 1: Let Y ∼ Mn1 , X ∼ U[0,2−1 ] . Then ( n+1 2 −2 , r = 12 n+1 P(D = r) = 22n+1 −1 −2 −(n+1)k r ∈ Rk , k ≥ 1. 2n+1 −1 2 The result easily extends to the case m ≥ 0. Concretely, by conditioning on the events {X ∈ [(j −1)2−1 , j2−1 )}, one obtains: Theorem 1: Let m ≥ −1, Y ∼ Mn1 , X ∼ U[0,2m ] . Then for the entropy H(D) it holds Proof: We have for r = 2−1 : P(D = 2−1 ) = ∞ X P(D = 2−1 , Ŷ = i) i=1 = ∞ X (2n − 1) 21−(n+1)i = i=1 l 2k and for r = P(D = r) = = 2n+1 − 2 , 2n+1 − 1 ∈ Rk (k ≥ 1, l odd, l ≤ 2k ): ∞ X i=k+1 ∞ X P(D = r, Ŷ = i) (2n − 1) 21−(n+1)i = i=k+1 2n+1 − 2 −(n+1)k 2 2n+1 − 1 Now it is easy to compute the entropy of D: Lemma 2: With the situation as in Lemma 1 the entropy of D is given by H(D) = Λn − 1 := log 2n+1 − 1 (n + 1)2n + n − 1. n 2 −1 (2 − 1)(2n+1 − 1) Proof: We have |R0 | = 1, and |Rk | = 2k−1 for k ≥ 1. Therefore H0 := − P(D = 2−1 ) log P(D = 2−1 ) 2n+1 − 2 2n+1 − 1 = n+1 log n+1 2 −1 2 −2 and H≥1 := − ∞ X X P(D = r) log P(D = r) k=1 r∈Rk ∞ X 2n+1 = 2k−1 n+1 2 k=1 2n+1 − 1 (n+1)k − 2 −(n+1)k 2 log n+1 2 −1 2 −2 1 2n+1 − 1 2n = n+1 log n+1 + (n + 1) n , 2 −1 2 −2 2 −1 H(D) = log a (a−1)2 . 2n+1 − 1 (n + 1)2n + n . n+1 2 − 2 (2 − 1)(2n+1 − 1) H(D) = Λn + m . For the remaining case (i.e., m ≤ −2), let us remark that Definition 1 is inadequate. To see that we let m = −2 and consider two outcomes where Y ∈ [ 21 , 1] (i.e., Ŷ = 1) and Y ∈ [ 14 , 21 ] (i.e., Ŷ = 2). Since X ∈ [0, 2m ] = [0, 14 ], we have that X̂Ŷ = 1 in both cases. Although the value of 2−2 is possible for both outcomes, Definition 1 separates them by giving D = 2−1 and D = 2−2 , respectively, which increases the entropy. To avoid this problem, we modify the slightly modify Definition 1: Definition 2: For m ≤ −2 and Ŷ , X̂Ŷ as above, let ( 2m Ŷ ≤ |m| D := X̂ Ŷ otherwise . 2Ŷ Let us also introduce a new random variable B, indicating which branch in the above definition is used: ( +1 Ŷ ≤ |m| B := −1 otherwise . Since P(B = −1) = 1 − P(B = +1) = 2nm , the conditional entropy H(D | B) is given by H(D | B) =(1 − 2nm )H(D | B = +1) + 2nm H(D | B = −1). By definition, D is constant on {B = +1}, and thus has entropy = 0 on this set. In turn, for {B = −1}, note that for t ≤ 2m P(Y ≤ t | B = −1) = 2−mn P(Y ≤ t) = 2−mn tn , i.e., on {B = −1} it holds that X ∼ U[0,2m ] and Y ∼ Mn2−mn . By applying the scaling properties 1) and 2), we obtain an equivalence to the case when X ∼ U[0,1] and Y ∼ Mn1 . Hence, by Theorem 1: H(D | B) = 2nm Λn . Theorem 2: If m ≤ −2 the entropy of D is bounded by H(D) ≤ 2nm (2n |m| + Λn ) . Proof: By the monotonicity of the log-function we get H(B) = −2nm log 2nm − (1 − 2nm ) log(1 − 2nm ) ≤ −2(2nm log 2nm ) = 2nm 2n |m| Applying the chain rule, H(D) ≤ H(D, B) = H(B) + H(D | B) yields the result. For large |m| this bound is quite sharp since H(D | B) ≤ H(D) and the term 2n |m| has a small influence compared to 2nm . Summarizing, for random variables X and Y such that X ∈ [0, α] and P(Y ≤ t) ≤ βn tn there is a solution D of the quantization problem for (X, Y ). Its entropy grows √ linearly in m := dlog α n βn e for m → ∞ (Theorem 1) and decays exponentially in m for m → −∞ (Theorem 2). B. Unbounded random variable X Here we still assume that Y ∼ Mn1 , but we no drop the condition that X is bounded. By the example above, without further conditions on X, there is in general no D with finite entropy. Thus an analogue of the uniform distribution as a worst case distribution for X is lacking, and we cannot compute the entropy of the corresponding D directly. Our strategy consists of, first, bounding X to some interval I and, second, applying the previously developed solution to Y and X | {X ∈ I}. The entropy of D can then be estimated by the entropy of the bounding procedure for X plus the entropy as calculated in the previous section. For a given length d > 0, we let the random variable X̂ indicate in which interval X lies in, i.e., X̂ = n ⇔ X ∈ [nd, (n + 1)d), n∈Z. Next we modify the definition of X̂Ŷ to X̂Ŷ = j ⇔ X ∈ [dX̂ + (j − 1)2−Ŷ , dX̂ + j2−Ŷ ) , such that we can give a formal definition of D: Definition 3: With X̂ and X̂Ŷ as above and Ŷ as before, define: X̂ D := dX̂ + Ŷ . 2Ŷ Again, D is the right endpoint of the interval in the definition of X̂Ŷ . Applying the chain rule to D and X̂ leads to H(D) ≤ H(X̂) + H(D | X̂) . p As X̂ ≤ X d , it holds for the moments that E[|X̂| ] ≤ Since maximum entropy distributions are hard to compute in the discrete case, we follow an approach from [10] and define a continuous random variable Ξ by E[|X|p ] dp . (3) As the random variable D | X̂ = n is bounded for n ≥ 1, we can apply Theorem 1. So all we have to do is to estimate H(X̂). This will be done by the use of maximum entropy distributions. As mentioned, a universal X such that H(X̂) is maximal does not exist. Therefore we consider only those X satisfying some moment constraint E[h(X)] ≤ µ and determine the distribution which maximizes the entropy in that class. Ξ := X̂ + U , where U ∼ U[0,1] is independent of X̂. Clearly, the density of Ξ is given by f (t) = P(X̂ = btc) and we have: H(Ξ) = H(X̂) . It is well known (see, e.g., [14]) that the distributions maximizing the entropy in Cµp := {Z | E[|Z p |] ≤ µ}, p = 1, 2 are the centered Laplacian and normal distribution, respectively. Thus: ( log(2eµ) p=1 p (4) max{H(Z) | Z ∈ Cµ } = 1 2 log(2πeµ) p = 2 . A straightforward calculation gives the relationship bep tween E[|Ξ| ] and E[|X̂|p ]: Lemma 3: Let X̂, Ξ as above, then: 1 1 E[|X|] + E[|X̂|] ≤ + , 2 2 d 2 1 1 E[|X|] E[|X| ] 2 E[|Ξ| ] = + E[|X̂|] + E[|X̂|2 ] ≤ + + . 3 3 d d2 In order to combine the preceding results with those of Section II-A we have to choose a specific value for d. It seems reasonable to set d = 2m for some m ∈ Z, which corresponds to the m from the previous section. We additionally assume m ≥ −1, as we have exact results only for this case. Then X|{X̂ = n} is a random variable bounded in [n2m , (n + 1)2m ), which is, due to the scaling properties, equivalent to an X being bounded in [0, 2m ]. In order to find the optimal m ≥ −1, note that by applying Eq. (3) with the entropy in Theorem 1 and the maximum entropy Eq. (4) with µ from Lemma 3, the entropy of our discrete D can be bounded by: ( m + log(2e( 12 + E|X| p=1 2m )) + Λn H(D) ≤ E|X|2 m + 21 log(2πe( 13 + E|X| + )) + Λ n p=2 2m 4m E[|Ξ|] = Replacing m by a continuous variable t ∈ [−1, ∞), it is easy to see that the derivative of the right hand side is > 0 in both cases. Thus, it is monotonically increasing and therefore m = −1 (or equivalently d = 12 ) is always the best choice. Combining the preceding results we have: Theorem 3: Let p = 1, 2, X ∈ Lp , and Y a random variable with P(Y ≤ t) ≤ βn tn , for some n ≥ 1, βn > 0. Then there is a discrete D with X ≤ D ≤ X +Y . Its entropy is bounded by: 1 Λn + log e( 1 + 2βnn E|X|) 2 H(D) ≤ 1 2 Λn + 1 log eπ ( 1 + 2βnn E|X| + 4βnn E|X 2 |) . 2 2 3 3 Proof: By the scaling properties we may work with Y ∼ 1 and βnn X instead of the original X and Y . Let m = −1 by the remark above. By Eq. (3), now add up the entropy from Lemma 2 and the maximum entropy in Eq. (4) with the moments from Lemma 3, which concludes the proof. For the application from Section III, the encoding of Brownian motion, the special case X ≥ 0 will be particularly important. Clearly, the corresponding maximum entropy distributions turn into its absolute value, i.e., the exponential and truncated normal. Thus, the entropy reduces by 1. Theorem 4: Consider the hypotheses from Theorem 3, and restrict X ≥ 0. Then the entropy H(D) is bounded by: 1 Λn + log e ( 1 + 2βnn E|X|) 2 2 H(D) ≤ 1 2 Λn + 1 log eπ ( 1 + 2βnn E|X| + 4βnn E|X 2 |) . 2 8 3 −2 −1 0 1 2 Mn1 0 1 2 3 4 5 6 7 Fig. 1. Sample path of Brownian motion B and its approximation B̂. For simplicity the jumping points are assumed to be in the middle of the τ -interval III. C ODING OF B ROWNIAN MOTION Encoding Brownian motion B on [0, 1] with an error of ε is, due to the scaling property (of Brownian motion), the same as encoding B on [0, ε12 ] with an error of 1. We will approximate (Bt )t∈[0, 12 ] by a cadlag process B̂, whose paths ε are piecewise constant. Its jumping points will depend on the exit times of B of intervals of size 1. Definition 4: Let (τn )n∈N denote the exit times of Brownian motion on a grid of size 1, i.e., let τ0 := 0 τn+1 := inf t ≥ 0 | BPni=1 τi +t − BPni=1 τi = 1 , n ≥ 0 Further, let ξ0 = 0 and ξn+1 := BPn+1 τi − BPni=1 τi = ±1 i=1 for n ≥ 0, indicate whether the Brownian motion moves up or down at the specific time. Suppose we have constructed discrete random variables Zi such that for every n ∈ N n X τi ≤ i=0 n X i=0 Zi ≤ n X τi + τn+1 , (5) i=0 and its entropy is uniformly bounded by some H ≥ 0. An equivalent definition will be given below. Given these Zi ’s we can directly define our process B̂: Definition 5: With the ξn as in Definition 4, define the approximating process B̂ by B̂t := ∞ X ξn + ξn+1 P 1[ ni=0 Zi ,∞) (t) . 2 n=0 i=0 i=0 so that for the difference B − B̂ it holds: BPni=1 τi − BPn−1 τi n i=1 , and (Bt − B̂t ) Il ∈ (−1, +1) + 2 P P n B i=1 τi − B n+1 τi i=1 . (Bt − B̂t ) Irn ∈ (−1, +1) + 2 Thus we have shown the following upper bound for the supremum error of our coding strategy B̂: Lemma 4: For the coding error it holds that supt≥0 |Bt − B̂t | ≤ 32 almost surely, and hence for any s > 0: 3 supt∈[0,1] Bt − B̃t ≤ . 2 s In Figure 1 a sample path of Brownian motion B and the corresponding path of B̂ are shown. We see that the process B̂ does not necessarily jump in every interval I n . Surely, this happens exactly when the coefficient ξn +ξ2 n+1 vanishes, i.e., when ξn = −ξn+1 . If this is case, then the coding of the corresponding random variable Zn is in that sense dispensable, as for B̂ we only need the values of ..., Pn Pn+1 In every (random) interval I n := [ i=0 τi , i=0 τi ) at most one jump of B̂ occurs. If we let " n ! " n ! n n+1 X X X X n n Il := τi , Zi , and Ir := Zi , τi i=0 the subintervals left and right of the jump, we have for every n∈N BPn−1 τi + BPni=1 τi i=1 B̂ Iln = 2 BPni=1 τi + BPn+1 τi n i=1 B̂ Ir = , and clearly 2 B I n ∈ BPni=1 τi − 1, BPni=1 τi + 1 i=0 n−1 X i=1 Zi , n+1 X Zi , . . . i=1 and not of Zn itself. Therefore, we will define a sequence of random times (σn )n∈N corresponding to those τn where B̂ does jump within the following interval. These times together with its coefficients fully determine B̂. First, we define the integer-valued random variables Nn ’s as follows: N0 = 0, N1 denotes the number of τ -intervals before Bt moves for the second time in the same direction, N2 denotes the number of intervals Bt moves two PNbefore 1 τ , etc. Finally, let times in the same direction after i=0 i Pn Mn = i=0 Ni the cumulative sum of the Nn ’s. The motivation for this definition is that the n-th P jumping Mn τi point of B̂ occurs in the interval I Mn , i.e., between i=1 PMn +1 and i=1 τi . Moreover, as long as the path goes up and down alternately (i.e., ξ1 = −ξ2 = ξ3 = . . . ), the coefficients ξn +ξn+1 vanish and thus B̂ does not jump. 2 Additionally, two consecutive jumps of B̂ go into the same direction if and only if the corresponding Nn is odd. Definition 6: Define, for n ∈ N, the real-valued random variables σn and πn by σ0 = 0, π0 = τ1 and: Mn X σn := τi , πn := τMn +1 , with Mn as above. Further, let η0 := ηn := BPn i=0 σi +πn − BPn i=0 σi , and n≥1. Next, the goal is to construct discrete random variables Dn ’s such that for every n ∈ N: n n n X X X σi ≤ Di ≤ σi + πn . (6) i=0 i=0 n X i=1 P(σn ≤ t) = P(∃s ≤ t : Bs ∈ / (−1, 2)) and P(πn ≤ t) = P(∃s ≤ t : Bs ∈ / (−1, 1)) . Moreover, the probability P (σn ≤ t ∧ sgn(ηn−1 · ηn ) = +1) is equal to the probability of the exit before t and that the exit takes place at −1. Proof: Because of the identical distribution, we may assume without loss of generality that n = 1. The assertion for π1 is evident from its definition. For σ1 , note that the exit time can be written as N X inf{s ≥ 0 | Bs ∈ / (−1, 2)} = τi , i=1 i=1 Di ≤ n X σi + πn for n ∈ N, i=1 where D holds σ ≤ D ≤ σ + π and is constructed as in Section II. Proof: As π1 and σ1 are independent, we can construct D1 with σ1 ≤ D1 ≤ σ1 +π1 as in Section II. Clearly H(D1 | η0 , . . . , ηn ) ≤ H(D1 | sgn η0 · η1 ) ≤ H(D | Bσ ) (by Lemma 5). For D2 we stipulate that σ1 + σ2 ≤ D1 + D2 ≤ σ1 + σ2 + π2 , i.e., σ2 − (D1 − σ1 ) ≤ D2 ≤ σ2 − (D1 − σ1 ) + π2 . Again, π2 is independent of σ1 and σ2 (and thus of D1 ). As D1 − σ1 ≥ 0 we apply the scaling property to see that the quantization problem is easier than n=0 We now need to determine the distribution of σn and πn . Due to the Markov property, both (τn )n and (Nn )n are i.i.d. families, and hence (σn )n and (πn )n are i.i.d. as well. Since πn ≤ σn+1 a.s., both families are not mutually independent. However, as πn does not depend on B moving up or down in that interval, πn is independent of Mn and hence for any fixed n ∈ N, σn and πn are independent. Lemma 5: For n ≥ 1, the distribution of σn and πn is the same as the first exit time of Brownian motion of the interval [−1, 2] and [−1, 1] respectively, i.e., n X H(D1 , . . . , Dn | η0 , . . . ηn ) ≤ nH(D | Bσ ) , i=0 With this property, B̂ can be defined equivalently as follows ∞ X ηn 1[Pni=0 Di ,∞) (t) . B̂t = σi ≤ whose conditional joint entropy can be bounded by n≥1 i=Mn−1 +1 B π0 2 where N = min{n | BPni=1 τi ∈ {−1, 2}}. But as P(N1 = k) = 2−k = P(N = k), we have N =D N1 . For the last part, note that Brownian motion leaves (−1, 2) at −1 (+2) if and only if N is odd (even). Clearly, P(N is odd) ⇔ P(Nn is odd). The remark above Definition 6 gives the result. For now on, let σ =D inf{t ≥ 0 | Bt ∈ / (−1, 2)} and π =D inf{t ≥ 0 | Bt ∈ / (−1, 1)} be mutually independent copies of the exit times in Lemma 5. Lemma 6: For σ1 , σ2 , . . ., π1 , π2 , . . . and η0 , η2 , . . . as above, there are random variables D1 , D2 , . . . such that σ2 ≤ D2 ≤ σ2 + π2 . Therefore, as for D1 : H(D2 | η0 , . . . , ηn ) ≤ H(D2 | sgn η1 · η2 ) ≤ H(D | Bσ ). Proceeding in this fashion we define the Dn , such that for n ∈ N: H(D1 , . . . , Dn | η0 , . . . ηn ) ≤ n X H(Di | η0 , . . . , ηn ) i=1 ≤ nH(D | Bσ ) . By the scaling property, we need to encode B̂ up to time therefore we let B̂ ε := B̂ [0, ε−2 ]. This process depends only on finitely many Dn . We introduce the new random variable ( ) n X ε −2 M := max n ∈ N | σi ≤ ε , 1 ε2 , i=0 indicating that number. Note that, as our first jumping point is at D0 , the number of jumps of B̂ ε is M ε + 1. As we may choose D0 ≡ 0 deterministically, M ε corresponds to the number of Dn , which we actually have to encode. The entropy H(B̂ ε ) is now estimated as follows: Lemma 7: For the entropy of the coding strategy B̂ ε holds H(B̂ ε ) ≤ H(M ε )+H(η0 )+[H(η1 | η0 ) + H(D | Bσ )] E[M ε ] Proof: We use the conditional entropy H(B̂ ε ) ≤ ε H(B̂ , M ε ) = H(M ε ) + H(B̂ ε | M ε ). Further: Lemma 9: For the (conditional) entropy H(η0 ) and H(η1 | η0 ) it holds that H(η0 ) = 1 and H(η1 | η0 ) < 0.92 . Proof: By symmetry of Brownian motion, P(η0 = ± 21 ) = 12 , and thus H(η0 ) = 1. By the remark above ∞ X Definition 6: H(B̂ ε | M ε ) = P(M ε = n)H(D0 , η0 , . . . , Dn , ηn ) 1 1 n=0 = P η = −1 | η = − = P η = 1 | η = 1 0 1 0 ∞ X 2 2 ε = P(M = n) [H(D1 , . . . , Dn | η0 , . . . , ηn ) + H(η0 , . . . , ηn )] ∞ X 2 n=0 = P(N1 odd) = 2−(2n+1) = . " # 3 ∞ n X X n=0 ≤ P(M ε = n) nH(D | Bσ ) + H(η0 ) + H(ηi | ηi−1 ) 2 3 1 Thus, H(η1 | η0 ) = 3 log 2 + 3 log 3 < 0.92. n=0 i=1 ∞ In order to estimate H(D | Bσ ) we need the probabilities X P(π ≤ t) and some moment of σ | {Bσ = −1} and σ | = H(η0 ) + [H(η1 | η0 ) + H(D | Bσ )] P(M ε = n)n n=0 {Bσ = 2}: Lemma 10: Let π be the exit time of B of the interval = H(η0 ) + [H(η1 | η0 ) + H(D | Bσ )] E[M ε ] . (−1, 1). The following estimation holds: Here we used the fact that we can chose D0 ≡ 0 and that P(π ≤ t) ≤ β2 t2 , (ηi , ηi−1 ) =D (η1 , 2η0 ), by the Markov property. For the rest of this section we determine the constants in where this inequality and its asymptotic behavior. We begin with 4 3 3 β2 = √ ( ) 2 < 1.86 . the expectation and entropy of M ε : 2π e Lemma 8: The expected number of jumping points of B̂ Proof: Clearly, per unit interval is asymptotically equal to E(σ)−1 , i.e., P(π ≤ t) ≤P(sup{Bs | s ∈ [0, t]} ≥ 1) 1 1 lim ε2 E(M ε ) = = . +P(inf{Bs | s ∈ [0, t]} ≤ 1). ε→0 E(σ) 2 Moreover, for the entropy of this number holds lim ε2 H(M ε ) = 0 . ε→0 Proof: Introducing a time parameter t ≥ 0, the process 1 √ Nt := M t can be regarded as a renewal process with interarrival times distributed like σ. By the elementary renewal theorem E(Nt ) 1 = . t→∞ t E(σ) lim ε2 E(M ε ) = lim ε→0 For the fact that E(σ) = 2 see, e.g., [3, p. 212]. For the second result, we can define a continuous random variable Ξ as in the previous section, such that H(Ξ) = H(Nt ). Applying the maximum entropy (with p = 1) we thus have: 1 H(Nt ) ≤ log 2e(E(Nt ) + ) . 2 It then follows that lim ε2 H(M ε ) = lim ε→0 t→∞ H(Nt ) =0. t As the random variables ηn only assume two different values, its entropy is bounded by 1. In fact, as these ηn are not independent, a slightly better estimation holds: By the reflection principle for Brownian motion we thus have 1 4 1 P(π ≤ t) = 4P(Bt > 1) ≤ √ t 2 e− 2t =: γ(t)t2 . 2π The derivative of γ is given by dγ 4 − 5 − 1 1 −1 3 (t) = √ t 2 e 2t t − . dt 2 2 2π Therefore there is a unique maximum at t = 31 . Now 1 4 3 3 = √ ( ) 2 = β2 . γ 3 2π e Hence P(π ≤ t) ≤ γ(t)t2 ≤ β2 t2 . Evaluating its Laplace transforms (see, e.g., [3, p. 212]), the first two moments of σ | {Bσ = −1} and σ | {Bσ = 2} can be seen as: 5 8 E(σ | Bσ = −1) = , E(σ | Bσ = 2) = , 3 3 17 32 E((σ | Bσ = −1)2 ) = , E((σ | Bσ = 2)2 ) = . 3 3 For H(D | Bσ ), the last remaining term of the right hand side of Lemma 7 we apply the results of Section II. Lemma 11: The conditional entropy H(D | Bσ ) is bounded by: H(D | Bσ ) < 4.74 . Proof: We apply Proposition 4 with p = 1, 2, n = 2 and Lemma 10. Therefore H(D | Bσ = •) is bounded by the minimum of: e 1 √ β2 E[σ | Bσ = •]) Λ2 + log 2(2 + 2 √ πe 1 Λ2 + 12 log •] 8 ( 3 + 2 β2 E[σ | Bσ = 2 +4β2 E[(σ | Bσ = •) ]) With the moments above we obtain the values: H(D | Bσ = −1) ≤ min{4.57, 4.62} = 4.57 H(D | Bσ = +2) ≤ min{5.20, 5.06} = 5.06. By the probabilities calculated in the proof of Lemma 9: 2 1 H(D | Bσ = −1) + H(D | Bσ = +2) 3 3 ≤ 4.74. H(D | Bσ ) = Collecting results, we can now state the central result of this paper: Theorem 5: For the optimal coding error D(e) (H) of Brownian motion B on the interval [0, 1] it holds that √ κ = lim D(e) (H) · H < 2.11 . H→∞ Proof: Let ε > 0. The coding error of B̂ (and thus of B̂ ε = B̂ [0, ε12 ]) is bounded by 32 (Lemma 4). Thus, by the scaling property there are B̃ ε such that supt∈[0,1] |B − B̃ ε | ≤ 3 ε ε (e) (He (B̃ ε )) ≤ 32 ε. With 2 ε and H(B̃ ) = H(B̂ ), hence D the substitution H := He (B̃ ε ) = H(B̃ ε ) ln 2 and according to the Lemmas 7, 8, 9 and 11 we have: q √ 3 κ = lim D(e) (H) · H ≤ lim ε H(B̂ ε ) ln 2 ε→0 2 H→∞ 3h 2 ε ≤ lim ε H(M ) + H(η0 )+ ε→0 2 i 12 [H(η1 | η0 ) + H(D | Bσ )] E[M ε ] ln 2 1 3 ln 2 2 = 0 + [H(η1 | η0 ) + H(D | Bσ )] 2 2 r 3 0.92 + 4.74 < ln 2 < 2.11 . 2 2 IV. C ONCLUSIONS We have derived upper bounds for the quantization problem of the exit times of Brownian motion, as formalized in Eq. (2). We have considered the bounded (Theorems 1 and 2), arbitrary (Theorem 3) and nonnegative (Theorem 4) cases for these exit times. These results were then applied to construct and determine the entropy of a coding strategy for Brownian motion (Theorem 5). Together with the result from [9], we have showed that for the optimal constant κ from Eq. (1), it holds that κ ∈ [ √π8 , 2.11]. Therefore, we have not only provided a concrete coding strategy for Brownian motion, but we have also improved the range of the multiplicative constant. Although we could not achieve optimal results in Section II, we speculate that the multiplicative constant κ is less than 2.11. In particular, for the estimation of H(X̂), one could apply maximum entropy distributions of classes different from Cµp (e.g., moments of higher order, exponential moments, etc.) in order to tighten the bounds. R EFERENCES [1] A. G. Adelmann von Adelmannsfelden. Coding of brownian motion under supremum norm distortion. Diplomarbeit, Technische Universität Berlin, 2010. [2] F. Aurzada and S. Dereich. The coding complexity of lévy processes. Foundations of Computational Mathematics, 9(3):359–390, 2009. [3] A. N. Borodin and P. Salminen. Handbook and Brownian Motion Facts and Formulae. Birkhäuser, Basel, second edition, 2002. [4] T. M. Cover and J. A. Thomas. Elements of Information Theory. John Wiley & Sons, New York, 1991. [5] S. Dereich. The coding complexity of diffusion processes under lp[0,1]-norm distortion. Stochastic Processes and their Applications, 118(6):938–951, 2008. [6] S. Dereich. The coding complexity of diffusion processes under supremum norm distortion. Stochastic Processes and their Applications, 118(6):917–937, 2008. [7] S. Dereich. Asymptotic formulae for coding problems and intermediate optimization problems: a review. Trends in Stochastic Analysis, pages 187–232, 2009. [8] S. Dereich, F. Fehringer, A. Matoussi, and M. Scheutzow. On the link between small ball probabilities and the quantization problem for gaussian measures on banach spaces. Journal of Theoretical Probability, 16(1):249–265, 2003. [9] S. Dereich and M. Scheutzow. High-resolution quantization and entropy coding for fractional brownian motion. Electronic Journal of Probability, 11:700–722, 2006. [10] S. Dolinar. Maximum-entropy probability distributions under lp-norm constraints. The Telecommunications and Data Acquisition Progress Report, 42-104:74–87, 1991. [11] S. Graf and H. Luschgy. Foundations of Quantization for Probability Distributions. Springer, Berlin, 2000. [12] A. György and T. Linder. On the structure of optimal entropyconstrained scalar quantizers. IEEE Transactions on Information Theory, 48(2):416–427, 2002. [13] O. Kallenberg. Foundations of modern probability. Probability and its Applications (New York). Springer-Verlag, New York, 1997. [14] J. N. Kapur. Maximum Entropy Models in Science and Engineering. John Wiley & Sons, New York, 1989. [15] P. Mörters and Y. Peres. Brownian Motion. Cambridge University Press, New York, 2010. [16] M. F. M. Osborne. Brownian motion in the stock market. Operations Research, 7:145–173, 1959. [17] L. M. Wein. Brownian networks with discretionary routing. Operations Research, 39(2):322–340, 1990.