Proof of Cesaro Means In order to show that our alternative definition of entropy rate: H (X) = lim H Xn | X1 , . . . , Xn−1 n→∞ is equivalent to the canonical definition 1 H(X1 , . . . , Xn ) n→∞ n H(X) = lim when the stochastic process is stationary, we used the following theorem Theorem Cesaro Means: Let an → a, let bn = n−1 ni=1 ai , then lim bn = a. n→∞ Proof Recall the meaning of limn→∞ bn = a: For every δ > 0 there exists a nδ such that, for every n > nd elta, bn −a < δ. Now, since limn→∞ an = a, we know that ∀, ∃n such that ∀n > n , |an −a| < . Choose δ = 2. Fix, , determine n , let n >> n , and look at bn − a. n −1 ai − a |bn − a| = n i=1 n n −1 = n ai − a n i=1 n −1 = n −na + ai i=1 n −1 = n (ai − a) i=1 1 Now divide the sum on the right hand side into two parts: the first is the sum over the indexes between 1 and n , the second is the over the remaining terms n 1 (ai − a) |bn − a| = n i=1 n 1 ≤ (ai − a) (1) n i=1 n 1 (2) + (ai − a) n i=n +1 We are now going to bound the two sums (1) and (2). First we bound (1) n n 1 1 (ai − a) ≤ |ai − a| n n i=1 i=1 1 n ≤ max |aj − a| n i=1 j=1 n 1 1 = max |aj − a| j=1 n i=1 n n = max |aj − a| j=1 n n n and, if we pick n satisfying |aj − a| n maxnj=1 n> , then sum (1) satisfies n 1 (ai − a) < n i=1 Now we bound (2) using a similar trick n n 1 1 (ai − a) ≤ |ai − a| n n i=n +1 i=n +1 2 (3) (4) n 1 n ≤ max |ai − a| n i=n +1 i=n +1 n 1 1 = max |ai − a| i=n +1 n i=n +1 n n − n n = max |ai − a| i=n +1 n n ≤ max |ai − a| i=n +1 ≤ (5) where Inequality 5 is a consequence of how we selected n . We have therefore shown that, if we fix δ, let = δ/2, determine n , there n maxn |aj −a| such that, for all n > nδ , |bn − a| < δ; exists a nδ = j=1 In other words, we have shown that limn→∞ bn = a. 3