>> Yuval Peres: Good afternoon. So at Russell's request or suggestion I researched various proofs of the Ergodic Theorem and the Subadditive Ergodic Theorem. So these are classical theorems going back almost a hundred years in the case of the Ergodic Theorem and the 70's for the Subadditive Ergodic Theorem. The original proofs, you know, were quite hard. Since then many shorter and simpler proofs were found. Some of the shortest proofs using the Maximal Ergodic Theorem are not so intuitive, so I want to present proofs that are perhaps more intuitive following the works of Kamae, Benji Weiss, Katznelson, and Mike Keane and Mike Steele. So it'll be a product of these. So the Ergodic Theorem of Birkhoff can be stated in measure theoretic form or a probabilistic form. In the probabilistic form we have a stationary ergodic process so Xj. So -- All right, so this process is, say, a stationary meaning that the law of any sequence, say, X1, X2, up to Xn is the same as the law of a shift X2, X3, up to X plus 1. And this is true for all n and it follows from this shift in variance that you can shift by more than one. In other words if we look at the transformation, p, that sends just the sequence, Xj, to the sequence Xj plus 1, this transformation is measure-preserving. And so our measure is a probability measure. And then we'll also assume this assumption can be removed, but in the applications often the process is ergodic which means that if you have any -- if -- So we assume that the process is ergodic which means if we have an invariant event or invariant function. So if F is any function of the process which is measurable and satisfies -- Okay, I'll write measurable -- and this shift in variance. So F is pointwise equal not just in distribution but pointwise equal to its composition with T then F is constant almost everywhere surely. And this assumption is satisfied in many cases. In particular this holds in the independent case but in many more cases. So if the Xj are independent then this invariance certainly implies that it's constant. Okay, so in that setting the Ergodic Theorem would just say that if we look at the partial sum -- So let me write more generally. So the partial sum of size L starting at K will be the sum of Xj plus K when J ranges from zero to L minus 1. Okay, so we'll be interested in the partial sums from various places and then we have the normalized partial sum, or the average, Al of K is 1 over L. Sl of K. And then the theorem states that An of K for any K, but it's enough to talk about the An of 1. This converges to the mean, [inaudible] of X1 of course by stationarity they all have the same mean, this converges almost surely. So that's the Birkhoff Ergodic Theorem. Later we'll discuss the subadditive version. Okay. So, all right. So let's -- Any questions on this statement. Okay, so there are two -- there are really two parts to the statement. One is that this actually converges which is maybe the [inaudible] the harder part and then identifying that the limit is, as you expect, the mean of X1. Any questions or something unclear? >> : So we've got to get the property of the distribution of the Xj? >> Yuval Peres: It's the property of the distribution of the Xj and of the transformation key. Okay, so I'm worried about -- What this means, you can -- instead of invariance functions you can think just of invariant events or an event B is invariant if T inverse of B equals B. And it's equivalent to just assume that all invariance events have probability of zero or one. So that's another form of ergodicity. And yet another form is that, you know, for any two sets of positive measure if you apply T enough times to 1 it will intersect the other at positive measure. So you cannot have kind of two sets of positive measure that don't see each other when you apply T. Okay, so that's about the statement. So I said there're several short proofs; I want to give the one that is most intuitive and also generalizes to the subadditive case. So as I said this proof starts with [inaudible] of Teturo Kamae then there is -- who wrote the proof in nonstandard analysis. Then this was simplified by Katznelson and Weiss and then by Keane, and it's basically his version that I'll show today. Okay, so first I want to -- Ah, so first I want to say that it's enough to consider positive variable. So we may assume all the Xj are non-negative because in general you just separate into the positive and the negative parts and prove it separately. All right? So you just write X as a difference of a positive and a negative part and work with each one separately and get the result. So once we prove everything here it's kind of linear. So once we prove it for the positive part, we prove it for the negative part, we can just subtract and get it. So it's enough to [inaudible] with this case. >> : Yuval? >> Yuval Peres: Yes. >> : [Inaudible] An I is the same as An [inaudible]? >> Yuval Peres: No. The limit is the same. But, again, An1 is the limit -- An, right? We're summing -- These are partial sums from different locations, right? So... >> : But there's a shift in variance so doesn't... >> Yuval Peres: So the shift in variance is of the distributions. Right? So X1 is not equal to X2, just it's distribution is the same. So the distribution -- But, so A1 is definitely not A2, but it has the same distribution. Right. Okay? So, yeah so of course -- And also it's easy to see that they must have the same limit. So An -- Right, so I wrote here An1 but it applies to AnK for any K. So this is a limit, right, as N [inaudible] to infinity. Okay. All right. But, okay, so look at the lim sup of the An's and -- Okay, as I said I'm going to [inaudible] proof as intuitive as possible which means not exactly the same as the short as possible. So I want to first consider an easy case which will then generalize. So, okay, one more definition before that. So we have the lim sup. We fix an alpha which is less than the lim sup. By the way this lim sup, we a priori don't know that it's finite at all even though --. So maybe I should have added the assumption. So we have stationary ergodic, and I want to assume that it's integrable. So assume that the expectation of the variables is finite. So let's add that. Okay. To make this statement meaningful. Okay, so we look at this lim sup which a priori don't know it's finite. So we know its, you know, zero infinity might at this stage be infinite but we'll prove it's finite. If we were talking about the lim inf, it would be immediate that it's finite [inaudible]. But we're talking about the lim sup so there's something still to prove. Okay. So fix alpha which is less than the lim sup and then let's say L of K, this would be random variables, these are the first time that you exceed -- that the averages exceed this lim sup so the first L so that Al of K exceeds alpha. Okay, alpha is certainly a finite number. Okay, so by the definition of lim sup this number is certainly finite but it might be very large. We might have to wait for long. So case one is when these numbers are uniformly bounded. So this is very special. You know, it happens for some ergodic sequences like periodic sequences, but it will be a good warm up for the general case. So suppose that these L of K's are uniformly bounded by some L. This is just a special atypical case, and we'll see what to do in this case and then generalize from that. So in this case we just take the partial sum and write it. And so basically the idea is to take the interval from 1 to N and cover it by intervals where the averages exceed alpha. So we know there is some interval here where the average exceeds alpha and the length of interval -- Right, so this is L of 1. The length of this interval is at most L. And then we find another interval where the partial sum exceeds alpha and so on. And we know that all these intervals are no longer than L. So we stop the first time we cross N minus L and list the composition of this series. Let's write it now more formally. So I'm going to say that the sum of Xj, J from 1 to N, this is going to be bigger than the sum I from 1 to sum M of partial sums. So this will be partial sums from sum Ak from some number -- Okay, let's write. These are L of Ki at Ki. Okay, so -- And I'll write some more and then explain this is going to be bigger than the sum of L of Ki times alpha which will be bigger than N minus L times alpha. And here -- Where each time we choose Ki -- So we start with K1 is 1 and Ki plus 1 will just be Ki plus L of Ki. Right, so each time we start at one we find an interval where the partial sums are large then we just go to the next. Right? So we find this interval then we got to the next point, find another good interval and so on. So all these partial sums are larger than their length times the average, right, because this is greater than alpha. So we get this inequality and then -- and the length of all these intervals will exceed N minus L because we just keep going as long as we can. And we just have to stop once we exceed N minus L; it's possible that the next interval will overshoot N so we stop at that point. We don't take the next interval. Okay? Now this inequality is, you know, completely obvious after you've seen it and maybe confusion the first time you see it, so please stop me because this is -- So apologies to whom it's trivial and apologies to those for whom it's confusing. But if anybody from the second group who wants to ask something -- Because this is really the crux of the whole matter. So --. >> : You probably should've mentioned that A var that it doesn't depend on L. >> Yuval Peres: Yes. >> : That [inaudible]. >> Yuval Peres: That's important. Thank you. So I should've mentioned that A var is a constant. So thank you. That's right. So A var this lim sup -- Thanks. So maybe at this point I should mention, so A var is -So observe note that A var equals A var composed with T. In other words if we -- This related to [Inaudible]'s comment. So if we start the partial sums from 1 or from 2, it not only has the same distribution but pointwise the partial sums only differ by the first variable. So when we divide by N and take a limit or a lim sup, it doesn't matter. So pointwise the lim sup of the sequence starting from the first or the lim sup starting from the second, when we divide by N, clearly, easy to see, it has the lim sup. So we have this invariance which means, because we're working in the ergodic case, so A var is a constant. Almost surely. Okay, so thanks [inaudible]. I should've commented on that. >> : [Inaudible]. >> Yuval Peres: Yes? >> : It's not just the first term [inaudible] the last term [inaudible]. >> Yuval Peres: Right. Yeah, so it's -- So we want it only to differ in the fist term, so maybe I'll say this. So An of 1, you know, equals An minus 1 of 2. [Inaudible]. Okay, Sn of 1 equals Sn minus 1 of 2 plus X1. Okay? So that's better because, you know, the last term we don't control well. All right, so now divide -- All right, so An of 1 equals, you know, N minus 1 over N. An minus 1 of 2 plus X1 over N. And now it's safe to take lim sups. Okay, so... >> : [Inaudible]? >> Yuval Peres: This is to formally justify this. Okay? So what [inaudible] pointed out is that if we just work with averages of length N then partial sum of length N from 1 and from 2 they differ by a last element which kind of varies in time. So we don't control it well. We could, but it's easier just to use this identity which compares partial sum of N terms to partial sum of N minus 1 terms. And now once we have it in this form, we really can take the lim sup. And see this goes to zero, this factor doesn't matter, and so we indeed derive this identity. Okay. So in this case we are basically done at least with the existence of the limit because -- And we can easily finish the rest because now if you take these two sides, right, you divide by N and take the lim inf, so you get that the -- So the lim inf of -- I'll just write it out -- 1 over N sum Xj. J equals 1 to N is going to be -- well, we take this, divide by N and take a lim inf for limit [inaudible]. So this is greater than alpha. Okay, and of course if we could do this for any alpha less than A var then we would get that the lim inf equals the lim sup and so the limit exists. But, of course, this assumption that L is constant is very restrictive. So now I don't want to spell out the details in this case. This case was more just to see the key argument in the [inaudible] setting. Now we're going to just do that same thing in the general case. So in the general case we can't assume that this L case abounded but what we can do is, you know, given epsilon small we can pick a large number L so that the probability that Lk. Okay, this probability doesn't depend on K but the probability of this is bigger than L will be less than epsilon. Okay. By stationarity this probability is the same for any case, so I could have put here just L of 1. Okay? So the point is this L of K is a finite number so it's a finite random variable but -- so I can put in L large so the probability of this variable is bigger than that is less than epsilon. And now we want to modify the process so that -- So Xk star will be Xk in the case when L of K is less than L. These are some how well behaved cases. And in the bad cases -- So I'm going to look ahead. And if the situation is bad, so we need to wait too long, then we're just going to modify the process and put in alpha here. Okay? This is modified process Xk star, and I want to write it as Xk plus Zk. All right, so Zk is usually zero. Just when we are in this case Zk will be, you know, alpha minus Xk. So we define Xk star this way. And now the -- And define this, you know -- All right, so Al star of K are the averages of the Xk star. Right? So Xj plus K. J from zero to L minus 1. And L. And then we have L star of K is the first L so that these averages Al star of K exceed alpha. And now we're in a good position. These are always at most L. >> : Did you mean X star [inaudible]? >> Yuval Peres: X star. Thank you. Yes. That's the whole point. These are averages of the X star. And now because L star of K, in fact it's L of K if L of K is less than L, and it's 1. All right, so L star of K, what is it? It's L of K if L of K is less than L and it's 1, otherwise. Okay, so it's certainly always at most, at most L. So now the previous argument applies and just partitioning the partial sum, sum I from -so now we're going to take N which is much, much larger than L intuitively. And we're going to take the sum. So the sum J from 1 to N of Xj star is going to be bigger by the same argument from before because we are in a situation of N minus L times alpha. Okay, so it's convenient that all the variables here are non-negatives. So the terms that we throw away here at the end we know they are nonnegative. Okay, so it's exactly the same argument from before now proves us this inequality which is really, you know, the key inequality [inaudible]. Right. Any questions? So it's exactly the argument because we are in that situation where we only have to sum at most capital L terms, you know, to get the average that we want. All right? So now what we can do with this, several things. So first let's take expectations on both sides. So we get that N times expectation of X1 star is greater than N minus L alpha. And what we can say about this? Well look at [inaudible] only differs in this case. So it's at most -So this is N times expectation of X1 plus alpha epsilon. Right, because the difference is just the expectation of this Z1 and that difference is at most alpha and with probability at most epsilon. Okay, so now we're in a good situation. We can divide by N and take a limit as N tends to infinity and we'll get that expectation of X1 is greater than 1 minus epsilon times alpha. Okay, moving the epsilon to the other side. And this we could do -- Note that here it's -- now it's X1; it's not X1 star anymore. So this is true for any epsilon. So yet expectation of X1 is greater at equal alpha. But this was true for any alpha less than A var. So we can conclude that expectation of X1 is in fact greater than A var which is a powerful inequality here because A var was the lim sup. So we conclude that this lim sup is in fact finite and bounded above by the expectation. Okay. So now we're almost done. We just still have to argue that the lim inf of the X is also A var. And for this we just go back to -- Yes? >> : [Inaudible] epsilon small? Like epsilon could be half [inaudible]? >> Yuval Peres: Yeah, but the point that we use that is -- I mean, at the end we use the fact that this is true for any epsilon. Right, once we had this -- right, we got this inequality and now we say this is true for any positive epsilon so E X1 is in fact greater than alpha. >> : Oh, okay. >> Yuval Peres: Okay? And then we said this is true for any alpha less than A var so in fact X1 is bigger than A var. So now in order to go back notice that the X star has two pieces, has this X and this Z. So let's make sure we can control the Z's well. And for this we're going to use what we've already proved for the process X, we're going to use it for the process Z. So the expectation of -- So this fact that we've already proved implies when applied to the process Z which is after all just another stationary process, stationary ergodic process that the expectation of Z1 is going to be bigger than the lim sup 1 over N sum J equals 1 to N of the Cj almost surely. And... >> : [Inaudible]. >> Yuval Peres: Right. So the Z's are a function of the X process, and so they are also ergodic. So any --. So if you have any ergodic process and you apply a function of that then you also get an ergodic process. And you just check that from the definition. Thanks, [inaudible]. So --. All right, so this is something that should be added. Since the Z's are a function, an invariant function of the original process, they're also an ergodic process. So we get this inequality. Now we want to apply that going back to our key inequality upstairs here. So that inequality tells us that -- Maybe I'll also write that expectation of Z1 we know is at most alpha epsilon. Okay, now let's write this inequality in the form 1 over N sum J from 1 to N of Xj plus sum J from 1 to N of Zj is at least N minus [inaudible] over N times alpha. [ Silence ] Okay. Now we want to take lim inf of both sides. Well here it just converges to alpha. What can we say about the lim inf of this side? Well on the one hand it's bigger. So we take the lim inf. And on the one hand it's bigger than alpha for that inequality. On the other hand it's certainly, "Well what can we bound it above?" It's not true that the lim inf is bounded above by the sum of the limits, but it's certainly true that it's bounded above by the lim inf of the X's plus the lim sup of the Z's. Right? So this is just true for any two sequences. When we add them we can bound -- right, we can bound above the lim inf by the lim sup of 1 and the lim inf of the other. Okay? But -- Right so this is lim inf of An of 1 plus -- and this we know -- so can be bounded above by alpha epsilon. This is greater than alpha. And so we're in the same situation we wanted before we move alpha epsilon to the other side. And so we get that this -- so this lim inf is in fact greater than alpha. So it follows that this lim inf is in fact greater than A var so it's equal to A var. And we have the convergence. Okay. And we've already verified that --. [ Silence ] All right. So we have convergence to --. [ Silence ] So A var --. Okay, so here I guess I've explained why A var is at most the expectation. I didn't say why A var is at least the expectation. So this is a final note. So the fact that is at most the expectation was proved. Now in the bounded case if the X's were bounded, we certainly know that A var would be equal Ex 1 just from Lebesgue Bounded Convergence Theorem because all the averages will be bounded by the same bound. And so in general A var is bigger than the limit of one over N sum J from 1 to N of the minimum of Xj with M which is -- Right? So this is now a bounded process. So this will give this expectation of X1 minimum M. So A var is bigger. >> : [Inaudible]? >> Yuval Peres: Xj minimum with a large number, M. This [inaudible] is the minimum. [ Silence ] All right. So again in the bounded case just Lebesgue just Lebesgue bounded convergence theorem gives you the expectation of the limit is the limit of expectations. In the unbounded case you just truncate. You get this inequality. And now once we have this inequality, you can let M tend to infinity and you get that A var would also be greater or equal than [inaudible] just by taking M to infinity. Okay? So any questions about this? All right. So as I said there other proofs. This one is particularly well adapted to generalization to the subadditive case. Historically when Kingman first proved the Subadditive Ergodic Theorem in the seventies the proof was much harder. So I'll go onto that if there are no questions on this case. [ Silence ] And despite the name Subadditive Ergodic Theorem, I'm going to prove the superadditive version. But it's up to [inaudible] minuses. So we're going to -- So what the superadditive process --. So maybe first it's just one word on the kind of places where Subadditive Ergodic Theorem's get applied. So one places in first passage percolation you have, say, lattice and you want to -- And on the edges you have random variables which indicate passage times. And you want to find P of zero N is the time to go from zero to N in general P of Mn might be the time to go from M to N on the X axis. So we have two points, you know, M and N on the X axis. We look at all possible paths that go from one to the other. For each path we look at the total passage time of the path, and we minimize over all these paths. So on the edges are endowed with independent random variables which are, you know, passage time of these edges. So these random variables, it's easy to see that they're subadditives. So the time to go from zero to N is certainly bounded by the time to go from zero to M plus the time to go from M to N. Because here we're considering a larger ensemble of paths that go from zero to N. Here we're considering paths that -- On the right-hand side we're considering paths that go from zero to M but have to go via the intermediate point M. So here we're minimizing over a larger ensemble of paths. So this is the kind of [inaudible]. And then you want to show that when you take the time to go from zero to N, you divide by N, this actually has a limit and the limit is nonrandom so it's almost really constant and it's equal to the limit of the expectations. This is one kind of application. Another application is for random walks on groups, and you want to show that the random walk has speed. So you have some kaleidagraph; you doing a random walk on the kaleidagraph and you look at the distance from your starting point to the Nth point in your walk. And, again, you can check that that satisfies such an inequality and still get existence of a limit. So I won't talk now about more applications but rather go to the formal statements since I want to finish in time. So the Subadditive Ergodic Theorem of Kingman. And I'm going to state it in superadditive version. So again we can think of some underlying probability space. So in this case the probability space is just all the edges on -- all the random variables that indicate the passage times of the edges. And you can think of a transformation key from omega to itself which is measure of preserving. And then the important things are random variables Ymn. So you can think of these as, say, the negative of these passage times. Okay, and Ymn they satisfy the subadditive inequality. So I'll just write Y zero N is, I'm sorry, the superadditive. So this is bigger than Y zero M plus Ymn. So that's one assumption. And also the shift in variance and distribution. So Ymn composed with T is Y m plus 1 n plus 1. So you see in this situation the transformation is just shifting the random variables and you see that the time to go from M plus 1 to N plus 1, it just has the same law as the time to go from M to N. And this corresponds to the shifted passage times. Okay so these are the assumptions. And then the conclusion is that if you take --. What? So, no, this is not equality in distribution. This is the actual random variables. So --. >> : T is measure preserving. >> Yuval Peres: T is measure preserving. That's right. Okay, so you should -- You really think there's really just one sequence, just like in the ergodic theorem. Yeah, so here you should -- Okay, so just think of your basic variables as these Ymn, these -- or Y zero N. And then from Y zero N you can end the transformation and you have all the variances. Okay, but the transformation just shifts the underlying space. Okay. So then --. Right, then the conclusion is that there exists the limit of Y zero N. This limit exists almost surely. Now beta is some number. In this case it's not minus infinity but it could well be infinity. Okay, the number is Y. These are finite numbers but I didn't assume integrability here, so certainly averages could go to infinity. Okay. And also -- Beta is also the limit of the expectations. >> : [Inaudible]. >> Yuval Peres: Thank you. So it's ergodic. Okay, so --. [ Silence ] The proof is -- as you see it follows the same lines as before. So first if you look at the variables which are Ymn minus the sum Y K minus 1K, K from M plus 1 to A. Then you can just see that the superadditivity assumption implies that this is non-negative. Okay, so leave -- That is a little verification. You just recursively apply this assumption again and again. And remember that this assumption together with the one on the right implies that we have the superadditivity along any interval. If you take the interval and break it in two pieces, Y of the big interval is bigger than the sum of Y's of the pieces. Right? And you just keep breaking it up until you get to this Y on the intervals of length 1. And so this is non-negative. And, okay, so --. And this partial sum can be treated with the Ergodic Theorem. So, okay, if these variables have infinite expectation then you easily conclude that limit is infinite. If they have a finite expectation then you can just use the Ergodic Theorem that tells you that averages of these will go to their mean, and you just reduce the case of Ymn to the case of Y [inaudible]. So because of such a definition so this allows us to assume that the original Ymn are non-negative. >> : So you are assuming that Y has an integral including plus or minus infinity? [ Silence ] >> Yuval Peres: Yes. So --. All right. So let's --. Yes, that's right. Okay, so... >> : So Y is an integral of a low non-negative? >> Yuval Peres: Yes. Okay, so let's first completely assume that these Y's are -- assume that these are finite. Okay. So, thanks. So then -All right. So this allows us to assume that the Ymn are non-negative. So now we just continue with no negative variables and define as before. So A -- So I guess now I'll call it beta. The lim sup of A1 over N Y zero N. [ Silence ] Okay. We fix alpha less than beta and define like before L of K as the first L so that when you take Y from K to K plus L this is bigger than L alpha. And L start of K. So now L star of K will be L of K if L of K is less than L and 1 of L of K is bigger than L. Okay, now the same logic that we've used already twice before will allow us to bound from below Y zero N. Okay, by something N minus -- essentially N minus L times alpha. I'll write it and then explain. Minus the sum over all --. [ Silence ] So I didn't tell you how we choose L but you can already guess. [Inaudible]. So the sum over all K so that L of K is --. [ Silence ] Okay. And here, as before, we choose L so that the probability of L of K, to be bigger than L, is less than epsilon. And then we -- So these L star K we don't really use them. They just kind of remind us of the argument to get this inequality. But to get this inequality, as before, we take the interval zero N and we look from zero, we look, "Do we have an interval here where L of K is less than L?" If so, we're happy and then we continue. But maybe here, when we look here, the L of K is bigger than L. Then we just take a singleton and we go to the next point. Okay. But this singleton was a special point where L of K was bigger than L. Then we go to the next point -- And so overall we cover the whole interval from zero to N minus L by good intervals where L of K is less than L and bad singletons. So when L of K is bigger than L we just take that singleton and jump to the next. So overall from the good intervals, we'll get N minus L times their length, but we're going to lose -- And here. Yeah, so I have -- We're going to lose from this. So we have this also, alpha multiplies this as well so maybe I'll write it this way. So N minus L minus the sum, all of this multiplies alpha. Right, so the total length of the good intervals is at least N minus L minus the sum of the bad locations, the number of the bad locations. Okay, and the bad locations will have to go to the next point. Okay. This is some of the key to this proof but it's similar to the keys to the previous proofs that's why we went through that. >> : [Inaudible] then the next one is not going to be bad as well? >> Yuval Peres: Maybe. But, you see, it's not -- I'm not -- It might -It kind of does give some negative information. But this is not [inaudible] these here are not completing probabilities. This is just a communitorial pointwise inequality, right, that says we gain alpha times the length of the good intervals and we lose -- I mean, but what is the total length of the good intervals? It's at least N minus L minus the number of the bad singletons. Every time we see a bad singleton we go to the next. Maybe that's another bad singleton. But then it will just enter into this sum. So the total length of the good intervals is at least what's within the parenthesis here. And then from them we get this times alpha. Okay? >> : Isn't it the same as the previous argument except you did both their cases together? Was there something different? >> Yuval Peres: Yeah, it's essentially the same. What is different here is that we didn't do actually a modification of the process for this. We just kind of paid the price here. But we're in a better position than before because we already have the Ergodic Theorem, the Birkhoff Ergodic Theorem, and we're about to use it. Okay, so that's why we didn't have to go through the same thing because now when we divide by N and we want to take a lim inf, you see these indicators are just a sequence of -- because everything is a function of a stationary process. These indicators could check, they're also stationary. So if we divide by N and take a limit we know they converge to the expectation of this indicator. And that expectation is just the probability of this event which is small; it's less than epsilon. So that's -- If we divide by N and take a lim inf, what do we get? Well here when L is constant. So when we divide by N we're going to get at least alpha minus -- Well, alpha times one minus the probability of L, say, of 1 and bigger than 1 which is bigger than L of 1 bigger than L. So this is bigger than alpha times 1 minus epsilon. Okay, and at this point we use the Birkhoff Ergodic Theorem for these random variables. Again, because we're in a stationary situation these themselves are from a stationary sequence so we can apply the Birkhoff Ergodic Theorem here. And now we're done. The lim inf here is greater than alpha and this was true for any alpha less than beta, so the lim inf equals the value. So the last comment is why is the limit the same as the limit of the expectations? So you always have -- Right so we already proved so the limit of Y zero N over N exists and equals beta. So then from [inaudible] if we take expectation we get that the expectation of the limit, which is beta, is at most the lim inf of the expectations. But this is a superadditive numerical sequence so the limit exists. So the limit exists and so beta is at most this limit. In the other direction just observe that if you take Y zero and say K times N you can break this up into N intervals of length K. Right? So if you take this and divide by N and take a limit this will be bigger than the expectation of Y zero K. Okay. Because just from the superadditive inequality you can break this up into a sum of N sums on each on intervals of length K. And then for these sums you apply the ordinary Ergodic Theorem to get that when you divide -- This is the sum of N summons -- when you divide by N, you'll get this limit. This is true for every K. So now let's take this -- So here I thought of N as tending to infinity and K is constant. So I can -- All right. So I have this and this is true. This is exactly our limit beta and it's bigger than this for every K. So taking the limit gives us the remaining inequality we need. Okay, since I promised to finish at five I won't really discuss more applications now. But any -- Let me stop here and wait for any questions. Yes? >> : What was the benefit of taking the Y [inaudible]? >> Yuval Peres: So we had -- In this inequality here, right, what did we do? We take the interval zero N, and we broke it into good intervals and bad singletons. Right. Now in the bad singletons all I said is, "Well, you know, we don't get the good contribution but we know we get something at least zero." So that's why I could write this inequality. So this gave us -- What is in the parenthesis is just the total length of the good intervals, and those all give us alpha times their length. The other things I don't know but it's no negative. >> : Oh, so the last L? Right? >> Yuval Peres: Right. Also the last L, that's right. Yes, because we stop before the end. Okay, so that's where that gets used. Okay, there are no more... >> : So you mentioned that the first proof used nonstandard analysis? >> Yuval Peres: No, not the first proof. The first proof along this argument. So the Kingman proof was... >> : [Inaudible]... >> Yuval Peres: So this is -- So this line of proof started with a nonstandard analysis proof by Taturo Kamae in 1982. And then this was, you know, Yitzhak Katznelson and Benji Weiss read that proof, understood it and understood how to remove the nonstandard analysis. But they still -- But it's -- And then this was further simplified a bit by Mike Keane and Mike Steele who gave essentially these arguments for the Birkhoff case in the subadditive, ergodic case. You want to compare to other proofs, you can look -- So say direct probability book has a proof for the Subadditive Ergodic Theorem, slightly more general version but this one applies to most applications. This is -- What I prove to you is basically the original Kingman version. And -- But the proof that [inaudible] gives which follows [inaudible] is really much harder to follow and to remember. So here I think at least the idea is pretty easy to remember. >> : So say if you kind of use your nonstandard analysis, would the proof have been simpler or no? >> Yuval Peres: Eh... >> : Can you... >> Yuval Peres: The proof -- If you assume -- I mean, but you prove a completely different statement. So if you want to verify that that statement is actually equivalent, it's much longer. Yes [inaudible]. >> : So what's the relation of this to maximal theorems? >> Yuval Peres: This is a way to avoid maximal theorems. So there's a very short proof of the Birkhoff Ergodic Theorem that comes from the Maximal Ergodic Theorem. And this was one of the roots of the original proofs. And initially it was thought, "Oh, this reduction is so easy so the Maximal Ergodic Theorem must be hard." But then Garcia came up with a very short proof of the Maximal Ergodic Theorem. So if you combine those -- There is a very -- You know, there is a proof even shorter than the one I presented going via Maximal Ergodic Theory, but that one is more mysterious than say even to the experts. >> : And that's not of the subadditive? >> Yuval Peres: Right. Right. That one doesn't translate directly to the subadditive. Okay. Thanks. [ Audience clapping ]