1 >> Kristin Lauter: Okay. So hi, everyone. Thanks for coming. Welcome to the first annual Microsoft Research Summer Number Theory Day. So I hope this will become a tradition and we have five wonderful speakers here today. First, I'd like to introduce Kiran Kedlaya, a professor of mathematics at the University of California San Diego. And he will be speaking about the Robbins phenomenon, p-adic stability of some nonlinear recurrences. Thank you. >> Kiran Kedlaya: Okay. I'd like to thank Kristin for the invitation to come visit Microsoft. This talk will be about a joint project with Joe Buhler at Center For Communications Research in San Diego. La Jolla, really. The slides will be posted on my website later today. Usually, I would have them up before the talk, but I was still writing them until about ten minutes ago. And the pre-print is not yet available either. I'm hoping that will be available in a few weeks. If anybody is interested in more details, I can tell you about them afterwards. So this talk is about some unexpected numerical stability and p-adic floating point arithmetic. So what I'm going to do is first talk about p-adic numbers and what floating point arithmetic is for p-adic numbers, and then introduce some examples of computations that you can do with p-adic numbers that have some kind of unexpected numerical stability, and finally, at the end, say a little bit about how you prove some partial results towards the conjectures that we observe numerically. We do not have complete understanding of the phenomena here, but we do have some partial results. Okay. So that's essentially the outline that I just went through. Okay. P-adic numbers in floating point arithmetic. So probably most people here know what p-adic numbers are. But just to make absolutely sure, I'll say it several different ways on this slide. So P will always be a prime number in this talk. I didn't write that explicitly, but you can guess from context. So P will be your favorite prime number. And Zp will denote not the ring of integers in mod P, which it sometimes does, but the ring of p-adic integers. So the p-adic integers can Actually, well, there's at but you don't need to know least three different nice be conceived of in at least three different ways. least one way I didn't write down using bid vectors, that. But sort of more concretely, there are at ways to think about the p-adics. One is to think of 2 doing base P arithmetic where instead of finite strings of base P digits, the digits in base P are, of course, the integers from zero to P minus 1. Instead of finite strings of base P digits, you use infinite to the left strings of base P digits and just use the normal rules of base P arithmetic to compute with them. So you can do addition and multiplication in base P, starting with the right and going to the left, and okay, if you -- even if you have to go infinitely far to the left, it still makes sense to do this. So, for instance, for P equals 2, if you take the string of all 1s, and you add this to 1, you get a string of all zeroes. So the string of all 1s represents the additive inverse of 1. In other words, this is minus 1 in the ring Z2. Which may be a familiar fact if you thought about how to represent minus 1 on a computer. So you can take it in terms of base P digits. A more sort of structured mathematical way to think of it is to take sequences in which the nth term of sequence is an element of the ring of integers modulo P to the N, and the N plus first term of the sequence reduces to the nth term sequence when you reduce the modulus from P to the N plus 1 to P to the N. So these are, so to speak, coherent sequences of the elements of the ring Z mod P to the N. In other words, these are the elements of the inverse limit of the sequence of rings, Z mod P to the NZ, where Z mod P to the N plus 1Z maps to Z mod P to the NZ by reducing the modulus. So that's a second perfectly equivalent way to think of p-adic. And the third, which is the one that has the most flavor of analysis, is define the p-adic absolute value, which is for each integer N, you declare the size of N to be P to the minus V sub P of N. Where V sub P in this talk will always note the p-adic valuation; i.e., the exponent of P and the prime factorization of N, which might be zero if P does not appear. So the more divisible that N is by P, the smaller its p-adic valuation becomes. So this is somehow inverse to the usual notion of size. But you can complete this ring Z for this p-adic absolute value and you get the same thing that I've described in the other two cases. And this description has the advantage that you can also apply it directly to rational numbers because the p-adic valuation is perfectly well defined for 3 rational numbers too. It might be negative, if you have Ps in the denominator, but you can still talk about rational numbers, non-zero rational numbers also have prime factorizations, and once you convince yourself that, oh, you should -- the absolute value of zero should, of course, be zero, that completely defines the p-adic absolute value on rational numbers, and you can complete Q for this p-adic absolute value and you get something called QP, which as a ring is just Zp with P inverted. So in terms of strings, these are again infinite to the left strings in base P, except now they're not integral. You might have the base P version of a decimal point and finitely many digits to the right of that. So you have some P denominators, of course finitely many of them. So you can represent things in this way using whatever you call a decimal point in base P, because decimal means ten, so I don't know what you call it in base P. But let's say you call it a decimal point. You can use this representation, or you can use this completion representation. The inverse limit representation doesn't work quite as well. But you can fix it if you want. So, okay. Those are the p-adics. So that's what we're going to be talking about during this talk. Now, why are we talking about p-adic numbers? So this is sort of an advertisement for why the topic of this talk is of some relevance so this definition of p-adic numbers was given by Hensel around 1900 or so, and the idea was to translate ideas from analysis, real analysis, complex analysis, into number theory. And you see that most formally by using the p-adic absolute value interpretation. So if you think of QP as the completion of Q for the p-adic absolute value, then that is somehow analogous to completing Q for the ordinary absolute value to get the real numbers, and certain operations that you can run the real numbers have parallels in the p-adics that have some number theoretic significance. For instance, if P is an odd prime, and N is an integer which is congruent to a perfect square mod P, then it's an elementary exercise in number theory that it is also a perfect square modulo P squared and P cubed and so on. And, in fact, it is a square in Zp. It has exactly two square roots in the field QP, which both happen to belong to the ring Zp. And you can even construct these square roots using an analog of whatever you call this iteration where you start -you find a root of a polynomial by this iteration where you correct your approximate route by subtracting the value of F divided by the value of F prime. In other words, you correct the error as if F were linear, and you repeatedly do that, and you get -- so, in this case if you start with an 4 approximation of the square root, which is correct mod P, then this iteration will be quadratically convergent. The number of correct digits will double each time you do this. So it's a very efficient way. I mean, you could, of course, you could try to compute the digits one at a time by just some elementary argument. But the Newton iteration gives you a quadratic. It doubles the number of digits each time. So if you wanted a thousand digits, you would need only ten steps or so. So not just definitions, but also algorithms from analysis translate very nicely into the p-adic framework. Now, we'll hear various things about p-adic numbers during the day. I suppose the final talk today will some p-adics in it. Just to give an advertisement more in the direction of computational number theory, and applications in cryptography, say, one thing that comes up in computational number theory and cryptographic applications is you have an elliptic curve or a hyper elliptic curve over a finite field and you want to know the Zeta function, which is essentially the information over the number of points on the curve over all possible finite fields. So there are algorithms involving p-adic numbers for this, which have been considered by a great many people. I think Satoh was the first person to do this in the elliptic case. And then I looked at this a lot in the hyper elliptic case. And actually, if you fire up one of the standard sort of number theoretic computer algebra systems, you will find code written that does p-adic arithmetic computations for this and various other problems. So p-adic numbers are used all over number theory these days, both on the theory side and, so to speak, on the practical side. So when you do p-adic -so this talk will drift more to the practical side, although there's quite a lot of algebra hiding that will poke out during the end. But for the moment, let me emphasize a concern on the practical side. There is an obvious difficulty computing with p-adic numbers. And it's the same obvious difficulty that occurs with trying to compute with real numbers. If you think about a real number, an arbitrary real number can only be specified by an infinite number of digits, say, in base ten. You need an infinite number of decimal digits to exactly specify a particular real number. 5 There are certain real numbers that can be specified in finite data, like rational numbers. But arbitrary real numbers cannot be described using a finite amount of data if for no other reason than the number of things you can describe in finite data is countable and the real numbers are uncountable. So no cleverness is going to solve that problem. And the same thing is true of p-adics. The cardinality of the p-adic numbers is again uncountable. It's the cardinality of the continuum. So you're not going to be able to represent arbitrary p-adic numbers, say, to a touring machine. So you're not going to be able to store them exactly on a computer. So this is a problem. So the way you deal with this with real numbers is you don't deal with exact real numbers. You deal with approximate real numbers and pretend that they're good enough for your purposes. And there are various ways to do that, which have an logs for the p-adics. And the one I'll consider in this talk is the p-adic analog of floating point arithmetic, which I suppose before computers was called scientific notation. So what is floating point arithmetic? Well, what is floating point arithmetic for real numbers? Well, if you imagine a real number if you imagine its decimal expansion, so it will have some digits before and after the decimal point, the way you represent something in floating point arithmetic is you write it as a power of ten times a real number, and, of course, you can shift powers of ten in and out of the left over real number part. So you rescale so that there are -- well, in scientific notation, I suppose you scale things so that there's exactly one digit in front of decimal point. For floating point arithmetic, maybe it's more conventional to shift it so there are no digits in front of the decimal point and the first digit after the decimal point is non-zero. So you normalize the co-factor in some way by adjusting the power of ten. So you can do the same thing in p-adics. So, right, any p-adic number can be written as -- so any p-adic number can be written as a power of P, positive, negative or zero, times an element of Zp which is not zero mod P. So in other words, if you imagine the base P digits, you can already multiply by a power of T to align the string so that the first non-zero digit occurs in the unit's place. That's so you can always represent a p-adic number as a power of P times a p-adic integer with non-zero units digit, which is to say it's a unit in the ring of p-adic integers. 6 So to makes approximations using that normalization, what you do is you fix a positive integer R, which is going to be the maximum relative precision of the numbers you're writing down, and any given p-adic number will be approximated by a rational number, which is a form P to the E times M, where E is an integer, which is the exponent, and M is an integer in the range from zero to P to the R minus 1, which is not divisible by P, and this is called the mantissa. Again, by analogy with usual floating point arithmetic. So you strip out all powers of P sod that the bit that's left over is a p-adic integer not divisible by P, and then you truncate it so that you're only keeping track of R digits in base P. Because you want the computations to be finite so you truncate in that way. So that's some scheme for approximating p-adic numbers. And, of course, when you approximate things that you really want to compute exactly, you create some errors, and these errors will probably get through the computation. And in floating point arithmetic, it certainly should be a familiar phenomenon that the errors that propagate through floating point arithmetic have the potential to get worse and worse as you go along, because, you know, they can compound. If you're off by 10 to the minus 5 in one quantity and 10 to the minus 5 in another quantity and you add them, the result might be off by 2 times 10 to the minus 5. So you have this compounding problem. You have the same thing for p-adics, but it's not quite as bad. So let me first quantify what I mean by accuracy and then I'll explain you accuracy degrades in floating point arithmetic. So by the accuracy of a p-adic floating point approximation to a p-adic number, if the p-adic in number is X and the p-adic approximation is P to the E times M, what I'll say is look at M minus P to the minus E times X. So take the difference and divide by P to the E, sort of to renormalize and then look at the p-adic valuation of the result unless that's negative, in which case I don't allow accuracy to be less than zero. If the accuracy is zero, I give up and say, well, I've lost any control. You might prefer to have the accuracy zero still mean something and have the accuracy minus 1 be lose all control. But to avoid any annoying corner cases, I'm just going to declare that once the accuracy is zero, then I give up. I have no information anymore. So in words, what this equation is doing is counting the number of correct 7 p-adic digits of the mantissa starting from the right. At least that's what it's doing if the exponent is the correct exponent for X. You could imagine having an approximation where you even wrote down the wrong exponent, and then it would be a really bad approximation. So, for example, if I take minus 1, which, of course, is -- remember, that's the p-adic number, which is the p-adic integer represented by a string of 1s, going infinitely far to the left when P equals 2. So minus 1 in base 2, if I write down these floating point approximations of it, where these are meant to be integers in base 2, that's what the subscript 2 means, then this has accuracy 3, because it has three digits correct at the end. Any digits that happen to be correct over here don't matter, because as soon as you have an incorrect digit, you stop keeping track. And likewise, here you only have one correct digit before you go wrong. This is not a valid approximation, because in my floating point approximations, I insist that the last digit always be 1. So this is sort of -- this is not even a valid approximation. This is a valid approximation, but it has the exponent wrong. So if you look at what happens when you compare, well, I mean, you can go through this definition and you'll see that it actually comes out to be zero. So essentially, if you have the wrong exponent, the approximation is useless. So when I talk about the accuracy of a floating point approximation, this is what I'm going to mean. The number of digits starting from the right that are correct in the mantissa. Okay. Now, how do addition and multiplication affect the accuracy of an approximation? Well, let me start with multiplication, because that behaves very well. If I have two numbers, X and Y, and these approximations of them, then it's clear what I should take as a floating point approximation of X times Y. That should be times. Let me fix that. So that should be X times Y, so it's clear that this is somehow a reasonable approximation to take, and, in fact, it's such a good approximation that the accuracy is no less than the minimum of the accuracies of the original two approximations. So this if is accurate to five digits and this is accurate to seven digits, then this approximation will be accurate to at least five digits. So in some sense, there's no additional loss of accuracy when you do a 8 multiplication. The only degradations of the approximation are the ones that came in to the two inputs. And they don't even compound. You just take the worse of the two. So this is, in some sense, even better than real numbers. In real numbers, you always have to compound -- you always have to add the losses of -- the error terms. But here, you just, you take the minimum of the two accuracies and that's the accuracy of the approximation of the product. This addition is not quite so straightforward. So addition is not quite so straightforward. So what do you do? Of course, you should try adding these two approximations but, of course, this will just give you some number which you have to rewrite in the form P to the exponent times mantissa, and then you have -- so first, you have to collect the powers of P, and then you might have to round the mantissa, because the mantissa might have gone out of your range. You may have to truncate it to R digits. So, for example, if E1 is strictly less than E2, then the power you're going to get is just E1, because when you add these two things together, M1 plus -- that should be sub. So when you add these two things together, M1 is not divisible by P, and this is a positive power of P times a p-adic integer. So this is divisible by P. So the result is not divisible by P, because you have one thing that with nonzero units, p-adic digit and one thing with zero units p-adic digit. So this is now a valid mantissa, except for the fact that it might have too many digits. So when I read brackets, I mean round the number of digits. So if these two things have different valuations, then all you have to do is round. And if you stare at this for a minute, you see that the accuracy you get is no less than the minimum accuracy of the two original approximations. So in this case, again, the operation is exact. You don't experience any further loss of accuracy beyond what came into the computation. And likewise, the thing is completely symmetric. So if E1 is bigger than E2, the same argument applies in reverse. So where do you lose accuracy? You lose accuracy if the two things, if the two numbers you're putting in have the same p-adic valuation. Or you my may lose accuracy in that case. And the reason is because in this case, it looks good at first. You factor out P to the E1 and you just add the two mantissas. But, of course, this is only a valid floating point approximation if the sum of M1 and M2 is not divisible by P. So if it is 9 divisible by P, here you get a problem. If the valuation is F, strictly bigger than zero, you then have to shift a power of P to the F from the sum of the mantissas into P to the E1 before you do the rounding. In terms of digits, this had the effect of you start with, say, R digit of p-adic digits which you know are correct, say, if you have no loss of accuracy beforehand. So then you add these two things, and the sum of two things which are known to R digits is going to be known to R digits, but then you have to divide by P to the F. So you shift -- you have all these zeroes at the right that you have to shift out. And what comes in at the left may or may not be the correct p-adic digits anymore for the things you're actually computing. So you're essentially adding as many garbage digits at the left as you have zeroes that appear on the right. So this is the source of compound -- this is the source of loss of accuracy. And you might expect, if you perform a sequence of arithmetic operations, using p-adic floating point arithmetic, typically, you experience progressive loss of accuracy over the course of the computation, because you encounter this situation over and over again, and you kind of introduce more and more garbage digits. And so if you -- you can sort of witness this first hand if you do p-adic number computations in magma or sage. Things really do get worse and worse as you good along. And so you're forced to consider questions that are analogous to questions in what's called numerical stability. There's a quite a large subject of mathematics devoted to the problem of doing various computations in such a way so as to limit losses of accuracy in floating point computation. So, for example, numerically stable linear algebra is a whole sub-field of, I suppose, linear algebra where you are sometimes forced to do things in quite a different way than you might expect, in order to avoid losses of precision. So it's kind of algorithmic algebra of quite a different flavor than we might be used to in number theory. But if we want to do p-adic number computations, then we do have to think about numerical stability from the p-adic point of view. So it's not completely irrelevant to our computations how one manages loss of accuracy. So there's a general framework you can use to try to study this, but in this 10 talk, I'm not going to talk about the general problem of studying p-adic numerical stability. I want to talk about some cases of unexpected numerical stability. So cases where you do a computation and you expect a certain amount of loss of accuracy and, in fact, you lose much less than you thought. These cases appear to have some deep algebraic origin, which is not completely understood, although what we do in the rest of the talk will give some partial explanation. I should maybe mention that for those of you who have heard me talk about doing -- or some collaborators talk about doing computations of Zeta functions using p-adic methods, p-adic [indiscernible] methods, there are examples of numerical stability, of surprising numerical stability there, but those are essentially doing linear computations so they have linear algebra computations. So they have, in some sense, linear explanations. So those led me to consider these things. That's somehow my original motivation for considering what I'm going to talk about in the rest of the talk was motivated by having seen those examples doing linear computations, but these turn out to be much deeper. And you'll see there are a lot of nonlinear things happening. We'll be doing a lot of divisions and encountering some strange cases where you don't really lose precision, even though you think you should. So okay. So I want to -- so I'm going to go back to -- so, well, I'm not going back in some sense. So now I want to describe an observation made by David Robbins in the early 2000s. It was published just after he died in 2005 of one example that he was working with that turned out to have this surprising numerical stability. And this is the example that kind of triggered the work that we're doing here. And it comes from a bit of 19th century mathematics, which starts out with a little matrix identity which is due in this level of generality Jacobi. There are some special cases due to our people, but this version is due to Jacobi. If you take an N by N matrix. So this is an identity involving various determinants. One of them is the determinant of the whole matrix. But you consider various sub-matrices as well. And I'll have one on the next slide. Let me state the identity first. So if you take the determinant of M. So M is an N by N matrix. But now there are four different sub-matrices of N minus 1 sitting in the top left, top right, bottom left and bottom right. So, for instance, A is the determinant of the matrix of size N minus 1 sitting in the 11 top left. So you chop off the bottom row, the bottom row on the right column. And likewise, B is sitting in the top right, C is sitting in the bottom left, D is sitting in the bottom right. And E is the N minus 2 -- determinant of the N minus 2 matrix sitting in the middle. So you chop off all the outside rows, columns. And then the identity of Jacobi, which is an entertaining thing to prove if you've never seen it before, is that AD minus BC equals EF. So if you take the determinant of the 2 by 2 matrix performed by A, B, C and D, you get the determinant of F times the determinant of this thing in the middle. If you imagine, these things are homogenous polynomials of degrees N minus 1, N and N minus 2, then you need the E in there to balance. So both sides have degree twice N minus 1. So this is a little, a fun little identity just to illustrate it. So if I take this 3 by 3 matrix, okay, A is the determinant of the top left corner, B is the determinant of the top right corner, C is the determinant of the bottom left corner, D is the determinant of the bottom right corner, E is, in this case, the 1 by 1 determinant in the middle so it's just at the entry in the middle. If I did a larger example, which I will a little bit later, this will actually itself be a determinant. And F is a determinant of the whole thing, which since it's 3 by 3, we remember how to compute it. Turns out to be 5. And we can check that 3 times 10 minus minus 3 times minus 5 is the same thing as 3 times 5. So one thing you're supposed to take away from this is that it might not be obvious when you first start doing this computation that this times this minus this times this will be divisible by this. It's not obvious that AD minus BC will be divisible by E. But, of course, it is because the quotient is F and the determinant of a matrix with integer entries is an integer because we have another way to write it as a polynomial in the entries, say, from the definition in terms of summing over transversals. So this identity forces an interesting divisibility property of AD minus BC, which can be used to construct interesting examples of recurrences that involve rational functions that give you integer entries. So, for example, there are some examples due to Conway and Guy, maybe, of things called number freezes that come up this way. But I'm not going to talk about that. I'm going to talk about a proposed 12 application of the Jacobi identity, which is due to a mathematician by the name of Charles Leftwich Dodgson, who in his spare time was the children's author Lewis Carroll, but for the purposes of this talk, he's Dodgson, because this is from his day job as a mathematician. So Dodgson proposed to use Jacobi's identity as a thought to commute determinants, because, right, essentially what it -- right, if you haven't -well, let me just say what the algorithm is. So given a square matrix M, so in this picture at the bottom, this is the matrix M. So I successively compute connected minors of size K from those of size K minus 1 and K minus 2. So if you imagine the entries of the matrix that I start with are the 1 by 1 minors. The minors are the -- are sort of determinants of the square sub-matrices. And the 1 by 1 sub-matrices are just the entries. The 1 by 1 sub-matrices are the entries. To get started, I need zero by zero sub-matrices. So I declare that the determinant of the zero by zero matrix is 1. So I need a whole bunch of 1s to get started. And so, right, using -- so the first step in this algorithm is I start with my 1 by 1 minors and then I compute the 2 by 2 minors by, well, you can imagine that this is a special case of the Jacobi identity where I take this times this minus this times this divided by this. But, of course, it's just the usual formula for the 2 by 2 determinant. So I use Sage to do this so I hopefully got the right answer. Modulo transcription error. Sage did it right. I hope I copied it correctly. So when you compute the 2 by 2 -- so, for example this one is minus 6, minus 1, minus seven and so on. Okay. Then the next -- so now I have the -- so now I can throw away this and I keep the 1 by 1 and the 2 by 2 minors and I can use Jacobi's identity again to compute the 3 by 3 minors. And I put some interesting entries in the middle here so that there's some falsifiability, there's some -- of course, apriori, there would be some possibility of getting interesting denominators when you divide by these things. But the truth of the Jacobi identity ensures that when I take this times this minus this times this and divide by this, I get -- this looks like a sign error here. Oh, minus, right. So plus 7 minus plus 4 divided by minus 3 is minus 1. So I did, in fact, copy this correctly from Sage. And you can check the other ones yourself. So that gives me -- so these are the 3 by 3 minors of my original 4 by 4 13 matrix. So from the 2 by 2 and the 3 by 3 minors, I run the process one more time and I get -- so I take this times this. So minus 1 has minus 2, minus 4 times minus 1 divided by minus 1, and I get plus 2, which is, in fact, the determinant. So Dodgson called this condensation for a natural reason. As you move along this process, you're working with smaller and smaller matrices. So you're somehow condensing the original matrix towards its determinant. So you start with this 4 by 4 matrix and this auxiliary 5 by 5, which is the minors, and then you replace it with a 4 by 4 and a 3 by 3, 3 by 3 and 2 by 2, 2 by 2 and the 1 by 1 that you're looking for. So you condense towards the determinant. Okay. So this has some nice features, not all of which were formulated by Dodgson, but which are apparent in modern hindsight. So you can check that this is an O of N cubed algorithm, just like any other reasonable algorithm for computing determinants, like Gaussian elimination. So, of course, on one hand, that means it's not going to get Henry Cohen excited. It's not going to solve the fast matrix multiplication problem, but it's not any worse than any other natural algorithm for computing determinants. So it's a reasonable algorithm in terms of complexity. It also has these extra bits of algebraic structure. The intermediate terms belong to the same ring as the entries of the original matrix, because they are determinants of sub-matrices. So, for example, if M has integer entries, then all of the intermediate terms are integers, not more general rational numbers. This, on one hand, helps reduce the size of the numbers involved, because you're not carrying around numbers with big numerators and big denominators, you're carrying around integers. Also, I think this is something that Dodgson remarked on, if you're doing this by hand, which he would have been in the 19th century. If you're doing these computations by hand, it's very useful to have error checks to help you confirm that you didn't make a mistake. And the fact that AD minus BC is divisible by E might be a very good check. So for hand computations, it was actually a very nice algorithm. An observation which I don't think Dodgson made, because he wouldn't have been interested in large scale computations, but is relevant in the modern world, is that condensation is highly parallelizable with very little communication. Because each step of the operation is so imagine you have a square grid of 14 processors and each one is keeping track of a K minor and a K minus 1 minor. Then each one wants to compute a K plus 1 minor. Well, essentially, it only has to compute -- communicate with a couple of neighbors in order to do that. So has very, very good storage properties. It's very little communication and everything can happen in parallel. I mean, it's a fantastically nice algorithm from that point of view. I don't actually know whether much use has been made of this in the real world. And probably, that's because of the incredibly serious disadvantage of the condensation algorithm, which is that it doesn't always work. This is a bit of a problem with describing an -- I mean, I shouldn't really use the word algorithm here, because an algorithm is generally thought to mean something that actually succeeds in computing what it's supposed to compute. And condensation does not do so, because there's this tiny little problem that one of these things that you have to divide by might be zero. This does not falsify the Jacobi identity, because that just means in this case AD minus BC is also equal to zero. But it means that you can't solve for F in this equation, because all you know is that zero equals zero. So this is a problem, and Dodgson observed this. In his examples, he observed it doesn't occur very often for him, because sort of random numbers are generally not going to be zero with high probability. And even if they do occur, even if you do hit a zero, well, you have some hope, because you know other things about determinants. For instance, if you switch two of the rows or the columns of a matrix, you change the determinant only by a sign. So potentially, you could try to manipulate the matrix in some way to essentially shake out one of the zero sub minors and then repeat the computation. This is not a very systematic description of what to do, which is maybe why condensation never really took off as a method, but one can do it. So okay. So as I said, condensation has these incredibly good features and this incredibly bad feature. So because of the very bad feature, it went ignored for a very long time, until Robbins started looking at it, and Robbins had the idea that while he was interested in computing determinants of matrices over FP, and he realized, well, if you're working over FP, okay, you might hit a zero at some point during the condensation process, and then you're a bit stuck. But you could work around this problem by lifting the problem to Z so 15 that something that starts out being zero lifts to something which is only guaranteed to be zero mod P. And so it has a very good chance of being non-zero. So since the condensation recurrence is completely algebraic, it commutes with ring homomorphisms. So if you lift your matrix from FP to Z, compute its determinant by condensation and then project down, you get the protect answer over FP. Now, this is not ideal, because working over the Z means dealing with integers that get very, very large. And if your matrix, say your matrix is 100 by 100, even if you start out with numbers in the range of zero to P minus 1, well, the determinant of the matrix over Z might be -- I mean, I have a hundred digits in base P, because it's a polynomial degree of a hundred in the entries. So you get some very large integers that you don't want to deal with. So but, of course, you only want the answer mod P. So what Robbins said was, well, you can't quite do everything over Zp, because you have to do these divisions along the way. So you might divide by things which are not zero, but are divisible by powers of P. So you can't work in -- you can't work in fixed precision in Zp. You can't work in Z modulo or fixed power of P. But you can do floating point arithmetic with a fairly small relative precision. For instance if it sits in a machine word, then that makes it very efficient. So, for example, if P is 2 and your modulus is -- your relative precision is, let's say, less than 64, this works quite well. But to get an answer, you have to guarantee that the final results, the approximation of determinant that you end up with has accuracy at least 1. You need to know that there is at least one digit correct at the end so that when you reduce mod P, that digit is guaranteed to be the correct value of the determinant. So Robbins was then led to test numerical stability of condensation, and he discovered that accuracy losses don't compound the way he was expecting. So let me state what he observed. What owe observed, so M is a square matrix with entries in Zp, and I'm going to represent each entry of the original matrix with a p-adic floating point approximation of accuracy at least R. So R is going to be the maximum relative precision I start with. And then I'll do the computation of condensation using floating point arithmetic. And along the way, I divide by various p-adic numbers. They're, in fact, all p-adic integers, assuming I don't run out of digits. 16 So let D be the maximum p-adic valuation of any denominator that I encounter. And then let A be the accuracy of the computed determinant, which I'll actually compute as an absolute accuracy. In other words, I won't renormalize. I'll just take the difference between the computed determinant and the actual determinant and take the p-adic valuation. So this is going to say something slightly stronger than computing its relative precision. Actually, no, it's going to maybe say something weaker. But this is what I'm interested in at the end. I'm interested in making sure that the -- I want to know that A is at least 1. I want to know that the thing that I computed is correct mod P so that when I reduce mod P, I'm correctly computing the determinant of the original mod P matrix. And what Robbins observed by doing billions of examples for small Ps, mostly P equals 2, but I think he tried some other small Ps also. What he observed numerically is that the accuracy of the approximation is always the relative precision minus this maximum valuation. In other words, the loss of precision -- the loss of accuracy is bounded by the single largest denomination -- valuation of a denominator, which is essentially the single largest loss of accuracy at an individual step for computation. Which is not what you -- this is not what you observe typically. Typically, you observe that when you lose accuracy at multiple steps, those losses compound. But you don't observe that in this case, and what we have proven is a weaker statement along these lines. We've proved that if you add a factor of 3 here, then you do get a true lower bound. Of course, you would like to get rid of the factor of 3, especially because experiments suggest that this is really best possible. You get equality quite often here. So we would like to get rid of the factor of 3. But if I have time at the end to show a bit about how the proof goes, you will see there's a bit of a gap in our methods that prevents us from doing that. But we do prove a qualitative version of this observation that the loss of accuracy is controlled not by the sum of losses of accuracy during the computation, but by the maximum, which is very much not the typical case. If you just try computations of not of this special form, you experience the generic case. 17 Okay. So in order to prove this, so it's actually kind of complicated to work directly with the condensation recurrence. So our approach to proving this was actually generalized first. Try to figure out a more general classic statement of which the conjecture of Robbins is a special case and try to give some unified proof of these. So this can be illustrated by taking additional examples. So here's an example that is pretty different but has a similar shape to the Robbins -- to the condensation, the adoption equation. This is an observation of Michael Somos originally, that if you start with four elements of a ring, say R is an integral domain and you start with Xnon, X1, X2, and X3, which are units in that integral domain and then you compute this recurrence in rational functions, XN plus 4 is XN plus 1 times XN plus 3, plus XN plus 2 squared divided by XN, then despite the fact that you do divisions along the way, every term in this recurrence is guaranteed to be in the original ring. So any denominators that you might see actually cancel out. Any denominators introduced by this thing are actually cancelled out by divisibility in this thing. And there is an interpretation of this in terms of elliptic, I suppose elliptic divisibility sequences, which I will not introduce, but I'll give a different explanation of why this is true later. But let me state a version of the Robbins observation for this recurrence. So if I take R to be the p-adics and now again I represent each initial term of the recurrence with a p-adic floating point approximation of accuracy at least R, then compute the recurrence out to XN using floating point arithmetic, again let D be the maximum p-adic valuation of any denominator that I see, and let A denote the absolute accuracy of the computed value of XN. So the p-adic valuation of this thing minus the computed version. Then we prove that actually, the same inequality as in Robbins conjecture holds, and this time it's a theorem. So we proved that the accuracy here is always at least the number of correct digits to start with minus the maximum valuation of any denominator along the way. So clearly, there are other examples where you get this control of the loss of precision, loss of accuracy. And so this suggests making a general definition. If you have a recurrence defined by rational functions defined over Zp, you can talk about the A, R and D as I had before, the absolute accuracy, the initial 18 relative precision and the denominator valuation. If you always have A greater and equal to R minus D for any choice of the initial terms, we'll say the recurrence satisfies the strong Robbins phenomenon. And if you have to stick a constant factor in front of this guy here to get the bound, then I'll say has the weak Robbins phenomenon with the correction factor. So here's an example where the correction factor is needed. So this is another example that was investigated by Somos. It's the Somos recurrence of length 6. It has the same pattern, N plus 1, N plus 5, N plus 2, N plus 4, N plus 3 squared for XN. You can sort of guess what the pattern is. I should say the pattern only works up to seven. If you figure out what the analog is with length eight, it does not have the right integrality property anymore. So this one does have the integrality. X1 through X5 are units in an integral domain. Then XN is in R. I believe this can be interpreted using the analog of elliptic divisibility sequence for a certain genus two curve, but I don't remember so don't press me on that. I can look that up if necessary. But I'm not going to use that interpretation in this talk. So this has this unexpected integrality property, but when you try the Robbins phenomenon, when you work over Zp with floating point arithmetic, you observe that the -- the bound A -- you don't get a bound A greater than or equal to R minus D. You only get a bound A greater than or equal to R minus 2D experimentally, and our results are not strong enough to apply even that. We only get a correction factor of 5. The lower the correction factor, the better the result is, because you're getting a stronger lower bound on A when you subtract less stuff. So there are examples where you do get something like the Robbins phenomenon, but you need this correction factor. And not just in the proof. Sometimes even in the statement, even in the optimal statements. So there's something call it the Laurent phenomenon that's been observed in algebra, I suppose, maybe combinatorial [indiscernible]. There is an incredibly rich theory of recurrences which are computed by rational functions but with a property that their terms can be expressed as Laurent polynomials in the initial data. So in particular, if the initial data are units, Laurent polynomials are polynomials where you allow negative powers of the variables. So if you're plugging in units, you will actually get elements of the ring you 19 started with. If you plug in things that are not units, you will encounter some denominators, but they're controlled by the initial data. So recurrences that have this property are said to exhibit the Laurent phenomenon and there's sort of a unified theory. It doesn't completely capture the whole Laurent phenomenon, but there is a unified theory due to Fomin and Zelevinsky that captures many cases of the Laurent phenomenon using something called the caterpillar lemma, which I won't try to state in this talk. The caterpillar is a certain graph that has a shape like a caterpillar. And it's combinatorial lemma involving this graph. I'll show an explicit example of it on the last couple of slides. If you have an example of a recurrence which has the Laurent phenomenon as explained by the caterpillar lemma, it also exhibits the weak Robbins phenomenon for a certain correction factor, which we can write out explicitly but it's typically not the best possible one. So, for example, for condensation, this general -- when you apply this general rule to condensation, you get the weak Robbins phenomena with a correction factor of 3, which is a theorem I stated earlier, but it's not best possible. The best possible correction factor should be 1. So this is a qualitative version of the Robbins phenomenon for Laurent recurrences. But it's not quantitative. It's not best possible. And generally, if you have something not exhibiting the Laurent phenomenon, it doesn't exhibit the weak Robbins phenomenon either. The accuracies of approximations get worse and worse and worse as you go along. There is a special class that we've identified that look like cases that satisfy the strong Robbins phenomenon. These are things that are related to cluster algebras, which I won't say what they are, but they have the property that recurrences you look at have to have two monomials sitting up here divided by a single term down here. So the Somos-6 sequence didn't have that property, but if I omit the middle term, then I have something that has this, so to speak, binomial shape. This is related to cluster algebras and it experimentally exhibits the strong Robbins phenomenon. The correction factor drops to 1 in this case. I should say that if you don't make this restriction, we conjecture, based on some evidence, that correction factors might be arbitrarily large. You can 20 make an example of recurrence with a weak Robbins phenomenon where the correction factor is a hundred. But for these special shapes, of which there are quite many that come from cluster algebras, you get the Robbins phenomenon of 1. And I should say that condensation, of course, has this shape because it's AD minus BC equals E over F. So this includes the Robbins conjecture. So in the last three minutes or so, I just want to illustrate a little bit of the algebra under the hood that goes into the statements. So I won't go through this in much detail, but I will point out that to get some idea of how you prove the Laurent phenomenon for, say, the Somos-4 recurrence, the rough idea is that instead of just trying to prove that these things are Laurent polynomials in the input data, you make a stronger induction hypothesis. So you carry along certain extra terms. This, of course, is XN plus -- this is the thing that computes XN plus 4. This is the thing that computes XN minus 1. So this is a term that steps the recurrence forward. This steps the recurrence back, and these two are slightly mysterious auxiliary terms, but they're cooked up so that when you try to do the induction, you can do it. So, for example, if you want to step this process one forward, one of the things you have to check is that this thing with the indices shifted by one is Laurent polynomial. So when you write it out, you substitute for XN plus 4 using the recurrence, and then you get some funny things. And one of the things you get is precisely this auxiliary term, and so you have something expressed both with an N plus 1 and with an N in the denominator. But you can also check by induction that any two of these four guys generate the unit ideal in the ring. So you don't actually see any denominator, because you have two -- if you have two co-prime -- if you have two different expressions of the same quantity, with denominators that are co-prime, then there can't actually be any persistent denominator in the result. So you actually do get, in this case, p-adic integer. So you can use this method to try to give an algebraic proof of strong Robbins for Somos-4. The way you set it up is okay, you imagine computing a sequence that starts out the same as your original sequence, but you modify the recurrence by putting in some junk terms. So you put in multiplicative factors of the form 1 plus P to 21 the R times a variable. And this corresponds to the fact that remember, your mantissas are only stored to R digits. So anytime you write down an approximation, of the ambiguity as to what p-adic number it represents is represented by multiplying a factor of the 4 on 1 plus P to the R times mystery number. So you put these factors in. You have some discretion over how -- you can consolidate some of these factors. So you could even get away with just one of them here. But let me put one in on each side. And now I claim that if these things inner Zp, that the difference between the -- the valuation of the difference, Y minus XN, is at least R minus the maximum valuation of any denominator. So just the Ys that you encounter up to the computation of XN. Okay. And the way you prove that is you go through the proof of Laurent phenomenon that I sketched on the earlier slide and you show that, well, if you modify the error terms so that this one is divisible by YN and Y in plus 2, and this is divisible by YN, YN plus 1 and YN plus 3, in other words, each of these things is divisible by all of the Ys, all of YN through YN plus 3 that are not present in this product. So if you put all of them -- so there could be missing variables on each side, then YN is actually a Laurent polynomial in the input data and an ordinary polynomial in the error terms. So this follows by a careful modification of the original proof, and this gives you the Robbins phenomenon with an error factor of 3, but I observed earlier that no two of these things can have a common factor. So really, you only get a 1 here, because these there's things, only one of them can actually be contributing anything. So you really only get a 1 here, and that's the strong Robbins in this case. So my last slide is the analog of this for weak Robbins. So you can do a formally similar thing for Robbins and then get the weak -- for condensation and you get weak Robbins with a factor of 3. But in this case, it's not the case that these five guys are forced to be co-prime. They could share, right. I mean, if you just have a matrix, it's possible that many of the minors can be divisible by powers of P. So you can't use the same trick as I used earlier to get rid of the -- to get the 3 down to 1 here. And this is why the theorem that I stated about condensation has this factor of 3 in it. It's because I'm forced to have put three terms in here to get an algebraic statement. 22 Now, the hope is that because this example is related to cluster algebra, if we show this to enough experts in the area of cluster algebras, which I'm hoping to do at MSRI this fall sometime, because there's a special program if I show this to enough experts in cluster algebras, one of them might explain why the theory of cluster algebras gives me a better algebraic statement than the one I got straight out of the caterpillar. So the hope is that by casting this in the language of cluster algebras, we can recruit some help from experts in that theory, get a better estimate and really nail down the strong Robbins phenomenon for condensation. But this is as far as we've gotten right now is this factor of 3. So my time's up so I'll stop there. Thank you. >> Kristin Lauter: Questions? >>: Once you have bounded the correction factor to some constant, could be C plus 1 or whatever, is there a way to recover this loss of data? >> Kiran Kedlaya: So if you know a bound on the correction factor, can you recover the missing digits. >>: Yes. >> Kiran Kedlaya: I suppose you mean without redoing the computation to more digits, because that's certainly one thing you can do is you can try to just redo the computation with higher initial precision to correct for the loss that you experience along the way. I don't know a good way to recapture the precision that you've lost, other than to start with more precision in the first place. But certainly, that's the idea -- that would be the thing you would want to do is figure out what the loss of precision is and if you've lost too much precision, then go back and start with more. But the Robbins phenomenon helps you show that -- I mean, the problem is, of course, that without doing the computation exactly, you don't necessarily know exactly what the loss of accuracy is compared to the exact computation. So you need this theorem to guarantee that the thing that you computed approximately has a certain number of correct digits. And if you need more 23 digits, you can go back and repeat the computation with more digits to start with. >>: This number D, your sort of most loss, is there any indication that it doesn't get huge at any point? >> Kiran Kedlaya: It could get huge. On the other hand, it is just a valuation, so to speak, of a random p-adic number. So if you really believe that, then it shouldn't be bigger than 5 with probability more than P to the minus 5. I mean, that's not a guarantee, but that's a good heuristic, and that's consistent with experiments. >> Kristin Lauter: Other questions? Okay. Let's thank Kiran again.