1

advertisement
1
>> Kristin Lauter: Okay. So hi, everyone. Thanks for coming. Welcome to the
first annual Microsoft Research Summer Number Theory Day. So I hope this will
become a tradition and we have five wonderful speakers here today.
First, I'd like to introduce Kiran Kedlaya, a professor of mathematics at the
University of California San Diego. And he will be speaking about the Robbins
phenomenon, p-adic stability of some nonlinear recurrences. Thank you.
>> Kiran Kedlaya: Okay. I'd like to thank Kristin for the invitation to come
visit Microsoft. This talk will be about a joint project with Joe Buhler at
Center For Communications Research in San Diego. La Jolla, really. The slides
will be posted on my website later today. Usually, I would have them up before
the talk, but I was still writing them until about ten minutes ago.
And the pre-print is not yet available either. I'm hoping that will be
available in a few weeks. If anybody is interested in more details, I can tell
you about them afterwards.
So this talk is about some unexpected numerical stability and p-adic floating
point arithmetic. So what I'm going to do is first talk about p-adic numbers
and what floating point arithmetic is for p-adic numbers, and then introduce
some examples of computations that you can do with p-adic numbers that have
some kind of unexpected numerical stability, and finally, at the end, say a
little bit about how you prove some partial results towards the conjectures
that we observe numerically. We do not have complete understanding of the
phenomena here, but we do have some partial results.
Okay. So that's essentially the outline that I just went through. Okay.
P-adic numbers in floating point arithmetic. So probably most people here know
what p-adic numbers are. But just to make absolutely sure, I'll say it several
different ways on this slide. So P will always be a prime number in this talk.
I didn't write that explicitly, but you can guess from context. So P will be
your favorite prime number. And Zp will denote not the ring of integers in mod
P, which it sometimes does, but the ring of p-adic integers.
So the p-adic integers can
Actually, well, there's at
but you don't need to know
least three different nice
be conceived of in at least three different ways.
least one way I didn't write down using bid vectors,
that. But sort of more concretely, there are at
ways to think about the p-adics. One is to think of
2
doing base P arithmetic where instead of finite strings of base P digits, the
digits in base P are, of course, the integers from zero to P minus 1. Instead
of finite strings of base P digits, you use infinite to the left strings of
base P digits and just use the normal rules of base P arithmetic to compute
with them.
So you can do addition and multiplication in base P, starting with the right
and going to the left, and okay, if you -- even if you have to go infinitely
far to the left, it still makes sense to do this.
So, for instance, for P equals 2, if you take the string of all 1s, and you add
this to 1, you get a string of all zeroes. So the string of all 1s represents
the additive inverse of 1. In other words, this is minus 1 in the ring Z2.
Which may be a familiar fact if you thought about how to represent minus 1 on a
computer.
So you can take it in terms of base P digits. A more sort of structured
mathematical way to think of it is to take sequences in which the nth term of
sequence is an element of the ring of integers modulo P to the N, and the N
plus first term of the sequence reduces to the nth term sequence when you
reduce the modulus from P to the N plus 1 to P to the N. So these are, so to
speak, coherent sequences of the elements of the ring Z mod P to the N. In
other words, these are the elements of the inverse limit of the sequence of
rings, Z mod P to the NZ, where Z mod P to the N plus 1Z maps to Z mod P to the
NZ by reducing the modulus.
So that's a second perfectly equivalent way to think of p-adic. And the third,
which is the one that has the most flavor of analysis, is define the p-adic
absolute value, which is for each integer N, you declare the size of N to be P
to the minus V sub P of N. Where V sub P in this talk will always note the
p-adic valuation; i.e., the exponent of P and the prime factorization of N,
which might be zero if P does not appear.
So the more divisible that N is by P, the smaller its p-adic valuation becomes.
So this is somehow inverse to the usual notion of size. But you can complete
this ring Z for this p-adic absolute value and you get the same thing that I've
described in the other two cases.
And this description has the advantage that you can also apply it directly to
rational numbers because the p-adic valuation is perfectly well defined for
3
rational numbers too. It might be negative, if you have Ps in the denominator,
but you can still talk about rational numbers, non-zero rational numbers also
have prime factorizations, and once you convince yourself that, oh, you
should -- the absolute value of zero should, of course, be zero, that
completely defines the p-adic absolute value on rational numbers, and you can
complete Q for this p-adic absolute value and you get something called QP,
which as a ring is just Zp with P inverted. So in terms of strings, these are
again infinite to the left strings in base P, except now they're not integral.
You might have the base P version of a decimal point and finitely many digits
to the right of that.
So you have some P denominators, of course finitely many of them. So you can
represent things in this way using whatever you call a decimal point in base P,
because decimal means ten, so I don't know what you call it in base P. But
let's say you call it a decimal point. You can use this representation, or you
can use this completion representation. The inverse limit representation
doesn't work quite as well. But you can fix it if you want.
So, okay. Those are the p-adics. So that's what we're going to be talking
about during this talk. Now, why are we talking about p-adic numbers? So this
is sort of an advertisement for why the topic of this talk is of some relevance
so this definition of p-adic numbers was given by Hensel around 1900 or so, and
the idea was to translate ideas from analysis, real analysis, complex analysis,
into number theory. And you see that most formally by using the p-adic
absolute value interpretation. So if you think of QP as the completion of Q
for the p-adic absolute value, then that is somehow analogous to completing Q
for the ordinary absolute value to get the real numbers, and certain operations
that you can run the real numbers have parallels in the p-adics that have some
number theoretic significance.
For instance, if P is an odd prime, and N is an integer which is congruent to a
perfect square mod P, then it's an elementary exercise in number theory that it
is also a perfect square modulo P squared and P cubed and so on. And, in fact,
it is a square in Zp. It has exactly two square roots in the field QP, which
both happen to belong to the ring Zp. And you can even construct these square
roots using an analog of whatever you call this iteration where you start -you find a root of a polynomial by this iteration where you correct your
approximate route by subtracting the value of F divided by the value of F
prime. In other words, you correct the error as if F were linear, and you
repeatedly do that, and you get -- so, in this case if you start with an
4
approximation of the square root, which is correct mod P, then this iteration
will be quadratically convergent. The number of correct digits will double
each time you do this.
So it's a very efficient way. I mean, you could, of course, you could try to
compute the digits one at a time by just some elementary argument. But the
Newton iteration gives you a quadratic. It doubles the number of digits each
time. So if you wanted a thousand digits, you would need only ten steps or so.
So not just definitions, but also algorithms from analysis translate very
nicely into the p-adic framework.
Now, we'll hear various things about p-adic numbers during the day. I suppose
the final talk today will some p-adics in it. Just to give an advertisement
more in the direction of computational number theory, and applications in
cryptography, say, one thing that comes up in computational number theory and
cryptographic applications is you have an elliptic curve or a hyper elliptic
curve over a finite field and you want to know the Zeta function, which is
essentially the information over the number of points on the curve over all
possible finite fields.
So there are algorithms involving p-adic numbers for this, which have been
considered by a great many people. I think Satoh was the first person to do
this in the elliptic case. And then I looked at this a lot in the hyper
elliptic case. And actually, if you fire up one of the standard sort of number
theoretic computer algebra systems, you will find code written that does p-adic
arithmetic computations for this and various other problems.
So p-adic numbers are used all over number theory these days, both on the
theory side and, so to speak, on the practical side. So when you do p-adic -so this talk will drift more to the practical side, although there's quite a
lot of algebra hiding that will poke out during the end.
But for the moment, let me emphasize a concern on the practical side. There is
an obvious difficulty computing with p-adic numbers. And it's the same obvious
difficulty that occurs with trying to compute with real numbers. If you think
about a real number, an arbitrary real number can only be specified by an
infinite number of digits, say, in base ten. You need an infinite number of
decimal digits to exactly specify a particular real number.
5
There are certain real numbers that can be specified in finite data, like
rational numbers. But arbitrary real numbers cannot be described using a
finite amount of data if for no other reason than the number of things you can
describe in finite data is countable and the real numbers are uncountable.
So no cleverness is going to solve that problem. And the same thing is true of
p-adics. The cardinality of the p-adic numbers is again uncountable. It's the
cardinality of the continuum. So you're not going to be able to represent
arbitrary p-adic numbers, say, to a touring machine. So you're not going to be
able to store them exactly on a computer.
So this is a problem. So the way you deal with this with real numbers is you
don't deal with exact real numbers. You deal with approximate real numbers and
pretend that they're good enough for your purposes. And there are various ways
to do that, which have an logs for the p-adics. And the one I'll consider in
this talk is the p-adic analog of floating point arithmetic, which I suppose
before computers was called scientific notation.
So what is floating point arithmetic? Well, what is floating point arithmetic
for real numbers? Well, if you imagine a real number if you imagine its
decimal expansion, so it will have some digits before and after the decimal
point, the way you represent something in floating point arithmetic is you
write it as a power of ten times a real number, and, of course, you can shift
powers of ten in and out of the left over real number part.
So you rescale so that there are -- well, in scientific notation, I suppose you
scale things so that there's exactly one digit in front of decimal point. For
floating point arithmetic, maybe it's more conventional to shift it so there
are no digits in front of the decimal point and the first digit after the
decimal point is non-zero. So you normalize the co-factor in some way by
adjusting the power of ten.
So you can do the same thing in p-adics. So, right, any p-adic number can be
written as -- so any p-adic number can be written as a power of P, positive,
negative or zero, times an element of Zp which is not zero mod P. So in other
words, if you imagine the base P digits, you can already multiply by a power of
T to align the string so that the first non-zero digit occurs in the unit's
place. That's so you can always represent a p-adic number as a power of P
times a p-adic integer with non-zero units digit, which is to say it's a unit
in the ring of p-adic integers.
6
So to makes approximations using that normalization, what you do is you fix a
positive integer R, which is going to be the maximum relative precision of the
numbers you're writing down, and any given p-adic number will be approximated
by a rational number, which is a form P to the E times M, where E is an
integer, which is the exponent, and M is an integer in the range from zero to P
to the R minus 1, which is not divisible by P, and this is called the mantissa.
Again, by analogy with usual floating point arithmetic.
So you strip out all powers of P sod that the bit that's left over is a p-adic
integer not divisible by P, and then you truncate it so that you're only
keeping track of R digits in base P. Because you want the computations to be
finite so you truncate in that way. So that's some scheme for approximating
p-adic numbers. And, of course, when you approximate things that you really
want to compute exactly, you create some errors, and these errors will probably
get through the computation. And in floating point arithmetic, it certainly
should be a familiar phenomenon that the errors that propagate through floating
point arithmetic have the potential to get worse and worse as you go along,
because, you know, they can compound. If you're off by 10 to the minus 5 in
one quantity and 10 to the minus 5 in another quantity and you add them, the
result might be off by 2 times 10 to the minus 5.
So you have this compounding problem. You have the same thing for p-adics, but
it's not quite as bad. So let me first quantify what I mean by accuracy and
then I'll explain you accuracy degrades in floating point arithmetic.
So by the accuracy of a p-adic floating point approximation to a p-adic number,
if the p-adic in number is X and the p-adic approximation is P to the E times
M, what I'll say is look at M minus P to the minus E times X. So take the
difference and divide by P to the E, sort of to renormalize and then look at
the p-adic valuation of the result unless that's negative, in which case I
don't allow accuracy to be less than zero. If the accuracy is zero, I give up
and say, well, I've lost any control.
You might prefer to have the accuracy zero still mean something and have the
accuracy minus 1 be lose all control. But to avoid any annoying corner cases,
I'm just going to declare that once the accuracy is zero, then I give up. I
have no information anymore.
So in words, what this equation is doing is counting the number of correct
7
p-adic digits of the mantissa starting from the right. At least that's what
it's doing if the exponent is the correct exponent for X. You could imagine
having an approximation where you even wrote down the wrong exponent, and then
it would be a really bad approximation.
So, for example, if I take minus 1, which, of course, is -- remember, that's
the p-adic number, which is the p-adic integer represented by a string of 1s,
going infinitely far to the left when P equals 2. So minus 1 in base 2, if I
write down these floating point approximations of it, where these are meant to
be integers in base 2, that's what the subscript 2 means, then this has
accuracy 3, because it has three digits correct at the end. Any digits that
happen to be correct over here don't matter, because as soon as you have an
incorrect digit, you stop keeping track.
And likewise, here you only have one correct digit before you go wrong. This
is not a valid approximation, because in my floating point approximations, I
insist that the last digit always be 1. So this is sort of -- this is not even
a valid approximation. This is a valid approximation, but it has the exponent
wrong. So if you look at what happens when you compare, well, I mean, you can
go through this definition and you'll see that it actually comes out to be
zero. So essentially, if you have the wrong exponent, the approximation is
useless.
So when I talk about the accuracy of a floating point approximation, this is
what I'm going to mean. The number of digits starting from the right that are
correct in the mantissa.
Okay. Now, how do addition and multiplication affect the accuracy of an
approximation? Well, let me start with multiplication, because that behaves
very well. If I have two numbers, X and Y, and these approximations of them,
then it's clear what I should take as a floating point approximation of X times
Y. That should be times. Let me fix that.
So that should be X times Y, so it's clear that this is somehow a reasonable
approximation to take, and, in fact, it's such a good approximation that the
accuracy is no less than the minimum of the accuracies of the original two
approximations. So this if is accurate to five digits and this is accurate to
seven digits, then this approximation will be accurate to at least five digits.
So in some sense, there's no additional loss of accuracy when you do a
8
multiplication. The only degradations of the approximation are the ones that
came in to the two inputs. And they don't even compound. You just take the
worse of the two. So this is, in some sense, even better than real numbers.
In real numbers, you always have to compound -- you always have to add the
losses of -- the error terms.
But here, you just, you take the minimum of the two accuracies and that's the
accuracy of the approximation of the product.
This addition is not quite so straightforward. So addition is not quite so
straightforward. So what do you do? Of course, you should try adding these
two approximations but, of course, this will just give you some number which
you have to rewrite in the form P to the exponent times mantissa, and then you
have -- so first, you have to collect the powers of P, and then you might have
to round the mantissa, because the mantissa might have gone out of your range.
You may have to truncate it to R digits.
So, for example, if E1 is strictly less than E2, then the power you're going to
get is just E1, because when you add these two things together, M1 plus -- that
should be sub. So when you add these two things together, M1 is not divisible
by P, and this is a positive power of P times a p-adic integer. So this is
divisible by P. So the result is not divisible by P, because you have one
thing that with nonzero units, p-adic digit and one thing with zero units
p-adic digit.
So this is now a valid mantissa, except for the fact that it might have too
many digits. So when I read brackets, I mean round the number of digits. So
if these two things have different valuations, then all you have to do is
round. And if you stare at this for a minute, you see that the accuracy you
get is no less than the minimum accuracy of the two original approximations.
So in this case, again, the operation is exact. You don't experience any
further loss of accuracy beyond what came into the computation.
And likewise, the thing is completely symmetric. So if E1 is bigger than E2,
the same argument applies in reverse. So where do you lose accuracy? You lose
accuracy if the two things, if the two numbers you're putting in have the same
p-adic valuation. Or you my may lose accuracy in that case. And the reason is
because in this case, it looks good at first. You factor out P to the E1 and
you just add the two mantissas. But, of course, this is only a valid floating
point approximation if the sum of M1 and M2 is not divisible by P. So if it is
9
divisible by P, here you get a problem.
If the valuation is F, strictly bigger than zero, you then have to shift a
power of P to the F from the sum of the mantissas into P to the E1 before you
do the rounding. In terms of digits, this had the effect of you start with,
say, R digit of p-adic digits which you know are correct, say, if you have no
loss of accuracy beforehand. So then you add these two things, and the sum of
two things which are known to R digits is going to be known to R digits, but
then you have to divide by P to the F. So you shift -- you have all these
zeroes at the right that you have to shift out.
And what comes in at the left may or may not be the correct p-adic digits
anymore for the things you're actually computing.
So you're essentially adding as many garbage digits at the left as you have
zeroes that appear on the right. So this is the source of compound -- this is
the source of loss of accuracy. And you might expect, if you perform a
sequence of arithmetic operations, using p-adic floating point arithmetic,
typically, you experience progressive loss of accuracy over the course of the
computation, because you encounter this situation over and over again, and you
kind of introduce more and more garbage digits.
And so if you -- you can sort of witness this first hand if you do p-adic
number computations in magma or sage. Things really do get worse and worse as
you good along. And so you're forced to consider questions that are analogous
to questions in what's called numerical stability. There's a quite a large
subject of mathematics devoted to the problem of doing various computations in
such a way so as to limit losses of accuracy in floating point computation.
So, for example, numerically stable linear algebra is a whole sub-field of, I
suppose, linear algebra where you are sometimes forced to do things in quite a
different way than you might expect, in order to avoid losses of precision.
So it's kind of algorithmic algebra of quite a different flavor than we might
be used to in number theory.
But if we want to do p-adic number computations, then we do have to think about
numerical stability from the p-adic point of view. So it's not completely
irrelevant to our computations how one manages loss of accuracy.
So there's a general framework you can use to try to study this, but in this
10
talk, I'm not going to talk about the general problem of studying p-adic
numerical stability. I want to talk about some cases of unexpected numerical
stability. So cases where you do a computation and you expect a certain amount
of loss of accuracy and, in fact, you lose much less than you thought.
These cases appear to have some deep algebraic origin, which is not completely
understood, although what we do in the rest of the talk will give some partial
explanation. I should maybe mention that for those of you who have heard me
talk about doing -- or some collaborators talk about doing computations of Zeta
functions using p-adic methods, p-adic [indiscernible] methods, there are
examples of numerical stability, of surprising numerical stability there, but
those are essentially doing linear computations so they have linear algebra
computations. So they have, in some sense, linear explanations.
So those led me to consider these things. That's somehow my original
motivation for considering what I'm going to talk about in the rest of the talk
was motivated by having seen those examples doing linear computations, but
these turn out to be much deeper. And you'll see there are a lot of nonlinear
things happening. We'll be doing a lot of divisions and encountering some
strange cases where you don't really lose precision, even though you think you
should.
So okay. So I want to -- so I'm going to go back to -- so, well, I'm not going
back in some sense. So now I want to describe an observation made by David
Robbins in the early 2000s. It was published just after he died in 2005 of one
example that he was working with that turned out to have this surprising
numerical stability. And this is the example that kind of triggered the work
that we're doing here.
And it comes from a bit of 19th century mathematics, which starts out with a
little matrix identity which is due in this level of generality Jacobi. There
are some special cases due to our people, but this version is due to Jacobi.
If you take an N by N matrix. So this is an identity involving various
determinants. One of them is the determinant of the whole matrix. But you
consider various sub-matrices as well. And I'll have one on the next slide.
Let me state the identity first. So if you take the determinant of M. So M is
an N by N matrix. But now there are four different sub-matrices of N minus 1
sitting in the top left, top right, bottom left and bottom right. So, for
instance, A is the determinant of the matrix of size N minus 1 sitting in the
11
top left.
So you chop off the bottom row, the bottom row on the right column.
And likewise, B is sitting in the top right, C is sitting in the bottom left, D
is sitting in the bottom right. And E is the N minus 2 -- determinant of the N
minus 2 matrix sitting in the middle. So you chop off all the outside rows,
columns. And then the identity of Jacobi, which is an entertaining thing to
prove if you've never seen it before, is that AD minus BC equals EF. So if you
take the determinant of the 2 by 2 matrix performed by A, B, C and D, you get
the determinant of F times the determinant of this thing in the middle.
If you imagine, these things are homogenous polynomials of degrees N minus 1, N
and N minus 2, then you need the E in there to balance. So both sides have
degree twice N minus 1.
So this is a little, a fun little identity just to illustrate it. So if I take
this 3 by 3 matrix, okay, A is the determinant of the top left corner, B is the
determinant of the top right corner, C is the determinant of the bottom left
corner, D is the determinant of the bottom right corner, E is, in this case,
the 1 by 1 determinant in the middle so it's just at the entry in the middle.
If I did a larger example, which I will a little bit later, this will actually
itself be a determinant. And F is a determinant of the whole thing, which
since it's 3 by 3, we remember how to compute it. Turns out to be 5. And we
can check that 3 times 10 minus minus 3 times minus 5 is the same thing as 3
times 5.
So one thing you're supposed to take away from this is that it might not be
obvious when you first start doing this computation that this times this minus
this times this will be divisible by this. It's not obvious that AD minus BC
will be divisible by E. But, of course, it is because the quotient is F and
the determinant of a matrix with integer entries is an integer because we have
another way to write it as a polynomial in the entries, say, from the
definition in terms of summing over transversals.
So this identity forces an interesting divisibility property of AD minus BC,
which can be used to construct interesting examples of recurrences that involve
rational functions that give you integer entries. So, for example, there are
some examples due to Conway and Guy, maybe, of things called number freezes
that come up this way.
But I'm not going to talk about that.
I'm going to talk about a proposed
12
application of the Jacobi identity, which is due to a mathematician by the name
of Charles Leftwich Dodgson, who in his spare time was the children's author
Lewis Carroll, but for the purposes of this talk, he's Dodgson, because this is
from his day job as a mathematician.
So Dodgson proposed to use Jacobi's identity as a thought to commute
determinants, because, right, essentially what it -- right, if you haven't -well, let me just say what the algorithm is. So given a square matrix M, so in
this picture at the bottom, this is the matrix M. So I successively compute
connected minors of size K from those of size K minus 1 and K minus 2.
So if you imagine the entries of the matrix that I start with are the 1 by 1
minors. The minors are the -- are sort of determinants of the square
sub-matrices. And the 1 by 1 sub-matrices are just the entries. The 1 by 1
sub-matrices are the entries. To get started, I need zero by zero
sub-matrices. So I declare that the determinant of the zero by zero matrix is
1.
So I need a whole bunch of 1s to get started. And so, right, using -- so the
first step in this algorithm is I start with my 1 by 1 minors and then I
compute the 2 by 2 minors by, well, you can imagine that this is a special case
of the Jacobi identity where I take this times this minus this times this
divided by this. But, of course, it's just the usual formula for the 2 by 2
determinant. So I use Sage to do this so I hopefully got the right answer.
Modulo transcription error. Sage did it right. I hope I copied it correctly.
So when you compute the 2 by 2 -- so, for example this one is minus 6, minus 1,
minus seven and so on. Okay. Then the next -- so now I have the -- so now I
can throw away this and I keep the 1 by 1 and the 2 by 2 minors and I can use
Jacobi's identity again to compute the 3 by 3 minors. And I put some
interesting entries in the middle here so that there's some falsifiability,
there's some -- of course, apriori, there would be some possibility of getting
interesting denominators when you divide by these things. But the truth of the
Jacobi identity ensures that when I take this times this minus this times this
and divide by this, I get -- this looks like a sign error here. Oh, minus,
right. So plus 7 minus plus 4 divided by minus 3 is minus 1. So I did, in
fact, copy this correctly from Sage. And you can check the other ones
yourself.
So that gives me -- so these are the 3 by 3 minors of my original 4 by 4
13
matrix. So from the 2 by 2 and the 3 by 3 minors, I run the process one more
time and I get -- so I take this times this. So minus 1 has minus 2, minus 4
times minus 1 divided by minus 1, and I get plus 2, which is, in fact, the
determinant.
So Dodgson called this condensation for a natural reason. As you move along
this process, you're working with smaller and smaller matrices. So you're
somehow condensing the original matrix towards its determinant. So you start
with this 4 by 4 matrix and this auxiliary 5 by 5, which is the minors, and
then you replace it with a 4 by 4 and a 3 by 3, 3 by 3 and 2 by 2, 2 by 2 and
the 1 by 1 that you're looking for. So you condense towards the determinant.
Okay. So this has some nice features, not all of which were formulated by
Dodgson, but which are apparent in modern hindsight. So you can check that
this is an O of N cubed algorithm, just like any other reasonable algorithm for
computing determinants, like Gaussian elimination.
So, of course, on one hand, that means it's not going to get Henry Cohen
excited. It's not going to solve the fast matrix multiplication problem, but
it's not any worse than any other natural algorithm for computing determinants.
So it's a reasonable algorithm in terms of complexity. It also has these extra
bits of algebraic structure. The intermediate terms belong to the same ring as
the entries of the original matrix, because they are determinants of
sub-matrices. So, for example, if M has integer entries, then all of the
intermediate terms are integers, not more general rational numbers.
This, on one hand, helps reduce the size of the numbers involved, because
you're not carrying around numbers with big numerators and big denominators,
you're carrying around integers. Also, I think this is something that Dodgson
remarked on, if you're doing this by hand, which he would have been in the 19th
century. If you're doing these computations by hand, it's very useful to have
error checks to help you confirm that you didn't make a mistake. And the fact
that AD minus BC is divisible by E might be a very good check. So for hand
computations, it was actually a very nice algorithm.
An observation which I don't think Dodgson made, because he wouldn't have been
interested in large scale computations, but is relevant in the modern world, is
that condensation is highly parallelizable with very little communication.
Because each step of the operation is so imagine you have a square grid of
14
processors and each one is keeping track of a K minor and a K minus 1 minor.
Then each one wants to compute a K plus 1 minor. Well, essentially, it only
has to compute -- communicate with a couple of neighbors in order to do that.
So has very, very good storage properties. It's very little communication and
everything can happen in parallel. I mean, it's a fantastically nice algorithm
from that point of view.
I don't actually know whether much use has been made of this in the real world.
And probably, that's because of the incredibly serious disadvantage of the
condensation algorithm, which is that it doesn't always work. This is a bit of
a problem with describing an -- I mean, I shouldn't really use the word
algorithm here, because an algorithm is generally thought to mean something
that actually succeeds in computing what it's supposed to compute. And
condensation does not do so, because there's this tiny little problem that one
of these things that you have to divide by might be zero.
This does not falsify the Jacobi identity, because that just means in this case
AD minus BC is also equal to zero. But it means that you can't solve for F in
this equation, because all you know is that zero equals zero.
So this is a problem, and Dodgson observed this. In his examples, he observed
it doesn't occur very often for him, because sort of random numbers are
generally not going to be zero with high probability. And even if they do
occur, even if you do hit a zero, well, you have some hope, because you know
other things about determinants. For instance, if you switch two of the rows
or the columns of a matrix, you change the determinant only by a sign. So
potentially, you could try to manipulate the matrix in some way to essentially
shake out one of the zero sub minors and then repeat the computation.
This is not a very systematic description of what to do, which is maybe why
condensation never really took off as a method, but one can do it.
So okay. So as I said, condensation has these incredibly good features and
this incredibly bad feature. So because of the very bad feature, it went
ignored for a very long time, until Robbins started looking at it, and Robbins
had the idea that while he was interested in computing determinants of matrices
over FP, and he realized, well, if you're working over FP, okay, you might hit
a zero at some point during the condensation process, and then you're a bit
stuck. But you could work around this problem by lifting the problem to Z so
15
that something that starts out being zero lifts to something which is only
guaranteed to be zero mod P. And so it has a very good chance of being
non-zero.
So since the condensation recurrence is completely algebraic, it commutes with
ring homomorphisms. So if you lift your matrix from FP to Z, compute its
determinant by condensation and then project down, you get the protect answer
over FP.
Now, this is not ideal, because working over the Z means dealing with integers
that get very, very large. And if your matrix, say your matrix is 100 by 100,
even if you start out with numbers in the range of zero to P minus 1, well, the
determinant of the matrix over Z might be -- I mean, I have a hundred digits in
base P, because it's a polynomial degree of a hundred in the entries.
So you get some very large integers that you don't want to deal with. So but,
of course, you only want the answer mod P. So what Robbins said was, well, you
can't quite do everything over Zp, because you have to do these divisions along
the way. So you might divide by things which are not zero, but are divisible
by powers of P. So you can't work in -- you can't work in fixed precision in
Zp. You can't work in Z modulo or fixed power of P. But you can do floating
point arithmetic with a fairly small relative precision. For instance if it
sits in a machine word, then that makes it very efficient.
So, for example, if P is 2 and your modulus is -- your relative precision is,
let's say, less than 64, this works quite well. But to get an answer, you have
to guarantee that the final results, the approximation of determinant that you
end up with has accuracy at least 1. You need to know that there is at least
one digit correct at the end so that when you reduce mod P, that digit is
guaranteed to be the correct value of the determinant.
So Robbins was then led to test numerical stability of condensation, and he
discovered that accuracy losses don't compound the way he was expecting. So
let me state what he observed. What owe observed, so M is a square matrix with
entries in Zp, and I'm going to represent each entry of the original matrix
with a p-adic floating point approximation of accuracy at least R. So R is
going to be the maximum relative precision I start with. And then I'll do the
computation of condensation using floating point arithmetic. And along the
way, I divide by various p-adic numbers. They're, in fact, all p-adic
integers, assuming I don't run out of digits.
16
So let D be the maximum p-adic valuation of any denominator that I encounter.
And then let A be the accuracy of the computed determinant, which I'll actually
compute as an absolute accuracy. In other words, I won't renormalize. I'll
just take the difference between the computed determinant and the actual
determinant and take the p-adic valuation. So this is going to say something
slightly stronger than computing its relative precision.
Actually, no, it's going to maybe say something weaker. But this is what I'm
interested in at the end. I'm interested in making sure that the -- I want to
know that A is at least 1. I want to know that the thing that I computed is
correct mod P so that when I reduce mod P, I'm correctly computing the
determinant of the original mod P matrix.
And what Robbins observed by doing billions of examples for small Ps, mostly P
equals 2, but I think he tried some other small Ps also. What he observed
numerically is that the accuracy of the approximation is always the relative
precision minus this maximum valuation. In other words, the loss of
precision -- the loss of accuracy is bounded by the single largest
denomination -- valuation of a denominator, which is essentially the single
largest loss of accuracy at an individual step for computation.
Which is not what you -- this is not what you observe typically. Typically,
you observe that when you lose accuracy at multiple steps, those losses
compound. But you don't observe that in this case, and what we have proven is
a weaker statement along these lines. We've proved that if you add a factor of
3 here, then you do get a true lower bound.
Of course, you would like to get rid of the factor of 3, especially because
experiments suggest that this is really best possible. You get equality quite
often here. So we would like to get rid of the factor of 3. But if I have
time at the end to show a bit about how the proof goes, you will see there's a
bit of a gap in our methods that prevents us from doing that.
But we do prove a qualitative version of this observation that the loss of
accuracy is controlled not by the sum of losses of accuracy during the
computation, but by the maximum, which is very much not the typical case. If
you just try computations of not of this special form, you experience the
generic case.
17
Okay. So in order to prove this, so it's actually kind of complicated to work
directly with the condensation recurrence. So our approach to proving this was
actually generalized first. Try to figure out a more general classic statement
of which the conjecture of Robbins is a special case and try to give some
unified proof of these.
So this can be illustrated by taking additional examples. So here's an example
that is pretty different but has a similar shape to the Robbins -- to the
condensation, the adoption equation.
This is an observation of Michael Somos originally, that if you start with four
elements of a ring, say R is an integral domain and you start with Xnon, X1,
X2, and X3, which are units in that integral domain and then you compute this
recurrence in rational functions, XN plus 4 is XN plus 1 times XN plus 3, plus
XN plus 2 squared divided by XN, then despite the fact that you do divisions
along the way, every term in this recurrence is guaranteed to be in the
original ring.
So any denominators that you might see actually cancel out. Any denominators
introduced by this thing are actually cancelled out by divisibility in this
thing. And there is an interpretation of this in terms of elliptic, I suppose
elliptic divisibility sequences, which I will not introduce, but I'll give a
different explanation of why this is true later.
But let me state a version of the Robbins observation for this recurrence. So
if I take R to be the p-adics and now again I represent each initial term of
the recurrence with a p-adic floating point approximation of accuracy at least
R, then compute the recurrence out to XN using floating point arithmetic, again
let D be the maximum p-adic valuation of any denominator that I see, and let A
denote the absolute accuracy of the computed value of XN. So the p-adic
valuation of this thing minus the computed version. Then we prove that
actually, the same inequality as in Robbins conjecture holds, and this time
it's a theorem. So we proved that the accuracy here is always at least the
number of correct digits to start with minus the maximum valuation of any
denominator along the way.
So clearly, there are other examples where you get this control of the loss of
precision, loss of accuracy. And so this suggests making a general definition.
If you have a recurrence defined by rational functions defined over Zp, you can
talk about the A, R and D as I had before, the absolute accuracy, the initial
18
relative precision and the denominator valuation. If you always have A greater
and equal to R minus D for any choice of the initial terms, we'll say the
recurrence satisfies the strong Robbins phenomenon. And if you have to stick a
constant factor in front of this guy here to get the bound, then I'll say has
the weak Robbins phenomenon with the correction factor.
So here's an example where the correction factor is needed. So this is another
example that was investigated by Somos. It's the Somos recurrence of length 6.
It has the same pattern, N plus 1, N plus 5, N plus 2, N plus 4, N plus 3
squared for XN. You can sort of guess what the pattern is. I should say the
pattern only works up to seven. If you figure out what the analog is with
length eight, it does not have the right integrality property anymore.
So this one does have the integrality. X1 through X5 are units in an integral
domain. Then XN is in R. I believe this can be interpreted using the analog
of elliptic divisibility sequence for a certain genus two curve, but I don't
remember so don't press me on that. I can look that up if necessary. But I'm
not going to use that interpretation in this talk.
So this has this unexpected integrality property, but when you try the Robbins
phenomenon, when you work over Zp with floating point arithmetic, you observe
that the -- the bound A -- you don't get a bound A greater than or equal to R
minus D. You only get a bound A greater than or equal to R minus 2D
experimentally, and our results are not strong enough to apply even that. We
only get a correction factor of 5.
The lower the correction factor, the better the result is, because you're
getting a stronger lower bound on A when you subtract less stuff.
So there are examples where you do get something like the Robbins phenomenon,
but you need this correction factor. And not just in the proof. Sometimes
even in the statement, even in the optimal statements.
So there's something call it the Laurent phenomenon that's been observed in
algebra, I suppose, maybe combinatorial [indiscernible]. There is an
incredibly rich theory of recurrences which are computed by rational functions
but with a property that their terms can be expressed as Laurent polynomials in
the initial data. So in particular, if the initial data are units, Laurent
polynomials are polynomials where you allow negative powers of the variables.
So if you're plugging in units, you will actually get elements of the ring you
19
started with.
If you plug in things that are not units, you will encounter some denominators,
but they're controlled by the initial data. So recurrences that have this
property are said to exhibit the Laurent phenomenon and there's sort of a
unified theory. It doesn't completely capture the whole Laurent phenomenon,
but there is a unified theory due to Fomin and Zelevinsky that captures many
cases of the Laurent phenomenon using something called the caterpillar lemma,
which I won't try to state in this talk. The caterpillar is a certain graph
that has a shape like a caterpillar. And it's combinatorial lemma involving
this graph. I'll show an explicit example of it on the last couple of slides.
If you have an example of a recurrence which has the Laurent phenomenon as
explained by the caterpillar lemma, it also exhibits the weak Robbins
phenomenon for a certain correction factor, which we can write out explicitly
but it's typically not the best possible one.
So, for example, for condensation, this general -- when you apply this general
rule to condensation, you get the weak Robbins phenomena with a correction
factor of 3, which is a theorem I stated earlier, but it's not best possible.
The best possible correction factor should be 1. So this is a qualitative
version of the Robbins phenomenon for Laurent recurrences. But it's not
quantitative. It's not best possible.
And generally, if you have something not exhibiting the Laurent phenomenon, it
doesn't exhibit the weak Robbins phenomenon either. The accuracies of
approximations get worse and worse and worse as you go along.
There is a special class that we've identified that look like cases that
satisfy the strong Robbins phenomenon. These are things that are related to
cluster algebras, which I won't say what they are, but they have the property
that recurrences you look at have to have two monomials sitting up here divided
by a single term down here. So the Somos-6 sequence didn't have that property,
but if I omit the middle term, then I have something that has this, so to
speak, binomial shape. This is related to cluster algebras and it
experimentally exhibits the strong Robbins phenomenon. The correction factor
drops to 1 in this case.
I should say that if you don't make this restriction, we conjecture, based on
some evidence, that correction factors might be arbitrarily large. You can
20
make an example of recurrence with a weak Robbins phenomenon where the
correction factor is a hundred.
But for these special shapes, of which there are quite many that come from
cluster algebras, you get the Robbins phenomenon of 1. And I should say that
condensation, of course, has this shape because it's AD minus BC equals E over
F.
So this includes the Robbins conjecture. So in the last three minutes or so, I
just want to illustrate a little bit of the algebra under the hood that goes
into the statements. So I won't go through this in much detail, but I will
point out that to get some idea of how you prove the Laurent phenomenon for,
say, the Somos-4 recurrence, the rough idea is that instead of just trying to
prove that these things are Laurent polynomials in the input data, you make a
stronger induction hypothesis. So you carry along certain extra terms. This,
of course, is XN plus -- this is the thing that computes XN plus 4. This is
the thing that computes XN minus 1. So this is a term that steps the
recurrence forward. This steps the recurrence back, and these two are slightly
mysterious auxiliary terms, but they're cooked up so that when you try to do
the induction, you can do it.
So, for example, if you want to step this process one forward, one of the
things you have to check is that this thing with the indices shifted by one is
Laurent polynomial. So when you write it out, you substitute for XN plus 4
using the recurrence, and then you get some funny things. And one of the
things you get is precisely this auxiliary term, and so you have something
expressed both with an N plus 1 and with an N in the denominator. But you can
also check by induction that any two of these four guys generate the unit ideal
in the ring.
So you don't actually see any denominator, because you have two -- if you have
two co-prime -- if you have two different expressions of the same quantity,
with denominators that are co-prime, then there can't actually be any
persistent denominator in the result.
So you actually do get, in this case, p-adic integer. So you can use this
method to try to give an algebraic proof of strong Robbins for Somos-4. The
way you set it up is okay, you imagine computing a sequence that starts out the
same as your original sequence, but you modify the recurrence by putting in
some junk terms. So you put in multiplicative factors of the form 1 plus P to
21
the R times a variable. And this corresponds to the fact that remember, your
mantissas are only stored to R digits. So anytime you write down an
approximation, of the ambiguity as to what p-adic number it represents is
represented by multiplying a factor of the 4 on 1 plus P to the R times mystery
number.
So you put these factors in. You have some discretion over how -- you can
consolidate some of these factors. So you could even get away with just one of
them here. But let me put one in on each side. And now I claim that if these
things inner Zp, that the difference between the -- the valuation of the
difference, Y minus XN, is at least R minus the maximum valuation of any
denominator. So just the Ys that you encounter up to the computation of XN.
Okay. And the way you prove that is you go through the proof of Laurent
phenomenon that I sketched on the earlier slide and you show that, well, if you
modify the error terms so that this one is divisible by YN and Y in plus 2, and
this is divisible by YN, YN plus 1 and YN plus 3, in other words, each of these
things is divisible by all of the Ys, all of YN through YN plus 3 that are not
present in this product. So if you put all of them -- so there could be
missing variables on each side, then YN is actually a Laurent polynomial in the
input data and an ordinary polynomial in the error terms.
So this follows by a careful modification of the original proof, and this gives
you the Robbins phenomenon with an error factor of 3, but I observed earlier
that no two of these things can have a common factor. So really, you only get
a 1 here, because these there's things, only one of them can actually be
contributing anything. So you really only get a 1 here, and that's the strong
Robbins in this case.
So my last slide is the analog of this for weak Robbins. So you can do a
formally similar thing for Robbins and then get the weak -- for condensation
and you get weak Robbins with a factor of 3. But in this case, it's not the
case that these five guys are forced to be co-prime. They could share, right.
I mean, if you just have a matrix, it's possible that many of the minors can be
divisible by powers of P.
So you can't use the same trick as I used earlier to get rid of the -- to get
the 3 down to 1 here. And this is why the theorem that I stated about
condensation has this factor of 3 in it. It's because I'm forced to have put
three terms in here to get an algebraic statement.
22
Now, the hope is that because this example is related to cluster algebra, if we
show this to enough experts in the area of cluster algebras, which I'm hoping
to do at MSRI this fall sometime, because there's a special program if I show
this to enough experts in cluster algebras, one of them might explain why the
theory of cluster algebras gives me a better algebraic statement than the one I
got straight out of the caterpillar.
So the hope is that by casting this in the language of cluster algebras, we can
recruit some help from experts in that theory, get a better estimate and really
nail down the strong Robbins phenomenon for condensation. But this is as far
as we've gotten right now is this factor of 3. So my time's up so I'll stop
there. Thank you.
>> Kristin Lauter:
Questions?
>>: Once you have bounded the correction factor to some constant, could be C
plus 1 or whatever, is there a way to recover this loss of data?
>> Kiran Kedlaya: So if you know a bound on the correction factor, can you
recover the missing digits.
>>:
Yes.
>> Kiran Kedlaya: I suppose you mean without redoing the computation to more
digits, because that's certainly one thing you can do is you can try to just
redo the computation with higher initial precision to correct for the loss that
you experience along the way.
I don't know a good way to recapture the precision that you've lost, other than
to start with more precision in the first place. But certainly, that's the
idea -- that would be the thing you would want to do is figure out what the
loss of precision is and if you've lost too much precision, then go back and
start with more. But the Robbins phenomenon helps you show that -- I mean, the
problem is, of course, that without doing the computation exactly, you don't
necessarily know exactly what the loss of accuracy is compared to the exact
computation.
So you need this theorem to guarantee that the thing that you computed
approximately has a certain number of correct digits. And if you need more
23
digits, you can go back and repeat the computation with more digits to start
with.
>>: This number D, your sort of most loss, is there any indication that it
doesn't get huge at any point?
>> Kiran Kedlaya: It could get huge. On the other hand, it is just a
valuation, so to speak, of a random p-adic number. So if you really believe
that, then it shouldn't be bigger than 5 with probability more than P to the
minus 5. I mean, that's not a guarantee, but that's a good heuristic, and
that's consistent with experiments.
>> Kristin Lauter:
Other questions?
Okay.
Let's thank Kiran again.
Download