1

advertisement
1
>> Kristin Lauter: William Stein from UW here to speak to us about Sage. William
is a professor in the math department, professor of number theory, and has written
several books on number theory. He's also the founder and main developer for the
Sage software project. Thank you.
>> William Stein: Thank you. So today, I'll talk about Sage. I'm going to
describe briefly the background to Sage, kind of where it came from and some of the
advantages and motivation of Sage. And then I'll just show you a couple of examples.
So that will be my talk.
First, a quick poll. How many of you use mathematical software, like Matlab or PARI
or Maple? Wow, almost every person in the audience, okay.
So here's a quick history of Sage. I personally use mathematical software a lot,
and I started using software a lot when I was a Ph.D. student at Berkeley in about
1997, and I contributed a lot to Magma, which is a closed source, but very, very
powerful computer algebra system for doing kind of cryptography, number theory,
algebra group theory, that sort of thing. It's a lot better than some of the more
famous systems, at least for those areas of mathematics.
But in 2005, I started a new project called Sage, which mainly has, as its goal,
to be technically -- basically technically more modern than a lot of those other
systems, and it's also free, which is a big advantage.
The funding model is actually very similar to the funding for Magma. It's just that
for Magma what they do is they charge -- they have government grants and they charge
users. And with Sage, we just have government grants and maybe get some donations
from users. But it's a similar funding model. Magma's been funded for decades that
way with no trouble. I think Sage can be as well. And I think it's unnecessary
to charge users. And so far, that's proven to be the case.
So that's one of the kind of fundamental design constraints. One of the advantages
of being free is we get kind of a bigger choice of preexisting libraries to draw
on, to choose from in building Sage. We get to choose anything that's out there
that's free. And that's quite a lot of good stuff.
Sage 1.0 was released almost -- I guess to this day, I think it would be three years
ago. Maybe yesterday three years ago. But it's basically the three-year
2
anniversary of Sage.
Or birthday, not anniversary.
I didn't marry Sage.
So there have been a bunch of Sage Days workshops. There are where a lot of Sage
development work gets done. There are Sage Days 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11.
And then Sage Days 12, which I'm wearing the t-shirt for. The theme of Sage Days
12, which was two weeks ago in San Diego was to fix as many bugs in Sage as we could.
So about 18 of us spent around the clock a week fixing bug after bug after bug after
bug. I personally fixed, I think, 25 bugs. So turns out that if you take the amount
the conference cost the funder of the conference and divide by -- well, take that
divided by the number of bugs, it cost about $60 per bug. So it's pretty efficient
for Sage development. But that's how much it costs to fix a bug.
Also, the most productive Sage developers working as hard as they can for basically
full-time for about five days can fix between 20 and 25 bugs. So that's not so many.
You'd think that -- we all thought we would fix twice as many as we did when we got
going. So it's surprising how long it takes to fix a bug. Yes?
>> How long is the current bug list?
>> William Stein: There are, let's see. I have a nice little document about this
which I will just pull up. This is kind of silly, but -- let me just show you a
little thing. That's actually the wrong thing. Open report.pdf. So this is just
a summary of what happened. So those are the bugs in Sage. These are the bugs we
fixed at the workshop that we know of.
So more precisely, there were 621 known defects in Sage when we started, and we fixed
over 200 of them. And these are all listed on a public bug tracker that you can
go to. I'll show it to you. So right here, this is the Sage bug tracker, and here
you can list, for example, active tickets. Some of them are enhancements, some of
them are bugs, et cetera. And this lists 995. These are not all defects. Many
of these things are add a new features to Sage, et cetera. You can see each of the
known bugs.
And a lot of the kind of really -- there are some bugs which are, you know, very,
very fine corner cases that, you know, you really, they get reported, you'd like
to fix them, but they're not a big deal. And there are some that are really a big
deal. Of course, we aimed at fixing as many of the big deal bugs as we could during
Sage Days.
So I think the next Sage release, Sage 3.3, is going to be very, very good from the
3
point of view of having less bugs than any other Sage. So that's what we did at
the last Sage Days, which was in San Diego. And that was actually -- the entire
conference was funded by the Department of Defense, because they find Sage very
useful for cryptography research so they really want Sage to thrive, and they're
funding its development.
So we have a lot of workshops, and this really got developer energy going. I think
quite a lot of developer energy got going already at Sage Days 1, which Kristin was
at, which was in San Diego three years ago, and it's sort of just grown ever since.
We also won the first prize in some international contest for software. So in the
scientific category, Sage won first prize. And we've got funding from various
organizations for work on Sage.
For example, last summer, Google funded Robert Miller, who is sitting here to work
all summer on what ->> [inaudible].
>> William Stein: On partition refinement functions so common [unintelligible]
type code. Sage has a really large amount of highly optimized codes for working
with graphs and partitions which Robert wrote over the course of about two or three
years.
And this is new code that often is not available anywhere else.
Here's what I view Microsoft's interest in Sage as. Definitely, I'm going to tell
you what Sage is in a minute. But I just want to kind of give you kind of this
overview. So I think that my impression is that Microsoft's interest in Sage is
that, one, it could be a useful tool to support research in cryptography and related
areas such as [unintelligible] graph theory, linear algebra, signal processing and
so on. So, you know, maybe Kristin wants to compute something, and it's difficult
to do in other systems, and it can be made easy to do in Sage so she uses Sage to
do it. Sage has a lot of code that's really, really useful for doing cryptography
research.
Another reason Sage is interesting to Microsoft in enhancing the viability of Windows
as a platform for doing high performance computing. So I mean, we're of an effort
by another part of Microsoft to fund R development. So R being the stats, the very,
very famous stats package. And the reason being that R currently, there's no 64-bit
version of R for Windows. It's kind of a second class citizen in Windows. There's
32 bit only. It builds on NGW instead of MSVC, et cetera. It's not fully supported
4
under Windows.
It's bad from an HPC point of view.
You know, if you want to sell customers on buying a big HPC system, and installing
Windows on it, if, you know, the main software they use happens to be, say, R, and
maybe they want to do some crypto research that involves some Sage, it's kind of
bad if Sage or R doesn't work on Windows or doesn't work very, very well, because
the customer might have to use Linux on their cluster, which would be undesirable
that you would restrict their choice.
So another reason that I think Microsoft would have an interest in Sage is that it
would be good if Sage ran very well on Microsoft's operating system, especially
64-bit Sage.
And as you'll see in a minute, Sage itself -- it's kind of like a distribution of
most of the open source mass software out there. So if you make Sage run well on
a platform, you're making pretty much all the open source mass software out there
run well on that platform. So Sage includes [unintelligible] PARI, Singular,
Maxima, et cetera. It includes across the board almost all the open source mass
software out there.
And these are the two reasons that Microsoft is funding Sage development. So that
it will be a better tool for cryptography and other research and so that Sage will
be much more well supported on Windows.
Okay. Now I'm just going to give you a little glimpse into the Sage community. Just
to emphasize, it's a free program. I've mentioned that several times now. There's
also a free web application. Anyone can go to a certain website, sagenb.org. And
I'll do that right now, and they can use Sage. There's also a Sage support mailing
list with 964 members.
So I'm going to switch to the machine that's physically located in here, so it should
switch over. And I'm running Internet Explorer, and I go to this website,
sagenb.org. Then I log in using an account I made. You can just click create a
new account and then you'd have your own account. I'll just log in.
And then I get a list of worksheets, and I can create a new one. I just made a brand
new one right here, actually. And then I can do calculations in Sage. I can do
arithmetic, two plus three. Can also draw plots. So plot, say, sign of X cubed
times cosign of X from 0 to 5. And you get a nice plot. So that's just showing
you how Sage works and how you can use it from just some random computer somewhere
5
by just going to this website.
I'll switch back to the presentation.
Yes?
>> Is this running locally?
>> William Stein: No. When I went to that website right here, I'll switch back
for a second. That's running on sagenb.org, which is a computer in the mathematics
department. In fact, sagenb.org is running on this hardware. So that right there,
that's Internet Explorer running on this computer right here accessing a web page
which is at University of Washington.
>> [inaudible].
>> William Stein: The only thing that I downloaded was the Javascript that defines
that web page. It's just a web ap, an Ajax web ap. Here's the compute their it's
actually running on. This is a computer that's sitting in the basement of the
mathematics department. And it's -- this is the machine that we do most of our Sage
development on. It's four Sunfire X4450s, each with 128 gigabytes of memory and
24 zeon cores, and then a big disk box here.
This what I'm showing you here is also Sage. It looks like a presentation. It's
actually Sage, and this is running locally on my laptop. So when I do a calculation
here, this is happening completely locally, just on my laptop. This is not being
served over the web. But you can't tell the difference, except that it might be
a little faster, or maybe a little slower, depending on -- I mean, that hardware
I just showed you is faster than my laptop.
So if you type some -- the time to do the actual CPU bound work is faster if I do
it remotely. Whereas the time to maybe, you know, get the answer to display is faster
if I do it locally. So maybe this is a little snappier than if I do it on the other
one.
There's a mailing list for Sage called Sage Support, and it has, last time I checked,
it had 964 members. And I'm pulling it up right here. And now it has, whoa, wait
a minute. Did people drop out? 955. Hm. That's odd. Wow. Huh. Anyways, now
it has 955 members. Maybe several were discovered to be spammers or something. I
don't know.
Here's the Sage website.
So apparently can show you it directly, since I have
6
Internet access. So I'll just go to sagemath.org. This is the Sage website, and
there's a big download link that you can click on. This link gives you the Sage
notebook that I just showed you. There is -- if you click on quick start, it gives
you just a quick summary of what Sage is about, what it can do. Kind of gives you
a little tour. And there's extensive help. If you want to know whether Sage can
do something, you can just type a question in here, and it does a search of all the
Sage documentation, the Sage users groups and so on. In fact, somebody ask can Sage
do something. Just somebody.
>> [inaudible].
>> William Stein:
Do what?
>> Anti-commuting variables.
>> William Stein: I'm going to write non-commuting variables. See what comes up.
Google found nothing. Google found 12 results. So I guess the web found something.
There was nothing in any of the Sage documentation, at least with non dash commuting.
Maybe you should just write noncommuting without the dash. Let's see. It's giving
nothing at all. Hm. I know that Sage can do things with -- noncommutative. I'll
just search for noncommutative. Ah. So quaternion algebras. So that's an
example. Matrix bases, quaternion algebras. Working with matrices over
noncommutative rings, et cetera.
For example, if we click here on quaternion algebras, we'll get the section of the
Sage reference manual, quaternion algebras and you can see an example of making an
quaternion algebra. And you can paste this example into Sage, click a box right
here, and now we have a quaternion algebra. And see, I times J is K, that sort of
thing. If I define it this way, then the variables are directly defined. So I can
do I times J plus K, get two times K, et cetera.
So that is the Sage website. Another neat thing that is on the Sage website is a
development map. There's acknowledgment. So this lists organizations that have
funded Sage. You can see there are a lot. There's a map which lists the people
involved with Sage development, which is getting really, really large. So that's
a list of people that have contributed code that's got into Sage. And I think now,
the number's well over 100. So a lot of different people have contributed.
In fact, one of the motivations for me when I started Sage was a conversation I had
with a graduate student at University of Texas at Austin, where he asked if I come
7
up with a nice algorithm, and implement it, what do I implement it in? If I implement
it in Magma, the people in Uruguay, where I'm from, are not going be able to use
it, because Magma costs $550 at the third world discount, which is six months' salary
there or something. If I implement it in PARI, only the PARI people. What system
should I implement my code in? And Sage is an answer to that question, and a lot
of people are agreeing with that answer, as you can see by the list of people
developing code.
There are a lot of people here I've never met who have -- who are actively including
code in Sage. There's -- for example, like a week or two ago, there was some random
guy from France who I've never met who popped up and started implementing very fast
codes for computing discrete logs in generic groups. He just started implementing
the standard algorithms and doing a very good job at it. That's really good. It
just sort of pops up, appears. You post the code.
We referee it. I can actually probably find the code right here.
So I think he implemented, for example -- it's not the -- it's in small groups. So
he's not implementing index calculus attacks on discrete logs. But it's this guy,
YLCHAPUY. He just popped up, posted this code. Here it is. And it's a nice little
implementation. This is just one of several things he's posting.
And what happened was he posted the code, he described it on the developer mailing
list, then he posted the code, then I asked him to, you know, explain is it faster.
And here's doing some simple discrete log in the finite field F15 in Sage 3.23, one
of the, I guess, the latest release version of Sage. That one calculation takes
276 seconds. And now, after applying his patch, it takes 0.14 seconds.
So that's a lot better. So it's really nice that people just notice these sorts
of things and just start speeding them up left and right.
>> [inaudible].
>> William Stein: I don't think it was using [unintelligible] at all. It was -- I
think it was using baby step giant step, yeah. So I mean, we have some very generic
code. First, it was probably making a -- doing a four-loop. And then after, that
somebody implemented baby step giant step. And after that, somebody came along and
does this. I'm sure he's doing this because for his research, it's very important
that it be fast. Yes?
>> Are there any concerns about people pulling operating code from somewhere else
8
or the algorithms might be patented or something like that?
>> William Stein: We have concerns, definitely. One thing we do is make sure people
copyright their code so they're at fault. And they have to release it to us under
a license that allows to us include it in Sage. That could be GPLV2 plus. It could
be a BSD license or it could be the most permissive Microsoft shared license, which
is GPLV2 plus compatible. That so would be a possibility also. Nobody's
contributed a code under that also. But my understanding is it's possible for you
to contribute code under one of Microsoft's licenses. And there is one that would
work.
That's one thing we do. The other thing is every single line of code that gets into
Sage is peer reviewed. That helps a lot. For example, this particular patch, I
reviewed it. And if you look, let's see, John Cremona also reviewed it, and he wrote
all this stuff about what the heck is going on. And then looks like Michael pointed
out that I'd already reviewed it positively. So -- but John Cremona is a pretty
famous number theorist. He's the chief editor of the London Mass Society. The
Journal of the London Mass Society. He's a pretty serious guy. He's also a very
involved Sage developer and he very regularly referees patches for the Sage group.
So you can see that here. And now it got merged into Sage. So it just gives you
a glimpse into how Sage development works. Lots of people all over the world do
this.
But it's totally possible that somebody could sneak some patented code in. We would
know exactly who did it. We would be able to remove it if somebody complained, but
it's totally possible it could happen. We don't have lawyers sort of, you know,
carefully looking over stuff like you guys do. But at least we would know how the
code got there, and we could get rid of it. Yes?
>> When you suspect that, do you link the code to the DLL, or do you link it directly?
If it's DLL [unintelligible], if it's directly, you have to relink it.
>> William Stein: All code we get comes in as source code. It's sort of like this
code right here literally modifies. This is a patch that modifies our core library.
When we get code submissions from other people, they're modifying how Sage is written
itself. I think that answers your question.
It would be difficult if it turns out that algorithm were patented and we had to
remove it or we had to get permission from the patent owner, whatever, it would be
difficult because we'd have to revert the patch, which would make everything slow
9
again, maybe some other patches people had built on top of it. It could be painful,
yes. It hasn't happened yet to us. But if it does, I think we'll know exactly where
it came from, and we'll listen to whoever is complaining and then we'll fix it.
That's the plan.
>> And you're noncommercial, which makes it easier.
>> William Stein: Yeah, we make zero money. Sage is, there's no company. I want
to make that clear. There's no company at all. All money that we get for Sage goes
into on an account at university of Washington. If anybody gives money, it's a tax
deductible contribution. It's all a not for profit type thing.
So we tend -- you know, if you want 50% of our money that we make off of Sage, you're
welcome to it, because it's 50% of zero.
All right. Here are some advantages of Sage over Mathematica, Matlab, Maple. There
are certainly many -- there are advantages and disadvantages to Sage as compared
to Matlab and Mathematica and so on. One advantage to Sage, the core language, the
language used for interact is Python, which is a general purpose language. It's
easily one of the world's top ten most popular languages. There's definitely some
strong interest in Python at Microsoft, for example. There's the Iron Python
project, which is an implementation of Python on top of dot net framework.
So I think Jim Hugen or something is the guy who works here who works on Iron Python.
And it will definitely be worth thinking about. As Sage gets better and better
support under MSVC and under Windows, it will be interesting to see how it can work
better with Iron Python. We'll see how that goes
Another advantage of Sage instead of using a custom language like Matlab's very
mathematically oriented language or Maple's very odd and mathematically oriented
language, it's just a normal, general purpose language. Python is a good language
at that. It's nice to use, friendly. People pick it up really quickly in practice.
A second big advantage of Sage is it's really easy to write compiled code in Sage
that's just as fast as custom C code. That's not easy to do in Matlab, Magma or
Maple. It is easy to do in Sage. I'll show you some examples of that. That's a
huge, huge advantage of Sage. People regularly post saying why doesn't Magma do
something like this that, why doesn't Maple do this?
It's really something that partly comes for free in Python. It's a pretty neat
technology that allows us to do this, called Cython. I'll show you an example of
10
that in a minute.
There's a lot of cool stuff that's just included in Sage, or available to Sage because
we use a general purpose language, because we use Python, and Python has, you know,
millions of users around the world so it has a lot more users than these other systems
and there's a lot of code that's at PARI completely unrelated to mathematics that
you can just use from Sage. For example, say you want to write a web server in Sage.
Well, it's a few lines of code because there already exists a library called twisted
that provides a web server.
Let's say you want to write a command e-mail. This would be a Spam lover's dream.
There's a command in Sage e-mail that sends an e-mail mental. It doesn't require
you have a mail server set up anywhere. In fact, Sage contains a very serious
implementation of mail server itself. Sage is a mail server because the twisted
library in Sage is a mail server. So for example -- good thing this isn't tested
because every time you test Sage, it would send an e-mail. But if I do this, you
know, you could have some calculation you're running. You want to e-mail it to
yourself to tell you that the calculation is done, it's just a command in Sage to
do that.
It, in fact, immediately returns, it starts a trial process. That trial process
sits there and it sends an e-mail when it's capable of connecting to the mail server.
So if I were to go check my e-mail, I would find a message there saying the calculation
is finished.
I guess I can do that. Hopefully, I don't have any embarrassing e-mails. We'll
soon find out. So there it is. The calculation is finished. So it's nice that
you can e-mail yourself like that. That's the sort of thing that you could
accomplish that in some of the other MAs, but it wouldn't be nearly as nice. I mean,
if you wanted to do it in one of the other systems, you'd have to have some other
program like mail installed on your machine and have to go through that, whereas
here it's a command that's built in, that will work on any copy of Sage anywhere,
just for example. But there's a ton of stuff like this where ->> [inaudible].
>> William Stein: That was left over from -- I'm going to refresh this. That's
left over from this example right here. So this is Python -- the very first example
in the Python tutorial is right here. And in the Python tutorial, most examples
are taken from Monty Python's Flying Circus. Python is named after that show rather
11
than the snake, in case you're curious.
So here's some just -- I'm just going to show you a bunch of examples of Sage now.
The examples will range from a bunch of calculus examples to some examples you see
in Cython to speed up real world code to some number plotting examples, 2D and 3D
plotting examples.
Here's an example where I create two symbolic variables and now I can type in
expressions. They involve these symbolic variables, like X to the volume of Y, pi
squared of two. They're all exact object. It has a similar feel of Maple,
Mathematica or Maxima. By default, you see the output as just a linear line, but
if you type show us something, you'll see a nicely typed set. It's actually very
nicely typed set. Here, let me zoom in on that for are a second.
Actually, I can probably zoom with that, although the resolution is low. But I think
I'll zoom in just to emphasize that is not an image. It actually types up
mathematics. I'm going to zoom in by increasing the font size of my browser. You
see that it actually got bigger. Each of those characters are individual
characters, as you can see by highlighting them. If you double click, you see the
[unintelligible] that defines it.
What I have here is GS math. GS math is a Javascript implementation of text layout
engine. This guy, David Carbone, sat down with the tech book and implemented text
algorithm, line by line in Javascript. It's a few thousand lines of Javascript,
but it's pretty cool because it's nicely integrated into the Sage notebook. What
it gives you is you can type in essentially any tech expression or any Sage
expression. Sage expressions know how to tech themselves, and you'll get
beautifully type set exactly like you want, because you like tech display. So that's
what this is. I mean, I assume you like tech.
Here's another example where just a little more complicated, expanding out A cubed.
Here's an example of solving an equation. So this is a cubic equation. So you can
write down an exact solution. You can, you know, of course, change the forms so
maybe I want it to be 17 time -- how about squared of 17 times AX plus B. What happens
here, secretly, behind the scenes currently is that this equation gets converted
into equation in Maxima. Maxima solves it. Then we parse the result and that's
what's here.
It's probably the case that Maxima will not be used to do all of these sorts of solve
12
operations in the long run. I think that will last for a certain amount of time.
We're currently, currently a lot of the symbolic, just the pure symbolic manipulation
stuff, the very calculus oriented stuff is implemented using Maxima. We've been
doing a lot of work to move away from that. We have a native C plus plus Library
called Pynac, which if you make your variables and just give this option, new
symbolics equals 1, then you get these Pynac variables, which they work almost the
same as the current symbolic variables, but manipulation is way faster.
For example, if I expand out A plus B plus C to the power of, say, 20, using Pynac,
it takes that long. And I think Maxima will take longer. So if I do exactly the
same thing now using -- it's about, well, so this was really -- the first one was
really point 0.01, because it didn't use -- everything happened at CPU, the other
is 0.07. You can see there's a definite difference in speed. The speed difference
will become bigger if you do much large things.
Our interface to Maxima is optimized but we can get a lot more out of something that
native C plus plus. The other nice thing about the new native C plus plus stuff
that we have is you can do symbolic manipulation under finite fields and other funny
base rings, which is pretty amusing.
All right. Here's an example of a huge integer determinant. I'm making a random
200 by 200 matrix. Every single entry has 128 bits. So just to show you what the
entries of the matrix look like. Here's the top left entry. So each number in the
matrix is about that big. So it's a big 200 by 200 matrix. Each entry is about
that big. And I compute the determinant, which is a very large number. And you
can see it took three and a half seconds, less than three and a half seconds. Here's
the actual determinant of that matrix. It's really, really big.
Sage is very, very good at computing determinants in matrices with large numbers.
And this example illustrates making a random matrix and then how you compute the
determinant. You call the method determinant on the object that's the matrix.
If you want to see what other things you can do with a matrix, if you do A dot and
hit the tab key -- so, let's see. How shall I do this? I guess if you pretend like
you're ready to call it and you don't know what to do. You're like, I want to call
it A dot something, but I don't know. You hit the tab key and it will show you all
the options. Like playing a game and it shows you all the directions you can go
in. Those are all the things you can do with the matrix. You could compute the
determinant. You can differentiate all the entries, get zero here, ask if it's a
square matrix. Compute the left kernel of the matrix. Compute [unintelligible]
13
vectors and so on. Triple L reduce the matrix. Triple L reduce the lattice with
a given gram matrix, et cetera, in the BKZ algorithm. So you get a big list of
possibilities.
Then if you decide oh, I want to do determinant, then if you're about to call it
and you hit the tab key again, it will tell you about the function you're about to
call. So here, you find out that this will call the determinant function. There's
a couple of different algorithms you can use. For example, NTL has a pretty good
implementation for a certain range of inputs. There's also a [unintelligible]
algorithm or you could use LinBox and so on.
If you look at almost any function in Sage -- well, actually 65% of the functions
in Sage, you will find lots of examples, like this, which illustrate usage of the
function. And the examples are automatically tested on a regular basis. Before
we make a Sage release, we test every single of these 80,000 inputs on about, I think,
20 different operating system hardware combinations.
So, for example, on one of those machines that I showed you before when I showed
you that big rack of computers, one of them is running VM ware server which has 12
different Linux distributions installed into it. So we just build Sage in all of
them from scratch, run the full test suite, and we do that before we release any
copy of Sage.
And then we also have some other hardware in other places. For example, we have
a build farm at the DOD that allows us to build Sage on all the machines that are
of interest to them. Like [unintelligible] Linux, all these sort of exotic machines
before we release Sage.
So again, there's the determinant of this matrix.
about three seconds.
Pretty big number.
It takes
Here's an example of how to do the same calculation using Maple. The nice thing
is you can call Maple directly from Sage. This is a kind of unique functionality
Sage has that is different than what you get in a lot of other systems. You can
call -- like the command Maple of A right there, that takes the matrix A that we
have and converts it into a Maple matrix and gives us back B. B is a reference to
a matrix in a running session of Maple.
There's one session of Maple that got started running. It's sitting there, and
inside of that session, there's now a matrix B that is equal to our matrix A. And
14
then by calling determinant, which is a Maple function, it computes the determinant
of that matrix. In Maple 12, there's some much faster determinant code than in Maple
11 or earlier, which would have taken hours for this. And it will do the determinant
quite quickly. Yes?
>> If you haven't paid for Maple, would this a run time error or compile time error?
>> William Stein:
Run time.
>> Run time?
>> William Stein: Yes. You'll see at run time a message you should install Maple,
and where you can get Maple from, like the Maple website in case you've never heard
of Maple.
In fact, how about if we switch over to -- remember, I was running this on any public
server anyone can log into. I don't want to get in trouble. Maple better not work
here. If it does, Maple is going to sue me for violating my license agreement.
So here's what you get. If you try to run Maple on a machine that doesn't have Maple,
it says unable to run Maple. It gives kind of a teeny piece of a trace back. If
you click to the left, it gives you a lot more.
And then it gives you some hints about how to set up Maple on your computer, and
so on. Tells you where to buy Maple. Okay? So that's what happens. It happens
at run time so you don't have to worry -- there's no compile time linking to Maple,
which is a serious worry. How do we talk to Maple without actually linking it in
somehow, but we use a pseudo TTY so we can talk to Maple.
>> If it's a run time error in a complicated program, you never know whether you're
going to get a given [unintelligible] message after midnight.
>> William Stein: Yeah. Fortunately -- so I showed you the 150 Sage developers,
and there are many thousands of Sage use. Most of them use Sage because they don't
have Maple or Mathematica or these other systems or they don't want to use them.
So the -- and it could have -- at PARI, you have no idea if you're not a Sage
developer, the direction Sage development has gone. Here's the direction. As much
as possible -- actually, I would say 100%, if you call a Sage command and it doesn't
explicitly say right in the name of the command Maple, or algorithm equals Maple
as a non-default option, it's not going to use Maple or Mathematica or Magma or any
15
of the other systems.
library.
There's maybe like three exceptions to that in the whole Sage
So as it turns out, that's just not going to happen to you unless you explicitly
wrote code that explicitly calls Maple, it's not going to be calling Maple. At PARI,
you could have imagined maybe Sage is this big system that assumes you have Maple
and Mathematica and actually uses it all over the place to implement stuff. That's
not how it turned out to be, though. Really, a lot of people are using Sage so they
don't have to want to use Maple at all.
It's just if you're working on Sage and you want to see the speeds of answers, or
you're curious, did we get right answer? Did we get the same answer as Maple? Right
there I can check to see if the two determinant are the same. The algorithms actually
are totally different between Maple and Sage in computing the determinant. I asked
a Maple developer what was going on. He said it was using a Chinese [unintelligible]
theorem algorithm. So it's computing the determinant [unintelligible] using the
Chinese [unintelligible] theorem. Sage is using a different algorithm. And it's
nice that we get the same answer.
So was there another question?
>> For example, before, when you gave the quaternion algebra example that was written
by David Cole, that was not calling Magma?
>> William Stein: No, that was completely 100% Sage code. It was written by David
Cole and other people have worked on it too, yep. Unless it's really, really obvious
that it is calling Magma, it's not calling Magma.
So here's the thing. The stuff that Sage calls by default to get the work done is
all included in Sage. It's -- Sage calls PARI and Gap and Singular. It calls these
systems all over the place. Like if you, let's say, let's say you want to, say you
want to make a multivariant polynomial over -- you want to try, say, the Fateman
benchmark, which is a raise a certain variant polynomial to a big power. Actually,
it's a little different than that.
I think it's -- I'm not even sure what it is. Here's a benchmark. That's not using
Magma. It's, I think that's actually the actual benchmark is 20. So this computed
the product of two multivariant polynomials. The output 216,000 characters if you
print it out. You can see it's very, very fast. Behind the screens it used
Singular, it used a C library interface to Singular that Martin Albrecht one of the
16
Sage developers wrote. Singular is included in Sage. There's no requirement you
have Singular on your system anywhere. When you get the Sage distribution, it gives
you everything that it uses by default. You don't have to worry about this stuff.
It's here as an a extra feature.
Just to emphasize the usability of this feature, here's a quote from IRC last night.
I need to shrink -- actually, that's a little hard to see. So this was some guy,
and he wrote I can show people at work tomorrow this they don't have to abandon Matlab
if they switch to Sage. He was talking about using the Sage and the interfaces to
Matlab and other systems. He was wondering about the sort of conversions you can
do between Sage and these other systems. I have no idea where this guy works, I
don't know what country he lives in, but he is very excited about Sage, wants his
coworkers at work to use it and they have a lot of Matlab code.
You know, instead of his sell being hey guys, we have to switch from Matlab to Sage,
it's, hey, let's use Sage in addition to Matlab, because Sage has nice cool features
that Matlab doesn't have. Sure, let's continue using Matlab. We have licenses.
Matlab is really a powerful, wonderful system and we can do stuff in Sage and Matlab
at the same time.
Okay.
Any other questions?
Okay.
Here's another example of -- I'm going to show you some plotting examples at this
point. I'm going to show you plotting examples, number three examples and then my
talk will be over. Here's an Pell of making a callable symbolic expression a sign
of 3x times X plus log X plus 1 over X plus 1 squared. There you can see it. You
can do lots of things like this, like you can integrate it,
F dot integrate. There's the symbolic integral.
You can plot it. And there's what the plot looks like. The plot command has a
calling, like the options that you can give it, what they're named, how they work
are very similar to Mathematica.
It's like thickness, everything is almost the same as Mathematica, except you use
lower case instead of upper case. So it's lower case with underscores. That's just
a convention in Python. In Mathematica, it's camel case. You just change all that.
There's also a Matlab so if you like Matlab's plotting instead of Mathematica's
plotting. And for visualizing data, Matlab's plotting syntax is actually in many
ways a lot better than Mathematica's, whereas for messing around with mathematical
functions, like in a calculus class, Mathematica's syntax and design is I think is
17
personally better.
Here's an example where I plot the same function but instead, if I import pi lab.
That pi lab thing, a python thing, if I do pi lab as P, I get this object P, that
has a whole bunch of functions, I can do P dot tab to see what they are. It gives
you literally everything that you'll see in Matlab for 2D plotting. With the same
inputs. It's like a compatibility layer, almost.
So for example, the plot command, if you do that you can see how it works, and it's
just like Matlab. You give the X and Y values you want to plot, and it plots them.
It has these little like line styles and they're just like the Matlab line styles
and so on. So you set up a plot by just calling plot multiple times, saying you
want axis in this point, saying you want a legend, deciding all these things, using
all these options. This guy, John Hunter for Python, he went through and implemented
everything exactly like in Matlab for 2D plotting, and that's what you're looking
at here.
Maybe not exactly like it, but it's very, very similar.
And here it is.
the same plot.
So this looks just like what you'd see coming out of Matlab.
It's
Here's another cool thing in Sage. You can do -- oh, I think the X is all messed
up. Let me make sure. Ah, there it is, okay. So if I call fast float, then I get
a version of this function such that when you call it, it's really ridiculously fast.
It only takes 518 nanoseconds to call it with a given floating point input. So it's
really, really fast. It's actually way faster than just defining a function in
Python and calling that function. Yes.
>> [unintelligible] log in 500 nanoseconds?
>> William Stein: Yeah. In the interpreter. In fact, it does the entire thing
in that amount of time. This guy, Robert Bradshaw, is one of my grad students at
UW. He wrote something which converts an arbitrary symbolic expression into this
call stack, and it's very, very tightly coded. It's a -- I mean, the actual
implementation is compiled. It's written in Cython, and but this is all at run time.
This is done at run time, and it gives you back something that you can then call.
And this was really important because we were doing things like contour plots of
functions, 3D plots, like 3D, you know, give a function. You want a plot of a
18
function you would have to evaluate on a mesh. It was ridiculously slow.
Literally, every time it would evaluate one of these functions it would call up to
Maxima. Maxima would plug numbers in, simplify the result, and we would parse that.
You'd try to plot a function in, it would spend 30 second trying to evaluate the
function on a big mesh. You'd put in all the defaults so it would only evaluate
15 points in each direction or something ridiculous.
Robert thought this was completely stupid so he spent a weekend and. He wrote this
fast float thing. It works not just on single variable projects. You can make up
a function of many variables and it works just as well. X cubed times sign of X
squared plus cosign of X times Y minus 1 over X plus Y plus Z. So it's three variables
and now you can make a fast float thing on those three variables, and they can call
it. And let's see how fast it does that. So 715 nanoseconds to evaluate that
expression on these three input variables.
So it's very, very fast. You remember the other one was 500 nanoseconds. And this
is pure Python. So if you do the same thing in Python, exactly the same expression
as I had before, it's over ten times longer. So we're really beating what you get
from the Python interpreter. So yeah, he just wrote this, and it's very, very nice.
There's an engineer at Newton labs, Karl Witte, it's a place that does computer vision
in Renton. And he's been working on making a very sophisticated investigation of
this fast float that doesn't require the input to be a float. That's going to be
pretty cool. So it will be something like this, but for finite fields and all kinds
of other things. You can imagine this will be really useful for maybe enumerating
points on varieties over finite fields, that sort of thing.
So that's one of those cool little gems that are in Sage.
I think I'll just skip this.
comparing the answers.
This is just doing integration in Sage and Maple and
Here's a cool example. So this shows how you can use PyLab, just like you'd use -- or
Matlab for loading images and manipulating them. If you do M-read, you can read
in an image. Here's a picture of Seattle. What it does, this has red, green and
blue channels. It gives you a three dimensional array. Basically it's like three
arrays, one for red, one for blue, one for green. It's a single object A that's
a three-dimensional array.
I'll show you just a little bit of it.
So you can do, like, this may print out a
19
lot of stuff. Yeah. So that's the upper left pixel. It has these red, green and
blue values. And here, the second thing I did was I took that image and I took just
the blue channel, and that's what the right thing is.
So you can plot like this. You can also have fun. You can, for example if you take
one minus the matrix, then plot it, and it inverts it. So you can do image
manipulation mathematically like that.
You can also -- I don't know what this means, because I'm not quite sure what inverting
a -- you know, a three-dimensional array means, but I've inverted the picture. I
actually don't know what that means. I don't know if it inverts each of the three
individual matrices that give the red, green and blue channels. I just don't know.
I probably should figure that out. But it looks amusing.
Okay. Here's an example of something called interact. You can, just by -- if you
put that little thing, that decorator, as it's called, at interact before any
function definition, then the function becomes interactive, and you get interactive
control over the inputs to the function. Notice that this function has two inputs,
integer I, which is going to be the number of IGAN values used to compress an image
and a bullion flag, whether to display the axes on the outside. And those inputs
get converted into controls. There's a slider for I and a check box for whether
or not to display axis.
If I click the check box, it will redraw the -- rerun the function but with the axis
not displayed. If I drag the slider, it will change how many IGAN values are used
for compression. There it uses seven. Hence the image is really washed out.
Whereas if I use, say, I don't know, 38, then the compressed image looks reasonably
good, though, of course, the original is better. If I use more, it's almost
indistinguishable.
But it's nice to be able to play around with this in your web browser. Like this.
This is very, very similar to Mathematica's manipulate command, except theirs
doesn't work in a web browser and is a little bit more powerful in some ways. This
is quite powerful. We have a lot of different controls. Like if you wanted to
choose a color for some reason, you can -- it will give you color selector.
So when you click there, then it would -- that variable gets set equal to the color
you click on, et cetera. So there's lots of different things like that. And there's
little matrices. If you wanted to have a matrix, you could do M equals, say, a
matrix, maybe integers and make it a 2 by 2 matrix and there it is. And you could
20
fill it in. And when you change the values, then the function gets rerun, but with
M set to those values.
So it's pretty cool. It's a really, really useful idea. I would say it's a great
idea in Mathematica, but it's not actually -- Mathematica just copied it from what
had been done by other people. You can already see this in, like, some old -- there
are certainly some old Unix tools for making gooeys that do similar things. And
in the Python world, Enthought has sort of their core technology is something called
Enthought Traits Library and it does essentially this, but it's just more complicated
to use.
So it's a pretty cool idea, though, to just make functions interactive like that.
And now, here's some 3D plots. So if we make up a function, like that, say sign
X minus cosign -- sign of XY minus cosign of X. Let's make it more complicated.
Make X to the power of five and what we get is a 3D plot, and you can zoom into it.
You can zoom out. You can rotate it around. And so on.
The way it works is that there's a java applet that gets embedded in the web page
that does 3D plotting. But only uses 2D java primitive, so it doesn't require that
I sign any code or run anything that uses 3D acceleration. It runs on a lot of
different systems.
Here's another example that just illustrates how plotting works. There are lots
of different prim tears like spheres, icosahedrons, tetrahedrons, et cetera. You
can just add them together and you get a scene. And you can give properties.
Like I'll make an orange icosahedron, I'll add it to a red sphere that's transparent,
et cetera. If I evaluate that, here's what I get. I get the scene that has those
objects in it. So I have my icosahedron, I have my sphere. One of the spheres is
transparent, and so on.
And as also some cool things, you can set things spinning, like this. So if you
want to leave a demo running while you're teaching or something. You can, if you
want to give your audience a headache, you can do cross-eyed view. And then
everybody -- because I have zoomed in so much so that the fonts look good, it's kind
of clipping it partly. But if I -- yeah, let me -- it gets confused if you zoom
in a massive amount. That's causing trouble. All right. Yeah, I'm confusing it
by zooming so much, I think,
Which is unfortunate. Jeez, that does not look happy. It's really clipped.
21
Okay. So I want to finish up so I better not risk too much time on this. I'm just
going move on. Here's another example of -- so here's an example -- oh, great. I
have seriously messed this up. Let me just refresh.
So here's an example of -- oh, weird. That is really weird. Hm. Somehow, by
zooming in and out a bunch, I've confused the java applet big time. Yeah, that's
not good. All right, well, not so good. Here's another example. So this is just
an indexed face set. So it's just a bunch of 50,000 triangles. Yes?
>> This is the second or third time you have random stuff. Where Nell come from,
and how bad is it, because there are so many bad ones out there.
>> William Stein: Ah, good question. There is a -- this guy Karl Witte I mentioned
earlier, the engineer that works at Newton Labs, he really likes random numbers.
He came up with a nice unified framework for the random numbers. There's a single
file where the random seeds are set for the various subsystems. You can do set random
seed and the gap random number generators all the other random number generators
in PARI all get set.
There's one point where you can set everything. There are really good random number
generators in the new multi-precision library. There are good ones in Python, in
some of our libraries. The answer is there's probably, at least 20 different random
number generators. There's also NumPy, which is one of the libraries included in
Sage, has about 100 or so different distributions that it implement. So that gives
you lots of different, differently distributed random numbers.
So the answer is, there are some good random numbers. There are some bad random
numbers that are documented, and you -- and you have to look. So the answer, I guess,
to your question is it's all over the place. There's a lot of stuff. But at least
there is a good framework for setting the seed for the random number generator and
figuring out what's getting used.
I'll go through this example very quick. I'm running out of time. Here's an example
of an e-mail that appeared on Sage Support this Saturday. It's a real world report
of number theory calculation. I haven't method about what he's doing. But he's
iterating over lots of values and computing the number of numbers with some property.
It's the sort of thing that you could -- it doesn't use too much, just iterating
over ranges of numbers and checking things like if something is square and so on.
22
The way he'd written it, it was creating a list of a billion entries in memory all
at once, which on his puny computer would run out of RAM. So there's a way to just
make a one-letter change to the function or, well, actually, change the notation
slightly. And instead of creating the entire list at once, it makes a lazy list.
And that makes it so his code would run.
So I started running, and I told him this will work instead, in Sage support in the
million list and then I started it running. On his example, he wanted to do it from
10 the to 9th. I kind of got bored because it was sitting there for a long time.
I decided to change it to Cython code. I put percent Cython at the top. I import
some C-library function and I declare data types. And I said everywhere T is going
to be long long. Instead of using Sage is squared, I'm going to use the C library
square root and Y is going to be long long. So I declare some data types.
Then I hit shift enter, which I'll do right here. And then I tried this function
on a couple of inputs, and for example, for 10 to the 6th, it turns out that it's
238 times faster than the uncompiled version. So this is real world code. This
is an actual user on Saturday. And we got that big of a speed-up by using Cython,
by declaring some data types and compiling it.
>> In the background, is it actually going back and compiling [unintelligible], then?
>> William Stein: Yep, here's what happens. It takes -- if you click here, it shows
you the code I just compiled. But it turns various parts of the code into C. For
example, that code right here gets turned into this C code. And this if check gets
turned into that C code. In fact, it's almost for things that just involve C
variables and C operations, it's a one-to-one translation. Literally gets
identically translated to C so the variables get a funny prefix to obfuscate them.
That runs exactly the same speed as C.
It generates a C file. Some things that more complicated like A penny to a Python
list, that is a couple lines of Python API code. It generates this big file that
can be compiled using C. It then compiles it using a C compiler to make a shared
object library, and then it links it at run time. So that's what happens behind
the scenes. That happens all when you hit is shift enter. There's also a
Fortran-- you can place Fortran into this code also. And it will do something
similar, but with Fortran code. There's a couple other things like that. For ten
to the ninth, his example took 25 seconds. I was able to rewrite it in Cython and
run it in a lot less time than it would have taken to just wait for the Python version
to finish. And it's not contrived this happened. It's a real world example, okay?
23
Cython, by the way, it's a standalone project that you can get if you use Python
in any way at all. You can use Cython. It's a completely separate thing from Sage.
A lot of the development of Cython goes on at the University of Washington also.
It's still a separate project.
For the end, I'm going to show you examples of number theory. Here's making an
elliptic curve and plotting it over a random field, which, of course, is just random
looking. It's fun to do that, because you're not supposed to. So there you are.
Here's an example of computing the [unintelligible] group structure of the elliptic
curve over an finite field. Given any elliptic curve over a finite field, Sage will
do that. It's using baby step, giant step, basically. John Cremona implemented
that.
Here's an example of making an elliptic curve over a 60 digit prime field and
computing -- displaying it and computing the number of points on it, which takes
about five seconds, and this uses PARI, a third party out on to PARI, called sea.gp.
Here's an example of computing a basis for space of weight three modular forms of
level 12. See right there.
So there's a bunch of, all kinds of different number theory related to elliptic curves
and so on. What we need to improve is hyper elliptic curves. Hyper elliptic curves
and Sage have very little functionality. There's a very little bit.
We do have one thing. We can compute the matrix of [unintelligible] acting on the
elliptic curve very efficiently and better than a lot of other -- well, anything
else. But a lot of basic operations with hyper elliptic curves are not in Sage,
which means to be there for crypto applications.
Okay. So that ends my talk. And I'm certainly out of time.
questions that you haven't already asked?
Are there any quick
>> Numerical libraries.
>> William Stein: Ah, yes. So Sage is very good at that. There are two libraries
called NumPy and SciPy, which are mainly funded by and organized by this company,
Enthought, in Texas. Let me just show you their website very quickly. And they're
included in Sage. So their goal is to provide an alternative to Matlab, based on
Python. They're commercial, but everything they do is -- everything they release
is BSD licensed. And we include their code in Sage. That gives us a quite a lot
24
of numerical computing functionality.
So I mean, you know, they do these NumPy and they do a lot of projects, and the stuff
they do is very, very good. Just to show you, just a quick example, you can do,
well, you can do things that look like you're using Sage. But behind the scenes,
it will use, like, that makes a random matrix with -- that makes a random
500 by -- that's not so useful. That's a random 500 x 500 matrix with double
precision entries. Let's say I want to do something like compute all the IGAN
values. Hopefully this works.
So that takes under a second. And behind the scenes, this is using NumPy, which
has stuff that's built on top of law pack [phonetic] and bloss [phonetic] and so
on. Sage incorporates Atlas so anytime you get Sage, it has an optimized build of
Atlas in it. That gives us numerical linear algebra. There's a ton of numerical
linear algebra. If you do import -- let's just say there's a lot. There's also
a lot of numerical optimization, et cetera. So that's all part of the SciPy library
import. For example, import SciPy dot special, that will show you a bunch of
numerical special functions, much of which wraps Fortran libraries, the same Fortran
libraries that Matlab is wrapping.
So you can see there's a ridiculous -- I mean, I'm going on for pages and pages and
pages with special functions that are in SciPy. Or let's do SciPy dot optimize.
Import SciPy dot optimize, and then SciPy dot optimize -- oops. And then you see
that there's a lot of implementations of things. And they're not just like some
cheesy implementation, like F solve. It's pretty sort of big thing, as you can see.
There's a ton of different options. And I think this is, presumably this is wrapping
some existing Fortran library. I think the government sponsored the development
of a lot of public domain Fortran code over the years and that's incorporated into
Sage via SciPy. Yes?
>> Like what about debugging.
Debugging scripts.
>> William Stein: There's an interactive debugger that you can only use from the
Sage command line. So if I start out Sage over there, then there's -- let's see.
There's this project, I-Python, that makes a very nice interactive command line which
Sage uses. And it has integrated into a nice interactive debugger. So if I, I don't
know, if you take percent PDB and then do something that leads to an error, it will
give you a trace back, and it will dump you to the point where the error occurred
and then you can start changing variables.
25
So maybe I'll try to make a singular elliptic curve. That will obviously cause some
sort of error to occur. And it dumps me to this point of code, line 147 of L generic
where it said the invariants define a singular curve. I can actually print them.
It's not like I'm running and in that scope. I can type L and see the code around
it. I can see that it computed self-dot discriminant. I can call that if I want.
I could literally, I think, change the variables. I'm not sure if this will work.
And now the discriminant -- oh, because it's cached, probably. But you can
literally inspect variables locally and step through and see what happens.
It's a pretty nice debugger for an interpreter. And this is often very useful.
There's also a profiler you can do percent P-run and see, you know, maybe you're
annoyed that creating elliptic curves seems to take too long. So it will tell you
what functions get called, what takes the most time, how many times each function
is called, et cetera. So there's that.
There are some IDs for developing Python code, like there's wing IDE, which is a
commercial Windows IDE. There's, of course, eclipse. So here's wing Python IDE.
And since, you know, Sage really in a sense Sage is just a Python library. So any
tools out there for working with Python, you can use with Sage.
So it should have given me a screen shot, but you can kind of see there there's some
IDE here. There are a number of different IDEs. Those also have debugging tools
in them. But a lot of Sage developers maybe were old school, but we basically use
this command line and print statements and stuff, you know.
But there's definitely -- well, I saw a took talk on the weekend by a guy from Sun
and he said they're putting a lot of effort into improving the tools for doing
basically IDEs for Python development, because they think that will make Python a
more desirable platform for a lot of developers, because right now, it's good, but
it could definitely be better. You don't have something like visual studio or
something.
Any other questions?
me.
Because I'm definitely over time.
I don't want you to lynch
>> Kristin Lauter: So let's thank William, and we're going for lunch in case anyone
wants to continue this discussion.
[applause]
Download