>> Arkady Retik: Today in the morning when we... that are five to ten people who we can identify...

advertisement
>> Arkady Retik: Today in the morning when we started the event, Dennis Gannon mentioned
that are five to ten people who we can identify easily as thought leaders in computing. One of
them is Ed Lazowska, one of them is Dan Reed, and one of them is Dave Patterson.
And I went to the Wikipedia to find what can I say about Davy Patterson. You saw a few
minutes ago the delegation from Brazil so enthusiastic that he decided to come and take that
photograph.
And I was hoping the delegation from Italy, the delegation from Spain and the Germans would
like do that as well.
So who is David Patterson? He's former chair of computer science at Berkeley. He's past chair
of the Computing Research Association, CRA. He was a member of PITAC, which is the
Presidential Information Technology Advisory Committee. So he advised the President of the
United States in technology.
And he has written five books, one of which he wrote with John Hennessy. And one day I went
to his office and I saw the book translated into the Basque language. And we went through the
pages of the book and we found that even the figures are in Basque, even the comments are in
Basque.
So I asked him to sign the book and to keep the [inaudible] language. He signed the book in
Basque. So we have to go to the Web, get some [inaudible] in Basque, and he wrote that.
So he's a wonderful researcher. He has -- we cannot measure his impact by the number of papers
or the number of conferences. We have to measure it in terms of awards.
He has 30 awards in research, teaching, and service, and he leads two laboratories, the RAD Lab
and the Par Lab, and he's starting a new effort, RAMP Lab, probably you are going to talk about
it.
It's a great honor for me, it's a great pleasure for me to present him, and here he is.
>> David Patterson: Thank you.
[applause]
>> David Patterson: It's great to be here and see old friends and I hope make new friends.
So I'm going to make sure I have at least 15 minutes for questions at the end. And so I was a
little worried that I didn't have enough, so I may have to skip some slides.
And the first thing I'm going to talk about is what is cloud computing. And, you know, Juan has
a little bit of an accent, so I'm afraid some of you think I'm going to talk about this. But it's
cloud, not clod computing. This is for you Brazilians. This is somebody falling down the
staircase. That's a clod in English. Okay.
Other people think of cloud computing as clod computing. Larry Ellison, the CEO, the famous
CEO billionaire of Oracle, hates cloud computing. He said as far as -- we redefined [inaudible]
everything we already do, all's I have to do is change a couple of lines in the ads.
And so the cloud computing became extraordinarily popular and it was hard to understand what
everybody meant by it. There were strong claims that were hard to believe.
So a group of us at Berkeley including Armando Fox, is here in the audience, decided we would
write a white paper about it to see from an academic perspective what we thought was important
and what wasn't, trying to clarify these terms.
And we put the paper out there a little over a month ago. It was a technical report, and thanks to
the miracle of the Web, there were -- by a year later there were 50,000 downloads, making it
surely the most popular paper any of us had worked on.
And for those of you who are members of the Association for Computing Machinery, the current
issue of the Communications of the ACM has a shorter version of that article in it.
So why did we think we could offer and interesting perspective? As you'll see, we've got a really
strong engagement with industry over the many years at Berkeley. And also we've been using it
both in our research and in our teaching since 2008. We think it's a really big deal and we're
going to try and highlight what the big things are.
So this the arrival of a long-sought idea of utility computing as you heard surely earlier today
about what -- how little it costs to use it. And it was kind of a breakthrough in low cost with
little promises of what it would do.
And in particular it's very interesting there is no contract that had to be done and just billing to
the nearest hour. That's unlike anything that anybody tried in the past and has certainly caught
on.
Software as a service, you could think of the old timesharing systems going back to the 1960s as
software as a service. There's a big mainframe, it runs all the software, you have this cheap
terminal and all the work's done in the thing that's far away from you.
In this hype around cloud computing or enthusiasm around cloud computing, all kinds of terms
got thrown around. And as we went through them we couldn't figure out what some of them
meant. We couldn't figure out what the difference was, hardware as a service or infrastructure as
a service or platform as a service. So we pretty much just talk about cloud computing and
software as a service.
But what we think is different is this pay-as-you-go utility computing. The illusion of infinite
resources on demand. That is a brand-new thing in computing. Often -- always in the past if you
could get much more scale that you had to pay more than linear. If you wanted a thousand times
as many computers, you'd have to pay 10,000 times as much for the right to do that.
The fine-grain billing that -- this is another new thing, that you not only can grow, which
computer science has been worried about growth for a long time, but you shrink. Shrinking is
new. They've never -- that never made sense. How -- what would you shrink? Would you burn
some of your computers? That doesn't make -- you bought them, you want to use them, and
then, well, you buy more. So shrinking is new.
And the bunch of people -- this isn't the first attempt to do utility computing over the last
decades, besides timesharing. There are other attempts by companies like Sun Microsystems and
Intel that got into cloud computing and didn't really make it.
So why? So I think -- our take on it is there's this Web space race is where these companies like
Google and Microsoft and Amazon were becoming more popular, so they had to build even
bigger datacenters. And as they built them at the scales of tens of thousands and they built them
out of these commodity components, they recognized they were getting these incredible
economies of scale, surprising economies of scale.
Who would have thought that if you bought 10,000 PCs it would be tremendously cheaper than
buying 100 PCs. Maybe it's not so surprising that network bandwidth, if you buy gigabits per
second is tremendously cheaper per megabit per second than if you buy a megabit per second.
Maybe that one is not so surprising.
But they repeatedly found for all kinds of important costs that at -- there was economies of scale.
If you acquired a very high volume, you get it much cheaper per unit, so much cheaper that you
could sell it and make a profit. And I think that's why this happened, is somebody -- they were
forced to build it and they recognized what they had, and then people started selling it.
If you had, you know, these services in these datacenters and we didn't have good connectivity,
you know, that wouldn't have done them any good. But we do.
And this commoditization of hardware and software with the popularity of virtualization, it runs
pretty fast, standardized stacks, that also helped. But we think primarily it was the Web space
race that did it.
So the datacenters, this new server that you could think of it as I'm kind of a computer architect,
and you could think of designing a whole building at a time. This is an effort that Microsoft is
doing. They have -- in Chicago it can handle -- I think it's 2,500 containers.
We actually have a container at Berkeley next to us that we bought because we ran out of space
inside of our building. And this container has as much useful cooling and electricity as almost
the whole building, and increasingly it's not the space, it's the power in cooling that does it.
This center that could be a million servers that Microsoft has is pretty phenomenal by lots of
measures. 300 megawatts, dual water-based. So they're building the biggest computer
collections for commercial uses that we've ever seen, and they're getting these economies of
scale and offering it for us.
As we'll see at this conference, there's a whole bunch -- different bets are going on. I think of
this as the early days of cloud computing. We're trying to see which is going to work. Now,
maybe all three of these bets will work.
What you heard -- well, you hear about Amazon EC2. That's on the left of the spectrum. That's
less managed, lower level. It's at the Instruction Set Virtual Machine architecture. That's what
they provide and it's up to you manage it and to scale it up.
At the other end is what Google is offering with the App Engine. They'll talk about that. That's
highly manageable. It will do the scaling for you. And Azure is in between.
So here are these companies with these different skills, all of whom have been successful in
providing big services, and they're thinking this will be the thing that people want. And so I
think we're going to find out over the next five years whether all of these will succeed or one will
dominate. So that's what's kind of fun being in this field, right, things change fast and it has huge
impact both on people's lives and financially. And we'll see how this all works out.
So from our perspective as a cloud computing user, what's -- when -- at least a year ago when
Armando and I were working on this, a lot of the arguments were about what's tablet
cost-effectiveness of cloud computing. And so we wanted to add our arguments for it.
And we take a lot of the perspective of providing a service. So if you were trying to create a new
company, create this thing that you hope to grow, what is it from your perspective.
So typically if you bought the computers yourself, you want to make sure you have enough
capacity to meet demand. All of these Internet services, even companies the size of Microsoft
and Google, will see differences between night and day and even on the weekends. There's less
work on the weekend.
So maybe there's factors of five to ten between the peak in demand. So if you build for the peak
demand, a lot of the time that's idle, and of course with the virtual datacenter you can grow and
shrink, right? So that's one of the -- you have less unused resources as a cloud computing user.
So what's the danger here if you're a startup company doing it. Well, one danger is you think
you're going to be so popular, you overbuild, you take all your money and you build this gigantic
datacenter and actually you're not all that popular, so you sunk all that money into your
datacenter even though you hardly ever use it. So that's one danger.
The other danger is the opposite of that: You don't want to waste too much money, so you don't
have enough capacity to meet the demand. So what's going to happen. So we had two scenarios
here. One is you're providing this Internet service, and the users, they run in and then they can't
log on anymore and yet they keep coming back every day.
We think a more likely one is you become popular, you run out of your hardware capacity, those
people get frustrated by it and decide not to come back. And eventually you stop getting enough
users so you actually meet the capacity. But you've lost that revenue, you've lost those
customers, and it's not a great thing if you're a company trying to become a popular service.
So -- so that's kind of this Internet service perspective. I think one of the -- we think people use
it for two ways: They provide these services or they do kind of batch processing, they have a lot
of computation to do.
And what's striking to me is this what we call cost associativity; that you could get a thousand
CPUs for one hour and that doesn't cost any more than one CPU for a thousand hours. And that's
a remarkable statement that's never been true before.
So actually how we got into this is the feedback we got in our research in cloud computing is we
were, like a lot of researchers, trying to do cloud computing and our example was 20 servers in
Iraq. And the feedback we got from industry was that's -- we understand where you start, but
you have to figure out a way to do your research at the scale of a thousand servers.
And so we went back and, jeez, what are we going to do, get ten more containers, and where
were we going to get the money and the power to run that. We agreed with the comment to do
cloud computing research we had to do larger scale, but there was no way, even if we could get
the donations, that we had the manpower to operate that.
So, well, then we got interested in cloud computing and tried to find those resources remotely.
So with less than I'd say -- I think it's -- Armando can correct me -- less than six months
[inaudible] you'd need to figure out a way to do your research at the scale of a thousand. We
went back, thought about it, explored cloud computing opportunities, and I think wrote a paper
using a thousand processors after that [inaudible]. And I think that's right. And that was
accepted as a conference where we went from like 20 to a thousand and we couldn't have done it
without cloud computing.
So that's from a research approach. And then kind of the poster child that Amazon talks about or
used to talk about was Animoto, the startup company that they were -- they'd kind of do a music
video for you automatically. You'd supply some pictures and some music, and they'd create a
music video for you.
Well, somebody finally plugged it into Facebook a few years ago, and for every 12 hours, three
days, they doubled. So they went from needing 50 servers to 3,500 servers in three days. So
there's no way you could have planned for that as a startup company you would have had that
resources around.
And then not only do they grow up, they could shrink after it became -- after things calmed down
they shrink, so they wouldn't -- they wouldn't have put all those orders for 3,500 servers and
having them sit there idle.
And then from a business's perspective, often you couldn't do this even if you wanted to. Your
IT organization wouldn't allow you to buy a lot of computers. You'd have to plan yours in
advance, and they have the space and resources in cooling. So kind of another way to think of
cloud computing, it's like the PC revolution, is used to be there's the mainframes and they would
say what you could do and how much cycles you could get and terminals, and they controlled the
computing budget.
And the personal computer unleashed that. You could take your money instead and buy a PC
and have your own computer and do whatever the hell you want and they couldn't stop you.
Right? That was a big change in the computer industry and people's experiences.
Cloud computing has some of that. Cloud computing is I'm not going to take my money and
give it to the central service; I'm instead going to take my money and go outside of my
organization and do whatever I want with it and you can't stop me.
And so that's another kind of mind-expanding opportunity here.
And then one of the things we talk about a couple times is you don't have to pick one or the
other. You don't have to do all your computation in your computing center and never use cloud
computing, or you don't have to burn all your servers and do everything on cloud computing.
You don't have to do that.
What's -- we call it surge computing, and I think more popularly it's called hybrid computing, is
you have a set of computers, and then when that surge comes you surge into the cloud. So the
cloud provides that insurance that Animoto needed; that suddenly you become popular and you
need hundreds or thousands of more servers, but you could still have your own set there, that will
have some advantages in privacy.
What it means, though, is that the software stack that you're using in the cloud you have to be
running locally so that you can mostly do that. But I think a lot of people will find this one an
interesting compromise.
One of the things that I am surprised about is when I talk to my friends in high-performance
computing about this, was this seems pretty obviously important, is they don't get it. There's a
couple of technical reasons. But primarily they'll -- to the first order they'll do this calculation:
How much does it cost for me to buy a cluster, run it 24/7 by three years, how many hours would
that cost on some cloud computing provider. And then do the analysis, and then cloud
computing is more expensive, we're not going to do it.
So what does that ignore? Well, it ignores taking a graduate student who wants to become a
Nobel Prize-winning physicist and turn him or her into a system administrator.
The other thing to ignore is the cost of space, power, and cooling, because they don't pay that.
That's the campus's problem. Right now the Berkeley campus, you know, the financial problem
is everybody's problem. Everybody -- when the campus has a financial problem, we all have a
financial problem.
So saving the campus money is a good thing. We should care about that.
And the thing I really don't understand is they don't appreciate the opportunity cost. I mean,
we're in this race, especially in science and in chemistry and physics and biology, people are
often trying to discover something about the natural world. And the first person to do it gets the
glory, right?
Doesn't it seem obvious that the time -- something that would get you there faster would be
valuable? So if instead of 20 servers for a year you could do a thousand servers for a week, that's
about the same amount of computation, wouldn't you want to have a 51-week head start over
your competition?
And when I talked with them, they don't -- I don't understand why they don't understand the
value of that.
For those of us who have environmental concerns, you know, the information technology -there's all kinds of computations, but maybe we use up as much -- we have as much a carbon
footprint as the aviation industry. I flew up in an airplane here right now, and maybe it will be
twice as bad in a few years.
So is there anything about cloud computing to save energy. Well, yeah. Basically off a lot of
these machines -- when you buy machines, they're idle a lot of time. They're sitting there unused
but wasting power. More or less, if you have your computer on and it's doing nothing, that's
two-thirds as much power as if it's completely busy. So idle computers are wasting power all
over the place.
It'd be a lot -- it'd seem a lot better to all combine our resources and use -- ship photons over fiber
rather than electrons from dams around here or down -- not down to Berkeley, you know, a much
more -- much more environmentally friendly. And not only that, it allows us to -- there's finally
an economic incentive to turn things off.
When we tried to argue with our friends at these companies they should turn things off when
they're not using them, they said, well, there's a lot of practical disadvantages of that. All of our
sensors are based on the machines working. You ping the machine, it responds back, it's
working. If we turn it off it won't happen. All our monitoring [inaudible]. So now we don't
want to do it.
Well, now there's an economic incentive. Why are you leaving your servers on? You're wasting
money. If they're idle, you should turn them off. So the economics match kind of a behavior
that you would like as responsible engineers and scientists of trying to worry about the planet
some by not wasting power when you're not doing anything.
Okay. And in this paper, we said, you know, this is really exciting, we think this is going to
change the world, but there's a lot of challenges to go. So these are both technical and
nontechnical. We decided to pick a top ten, and I'm going to go over those -- the paper has a lot
more details and some examples of how to do that.
So the first one is what's going to be -- what are five obstacles to growth. And one of them is
how do you program these things.
Well, Armando at 1:30 in that room is going to talk about a really exciting new idea called
SEJITS, for -- let's see -- for Selective Embedded Just in Time Specialization.
And it's -- we think it's going to have impact on both cloud computing and on multicore
computing and programming in a really high productive system.
You want storage that can grow and shrink, just like computation grow and shrink. How you'd
like a computer to help you grow wisely, not -- and shrink wisely, that's another challenge.
Performance on predictability. One reason scientists don't want to use cloud computing is the
problem of simultaneously scheduling, so that's another challenge for cloud computing people.
One of the surprising challenges is the speed of the Internet. It's not all that fast if you want to
transfer terabytes of data. And, surprisingly, FedEx-ing disks is a pretty fast, high-bandwidth
way of doing it, and there are actually services that provide that. Since we published the paper,
the people are actually doing that.
How about adoption challenges. From a business perspective, one of the worries is you put your
service up there on this and it goes -- and the company you're using goes away. We can't think
of any other way to do that than to provide multi -- have -- use multiple organizations so that you
could provide it to more than one place.
And the related thing is data lock-in. I'd put my data up in the cloud, that service goes away,
how do I get my data back out.
So we think you have to use multiple providers, and for multiple providers to be interesting,
there has to be standardization. And so standardization helps with lock-in and things like that.
A lot of times, I think if there are -- often people from industry and there's a cloud computing
talk, one of the first questions will be what about data confidentially and auditability. We think
there's technologies that can be helpful in that. And surprisingly a lot of companies that you
would think or industries that you would think would be afraid of cloud computing, like
pharmacology or the [inaudible] industry, are some of the biggest users of it.
One of the -- I mean, it would be great if every company that provided cloud computing would
list all of who their customers are. But right now we don't know what that is. So people think
this is a big problem, but lots of those companies are using it.
Kind of a more nerdy thing is reputation fate sharing; that if somebody in your cloud does
something that gets them banned, that could affect you as well. And then, you know,
pay-as-you-go licensing needs to become popular rather than just, you know, pay the CD model
of it.
Okay. So that's a quick cloud computing, what it is.
Software as a service in education. This will be more computer sciencey. I mean, in reflecting,
you know, what we've been doing is it's a typical depth-first curriculum, where we dive into a
particular topic all the way to the bottom is the typical way to do things, where this Web 2.0
approach to programming is more holistic. Lots of things are coming together.
One of the things -- Armando has led the way in teaching these courses, but one of the things
that's really interesting is these modern programming languages or scripting languages, highly
predictive languages, like Ruby and Python, really take advantage of modern ideas in computer
science.
They have meta programming and closures and high order. And not only do they have them,
they show how that makes you more productive for programmers.
And this happens to be a perfect fit for undergraduate education. You exactly want some kind of
tools that are highly productive so you can create an app in a short amount of time, like within a
semester or a quarter, and they actually work at the end. And plausibly the tools can really do
that.
This is in contrast to what we've done in the past, which because it's so hard to get the thing
working often at Berkeley, at least, it's kind of fill-in-the-blank programming. Here's all this
infrastructure, you've got to figure out what to put in there, and then you get it to work.
It's focusing -- these courses are focusing on the software as a service environment rather than,
say, on a more traditional platform.
And Ruby on Rails is this high-level framework that we've been using. And we expect the
projects to work. So we -- the classes are Armando and I have taught together, maybe a dozen or
two dozen properties, we expect a dozen or two dozen projects to work at the end of the
semester, and I think that's often not the case, ironically, in these courses.
I think that just said that. Okay.
So what -- that's kind of the software as a service. What about cloud computing with education.
Boy, is it a good match. Cloud computing is a good match to education.
You know, we only teach I guess -- 30 weeks a year we're teaching. And we don't need
computing all that time.
We were able to do a thing where we could overload a database so they could get some idea of
scalability. Well, we needed 200 servers per student to do that. We could do that. Or 200
servers for the class. We never could have done that at Berkeley if we'd done it locally. There's
not that many servers there. There were in the cloud.
The lab deadlines, homework deadlines, the projects, we had this system to coordinate between
all the classes to make sure we all didn't do the same homework at the same time, because that
would wipe out our little computing infrastructure for instruction.
Well, we're trivial compared to a cloud, so we didn't have to worry about that. It just scaled up
when needed.
And it was cost-effective as is, but it was nice that Amazon Web services even made donations
to make it even more cost-effective.
Virtual machines are a really good idea in dealing with courses. You know, often you have
special software or stuff crashes and things. It made it very easy to create an image and
everybody just used it and reloaded it as they needed it.
The hardware's modern. These cloud computing companies are making investments in
hardware, and instructional computing is not often the top priority in a campus, and we can get
this cost [inaudible] to scale up.
And for Amazon it's -- we can do this Eucalyptus thing where it could run it locally and surge
into the cloud.
So this is just what we did before, is before we ran things locally, and in the cloud we switched
over to using cloud assertions of all this. And in particular we didn't really have a system
administrator. That's the name of a local system administrator who his health had a lot to do
with the success of the class. He had to fix all these problems. And if he wasn't around,
everybody in the class suffered. We just didn't have those problems because of the cloud.
Success stories. These are actually -- the amazing thing about this Ruby on Rails environments,
the students can not know anything and by the end of the semester build something that looks
like this that actually works, as I'll say.
So People Debate was to get people on both sides working both sides of an issue, and you could
see the debates on the screens. How about commuting. That's the big thing in the Bay Area.
People talking about you was Found It, I think, and Hesperian was to share health information to
third-world countries. These were things that had -- you know, if they worked, they had pretty
good user interface, and it all happened at end of semester and were all deployed in the cloud.
So comments from students who took the course. They took the more traditional software
engineering course in Java. That would have taken us at least twice as long if we'd done that.
And, you know, typically if you taught software engineering courses, kind of the ironic thing,
you're trying to teach good programming practices, and a lot of time the projects don't work.
Doesn't that seem like there's a problem, a problem there? But that's different here. That's the
same thing here.
So what's the difference. Rather than everything is solved on a laptop it's up in the cloud. It's -you know, you're doing it for a class project, well, it [inaudible] on the cloud, a lot of the
students would start showing their friends when it started to work.
So what a remarkable, inspiring service. It's not just an assignment that you do. Some of your
friends start using it, like Found It and things like that.
Well, then you start worrying about user interfaces, you start worrying about dependability and
issues like that. You know, all my -- I can't remember any of my course projects I ever did; they
were all dead at the end of the course.
Well, these things live on, and in cases like we showed, the app [inaudible] the course. We had
an example remarkably of somebody wanted to do a health app in Tas- -- no, it wasn't Tasmania.
>> Armando Fox: It was somewhere in Africa.
>> David Patterson: Somewhere in Africa. And they knew somebody in Africa. They were
trying to do a health app for a boarding school. So they actually had a doctor, and the boarding
school helped them with the design of the app in this undergraduate class and deployed it,
deployed it there as well.
The virtual machine images is a really great way to be able to create what somebody uses. And
then, you know, the students bragged about this in the resumes, and it was this is the thing -- it
wasn't just I took our course and got an A, they would write what they did on their resume and
often the interviewers -- you know, they knew more than the interviewers about a lot of these
things from the hands-on experience. So it was a really inspiring experience.
I think the SaaS is a really -- you know, the tools for SaaS and as a target it's a highly motivated
target and part you can show off to your parents, show off to your friends, and you actually get
something working in this semester about something you care about and deployed on the Web.
And just cloud computing just seems perfect for courses; that only need it some of the times of
the year. That's when cloud computing is at its strength.
All right. I'm on schedule. So what is cloud computing? Boy, it seems like a good match to the
education.
And what's the RAD Lab research project? So almost five years ago -- I think we're four years
into our five years' mission -- we said we think it'd be an interesting research challenge to allow
one person to develop, deploy, operate the next-generation Internet service. So we had always
used eBay. So eBay was the first version which created over a long weekend, but then they had
to build up this giant corporation with hundreds -- or thousands of engineers to turn eBay into
what it was then to have millions of people use it.
Wouldn't it be great if you didn't have to do that, is if we could invent the technology, that one
person could come up with the idea, and although they'd have to use cloud computing to get
enough computing resources, they could -- they could scale it up and operate it themselves. A
hand full of people could do that. In this case, one.
So we brought together people on systems, like Armando, Randy Katz and I. I joke about
machine learning as we have the Michael Jordan of machine learning whose name is Michael
Jordan. And people in networking and security databases all together there, and this is about
people.
The reason all these logos are here, the Bush Administration inspired us to find industrial support
for our research project.
[laughter]
>> David Patterson: And their inspiration -- Ed Lazowska and I, who do these national things,
were simultaneously trying to get DARPA to continue to fund things. And we just got
documented evidence today that what we said is true, is they cut academic funding in half. And
we said that and people made fun of us. Well, not people, the people at -- in the Bush
Administration made fun of us for saying that, and, yep, they cut it in half.
Well, simultaneously when we were trying to argue for research funding, how was I going to do
my research without funding, right? Well, we better figure out how to work with companies.
And fortunately Microsoft and Google and Sun all stepped in as major donors. And then over
the remaining years we've added many more companies as affiliates to this project. And so it's
primarily an industrial-funded research project, which was another thing we tried on this.
The fundamental bet -- I didn't say that. The fundamental bet was that statistical machine
learning would help us reach this goal, is we were going to use computers and modern ideas in
machine learning to allow things to be scaled up and to see what was wrong and stuff like that.
So how were we going to do that. It wasn't just we were going to get really smart people, we
wanted machine learning to step in. Would it work. And it worked, I would say. And I think
the good news is that that really worked. I'll give you some examples.
But predicting the performance of complex systems, automatically adding and dropping systems
if you're violating what they call a service level objective, and one of -- as I'll talk about, one of
the amazing things is take millions of lines of source code without telling it what's important,
turning into a one-page decision tree to detect anomalies by somebody who didn't build the
system.
This is our organizational slide. This Elvis character comes from Microsoft personal- -programming personalities. And so there's Einsteins and there's [inaudible] at the bottom of the
food chain and Elvis in the middle there. So we wanted to enable Elvises to operate a datacenter
like this.
So in the lower right is our software stack which has virtual machines and operating systems, the
Ruby on Rails environments. And then we had a trace collection system that's called Chukwa
and XTrace, and that's at the base there. So we can collect the data together so we can feed that
to machine learning. And the Web 2.0 apps.
A storage system I'll talk about later, a scalable storage system that's called SCADS, and then the
log analysis, which I mentioned, we were able to take the trace information and do the machine
learning log analysis, and I'll talk about that next.
And then to run this whole thing we needed something like the operating system for the
datacenter. We -- that operating system supervisor has been used a lot, so we used the term
director. And I'll tell you a little bit about that. The idea is this director would inform the
operator of what's going on, and the operator would approve the decisions the director would
make and sometimes just let the director make them on their own. That was our vision.
So let's talk about the console logs. So console logs are these things that programmers write
individually. They write a note typically to themselves into the log. Well, now a datacenter will
have -- or an application will have programs from hundreds of people.
So these logs will have this information. Operators -- or the developers wrote -- not operators -developers wrote for themselves have all these intermixed in there. Millions of lines of code.
Tens of millions of lines of code. What do people do today? They typically write Perl scripts.
The operators, not the developers, will write Perl scripts [inaudible] try and find something
interesting.
So not surprisingly this doesn't work very well, and actually most of the time people just ignore
what's in the logs. They don't even look at them.
So what we wanted to do is use the magic of machine learning to without telling it what to do it
would find interesting things that would be useful to the operator.
So kind of the problem is 24 million lines long, add machine learning and give advice to Elvis
over there on the right, so that sounds like a hard problem.
So what the student who led this did was first he parsed the logs to turn them into something
useful. The machine learning works best with numbers. And then after parsing the log so a
computer -- it was more understandable, he found -- he did feature extraction, so what are
interesting features that we should feed into machine learning. And he came up with a couple
that he thought would be interesting.
Then he used more or less standard machine learning algorithms, in this case PCA. And then
finally those machine learning results that said, you know, boy, this looks interesting, it's hard for
the operators, so he turned it into a decision tree, which is basically if this happens less than that,
then it's a problem. If not, go ahead. And then you go down to the bottom of this, and you see
there's about eight red things on the right, and then finally it's okay.
But it boils it down to a single page, amazingly enough. And it works extremely well. The
student who did this who's graduating soon is going to visit other companies in the Bay Area that
are Web services and trying to use their ideas there because he's basically said there's this
untapped resource that you're ignoring that will be very helpful for you to understand when
something strange is going on in your system. So it's worked out even better than we hoped.
What about this growing and shrinking of the datacenter. That's this idea, what we call a
director, which is the thing that does things, and then we have advisors that are the intelligence
behind that. And so our model is that we have a bunch of advisors that are watching parts of the
system, have models of how the system works, and that gives input to the director who using the
virtual machines turns things off and unscales them up. I think I just said those things.
So how about storage. So computation growing and shrinking, we see how that works. But
databases don't usually let databases grow, but they don't particularly shrink. What would that
mean, a storage system that could grow and shrink.
So we decided that it wasn't something that we could easily use, so we developed our own
system called SCADS, for Scalable Consistently Adjustable Data Storage. And the goal is that
you could build the small system and then as your users grow, it without automatically scale up
and it would scale up in ways that wouldn't increase the latency per request. So as you get more
users, you'd have to buy more machines. But the latency per request wouldn't go up, it would
scale.
So some of the key innovations are new -- a replacement for SQL, a performance-safe query
language called PIQL, declarative performance/consistency tradeoffs. One of the big things,
instead of have the asset semantics is to let you say, well, we'd like these things to happen in ten
seconds or it will happen in a minute. And then using the director it would automatically scale
up and scale down.
Okay. So I -- good thing I added some slides.
All right. So I'll have even more time for questions here. So we think cloud computing is really
a big deal. We think this is going to transform the information technology industry. For those of
us who build hardware, if instead of selling a server one at a time you sell a container full of
services, that's going to affect the way you design hardware.
I think cloud computing will affect hardware the least. For the software industry, we're going
to -- anybody can become a software as a service. Rather than shipping CDs you'll be shipping it
as a service. I think in the next decade we're going to -- this is going to transform this -- the
service industry, the software industry.
And kind of remarkably, technology is there between the software and cloud computing, but
anybody in the world can build the next great service. It's not a select few with the companies
that [inaudible] anybody can do that. All's you need is to get popular and then grow as the man
needs it.
As I said, I think -- boy, it's hard -- there must be industries that cloud computing is not going to
have any impact on it. It's hard for me to think of what it is. Certainly academics, both at the
research enterprise and the education enterprises, it's a great match for cloud computing. It'd
be -- it'd be embarrassing if ten years from now we haven't embraced cloud computing to try and
make things more cost-effective.
It's -- you know, this is from a kind of research side in systems, it's got this new challenge.
Systems have to shrink down as well as shrink up, and it's got this nice characteristic of
shrinking down is the environment -- shrinking is kind of conservation. It's environmentally
friendly.
And then the project at Berkeley is taking on a lot of these challenges. We've been working on it
for several years, and you can go to our Web site and see what we're doing or go to Armando's
talk.
All right. Thanks very much.
[applause]
>> Arkady Retik: So the question [inaudible] bring the microphone to you.
>> David Patterson: And the first question. Great. I'm a bit long-winded. I'll try and keep crisp
answers.
>> One challenge maybe deserves more attention is security. So suppose NATO or Warsaw
Pact wants to use cloud computing, which would be natural and advantageous to them. But then
it's kind of paranoid world. You put stuff which is very secretive or big pharma uses that. And
if you just -- on that aside, suppose I am Microsoft, you put your stuff, just storage -- you encode
the whole thing, I have no idea what it is, then you take it back. Then of course you're safe, but
it doesn't buy you that much. So it's important to think that on the cloud you can do certain
things with the data, even though it's encoded. So I'd like your thoughts.
>> David Patterson: Yeah. So I've heard the people at cloud computing companies give keynote
addresses in lots of forums, and privacy and security is one of the first questions they always get.
So I think -- let's just talk about companies.
Not surprisingly, the way companies get started in cloud computing is they put things out there
to get experience with it. And the more success they have the more they do. And apps -- and so
then -- and some of the interesting things about why would people -- how could you imagine this
ever happening given -- seems like a pretty insecure thing.
Well, at least in the United States, every major corporation outsources their payroll to this single
company. And they've been doing it for at least 15 years. And so nobody does their own payroll
systems. Well, payroll's pretty sensitive information. So how do they get away with it. Well,
people have been using this company for a while. They haven't leaked this information, and so
they just do it.
Another example is e-mail. Boy, you know, if you -- I don't know what I have that's secret in my
house, but, boy, my e-mail has a lot of stuff in it. So many corporations outsource their e-mail.
So why -- if they're worried about security, why are they doing that. I think they've got some
level of comfort now about outsourcing e-mail systems.
You know, I don't know if Berkeley -- I don't think the University of California, we're not
supposed to outsource our e-mail, but governments, cities, corporations have outsourced their
e-mail. So there's examples of two pretty very sensitive things that in practice people have felt
comfortable doing even though theoretically could do it.
And, like I said, it's not like -- cloud computing users don't brag about the use of cloud
computing. But I believe that all these companies that are here, if you ask them, is do banks, do
pharma people use it, and they do. So like where there's laws in the United States about not -the HIPAA laws about revealing private patient things. So some of the simple things they do
[inaudible] processing is they'll just replace the patient IDs with some intermediate -intermediate information that they only know, and then send the data there and bring it back.
I think for batch processing, I believe corporations aren't as worried. You know, it's there for a
short time, they do the work on it and they get it -- you know, they could destroy it. Or they
could encrypt it while it's there and then get rid of it back. They don't seem to be as worried
about that. I think you'd be worried about more services.
But I think the tradeoff is this elasticity here, right? The opportunity without making a joint
investment of being able to have 10,000 servers working on something, you know, right now
that's at low cost is pretty attractive.
So I think what's happening is people are all naturally worried about security and privacy, but,
boy, that's sure cheap, and, you know, boy, that's going to be a lot less work. Suppose I'm going
to do a new service. Boy, I could do it there.
I talked to the people at Netflix who were converting all of their movies from one format to
another. I think H.264 was where they were headed from MPEG-4. And so what was the
reaction of the people there. He says, well, the programmers loved it because they kept -- their
software wasn't working, right, the deadline was April 1st and they, you know, well, it doesn't
work great, it doesn't work great. Well, they finally got it working four days before the deadline,
right? Went to cloud computing, got it all done, made the deadline. So that elasticity is pretty
attractive, and I think it's overcoming -- the cost of elasticity is -- overcomes the anxiety of about
security, so I think we're seeing companies do this.
And there's examples it's so cheap where programmers just take out their credit card and go use it
and hope they'll get reimbursed afterwards by their company. So I think people are kind of
voting, and voting with their feet.
And I think when I hear the conversations about security concerns, it's usually from people who
haven't tried it. Or they're probably not going to try it. They just don't -- they can't imagine how
anybody would use it since the obvious security [inaudible] although it's not published, my
impression is this is being overcome, the practical advantages are overcoming the kind of
obvious security concerns.
>> Arkady Retik: Next question.
>> You're giving a very enthusiastic picture for everything on the cloud. But the history of the
IT industry is made of a sequence of swings of the pendulum back and forth between using
centralized resources and decentralizing to users, you know, with mainframes and minis and
personal computers, and we've had sym clients, smart clients, and we'll have client-server Web
services and so on. So do you think this is the last swing of the pendulum, or is there any limit
and are we going to go back to more power in the hands and at the location of the user and the
programmer?
>> David Patterson: Yes. I didn't give you -- we've been saying for a while we think the
extremes of computing are the most interesting. So battery-operated things like this, you know,
on one hand and the cloud in the other is where things are headed.
I mean, who knows in the longest -- you know, who knows -- who could possibly know what the
world's going to look like in 50 years. Who could possibly know.
But I think in the next decade or 15 years, doing things for battery-operated devices, the
programming environment there, and then the cloud, I think that's the programming
environment -- the exciting programming environment of the future is applications are going to
be in the cloud and on the client, on the battery-operated client. And depending on your
circumstances, like your connectivity, your battery power, the applications may shift back and
forth.
I find that a very exciting future, and I think my advice would be if you're looking for a research
project, that's a pretty interesting research project to work on that's got these security issues, it's
got battery life, what are the opportunities there, what's the cost -- what's the battery cost of
transmission versus local.
So I think that's a very exciting future. There's a bunch of legacy stuff that as a researcher I just
don't care that much about. But I find that a compelling version of the future. I don't think that's
going to go away. I think people are -- I mean, the cell phone's an amazingly -- it's the most
successful information technology devised of all time.
I believe I'm the first one to notice this. Half of the people on the planet have a cell phone.
More than half. The majority of people on the planet have a cell phone. Out of 6.7 billion
people, there are 4 billion cell phone plans and probably -- amazingly like 10 percent of the
people have two cell phones.
Okay. So if we subtract those out, that's still more than half the people in the world. And they
think it's going to grow to three-quarters of the people in the world. We have never had a
technology that's that way.
So I think people are voting for portable communication devices. So I don't -- I can't imagine
that one going away. And how are you going to complement that, and I think the cloud.
So I think 10 or 15 years it's going to look like this.
>> [inaudible] so I wonder what do you think about the following statement. So the [inaudible]
main difference between cloud and grid is the cloud focuses more about data and the location
and processing rather than about computations.
>> David Patterson: I know Armando has -- why don't you stand up and -- I mean, I don't
understand the grid a little bit.
>> Armando Fox: I don't speak for Dave Patterson, but I [inaudible] cloud and grid is Amazon
didn't pretend to know what everybody wanted.
[laughter]
>> Armando Fox: The x86 has steamrolled everybody. We believe a lot of people would be
willing to buy this commodity product, which is rentable x86 VM. Let's sell that. There's no
software infrastructure, there's no tools, there's no [inaudible] there's no federated policy. It's just
here. Use this machine for a while. Knock yourself out. That's a very practical -- it's the
difference that a businessperson's approach would have made. Don't pretend that you know what
everybody wants. That's only my opinion.
>> David Patterson: Yeah, and I think the grid didn't concentrate as much.
>> Armando Fox: [inaudible] grid user but a happy client user.
>> David Patterson: I always thought the grid concentrated more on computation than it did on
data. And this is what I like about capitalism, well, there are all these ideas out there, let's try
them, right? And, you know, and capitalism allows the one that's popular -- and I know the
people at Sun who did their cloud computing thing. They thought no one would use something
where there weren't guarantees. Amazon will make it cheap, there's no guarantees, and it's really
popular. And they were stunned. I mean, why would anybody use a cloud or a utility there's no
guarantees. And there have to be contracts. We have to have -- since there's guarantees, we
have to have contracts and penalty clauses. You've got to sign a contract to use it. No, credit
card. No guarantees.
So it was another model, and it's caught on.
>> Hi. Thanks for your great talk. Two questions. First one is on cost associativity. So it's true
that if you look at on-demand instances at EC2, for example, that this is correct, but on the other
hand immediacy does cost if you want to allocate let's say a thousand servers at an instant point
in time. It does cost you more in an on-demand fashion compared to ->> David Patterson: Cost [inaudible].
>> Yeah, you can reserve these ->> David Patterson: Yeah, you could also ->> So there's a difference there.
>> David Patterson: -- sit on them. Yeah. What's the question?
>> So the question is I guess that's one important issue to take into account, and also the second
issue is is that do you think there's no opportunity maybe for certain providers to deliver volume
discounts which would kind of distort this relation a little bit?
>> David Patterson: Okay. So I think the first one is what about reserved instances was kind of
like renting versus instantaneous. And then the second one was volume discounts, right?
So the first one -- well, actually, we'll get in a lot of arguments with economists in particular
about, oh, no, this won't work. Right? In fact, the review of our paper in Communications of the
ACM was I don't care what people are doing in the cloud. That doesn't make any sense. They
should change their count. We know this is the wrong charging algorithm.
And when we talk to the cloud providers, you know, a simple-to-understand cost model has
advantages in selling it, right?
So I think, you know, this reserve instance where you rent, you know, it may have practical
things to do.
What's exciting to us as researchers is the scalability. Can we create software that can take
advantage of that opportunity, because, for me, if you rent stuff, I can more or less guarantee
you're going to be wasting power. A lot of the time it's going to be idle and you're going to be
wasting power, and I don't see how a cloud writer can turn that off because you've rented it.
That's your responsibility.
And so I like it -- it's a much more, I don't know, engineering-sensible, conservation-oriented
thing that you grow and shrink as you need things. That's that one. What was the second one
again?
>> Well, I was going to pose the second one. It's about the green, let's say, argument. So I
certainly subscribe to the idea that you can do optimization there with green energy, collocation
with energy, generation facilities with datacenters. On the other hand you're also removing
constraints from consumers of using these systems on a very large scale.
So do you think there's a certain risk of this system getting a bit overloaded when everyone starts
jumping in with very large-scale computing and demands there?
>> David Patterson: Yeah. So but the job of a cloud computing provider is to provide the
illusion of infinite scalability, and they have to manage that. But we've transferred the risk from
us to them. So far in our experience they've managed that.
The other thing you asked about, volume discounts. I suspect if you were to go to one of these
companies and say we want 100,000 servers, I bet you don't pay the same that I do. I bet that's
there. But I think there's kind of a beauty in their standard charging algorithm that it's easily
flexible, and I don't see any evidence of them changing it. They've lowered the cost over time
and provide new types of instances, but this simple model has stayed and so -- and it's attractive
to us.
What I like about it as a researcher, if I've also been interested in cost performance, and
whenever -- as a field we do performance. And when you bring up cost, people don't trust you.
This has cost. I mean, you don't have to argue about it. This is a cost performance argument.
We just wrote a paper about what we did compared to what it would cost on Amazon, and we
were able to calculate the difference there, and nobody is going to argue about that. So I like that
too. All right. I told you I was long-winded.
>> Arkady Retik: Who else has questions?
>> David Patterson: There's still more time.
>> I have one. Hi. My name is Frederico [inaudible].
We saw earlier today science in the clouds and then you saw education in the clouds. And I have
the gut feeling that in the future we might see desktop in the clouds, like people using the clouds
as a major synchronization media for multiple gadgets and as a way to preserve their data and a
way to save energy. Do you think that cloud infrastructure today would have to change what it's
doing if such demands comes?
>> David Patterson: Well, I think -- like I said, I love that we're in a fast-moving field. I'm
really glad I'm not in history or -- I don't know what's the new innovative thing you teach about
something that happened 2,000 years ago, but I'm sure there's something. But, wow, we're in a
fast-moving field. It's really exciting.
People are making different bets. There are people who are building clouds that have, you know,
thousands of GPUs on them, so that they're going to -- their idea is they're going to do gaming all
in the cloud and your device is going to have very little in it. And that's the way to do games,
right?
I don't know that -- that doesn't strike me as a good idea, but people are trying this right now. So
I don't -- I -- how fast is wireless communication going to improve, what about how are people
going to build cell phone towers.
There was an article this week about AT&T having instead of a cell phone tower you have your
own little tower that you put in your own home. So there's all kinds of interesting things that are
going on.
The one I think negative is probably Internet bandwidth. The one -- as a person who's kind of a
cheerleader for new technology, a cautionary tale would be Internet bandwidth. Internet
bandwidth, at least in the United States, isn't so much technology driven as monopoly driven or
standards driven or something like that. Is Internet bandwidth going to get a lot cheaper in the
next few years or is bandwidth -- Internet bandwidth go up a lot in the next few years for the
same price.
Boy, that's harder to figure out. One of the reasons is the cost of the switches in the datacenters,
the multimillion dollar Cisco switches set some kind of limit on how cheap that can be.
Maybe we'll need innovative new switches, much cheaper switches to bring the cost down. So I
think the Internet bandwidth one is sticky, but, boy, the wireless stuff and what people are doing
in hardware and energy efficiency, it's hard to -- I'd be afraid to say something can't be done.
Will we -- well, I think Microsoft already offers all of its desktop apps that many of us use is
there's a cloud version of all that today. So it's easy to -- it's easy to imagine anything being
anywhere.
On the other hand, in Berkeley there's places I can't get Internet -- I can't get wireless access in
Berkeley. So I would want -- I particularly would like whatever my wireless gadget is to be able
to do stuff locally without being disconnected. So I think that's why I think this is kind of
shuffling back and forth between the cloud and the device will be a pretty exciting area.
>> Arkady Retik: And we have one more question.
>> David Patterson: At the rate that I answer.
>> It's kind of a follow-up question to your points around bandwidth, and that's do you -- have
you looked into a theoretical maximum for storage, how someone goes about storing in the cloud
either using a bandwidth upload, ingress methodology or through just a FedEx sort of ingress
bulk intake?
>> David Patterson: Yeah, I don't think we've calculated that. But disk capacity is continuing to
grow. FedEx doesn't change. We could ship a lot of disks anywhere in the world quickly, you
know, in a day.
So if I was doing -- that's the way I would do bulk transfer there. It's hard for me to see some
practical limit. I think disk is still improving almost by the aerial density of disk I think is still
improving by almost Moore's law, and, you know, you can calculate how much to ship a box at
the certain size to do that.
But I think that's probably the way I would calculate -- if I was going to say what makes sense to
do, you could certainly ship -- well, you can get multiterabyte disks today. So could you ship a
petabyte economically, I wouldn't be surprised. Or a big fraction of a petabyte if you worked it
out. That's plausibly you could do. And then you could grow by kind of aerial density
improvements from that point.
So I don't know if that's a theoretical argument. That's a practical argument of something. That's
the way I would do it practically.
>> Arkady Retik: Okay. Thank you very much. Please join me to thank you.
[applause]
Download