So welcome, everybody. This is the 4:30 to 6:00 session of the

advertisement
>>: So welcome, everybody. This is the 4:30 to 6:00 session of the Cloud
Futures 2010. This session is about systems, and well have three
presentations. The first one is by T.S. Mohan from Infosys in Bangalore.
He's going to be talking about the many specific software architectures,
followed by Tajana Simunic from the University of California, San Diego,
followed by Zach Hill from the University of Virginia.
The presentations are going to be 20, 25 minutes, and then at the end of the
presentation we'll have about five minutes for questions, and then hopefully
we will finish in 90 minutes the whole session.
T.S. Mohan:
Thank you, Juan.
Good afternoon. I'm going to present to you some work that we are currently
doing [inaudible] between me, based and India, and a person, Nenad
Medvidovic, who's working with USC, University of Southern California, and
Chris Mattmann, who works for Jet Propulsion Lab and NASA. Of course, he's
also professor in USC.
This work essentially is based on certain things that has been done trying to
understand grid computation technologies and how people who tried the
program, the grid, had run into various kinds of problems. So we did an
analysis of the code and then we found that trying to extract some
architectural patterns out of such things would be of use because if somebody
else wants to write code, they could do a better job.
But let me start, would it not be good to introduce that into the world of
cloud, because notice everybody wants to do the cloud programming. However,
in your [inaudible] application and when you architect [inaudible]
application, most often you run into all kinds of problems. In fact, the
biggest challenge has been that if you say you're migrating the cloud, people
think that you're just doing database migration from a database in your local
datacenter to maybe database in the cloud.
It's too crude, too simplistic. We're looking at all kinds of problems that
can happen, the patterns that can be leveraged, and how to go about it.
So the big question that we hit upon when he tried to look at migrating into
the cloud was have we really understood the cloud, because what we found was
there are too many definitions, too many terms and too many usage patterns,
too many technologies involved, and consistency across the board was missing.
There are lots of clusters of activities going on, common terminology, and a
lot of confusion.
So the biggest challenge that we find is that there's hype, there are a lot
of domains, and domains have their own specific usage patterns, right?
Actually we heard about a lot of how the cloud has been used for the
Worldwide Telescope Project. That is one site. We also heard about the
Microsoft Biology Foundation using the cloud for analyzing data in huge,
large scale, and the usage patterns looked really different.
But then, interestingly, the underlying technology [inaudible] because that's
Microsoft Azure. If you look at how it happens around the world, we see that
there are lots of open source packages like Hadoop, like Amazon getting used,
like people wanting to set up their own private clouds and struggling there.
And when you see how they want to move an existing application onto a cloud,
then things kind of start falling apart. So that kind of set the background
for our work. And we found that somewhere along the line things mature up
and we'll have better understanding of what typically a cloud should be and
how we should go about using.
So what is it that really clicks for a cloud? Is it just these three things
like what you heard all the keynote speakers talk about, the pay per use,
where there's no capex, only opex, or an ability to meet seasonal loads and
what Dave Patterson called the search computing. So I have my computing that
happens on a regular basis, suddenly I have an extra load. I just move to
the cloud for that part and then come back.
Scaling down was a very interesting aspect. And it's also a case where
because of the hype, we kind of see that there's uniform simplified
abstractions being presented for programming the cloud, and that seems to be
pretty selective, so people say let's do it in the cloud, but when the
reality bites, that's when we start [inaudible].
So large applications are missing on the cloud. Large data has been
processed on the cloud, but applications have been simple. So this is the
kind of experience that we are having in [inaudible] the programming being
done in the cloud.
The service offerings in Windows, we see that on one end of the spectrum we
have large players like the Microsoft Azure or the Google App Engine or the
Amazon Web Services. At the other end of the spectrum we have a lot of small
niche players, and these are all cloud enablers. There are a lot of these
type of companies which say that, look, give me your problem, we'll try to
help you get the best out of the cloud that is there. And typically the guys
who go to these small companies are IT departments of various enterprises.
So what happened to the in-between part, big players on one end, small
players on the other end. In between, what's happening? And that in between
what's happening is many of us in the IT outsourcing industry, IT industry,
are trying to grapple with, because what we find is large enterprises have
CAP datacenters. They have a lot of these technologies like virtualization,
which some of them are already using it, but can they have a private cloud.
And when it comes to that question, a lot of questions are not answered. In
fact, that's one of the big reasons why a lot of the enterprises are kind of
sitting on the fence when it comes to moving to the cloud wholeheartedly.
They might want to do search computing, but they'll not want to do cloud
computing per se across the board.
So the big question then asked is, if there's a value add, what is the kind
of value add? Is it only economics? Is it only that the cost of [inaudible]
comes down? Or is it that things will be a lot more easy, a lot more
standard compliant, a lot more portable a lot more intraoperable, a lot more
[inaudible] secure. Like what Dave was saying, we all have our information
kept on the cloud assuming that we trust that the security that they give us
is good enough, and yet when it comes to thinking about it, we ask these
questions like is it secure. Right?
And how sustainable is it? This is the other question that gets asked.
I run a business long term on the cloud without depending on anything?
Can
So there are, of course, a lot of key issues and challenges for sustained
usage, and that's where industry people like us step into the picture. We
want to ask how do you handle this thing called consistency, how do you
handle this thing called availability, how do you handle this thing called
performance when it comes to doing, say, [inaudible] transaction processing
of a business on the cloud. Right?
In fact, all the popular usage on the cloud has been, and information could
be inconsistent, and that's acceptable. It could fail, and it could be a
case where the performance is pretty poor, and in spite of it, people are
happy with the information that they get. That's the kind of applications
that people are working towards on the cloud rather than doing the real
hard-core persistence action processing.
Now, how secure is the information on the cloud? That gets asked, and Dave
Patterson beautifully gave a reply. He said when we file our taxes with the
IRS, maybe IRS is processing much of our information in their datacenters
which are possibly accessible to a lot of people. Right? We don't ask how
secure is that information that we give them, right? We submit our
information to our lawyers and our [inaudible] or the legal attorneys, and
they process that information using possibly public resources. We don't ask
questions.
So we ask such questions, which only mean that you are aware of certain
issues that everybody has been talking about, but you're not really
experienced. In fact, our little analysis off the shelf, off the cuff,
should I say, [inaudible] analysis kind of shows that that 60 to 70 percent
of the information that generally gets used up in enterprise and which is not
mission critical need not be of that high security [inaudible]. And given
that situation, I think in the future maybe going forward, maybe next two or
three years or four years, we see that a lot of applications will move onto
cloud at the enterprise level.
But then, of course, enterprises have to make a choice. Would they like to
keep it in their private datacenters and their private clouds or would they
like to use the cloud offerings by various public cloud service providers.
That's the choice. And that's where we find that many of the kinds of
questions that we ask for this particular topic that we're talking about,
leveraging the -- building specific architectures for applying it to the
cloud service abstractions right now, can we make life simpler and easier.
Intraoperability and portability has been a big issue. A lot of proprietary
service providers have kept their APIs kind of closed. Of course, there's
the pricing model that also gets filtered in. And if you closely observe,
price for moving data into is cheaper than price for moving data out of the
cloud.
Many of these service providers kind of build in this barrier there. But in
spite of it, there has been at times where enterprises to kind of have a
hybrid model where there is the sensitive data that gets kept in their
private datacenters, parts of it is taken, processed across the cloud service
offerings, and perhaps done in such a way that, say, one part of Microsoft
Azure fails to live up to certain parametric standards or parameters of
performance, it also gets processed on, say, the Google App Engine and things
still move forward.
So in such a situation, you need to have a very intraoperable setup.
Programming for that is a huge challenge, and that's something which we look
forward to best practice by which we can go forward.
Variable seasonal cloud services pricing. Bidding for spot instances on the
Amazon server. It's a very interesting example of what the future could be,
because, all said and done, the number of resources, the quantity of
resources available to even the large service providers is finite. And it's
all a question of how they manage it, and it comes to us asking for it. And
the way they will manage it is they will move on to a bidding situation where
the prices keep varying and the challenge for the enterprise architects would
be would it be economic use the public cloud service offerings when the price
varies. Because at a certain point of time it's meaningful, but at a certain
point of time you may not want to [inaudible] any tasks on the public cloud.
So that becomes a big factor, and understanding that is important.
Similarly, you have issues like multi-tenancy and reputation sharing. These
are all things which have been talked about, but then I'll touch upon it
later.
So the big challenge, again, comes here. I want to migrate my big enterprise
application into the cloud. How do I go about it? Do I just move the
application as-is to the cloud, which is like do I go out and use it on
Amazon as IT system administrator would like to do without touching anything
inside? Or should I go fine tune it, program, perhaps reprogram it and use
the platform that, say, Azure or the Google's engine provide? Or should I
possibly use an application which is already available to us on the net and
maybe that is kind of based on the cloud or a prior cloud. Like, for
example, if you take salesforces.com, I'm sure they're using a lot of the
cloud technologies run the show.
If you look at Gmail, all of us use Gmail or HotMail, and if you closely
observe, much of the way in which it happens is the cloud technologies get
used without you even knowing about it. So depending on what application,
what level would we like to get into, step into.
This is kind of a challenge that comes for a typical person who wants to
migrate the application in. And typically when you talk about enterprise
applications, not like one particular functionality or a certain focused
things, it is typically, in a core system -- an application doesn't live in
isolation. So it's connected to a number of other applications, and you
would want to migrate into various parts. How do you put that together?
This was an enterprise application which was a big topic, say, several years
back. And then came [inaudible] web services. All that kind of means that
when move to the cloud, they will all pop up in various shapes and forms.
And that's a big challenge. How do you address it. That's kind of a
challenge or an issue that we are looking at.
And what did we find? It's kind of a compromise. Yes, we cannot have a
hundred persons move into the public cloud because we do not know. Public
clouds don't promise anything. They don't give you SLAs which guarantee
what's called [inaudible] performance, five times 9999 -- 99.9999 percent
[inaudible]. You can't run your own private cloud because if you wanted to
have a cloud which is really meaningful in the cloud sense in your private
datacenter, you really need to have scale and you really need to manage well.
It doesn't make sense.
Then what do we do? We kind of come up with something just hybrid in
between. We use combinations of this. It's not that just because the cloud
has become such a fantastic IT [inaudible] technology available to all of us
we abandon the datacenter and the enterprise and move on to the cloud.
At the same time, we are not going to augment it either. So the [inaudible]
combination [inaudible] use of [inaudible]. And when you want to do this
hybrid cloud, what do we leverage? Where do we go about?
So these are kinds of things that come. Do we use combinations of
infrastructure, combinations of private platform or combinations of software
service that comes in to be able to put together the small ecosystem of
applications that kind of represents the application that you have in mind?
So what level do we leverage? Second is why does it cost more? Because is a
program that uses a particular cloud service software with a certain pricing
model meaningful, but then when the prices shoot up, you'd want to shift.
That agility is something which everybody looks forward to. Can we build in
that agility? In fact, the architect who designs this kinds of solutions has
an additional dimension to think about, and that's the variable pricing part.
And there are associated risks with it.
What configurations and deployments do we attempt? Do we have a certain
deployment which then can be revoked and moved on to another deployment?
Think about it, because in the real world, in the world of the enterprises,
applications continuously change their -- they kind of metamorphosize
continuously. [inaudible] a picture or a notion of an application.
So viewing all these kinds of big-picture situations in the industry, the
question that gets asked is can we operate the principles and discipline of
software engineering on using cloud services? What do you mean, is the
question. So when you develop an application or when we put together,
integrate a bunch of applications on the cloud, can we use the methodology,
the metrics, the software engineering principles and then come up with
mechanisms and means which can help people estimate the costs correctly,
estimate the [inaudible] probably, ensure that the level is really properly
handled and manage and maintain. So these are the kinds of issues that we
face in a typical software industry, but can we do that. That's a question
that comes up.
And in that context, we have an answer, and the answer is there's something
called domain specific software architectures. If you closely observe,
applications don't live in isolation, they are clusters of applications
solving some class of problems in a certain domain and a lot of interesting
things that go along with it.
So there is this nice, interesting group called Software Architecture
Foundation Theory and Practice which has been coauthored by one of the
authors of this presentation, Nenad Medvidovic, and in that you will find a
certain definition wherein we talk about domain specific architecture
comprising a reference architecture, a component library, an application
configuration method and how this reference architecture can be used to make
a lot of principle design choices.
So moving forward, we did an analysis of a lot of good technologies too. We
went and applied some of these understandings and see how good programming
packages like this, how do they go about implementing their understanding
with respect to the reference architecture that they talk about. And then we
found that there are a lot of interesting inferences that one can draw upon
and how these kind of technologies have been used by the applications that
having been using.
The good thing about grid is that both above the application level and below
the systems level, things are visible. But in the case of clouds, typically
from the public service offerings you'll see that it opaque. You cannot
really understand how Google manages its Google file system or what Microsoft
does with respect to its database or data management, right? Of course
there's a certain understanding, but nothing beyond it.
This kind of gave a certain insight ->>:
What is the third column, KSLOC?
T.S. Mohan:
Kilo source lines of code.
Okay? From this we could extract this kind of abstraction. And this is very
high level, by the way. Each of these boxes represent a certain set of
abstract calls, APIs which specify the architecture and which are kind of
cutting across several of these implementations.
Like for a time there was this thing called a run time [inaudible] which does
a lot of this run time management aspects of a typical grid system. There
was this resource abstraction which captures within itself things like
storage, things like database, things like persistence, et cetera. And
there's the fabric part which talks about the communication part.
This is the rest of the grid. Now, can we apply these kinds of abstractions
in the world of cloud. That's the challenge. And if they apply, can we
influence the way the migration to the cloud happens such that it's really
optimal. That's the kind of challenge that we are taking up.
And in that context, we looked up a number of things that we get to do and we
do cloud programming, and it's typically what's been talked about for a long
time [inaudible] systems fallacies like, for example, in the cloud the
network is fully reliable, which is not true, that there's zero network
latency, which is, again, not true because maybe within a rack the latency is
pretty small, but across racks or within a datacenter, it could be a little
larger, and across multiple datacenters, absolutely big. So these kinds of
assumptions that we make impact the performance of the application. Right?
So these are a bunch of things that have been true for a long time in the
distributed systems, and this affects the way the cloud programs are
configured, deployed, or programmed.
And if you closely observe, there are other things which are also to be
worried about. But then we do not have a direct handle in the cloud
abstraction, cloud service abstractions, like [inaudible] infrastructure on
the platform where we explicitly can play around at this level.
So keeping these kinds of things in mine, keeping the DSSA in mind, what are
the typical steps that we go through for migration? Perhaps first we
evaluate and assess what options we have when we split an application across
into components, see which components need to go to which part. Then we do a
pilot on the right level of migration, check it out, and then re-architect or
redesign or recode the total component so that it migrates in full. And then
having done that, we leverage the platform advantages and then we, of course,
look at the largest picture.
There's a platform, but then there's a larger picture of multiple, all these
together, and then we validate it. Once we validate it, of course, we
refactor, reiterate [inaudible] kind of migration service that we get to.
Having done that, what is it that we kind of come to? We have come to
reclassify the kind of cloud services abstraction that you like to at, and
that [inaudible] domain specific architecture viewpoint for a class of
[inaudible] programs. And these abstractions like this, the domain specific
application services abstraction is kind of core to software as a service
thing.
Platform run time collector services go to the platform as a service cloud
offering. The run time collector services abstraction is the one which
manages the cloud within Amazon. The resource service abstractions takes
care of storage, and, of course, fabric services abstractions handle things
like the M queue, message queueing support, which is there.
And having said that, we are studying a lot of applications to get things in
place.
And, of course,
stuff? So this
wherein you can
for [inaudible]
instances.
what does it mean in the existing setup, these kinds of
kind of combines what -- like we have a typical [inaudible]
explicitly ask for run time support, you can explicitly ask
interface and you can communicate between the various server
Now, this is what is kind of interesting, so this would be in the kind of
abstractions that we have, the reclassified one. Same case with platform
service and same case with the software service. In fact, while this looks
really abstract, there's a lot of detail work going on, and I think given the
time, I would like to stop at this level of abstraction.
And how challenging is it to do this job? To be very clear, we got
opportunities and options to either re-architect the whole application, or
parts of it, redesign the same architecture [inaudible], but redesign it or
perhaps re-implement it in a different programming platform. That is one
part. Again, another part is we need to keep these kinds of issues in mind,
parameters in mind. And when you look at it, the kind of options that we
have, on one side we have the private datacenter and private cloud with the
code -- we take an application, we split it into parts, we take one part, we
can either keep it sequential as-is or we can run it in one of the three
modes within the cloud. And on the public cloud too.
So the number of options are humongous. Too many of them. So this number of
options into this number of options is what we need to consider when we want
to do a migration. This kind of gives the big picture of what I wanted to
say, and I stop here.
>>: Thank you. So we can ask questions without having to go around a
microphone, because we have microphones on the ceiling. So please ask
questions to our speaker.
>>:
You were surprisingly very clear [laughter].
T.S. Mohan:
Thank you.
>>: So the emphasis is -- what is the position of emphasis with the cloud?
What are the plans.
T.S. Mohan: That's a good we. I told you on one side. On one end of the
spectrum we have the large service providers like Microsoft or Amazon who
have huge CAP cloud platforms. On the other end of the spectrum we have
companies like RightScale,
Cloudera, and these are all small companies that export these license
enabling people, but in certain specific domains. But then companies like
us, we fall in the in between category. We kind of look at both ends and say
should we be having a CAP datacenter or should we be doing that? If do you
that, then they're really so small, we're not really looking at the scale
that comes out to that. If you do this, then we are not really a service
provider in the sense that we're IT services [inaudible], no. We are getting
to the ->>:
[inaudible] still looking into it?
T.S. Mohan: Not just we. We found that a lot of companies are still
grappling with [inaudible], and that's where this kind of thing comes.
Because if you see the earlier slide, the complexity of this into this, the
amount of ways in which you can do, that shows what migration of a typical
application could be. And an application and the enterprise doesn't exist in
isolation. That's the first reality that one has to wake up to. Second
thing is it's a core system of applications that exist, and in that
ecosystem, parts of it -- parts of parts of applications will be either in
the public cloud or on the [inaudible]. And conquering the complexity is
where I think our scent is going to be.
>>:
So what is the position on the [inaudible].
T.S. Mohan: The position is to research and to build up the competencies,
sell the customers and have them make their money, and we make our money in
the process.
>>: Where do you address the quality of service requirements?
part of the pricing or [inaudible].
Is it just
T.S. Mohan: Good question. Now, if you are asking me what's the viewpoint
from a service provider, then it's something called SLS or agreed-upons.
Service-level agreements. The QS parameters fits into the service-level
agreements, and when a datacenter has multiple options of what tasks to
schedule, what resources to allocate, the QS comes into the picture because
the SLS have an agreed-upon. And if you closely observe today, most of the
large-scale service providers like either Microsoft or Google wouldn't want
to commit on very tough SLAs. In fact, they don't guarantee anything at all.
And in spite of it, many people have benefitted from using the cloud. But
then the usage of the cloud is also not that [inaudible]. It's not like you
have a mission-critical transaction-posting system sitting on the cloud which
can take on the surge needs like, for example, high loads and still
[inaudible]. Right?
>>:
[inaudible].
T.S. Mohan: I heard much about what you said about Apex, but I can share
this thought with you. Domain specific service software architectures are
not just limited to the platform or to the architecture. In general, at the
systems level we talk about domain, say, finance or, say, insurance or, say,
the biology, so we look into the specific class of problems that a particular
approach solves and look at the domain specific best practices there. That
dictates the architecture, and that gets [inaudible].
>>:
Do you have another question?
>>: Yeah. So you talk about kind of aversion to varying the pricing.
Enterprises are worried that these prices are going to change. Do you
that's a valid concern or -- if seems to me that this variable pricing
going to exist regardless of whether you use a cloud or you build your
datacenter. I mean, you have to pay for electricity from some utility
T.S. Mohan:
>>:
No, no.
think
is
own
--
You see --
[inaudible].
T.S. Mohan: Exactly. Variable pricing, if it tends to be all the time, if
it's going to upset your budgets, it's a big concern, if variable pricing is
going to be stead for a month or two at a time. Like suppose every alternate
Microsoft announces a new pricing, and that's got a little predictable, say.
Absolutely no problem. No worries. But suppose you are budgeting, say, for
example, there's a big Superbowl activity going on in the U.S., a bunch of
standard companies have started off and they want to come up with a program
that anybody can use, perhaps an analysis of how the batting or bowling order
is done, and for they want to have advertisements to be priced and sold, and
for that they have to fix the tariff, and at that time they have to consider
the pricing that they have to pay, and suddenly if that goes up or down, the
same type thing changes. Right? And that's where pricing [inaudible] this
is something that architects have to consider when they design these kinds of
applications. It's not exclusively linked only to the managers of them.
>>:
I think we're behind schedule.
T.S. Mohan:
Thank you so much.
Thank you very much.
Thank you.
[applause]
>>: And now we have Tajana Simunic Rosing from the University of California,
San Diego, and she's going to be talking about achieving energy efficient
computing for future large scale applications. Tajana came about two months
ago -Tajana Simunic Rosing:
About, yeah.
>>: About two months ago, and her presentation was fabulous so we insisted
on having her in this workshop.
Tajana Simunic Rosing: Thanks. I appreciate the implement.
enjoy it. Otherwise I'm in big trouble, right?
I hope you'll
So I'm actually heading System Energy Efficiency Lab at the UCSD. And the
focus of our work, as you can see from the title, is on achieving energy
efficiency, in this case across scale. So from our view, the future of IT is
actually bridging the gap from very small devices that may exist around us in
terms of sensors and various ways of measuring things and interacting with
the environment. Those devices already today talk to mobile devices that we
carry around us and with us, so our cell phones, our iPads, for those of you
who got in the line, and other systems that are basically battery operated,
and eventually the data makes it to the infrastructure cloud.
The interesting issue here is that energy is a problem across all of these
scales. You've got a problem with energy if you carry anything that's
battery operated or even if you go to the outside rim of the circle, you have
devices that may use energy harvesting where energy is really at a premium.
It's a problem to the infrastructure because it costs a lot to operate these
datacenter clouds. So the question is how do we actually optimize, how do we
deliver good performance, how do we deliver result that people are after
while at the same time maximizing energy efficiency.
So I'd like to give you a couple of very realistic examples of applications
and application domains that we've been working with in the San Diego area.
So what you see on this map is actually a very large scale wireless mesh
sensor network. The picture represents only the top layer nodes. So only
the big communication links. Under every one of the dots that you see on
this map are literally hundreds of sensors and sensor node cluster heads.
UCSD is this little dot right over here. The network actually spans about a
hundred miles in length, it goes 70 miles off the coast, and it covers an
area from almost down to the Mexican border all the way up to Riverside
County.
What's really exciting to me personally about this network is that there are
very few computer scientists that do research on this. There are lots of
people who do research in other areas and actually communities that utilize
this network on a day-to-day basis.
So let me give you a couple of examples. On the very low end, we actually
have a whole a bunch earthquake sensors on the network. The sensors will
produce about five kilobits per second worth of data, which really isn't a
big deal at all. The only problem is that they're pretty much all solar
powered because they tend to be in these random locations. Right now their
in one-mile increments around all of San Diego County.
The goal behind this particular application is to study what's going on with
seismic activity globally, world wide. And the San Diego area was the
starting point.
Because of the availability of a fast wireless connectivity, this project was
so successful that it got recently funding to expand all the way along the
West Coast because they're now able to actually stream the data on a
continual basis and catch even the smallest tremors on time.
So five kilobits per second, not a big deal. But when you compare it to
everything else that sits on the network, it can start becoming a problem.
Motion detect cameras and acoustic sensors are actually present in one of our
ecological reserves. The study here, what you can see on the left picture of
there, is wolves. So there is an indigenous California wolf population that
actually lives in this area, and what we're trying to -- what people are
trying to understand is the basics of how wolves behave. They look at that
from the perspective of video and also audio.
Unfortunately, audio and video are physically separated. So video tends to
be placed up high so you can see a good picture. Audio tends to be placed in
areas where you don't have a lot of wind. So it's in a different location.
And you need to be able to gather these realtime streams of data and
correlate them in time and then run sophisticated algorithms. So on the
audio that you can see right in here, they're actually trying to run speech
recognition to understand and correlate the pattern of the sound that the
wolf makes to the behavior that they see on video.
So here are the two streams of data that you actually need to analyze fairly
quickly in realtime, and it consumes more bandwidth than earthquake sensors.
On top you see another ecological reserve, and actually the reason why I
included this is the student that you see sitting up there. She's actually
sitting on a big ledge on the top of the canyon right up here. And what you
see is her laptop with an antenna. This antenna is pointing to our access
point. This student can get about 11 megabits per second connectivity on the
edge of that canyon. She reconfigure all of her experiments throughout about
one mile by one mile radius where she has set them up.
Her job is to study ecology. She doesn't really care about computer science
at all. What this particular network has enabled her to do is to actually
study things in realtime both from her office and also to reconfigure her
experiments in realtime even when she's in the field by simply sitting down
and pointing the antenna in the right direction.
Down here you can see actually a couple of still images of what used to be a
video produced to support California Fire Department So San Diego, as opposed
to Seattle, has way too much sunshine. So much so that in September and the
beginning of October we have fire season. And unfortunately, fairly
frequently we get very large fires, and what you probably are not aware of is
when the fire starts in the San Diego area, usually the wind is blowing
really hard. It can propagate extremely quickly. And the only way for the
fire department to know how to deploy people is to have some way to monitor
this progress on the ground.
With a network such as this one, they're able to do this from their offices.
They're also able to get alerts that tell them this particular area has ripe
conditions for the fire or we even can see fire beginning.
And then on the very high end, I give you an example of one of the two
observatories that we have on the network. While this is not terabytes of
data, we have limited this observatory to 150 megabits per second, primarily
because our network actually supports about that much. If we gave them more
bandwidth, they can definitely do more. But that would mean that nobody else
could use the network.
The reason why I include this is so you can see the wide range of bandwidth
that has to be supported, from 150 megabytes per second down to five kilobits
per second. And also all these examples include some constraints in quality
of service. So if you have a fire in the middle of night and the observatory
happens to be streaming beautiful pictures of the night sky, you bet that
people will prefer for the fire images to make it first and to get to the
fire department on time.
However, this is an event that's unpredictable, so you need to be able to
reconfigure, and you need to be able to deliver data and compute and detect
right in time.
So this actually brings another idea that we've been working on, which is a
CitiSense Project. This project has been funded NSF and also by industry and
is a project that's done jointly with another NIH-sponsored project which
looks at how does environment around us and the decisions that we make on an
everyday basis affect our long-term health. And basically this project is
possible because we do have a large-scale environmental sensor network that
we can use to provide feedback to us on a daily basis as we decide to go
exercise or as we decide to just be lazy and, you know, sit with a laptop in
our lap.
So with projects like this, what you find is that we have a combination of
data that comes from the environment, data that's relevant to us as
individuals in the form of how much are you exercising, when. Also in the
form of genetic background might you have that would allow people like
healthcare professionals, public health officials, your doctors and you as an
individual make better decisions going forward.
If you start imagining a system in which everybody is being monitored 24/7
and is getting positive, hopefully, feedback from this system, you are
beginning to imagine a humongous amount of data that this is going to create
and the humongous opportunities for large scale computing both on the back
end in the cloud and also locally on your cell phone.
If you think about healthcare, you need to have sometimes realtime feedback
right where you are. You cannot possibly always rely on data streaming
somewhere to the back end and then the result coming to you. So you need to
be able to actually do computation on both sides, and you need to do it
efficiently.
So here's an example of a clinical trial that we ran. This was last summer.
This particular clinical trial was done with our school of medicine, and it
focused primarily on physical activity. A very simple question was asked.
The question was if we provided realtime feedback to individuals through
their cell phone in a very low-tech way on how much physical activity they
have done to date and some small encouragement to do better, will they change
their behavior, how much effect would it have.
So for that we selected a sample of over 60 individuals. Individuals were
selected primarily based on the fact that they had struggled with obesity.
So these are people who typically do not exercise a whole lot. They were
given a cell phone and a couple of sensors were placed on their body,
basically a heart rate monitor and an accelerometer. And the feedback was
then through SMS and MMS messages. So very low tech.
It turned out because they were able to get very realtime information about
how much they exercised, people actually significantly increased their
activity. They changed their behavior. And the result of this change was
much more significant weight lost as you compared the group that we studied
that used our system versus the group that didn't.
And what I think was the biggest outcome of this study is that over
95 percent of people who used the system wanted to buy it. They wanted their
friend and family to use it, they liked it so much.
So what we learned from this is people actually do care, and they do change
their behavior if you provide feedback in a way that's relevant to them. And
that is really what motivated our work going forward on the healthcare
management system where we look also at the environment and its effect.
One of the challenges that we found is because we're doing this monitoring
24/7 and providing feedback 24/7, energy became a very big issue. Batteries
started dying very quickly. So this is what motivated research in energy
efficiency across the scale.
So I was very happy when Dave Patterson said, well, you know, if you want a
research topic, you need to really look at how to do energy efficiency in
mobiles and on the cloud, because that's exactly what my group does. So I
guess I listened to him, huh?
So the idea here is that we have sensors in the environment, we may have some
sensors on the body. Those may or may not be battery powered. Some of them
may actually be using energy harvesting. So an example would be solar and
wind. You have what we call here a local server, or cell phones, basically,
and then you have back end. And there are sets of tasks that you can assign
to the sensors, that you can assign to the cell phone, and that can run on
the back end. There's some tasks that can only work on the sensor, like
sensing, obviously, but there's a good fraction that you can assign to any
part of this system.
And the decision of who runs what at what period of time will significantly
affect the length of the battery lifetime that you will have, and it will
also affect the amount of computation you end up with at the back end. In
fact, in our most recent result, we found that if you dynamically assigned
these tasks across this scale, you can increase battery lifetime by about
80 percent on the mobiles, which is a very big deal for the particular
healthcare scenario we're looking at.
So with that, I would like to now focus a little bit more on the cloud side
of this equation. So what does it mean to do energy efficient computing in
datacenters. And for this, there are a number of challenges that we're
looking at. So what we're monitoring are temperature, power, and
performance. So we've been talking a lot about energy, so people are used to
think about power. Everybody cares about performance.
Temperature I don't think has been mentioned at all, and yet if you look at
the operating cost of a datacenter, depending on how well you designed it,
about half of this can go to cooling.
Temperature strongly affects reliability, and that is why you cool. So you
really cannot design a system that's energy efficient without looking at the
cooling and thermal aspect.
What we control are various cooling settings, power states and task
scheduling. So where should the job run. And what we actually look at
predicting is what temperature is likely to do in the near future, because it
turns out the temperature changes relatively slowly as compared to the
workload. We also tried to estimate what incoming workload will likely do in
the near term.
The goal of these predictions is to buy us a little extra time so that we can
be more energy efficient.
The particular research focus in my group is on looking at both what we can
do in terms of individual server redesign, and there it focuses on memory and
storage architecture. So what can be done to make a computer more energy
proportional. It's not the CPU redesign, it's how the memory subsystem
actually interacts with the CPU.
We look at power management techniques, cooling, and then use virtualization
as a method to actually implement all this.
So for power internal management, we've been lucky to get funding from NSF
Project GreenLight to deploy a green cyber infrastructure which basically
consists of a couple of these datacenter containers that you see up there.
This allows us to play at fairly large scale all kinds of thermal and energy
management games that wouldn't be possible just within a small machine room
that most department have.
So for that we have developed some power management algorithms and also some
thermal management algorithms, and I'll show you in a second a little bit
more about each.
So for power management, we looked at traces of work loads, realistic work
loads, working on a whole a bunch different devices. And here I've just
included a sample of two, a hard disc trace and a wireless network interface
trace. The reason why I included these two is because intuitively it would
seem like they should look completely different. These are totally different
devices. One is much slower than the other. One tends to work with larger
sized data, the other one works with smaller sized data, and yet, when you
look at the shape, the shape is the same.
What's on x axis is the interarrival time between requests to the particular
device. What's on the y axis is 1 minus cumulative probability distribution
of getting those interarrivals.
What you see in both cases is the experimental data, which is in this teal
color, does not match exponential fit to it at all. It actually matches a
lot better Pareto distribution, which is a heavy-tail distribution.
The reason why this is really important is because when are you going to do
power management? You're going to do it when you have long enough idleness,
right?
Now, look at what happens at long idle times. At long idle times,
exponential fit is very poor. It's not even close. The reason why we even
talk about the exponential distribution is because people use it to model
performance. To understand, they use basically queueing theory.
If you look at the high performance regime, exponential is actually close
enough. So it makes sense in some cases to use it for performance modeling.
It makes no sense to use it if you want to save energy. You're going to make
a whole bunch of wrong decisions.
So based on this, we actually expanded markup decision process model and
accounted for the fact that we need to have heavy-tail distributions to
monitor recent history of behavior of the workload. And because we did that,
our implementation showed significant power saving. So measurements were
within 11 percent of an ideal oracle policy. Ideal oracle policy is a policy
that knows the future. So as soon as the idle period begins, it knows
exactly when it ends.
The assumptions were that we have a general distribution that guides the
request interarrivals, that we have exponential distribution for everything
else, because it turned out that that was close enough, and that everything
is stationary.
The last assumption is actually the most limiting one. Stationary tells you
the statistical property of all of the parts of the system do not change in
time. That clearly is not true. So in order to address that, we actually
used online learning algorithm.
What this algorithm does is it takes a number of policies that may be a
result of our optimization and then it adaptively selects among them. As it
selects each expert or policy, that policy will make decisions. Once a
decision is made, we can evaluate how well it's done, and then we update the
costs. And then on the next interval we'll select the next best performing
expert.
What's nice about this particular algorithm is that it's guaranteed to
converge to the best selected policy very quickly. The convergence is at a
rate that's a function of the number of the experts and the number of time
periods when you actually evaluate this. So the end result is that you get
very good savings, even when the workloads are changing.
So let me give you an example. In this particular example we specific chose
real-life traces from a datacenter -- in this case this was done at HP -that have fairly different properties. So what you see here is the average
interarrival time and the standard deviation. So I specifically picked
traces that have significant differences between each other.
The first table shows you the results if you have each individual power
management expert making all of the decisions all the time. The second
time -- and you can see that one of the policies has been optimized for least
overhead and performance. The other one is maximized for maximum energy
savings.
With our online learning controller we can now trade off how much performance
overhead versus how much energy savings we want seamlessly across the traces.
And as we trade that off, we see that our controller will pick a policy that
gives us lowest delay when we choose that, and it will automatically pick the
policy that gives us the maximum energy savings when we need that. So it
able to adapt very quickly across different traces and across a set of
policies.
You can do the same thing for changing voltage and frequency of operation on
the processors. In this case, what we're looking at is running from
40 percent to 100 percent speed. And as we trade off, again, lower
performance overhead, which means you tend to run faster. For more energy
savings you will tend to pick lower frequency setting.
What's interesting is that fairly good fraction of the time you also will run
faster. Right? Why is that? Well, the reason for that is fairly simple.
If you look within typical workload, you would have parts of the time when
the workload is very intensive in terms of CPU time and you'll have chunks of
time that can be fairly large when it's waiting for data to come from memory.
During those times you can slow down without any performance hit at all.
this particular online learning approach can actually adapt very easily.
can monitor this and immediately pick the right approach.
So
It
Now, all of this up until now talked about only energy management. The
second half of the equation is temperature. So what this graph shows is the
percentage of time that you spend above certain temperature range if you use
a standard Linux scheduler for a set of workloads if you do energy-aware
optimization. So in this case if I assume I know exactly what I'm going to
run and I do an absolutely optimal assignment and maximize my energy savings,
that is the result that I'm going to get in terms of temperature
distribution.
And the last one is if I also do thermally aware optimization. And when you
immediately see is that optimizing for just energy savings does not solve
your temperature problem. The reason why it does not solve it is very
simple. So you're going to maximize your energy savings if you cluster your
workload into as few areas as possible and you shut off everything else.
Shutting off definitely will cool things down. However, clustering will heat
up the area dramatically. So it's because of the clustering that you end up
with all these hot spots that you see on this plot.
So as you think about energy savings, you actually have to look at both sides
of the equation. You have to consider thermal constraints.
So we did that using the same online learning algorithm that I showed you for
power management, and what we did here is we took workloads that were
collected from an 8-core Ultrasparc T1 system. This was done at one of Sun
Microsystems customers, so we used their workload.
And then we basically took one hour from each day over a period of the week
and we concatenated that together to show adaptability. And that's how you
get A, B, C and D workload and then the average on the right.
And what you can see here is a set of policies starting with default OS
scheduling. This was actually a Solari [phonetic] scheduler to add
migration, which will move a thread when things get hot, power management and
voltage scaling, which will basically either go to sleep or slow down when it
gets hot, adaptive random policy which actually improves on standard
operating system scheduling by scheduling proactively to the cores, which are
relatively cool, and then online learning, which just selects among all these
policies.
And you can see that the across all of the examples online learning will beat
every single individual policy. And, in fact, it even beats by 20 percent in
terms of hot spot reduction in comparison to the best possible policy. So
being adaptive really pays.
So these are great results except for the fact that every time you do thermal
management, you pay a price in performance. So when you migrate your thread,
it costs you some time. When you slow down, it definitely costs you time.
If you go to sleep, it kills your performance.
So instead of reacting, what you really want to do is you want to be
proactive. You want to avoid getting hot, if you can, while still delivering
good performance. So that is exactly what we did. We forecast the
temperature, and based on that forecast, will proactively assign workload so
that performance is kept at a best possible level and energy is saved.
And they did this by taking data from temperature sensors. So every single
system has a whole bunch of thermal sensors in it, and all have to do is just
tap into them online. So we take the data, we develop the predictor based on
statistical model, in this case ARMA model, we predict the temperatures, use
that as a feedback to scheduler, scheduler then makes proactive decisions on
how to send workload, and based on those decisions, you hopefully get much
better result.
Now obviously as things change dramatically, you may have to update your
model. So we have a very quick online way to update that.
And the end result is, as you can see on the right here, over 80 percent
savings, reduction in thermal hot spots. Right? And we do that without
necessarily having much performance overhead. In fact, the own time that you
get any overhead is when you run in a very high utilization regime, and there
really isn't anyway to proactively schedule. You simply have to slow down or
migrate. So it really pays to be proactive. And this is why we're convinced
that predictive work will actually make a big difference in these systems.
Since last time you saw this talk, we have actually gone a step further and
we've look at cooling aware thermal management as well. The basic intuition
is very simple. So it turns out that if you look at typical fans, your
server, they run at a fixed number of speeds, so say about five settings is
pretty typical. The speed settings differ in the amount of power. In fact,
as you increase concur speed, power is actually cubically proportional to
speed. So the amount of power you're going to lose by increasing speed is
huge.
As a result, it pays to actually try to make sure that you pack as much jobs
as you can into a particular socket or into a particular server up to a point
that doesn't cause the next increase in speed. So that's quickly intuition
behind what we did. We said if I have a high-speed fan and I have a
low-speed fan, I'm going to look to move jobs off the high-speed so it slows
down in such a way that the low-speed fan doesn't speed up. And the other
way around also. I may want to swap threads between those two in such a way
that the speed of the fans does not increase but actually remains reasonably
low, and therefore we can save.
So you can see from these results on the various workloads that we ran that
you can actually get about 73 percent savings in terms of just cooling
energy.
So putting it all together, we've been using Xen virtualization system, we've
extended it, which we call vGreen, and what we're doing there is we're
actually doing online workload characterization and also thermal
characterization at the same time. We characterize the characteristics of
every single virtual machine running on every single server and every single
processor, and those characteristics are aggregated all the way up to the
node level so that we can make decisions that have to do with the individual
VMs and also individual physical components.
And then we use that to perform scheduling, power management, thermal
management, and to make migration decision, if any.
And you can see from preliminary results that even in highly utilized
systems, so these are systems that are running at 100 percent utilization,
which is not really that realistic, we're able to get good energy savings
with speed-up, and we get these because we're monitoring characteristics of
each VM and we schedule VMs that play together well on a single socket or on
a single server. That is really the only reason why we get these kinds of
savings.
Now, as utilization comes down, you can see where the savings would clearly
go up if you have ability to utilize more power and thermal management knobs
on your server machines. So going forward, leading a fairly large center,
it's actually through MuSyC, which stands for multi-scale systems center, the
goal of this is to manage energy across all of the different layers within
datacenters and to show and to ensure that energy will be consumed only when
and if needed instead of wasting it.
And we do this from software layer all the way down to platform and hardware
level. There are a number of faculty involved from UCSD, UC Berkeley, USC,
Stanford, and Rice University on this project. So we're really trying to
look at this from a holistic perspective because we believe that you cannot
solve an energy problem by just doing a software solution, by just doing a
hardware solution, or even just solving the cooling problem.
to think about it as a whole.
You really have
So to summarize, the key is really to ensure that we monitor what is going on
in the system, we develop policies that are aware of what's coming down the
line, that we're aware of the hardware characteristics. We can't just
pretend that hardware works perfectly and works equally everywhere. It
doesn't. We, in fact, need to leverage these differences to our benefit.
And as I've shown you, we have done some first steps, too, toward
implementing power management and thermal management policies that behave
well, and we've started integrating this into vGreen virtualized system. So
this is pretty much all I have to say.
>>: So we can ask questions without any microphone, so, please, if you have
any questions.
>>:
So do you have a study of how software inference [inaudible].
Tajana Simunic Rosing: Yes. So, actually, Alan [inaudible], who is right in
the software energy management team, has worked with me on looking at how we
can predict the type of performance and the energy costs that future
applications are going to demand out of future hardware and how would then
design those applications and design the hardware to meet that. Basically
the concept behind this has to do with creating profiles off the machines and
creating profiles off the applications. These profiles of applications are
basically looking at relatively simple kernels that you can detect as the
application is running. So you're not looking at the source code of
application at all, you're just monitoring it running on today's system. And
as it's running, you gather enough information to figure out where are the
critical hot spots. Then you convolve those two together to figure out what
is going to happen if you take today's application on some new system or the
other way around, you know, what if I take some future application and run it
on today's system, what am I going to get. And based on this, then we can
actually make some better decisions. So that's a good question.
Any other questions?
>>: How did you actually measure the performance?
time or [inaudible].
Purely on the response
Tajana Simunic Rosing: So it depends on the application. And that's a great
question. So if you look at multi-tier applications that have response time
guarantees, then, yeah, we basically measure response time. You can measure
the bandwidth. For the examples of video streaming, you know, you can
actually see the quality of the experience, frame drops, and so on. So it's
a strong function of what it is that you are actually
running, and that's part of the challenge here. And then if you look at what
happens within the kernel, you actually don't have a good way to get the idea
about performance, and that was one of the challenges that we looked at is
how do you provide this feedback to a virtue machine scheduler so it knows
what to do. Because otherwise how is it going to make it better?
>>:
So you actually made a [inaudible].
Tajana Simunic Rosing: So we actually created a very thin layer interface
that allows application monitoring and feedback into the virtue machine in a
way that doesn't create overhead.
>>:
[inaudible].
Tajana Simunic Rosing:
>>:
Basically.
[inaudible].
Tajana Simunic Rosing:
Yeah.
>>: The perfect [inaudible] you are measuring as a conception, I don't think
you can have an and exception for air conditioning [inaudible].
Tajana Simunic Rosing: Yeah. So I talked a little bit about that. We
haven't done the scale of a whole room, but we are capable of doing it. So
what we have right now is ability to actually measure how much we're
consuming at the whole level of the datacenter container because we're
measuring the amount of power that goes in and also we know the rate at which
water is coming in, what temperature the water is, and then the rate at which
it's going out and what temperature it is what it goes out. So based on that
you know exactly how much you're consuming on the whole box level.
Now, inside we also measure all the temperature distributions of the racks,
the heat exchangers, and the servers themselves. And we know all of the fan
speeds. So you can kind of figure out, you know, and allocate the costs
across this. We're not quite at the point where we've developed policies
that run at that scale, so I started sort of from the server and then started
scaling it up. And a friend of mine who is working at the software level
more, he started from the top level, you know, monitoring everything, and
he's working his way down. One of these days we'll meet, right?
>>:
Any more questions?
Tajana Simunic Rosing:
Well, thank you very much.
Okay.
Thanks.
[applause]
Zach Hill: Hi. My name is Zach Hill, and I'm presenting our work on early
observation on performance of Windows Azure.
So we've seen a lot of talks today about people's applications on clouds and
we did this and we did that, but we kind of took the perspective of an
application developer looking at these new technologies and trying to decide
do I want to use it and, more importantly, if I do, how do I build my
application in Azure specifically.
So the question is not can I do it but how should I do it and how should I
design my application to best utilize these services that are provided in
this cloud environment.
So, specifically, we're focusing here on how do various Azure services
perform somewhat in isolation, and we ran these experiments between November
and January, so just as it was coming out of the CTP and into the final
commercial release. So with some slight disclaimer that you may see
something different than this if you run these experiments today and you may
see something different than that tomorrow, as we all know, there are
precious few performance guarantees given by any of the cloud providers. So
we kind of give these as general recommendations and things we've seen and
experiences we've had working with the cloud, but certainly had to take all
of these with a slight grain of salt.
So here we kind of present a fairly typical application architecture. You
see things like this fairly often in documentation and literature. You have
some users submitting requests to some web-based front end that goes through
a load balancer that hands off work to some task queue and in the back we
have workers that operate on these tasks and do some batch processing or
whatever, and they then interact with various types of storage tables, blobs,
SQL services. So we're going to look at each one of these individually.
So first I'll start out with looking at kind of what's the performance we can
expect when deploy and scaling the compute resources themselves. So we're
not looking at how many CPU cycles are we getting, how fast can I execute
this algorithm, but if I wanted to actually deploy an application and scale
it up, for instance, what kind of performance can I expect, how long does it
take to do these operations so I can kind of give you an idea of the kind of
parameters you need to take into account when you're designing an
application, particularly a scalable weapon applications and things like
that.
We'll also look at the storage services, so we'll do some member of marks of
the task queues, the tables and blobs. These are fairly straightforward
measurements, but we think they're interesting and in the results, and
particularly with relation to some other metrics, such as the direct TCP
communication, which was released -- oh, was that January, I think, or
December when they announced the feature of allowing worker roles to interact
directly via endpoints. So it was not part of the original Azure offering.
Originally these instances could only communicate through the storage
services, but now you can actually define a direct TCP port and make direct
connections, so how does that fit into the larger performance picture.
And, finally, we'll wrap up with the Azure SQL services, so their actual
relational database in the cloud and how does that compare with either what
you would find in a local LAN environment and what kind of parameters can you
expect for scaling it and performance when you're actually writing
applications against it.
So starting off with the deployment and scaling, so our methodology here was
to kind of evaluate how long it takes to deploy and how long it takes to
scale. So we deployed applications from the blob storage itself. The
deployment packages were essentially trivially small, so less than five
megabytes, so we can kind of erase that out.
Then we measured the time to start the deployment. So we present some
numbers for some different instance sizes. In total, we ran eight cores.
So, for instance, for the small type we start up four instances and then we
scale it, and I'll talk about that in a minute, and for medium size we start
up only two because those are two-core, so you kind of get the math of the
cores.
Then we also measure the time to actually double the instance count. So we
start out measuring how long it takes to bring up four instances, and then
what if we want to bring up another four, and how those two numbers relate
and what can we kind of expect. And these are the experiments that were run
between December and January. We ran it 431 times, so if anybody's really
interested, we can give you this nice, long plot with every single data
point. You can see the variability and stuff. We will omit that for this
talk. We simply don't have time or space.
We actually did experience a failure rate of 2.6 percent. So 2.6 percent of
the time one or more of the instances didn't come up. And that's worth
noting, because, again, with the hype surrounding cloud that you always get
this stuff, well, that's not necessarily the case. You don't always get the
resources either right when you request them or at all, so you have to
account for that when designing your applications.
So here's the time to deploy and the time to see the very first instance come
up. So here we have minutes in the scale, which is noticeable in the first
place that it's even a minutes scale, and here's the various VM instance
sizes, so small, medium and large. And then we distinguish between web roles
and worker roles. If you're not familiar, the web role essentially has IIS
attached to it and it hooks itself up to a load balancer.
So based on that, we kind of expected the web roles to take a little longer,
and indeed they do. We also maybe or maybe not, depending on your
perspective, expect larger instance types to take a little longer. It's
interesting that they do, but not significantly. So if you actually look at
the time per core in some sense, extra large, you get eight cores in
13 minutes versus 1 core in 9 minutes. So these kind of design tradeoffs are
interesting. If you need lots of resources very quickly, that's actually
your best bet.
But overall, the first impression we had when we saw this was, wow, it takes
ten minutes to bring a VM up. Why is that? So if there's anybody here from
Microsoft, I would love to have a talk with you why it takes so long.
not convinced that's absolutely necessary.
I'm
And even more interestingly, when we compare starting it up to actually
scaling it, so then doubling the instance size here, so we add four more VMs.
And so these are stacked charts. So you can see the total time at the top is
a total time to double the entire deployment size. So here the worker -- I
should note, these are all install instances. I only present the data for
the small instance types here. So from the time that we already have four
instances running for worker role instances to the time we can get four more,
it's just under 14, 15 minutes for that. So we can see, again, as
expected -- no, wait, that's start. For scale, it's significantly longer.
So the first ones come on and then we see kind of they trickle in. So
there's also no guarantee you'll get all your resources at once, although I
should note, we have seen some slight changes in the behavior recently, so
some recent experiments as of literally a few days ago rerunning some of
this, we actually have seen these gaps shrink significantly. I haven't
analyzed the data enough to say whether this goes up or this comes down, but
it's worth noting. And certainly when you're talking about dynamically
scaling applications, when it takes 20 minutes to bring up five more, four
more instances, that's something you need to take into account, particularly
if you're trying to follow some workload curve, right? I mean, if you're
trying to match your resources to some workload, you need to know that you
need 10, 20 minutes of lead-in to actually match that.
So the take-aways here. Deploying VM takes about ten minutes. Is this too
long? We think so. In a lot of cases that could be hindrance. We have not
really run a comparison with other cloud providers. I could give you
anecdotal kind of information that we've seen significantly shorter time from
some other providers, but for actually Windows instances, it's not actually
that different, which is telling in itself.
Adding instances takes much longer than initial deployment, so kind of be
aware that dynamic scaling does have an overhead and it's not quite as
instant as we or many other people would like it to be. As you increase
instance types, it will take longer, and you have to account for the fact
that you won't get all your instances at once. This actually can be good and
bad, as I'll talk about later with the storage services.
Speaking of which, we'll look at each one of the main storage services. So
the blob storage, table storage, queue storage. I won't go into what each of
these really is and how they really work. If you're interested in that,
there's plenty of documentation or you can come talk to me later and we can
kind of discuss the intricacies a little bit more. But suffice to say blobs
provide large kind of unstructured storage, big chunks of bits. Tables are
semi-structured data, although not in the classic RDBMS sense. Their only
semi-structured. There's no enforced schema, but the kind of query, insert,
update things. And then the queues, which are fairly self-explanatory.
So we'll start off with the blob service. Again, the limit -- we'll skip
that since we have time, the get and put semantics. And according to Windows
Azure, performance is isolated between blob containers. So blobs are these
objects, and they're kind of grouped into bunches by these containers in a
naming sense. So you have a container that contains some set of blobs.
Performance between these containers is supposed to be isolated in terms of
where they're located and stuff like that. In the datacenter, you can't
count on them being near each other, et cetera.
So we test the performance of getting and putting a blob within a single
container, so we did not span across containers, and we actually scaled
between 1 and 192 concurrent clients. So for the get action we scaled
between 1 and 192 gets on the same blob, and for put, putting those blobs
into the same container.
And what we got here is -- okay, that's right. You see the per-client
bandwidth is in the vertical scale. So this is from the perspective of a
single client within the deployment, and we scale it between 1 and 192, as I
mentioned. And so for download, we see a single client gets about
13 megabytes per second download from the blob. So when fetch that 1
gigabyte blob, you're getting -- which works out to be about 100 megabits per
second as the Azure specification states for what you expect network
performance to be in a small instance type. So we did all these tests using
small instance types so that we could get that high scalability number.
And we the performance degrades reasonably. So the big take-away here is
that it's not infinitely scalable, certainly. Especially when you're
accessing a single entity. And so you need to be careful how your
application uses the storage. If you scale up 50 instances and their first
action is all to go fetch some initialization data from blob storage, you
need to be careful how it's organized because you can see significant
performance degradation depending on how that blob is organized and how it's
distributed in the storage system.
Additionally, we see here upload is significantly slower than the download,
which is somewhat to be expected. And I'll keep moving because we've got to
go quick.
So here's kind of the service side perspective, the cumulative bandwidth. If
you add all those up from each client, we see the service itself is
supporting, we max out about just under 400 megabytes per second, which is
kind of an interesting number. We weren't quite sure why that was. We kind
of had assumed maybe it's triple replicated and each has a gigabit, but that
adds up to about 375 megabytes per second, not just under 400. So there's
still some investigation to be done here, and, again, because of performance
variability, do we know -- you know, will this change over time? Who's to
say it won't. But it's interesting.
And upload was significantly slower as well, but again, that's expected since
there are replication issues involved.
So quickly moving onto the table service. So table service basically has
this entity, attribute, value model where entities are essentially rows,
attributes are items within the row, and each attribute can have a value and
a name, et cetera. Again, semi-structured, no schema. So the question is
what kind of performance can we expect when we're running queries and inserts
and updates against this storage service.
So we performed each of the four primary operations: insert, update, query,
and delete. Each client operates on its own unique entity. So they didn't
have row conflicts directly, but they're all within the same table and within
the same partition. And Azure as this feature where each table is kind of
divided into partitions which are dependent upon explicit values that you put
in the row entities themselves. You give them a partition. So we worked
within the same partition, and we performed basically 500 of each operation
for each client within the exception of update, which only did 100 ops.
So at the end of the insert phase, which was the first phase, there were
approximately 220,000 entities in the table and then upon that we operate the
queries and updates and deletes.
Moving on, so this is our performance graph. On the left we see the table
performance using 4-kilobyte entities. We have data actually on many
different sizes. We present 4k here because it's fairly typical of the
various sizes, and they're wasn't a dramatic difference between entity sizes
such that we think you need to present all the separate ones.
So query and insert are interesting here, because not only do we have a weird
kind of uptick with low concurrency, but they don't really vary that much, so
we actually are pleasantly surprised at the scalability here. From 1 to 192
clients we only see, you know, 30, 40 percent variation in performance, which
is actually fairly impressive for both query and insert.
Delete, we can't say that. It grows quite quickly. And update, which is on
the right here, it's a whole different story altogether, and we present each
of the different sizes here. So you can see for the different entity sizes,
it didn't really make a significant difference in terms of performance.
They're all basically about the same given the average operation time, but it
quickly gets to be quite expensive with high concurrency, so be careful how
you design your tables.
Again, this is a single partition, so within the same table you could use
multiple partitions to kind of spread this load and get better performance in
that regard.
So the queue service is kind of the last of the three primary storage
services, and it's intended for passing reasonably small messages in a
basically FIFO model. Get, put, peek are kind of our typical queue
operations. And so here we just test concurrency against a single queue and
see kind of what a queue can handle using various message sizes. So we
changed from all the way down to 512-byte messages all the way up to 8k
messages.
So put and get are the ones we really care about. Peek is basically
consistent across the concurrency level. I should mention that the vertical
axis here is message per second, and that's seen from a single client's
perspective. You can do a little bit of math if you really want to get
absolute latency numbers.
But, again, so for put and get, particularly at scale, again, the size of the
message becomes less important. So actually using larger messages as you
need more concurrency is a better way to get more bytes through the
interface. But it scales reasonably well. We kind of -- we think 32
concurrent clients is about the inflection point at which we start to see
degradation of performance beyond, so you get approximately 50 percent less
messages per second after you pass that barrier regardless of message size.
So now let's talk kind of very briefly, because we are getting short on time,
on the direct TCP communication. So this is a somewhat new feature, useful.
It allows workers to communicate directly without having to pass messages
through this queue, so we have a much -- potentially a much lower latency
communication operation because there's no intermediary required. So we just
ran some -- opened a TCP connection, transfer a file, and kind of measure the
bandwidth and the latency that we observed, and we actually ran these tests
for a long time, several weeks. So, again, if you really are interested in
it, we can talk about kind of the kind of variation that we're talking about.
We actually do have a graph that point out some interesting artifacts that we
kind of discovered and had some observations.
So here a kind of a histogram view. So of all the samplings -- I should say
of all the experiments we ran, the key note is performance. So -- I can't
read it. The way to kind of read this is that about 65 percent of all file
transfers that we performed via this TCP got 80 megabytes per second or
greater. So that's kind of how we read this. So percent is the speed or
greater.
So we see most of them, at least 50 percent, got actually very high
performance. And this is very interesting when you consider this is also on
a small instance type, which is supposed to be limited to 100 megabits per
second. This is clearly well above that threshold. So we are interested in
figuring out exactly what's going on here, which I'll talk about in the next
slide, the theories we have, although, again, it's a black box so we're not
quite sure.
Latency. Reasonable latency. Again, because this is a datacenter, we have
no real notion of how far apart these objects really are, but reasonably
consistent. Not a huge variation. Not nearly like what we see with the
bandwidth.
So here are all those data points. You'll notice immediately there's a
reasonable amount of variability between these tests. These were run every
half hour for several weeks, and these two troughs are the interesting
points. So what really happened there? A couple of ideas, but we have no
concrete data. So the real take-away here is you can't really count on any
specific performance number you get. We don't know -- because of this kind
of weird occurrence where we're getting much higher than what we actually
expected, you know, we expected this 100 megabit section, and we're getting
much higher than that, so we're looking at is this really the correct
numbers, are these the correct numbers and we just got lucky a lot of the
time, so could there be some co-location issues or multi-tenancy that
actually reduced the bandwidth. I don't know how Microsoft actually enforces
the bandwidth limits, so that's one possibility, or we could have just had
some random network occurrences, high load elsewhere that caused this. But
you can see it's clearly variable over time, and you need to take that into
account when you're designing applications.
And I will wrap it up with our quick look at Azure SQL services. So this is
kind of the most traditional service here, the RDBMS that everybody is
familiar with based on SQL Server 2008, I believe. It is size limited to
less than 10 gigabytes per database, so that's an important factor when
you're designing your application. If you expect your database to grow
beyond 10 gigabytes within a single database, you have to find a way to
partition into multiple physical databases, not even between tables or full
databases, which that does not fit all workloads.
So we ran the TCP-E benchmark, which is an online transaction processing
benchmark. It simulates a stock brokerage house, to updates and stock
tickers and things like that which I'll talk about in a second. And the
database that we used to test against is about 3 gigabytes in size, so kind
of right in the middle, not right up at the upper limit.
And here's a breakdown by the micro-benchmark. So each of these is kind of a
defined transaction within the suite of benchmarks, and in the blue here we
actually ran the benchmarks on a local machine that we had in our lab. So it
was a quad-core, I think a Xeon. I'm not sure if it was an Xeon or a Core
two with, I think, 8 gigabytes of RAM, so kind of a standard local server.
And then we compared that to the performance between the SQL services in
Azure and a client also running in Azure. So we see this is kind of the
Azure LAN case.
And on average, which is right here, this is the average case, average across
all the micro-benchmarks, we see about a 2x slowdown that you can expect. So
that kind of -- we've looked at which ones of these are reads and which ones
are read-write intensive. There seems to be no consistency between the
performance differences between those. They kind of average out. We're
investigating this a little further, why it's actually faster in Azure than
it was on our local system. Because Microsoft gives us absolutely no specs
on what these SQL Server instances are running on in terms of hardware or
even their virtual resources, it's hard to compare directly and claim that
this is some resource contention issue or anything like that. But on average
you can expect a 2x slowdown.
I'll move really quickly. So we also go some transaction slowdown. So here
we see as we increase the number of concurrent threads that were running
transactions against the database, comparing, again, this local LAN server
with the Azure cloud, we see Azure actually scales reasonably well to
concurrent clients, more so than our LAN server did. At this point we
actually saw a fair number of failures locally. That's why we don't have
data for all the different data points.
But we can maintain below a 2x -- yeah. So we're kind of calling the
inflection point here about 30 concurrent threads, and we can stay under that
200 percent slowdown factor up to 30 concurrent threads. And this is showing
how we actually saw commit failures as we increased the concurrency as well.
So these are actually transactions that failed to commit. And we lost a data
point. This is the local server again. So it degraded rather quickly.
So looking at this data over time -- so we ran these benchmarks for several
weeks, and we saw fairly consistent performance. So this is kind of a
high-level trace of each of the micro-benchmarks. And so you see a little
bit of jittering here and there, and this is actually a case where the client
machine that was running the benchmark against the server actually failed and
we didn't know about it for a couple of days. So it's kind of another lesson
learned, be careful what happens, because even though it's supposed to
recycle, Azure is supposed to be recycling these VMs, if they fail, they
actually failed and then failed to recycle, so it just died altogether. So
kind of another lesson learned.
But we see reasonably consistent performance, particularly given that this
was during the CTP phase, so while the data building and testing was still
going on.
So general recommendations and conclusions. Be careful of how you do the
scaling. Be aware of that because it's dramatically slower than the initial
deployment, you need to look at the workloads themselves to determine when it
might or might not be worth it to actually scale. If you have a lead-in time
of 20 minutes to gain more resources, will the workload peaks have passed by
that point, essentially.
Distributing blob accesses across many containers is one way to contain or
maintain higher performance. So don't point all of your instances at a
single container or a single blob. The tables scaled fairly well for most
operations. Update and delete were the two noticeable exceptions, but this
is fairly expected given the nature of those types of operations.
And SQL Services scale reasonably well, but it's tough to really recommend
using something like that for a scalable application because, again, of the
size limitations. If your database is expected to grow, and as we've heard a
couple times today, they never shrink, then you need to be very careful of
how you appropriate that.
Surprises. So the big surprises were why does scaling take so long and why
is TCP performance not the same as blob performance. These are kind of the
areas that, moving forward, we'd like to investigate and talk with the
Microsoft and Azure people, if possible, to see what's really going on here.
I should mention that similar to what work we've been doing, there's also an
application out by the extreme computing group here at MSR, the Azure Scope,
which provides some similar benchmarks. If you're interested in running some
of your own kind of benchmarks like that, they provide some examples and
stuff that actually we found fairly interesting towards the end of our work.
So I will conclude so we can move on and get some lunch.
Yeah?
>>:
[inaudible].
Zach Hill: Yeah, we had looked at that. That was one of our kind of things
that we thought might be the case, but it not clear in terms of documentation
whether this 100 megabit is a guaranteed minimum or a guaranteed maximum.
>>:
[inaudible].
Zach Hill: Yeah. So it's kind of hard to interpret what 100 megabits means;
do I always get that or will I ever get more.
>>:
[inaudible].
Zach Hill: Yeah. So that was -- yeah, excellent point. That's one of the
kind of issues is how much are you really contending and ->>:
[inaudible].
Zach Hill: Ours is a regular sequential downloading, yeah. Again, we didn't
think that number was actually surprising given that we expected to get the
100 megabit network limit. So if we could actually get more than that, that
would also be surprising.
>>:
[inaudible].
Zach Hill:
Http, not [inaudible].
>>: Any other questions?
Download