>>: So welcome, everybody. This is the 4:30 to 6:00 session of the Cloud Futures 2010. This session is about systems, and well have three presentations. The first one is by T.S. Mohan from Infosys in Bangalore. He's going to be talking about the many specific software architectures, followed by Tajana Simunic from the University of California, San Diego, followed by Zach Hill from the University of Virginia. The presentations are going to be 20, 25 minutes, and then at the end of the presentation we'll have about five minutes for questions, and then hopefully we will finish in 90 minutes the whole session. T.S. Mohan: Thank you, Juan. Good afternoon. I'm going to present to you some work that we are currently doing [inaudible] between me, based and India, and a person, Nenad Medvidovic, who's working with USC, University of Southern California, and Chris Mattmann, who works for Jet Propulsion Lab and NASA. Of course, he's also professor in USC. This work essentially is based on certain things that has been done trying to understand grid computation technologies and how people who tried the program, the grid, had run into various kinds of problems. So we did an analysis of the code and then we found that trying to extract some architectural patterns out of such things would be of use because if somebody else wants to write code, they could do a better job. But let me start, would it not be good to introduce that into the world of cloud, because notice everybody wants to do the cloud programming. However, in your [inaudible] application and when you architect [inaudible] application, most often you run into all kinds of problems. In fact, the biggest challenge has been that if you say you're migrating the cloud, people think that you're just doing database migration from a database in your local datacenter to maybe database in the cloud. It's too crude, too simplistic. We're looking at all kinds of problems that can happen, the patterns that can be leveraged, and how to go about it. So the big question that we hit upon when he tried to look at migrating into the cloud was have we really understood the cloud, because what we found was there are too many definitions, too many terms and too many usage patterns, too many technologies involved, and consistency across the board was missing. There are lots of clusters of activities going on, common terminology, and a lot of confusion. So the biggest challenge that we find is that there's hype, there are a lot of domains, and domains have their own specific usage patterns, right? Actually we heard about a lot of how the cloud has been used for the Worldwide Telescope Project. That is one site. We also heard about the Microsoft Biology Foundation using the cloud for analyzing data in huge, large scale, and the usage patterns looked really different. But then, interestingly, the underlying technology [inaudible] because that's Microsoft Azure. If you look at how it happens around the world, we see that there are lots of open source packages like Hadoop, like Amazon getting used, like people wanting to set up their own private clouds and struggling there. And when you see how they want to move an existing application onto a cloud, then things kind of start falling apart. So that kind of set the background for our work. And we found that somewhere along the line things mature up and we'll have better understanding of what typically a cloud should be and how we should go about using. So what is it that really clicks for a cloud? Is it just these three things like what you heard all the keynote speakers talk about, the pay per use, where there's no capex, only opex, or an ability to meet seasonal loads and what Dave Patterson called the search computing. So I have my computing that happens on a regular basis, suddenly I have an extra load. I just move to the cloud for that part and then come back. Scaling down was a very interesting aspect. And it's also a case where because of the hype, we kind of see that there's uniform simplified abstractions being presented for programming the cloud, and that seems to be pretty selective, so people say let's do it in the cloud, but when the reality bites, that's when we start [inaudible]. So large applications are missing on the cloud. Large data has been processed on the cloud, but applications have been simple. So this is the kind of experience that we are having in [inaudible] the programming being done in the cloud. The service offerings in Windows, we see that on one end of the spectrum we have large players like the Microsoft Azure or the Google App Engine or the Amazon Web Services. At the other end of the spectrum we have a lot of small niche players, and these are all cloud enablers. There are a lot of these type of companies which say that, look, give me your problem, we'll try to help you get the best out of the cloud that is there. And typically the guys who go to these small companies are IT departments of various enterprises. So what happened to the in-between part, big players on one end, small players on the other end. In between, what's happening? And that in between what's happening is many of us in the IT outsourcing industry, IT industry, are trying to grapple with, because what we find is large enterprises have CAP datacenters. They have a lot of these technologies like virtualization, which some of them are already using it, but can they have a private cloud. And when it comes to that question, a lot of questions are not answered. In fact, that's one of the big reasons why a lot of the enterprises are kind of sitting on the fence when it comes to moving to the cloud wholeheartedly. They might want to do search computing, but they'll not want to do cloud computing per se across the board. So the big question then asked is, if there's a value add, what is the kind of value add? Is it only economics? Is it only that the cost of [inaudible] comes down? Or is it that things will be a lot more easy, a lot more standard compliant, a lot more portable a lot more intraoperable, a lot more [inaudible] secure. Like what Dave was saying, we all have our information kept on the cloud assuming that we trust that the security that they give us is good enough, and yet when it comes to thinking about it, we ask these questions like is it secure. Right? And how sustainable is it? This is the other question that gets asked. I run a business long term on the cloud without depending on anything? Can So there are, of course, a lot of key issues and challenges for sustained usage, and that's where industry people like us step into the picture. We want to ask how do you handle this thing called consistency, how do you handle this thing called availability, how do you handle this thing called performance when it comes to doing, say, [inaudible] transaction processing of a business on the cloud. Right? In fact, all the popular usage on the cloud has been, and information could be inconsistent, and that's acceptable. It could fail, and it could be a case where the performance is pretty poor, and in spite of it, people are happy with the information that they get. That's the kind of applications that people are working towards on the cloud rather than doing the real hard-core persistence action processing. Now, how secure is the information on the cloud? That gets asked, and Dave Patterson beautifully gave a reply. He said when we file our taxes with the IRS, maybe IRS is processing much of our information in their datacenters which are possibly accessible to a lot of people. Right? We don't ask how secure is that information that we give them, right? We submit our information to our lawyers and our [inaudible] or the legal attorneys, and they process that information using possibly public resources. We don't ask questions. So we ask such questions, which only mean that you are aware of certain issues that everybody has been talking about, but you're not really experienced. In fact, our little analysis off the shelf, off the cuff, should I say, [inaudible] analysis kind of shows that that 60 to 70 percent of the information that generally gets used up in enterprise and which is not mission critical need not be of that high security [inaudible]. And given that situation, I think in the future maybe going forward, maybe next two or three years or four years, we see that a lot of applications will move onto cloud at the enterprise level. But then, of course, enterprises have to make a choice. Would they like to keep it in their private datacenters and their private clouds or would they like to use the cloud offerings by various public cloud service providers. That's the choice. And that's where we find that many of the kinds of questions that we ask for this particular topic that we're talking about, leveraging the -- building specific architectures for applying it to the cloud service abstractions right now, can we make life simpler and easier. Intraoperability and portability has been a big issue. A lot of proprietary service providers have kept their APIs kind of closed. Of course, there's the pricing model that also gets filtered in. And if you closely observe, price for moving data into is cheaper than price for moving data out of the cloud. Many of these service providers kind of build in this barrier there. But in spite of it, there has been at times where enterprises to kind of have a hybrid model where there is the sensitive data that gets kept in their private datacenters, parts of it is taken, processed across the cloud service offerings, and perhaps done in such a way that, say, one part of Microsoft Azure fails to live up to certain parametric standards or parameters of performance, it also gets processed on, say, the Google App Engine and things still move forward. So in such a situation, you need to have a very intraoperable setup. Programming for that is a huge challenge, and that's something which we look forward to best practice by which we can go forward. Variable seasonal cloud services pricing. Bidding for spot instances on the Amazon server. It's a very interesting example of what the future could be, because, all said and done, the number of resources, the quantity of resources available to even the large service providers is finite. And it's all a question of how they manage it, and it comes to us asking for it. And the way they will manage it is they will move on to a bidding situation where the prices keep varying and the challenge for the enterprise architects would be would it be economic use the public cloud service offerings when the price varies. Because at a certain point of time it's meaningful, but at a certain point of time you may not want to [inaudible] any tasks on the public cloud. So that becomes a big factor, and understanding that is important. Similarly, you have issues like multi-tenancy and reputation sharing. These are all things which have been talked about, but then I'll touch upon it later. So the big challenge, again, comes here. I want to migrate my big enterprise application into the cloud. How do I go about it? Do I just move the application as-is to the cloud, which is like do I go out and use it on Amazon as IT system administrator would like to do without touching anything inside? Or should I go fine tune it, program, perhaps reprogram it and use the platform that, say, Azure or the Google's engine provide? Or should I possibly use an application which is already available to us on the net and maybe that is kind of based on the cloud or a prior cloud. Like, for example, if you take salesforces.com, I'm sure they're using a lot of the cloud technologies run the show. If you look at Gmail, all of us use Gmail or HotMail, and if you closely observe, much of the way in which it happens is the cloud technologies get used without you even knowing about it. So depending on what application, what level would we like to get into, step into. This is kind of a challenge that comes for a typical person who wants to migrate the application in. And typically when you talk about enterprise applications, not like one particular functionality or a certain focused things, it is typically, in a core system -- an application doesn't live in isolation. So it's connected to a number of other applications, and you would want to migrate into various parts. How do you put that together? This was an enterprise application which was a big topic, say, several years back. And then came [inaudible] web services. All that kind of means that when move to the cloud, they will all pop up in various shapes and forms. And that's a big challenge. How do you address it. That's kind of a challenge or an issue that we are looking at. And what did we find? It's kind of a compromise. Yes, we cannot have a hundred persons move into the public cloud because we do not know. Public clouds don't promise anything. They don't give you SLAs which guarantee what's called [inaudible] performance, five times 9999 -- 99.9999 percent [inaudible]. You can't run your own private cloud because if you wanted to have a cloud which is really meaningful in the cloud sense in your private datacenter, you really need to have scale and you really need to manage well. It doesn't make sense. Then what do we do? We kind of come up with something just hybrid in between. We use combinations of this. It's not that just because the cloud has become such a fantastic IT [inaudible] technology available to all of us we abandon the datacenter and the enterprise and move on to the cloud. At the same time, we are not going to augment it either. So the [inaudible] combination [inaudible] use of [inaudible]. And when you want to do this hybrid cloud, what do we leverage? Where do we go about? So these are kinds of things that come. Do we use combinations of infrastructure, combinations of private platform or combinations of software service that comes in to be able to put together the small ecosystem of applications that kind of represents the application that you have in mind? So what level do we leverage? Second is why does it cost more? Because is a program that uses a particular cloud service software with a certain pricing model meaningful, but then when the prices shoot up, you'd want to shift. That agility is something which everybody looks forward to. Can we build in that agility? In fact, the architect who designs this kinds of solutions has an additional dimension to think about, and that's the variable pricing part. And there are associated risks with it. What configurations and deployments do we attempt? Do we have a certain deployment which then can be revoked and moved on to another deployment? Think about it, because in the real world, in the world of the enterprises, applications continuously change their -- they kind of metamorphosize continuously. [inaudible] a picture or a notion of an application. So viewing all these kinds of big-picture situations in the industry, the question that gets asked is can we operate the principles and discipline of software engineering on using cloud services? What do you mean, is the question. So when you develop an application or when we put together, integrate a bunch of applications on the cloud, can we use the methodology, the metrics, the software engineering principles and then come up with mechanisms and means which can help people estimate the costs correctly, estimate the [inaudible] probably, ensure that the level is really properly handled and manage and maintain. So these are the kinds of issues that we face in a typical software industry, but can we do that. That's a question that comes up. And in that context, we have an answer, and the answer is there's something called domain specific software architectures. If you closely observe, applications don't live in isolation, they are clusters of applications solving some class of problems in a certain domain and a lot of interesting things that go along with it. So there is this nice, interesting group called Software Architecture Foundation Theory and Practice which has been coauthored by one of the authors of this presentation, Nenad Medvidovic, and in that you will find a certain definition wherein we talk about domain specific architecture comprising a reference architecture, a component library, an application configuration method and how this reference architecture can be used to make a lot of principle design choices. So moving forward, we did an analysis of a lot of good technologies too. We went and applied some of these understandings and see how good programming packages like this, how do they go about implementing their understanding with respect to the reference architecture that they talk about. And then we found that there are a lot of interesting inferences that one can draw upon and how these kind of technologies have been used by the applications that having been using. The good thing about grid is that both above the application level and below the systems level, things are visible. But in the case of clouds, typically from the public service offerings you'll see that it opaque. You cannot really understand how Google manages its Google file system or what Microsoft does with respect to its database or data management, right? Of course there's a certain understanding, but nothing beyond it. This kind of gave a certain insight ->>: What is the third column, KSLOC? T.S. Mohan: Kilo source lines of code. Okay? From this we could extract this kind of abstraction. And this is very high level, by the way. Each of these boxes represent a certain set of abstract calls, APIs which specify the architecture and which are kind of cutting across several of these implementations. Like for a time there was this thing called a run time [inaudible] which does a lot of this run time management aspects of a typical grid system. There was this resource abstraction which captures within itself things like storage, things like database, things like persistence, et cetera. And there's the fabric part which talks about the communication part. This is the rest of the grid. Now, can we apply these kinds of abstractions in the world of cloud. That's the challenge. And if they apply, can we influence the way the migration to the cloud happens such that it's really optimal. That's the kind of challenge that we are taking up. And in that context, we looked up a number of things that we get to do and we do cloud programming, and it's typically what's been talked about for a long time [inaudible] systems fallacies like, for example, in the cloud the network is fully reliable, which is not true, that there's zero network latency, which is, again, not true because maybe within a rack the latency is pretty small, but across racks or within a datacenter, it could be a little larger, and across multiple datacenters, absolutely big. So these kinds of assumptions that we make impact the performance of the application. Right? So these are a bunch of things that have been true for a long time in the distributed systems, and this affects the way the cloud programs are configured, deployed, or programmed. And if you closely observe, there are other things which are also to be worried about. But then we do not have a direct handle in the cloud abstraction, cloud service abstractions, like [inaudible] infrastructure on the platform where we explicitly can play around at this level. So keeping these kinds of things in mine, keeping the DSSA in mind, what are the typical steps that we go through for migration? Perhaps first we evaluate and assess what options we have when we split an application across into components, see which components need to go to which part. Then we do a pilot on the right level of migration, check it out, and then re-architect or redesign or recode the total component so that it migrates in full. And then having done that, we leverage the platform advantages and then we, of course, look at the largest picture. There's a platform, but then there's a larger picture of multiple, all these together, and then we validate it. Once we validate it, of course, we refactor, reiterate [inaudible] kind of migration service that we get to. Having done that, what is it that we kind of come to? We have come to reclassify the kind of cloud services abstraction that you like to at, and that [inaudible] domain specific architecture viewpoint for a class of [inaudible] programs. And these abstractions like this, the domain specific application services abstraction is kind of core to software as a service thing. Platform run time collector services go to the platform as a service cloud offering. The run time collector services abstraction is the one which manages the cloud within Amazon. The resource service abstractions takes care of storage, and, of course, fabric services abstractions handle things like the M queue, message queueing support, which is there. And having said that, we are studying a lot of applications to get things in place. And, of course, stuff? So this wherein you can for [inaudible] instances. what does it mean in the existing setup, these kinds of kind of combines what -- like we have a typical [inaudible] explicitly ask for run time support, you can explicitly ask interface and you can communicate between the various server Now, this is what is kind of interesting, so this would be in the kind of abstractions that we have, the reclassified one. Same case with platform service and same case with the software service. In fact, while this looks really abstract, there's a lot of detail work going on, and I think given the time, I would like to stop at this level of abstraction. And how challenging is it to do this job? To be very clear, we got opportunities and options to either re-architect the whole application, or parts of it, redesign the same architecture [inaudible], but redesign it or perhaps re-implement it in a different programming platform. That is one part. Again, another part is we need to keep these kinds of issues in mind, parameters in mind. And when you look at it, the kind of options that we have, on one side we have the private datacenter and private cloud with the code -- we take an application, we split it into parts, we take one part, we can either keep it sequential as-is or we can run it in one of the three modes within the cloud. And on the public cloud too. So the number of options are humongous. Too many of them. So this number of options into this number of options is what we need to consider when we want to do a migration. This kind of gives the big picture of what I wanted to say, and I stop here. >>: Thank you. So we can ask questions without having to go around a microphone, because we have microphones on the ceiling. So please ask questions to our speaker. >>: You were surprisingly very clear [laughter]. T.S. Mohan: Thank you. >>: So the emphasis is -- what is the position of emphasis with the cloud? What are the plans. T.S. Mohan: That's a good we. I told you on one side. On one end of the spectrum we have the large service providers like Microsoft or Amazon who have huge CAP cloud platforms. On the other end of the spectrum we have companies like RightScale, Cloudera, and these are all small companies that export these license enabling people, but in certain specific domains. But then companies like us, we fall in the in between category. We kind of look at both ends and say should we be having a CAP datacenter or should we be doing that? If do you that, then they're really so small, we're not really looking at the scale that comes out to that. If you do this, then we are not really a service provider in the sense that we're IT services [inaudible], no. We are getting to the ->>: [inaudible] still looking into it? T.S. Mohan: Not just we. We found that a lot of companies are still grappling with [inaudible], and that's where this kind of thing comes. Because if you see the earlier slide, the complexity of this into this, the amount of ways in which you can do, that shows what migration of a typical application could be. And an application and the enterprise doesn't exist in isolation. That's the first reality that one has to wake up to. Second thing is it's a core system of applications that exist, and in that ecosystem, parts of it -- parts of parts of applications will be either in the public cloud or on the [inaudible]. And conquering the complexity is where I think our scent is going to be. >>: So what is the position on the [inaudible]. T.S. Mohan: The position is to research and to build up the competencies, sell the customers and have them make their money, and we make our money in the process. >>: Where do you address the quality of service requirements? part of the pricing or [inaudible]. Is it just T.S. Mohan: Good question. Now, if you are asking me what's the viewpoint from a service provider, then it's something called SLS or agreed-upons. Service-level agreements. The QS parameters fits into the service-level agreements, and when a datacenter has multiple options of what tasks to schedule, what resources to allocate, the QS comes into the picture because the SLS have an agreed-upon. And if you closely observe today, most of the large-scale service providers like either Microsoft or Google wouldn't want to commit on very tough SLAs. In fact, they don't guarantee anything at all. And in spite of it, many people have benefitted from using the cloud. But then the usage of the cloud is also not that [inaudible]. It's not like you have a mission-critical transaction-posting system sitting on the cloud which can take on the surge needs like, for example, high loads and still [inaudible]. Right? >>: [inaudible]. T.S. Mohan: I heard much about what you said about Apex, but I can share this thought with you. Domain specific service software architectures are not just limited to the platform or to the architecture. In general, at the systems level we talk about domain, say, finance or, say, insurance or, say, the biology, so we look into the specific class of problems that a particular approach solves and look at the domain specific best practices there. That dictates the architecture, and that gets [inaudible]. >>: Do you have another question? >>: Yeah. So you talk about kind of aversion to varying the pricing. Enterprises are worried that these prices are going to change. Do you that's a valid concern or -- if seems to me that this variable pricing going to exist regardless of whether you use a cloud or you build your datacenter. I mean, you have to pay for electricity from some utility T.S. Mohan: >>: No, no. think is own -- You see -- [inaudible]. T.S. Mohan: Exactly. Variable pricing, if it tends to be all the time, if it's going to upset your budgets, it's a big concern, if variable pricing is going to be stead for a month or two at a time. Like suppose every alternate Microsoft announces a new pricing, and that's got a little predictable, say. Absolutely no problem. No worries. But suppose you are budgeting, say, for example, there's a big Superbowl activity going on in the U.S., a bunch of standard companies have started off and they want to come up with a program that anybody can use, perhaps an analysis of how the batting or bowling order is done, and for they want to have advertisements to be priced and sold, and for that they have to fix the tariff, and at that time they have to consider the pricing that they have to pay, and suddenly if that goes up or down, the same type thing changes. Right? And that's where pricing [inaudible] this is something that architects have to consider when they design these kinds of applications. It's not exclusively linked only to the managers of them. >>: I think we're behind schedule. T.S. Mohan: Thank you so much. Thank you very much. Thank you. [applause] >>: And now we have Tajana Simunic Rosing from the University of California, San Diego, and she's going to be talking about achieving energy efficient computing for future large scale applications. Tajana came about two months ago -Tajana Simunic Rosing: About, yeah. >>: About two months ago, and her presentation was fabulous so we insisted on having her in this workshop. Tajana Simunic Rosing: Thanks. I appreciate the implement. enjoy it. Otherwise I'm in big trouble, right? I hope you'll So I'm actually heading System Energy Efficiency Lab at the UCSD. And the focus of our work, as you can see from the title, is on achieving energy efficiency, in this case across scale. So from our view, the future of IT is actually bridging the gap from very small devices that may exist around us in terms of sensors and various ways of measuring things and interacting with the environment. Those devices already today talk to mobile devices that we carry around us and with us, so our cell phones, our iPads, for those of you who got in the line, and other systems that are basically battery operated, and eventually the data makes it to the infrastructure cloud. The interesting issue here is that energy is a problem across all of these scales. You've got a problem with energy if you carry anything that's battery operated or even if you go to the outside rim of the circle, you have devices that may use energy harvesting where energy is really at a premium. It's a problem to the infrastructure because it costs a lot to operate these datacenter clouds. So the question is how do we actually optimize, how do we deliver good performance, how do we deliver result that people are after while at the same time maximizing energy efficiency. So I'd like to give you a couple of very realistic examples of applications and application domains that we've been working with in the San Diego area. So what you see on this map is actually a very large scale wireless mesh sensor network. The picture represents only the top layer nodes. So only the big communication links. Under every one of the dots that you see on this map are literally hundreds of sensors and sensor node cluster heads. UCSD is this little dot right over here. The network actually spans about a hundred miles in length, it goes 70 miles off the coast, and it covers an area from almost down to the Mexican border all the way up to Riverside County. What's really exciting to me personally about this network is that there are very few computer scientists that do research on this. There are lots of people who do research in other areas and actually communities that utilize this network on a day-to-day basis. So let me give you a couple of examples. On the very low end, we actually have a whole a bunch earthquake sensors on the network. The sensors will produce about five kilobits per second worth of data, which really isn't a big deal at all. The only problem is that they're pretty much all solar powered because they tend to be in these random locations. Right now their in one-mile increments around all of San Diego County. The goal behind this particular application is to study what's going on with seismic activity globally, world wide. And the San Diego area was the starting point. Because of the availability of a fast wireless connectivity, this project was so successful that it got recently funding to expand all the way along the West Coast because they're now able to actually stream the data on a continual basis and catch even the smallest tremors on time. So five kilobits per second, not a big deal. But when you compare it to everything else that sits on the network, it can start becoming a problem. Motion detect cameras and acoustic sensors are actually present in one of our ecological reserves. The study here, what you can see on the left picture of there, is wolves. So there is an indigenous California wolf population that actually lives in this area, and what we're trying to -- what people are trying to understand is the basics of how wolves behave. They look at that from the perspective of video and also audio. Unfortunately, audio and video are physically separated. So video tends to be placed up high so you can see a good picture. Audio tends to be placed in areas where you don't have a lot of wind. So it's in a different location. And you need to be able to gather these realtime streams of data and correlate them in time and then run sophisticated algorithms. So on the audio that you can see right in here, they're actually trying to run speech recognition to understand and correlate the pattern of the sound that the wolf makes to the behavior that they see on video. So here are the two streams of data that you actually need to analyze fairly quickly in realtime, and it consumes more bandwidth than earthquake sensors. On top you see another ecological reserve, and actually the reason why I included this is the student that you see sitting up there. She's actually sitting on a big ledge on the top of the canyon right up here. And what you see is her laptop with an antenna. This antenna is pointing to our access point. This student can get about 11 megabits per second connectivity on the edge of that canyon. She reconfigure all of her experiments throughout about one mile by one mile radius where she has set them up. Her job is to study ecology. She doesn't really care about computer science at all. What this particular network has enabled her to do is to actually study things in realtime both from her office and also to reconfigure her experiments in realtime even when she's in the field by simply sitting down and pointing the antenna in the right direction. Down here you can see actually a couple of still images of what used to be a video produced to support California Fire Department So San Diego, as opposed to Seattle, has way too much sunshine. So much so that in September and the beginning of October we have fire season. And unfortunately, fairly frequently we get very large fires, and what you probably are not aware of is when the fire starts in the San Diego area, usually the wind is blowing really hard. It can propagate extremely quickly. And the only way for the fire department to know how to deploy people is to have some way to monitor this progress on the ground. With a network such as this one, they're able to do this from their offices. They're also able to get alerts that tell them this particular area has ripe conditions for the fire or we even can see fire beginning. And then on the very high end, I give you an example of one of the two observatories that we have on the network. While this is not terabytes of data, we have limited this observatory to 150 megabits per second, primarily because our network actually supports about that much. If we gave them more bandwidth, they can definitely do more. But that would mean that nobody else could use the network. The reason why I include this is so you can see the wide range of bandwidth that has to be supported, from 150 megabytes per second down to five kilobits per second. And also all these examples include some constraints in quality of service. So if you have a fire in the middle of night and the observatory happens to be streaming beautiful pictures of the night sky, you bet that people will prefer for the fire images to make it first and to get to the fire department on time. However, this is an event that's unpredictable, so you need to be able to reconfigure, and you need to be able to deliver data and compute and detect right in time. So this actually brings another idea that we've been working on, which is a CitiSense Project. This project has been funded NSF and also by industry and is a project that's done jointly with another NIH-sponsored project which looks at how does environment around us and the decisions that we make on an everyday basis affect our long-term health. And basically this project is possible because we do have a large-scale environmental sensor network that we can use to provide feedback to us on a daily basis as we decide to go exercise or as we decide to just be lazy and, you know, sit with a laptop in our lap. So with projects like this, what you find is that we have a combination of data that comes from the environment, data that's relevant to us as individuals in the form of how much are you exercising, when. Also in the form of genetic background might you have that would allow people like healthcare professionals, public health officials, your doctors and you as an individual make better decisions going forward. If you start imagining a system in which everybody is being monitored 24/7 and is getting positive, hopefully, feedback from this system, you are beginning to imagine a humongous amount of data that this is going to create and the humongous opportunities for large scale computing both on the back end in the cloud and also locally on your cell phone. If you think about healthcare, you need to have sometimes realtime feedback right where you are. You cannot possibly always rely on data streaming somewhere to the back end and then the result coming to you. So you need to be able to actually do computation on both sides, and you need to do it efficiently. So here's an example of a clinical trial that we ran. This was last summer. This particular clinical trial was done with our school of medicine, and it focused primarily on physical activity. A very simple question was asked. The question was if we provided realtime feedback to individuals through their cell phone in a very low-tech way on how much physical activity they have done to date and some small encouragement to do better, will they change their behavior, how much effect would it have. So for that we selected a sample of over 60 individuals. Individuals were selected primarily based on the fact that they had struggled with obesity. So these are people who typically do not exercise a whole lot. They were given a cell phone and a couple of sensors were placed on their body, basically a heart rate monitor and an accelerometer. And the feedback was then through SMS and MMS messages. So very low tech. It turned out because they were able to get very realtime information about how much they exercised, people actually significantly increased their activity. They changed their behavior. And the result of this change was much more significant weight lost as you compared the group that we studied that used our system versus the group that didn't. And what I think was the biggest outcome of this study is that over 95 percent of people who used the system wanted to buy it. They wanted their friend and family to use it, they liked it so much. So what we learned from this is people actually do care, and they do change their behavior if you provide feedback in a way that's relevant to them. And that is really what motivated our work going forward on the healthcare management system where we look also at the environment and its effect. One of the challenges that we found is because we're doing this monitoring 24/7 and providing feedback 24/7, energy became a very big issue. Batteries started dying very quickly. So this is what motivated research in energy efficiency across the scale. So I was very happy when Dave Patterson said, well, you know, if you want a research topic, you need to really look at how to do energy efficiency in mobiles and on the cloud, because that's exactly what my group does. So I guess I listened to him, huh? So the idea here is that we have sensors in the environment, we may have some sensors on the body. Those may or may not be battery powered. Some of them may actually be using energy harvesting. So an example would be solar and wind. You have what we call here a local server, or cell phones, basically, and then you have back end. And there are sets of tasks that you can assign to the sensors, that you can assign to the cell phone, and that can run on the back end. There's some tasks that can only work on the sensor, like sensing, obviously, but there's a good fraction that you can assign to any part of this system. And the decision of who runs what at what period of time will significantly affect the length of the battery lifetime that you will have, and it will also affect the amount of computation you end up with at the back end. In fact, in our most recent result, we found that if you dynamically assigned these tasks across this scale, you can increase battery lifetime by about 80 percent on the mobiles, which is a very big deal for the particular healthcare scenario we're looking at. So with that, I would like to now focus a little bit more on the cloud side of this equation. So what does it mean to do energy efficient computing in datacenters. And for this, there are a number of challenges that we're looking at. So what we're monitoring are temperature, power, and performance. So we've been talking a lot about energy, so people are used to think about power. Everybody cares about performance. Temperature I don't think has been mentioned at all, and yet if you look at the operating cost of a datacenter, depending on how well you designed it, about half of this can go to cooling. Temperature strongly affects reliability, and that is why you cool. So you really cannot design a system that's energy efficient without looking at the cooling and thermal aspect. What we control are various cooling settings, power states and task scheduling. So where should the job run. And what we actually look at predicting is what temperature is likely to do in the near future, because it turns out the temperature changes relatively slowly as compared to the workload. We also tried to estimate what incoming workload will likely do in the near term. The goal of these predictions is to buy us a little extra time so that we can be more energy efficient. The particular research focus in my group is on looking at both what we can do in terms of individual server redesign, and there it focuses on memory and storage architecture. So what can be done to make a computer more energy proportional. It's not the CPU redesign, it's how the memory subsystem actually interacts with the CPU. We look at power management techniques, cooling, and then use virtualization as a method to actually implement all this. So for power internal management, we've been lucky to get funding from NSF Project GreenLight to deploy a green cyber infrastructure which basically consists of a couple of these datacenter containers that you see up there. This allows us to play at fairly large scale all kinds of thermal and energy management games that wouldn't be possible just within a small machine room that most department have. So for that we have developed some power management algorithms and also some thermal management algorithms, and I'll show you in a second a little bit more about each. So for power management, we looked at traces of work loads, realistic work loads, working on a whole a bunch different devices. And here I've just included a sample of two, a hard disc trace and a wireless network interface trace. The reason why I included these two is because intuitively it would seem like they should look completely different. These are totally different devices. One is much slower than the other. One tends to work with larger sized data, the other one works with smaller sized data, and yet, when you look at the shape, the shape is the same. What's on x axis is the interarrival time between requests to the particular device. What's on the y axis is 1 minus cumulative probability distribution of getting those interarrivals. What you see in both cases is the experimental data, which is in this teal color, does not match exponential fit to it at all. It actually matches a lot better Pareto distribution, which is a heavy-tail distribution. The reason why this is really important is because when are you going to do power management? You're going to do it when you have long enough idleness, right? Now, look at what happens at long idle times. At long idle times, exponential fit is very poor. It's not even close. The reason why we even talk about the exponential distribution is because people use it to model performance. To understand, they use basically queueing theory. If you look at the high performance regime, exponential is actually close enough. So it makes sense in some cases to use it for performance modeling. It makes no sense to use it if you want to save energy. You're going to make a whole bunch of wrong decisions. So based on this, we actually expanded markup decision process model and accounted for the fact that we need to have heavy-tail distributions to monitor recent history of behavior of the workload. And because we did that, our implementation showed significant power saving. So measurements were within 11 percent of an ideal oracle policy. Ideal oracle policy is a policy that knows the future. So as soon as the idle period begins, it knows exactly when it ends. The assumptions were that we have a general distribution that guides the request interarrivals, that we have exponential distribution for everything else, because it turned out that that was close enough, and that everything is stationary. The last assumption is actually the most limiting one. Stationary tells you the statistical property of all of the parts of the system do not change in time. That clearly is not true. So in order to address that, we actually used online learning algorithm. What this algorithm does is it takes a number of policies that may be a result of our optimization and then it adaptively selects among them. As it selects each expert or policy, that policy will make decisions. Once a decision is made, we can evaluate how well it's done, and then we update the costs. And then on the next interval we'll select the next best performing expert. What's nice about this particular algorithm is that it's guaranteed to converge to the best selected policy very quickly. The convergence is at a rate that's a function of the number of the experts and the number of time periods when you actually evaluate this. So the end result is that you get very good savings, even when the workloads are changing. So let me give you an example. In this particular example we specific chose real-life traces from a datacenter -- in this case this was done at HP -that have fairly different properties. So what you see here is the average interarrival time and the standard deviation. So I specifically picked traces that have significant differences between each other. The first table shows you the results if you have each individual power management expert making all of the decisions all the time. The second time -- and you can see that one of the policies has been optimized for least overhead and performance. The other one is maximized for maximum energy savings. With our online learning controller we can now trade off how much performance overhead versus how much energy savings we want seamlessly across the traces. And as we trade that off, we see that our controller will pick a policy that gives us lowest delay when we choose that, and it will automatically pick the policy that gives us the maximum energy savings when we need that. So it able to adapt very quickly across different traces and across a set of policies. You can do the same thing for changing voltage and frequency of operation on the processors. In this case, what we're looking at is running from 40 percent to 100 percent speed. And as we trade off, again, lower performance overhead, which means you tend to run faster. For more energy savings you will tend to pick lower frequency setting. What's interesting is that fairly good fraction of the time you also will run faster. Right? Why is that? Well, the reason for that is fairly simple. If you look within typical workload, you would have parts of the time when the workload is very intensive in terms of CPU time and you'll have chunks of time that can be fairly large when it's waiting for data to come from memory. During those times you can slow down without any performance hit at all. this particular online learning approach can actually adapt very easily. can monitor this and immediately pick the right approach. So It Now, all of this up until now talked about only energy management. The second half of the equation is temperature. So what this graph shows is the percentage of time that you spend above certain temperature range if you use a standard Linux scheduler for a set of workloads if you do energy-aware optimization. So in this case if I assume I know exactly what I'm going to run and I do an absolutely optimal assignment and maximize my energy savings, that is the result that I'm going to get in terms of temperature distribution. And the last one is if I also do thermally aware optimization. And when you immediately see is that optimizing for just energy savings does not solve your temperature problem. The reason why it does not solve it is very simple. So you're going to maximize your energy savings if you cluster your workload into as few areas as possible and you shut off everything else. Shutting off definitely will cool things down. However, clustering will heat up the area dramatically. So it's because of the clustering that you end up with all these hot spots that you see on this plot. So as you think about energy savings, you actually have to look at both sides of the equation. You have to consider thermal constraints. So we did that using the same online learning algorithm that I showed you for power management, and what we did here is we took workloads that were collected from an 8-core Ultrasparc T1 system. This was done at one of Sun Microsystems customers, so we used their workload. And then we basically took one hour from each day over a period of the week and we concatenated that together to show adaptability. And that's how you get A, B, C and D workload and then the average on the right. And what you can see here is a set of policies starting with default OS scheduling. This was actually a Solari [phonetic] scheduler to add migration, which will move a thread when things get hot, power management and voltage scaling, which will basically either go to sleep or slow down when it gets hot, adaptive random policy which actually improves on standard operating system scheduling by scheduling proactively to the cores, which are relatively cool, and then online learning, which just selects among all these policies. And you can see that the across all of the examples online learning will beat every single individual policy. And, in fact, it even beats by 20 percent in terms of hot spot reduction in comparison to the best possible policy. So being adaptive really pays. So these are great results except for the fact that every time you do thermal management, you pay a price in performance. So when you migrate your thread, it costs you some time. When you slow down, it definitely costs you time. If you go to sleep, it kills your performance. So instead of reacting, what you really want to do is you want to be proactive. You want to avoid getting hot, if you can, while still delivering good performance. So that is exactly what we did. We forecast the temperature, and based on that forecast, will proactively assign workload so that performance is kept at a best possible level and energy is saved. And they did this by taking data from temperature sensors. So every single system has a whole bunch of thermal sensors in it, and all have to do is just tap into them online. So we take the data, we develop the predictor based on statistical model, in this case ARMA model, we predict the temperatures, use that as a feedback to scheduler, scheduler then makes proactive decisions on how to send workload, and based on those decisions, you hopefully get much better result. Now obviously as things change dramatically, you may have to update your model. So we have a very quick online way to update that. And the end result is, as you can see on the right here, over 80 percent savings, reduction in thermal hot spots. Right? And we do that without necessarily having much performance overhead. In fact, the own time that you get any overhead is when you run in a very high utilization regime, and there really isn't anyway to proactively schedule. You simply have to slow down or migrate. So it really pays to be proactive. And this is why we're convinced that predictive work will actually make a big difference in these systems. Since last time you saw this talk, we have actually gone a step further and we've look at cooling aware thermal management as well. The basic intuition is very simple. So it turns out that if you look at typical fans, your server, they run at a fixed number of speeds, so say about five settings is pretty typical. The speed settings differ in the amount of power. In fact, as you increase concur speed, power is actually cubically proportional to speed. So the amount of power you're going to lose by increasing speed is huge. As a result, it pays to actually try to make sure that you pack as much jobs as you can into a particular socket or into a particular server up to a point that doesn't cause the next increase in speed. So that's quickly intuition behind what we did. We said if I have a high-speed fan and I have a low-speed fan, I'm going to look to move jobs off the high-speed so it slows down in such a way that the low-speed fan doesn't speed up. And the other way around also. I may want to swap threads between those two in such a way that the speed of the fans does not increase but actually remains reasonably low, and therefore we can save. So you can see from these results on the various workloads that we ran that you can actually get about 73 percent savings in terms of just cooling energy. So putting it all together, we've been using Xen virtualization system, we've extended it, which we call vGreen, and what we're doing there is we're actually doing online workload characterization and also thermal characterization at the same time. We characterize the characteristics of every single virtual machine running on every single server and every single processor, and those characteristics are aggregated all the way up to the node level so that we can make decisions that have to do with the individual VMs and also individual physical components. And then we use that to perform scheduling, power management, thermal management, and to make migration decision, if any. And you can see from preliminary results that even in highly utilized systems, so these are systems that are running at 100 percent utilization, which is not really that realistic, we're able to get good energy savings with speed-up, and we get these because we're monitoring characteristics of each VM and we schedule VMs that play together well on a single socket or on a single server. That is really the only reason why we get these kinds of savings. Now, as utilization comes down, you can see where the savings would clearly go up if you have ability to utilize more power and thermal management knobs on your server machines. So going forward, leading a fairly large center, it's actually through MuSyC, which stands for multi-scale systems center, the goal of this is to manage energy across all of the different layers within datacenters and to show and to ensure that energy will be consumed only when and if needed instead of wasting it. And we do this from software layer all the way down to platform and hardware level. There are a number of faculty involved from UCSD, UC Berkeley, USC, Stanford, and Rice University on this project. So we're really trying to look at this from a holistic perspective because we believe that you cannot solve an energy problem by just doing a software solution, by just doing a hardware solution, or even just solving the cooling problem. to think about it as a whole. You really have So to summarize, the key is really to ensure that we monitor what is going on in the system, we develop policies that are aware of what's coming down the line, that we're aware of the hardware characteristics. We can't just pretend that hardware works perfectly and works equally everywhere. It doesn't. We, in fact, need to leverage these differences to our benefit. And as I've shown you, we have done some first steps, too, toward implementing power management and thermal management policies that behave well, and we've started integrating this into vGreen virtualized system. So this is pretty much all I have to say. >>: So we can ask questions without any microphone, so, please, if you have any questions. >>: So do you have a study of how software inference [inaudible]. Tajana Simunic Rosing: Yes. So, actually, Alan [inaudible], who is right in the software energy management team, has worked with me on looking at how we can predict the type of performance and the energy costs that future applications are going to demand out of future hardware and how would then design those applications and design the hardware to meet that. Basically the concept behind this has to do with creating profiles off the machines and creating profiles off the applications. These profiles of applications are basically looking at relatively simple kernels that you can detect as the application is running. So you're not looking at the source code of application at all, you're just monitoring it running on today's system. And as it's running, you gather enough information to figure out where are the critical hot spots. Then you convolve those two together to figure out what is going to happen if you take today's application on some new system or the other way around, you know, what if I take some future application and run it on today's system, what am I going to get. And based on this, then we can actually make some better decisions. So that's a good question. Any other questions? >>: How did you actually measure the performance? time or [inaudible]. Purely on the response Tajana Simunic Rosing: So it depends on the application. And that's a great question. So if you look at multi-tier applications that have response time guarantees, then, yeah, we basically measure response time. You can measure the bandwidth. For the examples of video streaming, you know, you can actually see the quality of the experience, frame drops, and so on. So it's a strong function of what it is that you are actually running, and that's part of the challenge here. And then if you look at what happens within the kernel, you actually don't have a good way to get the idea about performance, and that was one of the challenges that we looked at is how do you provide this feedback to a virtue machine scheduler so it knows what to do. Because otherwise how is it going to make it better? >>: So you actually made a [inaudible]. Tajana Simunic Rosing: So we actually created a very thin layer interface that allows application monitoring and feedback into the virtue machine in a way that doesn't create overhead. >>: [inaudible]. Tajana Simunic Rosing: >>: Basically. [inaudible]. Tajana Simunic Rosing: Yeah. >>: The perfect [inaudible] you are measuring as a conception, I don't think you can have an and exception for air conditioning [inaudible]. Tajana Simunic Rosing: Yeah. So I talked a little bit about that. We haven't done the scale of a whole room, but we are capable of doing it. So what we have right now is ability to actually measure how much we're consuming at the whole level of the datacenter container because we're measuring the amount of power that goes in and also we know the rate at which water is coming in, what temperature the water is, and then the rate at which it's going out and what temperature it is what it goes out. So based on that you know exactly how much you're consuming on the whole box level. Now, inside we also measure all the temperature distributions of the racks, the heat exchangers, and the servers themselves. And we know all of the fan speeds. So you can kind of figure out, you know, and allocate the costs across this. We're not quite at the point where we've developed policies that run at that scale, so I started sort of from the server and then started scaling it up. And a friend of mine who is working at the software level more, he started from the top level, you know, monitoring everything, and he's working his way down. One of these days we'll meet, right? >>: Any more questions? Tajana Simunic Rosing: Well, thank you very much. Okay. Thanks. [applause] Zach Hill: Hi. My name is Zach Hill, and I'm presenting our work on early observation on performance of Windows Azure. So we've seen a lot of talks today about people's applications on clouds and we did this and we did that, but we kind of took the perspective of an application developer looking at these new technologies and trying to decide do I want to use it and, more importantly, if I do, how do I build my application in Azure specifically. So the question is not can I do it but how should I do it and how should I design my application to best utilize these services that are provided in this cloud environment. So, specifically, we're focusing here on how do various Azure services perform somewhat in isolation, and we ran these experiments between November and January, so just as it was coming out of the CTP and into the final commercial release. So with some slight disclaimer that you may see something different than this if you run these experiments today and you may see something different than that tomorrow, as we all know, there are precious few performance guarantees given by any of the cloud providers. So we kind of give these as general recommendations and things we've seen and experiences we've had working with the cloud, but certainly had to take all of these with a slight grain of salt. So here we kind of present a fairly typical application architecture. You see things like this fairly often in documentation and literature. You have some users submitting requests to some web-based front end that goes through a load balancer that hands off work to some task queue and in the back we have workers that operate on these tasks and do some batch processing or whatever, and they then interact with various types of storage tables, blobs, SQL services. So we're going to look at each one of these individually. So first I'll start out with looking at kind of what's the performance we can expect when deploy and scaling the compute resources themselves. So we're not looking at how many CPU cycles are we getting, how fast can I execute this algorithm, but if I wanted to actually deploy an application and scale it up, for instance, what kind of performance can I expect, how long does it take to do these operations so I can kind of give you an idea of the kind of parameters you need to take into account when you're designing an application, particularly a scalable weapon applications and things like that. We'll also look at the storage services, so we'll do some member of marks of the task queues, the tables and blobs. These are fairly straightforward measurements, but we think they're interesting and in the results, and particularly with relation to some other metrics, such as the direct TCP communication, which was released -- oh, was that January, I think, or December when they announced the feature of allowing worker roles to interact directly via endpoints. So it was not part of the original Azure offering. Originally these instances could only communicate through the storage services, but now you can actually define a direct TCP port and make direct connections, so how does that fit into the larger performance picture. And, finally, we'll wrap up with the Azure SQL services, so their actual relational database in the cloud and how does that compare with either what you would find in a local LAN environment and what kind of parameters can you expect for scaling it and performance when you're actually writing applications against it. So starting off with the deployment and scaling, so our methodology here was to kind of evaluate how long it takes to deploy and how long it takes to scale. So we deployed applications from the blob storage itself. The deployment packages were essentially trivially small, so less than five megabytes, so we can kind of erase that out. Then we measured the time to start the deployment. So we present some numbers for some different instance sizes. In total, we ran eight cores. So, for instance, for the small type we start up four instances and then we scale it, and I'll talk about that in a minute, and for medium size we start up only two because those are two-core, so you kind of get the math of the cores. Then we also measure the time to actually double the instance count. So we start out measuring how long it takes to bring up four instances, and then what if we want to bring up another four, and how those two numbers relate and what can we kind of expect. And these are the experiments that were run between December and January. We ran it 431 times, so if anybody's really interested, we can give you this nice, long plot with every single data point. You can see the variability and stuff. We will omit that for this talk. We simply don't have time or space. We actually did experience a failure rate of 2.6 percent. So 2.6 percent of the time one or more of the instances didn't come up. And that's worth noting, because, again, with the hype surrounding cloud that you always get this stuff, well, that's not necessarily the case. You don't always get the resources either right when you request them or at all, so you have to account for that when designing your applications. So here's the time to deploy and the time to see the very first instance come up. So here we have minutes in the scale, which is noticeable in the first place that it's even a minutes scale, and here's the various VM instance sizes, so small, medium and large. And then we distinguish between web roles and worker roles. If you're not familiar, the web role essentially has IIS attached to it and it hooks itself up to a load balancer. So based on that, we kind of expected the web roles to take a little longer, and indeed they do. We also maybe or maybe not, depending on your perspective, expect larger instance types to take a little longer. It's interesting that they do, but not significantly. So if you actually look at the time per core in some sense, extra large, you get eight cores in 13 minutes versus 1 core in 9 minutes. So these kind of design tradeoffs are interesting. If you need lots of resources very quickly, that's actually your best bet. But overall, the first impression we had when we saw this was, wow, it takes ten minutes to bring a VM up. Why is that? So if there's anybody here from Microsoft, I would love to have a talk with you why it takes so long. not convinced that's absolutely necessary. I'm And even more interestingly, when we compare starting it up to actually scaling it, so then doubling the instance size here, so we add four more VMs. And so these are stacked charts. So you can see the total time at the top is a total time to double the entire deployment size. So here the worker -- I should note, these are all install instances. I only present the data for the small instance types here. So from the time that we already have four instances running for worker role instances to the time we can get four more, it's just under 14, 15 minutes for that. So we can see, again, as expected -- no, wait, that's start. For scale, it's significantly longer. So the first ones come on and then we see kind of they trickle in. So there's also no guarantee you'll get all your resources at once, although I should note, we have seen some slight changes in the behavior recently, so some recent experiments as of literally a few days ago rerunning some of this, we actually have seen these gaps shrink significantly. I haven't analyzed the data enough to say whether this goes up or this comes down, but it's worth noting. And certainly when you're talking about dynamically scaling applications, when it takes 20 minutes to bring up five more, four more instances, that's something you need to take into account, particularly if you're trying to follow some workload curve, right? I mean, if you're trying to match your resources to some workload, you need to know that you need 10, 20 minutes of lead-in to actually match that. So the take-aways here. Deploying VM takes about ten minutes. Is this too long? We think so. In a lot of cases that could be hindrance. We have not really run a comparison with other cloud providers. I could give you anecdotal kind of information that we've seen significantly shorter time from some other providers, but for actually Windows instances, it's not actually that different, which is telling in itself. Adding instances takes much longer than initial deployment, so kind of be aware that dynamic scaling does have an overhead and it's not quite as instant as we or many other people would like it to be. As you increase instance types, it will take longer, and you have to account for the fact that you won't get all your instances at once. This actually can be good and bad, as I'll talk about later with the storage services. Speaking of which, we'll look at each one of the main storage services. So the blob storage, table storage, queue storage. I won't go into what each of these really is and how they really work. If you're interested in that, there's plenty of documentation or you can come talk to me later and we can kind of discuss the intricacies a little bit more. But suffice to say blobs provide large kind of unstructured storage, big chunks of bits. Tables are semi-structured data, although not in the classic RDBMS sense. Their only semi-structured. There's no enforced schema, but the kind of query, insert, update things. And then the queues, which are fairly self-explanatory. So we'll start off with the blob service. Again, the limit -- we'll skip that since we have time, the get and put semantics. And according to Windows Azure, performance is isolated between blob containers. So blobs are these objects, and they're kind of grouped into bunches by these containers in a naming sense. So you have a container that contains some set of blobs. Performance between these containers is supposed to be isolated in terms of where they're located and stuff like that. In the datacenter, you can't count on them being near each other, et cetera. So we test the performance of getting and putting a blob within a single container, so we did not span across containers, and we actually scaled between 1 and 192 concurrent clients. So for the get action we scaled between 1 and 192 gets on the same blob, and for put, putting those blobs into the same container. And what we got here is -- okay, that's right. You see the per-client bandwidth is in the vertical scale. So this is from the perspective of a single client within the deployment, and we scale it between 1 and 192, as I mentioned. And so for download, we see a single client gets about 13 megabytes per second download from the blob. So when fetch that 1 gigabyte blob, you're getting -- which works out to be about 100 megabits per second as the Azure specification states for what you expect network performance to be in a small instance type. So we did all these tests using small instance types so that we could get that high scalability number. And we the performance degrades reasonably. So the big take-away here is that it's not infinitely scalable, certainly. Especially when you're accessing a single entity. And so you need to be careful how your application uses the storage. If you scale up 50 instances and their first action is all to go fetch some initialization data from blob storage, you need to be careful how it's organized because you can see significant performance degradation depending on how that blob is organized and how it's distributed in the storage system. Additionally, we see here upload is significantly slower than the download, which is somewhat to be expected. And I'll keep moving because we've got to go quick. So here's kind of the service side perspective, the cumulative bandwidth. If you add all those up from each client, we see the service itself is supporting, we max out about just under 400 megabytes per second, which is kind of an interesting number. We weren't quite sure why that was. We kind of had assumed maybe it's triple replicated and each has a gigabit, but that adds up to about 375 megabytes per second, not just under 400. So there's still some investigation to be done here, and, again, because of performance variability, do we know -- you know, will this change over time? Who's to say it won't. But it's interesting. And upload was significantly slower as well, but again, that's expected since there are replication issues involved. So quickly moving onto the table service. So table service basically has this entity, attribute, value model where entities are essentially rows, attributes are items within the row, and each attribute can have a value and a name, et cetera. Again, semi-structured, no schema. So the question is what kind of performance can we expect when we're running queries and inserts and updates against this storage service. So we performed each of the four primary operations: insert, update, query, and delete. Each client operates on its own unique entity. So they didn't have row conflicts directly, but they're all within the same table and within the same partition. And Azure as this feature where each table is kind of divided into partitions which are dependent upon explicit values that you put in the row entities themselves. You give them a partition. So we worked within the same partition, and we performed basically 500 of each operation for each client within the exception of update, which only did 100 ops. So at the end of the insert phase, which was the first phase, there were approximately 220,000 entities in the table and then upon that we operate the queries and updates and deletes. Moving on, so this is our performance graph. On the left we see the table performance using 4-kilobyte entities. We have data actually on many different sizes. We present 4k here because it's fairly typical of the various sizes, and they're wasn't a dramatic difference between entity sizes such that we think you need to present all the separate ones. So query and insert are interesting here, because not only do we have a weird kind of uptick with low concurrency, but they don't really vary that much, so we actually are pleasantly surprised at the scalability here. From 1 to 192 clients we only see, you know, 30, 40 percent variation in performance, which is actually fairly impressive for both query and insert. Delete, we can't say that. It grows quite quickly. And update, which is on the right here, it's a whole different story altogether, and we present each of the different sizes here. So you can see for the different entity sizes, it didn't really make a significant difference in terms of performance. They're all basically about the same given the average operation time, but it quickly gets to be quite expensive with high concurrency, so be careful how you design your tables. Again, this is a single partition, so within the same table you could use multiple partitions to kind of spread this load and get better performance in that regard. So the queue service is kind of the last of the three primary storage services, and it's intended for passing reasonably small messages in a basically FIFO model. Get, put, peek are kind of our typical queue operations. And so here we just test concurrency against a single queue and see kind of what a queue can handle using various message sizes. So we changed from all the way down to 512-byte messages all the way up to 8k messages. So put and get are the ones we really care about. Peek is basically consistent across the concurrency level. I should mention that the vertical axis here is message per second, and that's seen from a single client's perspective. You can do a little bit of math if you really want to get absolute latency numbers. But, again, so for put and get, particularly at scale, again, the size of the message becomes less important. So actually using larger messages as you need more concurrency is a better way to get more bytes through the interface. But it scales reasonably well. We kind of -- we think 32 concurrent clients is about the inflection point at which we start to see degradation of performance beyond, so you get approximately 50 percent less messages per second after you pass that barrier regardless of message size. So now let's talk kind of very briefly, because we are getting short on time, on the direct TCP communication. So this is a somewhat new feature, useful. It allows workers to communicate directly without having to pass messages through this queue, so we have a much -- potentially a much lower latency communication operation because there's no intermediary required. So we just ran some -- opened a TCP connection, transfer a file, and kind of measure the bandwidth and the latency that we observed, and we actually ran these tests for a long time, several weeks. So, again, if you really are interested in it, we can talk about kind of the kind of variation that we're talking about. We actually do have a graph that point out some interesting artifacts that we kind of discovered and had some observations. So here a kind of a histogram view. So of all the samplings -- I should say of all the experiments we ran, the key note is performance. So -- I can't read it. The way to kind of read this is that about 65 percent of all file transfers that we performed via this TCP got 80 megabytes per second or greater. So that's kind of how we read this. So percent is the speed or greater. So we see most of them, at least 50 percent, got actually very high performance. And this is very interesting when you consider this is also on a small instance type, which is supposed to be limited to 100 megabits per second. This is clearly well above that threshold. So we are interested in figuring out exactly what's going on here, which I'll talk about in the next slide, the theories we have, although, again, it's a black box so we're not quite sure. Latency. Reasonable latency. Again, because this is a datacenter, we have no real notion of how far apart these objects really are, but reasonably consistent. Not a huge variation. Not nearly like what we see with the bandwidth. So here are all those data points. You'll notice immediately there's a reasonable amount of variability between these tests. These were run every half hour for several weeks, and these two troughs are the interesting points. So what really happened there? A couple of ideas, but we have no concrete data. So the real take-away here is you can't really count on any specific performance number you get. We don't know -- because of this kind of weird occurrence where we're getting much higher than what we actually expected, you know, we expected this 100 megabit section, and we're getting much higher than that, so we're looking at is this really the correct numbers, are these the correct numbers and we just got lucky a lot of the time, so could there be some co-location issues or multi-tenancy that actually reduced the bandwidth. I don't know how Microsoft actually enforces the bandwidth limits, so that's one possibility, or we could have just had some random network occurrences, high load elsewhere that caused this. But you can see it's clearly variable over time, and you need to take that into account when you're designing applications. And I will wrap it up with our quick look at Azure SQL services. So this is kind of the most traditional service here, the RDBMS that everybody is familiar with based on SQL Server 2008, I believe. It is size limited to less than 10 gigabytes per database, so that's an important factor when you're designing your application. If you expect your database to grow beyond 10 gigabytes within a single database, you have to find a way to partition into multiple physical databases, not even between tables or full databases, which that does not fit all workloads. So we ran the TCP-E benchmark, which is an online transaction processing benchmark. It simulates a stock brokerage house, to updates and stock tickers and things like that which I'll talk about in a second. And the database that we used to test against is about 3 gigabytes in size, so kind of right in the middle, not right up at the upper limit. And here's a breakdown by the micro-benchmark. So each of these is kind of a defined transaction within the suite of benchmarks, and in the blue here we actually ran the benchmarks on a local machine that we had in our lab. So it was a quad-core, I think a Xeon. I'm not sure if it was an Xeon or a Core two with, I think, 8 gigabytes of RAM, so kind of a standard local server. And then we compared that to the performance between the SQL services in Azure and a client also running in Azure. So we see this is kind of the Azure LAN case. And on average, which is right here, this is the average case, average across all the micro-benchmarks, we see about a 2x slowdown that you can expect. So that kind of -- we've looked at which ones of these are reads and which ones are read-write intensive. There seems to be no consistency between the performance differences between those. They kind of average out. We're investigating this a little further, why it's actually faster in Azure than it was on our local system. Because Microsoft gives us absolutely no specs on what these SQL Server instances are running on in terms of hardware or even their virtual resources, it's hard to compare directly and claim that this is some resource contention issue or anything like that. But on average you can expect a 2x slowdown. I'll move really quickly. So we also go some transaction slowdown. So here we see as we increase the number of concurrent threads that were running transactions against the database, comparing, again, this local LAN server with the Azure cloud, we see Azure actually scales reasonably well to concurrent clients, more so than our LAN server did. At this point we actually saw a fair number of failures locally. That's why we don't have data for all the different data points. But we can maintain below a 2x -- yeah. So we're kind of calling the inflection point here about 30 concurrent threads, and we can stay under that 200 percent slowdown factor up to 30 concurrent threads. And this is showing how we actually saw commit failures as we increased the concurrency as well. So these are actually transactions that failed to commit. And we lost a data point. This is the local server again. So it degraded rather quickly. So looking at this data over time -- so we ran these benchmarks for several weeks, and we saw fairly consistent performance. So this is kind of a high-level trace of each of the micro-benchmarks. And so you see a little bit of jittering here and there, and this is actually a case where the client machine that was running the benchmark against the server actually failed and we didn't know about it for a couple of days. So it's kind of another lesson learned, be careful what happens, because even though it's supposed to recycle, Azure is supposed to be recycling these VMs, if they fail, they actually failed and then failed to recycle, so it just died altogether. So kind of another lesson learned. But we see reasonably consistent performance, particularly given that this was during the CTP phase, so while the data building and testing was still going on. So general recommendations and conclusions. Be careful of how you do the scaling. Be aware of that because it's dramatically slower than the initial deployment, you need to look at the workloads themselves to determine when it might or might not be worth it to actually scale. If you have a lead-in time of 20 minutes to gain more resources, will the workload peaks have passed by that point, essentially. Distributing blob accesses across many containers is one way to contain or maintain higher performance. So don't point all of your instances at a single container or a single blob. The tables scaled fairly well for most operations. Update and delete were the two noticeable exceptions, but this is fairly expected given the nature of those types of operations. And SQL Services scale reasonably well, but it's tough to really recommend using something like that for a scalable application because, again, of the size limitations. If your database is expected to grow, and as we've heard a couple times today, they never shrink, then you need to be very careful of how you appropriate that. Surprises. So the big surprises were why does scaling take so long and why is TCP performance not the same as blob performance. These are kind of the areas that, moving forward, we'd like to investigate and talk with the Microsoft and Azure people, if possible, to see what's really going on here. I should mention that similar to what work we've been doing, there's also an application out by the extreme computing group here at MSR, the Azure Scope, which provides some similar benchmarks. If you're interested in running some of your own kind of benchmarks like that, they provide some examples and stuff that actually we found fairly interesting towards the end of our work. So I will conclude so we can move on and get some lunch. Yeah? >>: [inaudible]. Zach Hill: Yeah, we had looked at that. That was one of our kind of things that we thought might be the case, but it not clear in terms of documentation whether this 100 megabit is a guaranteed minimum or a guaranteed maximum. >>: [inaudible]. Zach Hill: Yeah. So it's kind of hard to interpret what 100 megabits means; do I always get that or will I ever get more. >>: [inaudible]. Zach Hill: Yeah. So that was -- yeah, excellent point. That's one of the kind of issues is how much are you really contending and ->>: [inaudible]. Zach Hill: Ours is a regular sequential downloading, yeah. Again, we didn't think that number was actually surprising given that we expected to get the 100 megabit network limit. So if we could actually get more than that, that would also be surprising. >>: [inaudible]. Zach Hill: Http, not [inaudible]. >>: Any other questions?