Document 17856961

advertisement
>> Dennis Gannon: So we have a great session now. Previous session was really interesting to hear
what the research community was doing. This session is flipping it the other way, really, and talking a
bit about what we’re doing here in Microsoft and Microsoft Research, and on both the—sort of—
infrastructure side and also on the research side here as well. So, great pleasure to be introducing
David, here, who works for an organization called Global Foundation Services that runs Microsoft’s—
sort of—infrastructure across the world. And I think this is—you know, I’ve seen his slides—it’s a great
insight into—you know—what powers Microsoft, really. So here’s David.
>> David Gauthier: Great. Thanks, Gadgy. So thanks: thanks for having me; thanks for the interest in
the topic today. My name is Dave Gauthier. I run our datacenter architecture group in Global
Foundation Services. I’m really responsible for the technical strategy and the availability profile of the
physical datacenters we’re … we build around the world. I’m probably the only person in this room who
doesn’t have some sort of degree in computer science, so go easy on me. But I’ve been in the company
about fifteen years, and what we find a lot datacenter physical industry is that people just kind of fall
into this space. My background is in a little bit of computer science, a little bit of broadcasting and
communications, and just a love of wires and plugging things in, and so I kind of fell into this space.
But yeah, I really want to talk about the growth that … the growth of the cloud and how that’s fueled
investment in our datacenters across the planet, and kind of how we tackle that. So Microsoft—as a lot
of people know—runs a lot of different cloud services; we’ve actually got—you know—the … probably
the five big ones that everybody talks about, or knows about: Office 365, Windows Azure, Hotmail
Windows services and the things that power the back end of Windows 8, Bing, and the Xbox Live
components, but there’s about two hundred different services that make all this up. We’re delivering to
over a billion customers around the world, twenty million businesses, and hitting a ton of markets in a
ton of different languages around the globe. The main point of this—there’s some really big numbers
here: billions and trillions—but it’s really more about the scale and the need that we’ve had to scale
over the last, probably, four or five years, since things really started ramping up. And to take something
like Office 365—which is the fastest-growing product in Microsoft history; we’re moving all these folks
on Exchange and SharePoint into the cloud as a service—we need to be building datacenters in new
markets all the time; we need to be innovating in our datacenter space. And then, for Windows Azure,
where there’s a robust competition for IaaS and PaaS services in the cloud, we need to be able to do
that at very, very competitive rates, and be able to do it very, very quickly and agilely, and so we …
building new datacenters, innovating in their efficiency and in their cost to deliver is key to doing that.
Around the world, these are kind of the big datacenters we talk about. I think we have a number of
presences here in North America; we’re also turning up in new services in Asia and China, around Office
365 and Azure, lighting up Australia and Brazil as well. A lot of people ask what goes into this selection:
how do we decide where to place datacenters? And really, it’s about thirty-five things—we used to like
to call it thirty-one flavors, and then it grew—but it’s about proximity to customers; it’s about the usage,
and the utilization, and the speed and latency that these services are consumed under. You know, a lot
of cases, we have customers come to us and say, “Hey, why isn’t there a datacenter in my town?” And
it’s like, “Well, I could build a datacenter right next door to your office building, but in reality, both of
these are gonna transit all the way back to Amsterdam before they come back here.” And so, having
real proximity to the customers, from a network standpoint, is super-key. Having robust scale-out
energy and fiber infrastructure is another really important element. These datacenters are getting into
the tens of megawatts of capacity. We’re going into a lot of rural locations that have lots of land, and
we can expand, and we can keep growing and growing and growing, without having to start on a new
site, and so having an energy grid that we can support that from is really key, and having growth in
fiber’s key. And the last piece is: around a skilled workforce—you know, people who can build these
datacenters—we’re turning up—y’know—hundreds of millions of dollars of projects in very short
amount of time, so we need to build them, and then we need to operate and run them and take care of
the servers and the electrical mechanical equipment inside the datacenters.
We’ve been on a bit of a journey, and I’ve been involved with this … kind of want to call it day one; we
built our first datacenter at Microsoft in 1986—that’s about half a mile that way—it still exists; it’s called
building eleven. And it was really about corporate IT functions, but we built our first, like, internetfacing datacenter in ’96—that’s north a little bit, in Bothell. And when we really started going into the
internet space—after ’96, ’97, ’98, ’99, through about 2001, 2002—we were really going out and buying
colocation: third-party leasing space, and growing into that. And I think the main reason I talk about this
is because those stuff was just about rent—it was just about a place to put you servers and put your
devices and run them. A lot of companies still run this day today … or this way today. And there was
never much thought of efficiency or energy efficiency or resource efficiency in those types of
datacenters, because they were fully landlord-tenant relationships—who cares what the tenant pays,
I’m just gonna build this thing. And so we have this metric we call PUE; it’s very, very … it’s an overly
simplistic metric, but it’s one that people can grok easily that’s about efficiency. And so it’s about the
amount of energy coming into the building versus the amount of energy being consumed at a compute
resource or device or network device, and so it’s a simple overhead metric. And it’s a good way to talk
about things.
In 2007, we had eaten up all of the extra lease space the dot-com boom had afforded us, and we
realized we had to build datacenters ourselves—it was the first time we really focused on designing for
efficiency, designing for operation and how we actually deploy in the environment and get a little more
speed into what we were doing. And we were able to knock a number of points off of the PUE metric
straight away, just by focusing on it as an owner, rather as a tenant. In generation three, which started
landing in 2009, we played with containers, and I think a lot of people have probably heard about … or
have heard about containerized datacenters, and we really went all in on this. And we built a very large
facility in Chicago, where we were deploying contain—you know—forty-foot shipping containers filled
with servers, and we get about two thousand machines into each box; we could roll it in off of the truck,
and within about eight hours, we’d have our bits imaged on top of that, and we’d be serving traffic off of
that cluster—highly, highly efficient from a workflow and a scale standpoint, and also highly efficient
from an energy- and a water-consumption standpoint. So it was just a super-big win-win.
The other element that we brought in to that container farm is: the first time we looked at what we call
differentiated SLA’s. So the typical datacenter environment today: if you were to go and hire somebody
to build you a datacenter, they’re gonna build it to five nines availability—really quick, you know, very
low mean time betw … oh, sorry … very high mean time between failure, very low mean time to recover,
and it’s always running; you can treat it as a permanent resource. Now we know, as software
developers and as service developers, that there are a whole bunch of other things that can go wrong,
besides just the power going out on the infrastructure, and so, if we want to design our applications
better, we’re gonna design them for all the faults that can happen in the system. And so we took
advantage of that just—you know—five years ago, and actually provided a three nines environment for
our server infrastructure, and recognizing that it can fail. And they have failed. In one instance, we took
a double lightning strike on two separate substations that connect to this datacenter, and those data …
those servers didn’t have any generators behind them, and so we dropped tens of thousands of
machines. Great thing is: we’ve got some smart people here; nobody outside of Microsoft noticed. The
applications kept running; the users were available; and everything went exactly as it’s supposed to
happen—with a lot of operational work on the other side to recover the environment, but we got there
and were … learned from it.
>>: So you do have five nines availability …
>> David Gauthier: Yeah, five nines at the service. Right? Five nines at the service, three nines or less at
the infrastructure. We take this a little bit further in our generation four facilities, and these are what
we are … we’re still turning over a number of these facilities today. And we moved more of those
modular components into containers, and we also took advantage of outside air to cool them, so they’re
even more efficient, and they used a heck of a lot less water—about one percent of the water of this
day … of … oh, well this pointer’s not on there. This datacenter versus this datacenter: huge reduction in
the amount of the water that was consumed. And it’s really about—you know—fast time to market,
and sizing across to our demand.
Now what am I working on today? Working on our gen five infrastructure. We’re … just announced
investments in Cheyenne, Wyoming and Des Moines, Iowa—West Des Moines, Iowa—of nearly a half a
billion dollars for new datacenter construction—this is our gen five builds that are kicking off. This is
about integration; this is taking that understanding of the service and the application another layer deep
into the infrastructure, where we’re designing the datacenter, the server, the network, the workflow
and operational tooling all together with Windows Azure and with the team behind Bing Autopilot,
which I think we’ve talked about a fair amount in the past. It’s really about resilient software: so the
software can deal with any kind of failure within the environment; it’s on common infrastructure; we
have much fewer server products—or skews—than we’ve had in the past. And it’s really about being
flexible and scalable—so we can get all the benefits we had here with an integrated system. We’re able
to kind of rev the code of the datacenter—you know, kind of compile the availability to be what the
service needs to run.
One of the things that goes into that—and this is an e-science workshop, so I figured I’d try to at least
show you some of the math we put behind this stuff. You know, we were running over a million servers;
that means we have a really nice, big sample pool to pull from for how infrastructure runs. And one of
the most notable and simple things that we did with this is: we modelled out our datacenter topologies,
and the temperatures outside, and the temperature inside, and what the weather was doing against the
device failures on the systems. And the number one thing that fails in big infrastructure is hard drives;
the number one … number two thing is systems that support hard drives; and I don’t remember what
number three is, but it’s way down the list after those two. And so we were able to find the—depending
on the inlet temperature that the data … that the device was seeing—we could annuli … look at our
annualized failure rates and really understand how hot we could run the datacenter. And so we took
this information; we’ve poured it into the designs; we’re running our datacenters ten to thirty-two
degrees C, with a time-weighted average of twenty-four degrees C to get the best energy efficiency
possible out of this, and this has actually driven our PUE numbers down on that last slide from the one
point two to the one point O seven range—mostly in hard drive temperature knowledge.
We’ve also done a lot of other things: removed all unnecessary components—you know—no hard … no
spinning … oh, sorry … no CD ROM’s, no USB ports, things like that, but we’re also running at a
European voltage now—we’re running one voltage acrossed our portfolio—four hundred fifteen volts;
it’s the most efficient; we’re able to take extra transformations out of the design; and we’re able to have
commonality in our supply chain worldwide as a result. Let’s see … the other piece is around … from
these high temperatures, how we get that major savings is by using outside air cooling. We use what’s
called adiabatic cooling or direct evaporative—like, if you’re sitting on a beach, and you have a spray
bottle, and you spritz yourself, that’s how we cool our servers—we open the windows, and we spray a
little bit of moisture into the air to bring the temperature down on the super-hot days, by introducing
humidity. So it’s super, super efficient; we’ve got these running almost around the world. There’s only
a few locations where we can’t do this—most of that’s in Southeast Asia, where it’s very humid at the
same time as very hot.
The other interesting pieces that we’re working on are the next next elements. We’ve got a team
working on research around direct … or, sorry … distributed generation and also fuel cells. Now, we’ve
got a team at MSR and GFS, and along with the UC Irvine National Fuel Cell Research Center, working on
directly powering servers off of natural gas, and so we have a rack of servers where we’re converting
natural gas via a—I’m gonna get caught on this one—I think it’s a PEM fuel cell to actually deliver direct
DC voltage into the servers. And this is gonna … this has been really, really successful; it’s really exciting,
‘cause if you know, the grid is horribly inefficient at delivering energy. Something like forty to sixty
percent of the input energy is actually lost in the distribution before it comes into a datacenter, and so if
we can take natural gas straight into the datacenter, we’re able to run a tremendously higher efficient
datacenter infrastructure overall. And so this is one of my favorite things that we’re working on. Other
piece: we’re actually taking that a step further, and in Cheyenne, we’re working with the local university
on a waste water treatment plant, where we’re recovering methane gas coming off of this treatment
plant, and we’re then running that into a fuel cell to run servers. And so we could see ourselves—you
know—all jokes asides, kind of powering these things off poop, which is pretty exciting. See, the other
nice thing about this is there are a lot fewer parts—a lot fewer things break—and so the reliability of the
system increases—fewer broken … fewer moving parts, fewer broken pieces, and we’re able to have a
common system, again worldwide, where all we need is a natural gas feed, and we’re ready to go.
And so, just to summarize—I’m a couple minutes over—but the other element that we’re really working
on at Microsoft is shifting the thinking. I think there’s been a lot of talk about one Microsoft and about
doing things at cloud-scale and really thinking about new ways of developing apps, new ways of
delivering services, and so these are kind of key themes. And when I talk to developers here, I’m really
nailing on a daily basis, [laughs] “You need to think about resilient services. How is your app going to
deal with not only a power failure or cooling failure, but also somebody fat-fingering the code into the
router? How are you gonna recover from that?” And so, these brittle services, this brittle networks
need to start getting stronger. Solve hardware problems in software. Software company: we don’t
need to solve … we don’t need to throw a bunch of redundancy or extra components at the problem;
let’s figure out how to solve that. Self-healing, and last piece is probably the more critical element in
scaling: how are we enabling the business? We need to be able to go out and come up with new
products. The datacenter, and the servers, and the network are not the product that Microsoft sells.
The key thing that we’re looking at doing here is making sure that that infrastructure is there to enable
developers, enable researchers to be able to spin up new ideas, test new hypotheses, and get them out
to market faster or solve that problem.
So that’s about it. There’s some slides; there’s some content here on the back end that people are
welcome to go out and check out. We’ve got a robust blog at microsoft dot com lat datacenters, and a
whole bunch of other great information on Azure and O three sixty-five. There any—yeah—questions?
Please … I should have said ask me questions as I go. I forgot that part.
>>: How much [indiscernible] number one [indiscernible] datacenters’ hard drives. I just want to know
how the … does using SSD drives change that in the system—of course, there’s a addition in cost, but
this comes to break down the cost [indiscernible]
>> David Gauthier: Yeah, absolutely. So the question was: how does hard drives being the number one
failure change if you start moving to SSD’s, recognizing that there’s a cost premium there. You know, we
find that—you know—the SSD’s have their own issues, but they’re more reliable to temperature
variation—absolutely—but I think that we find that we use them for different things. And so if we’re
going to go out and build hundreds of petabytes of storage for Azure storage or for Bing or something
like that, the economics of SSD just break. But they are a little bit more reliable for temperature
variation, and there’s some interesting work going on in understanding exactly how SSD’s respond to
temperature, and what that looks like from a drive-life perspective, as well as a read-write perspective
on the type of flash we can use.
>>: So over time, things will break.
>> David Gauthier: Absolutely.
>>: And do you actually try to go into these containers and repair things or do you just—at some
point—recycle everything?
>> David Gauthier: Do we—yeah—so do we fail in place in the containers and forget about it? Do you
have another question? Or do you …?
>>: No, that was my question.
>> David Gauthier: That was it? So no, we don’t. Unfortunately, computers haven’t gotten cheap
enough yet. You know, there is a … the way that we treat the containers—and actually the whole
environment for that matter—is about a carefully-crafted percent of the infrastructure that can be
down at any given time and for how long it’s down, and then we kind of target that for when we’re
gonna go in and repair it. But for right now, most of the computer SLED is just too expensive to forget
about it, unless it’s right toward the end of its depreciation cycle.
>>: Yeah.
>> David Gauthier: ‘Kay, another? Go. [laughs]
>>: Yeah, that’s alright.
>>: You mentioned about the utilizing a methane generator [indiscernible] is that twelve percent … is
that what … about utilizing the heat. I didn’t know, I remember in some newspapers, in some way,
they’re sort of … they are utilizing heat generator for the [indiscernible] the greenhouses gas. Do you
have any plans for that?
>> David Gauthier: I can’t speak to any plans that we have on that—on the drawing boards—but it’s
absolutely a huge opportunity, and we are looking at it in certain locations. Waste heat recovery for a
datacenter is a great opportunity. The challenges are it’s not … today, it’s not quite high-enough-grade
waste heat—like, if it was, like, three or four hundred degrees, it’s more portable; you could take it
somewhere—but when it’s only—y’know—a hundred and twenty, a hundred and thirty degrees, it’s
hard to transfer that into water and actually move that water a distance to actually get a huge amount
of benefit from it without putting the greenhouse right next door. Not … I’m not gonna say that’s out of
the question, though. [laughter]
>>: A question I had was about Amazon Glacier …
>> David Gauthier: Yeah.
>>: … is a long-term archiving service …
>> David Gauthier: Sure.
>>: … which is very cost-effective. Do we have any thoughts on that?
>> David Gauthier: I’m sure that someone in the Azure space is considering long-term storage. There’s
a number of different technologies that we could choose from to do there. So can’t speak to it
definitively.
>>: It’s alright.
>>: Do you think it will be possible at some point to see if a different machine had better energy
consumption than machine? Provide an API, for example?
>> David Gauthier: An API to provide … to pull the energy consumption from machine? Absolutely. And
actually, this is kind of one of the challenges of the overall OEM ecosystem is that in 1999, I was buying
servers from Dell and HP that had that kind of telemetry coming off the hard drive and the power … or
off the motherboard and the power supply already. And then, somewhere along the line, they wanted
things to get cheaper and cheaper, and that was like one of the first things to go, and you’re starting to
see those features come back in natively in the power supply and the chassis managers in some of the
new products. So that … those are there in the higher-end products, especially.
>> Dennis Gannon: Okay. I think we’ll move on now. So thanks, David. [applause]
>> David Gauthier: ‘Kay, thank you.
>> Dennis Gannon: So delighted to introduce Victor Bahl, who runs a mobility and networking group
here. I think what’s really exciting is—you know, we’re obviously talking about cloud, but obviously
mobile devices and services—so Victor actually helps to really define the strategy of the company on
networking. So I’m really looking forward to this … to Victor’s talk today. Thank you, Victor.
>> Victor Bahl: Thank you. Thanks. Hi. Alright, thank you for giving me this forum to say a few words.
There’s a lot to say, but I’ll try to condense it very quickly. So—you know—you know and we have—
Microsoft has—and Google, and Amazon, and everybody are spending billions of dollars on cloud—you
just heard that talk—and it’s been going on for many, many years. And so all these companies, including
ourselves, have been optimizing the heck out of the cloud infrastructure itself, and now that we have
the both a devices and a services company—we’re building both the devices and the cloud—but there is
still an open element here that I want to talk to you a little bit about. But … just let’s start with some
ground truths here. So we know that in order to build great services, we need to have the ability to get
to do services really quickly and really optimally and fast. And the latency and the jitter that you
experience is actually … directly will contribute to how you—you know—use our services, or how you
think about it.
So while we’re optimizing all the cloud and … it is important to sort of think about how you connect to it,
alright? And these things matter; they matter a lot. I just picked up quotes—we have our, obviously,
our internal measurements as well—but I picked up some quotes of people that you might recognize,
claiming publically how much it mattered to their revenue, in terms of looking at something simple as
latency. Right? So if you increase the latency to your services, you actually gonna lose revenue; if you
make it better—y’know—people are gonna come towards you. So we see this again and again, both
internally and externally as well. So I think the more famous quote is from Marissa Mayers, which I have
listed as first.
So now, if you think about latency as that component that you want to reduce, then let’s simplify this
problem—I mean, you know, there’s latency in every part of the system. So there’s something going on
in the client when the request is sent, then you send the request over the network to the datacenter—it
munches on your data—it sends it back; the clients pre-processes, post-processes it, and presents it to
you. So there are really these three components. Now in the past, we have focused very heavily on the
datacenter, right? In fact, Albert’s gonna talk next, and he’ll tell you about some of the innovations that
he, himself, has been involved with, which include removing the network from being bottlenecked to
the server—so the servers can now work on line rate … pump data out in line rates and can be fully
utilized. Similarly, software load balancers, and most recently, we’ve been through working on trying to
optimize the bandwidth utilization between datacenters. So we’ve done that, but like I said, there is a
gap, and the gap has to do with the internet.
So no matter how optimum you are in one space, you are not gonna really be able to do great thing,
unless you sort of think about the internet as well. So this is a talk about that. Now, this is a little
experiment I did a while ago, from my own office—this is using one of our phones, and it was just a ping,
a traceroute to one of the Google sites—and you could start to see that, depending on the network I
use—in case of Wi-Fi and cellular—the number of hops go up. And with the number of hops, the
latency, too, it goes up. So now, if you say, “I want an SLA. I want to build a service which performs like
this, ‘cause I want all the computes to hit me.” You know, if the packets don’t even get to your cloud,
you’re gonna have problems. And we repeated this experiment again and again, and—you know—we
see the same things.
Now, why is this so? Because the internet is actually very, very complex, right? This is a gross
simplification of the internet, and what is it? It’s a bunch of ISP’s—right—they talk to one another; they
have this nice backdoor, called PGP, and based on policies, your packets get routed. And at any given
point, it’s not really clear what route it’ll take. So if you don’t have any clarity on what route it’ll take,
you don’t have clarity on how many hops it will take, and then you don’t have clarity on how fast or slow
it’ll get. And there are lots and lots of networks, right? This is just some data to show what those
networks are.
Now if you add to that the mobile world—right—which is the … huge deal now, you have great
complexity in the mobile world as well. In fact, a lot more complex than the IP world—right—so the IP is
presented with this little internet here, and a lot of the other stuff is just mobile. And then there’s
measurements, and you’ve seen papers—or you can see papers—which tell you that latency just goes
up. This is on a 3G network, you can see T-Mobile with four hundred and fifty milliseconds—this is not
singling out any particular ISP here, but just general measurements. And then, when people say, “Well,
LT’s coming. LT’s lot better,” but there too, we see the jitter and we see at least seventy milliseconds
average latencies to our datacenters. Now, if you don’t believe me, you can just download this little app
that is available on Windows and phones and everything else, and you can try it for yourself. And you
will see, ‘cause it measures the latency, and you will see all these numbers are, actually, absolutely
correct. In fact, you can start to see how good your ISP’s are and what you’re getting.
So now, what is the question? The point is that when the packet leaves the device, you want it right
away. You don’t want anything … any time spent on the network is wasted time, right? So what do you
do? You say, “Well, we’re gonna spend a … build all these massive datacenters all over the world,”
which is what we’re doing—and not just us, everybody is doing that. You saw a much nicer picture of
this—I just sort of brought it up—just a minute ago … or just the speaker before. But really, we’re
building all these big datacenters all over the world so that you can get to these places, from wherever
you are, very quickly, right? But is this really a solution that’s enough? And the assertion that it’s not,
because the cost for building these op … these datacenters is very large. The cost of both building them
and operating them and—you know—keep equipment going and everything else is very, very large.
So what’s another alternative? Another alternative is build lots of these ones, but builds slow … small
parts … small of these datacenters, right? So, for example, we have a team called Bing Edge which kind
of does that, and then the idea is that you spread thousands of these on the internet, and then
wherever you are on the network in the world, you can get to those guys fast, right? And then once you
get to those, you can, at that point, decide what do you want to do with the packet? Do you want to
compute there or do you want to send it back through some really good … better networking from there
to your mega-datacenters? And there … so then, systematically, you break up this really complex
problem into its pieces and start to approach each of these pieces individually. ‘Kay? So that’s … and so
then, like you said … like I said, you tunnel from these micro-datacenters—or cloudlets, they’re called in
the literature.
So these cloudlets, what are these? They are much … they’re capable of doing a whole lot. So right
now, in this … today’s world, they’re really an SSL termination thing, so where … and they’re a proxy for
MS services or any other services that Microsoft runs. So for example, you’ll say, “Hit the cloud,” and
it’ll terminate the TCP connection there and then have a persistent TCP connection from there … from
the edge to the cloud. And so that way, if there are losses in the TCP, those losses get—you know—
mitigated … not … the losses don’t get admitted, but the delay that happens because of the losses get
mitigated. But then, you can also do a lot of edge computing stuff—this is sort of new things. You can
act … use these micro-DC’s to do resilient routing, so you know, if one path is not going well, you can
create—imagine creating—an overlay network on top of that and start to send packets on that network;
you can keep doing service internet monitoring; you don’t … you can proactively see if something is
going bad or good, and so you can react with that; and then you can do a tremendously … a lot of things
for mobile computing, and I’ll give you an example of that.
So here’s one benefit that I wanted … that … which I think is fairly significant and you will appreciate. So
I don’t know if … how many of you know, but when iPhone first came out in New Yo … in … several years
ago, there were lots of these news articles which said that it brought down the New York … the
networks in New York. Does anybody remember any of this stuff at all?
>>: [indiscernible]
>> Victor Bahl: You remember it. So you guys … okay, so bad—I guess—things don’t last long. But
nevertheless, there was this big thing, which is that the networks were just brought down. And the
question was: what was going on there? Why did the networks go down? Well, the way the networks
went down is because everybody—iPhone and all smart phones—are trying to save battery, right?
That’s sort of the most … and so all our traffic is very bursty. So when you go to the website, you send a
bunch of packets, a bunch of packets comes, and you done. Then you watch the page or whatever. So
really, what the iPhone guys were doing at the time was: they would put the radio on low power mode;
then, when the data was there to transmit or receive, they would put it high power mode, transmit the
data, and as soon as it was done, bring it down to low power mode—makes sense. But really, that
didn’t really work well with the tel co’s, ‘cause what was happening was that the whole tel co business is
based on circuit switch networking, really—they come from that piece of world—so every time you set
up a connection, a whole set of resources are set up from your device right onto an entire network to
the point where they go to the internet. And so every time you bring the radio down, a signal is sent—
bring down the resources—so now, as you’re going back and forth—as you … as the syste … as you are
sending data back and forth—these signals are going back and forth, right? And as these signals are
going back and forth, they’re … the network is … gets overloaded and can’t deal with it. Alright? So they
figured this out, and then said, “Okay, you can’t do that anymore.” So what you have to do now is: you
have to … when you’re done with the data, you can bring it down to lower power—so one point six
watts to one watt—but you can’t really bring it down for an additional n seconds, okay? So if you don’t
bring it down, then you can as … fix any perturbation, and you can sort of solve this problem a little bit,
and the energy’s not lost.
Well, so that’s a … that’s the data here. So LTE consumes about one point five watts, but the chip is
active for approximately ten seconds at one watt, doing nothing, alright? Now if you have … now if you
imagine if you have this cloudlet—and a cloudlet is a proxy for your phone—so now the cloudlets hold
on … holds onto your connection, but you tell the cloudlet, “I’m done;” you bring your radio down; the
cloudlet holds onto your pronnection, which is on the wild side; and you can—you know—things are
okay. So now, if you do just a little bit of calculation, you find that one point six watts—that’s here—
times whatever speedup—because now you sending to cloudlet, you not sending it to a … or mDC,
you’re not sending it to the big cloud—so that’s the speedup, times one watt times nine seconds—that’s
the amount of time you save, because now your radio’s down, and … but you can see this in the packets.
That gives you a saving of ten point six joules. This is amazingly significant saving, because if you have
twenty network transfers in, let’s say, one hour—or y’know, and they may be e-mail notifications, et
cetera—then you can save up to twenty-six percent of your battery, ‘kay? Now, no matter what—I
mean, if you’re in the field, you already know that energy management is amazingly hard—this kind of
saving is just incredible, right? And if you sort of let that whole thing keep going, and you ar—you know,
with the argument—if you sort of increase the number of transactions per hour, you start to see how
much saving you can get from this little thing, alright? So having … ‘cause you now have a cloudlet in
the edge.
Now let me show you … so this is pretty good. Let me show you one more application, and then I’ll be
done. So one … in this particular application—you know, we talk about new services, and I talked about
edge computing and things like that—so let’s take something like object recognition. You all heard
about augmented reality; you’ve probably work on that and object recognition and things like that. So
for all these sort of applications, the building blocks are similar. You … in … let’s say you take face
recognition in this case. You detect a face; you align it—correct—you then extract the features from the
face; and you match those features to some database which will tell you what those … what the face is,
right? Now, if you do a lot of this detection, for example, just on the client, you take a lot of time here.
But if you send it to the cloud—or this cloudlet that I’m talking about—then you serving less time. But
what is interesting is this graph, okay? So what this graph is showing is: on the horizontal axis is the
time spent on computation; on the vertical axis is accuracy; n is the number of features that have been
extracted. So now, depending on how much computation time you have, that many features you can
extract, and that’ll give you the amount of accuracy you get. And this is true for, pretty much,
everything that we do with imaging, or vision, or even speech, and even search. The graphs look very
similar—really. The more computation time you have, the better number of features you can extract,
the more accuracy you can get, okay? Now … which means that every second—every millisecond—that
you’re using on the network, is wasted—completely a waste. So all of us—all our companies, you know,
Microsoft in particular—is very, very deeply trying to sort of fix that problem, okay?
Let me show you what it means viscerally. So here’s a little thing that my intern, actually, put together,
and what you see is—as you set it up—you see a video of her holding a device, and she is taking a video
of people moving, and they’re just moving at regular speeds. And the system is using the best face
recognition algorithm that we have—the state of our stu … face recognition algorithms—to see if she
can detect faces. And this is gonna be very fast, so you will see how it goes—hopefully it’ll catch it. So
there are seven faces in here, okay? You will see there’s seven people walking by—right—and we are …
they think that … the video and doing face recognition on them. So now, when you’re able to send this
data five to fi … to a computing device five milliseconds away, the system is actually able to recognize all
the seven faces. Now let’s say you put it seventy-five to eighty milliseconds away, the system is gonna
try to do the same thing, but now, it recognizes only four of the seven faces, okay? So … because it just
can’t keep up, and the time that is spent on the latency is lost. So now, whichever company—whichever
one of us—actually licks this problem will … obviously, you will go towards that company, because now,
we can—even if we have the best minds working on this—we really solved the problem because we
fixed the latency issue, alright?
So there’s a whole lot of stuff you can do on the edge. I just … this is a laundry list of things you can do.
A lot of these are open problems, by the way. It’s not necessarily … none of these things are solved, but
we are working actively on it. I talked about caching, and caching is—you know—you just … if you have
the data, you just send it. SSL termination: you can save a lot of bandwidth to, because you can do a
proxy, and there are ideas here. I’ll not have time to go through all this stuff, but there’s a lot of stuff
you can do at the edge.
Now, where do you put these prod … or these clients? Here’s just a couple of small ideas. In a wireless
space, the way things work are you’ve got these dumb access points, and then you have our wireless
switch. That’s how the wireless is laid out here; that’s how wireless is laid out pretty much in every part
of the world, everywhere. Right? So what the idea here would be: that you would attach an mDC—or a
cloudlet—right next to a switch. Once you do that, all the packets that are going in hit this guy, and then
you can decide whether it wants to go to the cloud or not—I mean, it can do its own thing. So that’s one
way to deploy it, and you can go hit every enterprise, every hot spot—pretty much—every hotel,
everywhere you want to be, okay? The same thing … if you think about cellular networks, you can do
the same thing. Cellular networks are evolving towards what is called small cells now, and small cells
has basically to deal with capacity. You do the same thing: you’ve got small cell showing up, and you put
an mDC there—or a cloudlet there—and it’ll do the same … it’ll do the right … needful.
Now, it’s not … so—you know—the concept of cloudlets was initiated by us, but we see now that a lot of
the technical leaders in the rest … in different parts of the world are also starting to talk about it. So I
just briefly checked, and I found that—you know—here are some quotes about people pretty much
saying the same thing here. So as you can see, I … in this thing, you’ve got some devices, and then you
got a little device here, sitting on the edge. This is from whoever—probably somebody at Intel—and
this is from Hawaii; they also talked about—they actually used the word cloudlet here—they’ve got bay
stations and then you’ve got … every dock in the bay station had a cloudlet which talks to a cloud. This
is from—I think—Nokia Siemens; they really put … they show us whole stack here—that’s off its stack—
that’s sitting right here of the edge as well. So this idea—this concept—has actually … has got lots of
legs here, and there’s plenty of literature—both from Microsoft Research as well as other researchers—
which show that this leads to a tremendous amount of improvements.
Okay, so I’ll leave you with this thought. You know that I think—in my worldview—we have focused
very heavily on these large mega-datacenters, right? And what I’m sort of talking about now is to create
these thousands of these micro-datacenters at the edges. Now, for a company like Microsoft, we have
to now think about this as a full resource. So just as we look at optimizing the heck out of these guys,
we have to now optimize everything. So even if you have put forty servers per rack here, and you got
thousand here, you’ve got forty thousand servers that you have to optimize. So now, you think about
things like IOT—or internet of things. Internet of things really is about getting all the data, aggregating
it, analyzing it, doing something with it. Are you gonna bring all your data here? Or are you gonna just
take … keep it here, analyze it, and send it back here? So now, you think about distributed analytics; you
think about Hadoop, MapReduce—those sort of things—and say, “Is it … how do I think of this entire
infrastructure? Because now, the network is in the middle, and I have to take that into account when I
do job placement and scheduling.” So as—you know, like I said—as you think about the cloud, and you
think about disaggregated cloud, you start to see that all … a lot of the problems that we have faced in
the past are coming back. So that’s the short talk. [applause]
>> Dennis Gannon: Fantastic, Victor, and it’s good to see the work you’re doing on trying to save on
battery life as well as improve performance. So let’s hope we get there. So questions from the
audience please.
>> Victor Bahl: Yeah.
>>: So one of the main thing … your big datacenter is you can have a few staff members attending very
large number of machines. Once you go to these mini-datacenters, then they are distributed; you have
a—maybe—a few thousand machines in one place, a few thousand other machines another place. That
eventually disappears. The …
>> Victor Bahl: Well … so I thought way you were going with this was you were saying that it’s a huge
management headache for a company like Microsoft to be able to manage, ‘cause even for us—and I
think part … what you’re asking is: you’ve got these scales, and you can have these cost benefits, but if
you’ve got this disaggregated system, then … so I tell everybody here internally that the world sells the
cloud—we, too, sell the cloud—as computes and storage, right? What I want to see is us selling the
cloud as computes, storage, and latency. I want to be able to give you guarantees that we can not only
give you the compute, but we can perform faster. So now, from that perspective, I think the cost gets
amortized as you start to build this out, and—you know—as you build the technologies out in a
disaggregate … it should be okay. It should not be, because—you know—it’s still the thing.
>>: I was wondering how you see this playing in—if at all—to developing countries and—you know—
providing services there, where energy is a bigger issue, as well as is network space.
>> Victor Bahl: I think, for developing countries, it’s actually a great solution, because when you think
about … when we think about developing countries, we actually think about things like—you know—
lexity [laughs] not being there or connectivity. But if you have these micro-DC’s, you can actually start
by building out these micro-DC’s infrastructures, right? They are smaller; they—you know—they can be
even one rack or two racks, maybe have a little—you know—some power. You can actually deal with
that, so this is a good build-up strategy. And when we … when you find that the DC’s … or micro-DC’s
are no longer capable of doing it, you start to invest more. So to me, this is actually the right way to go
in some of the developing countries, ‘cause you can do … you can reach out to a lot of people with this
sort of stuff very fast.
>>: What types of locations would you put them in? Should they go in, like, an office building or just
room somewhere, or …?
>> Victor Bahl: Right. So this is a slightly long answer, but I … the two examples I gave you: I picked
easier examples, because that was a case where you have an enterprise, and you already have power
and you already have this thing. But I think what we have to do to be—in all honesty—is we have to
figure out what is the … what reason a Ma and Pa ISP will deploy this. So the traditional reason that
people do is that they say that we can save you the upstream bandwidth cost by caching a lot of data.
So once you’re … once we’re able to add that value, then we suspect that all these ISP’s that are … exist
out there, they will find the rack space and power for it. So the onus is on … the … it is sort of on us to
figure out what the value proposition is for these individuals.
>> Dennis Gannon: Okay, I think we’ll move on. So thank you, Victor.
>> Victor Bahl: ‘Kay, thank you.
>>: And it’s a real pleasure to introduce Albert Greenberg, who was a researcher here in MSR, but has
moved across into the Azure team. Victor talked about what happens at the edge, and Albert’s gonna
talk about, really, what’s happening inside our datacenters. So over to you, Albert.
>> Albert Greenberg: Great thanks. So actually, I’m gonna talk about the work of a … about a hundred
and sixty engineers, who are in the networking team, distributed between Redmond and Dublin and
Silicon Valley, and the … and what we’ve done over the last three years or so. In summary, we filled up
the network in the houses that David built—and these are houses of glass and copper … lots and lots of
glass—for our networks. But we’ve changed everything—I mean there’s nothing that was … really, not
one … not a plug or a wire or anything that hasn’t been rethought. And that’s one of the messages of
the cloud is just the scope for innovation is so enormous, and the—like you could see in Victor’s
remarks—you know, what seems like—you know—just percent—you know—one of the memes you get
from a—you know—some computer science talks: if it’s not ten X, it’s not worth doing. But in the cloud,
it’s not so, right? Because you get just this incredible multiplier from five X … sorry, five percent. If you
can save five percent or twenty-five percent or whatever in the power, it’s amazing. So the … that
doesn’t mean we’re not aiming for ten X, but you … the small gains have terrific impact.
So I want to give you a little bit of Azure-by-the-numbers, just so you get a feeling for how we grew over
… and what scale means. So now, physically, you have to just get a—you know—a sense of: what is
scale here? And then, I want to talk a little bit about what these numbers mean for network, and—you
know—there’s a company’s been purchased for a billion dollars and this and that. So you probably
heard of software-defined networks: I want to tell you kind of what that’s about—what it really means
besides “it’s cool.” And software-defined storage.
So this is a bit of a dated picture of the footprint of Windows Azure’s mega-datacenters. Let’s see …
twenty-five datacenters—and here, I’m not showing the edge that Victor talked about—a billion
customers, twenty million businesses, and seventy-five markets. It’s … again—I guess—just get a feel
for: what is scale? You know? Maybe—you know—a hundred developers sounds like a lot of people,
but if you look at this kind of growth, it’s amazing … you cannot assign a developer to a cluster or a
datacenter or whatever, that’s just … we’re building like crazy. So in 2010, we had a hundred thousand
compute instances—VM’s—and now we have millions. As David said, we also have millions of hosts.
The … in terms of storage, we went from tens of petabytes to exabytes over those years. And in
network capacity, we went from a … kind of a skinny traditional network of tens of terabits per second
to the petabit-per-second networks that we build now in the datacenter.
So what does it mean if you want to—you know—if you’re an e-science guy, and … or if you want to
connect to Azure and run your workloads on us—or you’re actually a big data guy, or whatever it is, you
know—what does it mean for you? So the … really what cloud providers like Azure provide is the ability
to create, in our environment, the virt … from a networking guy’s point of view, it’s whatever virtual
network services you want. So we allow you to—first of all, with high speed; I won’t be able to talk
about this—extend your private network into Azure in a convenient way—click, boom, and you’re into
Azure—and as well as … you don’t have to rewrite your network architecture, you can, in particular, use
the same IP address space, the same kind of network planning tools and management systems that
you’re already using, just extended into Azure. So one way to say it to enterprises is that when they go
to build a branch office or a hub office, they can just extend it in to the cloud. If you don’t need that,
then you can just, from the beginning, build into the cloud. But you get a virtual network, and in
essence—because of the way we do compute, storage, and networking—you get a virtual datacenter
inside a cloud. Okay.
And then, again, we … this picture shows that we’ve got an orange enterprise and a red enterprise, and
they’re completely isolated, even though they’re built on shared infrastructure, and they’ll feel like
they’re dedicated to each customer. The orange customer won’t feel the presence of the red customer.
So the … really, the challenge to all of us who are building in the cloud is just: how do you keep up with
these numbers? The … how do you support, for example, fifty thousand virtual networks in a single
hundred-thousand-server deployment in a datacenter? And really, the answer is—I mean, honestly—
software, so the … which usually starts with finding the right abstractions and the right separation of
concerns. One of the—you know—sort of natural things for people to talk about is the hardware, and—
kind of—the lay of the land. I talked about it myself in the last minutes, but the software is actually the
only way you can get to this kind of scale. And the … what’s interesting is also the impact it’s had on the
industry in—you know—the industry has put a lot of focus on network hardware, like switches, and
routers, and things like that, or optical networking things, and they certainly have a major role to play—I
could show you some data on how we use them—but the key to this software-defined networking stuff
that you hear about and what everybody is using—all the cloud providers are using—is on the host, on
the server, okay? So it’s not—really, honestly—about just about disaggregating a router, it’s about new
innovation on the host and host networking, and that’s the way you get to deal with these huge
numbers that I’m talking about in terms of datacenters and virtual networks. And beyond just the
numbers, by moving networking down to the host—the networking policy down to the host—and you
get the flexibility you need to roll out new features and to debug things. There’s no way we could have
changed everything in networking if we had to wait for a new A6 to get rev’d—the … it has to happen in
software on the host, and that’s where it does happen. So if you don’t remember anything but this one
line to remember is: in the datacenter, the action is on the host.
So in terms of networking, and what does it mean? That … you see by … you think about it: every host
performs all packet actions on its own VM’s, so that means you’ve got this massively distributed
compute engine that works a lot better than if you just try to—you know—build some kind of scale-up
device that does it in the middle. You know, Victor mentioned our software load balancer; we run
software load balancer on every single device on every single host in the datacenter. So it’s really this
incredible scale-out. So it’s a massively distributed system that’s put together through host networking.
And really, the reason it makes sense is: by distributing it this way, a teeny bit of power on each of a
million servers comes together to deal with billions of flows, and if you do the math, then each host only
has to deal with a thousand flows. So you’re at the right place where you can apply per-flow policy.
And so the essence is: well, how do you control that? How do you build a software to deal with all those
distributed hosts and push these software-defined networking to the host?
I just … there’s really a lot to say here, and I just grabbed one slide, but—you know—there’s a lot of talk
about—in the industry—OpenFlow. You’ve probably hear of OpenFlow and—you know—are
wondering—kind of—is it technically mature? Is it really a big deal? Is it just a … kind of a hype thing?
What is it? So the essence of it, really, is pretty much that you can get stuff done through flow tables
that are fairly simple … that provide fairly simple transformation on packets—they move through
pipelines. And this happens in a switch that runs on the host—it’s called a vSwitch, or a virtual machine
switch—and by the way, it’s not necessarily a layer 2 switch during a lot of routing. And … but the main
this is these tables can perform … this transformation of packets moving through tables can perform all
the networking functionality you need to perform to get stuff done in networks, and it’s … I won’t say
this is OpenFlow; it’s kind of … it’s similar to the idea of OpenFlow, and a lot of work has been done at
Microsoft to make this kind of idea go really fast and scale to huge numbers. So that’s … this was kind of
the data-path mechanism—that’s all I have time to show you in this little bit, but there’s this thing called
the controller up there. That’s actually coordinating policy and programing all these switches, all
these—you know—there’s a hundred thousand switches in a datacenter. So that thing is not really one
thing; it’s a distributed system. It’s highly re … highly available, highly resilient, and this combination of
the virtual machine switch and the controller is what lets you do software control and rapid innovation
in the datacenter network. Okay?
So I thought since a lot of you guys are interested in storage and big data, I’d tell you a little bit about
storage. So just like we have to virtualize compute and virtualize network, we have to virtualize storage.
So there is a software-defined storage industry moving in the datacenter. As it turns out, storage puts a
lot of pressure on the network. Sometimes we use the word East-West—you know—to talk about flows
that go from server to server in the network, and if you think about one write in Azure—and this has
some interesting papers on this that you can find—it generates a lot of traffic all around the cluster, and
the real … the reason is we want to provide amazing durability. We want to … every byte written has to
be quickly accessible from the compute cluster and has to be—you know—incredibly durable, and so
that—you know—you won’t lose data in the cloud. So we copy it. We copy it efficiently using erasure
coding, and so this generates a ton of I/O’s in the cluster—and by the way, it goes over the wide area
network to other clusters, because we want this data to be available even if this cluster is unavailable
because of a fiber cut or what have you. So it … when you go and do the math of how you build this
thing—that’s one of the fun things about being in the cloud: you can play with any parameter; you can
put more disk, more SSD, more … bigger NIC, a bigger switch, and so forth—but it turns out at the end of
the day, to make storage cheaper, you wind up spending more on network, and not—you know—in a
reckless way, but in a engineered way. You build a bigger network to deal with the storage.
One of the interesting things is the resurgence in clouds of RDMA or … which, really I … a few years ago,
we didn’t think that much about, but really, the great thing about RDMA technologies is you get
incredible offloads from the CPU. So the … I mean, I think you guys that … if you’re coming from escience, and you’ve used InfiniBand, you’re used to this: that now your CPU is basically … all your
networking is offloaded to the hardware, and you’ve got your CPU back to do your computation. And
the … this is amazing, and I’ll show you some numbers about this. So what we’re able to do is—in the
cloud—embrace RDMA; we use a technology called routable RoCE, which is an Ethernet technology that
employs InfiniBand transport of Ethernet, and we’ve made … we’ve got it to route, and this is …we’re
working with lots of industry partners on this. So it … this allows us to basically take stress off of the
servers—these servers are expensive—build a slightly bigger network with the right kind of host support
for offloading—again, host networking—and get a more economical result that meets the needs of the
storage community. So the bottom line is: all the logic is in the host, and now we have software-defined
storage that scales with software-defined network. So does it work? So this is just a little screenshot
that I have that shows that we’re getting line rate with no CPU—zero CPU—using RDMA.
You know, there … I didn’t want to just say nothing about physical networks. I—you know—we want
these physical networks to be so redundant and so high quality that it’s okay if there’s an edge that’s
broken here and there, and—you know—there’s a … packets are dropping here and there. We’ve got
lots of capacity. So the problem becomes: how do we just disable the broken piece and—you know—
not have a panic or a judgment call? “Hey, should we fix this piece or we keep it available? Do we have
enough remaining capacity?” We have enough capacity so that if we can keep the software, and the
monitoring, and the automation, and all that stuff working really, really well, then the … then we have a
lot of capacity in the physical network—like, in the … this little Clos diagram shows—so that it’s okay for
things to break. The bullet points here have to do with—I guess the … I have a … this should be forty at
the bottom—the … we can get network policy done—like access control lists, network address
translation, virtual private networks, software load balancing, firewalls, all that stuff, virtual appliances,
every kind of functional virtualization—done in the host, and then have this big, resilient, internal
network that can do things like RDMA, for example, that … it just needs to be very fast-converging and
powerful. All the policy pieces, the software, the stuff you want to—sort of—add new features within is
in software, and we implement everything as a VM. And we deploy network services just like any other
service.
So, let’s see … the—I’ll just wrap it up at this point—so this term of SDN: I hope I helped clarify a little
bit. I showed you where we apply policy to make it happen, and the other thing that’s worth
mentioning is that we are—you know—bringing the same technology to a customer premise with our
partners in the OSG team. Okay. [applause]
>> Dennis Gannon: Thank you, Albert. I think that was a real … really fantastic insight into what
happens inside the datacenters—I certainly found out a lot that I didn’t know about. So I’m sure there’s
a few questions. Dennis, do you want to kick off?
>>: I think that was really fantastic. I get asked by people I work with in the Internet2 community, and
they are totally committed to SDN …
>> Albert Greenberg: Yeah.
>>: … throughout the system it keeps ‘em … how … when will you able to actually develop a … couple
our SDN clients in the university community directly into Azure? Is that something that is foreseeable? I
mean, they’re usually OpenFlow, pretty much …
>> Albert Greenberg: Right, right.
>>: … but that may not be compatible with what we [inaudible] so …
>> Albert Greenberg: So … I mean, really, we offer … this … it … there’s a good … there’s many … there’s
layers to that question. [laughter] So the … if you want to build your own, let’s say, wide area network
Internet2, then yeah, you can use these kind of ideas to build your own Internet2, and … the—you
know—we do have an effort to do software-controlled wide area networks, okay? And that’s, maybe—
you know—one of the things that they want to do. What … it’s … if they, instead, they—you know—the
idea is you want to bring your own hypervisor or something to do the kind of host SD and stuff that I’ve
talked about, the cloud providers don’t provide that kind of bare metal, where you could bring your own
hypervisor. You … basically, you bring your own image, your … you … if you want to bring, let’s say, you
like a certain load balancer or something like that, we want to support that as a service—so you can
bring network functions, but you don’t … we don’t open up so that you can run on the bare metal in our
cloud, and no one, I think, does. To some degree, maybe—you know—the hosters do … that if you just
buy space, and then you can put it … anything you want in the cage. But we are—I think—fairly open
about … we publish what we’re doing, so there’s that kind of flow of information, but in essence, we’re
telling people that you don’t have to do that. You … if you want to set up shop in the cloud, and you
want to do big data, you’re good—I mean, and that’s … just bring the data to us, and we’ll do the heavy
lifting for you.
>> Dennis Gannon: Great. Are there any more questions—yeah?
>>: Well, maybe this is just more of a clarification. So you mentioned RDMA and RoCE and all that
business, and I understand that’s between the server and your storage. Will you be able to get clusters
with high-speed networking?
>> Albert Greenberg: Yeah. So we … yeah, that this is … the question was did we do that for storage
and … and we did. Yeah, it’s one of the things that we’re looking at is going broader with RDMA than
we’ve done up till now—in cluster size and also in field of use.
>> Dennis Gannon: Yeah, I think … just clarifying that, as well, on Azure, that we’ve now got the A8, A9
instances for true HPC, with InfiniBand running Windows HPC, and we’ve seen some benchmarks that
show, actually, from one of the universities in Switzerland, which is: that’s running faster than their local
HPC cluster. So I think we … we’re starting to experiment with that, and we do have a service, but
rolling that out more broadly I think is something that, let’s say, product teams are looking at.
>>: So actually, is that … I was under the impression that we didn’t have access to that … to a university
researcher’s allocations.
>> Dennis Gannon: Yeah, the Azure awards are for particular types of instances, but I think if you want
to play around with the Windows HPC, talk to us—Dennis. You know, we’re interested to see what
people are looking for, but as I said, I’ve just—literally, just last week—seen some benchmarks from a
university in Switzerland, and they’re looking to just use Window … use Azure HPC rather than their local
HPC for a lot of their workloads, so it’s … yeah. Are there any more questions? Okay, well thank you
Albert, that was fantastic. [applause]
>>: Pardon?
>> Dennis Gannon: Jonathan Goldstein—who’s one of our researchers here who did go off into product
land, but came back—is gonna talk about—sort of—e-science. He’s one of our experts in big data. We
had a session yesterday around streaming data, and I think that’s something dear to Jonathan’s heart—
he’s one of the people who sort of came up with the ideas around StreamInsight, which is one of our
products. I’ll just switch the power point … I go … it’s that one, isn’t it? We’re getting …
>> Jonathan Goldstein: Does this work?
>> Dennis Gannon: Yeah, I think so. Yeah. So … yep, so thanks, Jonathan.
>> Jonathan Goldstein: Hello everybody. So today, I’m gonna … so the title of the session was the
leadings … the leading edge of the cloud, right? So I thought it would be worthwhile to talk about how
this relates particularly to e-science, how we got to where we are, why the gaps are that are there, and
what the gaps are that are likely to close, and what are ones that—you know—maybe we don’t have
good answers for yet. So first of all, where is the leaning edge, right?
So the cloud was originally built basically to serve webpages and to solve the search and the online
advertising problem, right? So basically, the cloud—sort of, cloud v one—was basically designed to
solve these two problems. So what does that mean? Well, it means that the cloud was built for
serving—you know—web … that the cloud was built to handle human-generated request/response
kinds of workloads, where there was some kind of state being stored in the cloud. So you had,
essentially … think of it as—you know—billions of devices—which are people—very slowly entering data
into the cloud, which would then be accumulated somehow into a some kind of state and possibly
served back, right? The other aspect of it is that part of solving the problems related to online
advertising and search involved being able to solve very large processing problems that required
massive scale-out. Now, the thing that motivated all of this activity was that we knew in advance that
solving this problem would be worth more than ten billion dollars, right? So having very expensive
infrastructure as a critical thing to solve this problem was okay. It wasn’t a problem. The other aspect is
this sort of ubiquitously-connected queryability and result dissemination—it’s connected to the internet,
right? So everybody can get webpages; everybody can search; everybody can do all of those things. The
other thing is that there was some sense of sort of pay-as-you-go infrastructure—you know, the more
people that connected and used your website, the more expensive it was gonna be to do, but the
assumption was that you were gonna make that money back, and you wanted that sort of scalability as
people used it, the ramping up, to be transparent and easy.
So the question is does this sound like you, right? As e-scientists—right—does this sound like you? So
let’s take apart each of these things separately. The first thing: let’s talk about the human-generated
request/response, right? So science … e-science things tend to have more massive data rates from a
smaller number of sensors, right? So that doesn’t really sound like you guys, right? The … another thing
that’s worth saying is that it’s true that—I think—the massive scale-out are in common, because you
have very large datasets that you want to process, but the thing is: there’s an unknown payoff.
>>: I think it’s pretty well-known. It’s zero. [laughter]
>> Jonathan Goldstein: Well, the hope is that eventually it’ll pay off in some way, which is why NSF
funds it, and—you know—it’ll help society eventually, right? But the point is: you don’t know at the
beginning what the … maybe it’s gonna be big, right? Maybe the payoff’s gonna be gigantic; maybe it’ll
be nothing; maybe it’ll turn out you don’t find anything. You just don’t know, right? The other thing is
this sort of combination of ubiquitous connectivity and the pay-as-you-go part, and if you ask me, I think
the reason you folks are really interested in the cloud, is because of all the yellow stuff. It’s because it’s
connected to everywhere, so everybody can share datasets, can run queries against them, can share
results, can reproduce them—all of that, along with the fact that: oh wow, you know, there’s this
opportunity to get ahold of a supercomputer to … that we can all share to basically do computation. I
think these are the main reasons, but those other thing … there are these differences, right?
Let’s dig in a little bit deeper. So let’s talk about the human- versus machine-generated—you know—
aspect of things. There’s—you know—a vast increase in the data rates per source for e-scientists, right?
It’s not somebody clicking on a webpage, it’s some sensor that’s constantly reporting a result—maybe
even video, right? There’s a significant-but-smaller decrease in the total number of sources. You
probably don’t have five billion of these things—one for every person on the planet—you probably
have—you know—anywhere between tens to hundreds of thousands of these things would be my
guess. The other thing is—it’s worth saying, though—that the decrease in the number of sources is not
… doesn’t compensate for the data throughput. So frequently, you end up with vast differences in the
data scale. You look at something like the amount of data that the Hadron Collider—and there is no
business I’m aware of that produces data where … anywhere near those kinds of quantities. The other
thing is that none of the data sources are part of the datacenter. This turns out to be pretty important,
because you have to get your data into the datacenter somehow in order to operate on it. Usually,
these are sensors that are somewhere in the world, collecting data about something. The aggregate
amount of data that they all produce is pretty large. If you look at the online advertising case, yeah it’s
true that, like, clicks and things like that, and search terms came from the outside, but there’s also a lot
of data that was collected inside the datacenter, itself, that was then logged. So a lot of the data in that
case was actually born inside the datacenter in the first place. In this case, actually none of it is, right?
So—you know—all of this adds up to big … a big data-ingress problem for scientific data, right?
Now, the good news is that there is a solution to this problem—right, and Jim Gray suggested it, to my
knowledge, the first I heard of it many years ago—which is: you collect the data on some kind of
medium—could be a hard drive, could be whatever the future holds—right? And then the idea is: once
you collect the data and you store it there, you ship it directly to the datacenter and just physically
attach it, right? And the assumption is that you collect it on a medium that’s amenable to later data
processing. That’s pretty important. So that … ‘cause copying this data—this is large amounts of data—
copying it, moving it over a network—this just is not a sensible thing to do.
>>: There’s an … maybe another alternative. In China, in Beijing, a genomics institute built a datacenter
around the sequencer.
>> Jonathan Goldstein: So I was gonna hold that … yes, you’re right. There is, actually … and this
actually plays in a little bit into what Victor was saying possibly, too, right? Which is that one possibility
is to bring the data to the datacenter; another possibility is to bring the datacenter to the data, right?
So you can kind of go in either direction. Bringing the datacenter to the data has its own set of
problems, but neither of these are ea … are super straightforward, so …
>>: Well, Microsoft could build a telescope, a little bit.
>> Jonathan Goldstein: [laughs] Yeah, exactly. [laughter] So let’s go into this a little bit more, right?
About this whole … the nature of … sort of the lucrative aspect of online advertising compared to this
kind of exploratory—you know—aspect of data science. Well, so I just have—you know—four
quadrants here. The top two are the high investment quadrants; the bottom two are the low
investment quadrants; and then we have left quadrants, which are the low return, and the right is the
high return. The current cloud computing infrastructure, which was built to solve search and online
advertising, is clearly in the high investment, high return category, right? The problem with e-science is
that, well, we don’t really know: it depends on the activity, and you don’t know in advance exactly
where you are.
Now, to make matters worse, the very high return that we got for online advertising—it was so high; it
was such a valuable problem to solve—that it allowed us to be incredibly wasteful with the resources—
the computing resources—that we used to solve the problem, and time to market ended up being a lot
more important than whether or not you were efficiently using the resources that you had. In order to
sort of get a grip on all of that, so Hadoop can process about one to ten megabytes per node per second,
right? That’s about what Hadoop can do. Now, everybody’s excited about Spark, because Spark’s a lot
better, right? So Spark can do maybe five to fifty megabytes per node per second—that’s a lot better;
that’s a five times increase in throughput, right? We must be doing great, right? Spark must be great,
right? Well, maybe not. By comparison, main-memory databases can process very similar queries at
rates of about one to ten gigabytes per second per node. So we’re talking about a difference in levels of
efficient between a hundred and a thousand factor. It’s huge. So it takes a thousand Hadoop nodes to
do what one of these can do in a single node, right? So that’s a level of inefficiency that’s pretty
staggering.
Now, if you need a hundred thousand machines to solve your online advertising problem, and your
online advertising problem is worth ten billion dollars, you should shrug your shoulders and say, “Oh,
well. I’ll just go ahead and do it anyway, because I know how to scale that up to a hundred thousand
nodes. I don’t know how to scale up that technology to a hundred nodes. So I’ll just pay the price and
oh, well.” Right? But does this sound like a recipe for success for an exploratory data processing task
where you don’t know in advance what the return is going to be? Does it make sense, even, to spend
that money? Right?
And now, it is worth saying that these databases that I’m talking about have limitations, right? They’re
… they only allow you to express certain kinds of queries, and I think that, actually, a very rich area of
work—which actually is the area I’m working on right now—is basically saying—you know—how can you
reduce this efficiency gap, while at the same time, not limiting yourself to the expressiveness that these
database engines have and significantly expanding the scope of the kinds of computations that they can
do enough so that they’re useful to e-scientists? Right? There’s a lot of ground to make up here, right?
Now unfortunately, the bad news is that we may ultimately be limited by the … by networking, right?
It’s worth saying that today’s technology for these in-memory DB’s is one to ten GB’s per second, but
with—you know—the new sandy bridge architectures that are going on and all the improvements in
processor … in microprocessor and computer design, we have every reason to believe that in the pretty
near future, the one-to-ten is gonna turn into ten-to-a-hundred. So—you know—ten to a hundred
gigabytes per second: we all feel like we’d be doing really well if we maxed out our ten gigabit NIC—you
know—inside the datacenter. So that’s … so there may be a one-to-two-order-of-magnitude difference
between the throughputs that you can get sending data from machine to machine versus the kind of
bandwidth that you can get between main memory and your microprocessor. So it may be that those
net … that networks will fundamentally limit us here.
Now, the good news is the ubiquitous—sort of—connectivity and the pay-as-you-go aspect of it, well,
that’s kind of motherhood and apple pie for everybody. So I think, actually, our current datacenters do
a pretty good job there. Now, there’s one other axis of difference. I was just talking to Ed Lassiter
before this, and there’s one other axis of difference that he pointed out that I think is worth mentioning,
which is data longevity, right? Which is to say: the online adverti … in the online advertising problem,
once data was a year old, no one cared about it anymore—it really had no relevance anymore for either
search, or for peop … the behavior of people, for ad and all these kinds of things. So really, data just
kind of got thrown away—you know—after a year or so. You know, Ed pointed out that a lot of
scientists want to keep their data for something like fifty years, and as time goes on, there’s maybe
reason to believe that it’ll just sort of continue to be valuable over those fifty years. So that’s also an
important part of that, and my suspicion—I was thinking about that—and my suspicion is that that
problem probably will get solved just because companies are incredibly paranoid about losing data, so
they may, in fact—in this whole service thing—force us to cope with that problem anyway, just because
of their own internal paranoia. So that may be … that prob … there may be significant market pressures
for us to address that problem. So that is my talk, and if anybody has any questions—in part, this was
designed to spark a conversation, so feel free to jump in, and I want to hear opinions and comments.
>> Dennis Gannon: ‘Kay, thank you, Jon. [applause] So … slightly controversial things in there, I think, so
… [laughs] has anyone got any comments? Does anybody agree or disagree with Jonathan? [laughs]
Tanya …
>>: Actually … I mean, I very much agree with the point that, for a lot of scientists’ data, we cannot build
a datacenter where the data are coming from; we actually have to get … the big problem is getting the
data to the cloud, and with that also comes a trade-off—not only of, sort of, how do you use the cloud,
but do you use it infrastructure as a service? How much do you do locally still with whatever biggest
computer you can get versus on the cloud? And these trade-offs, I don’t think we’ve explored yet
enough of what’s worthwhile, and the … when you actually have to pay the costs of getting data to the
cloud.
>> Jonathan Goldstein: Yeah, it does kind of feel like there’s a … sort of a big discontinuity between—
you know—I can run my data on my machine in whatever my version is of my local, in-memory DB that I
can get really good performance on … once my da … once the data kind of reaches a certain size, where
it starts getting really difficult—you know—or nonsensical to do that, at one poi … at some point, you
have to sort of bite this bullet, and pay this price, and say—you know—“Oh, well, I’m going to flip a
switch now and put this data up in the cloud.” That’s a hard switch to flip. Is it … I think that’s kind of
your point, is that it’s not … it doesn’t feel like there’s any sort of continuum—there’s sort of this hard
switch that you have to flip.
>>: Right. It’s also not … like, it … at least, I don’t well understand yet the trade-offs of—sort of—the
expense of getting the data there, the computational …
>> Jonathan Goldstein: Yeah.
>>: … the infrastructure as a service, the—you know—the computation as a service, and so on—all
these trade-offs are not clear of: at what … what should we put in the cloud? Clearly, if we want to
share data, we should put some part of those data in the cloud.
>> Jonathan Goldstein: Yup.
>>: And clearly if the data is … are big, you should put some part of the data in the cloud, but at what
point should we add computation—you know—to go with it versus do some of it locally? What are the
trade-offs? As it just …
>> Jonathan Goldstein: Well, I wonder a little bit whether what Victor talked about—you know—
whether his sort of micro-clouds were … could sort of help with things like that, because maybe that can
give you more of a continuum, and maybe if it’s something that can even be on-premise, you can
become part of the cloud or something like that. I don’t know, it’s like … kind sparked some thoughts
for me and … around: maybe the cloud doesn’t have to be this totally monolithic thing where—you
know—you send your data there, and it’s this big thing you have to do. Maybe you can bring it to your
data in pieces as you need to or something like that.
>> Dennis Gannon: Geoffrey.
>> Jonathan Goldstein: Uh huh?
>>: Listen to one of the other talks: why can’t we use SDN and RDMA to link our data in our … local data
directly to Azure?
>> Dennis Gannon: I think that’s what Albert was showing. It’s actually something we’ve been doing
with JANET in the UK, where we’ve got now a dedicated ten-gigabit-per-second direct connection into
our Dublin datacenter from JANET. So all the UK universities now—and this is just … I think it just went
live. We haven’t made a big, public announcement, but we’ve been working with JANET for the past
eighteen months on this, where we have a dedicated private network line between the UK academic
network and our Dublin Azure datacenter—ten gigabits per second. And so, again, we are trying to
work with the research community to look at this, and very much—with the Azure for research
programmers—we’ve now getting towards a hundred and fifty, two hundred projects. It’s interesting:
the hybrid approach of local plus public is certainly something that we’re seeing, and I think as
Microsoft—as an organization—that is our strategy as a company, really, is hybrid and say …
>>: [indiscernible] why can’t we just ask Internet2? I don’t want a dedicated line. Why can’t I ask
Internet2 to take my local—you know—my local computing or my local storage and connect it in this
seamless fashion …
>> Dennis Gannon: Mmhmm?
>>: … towards R using hosts, SDN, and [indiscernible]
>> Dennis Gannon: Why was it that …
>>: What’s wrong with that?
>> Dennis Gannon: So next month is TURENA, which is the international national research [laughs]
network conference, and last year, at TURENA they did—you know—all of the talk was about STN …
SDN—all of the national research networks, not just Internet2, but the other ones, and not just … you
know, other US network providers not, you know, like in California. So I think SDN actually is something
that all of the national research networks are moving towards, and—you know—we are working with
them—SURFnet in the Netherlands, where we have an Amsterdam datacenter, for instance. So I think—
you know—as well, just from you—from the research community—if we know that that’s what you
need, then that helps us to talk to the national research networks. Geoff …
>>: So are we making … are we fooling ourselves a little bit by focusing on these dense datasets? You
know, like the National Ignition Facility generates a terabyte—you know—in some number of
nanoseconds.
>> Dennis Gannon: Mmhmm?
>>: Okay, well obviously, the networks can’t handle that, but a big dataset—at least, in my terms—are
the satellite data—you know, satellite images. Well—you know—we get the data down from the
satellite, right? [laughter]
>>: Really good …
>>: And so—you know—if we can get the data down to the satellite, it shouldn’t be a problem to get
those data onto a disk somewhere, right?
>> Jonathan Goldstein: So the question is: what kind of throughput are we talking about?
>>: We’re talking about—you know—sort of hundred megabit per second, sort of …
>> Jonathan Goldstein: Hundred megabit per second? So like ten megabytes per second or something
like that …
>>: Yeah.
>> Jonathan Goldstein: So yeah, that seems to me like something you could do.
>>: Yeah, the thing is that it’s—you know—it does go on …
>> Jonathan Goldstein: Yeah.
>>: twenty-four seven.
>> Jonathan Goldstein: Yeah.
>>: Right? But other … on the other hand—you know—we have communication satellites, data comes
down …
>> Jonathan Goldstein: Look, but you know what I think frequently happens?
>>: … and we’ve got people that are reading it with this low latency—you know—build their own
receiving station.
>> Jonathan Goldstein: So there’s sort of two comments. The first is that I think part of the problem
that people face when they have things like that is that everybody knows that satellites are expensive,
and everybody knows that launching satellites are expensive. So when they kind of budget these things,
they say—you know—“Oh, well I have this much money to sort of launch my satellite, and develop my
satellite, and all of that.” They don’t necessarily budget in all the things that people are going to want to
do with the data once they have it, right? So—you know—the question is: if you want to keep this data
for fifty years, and you want to allow somebody to run queries against it, part of it is this collections
thing—this continuous collection thing—but then, even after you collected it, somebody has to
continuously pay to curate it and keep it available for everybody so they can essentially answer
questions with it. And that’s an ongoing cost that never really ends, and it needs to be … the question
is: who pays for it? Does the person who puts the satellite up there pay for it? Does the person who
runs the computation task pay for it? Because if it’s the person who runs the computation task, for
instance, then okay, now who pays for data curation? So there has to be some kind of business story
around: it’s worth it for me to curate this data, because I’m confident that people are going to come to
it, and ask questions, which I can then charge them to produce answers for or something like that. And
…
>>: Well, but … I guess there’s … so first of all, there is an archive that is paid for by the taxpayers.
>> Jonathan Goldstein: Mmhmm? So that archive … so my point is that if that archive is a bunch of
offline tapes, for instance, then it’s not really very useful in that form. If I’m gonna run a query against it
… now, suppose I’m gonna get that data and run a query against it—and it’s pretty large; it may only be
ten megabytes per second, but now, accumulate that over ten years, right? That’s actually, like, a fairly
significant amount of data.
>>: Well, I guess for the … one of the issues, though, is it’s somewhat like a model computation in that it
is recoverable, alright?
>> Jonathan Goldstein: Mmhmm.
>>: So years ago, during the Sequoia 2000 project, Joe Pasquale kind of looked at this issue of—you
know—there’s a cost to store; there’s a cost to compute; in this case, there’s a cost to retrieving it …
>> Jonathan Goldstein: Yup.
>>: … from another archive. And based on usage and some cost trade-offs, you can kind of decide when
you can get rid … when you can throw data away.
>> Jonathan Goldstein: Mmhmm?
>>: Because if, in principle, it’s recoverable …
>> Jonathan Goldstein: Right, and in the end, I think it all goes back to the … it all goes back to
economics in the end, right? I mean, you hold onto the data if you … if … you hold onto the data in
queryable form if you think it’s gonna be valuable enough to make it worth your while to hold onto it
and then disseminate it for tasks, where you think that the expected return warrants the cost that
you’re going to pay—you know—or something like that, right? And it is fuzzy, because it’s science, and
we don’t always know … we don’t know what the outcome is going to be in advance, right? But these
are, like, very critical … these are very critical problems we have to solve—you know—as part of solving
this larger problem. We have to really think through this stuff very carefully: about—you know—who
curates the data and why, and make sure they’re motivated. In what form is the data? Is it very cold?
Is it clo … is it warmer? Do I have it spinning in disks somewhere, them ready to query? You know, just
in case somebody … I mean, somebody needs to decide all these things, and they’re … and you can’t
divorce it from the economics—I guess that’s partly what I’m saying.
>>: Well, no. I mean, you’re grounded by the [indiscernible]
>> Jonathan Goldstein: Yeah, yeah.
>> Dennis Gannon: I think that’s one of the things that … like the research data alliance is trying to
grapple with that …
>>: Mmhmm?
>> Dennis Gannon: … globally, right across all the disciplines, and a lot of these questions are common
across GSI instances, and all of them, and so …
>> Jonathan Goldstein: Absolutely, in—sort of—version one of the internet—you know—of the … of
these clouds, actually, the economics weren’t aligned—I think—with what scientists would like, and I
think that’s changing. I think it’s … there’s enough motivation to go after problems that aren’t just tenbillion-dollar problems, that efficiency is mattering now—I mean, you saw the number of customers we
want to support in … we are supporting in Azure, and I guarantee you there’s ten times more than that
that we want to support, and we won’t be able to lure them, you know, onto our cloud without
addressing these efficiency problems. But the original problem that motivated our cloud design was one
that had such a high level of return that it allowed us to not worry about efficiency to a pretty large
degree.
>> Dennis Gannon: I think that I’m just gonna … I think there’s a lot of interesting discussion here that
would be good to continue over lunch. So—I don’t know—Geoffrey, did you have a quick one, or is it …
is this a lunch discussion? [laughter]
>>: I’ll make this a lunch question.
>> Dennis Gannon: Or in the bar afterwards.
>>: I had a … I was at some meeting about scientific network, and I asked the people there that: as far
as I can see, scientific data was naturally natured more fraction than the total amount of data, and so
why on Earth do we need special resources, because we’re within some fluctuations? Therefore, it
ought to be possible not to have special programs and just solve it on top of the commercial
infrastructure. And they did not like my question. Nor did they actually answer my question. [laughter]
>>: But actually, I saw the numbers, and it’s exactly the other way around. That scientific … there is
more scientific data—sheer in numbers—today that’s produced by …
>> Dennis Gannon: So I think this is …
>>: You’ve lived though, right?
>>: But that’s not the point.
>>: What we do … we … when we want to make scientific data look really big, we cite about three or
four projects, and when we want to make it look small, we cite the other ninety-nine point nine percent
of the projects, or do aggregate, right? So …
>>: The total amount of data’s meant to be four zettabytes, and the LHC is tens or hundreds of
petabytes—it’s a naturable fraction.
>> Dennis Gannon: But I think …
>>: [indiscernible] saying even stronger. Even if you grant that maybe there’s … some of these projects
are producing a lot, you ignore those three or four projects—they’re gonna do their own thing anyway.
I’d like to figure … I sort of agree that we need to … the rest of the projects could be handled in the
noise, perhaps.
>> Jonathan Goldstein: But I think …
>> Dennis Gannon: Dennis, do you want to wrap up?
>>: I just want to say Facebook delivers more photographic images to people in a day than LHC has
generated in its entire lifetime.
>> Jonathan Goldstein: But I think …
>>: There is far more nonscientific data than there is data.
>> Jonathan Goldstein: But the total amount of data’s a lot less important than the variety of tasks—
and the value of those tasks—that people want to do with the data, right?
>>: That doesn’t affect the networking. I mean, the network is a network; it doesn’t care about variety.
>>: Depends on how much you do with it.
>>: It does, it does too.
>> Dennis Gannon: I think this is definitely sounds like a lunchtime discussion. [laughter] So thank all the
speakers in the panel, and we’ll reconvene in a half hour. [applause]
Download