>>: I have the pleasure of introducing Dan Reed,... president for technology, strategy and policy in the extreme computing...

advertisement
>>: I have the pleasure of introducing Dan Reed, who's Microsoft's corporate vice
president for technology, strategy and policy in the extreme computing group. I'm not
going to read here everything on the slide here. It's a very long bio that Dan has.
Prior to Microsoft, Dan worked as a professor at Dan worked at UNC Chapel Hill and was
director of RENCI. It was actually at RENCI when I first met Dan when he was
co-sharing the 2007 E-science workshop with Tony Hey. As was mentioned yesterday I
think by Dave Patterson, Dan served as a member of the President's Council of Advisers
on Science and Technology and was the lead author on the report for computational
science, ensuring America's competitiveness.
And also served two terms as chair of the CRA, Computing Research Association, where
he had numerous responsibilities. Also, previously, the director of the National Center
For Supercomputing Applications and was the principal investigator and chief architect
for the NSF TeraGrid.
Dan also served as department chair of computer science at the University of Illinois and
when I was e-mailing Dan while he was in Singapore this week about what I might say, I
was wondering how far back to go and I discovered I could go back 14 years, because
we have here, at this conference, one of his former students, who is in his Ph.D.
program, like to briefly introduce Fabio to say a couple of words about Dan as a
professor.
>> Fabio Kon: Dan doesn't know anything about that. He probably doesn't even
remember me, because I was taking the Ph.D. with professor Roy Campbell at the
University of Illinois, and I remember that everybody knew that Dan was a great
researcher, but Dan, it was announced that he would give a class on the performance
analysis and I was very interested in that topic so I enrolled in that class and it was the
most difficult class that I took in my life. I had to study like crazy, but I remember how we
enjoyed the class and all the students enjoyed the class. When we had a class, we woke
up in the morning happier, waiting for that moment in which we would be in the classroom
with Dan and having the pleasure of feeling the new connections in our neurons in the
brain while he was talking.
So it was a very enlightening experience. And just after that, he became the head of the
department of computer science, and it was a big change in the department in which the
department got the funds to create a new building and that changed very significantly the
infrastructure that the department has.
And just taking that class with Dan influenced my whole career after that, I think. When I
advise my students and when I teach everything that I learn in that one semester, I try to
pass to my students. So he didn't know that, and I bet there are hundreds of other
students like me that were influenced by him.
So just wanted to say that.
[applause].
>>: So it's my pleasure to introduce our key note speaker, Dan Reed.
>> Dan Reed: So I do have to now tell a story about fab yo, because I do remember.
Any of you who have been to Illinois in the winter know that it gets cold there. And him
being from Brazil, I met him walking into class one morning, and it was October, and the
frost had just started to come and the chill and he said, it's really cold here. Is it going to
get colder? And I said, I've got really bad news. It's gonna get a lot colder.
So thanks for the introduction. I want to spend some time talking about several issues
around clouds. A bit of cloudy forecast, talking some about what I think are the
technology trends that are driving the interest in clouds, some issues around what I think
are the opportunities to think expansively about the future of computing and some sea
changes that have really happened because of technology changes. Some about client
plus cloud experiences, an attempt at perhaps horizon expansion, and then some about
the -- and so in some sense, that's a discussion about what I view as application
research challenges and then some about some of the infrastructure and system
software issue challenges around clouds as well.
So if we sort of look back over the last, oh, 50 or 60 years and think about what's
happened in computing, I think the key lesson of modern digital computing has been
increasingly democratization. We moved from a world where computing was rare and
expensive to one where it's common and inexpensive. And yet, in many cases, I think
our psychology is still dominated by an implicit belief that computing is expensive and
that we should optimize computing rather than optimizing the human experience.
So the notion of what 21st century computing might look like I think is really a story of
increasingly natural interfacing and I'll talk some about that, about the fact that
intelligence is increasingly embedded in everyday objects, and some of the pressures
and challenges that exist around ubiquitous network access and communication.
I just came back from Singapore in giving a key note at the dynamic spectrum conference
where we were talking about many of the issues about worldwide spectrum allocations
and the ability to do cognitive radio to provide additional wireless band width for the vast
array of interconnective devices, what it means to enable these natural resources so I
want to talk some about these things.
But remember in psychology, it's easy to forget so many things that were true very long
ago. The fact that e-mail was rare. That spam was mostly stuff that you ate as opposed
to things what you worried about in your in box. That fishing was still largely something
you did with a pole rather than attempting to steal identities, that WIFI and cellular were
rare and that friends were mostly people you actually knew, as opposed to social network
friends.
So how we think about these social changes has big impact on how we embrace
technology and the ways we think about building the enabling components that will make
these new things happen.
So I think part of our challenge as researchers, whether in application domains or in
technological domains is to think expansively about the art of the possible. And as I said
I think often our language and culture con strains many of those things so I want to
challenge some of those opportunities and issue as we go along.
So let's think about what's changed over the last few years. We've got powerful system
on a chip designs. That's really what's allowed us to build mobile devices of a wide
variety of types. GPUs that provide graphics performance and near photo realistic
experiences that were unknown, even a handful of years ago.
The huge amount of explosive data growth that's, itself, a consequence of CMOS
technology and that has had some profound social effects on science as well. The fact
that we've moved from a world where data was largely rare and advantage often
occurred to people who had unique instruments to the broad availability of data and the
social change that says, you can do better science than me, not because you have better
infrastructure, but because you ask better questions. Because we all have access to the
same data.
And then a whole set of issues around, as I said, computing embedded in everyday
objects, the issues I just mentioned about wireless spectrum pressure and then all of the
opportunities that we've been here to talk about, about new software models of clients
plus clouds and how those things interact.
I wanted to hark back, though as a prelude to these discussions to some observations
that Jim Gray made years ago, that I believe the challenges we face in this decade and in
future decades are increasingly multidisciplinary. Regardless of the domain in which we
work, it's important to think about how we build tools and infrastructure that allow people
to bridge disciplines. Because the problems often lie at the intersections of those
disciplines. And the cultures, in fact, are often our biggest obstacle, rather than the
technology. Years ago, I gave an opening talk at a meeting that was looking at the future
of technology to support building structure design, and I was staying in a hotel and the
meeting was a nearby building. I walked out of the hotel, walked to the meeting and
there were two streams of people coming out of the hotel. One of whom were dressed
very solemnly and seriously, as if they were perhaps going to a state funeral and another
group of people coming out of the same building in t-shirts and tennis shoes and shorts
who looked as if they might be going to the beach.
I was somewhat surprised to see them both entering the same building, the building
where I was going to give this opening talk, and when I entered the conference room
where the meeting was, I realized, in fact, that what I had seen was the group of civil
engineers and the group of computer scientists, and I made some observations about the
importance of mixing cultures and I had showed up at least dressed somewhat like this in
long pants with a jacket, and I said I'm here as a bridge builder between communities.
So as we think about these interdisciplinary issues and how technology enables them, as
Jim Gray observed, they are systemic problems and the computation really depends on
this balance about the characteristics of computation, of storage, of networking and of
data access.
And to hark back to my professorial days when my students used to ask for copies of old
exams to prepare, I always reminded them that the great things about computing is that
the questions don't change, but that the answers do. And so just because you've seen
the old exam, doesn't mean that the answers are the same, because the ratios of access
times and costs are really what make some things possible, and some things prohibitive.
And as we think about this new generation of client plus cloud, think back, oh, 40, 50
years to the fact that you could have done really great distributed systems research in
1962. One would have had to commandeer the computing services of multiple
universities and perhaps negotiate with the telecom carriers to acquire some leased 56kb
lines for wide area communication. But there's nothing intellectually that would prevented
one from having done that. It was entirely technically possible. It may have been
politically impossible or economically nonfeasible, but it was intellectually possible. And
yet, very little of it happened, because of those issues.
And so those ratios actually do change what we think about. Part of what I think is our
opportunity and our challenges is to look forward and say which of the things that are true
now continue to be true and which of the things that are false now are feasible and
possible to the future and skate to the future, rather than just the present.
So in that spirit, let me cast a few scenarios to think about before I talk about applications
and infrastructure. So imagine some possible futures. One where your car drives and
navigates for you and maybe even parks. And that's increasingly a feature of many
vehicles. What does this rely on? Relies on embedded intelligence, sensors, broad
knowledge about traffic flows, and the broad set of issues related to urban planning and
urban congestion.
Think about a sound system that only plays the music you love, because it knows every
song you've ever heard. Again, an understanding of social networks and sensors and
interactions.
Imagine a phone that only rings when you want it to, because it not only knows what
you're doing, but it knows the context in which you're working and maybe even knows
your emotional state. That this may not be the best time for you to have this conversation
with this person.
All of your family memories are recorded automatically. It's entirely possible now for a
few hundred dollars, a few hundred euros, you can record every word you'll speak in your
lifetime.
Your body calls an ambulance when you're ill by virtue of biologically powered implanted
sensors. A little more far-fetched, but you can certainly see many of those things coming.
And you're DNA sample determines and your life style determines your quality of
healthcare you receive because you really do have personalized medicine that's tailored
specifically to your genotype and its phenotype and the environmental context in which
you live.
And your office, well, it's smarter than your and actually adjusts its behavior to your needs
by understanding the context in which you operate. Now, I don't know if any of those
things are going to happen, but I know we're on the verge of making many versions of
these things true and other versions that may be, perhaps, less positive.
And so as we think about this interaction between technology and possibility and
applications, social domains and policy, we're really on the vanguard of making a bright
future possible with the right set of opportunities and constraints.
But all of these are examples of something that I think is pretty simple. Successful
technologies, regardless of the domain, become invisible. The hallmark of immature
technologies is that one is painfully aware of their limitations and of their explicit use.
All of the things that we take for granted allow us to do things that we would not otherwise
be able to do.
Think about something as prosaic as the history of the automobile. In early days if you
weren't a first order mechanical engineer and a mechanic, you dare not drive one,
because repairs were common. And now most of us rarely look inside the vehicle, other
than to operate it.
It's true in lots of domains. And so as we think about how to empower people, the one
thing I remind myself repeatedly as a technologist is that the fraction of us on the planet
who care about technology as an end is the first order zero percent of the population.
Most people care about technology as a means. What will it let them do. How will it
empower them to accomplish things they could not otherwise do.
So as we think about client plus cloud I think that really is the key message. How do we
use client plus cloud technology to empower people in some new ways?
And so one of the things that goes with this is to think about interfaces. And so I'm going
to rant just a little bit about what I think is our failure of imagination to exploit this
transition from powerful but very expensive computing technology to even more powerful
and inexpensive technology.
To first order the way we interact with computing has not changed in the professional
lifetime of everyone in this room. Yes, we've gone from batch job submission where you
were lucky to get one run a day maybe on a batch machine with punch cards or punch
tape, to a whole lot of relatively powerful graphical interfaces. But to first order, all of
computing remains stimulus response.
Those of you who are sitting there looking at your e-mail or whatever, unless you do
something, your laptop just sits there. You literally have to poke it to make something
happen. It's true of touch displays and all the other new things we see. They are passive
devices. And they rely on us to drive the interaction.
So as we sort of think about this history, as we've moved from batch computing through
command line interfaces to GUIs, what happened with the internet, and the things that
we're excited about with client plus cloud, and we think about those futures, remember
what I said a moment ago about what happens when we have large numbers of very
powerful, inexpensive devices. How do we shift the dialogue and the imagination from I
optimize my behavior to the constraints of the computing system to, how do I design and
use a plethora of devices to empower me?
And that means a wide variety of things, but think about this transition, even from an ens
had GUIs where we have speech and we have handwriting and we have single and
multitouch to what it means to build truly natural interfaces, where I can interact with
client plus cloud systems as naturally and intuitively as I can interact with you. I can talk
to you, I can gesture. I can wave my hands, I can foam at the mouth. I can reach out
and touch, and you understand what all of those things mean.
To do that, that means in effect relying on hundreds if not thousands of wirelessly and
wired connected devices that understand my local environment, understand your local
environment, share information, and rely on all the back and infrastructure for context it.
But that's the psychological difference between computing being rare and computing
being common and saying, you know, I can use thousands of devices on a routine basis
just to make sure I'm not late for my next meeting.
So let's talk about clients and clouds and experiences. And the reason I wanted to
hammer on that point is because the language shapes behavior. Anytime you or I say
computer, we immediately narrow our scope of discourse to a vanishingly small fraction
of the things that computing can do.
The fact that all of us, I mean, here's an after-dinner question to pose to your
non-technical friends. Ask them how many computers they own. And the odds are is
they'll give you an answer that's a small integer. You know, probably in the zero to six
range, perhaps. Point of fact, all of us own thousands of computers. They're embedded
in almost everything we use every day.
And so that nomenclature really does shape the discourse. And so this notion of fixed
and portable and mobile devices and even specialty devices is true, but the notion of
client plus cloud with intelligent objects, that support healthcare that support intelligent
energy management, that support social inter action, that I think is one of the great
untapped opportunities. And it's true not only in social and consumer domains, it's true in
the way we think about doing science as well. Because ubiquitous sensors can
transform the ways that we think about capturing and analyzing data for scientific
discourse as well.
So let me talk about a couple of examples and let's start with healthcare in this context.
So the challenge and the opportunity of personalized medicine, if you will, what we've
learned from genotype sequencing, and what we increasingly know about gene
expression and our understanding of its manifestation of complex disease, the yin and
yang is how do we extend that technology to individuals worldwide but do so in a cost
effective way?
The great temptation, because of many social constraints, is to apply technology for
protection or as an indirect consequence to increase costs for medical care. And if we
want to broaden that base, then we need to think about some new ways of leveraging it.
How we provide day-to-day experiences.
So when I was talking at the outset about what does it mean to say I want to create an
environment where the quality of your healthcare is related not only to your genes and
their expression, but to the context in which you live. And perhaps equally importantly,
how do I shift the dialogue from you go to your physician as a supplicant, to request help
when you're ill, to you being an active participant in managing not your illness, but your
health. And that means how do I make available to you large volumes of information in
intuitive ways that you can understand that context?
And so you think about this world of interconnected medical devices. How do we deliver
high quality care and do so to scale costs in some ways that are increasingly efficient.
So think about some of those issues. As we think about using sensors for lo cost data
measurement and diagnosis, as we think about client plus cloud solutions that would
allow us to analyze that data and map it to personal circumstances. How we think about
personalized drug delivery and tailoring and then the broad set of issues around public
health. As we monitor information and protect privacy of individuals, how do we exploit
that data in effective ways to manage public health situations as well.
So how do we do this? How do we think about this vast array of devices, most of which,
as I said at the -- as before, have embedded computing intelligence in them. They have
the ability to capture and analyze data. They have wireless connections. How do we
fuse that data and support insight not for the physician, but for the individual in ways that
provide insight. That I think is our challenge and our opportunity.
Another example. Think about the future of intelligent energy management and what it
means to build a truly smart grid for an energy ecosystem. Everything from how smart
devices that capture insight from your home experience and relay appropriate versions of
that to the utilities so that they can manage dynamic load. How do we think about
intelligent vehicles for traffic routing for hybrid vehicles and energy management, whether
hydrogen or electric or battery powered. All the combinations that go with that, and what
does it mean to manage and support life styles in that context.
So there are some interesting studies about the power of intelligent energy management
in industrialized countries that say that -- and here's the key issue to think about. That
the energy intensive life style that all of us in this room are accustomed to, the other
several billion people in the planet aspire to. And the interaction of that energy demand
with global warming, environmental change, is real.
And so the question of how do we continue to deliver higher quality lifestyles but manage
the total energy consumption, not only in industrialized countries but in developing
markets to make those things happen, while responding to climate change, is a big issue.
And so all of these issues about distributed sensors and monitoring and not only the
client side but the issues of cloud data analysis come together.
I want to just give you a concrete example of something that Microsoft announced last
week as an example. It's not a marketing pitch. It's merely an illustration. So last week,
Microsoft announced a collaboration with Ford on electric vehicles. And it relies on a
combination of technology that Ford is gathering together with engineering technologies
that Microsoft has under the rubric of Microsoft hohm. And this particular example is
looking at the following question.
So Ford's about to start shipping relatively soon electric vehicles in the U.S. and the
question is this. So imagine a world in which there were millions of people driving around
in electric vehicles. What do people worry about? Consumer study after consumer study
says the nightmare scenario that everyone has with an electric vehicle is they run out of
electrons, right? That is the consumer worry.
So there's the whole issue of how can I make sure I can do root management in a way I
don't end up in the other side of the city desperately looking for a place to recharge my
vehicle. Beyond that, there's the issue of what happens when millions of people drive
home at the end of the day and they plug their vehicles into the electrical grid. All right,
this is a massive hammer hitting the energy system.
And so the notion of how one manages information about your life style, when you travel,
what it knows about when you might need to do what, and how one can do broad
scheduling of charging to manage and load level at the utility level from that distributed
sensor information is a big part of this story. And it's a client plus cloud application,
absolutely. Very distributed one with huge centralized components.
So the notion really in some sense is this. So there's a concept vehicle, but if you look
inside, and these are just concepts, the dashboard not only gives you status of the
energy reserves of your vehicle, but helps you do root planning, based on traffic, but also
on the availability of charging stations. And your mobile device may well interact with
your social calendar to do similar kinds of things. This is what I need to do to manage my
life style and to manage the utility environment so that one avoids peak load demands
that typically result in much less efficient energy generation.
And then finally let's talk some about science before I talk about cloud infrastructure. And
this is a slide that I stole from Tony Hey. Full credit here.
>>: I stole it from Jim Gray.
>> Dan Reed: There we go. Tony says he stole it from Jim Gray. The fourth paradigm
book is a tribute to Jim. But think about what's happened. It's another one of these
switches from scarcity to plethora. The fact that think about the history of science. We
went from science being mostly magic, if you will to experimental science where there
was a fairly rigorous process to understand and measure natural phenomena. The
emergent of mathematics and all the theoretical studies we have. The allusion made in
my introduction about the fact that all of us or most of us in this room have been involved
for many years in computational science. The fact that computing has become
sufficiently powerful that we can build models of natural phenomena or phenomena
where experiment simply is either cost prohibitive or safety prohibitive or time prohibitive.
In principle, one can do experimental analyses of stellar evolution, but probably not in any
practical way that we could ever imagine doing.
But we've shifted to a world of really data-rich science by the existence of sensors and
our ability to store large volumes of data. And as I said before, this is a hugely social
change as much as a technical one, because throughout most of the history of modern
science, advantage has almost always accrued to those who had unique instruments. If I
had a telescope with more resolving power than you, I could observe phenomena that
you couldn't.
I didn't have to be smarter than you. I just had data you couldn't get. With the rise of
broad sky surveys in astronomy and with the rise of inexpensive sequencing technology
and you can go on down the line, we really have moved to a world where data is actually
quite common. If anything, we're drowning in data and the challenge now is extracting
insight from that data.
But I think the social challenge and change is perhaps more profound, because as I said,
it has shifted many domains from data is rare and expensive and I spend a big chunk of
my career creating unique infrastructure to create competitive advantage to really one
where everyone has access to data and the question is, can you ask better questions
than I have? Because we all have access to the same data.
But that Democratization of providing technology to allow those questions to be asked is
really our client plus cloud opportunity.
So we all know this explosion. And it's really what's driven the growth of massive data
centers. To give you one other example which many of you probably know, think about
the rapid rise in quality of machine language translation. Computer science has been
working on machine language translation since the beginning of the modern digital
computer era and it's gotten incrementally better year after year after year as we've
attempted to derive better grammars, better rules that map concepts and syntax from one
language to another. But the real lesson in some sense of cloud based machine
language translation is the power of data.
If I have hundreds of millions of examples of the same thing being said in multiple
languages, statistics becomes your friends. And that is really fundamentally -- I'm
exaggerating, but fundamentally one of the things that has rapidly improved the quality of
machine language translation. We have samples. We have corpi at a scale we have
never had before. And the fact that Microsoft research, for example, was able to create a
quite adequate translation of English to Haitian Creole in the space of four days after the
Haiti earthquake is an example of that. There are a lot of other examples, but scale really
does make a difference in many contexts.
This is really what I was saying. We've moved from a world where it's hypothesis driven.
I have an idea. Let me go capture data and find out whether the data supports this
hypothesis to someone that's really exploratory. What correlations can I glean from the
plethora of data that I do have, and that means different sociology. It means different
tools. It means different techniques. It also means that for complex problems, simplicity
really matters. Because those problems that span disciplines, the simplicity of the tools
fundamentally determines whether one can tackle the problem or not.
And that's really the concept of the fourth paradigm that as we think about
multidisciplinary actions and think about those examples I gave, delivering personalized
medicine in a developing environment is not just a technical challenge. In fact, arguably,
that's the easiest part of the problem. It's the set of economic issues. It's the set of
political issues. It's the public health. It's the planning issues. It's the education issues.
It's the promulgation and distribution of the technology. It's managing the social
expectations around the use of the data. All of those things intersect with the technology.
And so in the spirit that successful technologies are invisible, how do you bring together
people across a dozen disciplines to solve a complex problem? Tools have to be simple.
A great story that Fred Brooks used to tell at UNC, when I'll repeat again here. He said
my objective as a tool builder is to build tools so powerful that a full professor will want to
use them. And so simple that they can. And that's a high bar indeed. And it's even more
challenging when you bring the people together across disciplines, each of which have
that perspective and often lack a common set of concepts to communicate and have
different social norms for success across the disciplines. Simplicity really matters.
And that's one of the reasons why I think these opportunities of client plus cloud to
support these multidisciplinary collaborations are so important. That the ability to deliver
simple tools really matters. And this is a slide I stole from my colleague, Roger Barge
that really gets at the notion of as we get to think about computer structure, how do we
enfranchise versus disenfranchise. I spent almost all my life in high performance
computing working across disciplinary boundaries, people in a wide variety of technical
domains. As I observed fundamentally two categories of people. There are the folks for
whom any new technology that he will embrace to advance science. These are the folks
that I would describe as the Marines on the beach. We may die in the attempt, but we
will happily embrace the technology and use it. Those are code words for physicists.
There is the other category of people for whom they will continue use a familiar
technology that's easy to use, even if there is perhaps a more powerful technology that is
difficult to use. Those are code words for biologists. And again you know, those are
gross characterizations. They're people in both camps.
But there are fundamentally different psychological expectations across disciplines for
tools. And so one of the questions is, how do we provide the power of those amazing
new technologies and the simplicity of familiarity? So put it in concrete terms. There are
people who will happily write a parallel application to run on a cluster and use a message
passing interface to do that, right. This is the assembly language of parallel computing.
People do that all the time.
There's another class of people that if you gave them that infrastructure free and all the
software, they would never use it. It's too painful for them. They would rather run a
spreadsheet for a month to do that calculation than write the equivalent code and run on
a parallel measure machine and get the answer in ten minutes. Simplicity matters.
So how do we get those two things together? Where we get the power of one and the
simplicity of the other? So I think our opportunity in this client plus cloud world is to
create this illusion that the powerful desktop metaphors and tools scale arbitrarily into the
cloud. That my data access looks just as powerful on my notebook or my mobile device
as it does if I'm directly accessing data base, that the computing capability scales as well.
That my domain specific and general purpose desktop tools have arbitrary power and
scale and scope for data management and computation, but they have the simplicity that
they do that I'm familiar with for doing small problems.
That's our opportunity and there are a huge number of underlying technology issues
about that, as well as the disciplinary issues. And that Excel example is real. I know
people looking at the spread of epidemics that, in fact, do build epidemic models and they
run for weeks or months in Excel. Because that's a familiar metaphor. It's easy to use.
Now I can see some of you shaking your heads saying oh, geez, I can't believe they do
that. Well, they do. So it works for them. But the scale and scope of the problems they
want to solve reach beyond what those desktops can do. And so there are many ways
you educate, but you also ask, how can you address the limitations of those tools and
preserve the familiarity. Because a social milieu really does matter.
And so a lot of what my team has been working on is engaging with the community about
these things in collaboration with Tony, looking at how we can engage the worldwide
research community to talk about how we empower people, how we enfranchise the
disenfranchised with more powerful experiences but familiar tools. And so we've been -we struck a deal with the U.S. national science foundation. We're in the midst of other
negotiations with other research agencies worldwide to make available Azure technology
as well as an engagement team to work with people on these issues of simplicity and
client plus cloud tools.
Let me turn, with that sort of backdrop about applications to talk, in systems geek terms,
about cloud technologies for a little bit and what I think some of the research
opportunities there are. The fundamental take of everything I've said up to this point is
convergent invisibility. How do we bring together the complexity of these hugely
distributed systems but still maintain simplicity and move from procedural programming to
modeling visualization, because that's what people really want to do. Programming is a
means to an end. It's not an end, except for a small number of us in computer science.
How do we move from discrete tasks to being able to support context and intent? What
is it people really want to do? How do we move from trial and error to some more
systematic processes that, as we bring together increasingly complex multidisciplinary
models, we have some idea whatsoever that we're not engaged in just a random
process.
So I'll throw out one other sidebar observation. If you look at the complexity of scientific
applications, even with individual domains, but certainly as you think about fusion across
multiple domains, what's the probability that a complex multidisciplinary science
application with 100 million lines of code in it is actually correct?
Not high. Couple that together with increasingly those models are run on machines that
involve tens, hundreds of thousands of processors and a complex underlying software
infrastructure that itself is problematic about being fully functionally correct. All I'm really
saying is a computational science experiment at large scale is an experiment in the same
way that any experiment is an exploration of distribution of the truth. So if you do an
experiment and you measure something and someone tells you it's two, with no error
bars on it, you should be suspect.
If someone runs a computation and says the answer is two, you should be equally
suspect for the same reasons. So as we think about this, there are challenges at that
scale about verifiability and consistency. There are issues about privacy and identity that
go with access in these things as well. So what we're really talking about is how do we
build a ubiquitous infosphere that spans disciplines that spans technologies, that provides
the right information and the right context at the right time. And how we do away in ways
that meet quality service expectations.
So to hark back so what I said before, this is really about how do we move from stimulus
response mode to allowing multidisciplinary teams to ask and answer differing questions
and have the computing environment not just be an aid but be a partner. You know,
much like my assistant whom I jokingly but quite seriously say makes me look better than
I really am. That's really what you want your computing system to do is to make you look
better than you really are by being an active partner in the things you want to do. And
that really does speak to this world of sensors and clouds and there are a wide variety of
issues about how we fuse these things together to do them.
So let me talk about cloud infrastructure for a little bit, since I've been talking about clients
and experiences. There's a lot of hype, Lord there's a lot of hype around clouds, right?
Have you been writing a research proposal lately? How many of you have taken that salt
shaker and just sprinkled cloud on your proposal? Yeah, there's one. There's another.
It's the way we play the game, and we all know that. There is an enormous amount of
hype around clouds. But there is some truth in the story as well and there are some
really interesting systems research challenges around scale as well.
So one of the other things I'd leave you with is, in the spirit of the questions don't change
but the answers do, is anytime you add an order of magnitude, some of the truisms break
down. Some things that we held true no longer scale. And we have to rethink some
things. And there are some issues in cloud environments around that.
Environmental responsibility is one of them. Big cloud infrastructures. A single data
center can assume 50 to 70 mega watts of power. And a lot of that is heat dissipation
which goes back to our climate issue. How do you provision 100 thousand servers. I'll
tell you a funny story about this. One of Microsoft data centers in the state of
Washington, I was touring it when it was being built. There was one person whose only
job was cardboard recycling. So if you buy 100,000 servers and they come individually
wrapped for your protection, then some things change.
So the whole notion of what's the right side Lego block at this scale, it's not a pizza box
server. It's a different sized building block. And the whole issue of how do you bring
those things online quickly is an issue. How do you deal with the huge social issues
around failure. But when an increasingly large fraction of the human knowledge base is
both digital and contained in a modest number of these massive data centers, there are
not only operational consequences for the operators, if they fail, there are huge social
consequences for the planet if they fail. And so those management issues are real.
And then there are a bunch of software and services issues around performance,
reliability, security, and reliability's on there twice because it matters a lot. So one of the
things to think about is a computer center and a computer room and a data center are not
the same thing.
You know, if you put up the standard science fiction movie vision of a computer room,
you know, it will look something like this. And depending on the vintage, there will
undoubtedly be some spinning tape drives in the picture somewhere. I'm sure movie
makers everywhere are unhappy about the demise of spinning tape drives. Even blinking
lights have gone away. It's a long line of boxes, mostly gray or back. This is what we all
think it looks like. It's got a raised floor and it's air cooling coming up. And it's cool to a
point where polar bears would be happy inside. This is the standard view.
Well, in fact, it's a whole lot more stuff. It's massive physical plant. It's operations and
control. It's huge amounts of UPS battery backup. It's diesel generators to provide
battery backup and it's massive amounts of cooling.
So a conventional computer room like that, scaled up to be a data center, as I said, can
consume 50 to 70 megawatts of power, can evaporate 4 million liters of water a day of
near drinkable quality. That has some interesting environmental implications as well.
The lead time to deploy these things is often dominated by how many two megawatt
diesel generators you can buy or deploy. It's dominated by the battery backup. It's
dominated by the water. It's dominated by the physical plant costs.
And one of the things that's important to remember is this. That if you look at the cost of
one of these facilities, about half of it is the computing and the other half is the other stuff.
This stuff is getting more expensive. This stuff is getting cheaper.
And if you look deeply at these, you'll see cooling, cooling, power. Most of the cost
somehow is related to energy management. So here's another thing that is not true that
we've been led to believe. I talked about the fact that computer rooms are cool to polar
bear consistency. Well, in point of fact if you look at what's actually true, you can run
computing hardware at temperatures that are just short of burn your hands if you touch
them. I'm talking second degree burns. And the failure rate is only marginally higher
than if you keep them cool.
The reason why this matters is so much of the cost of running a massive cloud data
center goes into cooling. And if you can make cooling go away, not only do you not -you use energy more efficiently, you don't throw heat out in the environment, you reduce
the total cost of physical plant as well.
So the huge incentives to try to optimize some things that we historically didn't worry
about, because if you're standing up, a rack of servers in a closet, you don't really worry
this much about it. We should but we don't.
So if you look at the history of data centers that Microsoft has built, generation one what
is I was describing. And with the cardboard guy. It looks just like raised floors and you
build massive amounts of them.
Generation two was an optimized version of that stuff, and I'll show you some more
pictures of details in a moment. Generation three went to bigger sized Lego blocks and
the notion that the building block one uses to build a data center is a shipping container
filled with computing hardware. It has three umbilicals. Power, cooling, network.
You drive it in on a truck, you lift it off the crane, you put it on the concrete, you plug in
the three umbilicals, it's on. That's the API. And that shipping container contains several
thousand servers and the data center consists of a concrete slab, a bunch of utility
infrastructure and a metal roof. And ideally you cool those things as little as possible. In
fact, ideally, you don't cool them at all, right. So what's success? Success means that I
just came back from Singapore, you can build a container that you can roll out and plug
in in Singapore with no umbilical for cooling. Just power networking. Because everything
else is wasted. You, in principle, don't need it.
And then the last sort of generation is what do you think about module rising to physical
infrastructure, the power and other things that go with this. So let me give you some
scale. These are overhead views of the generation 2/3 data centers and they look like
industrial warehouses. Well, they are. They're just filled with computers, and they're
about 10 to 12X the size, and it doesn't matter which kind of football field you pick,
whether it's American football or real football. It's order of magnitude the size of one of
those, right?
The ones that are filled with shipping containers, and this is one in Chicago, and these in
round numbers cost anywhere from half a billion to a billion dollars apiece. They're even
bigger. They're almost 20X the size of a soccer field. And Microsoft and its competitors
are building a worldwide network of these things with all of the infrastructure that goes
with them.
The vision I was talk about was this, that says -- this is an overhead view of that Chicago
data center, and if it looks like a truck stop where lots of 18 wheel trucks are parked,
there's a reason for that. These are the shipping containers. Ed scoop they're double
height with computing in the bottom and basically all utility support above. Those are the
Lego blocks. And the reason for wanting to modularize at this point is you can go to
hardware vendors and say I'm going to define the API at this interface, but I don't really
care what you put inside the box, as long as you can meet the service level agreement
for my expectations about performance and reliability.
And if that means you can do it with trained monkeys and as long as we don't have to
pipe in Banana juice to support them, we don't care. Just meet the reliability and
performance SLA that goes with those.
And so the time to market issues about being able to deploy this stuff rapidly with
preassembled components for all of the physical plant issues are parts of the challenge.
So with that context, I want to very quickly show you a few of the research challenges on
the systems side.
So energy efficiency really matters, but in principle what people want to do is optimize the
ability to deliver operations relative to TCO, to total cost of ownership. How do I deliver
service at the lowest possible cost? And there are a whole bunch of issues there about
how efficiently do my services run? There are micro processor and systems issues, all
these issues about energy efficiency, the things I was talking about packaging and
cooling, about the market costs for utilities and infrastructure and then all the economic
issues around the cost of people and money and time to market.
So just as I was talking about these complex, multidisciplinary application challenges
around personalized healthcare or around energy management or around research, there
were analogs of those problems here, because the traditional siloed approach that we
would take to think about, I'm a networking person, or I'm a micro processor architect, or
I'm a data base person, or maybe I'm an operating system person is not enough to solve
these problems. It's the interplay of multidisciplinary technical teams to tackle these
things.
So I'll tell you I learned more about sanitary sewer system design in the last two years at
Microsoft than I ever did in 30 years in high performance computing. And you might think
wow, what do those things have to do with one another? Well I was talking about those
data centers and the fact that the traditional ones are cooled with water. And the cooling
systems in those things, the pipes have algaecide in them to keep algae from growing in
the cooling pipes. Periodically, you have to flush those things and you have to dump that
someplace.
Well, it goes into the sewer systems, right, under all the appropriate regulations and
public services processes. Turns out that algaecide is too clean to go into the sewer
system. It actually kills the bacteria in the leech fields. It's a big issue, right, so you have
to think about things like that. You have to optimize cooling efficiency, learn something
about sewer systems. So my point is this complexity, the inter-dependence, is actually
quite real.
So one of the things that my team is doing in partnership with folks across Microsoft is
looking at how we take a systemic approach to these designs. How we look at
processors, at storage, at networks, at packaging, at software. Say how do we do global
trade-offs to try to meet some of these efficiency relative to TCO objectives. And I just
want to quickly show you and plant some food for thought. And don't let me ramble on
too long.
Why has virtualization been such a hot topic lately. One of the reasons is because a lot
of the workloads we run don't need the performance for modern processors. The fact
that I could virtualize three or four workloads on a processor and still meet some end user
expectations -- and that's not true for all workloads. Certainly not true for many technical
computing workloads, but it's true for many. For business workloads for sure. Because
those processors are far faster than we actually need.
So one of the things you could do is say, well, okay, virtualize the heck out of them and
that's what we do now. The other is to say what can we learn from the mobile and
embedded space and use lower power processors that actually achieve much higher
efficiency. So if you look at one of these versus even a laptop and you ask what's the
efficiency in terms of ops per joule that's achieved in a mobile device versus a laptop, this
is much higher in terms of energy efficiency, architecturally routed operation count. It
was designed because that's a pretty serious constraint. You won't use your phone if the
battery only lasts ten minutes.
There are lessons to be learned there about how we think about building next generation
data center processors. And balancing energy efficiency, complexity, in order, out of
order execution for appropriate workloads and so I plant that as an architectural seed as
food for thought. Think about storage at scale. Remember what you said about the fact
that orders of magnitude matter.
If you use any of the consumer advertised supported mail systems, in Microsoft's case
hotmail, think about how many accounts there are for those things worldwide. Think
about how much data there is. Think about how often disks fail. And a big data center
can lose hundreds of devices a day, right?
If the mean time to failure is a million hours and you got a million of something, do the
math, right? You lose something all the time. Disks are our last remaining mechanical
component in computer systems. And there is some hope, as we look beyond flash, and
flash in many ways is a stopgap technology, to things like face change memory and other
possibilities where we could have divided [indiscernible] of DRAM and the resilience of
block addressable flash and the reliability even greater than disks that change some of
the dynamics.
Because the challenge of this besides reliability at scale, is they're increasingly write-only
devices. Just think about the economics of what's happened with commodity disk. You
can buy a terabyte disk drive for a pittance today. And if you asked yourself how long
does it take to extract a terabyte from that drive, they're band width limited. The
technology is optimized for capacity, but not for band width. In fact, in many cases, the
disks that are deployed in scale and data centers are only used to 10 or 15 percent of
capacity, because band width is a constraint, not capacity. It's better to use more disks
at lower capacity to get more aggregate band width.
There are challenges and opportunities there to think about those things.
Networking. Think about the history of TCPIP. When it came into being, 56 KB was
blazingly fast bandwidth. It was predicated on non-reliable, low latency -- sorry.
Unreliable, low bandwidth communication. The data centers that I just showed you
pictures of have more computers in them than the entire internet did a handful of years
ago. And a small box, relatively speaking, was quite reliable communications and fairly
constrained speed of light delays.
How do we rethink the whole issue of the hardware software stack for networking at scale
and drive the economics down where we can build large networks at low cost that are
reliable. And then one other thing I will foam at the mouth a bit is the speed of light and
it's not very fast and I wish it were faster. Tony hasn't been able to help me with this
problem, physicist or not.
And so what this means in many cases is we're building data centers that are islands.
They're connected by a small number of ten gigabit lambdas and everything I said about
cell similarity and storage of disks applies at data centers at a much larger scale. There
are hundreds of petabyte of data in them, connected by relatively thin pipes.
Does it mean to embrace truly high bandwidth while you're in networking and say I want
a provision -- pick a number, a hundred terabits of bandwidth, okay? It's technologically
possible right now, but we don't do it.
I talked about the packaging and cooling issues that we don't have to cool from polar
bears. There are other options. I talked some about reliability and resilience. One thing
you might ask is when hardware is cheap, why do I worry about making it reliable? Why
don't I just buy more of it an over provision to tolerate failures, because it's getting
cheaper and cheaper. And that's a different psychological model, because we generally
try to think that our hardware is reliable. Well, large scale, it's not. Statistics doesn't lie.
And then there are all the issues around trust and privacy and security, and this is a
complex set of technical issues, as well as political, economic and social issues. How we
develop differentiated SLAs that allow us to deal with different expectations about
privacy, reliability, performance for different kinds of workloads. There are some deep
issues around that.
And then the broad set of energy issues. How much you think about using renewables to
power data centers. There's a whole interesting set of technical issues around what
happens if the sun goes down and my data center doesn't have power anymore. Well,
dynamic load distribution, load sharing, there are deep systems issues that go back to
networking, software distribution, global consistency that come in to play. And we're
working on those issues as well. And as I said, there are these SLA issues.
So let me leave you with this last thought. As we think about all of these complex,
multidisciplinary issues on the application side, on the physical infrastructure side. And
they're sort of summarized this way. What bounding box do you draw around the
problem? If you define it narrowly, then that will define the context in which you attack
the problem. What kinds of workloads is one trying to provide and perhaps equally
importantly, maybe even more importantly, the metrics of success you use, reward or
punish.
Just like writing research proposals, if your academic institution rewards you on papers
published and money brought in, well, we're all smart people. We'll go optimize for those
two metrics. Those may or may not be optimization for insight and knowledge creation.
But these are the worlds in which we live.
They said component failures are real at scale, and people and machines don't mix well.
And we've historically mixed them. There's no reason why a lot of this infrastructure has
to be human habitable, and these multidisciplinary solutions really matter. So there are
systemic architecture issues real large is really my take-away message.
So we need to think about end-to-end perspective in some broad ways. The clients, the
clouds, the applications, and the broad set of issues that go with creating experiences
and holistic design and intelligent management and, in fact, all of the policy expectations
that go with that.
And I see Kent's getting nervous. I actually am done, so thank you very much for
listening to me this morning.
[applause].
>>: We have a few minutes for questions.
>>: Thank you very much for a very inspiring talk. I wanted to question one of the
comments that you made on the general thrust of science in the talk. You mentioned that
there's a switch to form experimentation to observation of analysis of existing data, and
then you make the connection with a switch from hypothesis based to analysis based
science. And I don't think the connection holds. I don't see the argument.
I mean, of course, there are two kinds of research. There's deductive, where you start
from what you have in your mind. And inductive in which you look at data. And it seems
to me that the people -- the case for scientists working mostly in a deductive fashion was
made very convincingly by Pauper, by Koon, by Vashlon and people like this, and I don't
see at all how the availability of more data changes that. If anything, it should actually
make initial hypothesis even more important. Because if you have this huge amount of
data and you just analyze it without any [indiscernible] scientific hypothesis, you are just
going to find all kinds of bogus connections. So I don't follow you here.
>> Dan Reed: It's a really good observation, better ram. And let me try to recast what I
was saying. I completely agree with what you said. Two comments. Fourth paradigm
does not mean the only paradigm. It's a complement to the other three.
What I was really trying to say about the broad availability of data really is perhaps more
operational in the following sense. That when the pain and cost of acquiring data to test
a hypothesis is exorbitant, then you design the infrastructure to capture the data driven
explicitly in most cases by that hypothesis. To be able to test it.
When there's broad available -- broad variability of data before you even think about the
problem, then you have some existing context. I mean, you always have had. But in a
much broader, richer way than before. And it's possible to start asking questions with
existing data, and in some sense to apply a broader range of statistical techniques than
we once were able to.
One my astronomer friends used to joke that who is a variable star observer, oh, great, I
found three of these. Now I can do statistics on the problem, because I can compute a
variance with three. And we've sort of moved beyond that in the sense that there's broad
availability of data, and you can ask questions from it. And I think in that sense, it's
Democratizing in terms of availability of data and infrastructure. That's all I was trying to
say. I'm not quibbling with the epistemological issues about the nature of research. I
completely agree with what you're saying there. Dave?
>>: Hi, Dan. I guess as a former academic, you know that we're accused of not being
very good at multidisciplinary research, and I think --
>> Dan Reed: I've been accused of that too.
>>: You know, that's kind of your call, right. So now that you're not an academic
anymore what advice do you have for us to do that better?
>> Dan Reed: So I think it's a good question. I wish I -- there's an old story of Mark
Twain and life on the Mississippi when he was learning how to navigate the Mississippi
River as a riverboat captain and he was being grilled his trainer, and he said I was
gratified to be able to answer, and I did. I said I didn't know. And that may be part of my
answer to this question. I'm not sure I know. I think it goes back to the rewards issue,
frankly. And the metrics at many levels. I think some of it is related to the way that we
have structured our educational system. It's certainly related to the way our national and
international funding mechanisms work. And what I would like us to try to do is change
some of those things. And those are painful social, social processes to do.
You know, if you think about -- I used to be on an advisory committee for a European
University. A longstanding one. I was talking to a rector at one point. I said you know, if
we were to awaken one of your predecessors from the Renaissance, they wouldn't have
trouble stepping into management because the structure really hasn't changed much in
the last 600 years.
And, you know, that comment is equally applicable to universities, mostly worldwide. And
there's some good things about those eternal verities. There are some things that have
stood the test of time. But I think this is one instance of a broader problem that is a
consequence of technology evolution. That the rate of technology change in a huge set
of social domains has outstripped the rate of ability of our social structures to adapt
rapidly.
And job skills being a concrete example, right. The notion that when you leave the
university system, you have job skills for a lifetime. It was true of our parents' generation.
It's not true anymore. Perhaps the only sustainable skill you learn is how to educate
yourself and you hope you'll be able to continue to apply that. But the notion that you
have a concrete set of skills that extends is breaking down and I think these
multidisciplinary research issues are in that same context. So trying to get people to
think and work in teams, I think, is a university issue. I would like to see our funding
mechanisms and I should say this issue's not unique to academics. I think it's true of
corporations as well. It's an artifact of human social organizations that reward metrics
have to bring people together and do I have any magic for that? No. But I think that is
the killer issues, and it's an astute observation.
>>: Okay. Thank you very much.
>> Dan Reed: Thank you.
Download