20829 [music] >> Robert Hess: There are a wide variety of...

advertisement
20829
[music]
>> Robert Hess: There are a wide variety of machines which form a regular part of our
daily lives. From something as simple as a paperclip to machines which are far more
complex, such as a computer.
None of these machines, however, come close to the complexity that's found in the
machine that is the human being. When you think of Microsoft, you might simply
consider the various ways they focus on technologies related to computers. Microsoft
also plays a role in the human equation as well.
And this isn't just in trying to design better user interfaces or ergonomic hardware. Some
of the same structural models in data filtering algorithms can also find application at a
biological level, assisting us in better understanding our own selves as well as diseases
which often impact us. One person at Microsoft who is finding new ways to apply
advanced algorithms to our own biology is David Heckerman. Hello, I'm Robert Hess
and I'll be your host today as we talk with David Heckerman, distinguished scientist in
the e-science group at Microsoft. Hope you enjoy this opportunity to look at the
technology and the person behind the code.
David began his education with the intent of becoming a physicist, but found his interests
led him eventually into the medical sciences. He was at Stanford while working on his
MD that he began looking at the problems of artificial intelligence. He submitted for his
Ph.D. work impressive construct that he called the Probabilistic Expert System. It was
so impressive that Microsoft hired him in 1992 to build such systems for non-medical
applications.
His work at Microsoft began to lead him further and further from his original medical
focus. One of his pioneering areas of study was for graphical models known as
bayesian networks. And it was while working on these models that he recognized how
this he could also be applied to medicine and biology.
Today, his efforts have allowed him to return to his medical education roots and using it
to design such things as a vaccine for HIV in the search for genetic causes of diseases.
Join me now as I welcome today's guest, David Heckerman.
>> David Heckerman: Robert, pleasure to be here.
>> Robert Hess: Glad to have you here. Now, I've got to get it off the book to begin with
is that Microsoft's technology, I don't really think of Microsoft as being a medical
company.
Do you find that problem occurring periodically when you go around telling people who
you work for?
>> David Heckerman: Yes, when I explain to someone I'm in an elevator or something, I
tell them I'm working on an HIV for vaccine, or looking for genetic causes of disease,
they say: Really, why?
So there's actually several reasons. One is that I have this background in medicine. So
it's just generally of interest to me.
Another reason is some of the things I'm working on like a vaccine for HIV, it's just a very
important problem to tackle. HIV is the disease that causes AIDS which kills over 5,000
people a day.
And so if computation can be useful to combat that disease, that's great. I'm all in.
Probably more relevant to Microsoft is that there's this convergence between a real
growing convergence between computer science and biology. There's some obvious
things like we now understand that DNA is a programming language. Programs the
machines of life, the proteins that make us up.
And another issue is that there's an exponential growth in the amount of data that's
being produced in the biological sciences now. There's basically a doubling every ten
months, which is even outpacing Moore's law. So when you have all this data you really
want to do something with it. You've got to manage it, you've got to analyze it, and
computation has got to be there.
>> Robert Hess: It's an obvious connection then.
>> David Heckerman: There's an obvious connection, absolutely.
>> Robert Hess: How did you first get started on your focus in the Ph.D. and medical
and stuff like that?
>> David Heckerman: Well, first I thought I was going to be a physicist. It was almost a
religious quest. I wanted to understand how the universe worked, how the universe
came to be.
>> Robert Hess: When you figure it out, let me know.
>> David Heckerman: Well, yeah, that's the problem. I went through my physics
classes and then I hit this thing called quantum mechanics where basically they tell you
we don't know what's going on. And maybe even the universe doesn't know what's
going on.
So that was a very unsatisfactory end to my quest. I said what else can I do? I thought
well learning how the brain works is a very interesting thing to do so I'll do that. And
rather naively I thought well if I'm going to learn how the brain works I should study the
brain, go to med school. In retrospect that was pretty silly but it worked out very well for
me because I ended up going to medical school to Stanford. And there was this new
thing called artificial intelligence just getting going.
And I started just ditching my med school classes every once in a while and going to
these artificial intelligence classes. I said, yeah, this is the way to study how the brain
works. And I was hooked.
And I basically got a Ph.D. in artificial intelligence.
>> Robert Hess: Now, do your interests in these levels, did that begin at childhood or
was this something in high school and college that you started thinking about becoming
a physicist or medical doctor and stuff like that?
>> David Heckerman: It was when I went to undergraduate degree. I didn't know what I
was going to do. In high school I liked everything. I liked music. I liked math. I liked
science. I liked English.
Just had a blast doing all that stuff. And I guess it was a tradition, at least when I went to
college, if you knew a lot of math and you were just generally good, start with physics
and then go from there. So that's what I did.
>> Robert Hess: What did your parents do? Did they move you in that direction at all?
>> David Heckerman: Definitely my dad was a high school math teacher. And I can
remember as early as 5, six years old him sitting down with me and doing sequences
and things like that and enjoying that sort of thing.
>> Robert Hess: Fond members of doing sequences.
>> David Heckerman: He was a great math teacher, and certainly to this day a lot of
what I do involves mathematics. So his influence is there.
And my mom was a stay-at-home mom and just very supportive.
>> Robert Hess: Was there a computer in your childhood at all?
>> David Heckerman: The very first time I experienced a computer was maybe as a
sophomore at UCLA, as an undergrad. And that was one of the worst experiences I've
ever had in my life. It was one of these central computing systems where you had to
use cards. So you'd sit there and you'd type out your cards, guess as to whether the
program was going to work. Wait in line for about an hour and a half. Get to the front of
the line. Put the cards in the deck, push the button and it would come out and say error.
So you'd have to go back, figure out, take your best guess what the error was, go back
in line, wait another hour and a half.
It was a miserable experience.
>> Robert Hess: And how much time do you spend doing that, like a couple of weeks
and you were done with it?
>> David Heckerman: I took one class and I said, okay, I've now experienced
computers, that's very nice now I'll move on to something more useful.
>> Robert Hess: If someone told you then you'd be working at a company that
specialized in computers you would have ->> David Heckerman: The thing that really got me into computers in a positive way was
when I was still a physicist at UCLA. My next-door neighbor was a geneticist. And he
was collecting all this data that would allow someone to estimate the probability that a
father or a person was a father of a given child.
He says: I've got all this data. I don't know computers. Can you help out? And I didn't
know computers as well, but I said this sounds like a pretty easy math problem so I
taught myself the genetics. I taught myself computer science. I guess the first program I
used was Basic on an Apple 2, right around 1979.
So I wrote this program. I think it's still used from time to time today. But that was a real
positive experience. And ironically what am I doing now? I'm doing mathematics
computers around genetics and so forth.
>> Robert Hess: The exact same thing.
>> David Heckerman: Come full circle.
>> Robert Hess: Now, what were your experiences like at UCLA and Stanford
understanding first being a physicist then becoming a medical doctor, how was that
training preparing you for what you're doing today?
>> David Heckerman: What I do today is basically analyze very, very complex data.
And to do that, you need a mathematical background. And certainly the mathematical
background I got in physics and then in computer science from Stanford played a very
large role. I still remember all this. Every day I remember back to some lesson I learned
in those days. That's still relevant now.
>> Robert Hess: Now, while you were studying to become a physicist, were there things
you were doing that maybe leaning you towards what you're going to be doing in
biology?
>> David Heckerman: Actually, not at all. I remember taking one biology course. And
my experience there was much like my experience in that computer science course.
>> Robert Hess: Very positive one, otherwise?
>> David Heckerman: Yeah. Lots of memorization. In fact, I remember being taught
how to memorize the Krebs cycle. And one of the compounds in the Krebs cycle is
alpha ketoglutarate. You'd say 2, 4, 6, 8, alpha ketoglutarate. And that was the level of
education or level of understanding that was available then in biology.
>> Robert Hess: You still remembered it today. Hey, obviously it did some work.
>> David Heckerman: Yeah. But now today, when you study biology, you get these
intricate diagrams, machine-like, engineering-like diagrams of how one molecule fits into
another molecule and how these things interact together. It's much more of a scientific,
actually engineering, combined science and engineering discipline than before with just
a bunch of memorization.
>> Robert Hess: Actually understanding the role it plays in your life rather than just
memorizing stuff. That was the problem I had with school when I was growing up, with
math, they'd teach you all these math equations, you didn't know how they would be
applicable. And today I go I wish I would have paid more attention in math class
because I could use that now.
>> David Heckerman: Back then you get all that stuff to memorize. There's no places to
hook in the statistics or the mathematics. Now there's enough understanding, yet
enough uncertainty that you can bring in these mathematical tools to help further the
understanding.
>> Robert Hess: Now, what year was this about when you were doing your switch from
physics to medical sciences.
>> David Heckerman: Around 1980. This is when I was fed up with quantum
mechanics and said what other questions can I look at how the brain works looks good.
Made the rather naive decision to go to medical school to do that, and then within weeks
of being in medical school I discovered this artificial intelligence program at Stanford.
That's what got me started.
>> Robert Hess: So you had quantum mechanics, got disillusioned with physics; you
flipped a coin, said, gee, let's take a look at the brain. Then you go to the medical
sciences. And then you kind of get ADD or something like that and start looking at
artificial intelligence.
>> David Heckerman: After taking a few weeks of these courses in artificial intelligence
it was clear that studying the software of the brain was a much more efficient way, I
think, to study the brain than to study the hardware.
>> Robert Hess: How was artificial intelligence being taught back in those days?
>> David Heckerman: Well, it was -- they had these things called expert systems.
Basically rule-based systems where you have these complicated sets of if-then rules,
that when you chain them together in a certain ways you get these complex behaviors.
There's still expert systems around today. A good example are these expert systems
that help you do your taxes.
I mean, the tax code is very complicated. But one thing that really simplifies the ability to
build these expert systems in that case is the determinism built into the tax code. If such
and such is true, then you owe this much tax with no uncertainty.
So the types of expert systems that work, those types of expert systems that work well
now and the types that were being used back then were all about situations where there
was no uncertainty.
>> Robert Hess: Very clear decision tree.
>> David Heckerman: Right. But I was in medical school and wanted to build an expert
system for medicine, and there is no certainty whatsoever. Everything's uncertain. So if
you have this symptom and this symptom and this symptom you can't say therefore you
have this disease for sure you can say therefore I have this set of possible diseases,
each with its own uncertainty.
>> Robert Hess: But back in those days, they would do an awful lot of talk about expert
systems where they could help you fix your car, for example, and they were talking
about medical ones, the concept I remember them talking about they would sit down
with a doctor who had been doing family practice for years and years and just have him
start spewing his knowledge of when he was doing, looking at a new case or something
like that. Just by traveling, okay we're going to ask you this question, this question they
would mark it down. Okay here's our expert system then. But you're saying it wasn't
quite as easy as that.
>> David Heckerman: No, especially in the medical case. Maybe for auto repair it
works a bit better. There's less uncertainty in auto repair but there's still uncertainty as
any mechanic will tell you.
But in the case of medicine there's so much uncertainty, that, for example, if you had a
doctor sit down say in my practice if someone comes in with these symptoms then I'll
treat them like this. You try to take those rules move it to another situation, maybe to a
developing country, where the prior probability of various diseases are very different, the
system will fail immediately.
And there were some attempts to deal with uncertainty back then at Stanford that used
things that didn't involve probability. And coming from physics, that didn't make sense to
me. Probability seemed a perfectly decent way to handle uncertainty. And my
contribution back then was to show everyone that indeed you could use this very old
fashioned thing called probability to handle the uncertainty and expert systems. And to
do that I built several medical expert systems and showed that you could in fact build
these systems successfully.
>> Robert Hess: And so what are some of the key aspects, with probability, are you
simply then saying rather than a flip a coin black/wide, 1-0, you're saying there's 25
percent chance it will be this answer and 30 percent chance this and so forth and
breaking it down into simple rough probabilities?
>> David Heckerman: Exactly. For example, a physician who might be using one of
these medical expert systems would enter the symptoms and what would come up is not
just a list of diseases but a list of diseases and how likely they are.
And then what you typically do is you use those likelihoods to determine what questions
that are the next best questions to ask to get the most information to narrow down that
diagnosis. And often some of the questions are expensive, like what does an MRI show.
You have to be wary of the cost while you're asking the questions.
You might want to ask a question first that doesn't have as much information, but is
much less costly. So I built these systems that not only took probabilities into account
but the costs involved with doing the diagnoses and treatments.
>> Robert Hess: Like rule out a 10 percent probable things because it doesn't cost
anything to rule it out.
>> David Heckerman: Exactly. If you can do that with a quick question by all means do
that before you start getting invasive.
>> Robert Hess: Did the probabilistic system get much use then and is it getting much
use now.
>> David Heckerman: Yes we actually started a couple of companies based on these
expert systems. One company was focused on specific medical expert systems, and
another company was focused on using these systems outside of medicine. For
example, jet engine repair was one of the systems we built. And actually that's what got
me to Microsoft.
I wrote a book on this system. It won some kind of award. That drew the attention of
Nathan Mirvold who was working here at the time. And he read it.
>> Robert Hess: Setting up a research ->> David Heckerman: He was just starting a research group at Microsoft for the first
time. Nothing like it had existed before. And he said hmmm, expert systems, I think
there's a lot of things we can do here at Microsoft that can use these expert systems.
And so we had a chat and, of course, the first thing I told him was you're crazy, forget
about it. But I have this company here you might be interested in purchasing. So he
very cleverly said sure, come up we'll check out your company. So he got myself and
my two friends who were also working with that company, Jack Reese and Eric Horvitz.
Eric by the way did one of these things about a year back. And he got us up there.
And Microsoft was very much more impressive than we had ever expected. I was
thinking, Microsoft, DOS, mice. You know, this is 1992. What could they ->> Robert Hess: Like out of a garage.
>> David Heckerman: What do they do with probability expert systems for. But there
were a lot of great people up here. We had some great discussions. And so I came up
and that was 18 years ago.
>> Robert Hess: Much smaller campus back in those days.
>> David Heckerman: It was. It was the six, seven, eight, nine campus area.
>> Robert Hess: When you first started working at Microsoft, then, was it kind of like,
okay, what am I going to do here I'm working with computers where is the card punch
machine and I'm not going to stand in line for an hour to get my cards in there just to find
out I had an error in my program.
>> David Heckerman: By the time I made it to Microsoft I was sold on computers. I had
gone through my training at Stanford doing artificial intelligence work, doing these
probabilistic expert systems.
And there were no problems finding interesting applications to work on. The very first
thing I did was a help system. It's called Answer Wizard it's still there in Office and
Windows. It's that little box you type a question in how do I print sideways, brings up the
appropriate help topic.
>> Robert Hess: Clippy?
>> David Heckerman: For a while it turned into Clippy, but then Clippy went away. It
was not Clippy first. And then it was Clippy, then it went away.
And that was just out of necessity. I came to Microsoft. I hadn't been using Microsoft
Tools. I was having trouble using Microsoft Tools. I said boy we need some kind of help
system.
The help system then was here's a bunch of text, read it. I said no, I want to be able to
type a question. So we got a bunch of experts at Microsoft who knew how to map
people's questions to help topics. I interviewed them and coded it in this probabilistic
expert system and that became Answer Wizard.
>> Robert Hess: Did you look at actual users' questions, put them in a room, have a
problem, what's the question you'd like to ask, have them ask the question and try to
figure out the answer to it?
>> David Heckerman: That's more the way it's done now, certain things are evolving
that way. Back then we just used experts. That's one of the advantages of building an
expert system is you don't have to collect a lot of data. If you have an expert around
who can encode this knowledge efficiently, that's great.
But your question is a good one, because after doing some other things at Microsoft
where there was obvious expertise, another one was troubleshooting systems. So when
you can't print, there's a path that the operating system will take you through and say,
ask you a bunch of questions to try to help you print. That's another thing we worked on.
After doing a couple of these expert systems, I was noticing that at Microsoft at least
there were far less experts and far more data around. So the technology I was using to
encode this expert knowledge was this graphical model known as a bayesian network
that you mentioned in the introduction.
So I said I wonder if we can build these things from data instead of from experts. And
that was only after about a year of being here. And that occupied that line of work
building these graphical models from data occupied my next ten years of work here and
subsequent applications at Microsoft.
>> Robert Hess: How exactly does that work, though? Building expert systems out of
just data?
>> David Heckerman: Right. So instead of interviewing the expert, you kind of interview
the data, if you will. A good example is the spam filter.
My team and I -- we still think we built the very first spam filter content-based spam filter
ever. Not just at Microsoft, but anywhere in the world.
And the way you do that is you get a bunch of e-mail that you think you've labeled spam
and you get a bunch of e-mail that you label normal mail. And then you look at the data
in the mail. You basically extract words from the mail, phrases from the mail, special
features like what time of day did you receive the mail. How many exclamation points in
a row do you see in the mail. You get these what are called features and you ask the
question: What features discriminate one set of mail from the other. And there's
algorithmic machine learning ways to do that.
And you don't have to talk to the expert. You just let the data speak for itself. It leads to
these classifiers and as you know successful spam filtering.
>> Robert Hess: To a certain extent the expert you're referring to is yourself. I mean,
when we're looking at spam we're all experts at spotting spam, a piece of spam mail
pops up we can say that's spam mail.
>> David Heckerman: Right.
>> Robert Hess: Sometimes you don't know exactly why you're saying that per se. Gee
it's asking me to buy something or get meds or go online or something like that you can
kind of see ->> David Heckerman: The nice thing about using data, the level of expertise you need
is much, much less. You can use the Supreme Court I know it when I see it criteria. I
can't really tell you why this message is spam, but I know it's spam. And that's all the
human has to do.
Label the spam versus normal. The rest is done by algorithms.
>> Robert Hess: So now is your spam filter part of the Exchange and Office.
>> David Heckerman: Yes it's in Hotmail. It's in many different systems, yes.
>> Robert Hess: Are you still working on it?
>> David Heckerman: No, I've now moved on to this area of biology. But actually
there's some interesting analogies between what happens in the case of spam and what
happens in the case of biology.
>> Robert Hess: Oh really? Like what?
>> David Heckerman: So let's talk about some of the work I'm doing now with HIV
vaccine design. HIV is the virus that causes AIDS.
And it's a virus. And we get lots of viruses and normally our immune systems take care
of these viruses, eliminating them from our body. But that's not the case in HIV. Get
HIV viral infection and the immune system starts to attack but it's not complete. And that
virus hangs around. And then what it does, it starts to mutate.
And in fact HIV mutates a lot. It mutates about a million times faster than we do, than
humans do. And eventually the immune system finds a path to escape -- sorry, the HIV
finds a path to escape the immune system and this can take years. But then eventually
you go on to get AIDS and you die.
So there's an interesting -- so now there's an interesting analogy between what happens
there in the case of HIV and what happens with spam filtering. In the case of spam
filtering, spammers send out their messages and we've built a spam filter to block those
messages.
And then spammers say ah these messages are being blocked what can we do to
change our message to get it through the filter? So they'll do things like spell Viagra with
a one instead of the I in the name to try to get through the filter. Then we have to do
things to counter balance their attack. So you get this kind of adversarial thing going.
And in the case of HIV it looks very similar. You've got HIV attacking or the immune
system attacking HIV IEEE and then HIV mutates to avoid that attack.
Now you can, by analogy, you can say, well, how do we solve the problem with spam
and how might that apply to HIV? And what we did in the case of the spam filter was we
looked for the Achilles heel of the spammers, what do they have to do in order to
succeed?
In the case of spam they have to sell something. And so we use that against them. So
they have to sell something, there's going to be brand names in their e-mail. There's
going to be an attempt to extract money from you somehow.
And we can use the fact that they have to do that to make our filters better to block more
spam.
So now if you take that over to the HIV case, we can ask the question, is there a weak
link or an Achilles heel in HIV ore are there a series of them that we might be able to
attack. And it turns out that there may indeed be such areas of vulnerability on HIV and
what we're working on now is a vaccine that works along those lines. So basically what
a vaccine does is it trains your immune system to attack before you get the infection.
So what we're doing, the idea behind the vaccine that we're working on is to show our
immune system just those points of vulnerability in HIV so that our immune system will
mount an attack to those vulnerable points and be able to make an effective response
against HIV.
>> Robert Hess: So it's kind of like the small pox, the small pox vaccine essentially, if
my understanding is correct, gives you something that's like a dead small pox virus that
trains your body this is a small pox virus, see it and attack it but the small pox virus can't
really attack you any more, but now gets your immune system ready for it. And you're
saying with HIV we should be able to do the same sort of thing by identifying certain
parts of an HIV chain that we can attack properly.
>> David Heckerman: The difference between HIV and small pox is that the case of
small pox, small pox does not mutate. So you can take any -- not any, but certain
specific parts of small pox administer it as a vaccine. Your immune system will train to
recognize those bits and attack effectively and small pox doesn't mutate. It's dead when
your immune system has at it.
The case of HIV, however, if you were to give it the wrong portions of HIV, if you were to
train your immune system to the wrong portions of HIV, the immune system would attack
those wrong portions. HIV could then mutate around those spots and the vaccine would
be ineffective.
So what we're trying to do is to find the spots on HIV that will be effective for vaccine to
help our immune system get rid of HIV.
>> Robert Hess: Now, I still am finding this a little bit interesting to see Microsoft so
deeply involved in doing HIV research. I mean, where does your research actually play
out? Are you taking in going down standard Microsoft channels in some fashion or are
you going straight to the medical industries?
>> David Heckerman: We are working with external collaborators. We don't have any
wet labs or fume hoods here at Microsoft, so we're very highly dependent on
collaborators across the world working with us. Some of my collaborators include Bruce
Walker. He's at Harvard. Florency Perare [phonetic] at Harvard. Phillip Golder
[phonetic] at Oxford. We work them almost on a daily basis trying to figure out these
various angles. It was actually Bruce and Phillip that came up with this idea of
vulnerable spots to begin with, paralleling what we had done in the spam case. And so
it's a constant dialogue back and forth, what can we do, what experiments can we do to
try to narrow our uncertainty in what's going on with HIV. Then they do the experiments.
We analyze the data, and we go back and forth like that.
It's quite an exciting and fun experience.
>> Robert Hess: So they're essentially bringing to the table the medical side of the data
and the techniques and understanding, the wet sciences, per se, and then you're
bringing more the dry sciences, the computing power, the access to the other
researchers at Microsoft Research and e-science as well as some of the computing
parts like HPC things like that to assist in these sciences?
>> David Heckerman: Yes.
>> Robert Hess: HIV is such a strange problem and a lot of different capabilities. I think
it only makes sense then to try and attack the problem the same way it's attacking all of
us as well.
>> David Heckerman: Exactly.
>> Robert Hess: How long have you been working on the HIV problem?
>> David Heckerman: About six or seven years now.
>> Robert Hess: That's quite a while.
>> David Heckerman: Yes.
>> Robert Hess: Are you seeing big inroads being made recently or just kind of steady
progress throughout the time?
>> David Heckerman: I would characterize it more as steady progress, with an
occasional big insight. Like, for example, this insight about attacking the vulnerable
spots at HIV, I think that was a very clear potential breakthrough, and really steered our
work after that.
>> Robert Hess: Now, do you also see some of this stuff we're using to try to combat
HIV, expanding out and assisting us with other diseases as well that aren't mutatable
diseases?
>> David Heckerman: There are other viruses much like HIV and those are things we
hope our work will also be applicable to immediately. There's hepatitis C, for example,
which mutates about just as much as HIV does. It's remarkable, another remarkable
virus in a negative sense.
So that's perhaps the first thing that we would tackle. Some of the more general
concepts might also be applicable to, say, influenza, which is, still, as we know, a
problem.
>> Robert Hess: How big is the HIV problem itself? I mean, like, used to be we heard
about it all the time. I know it's still out there. Has prevalence, more prevalent than it
was before but we're not seeming to hear about it much.
>> David Heckerman: It could be simply because people are waiting to get that, to hear
about the progress. It's still very bad. 5,000 people per day die from HIV. It's less -there's less deaths in the United States because there are effective treatments now. But
these treatments are expensive and they require that you take the treatments on a
regular basis.
And so they're not ideal for developing countries. And we all still think that the vaccine is
the greatest hope for these developing countries.
>> Robert Hess: And you think we're on the verge of having a vaccine that can actually
get rid of the HIV problem?
>> David Heckerman: I wouldn't say verge. And there's so many people that are
affected by HIV that I definitely don't want to give false hope here in this discussion.
We have a long ways to go. This idea of the vulnerable spots is a very interesting one.
We're gathering evidence to try to confirm this hypothesis. We actually have a vaccine
design that's ready to go to test this hypothesis.
But that test is going to take a long time, testing an HIV vaccine is very slow. It takes a
long time. You can't give someone a vaccine then give them HIV to see if it works. You
just can't do that. It's a very time-consuming process.
And even if this initial test works out, we know that the vaccine design we have right now
is far from perfect so there's still a lot more work to be done there.
>> Robert Hess: A lot of fine tuning and stuff.
>> David Heckerman: Yes.
>> Robert Hess: What's specifically some of the tools you're working with to assist you
in this process?
>> David Heckerman: Well, one of the things we need to do, or one of the key things we
need to do is to find these vulnerable spots. And to do that we've developed a program
called PhyloD.net. Basically what you do is you get a set of people that are infected with
HIV and you draw blood from them. Then you sequence their HIV.
You then also sequence their DNA. So now you've got the sequence of the virus and
you've got the sequence, the DNA sequence of the host. And you look for correlations
between those things.
And they're surprisingly are correlations and those correlations help you to find these
vulnerable spots on HIV. And this map here shows some examples of potential
vulnerable spots. This is the circle here shows one of the proteins that you find in HIV.
And then these letters out here show the points of attack of the immune system of
different people's immune systems at these various points along HIV. And then these
arcs here show how HIV, once attacked in one spot, tries to mutate in another spot to
make up for the initial attack.
>> I see, so each one of these identifies a location where you would try to train the body
to attack at that point?
>> David Heckerman: Right. Well, these spots here are potential vulnerable spots.
Some of them may be decoy spots. So spots where if the immune system attacks it
does no good because HIV can mutate. There's subsequent tests we can do after we
find all the potential spots to tell which ones are vulnerable and which ones are not
vulnerable. So these are all the spots and so step one is find all the spots. Step two is
find the vulnerable spots and step three is find out how the virus might mutate to recover
from being attacked at a vulnerable spot and then have the vaccine target both the
original attack point and the response, the makeup response point as well.
>> Robert Hess: So it can't recover from the attack to begin with, if you're actually taking
the background out of it?
>> David Heckerman: Right.
>> Robert Hess: This is tool, this is part of the application you wrote and designed?
>> David Heckerman: Yes.
>> Robert Hess: It's using some of the stuff you were doing before?
>> David Heckerman: This approach uses the graphical models, the very same
graphical models that we used to do medical diagnosis and airplane jet engine
diagnoses and spam filtering all that.
>> Robert Hess: More of a biological spam filter to a certain extent.
>> David Heckerman: If you will, yes.
>> Robert Hess: And what do you think the next step of this application and some of
your HIV work would be then?
>> David Heckerman: First, we want to test the concept that there are these vulnerable
spots. We have some lab evidence that suggests this is true. But we actually want to
deploy it in the field with a clinical trial. Give people this vaccine, although imperfect
vaccine, and see whether it really does help protect them against HIV.
And at the same time we're doing that, we want to find more of these vulnerable spots
and one thing we've done recently which opens up that door quite nicely is it turns out
that when your body takes DNA and turns it into protein, the protein are the machines of
life, if you will.
When it does that translation, it can make mistakes. So it turns out that the way you turn
DNA or RNA into a protein you raid three base pairs at a time. That's what mother
nature decided.
And sometimes the machine that does that translation slips, accidentally, so you get a
frame shift error. So instead of reading these three, these three, these three, you're
reading one from the previous and two from the next and so forth. And that leads to
basically garbage proteins. And the thought was, well, maybe these garbage proteins
are also being attacked by the immune system.
And if they were, they are either potential vulnerable spots or maybe they're potential
decoy spots that you want to avoid to put in a vaccine. Either case we want to know if
they're there. So we use the same tool, PhyloD.net, and applied it to looking for
vulnerable spots in the garbage proteins. Lo and behold they were there. And, in fact,
we found just as many spots in the garbage proteins as we're finding in the normal
proteins.
So this opens up a whole new avenue of work for the vaccine design.
>> Robert Hess: Now, the vaccine design, would it be something that would need to
address like a bunch of spots at once or do you have several different vaccines that you
need to give in a cycle?
>> David Heckerman: Especially in developing countries you want to try to develop
something that is literally one shot.
You want to be able to give them one shot. You may never see these people again. So
you want to take care of everybody at once, so to speak, with one shot.
And since different people have different immune systems, they will target different spots
on HIV. So you actually have to build a vaccine that's a cocktail that where some
portions of the cocktail work for one person, other portions of the cocktail work for
another person. And that's sort of the avenue we're going down right now.
>> Robert Hess: So you have this complex HIV strand, but then you've also got, in
someone else, another complexity associated with them as well. So you've got to
address everything at once?
>> David Heckerman: That's right. If you're going to make effective vaccine you want to
cover a lot of people at once.
>> Robert Hess: What do you think is next for you?
>> David Heckerman: Well, I really love working in this area of biology -- at the interface
of biology and computer science. And one of the -- I'm really putting a lot of effort into
the HIV vaccine. But another thing that's very interesting is looking for genetic causes of
disease.
It turns out now that it's very inexpensive to sequence our DNA. So if you can do that
inexpensively then you can get a bunch of people who, some of whom have disease,
some of whom don't have a disease then just compare their DNA and see, well, what are
the differences. If you can isolate what sections of DNA are responsible for that disease
you can develop, perhaps, drug treatments or other interventions or maybe even cures
for the disease.
So I've been working with my team in the e-science group to develop methods that can
do just that.
>> Robert Hess: And what exactly is e-science group? I know Microsoft Research, we
have the big group of them doing all sorts of research from surface to mice to medical
things. What exactly is e-science?
>> David Heckerman: The purpose behind the e-science group is to just explore this
wonderful convergence between not only biology and computer science, but all the
sciences and computer science.
So, for example, not only do we have this biological work going on in the e-science
group, but we have work looking at carbon climate. The global warming phenomenon.
And there, as you might guess, there's a lot of data coming in from, say, satellites, from
measurements all around the globe and you want to process this data and try to infer
various models for what's causing global warming and explore or look at what various
interventions can do to mitigate those problems.
>> Robert Hess: So it's more things that you wouldn't necessarily immediately think
Microsoft's involved in this technology, or our technology might assist that process, but
global warming, HIV, genetic diseases, aren't the sort of thing that you would expect
Microsoft to actually be spending money time and research on to actually solve that
problem but it's still important for everyone in the world.
>> David Heckerman: But when you recognize that these problems -- by the way,
another one we're working on is energy production. Turns out sugarcane is a great
source of energy. And wouldn't it be nice if we could make sugarcane produce even
more energy and try to rid our dependence on oil?
So it turns out that all these very important problems to society involve or really need
computer science to help solve those problems. You've got data management
problems. Data analysis problems. Tons and tons of data you've got to use the
computation to help with these problems.
>> Robert Hess: Are you doing much with the HPC program as well? The high
powered computers and things which are designed for doing large data modeling and
large data processing, is that playing a role in all of this?
>> David Heckerman: Absolutely. HPC is key for our work. For example, this PhyloD
program that we've been talking about, to analyze just one position on HIV and one
immune system type takes a few seconds. But there's 3,000 positions on HIV and
there's hundreds of different immune system types.
And so the amount of computation to do this on one machine is about, it would take one
or two years to do the computations. But these computations fortunately in our case are
what we like to call pleasantly parallel. So we use HPC. And we can get this stuff done
over a cup of coffee.
>> Robert Hess: Now, from a personal side, what do you do in your personal life?
>> David Heckerman: Well, I work on HIV vaccine design. Work on genetic causes of
disease. I have a great family and spend my time with them. I love them. I have -- my
wife and two kids, 11 and 13. We're just having a blast.
>> Robert Hess: What are the hobbies you have?
>> David Heckerman: I play music. I play guitar, piano. Play trumpet. I'm still in a band
that reaches occasionally like once every year.
>> Robert Hess: Seattle punch rock bands.
>> David Heckerman: I'm from LA. We're still -- it's a '70s band. We never broke up.
>> Robert Hess: So you kept on going from back in the '70s.
>> David Heckerman: That's right.
>> Robert Hess: Really? The same people.
>> David Heckerman: Same people.
>> Robert Hess: What type of music do you play.
>> David Heckerman: Chicago, tower of power, rhythm and blues kind of things.
>> Robert Hess: Do you dress up in costumes.
>> David Heckerman: Actually, there were a few times when we very reluctantly did
that. But we actually got started before the disco era. And so disco was a very bad
thing for us. And all that dressing up stuff was not -- we didn't like that.
>> Robert Hess: Bad blood from those days?
>> David Heckerman: Yeah.
>> Robert Hess: And you mentioned you're actually living down and working down in
LA. Most people realize that Microsoft's main campus is up here in Redmond where we
are right now. How awkward is that to take and be working remotely like that in a small
team that's not in with the regular group?
>> David Heckerman: It's not awkward at all. Some of us are down there. Some of us
are up here. You know with all the Microsoft tools we have for remote collaboration, it's
not a problem at all.
>> Robert Hess: So technology, again, assists you in that.
>> David Heckerman: Thank you Microsoft for making it easy for us to do this.
>> Robert Hess: Now, of course, down in the LA area a lot of companies are down
there that are non-Microsoft technology companies. Do you have any problems, issues
working with people down there? Or doesn't feel like you're in the lion's den or
something like that?
>> David Heckerman: No, not at all. In fact, most of our collaborators not at Microsoft
are the biologists scattered all over the place, Harvard, Oxford, Australia, all over. South
Africa. They're all over.
>> Robert Hess: It's a global community.
>> David Heckerman: The only problem is the earth is round. I mean, the time change
is the only problem.
>> Robert Hess: Yeah, can't fix that with technology, can we?
Now, we've got a couple of questions we like asking all of our guests to get a nice pulse
of what they are. They're always the same sort of questions. We call them our mantra
questions. So what book would you like to recommend everyone to read?
>> David Heckerman: Well, the first one that comes to mind, it's a very old book, but it
really opened my eyes. It's called Gödel, Escher, Bach by Hofstadter.
>> Robert Hess: Know it well.
>> David Heckerman: It actually -- it's a fairly religious book, because the main premise
of the book is that you and I are machines, consciousness is generated from information
processing.
And that's a lot to swallow, I think, for some people. But he makes a very compelling
argument for it. And I think whether or not you believe that, certainly just suspending
disbelief for a moment and assuming that's true, you can, I think, make remarkable
progress in understanding how our intellect and how our brains work.
>> Robert Hess: Plus I think just having that discussion with yourself and the book I
think takes and helps you understand exactly where your position is and what things you
want to believe in as well from that standpoint. It's a thick book. It's a hard read. But it
definitely is something I also would recommend to people to get to.
And next, if you weren't working in the computer industry, what do you think you'd be
doing?
>> David Heckerman: Okay. Well, that's somewhat an easy question, because I'm kind
of not working in the computer industry. So I'd be doing the same thing. But if I wasn't
working on computers or biology.
>> Robert Hess: If you weren't doing what you're doing now.
>> David Heckerman: If I'm not doing what I'm doing now.
>> Robert Hess: Be a '70s band.
>> David Heckerman: I could very likely be in a band. I could see doing that. I love
music. I could be -- I could be doing mathematics, some other form of mathematics
proving something.
>> Robert Hess: Quantum physics?
>> David Heckerman: Actually, yeah, I would be -- if I couldn't do this right now, I think I
would go back and look at these very weird things that happen in the quantum world. I'd
try to understand those better.
>> Robert Hess: What do you feel is the most important technology in today's world?
>> David Heckerman: Well, I think it's not known that it exists by most people right now,
but I think it's definitely, in my book, it's the most important. And that is our ability now to
look at biological sequences, both DNA and RNA proteins.
We can do it now. For example, you can pay a company like.
Navigenics or 23andMe a couple hundred bucks, they send you a kit. You spit in a tube.
They send you back a list of 500,000 different base pairs on your DNA.
Very informative base pairs that tell you something like you're susceptible to this disease
or you're not or you are susceptible to this drug or this drug would work well for you.
Just an amazing technology. Again, there's this better than Moore's law phenomenon
happening in biology now where you're getting a doubling in the amount of data that you
can produce every ten months.
And so soon we'll be able to sequence our whole genome, if you get cancer we'll be able
to sequence the genome of your normal cells and compare them to the genome of your
cancer cells and figure out exactly what's gone wrong in the cancer, maybe develop a
vaccine or a treatment specifically for your cancer.
It's right here, right now. It's happening right now.
>> Robert Hess: I think we just like the tip of the iceberg. Just barely got to the point of
that book of you and now we understand what we can do with that book of you.
>> David Heckerman: Absolutely. It's just amazing what's going on.
>> Robert Hess: 25 years from now, what new technology would you like to see be
available?
>> David Heckerman: Well, this is a follow-on of what I just mentioned. I think within 25
years, if we follow this line of sequencing, understanding what's going on with our
bodies, I think we'll begin to understand how we can beat aging. Not completely. I think
death and taxes will still be the only things that are not uncertain in our lives. But I think
we'll be able to -- there will be technologies out there, either gene therapies or
treatments or something, we'll find the genes that lead to, that protect us more so than
we're protected right now by the ravages of time.
And we'll be able to -- I would think most people should be able to live to at least 100.
And what's good about that is if you get -- if you can increase the life span of people that
much, imagine what new scientific break-throughs we're going to have. I mean, I'm
amazed at what scientific break-throughs we had in the renaissance with people dying at
the age of 50 and 60. And the longer our life spans are going to get, the more a single
individual is going to be able to develop enough knowledge and wisdom over their
lifetime in order to come up with some really new amazing breakthroughs that we
couldn't even imagine before. So there's kind of a positive feedback loop with this
technology.
>> Robert Hess: You think aging itself could possibly be reduced?
>> David Heckerman: Yes.
>> Robert Hess: Because my understanding basically it's like an unwinding, relaxing of
the proteins or something that's causing some of the ->> David Heckerman: That's one theory. Chromosomal damage. Oxidative stress.
There's lots of theories out there. But if a parrot can live to a hundred years, why can't
humans? It's not impossible. It's the same biological system. Presumably there's some
protective mechanisms. We already know there's protective mechanisms that are
constantly repairing our DNA when it gets damaged.
Probably there's some other repair mechanisms that are available out there that mother
nature has created, for example, in the parrot that we can use for humans to extend their
lives as well.
>> Robert Hess: It's not just eating more crackers, right? [laughter].
>> David Heckerman: Although you might be able to put the drug in a cracker. Nice
touch.
>> Robert Hess: Lastly, this is kind of the fun part that all my guests really love to get
into. Is testing out their creative skills. We'd like you to draw and then explain your
favorite data structure and be sure to sign it.
>> David Heckerman: Absolutely. So maybe you can guess what my favorite data
structure is. I've mentioned it several times throughout the talk. It's the graphical model
or the bayesian network.
So, first of all, let me tell you what such a thing does. Basically a graphical model or
bayesian network is a representation that allows you to uncode what's called a joint
probability distribution.
So when you have a complex problem, you have all sorts of variables. And these
variables have different states. And what you want to know, ultimately, is the probability
of every possible combination of those states. So let's just do something very simple
here. Let's suppose there are three variables and so we'll say there's variable X1, X2
and X3 and just for the sake of argument here, let's just say that each of them have two
states. Either on or off.
Okay. So there's eight possible states. Two here, two here. Two here. Eight possible
states and what we want to know is the probability of all those eight states. So we are
interested in the probability of X one. X two and X three and that's how we write it. Now,
one thing to do is just list them. List in this case eight probabilities. That if I have N
variables here and they're all binary, suddenly, I have to list two to the N numbers, and
you can't do that.
So a graphical model helps us do that and it takes advantage of something called
conditional independence, which I'll get to in a moment. So one way you can write this
probability, it's called one of the rules of probability is called the product rule. So I can
rewrite this probability at the probability of the first variable times the probability of the
second variable, given that I know what the value of the first variable is. That's what that
[inaudible] stroke means it means given. Times the probability of the third variable given
the first two variables.
Okay. So these are just the same thing. And there's no magic here. There's two
numbers here well, one number because they have to sum to one.
There's two numbers here and four numbers here. That's seven. These numbers have
to sum to one. These probabilities have to sum to one. So there's seven numbers here
and seven numbers here so I haven't done anything fancy yet.
But there's this thing called conditional independence, which says that, well, what if the
probability of X3 only depends on X1 and it doesn't depend on X2? So basically you can
cross that term out of this probability expression. So now we have less numbers to deal
with. And you can encode these eight numbers with fewer numbers down here.
Now the bayesian network is a graphical way to represent this and it's very convenient in
practice. So if I were to represent the bayesian network for the first thing I drew, it would
look like this, X1 points to X2, points to X3.
All right. And the other thing that X3 depends on is X1. So I have another arc here. So
basically whenever you have a probability for a variable depending on another variable,
you just draw an arc from the conditioning variable to the target variable. As shown
here. So this graph, it's complete, it says there's no independence and so it's not that
useful. But here, if you do have this independence where X2 does not influence the
probability of X3, I can remove this arc. And now I have a simpler graph and it allows
me to code this probability distribution in a much simpler way.
You're probably thinking who cares, this is just three variables. But when you have a
thousand variables, then usually you have a lot of conditional independence so you can
dramatically reduce the amount of information needed to represent the joint distribution.
And when you can do that, you can start using the computer to answer all sorts of
questions like what's the probability of X7 and X8 given X2 and X3 and that's the thing
you need when you build an expert system.
>> Robert Hess: Thank you. Make sure you sign it. We'll get your John Hancock. We'll
post it on the wall, maybe sell it on eBay or something like that. Thank you, David, for
the [indiscernible] network and being our guest today. We hope all of you enjoyed this
chance to place technology and the person behind the code.
Download