20829 [music] >> Robert Hess: There are a wide variety of machines which form a regular part of our daily lives. From something as simple as a paperclip to machines which are far more complex, such as a computer. None of these machines, however, come close to the complexity that's found in the machine that is the human being. When you think of Microsoft, you might simply consider the various ways they focus on technologies related to computers. Microsoft also plays a role in the human equation as well. And this isn't just in trying to design better user interfaces or ergonomic hardware. Some of the same structural models in data filtering algorithms can also find application at a biological level, assisting us in better understanding our own selves as well as diseases which often impact us. One person at Microsoft who is finding new ways to apply advanced algorithms to our own biology is David Heckerman. Hello, I'm Robert Hess and I'll be your host today as we talk with David Heckerman, distinguished scientist in the e-science group at Microsoft. Hope you enjoy this opportunity to look at the technology and the person behind the code. David began his education with the intent of becoming a physicist, but found his interests led him eventually into the medical sciences. He was at Stanford while working on his MD that he began looking at the problems of artificial intelligence. He submitted for his Ph.D. work impressive construct that he called the Probabilistic Expert System. It was so impressive that Microsoft hired him in 1992 to build such systems for non-medical applications. His work at Microsoft began to lead him further and further from his original medical focus. One of his pioneering areas of study was for graphical models known as bayesian networks. And it was while working on these models that he recognized how this he could also be applied to medicine and biology. Today, his efforts have allowed him to return to his medical education roots and using it to design such things as a vaccine for HIV in the search for genetic causes of diseases. Join me now as I welcome today's guest, David Heckerman. >> David Heckerman: Robert, pleasure to be here. >> Robert Hess: Glad to have you here. Now, I've got to get it off the book to begin with is that Microsoft's technology, I don't really think of Microsoft as being a medical company. Do you find that problem occurring periodically when you go around telling people who you work for? >> David Heckerman: Yes, when I explain to someone I'm in an elevator or something, I tell them I'm working on an HIV for vaccine, or looking for genetic causes of disease, they say: Really, why? So there's actually several reasons. One is that I have this background in medicine. So it's just generally of interest to me. Another reason is some of the things I'm working on like a vaccine for HIV, it's just a very important problem to tackle. HIV is the disease that causes AIDS which kills over 5,000 people a day. And so if computation can be useful to combat that disease, that's great. I'm all in. Probably more relevant to Microsoft is that there's this convergence between a real growing convergence between computer science and biology. There's some obvious things like we now understand that DNA is a programming language. Programs the machines of life, the proteins that make us up. And another issue is that there's an exponential growth in the amount of data that's being produced in the biological sciences now. There's basically a doubling every ten months, which is even outpacing Moore's law. So when you have all this data you really want to do something with it. You've got to manage it, you've got to analyze it, and computation has got to be there. >> Robert Hess: It's an obvious connection then. >> David Heckerman: There's an obvious connection, absolutely. >> Robert Hess: How did you first get started on your focus in the Ph.D. and medical and stuff like that? >> David Heckerman: Well, first I thought I was going to be a physicist. It was almost a religious quest. I wanted to understand how the universe worked, how the universe came to be. >> Robert Hess: When you figure it out, let me know. >> David Heckerman: Well, yeah, that's the problem. I went through my physics classes and then I hit this thing called quantum mechanics where basically they tell you we don't know what's going on. And maybe even the universe doesn't know what's going on. So that was a very unsatisfactory end to my quest. I said what else can I do? I thought well learning how the brain works is a very interesting thing to do so I'll do that. And rather naively I thought well if I'm going to learn how the brain works I should study the brain, go to med school. In retrospect that was pretty silly but it worked out very well for me because I ended up going to medical school to Stanford. And there was this new thing called artificial intelligence just getting going. And I started just ditching my med school classes every once in a while and going to these artificial intelligence classes. I said, yeah, this is the way to study how the brain works. And I was hooked. And I basically got a Ph.D. in artificial intelligence. >> Robert Hess: Now, do your interests in these levels, did that begin at childhood or was this something in high school and college that you started thinking about becoming a physicist or medical doctor and stuff like that? >> David Heckerman: It was when I went to undergraduate degree. I didn't know what I was going to do. In high school I liked everything. I liked music. I liked math. I liked science. I liked English. Just had a blast doing all that stuff. And I guess it was a tradition, at least when I went to college, if you knew a lot of math and you were just generally good, start with physics and then go from there. So that's what I did. >> Robert Hess: What did your parents do? Did they move you in that direction at all? >> David Heckerman: Definitely my dad was a high school math teacher. And I can remember as early as 5, six years old him sitting down with me and doing sequences and things like that and enjoying that sort of thing. >> Robert Hess: Fond members of doing sequences. >> David Heckerman: He was a great math teacher, and certainly to this day a lot of what I do involves mathematics. So his influence is there. And my mom was a stay-at-home mom and just very supportive. >> Robert Hess: Was there a computer in your childhood at all? >> David Heckerman: The very first time I experienced a computer was maybe as a sophomore at UCLA, as an undergrad. And that was one of the worst experiences I've ever had in my life. It was one of these central computing systems where you had to use cards. So you'd sit there and you'd type out your cards, guess as to whether the program was going to work. Wait in line for about an hour and a half. Get to the front of the line. Put the cards in the deck, push the button and it would come out and say error. So you'd have to go back, figure out, take your best guess what the error was, go back in line, wait another hour and a half. It was a miserable experience. >> Robert Hess: And how much time do you spend doing that, like a couple of weeks and you were done with it? >> David Heckerman: I took one class and I said, okay, I've now experienced computers, that's very nice now I'll move on to something more useful. >> Robert Hess: If someone told you then you'd be working at a company that specialized in computers you would have ->> David Heckerman: The thing that really got me into computers in a positive way was when I was still a physicist at UCLA. My next-door neighbor was a geneticist. And he was collecting all this data that would allow someone to estimate the probability that a father or a person was a father of a given child. He says: I've got all this data. I don't know computers. Can you help out? And I didn't know computers as well, but I said this sounds like a pretty easy math problem so I taught myself the genetics. I taught myself computer science. I guess the first program I used was Basic on an Apple 2, right around 1979. So I wrote this program. I think it's still used from time to time today. But that was a real positive experience. And ironically what am I doing now? I'm doing mathematics computers around genetics and so forth. >> Robert Hess: The exact same thing. >> David Heckerman: Come full circle. >> Robert Hess: Now, what were your experiences like at UCLA and Stanford understanding first being a physicist then becoming a medical doctor, how was that training preparing you for what you're doing today? >> David Heckerman: What I do today is basically analyze very, very complex data. And to do that, you need a mathematical background. And certainly the mathematical background I got in physics and then in computer science from Stanford played a very large role. I still remember all this. Every day I remember back to some lesson I learned in those days. That's still relevant now. >> Robert Hess: Now, while you were studying to become a physicist, were there things you were doing that maybe leaning you towards what you're going to be doing in biology? >> David Heckerman: Actually, not at all. I remember taking one biology course. And my experience there was much like my experience in that computer science course. >> Robert Hess: Very positive one, otherwise? >> David Heckerman: Yeah. Lots of memorization. In fact, I remember being taught how to memorize the Krebs cycle. And one of the compounds in the Krebs cycle is alpha ketoglutarate. You'd say 2, 4, 6, 8, alpha ketoglutarate. And that was the level of education or level of understanding that was available then in biology. >> Robert Hess: You still remembered it today. Hey, obviously it did some work. >> David Heckerman: Yeah. But now today, when you study biology, you get these intricate diagrams, machine-like, engineering-like diagrams of how one molecule fits into another molecule and how these things interact together. It's much more of a scientific, actually engineering, combined science and engineering discipline than before with just a bunch of memorization. >> Robert Hess: Actually understanding the role it plays in your life rather than just memorizing stuff. That was the problem I had with school when I was growing up, with math, they'd teach you all these math equations, you didn't know how they would be applicable. And today I go I wish I would have paid more attention in math class because I could use that now. >> David Heckerman: Back then you get all that stuff to memorize. There's no places to hook in the statistics or the mathematics. Now there's enough understanding, yet enough uncertainty that you can bring in these mathematical tools to help further the understanding. >> Robert Hess: Now, what year was this about when you were doing your switch from physics to medical sciences. >> David Heckerman: Around 1980. This is when I was fed up with quantum mechanics and said what other questions can I look at how the brain works looks good. Made the rather naive decision to go to medical school to do that, and then within weeks of being in medical school I discovered this artificial intelligence program at Stanford. That's what got me started. >> Robert Hess: So you had quantum mechanics, got disillusioned with physics; you flipped a coin, said, gee, let's take a look at the brain. Then you go to the medical sciences. And then you kind of get ADD or something like that and start looking at artificial intelligence. >> David Heckerman: After taking a few weeks of these courses in artificial intelligence it was clear that studying the software of the brain was a much more efficient way, I think, to study the brain than to study the hardware. >> Robert Hess: How was artificial intelligence being taught back in those days? >> David Heckerman: Well, it was -- they had these things called expert systems. Basically rule-based systems where you have these complicated sets of if-then rules, that when you chain them together in a certain ways you get these complex behaviors. There's still expert systems around today. A good example are these expert systems that help you do your taxes. I mean, the tax code is very complicated. But one thing that really simplifies the ability to build these expert systems in that case is the determinism built into the tax code. If such and such is true, then you owe this much tax with no uncertainty. So the types of expert systems that work, those types of expert systems that work well now and the types that were being used back then were all about situations where there was no uncertainty. >> Robert Hess: Very clear decision tree. >> David Heckerman: Right. But I was in medical school and wanted to build an expert system for medicine, and there is no certainty whatsoever. Everything's uncertain. So if you have this symptom and this symptom and this symptom you can't say therefore you have this disease for sure you can say therefore I have this set of possible diseases, each with its own uncertainty. >> Robert Hess: But back in those days, they would do an awful lot of talk about expert systems where they could help you fix your car, for example, and they were talking about medical ones, the concept I remember them talking about they would sit down with a doctor who had been doing family practice for years and years and just have him start spewing his knowledge of when he was doing, looking at a new case or something like that. Just by traveling, okay we're going to ask you this question, this question they would mark it down. Okay here's our expert system then. But you're saying it wasn't quite as easy as that. >> David Heckerman: No, especially in the medical case. Maybe for auto repair it works a bit better. There's less uncertainty in auto repair but there's still uncertainty as any mechanic will tell you. But in the case of medicine there's so much uncertainty, that, for example, if you had a doctor sit down say in my practice if someone comes in with these symptoms then I'll treat them like this. You try to take those rules move it to another situation, maybe to a developing country, where the prior probability of various diseases are very different, the system will fail immediately. And there were some attempts to deal with uncertainty back then at Stanford that used things that didn't involve probability. And coming from physics, that didn't make sense to me. Probability seemed a perfectly decent way to handle uncertainty. And my contribution back then was to show everyone that indeed you could use this very old fashioned thing called probability to handle the uncertainty and expert systems. And to do that I built several medical expert systems and showed that you could in fact build these systems successfully. >> Robert Hess: And so what are some of the key aspects, with probability, are you simply then saying rather than a flip a coin black/wide, 1-0, you're saying there's 25 percent chance it will be this answer and 30 percent chance this and so forth and breaking it down into simple rough probabilities? >> David Heckerman: Exactly. For example, a physician who might be using one of these medical expert systems would enter the symptoms and what would come up is not just a list of diseases but a list of diseases and how likely they are. And then what you typically do is you use those likelihoods to determine what questions that are the next best questions to ask to get the most information to narrow down that diagnosis. And often some of the questions are expensive, like what does an MRI show. You have to be wary of the cost while you're asking the questions. You might want to ask a question first that doesn't have as much information, but is much less costly. So I built these systems that not only took probabilities into account but the costs involved with doing the diagnoses and treatments. >> Robert Hess: Like rule out a 10 percent probable things because it doesn't cost anything to rule it out. >> David Heckerman: Exactly. If you can do that with a quick question by all means do that before you start getting invasive. >> Robert Hess: Did the probabilistic system get much use then and is it getting much use now. >> David Heckerman: Yes we actually started a couple of companies based on these expert systems. One company was focused on specific medical expert systems, and another company was focused on using these systems outside of medicine. For example, jet engine repair was one of the systems we built. And actually that's what got me to Microsoft. I wrote a book on this system. It won some kind of award. That drew the attention of Nathan Mirvold who was working here at the time. And he read it. >> Robert Hess: Setting up a research ->> David Heckerman: He was just starting a research group at Microsoft for the first time. Nothing like it had existed before. And he said hmmm, expert systems, I think there's a lot of things we can do here at Microsoft that can use these expert systems. And so we had a chat and, of course, the first thing I told him was you're crazy, forget about it. But I have this company here you might be interested in purchasing. So he very cleverly said sure, come up we'll check out your company. So he got myself and my two friends who were also working with that company, Jack Reese and Eric Horvitz. Eric by the way did one of these things about a year back. And he got us up there. And Microsoft was very much more impressive than we had ever expected. I was thinking, Microsoft, DOS, mice. You know, this is 1992. What could they ->> Robert Hess: Like out of a garage. >> David Heckerman: What do they do with probability expert systems for. But there were a lot of great people up here. We had some great discussions. And so I came up and that was 18 years ago. >> Robert Hess: Much smaller campus back in those days. >> David Heckerman: It was. It was the six, seven, eight, nine campus area. >> Robert Hess: When you first started working at Microsoft, then, was it kind of like, okay, what am I going to do here I'm working with computers where is the card punch machine and I'm not going to stand in line for an hour to get my cards in there just to find out I had an error in my program. >> David Heckerman: By the time I made it to Microsoft I was sold on computers. I had gone through my training at Stanford doing artificial intelligence work, doing these probabilistic expert systems. And there were no problems finding interesting applications to work on. The very first thing I did was a help system. It's called Answer Wizard it's still there in Office and Windows. It's that little box you type a question in how do I print sideways, brings up the appropriate help topic. >> Robert Hess: Clippy? >> David Heckerman: For a while it turned into Clippy, but then Clippy went away. It was not Clippy first. And then it was Clippy, then it went away. And that was just out of necessity. I came to Microsoft. I hadn't been using Microsoft Tools. I was having trouble using Microsoft Tools. I said boy we need some kind of help system. The help system then was here's a bunch of text, read it. I said no, I want to be able to type a question. So we got a bunch of experts at Microsoft who knew how to map people's questions to help topics. I interviewed them and coded it in this probabilistic expert system and that became Answer Wizard. >> Robert Hess: Did you look at actual users' questions, put them in a room, have a problem, what's the question you'd like to ask, have them ask the question and try to figure out the answer to it? >> David Heckerman: That's more the way it's done now, certain things are evolving that way. Back then we just used experts. That's one of the advantages of building an expert system is you don't have to collect a lot of data. If you have an expert around who can encode this knowledge efficiently, that's great. But your question is a good one, because after doing some other things at Microsoft where there was obvious expertise, another one was troubleshooting systems. So when you can't print, there's a path that the operating system will take you through and say, ask you a bunch of questions to try to help you print. That's another thing we worked on. After doing a couple of these expert systems, I was noticing that at Microsoft at least there were far less experts and far more data around. So the technology I was using to encode this expert knowledge was this graphical model known as a bayesian network that you mentioned in the introduction. So I said I wonder if we can build these things from data instead of from experts. And that was only after about a year of being here. And that occupied that line of work building these graphical models from data occupied my next ten years of work here and subsequent applications at Microsoft. >> Robert Hess: How exactly does that work, though? Building expert systems out of just data? >> David Heckerman: Right. So instead of interviewing the expert, you kind of interview the data, if you will. A good example is the spam filter. My team and I -- we still think we built the very first spam filter content-based spam filter ever. Not just at Microsoft, but anywhere in the world. And the way you do that is you get a bunch of e-mail that you think you've labeled spam and you get a bunch of e-mail that you label normal mail. And then you look at the data in the mail. You basically extract words from the mail, phrases from the mail, special features like what time of day did you receive the mail. How many exclamation points in a row do you see in the mail. You get these what are called features and you ask the question: What features discriminate one set of mail from the other. And there's algorithmic machine learning ways to do that. And you don't have to talk to the expert. You just let the data speak for itself. It leads to these classifiers and as you know successful spam filtering. >> Robert Hess: To a certain extent the expert you're referring to is yourself. I mean, when we're looking at spam we're all experts at spotting spam, a piece of spam mail pops up we can say that's spam mail. >> David Heckerman: Right. >> Robert Hess: Sometimes you don't know exactly why you're saying that per se. Gee it's asking me to buy something or get meds or go online or something like that you can kind of see ->> David Heckerman: The nice thing about using data, the level of expertise you need is much, much less. You can use the Supreme Court I know it when I see it criteria. I can't really tell you why this message is spam, but I know it's spam. And that's all the human has to do. Label the spam versus normal. The rest is done by algorithms. >> Robert Hess: So now is your spam filter part of the Exchange and Office. >> David Heckerman: Yes it's in Hotmail. It's in many different systems, yes. >> Robert Hess: Are you still working on it? >> David Heckerman: No, I've now moved on to this area of biology. But actually there's some interesting analogies between what happens in the case of spam and what happens in the case of biology. >> Robert Hess: Oh really? Like what? >> David Heckerman: So let's talk about some of the work I'm doing now with HIV vaccine design. HIV is the virus that causes AIDS. And it's a virus. And we get lots of viruses and normally our immune systems take care of these viruses, eliminating them from our body. But that's not the case in HIV. Get HIV viral infection and the immune system starts to attack but it's not complete. And that virus hangs around. And then what it does, it starts to mutate. And in fact HIV mutates a lot. It mutates about a million times faster than we do, than humans do. And eventually the immune system finds a path to escape -- sorry, the HIV finds a path to escape the immune system and this can take years. But then eventually you go on to get AIDS and you die. So there's an interesting -- so now there's an interesting analogy between what happens there in the case of HIV and what happens with spam filtering. In the case of spam filtering, spammers send out their messages and we've built a spam filter to block those messages. And then spammers say ah these messages are being blocked what can we do to change our message to get it through the filter? So they'll do things like spell Viagra with a one instead of the I in the name to try to get through the filter. Then we have to do things to counter balance their attack. So you get this kind of adversarial thing going. And in the case of HIV it looks very similar. You've got HIV attacking or the immune system attacking HIV IEEE and then HIV mutates to avoid that attack. Now you can, by analogy, you can say, well, how do we solve the problem with spam and how might that apply to HIV? And what we did in the case of the spam filter was we looked for the Achilles heel of the spammers, what do they have to do in order to succeed? In the case of spam they have to sell something. And so we use that against them. So they have to sell something, there's going to be brand names in their e-mail. There's going to be an attempt to extract money from you somehow. And we can use the fact that they have to do that to make our filters better to block more spam. So now if you take that over to the HIV case, we can ask the question, is there a weak link or an Achilles heel in HIV ore are there a series of them that we might be able to attack. And it turns out that there may indeed be such areas of vulnerability on HIV and what we're working on now is a vaccine that works along those lines. So basically what a vaccine does is it trains your immune system to attack before you get the infection. So what we're doing, the idea behind the vaccine that we're working on is to show our immune system just those points of vulnerability in HIV so that our immune system will mount an attack to those vulnerable points and be able to make an effective response against HIV. >> Robert Hess: So it's kind of like the small pox, the small pox vaccine essentially, if my understanding is correct, gives you something that's like a dead small pox virus that trains your body this is a small pox virus, see it and attack it but the small pox virus can't really attack you any more, but now gets your immune system ready for it. And you're saying with HIV we should be able to do the same sort of thing by identifying certain parts of an HIV chain that we can attack properly. >> David Heckerman: The difference between HIV and small pox is that the case of small pox, small pox does not mutate. So you can take any -- not any, but certain specific parts of small pox administer it as a vaccine. Your immune system will train to recognize those bits and attack effectively and small pox doesn't mutate. It's dead when your immune system has at it. The case of HIV, however, if you were to give it the wrong portions of HIV, if you were to train your immune system to the wrong portions of HIV, the immune system would attack those wrong portions. HIV could then mutate around those spots and the vaccine would be ineffective. So what we're trying to do is to find the spots on HIV that will be effective for vaccine to help our immune system get rid of HIV. >> Robert Hess: Now, I still am finding this a little bit interesting to see Microsoft so deeply involved in doing HIV research. I mean, where does your research actually play out? Are you taking in going down standard Microsoft channels in some fashion or are you going straight to the medical industries? >> David Heckerman: We are working with external collaborators. We don't have any wet labs or fume hoods here at Microsoft, so we're very highly dependent on collaborators across the world working with us. Some of my collaborators include Bruce Walker. He's at Harvard. Florency Perare [phonetic] at Harvard. Phillip Golder [phonetic] at Oxford. We work them almost on a daily basis trying to figure out these various angles. It was actually Bruce and Phillip that came up with this idea of vulnerable spots to begin with, paralleling what we had done in the spam case. And so it's a constant dialogue back and forth, what can we do, what experiments can we do to try to narrow our uncertainty in what's going on with HIV. Then they do the experiments. We analyze the data, and we go back and forth like that. It's quite an exciting and fun experience. >> Robert Hess: So they're essentially bringing to the table the medical side of the data and the techniques and understanding, the wet sciences, per se, and then you're bringing more the dry sciences, the computing power, the access to the other researchers at Microsoft Research and e-science as well as some of the computing parts like HPC things like that to assist in these sciences? >> David Heckerman: Yes. >> Robert Hess: HIV is such a strange problem and a lot of different capabilities. I think it only makes sense then to try and attack the problem the same way it's attacking all of us as well. >> David Heckerman: Exactly. >> Robert Hess: How long have you been working on the HIV problem? >> David Heckerman: About six or seven years now. >> Robert Hess: That's quite a while. >> David Heckerman: Yes. >> Robert Hess: Are you seeing big inroads being made recently or just kind of steady progress throughout the time? >> David Heckerman: I would characterize it more as steady progress, with an occasional big insight. Like, for example, this insight about attacking the vulnerable spots at HIV, I think that was a very clear potential breakthrough, and really steered our work after that. >> Robert Hess: Now, do you also see some of this stuff we're using to try to combat HIV, expanding out and assisting us with other diseases as well that aren't mutatable diseases? >> David Heckerman: There are other viruses much like HIV and those are things we hope our work will also be applicable to immediately. There's hepatitis C, for example, which mutates about just as much as HIV does. It's remarkable, another remarkable virus in a negative sense. So that's perhaps the first thing that we would tackle. Some of the more general concepts might also be applicable to, say, influenza, which is, still, as we know, a problem. >> Robert Hess: How big is the HIV problem itself? I mean, like, used to be we heard about it all the time. I know it's still out there. Has prevalence, more prevalent than it was before but we're not seeming to hear about it much. >> David Heckerman: It could be simply because people are waiting to get that, to hear about the progress. It's still very bad. 5,000 people per day die from HIV. It's less -there's less deaths in the United States because there are effective treatments now. But these treatments are expensive and they require that you take the treatments on a regular basis. And so they're not ideal for developing countries. And we all still think that the vaccine is the greatest hope for these developing countries. >> Robert Hess: And you think we're on the verge of having a vaccine that can actually get rid of the HIV problem? >> David Heckerman: I wouldn't say verge. And there's so many people that are affected by HIV that I definitely don't want to give false hope here in this discussion. We have a long ways to go. This idea of the vulnerable spots is a very interesting one. We're gathering evidence to try to confirm this hypothesis. We actually have a vaccine design that's ready to go to test this hypothesis. But that test is going to take a long time, testing an HIV vaccine is very slow. It takes a long time. You can't give someone a vaccine then give them HIV to see if it works. You just can't do that. It's a very time-consuming process. And even if this initial test works out, we know that the vaccine design we have right now is far from perfect so there's still a lot more work to be done there. >> Robert Hess: A lot of fine tuning and stuff. >> David Heckerman: Yes. >> Robert Hess: What's specifically some of the tools you're working with to assist you in this process? >> David Heckerman: Well, one of the things we need to do, or one of the key things we need to do is to find these vulnerable spots. And to do that we've developed a program called PhyloD.net. Basically what you do is you get a set of people that are infected with HIV and you draw blood from them. Then you sequence their HIV. You then also sequence their DNA. So now you've got the sequence of the virus and you've got the sequence, the DNA sequence of the host. And you look for correlations between those things. And they're surprisingly are correlations and those correlations help you to find these vulnerable spots on HIV. And this map here shows some examples of potential vulnerable spots. This is the circle here shows one of the proteins that you find in HIV. And then these letters out here show the points of attack of the immune system of different people's immune systems at these various points along HIV. And then these arcs here show how HIV, once attacked in one spot, tries to mutate in another spot to make up for the initial attack. >> I see, so each one of these identifies a location where you would try to train the body to attack at that point? >> David Heckerman: Right. Well, these spots here are potential vulnerable spots. Some of them may be decoy spots. So spots where if the immune system attacks it does no good because HIV can mutate. There's subsequent tests we can do after we find all the potential spots to tell which ones are vulnerable and which ones are not vulnerable. So these are all the spots and so step one is find all the spots. Step two is find the vulnerable spots and step three is find out how the virus might mutate to recover from being attacked at a vulnerable spot and then have the vaccine target both the original attack point and the response, the makeup response point as well. >> Robert Hess: So it can't recover from the attack to begin with, if you're actually taking the background out of it? >> David Heckerman: Right. >> Robert Hess: This is tool, this is part of the application you wrote and designed? >> David Heckerman: Yes. >> Robert Hess: It's using some of the stuff you were doing before? >> David Heckerman: This approach uses the graphical models, the very same graphical models that we used to do medical diagnosis and airplane jet engine diagnoses and spam filtering all that. >> Robert Hess: More of a biological spam filter to a certain extent. >> David Heckerman: If you will, yes. >> Robert Hess: And what do you think the next step of this application and some of your HIV work would be then? >> David Heckerman: First, we want to test the concept that there are these vulnerable spots. We have some lab evidence that suggests this is true. But we actually want to deploy it in the field with a clinical trial. Give people this vaccine, although imperfect vaccine, and see whether it really does help protect them against HIV. And at the same time we're doing that, we want to find more of these vulnerable spots and one thing we've done recently which opens up that door quite nicely is it turns out that when your body takes DNA and turns it into protein, the protein are the machines of life, if you will. When it does that translation, it can make mistakes. So it turns out that the way you turn DNA or RNA into a protein you raid three base pairs at a time. That's what mother nature decided. And sometimes the machine that does that translation slips, accidentally, so you get a frame shift error. So instead of reading these three, these three, these three, you're reading one from the previous and two from the next and so forth. And that leads to basically garbage proteins. And the thought was, well, maybe these garbage proteins are also being attacked by the immune system. And if they were, they are either potential vulnerable spots or maybe they're potential decoy spots that you want to avoid to put in a vaccine. Either case we want to know if they're there. So we use the same tool, PhyloD.net, and applied it to looking for vulnerable spots in the garbage proteins. Lo and behold they were there. And, in fact, we found just as many spots in the garbage proteins as we're finding in the normal proteins. So this opens up a whole new avenue of work for the vaccine design. >> Robert Hess: Now, the vaccine design, would it be something that would need to address like a bunch of spots at once or do you have several different vaccines that you need to give in a cycle? >> David Heckerman: Especially in developing countries you want to try to develop something that is literally one shot. You want to be able to give them one shot. You may never see these people again. So you want to take care of everybody at once, so to speak, with one shot. And since different people have different immune systems, they will target different spots on HIV. So you actually have to build a vaccine that's a cocktail that where some portions of the cocktail work for one person, other portions of the cocktail work for another person. And that's sort of the avenue we're going down right now. >> Robert Hess: So you have this complex HIV strand, but then you've also got, in someone else, another complexity associated with them as well. So you've got to address everything at once? >> David Heckerman: That's right. If you're going to make effective vaccine you want to cover a lot of people at once. >> Robert Hess: What do you think is next for you? >> David Heckerman: Well, I really love working in this area of biology -- at the interface of biology and computer science. And one of the -- I'm really putting a lot of effort into the HIV vaccine. But another thing that's very interesting is looking for genetic causes of disease. It turns out now that it's very inexpensive to sequence our DNA. So if you can do that inexpensively then you can get a bunch of people who, some of whom have disease, some of whom don't have a disease then just compare their DNA and see, well, what are the differences. If you can isolate what sections of DNA are responsible for that disease you can develop, perhaps, drug treatments or other interventions or maybe even cures for the disease. So I've been working with my team in the e-science group to develop methods that can do just that. >> Robert Hess: And what exactly is e-science group? I know Microsoft Research, we have the big group of them doing all sorts of research from surface to mice to medical things. What exactly is e-science? >> David Heckerman: The purpose behind the e-science group is to just explore this wonderful convergence between not only biology and computer science, but all the sciences and computer science. So, for example, not only do we have this biological work going on in the e-science group, but we have work looking at carbon climate. The global warming phenomenon. And there, as you might guess, there's a lot of data coming in from, say, satellites, from measurements all around the globe and you want to process this data and try to infer various models for what's causing global warming and explore or look at what various interventions can do to mitigate those problems. >> Robert Hess: So it's more things that you wouldn't necessarily immediately think Microsoft's involved in this technology, or our technology might assist that process, but global warming, HIV, genetic diseases, aren't the sort of thing that you would expect Microsoft to actually be spending money time and research on to actually solve that problem but it's still important for everyone in the world. >> David Heckerman: But when you recognize that these problems -- by the way, another one we're working on is energy production. Turns out sugarcane is a great source of energy. And wouldn't it be nice if we could make sugarcane produce even more energy and try to rid our dependence on oil? So it turns out that all these very important problems to society involve or really need computer science to help solve those problems. You've got data management problems. Data analysis problems. Tons and tons of data you've got to use the computation to help with these problems. >> Robert Hess: Are you doing much with the HPC program as well? The high powered computers and things which are designed for doing large data modeling and large data processing, is that playing a role in all of this? >> David Heckerman: Absolutely. HPC is key for our work. For example, this PhyloD program that we've been talking about, to analyze just one position on HIV and one immune system type takes a few seconds. But there's 3,000 positions on HIV and there's hundreds of different immune system types. And so the amount of computation to do this on one machine is about, it would take one or two years to do the computations. But these computations fortunately in our case are what we like to call pleasantly parallel. So we use HPC. And we can get this stuff done over a cup of coffee. >> Robert Hess: Now, from a personal side, what do you do in your personal life? >> David Heckerman: Well, I work on HIV vaccine design. Work on genetic causes of disease. I have a great family and spend my time with them. I love them. I have -- my wife and two kids, 11 and 13. We're just having a blast. >> Robert Hess: What are the hobbies you have? >> David Heckerman: I play music. I play guitar, piano. Play trumpet. I'm still in a band that reaches occasionally like once every year. >> Robert Hess: Seattle punch rock bands. >> David Heckerman: I'm from LA. We're still -- it's a '70s band. We never broke up. >> Robert Hess: So you kept on going from back in the '70s. >> David Heckerman: That's right. >> Robert Hess: Really? The same people. >> David Heckerman: Same people. >> Robert Hess: What type of music do you play. >> David Heckerman: Chicago, tower of power, rhythm and blues kind of things. >> Robert Hess: Do you dress up in costumes. >> David Heckerman: Actually, there were a few times when we very reluctantly did that. But we actually got started before the disco era. And so disco was a very bad thing for us. And all that dressing up stuff was not -- we didn't like that. >> Robert Hess: Bad blood from those days? >> David Heckerman: Yeah. >> Robert Hess: And you mentioned you're actually living down and working down in LA. Most people realize that Microsoft's main campus is up here in Redmond where we are right now. How awkward is that to take and be working remotely like that in a small team that's not in with the regular group? >> David Heckerman: It's not awkward at all. Some of us are down there. Some of us are up here. You know with all the Microsoft tools we have for remote collaboration, it's not a problem at all. >> Robert Hess: So technology, again, assists you in that. >> David Heckerman: Thank you Microsoft for making it easy for us to do this. >> Robert Hess: Now, of course, down in the LA area a lot of companies are down there that are non-Microsoft technology companies. Do you have any problems, issues working with people down there? Or doesn't feel like you're in the lion's den or something like that? >> David Heckerman: No, not at all. In fact, most of our collaborators not at Microsoft are the biologists scattered all over the place, Harvard, Oxford, Australia, all over. South Africa. They're all over. >> Robert Hess: It's a global community. >> David Heckerman: The only problem is the earth is round. I mean, the time change is the only problem. >> Robert Hess: Yeah, can't fix that with technology, can we? Now, we've got a couple of questions we like asking all of our guests to get a nice pulse of what they are. They're always the same sort of questions. We call them our mantra questions. So what book would you like to recommend everyone to read? >> David Heckerman: Well, the first one that comes to mind, it's a very old book, but it really opened my eyes. It's called Gödel, Escher, Bach by Hofstadter. >> Robert Hess: Know it well. >> David Heckerman: It actually -- it's a fairly religious book, because the main premise of the book is that you and I are machines, consciousness is generated from information processing. And that's a lot to swallow, I think, for some people. But he makes a very compelling argument for it. And I think whether or not you believe that, certainly just suspending disbelief for a moment and assuming that's true, you can, I think, make remarkable progress in understanding how our intellect and how our brains work. >> Robert Hess: Plus I think just having that discussion with yourself and the book I think takes and helps you understand exactly where your position is and what things you want to believe in as well from that standpoint. It's a thick book. It's a hard read. But it definitely is something I also would recommend to people to get to. And next, if you weren't working in the computer industry, what do you think you'd be doing? >> David Heckerman: Okay. Well, that's somewhat an easy question, because I'm kind of not working in the computer industry. So I'd be doing the same thing. But if I wasn't working on computers or biology. >> Robert Hess: If you weren't doing what you're doing now. >> David Heckerman: If I'm not doing what I'm doing now. >> Robert Hess: Be a '70s band. >> David Heckerman: I could very likely be in a band. I could see doing that. I love music. I could be -- I could be doing mathematics, some other form of mathematics proving something. >> Robert Hess: Quantum physics? >> David Heckerman: Actually, yeah, I would be -- if I couldn't do this right now, I think I would go back and look at these very weird things that happen in the quantum world. I'd try to understand those better. >> Robert Hess: What do you feel is the most important technology in today's world? >> David Heckerman: Well, I think it's not known that it exists by most people right now, but I think it's definitely, in my book, it's the most important. And that is our ability now to look at biological sequences, both DNA and RNA proteins. We can do it now. For example, you can pay a company like. Navigenics or 23andMe a couple hundred bucks, they send you a kit. You spit in a tube. They send you back a list of 500,000 different base pairs on your DNA. Very informative base pairs that tell you something like you're susceptible to this disease or you're not or you are susceptible to this drug or this drug would work well for you. Just an amazing technology. Again, there's this better than Moore's law phenomenon happening in biology now where you're getting a doubling in the amount of data that you can produce every ten months. And so soon we'll be able to sequence our whole genome, if you get cancer we'll be able to sequence the genome of your normal cells and compare them to the genome of your cancer cells and figure out exactly what's gone wrong in the cancer, maybe develop a vaccine or a treatment specifically for your cancer. It's right here, right now. It's happening right now. >> Robert Hess: I think we just like the tip of the iceberg. Just barely got to the point of that book of you and now we understand what we can do with that book of you. >> David Heckerman: Absolutely. It's just amazing what's going on. >> Robert Hess: 25 years from now, what new technology would you like to see be available? >> David Heckerman: Well, this is a follow-on of what I just mentioned. I think within 25 years, if we follow this line of sequencing, understanding what's going on with our bodies, I think we'll begin to understand how we can beat aging. Not completely. I think death and taxes will still be the only things that are not uncertain in our lives. But I think we'll be able to -- there will be technologies out there, either gene therapies or treatments or something, we'll find the genes that lead to, that protect us more so than we're protected right now by the ravages of time. And we'll be able to -- I would think most people should be able to live to at least 100. And what's good about that is if you get -- if you can increase the life span of people that much, imagine what new scientific break-throughs we're going to have. I mean, I'm amazed at what scientific break-throughs we had in the renaissance with people dying at the age of 50 and 60. And the longer our life spans are going to get, the more a single individual is going to be able to develop enough knowledge and wisdom over their lifetime in order to come up with some really new amazing breakthroughs that we couldn't even imagine before. So there's kind of a positive feedback loop with this technology. >> Robert Hess: You think aging itself could possibly be reduced? >> David Heckerman: Yes. >> Robert Hess: Because my understanding basically it's like an unwinding, relaxing of the proteins or something that's causing some of the ->> David Heckerman: That's one theory. Chromosomal damage. Oxidative stress. There's lots of theories out there. But if a parrot can live to a hundred years, why can't humans? It's not impossible. It's the same biological system. Presumably there's some protective mechanisms. We already know there's protective mechanisms that are constantly repairing our DNA when it gets damaged. Probably there's some other repair mechanisms that are available out there that mother nature has created, for example, in the parrot that we can use for humans to extend their lives as well. >> Robert Hess: It's not just eating more crackers, right? [laughter]. >> David Heckerman: Although you might be able to put the drug in a cracker. Nice touch. >> Robert Hess: Lastly, this is kind of the fun part that all my guests really love to get into. Is testing out their creative skills. We'd like you to draw and then explain your favorite data structure and be sure to sign it. >> David Heckerman: Absolutely. So maybe you can guess what my favorite data structure is. I've mentioned it several times throughout the talk. It's the graphical model or the bayesian network. So, first of all, let me tell you what such a thing does. Basically a graphical model or bayesian network is a representation that allows you to uncode what's called a joint probability distribution. So when you have a complex problem, you have all sorts of variables. And these variables have different states. And what you want to know, ultimately, is the probability of every possible combination of those states. So let's just do something very simple here. Let's suppose there are three variables and so we'll say there's variable X1, X2 and X3 and just for the sake of argument here, let's just say that each of them have two states. Either on or off. Okay. So there's eight possible states. Two here, two here. Two here. Eight possible states and what we want to know is the probability of all those eight states. So we are interested in the probability of X one. X two and X three and that's how we write it. Now, one thing to do is just list them. List in this case eight probabilities. That if I have N variables here and they're all binary, suddenly, I have to list two to the N numbers, and you can't do that. So a graphical model helps us do that and it takes advantage of something called conditional independence, which I'll get to in a moment. So one way you can write this probability, it's called one of the rules of probability is called the product rule. So I can rewrite this probability at the probability of the first variable times the probability of the second variable, given that I know what the value of the first variable is. That's what that [inaudible] stroke means it means given. Times the probability of the third variable given the first two variables. Okay. So these are just the same thing. And there's no magic here. There's two numbers here well, one number because they have to sum to one. There's two numbers here and four numbers here. That's seven. These numbers have to sum to one. These probabilities have to sum to one. So there's seven numbers here and seven numbers here so I haven't done anything fancy yet. But there's this thing called conditional independence, which says that, well, what if the probability of X3 only depends on X1 and it doesn't depend on X2? So basically you can cross that term out of this probability expression. So now we have less numbers to deal with. And you can encode these eight numbers with fewer numbers down here. Now the bayesian network is a graphical way to represent this and it's very convenient in practice. So if I were to represent the bayesian network for the first thing I drew, it would look like this, X1 points to X2, points to X3. All right. And the other thing that X3 depends on is X1. So I have another arc here. So basically whenever you have a probability for a variable depending on another variable, you just draw an arc from the conditioning variable to the target variable. As shown here. So this graph, it's complete, it says there's no independence and so it's not that useful. But here, if you do have this independence where X2 does not influence the probability of X3, I can remove this arc. And now I have a simpler graph and it allows me to code this probability distribution in a much simpler way. You're probably thinking who cares, this is just three variables. But when you have a thousand variables, then usually you have a lot of conditional independence so you can dramatically reduce the amount of information needed to represent the joint distribution. And when you can do that, you can start using the computer to answer all sorts of questions like what's the probability of X7 and X8 given X2 and X3 and that's the thing you need when you build an expert system. >> Robert Hess: Thank you. Make sure you sign it. We'll get your John Hancock. We'll post it on the wall, maybe sell it on eBay or something like that. Thank you, David, for the [indiscernible] network and being our guest today. We hope all of you enjoyed this chance to place technology and the person behind the code.