>> Kirsten Wiley: Good afternoon, my name is Kirsten Wiley and I am here today to introduce and welcome Stephen Baker who is visiting us as part of the Microsoft Research Visiting Speaker Series. Stephen is here today to discuss his book Final Jeopardy Man Versus Machine and the Quest to Know Everything. For three nights in February, 2011, IBM’s computer sensation, Watson battled human champions Ken Jennings and Brad Rutter in an epic match on the quiz show Jeopardy. When Watson trounced the competition, it leaves many of us wondering what the future looks like. How do smart machines fit into our world and how will they disrupt it? Stephen Baker was BusinessWeek's senior technology writer for a decade. He is the author of The Numerati and has written for the LA Times, Boston Globe, and the Wall Street Journal. Please join me in welcoming him to Microsoft. [applause] >> Steven Baker: Okay. Is this the microphone? Yeah, okay, it is so little. Hi everybody, thank you for coming. You know, I was here three years ago, a little over three years ago to talk about The Numerati. Kim Ricketts brought me here and I was so sorry to see that she--I had had such a nice time with her, I was sorry to see that she had died several weeks ago, and so I just wanted to express my condolences to the Microsoft community and the Seattle community and book lovers in Seattle who she did so much to help you nurture. But you know I was here for The Numerati and it was about data and people like you who understand and work with data. And this one is about people like you building a great machine but it is about people like you that work at IBM. So there is something in common. Now when I was finishing up Business Week in 2009, and I was going to leave when Bloomberg took over. I could get a deal with severance pay when Bloomberg took over Business Week in December of 2009, so I was heading for the exit and looking for my next project. I had this idea to do a book about what do you need to know, and the idea being that there is information. Machines and networks give us information, so much information that they didn't used to give us and so what information do we need to keep in our heads? And I did a big proposal for that and I had high hopes for the book and my agent, I mean I editor Houghton Mifflin said no, it is too vague. It is an interesting idea but you need a story to hang it on; you need a tale. You can't just--people don't want to read about Stephen Baker kind of talking to people wondering what you need to know. You need a story. She also said you also don't have any answers. I said well I would get that in the research, but she said no, you can't sell a book based on research that you haven't done yet and convince us to buy it. [laughter] So then I was having lunch with these people at IBM, this was November of 2009 and they told me about this Jeopardy computer Watson that they were building that was going to take on humans, and I thought this is my story because it has a beginning, it has an end, it ends in a championship match so it can be almost like a sports story, a narration that tells about the struggle with this idiotic machine in the beginning and gradually the machine gets smarter and smarter and in the end it has this big confrontation and we don't know how it's going to turn out. So that was the kind of story that I wanted to write. I do a proposal, and I tell my editor Amanda Cook at Houghton Mifflin we cannot afford to sit on this book the way we set on The Numerati for a full year between the time I wrote it and the time it was published. This is news. This is happening. This event is going to take place in January. The TV show is going to be in February and we have to run like hell and get that book in the stores by September 2011. And she said, you know, we’re going to have to run a lot faster than that. What I want you to do is do the first 9/10 of the book by November, this last November, we edit that over the Christmas holidays, you go to the match in January, you write the last chapter a couple of days after that match in January, and then the day after the TV show we start selling the book. And so that was her idea. And then somebody at Houghton Mifflin said, you know, there is so much anticipation before these events. There's all, they are going to be hyping the hell out of this thing and we want something to sell during that period, so what we are going to do is try to convince Amazon to sell a partial e-book, Amazon and B&N, a partial e-book where they will sell the first nine chapters a few weeks before the event and then people can, e-book readers can get the final chapter sent to them as an update right after the event. So that is what we ended up doing and I thought it was kind of innovative and fun and there were some glitches, but, you know, that's what it was. I should mention that if you do read the book, I thought that it was important to get Microsoft's voice in this book and it is not an easy thing for Microsoft to participate in a book that might turn out to be lionizing engineers at IBM, so I didn't get Microsoft but I did try and I regret not having Microsoft's voice in there. Now this story began in 2005. IBM, as you know, has these grand challenges and part of the reason that they have them is because they don't have anything to sell to consumers. IBM when I was a kid was the preeminent tech brand. People wanted IBM typewriters, and later they wanted IBM computers and now nobody buys anything that is associated with IBM and so they don't have a brand that people know about as Microsoft does. So they come up with these things and these are contrivances, these grand challenges, and they are often criticized for them because it is looking for a lot of PR and it has a lot of hype around it, but they would argue that they serve a purpose and I happen to agree. And they had finished, they had had a chess match in 1997 where the, Deep Blue beat Gary Kasparov and then they did a Blue Gene what was at the time the fastest supercomputer and that was supposed to be the next grand challenge. It did not get as much publicity, and they were looking for the next one. And the head of IBM Research Paul Horn was walking around Research trying to engage, get teams to sign on to a Jeopardy challenge. And he couldn't find anybody that was interested in it. First, Jeopardy seems to be kind of cheap, kind of crass. It is a quiz show; it doesn't have the elegance and sort of the eternal value of chess. It is just a quiz show that is run by a company that puts advertising on it, and so I think researchers were a little bit hesitant to sign up to build a machine to play Jeopardy. It seemed a little bit trivial. And they deal with trivia. And secondly, it was hard because they had, IBM had a question answering system called Pequent and every year it took place in these government-sponsored tests that missed competition. And Pequent had all kinds of troubles and it topped out at about 35% range on questions that were much simpler than Jeopardy questions. And plus it had a lot of time to deal with each question. I will give you an example of a problem with a Pequent question. One of the problems is it had trouble figuring out what it was supposed to look for. So in this one, one question says what is Francis Scott Key known for? Now that seems like a pretty easy question, but if you ask a computer what something is known for, what does known for tell you? Is it known for doing something? Is it known for being something? Is he known for being the victim of something, or the plaything of something? It wasn't clear. And so those sorts of misunderstandings hobbled their computer and as I say it got about one out of three questions right. Now Jeopardy, you have to answer a question in 3 seconds and it is a very wide, it is an incredibly wide domain. Sometimes just figuring out what you are looking for is very difficult. I will give you an example. Here is one Jeopardy clue. I am paraphrasing it but, his daughter and granddaughter, both Indian premieres, were assassinated. So what is the computer looking for there? It's not looking for a daughter or a grandson. It has to figure out the relationship and say, if his daughter was killed then we are looking for a father, and if his grandson was killed then we are looking for a grandfather. So teaching the computer to make that kind of assessment turned out to be really challenging. Another thing that spooked to them from the Jeopardy game, in the beginning they thought that the computer would have to have voice recognition and that would just add much more complexity and difficulty to it. So they couldn't get anybody to do it. But Horn tried again because he was getting pressure from above and they really thought that the future of their consulting business was building technology to grapple with oceans of unstructured data that was flowing across networks. So he finally got, David Ferrucci to sign on, but he only signed on conditionally. Now Ferrucci who you have probably seen on the, if you have been looking at the website or you see him on the television commercials, he is a highly neurotic New Yorker. And he is extremely loquacious, so he will talk and talk and talk and he is very nervous, and so he makes for a great source for book because the guy talks about anything at great length which is great for me. He had been working in semantic analysis and building a software platform to, an analytic platform to deal with different streams of unstructured data. It was called UIMA. That was one of the reasons he turned down Jeopardy the first time because he was busy with UIMA, but he finally said he would take it on; this was early in 2007, but he said he needed him six months for a feasibility study. He had two big issues. One of them, well both of them had to do with working at a big company, big research outfits, so I thought it might be relevant to you. First, one of his fears was that it would be too hard and the machine would fail and embarrass the company and embarrass him and of his researchers. The second fear which is sort of the diabolical twin of that fear was that it would be too easy, and that after IBM pumps all of this money into it, and all of this hype already starts, and some college kid peeks out from his dorm and says, you know, I think I have this figured out. I did it with a search engine and some open source stuff and that would be his nightmare. And so he tried to, he figured he would try to simulate that by getting one of his researchers to pretend that he is what they call a basement hacker. But he had another problem, Ferrucci did, he, this Pequent system, the system that about one out of three right in the question answering, he wanted to kill it. And inside the company there were people who were, who had supported Pequent and Pequent had a certain constituency within IBM research and he had to basically figure out how to kill Pequent. So what he did was he set up his researcher James Fann, he had just come from the University of Texas and he put him apart from the rest of the team and he said I am going to give you 30 days to come up with a Jeopardy machine. You can use search engines, you can use Wikipedia, you can use the whole internet, you can use any open source software you can get, anything that anybody else would get, you can use it. Do the best you can in 30 days. And meanwhile he told the Pequent team your machine is going to be facing the basement baseline in 30 days, but don't do anything to it. He basically, he said just change it so it can answer, so it answers questions in the form of a question the way Jeopardy does. So Fann takes off on his mission here and he looks for all kinds of simple solutions because he doesn't have much time. I will give you an idea of one of his solutions. He looks, he types the entire convoluted Jeopardy clue into a search engine query and then he looks at the first page of results, and the first page of results often points to a Wikipedia page and the title of that Wikipedia page occasionally is the correct answer for the Jeopardy clue. Now that would only work for maybe 5% of Jeopardy clues. You could not build a system, a functional system with that, relying on that type of algorithm, but he thought that if you had maybe 100 other algorithms that were equally clueless in every other area but in their one area of specialty, maybe together they could provide 60, 70, 80% of the answers. So that was sort of Fann’s idea. And he built his machine and it had all kinds of weaknesses, but in the bake-off according to a couple of different metrics it matched the Pequent. And so Ferrucci killed Pequent and he used what, the architecture that Fann had designed for the Jeopardy machine. So in mid-2007 he sends a memo that says we will do this within 3 to 4 years we can build a machine that will be competitive with championship caliber Jeopardy players, and in 5 to 7 years it will be invincible. But he said he didn't think it was worth it to spend the money to spend that extra two years to make it invincible. There was no business case for making an invincible Jeopardy machine. So they built a system that has all of these competing algorithms that come back down, each of them bringing its own candidate answers and then it is up to the analytic engine on his UIMA platform to sift through them and figure out which one it has confidence in and whether it has enough confidence in each answer to bet on it. So I give you an example of some of the problems they faced. Chile’s--no, I'm sorry. Argentina's largest, longest border is with this country. I already said it. It's Chile. So they give that to the Watson, what was at that point was called Blue Jay, and a lots of its algorithms go off hunting. And some of them are specialists in limericks. And they bring back absolute garbage that has to do with South American limericks. And some of them are specialists in pun detection and some of them are specialists in things that have to do with numbers and math. And so they're all bringing back garbage, but two of them bring back something that is close to the answer. One of them comes back, one of them is kind of like a search engine type algorithm that looks for how many times other borders are mentioned with Argentina's border, I'm sorry, with Chile's border, anyway, Chili's border. And another one is a geography specialist. And so they both come back and the search engine one says that the answer is Bolivia and the geography specialist comes back and says the answer is Chile, or Argentina, which everyone it was, anyway. It picked the wrong one. It paid attention to the popularity instead of the geography. So they have to go and look and see how it came up with that and what evidence it was looking at and how it was weighing that evidence and then they go and they change the weights a little bit so that on that type of question it will pay more attention to the geography specialist rather than the search engine specialist. And then they have to go through thousands of more questions and see whether that screws up their accuracy in other areas. So that sort of is what they were doing day after, tweaking all these different algorithms. Another one that they came up with was called Nested Decomposition. And that is important because there are many Jeopardy questions that require two levels of analysis. So this is one of them. Of the four countries that do not have diplomatic relations with the United States, this is the farthest north. So again, all of the algorithms go out and they hunt and they bring back their ridiculous limericks and parables and things like that, but one of them has the sense to break the question into two and say okay, what are the four countries with which we don't have diplomatic relations? Bhutan, Cuba, Iran, North Korea. And then it hands it over to a geography, just like a human would, the geography part of it goes and looks and finds out which one is farthest north. So that is the Nested Decomposition. Then they had another category in Jeopardy called before and after. And before and after is a specific thing to Jeopardy which is to concepts that would have nothing to do with each other except the shared word in the middle. So there is one that says this candy bar is a Supreme Court justice. So it is Baby Ruth Ginsburg. Baby Ruth is a candy bar; Ruth Ginsburg is a Supreme Court justice. And so they trained Watson to do that and Watson could startle people with his ability to do those sorts of ludicrous before and after questions. And this raised a concern among IBM people and especially Ferrucci, because it raised the question of whether they were spending a lot of time training a machine to basically win a quiz show. And as researchers, as you can all imagine, they were very, very sensitive to this subject that they were participating in a big PR extravaganza where they were going to spend a lot of time and a lot of PhD hours fine tuning a machine to win a ridiculous game. And so they would always stress, you know, we taught it these before and after algorithms, but it only took a tiny bit of our time and most of it, most of our attention is focused on creating advances in English language, or natural language systems that can answer questions and dig through data. And so it was always a sensitive subject for them. And that subject came up also in strategy, because strategy was another area where clearly they were training a machine to win the game and it didn't really have anything to do with what the machine was going to do in its career after the Jeopardy game. And so they always tried to deemphasize the amount of effort they were putting into the machine’s strategy. Now they had this guy Jerry Tesoro, I don't know if any of you know him. He is an IBM researcher who specializes in games and in the 90s he created a system where basically, a machine learning system to train a machine to be a champion at backgammon. And it goes through millions and millions of simulated games and learns incrementally and, about the, you know, the calculating each move and the probabilities with each move it and it became a backgammon champion. And so he used the same system for Watson trying to figure out how much Watson should bet if it lands on a daily double and it is behind by $1000 with 10 questions to go. And it figures, and so they figured that out and Watson, and they came up with a conclusion that humans under bet dramatically. Humans are timid, and tend to pay way too much attention to the risks involved and not enough to the rewards of betting heavily. So they went to Ferrucci and they said we need to get Watson to bet much more aggressively in these daily doubles and final Jeopardy. And Ferrucci blanched at the idea, because it was scary to him that in the one game, or two games when this machine is on national TV, a game that would have a lot to say about where his career was going and his team, that Watson, even if it made statistical sense to bet a lot, if it lost one of those big bets, what then? If Watson, if it makes sense in a million games when there is no cost of losing for it to bet aggressively, that is not the same as if you are on national TV and there is a high cost. And he said you guys didn't represent the cost of what embarrassment means to us and what humiliation on national TV means and you've got to incorporate that into your analysis. And so what they basically did was turn their back just a little bit on the statistics that ran the whole program and go back to the old ruling by the gut and returning to what humans always do which is lookout for things that might kill you. And so they had to kind of rejigger and make their betting strategy a bit more conservative than what the statisticians wanted. When they, at about this point they, well they did the questions on one server, it took Watson about two hours to go through all of its calculations and they had to get that down to 3 seconds. So they scaled it out to over 2000 cores and that was a big effort, and I don't write that much about it in the book because I'm not a hardware guy. But it was a big deal and they finally got it down to about two or 3 seconds, and they also sort of prestructured a lot of the data so that they, so that it was already analyzed for easy reading for the computer, that so-and-so is a noun, and here is a verb and this is the object of the verb in this is a name, and this is a county and so all of that was spelled out so that the computer could speed read. And it data cache at this point was about 75 GB and then that structuring took it up to 500. And then they had to run the whole thing on RAM. When they were, as Watson kept improving its performance, they had this chart and it was called the Jennings Ark, and it was named after Ken Jennings who was the champion who they were going to face in this match. They didn't know that they would be facing Brad Rutter but they knew that they would be facing Jennings. So it was the amount of time they buzz in and the amount of questions, and the percentage of the questions they got right. So in the upper right are the Jeopardy's champions, because they buzz in a lot and they get most of them right, and Watson was down here with like third grade level people and was moving up. And they had this big blue cloud of Jeopardy champions and then you had the red dots and they were Ken Jennings and he was sort of like an outlier. He was higher performing then, and they had to get there. And they were making good progress, but when I started the book, when I started researching the book in the spring of 2010, they were having their first sparring matches. And Watson was winning about two out of three against former Jeopardy players, so it was good players. It was winning about two out of three, but it was making some really embarrassing blunders. One time when it was asked about the diet of a butterfly, Watson said that butterflies eat kosher. [laughter] And one time it was asked about a famous French bacteriologist and Watson answered what is, “how tasty was my little Frenchman?” [laughter] it was the name of a 1971 Brazilian movie that it had come up with somehow. So this was a real concern for them because, again, Watson could get most of the questions right, but when it didn't know something, it was a horrible guesser. Because it didn't have any contextual, kind of contextual smarts that humans have. So if it is looking for something, it looked for Oliver Twist, who was a character in Charles Dickens 19th-century novel and I came up with the Pet Shop Boys, who were a 1990s techno band. So they had teams that were focused on where can Watson embarrass us and let's go in and troubleshoot the areas where it's most likely to embarrass us. But the areas where Watson could embarrass them were as broad as the world of knowledge itself. So it was kind of a futile thing. And what's more, if you think that they only had 120 questions in the two games that they were going to play, the chances that the embarrassment that you steered clear of was going to be one that comes up in that game is remote. So as I tell you Ferrucci, Dave Ferrucci is a very nervous guy and one day last spring he comes to lunch with me and he's basically flying off the handle. He is extremely upset. He has just been on a two-hour conference call with the folks at Jeopardy and they have told him that the machine as he has it, and he and his team have it, it’s buzzing too fast. And they had the machine basically just buzzing electronically and they said you, it is going to kill the humans. You have got to create a finger for that machine. And so for Ferrucci, first of all, all of his data from the sparring sessions was going to be as he saw it no good anymore if you change, if you change the nature of the machine. Secondly he said he you are trying to graft human limitations on my machine. We have built a brain and you are trying to turn it into a robot. And what is more, if we build a finger and the finger is really fast are you going to come in and say no I am sorry, that finger is a little too fast and slow it down a little bit? And are you going to do other things to try to balance it? And he said why don't you, you know, if we are trying to even the playing field, my machine is clueless when it comes to language. So why don't you have the humans deal with ones and zeros a little bit instead of, instead of everything being done with language? [laughter] But he lost that argument, but then he knew he was going to lose it, because he knew on TV people were going to say it was unfair if his machine was sending e-mails. So his bigger question, which really freaked him out, was the potential of bias in the questions. He, Jeopardy has about nine people whose full-time job it is to write Jeopardy clues. And these people were going to be writing 15 tournament of champions games and the computer was going to be in two of them. And he said it was impossible for them to forget that they were writing potentially for a computer and to write questions for humans, which is what Watson was trained on. He said they are going to, this is going to be turned into a Turing test where you test, you basically whether you are doing it consciously or subconsciously, you are going to be slanting the questions against the machine, which he often called my machine. So he would get on the phone with these Jeopardy people and have big arguments with them. And you have to understand that Jeopardy comes from the world, it was born in the 1960s right out of the quiz show scandals. And the quiz shows in the 1950s were crooked. They were bought. And Congress went in. President Eisenhower expressed indignation. It was a national scandal, and quiz shows were subjected to scrutiny and to a regime that I would say is much more stringent than anything the SCC has on publicly traded companies. [laughter] And so they do everything, they have all of their procedures, it's like for the space shuttle, or something like that. Every contestant has to have an escort. The contestants can never be in the same room. If the machine, if the board stops working, they have to turn their back to the machine and somebody has to be talking to them to distract them so they are not thinking about the game. They have all of these procedures and for a researcher in a major project that they are undertaking to bring up the idea that their question writing was biased was poisonous to them. So I went out and talked to the chief of Jeopardy about two weeks later, his name is Harry Friedman, he is the executive producer and he works on the Sony lot in California in Culver City which is part of Los Angeles. And he told me that basically we are stepping back. We don't know if we're going to do this project. This was in May and I was supposed to deliver my first chapters in July and I had this whole thing running and it was devastating to me that he could step back, but I think he was engaged in a game of chicken with IBM. But anyway, it was a pretty nerve-racking time. In the end they managed, they found a bunch of Jeopardy clues that had already been written and they used those for it and that satisfied Ferrucci and they got past that, that bump. There were a lot of people, part of my challenge in writing this book was to write not only about Watson, but about the broader implications for artificial intelligence and question answering. So I talked to other people in AI and I met a lot of skeptics; a lot of people were skeptical about the Watson program, not only because it was full of advertising and promotion, but also the very nature of the computer itself. There are people at Vulcan here in Seattle who sponsor a project called Halo and the Halo project attempts to actually teach the computer concepts so that it comes closer to understanding, so that it can reason based on things that it knows. So you teach it chemistry and the idea was to teach it enough chemistry so that it can pass an advanced placement test in chemistry for high school. Watson could not do this, because Watson doesn't know anything and Watson certainly can not reason based on the answers it comes up with because those are just strings of ones and zeros, as far as Watson is concerned. But the trouble with Halo is that it costs about $15,000 per page of a textbook to teach this information to the machine. The machine is limited to what you teach it. It cannot reason outside of that and so it can't really get past human intelligence at all. It can do a more efficient job of conveying what humans already know. And if you go outside of what it knows, it is very brittle and can't really deal with it. Then I went to the other extreme. I went to MIT and I talked to this researcher named Joshua Tanenbaum and he was skeptical about Watson for a different reason. He went over the, that before and after, the Baby Ruth Ginsburg thing, and he said okay, I have only seen a before and after one time, but then he made up a before and after. And he said okay, I forget what it was. Thomas Jefferson and Jefferson, Thomas Jefferson Davis, you know, founder of the Republic who later turned his, led a rebellion against it. And so it was Thomas Jefferson Davis. And he came up with that in a minute, and he said see, that is what human intelligence can do. Human intelligence not only can understand and answer questions but it can, it can figure out how to create new ones. And his idea is that we are just at the beginning of a long path towards truly smart machines and if you, he compared it to, smart machines to the Apollo project. In 1960 Kennedy says we are going to go to the moon and in 1969 people go to the moon. And he says that project began with Galileo, you know, writing mathematics expressing motion, and then it goes through the whole scientific revolution and the science was carried out from, you know, the 1500s until 1960. And then the science was mostly settled and then it was the engineers who take science and they send the man to the moon. And he says in terms of understanding the brain, which is where we are really going to get intelligent machines, we are really only at Galileo now. So he went to the other extreme. And Ferrucci would agree with them. Ferrucci has to build a machine in three years that can play Jeopardy and he doesn't really care if you call smart or not, and it isn't. I mean it is not smart. So anyway, the match is getting close. They have a team that is called a stupid team that is focused on how Watson can embarrass itself especially in final Jeopardy because it is a bad guesser and in final Jeopardy it has to answer, even if it has low confidence. With any other questions, if it doesn't understand them, if he doesn't have confidence, it doesn't bet and it doesn't make a fool of itself. In final Jeopardy they worried that it might make a fool of itself. And so they actually considered the idea of having Watson shrug its digital shoulders and say, I don't know, because would be a much smaller embarrassment than what eventually happened. Watson wins the game. I went on that January afternoon and those poor humans. The set kept breaking down and, you see it on TV as three half hours, but it went on for hours and hours. The machine did not suffer at all, but those humans were going through hell. And plus they were, it was like they were on a visiting team, because the whole crowd was cheering for the machine. [laughter] And so they had, you know, it was tough for them. Watson embarrassed itself as I'm sure many of you know with a category, it was US cities and it guessed that Toronto was a US city. I'll be happy, if we get the questions and answers, I'll be happy to walk you through why it made that mistake, or how it made that mistake, but I don't want to do deal with it now. After the show, I am writing my last chapter and the Sony people call me and say Alex Trebek wants to talk to you. Alex Trebek is the host of Jeopardy and he was a good sport during this, because he had to entertain the crowd while people fix the machines behind him, for hours. And he was very upset, because he said that the IBM people had not played straight and that they had sandbagged. Sandbagging is a term you use when you pretend that you are weaker than you are. If you are at Las Vegas and you don't have a good hand or something like that and then you put down your winning cards. And in the test matches where Watson played Ken Jennings and Brad Rutter the two days before this competition, Watson pursued a very dumb and elementary betting strategy. It just went straight down the board from the easiest to the hardest questions. And once the game started, Watson began hunting the high-value questions looking for daily doubles. And so Alex Trebek said that they flipped the switch and made it a smarter machine and that that was not fair. And the IBM team said well, you know, we were just testing out the electronics in the test matches and then we turned it to match play and anyway we sent a DVD to Ken and Brad and they had seen that Watson shops for daily doubles. And Ken and Brad said that they sandbagged regularly in test matches and so why shouldn't the machine? [laughter] And so it wasn't as big a deal for them as it was for Alex Trebek. Now after the show there was all kinds of criticism about Watson and it was as if humanity was offended that this machine had beaten humans in Jeopardy and for the good of humans they were arguing back and saying Watson doesn't know what it is. True. Watson doesn't know anything. True. Watson won because it was really fast on the buzzer. Well, that's true, but you have to take into account that they had, they had to have it get the right answers a whole lot of times for the buzzer to make any difference at all, because if it is fast on the buzzer and gets the answers wrong, it gets killed. So I give them partial, partial on that. But the long story is that people wanted to denigrate Watson and show that human beings have superior intelligence. And this is true. But I think they are missing the point. And the point is that this type of technology is coming, and I am sure some of you are working on technology like this that might be competing against what IBM is doing and it is not going to be too long when Watson is going to seem like an extremely primitive highly wasteful machine that answers questions, because we're going to have much more intelligent, much more efficient ones in every area of our lives providing answers for us for all kinds of, in all kinds of industries. And so I wrote a fable for the Wall Street Journal about this and it goes like this: Imagine in the year 1550 in Italy in a little town and there is this one man in this town who is brilliant. He has, he can sense by the pattern of birds and they feel in the air and the noises coming from the barn and many other data that is coming in, he uses these immense powers of pattern recognition and intelligence that he has, to know that tomorrow it is going to freeze, or tomorrow it is going to snow. And this guy is a weatherman in this town, and he uses his human genius to come up with this analysis and it is very valuable. And then a wagon comes into town with a big new instrument in it and it is a barometer made in Florence in the 1550s. And the barometer doesn't know who it is, doesn't, isn't intelligent, doesn't know what it is doing. It is just an idiotic machine, but it produces the same answers that this guy did with his magnificent brain. And so forget about what is so special about the human brain and just say, we are going to have machines, and many of you are going to build them, that answer our questions and so what are we, what are we as a society going to need to know to basically make a living and advance beyond the machines and use our intelligence to use the machines to keep our jobs and make the world a better place? And so I think that is the challenge that we face as a society as these machines march forward as I am sure that they will do. So that is my talk and I would be very happy to answer any questions that you people have about IBM, Watson, anything else. Yes? >>: Excuse me for coming in late. You may have mentioned this. At one point in your talk you said that they decided to use questions from previous Jeopardy games? And my fear is that the IBM team had run 20,000 or more questions through Watson to see how good, what if too many of the questions they ended up using in the game were actually part of the ones… >> Steven Baker: Okay, they had a training set of 180,000 previous Jeopardy questions. They hadn't gone through all of them, but they had gone through many, they had gone through more than 100,000 of them. I misspoke when I said previously written Jeopardy games; they had not been used yet. They were written for the season that was coming up that was going to start in June. They wrote 100 games for the season. It was going to start in June of 2010 and go through November. And they could take 30 of those and use them for this. Then they hire a company that specializes in making sure that they do everything by the books, and they choose them and they do all that. So it was ones that had not been used before. Yes? >>: A technical question about the buzzing. >> Steven Baker: Uh oh. >>: When, how is Watson given the clue? Because a human still needs some fractional time to read the clue, so is there still an advantage there? >> Steven Baker: Watson gets it as basically an e-mail, right? He just gets text. And then it has to parse the sentence and figure out what is the subject, who is the direct object, and all those types of things. So I would argue that humans have the advantage when it comes to reading a sentence quickly. I would argue that that is one of the areas that it doesn't have an advantage, because it can misunderstand a lot of these questions. >>: Yeah, we worked on parsers ourselves so we know [inaudible] >> Steven Baker: And it took, I think it took about a half a second of those 3 seconds, a half of a second was usually needed for parsing. Yes? >>: You mentioned in the book that there was some sort of blocking that was used in a games when human played that you can't buzz too early. So there is a fair playground because you can't buzz too soon. They had to have a mechanical or electronic way of ensuring that it doesn't buzz too soon, so I think that's what you were asking, yeah. >> Steven Baker: Well, the other thing is, see the way it works is there is a human who is sitting at the judges table who listens to Alex Trebek's voice and when he judges that Alex Trebek is finished reading the clue, he hits a button that turns on a light and opens up the bidding. So the way humans, like Ken Jennings and Brad Rutter, the way that they turn out to be phenomenally fast buzzers is they anticipate the end of his sentence and they get a feeling for the rhythms of the voice and so it's humans understanding humans, and in that way it is kind of like jazz, because you are waiting for that downbeat and Watson has no anticipation but lightning reflexes. It's about 10 ms once the light goes on, and so the humans are like jazz and Watson is more like techno. [laughter] Yes? >>: Can you describe the way it was, they had 100 or maybe 200 of these single category expert systems and then there would be some layer on top that would say select from those hundred. Now how does that work when you also said there was this other problem where there were two levels [inaudible] it seems those were ideal questions ask because you can't, each ideal question would have an ideal answer but how do you combine them? You have to do, the second level has to analyze the data from the lower level one? >> Steven Baker: You mean on the North Korea question? >>: If the question was and this is alphabetically first of the noble gases. Well, he could produce the noble gases, but now he has to select them alphabetically and so that would be a second level process. And so how does that play into the speed? >> Steven Baker: Well it does the same thing. It calculates, just like a human would, if it works. It doesn't always work. But if it works, it says okay, what are the noble gases? And then it lists them, and then the second part is list them alphabetically and pick out the first one. >>: So then you can imagine that the alphabetizing has to wait for the other guy… >> Steven Baker: It does, it does. >>: Okay, so they can wire it that way? >> Steven Baker: Yes. Now with North Korea, you could look at a list of the northernmost countries. It does not have to be one before the other. It could look at 100, 200 countries and see which one is farthest North and then look among them which one the United States does not have diplomatic relations with. Maybe it does it that way, but I kind of think about it a human way, which is to think of the four… >>: And so [inaudible] Finland or something, what was the answer to that question because it was [inaudible]. If it ran them in the wrong order… >> Steven Baker: Right. >>: I think the point of discrimination is when the final answer [inaudible] it wasn't looking at a separate subject. >>: Yeah, but the general way of doing it is they have 200 [inaudible] each producing a separate set of answers and then somehow choosing between those 200. There is no way to combine the two [inaudible] in the general [inaudible] >> Steven Baker: Right. Each one comes back with its own candidate answers and each one develops its own record in each type of question and so the computer develops more confidence in certain types of questions in certain, for each category. So it basically discounts the, I just… Yeah? >>: So I have two quick questions. The first is how did IBM and Jeopardy try to train this, because my question is Watson didn't have very many environmental factors taken into his system so he didn't know how well the other players were doing. Like he didn't get that sense of emotion. He didn't get the lights and the awe of being on the set of a TV show. He didn't get the intonation of Alex Trebek's voice, all of those environmental factors that humans hear and sense and stuff. So how did IBM try to take that into account? Did they try to make, you know, claim that this was a fair match based on the fact that Watson wasn't getting all of these types of information even though it didn't… Not that it impacts the humans but… >> Steven Baker: They didn't get too hung up on fairness, quite frankly. Because they thought that the Jeopardy people were hung up on fairness and in a man machine match it is never going to be fair, because they are totally different things. The machine had no memories of human life and no experience and no experiential memories and like here is one of the questions that would confuse a machine. When entering, oh shoot, I forget what the word is. When entering a room, there is this little piece of wood, what is this thing called? Baseboard. Okay. When entering a room, which way do you look to see the baseboard? So, humans know, but the machine would have to look for statistical evidence that I looked down to see the baseboard affixed in literature or whatever. It would be a very difficult thing for the machine to figure out. So basically what Ferrucci's point was, is it will never be a fair match. It can only be a fun match. [laughter] And it can tell us things, but it is not going to be fair. >>: Can I go back to half of that? What was the most surprising thing, not about Watson specifically, but just your whole experience writing the book like what surprised you the most? Was it, it could be the types of people you interacted with or something technically about Watson or just in general? >> Steven Baker: What surprised me the most? Well one thing that surprised me, you know, I learned a lot in this book. Because I went in as I do in many projects pretty ignorant about this stuff. Like, they had to build two Watsons. They had to build a development Watson and a game Watson. And so the development Watson had, was slower but would detail absolutely on every decision it made. And so that type of thing surprised me. But what really surprised me was that a machine could do this. It kind of just knocked my socks off. Yes? >>: How big was the team in terms of like man years? >> Steven Baker: Well I would say between 20 and 25x4, so that is about almost 100 man years. >>: And is this the same as the DQA project or are they…? >> Steven Baker: Yeah. It is the branded fancy promotional name for it. They call it Watson when they want to get on TV and they call it DQA when they want to sell it to American Express. [laughter] Yeah? >>: Would you please tell us why Watson made the answer Toronto…? >> Steven Baker: Okay. Watson through statistical analysis of the Jeopardy categories, Watson learned to distrust them, because the categories often, the questions were not directly associated with the categories. For example, you might have American authors, and then the question is about, and then the question would say something about JD Salinger's protagonist in Catcher in the Rye, so it's Holden Caulfield who is not an American author; he is a protagonist. And then you have others that are just bizarre. Like they had one that is called country clubs, and it had to do with different sticks that are used in different countries to beat people with. [laughter] So, you know, in France it's a baton. A baton is a country club in, came up in the country clubs. So Watson learns to discount, not pay a whole lot of attention to the category names. Now if the answer it comes up with is Toronto, and it is US cities, that drives his confidence down a lot, but it doesn't kill it as a potential answer. And so Watson got, basically it didn't know the answer. But it came up with Toronto and gave it 15% confidence which is just minimal. For instance, anything under 20 is basically garbage and 12% was Chicago. They should, it should have shrugged its shoulders. It put about six question marks after Toronto to show people that it really didn't know, but it would have been much better to just put the question marks with nothing. [laughter] But the other thing is because it doesn't understand a lot of questions and it doesn't understand a lot of data, it has to entertain the possibility that it may have misunderstood something or that something unlikely can turn out to be true. So if you look at, if you develop a system that is going to talk about male rock stars in American music history, and you have a list of names, you know, Bob, and Richard and whatnot, and then you have female names, and Alice is one of them, if the computer is looking and says Alice Cooper, it would say that it is impossible that Alice Cooper could be a man. But it has to have its mind open to that possibility that Alice Cooper can be a man. And so it has to have its mind open that Toronto can be a US city, but here it came back and bit them. Yeah? >>: So what was the reaction to the development team about that response? >> Steven Baker: Enormous embarrassment. And the funny thing was it came after a session in which Watson had been utterly dominant. And they had every reason to be proud of what they had accomplished, because it had just mowed down the humans and then it embarrassed itself right there. Incidentally, after that second round, even though it screwed up on that question, Sam Palmisano, the CEO of IBM, was sitting with the Jeopardy people and they were worried about a good show, and Watson was making a mess of the good show and the ratings did go down for the third day because of that, and, Palmisano comes up to Ferrucci and says maybe we should tone this thing down a little bit, dial this thing down a bit. So, any others? Yeah? >>: I think it was in the article that Ken Jennings wrote after the game that said that Watson was having trouble on short questions and those were the ones that the two of them could do because they were taking less than 3 seconds for Alex to say, and I just want to hear what you… >> Steven Baker: Watson needs at least 3 seconds to come up with, to go through all of its processes. And so if you have a category in which it is just two words, it says, I think it's like, wives of movie stars. And the only clue is Demi Moore or whatever, Ashton Kutcher, or whatever it is, then that doesn't give Watson enough time and in that category where they were racing, the humans were beating it, Watson was getting them all right but coming in late. >> Kirsten Wiley: Let's do one more. >> Steven Baker: One more? >>: I think with all the IBM guys [inaudible] have been working [inaudible] next chance they might have? >> Steven Baker: I don't have, I mean I know what Ferrucci wants to do. He wants to basically build a crowd source system where Watson learns to ask questions and can use the world to educate it, and build up a big data, sort of a knowledge base that way. But I don't know about what their next ones would be. I would like to know if this is something that you folks would like to work--I mean this idea of the, 25 people working for four years on something, is that something that Microsoft should be doing? I'm just, I mean there are only a handful of companies, like three in the whole world that could do a project like this and you guys are one of them. I think it would be fun. And I would like to write about it incidentally. [laughter] If you guys pick something really fun to do, I would love to come out here and do another book like this one. >> Kirsten Wiley: Thank you Steve. >> Steven Baker: Sure. [applause]