>> Kirsten Wiley: Good afternoon, my name is Kirsten... introduce and welcome Stephen Baker who is visiting us as...

advertisement
>> Kirsten Wiley: Good afternoon, my name is Kirsten Wiley and I am here today to
introduce and welcome Stephen Baker who is visiting us as part of the Microsoft
Research Visiting Speaker Series. Stephen is here today to discuss his book Final
Jeopardy Man Versus Machine and the Quest to Know Everything. For three nights in
February, 2011, IBM’s computer sensation, Watson battled human champions Ken
Jennings and Brad Rutter in an epic match on the quiz show Jeopardy. When Watson
trounced the competition, it leaves many of us wondering what the future looks like.
How do smart machines fit into our world and how will they disrupt it? Stephen Baker
was BusinessWeek's senior technology writer for a decade. He is the author of The
Numerati and has written for the LA Times, Boston Globe, and the Wall Street Journal.
Please join me in welcoming him to Microsoft. [applause]
>> Steven Baker: Okay. Is this the microphone? Yeah, okay, it is so little. Hi
everybody, thank you for coming. You know, I was here three years ago, a little over
three years ago to talk about The Numerati. Kim Ricketts brought me here and I was so
sorry to see that she--I had had such a nice time with her, I was sorry to see that she had
died several weeks ago, and so I just wanted to express my condolences to the Microsoft
community and the Seattle community and book lovers in Seattle who she did so much to
help you nurture. But you know I was here for The Numerati and it was about data and
people like you who understand and work with data. And this one is about people like
you building a great machine but it is about people like you that work at IBM. So there is
something in common.
Now when I was finishing up Business Week in 2009, and I was going to leave when
Bloomberg took over. I could get a deal with severance pay when Bloomberg took over
Business Week in December of 2009, so I was heading for the exit and looking for my
next project. I had this idea to do a book about what do you need to know, and the idea
being that there is information. Machines and networks give us information, so much
information that they didn't used to give us and so what information do we need to keep
in our heads? And I did a big proposal for that and I had high hopes for the book and my
agent, I mean I editor Houghton Mifflin said no, it is too vague. It is an interesting idea
but you need a story to hang it on; you need a tale. You can't just--people don't want to
read about Stephen Baker kind of talking to people wondering what you need to know.
You need a story. She also said you also don't have any answers. I said well I would get
that in the research, but she said no, you can't sell a book based on research that you
haven't done yet and convince us to buy it. [laughter] So then I was having lunch with
these people at IBM, this was November of 2009 and they told me about this Jeopardy
computer Watson that they were building that was going to take on humans, and I
thought this is my story because it has a beginning, it has an end, it ends in a
championship match so it can be almost like a sports story, a narration that tells about the
struggle with this idiotic machine in the beginning and gradually the machine gets
smarter and smarter and in the end it has this big confrontation and we don't know how
it's going to turn out.
So that was the kind of story that I wanted to write. I do a proposal, and I tell my editor
Amanda Cook at Houghton Mifflin we cannot afford to sit on this book the way we set
on The Numerati for a full year between the time I wrote it and the time it was published.
This is news. This is happening. This event is going to take place in January. The TV
show is going to be in February and we have to run like hell and get that book in the
stores by September 2011. And she said, you know, we’re going to have to run a lot
faster than that. What I want you to do is do the first 9/10 of the book by November, this
last November, we edit that over the Christmas holidays, you go to the match in January,
you write the last chapter a couple of days after that match in January, and then the day
after the TV show we start selling the book. And so that was her idea. And then
somebody at Houghton Mifflin said, you know, there is so much anticipation before these
events. There's all, they are going to be hyping the hell out of this thing and we want
something to sell during that period, so what we are going to do is try to convince
Amazon to sell a partial e-book, Amazon and B&N, a partial e-book where they will sell
the first nine chapters a few weeks before the event and then people can, e-book readers
can get the final chapter sent to them as an update right after the event.
So that is what we ended up doing and I thought it was kind of innovative and fun and
there were some glitches, but, you know, that's what it was. I should mention that if you
do read the book, I thought that it was important to get Microsoft's voice in this book and
it is not an easy thing for Microsoft to participate in a book that might turn out to be
lionizing engineers at IBM, so I didn't get Microsoft but I did try and I regret not having
Microsoft's voice in there. Now this story began in 2005. IBM, as you know, has these
grand challenges and part of the reason that they have them is because they don't have
anything to sell to consumers. IBM when I was a kid was the preeminent tech brand.
People wanted IBM typewriters, and later they wanted IBM computers and now nobody
buys anything that is associated with IBM and so they don't have a brand that people
know about as Microsoft does.
So they come up with these things and these are contrivances, these grand challenges, and
they are often criticized for them because it is looking for a lot of PR and it has a lot of
hype around it, but they would argue that they serve a purpose and I happen to agree.
And they had finished, they had had a chess match in 1997 where the, Deep Blue beat
Gary Kasparov and then they did a Blue Gene what was at the time the fastest
supercomputer and that was supposed to be the next grand challenge. It did not get as
much publicity, and they were looking for the next one. And the head of IBM Research
Paul Horn was walking around Research trying to engage, get teams to sign on to a
Jeopardy challenge. And he couldn't find anybody that was interested in it. First,
Jeopardy seems to be kind of cheap, kind of crass. It is a quiz show; it doesn't have the
elegance and sort of the eternal value of chess. It is just a quiz show that is run by a
company that puts advertising on it, and so I think researchers were a little bit hesitant to
sign up to build a machine to play Jeopardy. It seemed a little bit trivial. And they deal
with trivia. And secondly, it was hard because they had, IBM had a question answering
system called Pequent and every year it took place in these government-sponsored tests
that missed competition. And Pequent had all kinds of troubles and it topped out at about
35% range on questions that were much simpler than Jeopardy questions. And plus it had
a lot of time to deal with each question. I will give you an example of a problem with a
Pequent question. One of the problems is it had trouble figuring out what it was
supposed to look for. So in this one, one question says what is Francis Scott Key known
for? Now that seems like a pretty easy question, but if you ask a computer what
something is known for, what does known for tell you? Is it known for doing something?
Is it known for being something? Is he known for being the victim of something, or the
plaything of something? It wasn't clear. And so those sorts of misunderstandings
hobbled their computer and as I say it got about one out of three questions right.
Now Jeopardy, you have to answer a question in 3 seconds and it is a very wide, it is an
incredibly wide domain. Sometimes just figuring out what you are looking for is very
difficult. I will give you an example. Here is one Jeopardy clue. I am paraphrasing it
but, his daughter and granddaughter, both Indian premieres, were assassinated. So what
is the computer looking for there? It's not looking for a daughter or a grandson. It has to
figure out the relationship and say, if his daughter was killed then we are looking for a
father, and if his grandson was killed then we are looking for a grandfather. So teaching
the computer to make that kind of assessment turned out to be really challenging.
Another thing that spooked to them from the Jeopardy game, in the beginning they
thought that the computer would have to have voice recognition and that would just add
much more complexity and difficulty to it. So they couldn't get anybody to do it. But
Horn tried again because he was getting pressure from above and they really thought that
the future of their consulting business was building technology to grapple with oceans of
unstructured data that was flowing across networks.
So he finally got, David Ferrucci to sign on, but he only signed on conditionally. Now
Ferrucci who you have probably seen on the, if you have been looking at the website or
you see him on the television commercials, he is a highly neurotic New Yorker. And he
is extremely loquacious, so he will talk and talk and talk and he is very nervous, and so
he makes for a great source for book because the guy talks about anything at great length
which is great for me. He had been working in semantic analysis and building a software
platform to, an analytic platform to deal with different streams of unstructured data. It
was called UIMA. That was one of the reasons he turned down Jeopardy the first time
because he was busy with UIMA, but he finally said he would take it on; this was early in
2007, but he said he needed him six months for a feasibility study.
He had two big issues. One of them, well both of them had to do with working at a big
company, big research outfits, so I thought it might be relevant to you. First, one of his
fears was that it would be too hard and the machine would fail and embarrass the
company and embarrass him and of his researchers. The second fear which is sort of the
diabolical twin of that fear was that it would be too easy, and that after IBM pumps all of
this money into it, and all of this hype already starts, and some college kid peeks out from
his dorm and says, you know, I think I have this figured out. I did it with a search engine
and some open source stuff and that would be his nightmare. And so he tried to, he
figured he would try to simulate that by getting one of his researchers to pretend that he is
what they call a basement hacker. But he had another problem, Ferrucci did, he, this
Pequent system, the system that about one out of three right in the question answering, he
wanted to kill it. And inside the company there were people who were, who had
supported Pequent and Pequent had a certain constituency within IBM research and he
had to basically figure out how to kill Pequent. So what he did was he set up his
researcher James Fann, he had just come from the University of Texas and he put him
apart from the rest of the team and he said I am going to give you 30 days to come up
with a Jeopardy machine. You can use search engines, you can use Wikipedia, you can
use the whole internet, you can use any open source software you can get, anything that
anybody else would get, you can use it. Do the best you can in 30 days. And meanwhile
he told the Pequent team your machine is going to be facing the basement baseline in 30
days, but don't do anything to it. He basically, he said just change it so it can answer, so
it answers questions in the form of a question the way Jeopardy does.
So Fann takes off on his mission here and he looks for all kinds of simple solutions
because he doesn't have much time. I will give you an idea of one of his solutions. He
looks, he types the entire convoluted Jeopardy clue into a search engine query and then
he looks at the first page of results, and the first page of results often points to a
Wikipedia page and the title of that Wikipedia page occasionally is the correct answer for
the Jeopardy clue. Now that would only work for maybe 5% of Jeopardy clues. You
could not build a system, a functional system with that, relying on that type of algorithm,
but he thought that if you had maybe 100 other algorithms that were equally clueless in
every other area but in their one area of specialty, maybe together they could provide 60,
70, 80% of the answers. So that was sort of Fann’s idea.
And he built his machine and it had all kinds of weaknesses, but in the bake-off
according to a couple of different metrics it matched the Pequent. And so Ferrucci killed
Pequent and he used what, the architecture that Fann had designed for the Jeopardy
machine. So in mid-2007 he sends a memo that says we will do this within 3 to 4 years
we can build a machine that will be competitive with championship caliber Jeopardy
players, and in 5 to 7 years it will be invincible. But he said he didn't think it was worth
it to spend the money to spend that extra two years to make it invincible. There was no
business case for making an invincible Jeopardy machine.
So they built a system that has all of these competing algorithms that come back down,
each of them bringing its own candidate answers and then it is up to the analytic engine
on his UIMA platform to sift through them and figure out which one it has confidence in
and whether it has enough confidence in each answer to bet on it. So I give you an
example of some of the problems they faced. Chile’s--no, I'm sorry. Argentina's largest,
longest border is with this country. I already said it. It's Chile. So they give that to the
Watson, what was at that point was called Blue Jay, and a lots of its algorithms go off
hunting. And some of them are specialists in limericks. And they bring back absolute
garbage that has to do with South American limericks. And some of them are specialists
in pun detection and some of them are specialists in things that have to do with numbers
and math. And so they're all bringing back garbage, but two of them bring back
something that is close to the answer. One of them comes back, one of them is kind of
like a search engine type algorithm that looks for how many times other borders are
mentioned with Argentina's border, I'm sorry, with Chile's border, anyway, Chili's border.
And another one is a geography specialist. And so they both come back and the search
engine one says that the answer is Bolivia and the geography specialist comes back and
says the answer is Chile, or Argentina, which everyone it was, anyway.
It picked the wrong one. It paid attention to the popularity instead of the geography. So
they have to go and look and see how it came up with that and what evidence it was
looking at and how it was weighing that evidence and then they go and they change the
weights a little bit so that on that type of question it will pay more attention to the
geography specialist rather than the search engine specialist. And then they have to go
through thousands of more questions and see whether that screws up their accuracy in
other areas. So that sort of is what they were doing day after, tweaking all these different
algorithms. Another one that they came up with was called Nested Decomposition. And
that is important because there are many Jeopardy questions that require two levels of
analysis. So this is one of them. Of the four countries that do not have diplomatic
relations with the United States, this is the farthest north. So again, all of the algorithms
go out and they hunt and they bring back their ridiculous limericks and parables and
things like that, but one of them has the sense to break the question into two and say
okay, what are the four countries with which we don't have diplomatic relations? Bhutan,
Cuba, Iran, North Korea. And then it hands it over to a geography, just like a human
would, the geography part of it goes and looks and finds out which one is farthest north.
So that is the Nested Decomposition.
Then they had another category in Jeopardy called before and after. And before and after
is a specific thing to Jeopardy which is to concepts that would have nothing to do with
each other except the shared word in the middle. So there is one that says this candy bar
is a Supreme Court justice. So it is Baby Ruth Ginsburg. Baby Ruth is a candy bar; Ruth
Ginsburg is a Supreme Court justice. And so they trained Watson to do that and Watson
could startle people with his ability to do those sorts of ludicrous before and after
questions. And this raised a concern among IBM people and especially Ferrucci, because
it raised the question of whether they were spending a lot of time training a machine to
basically win a quiz show. And as researchers, as you can all imagine, they were very,
very sensitive to this subject that they were participating in a big PR extravaganza where
they were going to spend a lot of time and a lot of PhD hours fine tuning a machine to
win a ridiculous game. And so they would always stress, you know, we taught it these
before and after algorithms, but it only took a tiny bit of our time and most of it, most of
our attention is focused on creating advances in English language, or natural language
systems that can answer questions and dig through data. And so it was always a sensitive
subject for them.
And that subject came up also in strategy, because strategy was another area where
clearly they were training a machine to win the game and it didn't really have anything to
do with what the machine was going to do in its career after the Jeopardy game. And so
they always tried to deemphasize the amount of effort they were putting into the
machine’s strategy. Now they had this guy Jerry Tesoro, I don't know if any of you know
him. He is an IBM researcher who specializes in games and in the 90s he created a
system where basically, a machine learning system to train a machine to be a champion at
backgammon. And it goes through millions and millions of simulated games and learns
incrementally and, about the, you know, the calculating each move and the probabilities
with each move it and it became a backgammon champion. And so he used the same
system for Watson trying to figure out how much Watson should bet if it lands on a daily
double and it is behind by $1000 with 10 questions to go. And it figures, and so they
figured that out and Watson, and they came up with a conclusion that humans under bet
dramatically. Humans are timid, and tend to pay way too much attention to the risks
involved and not enough to the rewards of betting heavily. So they went to Ferrucci and
they said we need to get Watson to bet much more aggressively in these daily doubles
and final Jeopardy.
And Ferrucci blanched at the idea, because it was scary to him that in the one game, or
two games when this machine is on national TV, a game that would have a lot to say
about where his career was going and his team, that Watson, even if it made statistical
sense to bet a lot, if it lost one of those big bets, what then? If Watson, if it makes sense
in a million games when there is no cost of losing for it to bet aggressively, that is not the
same as if you are on national TV and there is a high cost. And he said you guys didn't
represent the cost of what embarrassment means to us and what humiliation on national
TV means and you've got to incorporate that into your analysis. And so what they
basically did was turn their back just a little bit on the statistics that ran the whole
program and go back to the old ruling by the gut and returning to what humans always do
which is lookout for things that might kill you. And so they had to kind of rejigger and
make their betting strategy a bit more conservative than what the statisticians wanted.
When they, at about this point they, well they did the questions on one server, it took
Watson about two hours to go through all of its calculations and they had to get that
down to 3 seconds. So they scaled it out to over 2000 cores and that was a big effort, and
I don't write that much about it in the book because I'm not a hardware guy. But it was a
big deal and they finally got it down to about two or 3 seconds, and they also sort of prestructured a lot of the data so that they, so that it was already analyzed for easy reading
for the computer, that so-and-so is a noun, and here is a verb and this is the object of the
verb in this is a name, and this is a county and so all of that was spelled out so that the
computer could speed read. And it data cache at this point was about 75 GB and then
that structuring took it up to 500. And then they had to run the whole thing on RAM.
When they were, as Watson kept improving its performance, they had this chart and it
was called the Jennings Ark, and it was named after Ken Jennings who was the champion
who they were going to face in this match. They didn't know that they would be facing
Brad Rutter but they knew that they would be facing Jennings. So it was the amount of
time they buzz in and the amount of questions, and the percentage of the questions they
got right. So in the upper right are the Jeopardy's champions, because they buzz in a lot
and they get most of them right, and Watson was down here with like third grade level
people and was moving up. And they had this big blue cloud of Jeopardy champions and
then you had the red dots and they were Ken Jennings and he was sort of like an outlier.
He was higher performing then, and they had to get there. And they were making good
progress, but when I started the book, when I started researching the book in the spring of
2010, they were having their first sparring matches. And Watson was winning about two
out of three against former Jeopardy players, so it was good players. It was winning
about two out of three, but it was making some really embarrassing blunders. One time
when it was asked about the diet of a butterfly, Watson said that butterflies eat kosher.
[laughter] And one time it was asked about a famous French bacteriologist and Watson
answered what is, “how tasty was my little Frenchman?” [laughter] it was the name of a
1971 Brazilian movie that it had come up with somehow. So this was a real concern for
them because, again, Watson could get most of the questions right, but when it didn't
know something, it was a horrible guesser. Because it didn't have any contextual, kind of
contextual smarts that humans have. So if it is looking for something, it looked for
Oliver Twist, who was a character in Charles Dickens 19th-century novel and I came up
with the Pet Shop Boys, who were a 1990s techno band. So they had teams that were
focused on where can Watson embarrass us and let's go in and troubleshoot the areas
where it's most likely to embarrass us. But the areas where Watson could embarrass
them were as broad as the world of knowledge itself. So it was kind of a futile thing.
And what's more, if you think that they only had 120 questions in the two games that they
were going to play, the chances that the embarrassment that you steered clear of was
going to be one that comes up in that game is remote.
So as I tell you Ferrucci, Dave Ferrucci is a very nervous guy and one day last spring he
comes to lunch with me and he's basically flying off the handle. He is extremely upset.
He has just been on a two-hour conference call with the folks at Jeopardy and they have
told him that the machine as he has it, and he and his team have it, it’s buzzing too fast.
And they had the machine basically just buzzing electronically and they said you, it is
going to kill the humans. You have got to create a finger for that machine. And so for
Ferrucci, first of all, all of his data from the sparring sessions was going to be as he saw it
no good anymore if you change, if you change the nature of the machine. Secondly he
said he you are trying to graft human limitations on my machine. We have built a brain
and you are trying to turn it into a robot. And what is more, if we build a finger and the
finger is really fast are you going to come in and say no I am sorry, that finger is a little
too fast and slow it down a little bit? And are you going to do other things to try to
balance it? And he said why don't you, you know, if we are trying to even the playing
field, my machine is clueless when it comes to language. So why don't you have the
humans deal with ones and zeros a little bit instead of, instead of everything being done
with language? [laughter]
But he lost that argument, but then he knew he was going to lose it, because he knew on
TV people were going to say it was unfair if his machine was sending e-mails. So his
bigger question, which really freaked him out, was the potential of bias in the questions.
He, Jeopardy has about nine people whose full-time job it is to write Jeopardy clues. And
these people were going to be writing 15 tournament of champions games and the
computer was going to be in two of them. And he said it was impossible for them to
forget that they were writing potentially for a computer and to write questions for
humans, which is what Watson was trained on. He said they are going to, this is going to
be turned into a Turing test where you test, you basically whether you are doing it
consciously or subconsciously, you are going to be slanting the questions against the
machine, which he often called my machine. So he would get on the phone with these
Jeopardy people and have big arguments with them. And you have to understand that
Jeopardy comes from the world, it was born in the 1960s right out of the quiz show
scandals. And the quiz shows in the 1950s were crooked. They were bought. And
Congress went in. President Eisenhower expressed indignation. It was a national
scandal, and quiz shows were subjected to scrutiny and to a regime that I would say is
much more stringent than anything the SCC has on publicly traded companies. [laughter]
And so they do everything, they have all of their procedures, it's like for the space shuttle,
or something like that. Every contestant has to have an escort. The contestants can never
be in the same room. If the machine, if the board stops working, they have to turn their
back to the machine and somebody has to be talking to them to distract them so they are
not thinking about the game. They have all of these procedures and for a researcher in a
major project that they are undertaking to bring up the idea that their question writing
was biased was poisonous to them.
So I went out and talked to the chief of Jeopardy about two weeks later, his name is
Harry Friedman, he is the executive producer and he works on the Sony lot in California
in Culver City which is part of Los Angeles. And he told me that basically we are
stepping back. We don't know if we're going to do this project. This was in May and I
was supposed to deliver my first chapters in July and I had this whole thing running and
it was devastating to me that he could step back, but I think he was engaged in a game of
chicken with IBM. But anyway, it was a pretty nerve-racking time. In the end they
managed, they found a bunch of Jeopardy clues that had already been written and they
used those for it and that satisfied Ferrucci and they got past that, that bump.
There were a lot of people, part of my challenge in writing this book was to write not
only about Watson, but about the broader implications for artificial intelligence and
question answering. So I talked to other people in AI and I met a lot of skeptics; a lot of
people were skeptical about the Watson program, not only because it was full of
advertising and promotion, but also the very nature of the computer itself. There are
people at Vulcan here in Seattle who sponsor a project called Halo and the Halo project
attempts to actually teach the computer concepts so that it comes closer to understanding,
so that it can reason based on things that it knows. So you teach it chemistry and the idea
was to teach it enough chemistry so that it can pass an advanced placement test in
chemistry for high school. Watson could not do this, because Watson doesn't know
anything and Watson certainly can not reason based on the answers it comes up with
because those are just strings of ones and zeros, as far as Watson is concerned.
But the trouble with Halo is that it costs about $15,000 per page of a textbook to teach
this information to the machine. The machine is limited to what you teach it. It cannot
reason outside of that and so it can't really get past human intelligence at all. It can do a
more efficient job of conveying what humans already know. And if you go outside of
what it knows, it is very brittle and can't really deal with it. Then I went to the other
extreme. I went to MIT and I talked to this researcher named Joshua Tanenbaum and he
was skeptical about Watson for a different reason. He went over the, that before and
after, the Baby Ruth Ginsburg thing, and he said okay, I have only seen a before and after
one time, but then he made up a before and after. And he said okay, I forget what it was.
Thomas Jefferson and Jefferson, Thomas Jefferson Davis, you know, founder of the
Republic who later turned his, led a rebellion against it. And so it was Thomas Jefferson
Davis. And he came up with that in a minute, and he said see, that is what human
intelligence can do. Human intelligence not only can understand and answer questions
but it can, it can figure out how to create new ones. And his idea is that we are just at the
beginning of a long path towards truly smart machines and if you, he compared it to,
smart machines to the Apollo project. In 1960 Kennedy says we are going to go to the
moon and in 1969 people go to the moon. And he says that project began with Galileo,
you know, writing mathematics expressing motion, and then it goes through the whole
scientific revolution and the science was carried out from, you know, the 1500s until
1960. And then the science was mostly settled and then it was the engineers who take
science and they send the man to the moon. And he says in terms of understanding the
brain, which is where we are really going to get intelligent machines, we are really only
at Galileo now. So he went to the other extreme.
And Ferrucci would agree with them. Ferrucci has to build a machine in three years that
can play Jeopardy and he doesn't really care if you call smart or not, and it isn't. I mean it
is not smart. So anyway, the match is getting close. They have a team that is called a
stupid team that is focused on how Watson can embarrass itself especially in final
Jeopardy because it is a bad guesser and in final Jeopardy it has to answer, even if it has
low confidence. With any other questions, if it doesn't understand them, if he doesn't
have confidence, it doesn't bet and it doesn't make a fool of itself. In final Jeopardy they
worried that it might make a fool of itself. And so they actually considered the idea of
having Watson shrug its digital shoulders and say, I don't know, because would be a
much smaller embarrassment than what eventually happened. Watson wins the game. I
went on that January afternoon and those poor humans. The set kept breaking down and,
you see it on TV as three half hours, but it went on for hours and hours. The machine did
not suffer at all, but those humans were going through hell. And plus they were, it was
like they were on a visiting team, because the whole crowd was cheering for the machine.
[laughter]
And so they had, you know, it was tough for them. Watson embarrassed itself as I'm sure
many of you know with a category, it was US cities and it guessed that Toronto was a US
city. I'll be happy, if we get the questions and answers, I'll be happy to walk you through
why it made that mistake, or how it made that mistake, but I don't want to do deal with it
now. After the show, I am writing my last chapter and the Sony people call me and say
Alex Trebek wants to talk to you. Alex Trebek is the host of Jeopardy and he was a good
sport during this, because he had to entertain the crowd while people fix the machines
behind him, for hours. And he was very upset, because he said that the IBM people had
not played straight and that they had sandbagged. Sandbagging is a term you use when
you pretend that you are weaker than you are. If you are at Las Vegas and you don't have
a good hand or something like that and then you put down your winning cards. And in
the test matches where Watson played Ken Jennings and Brad Rutter the two days before
this competition, Watson pursued a very dumb and elementary betting strategy. It just
went straight down the board from the easiest to the hardest questions. And once the
game started, Watson began hunting the high-value questions looking for daily doubles.
And so Alex Trebek said that they flipped the switch and made it a smarter machine and
that that was not fair. And the IBM team said well, you know, we were just testing out
the electronics in the test matches and then we turned it to match play and anyway we
sent a DVD to Ken and Brad and they had seen that Watson shops for daily doubles.
And Ken and Brad said that they sandbagged regularly in test matches and so why
shouldn't the machine? [laughter]
And so it wasn't as big a deal for them as it was for Alex Trebek. Now after the show
there was all kinds of criticism about Watson and it was as if humanity was offended that
this machine had beaten humans in Jeopardy and for the good of humans they were
arguing back and saying Watson doesn't know what it is. True. Watson doesn't know
anything. True. Watson won because it was really fast on the buzzer. Well, that's true,
but you have to take into account that they had, they had to have it get the right answers a
whole lot of times for the buzzer to make any difference at all, because if it is fast on the
buzzer and gets the answers wrong, it gets killed. So I give them partial, partial on that.
But the long story is that people wanted to denigrate Watson and show that human beings
have superior intelligence. And this is true. But I think they are missing the point. And
the point is that this type of technology is coming, and I am sure some of you are working
on technology like this that might be competing against what IBM is doing and it is not
going to be too long when Watson is going to seem like an extremely primitive highly
wasteful machine that answers questions, because we're going to have much more
intelligent, much more efficient ones in every area of our lives providing answers for us
for all kinds of, in all kinds of industries.
And so I wrote a fable for the Wall Street Journal about this and it goes like this:
Imagine in the year 1550 in Italy in a little town and there is this one man in this town
who is brilliant. He has, he can sense by the pattern of birds and they feel in the air and
the noises coming from the barn and many other data that is coming in, he uses these
immense powers of pattern recognition and intelligence that he has, to know that
tomorrow it is going to freeze, or tomorrow it is going to snow. And this guy is a
weatherman in this town, and he uses his human genius to come up with this analysis and
it is very valuable. And then a wagon comes into town with a big new instrument in it
and it is a barometer made in Florence in the 1550s. And the barometer doesn't know
who it is, doesn't, isn't intelligent, doesn't know what it is doing. It is just an idiotic
machine, but it produces the same answers that this guy did with his magnificent brain.
And so forget about what is so special about the human brain and just say, we are going
to have machines, and many of you are going to build them, that answer our questions
and so what are we, what are we as a society going to need to know to basically make a
living and advance beyond the machines and use our intelligence to use the machines to
keep our jobs and make the world a better place? And so I think that is the challenge that
we face as a society as these machines march forward as I am sure that they will do. So
that is my talk and I would be very happy to answer any questions that you people have
about IBM, Watson, anything else. Yes?
>>: Excuse me for coming in late. You may have mentioned this. At one point in your
talk you said that they decided to use questions from previous Jeopardy games? And my
fear is that the IBM team had run 20,000 or more questions through Watson to see how
good, what if too many of the questions they ended up using in the game were actually
part of the ones…
>> Steven Baker: Okay, they had a training set of 180,000 previous Jeopardy questions.
They hadn't gone through all of them, but they had gone through many, they had gone
through more than 100,000 of them. I misspoke when I said previously written Jeopardy
games; they had not been used yet. They were written for the season that was coming up
that was going to start in June. They wrote 100 games for the season. It was going to
start in June of 2010 and go through November. And they could take 30 of those and use
them for this. Then they hire a company that specializes in making sure that they do
everything by the books, and they choose them and they do all that. So it was ones that
had not been used before. Yes?
>>: A technical question about the buzzing.
>> Steven Baker: Uh oh.
>>: When, how is Watson given the clue? Because a human still needs some fractional
time to read the clue, so is there still an advantage there?
>> Steven Baker: Watson gets it as basically an e-mail, right? He just gets text. And
then it has to parse the sentence and figure out what is the subject, who is the direct
object, and all those types of things. So I would argue that humans have the advantage
when it comes to reading a sentence quickly. I would argue that that is one of the areas
that it doesn't have an advantage, because it can misunderstand a lot of these questions.
>>: Yeah, we worked on parsers ourselves so we know [inaudible]
>> Steven Baker: And it took, I think it took about a half a second of those 3 seconds, a
half of a second was usually needed for parsing. Yes?
>>: You mentioned in the book that there was some sort of blocking that was used in a
games when human played that you can't buzz too early. So there is a fair playground
because you can't buzz too soon. They had to have a mechanical or electronic way of
ensuring that it doesn't buzz too soon, so I think that's what you were asking, yeah.
>> Steven Baker: Well, the other thing is, see the way it works is there is a human who
is sitting at the judges table who listens to Alex Trebek's voice and when he judges that
Alex Trebek is finished reading the clue, he hits a button that turns on a light and opens
up the bidding. So the way humans, like Ken Jennings and Brad Rutter, the way that they
turn out to be phenomenally fast buzzers is they anticipate the end of his sentence and
they get a feeling for the rhythms of the voice and so it's humans understanding humans,
and in that way it is kind of like jazz, because you are waiting for that downbeat and
Watson has no anticipation but lightning reflexes. It's about 10 ms once the light goes
on, and so the humans are like jazz and Watson is more like techno. [laughter] Yes?
>>: Can you describe the way it was, they had 100 or maybe 200 of these single category
expert systems and then there would be some layer on top that would say select from
those hundred. Now how does that work when you also said there was this other problem
where there were two levels [inaudible] it seems those were ideal questions ask because
you can't, each ideal question would have an ideal answer but how do you combine
them? You have to do, the second level has to analyze the data from the lower level one?
>> Steven Baker: You mean on the North Korea question?
>>: If the question was and this is alphabetically first of the noble gases. Well, he could
produce the noble gases, but now he has to select them alphabetically and so that would
be a second level process. And so how does that play into the speed?
>> Steven Baker: Well it does the same thing. It calculates, just like a human would, if
it works. It doesn't always work. But if it works, it says okay, what are the noble gases?
And then it lists them, and then the second part is list them alphabetically and pick out the
first one.
>>: So then you can imagine that the alphabetizing has to wait for the other guy…
>> Steven Baker: It does, it does.
>>: Okay, so they can wire it that way?
>> Steven Baker: Yes. Now with North Korea, you could look at a list of the
northernmost countries. It does not have to be one before the other. It could look at 100,
200 countries and see which one is farthest North and then look among them which one
the United States does not have diplomatic relations with. Maybe it does it that way, but
I kind of think about it a human way, which is to think of the four…
>>: And so [inaudible] Finland or something, what was the answer to that question
because it was [inaudible]. If it ran them in the wrong order…
>> Steven Baker: Right.
>>: I think the point of discrimination is when the final answer [inaudible] it wasn't
looking at a separate subject.
>>: Yeah, but the general way of doing it is they have 200 [inaudible] each producing a
separate set of answers and then somehow choosing between those 200. There is no way
to combine the two [inaudible] in the general [inaudible]
>> Steven Baker: Right. Each one comes back with its own candidate answers and each
one develops its own record in each type of question and so the computer develops more
confidence in certain types of questions in certain, for each category. So it basically
discounts the, I just… Yeah?
>>: So I have two quick questions. The first is how did IBM and Jeopardy try to train
this, because my question is Watson didn't have very many environmental factors taken
into his system so he didn't know how well the other players were doing. Like he didn't
get that sense of emotion. He didn't get the lights and the awe of being on the set of a TV
show. He didn't get the intonation of Alex Trebek's voice, all of those environmental
factors that humans hear and sense and stuff. So how did IBM try to take that into
account? Did they try to make, you know, claim that this was a fair match based on the
fact that Watson wasn't getting all of these types of information even though it didn't…
Not that it impacts the humans but…
>> Steven Baker: They didn't get too hung up on fairness, quite frankly. Because they
thought that the Jeopardy people were hung up on fairness and in a man machine match it
is never going to be fair, because they are totally different things. The machine had no
memories of human life and no experience and no experiential memories and like here is
one of the questions that would confuse a machine. When entering, oh shoot, I forget
what the word is. When entering a room, there is this little piece of wood, what is this
thing called? Baseboard. Okay. When entering a room, which way do you look to see
the baseboard? So, humans know, but the machine would have to look for statistical
evidence that I looked down to see the baseboard affixed in literature or whatever. It
would be a very difficult thing for the machine to figure out. So basically what Ferrucci's
point was, is it will never be a fair match. It can only be a fun match. [laughter] And it
can tell us things, but it is not going to be fair.
>>: Can I go back to half of that? What was the most surprising thing, not about Watson
specifically, but just your whole experience writing the book like what surprised you the
most? Was it, it could be the types of people you interacted with or something
technically about Watson or just in general?
>> Steven Baker: What surprised me the most? Well one thing that surprised me, you
know, I learned a lot in this book. Because I went in as I do in many projects pretty
ignorant about this stuff. Like, they had to build two Watsons. They had to build a
development Watson and a game Watson. And so the development Watson had, was
slower but would detail absolutely on every decision it made. And so that type of thing
surprised me. But what really surprised me was that a machine could do this. It kind of
just knocked my socks off. Yes?
>>: How big was the team in terms of like man years?
>> Steven Baker: Well I would say between 20 and 25x4, so that is about almost 100
man years.
>>: And is this the same as the DQA project or are they…?
>> Steven Baker: Yeah. It is the branded fancy promotional name for it. They call it
Watson when they want to get on TV and they call it DQA when they want to sell it to
American Express. [laughter] Yeah?
>>: Would you please tell us why Watson made the answer Toronto…?
>> Steven Baker: Okay. Watson through statistical analysis of the Jeopardy categories,
Watson learned to distrust them, because the categories often, the questions were not
directly associated with the categories. For example, you might have American authors,
and then the question is about, and then the question would say something about JD
Salinger's protagonist in Catcher in the Rye, so it's Holden Caulfield who is not an
American author; he is a protagonist. And then you have others that are just bizarre.
Like they had one that is called country clubs, and it had to do with different sticks that
are used in different countries to beat people with. [laughter] So, you know, in France
it's a baton. A baton is a country club in, came up in the country clubs. So Watson learns
to discount, not pay a whole lot of attention to the category names.
Now if the answer it comes up with is Toronto, and it is US cities, that drives his
confidence down a lot, but it doesn't kill it as a potential answer. And so Watson got,
basically it didn't know the answer. But it came up with Toronto and gave it 15%
confidence which is just minimal. For instance, anything under 20 is basically garbage
and 12% was Chicago. They should, it should have shrugged its shoulders. It put about
six question marks after Toronto to show people that it really didn't know, but it would
have been much better to just put the question marks with nothing. [laughter] But the
other thing is because it doesn't understand a lot of questions and it doesn't understand a
lot of data, it has to entertain the possibility that it may have misunderstood something or
that something unlikely can turn out to be true. So if you look at, if you develop a system
that is going to talk about male rock stars in American music history, and you have a list
of names, you know, Bob, and Richard and whatnot, and then you have female names,
and Alice is one of them, if the computer is looking and says Alice Cooper, it would say
that it is impossible that Alice Cooper could be a man. But it has to have its mind open to
that possibility that Alice Cooper can be a man. And so it has to have its mind open that
Toronto can be a US city, but here it came back and bit them. Yeah?
>>: So what was the reaction to the development team about that response?
>> Steven Baker: Enormous embarrassment. And the funny thing was it came after a
session in which Watson had been utterly dominant. And they had every reason to be
proud of what they had accomplished, because it had just mowed down the humans and
then it embarrassed itself right there. Incidentally, after that second round, even though it
screwed up on that question, Sam Palmisano, the CEO of IBM, was sitting with the
Jeopardy people and they were worried about a good show, and Watson was making a
mess of the good show and the ratings did go down for the third day because of that, and,
Palmisano comes up to Ferrucci and says maybe we should tone this thing down a little
bit, dial this thing down a bit. So, any others? Yeah?
>>: I think it was in the article that Ken Jennings wrote after the game that said that
Watson was having trouble on short questions and those were the ones that the two of
them could do because they were taking less than 3 seconds for Alex to say, and I just
want to hear what you…
>> Steven Baker: Watson needs at least 3 seconds to come up with, to go through all of
its processes. And so if you have a category in which it is just two words, it says, I think
it's like, wives of movie stars. And the only clue is Demi Moore or whatever, Ashton
Kutcher, or whatever it is, then that doesn't give Watson enough time and in that category
where they were racing, the humans were beating it, Watson was getting them all right
but coming in late.
>> Kirsten Wiley: Let's do one more.
>> Steven Baker: One more?
>>: I think with all the IBM guys [inaudible] have been working [inaudible] next chance
they might have?
>> Steven Baker: I don't have, I mean I know what Ferrucci wants to do. He wants to
basically build a crowd source system where Watson learns to ask questions and can use
the world to educate it, and build up a big data, sort of a knowledge base that way. But I
don't know about what their next ones would be. I would like to know if this is
something that you folks would like to work--I mean this idea of the, 25 people working
for four years on something, is that something that Microsoft should be doing? I'm just, I
mean there are only a handful of companies, like three in the whole world that could do a
project like this and you guys are one of them. I think it would be fun. And I would like
to write about it incidentally. [laughter] If you guys pick something really fun to do, I
would love to come out here and do another book like this one.
>> Kirsten Wiley: Thank you Steve.
>> Steven Baker: Sure. [applause]
Download