23878 >>: Good afternoon. It's an honor today to... who is joining us as part of the Microsoft Research...

advertisement
23878
>>: Good afternoon. It's an honor today to welcome Sharon Bertsch McGrayne,
who is joining us as part of the Microsoft Research Visiting Speakers Series.
She's here today to discuss her book "Bayes' Rule: The Theory that Would not
Die." Sharon is a former newspaper reporter and former author and editor on
topics in physics for Encyclopedia Britannica. We all grew up on that back when
it ruled. Her first book "Nobel Prize Women in Science" still in print after 20
years, published by the National Academies of Science press. Another book
"Prometheans in the Lab" about pioneers in the chemical industry. This new
book on bayesian methods and reasoning has been an idea on how to bring
some of the rich theory including some of the controversy about statistical
reasoning around bayesian methods. I like this quote here, the New York Times
book review by John Allen Paul that wrote, quote, if you're not thinking like a
bayesian, perhaps you should be.
I think it's better to cite other than to say it myself, although I've not been saying
things like that over the years I've been a bayesian long before I knew what that
meant.
I and a colleague David Heckerman who I ran into at Stanford University we're
both interested in artificial intelligence found in those days that we were in a
world where probability, where logic and theorem proving reigned and probability
was considered a throw back to ancient numerical methods that had very little to
do with reasoning and intelligence. But we and another colleague found out that
there was really no other way to tackle complexities of the real world to embrace
probability. And over lunch I was talking to Sharon and she's been doing even
further research on some subtleties about bayesianism and Microsoft Research
and Microsoft Corporation.
I had mentioned to her earlier that I became weak kneed in 1992 in July or
August when I was negotiating with Nathan Mervold as to why I would ever want
to come and join a tiny Microsoft Research of six or seven people and he leaned
forward I remember getting weak kneed, he leaned forward and he said to me:
Listen to me, Bill Gates is a bayesian. And he'll support you to really bring stuff
into the world. I heard more about it at lunch today. Maybe Sharon can fill us in
during her talk. I'll turn it over to Sharon so she can share the broader story of
the theory that would not die. Please join me in giving her a warm welcome.
[applause]
>> Sharon Bertsch McGrayne: Thank you for inviting me, for that nice
introduction. And thank you for coming today. I always start all of my talks in
truth in advertising and say I'm not a scientist or engineer, mathematician, that I
started out in newspapers.
But I did start working on Theory that Wouldn't Die eight or nine years ago, when
after a year of working I was totally thrilled when I could search for the word
"bayesian" on the Web and I got 100,000 hits.
Last week I searched for bayesian, and I got more than 13 million hits. So today
I want to talk to you about this revolutionary explosion of interest in Bayes' Rule
and how Bayes became a pervasive tool for decision-making based on
incomplete information.
So in the process, I hope we'll come to understand why many of you in this room
are real revolutionaries about a very fundamental issue, analyzing information,
making databased decisions.
Now, here at Microsoft you all know the 1998 patent for spam filter that had
Bayes embedded in it. But before I talk about the spam filter patent I want to talk
about something that's making headlines more recently.
And I'm going to see if I can work this. There. Air France Jet Flight 447 took off
in the spring of 2009 from Rio de Janeiro bound over night for Paris, met a high
altitude very intense electrical storm, disappeared without a trace. 228 people
aboard.
Last fall, I spent the afternoon with Olivia Ferante the man in charge of the
successful undersea search for the wreckage of Air France Flight 447. He's an
aviation engineer, who works for the French Civil Aviation Agency and
remembers with great fondness two years that he spent in Renton at the FAA
Research Lab.
Ferante was in charge for what became the biggest and most high-tech naval
search error. These are the plane's two black boxes which as you can see are
actually red and white. They're the size of shoeboxes, and they were lost in a
vast terrain the size of Switzerland, with mountainous topography of Switzerland
12,000 feet under the ocean.
After almost two years of fruitless searching by some of the world's leading
oceanographers, Ferante hires some of the same people who are in the naval
search chapter of theory that wouldn't die. And their bayesian search software
calculated the most probable site for finding Air France 447, where it was found
last April after an undersea search of one week.
A two-year fruitless search, Bayes finds it in one week. Now, for me the really
revolutionary thing about this is that the French authorities formally, in writing,
publicly credited Bayes with the discovery, because as we're going to see a lot of
people didn't dare even mention the word Bayes for decades of the 20th century.
But to understand this explosion of interest in Bayes, and why you all were such
revolutionaries, we have to go back a bit to Thomas Bayes and given the time
constraints I'm going to really race until the second world war and the fight in the
north Atlantic over the U-boats.
But I hope we're going to see two big patterns emerging, and the first is that
Bayes becomes an extreme example of a gap between the real world and
academia. That military super secrecy during the second world war entering the
Cold War afterwards had a profound effect on Bayes. That Microsoft Research's
use of Bayes is in a direct lineal decent from the bayesian warfare against the U
boats during the second world war and that Microsoft Research was a key player
in bringing Bayes to the public's attention and making it publicly acceptable.
Now, Bayes' Rule, of course, is named for the Reverend Thomas Bayes. He was
a Presbyterian minister, amateur mathematician, and lived in the first part of the
1700s in England. We know very little about him. He's the ubiquitous picture of
him is almost certainly of someone who lived much later.
We do know that he discovered his theorem during an inflammatory religious
controversy launched by the Scottish philosopher David Hume. The question
was whether scientists or others could use evidence about the real world to make
rational conclusions about God, the creator.
They called it God, the cause. God, the primary cause. Or just the cause. We
don't know that Bayes wanted to prove the existence of God, the cause, but we
do know he tried to deal with the issue of cause and effect mathematically. And
in so doing, of course, he produced this simple one-line theorem that allows us to
start with an initial idea. Bays actually used the word "guess". Said if you don't
have enough reason to guess one way or the other, guess 50/50.
And Xen commits us to modifying that initial idea with objective new information,
and then the really tough part is changing your mind in the face of the new data.
But Bayes didn't believe enough in his theorem to publish it. He files it away in a
notebook and dies 10 or 15 years later.
And going through Bayes' paper, a friend of his, another Presbyterian minister,
amateur mathematician, Richard Price, spends a long time rewriting and editing
the essay and gets it published in a journal that no one pays any attention to.
A few years later, however, a young professional mathematician, Pierre Simon
Laplace, most known today for the Laplace transform, discovers the rule
independently in Paris in 1774 and calls it the Probability of Causes. Now,
Laplace mathmetized every field of science known to his era.
He helped turn Newton's hypothesis about gravitation into a natural law. And he
spent 40 years of his career off and on transforming Bayes' Rule into the form
that's used today. And he actually used it. And until about 50 years ago, Bayes'
Rule was known as Laplace's work, and he's a hero of mine. I think he's
wonderful.
Now, over the course of Thomas Bayes and Laplace's lifetimes, western
scientists and governments worked very, very hard at compiling, at accumulating
lots of precise and trustworthy objective data. And by the mid 1800s, any up to
date statistician rejected Bayes' Rule, preferred to judge the probability of an
event according to how frequently it occurred.
And they become, of course, known as the frequentists, and they will be the
great opponents of Bayes' Rule up until a very short time ago. Because for them
modern science required both objectivity and precise answers and Bayes
approximations and a measure of belief were an anathema. They called it
amuck.
Now, by the time the second world war began in 1939, Bayes was virtually taboo
amongst sophisticated statisticians. Fortunately, Allen Turning wasn't a
statistician. He was a mathematician, of course. And besides fathering the
modern computer, computer science, software, artificial intelligence, Turing
machine, Turing test, he will also father the modern bayesian revival.
So I want to switch gears a little bit and dwell on Turing's story and the great
battle of the North Atlantic Ocean against the German U-boats because out of
this battle a lot of Microsoft Research interest in Bayes will grow. I also want to
dwell on Touring's story because it illustrates how Bayes worked as a pencil and
paper method as one of the earliest computer methods, and as an illustration of
military secrecy and the effect thereof.
Now, it's important to remember that during the second world war, England will
be cut off from the farms and factories of France and will be able to feed only one
in three of its residents. And it will depend on a convoys of unarmed merchant
marine ships delivering 30 million tons of food and strategic supplies each year to
Britain from the north and South America and from Africa.
Hitler said point-blank, U boats will win the war. Churchill said after the war, the
only thing that really scared me were those U boats.
And the German U boats did sink almost 3,000 allied ships and kill more than
50,000 merchant seaman. Now the German Navy ordered the U boats around
the Atlantic via radio messages that were encrypted with word scrambling
machines called The Enigmas, and this is a photograph of one that comes from
Photobyre Crypto Seller [phonetic] website.
To standardize their communications, the German military purchased 40,000
enigma machines and distributed them among the different services like the
German railways, the Army, the Air Force, the diplomatic Corps, the Italian
Spanish allies and so on. And each group incorporated its own security
measures and its own complexities, and of them all the German Navy operated
the most complex enigma code.
Now, if I can work this, an enigma code looks much like a very sturdy but
elaborate typewriter. It had wires coming out of here. We should see the wires
but they're not attached at this point.
It had wheels. Why are these the wheels. It had wiring. It had code books.
Tables. Some of it was doubly encoded. And there were features that could be
changed within hours, if necessary, and they could churn out millions upon
millions of permutations, and no one, German intelligence, German military, or
the British, ever thought that the British would be able to read those messages.
Now, Turing had been trained at Cambridge. Had a post-doc at Princeton New
Jersey. He returned in the summer of 1939 and spends the summer on his own
on the enigma codes. He goes to Bletchly Park [phonetic] occasionally to consult
with experts at the British super secret Decoding Center, coding center that's
north of London.
He has standing orders that the day after war the declared he's to report there full
time. And on September 4, 1939, he reports for duty.
When he arrived, he was 27. He looked 16, people said. He was handsome.
Athletic. Shy. Very nervous. And he had lived openly as a homosexual in
Cambridge. He would devote the next six years to the enigma and to other code
and decoding projects and to the machines that were needed for decoding. Now,
when he arrives in Bletchly Park no one is working on the all-important Navy
Enigma. Turing however liked to work alone and he said afterwards that after a
few weeks he decided that no one else was doing anything about it, and I could
have it to myself.
And he begins to attack it. He first assigns a machine to eliminate the wheel
arrangements that couldn't produce the words he thought should be in the
German messages. And then he developed a very bayesian system that let him
guess a stretch of letters in the original message, hedge his bets, measure his
belief in the validity by assessing their probabilities and add more clues as they
could be found.
Now, photo [inaudible], who gave us this picture, and this is an actual naval
enigma, not just one for one of the other services, he and some other
cryptography experts are trying to decode the remaining enigma messages.
And he said that even today a modern computer can take weeks or months to
solve a naval enigma message if all they know is the original language the
message was written in.
If they have to do it by brute force. But if they have a machine like Turing, one
that he invented to test the possible wheel combinations, and if they can guess
some of the words in the messages, then a modern computer can break an
enigma, naval enigma message in seconds or less.
Turing of course didn't have modern computers, but the principle remains the
same, he had his machine and next he had to decide he had to be able to guess
the most probable words in the message. So Bletchly Park begins to collect
clues. And the best source of them were a stretch of weather station ships that
the Germans stationed across the north Atlantic in the far north and they relayed
weather reports to the mainland, to the continent. Now, of course weather is
standardized vocabulary that's repeated a lot. So it was ideal for them, weather
for the night [inaudible] be ordered and so on.
And they could determine the likelihood of their guesses a bit by getting clues
from the British weather reports from ships in the north English channel.
In a fundamental breakthrough, though, Turing realizes that he couldn't system
ma ties his hunches or compare their probabilities without a unit of measurement.
He names his unit a ban for the town nearby that's called Banbury and defined it
as, quote, about the smallest change of weight of evidence that is directly
perceptible to human intuition. And he said that when odds of a hypothesis
would reach 50 to 1 they could figure that they had the right words.
This, of course, is basically the same as the bit that Claude Shannon discovered
by using Bayes' rule at roughly the same time at bell telephone laboratories. We
don't know whether Turing developed the system on his own or at Cambridge
probably from lectures by Harold Jeffreys. But by June 1941, a year and a half
after the war started, Bletchly Park could read the German messages, the U boat
messages, within an hour of their arrival of Bletchly Park.
And for almost a month that summer no convoy is hit. The convoys are routed
around the U boats.
By the autumn of that year, by autumn of 1941, this bayes bayes system was
critically short of typists, however, junior clerks, otherwise known as girl power,
and Turing and three others write a personal letter to Churchill asking for more
resources.
And he responds immediately. Ian Fleming of James Bond fame planned a
super elaborate raid to capture code books that Turing needed. They were so
complex just reading it made me feel glad that they decided not to do it.
Navy seamen risked their lives and some died collecting code books for Turing
from sinking German ships. Now, this system didn't always work. The German
Navy added a fourth wheel, and this machine actually has four -- four wheels.
And with the fourth wheel, for most of the remaining year, Bletchly Park couldn't
crack the U boat codes. Eventually, though, when the U.S. begins making a lot
of the Turing wheels testing machines breaking the enigma codes becomes
routine.
However, shortly after Germany attacks Russia that summer in June of 41, the
German Army started using new and highly sophisticated word scrambling
machines called the Lorenz Machines and a family of ultra secret Lorenz codes.
And a group of British mathematicians is assembled and spends a year resorting
to a variety of techniques to break the Lorenz codes. They use them on other
Bayes rule, priors, Turing bayesian scoring system, his fundamental unit of bans.
And then they incorporated the bayesian methods. Some bayesian methods into
the computers they were building to encrypt the German codes, the computers
called the Collosi.
They were of course the first large scale electronic digital computers. They were
built for a special purpose of decryption. But they were capable of making other
computations, ten models were built, and they were far ahead of anything at the
time in the United States.
The engineer who built the Collosi, Thomas Flowers, had strict orders to get the
current model up and operational by June 1, 1944. He was not told why. He and
his team, he said we worked until our eyeballs, we thought our eyeballs were
falling out.
But they get it operational by June 1, and on June 4 -- June 5, excuse me, a
message comes from Hitler, signed by Hitler, to his Army commander in
Normandy Irwin Romel. And he tells Romel if there's an attack on the Normandy
coastline, don't do anything for five days because this will be a diversionary
attack and the real attack will come later somewhere else.
The message is decoded at Bletchly Park. It's raced by courier to Eisenhower
where he and his staff are deciding when to launch the invasion of Normandy.
Eisenhower can't tell his staff about Bletchly Park and the decoding. So he reads
it and hands it back to the courier and then turns to his staff and says: We go in
the morning, the morning of June 6, 1944.
And Eisenhower said that the war in Europe was shortened by Bletchly Park by
at least two years.
Now, a few days after Germany's surrender in May of 1945, the British
government issued a surprising and shocking order. Everything showing that
mathematics, decoding, computers, Turing, had helped win the war was declared
super secret. All but two of the Collosi were destroyed, and I think it doesn't take
someone a great pundit to realize that if without that order Britain might well have
become the leader of the computer age, and you all might be working in
Manchester or Birmingham instead of of the state of Washington.
The orders also prevented mathematicians from becoming war heroes, in which
case the modern word "geek" would have very different connotations. They'd
have connotations of heroic accomplishment. Now, after the war Turing, hard to
imagine, huh? Now after the war, Turing was working on computers and other
projects.
When two English spies for the Soviet Union flee to Moscow to escape arrest.
1950. One was called, named Guy Bergis. He was an openly homosexual
diplomat, posted in Washington DC, graduate of Cambridge University. U.S.
intelligence told the British the spies had been tipped off by another homosexual
graduate of Cambridge University, an eminent art historian, named Anthony
Blount. And the British government panicked at the thought of a ring of
homosexuals spies from Cambridge. And the number of homosexuals arrested
spikes.
On the first day of queen Elizabeth's reign 1952, some of us just celebrated her
60th anniversary as queen, Turing is arrested for homosexual activity in the
privacy of his home with a consenting adult and less than a decade after Britain
fought a war against Nazis who had conducted medical experiments on their
prisoners, Turing is found guilty and sentenced to chemical castration.
On June 7, 1954, the day after the tenth anniversary of the Normandy invasion
that he had make possible, Turing father of the computers we use today,
commits suicide.
Anthony Blount, on the other hand who had started the witch-hunt, is knighted
and it's 55 years in 2009 before a British prime minister aapologizes for the
government's treatment of Turing.
Now, two weeks ago, despite an international petition that some of you may have
signed, Prime Minister Cameron refused to pardon Turing.
His work lived on in cryptography, and especially in bayesian theory, because his
wartime assistant Jack Good IJ Good in his articles developed and published
hundreds of articles on bayesian theory after the war.
But Turing wasn't the only one using Bayes during the war. The British Air
Ministry had organized a small group of scientists to improve its operational
efficiency. And they used a lot of Bayes, especially when only a few variables
were involved.
U.S. Navy was very impressed and formed a group of 40 civilian physicists,
mathematicians, chemists and actuaries that they called the Anti-Submarine
Warfare Operations Research Group, ASWORP, headed by the physicist Phillip
Morris at MIT and Columbia chemist George Kimmel. And ASWORP was
devoted not not just for avoiding the U boats, as Blatchly Park fewer resources
had done, but to actually hunting down U boats and sinking them.
And they used Bayes for small detailed parts of big problems. For example, the
number of aircraft needed to protect a convoy of boats. Whether the air
squadron should deviate from its regular flight plan, small problems.
Now, Morse at MIT will start the direct link between the second world war
operations research battle against the U boats and Microsoft Research here in
Redmond, because Morse will become Ron Howard's Ph.D. advisor and Ron
Howard in turn will advise David Heckerman and Eric Horowitz at Stanford.
I'm getting ahead of the story a bit, because after the war, despite Bayes's
successes during the war, it emerged even more suspect and de-classe then
before. So for 30, 40 years before the Cold War a small group of maybe 100 or
more believers will struggle for acceptance.
For example, when Jack Good, Turing's statistical wartime assistant gave a talk
about bayesian theory at the Royal Statistical Society, the next speaker's
opening words were: "After that nonsense."
Harvard business school professors who developed bayesian decision trees,
Howard Ray among them, were called socialists and so-called scientists. A
Swiss visitor to Berkeley's very frequented statistic department in the 1950s
realized, he said it was kind of dangerous to espouse Bayes.
So with the Cold War military applying bayesian rule in secret, the bayesians
concentrated on building a mathematical theory that would make Bayes a
respectable part of mathematics. And many bayesians of that generation
remember the precise moment when the Bayes's overarching logic would
descend upon them like an epiphany and they would talk about their religious
conversions.
Now both sides were proselytizing during this period as their method was the one
and only way to do statistics. Both sides used religious terms. When the
bayesian Dennis Lindley was appointed chair of an English statistics department,
frequentists called him a Jehovah's Witness, elected pope. He in turn asked how
to encourage Bayes retorted: Attend funerals. And frequentists answered if
Bayesans would only did what Thomas Bayes did and publish only after they
were dead we should all be saved a lot of trouble.
Now, as a result, during the Cold War, there were very few civilian applications of
Bayes in the public arena. For example, in 1972 statisticians were still wrestling
with the problem of how do you deal with something that has never happened.
And an MIT physicist Norman Rassmussen was asked to do a study of nuclear
power plant safety, nuclear power plants had never had an accident before. So
the frequentists had no way of judging the likelihood -- the probability of a future
accident.
Now, Rassmussen did have the failure rates of pumps and valves and things like
that he could use, but it didn't produce enough data. So he had to turn to expert
opinion and even to bayesian analysis. And both were incendiary sources of
information at the time for something as incendiary as a nuclear power plant
safety plan.
Rassmussen went to California, to Stanford, to consult with Ron Howard and his
partner Jim mathson and they encouraged him to use Bayes and Rassmussen a
report in 1964 predicted exactly what happened in Three Mile Island, that the
core damage wouldn't always be catastrophic, that human error and radioactivity
released outside of the building could be significant problems. But he hid the big
bad word Bayes in the appendix of Volume 3 of a massive multi-volume report to
the government.
By the late 1980s, however, pieces were falling together for the bayesian
statisticians. And for them imaging was forcing the issue for them. Industrial
automation at the time, the military medical diagnostics, were producing blurry
images with ultrasound machines, PET scans, mirrors, electron micrographs,
military aircraft, infrared sensors, and all these blurry images raised what did the
original object look like.
That seemed to be ideal for bayesian probability of causes. But Laplace's
method involved the integration of functions and with too many variables it was
just hopelessly complex.
But bayesians didn't realize that the key to making Bayes useful in the workplace
would be computational ease and not more polished theory.
Dennis Lindley, the theorist who had been programming his own computer since
1965, who regarded Bayes idea for computing he wrote: I consider it a major
mistake of my professional life not to have appreciated the need for computing
rather than mathematical analysis.
I should have seen that Bayes enabled one to compute numerical answers. But
many academic statisticians of those generations had started out as abstract
mathematicians who regarded a computer as a cop-out.
And there was a particularly poignant case involving a Canadian mathematician
who lives in Vancouver now, Keith Hastings, who published a paper in 1970 with
what's now called the Metropolis Algorithm or the Hastings Metropolis Algorithm
and Hastings used Markov chains, Monte Carlo sampling techniques. Dropped
out of research a year later because no one noticed, cared about his paper. And
he didn't even learn about the importance of his work for 20 years after he had
retired.
And Hastings told me with some anguish in his voice that his work was ignored
because, quote, a lot of statisticians were not oriented toward computing.
They took these theoretical courses. They cranked out theoretical papers, and
some of them wanted an exact answer, not estimates, definitely not measures of
belief.
When Dennis Lindley's student Adrienne Smith on the left and the American
Allen galoffhand on the right finally put the pieces together in 1989 for the
bayesian statisticians, they had Bayes, Gibbs sampling Monte Carlo, Markov
chains, iterations, and they wrote their watershed synthesis about MCMC very
fast figuring that other people would put the pieces together, too.
But they also wrote it very carefully. In 12 pages, they used the word Bayes only
five times. Asked Galfand why. He told me there was always some concern
about using the B word. A natural defensiveness on the part of bayesians in
terms of rocking the boat.
We were always an oppressed minority trying to get some recognition. And even
if we thought we were doing things the right way, we were only a small
component of the statistical community and we didn't have much outreach into
the scientific community.
But the next ten years after that paper, the bayesians pass it in what they still
remember as a frenzy of activity, because using MCMC, and the relatively cheap
powerful workstations that were becoming available, and a few years later they
got off the counter software from Adrian Smith's student David speak kelhalter
who called them bugs, bayesians could finally after two and a half centuries
compute complex and realistic problems.
When Galfand and Smith gave an MCMC workshop at Ohio State early in 1991,
they were astonished when almost 80 scientists appeared. Not statisticians, but
scientists.
And with outsiders like this from computer science, physics, artificial intelligence,
refreshing and broadening Bayes depoliticizing secular rising it, it was adopted
almost overnight.
Now, during this frenetic period of Bayes's history, Microsoft becomes a key
player. Jim mathson and Ron Howard had made the breakthrough as far as
bayesian networks were concerned and Matheson emphasized to me that we
regarded ourselves as engineers. So they organized the Stanford Research
Institute to apply Bayes to very complex problems.
Not the simple ones that had been done during the war. Ron Howard told me
what I did was to combine the theory of decision-making that goes back to
Laplace and Bayes when they're making simple decisions under uncertainty with
balls and urns and combine it with systems operation and engineering research
from the second world war in terms of actual decisions that people who lie awake
at nights are concerned about.
Now, by 1990, another during this frenzied period another person interested in
Bayes enters the picture, and that's Nathan Mervold, who was helping to
organize Microsoft Research. He had been a physics student and a physics
post-doc and had read and met Ed Janes, who was a bayesian physicist.
And Mervold became interested in Bayes not for inference or decision-making
but for artificial intelligence, because it would help quantify the uncertainties for
machines that could think.
When he learned that his high school classmate David Heckerman had formed a
software company with Eric Horvitz and Jack Reeves using bayesian networks
for diagnosing lymph node diseases, Mervold convinced Heckerman to join
Microsoft. I should have shown you Ron Howard's picture a long time ago. But
here's the first meeting of the three of them when they come to Microsoft to
explain what they've been doing.
And Eric Horvitz tells the story about how Mervold got him to join Microsoft by
leaning forward at the meeting and saying that Bill Gates is a bayesian and Bill
will give you the freedom to innovate broadly. And here during this first meeting
is Horvitz explaining Bayes rule to Microsoft.
Two weeks ago, I asked Mervold how Bill Gates became a bayesian? And he
said, well, when he was talking to Horvitz, I was stretching it a bit he said. He
said if you did an experiment at the time asking Gates, Bill Gates ten questions, it
would have been unclear that he was a bayesian.
On the other hand, Mervold said he always supported it. And in fact three years
later, in '96, it was Bill Gates who attracted Technology World attention to
bayesian networks when he said that Microsoft's competitive advantage lies in its
expertise in bayesian networks.
Heckerman says at that point people's eyebrows started going up. But the
public, people like me, first started took notice in 1998 with Microsoft's patent for
the bayesian spam filters.
It was filed, applied for in 1998. Up here, and approved in December of 2000.
And if you read further into the patent in column 27 -- I'm having trouble reading
it -- No. 17, I think, right? The using naive bayesian classifier, unlimited
dependence bayesian classifier, a bayesian network classifier, a decision tree.
It's got loads, loaded with Bayes.
I don't think anyone who didn't live through that period can appreciate what that
spam filter did. The dramatic impact it had. At the time they were saying that
some people were spending a half hour a day cleaning out the spam from their
e-mail accounts. There were thoughts of congressional legislation, of lawsuits.
I began every morning swamped with Viagra ads. And I have to tell you that
Viagra wasn't something I was really thinking about buying. But for me, that
period is crystallized by a comment that a rather elderly, quite proper but very
distinguished author of children's books said. She said: You know nowadays I
hate to start work in the morning because when I turn on the computer, I have to
go through screen after screen of graphic pornography. Actually, that's not what
she said. She described it, but my husband says I can't tell you.
So this bayesian revolution was a modern paradigm shift for a very pragmatic
age. And it happened overnight. Not because people changed their minds about
the philosophy of science, but because suddenly Bayes worked. The battle
between bayesians and frequentists subsides. Prominent frequentists
moderated their positions in public. Bradley Efferen, National Medal of Science
recipient, who wrote a classic defense to frequentism told me I've always been a
bayesian.
The controversy is not entirely over, of course. There are hold-outs remain in
clinical medicine, in the judiciary, even in physics. Let's say, for example, a
British appeals judge you probably know, banned the use of Bayes in British
courtrooms and he actually overturned a man's conviction for murder because
the statistics of the number of Nike shoes in Great Britain was not firm. And he
wants firm statistics, not approximations.
And experts have said his decision could affect virtually every case involving
circumstantial evidence in Britain. And an international committee of statisticians
is working on the problem.
Physicists are still using frequentism. They're using it for the Higgs Bozone
[phonetic] search. There was a big conference held recently about statistics
being used in the search. And an attendee told me the astronomers were taking
mostly a bayesian approach for the issues. The particle physicists were mostly
considering frequentist approaches.
They appear to be the last strong hold outs in science against bayesian methods.
And the statisticians, he said, were divided. Thank you.
I'm happy to take questions.
[applause].
>>: Can you go back to the Air France case that you started with, just for a sec,
because I thought about this distinct -- what's the difference between bayesian
statistics and Neiman Pearson statistics for a long time and even after reading
your book and other things I haven't quite got the essence of it.
So here's the Air France thing. You've got the bayesians use some debate and
techniques to find the plane. Now, the frequentists would say something like you
shouldn't be using trier probabilities. But seems to me that frequentists would
use them as well, would be using them, too, because surely they wouldn't be
searching in the Pacific Ocean. They're making a judgment about that. And so
when you think about it that way, it looks like there's a gradation between them
and these two approaches. They're making a decision on where to look. They're
going to look in the Atlantic.
>> Sharon Bertsch McGrayne: I did explain what Ferante did Bayes did for him.
During the second year of the search a group of oceanographers looked not on
the path that the jet had taken, was on, and this is the last known point. They
went up here. They thought they could use the currents and the wind patterns to
deduce how the jet had moved. That didn't work. Okay. As a matter of fact,
Larry Stone, the bayesian who is in the book, was also worked on the program at
Mitron, which is a small -- a defense contractor in Virginia, he said when he
heard that the oceanographers were going to go up to the north, he was stunned.
I asked Ferante, I told him that. And Ferante said I was too.
Now, I'm not convinced that Ferante understood the strength of the word in
English stunned. His English is magnificent. But Atenoe in French is not as
strong as it is in English. He was surprised. So what Ferante said Bayes had
done for him was to be able to combine all the evidence gathered over two years
of searching. He said that the civilian safety agents worked very closely
internationally. They all know each other. They work and cooperate with each
other. And he had data from almost ten Russian crashes. He had another that
was quite close to the Air France from South Africa.
He had all the evidence from the oceanographers and so on. And Bayes allowed
him to combine it all. Gave him the probable location, and then it gave him a
day-to-day search plan for how he could allocate his assets and frequentism
wouldn't have done that.
>>: The frequentists did, sounds like, they did apply prior probabilities here,
except they were just wrong. They went looking somewhere else.
>> Sharon Bertsch McGrayne: They were actually -- depending on Coast Guard
search program that had been done by these same people that's talked about in
the book, that uses the wind and the water currents to find lost sailers and so on,
but the crash occurred right near the equator, and at the changing of the
seasons, the currents and the winds are apparently very wild. And it just wasn't
adequate.
>>: What do you believe was the single most contributing factor to the intense
animosity that developed between these two schools of thought?
>> Sharon Bertsch McGrayne: I think much of the tenor was set by -- Fisher,
who was apparently a very damaged personality, someone would ask him about
science, even a colleague, and he would interpret it as a personal attack. And
his career was spent in a series of scientific arguments at meetings and
publications and so on, several at a time.
And it's kind of -- Fisher was a very, very important statistician. He was the
founder of modern statistics, very important.
>>: So this didn't develop until the middle of the ->> Sharon Bertsch McGrayne: He kept it up from the 1920s well into the 1950s.
He became so extreme then that when a bayesian at the Department of Health
NIH was using Bayes to show that cigarette smoking was probably a cause of
lung cancer, Fisher said no, it's the opposite way. Lung cancer probably causes
the smoking.
>>: So I'm curious whether your research is the result of instances where these
two schools of thought has actually collaborated and produced something that
either was able to do ->> Sharon Bertsch McGrayne: No. No. Nowadays, yes. Nowadays people
apparently pick what works for their project.
>>: [inaudible].
>> Sharon Bertsch McGrayne: It's a bayesian at Stanford who talked about how,
as he was a young man, would walk down the halls of the statistics department
at Stanford and there were signs making fun of Bayes on professors' doors.
He was very wounded. He told me that story three times. That doesn't
encourage cooperation.
>>: Have to make a comment that we both took Ferante's course. We both liked
him very much. We read his paper: Why I'm not a Bayesian very carefully.
Hearing now that he's a bayesian is very exciting for us.
>> Sharon Bertsch McGrayne: Yes.
>>: [inaudible] was a physicist for some years, in 1978 I was in the physics
department where we wrote a paper [inaudible] analysis to analyze physics data,
which seems to be entirely appropriate.
But one of my colleagues was Jeff Daniel, who wrote one of the first applications
of bayesian maximum entropy to infrequent construction of incomplete noisy
data. So [inaudible] in 1972. Much though Eric and David ought to be praised, I
think there was a lot of impact of bayesian on reconstruction earlier.
>> Sharon Bertsch McGrayne: There was a lot of physics -- physics.
>>: Physics department.
>> Sharon Bertsch McGrayne: From the '30s, Ferami, Feynman and so on. So,
yeah.
>>: Daniel and Gold was what I was introduced to. We used it.
>>: One last question.
>>: I have two.
>>: Last questions.
>>: So do you see any bayesian inference making enroads into classical places
where frequentists live like in agricultural research and in drug testing? Like
really classical areas where it was invented in some ways?
>> Sharon Bertsch McGrayne: Berkeley. Berkeley would be one of those. But
you're specifically asking for the state universities, the land grant universities.
>>: About Iowa State.
>> Sharon Bertsch McGrayne: Okay. I'll have to tell a story. I got a series of
e-mails from someone who says he has used Bayes for many years for animal
genetics in a land grant thing. I wrote Dennis Lindley that. Dennis Lindley has
been wonderful and written me dozens of letters.
So I thought he'd like this. And he wrote back and said that this gentleman is a
natural bayesian polluted by frequentist ideas.
>>: Your second question?
>>: Yes. I thought I was denied the second question. But it was a [inaudible]
question. Would you say, either of you, would you say that bootstrapping is a
bayesian technique?
>>: It seems to me it straddles.
>> Sharon Bertsch McGrayne: I'm passing that one to you.
>>: [inaudible].
>> Sharon Bertsch McGrayne: I didn't ask him that.
>>: I don't think he called it bayesian.
>> Yeah,
>>: Thanks a lot.
[applause]
Download