>> Matt Richardson: Alright, good afternoon everybody. My... Researcher here at MSR in the Machine Learning Group. ...

advertisement
>> Matt Richardson: Alright, good afternoon everybody. My name is Matt Richardson and I’m a
Researcher here at MSR in the Machine Learning Group. I have the distinct honor of introducing Pedro
Domingos to you all as part of the Microsoft Research Visiting Speaker Series. Pedro is here today to
discuss his book The Master Algorithm.
But before he starts I wanted to say a few words about Pedro. Pedro was my PhD Advisor at the
University of Washington. He has a very impressive biography which you can all read about in the talk
announcement. He’s won numerous awards including the KDD Innovation Award which is considered
the highest honor in the data mining field. He has an amazing breadth of knowledge and always brings a
valuable fresh perspective to any problem that he wants to tackle.
I haven’t read the book yet but I’m really looking forward to it. I’ve always admired Pedro’s writing
skills. I remember in Grad School my office mate and I who were both his students were impressed that
no matter how carefully we would write something. Pedro could always edit it to be more concise,
clear, and illuminating. I expect Pedro will have brought the same writing style to his book. Without
further ado please help me in welcoming Pedro Domingos.
[applause]
>> Pedro Domingos: Alright, thanks everyone. Thanks Matt for the introduction. As you can see some
of my grad students even survived the ordeal and went on to do great things.
I’m here today to talk about The Master Algorithm. How the quest for the ultimate you know learning
machine will remake our world. What is this all about? It’s about a change that is happening in the
world today that is as big as the internet or the personal computer, or electricity were in their time. In
fact it builds on all of them. You know it affects every part of society. It touches everybody’s lives. It
touches your life right now in ways that you’re probably not aware of.
You know it’s, huge fortunes are being made because of it. Also, unfortunately, and sometimes
unnecessarily many jobs are being lost because of it. Children have been born that wouldn’t be alive if
not for it. It may save your life one day.
This change is the rise of Machine Learning. That’s what the book is about. What is Machine Learning?
If you’re already familiar with, if you’re not familiar with Machine Learning this talk should be quite
valuable. If you are familiar with Machine Learning you hopefully get maybe a new view of it than what
you had before.
In one expression Machine Learning is the automation of discovery, its computers getting better with
experience like we do, learning by themselves. It’s a little bit like the scientific method except its being
done by computers instead of scientists. As a result it’s on steroids. You know the algorithms. You
know the learning algorithms they formulate hypothesis. They test them against data. They refine the
hypothesis. They repeat the cycle, again a lot like scientists do except its millions of times faster.
As a result we accumulate knowledge you know in any given period of time you know Machine Learning
can accumulate millions of times more knowledge than human scientists every could. Now most of this
knowledge so far is not very deep. It’s not you know like Newton’s Laws or the Theory of Relativity. It
tends to be more mundane knowledge. But you know mundane knowledge is what life is made of. You
know what do you search for on the web? What do you buy when you go on Amazon? What are your
tastes?
If you’re a company Machine Learning helps you understand your customers better. If you’re an
individual Machine Learning helps you find books to read, movies to see, jobs, even dates. A third of all
relationships that lead to marriage these days start on the internet and its Machine Learning algorithms
that propose potential dates for you. There are children alive today that wouldn’t have been born if not
for Machine Learning.
[laughter]
Let me give you another example. The Smart Phone in your pocket right now is chalked full of learning
algorithms. The learning algorithms you know let it understand what you say, let it correct your, you
know your typing errors. They predict what you’re going to do and then they help you either in
response to your commands or you know or even on their own initiative. You know they use both their
panaplea of sensors. You know and all the data going through them in order to do this.
You know they use GPS to figure out what your daily habits are. They can even compare that with your
calendar to you know to figure out if you’re the kind of person who tends to be tardy for meetings.
They can even use the accelerometers that they have to figure out you know what your characteristic
walk is. You know something that I think is going to happen in the fairly near future is that if your Smart
Phone figures out that you’re about to have a heart attacker it will call nine one one on its own. It will
warn you. Machine Learning may well save your life one of these days.
Now with Machine Learning being so valuable it’s not a surprise that tech companies are all over it. For
example you may know that Google you know not long ago bought this company called Deep Mined for
over half a billion dollars. It had no customers and no products just because it had better learning
algorithms. But if you’re Google and you have a learning algorithm that lets you, you know predict you
know whether someone will click on and add one percent better. That alone is worth you know fifty
billion dollars or so every year, okay.
Another example is IBM just a month ago bought this medical imaging company for a billion dollars. Not
so much because of what they do. But because they want access to their library of images in order to
train their learning algorithms to the things like diagnose breast cancer from x-rays and diabetes, and
what not.
The kinds of things that today takes very highly paid people to do. As a result you know people with
expertise in Machine Learning are very highly sought after, right. Peter Lee, right, you know the Director
of Microsoft Research actually said that the cost of acquiring a top deep learning expect. Deep learning
being of course the very hot area in Machine Learning is comparable to the cost of acquiring a top NFL
quarterback prospect. The geeks have finally won yay.
[laughter]
On the other hand you know Machine Learning also has a dark side. It is what’s behind the increasing
automation of white collar jobs. Some people say that the NSA uses it to spy on us. I don’t know
because you know its secret.
[laughter]
But you know, but yes it’s probably true at least to some degree. There’s even a lot of speculation these
days in the media as you’ve probably seen that you know Machine Learning is going to lead to
Terminator, right, big batty eyes and robots taking over and what not.
I think the take home message from all of this is that Machine Learning could be your best friend. But it
could also be your worst enemy depending on what you do with it, which is why I think we’re not at the
point where everybody needs to have a basic understanding of Machine Learning. Not just computer
scientists or Machine Learning researchers anymore.
That doesn’t mean that you need to understand the gory details of how Machine Learning works. It’s a
little bit like driving a car, right. You don’t need to understand how the engine runs. But you need to
understand you know what to do with the steering wheel and the pedals. I think that most people right
now don’t even know that Machine Learning algorithms have a steering wheel and pedals that they can
control, okay.
Alright, so how does this happen? How do computers learn things by themselves? This to a lot of
people I think comes as a surprise. Because they think of you know of computers as just these you know
fairly dumb things that just do exactly what we tell them again and again. You know learning you know
requires a lot of intelligence, a lot of creativity. Then computers aren’t supposed to have that, right.
You know Picasso said that computers are useless because they can only give us answers, okay. Well,
Machine Learning is what happens when computers start asking questions. In particular the question
that a computer asks or that a learning algorithm asks is the following. Is here’s an input and here’s an
output, how do I turn that input into that output? Here’s an x-ray of a breast and the output is there’s a
tumor here or no there’s not tumor.
This is the question that the learning algorithm is asking is like how do I go from one to the other? If you
give enough examples it often figures out how to do that better than you know a highly paid you know
experts can do. You know like you people like doctors and so on, okay.
Now, you know here’s maybe one way to look at this which is traditionally, right, computers have to be
programmed by us, right. The algorithms, right, we were the ones that defined the algorithms, right.
Then we input the algorithm into the computer. Then it was the data that went in the input. The
algorithm did something to the data and outcomes the output, right. This is how you know most you
know computers in the world work you know today.
But Machine Learning turns this around. In Machine Learning actually the output has become an input.
What goes into the computer now is the data and the output. What the Machine Learning algorithm is
doing is saying like huh, if this is what goes in and this is what comes out. Then what is the algorithm
that turns one into the other? Clearly, if you can answer this question it’s very, very powerful, okay.
Now, the amazing thing is that this one algorithm, right. What’s in the computer can produce very, very
different algorithms depending on the data that goes in. The same algorithm can learn to play chess or
to do you know breast cancer diagnosis, or to the credit scoring, or to do you know prior
recommendations, or whatever.
Really, the Holy Grail of Machine Learning and people, some even said, say the Holy Grail of Computer
Science is to figure out what is the most general algorithm that you can have that does this? In some
sense every Machine Learning algorithm aspires to be you know such a master algorithm, which is one
algorithm that is independent of the particular application. But just by giving it data that one master
algorithm, right, it’s a master algorithm because it makes other algorithms can turn into something that
is good for a huge variety of different things, okay.
What I would like to do is just give you a flavor of how that happens. It’s quite fascinating, right. You
know I think one reason to learn about Machine Learning is that it’s very important in your life. But
another one is that it’s fascinating because learning algorithms come from all sorts of different areas
including neuroscience and evolution, and so forth.
You know to simplify things a bit there are five main schools of thought in Machine Learning. Each of
these schools of thought has its origins in a different field of science. Each of these schools has its own
master algorithm, a general purpose algorithm that in principle can be used to solve any problem, okay.
Too fast forward a little bit, what I’m going to argue is that at the end of the day none of these people
really have the master algorithm. They do have one part of it. The real master algorithm will come
when we’re able to combine them all into one algorithm, okay.
What are those five schools? They’re the Symbolists who have their origins in logic and philosophy.
Their master algorithm is inverse deduction. It’s viewing induction as being the inverse operation of
deduction. There are the Connectionists. These days the ones that are most you know famous. Their
idea is to reverse engineer the brain. They’re inspired by neuroscience. Their master algorithm is back
propagation. The Evolutionaries what they do instead of being based on the brain they’re based on
evolution. They say let’s simulate evolution on the computer. They inspired by evolution biology. Their
master algorithm or the most powerful algorithm that they have is something called genetic
programming.
Bayesians another very famous you know school of thought Machine Learning. Their origins are in
statistics. In essence their master algorithm is probabilistic inference which is really how you apply base
theorem, which is what the, you know what Bayesians take their name from. Finally the Analogizers,
you know this is you know learning and reasoning by analogy. They actually have origins in many
different fields. But perhaps the most important one is psychology because there’s an enormous
amount of evidence that we human beings do a lot of reasoning by analogy. Again, they have several
algorithms but the most powerful one or you know powerful and widely used is kernel machines. Also
known as support vector machines which until you know the recent upsurge in connectionism was
actually you know the dominant approach in Machine Learning, okay.
Let’s see a little bit more what the basic idea of each of these schools is and what goes on there. Let’s
start with the Symbolists. Here are some of the most famous Symbolists in world, Tom Mitchell at
Carnegie Melon, Steve Muggleton in the UK, Ross Quinlan in Australia. He was actually the first ever
PhD in Computer Science from UW. In some ways he’s actually local.
[laughter]
You know he was actually on my PhD committee, as well. Here’s the basic idea in Inverse Deduction.
This is the idea that we’re going to solve learning in the same way that mathematicians you know solve
things by defining inverse operations. For example addition, right, gives us the answer to the question if
I add two and two what do I get as a result? Subtraction the inverse of that is the answer to the
question, what do I need to add to two in order to get four? Okay and the answer of course is two.
The basic idea in inverse deduction is essentially the same, right. You say well, deduction right is going
from the general to the specific. Induction is going from the specific to the general so it’s the opposite.
In the similar to addition, subtraction, right, deduction gives you answer to the question if I know that
Socrates is human and that humans are mortal, what follows? Well it follows that Socrates is mortal of
course, okay. Now, what is induction? Induction is asking the question what knowledge am I missing
such that if I know that Socrates is human I can infer that he’s mortal, okay.
Of course the answer is that you know what’s missing is the rule that humans are mortal. Now, once
I’ve acquired this general rule I can use it. You know I can apply it to other people. I can combine it in
arbitrary ways with other rules to potentially form very complicated chains of reasoning.
Now of course, you know I wrote this in English and computers don’t understand natural language yet.
In reality on the computer this is represented in something like first order logic, but the idea is the same,
okay? Here’s an example of the power of inverse deduction. You see this picture here. There’s a
Biologist here but it’s not this guy. It’s not the guy in the lab coat. This is actually Ross King a Computer
Scientist and Machine Learning Researcher.
The robot is this; the Biologist is this machine here. That machine is actually a complete robot scientist
in a box. It uses inverse deduction to come up with new knowledge of molecular biology starting from
knowing the basics you know like you know the central dogma, DNA, proteins, and so on and so forth.
Then it formulates hypothesis. It actually carries them out physically on its own, right, using you know
micro-rays and you know sequences, and stuff like that. Then it refines the hypothesis or rejects them
and keeps going.
This robot is called Eve. It’s at the University of Manchester. Last year it discovered the new malaria
drug. Now the interesting thing about this is that once you have one robot like that you can make a
million. I’ve just doubled the number of biologists in the world.
[laughter]
Then you can make ten million and then you know then maybe you will cure cancer one of these days,
much sooner than we would otherwise, okay. Now the connection is they think Symbolists are
mistaken, right. The idea in symbolic learning is that you can do it all at a certain level of abstraction
without worrying about the substrate. In essence you know the way mathematicians and logicians
prove things. But the connection is to like well, no, no, no you’re not going to get there that way. The
best learning algorithm in the universe is the one inside your skull. It’s your brain. Let’s reverse
engineer the brain and come up with a general purpose learning algorithm that way.
The most famous of all Connectionists is Geoff Hinton. He’s been trying to, he started out as a
Psychologist. These days he’s mainly a Computer Scientist. But his goal in life for forty years has been
to figure out what is the algorithm by which the brain learns. He’s pretty sure that it can be
encapsulated in one algorithm. He’s been doing this since the seventies. In fact he says that at one
point you know he came home from work very excited saying, yay I did it. I figured out how the brain
works. His daughter said like, oh, dad not again.
[laughter]
But you know these days you know his ideas are really starting to pay off. In particular he was one of
the co-owners of back propagation. Back propagation these days you know is everywhere. Two other
famous Connectionists are Yann LeCun and Yoshua Bengio. Yann is now the Director of the AI Research
at Facebook.
How do we do this? Well, the way we’re going to reverse engineer the brain is as follows, right. The
brain is made of neurons. What we’re going to do is we’re going to build them a mathematical model of
a neuron. We’re going to build the simplest model we can that works essentially the same way a neuron
does. Then we’re going to you know connect those neurons into a big network. Then of course the
problem is how do you train that network? That’s where a back prop comes in.
You know neurons have Dendrites, impulses come in the Dendrites. If the sum of those you know
impulses multiplied by the strength of the Dendrites exceeds a threshold then the neuron fires what’s
called an Axon potential, right. You can think of a neuron as being you know a cell in the shape of a tree.
These are the roots you know the axon you know is the trunk. You know the discharge goes down the
trunk to the leaves, to the branches and the leaves. Then it connects with other neurons at what are
called synapses.
The basic idea behind connectionism and you know a hypothesis that we believe is true about the
human brain is that everything you know. Everything you’ve learned in your life is encoded in the
strings of those connections, okay.
The whole question is well how do you learn those connections? Well, first of all here’s the neuron but
in math instead of in biology, right. What I have is a bunch of inputs. I multiple each input by a weight.
I sum them up so I’m really just doing a weighted average of my inputs, nothing very complicated. If the
result is above a certain threshold then the output is one. Meaning, you know I’ve detected something
let’s say and otherwise it’s zero, okay.
Now, connect this into a big network with lots of layers. In fact the term deep learning comes from the
fact that its neurons, that it’s you know neural networks with many layers. But now we have this very
difficult question which is, okay, let’s say I’m trying to learn you know to recognize cats, right. In comes
a picture of a cat. You know I compute the values of the neurons. Then the output should be one. But
the output is actually point two.
Now, how do I fix the problem? Right, this is what’s known as the credit assignment problem in
Machine Learning. It’s like or maybe more precisely the blame assignment. Because when this is
computing the right thing nothing needs to happen. When it’s making an error some connection
somewhere has to change. For a long time people didn’t know how to do that.
That’s the problem that back propagation solves. In essence the back propagation solves the problem
as saying well, I know there’s an error of point eight here. You have a small weight. You have a large
weight so maybe you need to change more because you’re more responsible for the error.
Now, there’s a corresponding you know error here and here. If I change you, you know how much will
we improve things? Well, how much is each of the weights responsible for that? Maybe something
needs to go up to make the neuron more likely to fire. Maybe something that is making a negative
contribution needs to go down so that it’s not preventing the neuron from firing, okay. In essence this is
what back prop is doing.
As I mentioned you know back prop these days under the name of deep learning is used for all sorts of
things. You know particular things to do with images. Retrieving images, understanding videos, and also
for speech, right. You know like the Skype, as you may know like the Skype simultaneous translation
system at heart is using deep learning for this.
But one very famous example of this that was actually on Page one of the New York Times is the Google
Cat Network, okay. This is a network that folks from Stanford and Google built. At the time it was the
biggest you know neural network ever it has on the art of a billion parameters, I think. They basically
trained it by having it look at YouTube videos, okay. Since the most, you know on YouTube videos the
single most frequent thing that they found was cats, right.
[laughter]
Because people love to post videos of their cats, right. You know the network you know can actually
you know learns a lot of things besides cats. But cats having the most data was the one that they
learned best. It’s become known as the Google Cat Network.
Okay, now the Evolutionaries say well, you know sure the brain you know that’s fine. You know you can
fine tune the connections between your neurons. But how did the brain appear, right. The real master
algorithm is not the brain it’s evolution. Because evolution made not only the brain but all of life on
earth as well. Now that’s a powerful learning algorithm.
Indeed biologists, you know many of them do think of evolution as pretty much an algorithm. We have
our reference of how it works. Now, the first person to actually you know start pursuing this idea was
John Holland back in the sixties. He actually died recently. But then you know a bunch of other people
followed in particular you know.
John Holland invented what I call genetic algorithms. John Koza you know went the next step by
inventing genetic programming. You know Hod Lipson is one of the people today who are doing you
know many interesting applications of you know of genetic learning.
What’s the basic idea in genetic algorithms? It’s quite simple, right. It’s, you know in the same way that
we were, has a computational implementation of the brain you know in the Connectionist’s school.
Here we’re going to have a computational implementation of evolution. You want to solve a problem,
right. Let’s say you want to build a radio. You start out with a random pile of components. Literally,
they start out with random piles of components. Or they start out with you know a thousand random
piles of components. Those are the individuals in your population. Then they go out into the world and
you see how well they do, right.
In the case of an animal it might be how well they survive and you know reproduce and what not. But in
the case of something like a radio well how much do they capture the signal that you want to capture
and so on. Then each organism or you know or program, or whatever has a fitness value. The ones with
the highest fitness get to produce the next generation. They get to you know cross over with some of
the others to [indiscernible] sexual reproduction. You combine you know some you know genes from
one with genes from the other.
In the case of computers, right, the genes are usually just bit strings, right. Because you know we don’t
need [indiscernible] we can just do it with a bit string. There’s also random mutation and then you get a
new population. Then you do the same thing again. If you do this for enough generations often
amazing things start to happen. Like for example you know people like John Koza have been able to you
know develop. They have a lot of patents where the things were invented by the algorithm. New ways
to build radios and amplifiers, and you know and low pass filters and things like that that are different
from the ones that human designers created. But actually many cases work a lot better.
The next step in this and the most powerful version of evolutionary learning is what is called genetic
programming. John Koza’s idea was this. Well, why should we be, sure nature does this with these
strings of DNA but we don’t have to do it with strings, right. A string is a very low level representation.
It’s very easy to muck things up by crossing over a string in some. You could have a perfectly good
program and then you ruin it by basically crossing over you know in a bad place.
His idea was this, let us evolve and let us directly evolve programs, right. A program is really a tree of
operations you know sub-routines, etcetera. Although, we’re down to additions and subtractions, and
you know ands and ors, and what not. The idea in genetic programming is that your individuals are
actually program trees. Then what you do when you, you know build the next generation is you pick a
cross over point in the two parent trees, right. The mother program and the father program if you will.
Then you switch the sub-trees.
For example if you had these two trees and you did this cross over at this point one of the resulting trees
would be the one with all the white nodes, which is actually one of Kepler’s Laws. It’s the law that gives
the length of the year as a function of the average distance of a planet from the sun. Okay, the length is
you know a constant times the square root of the distance cubed, okay.
As I said, you know this type of approach has lead to many interesting things like new electronic circuits.
But these days you know perhaps the most exciting or perhaps the most scary thing that the
Evolutionaries are working on is actually evolving issue of flesh and blood robots, right. This actually,
you know they’re not just doing this as a simulation on the computer anymore.
They actually have, you know this is a real robot from Hod Lipson’s lab. They start out with these robots
you know basically trying to crawl around and stand up, and then run faster and things like that. Then
each generation of robots the fittest ones actually get to program a three D printer to produce the next
generation of robots, okay.
[laughter]
If Terminator happens this might be the route by which we get there. Okay, so watch out for those little
spiders. You know one of them might be a robot.
[laughter]
Okay, so now the Bayesians. The Bayesians have a very different view of things, right. They don’t
believe in being inspired by nature whether it’s evolution or the brain, or whatever. They think we
should solve learning from first principals, okay.
You know Bayesianism has a long history in statistics. You know Bayesians are known in Machine
Learning as being the most fanatical of all the tribes. They have to be because they were persecuted in
minority in statistics for a long time. They have to become you know like really determined. It’s a good
thing they did because they certainly have a lot to contribute. These days on the back of powerful
computers [indiscernible] Bayesianism is on the rise even within statistics.
Within Computer Science probably the most famous Bayesianist is Judea Pearl who invented something
called Bayesian Networks and actually won the Turing Award, the Nobel Prize of Computer Science a
few years ago for that. He’s a Professor at UCLA.
Another famous Bayesian is Microsoft’s David Heckerman. In fact Microsoft Research in its early days
was a hot bed of Bayesian learning. It still is, but of course now it’s much, much more varied. You know
and perhaps the best known Bayesian these days in Machine Learning is Mike Jordan.
What do the Bayesians do, right? What do they believe in, right?
[laughter]
Well, Bayesians believe you know in base theorem. If you have a learning algorithm that is incompatible
with base theorem you’re wrong.
[laughter]
In fact Bayesians love this theorem so much that there was a Bayesian Machine Learning startup that
actually had base theorem written in neon letters and they put it outside their office. Right, so that’s
base theorem in neon you know shining through the night.
What is the big deal, right? Like this is a very small expression, right. It almost doesn’t merit being
called a theorem. The proof is extremely simple but it’s extremely important, right. Base theorem is
really just a way to say well I start out with a hypothesis. I don’t know how much I believe in it, right.
The key problem that the Bayesians are dealing with is uncertainty. Any knowledge that you learn is
always uncertain.
What I have is it’s really a combination of these things. There’s my prior probability of a hypothesis.
How much I believe in this hypothesis before I see any evidence. The hypothesis could be as simple as a
binary decision. This person has AIDS versus this person doesn’t have AIDS, right. Or it could be a whole
Bayesian network or a whole neural network, or decision tree, or program, or what have you.
You start out with your prior belief which is how much you believe in your hypothesis before you see
any evidence. Then there’s this other part which is the likelihood which is how likely the evidence is if
your hypothesis is true. If this hypothesis makes the evidence very likely then in return the hypothesis
you know our learning happens, like the hypothesis also more probable because it made what you’re
seeing likely.
Then when you combine the two by multiplying them you get what’s called a posterior probability which
is how much you believe your hypothesis after you’ve seen the evidence. Then you also need to
normalize using you know the marginal probability of the hypothesis. But let’s not worry about that.
It’s just to make everything add up to one, okay.
What you do in Bayesian learning is that you start out with a whole bunch of hypothesis and then the
probabilities evolve. Hopefully some hypothesis becomes a lot more likely. Some of them become a lot
less likely. But it may never be the case that there is a single hypothesis that you should believe in, okay.
The Bayesian view in some ways is very deep in terms of what it says about the world that says, no there
isn’t the single two hypotheses. They’re just your prior beliefs and then the evidence transforms into
posterior beliefs.
Now Bayesian learning has been used for all sorts of things. But a very famous one that I think almost
everyone has benefited from it actually started here at Microsoft Research, is Spam Filters. Right, the
first generation of spam filters and today you know many of them still are you know based on Bayesian
learning. The hypothesis here is, is this a spam email or is this a good email? The evidence is things like
you know the words in the email, right. If the email contains the word Viagra that makes it more likely
to be spam, if it contains free in all capitals more likely to be spam. If it contains four consecutive
exclamation marks that makes it more likely to be spam.
[laughter]
If it contains the word your mom or the word your boss then it probably isn’t spam. Or you might want
to you know file it as not spam just to be safe, okay. You don’t want your mom to get mad at you.
Okay, finally the Analogizers. The Analogizers are a looser tribe than the others, right. They’re really a
bunch of sort of like different sets of people that do have this thing in common that they learn by
analogy. By looking for similar, you know when you’re trying to solve a problem you find similar
problems in your experience. Then you try to you know transfer the solutions from one to the other.
You know one of the early pioneers in this was Peter Hart. You know there’s this algorithm called
nearest neighbor which we’ll see shortly, which despite its simplicity is surprisingly powerful. Vladimir
Vapnik is the father of kernel machines, also known as support vector machines. Douglas Hofstadter,
right he’s a cognitive scientist and he’s the author of Gödel, Escher, Bach. He recently wrote a whole
five hundred page book arguing that analogy is all there is to intelligence. Right that’s five hundred
pages explaining why everything we do is just analogical reasoning. He definitely believes that analogy
is the master algorithm.
What is the idea here? Well, let me illustrate it you know using a very simple puzzle. The puzzle is this;
let’s say I give you two countries. You know I’m calling them Posistan and Negaland.
[laughter]
I give you the map. I tell you where the main cities in each one are. You know here’s Positiville the
capital of Posistan. A bunch more cities in here and some cities of Negaland. Then my question to you
is like if I tell you where the cities are. Where is the boundary between them? Where’s the frontier,
okay? Now you don’t know, right. But a reasonable thing to do is to say well I’m going to assume that a
point on the map is in Posistan. If its closer to some city in Posistan than it is to any city in Negaland,
okay.
For example this line here is the set of points that are at the same distance from this city and this city.
It’s part of the boundary, okay. Even though this is very, very simple right, all you have to do is
remember the data. You don’t actually have to do anything whatsoever at learning time, right.
Peter Hart actually proved the theorem that says if you give this algorithm enough data it can learn
anything. It can learn any function, right. In that basic sense of the term this is actually you know a
master algorithm. It’s completely general purpose.
Of course the question is how much data do you really need to learn and then how efficient is it? This
algorithm is not ideal for a couple of reasons. One is that notice there’s a lot of you know things that
you really don’t need to remember. If I threw away these cities, right, the frontier wouldn’t change. All
I should have to keep are what are called the support vectors. The support vectors are the examples
that cause the frontier to be where it is, right.
In fact support vector machines take their name from the support vectors because they just figure out
what those are. The other thing that they do is they produce a much smoother frontier, right. This one
is a little bit jagged and you know it’s probably not the true frontier. Support vector machines you know
throw away all their necessary examples. They produce a smaller frontier. The way they do that is by
then what’s called maximizing the margin.
Imagine that I tasked you with walking you know from south to north while keeping you know all the
cities in Posistan on your left and all the cities in Negaland on your right. But with one extra condition
which is you want to stay as far away from those cities as possible, okay. Imagine that the cities are
landmines and you don’t want to step anywhere close to them if you can help it, okay. This is, so you
want to maximize your margin of safety. That’s exactly what support vector machines are doing, okay.
You know these algorithms have been used for all sorts of things. But one very famous one is
Recommender Systems. Again, these days you know all sorts of learning algorithms get used to
recommend products to you. But the original one and still one of the best ones is in essence a variation
of nearest neighbor. If I want to recommend movies to you what I do is I find people with similar tastes
to yours. If you tended to give you know high stars when they give high stars, and low stars when they
give low stars. Then if there’s another movie that they give five stars to and you haven’t seen. Then I’m
going to hypothesis that you would like that movie as well, okay.
You know I’ve seen it in multiple places that, although I don’t think anybody knows this officially that a
third of Amazon’s business comes from its recommender system. Three quarters of Netflix’ business
comes from its recommender system. You know like this thing is really at the heart of what these
companies do.
Alright, so let’s take stock of what we’ve seen. We have the five tribes. Each of these tribes has a
particular problem that it’s solving that the others don’t solve. This is an important problem for
Machine Learning. If you want to have a universal learner you have to solve that problem.
The Symbolists learn rules that you can compose in arbitrary ways. The others don’t do that.
Connectionists know how to. They do that using you know their master algorithm, their solution to the
problem is inverse deduction, right. Inverse deduction is how you learn these rules that you can
compose. Connectionists solve the credit assignment problem using back prop.
Evolutionaries discover structure, right. Before you can start frontiering the connections in your brain
you have to figure out what the structure of the brain is. Well, you know evolution you know is the one
that came up with that. Their solution or most powerful solution is genetic programming. Bayesians
solve the problem of dealing with uncertainty. They use probability. They use base theorem and then
they do inference to figure out what the probabilities are after the evidence.
Analogizers actually do something that none of the others can which is to generalize from the very few
examples to things that are very different. Remembering the Posistan and Negaland example, if all I
knew was the location of the capitals of both countries. I could already form a reasonable
approximation to the frontier. None of the other approaches can do this.
There are you know forms of analogy that are much more powerful than the ones that I showed you
here, okay. That even allow you to generalize to completely different domains. Like you can learn you
know to solve things as a physicist and then you get employed you know to predict the stock market
using the same skills that you acquired, right. Humans are able to do this, analogy is able to do this, the
other ones aren’t.
But here’s the thing as much as each of these tribes really believes in its approach. You know has been
very brilliant and very determined in you know in making progress. At the end of the day if we really
want to have a master algorithm we have to solve all five problems at once. We need a single algorithm
that actually has all of these properties.
The question is you know what would that algorithm look like, right? In some sense what we’re looking
for here is a grand unified theory of Machine Learning. In the same way that the standard model is a
grand unified theory of physics or the central you know dogma is the grand unified theory of biology.
What might that look like right? Well, we don’t have that yet. But we’re actually making very good
progress. In particular in the last decade we’ve made amazing amounts of progress.
Here’s the first thing to notice, right. If we, these algorithms look completely different, right, but
actually they’re not. All these learning algorithms and you know many others that I haven’t shown
they’re really all composed of three pieces. The first piece is representation. What is essentially the
programming language in which the learning algorithm is going to write the programs that it discovers,
okay? You know learning algorithms tend not to program in Java or C++, or anything like that. They
program in things like first order logic, right. But, or it could be a neural network, okay.
Now the first thing that we need to do is to find the unified representation that is powerful enough to
do this. You know Matt in his PhD thesis found one. It’s called Markov logic networks. What it does it
actually combines logic which is what the Symbolists use with the graphical models that you know the
Bayesians use. In particular it combines logic with Markov networks which is a different type of, in some
ways a more powerful type of graphical model than Bayesian networks.
Really what a Markov logic network is, is just you have your formulas in first order logic in which you can
encode basically anything you want to encode. Then you give them weights. If you really believe a
formula you give it a high weight. If you’re not very sure you give it a low weight. Then the state of the
world is very probable if a lot of the high weight formulas are [indiscernible] it. It turns out that this
representation is a very nice generalization of almost everything that we use in Machine Learning.
That’s the representation part.
The second part that all learning algorithms have is evaluation. Evaluation is deciding if I give you a
candidate program how good is it? Well, usually we decide it’s good because it fits the data well,
meaning it makes accurate predictions. Also maybe it’s simple. It has other desirable properties. One
thing that we can use here is just the Bayesians posterior probability, right. This is a popular option.
But more general the evaluation I would argue should not be a part of the algorithm. It should be
something that it takes from the user. You the user tell the master algorithm what you want it to do for
you. If you’re a company say maximize my profits. Or whatever you goal for that particular problem is.
If you’re an individual you say well maximize my whatever, my utility, or my you know whatever you
want. Then the algorithm just goes and optimizes that.
Finally there’s optimization which is really where most of the work goes in Machine Learning.
Optimization is finding the algorithm, the program that maximizes your score out of all the ones that the
language allows. For example in the case of Markov logic networks what is the set of formulas and their
weights? That for example most faithfully represents what I’ve seen in the world without being overly
complex, okay.
Here there’s a very natural combination of evolutionary and Connectionist ideas, right. We can use
genetic programming to discover the formulas, right. We have a population of formulas. We mix and
match them. We mutate them. You know we [indiscernible] them, have them remove things, take one
part of another, combine it with a part of you know, part of one combined with part of another, and so
forth, right.
We can use genetic programming to discover the formulas. Then to learn their weights of course we can
use back propagation, right. In pretty much the same way that it gets used in neural networks. Again,
you know we have many standard neural networks like both the machines for example is direct special
cases of this, okay.
You know we’ve made good progress. But of course you know we’re not there by any means. There’s a
lot that remains to be done. My feeling actually is that even if we succeed in completely unifying these
five you know paradigms and I think we’re getting pretty close. I don’t think at that point we will
actually have solved the problem. I think there are some major ideas that we’re going to need that
haven’t been discovered yet.
In some ways the people who are already working in one of these schools who are very deep experts in
one of them are not the best position for that. You’re actually better off you know if your thinking is not
along those tracks and if you have more of a distance. We need your ideas. If you read the book and
have a brilliant idea about what the master algorithm should be please tell me so I can publish it.
[laughter]
Okay, let me conclude just by saying a little bit about what I think are the things that the master
algorithm will enable that current learning algorithms cannot yet do. The reason these things need a
master algorithm is that again all of those five problems are present in them. If only one was present
then you could use the appropriate paradigm. But you know in most of the really important, really had
problems all the five you know issues are present.
One of them is home robots, right. Home robots are coming at some point. We would all love to have a
robot that you know cooks dinner for us, does the dishes, you know makes the beds, and so on. But it’s
a very, very hard problem. Everybody in AI believes it can’t be solved without Machine Learning. But
also you know what kind of Machine Learning will it take to build such a robot? I think it is something
like the master algorithm that we’re looking for.
Here’s another one. Everybody these days is trying to turn the web into a big knowledge base. You
know there’s an effort like that in Microsoft, there’s one at Google. There’s you know one at Facebook,
there’s several in academia. You know this is the idea that well the web is a massive text and images
which is very hard for computers to deal with. If we can turn that into something like a knowledge base
and first order logic then instead of you know issuing key word queries and getting some pages back you
can actually have a dialog with a computer. Where you know ask questions and it reasons over its
knowledge and it gives you the answers, right.
This would clearly be a great thing to have. But again we can’t solve it without Machine Learning. We
can’t solve it without the kind of universal learner that we’re talking about here. I think each of the
paradigms that we have right now is not enough. You know people try to hack some things on top of
the one that they start with. But we need a more fundamental solution.
Here’s another example, perhaps the most important one of all which is curing cancer. Why is curing
cancer so hard? It’s because cancer is not a single disease. Every cancer is different. The same cancer
mutates as it goes along. Somebody’s cancer right now is different from the one that it was six months
ago. It’s very unlikely that there’s a single drug that will magically cure all cancers.
What you actually need is something like a learning program that takes in you know the genome of the
cancer, its mutations, right, relative to the genome of the patient which it also knows, and the patient’s
medical history. From that it predicts, what is the drug that will be you know the best for that patient,
or combination of drugs, or sequence of drugs? In a way it’s not unlike a recommender system. Except
instead of recommending a book or a movie it’s actually recommending a drug to treat your cancer.
This is something that’s beyond the ability of any human being to do, biologist or doctor, or whatever
because there’s just too much information. There’s just too much knowledge that goes into
understanding how a cell works, right. You don’t just need to cure the cancer you also need to not you
know destroy the cells in the process or harm them, right.
Both the amount of information that you need to bring into this model is too large. The number of
things you have to attend to it at diagnosis time, even just the sheer number of drugs that you might be
able to use are beyond any human being. But I think with something like the master algorithm we will
be able to do it. You know there are many people working very energetically in that direction.
Finally, let me mention this idea of three hundred and sixty degree recommenders. The kinds of
recommendation systems that we have today recommend one type of thing. They’re based on the
knowledge that one of these companies has a view. Right, Amazon can recommend things based on
your clicks on the Amazon site. You know Netflix can recommend things based on you know all you do
on Netflix, same with Facebook and Twitter, and so on and so forth.
But what you would really like to have is a model of view that is based on all the data that you’ve ever
generated. If fact in the limit that’s generated from you stream of consciousness. Imagine what you’re
doing being recorded you know video and audio recorded you know continuously. Now, what you want
to do is learn on that a very, very good model of you. That knows you much better than any of these
models that I’ve based on a sliver of your data. Then that model helps you, you know at every step of
your life. Recommends things you know large and small to you. You know like from books to read, to
who to date, and you know where to go to college, and what house to buy, and what not, okay.
Again, this involves all the problems that we talked about. I think you know current Machine Learning
algorithms are certainly not up to that. But I think if we discover the master algorithm. We will be able
to do this. Then I think your personal model would become you know even more important to you than
your Smart Phone. Or you know like living life without the help of your personal model will come to
seem impossible. With that model you will live you know a happier and more productive life than we do
today.
Thank you.
[applause]
I’ll take questions, yeah.
>>: What do these popular [indiscernible] to find tribes.
>> Pedro Domingos: Yes.
>>: Six tribes or what?
>> Pedro Domingos: No, no you asked a good question. Decision trees are very much a symbolist
algorithm, right. They come from the Symbolist school of Machine Learning like Ross Quinlan is a very
good example of that, right. Going you know all the way back to the sixties.
Boosting is a simple way of combining algorithms. You could think of these on [indiscernible] methods
as a step on the way, they’re the simplest way you can combine this algorithms, right. They’re very
successful. But they’re also somewhat shallow, right. It’s not a deep combination. It works but we
should be able to do better.
>>: Thank you Pedro for the talk. Very quickly I wanted to get your thoughts on [indiscernible]
philosophical issue behind what’s the ethics behind the universal learner? Obviously, when the
universal learner will provide suggests is one thing. But one day the universal learner provides actual
choices, that’s another thing. What are your thoughts on that?
>> Pedro Domingos: Well obviously, you know I didn’t have time to go into that. The implications of
this for you and for society are very large, right. Who makes the decisions and why? Who are the
algorithms serving, right? The thing to realize is that today already a lot of the decision is being made
for you by an algorithm, right.
It’s for example, if you want to you know take your pick right, buy a book, right. Or even you know let’s
say search results. The learning algorithm already throws out ninety-nine point nine, nine percent of
things. Then from the ten that you look at you get to pick, right. You’re ultimately in control but only
what was already preselected by the algorithm.
You know the more the information society grows the more choices there are, right. This is a weird
thing. We have amazing choices today but we don’t have the time to make them. Which is why having
a model is good because what that model does it’s going to make a most of those choices for you.
Finally it’s going to leave the ultimate control to you and it’s going to learn from what you say, right. I
think this will be very useful.
At the same time you want to make sure that that model is owned by you, right. You know there’s a
little bit of danger in the model being owned by someone who has a conflict of interest. Like you know
like Serge [indiscernible] says that Google wants to be the third half of your brain.
[laughter]
Well, great but you know you don’t want the third half of your brain to always be trying to show you
ads, right. That would be a little worrisome, right. There are a lot of you know issues to consider here.
Yeah?
>>: Kind of looking at an early attempt and the current attempt. Look at [indiscernible] and also looking
at Watson...
>> Pedro Domingos: Right.
>>: What did each of those do poorly? What good things came out of those?
>> Pedro Domingos: Yeah, so [indiscernible] was the ultimate attempt by, so in AI they’re sort of like the
knowledge engineering school, right. We’re going to build intelligence systems by writing in all the rules
manually, right. Then that didn’t work because there are just too many rules. In fact, if you don’t learn
it doesn’t matter whether you put ten thousand or a hundred thousand, or a million rules. Your robot or
your system will always very soon come up against something it doesn’t know.
That was a failure but it was an interesting failure because we learned from it. One thing that we
learned from it is that you’ve got to use Machine Learning. Watson uses Machine Learning a lot. But it
uses; you know it use sort of like this big hodge podge of hundreds of different learning modules.
This is I think; Watson well illustrates the problem with trying to build AI by just treating as a pure
engineering problem. You can engineer a system that will solve Jeopardy but will that work well on
other problems. Well, so far you know there’s no evidence that it really does. Yeah, so I think we, it’s
because intelligence is complex that we need the simplest algorithms that we can get. I think both of
these were useful but you know neither of them is really the answer.
>>: You’ve seen on Machine Learning being used in the legal field to make decisions. I mean from
recommendations of…
>> Pedro Domingos: Yeah, let me give you, exactly this is one of those unexpected examples where we
take this highly qualified white collar work. It’s actually more easily automated than you know than
construction work, right.
One example of this is you know the Analogizers actually one of their favorite applications is law
because of case based reasoning, right. People in the law reason by cases which is they look for a similar
case. Then they say like, oh, the Supreme Court said this and this case is analogist, so how do we do
that? This is one successful application.
They’ve had things like you know somebody to study to see whether you could predict the outcomes of
various you know decisions. The winner was an algorithm of this type much better than the human
experts. Another example is eDiscovery, right. You know two companies have a lawsuit. They’re
allowed to look through each other’s documents.
These days there’s like millions of emails, right. They use to basically employ you know junior lawyers to
go through those piles of stuff. It turns out if you do that using some of these learning algorithms they
actually do better than the human beings, right. I think partly it’s because the human beings are very
bored and no one’s really you know paying attention to what they’re doing, but still. Yeah?
[laughter]
>>: Are you afraid that someone will hijack the system basically. You know I’m going [indiscernible] you
with [indiscernible] if you know how the opponent thinks [indiscernible], or [indiscernible] support the
system to give you the wrong advice.
>> Pedro Domingos: Yeah, so when you make [inaudible]. What if I have an enemy and that enemy’s in
control of the learning algorithm?
>>: Well either in control or he just knows what is in the learning algorithm, knows things about you.
>> Pedro Domingos: Yeah, actually I had this student of my Daniel [indiscernible] who you know in his
internship at Microsoft Research worked on this problem. This is a problem of adversarial learning,
right. I build a spam filter, right. I mean I remember when David Heckerman was like; oh we’ve almost
solved the spam problem because their spam classifier was ninety-nine percent accurate.
But of course the spammers wouldn’t stay still. They figured out how to defeat the classifier. It turns
out it’s not that hard. What you have actually is that you have to have Machine Learning algorithm that
is a combination of Machine Learning and game theory.
We say like, if I deploy this classifier what is the spammer going to do. Now let me do something that
defeats what the spammer does. Like for example they would put hyphens in [indiscernible] which
actually defeated you know the key word you know search that the learning algorithm did. But if you
now look for words with a hyphen in the middle that actually is even a better way to detect spam. You
need these kinds of algorithms. You know it is an arms race but it certainly an active area. Yeah.
>>: Have you seen the Machine Learning algorithm to decentralize like you know live in the datacenter
somewhere or it should be distributed like live in ones Tablet, phone, or?
>> Pedro Domingos: Yeah, that’s a very good question, right. Are they going to be centralized or
distributed?
>>: Yeah.
>> Pedro Domingos: I don’t think there’s going to be a unique answer. It’s going to depend on things
like how efficient the various parts of this are. How much you care about your privacy, right. I mean all
of these things these days even your Smart Phone. Like what they do is like you speak into them and
they send it to the Cloud. Then the Cloud is actually where the understanding is going on.
USDs don’t necessarily care, right. Partly it’s a question of just engineering efficiency. But partly it’s a
question of for example how safe do you want to be, right? Ultimately if you don’t want for example
the government to subpoena your data from the Cloud then maybe it has to always be with you. In
which case you know you have the rights to it. I don’t think there’s going to be a unique solution.
I do think there’s going to be a place for a kind of company that is kind of like the repository of all your
data. The company that learns the model for all your data whether this is one of the current companies
or not is an interesting question. But I think you know I think we’re going to see things like that
happening, yeah.
>>: In terms of pairing machines and humans, have any one of the models been better than the others
at things like augmented cognition?
>> Pedro Domingos: Yeah, I mean each, the reason these paradigms you know continues is that each
one of them has successes that it points to in terms of the things that it does better than the others.
Often when you compare them with humans, right, there are certain things at which these algorithms
are better than human beings, right.
You know a famous example these days is Deep Mining, right. They can play a lot of Atari you know
games better than any human being. At the same time you also see the things that they don’t do very
well. You know Deep Mining plays Pong very well. But Pong is basically just making this you know
racket go up and down. It can’t play Pac Man very well. Pac Man, you know I’ve talked to some people
there and they’re well for that you start to need reasoning and planning, the kinds of things that the
Symbolists are better at.
Eventually, so the amazing thing is you the human being cannot do these things, individual things as well
as the algorithms. But overall we’re still the best. It’s the master algorithm that I think will be able to do
things as well as, or better than people, by combining these things and maybe some others that we
haven’t figured out yet.
>>: Is there a master scenario or a master test that will verify the master algorithm?
>> Pedro Domingos: Yeah, so like this is definitely you know an important question. How will we know
when we found the master algorithm, right? I think the essential answer to this is that it has to be an
algorithm that solves not just one problem but many different problems that it didn’t see before.
These days when people do tests in Machine Learning it’s like, oh, I want to learn to do object
recognition. I will give you know a data base of images, train on some of them, then test on the others,
right. But this is, you know but that’s just a vision, right. The real test is I will give you a set of tasks to
learn on you know like for example vision and robotics, and what not.
Now I’m going to give you a task where the task itself is different from the ones that you saw before,
right. The traditional kind of learning wouldn’t have worked. You have to be able to do more than one.
It’s this idea that one algorithm has to be able to do many things at the same time.
If you think about it this is implicitly what the community already does with Machine Learning
algorithms, right. They test, they learn on these data sets for medical diagnosis or vision, or whatever.
Then they go test it on others and they get excited when it does well on those. In some sense I think
this is what we’re going to have to do. Yeah?
>>: You said in schools of thought that have if your answer have successes, is there another example of
the school thought that’s failed and why did it fail?
>> Pedro Domingos: That’s a good question. I think the interesting thing about the history of Machine
Learning is that all of these schools of thought failed at some point and then they came back. You know
maybe there’s minor ones that are there. I mean, so just to be clear there’s other things that I talk
about in the book that I didn’t talk about. Things like re-enforcement learning for example and
supervised learning, right. There are you know a whole cycles on Machine Learning that I haven’t talked
about.
But I think what is amazing is like for example take connectionism, right. In the fifty’s when Frank
Rosenblatt you know first started doing this. This was like, oh we know, we’ve almost figured out how
the brain works. Then it died, right. Basically, the Symbolists killed it, right. You know [indiscernible]
wrote this book saying look there’s all these things that your neural networks can’t do.
For twenty years they were dead. Then in the eighty’s they came back when back prop was invented.
Then they fitted the way again. Then the Bayesians were on the rise. Then the next decade you know
the Analogizers were on the rise. Now you know deep learning is on the rise again. Some of the more
optimistic of the deep learners are like yeah, we got it, right. We solve vision. We solve speech. Next
we’re going to do language and reasoning, and you know any day now, right.
I’m not kidding, right. My view is like if you look at the history you know just extrapolate in a naïve
Machine Learning sense. Like you know there’s going to be another five good years of deep learning
and then some other school will have another, right. But I think eventually we have to converge to
something that actually is not just one school or the other, but the combination of all of them. Yeah?
>>: Do you think that [indiscernible] solve a moral problem. Tell us what to go to, what’s bad…
>> Pedro Domingos: That’s a great question, right. You know like of making moral decisions. In fact it’s
a very pertinent question these days because of the issue of you know intelligent robots in warfare,
right. More and more we’re seeing autonomous drones. Some countries actually have drones that can
you know autonomously decide to fire. Other, some people say this is terrible and we should ban them.
I think that’s actually a very bad idea because you know it could save a lot of lives to replace human
soldiers by robots. Also robots you know they don’t get angry, they don’t get scared, they don’t get
vindictive. You know robots may actually have better judgment.
But then there’s this basic ethical judgment question. One thing that people are pursing like people like
Ron Arkin is you can maybe program certain rules of morality into the robots. Like Asimov’s Three Laws,
right. But the problem with Asimov’s Three Laws is that you know everyone of his stories about a failure
of one of those laws.
[laughter]
Just those laws are not enough to figure out what to do in every circumstance. But now I think the
Machine Learning answer to this question is sure you can program those basic things into the robot. But
then the robot can learn to make moral decisions by observing what people do. The problem there is
that I think if a robot does that he will be very confused. Because we have these moral principals but
most of the time we don’t follow them, right.
[laughter]
I think at the end of the day you know in this as in many other things Machine Learning and AI are
actually going to force us to confront. You know I think we in some ways are going to learn more and
more about morality. Because we’re going to have to teach the robots morality and in that process I
think we’re going to have to figure some things out ourselves. I think the hard part of programming
morality into robots is not the programming part, it’s the morality part.
>>: What has led to the insider expectation that there would be one master algorithm? Why couldn’t it
be five or ten, or?
>> Pedro Domingos: No, so to be more precise, right. Here’s an analogy think of Turing machines, right.
A universal Turing machine is a universal model of computation, right. There are dozens of other, even
hundreds of other universal models of computation that are Turing equivalent. It’s not that there’s only
one model, right. It’s that there are many equivalent ones. Likewise there will be many equivalent
versions of the master algorithm just like there are many equivalent versions of you know the laws of
physics, right. There’s Hinton’s version, the [indiscernible] version, the Hamiltonian version, right.
>>: Just one standard model.
>> Pedro Domingos: Well, I mean we could even debate that, right. The standard model is a bit of a
patchwork, right. But the point is, right. We want to find the first one, right. We in induction, right, are
at the point where deduction was before Turing came up with the universal Turing machine, right.
In some sense it doesn’t matter exactly what form it takes. You know then there would be a lot of
engineering the variations and what not, right. Because again these things are probabilistic so it’s a little
bit different from deduction. But yes there could be more than one. But at some level at the end of the
day they have to be equivalent.
All of the schools of Machine Learning have these theorems that say my method can learn anything
given enough data. A neural network can learn anything if it’s large enough and you give it enough data,
same with a set of rules, same with nearest neighbor, etcetera, etcetera.
But the problem is like how well can they learn it, right? It’s not just being able to learn it in principal.
It’s being able to learn in practice. In practice different things maybe better for different problems. But
to pick another analogy think of the microprocessor, right. A microprocessor is not the best solution to
any computing problem. There’s always the [indiscernible], right, the circuit specific with application
that does better. Yet, it’s microprocess that we use for everything. The master algorithm in a way is for
Machine Learning what a microprocessor is for computer architecture.
>>: To clarify there’s a difference between the master algorithm and strong AI, correct?
>> Pedro Domingos: Yeah, so this is another good question, right. The idea of strong AI is that we can
actually have a computer that is not just as intelligent as humans. But actually is indistinguishable from
humans, right.
The master algorithm hypothesis is actually in some ways doesn’t say anything about the strong AI
hypothesis, right. The master algorithm is just a learning algorithm, right. It’s not conscious. It’s not an
agent. It doesn’t have a will of its own. Now if you want to create strong AI I definitely recommend that
the shortest path to is to try and get the master algorithm. You know like Ray Kurzweil wants to invent
strong AI by reverse engineering the human brain. I am willing to bet him that we will figure out the
master algorithm long before we figure out how the brain works.
In fact you can’t figure out how the brain works without Machine Learning, right. You know there’s this
field of connectomics where it takes slices of the brain and literally do the circuit [indiscernible] literally
reverse engineer the brain. But the amount of information is so large, right. That they are desperate
you know like Sebastian Seung is always asking for like Machine Learning you know students who will be
post docs with him to actually use the Machine Learning to do the reverse engineering. Yeah, I think if
you believe in strong AI this is a good path to it. But you don’t have to believe in strong AI to think that
the master algorithm is a good you know research you know direction to work on.
>>: Related to that is several life size humans that are very biased. We always think we’re right for
example. Do you think the master algorithm will be more towards that way or more statistics based?
>> Pedro Domingos: Sorry, humans are very what?
>>: They’re very biased towards thinking they’re always right, sort of over confident normally.
>> Pedro Domingos: Yeah, so human beings exactly. There’s a large body of psychology you know
evidence that human beings tend to be over confident, right. We human beings actually tend to over fit
in Machine Learning terms. I think the beauty of Machine Learning in some senses that you have a knob
that you can turn, right. Any good algorithm, any learning algorithm has this knob that allows it to be
more you know have more bias or more variance, right, as the technical terms go.
However, an interesting thing is like why human beings are on the side of over fitting, right. I think there
probably is a reason for that is that it’s not the individual human being that’s the unit. It’s the group.
It’s the society, right. If I have a hundred people all from over fit one of them won’t be over fitting. That
one will learn something. As a result the whole tribe if you will, will learn something faster than if all of
them were trying to be you know as you know accurate as possible.
Maybe there’s a similar, you know again in model and [indiscernible] we were just talking about
boosting. There’s a little bit of this. The individual models in [indiscernible] can be over fit, right. The
decision trees for example can be wildly over fit. Then when you combine them you actually have a
much better model, so maybe that’s the reason, yeah. More questions.
>>: Yeah, so in the [indiscernible] you showed that what goes into master algorithm is output and data
assuming subtly implying that it’s supervised learning. But I thought the master algorithm
[indiscernible] learning.
>> Pedro Domingos: No…
>>: Leave out that…
>> Pedro Domingos: No, I agree, so that you know this is just you know a forty-five minute talks, right.
That is the supervised version of things. In the supervised version just that it goes in and the algorithm
figures things out, yeah, definitely.
>>: Do you think it might be the master algorithm…
>> Pedro Domingos: No the master algorithm should be able to learn from any amount of supervision.
We already have things like Markov [indiscernible] that can learn from any amount of supervision,
completely supervised, completely unsupervised, or somewhere in between, okay.
>> Amy Draves: We’ll take one more question.
>> Pedro Domingos: Yeah, one more question.
>>: Do you think [indiscernible] like come from [indiscernible] that is so unique that we can’t simulate it
in a computer by zero and one [indiscernible]?
>> Pedro Domingos: Yeah, so there are some people who say that, right. Like Roger Penrose says you
know we will never succeed because the brain is doing this mysterious quantum mechanics. Nobody
else believes that. I’m not even sure he believes that.
[laughter]
But in general this is a plausible. Here are two versions of that, one is there is some magic going on in
the brain, right. If you’re a scientist and your reduction is that can’t be the case, right. However, there’s
this more subtle version of this which is that the brain is so complex that we will never figure it out,
right. It’s like the notion that like if we were so simple that we could understand ourselves it would be
so stupid that we couldn’t, right.
[laughter]
But the thing is that like it’s not one person trying to understand one brain. It’s a whole scientific
community of tens of thousands of people over decades trying to understand the brain. I think we will
succeed in the end. Also because the master algorithm can be a lot simpler than the brain, right. The
brain does a lot of things besides learning. It has a lot of evolutionary [indiscernible] which we don’t
need. You know if all we want to do is have learning on a computer.
You know I think that we’re only going to find out by trying. Even if we don’t succeed I think we will
discover much better learning algorithms along the way. It’s definitely something worth doing. Thanks
everybody.
[applause]
Download