>> Matt Richardson: Alright, good afternoon everybody. My... Researcher here at MSR in the Machine Learning Group. ...

>> Matt Richardson: Alright, good afternoon everybody. My name is Matt Richardson and I’m a Researcher here at MSR in the Machine Learning Group. I have the distinct honor of introducing Pedro Domingos to you all as part of the Microsoft Research Visiting Speaker Series. Pedro is here today to discuss his book The Master Algorithm. But before he starts I wanted to say a few words about Pedro. Pedro was my PhD Advisor at the University of Washington. He has a very impressive biography which you can all read about in the talk announcement. He’s won numerous awards including the KDD Innovation Award which is considered the highest honor in the data mining field. He has an amazing breadth of knowledge and always brings a valuable fresh perspective to any problem that he wants to tackle. I haven’t read the book yet but I’m really looking forward to it. I’ve always admired Pedro’s writing skills. I remember in Grad School my office mate and I who were both his students were impressed that no matter how carefully we would write something. Pedro could always edit it to be more concise, clear, and illuminating. I expect Pedro will have brought the same writing style to his book. Without further ado please help me in welcoming Pedro Domingos. [applause] >> Pedro Domingos: Alright, thanks everyone. Thanks Matt for the introduction. As you can see some of my grad students even survived the ordeal and went on to do great things. I’m here today to talk about The Master Algorithm. How the quest for the ultimate you know learning machine will remake our world. What is this all about? It’s about a change that is happening in the world today that is as big as the internet or the personal computer, or electricity were in their time. In fact it builds on all of them. You know it affects every part of society. It touches everybody’s lives. It touches your life right now in ways that you’re probably not aware of. You know it’s, huge fortunes are being made because of it. Also, unfortunately, and sometimes unnecessarily many jobs are being lost because of it. Children have been born that wouldn’t be alive if not for it. It may save your life one day. This change is the rise of Machine Learning. That’s what the book is about. What is Machine Learning? If you’re already familiar with, if you’re not familiar with Machine Learning this talk should be quite valuable. If you are familiar with Machine Learning you hopefully get maybe a new view of it than what you had before. In one expression Machine Learning is the automation of discovery, its computers getting better with experience like we do, learning by themselves. It’s a little bit like the scientific method except its being done by computers instead of scientists. As a result it’s on steroids. You know the algorithms. You know the learning algorithms they formulate hypothesis. They test them against data. They refine the hypothesis. They repeat the cycle, again a lot like scientists do except its millions of times faster. As a result we accumulate knowledge you know in any given period of time you know Machine Learning can accumulate millions of times more knowledge than human scientists every could. Now most of this knowledge so far is not very deep. It’s not you know like Newton’s Laws or the Theory of Relativity. It tends to be more mundane knowledge. But you know mundane knowledge is what life is made of. You know what do you search for on the web? What do you buy when you go on Amazon? What are your tastes? If you’re a company Machine Learning helps you understand your customers better. If you’re an individual Machine Learning helps you find books to read, movies to see, jobs, even dates. A third of all relationships that lead to marriage these days start on the internet and its Machine Learning algorithms that propose potential dates for you. There are children alive today that wouldn’t have been born if not for Machine Learning. [laughter] Let me give you another example. The Smart Phone in your pocket right now is chalked full of learning algorithms. The learning algorithms you know let it understand what you say, let it correct your, you know your typing errors. They predict what you’re going to do and then they help you either in response to your commands or you know or even on their own initiative. You know they use both their panaplea of sensors. You know and all the data going through them in order to do this. You know they use GPS to figure out what your daily habits are. They can even compare that with your calendar to you know to figure out if you’re the kind of person who tends to be tardy for meetings. They can even use the accelerometers that they have to figure out you know what your characteristic walk is. You know something that I think is going to happen in the fairly near future is that if your Smart Phone figures out that you’re about to have a heart attacker it will call nine one one on its own. It will warn you. Machine Learning may well save your life one of these days. Now with Machine Learning being so valuable it’s not a surprise that tech companies are all over it. For example you may know that Google you know not long ago bought this company called Deep Mined for over half a billion dollars. It had no customers and no products just because it had better learning algorithms. But if you’re Google and you have a learning algorithm that lets you, you know predict you know whether someone will click on and add one percent better. That alone is worth you know fifty billion dollars or so every year, okay. Another example is IBM just a month ago bought this medical imaging company for a billion dollars. Not so much because of what they do. But because they want access to their library of images in order to train their learning algorithms to the things like diagnose breast cancer from x-rays and diabetes, and what not. The kinds of things that today takes very highly paid people to do. As a result you know people with expertise in Machine Learning are very highly sought after, right. Peter Lee, right, you know the Director of Microsoft Research actually said that the cost of acquiring a top deep learning expect. Deep learning being of course the very hot area in Machine Learning is comparable to the cost of acquiring a top NFL quarterback prospect. The geeks have finally won yay. [laughter] On the other hand you know Machine Learning also has a dark side. It is what’s behind the increasing automation of white collar jobs. Some people say that the NSA uses it to spy on us. I don’t know because you know its secret. [laughter] But you know, but yes it’s probably true at least to some degree. There’s even a lot of speculation these days in the media as you’ve probably seen that you know Machine Learning is going to lead to Terminator, right, big batty eyes and robots taking over and what not. I think the take home message from all of this is that Machine Learning could be your best friend. But it could also be your worst enemy depending on what you do with it, which is why I think we’re not at the point where everybody needs to have a basic understanding of Machine Learning. Not just computer scientists or Machine Learning researchers anymore. That doesn’t mean that you need to understand the gory details of how Machine Learning works. It’s a little bit like driving a car, right. You don’t need to understand how the engine runs. But you need to understand you know what to do with the steering wheel and the pedals. I think that most people right now don’t even know that Machine Learning algorithms have a steering wheel and pedals that they can control, okay. Alright, so how does this happen? How do computers learn things by themselves? This to a lot of people I think comes as a surprise. Because they think of you know of computers as just these you know fairly dumb things that just do exactly what we tell them again and again. You know learning you know requires a lot of intelligence, a lot of creativity. Then computers aren’t supposed to have that, right. You know Picasso said that computers are useless because they can only give us answers, okay. Well, Machine Learning is what happens when computers start asking questions. In particular the question that a computer asks or that a learning algorithm asks is the following. Is here’s an input and here’s an output, how do I turn that input into that output? Here’s an x-ray of a breast and the output is there’s a tumor here or no there’s not tumor. This is the question that the learning algorithm is asking is like how do I go from one to the other? If you give enough examples it often figures out how to do that better than you know a highly paid you know experts can do. You know like you people like doctors and so on, okay. Now, you know here’s maybe one way to look at this which is traditionally, right, computers have to be programmed by us, right. The algorithms, right, we were the ones that defined the algorithms, right. Then we input the algorithm into the computer. Then it was the data that went in the input. The algorithm did something to the data and outcomes the output, right. This is how you know most you know computers in the world work you know today. But Machine Learning turns this around. In Machine Learning actually the output has become an input. What goes into the computer now is the data and the output. What the Machine Learning algorithm is doing is saying like huh, if this is what goes in and this is what comes out. Then what is the algorithm that turns one into the other? Clearly, if you can answer this question it’s very, very powerful, okay. Now, the amazing thing is that this one algorithm, right. What’s in the computer can produce very, very different algorithms depending on the data that goes in. The same algorithm can learn to play chess or to do you know breast cancer diagnosis, or to the credit scoring, or to do you know prior recommendations, or whatever. Really, the Holy Grail of Machine Learning and people, some even said, say the Holy Grail of Computer Science is to figure out what is the most general algorithm that you can have that does this? In some sense every Machine Learning algorithm aspires to be you know such a master algorithm, which is one algorithm that is independent of the particular application. But just by giving it data that one master algorithm, right, it’s a master algorithm because it makes other algorithms can turn into something that is good for a huge variety of different things, okay. What I would like to do is just give you a flavor of how that happens. It’s quite fascinating, right. You know I think one reason to learn about Machine Learning is that it’s very important in your life. But another one is that it’s fascinating because learning algorithms come from all sorts of different areas including neuroscience and evolution, and so forth. You know to simplify things a bit there are five main schools of thought in Machine Learning. Each of these schools of thought has its origins in a different field of science. Each of these schools has its own master algorithm, a general purpose algorithm that in principle can be used to solve any problem, okay. Too fast forward a little bit, what I’m going to argue is that at the end of the day none of these people really have the master algorithm. They do have one part of it. The real master algorithm will come when we’re able to combine them all into one algorithm, okay. What are those five schools? They’re the Symbolists who have their origins in logic and philosophy. Their master algorithm is inverse deduction. It’s viewing induction as being the inverse operation of deduction. There are the Connectionists. These days the ones that are most you know famous. Their idea is to reverse engineer the brain. They’re inspired by neuroscience. Their master algorithm is back propagation. The Evolutionaries what they do instead of being based on the brain they’re based on evolution. They say let’s simulate evolution on the computer. They inspired by evolution biology. Their master algorithm or the most powerful algorithm that they have is something called genetic programming. Bayesians another very famous you know school of thought Machine Learning. Their origins are in statistics. In essence their master algorithm is probabilistic inference which is really how you apply base theorem, which is what the, you know what Bayesians take their name from. Finally the Analogizers, you know this is you know learning and reasoning by analogy. They actually have origins in many different fields. But perhaps the most important one is psychology because there’s an enormous amount of evidence that we human beings do a lot of reasoning by analogy. Again, they have several algorithms but the most powerful one or you know powerful and widely used is kernel machines. Also known as support vector machines which until you know the recent upsurge in connectionism was actually you know the dominant approach in Machine Learning, okay. Let’s see a little bit more what the basic idea of each of these schools is and what goes on there. Let’s start with the Symbolists. Here are some of the most famous Symbolists in world, Tom Mitchell at Carnegie Melon, Steve Muggleton in the UK, Ross Quinlan in Australia. He was actually the first ever PhD in Computer Science from UW. In some ways he’s actually local. [laughter] You know he was actually on my PhD committee, as well. Here’s the basic idea in Inverse Deduction. This is the idea that we’re going to solve learning in the same way that mathematicians you know solve things by defining inverse operations. For example addition, right, gives us the answer to the question if I add two and two what do I get as a result? Subtraction the inverse of that is the answer to the question, what do I need to add to two in order to get four? Okay and the answer of course is two. The basic idea in inverse deduction is essentially the same, right. You say well, deduction right is going from the general to the specific. Induction is going from the specific to the general so it’s the opposite. In the similar to addition, subtraction, right, deduction gives you answer to the question if I know that Socrates is human and that humans are mortal, what follows? Well it follows that Socrates is mortal of course, okay. Now, what is induction? Induction is asking the question what knowledge am I missing such that if I know that Socrates is human I can infer that he’s mortal, okay. Of course the answer is that you know what’s missing is the rule that humans are mortal. Now, once I’ve acquired this general rule I can use it. You know I can apply it to other people. I can combine it in arbitrary ways with other rules to potentially form very complicated chains of reasoning. Now of course, you know I wrote this in English and computers don’t understand natural language yet. In reality on the computer this is represented in something like first order logic, but the idea is the same, okay? Here’s an example of the power of inverse deduction. You see this picture here. There’s a Biologist here but it’s not this guy. It’s not the guy in the lab coat. This is actually Ross King a Computer Scientist and Machine Learning Researcher. The robot is this; the Biologist is this machine here. That machine is actually a complete robot scientist in a box. It uses inverse deduction to come up with new knowledge of molecular biology starting from knowing the basics you know like you know the central dogma, DNA, proteins, and so on and so forth. Then it formulates hypothesis. It actually carries them out physically on its own, right, using you know micro-rays and you know sequences, and stuff like that. Then it refines the hypothesis or rejects them and keeps going. This robot is called Eve. It’s at the University of Manchester. Last year it discovered the new malaria drug. Now the interesting thing about this is that once you have one robot like that you can make a million. I’ve just doubled the number of biologists in the world. [laughter] Then you can make ten million and then you know then maybe you will cure cancer one of these days, much sooner than we would otherwise, okay. Now the connection is they think Symbolists are mistaken, right. The idea in symbolic learning is that you can do it all at a certain level of abstraction without worrying about the substrate. In essence you know the way mathematicians and logicians prove things. But the connection is to like well, no, no, no you’re not going to get there that way. The best learning algorithm in the universe is the one inside your skull. It’s your brain. Let’s reverse engineer the brain and come up with a general purpose learning algorithm that way. The most famous of all Connectionists is Geoff Hinton. He’s been trying to, he started out as a Psychologist. These days he’s mainly a Computer Scientist. But his goal in life for forty years has been to figure out what is the algorithm by which the brain learns. He’s pretty sure that it can be encapsulated in one algorithm. He’s been doing this since the seventies. In fact he says that at one point you know he came home from work very excited saying, yay I did it. I figured out how the brain works. His daughter said like, oh, dad not again. [laughter] But you know these days you know his ideas are really starting to pay off. In particular he was one of the co-owners of back propagation. Back propagation these days you know is everywhere. Two other famous Connectionists are Yann LeCun and Yoshua Bengio. Yann is now the Director of the AI Research at Facebook. How do we do this? Well, the way we’re going to reverse engineer the brain is as follows, right. The brain is made of neurons. What we’re going to do is we’re going to build them a mathematical model of a neuron. We’re going to build the simplest model we can that works essentially the same way a neuron does. Then we’re going to you know connect those neurons into a big network. Then of course the problem is how do you train that network? That’s where a back prop comes in. You know neurons have Dendrites, impulses come in the Dendrites. If the sum of those you know impulses multiplied by the strength of the Dendrites exceeds a threshold then the neuron fires what’s called an Axon potential, right. You can think of a neuron as being you know a cell in the shape of a tree. These are the roots you know the axon you know is the trunk. You know the discharge goes down the trunk to the leaves, to the branches and the leaves. Then it connects with other neurons at what are called synapses. The basic idea behind connectionism and you know a hypothesis that we believe is true about the human brain is that everything you know. Everything you’ve learned in your life is encoded in the strings of those connections, okay. The whole question is well how do you learn those connections? Well, first of all here’s the neuron but in math instead of in biology, right. What I have is a bunch of inputs. I multiple each input by a weight. I sum them up so I’m really just doing a weighted average of my inputs, nothing very complicated. If the result is above a certain threshold then the output is one. Meaning, you know I’ve detected something let’s say and otherwise it’s zero, okay. Now, connect this into a big network with lots of layers. In fact the term deep learning comes from the fact that its neurons, that it’s you know neural networks with many layers. But now we have this very difficult question which is, okay, let’s say I’m trying to learn you know to recognize cats, right. In comes a picture of a cat. You know I compute the values of the neurons. Then the output should be one. But the output is actually point two. Now, how do I fix the problem? Right, this is what’s known as the credit assignment problem in Machine Learning. It’s like or maybe more precisely the blame assignment. Because when this is computing the right thing nothing needs to happen. When it’s making an error some connection somewhere has to change. For a long time people didn’t know how to do that. That’s the problem that back propagation solves. In essence the back propagation solves the problem as saying well, I know there’s an error of point eight here. You have a small weight. You have a large weight so maybe you need to change more because you’re more responsible for the error. Now, there’s a corresponding you know error here and here. If I change you, you know how much will we improve things? Well, how much is each of the weights responsible for that? Maybe something needs to go up to make the neuron more likely to fire. Maybe something that is making a negative contribution needs to go down so that it’s not preventing the neuron from firing, okay. In essence this is what back prop is doing. As I mentioned you know back prop these days under the name of deep learning is used for all sorts of things. You know particular things to do with images. Retrieving images, understanding videos, and also for speech, right. You know like the Skype, as you may know like the Skype simultaneous translation system at heart is using deep learning for this. But one very famous example of this that was actually on Page one of the New York Times is the Google Cat Network, okay. This is a network that folks from Stanford and Google built. At the time it was the biggest you know neural network ever it has on the art of a billion parameters, I think. They basically trained it by having it look at YouTube videos, okay. Since the most, you know on YouTube videos the single most frequent thing that they found was cats, right. [laughter] Because people love to post videos of their cats, right. You know the network you know can actually you know learns a lot of things besides cats. But cats having the most data was the one that they learned best. It’s become known as the Google Cat Network. Okay, now the Evolutionaries say well, you know sure the brain you know that’s fine. You know you can fine tune the connections between your neurons. But how did the brain appear, right. The real master algorithm is not the brain it’s evolution. Because evolution made not only the brain but all of life on earth as well. Now that’s a powerful learning algorithm. Indeed biologists, you know many of them do think of evolution as pretty much an algorithm. We have our reference of how it works. Now, the first person to actually you know start pursuing this idea was John Holland back in the sixties. He actually died recently. But then you know a bunch of other people followed in particular you know. John Holland invented what I call genetic algorithms. John Koza you know went the next step by inventing genetic programming. You know Hod Lipson is one of the people today who are doing you know many interesting applications of you know of genetic learning. What’s the basic idea in genetic algorithms? It’s quite simple, right. It’s, you know in the same way that we were, has a computational implementation of the brain you know in the Connectionist’s school. Here we’re going to have a computational implementation of evolution. You want to solve a problem, right. Let’s say you want to build a radio. You start out with a random pile of components. Literally, they start out with random piles of components. Or they start out with you know a thousand random piles of components. Those are the individuals in your population. Then they go out into the world and you see how well they do, right. In the case of an animal it might be how well they survive and you know reproduce and what not. But in the case of something like a radio well how much do they capture the signal that you want to capture and so on. Then each organism or you know or program, or whatever has a fitness value. The ones with the highest fitness get to produce the next generation. They get to you know cross over with some of the others to [indiscernible] sexual reproduction. You combine you know some you know genes from one with genes from the other. In the case of computers, right, the genes are usually just bit strings, right. Because you know we don’t need [indiscernible] we can just do it with a bit string. There’s also random mutation and then you get a new population. Then you do the same thing again. If you do this for enough generations often amazing things start to happen. Like for example you know people like John Koza have been able to you know develop. They have a lot of patents where the things were invented by the algorithm. New ways to build radios and amplifiers, and you know and low pass filters and things like that that are different from the ones that human designers created. But actually many cases work a lot better. The next step in this and the most powerful version of evolutionary learning is what is called genetic programming. John Koza’s idea was this. Well, why should we be, sure nature does this with these strings of DNA but we don’t have to do it with strings, right. A string is a very low level representation. It’s very easy to muck things up by crossing over a string in some. You could have a perfectly good program and then you ruin it by basically crossing over you know in a bad place. His idea was this, let us evolve and let us directly evolve programs, right. A program is really a tree of operations you know sub-routines, etcetera. Although, we’re down to additions and subtractions, and you know ands and ors, and what not. The idea in genetic programming is that your individuals are actually program trees. Then what you do when you, you know build the next generation is you pick a cross over point in the two parent trees, right. The mother program and the father program if you will. Then you switch the sub-trees. For example if you had these two trees and you did this cross over at this point one of the resulting trees would be the one with all the white nodes, which is actually one of Kepler’s Laws. It’s the law that gives the length of the year as a function of the average distance of a planet from the sun. Okay, the length is you know a constant times the square root of the distance cubed, okay. As I said, you know this type of approach has lead to many interesting things like new electronic circuits. But these days you know perhaps the most exciting or perhaps the most scary thing that the Evolutionaries are working on is actually evolving issue of flesh and blood robots, right. This actually, you know they’re not just doing this as a simulation on the computer anymore. They actually have, you know this is a real robot from Hod Lipson’s lab. They start out with these robots you know basically trying to crawl around and stand up, and then run faster and things like that. Then each generation of robots the fittest ones actually get to program a three D printer to produce the next generation of robots, okay. [laughter] If Terminator happens this might be the route by which we get there. Okay, so watch out for those little spiders. You know one of them might be a robot. [laughter] Okay, so now the Bayesians. The Bayesians have a very different view of things, right. They don’t believe in being inspired by nature whether it’s evolution or the brain, or whatever. They think we should solve learning from first principals, okay. You know Bayesianism has a long history in statistics. You know Bayesians are known in Machine Learning as being the most fanatical of all the tribes. They have to be because they were persecuted in minority in statistics for a long time. They have to become you know like really determined. It’s a good thing they did because they certainly have a lot to contribute. These days on the back of powerful computers [indiscernible] Bayesianism is on the rise even within statistics. Within Computer Science probably the most famous Bayesianist is Judea Pearl who invented something called Bayesian Networks and actually won the Turing Award, the Nobel Prize of Computer Science a few years ago for that. He’s a Professor at UCLA. Another famous Bayesian is Microsoft’s David Heckerman. In fact Microsoft Research in its early days was a hot bed of Bayesian learning. It still is, but of course now it’s much, much more varied. You know and perhaps the best known Bayesian these days in Machine Learning is Mike Jordan. What do the Bayesians do, right? What do they believe in, right? [laughter] Well, Bayesians believe you know in base theorem. If you have a learning algorithm that is incompatible with base theorem you’re wrong. [laughter] In fact Bayesians love this theorem so much that there was a Bayesian Machine Learning startup that actually had base theorem written in neon letters and they put it outside their office. Right, so that’s base theorem in neon you know shining through the night. What is the big deal, right? Like this is a very small expression, right. It almost doesn’t merit being called a theorem. The proof is extremely simple but it’s extremely important, right. Base theorem is really just a way to say well I start out with a hypothesis. I don’t know how much I believe in it, right. The key problem that the Bayesians are dealing with is uncertainty. Any knowledge that you learn is always uncertain. What I have is it’s really a combination of these things. There’s my prior probability of a hypothesis. How much I believe in this hypothesis before I see any evidence. The hypothesis could be as simple as a binary decision. This person has AIDS versus this person doesn’t have AIDS, right. Or it could be a whole Bayesian network or a whole neural network, or decision tree, or program, or what have you. You start out with your prior belief which is how much you believe in your hypothesis before you see any evidence. Then there’s this other part which is the likelihood which is how likely the evidence is if your hypothesis is true. If this hypothesis makes the evidence very likely then in return the hypothesis you know our learning happens, like the hypothesis also more probable because it made what you’re seeing likely. Then when you combine the two by multiplying them you get what’s called a posterior probability which is how much you believe your hypothesis after you’ve seen the evidence. Then you also need to normalize using you know the marginal probability of the hypothesis. But let’s not worry about that. It’s just to make everything add up to one, okay. What you do in Bayesian learning is that you start out with a whole bunch of hypothesis and then the probabilities evolve. Hopefully some hypothesis becomes a lot more likely. Some of them become a lot less likely. But it may never be the case that there is a single hypothesis that you should believe in, okay. The Bayesian view in some ways is very deep in terms of what it says about the world that says, no there isn’t the single two hypotheses. They’re just your prior beliefs and then the evidence transforms into posterior beliefs. Now Bayesian learning has been used for all sorts of things. But a very famous one that I think almost everyone has benefited from it actually started here at Microsoft Research, is Spam Filters. Right, the first generation of spam filters and today you know many of them still are you know based on Bayesian learning. The hypothesis here is, is this a spam email or is this a good email? The evidence is things like you know the words in the email, right. If the email contains the word Viagra that makes it more likely to be spam, if it contains free in all capitals more likely to be spam. If it contains four consecutive exclamation marks that makes it more likely to be spam. [laughter] If it contains the word your mom or the word your boss then it probably isn’t spam. Or you might want to you know file it as not spam just to be safe, okay. You don’t want your mom to get mad at you. Okay, finally the Analogizers. The Analogizers are a looser tribe than the others, right. They’re really a bunch of sort of like different sets of people that do have this thing in common that they learn by analogy. By looking for similar, you know when you’re trying to solve a problem you find similar problems in your experience. Then you try to you know transfer the solutions from one to the other. You know one of the early pioneers in this was Peter Hart. You know there’s this algorithm called nearest neighbor which we’ll see shortly, which despite its simplicity is surprisingly powerful. Vladimir Vapnik is the father of kernel machines, also known as support vector machines. Douglas Hofstadter, right he’s a cognitive scientist and he’s the author of Gödel, Escher, Bach. He recently wrote a whole five hundred page book arguing that analogy is all there is to intelligence. Right that’s five hundred pages explaining why everything we do is just analogical reasoning. He definitely believes that analogy is the master algorithm. What is the idea here? Well, let me illustrate it you know using a very simple puzzle. The puzzle is this; let’s say I give you two countries. You know I’m calling them Posistan and Negaland. [laughter] I give you the map. I tell you where the main cities in each one are. You know here’s Positiville the capital of Posistan. A bunch more cities in here and some cities of Negaland. Then my question to you is like if I tell you where the cities are. Where is the boundary between them? Where’s the frontier, okay? Now you don’t know, right. But a reasonable thing to do is to say well I’m going to assume that a point on the map is in Posistan. If its closer to some city in Posistan than it is to any city in Negaland, okay. For example this line here is the set of points that are at the same distance from this city and this city. It’s part of the boundary, okay. Even though this is very, very simple right, all you have to do is remember the data. You don’t actually have to do anything whatsoever at learning time, right. Peter Hart actually proved the theorem that says if you give this algorithm enough data it can learn anything. It can learn any function, right. In that basic sense of the term this is actually you know a master algorithm. It’s completely general purpose. Of course the question is how much data do you really need to learn and then how efficient is it? This algorithm is not ideal for a couple of reasons. One is that notice there’s a lot of you know things that you really don’t need to remember. If I threw away these cities, right, the frontier wouldn’t change. All I should have to keep are what are called the support vectors. The support vectors are the examples that cause the frontier to be where it is, right. In fact support vector machines take their name from the support vectors because they just figure out what those are. The other thing that they do is they produce a much smoother frontier, right. This one is a little bit jagged and you know it’s probably not the true frontier. Support vector machines you know throw away all their necessary examples. They produce a smaller frontier. The way they do that is by then what’s called maximizing the margin. Imagine that I tasked you with walking you know from south to north while keeping you know all the cities in Posistan on your left and all the cities in Negaland on your right. But with one extra condition which is you want to stay as far away from those cities as possible, okay. Imagine that the cities are landmines and you don’t want to step anywhere close to them if you can help it, okay. This is, so you want to maximize your margin of safety. That’s exactly what support vector machines are doing, okay. You know these algorithms have been used for all sorts of things. But one very famous one is Recommender Systems. Again, these days you know all sorts of learning algorithms get used to recommend products to you. But the original one and still one of the best ones is in essence a variation of nearest neighbor. If I want to recommend movies to you what I do is I find people with similar tastes to yours. If you tended to give you know high stars when they give high stars, and low stars when they give low stars. Then if there’s another movie that they give five stars to and you haven’t seen. Then I’m going to hypothesis that you would like that movie as well, okay. You know I’ve seen it in multiple places that, although I don’t think anybody knows this officially that a third of Amazon’s business comes from its recommender system. Three quarters of Netflix’ business comes from its recommender system. You know like this thing is really at the heart of what these companies do. Alright, so let’s take stock of what we’ve seen. We have the five tribes. Each of these tribes has a particular problem that it’s solving that the others don’t solve. This is an important problem for Machine Learning. If you want to have a universal learner you have to solve that problem. The Symbolists learn rules that you can compose in arbitrary ways. The others don’t do that. Connectionists know how to. They do that using you know their master algorithm, their solution to the problem is inverse deduction, right. Inverse deduction is how you learn these rules that you can compose. Connectionists solve the credit assignment problem using back prop. Evolutionaries discover structure, right. Before you can start frontiering the connections in your brain you have to figure out what the structure of the brain is. Well, you know evolution you know is the one that came up with that. Their solution or most powerful solution is genetic programming. Bayesians solve the problem of dealing with uncertainty. They use probability. They use base theorem and then they do inference to figure out what the probabilities are after the evidence. Analogizers actually do something that none of the others can which is to generalize from the very few examples to things that are very different. Remembering the Posistan and Negaland example, if all I knew was the location of the capitals of both countries. I could already form a reasonable approximation to the frontier. None of the other approaches can do this. There are you know forms of analogy that are much more powerful than the ones that I showed you here, okay. That even allow you to generalize to completely different domains. Like you can learn you know to solve things as a physicist and then you get employed you know to predict the stock market using the same skills that you acquired, right. Humans are able to do this, analogy is able to do this, the other ones aren’t. But here’s the thing as much as each of these tribes really believes in its approach. You know has been very brilliant and very determined in you know in making progress. At the end of the day if we really want to have a master algorithm we have to solve all five problems at once. We need a single algorithm that actually has all of these properties. The question is you know what would that algorithm look like, right? In some sense what we’re looking for here is a grand unified theory of Machine Learning. In the same way that the standard model is a grand unified theory of physics or the central you know dogma is the grand unified theory of biology. What might that look like right? Well, we don’t have that yet. But we’re actually making very good progress. In particular in the last decade we’ve made amazing amounts of progress. Here’s the first thing to notice, right. If we, these algorithms look completely different, right, but actually they’re not. All these learning algorithms and you know many others that I haven’t shown they’re really all composed of three pieces. The first piece is representation. What is essentially the programming language in which the learning algorithm is going to write the programs that it discovers, okay? You know learning algorithms tend not to program in Java or C++, or anything like that. They program in things like first order logic, right. But, or it could be a neural network, okay. Now the first thing that we need to do is to find the unified representation that is powerful enough to do this. You know Matt in his PhD thesis found one. It’s called Markov logic networks. What it does it actually combines logic which is what the Symbolists use with the graphical models that you know the Bayesians use. In particular it combines logic with Markov networks which is a different type of, in some ways a more powerful type of graphical model than Bayesian networks. Really what a Markov logic network is, is just you have your formulas in first order logic in which you can encode basically anything you want to encode. Then you give them weights. If you really believe a formula you give it a high weight. If you’re not very sure you give it a low weight. Then the state of the world is very probable if a lot of the high weight formulas are [indiscernible] it. It turns out that this representation is a very nice generalization of almost everything that we use in Machine Learning. That’s the representation part. The second part that all learning algorithms have is evaluation. Evaluation is deciding if I give you a candidate program how good is it? Well, usually we decide it’s good because it fits the data well, meaning it makes accurate predictions. Also maybe it’s simple. It has other desirable properties. One thing that we can use here is just the Bayesians posterior probability, right. This is a popular option. But more general the evaluation I would argue should not be a part of the algorithm. It should be something that it takes from the user. You the user tell the master algorithm what you want it to do for you. If you’re a company say maximize my profits. Or whatever you goal for that particular problem is. If you’re an individual you say well maximize my whatever, my utility, or my you know whatever you want. Then the algorithm just goes and optimizes that. Finally there’s optimization which is really where most of the work goes in Machine Learning. Optimization is finding the algorithm, the program that maximizes your score out of all the ones that the language allows. For example in the case of Markov logic networks what is the set of formulas and their weights? That for example most faithfully represents what I’ve seen in the world without being overly complex, okay. Here there’s a very natural combination of evolutionary and Connectionist ideas, right. We can use genetic programming to discover the formulas, right. We have a population of formulas. We mix and match them. We mutate them. You know we [indiscernible] them, have them remove things, take one part of another, combine it with a part of you know, part of one combined with part of another, and so forth, right. We can use genetic programming to discover the formulas. Then to learn their weights of course we can use back propagation, right. In pretty much the same way that it gets used in neural networks. Again, you know we have many standard neural networks like both the machines for example is direct special cases of this, okay. You know we’ve made good progress. But of course you know we’re not there by any means. There’s a lot that remains to be done. My feeling actually is that even if we succeed in completely unifying these five you know paradigms and I think we’re getting pretty close. I don’t think at that point we will actually have solved the problem. I think there are some major ideas that we’re going to need that haven’t been discovered yet. In some ways the people who are already working in one of these schools who are very deep experts in one of them are not the best position for that. You’re actually better off you know if your thinking is not along those tracks and if you have more of a distance. We need your ideas. If you read the book and have a brilliant idea about what the master algorithm should be please tell me so I can publish it. [laughter] Okay, let me conclude just by saying a little bit about what I think are the things that the master algorithm will enable that current learning algorithms cannot yet do. The reason these things need a master algorithm is that again all of those five problems are present in them. If only one was present then you could use the appropriate paradigm. But you know in most of the really important, really had problems all the five you know issues are present. One of them is home robots, right. Home robots are coming at some point. We would all love to have a robot that you know cooks dinner for us, does the dishes, you know makes the beds, and so on. But it’s a very, very hard problem. Everybody in AI believes it can’t be solved without Machine Learning. But also you know what kind of Machine Learning will it take to build such a robot? I think it is something like the master algorithm that we’re looking for. Here’s another one. Everybody these days is trying to turn the web into a big knowledge base. You know there’s an effort like that in Microsoft, there’s one at Google. There’s you know one at Facebook, there’s several in academia. You know this is the idea that well the web is a massive text and images which is very hard for computers to deal with. If we can turn that into something like a knowledge base and first order logic then instead of you know issuing key word queries and getting some pages back you can actually have a dialog with a computer. Where you know ask questions and it reasons over its knowledge and it gives you the answers, right. This would clearly be a great thing to have. But again we can’t solve it without Machine Learning. We can’t solve it without the kind of universal learner that we’re talking about here. I think each of the paradigms that we have right now is not enough. You know people try to hack some things on top of the one that they start with. But we need a more fundamental solution. Here’s another example, perhaps the most important one of all which is curing cancer. Why is curing cancer so hard? It’s because cancer is not a single disease. Every cancer is different. The same cancer mutates as it goes along. Somebody’s cancer right now is different from the one that it was six months ago. It’s very unlikely that there’s a single drug that will magically cure all cancers. What you actually need is something like a learning program that takes in you know the genome of the cancer, its mutations, right, relative to the genome of the patient which it also knows, and the patient’s medical history. From that it predicts, what is the drug that will be you know the best for that patient, or combination of drugs, or sequence of drugs? In a way it’s not unlike a recommender system. Except instead of recommending a book or a movie it’s actually recommending a drug to treat your cancer. This is something that’s beyond the ability of any human being to do, biologist or doctor, or whatever because there’s just too much information. There’s just too much knowledge that goes into understanding how a cell works, right. You don’t just need to cure the cancer you also need to not you know destroy the cells in the process or harm them, right. Both the amount of information that you need to bring into this model is too large. The number of things you have to attend to it at diagnosis time, even just the sheer number of drugs that you might be able to use are beyond any human being. But I think with something like the master algorithm we will be able to do it. You know there are many people working very energetically in that direction. Finally, let me mention this idea of three hundred and sixty degree recommenders. The kinds of recommendation systems that we have today recommend one type of thing. They’re based on the knowledge that one of these companies has a view. Right, Amazon can recommend things based on your clicks on the Amazon site. You know Netflix can recommend things based on you know all you do on Netflix, same with Facebook and Twitter, and so on and so forth. But what you would really like to have is a model of view that is based on all the data that you’ve ever generated. If fact in the limit that’s generated from you stream of consciousness. Imagine what you’re doing being recorded you know video and audio recorded you know continuously. Now, what you want to do is learn on that a very, very good model of you. That knows you much better than any of these models that I’ve based on a sliver of your data. Then that model helps you, you know at every step of your life. Recommends things you know large and small to you. You know like from books to read, to who to date, and you know where to go to college, and what house to buy, and what not, okay. Again, this involves all the problems that we talked about. I think you know current Machine Learning algorithms are certainly not up to that. But I think if we discover the master algorithm. We will be able to do this. Then I think your personal model would become you know even more important to you than your Smart Phone. Or you know like living life without the help of your personal model will come to seem impossible. With that model you will live you know a happier and more productive life than we do today. Thank you. [applause] I’ll take questions, yeah. >>: What do these popular [indiscernible] to find tribes. >> Pedro Domingos: Yes. >>: Six tribes or what? >> Pedro Domingos: No, no you asked a good question. Decision trees are very much a symbolist algorithm, right. They come from the Symbolist school of Machine Learning like Ross Quinlan is a very good example of that, right. Going you know all the way back to the sixties. Boosting is a simple way of combining algorithms. You could think of these on [indiscernible] methods as a step on the way, they’re the simplest way you can combine this algorithms, right. They’re very successful. But they’re also somewhat shallow, right. It’s not a deep combination. It works but we should be able to do better. >>: Thank you Pedro for the talk. Very quickly I wanted to get your thoughts on [indiscernible] philosophical issue behind what’s the ethics behind the universal learner? Obviously, when the universal learner will provide suggests is one thing. But one day the universal learner provides actual choices, that’s another thing. What are your thoughts on that? >> Pedro Domingos: Well obviously, you know I didn’t have time to go into that. The implications of this for you and for society are very large, right. Who makes the decisions and why? Who are the algorithms serving, right? The thing to realize is that today already a lot of the decision is being made for you by an algorithm, right. It’s for example, if you want to you know take your pick right, buy a book, right. Or even you know let’s say search results. The learning algorithm already throws out ninety-nine point nine, nine percent of things. Then from the ten that you look at you get to pick, right. You’re ultimately in control but only what was already preselected by the algorithm. You know the more the information society grows the more choices there are, right. This is a weird thing. We have amazing choices today but we don’t have the time to make them. Which is why having a model is good because what that model does it’s going to make a most of those choices for you. Finally it’s going to leave the ultimate control to you and it’s going to learn from what you say, right. I think this will be very useful. At the same time you want to make sure that that model is owned by you, right. You know there’s a little bit of danger in the model being owned by someone who has a conflict of interest. Like you know like Serge [indiscernible] says that Google wants to be the third half of your brain. [laughter] Well, great but you know you don’t want the third half of your brain to always be trying to show you ads, right. That would be a little worrisome, right. There are a lot of you know issues to consider here. Yeah? >>: Kind of looking at an early attempt and the current attempt. Look at [indiscernible] and also looking at Watson... >> Pedro Domingos: Right. >>: What did each of those do poorly? What good things came out of those? >> Pedro Domingos: Yeah, so [indiscernible] was the ultimate attempt by, so in AI they’re sort of like the knowledge engineering school, right. We’re going to build intelligence systems by writing in all the rules manually, right. Then that didn’t work because there are just too many rules. In fact, if you don’t learn it doesn’t matter whether you put ten thousand or a hundred thousand, or a million rules. Your robot or your system will always very soon come up against something it doesn’t know. That was a failure but it was an interesting failure because we learned from it. One thing that we learned from it is that you’ve got to use Machine Learning. Watson uses Machine Learning a lot. But it uses; you know it use sort of like this big hodge podge of hundreds of different learning modules. This is I think; Watson well illustrates the problem with trying to build AI by just treating as a pure engineering problem. You can engineer a system that will solve Jeopardy but will that work well on other problems. Well, so far you know there’s no evidence that it really does. Yeah, so I think we, it’s because intelligence is complex that we need the simplest algorithms that we can get. I think both of these were useful but you know neither of them is really the answer. >>: You’ve seen on Machine Learning being used in the legal field to make decisions. I mean from recommendations of… >> Pedro Domingos: Yeah, let me give you, exactly this is one of those unexpected examples where we take this highly qualified white collar work. It’s actually more easily automated than you know than construction work, right. One example of this is you know the Analogizers actually one of their favorite applications is law because of case based reasoning, right. People in the law reason by cases which is they look for a similar case. Then they say like, oh, the Supreme Court said this and this case is analogist, so how do we do that? This is one successful application. They’ve had things like you know somebody to study to see whether you could predict the outcomes of various you know decisions. The winner was an algorithm of this type much better than the human experts. Another example is eDiscovery, right. You know two companies have a lawsuit. They’re allowed to look through each other’s documents. These days there’s like millions of emails, right. They use to basically employ you know junior lawyers to go through those piles of stuff. It turns out if you do that using some of these learning algorithms they actually do better than the human beings, right. I think partly it’s because the human beings are very bored and no one’s really you know paying attention to what they’re doing, but still. Yeah? [laughter] >>: Are you afraid that someone will hijack the system basically. You know I’m going [indiscernible] you with [indiscernible] if you know how the opponent thinks [indiscernible], or [indiscernible] support the system to give you the wrong advice. >> Pedro Domingos: Yeah, so when you make [inaudible]. What if I have an enemy and that enemy’s in control of the learning algorithm? >>: Well either in control or he just knows what is in the learning algorithm, knows things about you. >> Pedro Domingos: Yeah, actually I had this student of my Daniel [indiscernible] who you know in his internship at Microsoft Research worked on this problem. This is a problem of adversarial learning, right. I build a spam filter, right. I mean I remember when David Heckerman was like; oh we’ve almost solved the spam problem because their spam classifier was ninety-nine percent accurate. But of course the spammers wouldn’t stay still. They figured out how to defeat the classifier. It turns out it’s not that hard. What you have actually is that you have to have Machine Learning algorithm that is a combination of Machine Learning and game theory. We say like, if I deploy this classifier what is the spammer going to do. Now let me do something that defeats what the spammer does. Like for example they would put hyphens in [indiscernible] which actually defeated you know the key word you know search that the learning algorithm did. But if you now look for words with a hyphen in the middle that actually is even a better way to detect spam. You need these kinds of algorithms. You know it is an arms race but it certainly an active area. Yeah. >>: Have you seen the Machine Learning algorithm to decentralize like you know live in the datacenter somewhere or it should be distributed like live in ones Tablet, phone, or? >> Pedro Domingos: Yeah, that’s a very good question, right. Are they going to be centralized or distributed? >>: Yeah. >> Pedro Domingos: I don’t think there’s going to be a unique answer. It’s going to depend on things like how efficient the various parts of this are. How much you care about your privacy, right. I mean all of these things these days even your Smart Phone. Like what they do is like you speak into them and they send it to the Cloud. Then the Cloud is actually where the understanding is going on. USDs don’t necessarily care, right. Partly it’s a question of just engineering efficiency. But partly it’s a question of for example how safe do you want to be, right? Ultimately if you don’t want for example the government to subpoena your data from the Cloud then maybe it has to always be with you. In which case you know you have the rights to it. I don’t think there’s going to be a unique solution. I do think there’s going to be a place for a kind of company that is kind of like the repository of all your data. The company that learns the model for all your data whether this is one of the current companies or not is an interesting question. But I think you know I think we’re going to see things like that happening, yeah. >>: In terms of pairing machines and humans, have any one of the models been better than the others at things like augmented cognition? >> Pedro Domingos: Yeah, I mean each, the reason these paradigms you know continues is that each one of them has successes that it points to in terms of the things that it does better than the others. Often when you compare them with humans, right, there are certain things at which these algorithms are better than human beings, right. You know a famous example these days is Deep Mining, right. They can play a lot of Atari you know games better than any human being. At the same time you also see the things that they don’t do very well. You know Deep Mining plays Pong very well. But Pong is basically just making this you know racket go up and down. It can’t play Pac Man very well. Pac Man, you know I’ve talked to some people there and they’re well for that you start to need reasoning and planning, the kinds of things that the Symbolists are better at. Eventually, so the amazing thing is you the human being cannot do these things, individual things as well as the algorithms. But overall we’re still the best. It’s the master algorithm that I think will be able to do things as well as, or better than people, by combining these things and maybe some others that we haven’t figured out yet. >>: Is there a master scenario or a master test that will verify the master algorithm? >> Pedro Domingos: Yeah, so like this is definitely you know an important question. How will we know when we found the master algorithm, right? I think the essential answer to this is that it has to be an algorithm that solves not just one problem but many different problems that it didn’t see before. These days when people do tests in Machine Learning it’s like, oh, I want to learn to do object recognition. I will give you know a data base of images, train on some of them, then test on the others, right. But this is, you know but that’s just a vision, right. The real test is I will give you a set of tasks to learn on you know like for example vision and robotics, and what not. Now I’m going to give you a task where the task itself is different from the ones that you saw before, right. The traditional kind of learning wouldn’t have worked. You have to be able to do more than one. It’s this idea that one algorithm has to be able to do many things at the same time. If you think about it this is implicitly what the community already does with Machine Learning algorithms, right. They test, they learn on these data sets for medical diagnosis or vision, or whatever. Then they go test it on others and they get excited when it does well on those. In some sense I think this is what we’re going to have to do. Yeah? >>: You said in schools of thought that have if your answer have successes, is there another example of the school thought that’s failed and why did it fail? >> Pedro Domingos: That’s a good question. I think the interesting thing about the history of Machine Learning is that all of these schools of thought failed at some point and then they came back. You know maybe there’s minor ones that are there. I mean, so just to be clear there’s other things that I talk about in the book that I didn’t talk about. Things like re-enforcement learning for example and supervised learning, right. There are you know a whole cycles on Machine Learning that I haven’t talked about. But I think what is amazing is like for example take connectionism, right. In the fifty’s when Frank Rosenblatt you know first started doing this. This was like, oh we know, we’ve almost figured out how the brain works. Then it died, right. Basically, the Symbolists killed it, right. You know [indiscernible] wrote this book saying look there’s all these things that your neural networks can’t do. For twenty years they were dead. Then in the eighty’s they came back when back prop was invented. Then they fitted the way again. Then the Bayesians were on the rise. Then the next decade you know the Analogizers were on the rise. Now you know deep learning is on the rise again. Some of the more optimistic of the deep learners are like yeah, we got it, right. We solve vision. We solve speech. Next we’re going to do language and reasoning, and you know any day now, right. I’m not kidding, right. My view is like if you look at the history you know just extrapolate in a naïve Machine Learning sense. Like you know there’s going to be another five good years of deep learning and then some other school will have another, right. But I think eventually we have to converge to something that actually is not just one school or the other, but the combination of all of them. Yeah? >>: Do you think that [indiscernible] solve a moral problem. Tell us what to go to, what’s bad… >> Pedro Domingos: That’s a great question, right. You know like of making moral decisions. In fact it’s a very pertinent question these days because of the issue of you know intelligent robots in warfare, right. More and more we’re seeing autonomous drones. Some countries actually have drones that can you know autonomously decide to fire. Other, some people say this is terrible and we should ban them. I think that’s actually a very bad idea because you know it could save a lot of lives to replace human soldiers by robots. Also robots you know they don’t get angry, they don’t get scared, they don’t get vindictive. You know robots may actually have better judgment. But then there’s this basic ethical judgment question. One thing that people are pursing like people like Ron Arkin is you can maybe program certain rules of morality into the robots. Like Asimov’s Three Laws, right. But the problem with Asimov’s Three Laws is that you know everyone of his stories about a failure of one of those laws. [laughter] Just those laws are not enough to figure out what to do in every circumstance. But now I think the Machine Learning answer to this question is sure you can program those basic things into the robot. But then the robot can learn to make moral decisions by observing what people do. The problem there is that I think if a robot does that he will be very confused. Because we have these moral principals but most of the time we don’t follow them, right. [laughter] I think at the end of the day you know in this as in many other things Machine Learning and AI are actually going to force us to confront. You know I think we in some ways are going to learn more and more about morality. Because we’re going to have to teach the robots morality and in that process I think we’re going to have to figure some things out ourselves. I think the hard part of programming morality into robots is not the programming part, it’s the morality part. >>: What has led to the insider expectation that there would be one master algorithm? Why couldn’t it be five or ten, or? >> Pedro Domingos: No, so to be more precise, right. Here’s an analogy think of Turing machines, right. A universal Turing machine is a universal model of computation, right. There are dozens of other, even hundreds of other universal models of computation that are Turing equivalent. It’s not that there’s only one model, right. It’s that there are many equivalent ones. Likewise there will be many equivalent versions of the master algorithm just like there are many equivalent versions of you know the laws of physics, right. There’s Hinton’s version, the [indiscernible] version, the Hamiltonian version, right. >>: Just one standard model. >> Pedro Domingos: Well, I mean we could even debate that, right. The standard model is a bit of a patchwork, right. But the point is, right. We want to find the first one, right. We in induction, right, are at the point where deduction was before Turing came up with the universal Turing machine, right. In some sense it doesn’t matter exactly what form it takes. You know then there would be a lot of engineering the variations and what not, right. Because again these things are probabilistic so it’s a little bit different from deduction. But yes there could be more than one. But at some level at the end of the day they have to be equivalent. All of the schools of Machine Learning have these theorems that say my method can learn anything given enough data. A neural network can learn anything if it’s large enough and you give it enough data, same with a set of rules, same with nearest neighbor, etcetera, etcetera. But the problem is like how well can they learn it, right? It’s not just being able to learn it in principal. It’s being able to learn in practice. In practice different things maybe better for different problems. But to pick another analogy think of the microprocessor, right. A microprocessor is not the best solution to any computing problem. There’s always the [indiscernible], right, the circuit specific with application that does better. Yet, it’s microprocess that we use for everything. The master algorithm in a way is for Machine Learning what a microprocessor is for computer architecture. >>: To clarify there’s a difference between the master algorithm and strong AI, correct? >> Pedro Domingos: Yeah, so this is another good question, right. The idea of strong AI is that we can actually have a computer that is not just as intelligent as humans. But actually is indistinguishable from humans, right. The master algorithm hypothesis is actually in some ways doesn’t say anything about the strong AI hypothesis, right. The master algorithm is just a learning algorithm, right. It’s not conscious. It’s not an agent. It doesn’t have a will of its own. Now if you want to create strong AI I definitely recommend that the shortest path to is to try and get the master algorithm. You know like Ray Kurzweil wants to invent strong AI by reverse engineering the human brain. I am willing to bet him that we will figure out the master algorithm long before we figure out how the brain works. In fact you can’t figure out how the brain works without Machine Learning, right. You know there’s this field of connectomics where it takes slices of the brain and literally do the circuit [indiscernible] literally reverse engineer the brain. But the amount of information is so large, right. That they are desperate you know like Sebastian Seung is always asking for like Machine Learning you know students who will be post docs with him to actually use the Machine Learning to do the reverse engineering. Yeah, I think if you believe in strong AI this is a good path to it. But you don’t have to believe in strong AI to think that the master algorithm is a good you know research you know direction to work on. >>: Related to that is several life size humans that are very biased. We always think we’re right for example. Do you think the master algorithm will be more towards that way or more statistics based? >> Pedro Domingos: Sorry, humans are very what? >>: They’re very biased towards thinking they’re always right, sort of over confident normally. >> Pedro Domingos: Yeah, so human beings exactly. There’s a large body of psychology you know evidence that human beings tend to be over confident, right. We human beings actually tend to over fit in Machine Learning terms. I think the beauty of Machine Learning in some senses that you have a knob that you can turn, right. Any good algorithm, any learning algorithm has this knob that allows it to be more you know have more bias or more variance, right, as the technical terms go. However, an interesting thing is like why human beings are on the side of over fitting, right. I think there probably is a reason for that is that it’s not the individual human being that’s the unit. It’s the group. It’s the society, right. If I have a hundred people all from over fit one of them won’t be over fitting. That one will learn something. As a result the whole tribe if you will, will learn something faster than if all of them were trying to be you know as you know accurate as possible. Maybe there’s a similar, you know again in model and [indiscernible] we were just talking about boosting. There’s a little bit of this. The individual models in [indiscernible] can be over fit, right. The decision trees for example can be wildly over fit. Then when you combine them you actually have a much better model, so maybe that’s the reason, yeah. More questions. >>: Yeah, so in the [indiscernible] you showed that what goes into master algorithm is output and data assuming subtly implying that it’s supervised learning. But I thought the master algorithm [indiscernible] learning. >> Pedro Domingos: No… >>: Leave out that… >> Pedro Domingos: No, I agree, so that you know this is just you know a forty-five minute talks, right. That is the supervised version of things. In the supervised version just that it goes in and the algorithm figures things out, yeah, definitely. >>: Do you think it might be the master algorithm… >> Pedro Domingos: No the master algorithm should be able to learn from any amount of supervision. We already have things like Markov [indiscernible] that can learn from any amount of supervision, completely supervised, completely unsupervised, or somewhere in between, okay. >> Amy Draves: We’ll take one more question. >> Pedro Domingos: Yeah, one more question. >>: Do you think [indiscernible] like come from [indiscernible] that is so unique that we can’t simulate it in a computer by zero and one [indiscernible]? >> Pedro Domingos: Yeah, so there are some people who say that, right. Like Roger Penrose says you know we will never succeed because the brain is doing this mysterious quantum mechanics. Nobody else believes that. I’m not even sure he believes that. [laughter] But in general this is a plausible. Here are two versions of that, one is there is some magic going on in the brain, right. If you’re a scientist and your reduction is that can’t be the case, right. However, there’s this more subtle version of this which is that the brain is so complex that we will never figure it out, right. It’s like the notion that like if we were so simple that we could understand ourselves it would be so stupid that we couldn’t, right. [laughter] But the thing is that like it’s not one person trying to understand one brain. It’s a whole scientific community of tens of thousands of people over decades trying to understand the brain. I think we will succeed in the end. Also because the master algorithm can be a lot simpler than the brain, right. The brain does a lot of things besides learning. It has a lot of evolutionary [indiscernible] which we don’t need. You know if all we want to do is have learning on a computer. You know I think that we’re only going to find out by trying. Even if we don’t succeed I think we will discover much better learning algorithms along the way. It’s definitely something worth doing. Thanks everybody. [applause]

>> Matt Richardson: Alright, good afternoon everybody. My... Researcher here at MSR in the Machine Learning Group. ...

Related documents

Products

Support

&gt;&gt; Matt Richardson: Alright, good afternoon everybody. My... Researcher here at MSR in the Machine Learning Group. ...

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib

>> Matt Richardson: Alright, good afternoon everybody. My... Researcher here at MSR in the Machine Learning Group. ...