>> Eric Horvitz: We have Rico de Salvo Braz... doing a post-doc with Stewart Russell's team. I'm very...

>> Eric Horvitz: We have Rico de Salvo Braz here from UC Berkeley here who's doing a post-doc with Stewart Russell's team. I'm very excited about this topic area. For quite a few years the uncertainty in AI community has focused on a largely propositional representations, Bayesian networks and all this fresh diagnostic reasoning, predictive modelling typically assumes you have a fixed set of propositions. And one of the directions that's kind of exciting is in opening up our systems to the open to become -- to have more of an open world intelligence is to mesh first order inferences with the probabilistic calculus, and it's been kind of an explosion of representations, I don't call it an explosion, I call it a mini firework of representations that combine first order logic with probability, probabilistic inference. Rodrigo was -- did his master's and bachelor degrees at the Universidade de Sao Paulo, and after that he got fascinated by intelligence and went off to Brown University for a couple of years in the cognitive science department doing work in neuronets and in descriptive models of various kinds. And as we discussed last night at dinner got a little frustrated with that work and went off to pursue his Ph.D. more centrally in inferential methodologies, core AI, and this was at the University of Illinois Champaign Urbana working with Ail Amir and Dan Roth [phonetic] and others there. He's been a post-doc in computer science at UC Berkeley and just recently informed me that he's probably going to be taking a position at SRI pretty soon to continue this work as an adult, a full-time researcher there. So Rodrigo on first order probabilistic inference. >> Rodrigo de Salvo Braz: Thanks for the introduction and for having me over. Very excited to be here. I have always wanted to visit Microsoft Research. It's nice to finally make it. Yeah, so today I'm going to talk about my work, my Ph.D. and also the post-doc. It's actually two different parts of the same subject. My Ph.D. I worked with what we call the lifted first order probabilistic inference, meaning inference that's performed not only on a first order representation but the inference itself is also captured the first order level as opposed to having a first order representation generating the propositions and then doing the inference at the propositional level. So that was my Ph.D. thesis. Then I work on to work with Stewart Russell. He was also working on the first order probabilistic inference from quite a different point of view. He was working with a language that was more expressive, more complex, he was working with open universe assumption and but his inference was still propositional, he was working with sampling whereas I was working with simple languages and more sophisticated inference because it was first order. But then because of that I couldn't deal with as much expressivity as he did. I didn't have the open universe assumption. So it was a nice compliment to work with him on that, and that's what I have been working on the past year in his language that he calls BLOG for Bayesian logic. So I'm going to talk about those two topics and hopefully one day they will integrate nicely and we're going to have all these things in one ->>: [inaudible]. >> Rodrigo de Salvo Braz: Yes. So that's it. So just ask any questions, fireworks if you like for questions. Because sometimes I find that some of this topic may look ambiguous to some people. It's good to clarify things right away. If you want to follow -- I'm surprised people don't have laptops here. Usually everybody's like with their lap [inaudible]. >>: [inaudible]. >> Rodrigo de Salvo Braz: Amazing. There you go. So for those who have laptops, they can follow the slides from my Web page. All right. So these are the parts of the talk. First I'm going to talk about first order probabilistic inference in general. The goal that it represents, what kind of things we want to achieve in the future with that. Then I'm going to talk about my Ph.D. thesis, lifted inference, then the BLOG which is an extension to BLOG that I have been working on and the exclusions. So the thing I'm most interested in is to abstract inference and learning from different AI problems, right, we are all working on different problems in AI and even though we share so many techniques, so many ideas about inference and about probabilities and so on, still often it is the case that we have separate solutions that are tailored to our problems. And of course that makes it harder to reuse those solutions to new problems that come along and it's also harder to integrate problems. If you're working on a problem, working on another one to integrate those two solutions it may be quite a piece of work and we have to rethink our models because of that. So it would be nice to abstract. All these problems and solutions by separating the knowledge and the inference and learning mechanisms that we are using. Once we do that, hopefully we can do this. We can have just one box that does the inference and learning for us so we can just put in the knowledge about specific problems and use that. That's a vision that AI has had since the beginning of AI I suppose. But it's still something that's not quite realized. So if we want this magic box and this inference in learning, what should it do for us? One thing that everybody seems to find very useful, especially if you're describing more complex problems, is to have predicates or relations and quantifications of our objects. This is something that's very powerful. You can describe many different problems using this. So at first people did that with logic, but then we also want other things. For example, we'd like to have knowledge that has varying degrees of uncertainty because often that's the case in problems is that we have knowledge but that's not a certain knowledge. Here I'm not committing to any specific uncertain representation, so I'm just using this brackets, these red brackets around my sentence to say this is uncertain in some way that I may commit to some form later. But just to say, you know, it may be that I have a tank, usually that means the car of the tank is green but not necessarily. So that -- those two things are pretty important different problems, different domains like language and vision and many others. And eventually even though that's not something concerning myself with right now but you also want this box to be able to use other things like modal knowledge and utilities and things like that. But for now we're concentrating on the predicates and probability -- not probabilities right now, uncertainty. So I was talking about integrating problems, right? So if I have domain knowledge about some common sense reasoning task, I'm thinking about a military situation with tanks and planes and the colors of objects and so on, I have that knowledge there. And I also may have even related to the same domain something involving natural language processing. I have also knowledge involving that about how verbs interact with the meaning, what does it mean to use the verb attack and so on. And if we share this inference in learning module, then it should make it easier for us to actually integrate those things. And what's the point of integrating that? The point is that you have a lot of synergy, a lot of feedback between those two pieces of inference of knowledge going on. You can resolve language ambiguities by using common reasoning, common sense reasoning facts that you are also observing. So bringing everything together may help you with all of the problems at once. You can use the language for the common sense and vice versa. I think that when humans solve problems they use that a lot, and I think that's a major direction for AI to integrate things as much as possible. And that's why I'm so interested in, you know, a very general language that can do that. So I was talking about predicates and things like that, and uncertainty. Logic has those objects and properties a very rich language but then first standard logics the statements are usually absolutes, so that doesn't do what we're trying here. And graphical models which are a machine learning techniques, they have -they involve lots of uncertainty but then usually as Eric was saying, you assume a fixed set of propositions and if you have objects that parameterize different propositions, it's not so easy to use that with those models, you usually have to have a separate thing, separate module that generates new instances of problems for you and it's a separate different stage and usually that's even [inaudible] solution so that's not so convenient. So just to give you a flavor of what you may do with graphical models, if you need to use objects and knowledge that is quantified, if you have that piece of knowledge there so you're saying that an object is either a tank or a plane and if an object is a tank use the color is green and so on, and if an object's next to each other -- to a tank then probably it's also a tank. So if you want to use that kind of thinking and you have the evidence that the color of a certain object is red, the other is green, you can create a graphical model that represents that. It's important to notice that even though I'm naming these variables with this not so usual notation parameterizing with parentheses here, from the point of view of a graphical model inference algorithm, those are just strings. They don't know the structure that's involved in this problem. They don't know that this is the same type of variable as that and that the same type of knowledge applies to both of them. So the algorithm cannot take advantage of that structure. Also, you have to do this transformation, right, and that's not something that the framework of graphical models necessarily brings to you. That's something that you have to come up with a solution for yourself. And I said, the algorithm doesn't know that this is talking about objects for the model -- for the algorithm this is just a bunch of random variables. And another interesting thing is that if I come -- if I have a different evidence I still have same model, I still have the same problem, same solution. I would like to have one model that can be used with different inference -- different evidence. But here every time I get new evidence, I get a different graphical model, so formally, technically speaking I got a new model, different model that formally is unrelated to the first one, and it would be nice to be able to formally establish how they relate to each other and that they in fact are representing the same knowledge and same things. So we also lose that kind of thing if we do something like this. So what I just showed you is a technique called propositionalization, which is just generating the random variables from first order representations. So a lot of people worked on something that, right, they came up with languages that were very expressive, very expressive and when they actually had to do inference usually they will create a graphical model with the advantages that I mentioned before, right? And the languages they have all different sorts of flavors. Some of them look like graphical models, some of them look like logic, some of them look like databases, some like description logics. Actually I think the field took quite a while to understand that these things were doing essentially the same thing. It wasn't so clear at first. It wasn't so clear why the semantics that a probabilistic relational model should have. So that's why I think we got so many different models. And eventually people started realizing we're really just talking about predicates and each ground literal path is going to be a random variable and so on. So you also have from Hackem and Meek and Coler [phonetic] from MSR, you have the system data, one of the systems in this line. So what people use is what I would called is marked propositionalization because they don't generate all the propositions, they don't just dump a set of propositions and general graphical model, they try to do some things smarter than that. They only stance 80 those variables that are relevant to the evidence and the query at hand. And so you can actually create something that's a lot smaller than you would just by propositionalizing. So that's usually the flavor what people do. Just to give you an idea about this type of model, one of them, which I find pretty typical, is Bayesian logic programs which by the head and [inaudible]. So they designed the language to look like prolog but instead of using the usual prolog implication symbol they used the conditional probability symbol because these are like prolog rules but they actually have associated CPTs and also combination rules in case you have two different rules that are parents to the same variable, you need to somehow combine those different CPTs into one. So if you are giving this bunch of rules like that, then you have something that will generate graphical models for you. So that's more or less how it works. If you have to decide whether a battalion needs rescuing, some kind of situation, and that depends on whether there is a soldier in that battalion that's wounded. So if you have a wounded soldier, you need a rescue operation there. And the way it will work is by applying prolog inference and building the proof tree and using that to build a graphical model like that. So by doing that, you actually concentrate on the random variables that are actually relevant to your problem, you don't propositionalize everything, but still you go and instantiate in the case of a battalion, you're going to instantiate variables for all the soldiers that you have in that battalion. This generation of graphical model happens even before you actually start doing the probabilistic inference, so say if you determine that soldier is wounded, that doesn't help you preventing the instantiation of other variables because you could in principal think of that, right? I already know this guy is wounded, I already know I need rescue, but because we're generating the graphical model before even considering things like that, it's going to generate the graphical model first and then it's going to do the probabilistic inference. On the other end of the spectrum, something that it's a first order model that looks like graphical models, it's work done by Laskey called Multi-Entity Bayesian Networks. It's essentially a bunch of templates. Each of them look like a fragment of a Bayesian network but then the random variables have these parameters and then this case the model does know that those are parameterized random variables, and uses them to instantiate new fragments. So if you give evidence or give a certain situation, it uses those objects to plug into the -- these variables here, then you have a bunch of fragments that have to be combined into a new -- into a regular Bayesian network, and that's -- and then it does the inference. So you just instantiate the fragments and you put everything together. So here you have two fragments at the same variable, you'll somehow combine. Not surprisingly you get something very similar to the -- what the Bayesian logic program, you get a similar graphical model here. Even though the representation looks pretty different. Okay. So that's kind of work people have been doing until pretty recently. In my Ph.D. the thesis I did this lifted inference approach which I'm going to explain now. Any questions so far? Just to clear that up? All right. So lifted inference. Suppose I have that same query, I want to know if a certain battalion needs rescue, right? Instead of instantiating random variables for each of the soldiers like we were doing before with those other approaches, I could do something that looks more like logic inference in which I determine the kind of things that will determine -- will be relevant to my query. But instead of instantiating everything, I leave them instantiated, I just say, well, I just need that kind of -- that class of random variables to answer this query. I'm showing this is not a form explanation, I'm just giving the idea about how you can go solving this problem without having to actually instantiate everything. It's the way that may be most people would think about solving it if they had to do it by hand. We wouldn't start thinking about each individual soldier if we had to decide that kind of thing. We'd think, well, a battalion has a bunch of soldiers, I have to check whether the soldier is in the battalion is wounded, so how do I do that? If I have another rule that I -- a soldier can be wounded depending on whether the battalion has been attacked, then I can decide -- I can check if the battalion has been attacked and think about that. While the battalion has been attacked, so I have so many soldiers. Each of them has a chance of having been wounded, so what's the probability that I do get at least one wounded soldier? So I think that's how most people would do that by hand, they wouldn't go over instantiating everything. And it's a faster approach. You don't have to instantiate. It's more compact. It's also more intuitive. And it keeps the structure of the problem, right, because during inference I know that this represents a class of random variables, and I know that they're all related to each other, that they all have the same knowledge on them, and I have that information available for my algorithm. I can use that to make better decisions. So that's the gist of lifted inference, just keeping things instantiated as much as possible. This is just a very quick review of Bayesian Nets. I think you guys are up to date on that. >>: I have a question. >> Rodrigo de Salvo Braz: Yeah? Sure. >>: [inaudible] I don't know if you call it the any concept, if any soldier has been wounded then that variable is true and then you mentioned having to do this inference that an attack causes the probability of wounding of one particular soldier and if you have N soldiers, you know, what's the probability that no soldiers will be [inaudible] that's a complicated thing. How many of those sort of complicated atomic things does this have to be able to do? >> Rodrigo de Salvo Braz: Why do you say it's complicated? You have an independent probability of each soldier being wounded, right? >>: Right. In and of itself it's not complicated. In and of itself, it's -- you know, that's a procedure that you could follow to see that if you have some probability of something happening to any member of a collection, you know, that what's the probability that it hasn't happened to any at all, it's happened to some, one of them, do you -- is there any other type of inference that you ->> Rodrigo de Salvo Braz: You're talking about this is almost like an inference step. >>: Right. >> Rodrigo de Salvo Braz: A way of doing inference. It's almost like an inference rule. >>: Right. Are there other inference steps? >> Rodrigo de Salvo Braz: Yes. That's a good point. This is the main one. That's the one that's the essence of it, I would say, but ->>: Each [inaudible] in that collection has a probability by itself and ->> Rodrigo de Salvo Braz: Yeah. But that doesn't work for all situations, right, because you may have situations in which things are not as independent and then I'm going to talk about that. There is another way of doing it which I don't show right now, but takes into account the dependencies between things. So you can think of it as inference rules and you may have some of them available. We have kind of -- we have two different ones and we have a third one that kind of links the first two together. It maps one situation -- well, after I talk about the actual operations then I will mention this third one. Okay. So Bayesian Networks you have a joint probability that's the product of conditional probabilities. You can also use factor networks which a notation that I use more often, so I'm showing it here. This is essentially the same thing. You have a joint probability that's proportional to the problem of these functions. You can represent that Bayesian Net using factor networks if you want. Then you have the task of marginalization. You just sending out a bunch of variables. You can do that with variable elimination. You can efficiently factor functions out and start obtaining new factors. Here I'm using the color green to show new factors obtained by sending others out. So you did that and you eventually get a function that's only on your variable of interest and that gives you a marginal probability. So I was just showing that to kind of introduce graph connotation that I was using and so on. So if I have a factor network with lots of redundancy like that I have again here the parentheses only have meaning for us, not for the model. I have several diseases and I may have epidemics of those diseases going on, and I may have people who are sick with those different diseases and people may be hospitalized or not. So whether Bob is hospitalized or not depends on the various diseases he may have and so on and whether people are sick depends on whether they are epidemics and so on. And I'm assuming here, and that's a very naive model that this function is exactly the same function as that so if there is an epidemics of flu, the probability of somebody getting that disease is exactly the same if there is an epidemic of measles, someone getting measles, which it's a naive thing but we'll do for illustrative purposes now. So if I have a model with lots of redundancy like that, I would rather represent it in a more intuitive way. Just talking about different diseases and people getting sick and being hospitalized so I could just say well, those are the relations that I have in my model and I have -- it's this -- this slide just has the point of showing and defining terms. So I could call those classes of random variables atoms just because they are pretty much the same thing that you have in logic. And you have the logical variables which are the parameters to these atoms. Those factors that actually represent many different instances of the same function we call par factors for parameterized factors. And we can also constrain their instantiations. I can say well that factor holds between instances of those two classes for the cases which P is different from John. So for John that didn't apply. So that's the kind of representation that I'm using for a listed inference. >>: [inaudible]. >> Rodrigo de Salvo Braz: You mean I could represent a constraint as another factor, is that what ->>: Parameterized factor. >> Rodrigo de Salvo Braz: I could do that. We keep them separate because those are the deterministic factors. They have their own different behavior, and it's more convenient for computational purpose to keep them as constraints. It's going to be hopefully more clear when I show it. So what's the semantics of such a model, right? So if I give you this first order model, what's the semantics of it? It stands for exactly the same thing as if I propositionalized everything. It stands for full instantiation. That didn't mean I'm actually going to do it, it just means that's what it's representing, that's the joint distribution it represents. And each factor here is just going to be the instantiation of the par factor with particular objects. If I do that, I have a standard graphical model and I have a joint distribution. So let's see the operations that we can do with more detail, all right. So the first one which is the one we were talking about is inversion elimination. And inversion elimination works the following way. In just one step I can sum one of the classes out just like that. It's very much like variable elimination but of course it's not variable elimination because here I'm talking about classes of random variables, I'm talking about when I sum over this it means I'm treating this atom as if it were an individual random variable, even though it's not. That's how the operation works. And of course to do such a thing, you have to justify that it's correct, that you can treat an atom as an individual random variable and that's still valid, right? But that's what it does. So you just sum it out, you get a new function that's on the remaining classes, the remaining atoms and you have a new first order representation that represents the marginalization over the instances, all the instances of sick PD. Yes? >>: So here if I -- so for you have not shown [inaudible] between any of your [inaudible] models which [inaudible] sum action. So if I understand that correctly, for a specific P [inaudible] I have probability of epidemic, I have probability of sick given epidemic and I sum over the sick and I get the epidemic. >> Rodrigo de Salvo Braz: Yes. >>: And if it's the same for all Ps, then of course I can do it for an average rate. Is that what you are assuming? Every person the relationship between sickness and epidemic is the is the same? >> Rodrigo de Salvo Braz: No. >>: And therefore you do it once on instantiation? >> Rodrigo de Salvo Braz: Yes. That's right. But it doesn't work all the time because here these variables, they don't interact with each other. The instances of sick PD, they are not interacting directly with each other. If they did then if you work out the math, you see that you cannot do that. If you do that, it's just going to be incorrect. >>: So [inaudible]. >>: Are you pushing to the direction which we look on the [inaudible] on the first order representation and you do something there that is more efficient in some ways than going down to the [inaudible]. >> Rodrigo de Salvo Braz: Yeah. Actually the rest of this is like -- is this, right. So the idea is that I could instantiate everything, do variable elimination over all the instances, obtain these guys here that are just over epidemic random variables, and now these instances can be represented by that. Instead I'm just doing it directly on the first order lab. I never go down in the propositional level. You can think about resolution, right, in logic reasoning. If you have first order resolution you can look at it from that point of view. You're doing resolution over quantified clauses and it is just as if you had instantiated all the clauses then resolutioning each of them and then going back first level, first order level. So this is the same idea. >>: So I'm confused. So you're still summing over all sick like all people times disease number of sick, right? The sum that you showed is still a sum over all like number of people times number of diseases? >> Rodrigo de Salvo Braz: No, the sum is just over one symbol. I pretend that that's an actual random variable. But let me get that. So this a binary variable, right? So I'm just going to say, suppose there's a false, I plug it in here. >>: You mean there's a false for all peoples and all diseases or a given person in a disease? >> Rodrigo de Salvo Braz: You can think of it for any person I'm going to do a summation for -- because these things are completely separate, right, they are like this, have this parallel structure, I can pretend that I'm doing the summation just for this guy. Once I do it for this guy, this is going to be exactly the same because the potential doesn't actually depend on the objects. It only depends on whether this is true or false. >>: Right. >> Rodrigo de Salvo Braz: So if I do the summation for here, I obtain a new function that depends on that and that's going to be exactly the same function I'm going to get for this one. Because all these subproblems they all have exactly the same structure and the actual numbers don't depend at all on the objects. >>: So the number depends on the true or falseness of the predicate? >> Rodrigo de Salvo Braz: Yes. >>: So the sum needs to know -- I mean, you're going to end up with true terms and false terms, so I mean, you just need to know the number of true, the number of false, and that's how you do the sum? I mean, you can move on if you like. >> Rodrigo de Salvo Braz: That's okay. You're asking if I need to know the number of objects that are true and the number that are false? >>: It's going to depend on that, right? So ->> Rodrigo de Salvo Braz: Yes, but because you have this nice structure here and it's not something that's general, it's only some situation like this one, you can actually ignore that fact. Because they are all independent, so you can -- it's as if you were factoring things out. Suppose that you do decide to do it on the propositional layer. You're going to have your summation, it's going to be a summation over all these random variables, right, all of them, you're going to have many of them, and you also instantiate all the factors. Because they are all independent, you can factor each instance of a factor to its corresponding summation. And you're going to end up with a summation over one factor here times a summation over another factor here, and they're all going to have exactly the same structure because they don't depend on the object. So you do the first one or whatever, and you know the result for all of them. And then you get a new -- and then you get these guys, a bunch of instances of factors that all look the same and because they all look the same, you can just abstract back like that. I'm going to move on because otherwise it's going to take a lot of time. But we can talk more about it if you want. Are you on my schedule, by the way? >>: Yes. >> Rodrigo de Salvo Braz: Okay. Good. So inversion elimination, this was something that was proposed by David Bowl[phonetic] in 2003, and he didn't actually formalize it very completely and the way he presented it was as if it would work every time. Basically he was saying what you were saying more or less, that oh, I'm just using generic objects, it applies to all the objects. But actually that doesn't work all the time because if the objects -- if the random variable is incorrect, then you don't have that nice separation anymore. So what we did the first paper on this subject was formalize it more carefully and then we realized, okay, that didn't work all the time, what are the situations that it is correct, and what are the ones that it's not correct. So here's the limitation, right? So because I have all the separate structures that have exactly the same structure, I can do an operation on one of them and I'm going to get exactly the same result. So it works. But if I have something like this, right, suppose I say well, what's the relation between the variables for epidemics of two different diseases, depending on the month, so I have that factor, right, and that relationship. I cannot just go suppose I want to marginalize over the epidemic, I cannot just do a fly inversion elimination and obtain something like that and I'm done. That's not going to work. I'm going to get something, but it's just not going to be the correct thing. And the reason for that is that each [inaudible] something like that you obtain this, you obtain a very connective graphical model, and things are not separate anymore, so when you're summing over one of the variables, that's going to influence the summation of the other ones, so you -- it's not true anymore that the summation over one variable is going to give you exactly what the other would give. So that's the limitation here inversion elimination. And because of this limitation, we started thinking of other ways of solving things. So for this problem, even though I have all this interaction between these random variables, there is -- this is still a very simple thing in a way, because these are all symmetric variables. They all have the same structure, they all have same knowledge on them, so it seems unnecessary to treat each of them individually because it's really just a mass of symmetric variables. So can we take advantage of that symmetry? So as we were talking about before, we need to consider joint assignments of that, and that's going to be an exponential number of them. But because they are symmetric variables, an assignment, when I say well this is true, this is false, and so on, it didn't really matter which ones are true, which ones are false, it only matters how many are true or how many are false. So if instead of iterating over all assignments, I just iterate over the histograms, the number of trues and false that I have in that closer variables, that's going to be enough. And that's going to be a polynomial number of [inaudible] and exponential one. So that's the other operation that we're talking about, two different operations and that's another one. Count elimination. And that's a more expensive operation because even though you have this polynomial number of assignments, it's a number of assignments that does depend on the number of objects that you have. I didn't mention that before, but inversion elimination actually doesn't depend at all on the number of objects. You may have a million objects. Doesn't matter because you're going to do the summation for this generic object. You're doing to do it only once. >>: [inaudible] one specific [inaudible] epidemic independent or [inaudible] without ->> Rodrigo de Salvo Braz: Well, this is an undirected graphical model, right? So ->>: Undirected where it's [inaudible] 3? >> Rodrigo de Salvo Braz: Yes. So this notation that I'm using is just like a Markov network, only I'm actually explicitly representing the functions over that. So this one click. Well, I'm not linking this guy here, but this guy's linking to each of those functions. This is a proposition. It links to all of those instances. Yeah, now I'm kind of regretting not having put the equations here. I think you guys would have appreciated that more. But I can go over that with you if you like. I just wanted to keep it more high level, not get down to details. But the idea is that you're going to have a summation again but instead of being a summation of a generic random variable, you're going to have summation over the number of values that you have in this cluster, and then you're going to get, you're going to get a potential just on the remaining variable month. So basically for each month is going to have a different potential. Yeah? >>: So this is assuming you don't have any evidence that distinguish between like [inaudible] so what if you have everything ->> Rodrigo de Salvo Braz: Yeah, that's a good question. That's actually a crucial point because this all depends on things being distinguishable and symmetric and if you have knowledge about specific objects then you actually have to separate those cases. And how -- and you do that in following way. If I tell you I know for a fact that there is an epidemic of the flu going on, so what you do, you come here and that's why I was keeping the constraints separate because they come really handy when that happens, because then I'm going to break this par factor into different parts. I'm going to -- par factor is just a set of instantiations, right? I'm going to separate the instantiations involving the flu and instantiation that don't involve the flu. And the ones that don't involve the flu it's what we call the residual. It's going to be like this. The one different from the two and but also different from the flu. And the other is going to already be instantiated, one of them's going to have the flu. So you separate that. And then once you do that, then again every par factor involves things that are completely undistinguishable and symmetric. So you go on with your calculations. But in the presence of evidence, you have to do this. It's what we call shattering. Actually, it's even more -- it didn't even depend only on evidence. If your knowledge distinguishes things in some way, if I have some special rule for the flu, then I have to do that as well, I have to combine that with this rule and separate them. Actually this is an interesting thing because in terms of logic, because it starts looking a lot like resolution and unification because when you have to identity the intersection between two par factors, it looks very much just like a unification and breaking clauses like we're doing logic. So it's an interesting link to think about logic inference and this the type of inference. It starts looking like that. And that's a major or limitation to approach right now because if you have a database, right, with facts about each of your objects, then there's not going to be helpful to you at all. Because you're going to have to look at that database and take all those facts into account and basically propositionalize, you're going to have things for each object. So it wouldn't be helpful in that case. So that's still a major limitation. It's not a limitation that's -- well you cannot get around it because you can do something like I'm going to talk about that later, but essentially you can do some kind of approximate inference that when objects are still not the same but they are -- they have similar properties in similar situations, then you can still consider them as part of the same class. So there is a generalization that takes this approximation into account. But right now because this is exact inference we're not doing that. Other questions before I move on? So this is lifted inference. If you run experiments on this, very predictably you'll get graphs like that, right? These operations don't depend, the first one, inversion elimination doesn't depend on the domain size at all, so if you're propositionalize and you do inference, it's going to be an exponential curve and lifted inference remains the same. And here even though the call elimination does depend on the main size it's really hard to get a graph that shows that because the propositionalization of it still grows a lot faster. So for this case in which you have clusters of symmetric graph -- variables, those are much faster operations than propositionalizing. All right. So I'm going to move on to the second part of the talk, which is about BLOG and DBLOG. So I have 15 minutes, right, so it's not much. >>: [inaudible]. >> Rodrigo de Salvo Braz: So let me tell you a little bit about BLOG, Bayesian Logic, which is the work that Brian Milch and Stuart Russell have worked on during Brian's Ph.D. thesis. So it's a probabilistic logic language, just like those other I presented, and its inference is just propositional sampling. You propositionalize variables as you go and you sample them. And what distinguished BLOG from other approaches is those two things. It's an open universe language so it doesn't assume you know the number of objects in the world. You have distributions on the number of objects in the world which is -- which did a nice feature there and very realistic because usually we don't know how many objects are out there. And it's also very expressive landing, it looks very much like a programming language, you can write very arbitrary things in it. So here's an example of a BLOG program. So you declare types, so I say I have this type battalion and the number of battalions is a uniform distribution. I have this property of battalions that's a boolean property, that's just going to be a Bernoulli distribution there, whether battalion is large or not. You can have the region of a battalion's going to be a natural number and that's a distribution from zero to three. And then you have another type of object, soldiers. And the number of soldiers it's a parameterized variable, it depends on the battalion. So for each battalion you're going to have a different set of soldiers. That's going to be a -- that's going to have a distribution, too. And you can write that the number of soldiers of a battalion is going to depend on whether the battalion is large and the region of the battalion. And then you can place queries like what's the average number of soldiers per battalion, for example. So because I have distributions on a number of objects, that's an open universe assumption and all sorts of pretty expressive language. I didn't really show when I was talking about par factors, I didn't want to be going down to details of syntax and so on, but usually those par factors, they are just tables, right? So it's a much simpler representation as opposed to here in which you actually describe the computation of the distribution much more carefully, and you have a lot of structure available there. It's a more expressive language. You also talking about sets of objects, you're talking a variable that, you know, average that's going to be over a set of objects. It's much more expressive language. So inferencing BLOG. So just going back to that first example that I was using of a soldier being wounded and a battalion needing rescue and stuff like that, so I'm just writing that in BLOG now. The only difference is that the probability of a soldier being wounded I am making the model more detailed now, and I'm saying that it depends whether the soldier is experienced or not, there's the probability of them being wounded is different. But basically that's the same example as before. So if I have a BLOG model like that and I want to know what is the probability that a battalion needs rescue, so how does the inference work? It's sampling, right? So when I have this query, I need to know what its distribution. So that's going to depend on its parents. And its parents, as we see here, you have this -- it's an substantially quantified sentence. So that's going to depend on all the soldiers. I have to look at all the soldiers, the set of soldiers and see which ones are in the battalion and how many of them are wounded. So that's one of the parents of the -- of my variable here is the set of soldiers for that battalion. So I sample values for that. And once I have a set of soldiers, I have the parents of this variable that I need which is going to be the variables wounded for each of the soldiers. Once I do that, I have to sample, okay, is the soldier wounded, now I need to sample for that. For that I need to determine its distribution, I need to know which -- I have to have values for its parents. So the parent there is whether the battalion has been attacked. So if I do that, I can sample a value if I get false, it means the battalion's not attacked, so the soldier is not wounded. And I -- unfortunately I'm going to do the same thing for all the variables, even though I know that the battalion's already attacked, I should be able to tell that all the soldiers are not wounded because battalions don't attack. But I'm going to sample variance for each of them and it's going to be false for all of them, because that's the distribution. And because of that, whether battalion needs rest or not is false. So that's how it works. Just another example ->>: [inaudible] some things ->> Rodrigo de Salvo Braz: Yeah? >>: Some things are interesting there. The notion that battalion has members, i.e., the soldiers, and then if this attack thing holds for the battalion holds for each of the members, right, we needed to know an attack, it wasn't an individual soldier that was attacked but that the battalion was attacked and therefore each of its members was attacked. >> Rodrigo de Salvo Braz: I think when you write the statement there, let's see, so here, right, battalion -- so I have this variable that says to which battalion a soldier belongs to, that's something I didn't go to over in very much detail but when I have a number statement for soldiers I'm saying for each battalion I have a number of soldiers, and that's actually constitutes a function on soldiers. So I have this function called battalion of that's going to give me the battalion to which the soldier belongs. So if I do that, I have -- if the battalion to which the soldier belongs is attacked, then I have a -- right? Does that make sense? I don't know if that answers your question. >>: Yeah. So if you go back to that picture. >> Rodrigo de Salvo Braz: Yeah. >>: The way you were explaining it, you were sort of going through variable by variable and saying okay, what parents do I need to know about in order to figure out if this variable is true? And you've got to wounded soldier one and you said oh, I need to know about attacked and then when you got to the next wounded soldier, you knew that that was going to depend on that same. >> Rodrigo de Salvo Braz: On that same variable, yeah. >>: Presumably in other cases it would depend on variable that's not the same. >> Rodrigo de Salvo Braz: Yes. >>: So this there was this membership concept. Are there any other concepts that we should know about that come in handy when you're deciding what depends on what? >> Rodrigo de Salvo Braz: I don't think those are separate cases because what, what I do is all the same thing, I look at the statement, the dependency statement for this, and it says well look at the battalion of this soldier and that's going to be your parent. So it determines there, okay, I need to know it is variable. This guy is going to do the same thing and it's going to -- just because that's how the competition works, it's going to get exactly the same variable, and then it's going to look up there. >>: So [inaudible] like battalion which are [inaudible]? >> Rodrigo de Salvo Braz: Yeah. >>: As you wish and then you just sort of [inaudible]. >> Rodrigo de Salvo Braz: Well, they're not arbitrary exactly because ->>: [inaudible] they get. >> Rodrigo de Salvo Braz: This function battalion of is what we call an origin function, it's actually -- it has a very -- it's not even a probabilistic variable because when I need to know the number of soldiers in a battalion, right, I have to sample that. Okay. I need to know how many soldiers this particular battalion has. I sample a number. In that case was 73. When I do that, I generate 73 objects that I place on my universe. And then I define the origin function for each of them to be that battalion that generated them and that's a fixed value for all of them. So when I go to their -- to wounded dependency model, I say, well, is the soldier wounded, then I plug the soldier here, then battalion already determined because that was determined when the soldiers were generated, so I have a value here battalion 13. So now I know that I need to know whether battalion 13 has been attacked or not. And then I sample that. >>: [inaudible]. >> Rodrigo de Salvo Braz: Yeah. >>: [inaudible]. >> Rodrigo de Salvo Braz: Yeah. It's a very much generated model. >>: How would like for instance if -- would you be able to borrow several battalions with several soldiers where like a single soldier can belong to battalions like you can be [inaudible] or like you know or does that get too [inaudible]. >> Rodrigo de Salvo Braz: No, you can do that. You can do that. But those will be origin functions because the origin function's defined as a parameter of the number here, right. Once you have that, you're saying all the soldiers are coming from same battalion. >>: Right. >> Rodrigo de Salvo Braz: So that's it. But you know, I can define arbitrary functions on things. I could also have another variable here saying alternative battalion of and then it could be anything. >>: So you could [inaudible]. >> Rodrigo de Salvo Braz: Yeah. And then I could use that variable if the battalion's been attacked or the alternative battalion's been attacked. So even though those origin functions they are under the hood different, they are treated -- you use them in same way as any other function. So it's -- it becomes a very uniform process. So do you see now that it's actually always the same case that you always looking at the parents like that? I'm asking if that like the other [inaudible]. >>: I guess what I'm getting at is like if you -- this is like a programming, right? And if you define a programming language you say okay, I have base types, I have integer, I have characters, I have really numbers, I have basic acomic functions, I can add, I can subtract, I can multiple. I'm trying to get at that what exactly is this set of basic operations that you could define here and basic types of types and so forth. Like if you do define it as a program language. >> Rodrigo de Salvo Braz: Well, you'd have. >>: What's the [inaudible] things? >> Rodrigo de Salvo Braz: You have the [inaudible] types like natural number and boolean and things like that. You also have user defined types like soldier and battalion. And then you have these operations, right? Here I'm not showing many of them, but you have things like plus and minus and all that. And it always works same way. You execute these instructions here every time you need the value of a variable, you look to see if it's already defined and already has a value, or if it not then you sample a new value for it. And when you sample new value for it, you're going to trigger this recursive process in which a sample that variable's parents and so on. So eventually you sample them and then you have this partial world of representation. So it's all this just this loop, sample parents get distribution sample value. If you get evidence that's the same thing, too, you get the evidence and you need to know to sample parents of the evidence and see if that's a likely explanation, so on. So this is one example, right, in which I had to sample values for all of the wounded variables. I can of another one. Suppose that the battalion has been attacked now that I sample and things in this particular sampling run the battalion has been attacked. And now it's a different sampling ram so I actually have a different sampling of soldiers. In this other world here that I'm generating now, that battalion has 49 soldiers. So the battalion's been attacked, so now -- and that's an interesting thing in BLOG, right? It uses this context dependent independence here which is called the specific independence which is now that I know that a battalion's attacked, now this variable matters to know the distribution of wounded actually need to know whether the soldier's experienced or not. So if I go through the dependence model here, okay, it's been attacked, so now what's the experience of the soldier? And then I'm going to sample something here. Once I do that, then I have a distribution for wounded. Suppose it's true. And if it's true actually you have just like in regular programming language, but if you have a disjunction and you get a true then you know that the result is true, don't even need to look at the other one. So in this particular case, we take advantage of that, and the other soldiers you don't actually have to do any sampling, because you already answered your query. But you have to run this many times and many times it's going to be there, you have to sample all the soldiers. So BLOG has this lazy evaluation property which is nice because as opposed to some other methods right, in which a propositionalize first and then you start doing the inference, BLOG the doing the inference and propositionalizing as it goes. And as a result of that, it savings work sometimes. So DBLOG, DBLOG is just like so it's trying to get to use BLOG for temporal models. Now, because BLOG is so expressive already, you don't really need more specificity to express temporal models, you can just use natural models as time, steps, you can just write your statements that depend on the previous time step and you're done. So that's a legitimate question. Why do we even need to bother about dynamic BLOG? You could just write a BLOG model that has time. And the reason for that is that temporal problems have a lot of structure that you want to take advantage of. So you have this Markov property there, if the algorithm knows that that's happening you can take advantage of that. Also usually in the -- in these problems you get the evidence and query successively four times steps so usually you know that once you are in a new time step you're never going to get evidence about previous time steps. And you may want to take advantage of that. So the language remains the same but it's still nice to use a different algorithm that takes advantage and it's more efficient for temporal models. So we're working on dynamic BLOG. We do make this change to the language that you just use natural numbers the algorithm doesn't know whether that natural number is actually representing time or not, all right? Because you could be just a natural, some natural number. So what we do have this new type, it's time step, which is just natural numbers but it's actually telling you this is representing temporal structure. So the algorithm knows. But it works exactly in same way. You just have different leaders. Now instead of zero, you have F0, things like that. But it works exactly the same. So much so that if you give this to the BLOG inference algorithm it still deals with that because time step is just a one of the [inaudible] types. But it's not going to take advantage of the structure. So we have the BLOG particle filter, right? So particle filtering as most of you know it just works by you have some sample for a state for a particular time step and then you get new evidence, use likelihood waiting and you see how likely an explanation each of them is, and then you resample to get population that represents more likely hypothesis. And we do the same thing in BLOG, in DBLOG . So you have a world there. You have this partially instantiated Bayesian network for a specific -- that represents a particular explanation that is sampled and then -- oh, wait a second. This slide is still above dynamic Bayesian network, so that's not above BLOG. But it's essentially the same thing. You just sampled the new variables using the time slice and you generate a new particle. So for DBLOG you have this particles. The particles now represent what I was showing before that network of variables that require the instantiation of each other and so on to generate an explanation for your query or your evidence. So when you get new evidence and the evidence is going to be BLOG statements like that, then you need to explain it. So here what I'm saying is that -- well, I didn't actually explain this model, so let me go back really quick. This is about aircraft. I have a bunch of aircrafts. They have their positions at each time step that's going to be a random walk depending on the previous position. And I have blips that aircraft generating aircraft in radar. It can be one blip or maybe it's not detected in that particular time step, right? The number of blips is going to depend on the aircraft and on the time steps. So each time step I may or may not generate a blip from an aircraft. I can also have another set of blips that are just false alarms. I can have some noise blips. And those blips are going to have the apparent position of an aircraft that's going to be Gaussian noise around the position of its source. So here source is another one of those origin functions, right? That's going to be the aircraft that generated the blip. So I just look it up here. That's the position of my source. Well, actually that shouldn't be pre this time step, it should be the current time step. That's the position of my source at this time step and then there is a Gaussian noise and that's the position of the blip. So that's the model I'm using here. So if I have the situation I'm saying I'm observing now that the number of -- that I have one blip only that comes from these aircraft that I have exactly one. So it has been generated. And the apparent position of this blip is 3.2. So I need to gather distribution for this variable, distribution of B1 at time two, because, yeah, I was using time step. So I do that by sampling a value for B1, which indicates here that's not a very interesting example because B1 is always going to be the blip. There is only one blip generated anyway. There's not a set of blips for our aircraft. So that's going to be a very simple thing. Always I'm just going to instantiate B1 with the blip that their aircraft generates. And if the aircraft doesn't generate a blip in that time step, it means this particle will have weight zero. Then once I know that, be I need to know also what's the position of the source. So I sample now the position of aircraft 1 and that's going to depend on the position the previous time step, so I use the particle previous particle as a set of already instantiated variables. And then I explain my evidence, and then I have a weight to evaluate this particle that I just generated. And like before, I have this the lazy instantiation. I just instantiated the things that I really need in order to explain my evidence. So now we have an issue that doesn't arise in dynamic Bayesian network because you could think, well, dynamic BLOG is just like dynamic Bayesian network, there's nothing new, but you start having some issues that you don't have before. This lazy instantiation is one of them. So just like in the particular case but now for time step eight, yeah time step nine, if I get a new blip and this type it comes from aircraft two, and suppose that I haven't observed aircraft two since the beginning, it's the first blip that comes out of aircraft two, I'm going to have to sample the position of aircraft two at time step nine. But because I have lazy evaluation, instantiation, I'm not going to have any values for that in the priest sometime step or the previous I have to go all the way to the beginning. So the cost of this lazy instantiation variable is that you may have to do something like that. And so the nice property of dynamic Bayesian Net so you have a constant update time would go away in this case. >>: Can you guarantee that somehow that doesn't affect the [inaudible]? That if you that you get the same answer to just the -- instantiate [inaudible]. >> Rodrigo de Salvo Braz: Yeah. You're wondering that what I did before didn't depend on position of aircraft two, right? >>: Right. >> Rodrigo de Salvo Braz: It didn't. Because the fact that it has not been sampled means it was irrelevant to the evidence. So even if I had instantiated the evidence would have exactly the same weight. So, yes, it is guaranteed that it didn't matter. It only matters in terms of cost. >>: [inaudible] adversarial attack on the system to bring it down [inaudible] future aircraft and then you're [inaudible] certain ways that will sort of make [inaudible]. >> Rodrigo de Salvo Braz: Right. Yeah, you could think of something like that. And you know, it sounds like this is a problem ->>: [inaudible] the probability of the previous time step, why do you need to [inaudible]. >> Rodrigo de Salvo Braz: Because now you got evidence that depends on like up to now you never had to bother about it, no matter where were, the evidence would have the same weight. But now because I'm saying that this blip comes from aircraft two, and I'm telling the position is here, I need to ->>: [inaudible]. >> Rodrigo de Salvo Braz: Yeah. >>: [inaudible]. >> Rodrigo de Salvo Braz: But to instantiate this to [inaudible] for that, I need to get its distribution and its distribution depends on the previous time step. And it's a recursive thing. >>: [inaudible]. >> Rodrigo de Salvo Braz: Right. So and actually you know this is a very common problem in a data association. People have worked on that not with BLOG but, you know, just hand made systems and they have circumvented this problem. They don't generate things all the way to the back, basically using things like closed form. And that's an option that's available to us as well, because we know that this is a random walk, we would also code the model in such a way that bringing that close form there. Just like previous work people had to sit down and think, hey, what's the close form of this, and then they worked it out in their system. We can do the same thing here. But it's something that's not done automatically, you have to think about it and write it down. This actually I don't know as much about dynamic Bayesian networks but I would assume that people may have worked at lazy instantiated dynamic Bayesian Nets too and then they would run in same problem as well. Do you know of anything like that? Because that's not a BLOG specific thing. If you have a dynamic Bayesian network with loss of variables and many of them are not relevant and you don't want to instantiate all of them, then again it would run into the same problem. Yes? >>: So to do that sampling, so if position A2 at time two depends on any of the state for the particle at time two, then does that -- would that mean that you wouldn't be able to just closed form figure out what it would be at time nine, you would actually have to go through the sampling or can you ->> Rodrigo de Salvo Braz: No, I could look at the particular distribution there, the transition distribution, and may be I could figure out some closed form, right. Is that what you're talking about? >>: Yeah, I guess I'm wondering if -- I mean, yeah, if the position doesn't -- if the position depends on the start state and how much it's been and [inaudible] but if it depends on for each particle, some of the other state factors that had been sampled in that particle, then it seems like you can't do a closed form you'd actually have to keep the history of the particle so that you can go back. >> Rodrigo de Salvo Braz: Oh, because maybe the history of the particle affect. >>: [inaudible] position of A1 affects position of A2. >> Rodrigo de Salvo Braz: Yeah, right. >>: Then you need to know A1 at time two. >> Rodrigo de Salvo Braz: So the closed form approach depends also on whether these objects are interacting among themselves. >>: Right. >> Rodrigo de Salvo Braz: Yeah. So. >>: It would depend on the evidence, so the evidence change so [inaudible] has to [inaudible] the evidence. >> Rodrigo de Salvo Braz: Well, yes, sure. >>: The evidence [inaudible] the previous time step. >> Rodrigo de Salvo Braz: So what you do, what you do and what people have done in data association models in the past, when you get an observation of something that you don't believe has been observed before, then you use the closed form. Because I never got evidence about this guy, so I know that that's the distribution this time step given that I got no evidence so far. Now, if you have observed it before, then you have to use the history, right, because you actually know something about that object, so you may have another closed form but that somehow depends on the history. Yeah. So you have to do that. >>: [inaudible] assuming that realistically know which aircraft is giving you which blip, and if you sort of put [inaudible]. >> Rodrigo de Salvo Braz: Right. That's not the usual situation. >>: [inaudible]. >> Rodrigo de Salvo Braz: Huh? >>: If you [inaudible] over that. >> Rodrigo de Salvo Braz: Yeah. >>: Then you also have to input about whether that [inaudible]. >> Rodrigo de Salvo Braz: Yeah. >>: In that case, [inaudible]. >> Rodrigo de Salvo Braz: Yeah. You're absolutely right. When you don't have -- if I just tell you, hey, that's my set of blips, that's their positions, I don't know where they are coming from, then you have to generate the positions of our aircraft to be able to explain that set of blips. And then this problem of instantiation wouldn't occur because you always instantiating everything, so you always have everything available. >>: [inaudible]. >> Rodrigo de Salvo Braz: Even that. I say where is that blip coming from, I have to instantiate all the aircraft and say which is more likely, which aircraft is more likely to have generated. So then that problem wouldn't arise, which is not necessarily a good thing because it would be instantiated in everything. And again you could have a form of lifted inference, right, in which you keep closed form representation for a generic object representing all objects that have not been observed so far. Because those are indistinguishable, you know, you know exactly the same thing about them, you know the prior of the -- at the beginning and you know you never knew anything else. So you could use some kind of listed inference. And actually when you look back, previous work in data association essentially that's what people are doing, they are implicitly representing unobserved objects with a closed form thing without representing individuals, so. But that's all done by hand and it's done for specific models. The challenge here is actually having a algorithm that can take a model and figure all this stuff out. Closed forms and lifted inference and all that. So it's a -- >>: [inaudible]. So suppose if you at the beginning you only generate, you generate a number of how many aircraft you have, suppose you sample to be one, and [inaudible] so 100 aircraft, so how do you [inaudible]. >>: Yeah, so that's a good point. This is a temporal -- the number of aircraft here is a temporal variable, right? That once you sample it, it's done, it's fixed. And that's a problem that arises in DBNs as well. Because if you have DBNs as the temporal variables, you have to do something smarter than just sampling it once. And people have come up with solutions for that. One of the solutions is to at every time step to apply a transitional kernel that preserve the posterior and that changes also the variables of the atemporal variables. We could do that here as well. We could change the number of aircraft at some point. But then you're going to have to regenerate the history. It's going to be a very expensive. >>: So how do you ->> Rodrigo de Salvo Braz: We don't handle that right now. That's still something to be [inaudible] with. We are working on having a MCC transition kernel that we can apply to BLOG models which is not that simple. Once we do that, then we can apply it to a temporal models. But then we to figure if it's going to be -- how efficient it's going to be to change those variable values. It's still not very clear to us. >>: Is there [inaudible] for dynamic programming approaches? Is that just [inaudible]? >>: I'm not sure. Because the structure here is so -- you know, dime program usually depends a lot on you know exactly your variables and everything is fixed. But we haven't thought that much about that. So maybe there is hope. So some of the things that we can mention about for instantiation is that one possible way of dealing with it is preinstantiating things that you know will eventually become necessary, even if they are not necessary for the current evidence. If you know the type of queries and evidence that you are getting, a template kind of queries in evidence, then you can infer what variables need to be instantiated. You could even do smarter things like what's the -- what's the probability of needing this variable and instantiating it according to that probability? So it can do pretty sophisticated things here. Other things using things like meets and time to decide how much the sample back. Maybe the initial position is completely relevant at this point anyway. So you can also try to make a decision how much to sample back. And the closed form things and the other things that you mention are also ways of dealing with it. Okay. Last issue on BLOG inference that's not related to [inaudible] instantiation or even to DBLOG. DBLOG it turns out that as we worked with D the BLOG that many limitations in BLOG because this data association for example was one thing that brought many challenging things that we had to rethink about BLOG. So this one of those items. So just showing here how basic things work, right, rejection sampling. You get the evidence, then you sample the value for that evidence and you see if it's equal or not, and then you keep the sample if you get the evidence. And you have likelihood weighting, you have the evidence, your sample values to know its distribution and you weigh the sample according to the likelihood. Now, suppose you get evidence like that? I'm telling you the set of values for happy P, for each person P is true, true, false. So how do I do likelihood weighting here? I could sample the values of those variables, right? Suppose I know I have three people and then I sample that, and I get true, false, false. And that didn't match the evidence, so I throw the sample away and I keep doing it. In this particular example doesn't look so bad because it's only 8 possible values but it's still pretty bad. Now, if you have variables with lots of values then it's going to be pretty hopeless, and if you have continuous variables, then it's really hopeless, right, you're always going to get probability zero. If you just sample the values and then you compare with the evidence. This becomes just like rejection sampling. It's no better than rejection sampling because you're just sampling the values and comparing to the evidence. Even though technically speaking this is likelihood weight it's not rejection, same thing, because we are -- the evidence is this variable here and we are not sampling that. Because you have deterministic function between the evidence and its parents ends up being just as bad as rejection sampling. So this is one of the problems that we came up -- this is very much like this is the blip situation, right? If I tell you that's the type of query that's going to be more common in the real application. I just tell you okay, I'm seeing through blips and those are the positions of the blips. I don't know where they are coming from and so on. So you have to give the likelihood of such evidence. And if you just sample the -those variables, it's not going to work. And this was not supposed to be here. So here what we do is the following. Instead of just doing blindly likelihood weighting which is sample values for the parents and you compute the likelihood here what we do when we sample the value for the parents, we determine its likelihood and we know here it's distribution, we know it's a Gaussian. So now I know that to have a valid sample I need to have one of those values. So I just do a -- I do an important proposal that's going to pick one of those values proportionately to the likelihood of each of them. So if I do that, here I pick for example 1.2 for the parents. And then the probability of picking 1.2 is going to be proportional to the likelihood of each value. So I know the probability of getting that value my specific proposal. I can keep track of that. And then I do the same thing for the second parent. Now I only have two values available, I have this choice, I keep track of that choice likelihood, probability and I do then the last one I don't even have a choice. So now I have a proposal distribution that's going to be much more efficient, it's going to work. And not only that, it's going to pick values not only that are valid but also that are usually more likely. Because I'm going to consider the distribution of the parent before I pick the value. So this is one of the chance that we did. This is not even DBLOG, this is just BLOG. This is just the BLOG mechanism for giving likelihood to evidence. And the nice thing about that, you know, the important sampling part of it, it's very well known, it's something that people do on a regular basis but usually people do this by hand. And the nice thing about having this language that has high level constructs like sets and so on is that the model automatically can look at these constructs and build a proposal that's going to be effective without you having to say anything. This is something -- I don't have to write a proposal to get this done, I just write my evidence and the algorithm looks at it and sees this is a set and that's how I have to do it. So that's one of the advantages of having this higher level structure available to the algorithm. All right. Future directions and all that. One of the things I mention to you before is that lived inference is a exact procedure that requires clusters of random variables to be symmetric. And that's not a very realistic assumption. So really we have to do something that uses approximations. The intuitive example here is the following. If I'm thinking about the population of the United States and I'm trying to make decisions and I estimate probabilities like in the elections and things like that, I don't have to consider every individually. Even though each individual I may have particular facts about each of them, I may have databases talking about all these people but I still can do some kind of lifted inference. If I'm willing to ignore irrelevant factors or things that are somewhat relevant but not much or that not going to affect the result all that much, so I can ignore that and have something that has a margin of accuracy. So there is a generalization to lifted inference that takes that into account that doesn't require symmetric objects. So that's a very important direction. Another thing is parameterized queries, right? What I showed before the queries always were propositions. And the reason for that is that if you have unbound variables in the queries then you have to keep track of possible variables for the logical variables in the queries and it complicates things. But it would be really interesting to be able to ask something what's the probability of X being sick with measles and then the answer is something like this. If X is John, then the probability is such. If X is Mary, the probability is such. If X is someone else, any of the other people, probability is such. Also, it would be a nice link to logic programming because that's how logic programming does, right, it instantiates, it binds all those logical variables for you when you give a query. So that's something that's still to be done. Also we don't have function symbols explicitly, so that's also nice if we could write things like that. As I was saying before, right, the lifted inference language is not very expressive like BLOG is. So these are all directions to make it more expressive. Also using equality adapting parameterization, things like that. And of course learning the parameters of functions. Okay. Take home messages here the following. So it's useful to keep the first order representation not only for representing your model but also doing inference and using that in the inference. If you do lived inference then you have something that's equivalent, it gives you exactly same answer as the backgrounded inference but it's much faster. And as for DBLOG, this is about bringing these techniques, temporal processing to this first order representations and all the variations that [inaudible] requires. So basically what I want to do in the future is to increase the lifted inference part of it and come up with something that's a inference engine that people can actually start using for real problems and not very different from what Pedro Domingez [phonetic] is doing with Markov logic networks. He has already quite advanced engine, and that's the direction that I also want to follow. He's using very different techniques now. So if he's not -- now he started using lifted inference but before he was also doing propositionalizations. And that's it. Thank you very much for your attention, for staying for so much longer. [applause]

>> Eric Horvitz: We have Rico de Salvo Braz... doing a post-doc with Stewart Russell's team. I'm very...

Related documents

Products

Support

&gt;&gt; Eric Horvitz: We have Rico de Salvo Braz... doing a post-doc with Stewart Russell's team. I'm very...

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib

>> Eric Horvitz: We have Rico de Salvo Braz... doing a post-doc with Stewart Russell's team. I'm very...