>> Eric Horvitz: We have Rico de Salvo Braz... doing a post-doc with Stewart Russell's team. I'm very...

advertisement
>> Eric Horvitz: We have Rico de Salvo Braz here from UC Berkeley here who's
doing a post-doc with Stewart Russell's team. I'm very excited about this topic
area. For quite a few years the uncertainty in AI community has focused on a
largely propositional representations, Bayesian networks and all this fresh
diagnostic reasoning, predictive modelling typically assumes you have a fixed set
of propositions. And one of the directions that's kind of exciting is in opening up
our systems to the open to become -- to have more of an open world intelligence
is to mesh first order inferences with the probabilistic calculus, and it's been kind
of an explosion of representations, I don't call it an explosion, I call it a mini
firework of representations that combine first order logic with probability,
probabilistic inference.
Rodrigo was -- did his master's and bachelor degrees at the Universidade de Sao
Paulo, and after that he got fascinated by intelligence and went off to Brown
University for a couple of years in the cognitive science department doing work in
neuronets and in descriptive models of various kinds. And as we discussed last
night at dinner got a little frustrated with that work and went off to pursue his
Ph.D. more centrally in inferential methodologies, core AI, and this was at the
University of Illinois Champaign Urbana working with Ail Amir and Dan Roth
[phonetic] and others there.
He's been a post-doc in computer science at UC Berkeley and just recently
informed me that he's probably going to be taking a position at SRI pretty soon to
continue this work as an adult, a full-time researcher there. So Rodrigo on first
order probabilistic inference.
>> Rodrigo de Salvo Braz: Thanks for the introduction and for having me over.
Very excited to be here. I have always wanted to visit Microsoft Research. It's
nice to finally make it.
Yeah, so today I'm going to talk about my work, my Ph.D. and also the post-doc.
It's actually two different parts of the same subject. My Ph.D. I worked with what
we call the lifted first order probabilistic inference, meaning inference that's
performed not only on a first order representation but the inference itself is also
captured the first order level as opposed to having a first order representation
generating the propositions and then doing the inference at the propositional
level. So that was my Ph.D. thesis.
Then I work on to work with Stewart Russell. He was also working on the first
order probabilistic inference from quite a different point of view. He was working
with a language that was more expressive, more complex, he was working with
open universe assumption and but his inference was still propositional, he was
working with sampling whereas I was working with simple languages and more
sophisticated inference because it was first order. But then because of that I
couldn't deal with as much expressivity as he did. I didn't have the open universe
assumption.
So it was a nice compliment to work with him on that, and that's what I have been
working on the past year in his language that he calls BLOG for Bayesian logic.
So I'm going to talk about those two topics and hopefully one day they will
integrate nicely and we're going to have all these things in one ->>: [inaudible].
>> Rodrigo de Salvo Braz: Yes. So that's it. So just ask any questions,
fireworks if you like for questions. Because sometimes I find that some of this
topic may look ambiguous to some people. It's good to clarify things right away.
If you want to follow -- I'm surprised people don't have laptops here. Usually
everybody's like with their lap [inaudible].
>>: [inaudible].
>> Rodrigo de Salvo Braz: Amazing. There you go. So for those who have
laptops, they can follow the slides from my Web page. All right. So these are the
parts of the talk. First I'm going to talk about first order probabilistic inference in
general. The goal that it represents, what kind of things we want to achieve in
the future with that. Then I'm going to talk about my Ph.D. thesis, lifted inference,
then the BLOG which is an extension to BLOG that I have been working on and
the exclusions.
So the thing I'm most interested in is to abstract inference and learning from
different AI problems, right, we are all working on different problems in AI and
even though we share so many techniques, so many ideas about inference and
about probabilities and so on, still often it is the case that we have separate
solutions that are tailored to our problems. And of course that makes it harder to
reuse those solutions to new problems that come along and it's also harder to
integrate problems.
If you're working on a problem, working on another one to integrate those two
solutions it may be quite a piece of work and we have to rethink our models
because of that. So it would be nice to abstract. All these problems and
solutions by separating the knowledge and the inference and learning
mechanisms that we are using. Once we do that, hopefully we can do this. We
can have just one box that does the inference and learning for us so we can just
put in the knowledge about specific problems and use that. That's a vision that
AI has had since the beginning of AI I suppose. But it's still something that's not
quite realized.
So if we want this magic box and this inference in learning, what should it do for
us? One thing that everybody seems to find very useful, especially if you're
describing more complex problems, is to have predicates or relations and
quantifications of our objects. This is something that's very powerful. You can
describe many different problems using this.
So at first people did that with logic, but then we also want other things. For
example, we'd like to have knowledge that has varying degrees of uncertainty
because often that's the case in problems is that we have knowledge but that's
not a certain knowledge.
Here I'm not committing to any specific uncertain representation, so I'm just using
this brackets, these red brackets around my sentence to say this is uncertain in
some way that I may commit to some form later. But just to say, you know, it
may be that I have a tank, usually that means the car of the tank is green but not
necessarily.
So that -- those two things are pretty important different problems, different
domains like language and vision and many others. And eventually even though
that's not something concerning myself with right now but you also want this box
to be able to use other things like modal knowledge and utilities and things like
that. But for now we're concentrating on the predicates and probability -- not
probabilities right now, uncertainty.
So I was talking about integrating problems, right? So if I have domain
knowledge about some common sense reasoning task, I'm thinking about a
military situation with tanks and planes and the colors of objects and so on, I
have that knowledge there. And I also may have even related to the same
domain something involving natural language processing. I have also knowledge
involving that about how verbs interact with the meaning, what does it mean to
use the verb attack and so on.
And if we share this inference in learning module, then it should make it easier
for us to actually integrate those things. And what's the point of integrating that?
The point is that you have a lot of synergy, a lot of feedback between those two
pieces of inference of knowledge going on. You can resolve language
ambiguities by using common reasoning, common sense reasoning facts that
you are also observing. So bringing everything together may help you with all of
the problems at once. You can use the language for the common sense and vice
versa.
I think that when humans solve problems they use that a lot, and I think that's a
major direction for AI to integrate things as much as possible. And that's why I'm
so interested in, you know, a very general language that can do that.
So I was talking about predicates and things like that, and uncertainty. Logic has
those objects and properties a very rich language but then first standard logics
the statements are usually absolutes, so that doesn't do what we're trying here.
And graphical models which are a machine learning techniques, they have -they involve lots of uncertainty but then usually as Eric was saying, you assume
a fixed set of propositions and if you have objects that parameterize different
propositions, it's not so easy to use that with those models, you usually have to
have a separate thing, separate module that generates new instances of
problems for you and it's a separate different stage and usually that's even
[inaudible] solution so that's not so convenient.
So just to give you a flavor of what you may do with graphical models, if you
need to use objects and knowledge that is quantified, if you have that piece of
knowledge there so you're saying that an object is either a tank or a plane and if
an object is a tank use the color is green and so on, and if an object's next to
each other -- to a tank then probably it's also a tank. So if you want to use that
kind of thinking and you have the evidence that the color of a certain object is
red, the other is green, you can create a graphical model that represents that.
It's important to notice that even though I'm naming these variables with this not
so usual notation parameterizing with parentheses here, from the point of view of
a graphical model inference algorithm, those are just strings. They don't know
the structure that's involved in this problem. They don't know that this is the
same type of variable as that and that the same type of knowledge applies to
both of them. So the algorithm cannot take advantage of that structure.
Also, you have to do this transformation, right, and that's not something that the
framework of graphical models necessarily brings to you. That's something that
you have to come up with a solution for yourself. And I said, the algorithm
doesn't know that this is talking about objects for the model -- for the algorithm
this is just a bunch of random variables.
And another interesting thing is that if I come -- if I have a different evidence I still
have same model, I still have the same problem, same solution. I would like to
have one model that can be used with different inference -- different evidence.
But here every time I get new evidence, I get a different graphical model, so
formally, technically speaking I got a new model, different model that formally is
unrelated to the first one, and it would be nice to be able to formally establish
how they relate to each other and that they in fact are representing the same
knowledge and same things. So we also lose that kind of thing if we do
something like this.
So what I just showed you is a technique called propositionalization, which is just
generating the random variables from first order representations. So a lot of
people worked on something that, right, they came up with languages that were
very expressive, very expressive and when they actually had to do inference
usually they will create a graphical model with the advantages that I mentioned
before, right?
And the languages they have all different sorts of flavors. Some of them look like
graphical models, some of them look like logic, some of them look like
databases, some like description logics. Actually I think the field took quite a
while to understand that these things were doing essentially the same thing. It
wasn't so clear at first. It wasn't so clear why the semantics that a probabilistic
relational model should have. So that's why I think we got so many different
models. And eventually people started realizing we're really just talking about
predicates and each ground literal path is going to be a random variable and so
on.
So you also have from Hackem and Meek and Coler [phonetic] from MSR, you
have the system data, one of the systems in this line. So what people use is
what I would called is marked propositionalization because they don't generate
all the propositions, they don't just dump a set of propositions and general
graphical model, they try to do some things smarter than that. They only stance
80 those variables that are relevant to the evidence and the query at hand. And
so you can actually create something that's a lot smaller than you would just by
propositionalizing. So that's usually the flavor what people do.
Just to give you an idea about this type of model, one of them, which I find pretty
typical, is Bayesian logic programs which by the head and [inaudible]. So they
designed the language to look like prolog but instead of using the usual prolog
implication symbol they used the conditional probability symbol because these
are like prolog rules but they actually have associated CPTs and also
combination rules in case you have two different rules that are parents to the
same variable, you need to somehow combine those different CPTs into one.
So if you are giving this bunch of rules like that, then you have something that will
generate graphical models for you. So that's more or less how it works. If you
have to decide whether a battalion needs rescuing, some kind of situation, and
that depends on whether there is a soldier in that battalion that's wounded. So if
you have a wounded soldier, you need a rescue operation there. And the way it
will work is by applying prolog inference and building the proof tree and using
that to build a graphical model like that.
So by doing that, you actually concentrate on the random variables that are
actually relevant to your problem, you don't propositionalize everything, but still
you go and instantiate in the case of a battalion, you're going to instantiate
variables for all the soldiers that you have in that battalion.
This generation of graphical model happens even before you actually start doing
the probabilistic inference, so say if you determine that soldier is wounded, that
doesn't help you preventing the instantiation of other variables because you
could in principal think of that, right? I already know this guy is wounded, I
already know I need rescue, but because we're generating the graphical model
before even considering things like that, it's going to generate the graphical
model first and then it's going to do the probabilistic inference.
On the other end of the spectrum, something that it's a first order model that
looks like graphical models, it's work done by Laskey called Multi-Entity Bayesian
Networks. It's essentially a bunch of templates. Each of them look like a
fragment of a Bayesian network but then the random variables have these
parameters and then this case the model does know that those are
parameterized random variables, and uses them to instantiate new fragments.
So if you give evidence or give a certain situation, it uses those objects to plug
into the -- these variables here, then you have a bunch of fragments that have to
be combined into a new -- into a regular Bayesian network, and that's -- and then
it does the inference. So you just instantiate the fragments and you put
everything together.
So here you have two fragments at the same variable, you'll somehow combine.
Not surprisingly you get something very similar to the -- what the Bayesian logic
program, you get a similar graphical model here. Even though the representation
looks pretty different.
Okay. So that's kind of work people have been doing until pretty recently. In my
Ph.D. the thesis I did this lifted inference approach which I'm going to explain
now.
Any questions so far? Just to clear that up? All right. So lifted inference.
Suppose I have that same query, I want to know if a certain battalion needs
rescue, right? Instead of instantiating random variables for each of the soldiers
like we were doing before with those other approaches, I could do something that
looks more like logic inference in which I determine the kind of things that will
determine -- will be relevant to my query. But instead of instantiating everything,
I leave them instantiated, I just say, well, I just need that kind of -- that class of
random variables to answer this query.
I'm showing this is not a form explanation, I'm just giving the idea about how you
can go solving this problem without having to actually instantiate everything. It's
the way that may be most people would think about solving it if they had to do it
by hand. We wouldn't start thinking about each individual soldier if we had to
decide that kind of thing. We'd think, well, a battalion has a bunch of soldiers, I
have to check whether the soldier is in the battalion is wounded, so how do I do
that?
If I have another rule that I -- a soldier can be wounded depending on whether
the battalion has been attacked, then I can decide -- I can check if the battalion
has been attacked and think about that. While the battalion has been attacked,
so I have so many soldiers. Each of them has a chance of having been
wounded, so what's the probability that I do get at least one wounded soldier?
So I think that's how most people would do that by hand, they wouldn't go over
instantiating everything. And it's a faster approach. You don't have to
instantiate. It's more compact. It's also more intuitive. And it keeps the structure
of the problem, right, because during inference I know that this represents a
class of random variables, and I know that they're all related to each other, that
they all have the same knowledge on them, and I have that information available
for my algorithm. I can use that to make better decisions.
So that's the gist of lifted inference, just keeping things instantiated as much as
possible. This is just a very quick review of Bayesian Nets. I think you guys are
up to date on that.
>>: I have a question.
>> Rodrigo de Salvo Braz: Yeah? Sure.
>>: [inaudible] I don't know if you call it the any concept, if any soldier has been
wounded then that variable is true and then you mentioned having to do this
inference that an attack causes the probability of wounding of one particular
soldier and if you have N soldiers, you know, what's the probability that no
soldiers will be [inaudible] that's a complicated thing. How many of those sort of
complicated atomic things does this have to be able to do?
>> Rodrigo de Salvo Braz: Why do you say it's complicated? You have an
independent probability of each soldier being wounded, right?
>>: Right. In and of itself it's not complicated. In and of itself, it's -- you know,
that's a procedure that you could follow to see that if you have some probability
of something happening to any member of a collection, you know, that what's the
probability that it hasn't happened to any at all, it's happened to some, one of
them, do you -- is there any other type of inference that you ->> Rodrigo de Salvo Braz: You're talking about this is almost like an inference
step.
>>: Right.
>> Rodrigo de Salvo Braz: A way of doing inference. It's almost like an
inference rule.
>>: Right. Are there other inference steps?
>> Rodrigo de Salvo Braz: Yes. That's a good point. This is the main one.
That's the one that's the essence of it, I would say, but ->>: Each [inaudible] in that collection has a probability by itself and ->> Rodrigo de Salvo Braz: Yeah. But that doesn't work for all situations, right,
because you may have situations in which things are not as independent and
then I'm going to talk about that. There is another way of doing it which I don't
show right now, but takes into account the dependencies between things.
So you can think of it as inference rules and you may have some of them
available. We have kind of -- we have two different ones and we have a third
one that kind of links the first two together. It maps one situation -- well, after I
talk about the actual operations then I will mention this third one.
Okay. So Bayesian Networks you have a joint probability that's the product of
conditional probabilities. You can also use factor networks which a notation that I
use more often, so I'm showing it here. This is essentially the same thing. You
have a joint probability that's proportional to the problem of these functions. You
can represent that Bayesian Net using factor networks if you want. Then you
have the task of marginalization. You just sending out a bunch of variables. You
can do that with variable elimination.
You can efficiently factor functions out and start obtaining new factors. Here I'm
using the color green to show new factors obtained by sending others out. So
you did that and you eventually get a function that's only on your variable of
interest and that gives you a marginal probability.
So I was just showing that to kind of introduce graph connotation that I was using
and so on. So if I have a factor network with lots of redundancy like that I have
again here the parentheses only have meaning for us, not for the model. I have
several diseases and I may have epidemics of those diseases going on, and I
may have people who are sick with those different diseases and people may be
hospitalized or not. So whether Bob is hospitalized or not depends on the
various diseases he may have and so on and whether people are sick depends
on whether they are epidemics and so on.
And I'm assuming here, and that's a very naive model that this function is exactly
the same function as that so if there is an epidemics of flu, the probability of
somebody getting that disease is exactly the same if there is an epidemic of
measles, someone getting measles, which it's a naive thing but we'll do for
illustrative purposes now.
So if I have a model with lots of redundancy like that, I would rather represent it
in a more intuitive way. Just talking about different diseases and people getting
sick and being hospitalized so I could just say well, those are the relations that I
have in my model and I have -- it's this -- this slide just has the point of showing
and defining terms. So I could call those classes of random variables atoms just
because they are pretty much the same thing that you have in logic.
And you have the logical variables which are the parameters to these atoms.
Those factors that actually represent many different instances of the same
function we call par factors for parameterized factors. And we can also constrain
their instantiations. I can say well that factor holds between instances of those
two classes for the cases which P is different from John. So for John that didn't
apply.
So that's the kind of representation that I'm using for a listed inference.
>>: [inaudible].
>> Rodrigo de Salvo Braz: You mean I could represent a constraint as another
factor, is that what ->>: Parameterized factor.
>> Rodrigo de Salvo Braz: I could do that. We keep them separate because
those are the deterministic factors. They have their own different behavior, and
it's more convenient for computational purpose to keep them as constraints. It's
going to be hopefully more clear when I show it.
So what's the semantics of such a model, right? So if I give you this first order
model, what's the semantics of it? It stands for exactly the same thing as if I
propositionalized everything. It stands for full instantiation. That didn't mean I'm
actually going to do it, it just means that's what it's representing, that's the joint
distribution it represents. And each factor here is just going to be the
instantiation of the par factor with particular objects. If I do that, I have a
standard graphical model and I have a joint distribution.
So let's see the operations that we can do with more detail, all right. So the first
one which is the one we were talking about is inversion elimination. And
inversion elimination works the following way. In just one step I can sum one of
the classes out just like that. It's very much like variable elimination but of course
it's not variable elimination because here I'm talking about classes of random
variables, I'm talking about when I sum over this it means I'm treating this atom
as if it were an individual random variable, even though it's not. That's how the
operation works.
And of course to do such a thing, you have to justify that it's correct, that you can
treat an atom as an individual random variable and that's still valid, right? But
that's what it does.
So you just sum it out, you get a new function that's on the remaining classes,
the remaining atoms and you have a new first order representation that
represents the marginalization over the instances, all the instances of sick PD.
Yes?
>>: So here if I -- so for you have not shown [inaudible] between any of your
[inaudible] models which [inaudible] sum action. So if I understand that correctly,
for a specific P [inaudible] I have probability of epidemic, I have probability of sick
given epidemic and I sum over the sick and I get the epidemic.
>> Rodrigo de Salvo Braz: Yes.
>>: And if it's the same for all Ps, then of course I can do it for an average rate.
Is that what you are assuming? Every person the relationship between sickness
and epidemic is the is the same?
>> Rodrigo de Salvo Braz: No.
>>: And therefore you do it once on instantiation?
>> Rodrigo de Salvo Braz: Yes. That's right. But it doesn't work all the time
because here these variables, they don't interact with each other. The instances
of sick PD, they are not interacting directly with each other. If they did then if you
work out the math, you see that you cannot do that. If you do that, it's just going
to be incorrect.
>>: So [inaudible].
>>: Are you pushing to the direction which we look on the [inaudible] on the first
order representation and you do something there that is more efficient in some
ways than going down to the [inaudible].
>> Rodrigo de Salvo Braz: Yeah. Actually the rest of this is like -- is this, right.
So the idea is that I could instantiate everything, do variable elimination over all
the instances, obtain these guys here that are just over epidemic random
variables, and now these instances can be represented by that. Instead I'm just
doing it directly on the first order lab. I never go down in the propositional level.
You can think about resolution, right, in logic reasoning.
If you have first order resolution you can look at it from that point of view. You're
doing resolution over quantified clauses and it is just as if you had instantiated all
the clauses then resolutioning each of them and then going back first level, first
order level. So this is the same idea.
>>: So I'm confused. So you're still summing over all sick like all people times
disease number of sick, right? The sum that you showed is still a sum over all
like number of people times number of diseases?
>> Rodrigo de Salvo Braz: No, the sum is just over one symbol. I pretend that
that's an actual random variable. But let me get that. So this a binary variable,
right? So I'm just going to say, suppose there's a false, I plug it in here.
>>: You mean there's a false for all peoples and all diseases or a given person
in a disease?
>> Rodrigo de Salvo Braz: You can think of it for any person I'm going to do a
summation for -- because these things are completely separate, right, they are
like this, have this parallel structure, I can pretend that I'm doing the summation
just for this guy. Once I do it for this guy, this is going to be exactly the same
because the potential doesn't actually depend on the objects. It only depends on
whether this is true or false.
>>: Right.
>> Rodrigo de Salvo Braz: So if I do the summation for here, I obtain a new
function that depends on that and that's going to be exactly the same function I'm
going to get for this one. Because all these subproblems they all have exactly
the same structure and the actual numbers don't depend at all on the objects.
>>: So the number depends on the true or falseness of the predicate?
>> Rodrigo de Salvo Braz: Yes.
>>: So the sum needs to know -- I mean, you're going to end up with true terms
and false terms, so I mean, you just need to know the number of true, the
number of false, and that's how you do the sum? I mean, you can move on if you
like.
>> Rodrigo de Salvo Braz: That's okay. You're asking if I need to know the
number of objects that are true and the number that are false?
>>: It's going to depend on that, right? So ->> Rodrigo de Salvo Braz: Yes, but because you have this nice structure here
and it's not something that's general, it's only some situation like this one, you
can actually ignore that fact. Because they are all independent, so you can -- it's
as if you were factoring things out. Suppose that you do decide to do it on the
propositional layer. You're going to have your summation, it's going to be a
summation over all these random variables, right, all of them, you're going to
have many of them, and you also instantiate all the factors. Because they are all
independent, you can factor each instance of a factor to its corresponding
summation. And you're going to end up with a summation over one factor here
times a summation over another factor here, and they're all going to have exactly
the same structure because they don't depend on the object.
So you do the first one or whatever, and you know the result for all of them. And
then you get a new -- and then you get these guys, a bunch of instances of
factors that all look the same and because they all look the same, you can just
abstract back like that. I'm going to move on because otherwise it's going to take
a lot of time. But we can talk more about it if you want. Are you on my schedule,
by the way?
>>: Yes.
>> Rodrigo de Salvo Braz: Okay. Good. So inversion elimination, this was
something that was proposed by David Bowl[phonetic] in 2003, and he didn't
actually formalize it very completely and the way he presented it was as if it
would work every time. Basically he was saying what you were saying more or
less, that oh, I'm just using generic objects, it applies to all the objects. But
actually that doesn't work all the time because if the objects -- if the random
variable is incorrect, then you don't have that nice separation anymore.
So what we did the first paper on this subject was formalize it more carefully and
then we realized, okay, that didn't work all the time, what are the situations that it
is correct, and what are the ones that it's not correct. So here's the limitation,
right?
So because I have all the separate structures that have exactly the same
structure, I can do an operation on one of them and I'm going to get exactly the
same result. So it works. But if I have something like this, right, suppose I say
well, what's the relation between the variables for epidemics of two different
diseases, depending on the month, so I have that factor, right, and that
relationship. I cannot just go suppose I want to marginalize over the epidemic, I
cannot just do a fly inversion elimination and obtain something like that and I'm
done. That's not going to work. I'm going to get something, but it's just not going
to be the correct thing. And the reason for that is that each [inaudible]
something like that you obtain this, you obtain a very connective graphical model,
and things are not separate anymore, so when you're summing over one of the
variables, that's going to influence the summation of the other ones, so you -- it's
not true anymore that the summation over one variable is going to give you
exactly what the other would give.
So that's the limitation here inversion elimination. And because of this limitation,
we started thinking of other ways of solving things. So for this problem, even
though I have all this interaction between these random variables, there is -- this
is still a very simple thing in a way, because these are all symmetric variables.
They all have the same structure, they all have same knowledge on them, so it
seems unnecessary to treat each of them individually because it's really just a
mass of symmetric variables. So can we take advantage of that symmetry?
So as we were talking about before, we need to consider joint assignments of
that, and that's going to be an exponential number of them. But because they
are symmetric variables, an assignment, when I say well this is true, this is false,
and so on, it didn't really matter which ones are true, which ones are false, it only
matters how many are true or how many are false. So if instead of iterating over
all assignments, I just iterate over the histograms, the number of trues and false
that I have in that closer variables, that's going to be enough. And that's going to
be a polynomial number of [inaudible] and exponential one. So that's the other
operation that we're talking about, two different operations and that's another
one. Count elimination.
And that's a more expensive operation because even though you have this
polynomial number of assignments, it's a number of assignments that does
depend on the number of objects that you have. I didn't mention that before, but
inversion elimination actually doesn't depend at all on the number of objects.
You may have a million objects. Doesn't matter because you're going to do the
summation for this generic object. You're doing to do it only once.
>>: [inaudible] one specific [inaudible] epidemic independent or [inaudible]
without ->> Rodrigo de Salvo Braz: Well, this is an undirected graphical model, right?
So ->>: Undirected where it's [inaudible] 3?
>> Rodrigo de Salvo Braz: Yes. So this notation that I'm using is just like a
Markov network, only I'm actually explicitly representing the functions over that.
So this one click. Well, I'm not linking this guy here, but this guy's linking to each
of those functions. This is a proposition. It links to all of those instances.
Yeah, now I'm kind of regretting not having put the equations here. I think you
guys would have appreciated that more. But I can go over that with you if you
like. I just wanted to keep it more high level, not get down to details. But the
idea is that you're going to have a summation again but instead of being a
summation of a generic random variable, you're going to have summation over
the number of values that you have in this cluster, and then you're going to get,
you're going to get a potential just on the remaining variable month. So basically
for each month is going to have a different potential. Yeah?
>>: So this is assuming you don't have any evidence that distinguish between
like [inaudible] so what if you have everything ->> Rodrigo de Salvo Braz: Yeah, that's a good question. That's actually a
crucial point because this all depends on things being distinguishable and
symmetric and if you have knowledge about specific objects then you actually
have to separate those cases. And how -- and you do that in following way. If I
tell you I know for a fact that there is an epidemic of the flu going on, so what you
do, you come here and that's why I was keeping the constraints separate
because they come really handy when that happens, because then I'm going to
break this par factor into different parts. I'm going to -- par factor is just a set of
instantiations, right? I'm going to separate the instantiations involving the flu and
instantiation that don't involve the flu. And the ones that don't involve the flu it's
what we call the residual. It's going to be like this. The one different from the two
and but also different from the flu. And the other is going to already be
instantiated, one of them's going to have the flu.
So you separate that. And then once you do that, then again every par factor
involves things that are completely undistinguishable and symmetric. So you go
on with your calculations. But in the presence of evidence, you have to do this.
It's what we call shattering. Actually, it's even more -- it didn't even depend only
on evidence. If your knowledge distinguishes things in some way, if I have some
special rule for the flu, then I have to do that as well, I have to combine that with
this rule and separate them.
Actually this is an interesting thing because in terms of logic, because it starts
looking a lot like resolution and unification because when you have to identity the
intersection between two par factors, it looks very much just like a unification and
breaking clauses like we're doing logic. So it's an interesting link to think about
logic inference and this the type of inference. It starts looking like that.
And that's a major or limitation to approach right now because if you have a
database, right, with facts about each of your objects, then there's not going to be
helpful to you at all. Because you're going to have to look at that database and
take all those facts into account and basically propositionalize, you're going to
have things for each object. So it wouldn't be helpful in that case. So that's still a
major limitation.
It's not a limitation that's -- well you cannot get around it because you can do
something like I'm going to talk about that later, but essentially you can do some
kind of approximate inference that when objects are still not the same but they
are -- they have similar properties in similar situations, then you can still consider
them as part of the same class. So there is a generalization that takes this
approximation into account. But right now because this is exact inference we're
not doing that.
Other questions before I move on? So this is lifted inference. If you run
experiments on this, very predictably you'll get graphs like that, right? These
operations don't depend, the first one, inversion elimination doesn't depend on
the domain size at all, so if you're propositionalize and you do inference, it's
going to be an exponential curve and lifted inference remains the same.
And here even though the call elimination does depend on the main size it's
really hard to get a graph that shows that because the propositionalization of it
still grows a lot faster. So for this case in which you have clusters of symmetric
graph -- variables, those are much faster operations than propositionalizing.
All right. So I'm going to move on to the second part of the talk, which is about
BLOG and DBLOG. So I have 15 minutes, right, so it's not much.
>>: [inaudible].
>> Rodrigo de Salvo Braz: So let me tell you a little bit about BLOG, Bayesian
Logic, which is the work that Brian Milch and Stuart Russell have worked on
during Brian's Ph.D. thesis.
So it's a probabilistic logic language, just like those other I presented, and its
inference is just propositional sampling. You propositionalize variables as you go
and you sample them. And what distinguished BLOG from other approaches is
those two things. It's an open universe language so it doesn't assume you know
the number of objects in the world. You have distributions on the number of
objects in the world which is -- which did a nice feature there and very realistic
because usually we don't know how many objects are out there. And it's also
very expressive landing, it looks very much like a programming language, you
can write very arbitrary things in it. So here's an example of a BLOG program.
So you declare types, so I say I have this type battalion and the number of
battalions is a uniform distribution. I have this property of battalions that's a
boolean property, that's just going to be a Bernoulli distribution there, whether
battalion is large or not. You can have the region of a battalion's going to be a
natural number and that's a distribution from zero to three. And then you have
another type of object, soldiers. And the number of soldiers it's a parameterized
variable, it depends on the battalion. So for each battalion you're going to have a
different set of soldiers. That's going to be a -- that's going to have a distribution,
too.
And you can write that the number of soldiers of a battalion is going to depend on
whether the battalion is large and the region of the battalion. And then you can
place queries like what's the average number of soldiers per battalion, for
example. So because I have distributions on a number of objects, that's an open
universe assumption and all sorts of pretty expressive language. I didn't really
show when I was talking about par factors, I didn't want to be going down to
details of syntax and so on, but usually those par factors, they are just tables,
right? So it's a much simpler representation as opposed to here in which you
actually describe the computation of the distribution much more carefully, and
you have a lot of structure available there. It's a more expressive language.
You also talking about sets of objects, you're talking a variable that, you know,
average that's going to be over a set of objects. It's much more expressive
language. So inferencing BLOG. So just going back to that first example that I
was using of a soldier being wounded and a battalion needing rescue and stuff
like that, so I'm just writing that in BLOG now. The only difference is that the
probability of a soldier being wounded I am making the model more detailed now,
and I'm saying that it depends whether the soldier is experienced or not, there's
the probability of them being wounded is different.
But basically that's the same example as before. So if I have a BLOG model like
that and I want to know what is the probability that a battalion needs rescue, so
how does the inference work? It's sampling, right? So when I have this query, I
need to know what its distribution. So that's going to depend on its parents.
And its parents, as we see here, you have this -- it's an substantially quantified
sentence. So that's going to depend on all the soldiers. I have to look at all the
soldiers, the set of soldiers and see which ones are in the battalion and how
many of them are wounded. So that's one of the parents of the -- of my variable
here is the set of soldiers for that battalion.
So I sample values for that. And once I have a set of soldiers, I have the parents
of this variable that I need which is going to be the variables wounded for each of
the soldiers. Once I do that, I have to sample, okay, is the soldier wounded, now
I need to sample for that. For that I need to determine its distribution, I need to
know which -- I have to have values for its parents. So the parent there is
whether the battalion has been attacked.
So if I do that, I can sample a value if I get false, it means the battalion's not
attacked, so the soldier is not wounded. And I -- unfortunately I'm going to do the
same thing for all the variables, even though I know that the battalion's already
attacked, I should be able to tell that all the soldiers are not wounded because
battalions don't attack. But I'm going to sample variance for each of them and it's
going to be false for all of them, because that's the distribution.
And because of that, whether battalion needs rest or not is false. So that's how it
works. Just another example ->>: [inaudible] some things ->> Rodrigo de Salvo Braz: Yeah?
>>: Some things are interesting there. The notion that battalion has members,
i.e., the soldiers, and then if this attack thing holds for the battalion holds for each
of the members, right, we needed to know an attack, it wasn't an individual
soldier that was attacked but that the battalion was attacked and therefore each
of its members was attacked.
>> Rodrigo de Salvo Braz: I think when you write the statement there, let's see,
so here, right, battalion -- so I have this variable that says to which battalion a
soldier belongs to, that's something I didn't go to over in very much detail but
when I have a number statement for soldiers I'm saying for each battalion I have
a number of soldiers, and that's actually constitutes a function on soldiers. So I
have this function called battalion of that's going to give me the battalion to which
the soldier belongs.
So if I do that, I have -- if the battalion to which the soldier belongs is attacked,
then I have a -- right? Does that make sense? I don't know if that answers your
question.
>>: Yeah. So if you go back to that picture.
>> Rodrigo de Salvo Braz: Yeah.
>>: The way you were explaining it, you were sort of going through variable by
variable and saying okay, what parents do I need to know about in order to figure
out if this variable is true? And you've got to wounded soldier one and you said
oh, I need to know about attacked and then when you got to the next wounded
soldier, you knew that that was going to depend on that same.
>> Rodrigo de Salvo Braz: On that same variable, yeah.
>>: Presumably in other cases it would depend on variable that's not the same.
>> Rodrigo de Salvo Braz: Yes.
>>: So this there was this membership concept. Are there any other concepts
that we should know about that come in handy when you're deciding what
depends on what?
>> Rodrigo de Salvo Braz: I don't think those are separate cases because what,
what I do is all the same thing, I look at the statement, the dependency statement
for this, and it says well look at the battalion of this soldier and that's going to be
your parent. So it determines there, okay, I need to know it is variable. This guy
is going to do the same thing and it's going to -- just because that's how the
competition works, it's going to get exactly the same variable, and then it's going
to look up there.
>>: So [inaudible] like battalion which are [inaudible]?
>> Rodrigo de Salvo Braz: Yeah.
>>: As you wish and then you just sort of [inaudible].
>> Rodrigo de Salvo Braz: Well, they're not arbitrary exactly because ->>: [inaudible] they get.
>> Rodrigo de Salvo Braz: This function battalion of is what we call an origin
function, it's actually -- it has a very -- it's not even a probabilistic variable
because when I need to know the number of soldiers in a battalion, right, I have
to sample that. Okay. I need to know how many soldiers this particular battalion
has. I sample a number. In that case was 73.
When I do that, I generate 73 objects that I place on my universe. And then I
define the origin function for each of them to be that battalion that generated
them and that's a fixed value for all of them.
So when I go to their -- to wounded dependency model, I say, well, is the soldier
wounded, then I plug the soldier here, then battalion already determined because
that was determined when the soldiers were generated, so I have a value here
battalion 13. So now I know that I need to know whether battalion 13 has been
attacked or not. And then I sample that.
>>: [inaudible].
>> Rodrigo de Salvo Braz: Yeah.
>>: [inaudible].
>> Rodrigo de Salvo Braz: Yeah. It's a very much generated model.
>>: How would like for instance if -- would you be able to borrow several
battalions with several soldiers where like a single soldier can belong to
battalions like you can be [inaudible] or like you know or does that get too
[inaudible].
>> Rodrigo de Salvo Braz: No, you can do that. You can do that. But those will
be origin functions because the origin function's defined as a parameter of the
number here, right. Once you have that, you're saying all the soldiers are coming
from same battalion.
>>: Right.
>> Rodrigo de Salvo Braz: So that's it. But you know, I can define arbitrary
functions on things. I could also have another variable here saying alternative
battalion of and then it could be anything.
>>: So you could [inaudible].
>> Rodrigo de Salvo Braz: Yeah. And then I could use that variable if the
battalion's been attacked or the alternative battalion's been attacked. So even
though those origin functions they are under the hood different, they are
treated -- you use them in same way as any other function. So it's -- it becomes
a very uniform process. So do you see now that it's actually always the same
case that you always looking at the parents like that?
I'm asking if that like the other [inaudible].
>>: I guess what I'm getting at is like if you -- this is like a programming, right?
And if you define a programming language you say okay, I have base types, I
have integer, I have characters, I have really numbers, I have basic acomic
functions, I can add, I can subtract, I can multiple. I'm trying to get at that what
exactly is this set of basic operations that you could define here and basic types
of types and so forth. Like if you do define it as a program language.
>> Rodrigo de Salvo Braz: Well, you'd have.
>>: What's the [inaudible] things?
>> Rodrigo de Salvo Braz: You have the [inaudible] types like natural number
and boolean and things like that. You also have user defined types like soldier
and battalion. And then you have these operations, right? Here I'm not showing
many of them, but you have things like plus and minus and all that. And it always
works same way. You execute these instructions here every time you need the
value of a variable, you look to see if it's already defined and already has a value,
or if it not then you sample a new value for it. And when you sample new value
for it, you're going to trigger this recursive process in which a sample that
variable's parents and so on. So eventually you sample them and then you have
this partial world of representation.
So it's all this just this loop, sample parents get distribution sample value. If you
get evidence that's the same thing, too, you get the evidence and you need to
know to sample parents of the evidence and see if that's a likely explanation, so
on.
So this is one example, right, in which I had to sample values for all of the
wounded variables. I can of another one. Suppose that the battalion has been
attacked now that I sample and things in this particular sampling run the battalion
has been attacked. And now it's a different sampling ram so I actually have a
different sampling of soldiers. In this other world here that I'm generating now,
that battalion has 49 soldiers. So the battalion's been attacked, so now -- and
that's an interesting thing in BLOG, right? It uses this context dependent
independence here which is called the specific independence which is now that I
know that a battalion's attacked, now this variable matters to know the
distribution of wounded actually need to know whether the soldier's experienced
or not. So if I go through the dependence model here, okay, it's been attacked,
so now what's the experience of the soldier? And then I'm going to sample
something here. Once I do that, then I have a distribution for wounded.
Suppose it's true. And if it's true actually you have just like in regular
programming language, but if you have a disjunction and you get a true then you
know that the result is true, don't even need to look at the other one.
So in this particular case, we take advantage of that, and the other soldiers you
don't actually have to do any sampling, because you already answered your
query. But you have to run this many times and many times it's going to be
there, you have to sample all the soldiers.
So BLOG has this lazy evaluation property which is nice because as opposed to
some other methods right, in which a propositionalize first and then you start
doing the inference, BLOG the doing the inference and propositionalizing as it
goes. And as a result of that, it savings work sometimes.
So DBLOG, DBLOG is just like so it's trying to get to use BLOG for temporal
models. Now, because BLOG is so expressive already, you don't really need
more specificity to express temporal models, you can just use natural models as
time, steps, you can just write your statements that depend on the previous time
step and you're done. So that's a legitimate question. Why do we even need to
bother about dynamic BLOG? You could just write a BLOG model that has time.
And the reason for that is that temporal problems have a lot of structure that you
want to take advantage of. So you have this Markov property there, if the
algorithm knows that that's happening you can take advantage of that. Also
usually in the -- in these problems you get the evidence and query successively
four times steps so usually you know that once you are in a new time step you're
never going to get evidence about previous time steps. And you may want to
take advantage of that.
So the language remains the same but it's still nice to use a different algorithm
that takes advantage and it's more efficient for temporal models.
So we're working on dynamic BLOG. We do make this change to the language
that you just use natural numbers the algorithm doesn't know whether that
natural number is actually representing time or not, all right? Because you could
be just a natural, some natural number. So what we do have this new type, it's
time step, which is just natural numbers but it's actually telling you this is
representing temporal structure. So the algorithm knows.
But it works exactly in same way. You just have different leaders. Now instead
of zero, you have F0, things like that. But it works exactly the same. So much so
that if you give this to the BLOG inference algorithm it still deals with that
because time step is just a one of the [inaudible] types. But it's not going to take
advantage of the structure. So we have the BLOG particle filter, right? So
particle filtering as most of you know it just works by you have some sample for a
state for a particular time step and then you get new evidence, use likelihood
waiting and you see how likely an explanation each of them is, and then you
resample to get population that represents more likely hypothesis.
And we do the same thing in BLOG, in DBLOG . So you have a world there.
You have this partially instantiated Bayesian network for a specific -- that
represents a particular explanation that is sampled and then -- oh, wait a second.
This slide is still above dynamic Bayesian network, so that's not above BLOG.
But it's essentially the same thing. You just sampled the new variables using the
time slice and you generate a new particle.
So for DBLOG you have this particles. The particles now represent what I was
showing before that network of variables that require the instantiation of each
other and so on to generate an explanation for your query or your evidence. So
when you get new evidence and the evidence is going to be BLOG statements
like that, then you need to explain it. So here what I'm saying is that -- well, I
didn't actually explain this model, so let me go back really quick. This is about
aircraft. I have a bunch of aircrafts. They have their positions at each time step
that's going to be a random walk depending on the previous position. And I have
blips that aircraft generating aircraft in radar. It can be one blip or maybe it's not
detected in that particular time step, right? The number of blips is going to
depend on the aircraft and on the time steps. So each time step I may or may
not generate a blip from an aircraft.
I can also have another set of blips that are just false alarms. I can have some
noise blips. And those blips are going to have the apparent position of an aircraft
that's going to be Gaussian noise around the position of its source. So here
source is another one of those origin functions, right? That's going to be the
aircraft that generated the blip. So I just look it up here. That's the position of my
source. Well, actually that shouldn't be pre this time step, it should be the current
time step. That's the position of my source at this time step and then there is a
Gaussian noise and that's the position of the blip.
So that's the model I'm using here. So if I have the situation I'm saying I'm
observing now that the number of -- that I have one blip only that comes from
these aircraft that I have exactly one. So it has been generated. And the
apparent position of this blip is 3.2. So I need to gather distribution for this
variable, distribution of B1 at time two, because, yeah, I was using time step.
So I do that by sampling a value for B1, which indicates here that's not a very
interesting example because B1 is always going to be the blip. There is only one
blip generated anyway. There's not a set of blips for our aircraft. So that's going
to be a very simple thing. Always I'm just going to instantiate B1 with the blip that
their aircraft generates. And if the aircraft doesn't generate a blip in that time
step, it means this particle will have weight zero.
Then once I know that, be I need to know also what's the position of the source.
So I sample now the position of aircraft 1 and that's going to depend on the
position the previous time step, so I use the particle previous particle as a set of
already instantiated variables. And then I explain my evidence, and then I have a
weight to evaluate this particle that I just generated. And like before, I have this
the lazy instantiation. I just instantiated the things that I really need in order to
explain my evidence.
So now we have an issue that doesn't arise in dynamic Bayesian network
because you could think, well, dynamic BLOG is just like dynamic Bayesian
network, there's nothing new, but you start having some issues that you don't
have before. This lazy instantiation is one of them.
So just like in the particular case but now for time step eight, yeah time step nine,
if I get a new blip and this type it comes from aircraft two, and suppose that I
haven't observed aircraft two since the beginning, it's the first blip that comes out
of aircraft two, I'm going to have to sample the position of aircraft two at time step
nine. But because I have lazy evaluation, instantiation, I'm not going to have any
values for that in the priest sometime step or the previous I have to go all the way
to the beginning. So the cost of this lazy instantiation variable is that you may
have to do something like that.
And so the nice property of dynamic Bayesian Net so you have a constant
update time would go away in this case.
>>: Can you guarantee that somehow that doesn't affect the [inaudible]? That if
you that you get the same answer to just the -- instantiate [inaudible].
>> Rodrigo de Salvo Braz: Yeah. You're wondering that what I did before didn't
depend on position of aircraft two, right?
>>: Right.
>> Rodrigo de Salvo Braz: It didn't. Because the fact that it has not been
sampled means it was irrelevant to the evidence. So even if I had instantiated
the evidence would have exactly the same weight. So, yes, it is guaranteed that
it didn't matter. It only matters in terms of cost.
>>: [inaudible] adversarial attack on the system to bring it down [inaudible] future
aircraft and then you're [inaudible] certain ways that will sort of make [inaudible].
>> Rodrigo de Salvo Braz: Right. Yeah, you could think of something like that.
And you know, it sounds like this is a problem ->>: [inaudible] the probability of the previous time step, why do you need to
[inaudible].
>> Rodrigo de Salvo Braz: Because now you got evidence that depends on like
up to now you never had to bother about it, no matter where were, the evidence
would have the same weight. But now because I'm saying that this blip comes
from aircraft two, and I'm telling the position is here, I need to ->>: [inaudible].
>> Rodrigo de Salvo Braz: Yeah.
>>: [inaudible].
>> Rodrigo de Salvo Braz: But to instantiate this to [inaudible] for that, I need to
get its distribution and its distribution depends on the previous time step. And it's
a recursive thing.
>>: [inaudible].
>> Rodrigo de Salvo Braz: Right. So and actually you know this is a very
common problem in a data association. People have worked on that not with
BLOG but, you know, just hand made systems and they have circumvented this
problem. They don't generate things all the way to the back, basically using
things like closed form. And that's an option that's available to us as well,
because we know that this is a random walk, we would also code the model in
such a way that bringing that close form there. Just like previous work people
had to sit down and think, hey, what's the close form of this, and then they
worked it out in their system. We can do the same thing here. But it's something
that's not done automatically, you have to think about it and write it down.
This actually I don't know as much about dynamic Bayesian networks but I would
assume that people may have worked at lazy instantiated dynamic Bayesian
Nets too and then they would run in same problem as well. Do you know of
anything like that? Because that's not a BLOG specific thing. If you have a
dynamic Bayesian network with loss of variables and many of them are not
relevant and you don't want to instantiate all of them, then again it would run into
the same problem. Yes?
>>: So to do that sampling, so if position A2 at time two depends on any of the
state for the particle at time two, then does that -- would that mean that you
wouldn't be able to just closed form figure out what it would be at time nine, you
would actually have to go through the sampling or can you ->> Rodrigo de Salvo Braz: No, I could look at the particular distribution there, the
transition distribution, and may be I could figure out some closed form, right. Is
that what you're talking about?
>>: Yeah, I guess I'm wondering if -- I mean, yeah, if the position doesn't -- if the
position depends on the start state and how much it's been and [inaudible] but if
it depends on for each particle, some of the other state factors that had been
sampled in that particle, then it seems like you can't do a closed form you'd
actually have to keep the history of the particle so that you can go back.
>> Rodrigo de Salvo Braz: Oh, because maybe the history of the particle affect.
>>: [inaudible] position of A1 affects position of A2.
>> Rodrigo de Salvo Braz: Yeah, right.
>>: Then you need to know A1 at time two.
>> Rodrigo de Salvo Braz: So the closed form approach depends also on
whether these objects are interacting among themselves.
>>: Right.
>> Rodrigo de Salvo Braz: Yeah. So.
>>: It would depend on the evidence, so the evidence change so [inaudible] has
to [inaudible] the evidence.
>> Rodrigo de Salvo Braz: Well, yes, sure.
>>: The evidence [inaudible] the previous time step.
>> Rodrigo de Salvo Braz: So what you do, what you do and what people have
done in data association models in the past, when you get an observation of
something that you don't believe has been observed before, then you use the
closed form. Because I never got evidence about this guy, so I know that that's
the distribution this time step given that I got no evidence so far.
Now, if you have observed it before, then you have to use the history, right,
because you actually know something about that object, so you may have
another closed form but that somehow depends on the history. Yeah. So you
have to do that.
>>: [inaudible] assuming that realistically know which aircraft is giving you which
blip, and if you sort of put [inaudible].
>> Rodrigo de Salvo Braz: Right. That's not the usual situation.
>>: [inaudible].
>> Rodrigo de Salvo Braz: Huh?
>>: If you [inaudible] over that.
>> Rodrigo de Salvo Braz: Yeah.
>>: Then you also have to input about whether that [inaudible].
>> Rodrigo de Salvo Braz: Yeah.
>>: In that case, [inaudible].
>> Rodrigo de Salvo Braz: Yeah. You're absolutely right. When you don't
have -- if I just tell you, hey, that's my set of blips, that's their positions, I don't
know where they are coming from, then you have to generate the positions of our
aircraft to be able to explain that set of blips. And then this problem of
instantiation wouldn't occur because you always instantiating everything, so you
always have everything available.
>>: [inaudible].
>> Rodrigo de Salvo Braz: Even that. I say where is that blip coming from, I
have to instantiate all the aircraft and say which is more likely, which aircraft is
more likely to have generated. So then that problem wouldn't arise, which is not
necessarily a good thing because it would be instantiated in everything. And
again you could have a form of lifted inference, right, in which you keep closed
form representation for a generic object representing all objects that have not
been observed so far. Because those are indistinguishable, you know, you know
exactly the same thing about them, you know the prior of the -- at the beginning
and you know you never knew anything else. So you could use some kind of
listed inference.
And actually when you look back, previous work in data association essentially
that's what people are doing, they are implicitly representing unobserved objects
with a closed form thing without representing individuals, so.
But that's all done by hand and it's done for specific models. The challenge here
is actually having a algorithm that can take a model and figure all this stuff out.
Closed forms and lifted inference and all that. So it's a --
>>: [inaudible]. So suppose if you at the beginning you only generate, you
generate a number of how many aircraft you have, suppose you sample to be
one, and [inaudible] so 100 aircraft, so how do you [inaudible].
>>: Yeah, so that's a good point. This is a temporal -- the number of aircraft
here is a temporal variable, right? That once you sample it, it's done, it's fixed.
And that's a problem that arises in DBNs as well. Because if you have DBNs as
the temporal variables, you have to do something smarter than just sampling it
once. And people have come up with solutions for that. One of the solutions is
to at every time step to apply a transitional kernel that preserve the posterior and
that changes also the variables of the atemporal variables. We could do that
here as well. We could change the number of aircraft at some point.
But then you're going to have to regenerate the history. It's going to be a very
expensive.
>>: So how do you ->> Rodrigo de Salvo Braz: We don't handle that right now. That's still something
to be [inaudible] with. We are working on having a MCC transition kernel that we
can apply to BLOG models which is not that simple. Once we do that, then we
can apply it to a temporal models. But then we to figure if it's going to be -- how
efficient it's going to be to change those variable values. It's still not very clear to
us.
>>: Is there [inaudible] for dynamic programming approaches? Is that just
[inaudible]?
>>: I'm not sure. Because the structure here is so -- you know, dime program
usually depends a lot on you know exactly your variables and everything is fixed.
But we haven't thought that much about that. So maybe there is hope. So some
of the things that we can mention about for instantiation is that one possible way
of dealing with it is preinstantiating things that you know will eventually become
necessary, even if they are not necessary for the current evidence.
If you know the type of queries and evidence that you are getting, a template kind
of queries in evidence, then you can infer what variables need to be instantiated.
You could even do smarter things like what's the -- what's the probability of
needing this variable and instantiating it according to that probability? So it can
do pretty sophisticated things here.
Other things using things like meets and time to decide how much the sample
back. Maybe the initial position is completely relevant at this point anyway. So
you can also try to make a decision how much to sample back. And the closed
form things and the other things that you mention are also ways of dealing with it.
Okay. Last issue on BLOG inference that's not related to [inaudible] instantiation
or even to DBLOG. DBLOG it turns out that as we worked with D the BLOG that
many limitations in BLOG because this data association for example was one
thing that brought many challenging things that we had to rethink about BLOG.
So this one of those items.
So just showing here how basic things work, right, rejection sampling. You get
the evidence, then you sample the value for that evidence and you see if it's
equal or not, and then you keep the sample if you get the evidence.
And you have likelihood weighting, you have the evidence, your sample values to
know its distribution and you weigh the sample according to the likelihood. Now,
suppose you get evidence like that? I'm telling you the set of values for happy P,
for each person P is true, true, false. So how do I do likelihood weighting here? I
could sample the values of those variables, right? Suppose I know I have three
people and then I sample that, and I get true, false, false. And that didn't match
the evidence, so I throw the sample away and I keep doing it.
In this particular example doesn't look so bad because it's only 8 possible values
but it's still pretty bad. Now, if you have variables with lots of values then it's
going to be pretty hopeless, and if you have continuous variables, then it's really
hopeless, right, you're always going to get probability zero. If you just sample the
values and then you compare with the evidence.
This becomes just like rejection sampling. It's no better than rejection sampling
because you're just sampling the values and comparing to the evidence. Even
though technically speaking this is likelihood weight it's not rejection, same thing,
because we are -- the evidence is this variable here and we are not sampling
that. Because you have deterministic function between the evidence and its
parents ends up being just as bad as rejection sampling.
So this is one of the problems that we came up -- this is very much like this is the
blip situation, right? If I tell you that's the type of query that's going to be more
common in the real application. I just tell you okay, I'm seeing through blips and
those are the positions of the blips. I don't know where they are coming from and
so on.
So you have to give the likelihood of such evidence. And if you just sample the -those variables, it's not going to work. And this was not supposed to be here. So
here what we do is the following. Instead of just doing blindly likelihood
weighting which is sample values for the parents and you compute the likelihood
here what we do when we sample the value for the parents, we determine its
likelihood and we know here it's distribution, we know it's a Gaussian.
So now I know that to have a valid sample I need to have one of those values.
So I just do a -- I do an important proposal that's going to pick one of those
values proportionately to the likelihood of each of them. So if I do that, here I
pick for example 1.2 for the parents. And then the probability of picking 1.2 is
going to be proportional to the likelihood of each value. So I know the probability
of getting that value my specific proposal. I can keep track of that.
And then I do the same thing for the second parent. Now I only have two values
available, I have this choice, I keep track of that choice likelihood, probability and
I do then the last one I don't even have a choice. So now I have a proposal
distribution that's going to be much more efficient, it's going to work.
And not only that, it's going to pick values not only that are valid but also that are
usually more likely. Because I'm going to consider the distribution of the parent
before I pick the value. So this is one of the chance that we did. This is not even
DBLOG, this is just BLOG. This is just the BLOG mechanism for giving likelihood
to evidence. And the nice thing about that, you know, the important sampling
part of it, it's very well known, it's something that people do on a regular basis but
usually people do this by hand. And the nice thing about having this language
that has high level constructs like sets and so on is that the model automatically
can look at these constructs and build a proposal that's going to be effective
without you having to say anything.
This is something -- I don't have to write a proposal to get this done, I just write
my evidence and the algorithm looks at it and sees this is a set and that's how I
have to do it. So that's one of the advantages of having this higher level
structure available to the algorithm.
All right. Future directions and all that. One of the things I mention to you before
is that lived inference is a exact procedure that requires clusters of random
variables to be symmetric. And that's not a very realistic assumption. So really
we have to do something that uses approximations. The intuitive example here
is the following. If I'm thinking about the population of the United States and I'm
trying to make decisions and I estimate probabilities like in the elections and
things like that, I don't have to consider every individually. Even though each
individual I may have particular facts about each of them, I may have databases
talking about all these people but I still can do some kind of lifted inference. If I'm
willing to ignore irrelevant factors or things that are somewhat relevant but not
much or that not going to affect the result all that much, so I can ignore that and
have something that has a margin of accuracy.
So there is a generalization to lifted inference that takes that into account that
doesn't require symmetric objects. So that's a very important direction.
Another thing is parameterized queries, right? What I showed before the queries
always were propositions. And the reason for that is that if you have unbound
variables in the queries then you have to keep track of possible variables for the
logical variables in the queries and it complicates things. But it would be really
interesting to be able to ask something what's the probability of X being sick with
measles and then the answer is something like this. If X is John, then the
probability is such. If X is Mary, the probability is such. If X is someone else, any
of the other people, probability is such.
Also, it would be a nice link to logic programming because that's how logic
programming does, right, it instantiates, it binds all those logical variables for you
when you give a query. So that's something that's still to be done. Also we don't
have function symbols explicitly, so that's also nice if we could write things like
that. As I was saying before, right, the lifted inference language is not very
expressive like BLOG is. So these are all directions to make it more expressive.
Also using equality adapting parameterization, things like that. And of course
learning the parameters of functions.
Okay. Take home messages here the following. So it's useful to keep the first
order representation not only for representing your model but also doing
inference and using that in the inference. If you do lived inference then you have
something that's equivalent, it gives you exactly same answer as the
backgrounded inference but it's much faster. And as for DBLOG, this is about
bringing these techniques, temporal processing to this first order representations
and all the variations that [inaudible] requires.
So basically what I want to do in the future is to increase the lifted inference part
of it and come up with something that's a inference engine that people can
actually start using for real problems and not very different from what Pedro
Domingez [phonetic] is doing with Markov logic networks. He has already quite
advanced engine, and that's the direction that I also want to follow. He's using
very different techniques now. So if he's not -- now he started using lifted
inference but before he was also doing propositionalizations.
And that's it. Thank you very much for your attention, for staying for so much
longer.
[applause]
Download