>>: Hi, everyone. It's my pleasure to reintroduce... talk series where we discuss making neural networks understandable. ...

advertisement
>>: Hi, everyone. It's my pleasure to reintroduce Paul Smolensky to continue with his
talk series where we discuss making neural networks understandable. And I just
wanted to do a preview that we are going to do part 3. Paul has agreed to do apart 3
which will be most oriented towards linguistics on this coming Friday.
And thank you for listening. We really appreciate your taking the time to give this
extended talk series.
>> Paul Smolensky: Well, I appreciate the very valuable introduction I have been
getting. And although I had said that the talk part 2 would be addressing linguistics I
really should have said language. It is not so linguistic-y.
Okay. So last time I mentioned that there were two goals that I had when I came here.
One to do with reverse engineering, trying to understand networks that exist and I'll
come back to that towards the end. But what I'll talk most about is the goal of
engineering networks which are built to be understandable and to hopefully invert you
of their greater interpretability, allow us to program into them useful constraints that will
allow them to perform to a higher level of linguistic competence. And in virtue of being
able to process discrete computational structure which is not really inherent in the
neural medium. Okay. So this is the, I think pretty much the same list I had last time.
I think we got to item number 2 or something. We'll see, maybe we'll get farther this
time.
The challenge you may remember concerns trying to unify two approaches to artificial
intelligence or cognitive science. And those are stated here in terms of hypotheses.
One, about the value of symbolic computation as a means of description of cognition
and intelligent behavior. And N, the other one, the value of neural computation for the
same types of purposes. And the approach that this system called gradient symbolic
computation takes is to posit a cross-level mapping, to view H as describing a more
macroscopic level of the same system that N is describing at a more micro level. And
to link the two by an embedding mapping called psi here which maps symbolic
structures into a discrete subset of the vectors in the neural vector space. So here is
the picture I put up the last time to remind you about analogies to other fields, where
we have macro structure emerging from micro structure. The mapping psi is what links
this kind of description to that kind of description.
And you remember the idea is that to try to design networks with the capability of
computing functions which on one level of description are taking structured symbolic
inputs like perhaps trees and mapping them into other structured objects, perhaps
propositional representation of the meaning of the sentence and to compute that
function not by operating on the symbols in an explicit way but by taking advantage of
this embedding to put the input into the vector space of neural network states, apply
neural network computation to produce an output which then can be reinterpreted
symbolically if desired.
And so I'll say some words about the kinds of function F that have been shown to be
computable in this way. And those theorems are the formal arguments that I can offer
that this computational system I'm describing really does provide a kind of unification
of these two quite different approaches to modeling intelligent cognition.
So I also mentioned that the result of this particular way of doing the embedding is a
kind of intermediate level in which tensors figure prominently. So we call that tensorial
level and so we can add a kind of intermediate hypothesis about the value of gradient
symbolic computation described at this level for characterizing the knowledge and
processing in intelligent systems.
Okay. So now, there's an important step which I would love to do right now. This is
where it belongs. But given my experience last time I fear that we might not get
beyond it. So I've decided to postpone it. And I hope we will get to it at the end
instead of at the beginning. But part of it has to do with characterizing neural
computation as involving not just activations and neurons but representations that are
in some sense sub-symbolic, where the conceptually interpretable entities, and this
talk is all about interpretation, the interpretable entities are distributed through
activation in extended parts of the network.
So that's a very important part of the whole story, but I propose to postpone most of it.
I will mention a few of the reasons for distributed representations centrality, but leave
the most important one in my way of thinking about it to the end. So for engineering
purposes, distributed representations provide important similarity relations. So if you
have one unit dedicated to one concept and another unit dedicated to another concept
and all the concepts have single units dedicated to them, then all concepts have zero
similarity to each other. That's how local representations are and on the contrary, if an
activation pattern is what is encoding a concept, then two different concepts will have
similar or more similar or less similar activation patterns. That similarity will play an
important role in determining how what is learned about one of these concepts will
generalize to the other.
There is the fact that you can get many more distributed representations than you can
get local representations in a set of N neurons. There's only one representation per
neuron in the local case. Distributed representations afford the opportunity to pack the
N dimensional space with more than N conceptual entities, using distributed patterns
to encode them.
Okay. Then for reverse engineering it just seems to be a fact that the representations
that learned by networks in their hidden layers and the representations that we find in
the brain have very significant distributed component. Rarely you do find individual
neurons that can be given a conceptual interpretation, but that is falls far short of
providing enough understanding of the system to explain how it functions and why it
succeeds and why it fails when it does. So understanding neural computation as
trafficking in distributed representations is an important for understanding ->> Audience: In biology there are such terms local [indiscernible] so how does that
reconcile the possibility that there is maybe a small number of neurons that maybe do
have local representation?
>> Paul Smolensky: Well, the principle is that the design of networks must be such
that distributed representations are possible. But local representations will also be
possible and in some senses perhaps for some purposes preferable.
But local representations are a special case of distributed representations. If you are
set up to cope with distributed representations, then you can specialize to local, but the
reverse is just not true.
Okay. And so there is a fundamental aspect of neural computation in my view. What
I'm postponing is a very general symmetry argument which leads to the conclusion that
neural computation must always allow for distributed representations, as I just said.
And I propose to skip that. Let's see if I can skip it.
Okay. That didn't work. Let's back up here. Hmm, hmm, hmm. Okay.
So I'll move on to the next topic now. And just review the actual proposal for how to
embed symbolic structure in distributed activation patterns in neural networks, what
this psi function looks like. So just lightning review of what we went through rather
slowly last time. So we will be using patterns of activation in which the individual
activity levels can be thought of as being the elements of tensors. And the Nth order
tensor is characterized by an N dimensional array of real numbers. There will be N
subscripts or indices to distinguish these members from each other.
The two basic operations that we talked about are first the outer or the tensor product
which increases the order of tensors. So if you multiply A and B together using the
outer product, you get a tensor of order N plus M. The sum of their orders. Whereas
contraction decreases the order. So contraction is tagged to a particular contraction
operation is tagged to a particular pair of indices in a tensor. So to contract over the
specific pair IJ, in a tensor that has rank order at least 2, is to take the Ith and the Jth
index, indices and replace them both by Q, and then sum overall the possible values of
Q. And that's what is written out in a painful way here.
So it takes two indices, sets their values equals, and sums overall the possibilities. So
special cases of this that are very familiar is the dot product. When the two tensors are
just first order. The matrix product when the two tensors are second order, that
involves taking two indices. The second index of this and the first index of that, which
means the second index of the tensor and the outer product and the third index of the
outer product, setting those two equal to each other. That gives us, and summing
overall the possible values gives us the matrix product.
But whenever I write two symbols like this next to each other, I'll try to be consistent
about using this open face for the tensors. You should not interpret that as a matrix
product unless it's explicitly described as a matrix product. Otherwise, everything will
be outer products because, as I told you last time, I'm experimenting with omitting the
outer product sign, which otherwise would be covering the page. Okay. So the
general concept of inner product with respect to two indices involves basically taking
two tensors, A and B, taking their outer product and then contracting over two indices,
one falling within the indices of A and the other falling within the indices of B. The
matrix product is one case of that. But we have use for inner products more generally
because the symbolic operations that these tensors are embodying will use the outer
product to bind information together and will use inner products to separate to extract,
unbind symbolic elements from one another. So some of the examples that we looked
at last time are shown here. So a simple case of a kind of slot with a name filled by
some element is represented. The representation of that binding together of this role
and this filler is achieved by the tensor product. So if the agent role corresponds to the
tensor A and the element individual J corresponds to the tensor J, then it's their outer
product that represents this pairing, this order pair essentially.
In the case of a graph we have two nodes linked by some labeled arc, let's say,
relation R. And again, we just take the outer product of the tensors that encode the
individual labels on the nodes A and B as well as the label on the arc that joins them.
So writing these tensors here in the simpler way, we just take the through A outer
product of those three tensors to encode that one triple or one edge of a graph. And
very analogously if we have propositional type representation, we have a relation
expressed in terms of two arguments for a binary relation. Then we would take the
outer product of three tensors, one for the relation and one for each of the arguments
and for the hider ->> Audience: The hidden that one is undirected and the other is directed?
>> Paul Smolensky: They are both directed in the sense that RXY and RYX are
different tensors.
>> Audience:
[speaker away from microphone.] the first one, not finding also to be interpreted as
very [indiscernible] obviously there.
>> Paul Smolensky: Yes. These are just two ways of writing the same thing.
>> Audience: Okay, so you could swap those? You actually do the real kinds of
computation at the end, sometimes you don't know which order it is because once you
do neuron, all the computation, you just get the tensors [indiscernible] sticking to one
definition there in order to [indiscernible.]
>> Paul Smolensky: Right, that's right. So a tensor product representation brings with
it a discipline for interpreting the different dimensions of the tensor, how they are
arrayed in the activation vector or activation pattern.
>> Audience: In the example you mentioned between not link node versus condition
planning [indiscernible] it is interesting why there is a link for two different ways.
>> Paul Smolensky: There is no reason to have two different ways. These are a
bunch of examples, things that are familiar from symbolic descriptions in AI.
>> Audience: I see.
>> Paul Smolensky: And how we encode them, that's all.
>> Audience: I see, okay.
>> Paul Smolensky: In the project that we are -- there are two projects that I will
describe very briefly if I get there and one of them is using this notation and one of
them is using that notation. So I wrote them both out. Otherwise it's kind of
redundant.
Okay. And here is the important point that came out in discussion after the lecture with
Li, in fact. So I wanted to emphasize it on this review that another type of tensor
product representation uses something you can think of as absolute positions within a
structure to individuate the symbols that comprise the structure. So in this case what
we are talking about is a string, AXB. And X just the representation of the X in that
string is one way of doing that is by having a vector that is associated with the role that
it plays, the position, which is the second position in the string. So R2 is the variable
whose value gives you the second position in the string. And in this case its value is X.
So the outer product of this vector R2 and X is the tensor that encodes the single
constituent X within this string.
With trees, you have a similar story, but now the set of positions is recursive. So in a
structure like this where we have this tree, if we want to talk about the X in the middle
of this one, then we can talk in the same way in terms of tensors that encode the
positions within the tree. In this case I tend to label them with bit string. So R01 is the
position which is the left child, zero, of the right child 1, of the root. So the path to the
root is indicated by these bits in the name of the role. And it doesn't matter whether
you do the R before the X or after the X, as long as you are consistent in carrying it
out.
>> Audience: May I ask a question?
>> Paul Smolensky: Yes.
>> Audience: During the study you have a [indiscernible] with A and B. Do you mean
it is for later on that you substitute the exact tensor of A and B? And that is just a
simple, that is a representation that is a tree you need to use this as the representation
of the X? But later on if you have real value of tensor A and B, you need to put A and
B somewhere to use together with this? I am not sure, what is the meaning of that
tree? The gray letter A and B?
>> Paul Smolensky: Hold on until the next slide, which is just one line away.
And the fact that the positions in trees are recursively related to each other can be
captured and has to be if you want things to work out to do recursive function
computation by taking a role like this one, or a position 01 left child of right child of root,
and expressing it in turn as an outer product of a vector that represents left child and a
vector that represents right child in the particular order associated with the path to the
root. So this, in these I neglected to mention that I have been dropping the explicit
encoding of the order of the tensors because we were just talking about first order
tensors all the way down here. But now in order for us to be able to talk like this, we
have to say that we can start with first order tensors, vectors for the primitive roles of
left child and right child, and then generate an open-ended limitless set of additional
tensors that are used to encode all the positions below. But -[sneezing.]
>> Paul Smolensky: Gesundheit.
Their order increases as you go down the tree. So this second level has second order.
>> Audience: But [indiscernible] because it is --
>> Paul Smolensky: Yes, you can infer that the two ->> Audience: So this still doesn't ... [speaker away from microphone.]
>> Paul Smolensky: It is not necessary and I may not carry it through. I'm not sure I
remember. I at least wanted to point it out here because it is the first time that higher
than first order tensors are being used in the examples.
Okay. So binding is done by the outer product. I will -- oh, yes. So the point I wanted
to emphasize and just skipped yet again was that in order to encode the structure,
symbolic structures like strings and trees in vectors, we make explicit the different roles
that symbols can occupy. Those roles are normally implicit in the way we draw the
diagrams, in the way we string the symbols together in a string. So it takes a little
getting used to sometimes to be making completely explicit the notion of role within a
structure because it is so often the job of a notation to hide that; make it implicit. But
we make it explicit in order to carry this forward.
All right. So outer product is used to bind together symbols and the roles that they play
in a structure or symbols to one another, if they are bound together in the structure.
And the remaining operation that we need is the means for putting together multiple
constituents to give the vector for the whole, which is what you were asking about with
the gray letters in the previous tree example. And it couldn't be simpler. It's just done
by addition. So if we have two variable structure like this, then each one is
represented the way I just said, but we add the two together to get the representation
of the structure as a whole. If we have two arcs, we add together the triples that we
talked about a moment ago. If we have three letters in a string, then we add together
three tensors, each one encoding a single symbol in its location and, for the tree, it's
the same story. Here we have three constituents to add together. We talked about the
X one, but here are the other two. They get added in. And by using the recursive
property of the definition of these embedded roles, that this role here is in fact the outer
product of two primitive roles for left and right child, we can observe a nice recursive
property of this representational scheme. So once we expand this out into its primitive
parts, we can factor out the R1 that these two have in common. That is to say these
constituents are both hanging off the right child of the root in this position. We factor
out the R1, then we get this expression here for the tree. And that is nothing but the
embedding of the symbol A bound to the left child position, and the embedding of the
tree structure now, not atomic symbol but embedding of the tree structure bound
together with the right child role.
So any binary tree that consists of left child P, however complex that sub-tree might
be, a right child Q. The encoding of that will be expressible in terms of the embedding
of P itself, the embedding of Q itself and then the two roles that each get bound to
when they are combined.
Okay. So I call that a recursive representation because it obeys this equation here.
Yes?
>> Audience: Is the conjunction approximate or do you take care of trying find all
these tensors and [indiscernible]?
>> Paul Smolensky: In the process of constructing our representation from a symbolic
structure, it's exact. In the process of reverse engineering, we are going to have to
make due with approximate summation, but ->> Audience: So how do you ensure that the first -- the first example, how do you
ensure that you get the vector that in truth or [indiscernible] rather than some kind of
posited elimination when you add that [indiscernible.]
>> Paul Smolensky: So what is encoded in my way of thinking about it is the
conjunction of two propositions. One asserting that the agent is J and the other
asserting that the agent is K.
>> Audience: How do you enforce the conjunction? I don't understand, I mean, this is
-- this is an assumption of the model, but how do you know that what you are going to
get is the conjunction? It doesn't follow the exact -- I mean, it is not clear to me that
you can do it with an addition. I mean, I can imagine how I would make it happen by
using proper vectors for agent and J and so on, but if they are not orthogonal, if they
don't have certain properties running through the data, it's going to be some kind of
fuzzy conjunction.
>> Paul Smolensky: Well, yes, if we are learning these from the data, then all bets are
off as to what extent these summations and the outer products themselves are going
to prove useful as a description of what has been learned in an unconstrained system,
I mean. We can build systems to be constrained so that they will always use addition
and outer product operations, and then we know that they are exact.
But for reverse engineering a generic network that has not been built to specifically use
tensor product representations, then all of this will at best be an approximation to what
we will find in there, I'm sure.
>> Audience: How about -[overlapping speech.]
>> Paul Smolensky: I did want to point out that, let me use the term proper to describe
a tensor product representation in which all the vectors and coding symbols and roles
are linearly independent of one another. So at the very least what we can then say
whether or not you're convinced that this summation can encode, that it would encode
conjunctions as opposed to something else, we can at least say that this is the result of
this is unambiguous as to what is bound to agent and what is bound to patient.
And then this is the world in which the theoretical work that I have done takes place,
the world of proper tensor product representations.
Firstly, the world of Microsoft is different because we can't afford to have as many units
as it takes to have linearly independent vectors for all of the symbols.
>> Audience: Right, and that is the other question that I was kind of meaning to ask
maybe later. Maybe now is the time. A lot of this, I'm sure that theoretically it is your
professional analysis, but [indiscernible] it is a inefficient way of representing symbolic
representations if you do it naively. Do you revert back later to not expanding and then
projecting, but actually computing your inner products rightly? [indiscernible]
>> Paul Smolensky: So the context of what defines an acceptable kind of computation
that I've worked in is implementable in a neural net using neural operations. Notion of
efficiency is rather different because you have parallel computation of the multiplication
and addition operations available to you.
And I have not explored what it takes to do efficient emulation of these computations
using digital architecture instead.
But that certainly is also a critical thing now. Yes?
>> Audience: In the recursive tree, one, you're adding tensors of different
dimensionality, right? But is that a problem?
>> Paul Smolensky: Well, what you have to do is you have to have a big vector space
in which these are sub-vector spaces. The subspace of order 2 tensors, the subspace
of order 3 tensors all together in one big happy vector space.
So in that sense it's addition within that bigger space that is well defined. Sometimes >> Audience: [speaker away from microphone] -- dimensionality sort of boosted up to
this high dimensionality, but for going the addition?
>> Paul Smolensky: In some sense, yes, that's right.
>> Audience: But how do you decide what slice of the -- you just kind of arbitrarily pick
like a corner of the space things slide to in their best dimension?
>> Paul Smolensky: Yes, I guess that's a way of saying it. Yeah. So you have the -the basis for the space as a whole is all built up by multiplying together these R0s and
R1s. And so if we just em belled R0 and R1 themselves in the large space, then that
picks out a two-dimensional corner where the depth one trees will live, and so on after
that.
It's all ripped off from particle physics without change. Multiple particle systems consist
of a direct sum of spaces for three particles for particles, five particles. So in the
equation here -- let's see.
Oh. Now, this equation here, right. So when we add together these tensors of
different order, there's some references. The symbol here could also be the direct sum
symbol instead of the regular sum symbol because we are talking about adding
elements of two different subspaces in the bigger space. So if you are happier thinking
about it that way that's also a perfectly legitimate way of thinking about it. But I didn't
think it was worth going into that, but I guess I just did.
Okay. Further questions before I go on?
So we use outer products to bind together symbols to each other and to their roles in
the structures. And to extract information from a tensor that encodes multiple
constituents to unbind roles and figure out what is the symbol in the second position in
the string, we use the inner product. And just for convenience, I'm going to assume
that the vectors and coding symbols are not only independent, that they are ortho
normal. They are orthogonal to each other and normalized to length 1 and we'll make
good on that asterisk at the bottom of the slide, to back off from that a little bit.
So if we take the example of strings, if we want to extract the symbol in position K
within a string, what we want to do is unbind the vector that the role RK, which was
used to bind that symbol originally into that position. So here is our representation of
the string AXB again called S. If we want to find out what is in the second position we
need to take the inner product of S with the second role vector, that's the vector -that's the role we are trying to unbind. We take an inner product of these two and the
inner product is a linear operation. So the inner product with S is just the sum of the
inner products with all of its constituent tensors and because of the ortho normality
assumption, R2 and R3 are orthogonal. So their dot product is zero. Similarly here for
2 and 1. The only one that doesn't get wiped out in the inner product is exactly the role
we want, R2.R2 is just one. The vectors have been normalized. So we end up pulling
out exactly the filler of that role in the structure as a whole indicated here.
And exactly the same thing works with the trees. Exactly the same thing works if we
are trying to extract information from the graph kind of representation I showed earlier,
but it plays out a little bit differently because we don't have positions in the graph. We
are not using that notion of role. We're identifying the nodes in terms of the positions
they play within these local three-way relationships instead. So the kind of question
that we want to ask in this case is: What is at the tail of an R edge from A? That is,
what is related by R to A according to the graph. Here is the representation of our
graph. It is representing the two triples. And if we take the product of the two bits of
information that we're using as our retrieval queue, the relation R and the Adam A,
take their product here, that is what we want to use to probe the tensor for the graph
as a whole. That inner product will, for just the same kind of reasons as we saw
above, wipe out all of the constituents except those that in fact have AR in them and
that will pull out just B as the result. Okay?
>> Audience:
[speaker away from microphone.] the other check relational finding that is
[indiscernible.]
This is R here?
>> Paul Smolensky: Yes. So we need to just keep our discipline that R is in the
second, that the second dimension of our third order tensor is where the relation goes.
We could have chosen it to be the first one instead.
And then we would have RA here instead of AR as the probe.
Okay. So as far as this asterisk is concerned, so in a proper tensor product
representation, what we have is that the vectors are linearly independent. They aren't
necessarily ortho normal. But the mere fact that they are linearly independent means
that there exists another set of vectors which we can use exactly in this way. So
instead of R2, we use R2 plus in that case. Instead of R01, we use R01 plus. These
dual vectors have exactly the property needed to make this calculation go through,
namely they are a dot product with all the other roles as 0 and they are a dot product
with their own corresponding role as 1. RK dual dot RK is one. RK dual dot any other
RL is 0.
So all the calculations go through as before.
>> Audience: So what do you -- [speaker away from microphone.] So this is some
notation for the ...
[speaker away from microphone.]
>> Paul Smolensky: Well, we can't do this, we can't unbind by taking the dot product
with the vectors that we use to do the binding, if those vectors that we are using to do
the binding are not ortho normal.
>> Audience: Oh, I see, okay.
>> Paul Smolensky: If they aren't ortho normal, then the vectors that we want to use
to do unbinding are not identical to the vectors that we use to do binding.
>> Audience:
[speaker away from microphone.] you don't want, you want tensor everything to be
zero so you can ...
[speaker away from microphone.]
>> Paul Smolensky: Either you use the exact unbinding vectors here which will make
the calculations go through exactly the same and you'll pull out exactly the right filler
for the role you're unbinding. If instead the vectors are approximately orthogonal in
some sense and you don't use the exact unbinding vectors here, which are the dual
vectors, instead you stick with using the original role vectors to do the unbinding, then
you will get some noisy approximations.
>> Audience: [speaker away from microphone] normal to begin with? Because you
can always orthogonalize everything.
>> Paul Smolensky: Well, one of the big advantages of distributed representations is
that we have similarity relations among the elements. So we don't want to lose the
ability to use that. We want to take advantage of that.
Yes?
>> Audience: [speaker away from microphone] -- conjunction, but what about this
junction? This makes sense ... I mean, suppose ARB or ERB.
>> Paul Smolensky: Yeah. Well, I would need to go up a level and put the logical
structure of these expressions in a form in which we are actually con joining clauses
together, but the clauses include things like disjunction signs. Then the operations
have to be the ones that are appropriately interpreting those signs. So it won't come
for free in the same way this low level conjunction comes for free.
So it may be a little misleading to even use the term conjunction for this, but certainly it
is true when you write down a graph you are saying that this link is there and this link is
there and this link is there. That's what the semantics of the picture is, right?
>> Audience: [speaker away from microphone.]
They are not things that are true/false. They are things that -- [speaker away from
microphone.]
>> Paul Smolensky: Yes, right. So operations on them could produce 0s and 1s, but
they themselves have some kind of richer content, is the idea anyway.
So what does gradient symbolic computation get from tensor product representations?
It gets a new level of description where we can talk about where the data are these
tensors as opposed to just a bunch of numbers which are activity values as opposed to
a bunch of symbols in some structure. We have tensors as a kind of intermediate
level. And we can derive techniques that apply to arbitrary distributed representations
which are interpretable in the sense that we can understand exactly what the data are
representing. We can build knowledge into networks that process these
representations because we know how the data are represented. We can construct
programs to do particular calculations that we want. I'm going to give you some
illustrations of that. We can write grammars that will allow networks to pick out the
tensors which are the embedding of the let's say trees that are generated by the
grammar or evaluated by the grammar. And of course, you also get massively parallel
deployment of this knowledge as I mentioned a moment ago. So this is all intended to
be construed as neural parallel computation.
For cognitive scientists we get a set of models that are really more about unconscious
automatic rapid processes than they are about deliberative ones. So reasoning about
disjunction is something that we do deliberatively. It would be handled within a larger
architecture in which there are inference processes that are built into network
machinery.
Okay, so ->> Audience: The assumption that I'm finding automatically [indiscernible.] It's all the
neuron computation is mentioned in resultant factor. Somehow [indiscernible.]
>> Paul Smolensky: Well, the main thing about unbinding is I don't think the brain
does it.
>> Audience: [speaker away from microphone.]
>> Paul Smolensky: We do it when we try to interpret the states of the brain, but so
that was a gross exaggeration, but the point is that whereas straightforward
implementations of sequential processing and so on would have us unbinding all the
time before we do anything, and I think it's the exception rather than the rule that
unbinding would be done ->> Audience: [speaker away from microphone.]
-- a delay or something, in the previous slide?
>> Paul Smolensky: It's just a very crude characterization of sort of the threshold of
consciousness and what kind of processes are not accessible to it. Okay. So I want to
give you some examples of programming with these tensor product representations
and what kinds of products can be computed as a way of just arguing that this isn't just
any old way of packing, of taking vectors and giving them names like this tree, that
tree, this tree, but rather it's a form of vector and coding in which neural operations can
do what we want in terms of computing functions for which those symbolic structures
were posited in the first place. Next time I'll talk about grammars, and I won't talk
about that today, but I'll talk about something more straightforward. Than and we are
going into the dark land of super nerd slides for just a little while. Hopefully it will give
you a feel for what I mean when I say that these networks can be programmed.
So suppose we have a function that takes a sentence of a certain structure and maps
it into an interpretation in the form of, some logical form like this. I want to construe
that really as a mapping between binary trees and here is a toy binary tree for
something that we can pretend is a passive structure that gets mapped into a binary
tree encoding of this proposition here, one of many ways of doing it.
But having chosen this particular way allows us to write this function down in a lisp-like
notation extracting the right child of a noticed, extracting the left child of a node, putting
two children together to form a mother node from them. Those are the primitive
operations for tree manipulation here. And in terms of them, we can write down an
expression for this function. And no matter how big P is as a sub-tree, no matter how
big the agent sub-tree here is, this function will do the right thing and put all of A here,
however big it may be and all of P there, and so on.
So here is a network that computes this function in the sense that this group of units is
the input pool and the activation pattern here is the embedding using the tensor
product isomorphism of this tree, with some selection having been made for what
activation patterns correspond to R0 and R1 and the symbol A and symbol aux and so
on. Actually we don't need A. We just need aux.
Having made some choices about what numerical patterns will be used to embed the
primitive bits of this, it's all assembled using the tensor product schema I just laid out.
Here is our input to the network and this output pattern stands in the same relation. It's
the embedding of what we hope will turn out to be this. And the operation from, the
input to the output is nothing but a matrix multiplication. This is a linear network,
simplest kind of connection network you can have. And so the implementation of this
function is just multiplication of the input by this vector of weights here.
Now, we can actually write down what the weight matrix needs to be. And here it is.
So having this expression for the function in terms of the primitives of binary tree
computation, we can write down an exact expression for what this matrix is like, looks
like. This is really the upper left corner of an infinite matrix or an unboundedly big
matrix because each of these things are actually very, very simple, but unbounded
matrices that implement extracting the right child and extracting the left child and
stringing together two children. So despite the fact that we have a distributed
representation in which members of this tree are mooshed together, we know exactly
what weights it takes to produce exactly the right moosh to embed the output that we
want.
>> Audience: The way to -- [speaker away from microphone.] based upon the
principle you talked about earlier and [indiscernible] You said binding operation, you
get that from ->> Paul Smolensky: Yes, essentially, essentially, yes. So essentially what this does is
take an inner product with the dual of the role vector for binding the right child, that's
right. Yes?
>> Audience: [speaker away from microphone.] Unique, given the input and output?
>> Paul Smolensky: This W, the total wallet picture?
>> Audience: The previous slide.
>> Paul Smolensky: Hmm, well, I guess if the -- I don't know. I think so.
>> Audience: [speaker away from microphone.]
>> Paul Smolensky: What is learning?
>> Audience: [speaker away from microphone.]
[laughter.]
>> Audience: This one doesn't require learning.
[speaker away from microphone.]
You have many, many examples. You won't be able to see that directly. At least you
do the least square, you get optimal [indiscernible] that gives the error. Is that the way
you think about it?
[speaker away from microphone.]
>> Paul Smolensky: Well, it is an interesting question whether there is a learning
algorithm such that if you give it a whole bunch of pairs that are actually, in the
instance of this mapping, that it will learn a matrix that does it. I will tell you one think
which will help a lot. There is a theorem that says that recursive functions all have a
certain, women at least the ones in the class that include this all have a certain form in
which they are the product of some very simple -- it's really an identity matrix but it's
infinite for all of the depths of the trees, times some smallish matrix which characters
the particular function that you're implementing. So a learning system that was built to
incorporate that structure would be able to generalize from what it observes at shallow
depths immediately to what should happen at deep depths.
So what it really is is building in a kind of translation in variance in the, within the
geometry of the tree. So like a vision system that has translation, a learning system
that has translation invariance built in somehow, you get generalization in the same
sort of way here.
But let me ->> Audience: [speaker away from microphone.]
>> Paul Smolensky: Let me just ->> Audience: [speaker away from microphone.] So you can figure out what the W
error given the constraint on the dimensionality of the ->> Paul Smolensky: I mean, I haven't worked on questions like if we don't have
linearly independent vectors for all of our roles in the tree, what is the best matrix and
how on to, how would you possibly be able to learn it?
>> Audience: That example. [speaker away from microphone.] It is responsive to the
dimensions.
>> Paul Smolensky: So if you mean what order tensor is it? I mean, it's operating, it's
multiplying tensors to produce other tensors. So it's -[overlapping speech.]
>> Paul Smolensky: But it can, we can write -- this expression it self characterizes
essentially an infinite matrix, which can deal with trees of any depth.
>> Audience: [speaker away from microphone] what is W comes actually, is that just
the W
sub[indiscernible.]
>> Paul Smolensky: It constructs a tree by putting together the left child and the right
child to create a binary tree with those constituents. So it takes -- you take W cons 0
and multiply it by the tensor that encodes what you want the left child of the new tree to
be. And add to that W cons 1 times what you want the right child of the new tree to be.
Then you get the new tree with the right children. But it has sort of two parts, cons 0
and cons 1 because it has two arguments essentially.
>> Audience: [speaker away from microphone.]
>> Paul Smolensky: Okay. Yes, thank you. Thank you. These are actually matrix
products here. Yeah. It should say that.
Yes?
>> Audience: So if I have the sentence "Few leaders are truly admired by George
Bush," is it the same W? Because you kind of hard coded that you need this aux, B
and by in the --[speaker away from microphone.] If I insert something, it's a different
tree.
>> Paul Smolensky: Yes, so the function I started with here was not, will not
generalize to that kind of case. If we could write a function using these operations that
did, then we could correspondingly build a network that would compute it. Yeah?
>> Audience: So sorry, I think I've totally gotten confused here. Before the
representation of the tree with the tensor of dimensionality kind of like the depth of the
tree or something, but now you're saying there are 2D functions that are infinite in size.
There was some sort of transformation I missed?
>> Paul Smolensky: Okay. So here is what potentially quite confusing about the
picture. So it is totally because of the two-dimensionality of the page that I have these
units arrayed in two dimensions here. These are the elements of a tensor that has
high order. The order, as you say, determined by the depth beyond which everything
is 0.
So you could array it faithfully in many dimensions. Or you could just linearize it to one
long string. And maybe doing that latter would have been less misleading because
there's nothing 2-ish about it actually. But anyway, this is the tensor that we have
been talking about all along.
>> Audience: Uh-huh.
>> Paul Smolensky: That you get by binding these symbols to their roles in the tree.
That's what this is.
>> Audience: Okay.
>> Paul Smolensky: Then this is an operation that maps one of those to another one
of those.
>> Audience: But the operation is that the 2D matrix multiplied? Maybe I misheard.
>> Paul Smolensky: If you imagine flattening this tensor out to one long string of
numbers, and similarly over here, then we just have a two-dimensional matrix for
taking the one string to another string.
>> Audience: Okay. So then it separates sort of flattens the tensor into one or 2D? In
order to ... not just for the diagram, but for ->> Paul Smolensky: Well, just for the diagram really. So really, you have linear
transformations from the space of tensors to the space of tensors is really what this is
supposed to be trying to depict. So linear transformations from, you know, a space
which may have high dimension to others. We can talk about them without any
confusion but we can't necessarily draw them in such a nice way on the page.
>> Audience: Okay.
>> Paul Smolensky: I mean, if you were to see the actual equation that this is
depicting, what you would see is that W would have huge numbers of indices all over
the place. It would have two sets, huge sets of indices, one for the inputs and one for
the outputs. So nothing to, there would be no 2-ness about that either. Well, except
that it is mapping from input to output. So that is two elements that are being related.
It only has one input, one output. It has two sets of indices, one for the inputs, one for
the outputs.
>> Audience:
[speaker away from microphone.] once you have the space which has tensor structure
inside, how do you unbind, for example, once you have the output? How do you know
that that is a tensor representation of the knowledge of the [indiscernible] which voices
some other representations about the knowledge?
>> Paul Smolensky: So in setting this up, some choice was made about what vectors
to use to encode left child and right child. And in theory you could make one choice for
this em belling and a different choice for this embedding. I don't think I did. I used the
same choice for both. But relative to the choices of the embedding of the primitive
roles, left child and right child, then you can unambiguously take inner products galore
with this vector and R0, R01, R0001, and each one will pull out what fills that position
in the output tree. And so if you were to take the inner product of this thing with R0,
you would get the activation pattern for the symbols. If you did the inner product with
respect to -- with the R1 vector, what you would get is a pattern that encodes this tree
here, the right child. Then if you took that and took an inner product with R0 then you
get a pattern for this tree. If you kept doing that, you would eventually get down to the
atoms like few and George and so on that are embedded in those constituents. You
guys can Duke it out.
>> Audience: [speaker away from microphone] -- the R is from the original test centers
through the matrix to get the Rs in the output?
>> Paul Smolensky: Rs?
>> Audience: If you apply R0?
>> Paul Smolensky: Yeah.
>> Audience: Or R10, you are going to get B, right?
>> Paul Smolensky: Yeah.
>> Audience: If you multiply by the matrix you should presumably be getting RO.
>> Paul Smolensky: Yes, yes. That's what all these guys are in the business of doing.
>> Audience: So the input is tensor and the [indiscernible.]
[speaker away from microphone.] but there's lots of ways of contracting an N tensor
and [indiscernible] is it just that you set things up and you take the first N coordinates
and that's the contraction that you do to correspond to the indicated output?
>> Paul Smolensky: I think so. Why would you do anything else?
>> Audience: I mean, it is quite unique. I can contract and get an N tensor out. There
must be something that is happening inside which ones contract [indiscernible.]
>> Paul Smolensky: Yes, I guess it's true that -- let me back up to something I said
before. So the theorem would say there is a linear transformation from the space of
tensors to the space of tensors, okay? And there is no unique way of writing down a
linear transformation. So you need to make choices if you are going to do that. But as
far as I know, it's no different from the choices that you would make in doing that in
other contexts. Yes?
>> Audience: Does the mapping work hold true for any choice of role vectors as long
as they are orthogonal or do you have to carefully choose them in order to make this
work?
>> Paul Smolensky: As long as they are linearly independent we can do this because
we can define these in terms of the dual vectors which will compensate for any lack of
ortho normality that there might be.
So this function here is linearly neurally computable. Here is a linear neural network
that computes it. So there are results characterizing bigger and bigger sets of
functions that are linearly neurally computable. As far as I've gotten so far, it's a class
of primitive recursive functions which can be defined in terms of one another by
recursions like this. And we can, if anybody's curiosity is piqued, we can come back to
that. Let me try to at a higher level characterize what's going on in these results here.
So first of all, there is a single step operation that is to say massively parallel operation
from the distributed encoding of the input to distributed encoding of the output. That's
what it means to be computing a function and we are implementing them in the
simplest kind of neural network here, linear transformations. Primitive tree
constructing and tree constituent accessing functions are implementable as linear
transformations. So we can realize them as matrix multiplication once we make the
right choices about bases and all of that.
An arbitrarily complex composition of these operations, tree constructing and tree
constituent accessing, an arbitrary complex composition of these is still just a single
linear transformation. So linear networking can compute the set of recursive functions,
that is the closure under composition of the primitive tree operations including the one
example we looked at. And to implement a recursive function that is defined by a
primitive recursion, an equation -- a recursive equation in this class of primitive
recursions that I showed briefly, what we need to do is take the corresponding
recursion equation for matrices relating the input vector to the output vector and solve
those recursion equations per matrix and we have a way of computing the recursive
function.
So this is a slide about that, but I'm going to skip that.
>> Audience: So I have a question. From everything I understand so far, the
operations are all linear.
>> Paul Smolensky: So far, yes.
>> Audience: In this model. Is it going to stay that way? Curious.
>> Paul Smolensky: No.
>> Audience: [speaker away from microphone.]
>> Paul Smolensky: We will get to this point, we'll get to this point right here and then
we will be talking about multilinear later than linear operations.
But everything is stated in the multilinear -- they're sort of like polynomial functions.
Too bad Ronnie is not -- what?
>> Audience: If you [indiscernible] you can multiply these matrices?
>> Paul Smolensky: If you bind politicians ->> Audience: The outer product between whatever you are partitioning, whatever the
partition leader is? I don't mean as words, but categories? Like a leader might be a
word and a politician might be a category.
>> Paul Smolensky: Okay.
>> Audience: Then you bind that and then you can multiply again, basically the same
matrix.
>> Paul Smolensky: Yes.
>> Audience: So then you end up with something that's -- that doesn't replace the
leaders with politicians, but it actually has all that information in the output. It has that
relation plus it has the information ->> Paul Smolensky: Yeah, that's interesting. I think that should be doable, yup. Yes,
that's interesting, yes.
So the example that I looked at involved copying of symbols, but the first, very first
statement about the class of linearly computable functions is that transformations that
leave symbols in place but change one symbol to another and in what you're
interested in is changing one kind of symbol to another kind of symbol, I guess, one
that is somehow first of a category and multiple members and such. Yeah? Chris?
>> Audience: [speaker away from microphone.]
>> Paul Smolensky: All right. So let me see. So the next item here is an example
from cognitive science about what happens when you take advantage of the similarity
structure of the distribute the representations for the roles of symbols in a structure,
specifically the roles of phonemes in syllables.
But we are again running very short of time. So I wonder what -So shall I keep going? Or shall we switch into a more open discussion mode? What is
your pleasure?
>> Audience: Well, I have an open discussion question. So far it seems as though
they are kind of [indiscernible] concept has had its own [indiscernible.]
Is that going to hold throughout the format? You're going to have these giant virtual
spaces where everything is sort of multilinear?
>> Paul Smolensky: That is what things have looked like prior to coming here.
Right. So maybe I'll jump down to the slides about projects going on here. Why am I
not getting my slides here?
So let's see. Okay, so one of the will projects involves taking tensor product
representations for syntactic trees, dependencies parses and mapping them into
tensor product representations for graphs in the formulism of AMR, for the meaning of
the sentences. And we are interested in learning the transformation between them.
And I'm interested in learning the vectors that are used to encode the symbols and
their roles in the structures. But so far we haven't done any of the learning work yet.
The algorithms are waiting, but they are not tried yet.
And we use the same representational scheme as I indicated before for representing
these two types of graphs. By representing all the triples in them and superimposing
them, adding them together. Sorry, microphone, I guess I shouldn't touch my shirt.
And to have some kind of neural network that does the mapping from one tensor
product representation to another is what we're shooting for, but my boss would only
allow me 20 dimensions. I have 15,000 symbols. So after I stopped cursing him, I
started looking into the question of: Well, if you only can afford 20 dimensions, what
can you do in generating 15,000 vectors in representing 15,000 words which appear in
the corpus of inputs and outputs for this task? And here is a picture of what happens
when you try to minimize a function here which penalizes large dot products for
vectors.
So what we have here is a plot of the resulting vectors from an example run and what
you see is that there are 15,000 vectors, each 20 numbers long. And the biggest dot
product between two of them, two different ones is .72. So that means that the dot
product -- you remember when we tried to do unbinding, we really want the vector that
is binding a particular position we're interested in to have the property that when you
take a stock product with itself you get one and when you take a stock product with all
of the others you get 0. So wipe out all the other constituents that you are not going
after.
So what we get here instead, we get a 1 just because we choose them to be
normalized, but for the dot product of the role vector in itself, but we get as big, up to
about .7-something for all of the other roles that we are actually trying to exclude. That
constitutes the noise that we have in the representation because we don't have linearly
independent role vectors.
>> Audience: [speaker away from microphone.] and beta here?
>> Paul Smolensky: What is that?
>> Audience: You are solving for VJBK?
>> Paul Smolensky: No, I'm sorry. I just fix data.
>> Audience: That's strange, because I can make the VJ.UK negative.
>> Paul Smolensky: So they are all normalized vectors. So you can try to make them
negative, but you can't make them huge.
>> Audience: And there's no notion of similarity, semantic similarity between the
vectors?
>> Paul Smolensky: Right. So this is a case where we are trying to make things as
orthogonal as we can, but in other situations we would want rather to impose some.
So one of the interesting things about learning is once the network mapping syntax the
semantics can influence the choice of these things, will it prove helpful to make similar
the vectors for certain pairs of symbols or certain pairs of roles, for that matter?
So that's one of the projects.
>> Audience: [speaker away from microphone.] coordinates as you go along? Right?
Just keep the [indiscernible] small, but when you need to disintegrate among things at
a high level, you add more features, more like passion values, almost. Your vector is
never fully, is actually never fully represented [indiscernible] vector.
>> Paul Smolensky: Right. I haven't pursued that line of inquiry yet but it's interesting.
I don't think what I have pursued is what you have in mind, but it certainly is close if not
the same. And that is, we really want the -- so the idea is to have purely hypothetical
space of dimension 15,000 here for each of the orders of the tensor. And then to do
something like principal components analysis over the set of vectors that we are really
trying to process, to pull out a lower dimensional subspace that is more manageable
and that can be done incrementally, as examples are processed.
So that, we haven't started on carrying that out. I have an algorithm but don't have any
experience with it. Yeah?
>> Audience: [speaker away from microphone.]
>> Paul Smolensky: Yeah.
>> Audience: [speaker away from microphone] analysis?
>> Paul Smolensky: Well, they are dependency trees.
>> Audience: They are not [indiscernible]?
>> Paul Smolensky: They are just dependency parses.
>> Audience: [indiscernible] coming from the Stanford parser?
>> Paul Smolensky: Yeah, either the Stanford parser or Illinois parser.
>> Audience: Okay.
>> Paul Smolensky: But the idea is that we are helping ourselves to using standard
symbolic means for taking input string and producing some kind of parse of it. And for
our purposes, of course, the more the structure in the syntax mirrors the structure in
the semantics, the better we are. The better our chances are for being able to readily
implement the mapping between them. So some people have used CCGs for this
purpose. Yeah?
>> Audience: So the dimensions of 20, what does that represent? Is that like a
constraint on realtime hardware? Single view?
>> Paul Smolensky: Usually better than that. We need 20 dimensions for A, another
20 for B. So we're up to 400. Then we have ten for R. That's 4,000. So that the
overall dimensionality that we get is 4,000.
>> Audience: Oh.
>> Paul Smolensky: Part of the reason that we have to be very skimpy here is that it
gets squared along the way.
>> Audience: Okay.
>> Paul Smolensky: But another fact is that there's -- we don't have the worldwide
Web as our database. We have 10,000 example pairs to work with. So the number of
parameters in the network that's mapping from one to the other -- because, right, we
have 4,000 input, 4,000 output, 16 billion? Connections? And 10,000 examples to
train them on. And so there are various considerations aside from sheer
computational cost that also mean that if we are too profligate in our representations
we won't expect to generalize well. Yeah, on to the training set. Yes?
>> Audience: [indiscernible] is that the input side or the output side?
>> Paul Smolensky: Well, the number actually after I did this went up to 15,500. That
was the union of the two.
>> Audience: [speaker away from microphone]-- on the input side that never show up
on the output side?
>> Paul Smolensky: Well, this number is coming down as we speak. The intern
working on this is doing named entity recognition, replacing all the proper names with
just a single symbol. So that number is going to go down. That will reduce the amount
of noise that we have to cope with a lot. And let's see. So another task and one idea
for it, this isn't the original idea for the intern but it's a more straightforward way of
using tensor product mechanisms to do it, is a task that Facebook put together and
has gotten a certain amount of attention for reasons that are somewhat mysterious to
me, but it has, and the idea is to code things in a relational format and use this exact
mechanism of just binding together by the outer product the relation and its arguments
to -- so here is what the task looks like. Let me see if I can do anything useful about
being able to read these things.
So there is some sort of old-fashioned adventure game world for generating
sequences of happenings that get expressed in simple sentences. So you get a
sentence like "John promised a baseball to K. J gave a baseball to L." You get a
bunch of sentences and some sequence in it. Various points during that sequence you
have to answer questions like what did J give to K? Or where is the football? Or
things like that. So there's an increasingly supposedly increasingly difficult series of
types of questions to, for your program to have to answer. And so here, so the
example is what did J give K? So the approach here is to translate these things in a
pretty straightforward way into a logical form that is called neo-Davidsonian by some
people in linguistics, where we have these event variables and properties of events,
like this is a promising type of event that is being described here. And that event has
as its agent J, very, very straightforward simple things. But the point is as soon as we
write it this way, then we can use the mechanism I just described and have tensor
product representation up the what zoo for all these presented indications and we add
together the information in multiple sentences and then we query -- oops. There they
are! Now they're back.
We query, so the question was what did J give K? So we have one tensor which has
all the information from these four sentences and we do some querying to figure out
what event is a giving event in which the agent is J and the recipient is K and having
identified that event, then what is the theme in that event? And that's the thing that
was given. So anyway, we used out you are products, I haven't rewritten this slide to
get rid of all the tensor product signs, but so that's in the works. We don't have results.
>> Audience: [speaker away from microphone.]
>> Paul Smolensky: What is that?
>> Audience: How do you figure out the [indiscernible.]
>> Paul Smolensky: How do you figure out where it is?
>> Audience: Yeah.
>> Paul Smolensky: Well, you have to find some kind of event that is in a class of
events that localize something. And query for events of that type. And what this
simple scheme does not do, which needs to be done for that type of question really is
to tag events with times so that you can keep track of something being in multiple
places as the story progresses. And then I don't know how it is we are going to be
able to simply determine the most recent time that the event that is most recent that
identifies the location of this object. I'm not sure how to do that. But ->> Audience: [speaker away from microphone.] what did J give K? How are you
going to solve that?
>> Paul Smolensky: What did J give K?
>> Audience: Unbind the ... because you've now got, there's a giving. There's a
relation giving between J and -- I mean, give is a three-place predicate.
>> Paul Smolensky: Well, in neo-Davidsonian terms we only have these triples. So
there isn't a proposition that has both J and K in it. There are two propositions that say
J is the agent of event one and K is the recipient in event one. So you have to be able
to cope with that, which makes certain things harder, not easier.
>> Audience: If you want to unbind such that you recover the theme?
>> Paul Smolensky: Yeah. So you need someone binding to determine what event is
the relevant event, which is the giving event that has this agent and that recipient. And
once you've found the event, then you probe the tensor representing the story to find
out in that event what element is bound to the theme role, which is what is given.
That's the name for what -- that's the name I chose for what gets given.
So you unbind theme for that event.
>> Audience: And then where in the representation of what do you know that that's
how you are going to be unbinding?
>> Paul Smolensky: Well, I guess the -- you have to translate this sentence into the
form: Find the, return the theme of the event which is of type giving and has agent J
and patient K. You have to translate it into that somehow.
Somehow. Yeah. Right.
So anyway, again we have relatively high order tensors here. So the number of
dimensionality of the vectors that encode these is going to be the product of the
dimensionality of these three types. And so we are also going to have a challenge of
putting a lot of triples of symbols, many different types of triples of symbols into some
manageable size of vector space.
So let me see if there's any kind of sensible conclusion here for -- oh.
Let me just mention this. For the, what we have for reverse engineering because I was
really hoping to get some of this done here. I'm just going to have one slide and then I
promise I'll stop.
So we have a machine learning approach using a generative model to address the
following problem. And I would like to try to use it on networks that have been trained
to produce English output, and try to analyze the representations in the hidden layers
of these networks with this tool.
So the problem that this tool addresses, you're given a set of vectors which are, let's
say the states of the hidden layer in a deep neural network, although we're also going
to try to do it for activation patterns in neural recordings. And different of these vectors
correspond to different inputs. You know what input corresponds to each of these
states of the hidden layer, let's say. You want to interpret them in order to be able to
say how they are encoding the domain data to the point where you can actually explain
how it succeeds in finding the output for the input using these intermediate
representations.
And the hypothesis that this pursues is that the vectors that you're given here at the
top are actually tensor product representation unbeknownst to anybody and so you
have a generative model which produces tensor product representation using choices
for what the roles and fillers are for each of the inputs and what vectors encode each
type of role in each type of filler. And those are what have to be fit in the learning
process in order to account for maximize the likelihood of the set of vectors that you
received as your starting point. So standard generative model and standard
techniques have been implemented and tried on some synthetic data with partial
success. But I like to see whether anything useful can come out of it applied to actual
hidden layers for networks that do the things in language which we need to understand
and which we have some information from linguistics about what might be in the
representations to make it possible for some system to correctly produce those kinds
of sentences.
And I think I will stop right there. Thank you for your patience.
[applause.]
>> Paul Smolensky: Two, two minutes left before noon. No parting questions?
>> Audience: [speaker away from microphone.] presentation of the Facebook, what
are the results? Do you have some results?
>> Paul Smolensky: We have not implemented what I showed you yet. The intern
who is working on it implemented a much more special purpose approach, which had a
lot of the similar ideas buried in it but not in such a generalizable form. And he claims
to be able to answer 18 of the 20 types of questions at this point. At last report.
>> Audience: [speaker away from microphone.]
>> Paul Smolensky: I believe that's the claim, yeah. Yeah. But how hard it is to do
that, making all of the assumptions that are being made, I don't really know.
>> Audience: But what the data center, right? Synthetic data?
>> Paul Smolensky: Yup, yup.
>> Audience: In case you're curious we have a data center that is not synthetic, that
has questions we can point you to, if you care ->> Paul Smolensky: Oh, yes, yes. These are not synthetic fictional stories?
>> Audience: No, it's [speaker away from microphone.] 500 short stories, short
fictional stories with multiple choice questions.
>> Paul Smolensky: Right, multiple choice questions, I'm about to learn about those.
I'm about to learn more about those. Chris has sent me some stuff. Yeah.
All right, well, Friday I'll try to say something more interesting from a
linguistics/language point of view.
Thank you.
Download