>> Eric Horvitz: Okay we’ll get started. It’s an honor today to have Ken Forbus with us. He’s the Walter P. Murphy Professor of Computer Science and Professor of Education at Northwestern University. Ken received his, did his Ph.D. work and undergraduate work at MIT. He’s been, in my mind, one of the core people over the decades working on challenges of representation and reasoning, with qualitative knowledge, as well a, doing analogical reasoning and learning, spatial reasoning. More recently, from my watching his work over the years, sketch understanding, natural language understanding. He’s been a long term advocate and leader in this realm of cognitive architectures. What’s an architecture for reasoning about the world, more generally? He’s been working in a number of areas, in a number of application areas. He’s a fellow of the Triple AI of the ACM and of the Cognitive Science Society. He’s received a number of awards for his work over the years. It’s always a pleasure to have him visit us, Ken. >> Ken Forbus: Thank you Eric for that lovely introduction. I’m going to talk to you today about some of the work we’re doing involving learning from what we call Bespoke Data, custom data. Why Bespoke data? Well as you all know big data is a big thing these days. My heavens it’s wonderful, right. Speech recognition is finally getting seriously good in many ways. Simultaneous translation and Skype all sort’s of great things it’s capable of. But it’s not what you want for everything. For example if you’re teaching a child to understand stories and read. Then you’re actually interacting with that child. If it takes that child a million times of being exposed to a word to learn it, you’ve got a problem. There’s evidence that actually human children can learn new nouns in a single exposure. There’s something we’re doing that’s very different than what today’s Machine Learning systems can do. If you’re teaching a child to do a jigsaw puzzle or you’re teaching an assistant a new task. You don’t want to have to teach someone how to fill out a form a million times. You want to have data that’s actually tuned towards your estimation of their capabilities. To be able to work with them effectively in the same kinds of range of examples the human collaborators take. Now, why do I care? Well, one of the things we’re trying to do is actually achieve human level AI. Now if you think about today’s AI systems they’re like drag racers. They’re very efficient. They’re very fast. Okay, but like a drag racer it does one thing. It’s carefully crafted to do one thing. It has to be carefully maintained and babied, and nursed back to health every time it falls down. By carefully trained experts who know its internals and the way that we don’t know each other’s internals. If we did cognitive science would be trivial, okay. Now think instead about dogs. You can task a dog to do all sorts of cool things. They don’t require a whole bunch of maintenance. You feed them, you give them some affection. They don’t blue screen on you a lot, right. There’s all sorts of great things. What if our AI systems were as robust and trainable, and taskable as mammals? One way to describe this sometimes is what we’re trying to build is a 6 th grade idiot savant from Mars. Okay, very smart but clearly not from around here. Enough where we can communicate with it and work with it, but we’re not trying to make something that passes for human. That’s not the point. Now the way we’re doing this is the Companion Cognitive Architecture. What we’ve done is reformulate the goal in a way that I think is actually pretty productive. We’re trying to build software social organisms. Things that work with people using natural modalities. For us that means natural language and sketching. Those are sweet spots because natural language gives you expressive power. Sketching give you spatial aspects. Learn and adapt over extended periods of time, things that can learn for weeks and months, or years without having human beings maintaining them. In fact they should be maintaining themselves most of the time. We shouldn’t have to know their internal representations just as we don’t know the internal representations of our assistants, or associates, or our children. Now why social organisms? Well, first of all it’s going to make them very useful. But also I think that actually essential to making things that are as smart as we are. Mike Tomasello has made this argument very plainly in several books now. I think there’s a lot convincing evidence for it. Also Vygotsky has argued that much of our knowledge is learned by interactions with people. We actually call companions sometimes the first Vygotsky in cognitive architecture. That’s a goal not an achievement. Okay, we’re trying for that. There’s many things we understand that we’ve never directly experienced. Yes, you need to ground things in sensory information. But you know none of us lived through the Revolutionary War. None of us have seen molecules directly except with the aid of a whole bunch of very carefully crafted equipment. We see the results of plate tectonics but that’s inference as opposed to actually watching plate tectonics in action, given the time scale that happens, and the same things with most examples of evolution. You have to have conceptual knowledge as well as physical knowledge. Now if you think about cognitive architecture. There’s been a lot of cognitive architectures. Some people when they hear a new cognitive architecture, say, oh, my lord why one more? Well, if you think about it Newell actually broke things down by time scale of tasks. Biological Band with neural modeling, Cognitive Band thinking about skills, Rational Band more about problem solving and reasoning. Social band and in fact there’s a Developmental Band above that. If you look at most cognitive architectures they focused here. For instance two of the leading Architectures Actar and Sore have as their signature phenomena skill learning. If you look at Actar where they’re going is down here. They’ve worked a lot to model FMR data for example. Where Sore’s been going is here. For instance they’ve had Sore agents that run for many hours in training exercises. A real tour de force and all sorts of stuff and they’re actually headed even further up. Companions we care about these two bands. Just as when you’re modeling a complex phenomena. You model it at multiple layers. Ultimately those models all have to talk together. You need to explore each layer. You can’t say I’m going to solve this one before I solve that one because we’d still be doing quantum mechanics. We wouldn’t have chemistry and we wouldn’t have meteorology, okay. You have to do things in parallel and talk to each other. For us we’re starting here. We assume these folks are handling skill learning really well. They are, but we’re more interested in how you do reasoning and learning at these broader time scales. That’s the big idea of companions. Now I’m going to tell you about the hypotheses in a little bit more detail. I’m going to walk you through three kinds of examples. One is learning games, both strategy games and a very simple Tic-Tac-Toe thing. I’ll show you a video of that. Learning spatial concepts and learning plausible reasoning. One think you’ll see is in all these cases it doesn’t take much data to actually learn these things. You can get a surprising amount if you know a lot and if you use analogical learning as your learning mechanism. By analogy I mean Gentner’s Structure-Mapping Theory. The idea is that analogy and similarity involve structured relational representations. You have entities with relations connecting them up. You’re comparing one description in the base against another description in the target. That gives rise to a bunch of correspondence in saying what goes with what. Then if there’s things in the base that don’t have anything corresponding with the target you can project those, those candidate inferences to do a kind of pattern completion of the symbolic descriptions. Now it turns out there’s a substantial amount of psychological evidence supporting this model. In phenomena ranging from high level, medium level vision, auditory learning, learning mental models, textbook problem solving, and conceptual change. The evidence is both from purely psychological things with humans and laboratories. Like reaction time studies and protocol analyses, but then also computational modeling. The story is actually pretty interesting. We have these three hypotheses about human cognition. We think that to build AIs we need to actually take these seriously. The first is an analogy is a central mechanism of reasoning and learning. A lot of things, we think of as rule based learning actually are probably applying something that’s a high level structure that still has some concrete stuff attached to it by analogy. Dedre Gentner’s article, Why We’re so Smart is a very good introduction to this idea. The common sense is basically analogical and reasoning, and learning from experience. You start out by within-domain analogies. They provide robustness and rapid predictions. When I go to start up a new car and if it’s the same kind of car I’ve had before. Then I do the same thing and that works. Now I don’t have to have a rule formalized saying oh, it’s a lock I have to turn the key in the lock. Or it’s one of those fancy fobs I just keep in my pocket. I don’t have to touch it. I just do it, okay. Now, so even one example you can do an analogy to learn some other stuff. But we do generalize. The generalization process gets us two first-principles reasoning. But they emerge slowly as generalizations from examples. Now by slowly I don’t mean millions. I mean dozens, okay. It’s very different in terms of time scale. I think much more like what humans get. Qualitative representations are central to this. Qualitative representations provide a bridge between perception and cognition. They provide an appropriate level of understanding for communication, and action, and generalization. Those are like the three big hypotheses we’re exploring. Now, there’s these models we have analogical processing. I’m not going to go into gory detail about each one of them because each one of them is an hour talk. But I’m going to give you a picture about how they all fit together. That will show you how they get used in these subsequent things. Here very roughly is what we think happens. You have the structure mapping engine which matches two descriptions. These, either examples or generalizations, and the candidate imprints give you things like predictions or explanations, or a possible principle to apply in dealing with some new situation. Now where do you get these things? You get them from your memory of course. We have a memory model called MAC/FAC, Many are Called Few are Chosen. It’s designed to scale up to human size memories because the first stage is a cheap filter using flattened versions of the structured representations. With a special kind of vector designed so that it predicts what SME will produce in terms of overall match quality. It’s an inaccurate prediction of course. That’s why you have the second stage that actually uses SME over several examples in parallel. That’s where you get your stuff. Now the generalizations happen because you have basically another model which uses SME and MAC/FAC to sit there and given instances of a concept like models for a word. We’ve done this with spatial prepositions of contact for example in English and Dutch. You basically build up models analogically by combining these things using match and keeping the stuff that’s in common, and deprecating the stuff that isn’t in common by lowering its probability of being there. Generalizations for us are probabilistic. We have a frequency information about every statement in them. They’re partially abstracted. There can be very concrete things in them still. They don’t have logical variables. We don’t need to go to variables because we can just use structure mapping to do the matching. Now sometimes we do introduce variables. In fact there’s a couple of cases where we’ve gone through for various reasons and turned these things into hardcore, into real life logical rules, or probabilistic rules. But that’s not what we have to do by default to make knowledge useful. That’s the essence of how the companion architecture turns over, sort of high level lessons. Yes? >>: In civilization you said one of the things you do is you take a bunch of similar examples and abstract out the commonalities. How do you know what similar examples are similarly… >> Ken Forbus: There’s two answers to that question. First we assume the third generalization contexts that actually are sort of the analogical models we’re building for a problem. Like how do I interpret on? Or how do I interpret in? Another example if I’m doing language processing and doing word sensed disambiguation. For this word in this sense what are the ways in which that’s been used, okay? That’s categorizing things in terms of relevance to a problem. Then to get the things that you compare against you use the same retrieval model. To retrieve from the pool of things you’ve been building up. If two of them become, are sufficiently similar you store the simulated version of those things back as a generalization. The same retrieval model’s used again. You basically are, it’s just, yeah? >>: I’m still not sure. Like, so this question like so what is similarity? Like is similarity like a general, like why does the similarity matching? Does it have… >> Ken Forbus: Okay, similarity metric is computed, no, no it’s computed by SME. It’s defined in advance. Same similarity metric for where it says disambiguation, matching visual stimule, story understanding, counter terrorism, moral reasoning, insanely robust. It’s the structural evaluation score that SME computes for two descriptions. It really is that general. Okay, so here’s some experiments that we’ve done early on. Learning physics from cross-domain analogies. You start with linear kinematics a model of that and container in rotational kinematics, electricity, and thermal problems. Now MAC/FAC retrieves the correct thing only forty percent of the time. If you know the psychology of that that’s actually high, that’s high because cross-domain analogies are relatively infrequent. Well the beast didn’t know much. It’s going to be able to find stuff more easily. Like humans if you actually give it the precedent it actually was able to transfer the information eighty-seven percent of the time, if it had that precedent. We’ve also done this for learning games. At sixty games which were generated by an external contractor, experiment run by an external contractor. I think this is still the largest experiment in crossdomain analogies that anyone has ever done. John Laires, [indiscernible] Michigan developed a sort of basic game which the contractor then went and did, went crazy over. You’re trying to take this character and make it to the exit by building a bridge to escape. Tom Hinrichs from our lab did a version of Rogue, a mini version of Rogue. There was a little version of Mummy Maze here, a variant of that. What happened was you’d learn a game, a game that you’d never seen before. The companion would basically learn HTN strategies by experimentation. Now it knew enough about games that if it couldn’t master the game in ten trials it quit because it never would, okay. These games are easily the complexity of the kind of Atari games that it takes the thing in nature thirty-eight days to actually master. I’m not so impressed by those results. Now given one of those learned games as a base you would then try to learn some new games, sorry. You’d measure how much faster you learn. Positive regret is good. I would have learned fifty percent faster if I had the game almost sixty percent of the time. There is some negative transfer, so fifteen percent of the time in fact I’d learn twenty-five percent more slowly. Than if I didn’t have the analogy. We’re kind of excited about that. Now modalities, we want natural interaction. That’s still a really hard problem. That’s still a very much unopened research problem. We’ve made some progress. But, well let me show you where we are. Our natural language approach is kind of different. We’re focusing on deep understanding. All the way down to things you can do reasoning with. We’re willing to simplify syntax just like human cultures do with children. We’re quite happy to simplify syntax because we want to go all the way to reasoning and decision making. We use James Allen’s parser, RearchCyc Knowledge Base contents. We’ve been building our own lexicon to replace COMLEX. We use Discourse Representation Theory for doing semantic interpretation as the representations for it. We have a query-based abductive semantic interpreter. We use the idea of narrative function trying to figure out what a statements telling you in the context that you’re working in. For example, in solving moral decision problems the system is looking for a set of choices that it has to make, and decide among. It’s looking for the utilitarian costs of each decision. It’s also looking to see if any sacred values are involved. It will basically decide in a human like way what to do, based on those factors. Interesting thing about that, those factors are the same for any moral decision problem. Normally abduction grows exponentially with the number of sentences. In this system it actually grows with the number of questions you’re asking as opposed to the number of sentences. By basically formulating it as a top down problem you can do pretty well. Here’s an example of some of the things you can express. “Because of a dam on a river, twenty species of fish will be extinct.” On the left is the predicate calculus with predicates drawn from the Cyc Knowledge Base. If you’re going to implement DRT psych micro theory is your friend. Okay, because it’s all about context. These boxes, each of these boxes is a micro theory in psych with statements relating the micro theories to each other. This is the DRT version of that same hunk of predicate calculus. Okay, so it takes some work. But it can be done. Now what do you do with this? Well one of the testbeds we use is the strategy game, Freeciv. It’s an open source version of Civilization two. It’s cool because it’s got spatial concepts. You’ve got terrain of different types. You have to design transportation networks and figure out where to place your cities. You’ve got a complex economy. You can go bankrupt. You have guns and butter tradeoff. You have investment versus immediate spending. There’s all sorts of complex stuff in here. It’s a wonderful rich domain. Now one of the things you do when you think about this game. Qualitative reasoning, a lot of it in dynamics grew up around engineering. When engineering you have a blueprint. You know all the parts in advance. That’s a really simple world. In this world it’s not that way. In this world you’re reasoning about things that don’t exist yet. You’re reasoning about things that can get destroyed. The object level representations are impractical when you’ve got limited attention and storage, and processing time. If I have to build an explicit qualitative model for every tile in the game I’m hosed, okay. That’s not going to scale. You have to plan for things that don’t exist yet. You have to build models of the dynamics for things that don’t exist. We’ve introduced a kind of type-level qualitative model that uses predicates and collections as arguments. For instance qprop+ normally says the first quantity, shouldn’t stand in front of the screen. The first quantity is causally determined in part by the second quantity. The type level version says a quantity of this type and a quantity of this type apply to objects of these types, with this relationship between them has a qprop between it. In other words, this cashes out two for all x, for all y. If x and y are GaseousObject and x is the same things as y then the pressure of x is qualitatively proportional to the temperature of y. Okay, so that’s how that’s a translationed instance level. But you do the reasoning of the type level as much as possible. Now if you’re dealing with generic statements like reading a simplified version of the Freeciv manual, this is a true blessing. Here’s a little bit of a manual. Here’s the translation after the whole abductive reasoning process has happened on the first sentence. It’s detected these; I’ve written this is frame like syntax just for easy visual processing. Normally there’s a lot more parentheses and they’re all independent statements. You’ve got a type level process that’s a generic production process. That kind of event is actually tied to the language in the Cyc KB. We didn’t introduce that. There’s some event it refers to a production from the sentence. That’s a discourse variable. It’s done by something that’s a Freeciv-City. The outputs created something that’s a type of amount of food. There’s a positive influence of the amount of food of the rate of production. Now you take the second sentence and remember this because you’ll see it again here. As the population of the city increases the food production of the city increases. This is actually a classic qualitative proportionality introduction. It’s saying this is positive qualitative proportionality and the constrained quantity is the rate of food production. The thing that’s governing it is the city population. Finally a citizen consumes food in the city you’re introducing another process. Notice it’s a destruction event another kind of thing from the anthology. Now you’re filling out its dynamic consequences. These qualitative models are very powerful. One of the things about strategies in games like this is they involve tradeoffs. Analysis of the qualitative model can identify tradeoffs and let you reason about those. Now even a little bit of advice improves performance. >>: Is time represented in the previous slide? >> Ken Forbus: Because these are all things that are happening continuously while the process is active. In other words it’s a mechanism by which behavior is generated as opposed to the description of the behavior itself. We have other ways of describing the behavior itself. Okay, so turns out the type-level representations are very useful for advice. “Irrigating a place increases food production.”, absolutely true in the game. “Adding a university in a city increases its science output.” Now you take a half dozen pieces of advice like that. You basically say okay how well do I do at producing science? This is averaged over ten different games. By ninety-seven turns it’s still, the experimental condition is still doing better on population. Not hugely better because that’s not what it’s optimizing for. It’s optimizing for science output. Here you’re getting to the point where you can build libraries and other units, right. Before that you couldn’t build them. These two look pretty much the same on science output. But now the advice kicks in and you start getting more science produced for the cities. Okay, so that’s one modality. It’s also a little bit of the game learning. Now, what about the other modality, sketching is a very natural way for people to communicate knowledge. This is a picture of a geological formation. If you’re an instructor in geo-science what you want to see when students mark it up is something like this. That’s the fault, the main fault. These are the directions of movement. These are called marker beds. These indicate the displacement of the marker beds, okay. We’ve actually built something with our sketch understanding system. That handles things like this. It’s domain independent which is very important for us because we want this to scale. In an experiment at University of Wisconsin Madison, a geo-science grad student made fifteen worksheets. Showed significant gains pre-post using those worksheets in their Intro Geo-Science class. We’ve had similar results with a unit on the heart for fifth grade Biology. We’re about to go back into the classrooms in an Engineering Design course for learning Engineering Graphics. We have independent evidence from laboratory studies that you can actually do assessment by looking at what order people draw things in, in a sketch. What they include and don’t include when they’re annotating diagrams. This is a massive effort involving a lot of people. He is a well known Structural Geo-Scientist. Bridget is the grad student who actually cranked out the worksheets. She’s really good at this. Maria and Jeff are two of the CogSketch Developers. That’s a whole talk there. But I’m not going to go there. But I’m instead going to pivot to other roles in sketching. Sketching’s also a tool for thinking. >>: [inaudible] completely. Can you say a few words about the ordering next? >> Ken Forbus: Okay, so if I have for instance a photosynthesis diagram. Then a student who doesn’t know photosynthesis will start with those visually salient parts. A student who understands photosynthesis will start from the input and go through the causal chain to the output. Very simple to catch and if you’ve got digital ink you can know what order people drew things in. Yeah, this has happened in a couple of domains now. Basically if a student understands a domain well, for instance in geo-science there’s certain things they’ll pick up versus certain surface features that literally are irrelevant. It’s pretty easy; it’s not rocket science to tell the difference. It’s not a subtle signal. Okay, so this is a sketch by a painter, Shonah Trescott. She was on an Arctic expedition. You’ll see some little stick figures here. You’ll see some notes about the background. No distinct horizon, disappearing figures. In the Arctic you can’t really paint, okay. Even if you’re in the more genial client artists often do this. This is the painting that resulted. She’s perfectly capable of doing all sorts of very fine subtle visual work. But to just think it out she first did a sketch. This is pretty common. Sketching is an aid to thinking. What you’d really like is systems that can sketch with us and sketch for themselves in human like ways. Here’s a long-term vision. You want a software that understands sketches as people do. Now what does that mean? Here’s a ramp and a block under gravity. You can infer the block will slide to the right and down, perfectly sensible. Now that requires fluent, natural interaction and human-like visual and spatial reasoning. Conceptual reasoning about the contents of the sketch and you want it to be domain-general. If you look at the sketch recognition literature right now every sub-problem and every domain is a separate system. You have to train the recognizers and you. You have to build new software. That doesn’t scale. That’s one of the reasons why we’ve done some engineering work arounds for segmentation and conceptual labeling. We actually don’t do recognition on the whole. Okay, because it turns out you don’t need to. When people are talking to each other during sketching that’s how a lot of the labeling happens. It doesn’t require recognition. Recognition is the best of catalyst. You look at the speech, recognition research it’s focused solely on that topic. We’re focused on the rest. Even if recognition were perfect and there’s reasons to suspect that it can never be perfect at the level you need it to be, especially for education purposes. You still have to do what we’re going. Let me show you what that involves. Now the thing I’m about to show you is actually something you can do yourself. If you download CogSketch and fire up the Design Coach, and ask it a question about the behavior. What I’m going to show is basically a rational reconstruction of the reasoning and the truth maintenance system that the system’s actually doing. Here’s our ramp, so we have the ramp and block. CogSketch recognizes visually that they’re touching directly which causes it to extract and edge representing that surface contact. It computes the surface normal of that because that’s very relevant in terms of how forces transmit. It’s in quadrant three here. That’s a qualitative way of describing angle. Now you think about gravity and we’re now using some of Jon Wetzel’s qualitative mechanics work. You have a force applied to the object in the downward direction. That is the only force on it. We’re assuming the friction doesn’t matter her. Now you have a translational constraint from this edge saying it can’t move in the downward direction. That plus a little bit of other reasoning says well the translational motion will be in quadrant four, i.e. to the right and down. That’s how it infers that stuff. Now it turns out there’s two parallel literatures. One in artificial intelligence about qualitative reasoning and one in cognitive psychology that they call categorical coordinate and they’re the same thing. They’re interestingly complimentary literatures. The other thing is the structure mapping processes are used in visual reasoning. Let me show you some work from Andrew Lovett’s Thesis. Geometric analogies, A is to B as C is to one of those. If you download CogSketch you’ll see all of Evan’s problems in a sketch that you can play with and experiment with. Andrew’s model has the lovely feature that of course it gets them all right. This is a very easy task it turns out. It makes reaction time predictions that are born out in human behavioral experiments. In fact there’s a later paper where by adding working memory constraints and two strategies, which is the model that actually you can play with in CogSketch. The correlation coefficient is point nine something. It’s insanely good. It really is a simple task. A much harder task is Raven’s Progressive Matricies which is commonly used as a test of fluid intelligence in humans. Andrew’s model is better than most adult Americans. Again, it makes reaction time predictions. What’s hard for people is hard for the model and vice versa. Finally there’s an Oddity Task that Dehaene and Spelke used to look at cross cultural difference in geometric processing between Americans and Munduruku. Andrew’s model, again same representations for all these things, all automatically computed from PowerPoint, stimule copied and pasted into CogSketch. Andrew’s model again solves most of the problems. What’s hard for it is hard for people and vice versa. You can do ablation studies on the models obviously, not the people and get some insights as far as what’s happening across these different cultures. Okay, now at the risk of putting everybody to sleep. I’m going to switch to, briefly to a video. This is showing you a companion learning Tic-Tac-Toe. [video] This is a demonstration of flexible multi-modal instruction in a cognitive companions architecture. We’re going to teach our computer to play Tic-Tac-Toe through a combination of natural language and sketch interaction. We… introducing the topic. I say I’m going to teach you to play Tic-Tac-Toe. This provides some expectations about how to interpret future statements. >> Ken Forbus: You’ll see the predicate calculus coming down here by the way. [video] We create a new sketch and I start by classifying the game. I say Tic-Tac-Toe is a marking game as opposed to a piece moving game, or a construction game. This tells it to expect some kinds of marks as well as a board. Now, we’re going to draw the Tic-Tac-Toe board. It doesn’t know what this is. It’s not recognizing the ink or anything. But the board is going to be the background. We’re going to explain what this is in natural language. I type Tic-Tac-Toe is played on a three by three grid of squares. That contains a lot of information. It tells us that there’s a spatial configuration, its Cartesian coordinates, and the maximum extent of any dimension is three. On that basis it’s able to label the glyph we just make as the board. Now, we go on to classify the game to more. Tell it that Tic-Tac-Toe is a two player game which means that it can expect us to introduce some player roles. X is a player. It understands that to mean that X is a game player, not an athlete or a musician. We can now draw an X and it doesn’t recognize the X. But rather it understands in context that this must be the X. You can see that it labels it with its own little x below it. Now, I can proceed to make an O. I can do this before I’ve introduced the player. It doesn’t recognize it. I entered O as a player. We can see that it understands and labels the O. The next thing we do is describe the actions in this game. I tell it that X and O take turns marking empty squares. That contains a lot of information about turn taking and the kinds of marks it makes, and the precondition for making marks on empty squares. The next thing we enter is a description of the goal. We’re only describing one goal here. We’re saying that the first player to mark the three squares in a row wins. It only understands row to mean horizontal row. Not a column or diagonal. But through demonstration we will teach it the other win conditions such as vertical columns or diagonals. Then X goes first is highly ambiguous and context sensitive. But it tells it who starts the game. At this point there’s enough information in this representation to play a legal game. I tell it to start a game. What it does is it takes those marks I’ve made and puts them in a catalog. A separate layer, creates a new layer for the game state, and made its move. It made an X in the middle left square. I respond with an O in the center square. It’s playing a blind random game. It doesn’t understand strategy yet. I’m going to be a little cruel here. I’m going to demonstrate winning by a diagonal. I mark three in row on the diagonal. In order to help teach it a new rule I select the winning configuration. >> Ken Forbus: Our way of doing deictic reference and sketches. [video] It doesn’t know that it’s lost yet, so I tell it I win. It takes the winning configuration and creates a new rule for winning on the diagonal. We can inspect this rule if we look in the command transcript window. In this case the rule is specific to the particular diagonal. It will require another trial to learn the opposite diagonal. But if we demonstrate winning on a vertical column it will generalize this to all columns analogously to the rule it already has for rows. We only need a total of three trials to learn the complete rules of TicTac-Toe through a combination of instruction, and demonstration. This concludes our demonstration of flexible multimodal… [video ended] >>: It’s like that middle vertical column was not a win in this special variance of no vertical win Tic-TacToe, what would happen at this point? We generalize and say oh I’ve generalized it in an appropriate way and back off and create a special sub-case? >> Ken Forbus: Yes, except now, this version won’t. Actually if it learns a bad rule it’s hosed. >>: It’s really hosed it can’t go [inaudible]? >> Ken Forbus: This version, no, right. Yeah, no, we’re focusing. I mean for that project our focus was entirely on what you have to do in terms of multimodal interaction to boot strap these things, okay. Okay, so what was going on under the hood there? You’ve got your language coming in, translated to a general logical form, predicating calculus interpretation. It’s interpreted in terms of communication events and instructional events. It understands how to interpret them. You know, in some cases we have to fall back on general heuristics through interpretation. In this case, no, because there’s enough knowledge about the context I’m being taught a game, okay. That’s an incredible driver. The digital ink gets turned into symbolic representations also. Interface gestures like they just added a glyph and things like that. Also get turned into the same communication stream. That’s what leads to building game rules. The interpretation process as described earlier. This turns out to be a lot of fun because the Cyc Knowledge Based has a boat load of stuff in it. In an earlier system we talked about the Jordan River enters the Dead Sea. I thought there was an entering the container event where the container was the musical group the Grateful Dead. Okay, so you get all sorts of amazing things if you have something that knows stuff about the world. It’s sitting there basically interpreting these things. To where it gets down to a point saying well X plays the role of something in a game. There’s a whole set of instructional events. If you look at the intelligent tutoring literature this is a spin on some of the kinds of instructional events you see there. But game specific. Like for instance wind conditions is not something general. But defining configurations, that is pretty general in that the introduction and action generalization are pretty general. The game classification is from earlier work on GDL. There’s basically nothing that we really added ourselves. Communication events, there’s both the stuff from language and the stuff from interpreting what’s happening with the sketch world. Now, where we’re going next with this, we’ve done hexapawn. That’s kind of trivial. The only difference is it’s a piece moving game. Tom actually can now describe all the regular moves of chess to the system by language and sketching, okay. Can’t do castling, capturing en passant, and pawn promotion. All those things are things that you really want to do by language. You really don’t want to do those by demonstration, right. Now do you say you can’t castle if the thing hasn’t moved? That doesn’t work so well. We’d like to then expand to discussing strategy as well as rules for play. Now spatial concepts, one place we’ve done this is Freeciv where we looked at geospatial concepts. We can map a Freeciv map into CogSketch and sketch on top of the map. Here, Matt has basically said, drawn a circle and said this is a Strait. What it causes it to do is take the encoding of that region and shove it into a journalization context for the word Strait. It’s using that to build up a model of the concept. Then you can then draw circles elsewhere and ask it to basically classify those things. It’s basically saying if I do analogical retrieval over my encoding of this across the whole set of generalization context or the concepts. Then which is the strongest match from? That’s its way of doing classification. This is okay. I mean you know sixty examples, ten per concept, tenfold cross validation about seventy-six percent accuracy. I’m not excited by that. But when you look at similar tasks in this world it’s not out of band. I think we should be able to do much better. But one thing I worry about it is overfitting. It’s always dangerous to just use your own data. There’s this lovely corpus by some folks at Berlin & Brown. What they did was they took two hundred fifty concepts like snowman, ice cream cone, giraffe, grapes, bunny rabbits, airplanes, etcetera, and got people on Mechanical Turk to make eighty sketches per concept. That’s twenty thousand sketches. Now they claim in their paper that this exhausts everyday objects. I don’t believe it. It includes things like hand grenades and flying saucers neither of which are everyday objects for me. Especially hope not together ever become everyday objects for me, okay. But it’s still a great corpus. It’s very tough. Here’s the experiment. What they did was they used pixel level image encoding properties and Machine Learning classifiers. We’re using CogSketch encoding because it’s all ink. We just suck it in, plus analogical learning. Now we start with ten concepts. Why ten? Both runtime because we have not engineered this in the way we need to do to scale up for large batch experiments. But also another problem as you’re going to see in loving detail in a minute. Okay, because some of these things we just can’t do yet. We know why. We have partial solutions. Encode with CogSketch, eight fold cross-validation, and those numbers of concepts. Typically when we do analogical generalization within twelve examples we’ve got as good as we’re going to get. The only example before this that was different was counter terrorism. There we needed about thirty to figure out who the perpetrators were in some events reliably. This was how they broke it up. We encoded it their way. Here’s the results for three versions of our system. These are two extremes in SAGE. This is the knob where you say I want everything to assimilate. I set my assimilation threshold so low it always merges everything into one model, okay. This is, turn it the other way has to be a perfect match before you’ll assimilate it. It’s a way of doing both prototypes and exemplars in the same model. A line is like these two that’s using SAGE. But it also has near misses automatically derived near misses. If you get something retrieved from a neighboring concept that’s mutually exclusive, that’s obviously a near miss. It actually constructs small hypotheses about what those differences are. That in some cases significantly help in terms of telling them apart. Now here the differences turn out to be not so much. If you look at the pixel level stuff it’s also about fifty-six percent but overall two hundred and fifty categories. Okay, so for the categories we can do at the moment. Hey, we do about the same as what they do. But there’s two things that disturb me about this. The first thing that disturbs me is you know why so low, and why so slow? The second thing that disturbs me is humans are up there by a Turk experiment that they did on their dataset. That’s a lot of head room. Okay, that really bothers me because usually we get human like performance in a small number of examples. What’s going on? Well, if you look at it close here’s at least one thing that’s going on and there could be others. If you take a turtle, this is from their corpus. Here’s a visualization of the CogSketch analysis of it. It breaks things up into edges. You can see the junctions between the edges. It also constructs edge cycles, connected sets of edges. These things often are cool because they segment stuff into objects. These edge-connected objects are sort of higher level descriptions that group those things, okay. Now, bad move. Instead of just picking one level, the one that’s most informative. We threw them all in. Okay, so that’s a lot of facts. That’s four hundred and sixty-four facts for this turtle. Now in other experiments it looks like for analogical matching if you can’t do between ten and a hundred relations per description you’re not in the game. You’re not going to understand stories. You’re not going to do moral decision making. You’re not going to do visual processing. You’re not going to solve textbook problems. But this is a lot. Now when you start doing more textures it gets worse. Okay, so a lot of texture in this turtle, right. A lot of regions in this turtle and so what do you thinks happening? Here’s a principle we’re trying to extract out of this. That part of the goal of encoding processes should be to construct concise informative representations. We’ve always worried about informativeness. But we’ve never really worried about conciseness. With perceptual processing that’s a mistake. We think this is going to be an internal metric on the cognitive architecture. You start thinking in terms of upper bounds and assertions, and trying to extract a way when you have that. There is some evidence people do this. But it’s very weak evidence. I would not take it to the bank. >>: Are you talking about concise representation or just a way of separating essentially figuring [indiscernible], or so that it may be a layered representation? >> Ken Forbus: Oh, yeah, no we’re assuming layered. >>: Okay. >> Ken Forbus: CogSketch has three, actually four different layers of representation. It can do groups, shapes, or groups, objects, and edges. There’s a fourth thing in terms of surfaces. It’ll actually dynamically move from representation to representation. It couldn’t do Ravens without it for instance or the [indiscernible] task. But we have to add more. This is a clear, you know if you’re thinking of this in vision terms this is a texture problem. We’re looking from the vision literature planar Ising model to handle these regular repeated structures. You basically take a whole bunch of them that are; they’re similar in some way. Turn those into the description that’s one big chunk of stuff and say. It’s got a whole bunch of things that are about the size and about the sysentric, and all that. You’ve basically re-represented it. Now that works for some of the turtles. Works nicely for this one and this one, oh my god that was a nasty little turtle before, right, because look at all these different textures in there. Works for this one, this is all one unit now. It’s got some extra properties of it, talking about the visual properties of the things inside. >>: I missed what you’re doing here to abstract [inaudible]? >> Ken Forbus: The idea of these Ising models is you say I’m going to make. I’m going to put basically little control points at these various spots and say can I get rid of these? Are these things sufficiently alike that I can merge them together into one unit? Then you do some energy minimization over that. >>: [inaudible] understand that? >> Ken Forbus: Well we didn’t invent the technique. We just got it off the shelf. >>: Okay [inaudible]. >> Ken Forbus: But it isn’t like… >>: It’s for texture detection, right? >> Ken Forbus: What? >>: Ising models are for texture detects, right? >> Ken Forbus: Yeah, and we’re treating this like texture. >>: It would be hard to do for hair for example whether or not. I don’t know like strands of hair might be [indiscernible]? >> Ken Forbus: Yeah and you know people put in what the sketch recognition community calls adornment or decoration, right. >>: [inaudible]. >> Ken Forbus: Yet, people do that. You really have to learn how to handle that. Now that works sometimes, does not work all the time. Here it didn’t actually figure out that you should group these things that way. Here it turned the entire turtle into one big blob, okay. We’re still trying to figure out what’s going on here. Now, I’ll close with one more example. This is learning to do inference on structures. This is not a problem we picked out. This is a Machine Learning community problem. The idea was well you got the semantic web which is growing by leaps and bounds. You’ve got the whole knowledge graph thing, and Google, and I’m sure Microsoft has an equivalent thing. >>: It looks like [inaudible]. >> Ken Forbus: I’m sorry? >>: It’s like [indiscernible]. >> Ken Forbus: Okay. >>: Yeah, it’s just like [indiscernible]. [laughter] >> Ken Forbus: Can you do traditional logical inference? Sort of, but the data’s incomplete and noisy. Statistics, well they’re structures not feature vectors. If you’re a dedicated Machine Learning person you’ll say we can fix that. We can vectorize those suckers, okay. You make a distributional vector space. You take the relations. You mush those into vectors and you crank around. You make a tensor network and you do stuff, okay. Now I think that’s sort of sad. I think feature vectors are not as expressive inherently as relational representations. If you had the structure to begin with it’s really a mistake to not exploit it. You get good accuracy but it requires a lot of examples. It has to train over all the relations at the same time. It lacks interpretability. You get this number that this is the best one. You can’t say why. Let me show you a better way. You’ve built cases for analogy by path-finding. The insight here which is not unique to us is if you have a relation between two entities they’re likely to be indirectly related to each other in a bunch of other ways. As usual with these sorts of algorithms you put in limits on branching and search for tractability. A parent of one person, another, you also generate negative examples by corrupting that triple into something else. That’s the same technique that these folks used. Now what we do is we basically search through the database and grab a bunch of relations tying those two things together. Those become a case. We only grab ten positive cases and permute those to get ten negative cases. Ten remember that number. It’ll come up again. Then for template learning we do analogical generalization. We basically take the two cases and we match them to find out what corresponds. Then we basically use SAGE to construct templates. If after this match if these are mushed together then these only appear in one. They’re going to have probability point five. These appear in both they’re going to have probably one point O. If you keep going you can see the things that aren’t in common are going to get smaller and smaller probabilities. Now the things that are common stay high, okay. Now, you can do better. This, the usual way we do these things is just say hey we use SME and we let that be that. Now what Chen figured out was okay. There’s cases where some properties are really for a task more important than others. You’d like to learn what those are. You’d like the bias to match. Think about the numbers computed by a structure mapping computation. Now convolve that with task specific importance weights. He computes those by extending logistic regression to work on structured data. Think about an analogy between vectors in a structured case, dot product becomes structure mapping, and vector addition becomes structure addition. That SAGE is already doing with the alignment step. Then you train with gradient descent, etcetera. If I’ve got the traditional way of doing it with input vectors, each vector position tells you what goes with what. That’s trivial and then you get your dot product. With structure mapping these are expressions now. I have to compute what things go with each other. Then I get my stuff that then I can train by gradient descent. How well does it do? Well, so there’s two datasets that people have used for this. They have a WordNet dataset and a Freebase dataset. Looking at eleven and thirteen relations respectively, large test set, whole bunch of training data. The other models use typically ten thousand training examples and train on all relations at once. We use ten and we can train for each relation independently. You add any relation we don’t have to retrain all the others. How does it come out? That’s the scoreboard. That’s our system and you notice the top systems in both cases are not the same. We’re right up there. We’re not the best on either corpus. But we’re right up there with three orders of magnitude less training data. I think it’s pretty cool. I’m excited about this. You know it’s, we’ve done this plenty of time before on datasets that we found were generated from our reasoning systems. But to be able to do it on datasets that the Machine Learning community has done, and get the same performance. We’re very happy about that. >>: Do you gain anything with more training examples? >> Ken Forbus: Not at the moment. We’re trying to figure out why that is. >>: The examples randomly chosen? >> Ken Forbus: Yeah, they’re randomly chosen. >>: Or [indiscernible] space in and interesting way based on the [inaudible]? >> Ken Forbus: They’re randomly chosen. Now the other thing it gives you is explanations. You can say basically by sort of evidential explanations what’s in the matching. This is deliberately chosen to not be interpretable easily. If you had a natural language generation system to test your knowledge, you’d have a hypothesis like saying well this person’s Tongan and you ask the human yes or no. They’re going to say, uh, right. Unless they have a lot of world knowledge. In which case why are you learning this knowledge? But you can say, well his parents is Tongan and person with the same country as him, his ethnicity is also Tongan. Then you can look at explanation and now have more context. You have something, some knowledge about the reason why the system believes that. We think that’s also very valuable. By sticking with structure you learn faster and you get explanations which is are not inconsiderable advantages. Just to wrap up. For some purposes Bespoke data is better than big data. If I’m trying to train software assistants and I’m looking at you Cortana. You want to say something to it once like never show me Fox News in my notification stream every again. I can’t say that. Or I can say that it won’t listen. You want human like learning of tasks because you want assistants to be at least as good as we are at learning about the tasks that we’re training them for. Rich knowledge supports learning from Bespoke data. We didn’t use analogy in learning Tic-Tac-Toe. We actually just talked it through and because the system knew about games more broadly it was, and knew about sketching, and knew how to connect those things, and knew about instruction. It was able to interpret that information in a way that mattered. If you’re doing learning, structure mapping can support learning with a small number of examples. Better representations like sticking with structure when you have it can drop your data needs by three orders of magnitude in the case we just saw. That’s, we think that’s true more broadly. A little joke here, just like orange is the new black, structure mapping is the new dot product. [laughter] Okay, so I end by thanking all the people who really made all this happen. Thank you. [applause] Questions, yeah? >>: You mentioned earlier on in your talk. You mentioned the word understanding. >> Ken Forbus: Yes. >>: Do you have a definition for that. What does understanding mean? >> Ken Forbus: That the system constructs representations that enable it to do the kinds of things that we would expect people to do given that same material and the same context. >>: The only way to measure if a system understands is by having humans in the loop? >> Ken Forbus: Or by measuring its performance in some other way. I mean, so here’s an example from another study we’ve done. You take little kids and you give them a forced choice task where you have things that are the same in one dimension versus another. It turns out that you can pick the task so that four year olds can’t do it, they’re at chance. Eight year olds can do it, okay. Then by cleverly reordering the stimule you can get four year olds to learn it. Okay, without feedback, okay. This is something Dedre Gentner and her student Laura Katowski did quite a while ago. Now here’s the cool thing. If you’re going to simulate that you need to do two things. You need to figure out how to do the forced choice task. In this case it’s, are these things similar enough? You know, which of these is more similar? You need a signal that says I’m not doing it right yet. How do you do that? Well, in the forced choice task if my encoding of the situations is not sufficient to make it clear which one to pick I know internally without someone telling me that this is not a good encoding. The system actually casts about a bit to figure out what a better encoding is. Then snap you know it gets it, okay. Internal signals like that we think are critically important. We’re kind of on the lookout for those now, right, which is why on the encoding thing on conciseness we’re now thinking that should be a big signal. If you’re getting too much stuff you’re clearly encoding it wrong. You have to think about this thing differently. Rather you get advice from people in how to do it or you have to search your own space of alternatives, open question. But I think those internal signals are also crucial. One thing that happened in Learning Leader is you could have, there’s good news and bad news. If you want to text surprise then you could give it some new example. If it’s been building up an analogical model of something. It’s got a bunch of examples in it and nothings retrieved. That’s a surprise, right, because if you’ve really seen all the space then you should get something. Now it turns out the bad news is there’s two reasons for that to have happened. You really are seeing a different part of the space or you have an error in your natural language understanding system. As you get more things with that same error, telling those two situations apart is really hard. Okay, so that’s the down side of it. AI’s hard. [laughter] >>: [inaudible] share some questions that you shared earlier about the approach that you’d been taking over your career. What kind of architectures an results coming out of some of the larger scale big data approaches. I know those things were taught but like you know, the Atari game learning by DeepMind and other groups that were doing that kind of thing. Where thinking is a thing along these lines as a general community challenge problem in larger spaces like [indiscernible] games. People that look at general connectionist is the old name for it, style, models that promise to learn from the data. Potentially having, super charging those someday with these kinds of abstractions versus starting at the abstractions and using a little bit of learning on top. What are your thoughts on methodology and approach? Some people say that those then will sooner or later bury these more human authored kinds of approaches. Even when humans just draw big boxes and put arrows between them. >> Ken Forbus: Well, okay so if you think about what we’re doing. We’re making the bet that you can safely package off a lot of the perceptual processing and hand jam that. For instance our CogSketch for better or worse is our model of two D geometric processing and vision. That’s, it doesn’t do everything in vision by any means. Think about texture, color, shape, and shading, just a raft of problems it doesn’t try to address. But it’s a sweet spot for interacting with people. You know, gerbils learn in slow ways, as do people. I mean if you think about how long rehabilitation takes after an accident. Physical therapy takes forever, right, because our motor learning systems aren’t that fast. You probably wouldn’t want them to be that fast. They’d over adapt in bad ways. But the stuff that humans can do that’s pretty darn amazing is couple the stuff that lets you interact with the world in reasonable ways, with all this amazingly complex conceptual knowledge. I think if I were to make a bet, I have made this bet. You know, we’re going to have human level AIs helping us figure out how brains work. Rather than understanding about how brains work be central in making human level AIs. There’s an analogy that Ken Ford and Pat Hays did with AI and artificial flight. They pointed out the first flying machines did not flap their wings and did not have feathers. There were plenty of attempts to make those at that time. But the ones that didn’t and succeeded basically said we’re going to understand the principles of flight, so the principles of intelligence. Now to understand those principles I still take a lot of guidance from humans because humans are the best example we’ve got. But I’m also willing to look at other animals, okay. Eventually, in fact now materials have gotten better and different. People are trying to build flying machines that are like birds. But that’s over a hundred years later. I think that’s what will happen will AI too. You know, eventually we are going to have one story that goes all the way down and says, you know these dopamine receptors hook in [indiscernible] all the way up to you know your knowledge about intention. You know Metzloff’s like Me Hypothesis where think of it as an analogy between yourself and other people, right. I can state that hypothesis without talking about dopamine. I can have a completely good scientific model of conceptual reasoning without knowing about neurons. Yeah and so that’s why I’m betting the way I do. I’m glad other people are making other bets because I could be wrong. Ultimately the whole stories got to all hook together. You really have to have all these things going on. You guys are going to do some game learning? >> Eric Horvitz: We talked offline about it. But actually we’d like to get your opinions. >> Ken Forbus: Oh, yeah. I think games are great. We did the sketch games as a Darpra seedling. No more money for that alas. But I do think it’s the kind of platform where you can imagine really open ended conversations. >> Eric Horvitz: Right. We can play later [inaudible] and we’ll just talk about it offline, everybody out there. >> Ken Forbus: But like you know if you think about the way people are playing Go or Chess. It’s not very human like, right. Think back to the old Chase and Simon results. One of the reasons why we’re doing this is we’d love to make a companions executable that has the sketch game capability built in. As well as a whole bunch of conceptual reasoning built in so that people could then experiment. I’ll bet you that you could make a really cool like human like chess player that could really explain what it’s doing in terms like those you would see in a chess book, for example, okay. Yeah and even if it wasn’t the best chess player hey it would still be kind of cool. >> Eric Horvitz: Any other questions? Okay, thanks very much. >> Ken Forbus: Thank you. [applause]