17770 >> Dan Bohus: Okay, everyone. I'm very excited...

advertisement
17770
>> Dan Bohus: Okay, everyone. I'm very excited to have Candy Sidner visit us today. She's a fantastic
researcher. She's done tons of great and seminal work in dialogue and communication collaboration.
And one of the things that I really admire about her is her ability to sort of like thread the whole space from
sort of theory to practical engineering issues. Like she's done great work, I think in the past, on like
discourse structure models with Barbara Gros and Share Plans. And she's brought that to life actually in
the collagen framework, which has been used to develop a number of different systems. And then she's
also done great work at the intersection of dialogue and robotics, and both at Mitsubishi Electric and WPI.
And she's going to talk to us about some of her recent efforts on engagement in human-robot dialogue.
So Candy.
>> Candy Sidner: All right. I hope the mic is working. Especially for the people who weren't present. First
of all, thank you for the nice introduction. And it's nice to see all of you here today. For those of you who
know the work that Eric and Dan have been doing, I'm going to be talking about what is, I think, maybe in
the framework, the perspective of their framework, how it is you make decisions about when you engage
with someone.
So the work that I've been doing over the past several years is to look at the question of what happens
when people actually try and engage with one another in an interaction. So first I'm going to describe what
engagement is.
I'm going to show you some analysis that I've been doing of human-human dialogue. So what do people
do. What's their situation. I'll define for you what a connection event is, as a way of giving some more
grounded meaning to the notion of engagement.
And then I'll talk about a module that's being developed by my collaborators, Chuck Rich, Aaron Holroyd
and Brett Ponsler. Chuck is my long-standing colleague, and he also happens to be my spouse. And Brett
and Aaron are two students who are in the research group with us. I'll talk about the module we've been
developing and I'll show you a demonstration of this work in what's called the pointing game.
Okay. So what is engagement? Engagement is a process and it's a collaborative process by which two
participants, in my case it's really more if you want to look at the end person case, establish, maintain and
terminate or end their perceived connection to one another. There's a problem -- the initial problem is how
do they actually do that? I come from the language community. And people in the language community,
you just say well you say hello and everything happens.
It's really not like that. And Dan and Eric's work has been really looking at how this actually goes. Once
the connection is made, then you have to negotiate this. And you negotiate it as part of the whole
collaboration for whatever it is you're doing. The collaboration can be simply to talk to one another. It can
be you're talking to one another while you're doing some task. So all of those things are part of it.
Then there's the interesting problem of checking. So each participant in the interaction checks to see what
the other one is doing if they're still engaged. What kinds of things do they do? Well, they talk, they look,
they track the didactic references. I'll show you more about that. They do mutual gaze. There's some
other interesting things we'll look at.
And at some point they have to decide to terminate the interaction. They have the terminate their
connection. And again it's not simply that you say good-bye. It's actually more complicated than that. I
won't say too much about that today. You'll just have to believe me.
So where is the evidence for this sort of thing come from? Well, there's the collaboration itself, whatever
tasks we're doing. There's conversation management. That is the whole process by which we take turns.
Now, conversations go on. We don't just say everything and then the other person says something.
They're very complicated. But it's an indication that in fact we're engaged with one another.
Gaze is a very important effect for how it is that we indicate that we're connected to one another. We use it
to take turns. We use it to track objects. We use it to check the attention of the other person, are they
looking at the objects I'm pointing at or are they looking at me, are they looking around the room, what are
they doing? Are they looking at something else.
There are hand gestures. The obvious ones I'll talk about have to do things with pointing, with presenting
things, with explaining things. But there's a whole other range of gestures which I will say nothing about
which I call semantic gestures. They're all the things like it was really big or I'm not really sure about that.
All these wonderful things we do with our hands.
Some of which are cliches. They're kind of fixed and they're almost like fixed little presentation signals.
And others vary as far as everyone knows from person to person and the meanings are much more
tenuous.
Nonetheless, I'm not going to say anything about it, because I think that is probably the hardest problem I
can think of in communication. And I'm probably not going to get to it in my lifetime.
Head gestures. We nod our heads at one another. We shake our heads. We tilt our heads. There's all
these things that go on there. Body stance. Normally when we're communicating with someone, we
address that person with our bodies. But we can't always do that if we're doing some other task. If I'm
washing the dishes and you're there, I have to stand like this to wash the dishes. So I have to do
something to counter the fact that my body stance is not providing the right kind of information.
There are facial gestures. Facial gestures are enormously complicated. It's another really interesting area.
I'm going to say a tiny little bit about it in the course of showing you some video.
Lastly, there are things that have to do with social relationships and cultural norms. These are incredibly
important in how it is that we recognize and understand how the connectedness between the two of us is
going. They have to do with things like our relative status to one another. How much we are connected to
the other personally, a whole range of things that I'm not going to touch on at all, but you have to
acknowledge that they're there.
Okay. So how do you know when someone is engaged with you? All of the behaviors I'm going to talk
about are two-sided things. So there's what I do and there's what you do. And I'm trying to influence you
and you're trying to influence me.
So when we look at computational systems, the system has to not only be generating behavior but
producing it at the same time. When one participant generates a behavior and the other one responds, we
call that a connection event. And the question is what are the kind of behaviors that come up, and we'll talk
about that in a minute.
If the second person ignores the behavior of the first person, then we say that the event failed. So it's very
simple. I do something you either respond or you don't respond to it.
>>: Candy, so for multi-party, would you say that the second party is the group or do you ->> Candy Sidner: Oh, Dan and I were just talking about this this morning. It's a really complicated matter,
because there's the initial part of the group, it's who you're talking to in the group. There's the overhearers,
the people who are sort of in the background. And then beyond overhearers there are bystanders who are
not really even listening in but are sort of there in some capacity. So there's a really interesting question.
And the real problem is, in multi-group things, is who do you actually address.
So if I produce a behavior, who is it that I address it to? So right now I'm addressing sort of my comments
to you but I'm trying also to catch the eye of other people in the room. But since you asked the question,
I'm really addressing you.
Well, is the group everybody in the room or is it you? That's the real question. And it's a really interesting
problem about multi-group things. And I mostly have not looked at multiple group interactions. So you can
ask this question of Dan and hear what he has to say. In fact, if you want to answer, go right ahead. This
is a seminar. People are allowed to have their opinions. But it's really interesting from what a group
actually comes to.
So, okay. So what kinds of connection events are there? Well, the first type of connection event is what I
call is an adjacency pair. This is a term that comes from the work of Sacks and Schegeloff, who were
socioethnomethodologists who did their work in the '70s. Their notion was that somebody says something.
This is entirely linguistic. Somebody says something and the other person responds that is what their
response is how, is engendered by what the first person actually did.
So it's the paradigmatic example is the question. I say what time is it. You say 10:00 p.m. It's harder when
we start talking about adjacency pairs for other things, because sometimes I will say something like: I'm
going to go to the store. I'm thinking about buying a new dress. And depending on who I'm talking to I'm
getting ahas or maybe not ahas. Hi Dave. So there's this interesting question about what constitutes the
adjacency peer.
In the work that I'm going to talk about we've reformulated the notion to not just be straightly linguistic. And
that's because in the data we see things like someone will say to the other person, and this is a kitchen
scene. Knife and the other person hands them a knife. That's clearly a response to what was initially a
linguistic event. But the response doesn't have to be linguistic.
In fact, I'll show you, I hope, in the video I'm going to show you, one of the cases where there's a response.
The response is not even a task response, that is, giving somebody something, but in fact their faces do
the responding.
So there's all kinds of nonverbal reasons that count as a response, including when you nod at me when I'm
saying something. That count as part of the adjacency peer. So we've expanded this notion.
The other thing to say about adjacency pairs is in the original work that Sacks and Schegeloff did an
adjacency pair was first thing, second thing. They had a notion of what was called a third turn repair.
If the first person said something like do you want to go to the store and the second person said which
store, and the first person responded: Macy's, then that was the third -- you had three turns instead of just
two.
It turns out that you really want to be able to expand that notion even out to four turns, because people will
say things and this comes from my data. One person asks the other: Can we eat these when we're done
and the second person says: Well, I don't really think so. And the first person says damn it, and the
second person says yeah, that's the way it is ha, ha.
Now you could count the damn it and ha, ha, as their own thing and say well they're just engendered by
themselves. But clearly they're not. The damn it is in response to what the first person said.
So we actually have an adjacency pair that goes on. That's not really a pair. It's a set of pairs. So we've
allowed for that in the data.
My adjacency pairs also include back channels. Back channels are the phenomena of when I'm speaking
and you nod your head as a speak. I keep going on. You don't actually get a turn, but you're using a
nonverbal signal to give me information.
Another kind of connection event is directed gaze. And this is how we use our heads and our eyes to tell
the other person that we want them to do something. So there's sort of two ways to do this. One is to just
use your physical head to turn to look at something.
Now if I do that right here it's a little odd because there's nothing there on the floor for me to actually, for
you to actually see. But if we're doing a task together and what's over there is something that I want you to
actually look at, that's a way that I can actually get your gaze directed there, and that's something -- my
behavior and your response counts as one of these connection events.
Similarly, if I use my hand and I point, that's the same kind of thing. In that case I'm using both my
gesturable, my head gaze because I can't really point for the most part accurately without turning my head
to the right place, plus the use of my appendage to indicate a space that I also want you to point out.
So that's another kind of directed gaze. Those are two other cases that I've been coding in the data, I've
been looking at. The third case that I'm going to talk about is mutual facial gaze. Whenever we have an
interaction, there are times we actually need to look at each other. In run along conversations where
there's not some kind of other thing going on, most of the time we look at one another.
But not all of the time by a lot. One study I did, I looked at how two people interacted when one of them
was talking about various things that they were showing them in a laboratory with all kinds of cool new toys
that people invent.
And in that experience, what I discovered is that people look at each other about half the time. The rest of
the time they're looking around the room at various things. They're doing other things like they have to look
for the water glass. They have to look at it. Pick it up, when they drink from it. A whole bunch of things
like that. And that was in a fairly benign setting.
If you think about other settings in people's lives, for example, when you're outdoors you don't even look at
people that much when you talk to them because you have to navigate in the environment. And that takes
up a good amount of your face and eye time.
Nonetheless, we do look at each other at various points in the interaction. It serves a purpose, not only to
connect us. It often has other additional roles to play to tell us what it is that's actually, what it is we're
actually paying attention to if the person's understanding and so forth.
Now, those are just three kinds of connection events that are pretty obvious ones. There may be others.
So, for example, touch may turn out to be a way that we make connection to one another.
It's very hard to think about that when you're in a setting like this, because in public settings, when we're
not among close family members, touch is a very constrained kind of behavior.
Another thing is motion. If I'm really angry with you in conveying that, that's another kind of connection
event. It means that we're really connected, especially if you're responding to what it is that I'm doing
emotionally. So it may turn out that that's something else that indicates what the perceived connection is
between us.
But I'm guessing here because I don't have good data for that sort of thing. All right. So now I'm going to
show you some of what it is we've been doing in a human-human studies. We have collected a set of
videotapes of pairs of people doing a very simple task. And it's making canapés. So there's an instructor
who teaches the student, another person, how to make canapés.
These are crackers with lots of spreads on it, that sort of thing.
And after they do a set of these and arrange them on a plate, then the person who is the student takes the
role of the instructor with a third person who enters the environment, the original instructor leaves, and we
go through this all again.
And we have eight sets of these -- four sets of these interactions. So that's the study. And what I'm going
to show you now is just a piece of one of these interactions so you can see what's going on.
One of the things you should notice is that the instructor points a lot. He does other kinds of gestures,
namely the iconic ones, things like really big. Although he doesn't do that particular one. He also does the
sort of metaphorical gestures.
You'll see as he points, there are responses on the part of the other person.
>>: In this case, the instructor is the subject, not the confederate?
>> Candy Sidner: In this case, the instructor is the confederate. Okay. So there's a confederate and the
person who is learning from them. You won't see the case where they're essentially both nonconfederates.
I'm not going to show you that.
>>: So the confederate is aware of the purpose of ->> Candy Sidner: That's right. That's right. And the other case, the person, neither person is aware of
exactly what's going on.
So first we're going to take a look -- so I have videos that are -- they're done with two sets of cameras. You
can see one camera there. That's the one that gets the student's face. You can't of course see the camera
that we're getting the view from, which is this person who is the instructor. And so let's take a look at what
he does. [video -- cracker followed by either cream cheese or [indiscernible] something. And then either
put on pimentos or olives or something small like that].
>> Candy Sidner: Okay. So that's a very brief bit of information. But what you see is he gestures at a
bunch of things that he's talking about, telling this person about, on one side of him, he switches his hand
uses the other side to gain some things.
Interesting, one of the things you may not have noticed, and maybe I should play this again so you can see
it again, while he's pointing, he does not point this way. In fact, I have very few examples of this in all of
these videos.
Most of the pointing you'll see that goes on is this kind of stuff. I'm showing you things this kind of -- there
are a bunch of different. They're half taps which is what this is and there are half taps. I don't think we see
one of these but one of the things that the subjects do is kind of voila, here it is, I'm presenting it to you.
So watch again. You can see this. Whoops. [video: Basically we start with a cracker followed by either
cream cheese or [indiscernible] type thing. And then either put on the pimentos or olives or [indiscernible]
something small like that].
>> Candy Sidner: Of course, what you notice is at the end the student nods his head. He doesn't actually
say anything. So here's a case where that adjacency pair, his part went on pretty long. He points out a
bunch of things. There was a very small nod. You'll see this when we look at the video from the other side.
And then at the end there was a much larger kind of nod.
So here's the same video from the other side. [followed by] not quite. We're off a little. Okay. [video: To
start off basically start with cracker followed by either cream cheese or [indiscernible] something and either
put on pimentos or olives or something small like that.]
>> Candy Sidner: So one of the -- he did one of the things that I think is so interesting, which is his
response about what the teacher says about the things on this side is he nods his head as the teacher's
speaking, but at the end of it he doesn't nod his head he makes this funny expression with his face.
So clearly he's understood the interaction is happening, there's no confusion about that. But it certainly
wasn't done with anything resembling language or even a standard back channel.
And the other case he actually nods his head fairly seriously. So that's fun. It's always fun to look at what
people do. But the real hard job is annotating data. And we've been using the elan system. I'll show you
an example of that in maintain. The annotators myself and a student.
There's a whole slew of things we've obviously been annotating. We've been annotating what happens
with people's head in terms of directions, where their eyes gaze at and to some extent this involves
judgment on our part, because when you're looking at a video, you're to some extent you're taking the part
of the other person in the interaction and trying to say, okay, what exactly is it that they are actually looking
at.
You look at -- we look at pointing and we annotate which kind of pointing they actually do, what's being
point at, where their body position is. Largely in these videos, because there's a table, they're facing one
another, but we set it up so that for part of the video the instructor actually has to turn to the side and get
some materials off of another table and do something with them and bring them back to the other table. So
during that time you get some of this phenomena of how do you talk to the other person when you're in fact
not even facing them.
Okay. Obviously we transcribed their speech. We transcribed time intervals of referring expressions, what
the reference of referring expressions are, and those are not the same thing as the referring expression.
So a lot of what people say are alighted expressions for what they really mean. So the instructor says, for
example, crackers. And he taps on what's the cracker box. He's pointing to the cracker box and saying
crackers. Easy for us we know crackers in the cracker box, no big deal. But the point is the utterance and
the thing he was pointing to were not exactly the same.
We also code what the adjacency pairs are, that come from all of those other things, where the mutual
facial gazes are and what the responses are to pointing. All right. Give you a sense of how this all looks.
Here's the two videos, and here's all of the different annotation channels that we're keeping track of.
Now, before I say something about what we've been learning from all of that stuff, I'm going to give you a
little bit of a definition about what we actually mean by these things. So what's directed gaze? Directed
gaze happens because an initiator gazes at something for a period of time, which is what this piece of stuff
is right there.
And then there's a response. If there's going to be one by the other participant. And now when that
response actually happened we marked that here. But in fact there may or may not be a little break
between what the first participant does and what the second participant does. So optionally there may be a
space in here where one is gazing and the other is not or it might be that there's in fact a long overlap. And
that's what this dotted line is meant to suggest.
This part is the delay relative to the responder, and this point, when they're actually doing something
together, is the shared gaze. Okay. And if there's pointing, if it's not just the face, then sometime after the
gazer, the initiator starts gazing, they point and that happens for a period of time. Usually before the
responder actually gets to turn and look at something.
Okay. Mutual facial gaze is a lot simpler, the initiator gazes at the other person, obviously. And then
there's a response from the other person. This is the gaze point and this is where mutual facial gaze
occurs.
>>: Terminology, do you consider the neutral gaze phase to end at the time when either one of them stops
gazing or when the responder stops?
>> Candy Sidner: When either one of them does it. So that experience happens and then somebody looks
away. And it can be either one of them. Okay. So adjacency pairs, I think, I said a fair amount about this.
But there's a person who says something. They're the initiator. Then there's the response by the
responder. It can overlap with what the first person said. It can start immediately after, which actually
happens surprisingly often in my data, or there can be a break before the responder says something. And
then, of course, there can be, in the case of third turns, there can be another response by the initiator. And
I don't show here, but of course in the case where they're actually four sayings, four things before the
whole thing ends it will go on out here.
Okay. And finally there are back channels. Back channels are the initiator saying something, the
responder responds with some kind of head motion, or I think it's possible that there could be some kind of
other facial expression like the one I showed you.
And then that stops at some point.
>>: Verbal or do you put things like ->> Candy Sidner: We allow for uh-huhs as well. So there can be verbal expressions as well as nonverbal
ones. Thank you. I didn't think to mention that.
Back channels is a very complicated matter. It's been talked a lot about in the literature in terms of can you
back channel anywhere? Is it controlled in some way. Those are open questions as far as I'm concerned.
Okay. So we're interested in the matter of time that occurs between these various kinds of connection
events. And we've defined a time as the time from one connection event starting until the next connection
event occurs. And that's because that allows us to have overlaps. And the reason we've done this is we
have a hypothesis that the meantime between connection events captures what we all informally
experience of the pace of the conversation or interaction as a whole.
When you're interacting with someone else, there's the kind of uptake that the other person has in the
interaction. And sometimes a conversation can, an interaction as a whole can have a very kind of slow
pace. You say your words pretty slowly, the other person doesn't pick up right away. They may speak very
slowly.
And that feels very different than a conversation where you say something and the other person says
something or nods and there's this very quick uptake and that keeps happening, kind of rolls along in that
kind of way.
So we're interested in understanding this, because we think pace is a very important indicator of how it is
that the engagement process is actually going.
So the faster the pace is, the less time there is between connection events. So basically pace is
approximately one over the meantime between events. Okay. The reason we're -- yeah.
>>: Curious. I would see how that would make sense maybe with a task like the making of the hors
d'oeuvres or whatever, but as the cognitive complexity of the task increases, if I ask you a question that
requires you to really think, I may purposely sort of back out of the conversation in order to give you more
space to think, and so you're deeply engaged in our conversation but processing it in a different way.
Is that -- would you consider that engagement or would you consider that some other ->> Candy Sidner: So you've asked exactly the right question. This is a question we've been asking
ourselves as well. When we're interacting, a very fast pace means boy it's very clear we're doing
something together. If you ask me a question and I have to say, I sit there for a minute and I think about it.
Obviously the pace of the conversation has changed. In fact, such change sort of almost instantaneously,
if you will, whereas what we see is that the pace may change if you take a sliding window over time in the
conversation and look at connection events in the meantime over time.
So the question is where do you get evidence that the other person is no longer involved? Certainly if
they're indicating, they've slowed the pace down, but it's clear that they're still involved, that is, they do the
kinds of behaviors which largely are kind of looking up like this and the other one people do is look down,
they serve different cognitive purposes, apparently. But in that case you're clearly not ignoring whatever it
is I've asked you or vice versa. On the other hand, there can be in some of those kinds of circumstances,
it's clear that the participants are no longer engaged with one another.
So one of the reasons that we're interested in gestures in all of this is, because if it's just what I say and
what you say, clearly that's not enough, and so if I ask you a question and your gesturial behavior comes
relatively quickly even though you haven't said anything yet you've begun to respond to my interaction.
That's why we want to count the nonverbal stuff as part of that particular process.
Now, I'll show you in a minute -- just a minute Tim, I'll show you in a minute some other issues that I think
come up that make this all a little more problematic. So your turn, Tim.
>>: I was going to say even outside of the nonverbal cues, psycho linguists have found that people will
signal they need more time based on the feeling of knowing, so they use hums and ahs to signal what you
just asked me is a difficult thing and I'm going to take some time to think about it. So they're much more
likely to say um instead of ah, and put you through a whole bunch of [indiscernible] at that length of time
between um and a pick up is much longer than an ah. People seem to be seeking how long information
seeking will take and signalling that to show they're still engaged. They still want to be part of it but they
need some time.
>>: So um versus ah is cross-cultural.
>> Candy Sidner: It's probably American, frankly.
>>: No. It's not. I know some countries don't use the same kind of disfluencies. And I know that someone
did a similar kind of analysis for Japanese and found that there were slightly different markers, but they
serve the same kind of purpose.
>> Candy Sidner: Yes. I actually had a colleague many years ago who didn't do ums and ahs he did
[indiscernible]. So whenever he didn't, whenever he wanted to indicate what was going on, and he
certainly didn't want to not have the floor, you got these Latinisms that were very interesting and very
strange. And it drove everybody else crazy. But, nonetheless, there are some people who can do
something much more interesting than um or ah.
Okay. So let me give you some statistics that come from one of my pairs. This is nine minutes of their
interaction. So it's not even their whole interaction, because most of these pairs run about 12 minutes. So
what do we see? Okay. Things like directed gaze. There are 19 directed gazes and what do you know,
most of them succeed. Usually when the one person turns their head the other person actually pays
attention and responds. The mean times, et cetera, are not very far off from one another.
On the other hand if we look at mutual facial gaze, it's really different. It succeeds roughly half the time.
And the rest of the time it doesn't work.
Does that mean the other person's not interested in what the first person is trying to find out about them?
The answer is I don't think so. Because this task, the task itself is actually very significant and takes a lot of
eye space. So if you're making crackers and you have to spread stuff on them and cut up little doodads to
put on top of them you actually have to use your eyes a lot to make that process happen.
So that consumes one of the participants, namely the student, for a lot of the time. So he seems to miss a
lot of the mutual facial gaze requests that the other person gives him. Adjacency pairs.
There are about 30 out of, two-thirds of the whole bunch that succeed. But there are a surprising number
that fail. Why is this? Again, I think it has to do with this particular task. When the student is busy doing
things, the teacher occasionally is explaining other things about what's going on. And the student is
interested enough in what he's doing that he simply doesn't indicate any response at all to what the person
said. Did he not hear them? Well, probably not. But he simply doesn't respond to even with head shakes.
He doesn't do any of that stuff to indicate what it is that's going on.
Finally, there are about 15 back channels. That's not a tremendously big number. One of the interesting
things is the meantime between connection event and the thing that's interesting here is while the
meantime is about a little under six seconds, the maximum time is huge. 70 seconds. So what's going on
here? Well, again, this has to do with a particular task that they're doing.
For a long stretch of this interaction, one of them is actually making a bunch of canapes. And the other one
is organizing the plate putting the plate together. So in some sense they're kind of doing parallel play,
parallel activities. They don't have to say anything to each other during those period of times, and they
don't. They're not people who know each other well. Whereas, if they were, they might be using that kind
of dead air time to make jokes, talk about something that they're both doing. What's happening on
Saturday night. Whatever. None of that goes on here, because they don't know each other very well. And
also possibly because this is an experimental setting.
One of the ugly things about putting people in a laboratory is that they freeze up a little bit. And they act in
a sort of more formal way than they do in their kind of more normal circumstances. So that may be another
effect on this. Nonetheless, there are long stretches where they're just doing their own thing.
So the question is are they disengaged? Well, in a certain way they are. But clearly they're not in the
sense that they are still both committed to the task that they've undertaken and they're doing what they
need to do to get it done. So they're engaged at the level of their task and therefore have this connection
to one another. But their connection is not reflected at all in things like what they say to each other, how
they look at each other, or the other objects that they have in the room.
So clearly a component and the nature of how we see ourselves as connected to other people, is mediated
by the nature of the activities that we have to undertake.
And so the interesting question for us in the long-term is how do we bring that to bear. All right. Now we're
going to switch gears from what data tells us and say, okay, let's think about how we get a robot to be
involved in this kind of activity.
And here's the setting for how it is we're thinking about these problems now. We are not yet to the point of
having the robot make canapes. But in fact our robot can't do that. But he has hands. He has them like
this. He can point to things, because he's got enough degrees in his arm he can do tapping actions but he
can't pick anything up. So he's never actually going to make canapés.
The simple case is that the robot points to something and the person points to something. So these are
two sides of the pointing game. There's the human. Human can nod and shake its head. The camera that
the robot has makes it possible to recognize head nods and head shakes. We're using the Watson system
from MIT to actually do that.
And the robot can also not and shake his head because he's got the right kind of degrees of freedom in his
neck. He can say things to the human. But for the moment whatever the human says back is
gobbledygook as far as the robot is concerned. It's not because we don't want to do speech, but we've
done a simplified version of the game for the moment.
We started that originally because the robot's got motors that you will get to hear in his arms that made so
much noise it made the speech system impossible. We have thankfully two weeks ago finally gotten this
fixed. We had to take the robot back to its original designers and they put in different kind of motors and so
we're a lot happier, because it's a really screechy sound and it was horrible to even work with the robot.
That's the setup that we have. And what we're doing is we're developing a reusable software module for
robots. And it implements the recognition of engagement only. So engagement has two parts to it. It's
recognizing that someone else wants you to be engaged with them or is indicating their engagement.
And the second part is what do you do about that? That is, do you respond or not respond? And we're
trying to develop a set of generic algorithms that are independent of particular robot software details and
that can be an independent package and this is being packaged up as a set of ROS messages, which is
the framework that's been developed by Will Degrage [phonetic]. Here's the basic picture in which this
engagement module actually sits. There's a whole lot of the rest of cognition going on. The robot has
sensors to the world. In our case they happen to be vision. There's some sensory input. We know when it
is that the person actually says something and where the sound is coming from, we just don't know what it
means.
But you can imagine having real speech understanding. The human is gazing and pointing and nodding
and shaking, doing all that stuff. And the robot has to make similar kinds of behaviors, and he actually has
to do that in terms of actuators.
Okay. So what is the engagement recognition module getting? It's getting essentially three sources of
information. From the sensory information it's getting what the human actually did, did they gaze, did they
point, nod, shake, that kind of thing.
It's also making use of what the robot itself is going to do. What the robot actually decided to do is going to
be important for this engagement recognition module. Furthermore, we need to know what the rest of the
cognition has said about what kind of goals the robot actually has.
Does the robot actually want to be engaged? What kind of engagement is he trying to do. And of course
how the floor changes, that is how turn-taking actually happens.
But what it gives back are basically two pieces of information. One, did the engagement goal succeed or
not? And, secondly, ongoing statistics, a sliding window about what the meantime is between
engagement.
So to summarize, again, the information about where the human looks, what they point out, et cetera, we
want to be able to recognize what the human initiated connection events actually are.
And we want to be able to know when they terminate, because that tells us when the meantime between
connection events is. So we've got to recognize that process actually happening and that will be because
the sensory information will tell us something about what it was that the human actually did.
Okay. Similarly, information on where the robot looks and points and stuff is to allow us to recognize that
the human completed whatever it was that the human asked of them. So if the human wanted the robot to
recognize a directed gaze, for example, then we want the recognition module has to know that the robot
actually did it.
Okay. Did the robot actually turn and look at it and keep track? That means there's a connection and
succeeded as opposed to one that failed. The engagement goals, we need to know when to begin to start
waiting for human response, if the robot produces some kind of connection event. Because the real
problem for the robot is when should I stop waiting? I've done this thing, when should I stop doing this,
when should I stop expecting the human to actually do something, so that business about pace of the
conversation is going to guide what it is that the robot's going to actually do and that's why we need to
know about the engagement goals.
And finally we need to know about floor exchange, because you have to know when it is you're actually
supposed to be taking over and doing something. Okay. Now, when we started this, we thought, oh,
engagement recognition module is going to do all these things and all this stuff. As we pared away what
really needed to be there, it turned out there only were two things that the engagement needed to be able
to tell the rest of the cognition. And that was did the robot initiate a connection event, succeed or fail? So
that the robot could decide when to stop looking and when to stop pointing.
So if the human says: I want you to do something, you've got to decide all these things about how you do
it -- when you're getting the human to do something you have to decide when to stop.
And finally, the statistics of in terms of a sliding window about current pace, as a way to know whether the
connection for engagement is weakening. Now, this is just statistics that's then provided to the rest of the
cognitive architecture, because some other component, for example, the engagement production
component is going to have to decide what to do with the fact that the pace has changed. It has to make
decisions about how it is that it should respond, and it may involve a whole lot of other planning processes
than just the production component.
So that's why we were providing those kind of statistics. Okay. The architecture we're using here evolves
four different recognizers. So it's one for each of the different kinds of connection events I talked to, and
we distinguish out the back channel cases as a special case of adjacency pair recognition.
And these operate in parallel because in fact, of course, somebody can be saying something as well as
moving their head around. So we have to be able to keep track and make all of those things operate at the
same time.
Now, I'm not going to talk today about all four of these. I'm just going to pick one. I think I'll take the first
one, which is to give you a sense of how one of those recognizers actually works. And so we'll start with
directed gaze.
If you remember, there was the basic picture of what directed gaze is about. And so there are two different
kinds of things that could be going on here. At the start, either the human's pointing at an object or the
robot itself has directed gaze for the human to something, and if it's that case, then the robot, of course, is
waiting to see if the human will actually respond within the window that it currently has its notion of what the
meantime is.
And if that happens, then in fact the statistics can report that in fact it succeeded and then at some point
either the human or the robot will look away and then the activity is over.
In the case where the robot's waiting and there's a timeout because we've now moved past what it currently
thinks pace of the conversation is, or because some other goal comes up like mutual facial gaze, where the
robot's not pointing, then we get a failure circumstance.
On the other side, if the robot is in the situation where the human who is going to point, then this module is
saying okay the human is waiting for the robot to do something. And either the robot's going to succeed in
gazing and this module is going to get information from the actuators that actually did what it was supposed
to do, or again there's going to be a timeout, and so this module is going to be able to report that it failed.
So it's a fairly simple mechanism.
>>: Do you think the sensors are noisy, how do you account for certain [indiscernible].
>> Candy Sidner: I'm really glad you raised this issue. This is clearly a very finite state interpretation of all
this stuff. And we've been talking about what it would mean to do the stuff with them much more serious
model of uncertainty built in. The Watson system is itself, you know, based on a probabilistic kinds of
models. So it's got -- it kind of in its own way does uncertainty. But at any one of these particular points,
you could be correctly or incorrectly making assumptions about -- I mean, you know about the robot's goals
very clearly. But when it comes to human, you know, did they really look at it or didn't they?
You get a certain amount of information from the Watson system and you make some decisions. And
obviously we'd be a lot better off if we had some probabilistic ways to look at those things. That really turns
on having enough data to be able to do that kind of thing. Which we don't at the moment have.
So it's an issue that's been kind of racking our brains. In the work I did at Mitsubishi the way we got the
data was to run tons and tons of subjects. I had 100 people interact with Mel the Penguin. So we ended
up with lots and lots of data.
We are not at that point yet. So we just don't have a data source to give us the possibility of looking at
other ways besides some very simple finite state models. So I think of this as a bootstrapping process.
You get the whole thing going and then you're ready to run subjects and you get lots of people to play with
your thing, and then you have enough data to think about other ways to do it. So we'll see.
>>: This one will be a little easier. If you recognize that the human is gazing at something, right? What is
this thing doing to say you robot ought to respond by ->> Candy Sidner: It's not doing anything. It's not doing anything. That production problem, which is -that's a production problem. You recognize that the human is gazing. Some other component, namely a
production component, has to decide am I going to engage, am I going to do that? Am I too busy doing
something else that I can't actually do it? That's not the job of this component. This component's job is to,
when that production component decides, okay, I'm going to actually look at it, the job of this component is
to say, fine, that happened.
>>: Production.
>> Candy Sidner: I beg your pardon.
>>: You consider it part of the production thing's job to recognize a request to engage has happened rather
than part of your job to recognize to engage has happened. What seems odd is a structure to me.
>> Candy Sidner: Let me be clear about this. That the request has happened is something that this
component actually needs to know about.
>>: Yes, that's right.
>> Candy Sidner: So it gets information that says the human did some kind of pointing or something like
that.
>>: Some other process may or may not be a cause to go take some action on.
>> Candy Sidner: Right. So this is the other component that has to decide, okay, something like that
happened, I could decide to respond. I could use my, that connection event and respond to the connection
event or not. All this component does is say, fine, when that actually happens or doesn't happen, it keeps
track of that information. It's a kind of, you know, it's a kind of secretary accounting, bean counter.
>>: When it failed do you put out reasoning for the fail? Because that's a very distinct thing, the reason
we -- we decide not to engage.
>> Candy Sidner: We report out which module it was that failed, okay? So we're just talking about directed
gaze. So we report out the directed gaze field. And in the case where it's an adjacency phenomenon we
report, et cetera, et cetera, for each one of those boxes in -- for each one of these things, we report out
which component it was that failed as well as the failure. So that's the critical information.
It might turn out that we would really want more than that to reason about it, et cetera, and I don't know -we just haven't reached the point to know whether that's critical yet or not. Okay. So there's obviously
these things for everything. We're not going to go through them.
As I mentioned, we are providing the results of our recognition component to RS time?
>>: No, I have a question because you're not going to talk necessarily I think about the point. Do you like
do, you use Watson for the gaze do you do pointing recognition also?
>> Candy Sidner: We use pointing recognition. We use a totally different way to do that.
>>: Decision based?
>> Candy Sidner: Simple tracker that's tracking what the robot's hand's doing and also what the person's
hand's doing. We don't use colored gloves at the moment. We thought we were going to have to do that.
But it turns out that the algorithms we're using are good enough without that.
They're doing very simple kind of geometric stuff. Little blob stuff. It works pretty well for the setup that
we're talking about. Okay. So the recognizer we've created is now available as an RS package if you want
to play with it. I realize it's not in Microsoft Robotics Studio, but it's there if somebody wants to look at it
nonetheless.
Now I'm going to demonstrate the pointing game and I'm going to demonstrate two versions of it. This is
the first version. And here the set up will always be the same, so I'll tell you a little bit about what is going
on. This is the robot obviously. Two degrees of freedom in his neck, two in his shoulder. One degree of
freedom here. He's got eyes that go back and forth in his head. He's got two degrees of freedom in his
eyes. He's got two degrees of freedom in his little lips, and he's got eyebrows that go up and down.
He has two cameras. The camera you can see up there is the stereo optic camera. That's how he
recognizes what the person's head is doing. There also is a camera you can't see which is up here. And
this is where we get the information about what are the objects that are on the table what the hands are
and what he's actually doing. And those are the kinds of various things in the setup. It matters that the
plates are different colors, because object recognition is a problem in its own right and we decided to use a
very simple algorithm.
So having colors makes object recognition really easy. Okay. So there are two versions of this. The first
version done last spring was an undergraduate project. And it's one giant state machine. It's really cute.
You'll see it works pretty well. It's kind of neat.
But everything is all kind of mooshed in there and together. There are some generic engagement rules but
they're all just part of this big ugly finite state machine. The second version of what I'll show you, the
observed behavior is sort of equivalent. It's not really virtually identical. It's much weaker on engagement
generation. You'll see that when we look at it but it has this engagement recognition model set up to make
use of that stuff.
So now, let's see, okay. This is the first one. So [beep] I'm a little worried. It's not making noise. I wonder
why. [Pause] [hello, my name is Melvin. Let's play the pointing game] [pause] you pointed at the orange
plate, please...
To start out with you'll notice something very interesting about this. The robot's just kind of looking forward
in space. And that will change in a minute but the very beginning they didn't have him do any kind of
natural movements in that way.
>>: You have a camera [indiscernible] the hand function?
>> Candy Sidner: Yeah, but the.
>>: The robot's eyes I notice doesn't follow whatever the camera on the top is doing. So that's a little bit
unnatural.
>> Candy Sidner: What the robot should be doing is either looking a little bit down at the subject, at Brett
as it turns out, or looking at Brett's hands. We'll see he does that better as it goes along [point again]
[pause] [you pointed at the orange plate. Please point again. [pause] [you pointed at the blue plate.
Please point again. ] [Pause] [chuckling] [you pointed at the orange plate. Please point again. ]
You see the big sigh at the end that's because Brett was very happy it all worked. We didn't have to do the
video again. Okay. Now, we're going to see the more recent version. And what you will notice -- I first
have to make this thing bigger. Full screen. What you'll notice about this case -- is it paused? Wait a
minute. First of all, stop. You'll also get to hear how noisy the motors are. What you'll notice about this
case is that the generation behavior of the robot is much, much simpler. Whoops. Where is the thing that
makes it -- oh, there it is. Okay. [hello, my name is Melvin. Let's play the pointing game] [pause] [you
pointed at the blue plate. Please point again. ] [pause] [you pointed at the blue plate. Please point at a
plate. ] [pause] okay. Thanks for playing.
Yea. Now, one of the things you'll notice about this is that Aaron is actually using -- so that's our other
student. Aaron is actually using nonverbal behavior to communicate with the robot. So when the robot
says please point, he nods yes and then later on he shakes his head no. You'll notice, however, of course
that there's only one plate to point at. And that's because the robot has a much simpler model of what it is
actually going to do as part of generation.
All right. So what I've shown you today are a couple of things. One I've defined what the notion of
connection events are for various types of indicators of engagement. And secondly I've given you an
architecture that clarifies at least where this one particular piece fits. That is, how we do the recognition of
engagement with respect to the much larger part of the architecture.
And, thirdly, I've talked about how we have created a reusable module for doing engagement and made it
available in RRS. So what's next? Well, lots more data analysis. I have been through all of the
videotapes, but not coded them all for adjacency pairs and so forth. So a lot of the things I mentioned like
where the head turns and all that stuff is done, but I haven't done all the adjacency pair work yet. So
there's a lot of work to be done there. The other part of the analysis is to look at the question in more detail
and start counting in terms of didactic behavior, what kinds of didactics do we actually see, that is, how
often do people actually say something where they're pointing to something they're actually naming the
thing in a direct way and how much do they do this sort of alighted crackers and pointing to the cracker box
and pimentos, when you're pointing to a jar of pimentos and so forth. The reason this matters when we
start talking about robots we're not talking about people anymore, and things that have to do with allusions
means you need some other interesting bunch of information about the nature of how objects are
structured, what they contain, all this stuff, in order for all that stuff to make sense.
So it's a much bigger undertaking. We want to do some studies to actually evaluate this module, but that is
going to wait unfortunately until we have a better generation module as well so that we can actually do a
version of the pointing game that fits together nicely.
We also want to be able to do the reverse pointing game, where the person tells the robot to point to a
particular plate, and that of course means that we have to get speech into a better state than it currently is.
I mentioned one of the things that I'm interested in pursuing more, which is this question of parallel activity
as a weak form of engagement. Dan and I were talking today about, you know, is engagement kind of an
all or nothing thing, or there's this kind of gray scale of it. And if that's the case how do we want to begin to
model that kind of thing.
The other question is what other kinds of behaviors are there that signal engagement? The one about
facial displays, I showed you an example of that, because it occurs in the dataset. We don't really have
any way to envision at the moment to be able to deal with those things very nicely. There are people who
are working on the recognition of emotion, if you will, in faces. It comes from Paul Akman's original work on
identifying emotion in people's faces, and there have been various vision algorithms that can do some of
this kind of thing.
But some facial displays are not really about emotion quite so much as they're just some indication of
change in the face that we pay attention to. So it's not clear what that really means.
And the question about engagement being a model on some kind of scale. I've already talked about this a
little bit and that's the question of how we represent uncertainty from robot sensing, from the kind of finite
state models that we have. And it's pretty important because human behavior is really, really
unpredictable, even in this relatively controlled setting of two people sitting across the table from one
another.
Okay. That's it. Questions?
[applause]
>>: I have a question about mutual facial [indiscernible] the high failure rate that you saw there, was that
perhaps it was not always intended that, for example, versus looking at another person to see what they're
doing, but the other person knowing that doesn't say we want to kind of react and maybe continue what
they're doing. So it's not to say we just, the intent was to have a mutual face but it was just to continue on,
maybe there's two things happening.
>> Candy Sidner: There's a difference in the circumstance. Because of the nature of the activity they're
doing, you can tell what somebody is doing by looking at what they're doing with their hands. That's very
different than directing your attention to their face.
And so I distinguish them, when I'm talking about mutual facial gaze I'm talking about looking at their face.
And so in this circumstance, if I want to know what the student is doing, I can look at their hands and see
that.
I look at their face, presumably because I have something that I want to convey to them. There's some
reason I need to get their attention in terms of their face.
>>: Looking at the face can also give you other information maybe like are they happy doing the activity,
are they sad doing the activity. Looking at the hands doesn't really communicate that, too. So you might
be checking for that but at the same time that the person doing the activity might not be ->> Candy Sidner: Yep, that's very possible. So one of the questions about the rate of failure is, you know,
what's the typical rate of failure? Nobody knows. Nobody's ever -- there's been lots written about mutual
facial gaze. Nobody has ever counted before as far as I can tell. We don't really know about that stuff. I
know a little bit about back channels because Tim Bickmor for example did an account on back channels.
I've seen accounts of other interactions with robots to know something about what the back channel
behavior is there. But we don't really know.
>>: Now, how long is the term failure, not my field, but how long the term failure has been used in that
context, but I would think that particularly with something like facial gaze, if it's a class of interaction that
you don't fully understand yet, the use of the word failure sets up assumptions.
>> Candy Sidner: Yes, it sets up assumptions, you're quite right about that.
>>: That may not prove out to be the case but they can have a real impact on how quickly you converge
on what it actually is just because of the assumptions that we are all trying to accomplish something.
Which maybe we are. But it sounds like you don't have the data to ->> Candy Sidner: I do know that in other circumstances, you know, facial gaze -- people don't look at each
other all the time. They move around and the other person doesn't even track them always when they
move on things. So there is -- so it is a very complex phenomena, and you're right, it may be calling these
failures, but the problem is that at the moment I don't have a way to even talk about -- because I only have
what I can observe the two humans doing. I don't know when the one person looks at the other's face, did
they do it because they're just trying to find out if they're happy or sad or bored or if in fact they intend to
get information from them. I might be able to get a little bit. I'll go back, look at the data in terms of this.
Do they then say something, for example? So if I do mutual facial gaze with you because I'm about to say
something to you and I think it's important that you should pay attention, then clearly that's a case where I
really want to do it to convey information. I really need that facial gaze as opposed to I'm just kind of
checking in to see how life is going with you.
>>: I would hypothesize that you ran the same test that you showed at this distance. My guess is that the
rate of mutual gaze would sky rocket because it's a more comfortable distance from me to make eye
contact with a stranger whereas three and a half foot distance, in our culture, that's a much less common
thing to do with a stranger. I don't know.
>> Candy Sidner: Well they're strangers, remember, but they're strangers that have undertaken an activity.
It changes things. It's really different if you and I are standing this close to each other. That would be really
strange. That really would because we don't know each other and so forth. But once they're undertaking
this activity, there is this question about how much that mediates. These are all questions -- I don't know
the answer to. But you're right, there is clearly those -- there are those kinds of effects.
I also have been looking at other data with another colleague, and these people are sitting fairly far apart.
And so you don't even see their eyes move very much, because they don't have to do that because the
distance is far enough that you don't have to worry about eye movement in order to see what's going on.
>>: Have you tried coding interactions in films?
>> Candy Sidner: You mean like in movies?
>>: Right.
>> Candy Sidner: Actors doing stuff?
>>: Exactly. In order to --
>> Candy Sidner: No.
>>: There the director as somebody who has spent a lot of time observing human interactions is
attempting to convey a class of engagement interaction, et cetera, et cetera.
>>: Want to get real good insight in what you're doing, sit down with a very good director.
>> Candy Sidner: Find out what they do.
>>: Use the word pace. I just got off a production I was the director. He was always on about pace, and
his observation was if you want a faster pace don't speak faster, that just makes it confusing. Cut the gaps
down. So it works in place you know what the next one is going to say. So we know how to do it there.
>> Candy Sidner: And you also know when they're going to finish what they're going to say so you get
more information presumably.
>>: In real time but you want someone with professional experiences observing people, seeing whether it
looks realistic.
>> Candy Sidner: Yeah, right.
>>: So you can engage in a very interesting conversation with him about that with respect to that.
>> Candy Sidner: Okay.
>>: One of the things that is unnatural about plays and movies with few exceptions in particular directors,
people talk simultaneously in plays and movies a lot less than they do in real life, unless you're talking
about Robert Altman or Woody Allen, somebody like that who makes a point of trying to mimic real life.
>> Candy Sidner: Yeah, that's interesting.
>>: A very long time ago I did narrative analysis and conversational analysis with [indiscernible] and we did
studies where we annotated somebody else's taped conversation and we also annotated conversations in
which we ourselves participated.
>> Candy Sidner: Yes.
>>: And the annotations are different because you can know what it is that you were trying to accomplish
at the time.
>> Candy Sidner: Right.
>>: It makes it very hard -- it would make it very hard for you to annotate your own interaction with the
robot, because you know so much of the system.
>> Candy Sidner: Yeah, sure.
>>: But have you -- like if you were to take kind of interaction with Dan's assistant, and then sit back and
annotate your own interaction, I don't know if it would be interesting and if you would end up with a different
annotation.
>> Candy Sidner: I do know that a standard technique to do with people interacting with some kind of
software system, whether it's got faces or whatever, is they have their interaction, whatever the laboratory
setting is, and then one of the things you do is you go back with them through the videotape and ask them
about various -- especially the points you really are interested in, since you've often set these things up,
and ask them to tell you what it was that was going on there.
And that often is very revealing about that particular process. It's pretty hard in the case of the
human-human data for me to go back and do that now because in fact it's been a fair length of time. This
data was captured about 10 months ago. So the participants probably wouldn't know any more, and a
bunch of undergraduates who graduated so that's no help either. But it is an interesting question whether
we should plan when we collect these kinds of videos to look at them real quick, come up with a set of
questions and then go and do that. But, frankly, if I looked at them really quick I wouldn't have known what
I know now having spent a lot of time on this stuff. Now I would really know what kinds of questions I would
want to ask. So there is this problem about what you're looking at. So Dan.
>>: So this is where you were asking about earlier about the [indiscernible] I like the notion of connection
events and identifying some of these classes. But it also seems like -- I don't know if this is influenced by
the particular setup you have, where you [indiscernible] have you thought of looking at sort of the
disconnection? Right now everything is the positive definition, the place is defined by these positive events
that all indicate engagement. So the measure of engagement is dictated by that positive measure and
wondering if it makes sense to have signals for the opposite of that where the lack of engagement is not
just the lack of the positive but actually negative events that ->> Candy Sidner: Well, all of the failures constitute an interesting class. They're essentially native events.
I say something you don't respond. I point at something you don't respond. The face, I look at you and
expecting to have you look at me so I can say something. And so we could in fact -- we haven't done this
yet -- we could in fact look at -- try and think about what kind of measure one wants out of the failures, and
we haven't done that yet. So the meantime between connection events and for the ones that succeeded,
not the ones that failed. So that's the other side. And then I think that's true. It would be useful to look at
that kind of thing.
You have a question?
>>: I was just curious, the use of the physical robots as opposed to I'm watching the person that the robot
that I'm talking to on TV. I'm sure that's something that people have looked at, thought about, et cetera, do
you have a sense how that distinction then [indiscernible].
>> Candy Sidner: There are people that have looked at this. People are trying to understand what's the
difference between having a character on a screen, the kind of thing that Dan has been doing, versus
having this thing in front of you. There clearly is something different, and the question what is that? And so
people have been struggling to come up with ways to try and measure what this is. Mostly they've been
able to find out that there are differences in how much people trust the thing. They're sort of very kind of
not very revealing in a certain way. Bits of information.
I originally got interested in robotic stuff because I wanted to look at the problem of how robots point in the
world. You can't do that very well. You can't point in this physical situation very well with something that's
on a screen that's 2-D. Whether that's the robot on television or a character you create with animation, it
looks weird. It's hard to make it work very well.
>>: It was interesting your robot was ambidextrous. I assume that was a technical limitation in how far it
could move.
>> Candy Sidner: Yes it's a technical limitation. It turns out in order that he won't break himself by hitting
himself, he can't get his hands any closer together than this. So these were all stupid things you have to
worry about when you have these physical devices. So he in fact was designed so he could never clap his
hands together because if he did he'd break his arms. This is a good robot in one sense his arms are very
lightweight. He can't hurt a person. That's the thing you have to worry about with robots. But he can hurt
himself really quickly. And we did it when he was first mobile, he went into a doorway and broke his elbow
joint. This is the kind of thing you have to deal with.
So that's a technical limitation. But also if you may have noticed, if you remember from the short video clip
I showed you, the teacher says you know about the stuff here and then when he turns and talks about stuff
there, he switches hands. So that's another thing that people are ambidextrous and they do these things.
It's not a bad thing.
>>: Person to person, people are so one-handed dominant.
>> Candy Sidner: Yeah, that they would go the other way. I've seen both in among all the other subjects
that are involved.
>>: When they switch hands related to the fixation, and relative to the fixation?
>> Candy Sidner: I think so. I mean, it's much more awkward to go like this to point at something over
here when you've got an appendage that will do it this way instead. But there are people -- but again it may
have something to do with a certain amount of dominance.
>>: The visual of how the visual, the split of the brain, the left one is split relative to sensation from the left.
So it's kind of closer to coordinate the same corresponding hand opposing hand.
>> Candy Sidner: That may in fact be why it's natural to do this you get this effect for things over here. But
there are people who will go to the trouble to point like that. I mean, I do have some cases where people
point that way. So it may have -- I think that's probably their right hand, but nonetheless. So there's
another interesting question in production which is what's the right way to kind of not at the level of
conveying engagement, but it is an interesting question about how you use the limbs of the robot. So we
could make it completely arbitrary. Well, thank you very much.
[applause]
Download