>> Andrew Kun: Okay. So I guess the... questions at any time you like. And I do...

advertisement
>> Andrew Kun: Okay. So I guess the one thing is please feel free to interrupt with
questions at any time you like. And I do want to start this by thanking two people: Tim
Paek, who's been collaborating with me for almost two years now, and a lot of the work
that I'll be talking about here he's been participating in, so thanks, Tim, and looking
forward to continuing this; and also Ivan (phonetic), who invited to me to this talk, and
hopefully we'll have chance to collaborate as well.
I want to acknowledge my students who are generating all the data that we're going to be
talking about here today: John, who's working on a navigation experiment; Oskar, who's
in the audience here who is working on a press to talk experiment; Jacob, who's a
Microsoft Research intern this summer actually with Tim and also working on the
mapping experiment; and Owen, who did a lot of the video work you'll see here; Alex,
who's working on a human-human study that I'm not actually going to have a chance to
talk about much, but we're basically learning from human-human interaction, trying to
form our human computer interactions from that. And then Puneet (phonetic), a summer
intern who's working on one of our next steps, which is creating the UNH obstacle test,
which I'll talk about soon.
So a short outline. I just want to introduce the topic by talking about ubiquitous
computing in cars and what the relationship is, and then basically talk about the studies
that we've worked on and say a couple of words about what's next. So ubi-comp in cars,
the idea is ubiquitous computing, of computer being everywhere, networked but sort of
fading into the background.
I just did a little quick online search, and, you know, Intel certainly thinks that there's
something to this. This is a thing from their Web site where they're thinking about the
car being at the center of this entertainment and communications and all these things
coming together in the car.
Navigation is certainly something that we see a lot in cars. And I did like this picture
from the Web site of one of the major manufacturers. Does it look like the person's
driving to you? The left hand there seems to imply that this is a person driving. And as
you'll -- you can probably guess and you'll also see from our data, this would not be such
a good idea, right, driving and doing pointing at the same time. But, anyway, so I like
that picture.
The Zune, you can buy that with the in-car attachment and, you know, iPhones are
everywhere. So certainly cars are getting into this age of ubi-comp where things are
getting into cars. So how will this progress for cars? And one person who's in this field,
Russell Shields, who is the CEO, I believe, of Ygomi, thinks of cars as docking ports, or
cars are going to become docking ports in his opinion.
So what is a good docking port? In my opinion that would be something that provides
you with an open interface so that you could in fact dock into it.
Now, if you're a car manufacturer, this sounds good on the one hand because, as we've
seen in the slides before, people are bringing in these brought-in, third-party aftermarket
devices. And that's sort of a reality they have to deal with, and perhaps there is value to
saying, look, my car is an open interface. But then there's the liability issue too. So if
you're a car manufacturer and someone plugged in their MP3 player and then crashed
because they were playing with it, will you get sued. And also there is the profit issue.
So if someone else is producing the faceplate for the new radio and you're not the one
determining what the size and the shape of it is, then you're not making that extra money.
So there are certainly things that drive car manufacturers towards opening up the car and
sort of going into the ubi-comp age, and then there are definitely pressures that are
pushing in the opposite direction.
Now, having said that, the background we come from is police cars. Because, as many of
you know, we have this effort called Project54, and I'll talk about that in a second. But
police cars, the difference between a police car and my car is that while I might have
electronic devices in my car, they're really toys. So if I don't have my MP3 player, even
if I don't have my cell phone or navigation device, I can probably drive. People have
done this for many years and they were okay.
But you're not a police officer if you don't have lights. You're not a police officer if you
don't have a radio. You're not a police officer really now even if you don't have a
computer that you can run license plates on. That just is part of policing. So if you're a
police officer, there are things that you have to have, these electronic devices.
And so in a very real sense, police cars are the vanguard of ubi-comp in cars. Because
they actually both have these devices and really need them and want to use them on a
daily basis. It's easy enough for us to say, look, just don't use your cell phone, it's
prohibited, and ubi-comp-problem solved. But for police this is not a way to go. So I
think that if you're going to do research in this area, in fact working with police is a very
nice place to be.
So for some of you a reminder, for perhaps others a quick introduction into this Project54
system, basically lots of devices, as I just said. We provide a way to integrate them into a
single system and provide a single user interface which has a voice modality so you can
issue voice commands to turn lights on, to run license plates and these sorts of things.
And there's also an LCD touchscreen as well as the original user interfaces. So if you're
resistant to technology or, for that matter, if your computer crashes, you still have the
fallback of everything works the way it used to work 50 years ago, but ideally you can
actually take advantage of the new technology as well.
Okay. So hopefully that sets the stage for some of the studies that we've done. There are
two problems that the studies that I'm going to talk about address. And the first one is
that there's no clear -- we don't quite know how in-car devices affect driving
performance. So there are obviously people working on this. But there's no formulaic
way to figure out if you put a device into a car how will that affect driving performance.
And then the other question is -- related is how does driving performance and your
likelihood of getting into a crash -- how are those related. Just because you're not driving
as well does not necessarily mean you're going to get into an accident.
Primarily, so far we've been interested in the first question, really, which is how do in-car
devices, the various in-car devices affect driving performance. And that's what I'm going
to concentrate on today.
Our goals are twofold. One, we'd love to have an evaluation tool. So it would be really
nice if we could evaluate these in-car devices and say, well, this is how they affect
driving, this is a safe device and this is not a safe device, or this is a safe user interaction
design and this is not.
And then we would like to propose ways to reduce potential distractions. We know that
there are distractions. We can identify them hopefully with the evaluation tool and
hopefully we can propose ways to reduce them.
Our major hypotheses in all the studies in this work are that what affects driving
performance are things such as the user interface characteristics. For example, for our
speech user interface, this could be the speech recognition accuracy, do you have
press-to-talk button or not and so forth. What we would call road conditions: are you
driving at night or day, are you driving in the city or a highway, curves, no. And then the
psychological state of the driver: are you frustrated, are you happy. And, of course, all
the interactions between all of these things.
And in evaluating these hypotheses, we use a driving simulator. So what I'm going to
talk about really today is the driving simulator studies that evaluate the major hypotheses
and then, in fact, very specific hypotheses that come out of those.
So the driving simulator looks something like this. And, in fact, there is a link up there if
you want to get more about this. There is a video and a little explanation about what the
driving simulator does.
But, in short, this is a driving simulator with a 180-degree view in the front. And so that's
with three channels, so-called channels, so three screens. And you also have these
side-view mirrors, as well as a rear-view mirror, and then a motion platform. The motion
platform allows for feeling acceleration, deceleration only. But that turns out to be a nice
feature because you can actually tell -- as it turns out, it's very difficult to stop at a line
without feeling that deceleration. So that really helps with that.
The driving simulator helps us evaluate our hypotheses through driving performance.
And what constitutes driving performance? Well, primarily, we've looked at the variance
of things like lane position, steering wheel angle, velocity, and a distance if you're
following a car. So what position you are in the lane exactly doesn't really matter as long
as you are in the lane. So are you at zero centimeters or are you at plus ten centimeters.
It doesn't matter. What does matter, though, is if you're weaving in and out or if you
you're changing that position a lot. So basically the variance.
Similarly, the steering wheel angle, the mean is probably going to be zero, otherwise
you're going to be in a ditch. But the question is how hard are you working. So you
could imagine a situation where, in fact, your lane position variance is low, you're able to
keep yourself in the lane, but you're really working hard at it. And that's a sign that
something's going on; you're probably overwhelmed by something, whether it's the road
being too difficult or there is something else going on in the car that's distracting you
People tend to slow down if they're overwhelmed. So velocity and velocity variance
matters. And then people tend to have a harder time keeping distance to a vehicle. They
might just lose it and the variance goes up. Of course, simple things like lane departure.
So if you're departing lanes very often, that means that you're not doing so well.
Collisions and other things you can look at.
Now, the simulator is also equipped with a Seeing Machines eye tracker. Again, there is
a URL if you want to take a look at more information about this eye tracker. The eye
tracker has two cameras and also a couple of IR pods that we can use to illuminate the
subject. And then what you do with this is basically look at the visual attention. So
things like fixations, which would be looking at a particular spot for, let's say, over 100
milliseconds or over 200 milliseconds, however you feel like the definition is appropriate.
And then the number, the timing of these things, so how often do you actually look at the
GPS screen. What is the timing; do you happen to do this before or after turns or is there
a voice announcement that prompts you to look at the GPS screen and so forth.
Scanning matters. Scanning meaning scanning left to right. Because especially in cities,
that's very important; that's how you find out if there's a pedestrian lurking behind that
parked car. And people tend to focus in on the road ahead as they get overwhelmed with,
for example, in-car activities.
Per closed to percent closed time, so that people look at how -- as it turns out, the way to
find out if you're getting tired is whether you're starting to close your eyes for longer and
longer periods of time. So if you have one of those devices -- truck drivers have these
now in trucks where they start beeping because it notices that you're falling asleep. And
one way to notice that you're falling asleep is that this percent closed time is increasing.
Time looking at the road. Just are you looking at things in the car or are you looking at
things outside the car. If you're looking at things outside the car, that's probably a good
sign that you're not going to crash.
And so these would be things that we can use to evaluate hypotheses through our studies
in the driving simulator, and then as well as with the eye tracker.
So I wanted to talk about four studies today, and, again, please do feel free to interrupt if
you have questions. An older study, the first study we've done had to do with a police
radio. And, again, having this police background for the project, made sense to look at
the police radio. And we looked at the speech user interface versus the hardware user
interface interaction and how this affects driving performance.
In the following study, the speech user interface characteristics were varied, and we
looked at how that affects driving performance. And this is work with Tim Paek that was
actually published at Interspeech last year. And then Oskar's work also with Tim's help
were looking at the glove -- I'm sorry, we call it the glove press-to-talk button. So the
idea here is what if you didn't have a fixed location for the press-to-talk button but rather
you had it sort of floating on the steering wheel. But instead of an instrument in a
steering wheel, if you just put a glove on someone just to get results quicker.
And then we're also currently working with Tim on a navigation experiment where we're
looking at the differences between people getting printed instructions, graphical user
interface in speech, so that would be the state of the art types of instructions, and then
speech-only instructions and this on city highway and so forth.
So let's take a look at this police radio study. For some of you this may be a reminder,
but what we've looked at here is this is the picture of a -- one of my students who's
demonstrating this, so we have the speech user interface, which here is a press-to-talk
button, there is the microphone so you can talk to the computer and basically issue
commands such as change the radio station to X, Y, Z.
Or you can do that by operating tiny buttons and looking at a tiny screen on a commercial
police radio which sits in -- in fact, this particular radio is in all of the New Hampshire
police cruisers, from maybe 1,500 of them between the state and local police cruisers
And so you have relatively small buttons. You have to take your hand off the wheel and
you have to look for feedback on this small screen. In fact, given that there is something
like 200 channels, this actually isn't trivial; you kind of have to look at where you're
going.
And we found drastic changes in lane position variance, for example. You see that
highly significant with the speech user interface, a lot smaller variance with the radio,
hardware interface, a lot larger variance. And by the way, this was on a straight road.
There were no curves. People were just driving on a straight road. And they had a harder
time keeping in their lane. And then even more significant and even more dramatic
difference in the steering wheel angle variance where the hardware interaction showed
huge variance compared to the speech user interface variance.
>>: (Inaudible?)
>> Andrew Kun: Degree squared, sorry. In the previous one it was meter squared, so
this is degree squared, yes.
So having realized that there's -- having gone through past this, we wanted to get a little
more information about how does speech user interface characteristics actually -- and
which speech user interface characteristics affect your driving performance.
So we've designed another experiment in which we had a sort of similar task, where you
were supposed to interact with a police radio. But we actually took the police radio
away. We made this a Wizard of Oz experiment basically. And we looked at speech
control only. So the secondary task -- the primary task being driving, the secondary task
was speech control of the radio. You have to issue commands such as change the
channel to this, retransmit the message that just came in, go back to this channel and so
forth.
And then we varied a couple -- three things, specifically. We varied speech recognition
accuracy. And, again, given that this was Wizard of Oz, it was easy to vary speech
recognition accuracy, right? And we had a high condition or a low condition. And then
we varied whether you have to use a press-to-talk button. So in one case you have to
push down, hold, and then release, a fixed press-to-talk button, which in fact in this case
was in the center console. And then the other option was that you had ambient
recognition. So you did not need to be using a press-to-talk button.
And finally we looked at the dialogue repair strategy; that is to say when the computer
did not understand what you said. It either misunderstood -- that is, it executed the wrong
command -- or it just said I didn't understand, please repeat.
>>: For the accuracy, how high was high and how low was low?
>> Andrew Kun: So 88 percent was high, and 44 percent was low. So it was truly low.
We wanted to have extremes for this first study. And that's a good question. Of course
you would want to have more graded result at some point presumably, but...
People drove a scenario which the map looked like this, so basically it was a curvy road.
And you want to have a curvy road so that people actually have to struggle a little bit of
driving. Straight driving is obviously a lot easier than having some sort of curves in your
driving scenario.
And then what happened was that you see that we basically varied the accuracy from 89
to 44, and then with or without press-to-talk and then particular parts of the road they had
to do one or the other. And so this is about one kilometer, so this is a reasonably long
scenario.
>>: In which direction they drive, clockwise or counterclockwise?
>> Andrew Kun: Clockwise. So this was the starting point. So they headed up. And
then there was a little bit of a warmup here. In fact, I'm not telling you the complete
story, so there was training obviously that went on.
>>: (Inaudible) the second half of the road is practically nothing.
>> Andrew Kun: Yes. So in fact they did not -- you're right. So I probably could have
given you a slide that left us off, because they did not actually have to drive the rest of it.
Or they drove a little bit so that we could do some comparisons of baseline driving and so
forth.
>>: (Inaudible) I just thought that the initial part of this, on the lower right part, which
means that training can -- it seemed quite long, so this is (inaudible).
>> Andrew Kun: Yeah. It's about -- this is where they ended up starting, and they
started the interactions here.
So, again -- or what we found here were really two important results. One is that the
recognition accuracy does influence your driving performance, specifically steering
wheel angle. Variance was higher for low-recognition accuracy. So for low-recognition
accuracy, you ended up with a higher average steering wheel variance than for high
recognition.
And then the second result was that if you had low accuracy, then using the press-to-talk
button also influenced your driving performance, specifically (inaudible) position, and
you see that on the X axis you have to the press-to-talk, so you didn't have to use it in the
ambient-recognition condition and, yes, you did have to use it. In there, the variance of
the lane position was higher.
So this we thought was important results. So if you're going to put a speech recognizer in
a car, you better make it work well enough, for one thing. And then be careful because if
it starts not working so well, then the press-to-talk button might become a problem. And
honestly it just kind of gives you an inkling that the press-to-talk button, there might be
something there, so this might actually be worth some further -- some further study
>>: Just so I'm clear on this, so this is only for the low accuracy conditions (inaudible) so
the idea is that if you're low accuracy and you've got press-to-talk, you're getting a higher
variance.
>> Andrew Kun: Right. And so we can think about this, why would this be the case if
presumably -- and that's another hypothesis that kind of came out of this, is you're
probably frustrated. And perhaps you take it out on the press-to-talk.
>>: And the press-to-talk was on the wheel or on the --
>> Andrew Kun: Yes. It was on the wheel. In this case we moved it onto the wheel.
>>: So who are the drivers? Are those policemen or --
>> Andrew Kun: These are not policemen. These are subjects that we recruited from the
community, and it's mostly --
>>: (Inaudible) diverse.
>> Andrew Kun: It's diverse, yes. It's -- I don't have the statistics off the top of my head,
but it was on the order of ten drivers. And they were from UNH, some students and stuff.
>>: If you needed some related work, there's a kids' show on called Myth Busters where
they actually did a comparison. They didn't have the speech or the push-to-talk, but they
compared cell phone -- driving with cell phones versus alcohol. And I've forgotten what
the result was, but you might consider alcohol as another baseline condition.
>> Andrew Kun: Well, I'm sure that we'll find students who are going to be willing
too --
(Laughter.)
>> Andrew Kun: See, UNH is a dry campus, so we'll have to get special permission to
do these sorts of -- but, you know, I mean if that's what it takes to do science, right,
then...
Okay. So motivated by this result that the press-to-talk button does have an influence on
driving performance, we wanted to drill a little deeper. And this is an ongoing study that
Oskar is working on in which, again, we had the same task of driver on the same route,
do the same business of controlling this police radio to Wizard of Oz again. But now it's,
again, let's review this same speech recognition accuracy high and low, but also let's take
a look at what the activation sequence is for the press-to-talk button. So is it push, hold
while you're talking and then release, or is it push and release and then speak and let the
end pointing be done automatically. Or no push, which, thanks to Ed Catrell (phonetic)
we realized this is how we really designed our experiment.
>>: (Inaudible.)
>> Andrew Kun: Well, and so let me just say that in fact if you look at the last line here
right up, if you start out with this, so push-to-talk button, do you have no push-to-talk
button, there's ambient recognition, or you have a fixed push-to-talk button, which is
fixed on the steering wheel, or you have this glove, which allows you to -- here's the
picture of the glove, so it's basically a glove with a couple of sensors in the thumb and in
the index finger, and those are basically the switches, the press-to-talk switches. So this
sort of -- instead of instrumenting the steering wheel, which is harder, right now you have
basically the ability to push any point on the steering wheel and get the push-to-talk
button.
So back to this, so you have either -- you don't have to use the push-to-talk button, you
have this fixed push-to-talk button, which is on the steering wheel, or you have this
glove, which is sort of a floating push-to-talk button, and these are your three conditions.
Now, for these too, you can have push-hold-release or push-release. But in fact for
ambient, you can't. So hopefully this here explains it a little better, this table. So you
have the push-hold-release and push-release, which makes sense for fixed and glove, but
in fact don't make sense for ambient. So that's really sort of a no push condition. So just
if you want to set this up statistically, this would be a good way to look at it.
But at any rate, I think that what's important to take from this slide is we have these three
conditions for, you know, do you have a push-to-talk button at all and, if you do, is it
floating or is it fixed. And then if you do have a push-to-talk button, do you have to do
push-hold-release or push-release.
Now, let me show you a couple of videos of what the interactions look like. So this is
again the glove, and then here is a video where you'll see a person using the fixed
push-to-talk button. And, by the way, I don't know how the lighting is. Hopefully you
guys can see. But one option is to potentially turn it lower.
>>: They tend not to like to because (inaudible) --
>> Andrew Kun: Okay. Well, let's see how it works out. Let's see how it works out.
And, anyway, you'll see the press-to-talk button -- here's the steering wheel and the
press-to-talk button is right here. This person will operate it. There will be a red circle
pointing out that she's pushing it, so that will help. And then there is the leading vehicle.
You can see that there's a curve coming up, so this is that curvy road this person is
driving on and she is basically following that leading vehicle. That's the primary task,
and the secondary task, again, is issue these commands to this police radio about
changing channels.
(Video played.)
>> Andrew Kun: Okay. So that's the fixed push-to-talk, and you could see those were
the misunderstanding where the computer would misunderstand things and you'd have to
fix it and so forth.
Now, one thing that -- so we're still looking at the data, and I think we're probably going
to end up probably collecting some more data. But one thing that we can certainly say,
one thing that came out of this that was interesting, we thought, was that people actually
tended to glance down at the fixed push-to-talk button. And I was surprised because I
thought that it's fixed, there's only one button, it's not -- what's there to look down for.
But as it turns out, as you'll see in this video, people do. And a lot -- a lot of the
subjects -- most of the subjects actually look down very often to see where the button is
So what you'll see here -- by the way, this is a video from the viewpoint of the eye
tracker. So, I don't know, Ed can tell us how this compares to the video that he sees on
his eye tracker. But a couple of things here. The two vectors here would show you the
direction of the eye gaze. And you'll see that will be moving. This is the head position,
so the direction the head is pointed at. And the two numbers here, the green will be
counting up how many times a person looks down before he presses the button, and then
the red will be counting up how many times he does not look down before he presses the
button.
You'll be able to hear a beep when the button is pressed. So, ideally, and what's going to
happen here is you'll have a bunch of short snippets of basically the person looking down,
looking up, pressing. And then I cut off to the next interaction where the person looks
down, looks up and presses. So you can try to synch yourself up with the look down,
look up, listen for the beep, look down, look up, listen for the beep. And the beeps are
basically when the person will start to issue a command and I cut that off. That's not
there. It's basically -- the original video shows all the interactions, and this one in the
middle there is going to be a blacked-out spot just because it's going to be a little too
long. So here we go.
(Video played.)
>> Andrew Kun: So look down. There's a beep.
Okay. So in this particular case, 31 to 23, quite a large difference. So more than half the
time this person looked down, even though it's a fixed push-to-talk button, it's not going
anywhere. And there's really only one button you could possibly press.
>>: So these are all first-time users.
>> Andrew Kun: That's true. And I'll get to that. And that's a question that is worth
asking, is does that make a difference.
>>: And was the button on the left side of the wheel?
>> Andrew Kun: It was. And I was wondering if anybody was going to catch that. But
for some reason --
>>: (Inaudible)
>> Andrew Kun: It was on the right side of the wheel. And for some reason the eye
tracker gives us the mirror image of the (inaudible).
>>: Is there any attempt to test maybe push-to-talk being on the left foot?
>> Andrew Kun: We did not do that. I think people have done that. We haven't done
that. And I can't quote a paper, but I'm not positive that that worked out so well for
people. But we haven't tested it.
>>: People looking at their foot and some point (inaudible).
>> Andrew Kun: That would be more exciting, right?
>>: The video went through real quick, but was there some learning even on these 50
trials --
>> Andrew Kun: Yes.
>>: That is, was the first half that you look more and then it would be more red on the
second one?
>> Andrew Kun: Well, in this particular example -- so whether that's a representative
example is a question for some statistical work. But you could see -- I don't know if you
noticed, but in fact in this case the last few glance- -- the last ten probably were glance
down. So he definitely glanced -- it almost looked like it's the opposite. So that's a good
question. We have to look into that a little more.
And then one thing that is kind of interesting is just the difficulties that the eye tracker
has to go through and the difficulty that Oskar then has cleaning up the data, because of
the various -- the person looking one way or the other, the eye tracker only has so much
of an angle that it can track.
And then this last shot here is interesting. See the hand right in front of the camera,
which then if you put it right in front of the illumination, that kind of messes up the
contrast and so forth, so just interesting things that keep -- that go on.
Now, let me show you what the glove interaction looks like. So again:
(Video played.)
>> Andrew Kun: So that's how the glove interaction happens. And then take a look at
this person again and listen for the beeps and look for the glances.
(Video played.)
>> Andrew Kun: So the point is that there are few, right? So at least for this person -obviously there's no reason to look at your -- you know where you index finger is.
You've learned that very early on, or your thumb. So you can basically do this without
looking. So we thought that was -- I certainly thought that was surprising, but interesting
result.
One thing that we also looked at that we've had a chance to look at already is where do
people actually interact with the push-to-talk button? By this I mean, if you think of the
car coordinate system, where on the steering wheel do they push the button. So if you
have a fixed push-to-talk button, that depends only on how much you're turning the
wheel. That gives you exactly the angle of where you're pushing. If you have the glove,
then you have to kind of transcribe it. So example:
(Video played.)
>> Andrew Kun: So Owen, one of the students, went through this where he basically
overlaid this fixed coordinate system on the steering wheel and transcribed where people
push the glove button. And when you do that, you get this sort of a graph, which shows a
couple of things. For one, the red, which is the fixed push-to-talk button, is centered
around roughly the 75 degree bend. So we bend this. Obviously it wasn't supper precise,
but roughly the 75 degree bend, which is where if you're heading straight, that's where on
the steering wheel the push-to-talk button is.
The glove push-to-talk is more towards the 30, 45 degree, which, when you think about
it, that's the ten o'clock-two o'clock setup. So if you do what you're told to do in driving
school, then that's basically where you're going to push the button. So that's a nice result
we thought.
And, also, see how it's more spread out, right, the blue versus the red. So this is
something that we thought would happen, that people would feel more comfortable
pressing the button sort of in a wider range of the steering wheel, and that is coming
across. Now, you might ask is that a good thing, but that's sort of a separate --
>>: (Inaudible) this is where the push-to-talk button was actually fixed on the steering
wheel?
>> Andrew Kun: The button is fixed on the steering wheel, and that happens to be in that
75 degree bend when you -- and then the only thing you have to do is really look at the --
>>: (Inaudible) or just on the trial set?
>> Andrew Kun: Well, that is kind of -- I mean, we could have really placed it
anywhere. But that's roughly where if you have a -- if you buy a car with a bunch of
press-to-talk buttons, that's where they will be, roughly the center of the steering wheel.
>>: (Inaudible) because of the possible false alarms, you have a more (inaudible) --
>> Andrew Kun: You could. And so that is -- I don't know that we have -- Oskar, do we
have any false -- we probably have a couple, but not too many
>> Oskar Palinko: (Inaudible) significant.
>> Andrew Kun: The push-to-talk buttons are a little -- they're little microswitches, and
they do give you -- what do you call that -- tactile feedback by virtue of being the way
they are.
So it's actually not very easy to press them if you don't mean to press them. And they are
on the index finger and the thumb, which, when you're driving, are not -- you don't
necessarily drive like this. Is it's a valid question, you're absolutely right, but it's -- I
think the setup is such that that does not happen very often.
And so that brings me to the last thing I wanted to discuss today. This is also an ongoing
study about navigation -- yes.
>>: You didn't show any of the -- at least I don't think I saw any of the data having to do
with variance, et cetera, from lane position. Was there any difference between the
push-to-talk and the --
>> Andrew Kun: So we're in the process of actually looking at that right now. In our
preliminary analysis, we see that the accuracy is showing up, so we're basically
reproducing the result that was the previous study, so that's good. And we need to drill
down, given that we have this unbalanced data set. We need to actually think about a
little more how we're going to process it. So, presumably, if this talk was given in a
month, I'd have slightly more results from both. But that's how it is.
So navigation, the motivation is sort of obvious. The personal navigation devices are
proliferating. So it's interesting to see how they affect driving. And when you think back
to that picture that I showed you at the beginning of the talk where the person was I think
driving and definitely pushing a button, is that such a good idea.
And, of course, even from the very first study I showed you where the -- the radio
interaction, you know that that's not a good idea. So pressing buttons on a tiny display,
that's not going to be so good.
But, anyway, even if you don't do that, we wanted to look at the following. What if you
had this task of follow directions to get to a destination, but then the directions are given
in three different possible ways: one would be you print out your directions from the
Web and you get them on a piece of paper and then off you go; two would be the state of
the art, you have a personal navigation device that has a graphical user interface and it
also gives you voice prompts to help you make turns; and then the third one is sort of,
well, what if you just took away the graphical user interface and you kept the voice
prompts only, would that actually do just as well in terms of getting you to the destination
but perhaps better in terms of driving performance.
And then we varied road conditions because we figured that highway and some sort of
a -- I call it suburban highway, so basically multilanes but perhaps more curves and more
buildings around, and then city, obviously, those probably are going to be different -- are
going act differently.
So the cabin setup looks something like this. Here is our personal navigation device,
Goowy. So somewhere where you'd probably put one if you rented a car and you had a
beanbag, to put it on your dashboard. And then we also have a video camera, and you'll
see some video from this angle of what goes on in a car
So the map -- the city part of the map looks something like this. And something to notice
here is that we have some short segments, we have a longer segment and some sort of a
middle-length segment. Because one hypothesis is that the length of the segment, the
amount of time you have to wait for the next prompt will actually influence your driving
performance; perhaps you'll get nervous or fidgety or you have to look over.
And then one thing we're also interested in, well, how many times do you actually look
over at the GPS screen, just to look, to see where you are. And the argument would be
you probably shouldn't look if you don't have to, because looking at the GPS screen is
probably not the best idea.
And then this is the highway, the straight segment here of the highway, and then what I
called earlier, this urban highway or something like that. So there are curves here
basically and also there is some more built-up -- it's a more built-up area.
Okay. So here's a video of a person using printed directions. And what you'll see is four
video segments. There are three video camera angles that you'll see. This one, so behind
the driver, you'll see that video camera from the side that I also showed you, and then
you'll also see the eye tracker video angle. And then at the end you'll see a segment
created in MATLAB, which will just show you what happened with the lane position.
And you'll see in this particular case that this person will deviate from the lane that he's
in. You see that right now he's on the highway, he's in this particular lane, printed
directions are right here, he'll pick them up and, as he picks them up, he'll start moving
into the next lane. And you'll see that on that last segment as well. So let's take a look.
(Video played.)
>> Andrew Kun: Okay. I see the glance. It's interesting to see how it becomes yellow,
by the way, there, which sort of -- and then -- okay. So here's -- glance is coming up.
There's a glance. Look at how he started moving and then he decided, all right, fine, I'll
just go. And then multiple glances. So here are the glances that he took down, and you
see that basically as he was getting ready and then took that glance, he basically moved
from his lane. This is the lane marker here. He ends up in the next lane, and then at that
point he just says, well, fine, I'll just go to that lane. So this is about one meter, so this is
a serious distance that he passed, and this is about five seconds of travel time on the X
axis.
Now, we did compare that already to what is the state of the art. Well, here's the same
driver using the graphical user interface and the speech interface, so that's the state of the
art that you have right now. This now is a (inaudible) scenario, and partly because the
video's more interesting when you actually hear some instructions. And so what you'll
hear here or what you will see is -- so there's the graphical user interface. You'll see the
person glancing over. You'll see three segments. The MATLAB segment will not be
here. But you'll see, again, the -- this angle, the angle from the side, and then also from
the point of view of the eye tracker.
You will also hear spoken directions. And I'll turn this up just to make sure that you hear
that. And the spoken directions will same something to the effect of turn right.
(Video played.)
>> Andrew Kun: See those glances. And see the glance in this case is to the left because
it's mirror image. And there's another glance on the GPS unit after he was done with the
curve, with the turn.
And then finally that same person:
(Video played.)
This is only speech. The graphical user interface is turned off. Notice that there are no
glances anymore. All right. So he hears the instructions and follows them and basically
looks at the road. So look at the steady gaze, right?
So while we don't actually -- we're still actually in the middle of collecting data. The data
that I've seen show clearly, as you'd expect, that when you have a piece of paper in your
hand, your variance of your steering wheel angle and your lean position is visible by the
naked eye from zoomed out on the map. And just this gives us a good indication that
there will be some interesting data as far as the glances are concerned. So we're looking
forward to the data collection being completed.
>>: (Inaudible.)
>> Andrew Kun: So a quick overview of what we've learned. Certainly most speech
recognition accuracy is a problem. If you're going to put a -- and of course it may not be
that big of a problem, because people will not use your system, but if they decide to use
it, they will have issues with their driving performance.
Press-to-talk button is an issue. So the design of the press-to-talk button, where you put
it, what kind of interaction it is, you should pay attention to. And then the question about
training. So this business of glancing down. So one question is would this person have
done this week after week after week. So if we bring back the same subject over and
over and over, are they going to stop doing it or no.
And I'm not sure that they will, because in fact there is no training going on. No one is
telling this person, look, don't look down. So perhaps they'll figure it out, but there is
really no feedback that says, you know, you really shouldn't do this.
So I'm wondering if in certain situations bad habits that are formed at the beginning of
the user interaction being figured out by the user are going to stay because what exactly is
the training that we give our users and what exactly is the training that they're willing to
accept. So we might have to really design for this and think about this ahead of time,
because the bad -- I'm really of the opinion that the bad habits that they develop early on,
unless they crash and then they're told, hey, by the way, that was because you were
glancing down, what exactly is the feedback that makes them stop.
>>: Kind of curious as to whether we could train the users by simply having nothing
interesting to look at when they look down. So they look down, blank screen. Okay.
Maybe eventually they'll stop looking because they're not finding anything there.
>>: And hopefully the navigation experiment where you actually have this third
condition of the speech only, my guess is that that's going to be (inaudible) --
>>: (Inaudible) still was looking at that direction over and over and over each time
(inaudible) the same, nothing changes.
>> Andrew Kun: And what I wonder about is what is the training that tells them don't do
this. I don't think that there's anything. So unless he has some self-feedback of, oh, boy,
I just almost ran something over because I wasn't looking.
>>: (Inaudible) possibility might be that people actually believe what they see more than
what they hear more.
>> Andrew Kun: True.
>>: So maybe if they gain more confidence that this system is actually doing a good job
(inaudible) --
>> Andrew Kun: You're right. You're absolutely right. It's an open question. I don't
have an answer.
>>: (Inaudible) happen to see anything, I mean, there's something to see (inaudible) they
want to see.
>>: So intelligibility, too, is also -- if they don't hear what was said, they're not sure, the
visual confirmation --
>> Andrew Kun: Right.
>>: That's why we use voice prompts.
>> Andrew Kun: Right. But it --
>>: (Inaudible.)
>> Andrew Kun: In this case, the video may not have sounded great, but, in fact, in the
car it was pretty clear. So I don't think that -- yeah.
>>: A comparable question may be whether or not they look down when they're pressing
their cruise control buttons as well.
>> Andrew Kun: Right.
>>: Because that's a completely driving -- I mean, I don't know how people use their
cruise control generally, but use accelerator and resume (inaudible) and they look down
on all those as well or (inaudible).
>>: College kids can (inaudible).
>> Andrew Kun: And also, I'm sorry, but that's actually an even -- well, the difference is
that there are a lot of buttons there, more than one anyway. So that might give you more
of a reason. And also I wonder how often you use it. So there's something to tactile
feedback or just -- people make buttons that are a different shape and feel, so that might
help.
>>: How were your subjects motivated to do a good job of driving? The stakes are
actually pretty low --
>> Andrew Kun: You're right. And that's certainly something that -- we don't actually -in these designs, we actually don't have a reward for a particularly job well done. But
what we do ask them to do is drive as you normally would. And it seems to me, and
Oskar can correct me if I'm wrong -- but I think that people are pretty excited to be there.
They're not unhappy. And they're getting paid reasonably well. It's $20 an hour.
>>: I think (inaudible) said that they would get $5 more.
>> Andrew Kun: Okay. So I'm wrong. So in your experiment -- oh, that's right. So in
your experiment we said that for -- in what case?
>>: (Inaudible) get $15 if they complete the test and $5 more if they do it right -- I mean,
if they tried to drive (inaudible).
>>: Is there any options to introduce traffic into those simulations?
>> Andrew Kun: There is traffic. And you may not have noticed it, but there is ambient
traffic. And you can actually control individual cars and make them do things that kind
of cut you off or turn or whatnot. Yeah.
>>: Sort of a related question to that is thinking about other kinds of metrics and
performance that you might use (inaudible) and I know one thing that (inaudible) is
following position. So in the task where it's basically just sort of your position to the car
in front of you (inaudible).
>> Andrew Kun: Yeah. So we do have distance to cars; however, actually Tim was
suggesting a similar metric, which would be the two-second rule, so are you tailgating too
close.
>>: Sure. I'm just thinking generally. I'm looking at the speed variability --
>> Andrew Kun: Yeah, we have that number, so we (inaudible).
>>: It would just be a nice thing to look at, I think. But it seems like the more you have
there -- I mean, some of those differences were real (inaudible).
>>: It seemed like some of the conditions you were seeing lots of glances, it felt like
there was a lot of stress in the video. I wonder if you could actually measure that
(inaudible).
>> Andrew Kun: Yes. Well, so you can't really measure stress, but one thing you can
measure is skin conductance, for example. And actually one of my slides talks about
that. So, yeah, we're -- we think that one reason that you may be doing poorly is some
sort of frustration or stress. And you can -- if you know that you're inducing it, then you
can -- when you measure a change in the skin conductance, you can argue that that's
related and that's a measure -- an effect of it.
>>: In fact, that's what we're having to do before Jacob decided, before I (inaudible). So
the next set of experiments were choosing a part, frustration. That frustration caused
by -- different types of frustration measured, physiological, depending on frustration
that's caused by not being able to fulfill the task or if it's caused by some property of the
speech interface and trying to (inaudible) as well.
>>: For the navigation study, if you were deciding the speech versus the speech plus
screen interfaces, do you think that the speech might be different between the two, like if
it was only speech you might say different things?
>> Andrew Kun: Yeah. And you're absolutely right. So, I mean, one thing is
ideally you'd have -- or probably you might be better off with landmarks. Right now
what we're doing is sort of trying to get a baseline, I think that's fair to say, of what
happens if you don't really change anything but move the Goowy away.
However, what we're hoping to learn, both from the Goowy plus speech as well as the
speech only is, well, how do people react? For example, what happens with the glances.
And that might actually give you an idea of how you should say things differently. So,
for example, if it turns out that they get fidgety in a long stretch, that they normally
would have actually looked down onto the GPS to confirm that they're on the right road,
the red line is still ahead of them and there are no -- they haven't lost it. Maybe you
should say something along the lines of you're still fine, or repeat, by the way, we'll be
taking the exit in a mile.
>>: So I think the drivers will be quite -- it's easy to tell which way they prefer between a
paper map and GPS. Have they -- have you asked them the question, say, hey, do you
really prefer the like (inaudible) touchless speech instruction versus (inaudible)?
>> Andrew Kun: We are asking the question of how do they like it and are they happy
with it.
>> : (Inaudible?)
>> Andrew Kun: I don't have the numbers, unfortunately. We're still collecting the data,
so we're -- but I -- and haven't looked at those numbers, to be honest with you.
>>: I think that would be very interesting because --
>> Andrew Kun: I agree. They may not like -- they may really like the visual feedback.
So that's -- but then again, that goes back to John's question, so there might be things that
you can do differently into speech that will reassure them and --
>>: When I was driving -- because sometimes I get frustrated using the GPS because
once they say it, they won't let me say, what did you say? This kind of interaction may
actually (inaudible) users' confidence (inaudible).
>> Andrew Kun: Sure. Yeah.
>>: (Inaudible) listening to music (inaudible) I may even switch off the GPS and just use
the (inaudible) the music shut off everything (inaudible)?
>> Andrew Kun: Yeah. And that's a good question. On the other hand, I think one
argument about the visual display is that it's really not that safe. I mean, looking away
from the road is probably not advisable, especially -- I mean, when do you actually look
at navigation devices? When you're actually lost or when you're in a new place. So you
in fact need the information. It's unlikely -- at least I can't do a quick glance. I usually -I like to travel with my wife, and she looks at it. That's the safest setup that we have.
>>: (Inaudible) confusion is the way you look at it.
(Laughter.)
>> Andrew Kun: Exactly. Well, I know, because there's a delay, right? The GPS is
slightly delayed and I --
>>: (Inaudible.)
>> Andrew Kun: I know. I've missed turns many times because of that, because is it
really now? Oh, no.
>>: The trend in the navigation displays are for higher and higher resolution and more
and more detailed graphics and more and more information, kind of away from what we
used to have the low res displays where it has a big arrow turning right, which was much
easier to glance at and interpret what that meant.
>>: (Inaudible.)
>>: We have the 3D buildings and we have perspective rendering and we have shadows
and drop shadows here and you see the angle of the sun and whether it's raining over
there.
>> Andrew Kun: I don't know, have you guys sat in a Prius, the Toyota Prius? So what's
with that display? They have this display that shows you when you started braking and
then the energy goes from the wheels into the whatever, the battery, and then when you
step on the accelerator, the reverse -- how does this matter and how is this safe.
>>: (Inaudible) impressive.
>> Andrew Kun: It is impressive, but, boy, it just doesn't seem safe at all. Because
people look at it and then they tell you, look.
>>: It's fun to look at from the passenger's --
>> Andrew Kun: That's right.
>>: I owned one of those cars, and I got one of the first in this country. And you learn to
ignore that display because it's so -- I mean, what it's basically doing is teaching you how
to drive.
>> Andrew Kun: Sure.
>>: Whenever you are heavy on any change, then you lose gas mileage and it teaches
you (inaudible). And then with once you learn how to drive that way, then --
>> Andrew Kun: Yeah.
>>: So now that we've seen these experiments in some detail, can you give a little bit
more information about the human-human experiments that you guys are talking about?
>> Andrew Kun: Sure. I can tell you about them. I don't have the slide. But in the
human-human interactions, basically we're trying to find out -- in the latest study we've
run is we looked at a task that's similar to the map task, which is there is a driver and
there is a dispatcher and the dispatcher is trying to get you from point A to point B, and
the problem is that the dispatcher's map and reality don't match. So the dispatcher is
telling you to take a right but there is no right turn. And the reason to do this is so that
there will be an ongoing conversation.
And our interest is in the question what happens if you have multiple overlapping
dialogues with a machine as well as a hands-busy eyes-busy task. So, for example, the
analogous situation would be that I'm driving and I'm discussing with you the study and
Tim is in the car and every now and then he interrupts with, oh, you need to take a right
here, or something along the lines. That, and then I go talk to him about the directions
and then I go back to you. And most likely I'm going to be able to do this without
crashing.
But if I do that with a computer, it's not clear that -- it's probably going to influence my
driving performance, but also the speech interaction performance, right? And so what do
people -- right now what we're interested in, how do people do this and what is it that we
can learn from human-human interaction.
So one study that we've done was looked at adjacency pairs because that's a nice, easy
way -- that's something the electrical engineers that we are can understand and take a
look at if you have this ongoing task and it's made up of adjacency pairs, where do people
interrupt, within, without, and also depending on what kind of urgency that interrupting
task has.
And we're actually continuing to look at that, and now actually the next study -- I was
just talking to Peter Heeman. Some of you probably know him from OGI. He's my
collaborator on this. And we're designing the next experiment where we're thinking
about sort of a 20-question duel being the ongoing task, where you ask a question -- you
have a turn and the other person has a turn, and whoever gets to the answer first, and then
having another interrupting task. But probably have something along the lines of a driver
and then a person at another location connected with headsets, so that we -- we realize
that we didn't have enough basically in the original -- the last experiment we had data, but
it wasn't probably enough data at the right places. So we'd like to actually force a lot
more question and answer pairs, and then more carefully figure out where we're going to
insert the interruptions.
>>: I believe there's a study that says that cell phones are dangerous, but having a
passenger in the front safe actually is safer than driving (inaudible) and so that it actually
matters if that person is in the car (inaudible).
>> Andrew Kun: Yeah.
>>: I wonder if your 20 questions could actually (inaudible)
>>: Andrew Kun: Right. Interesting.
>>: You might want to talk about (inaudible).
>> Andrew Kun: Yes, I'm about to do that. Someone asked about measuring frustration.
So I wanted to talk about a few things of what's next. So on a smaller scale, perhaps,
we're looking at frustration and then specifically in the small scale referring to, well, how
do you measure frustration. So one way to do that is to measure skin conductance, which
could be the physiological effect of frustration. And skin conductance, we have a nice
device that you're supposed to strap on little electrodes onto your fingers, which of course
doesn't really work if you're driving because there is -- for one thing, motion artifact. If
it's right on your fingers and you squeeze, then that creates problems.
So Owen, one of the students, is designing this glove. We like gloves. And he's trying to
fix the electrodes in places where the motion artifact will not be so pronounced and you'll
still get a decent reading. So that hopefully will be operational soon and then we can run
some studies.
>>: (Inaudible) here or here and to measure the skin (inaudible) --
>> Andrew Kun: We could. But the signal is not as strong. So the best signal is on your
palm and on your hand. So you're right. We might end up having to do that. We wanted
to give this a try first because the signal is nicer. But you're right, we might -- we might
have to do that.
And then if I can remind you of the problems that we wanted to address throughout our
studies, the in-car devices versus driving performance and then driving performance
versus probability of accidents. And what Tim and I have been discussing now for a
while and we're hoping to reach is this UNH -- Tim graciously gave up the naming rights
here -- obstacle test. So the -- how about if you could design an obstacle test if you're
driving you can -- and not being distracted you can get through. So things like you're
driving in the city and people are pulling out in front of you or a pedestrian is jumping
out or a car braking in front of you. And you can handle this fine because you're not
distracted and it happens such that it gives you enough time, if you drive in a reasonable
way. But then what happens if you put a device in there? Does that distract you enough
that basically you cannot now pass this unhot test.
And if that's the case, perhaps this is a good way to then measure the impact of these
devices and even to create as a quantitative test of should you put this thing in the car or
no. So this is certainly a large goal that we have set out for ourselves.
And then tied to this -- this is the simulator-based world. Now, to remind you, we have
this project, the 54 system, which is deployed in roughly a thousand police cruisers in
New Hampshire and maybe a couple hundred around the country. So what you could do
is actually tie this unhot test to some law enforcement vehicles as well and get the unhot
test to tell you will this thing work well, and then tie that to some perhaps naturalistic
studies, right, that go on in a police cruiser.
And we've actually pilot completed recently where we looked at how do police officers
use the Project54 system, meaning do they use the speech interface, do they use the
graphical interface, or do they use the original hardware interface. And let me see if I
can -- so, yeah, and this slide, just to remind you, so we have the speech interface,
press-to-talk, and microphone. You have the graphical user interface and you also have
in the center console the original hardware interface. So if you don't want to use
whichever, or perhaps you're really used to flipping the light switch on, that's the fastest
way to do it, or as it turns out the radar actually has a remote, and if you -- that's the
fastest way to catch someone, because you really have to do it quickly: cars fly by at 80
miles an hour, you can't -- if you issue a speech command that says lock, meaning lock,
the speed, by the time it gets recognized, the car is gone. So speech is just not a realistic
scenario here.
So in this slide or in this slide here you see this is what people would see on the graphical
user interface, and you have an overlay here, sort of a heat map of, well, how many times
did a particular speech command get issued. So dark blue would say yes, often, and
lighter blue, less often. So some of the -- basically we can collect this type of data, so
how do people use things in the car.
So we think that between the unhot giving us a nice way to predict what's going to
happen and then perhaps informing that -- the design of this unhot test from some of the
naturalistic data that we can collect in really a large deployed base that we have a good
relationship with, we think that we have something that could be interesting.
Now, I said law enforcement here, but I do want to point out that from the point of view
of things getting into cars, devices getting into cars, law enforcement is the vanguard
because they really use it on a daily basis and they really need it on a daily basis, so this
is a nice place to -- it's a nice place to study.
>>: Are the physical button presses and all that stuff also instrumented?
>> Andrew Kun: We can log them. They're not -- they're instrumented by software. So
we basically -- given that everything is synched up, we know that someone actually
pressed a button and we can tell, so -- which is really important, because that's a key. I
doubt that they -- they sometimes use the graphical user interface, but my guess is they'll
flip switches and then they'll talk.
>>: I just think that it would be really fun to sort of map out all the commands that are
used as well as the actual interactions and see where is the balance (inaudible) which is
the classic things that are always touched.
>> Andrew Kun: That's exactly what we're hoping to -- in fact, the pilot was run I think
last summer, and now we're gearing up to basically deploy this in probably 20-ish
cruisers. And we have a nice statewide setup where we can wirelessly get the data back,
so I think that the data should start flowing sometime soon.
So a quick set of acknowledgments: The NSF for funding us, as well as the USDOJ,
where the majority of funding comes from; Microsoft Research for multiple things;
certainly Tim's collaboration; also Jacob, one of my grad students, is a Microsoft intern,
so that's very much appreciated. And also the in-kind contribution of software which
we're receiving to compensate our subjects in the navigation study. And Tellme who
provided the voice talent recordings for the navigation study, the turn directions.
So, with that, I'm going to plug my blog. I run this EC blogger blog where Oskar is one
of the main contributors. And we have stories that are relevant to this particular type of
research, as well as other stories. So if you feel like checking it out, please do.
(Applause.)
>> Andrew Kun: Thank you.
Download