>> Zhengyou Zhang: Okay. So let's get started. ... Cleveland. He's President and CTO of LC Technologies. ...

advertisement
>> Zhengyou Zhang: Okay. So let's get started. It's my great pleasure to introduce Dixon
Cleveland. He's President and CTO of LC Technologies. I made met him last year at a European
conference on eye movement and again this year, last month at [inaudible] Conference. I have
tried his system. You will see the demo later on. I believe this is the best system building in the
variable of sight. That's why I'm very excited to have him here and demo the system. And
today the talk will be very special. There's no screen. This is the first time I host a talk without
any computer or slides. Dixon, please. So, by the way, he has more than 25 years’ experience
in the system [inaudible]. It’s a beautiful system. You will see.
>>: Why do you switch on the [inaudible] sunglasses?
Dixon Cleveland: You’re an [inaudible].
>>: I don’t see why you're looking.
Dixon Cleveland: I'm your user. I need your help. What's wrong with this picture?
>>: I don't see where you're looking.
Dixon Cleveland: Can’t see where I'm looking. You need to know where I'm looking. Why do
you need to know where I'm looking? Why do you need to know where I'm looking?
>>: [inaudible].
Dixon Cleveland: You can see through the glasses.
>>: [inaudible].
>>: More than just eyes.
Dixon Cleveland: Oh, I see. Do it with point in my face.
>>: [inaudible] expression, the eye brows.
Dixon Cleveland: That's exactly right. So the deal is that eyes are really important. We as
humans make eye contact. That's just one of the most important means of communication we
have. I may be talking to you right now, but you're reading my face as much as you’re listening
to my words. Computers need to do the same thing. They put up these marvelous displays,
you see all these beautiful graphics and these big screens with all kinds of pixels and color
depth and all that sort of stuff, does that computer have a clue what you're looking at when
you look at that stuff? Doesn't have a clue. Just does not have a clue. Should it? Yes. It's
dealing with a person. We do it. Our eyes are built. We got whites in our eyes right next to
these dark irises. So when I look one way or another you can tell immediately my eyes are
moving around. It's a communication mechanism. It’s the fundamental basis of all this.
So, can computers actually see our eyes? The answer is yes. eye trackers are out there.
They’re around. LC Technologies happens to build a pretty good one. This can be done. This
whole business of putting on eye tracker, it will nicely, one day it will be a sugar cube sitting
right in the very bottom of your device, whatever device it is. Great big screen, fine. Sugar
cube sitting right there. All the way the way at the other end of the spectrum you're going
around with your hand held by a sugar cube right in there looking back at you, figuring out what
you're doing with your eyes, communicating with you as if you're a human being that can see.
So I want to talk today about a lot of the physiological basis of what's happening inside our
head, how our brains work such that we can make these decisions. What's going on behind our
eyes? Un-cloud some of this issue of what's going on inside our head. So the whole story
basically starts with light. The universe started with light. Universe is Internet, light going all
over. It's going between the planets, it's going between the galaxies, it's going between people,
it's actually happening down at a microscopic level. Everything starts with light. So then
human life comes along, and we humans or any life form has to figure out what's in its
environment. Well, it picks up photons. So you have to have some mechanism to pick up the
photons, that’s the way you figure out what's going on in the world. And so we build these
things, and in the human they became our eyes. And it's a pretty fantastic device. It's an
amazingly fantastic device. The engineering that went into designing our eyes is just
outstanding.
Well, why do we go to all this trouble? Well, we’ve got to survive, we've got to find food, we've
got to do things, see what's going on out there. We've got two objectives when we are in
vision. One is this big thing you've got to be able to see everything. If a bear is over there and
that bear is going to come after you you’d better see it in your peripheral vision. At the same
time, if we're going to look at something in detail that we can use it in a way that we really
want to be able to use it we've got to be able to see with very, very high resolution. And so
nature did this a funny thing. It decided instead of just building our camera with uniform pixel
density every place, it puts this really, really high concentration of cones at the very central part
of your vision; and your peripheral vision’s got 70 times less resolution than you do in your
central vision. This is a beautiful solution to the problem because you can get very, very high
resolution image when you look at something, and you still get to see the entire world. So
peripheral vision versus central vision. But this creates all kinds of problems. And one of those
problems is that if you want to go look at something you have to point your eyes. So if I want
to look at you I’ve got to point my eyes over there. If all of a sudden I want to look at you I've
got to point my eyes over in this direction to look at you. But there is a wonderful piece of
serendipity here and that is that you now can tell what's interesting to me. I'm interested in
you so I look at you. I'm interested in you so I look at you. That's cool. We are communicating
now.
So the reciprocity of optics is a fairly important concept here, and that is if I can see each other,
if I can see you, you can see me, cool. Let's build on that. And that's why eye trackers can be
built. We are the best eye trackers there are. You don't need to go buy an eye tracker. You’ve
got two of them right there and you’ve got this visual cortex in your brain that just does eye
tracking like mad. It's perfect. I wish we could build a system that was just that good. But
we're getting close to that. We can start to do that today. It's still pretty expensive, and it's still
pretty clunky, but it's doable. The existence proof is out there. You can build those eye
trackers.
So what else then happens in the brain? Let's talk a little bit more, staying with the eye for just
a second, let's talk about a couple other parameters. We said we’ve got 70 times the density of
cones in the central vision of the eye than we do out in our peripheral vision. That's true. Well,
what is the scope of that? Basically, if you take that very high central part, the foveola, the
central part of the macular region, and you hold your thumb out at roughly arm’s length, that
foveola covers about the size of your thumbnail. So when you point your eyes, I want to look at
your eyes, I stick it out there, that thumb covers about this much of your face. You're sitting,
what, 10 feet from me? So your eye has to point at least that accurately to get the information
that it wants. So the eyes are pretty damn accurate when they point.
Well, nature has a problem now. Let me back up a second. I forgot an important piece of this
puzzle, and that is if you were to have the same resolution of pixels all over your retinal vision
turns out that your optic nerves will be about this big around, the visual cortex to process all of
that would be about a half a cubic meter, and that's a little bit difficult to carry around. It just
won't work. So nature really had, to get high resolution in your central vision, it really had to
concentrate down and it went to all kinds of extremes to get high resolution in the center. Out
in your peripheral vision there's a cell body for each rod and cone out there. When you start to
get in towards the central part of the vision it can’t get the density that it needs by putting a cell
at every location. What it has to do back at that point is put the cells right around the outside
of the macular region, and nothing but the wires and the receptors themselves are at that very
central part where there’s 70 times the cone density as there is in the rest of your eyes.
So nature went to all this trouble to do this. And in the process of doing then came up with the
problem well, you’ve got to point the eyes. You have to move them. So then it came up with
the ocular muscle systems. Those ocular muscles are the best muscles you've got in your body.
You don't think about them, all of this stuff is kind of unconscious and we’ll get into the
unconsciousness trip a little bit later, but what's happening is that those, the muscles have to
be really, really precise and they have to be really, really fast. When I look and focus on you,
back to the photon problem, I'm only getting a certain number of photons. So the photons
coming in, bouncing off of you, some of those photons happen to make it through my pupil
back onto my retina so I can see your eyes, and when that happens there aren't many photons
left. There are pretty few at that point. So even though we've got zillions of photons floating
around in the universe the number that get into my eyes are pretty small. So nature's got to
make use of those photons the best way it can.
One of the ways that it makes use of those photons is to put, well when I take a picture I have
to hold my eyes still for a certain period of time, and that period of time is about 250
milliseconds. So I have to go over and I fixate, hold my eyes still for about 250 milliseconds,
wait for all those photons to come in, develop enough of an image that can then go back into
my occipital lobe and get that thing, get that image processed and make sense out of it.
So nature's got a problem at this point, and that is it’s got to hold that eye still within a couple
of pixels for 250 milliseconds. That's a pretty astounding engineering problem. But it does it.
Those muscles do it. They are like no other muscles in your body, they never get tired, they can
hold your eye extremely stable, and then all of a sudden, bam, they can move your eyes at 300
degrees, 600 degrees per second over long-distance and stop them on a time and hold bloody
still. That's a fantastic engineering problem, and nature solved that problem. It built the ocular
muscle system to do that.
So obviously if nature went to all this trouble to design this complicated control system, this
complicated ocular muscle system, it’s important. It really is important to us. I don't mean to
harp back to this idea, but computers, they aren't paying attention to that process that's going
on. One of the reasons that we haven't thought to pay attention to that topic is that it's all
unconscious. By and large, everything we do with our eyes is unconscious.
>>: Quick question, how far back in evolution do you think this whole system developed? Like
how similar are we to like frogs?
Dixon Cleveland: Good question. That's a marvelous question. I don't know the answer to that
question. I really don't know. I can speculate, but my speculation wouldn't be any better than
yours so I won’t to that kind of speculation. But I don't know.
So you can see now if this thumbnail is about 1.2 degrees across, that's the extent of the
foveola in your eyeball itself, and across the foveola there are approximately 100 pixels, 100
cones. So that means that your eye has to be able to hold still to within 100th of a degree for a
period of 250 milliseconds; otherwise, if the ocular muscle system weren't that good, why have
all the density of cones? It just wouldn't be there. So the balance that nature finally chose is
this one with high resolution, but the ocular muscle system has to hold it that still.
The reason I'm going into a lot of this detail about some of these numbers in the eye is that
before we can actually design a good eye tracker we need to know what the eye is capable of.
We need to build an instrument that's good enough to measure what the eye really does; but
to build it any better than that, if all we are interested in is where people are looking, then we
don't need to build it any better than that.
So there's this concept of the Heisenberg Uncertainty Principle of eye tracking. So there's this
Planck's constant thing in the real Uncertainty Principle which determines exactly how precisely
you can measure something that's in motion. The analogous thing here is how precisely do our
eyes really have to work? So that's kind of the concept that I'm getting at here. Just how
precisely does the eye have to work? And one of the numbers that we just sort of derived here
in this conversation was the eye has to be held still approximately within 100th of a degree for
250 milliseconds. That’s engineering design requirement for the ocular muscle system. It's
pretty fantastic, and the eye can do that. So if are going to build an instrument we need to be
able to measure that kind of stuff. Well, I shouldn't say we need to be able to, but that's the
ultimate target. That's where eye tracking wants to go. Ultimately, that's the objective.
Well, we've been talking about this idea of the eye just holding still and taking a good picture,
and that's true. And even if I look at your eyes and my head’s doing this I can still get a fairly
good feel, but my eyes rotating during that 250 milliseconds in order to get that nice stable
image where I can still see what's going on. So as we are driving to the road and we are
bouncing around we still see everything fine. By the way, you can tell when you're not seeing
things fine when you start to get too blurry you get dizzy. There's some vestibular feedback
that we get on our own that says our eyes aren’t working very well. So if you're playing ball and
running around like this and you're not dizzy, your eyes are basically getting the information
that they need, and your eyes are holding steady and your ocular muscles are holding your eyes
still with respect to what it is that you're looking at, that moving ball as you go catch it>>: [inaudible], but you couldn’t read that way.
Dixon Cleveland: You don't think so?
>>: I don't think so.
Dixon Cleveland: Good question. Somebody ought to run that experiment.
>>: I mean you're bobbing your head. Try reading a page of text at arm’s length while your
head is, w don't do that.
Dixon Cleveland: Indeed.
>>: You do in a sense. I mean, kids and adults are actually reading in a car. You’re bouncing
around, you're moving around. Some people get car sick.
>>: Walking with a cell phone.
Dixon Cleveland: Yeah.
>>: Walking on a treadmill.
Dixon Cleveland: So there is a degradation. But there's really not quite as much degradation as
you might think. The ocular muscles accommodating a lot of this stuff. They really do; they’re
marvelous at accommodating it. Well, once you’ve looked at something for 250 milliseconds
and got a good, clear image there's generally no reason, from a photographic standpoint, to
continue looking at it. If that environment is constant, you're looking at a word, once you’ve
read that word and that information goes back into your occipital lobe and your visual cortex
processes it then says oh that's the word something and that word something goes up and your
frontal lobe processes the word something, you don't need to look at that word anymore. You
need to go look someplace else.
>>: Question. Does the eye evolve from the baby stage to the adulthood? Because obviously
when a kid was young, sometimes I look at the baby I don’t what's in his mind or what really
he’s looking at. And you need to give them some stimulation for that.
Dixon Cleveland: You do. There's a very, very complicated thing, and when you're actually
born you have about 20 times this many connections between the rods and cones in your eyes
than your brain says you need. And there's a big problem of figuring out which cones and cells,
how they fit together geographically because they aren't laid down perfectly; so nature goes
through a process of apoptosis which is program cell death, which is sounds really awful and
weird and blah, but what's actually happening is it’s figuring out which of those connections are
the right ones to make the best use of the rods and cones that exist, and that is a process that
happens during the first several weeks of life. It begins during that period of time, and it really
actually happens up through about five years. And if people, if in the process people don't
untangle that and figure out exactly which connections are the right ones to make your eyes
work for you, you have serious reading problems.
>>: So we can design our eye tracker in that way. We can try various combinations.
Dixon Cleveland: Well, that's beyond the scope of eye tracking at this point. We're going to
assume for the moment that people don't have amblyopia. Amblyopia is the disease that you
get when you’ve got one eye that your muscles can control it, for example, and they start doing
funny things; and an eye can go completely blind because that process of apoptosis never does
figure out which cells are right and it just keeps wiping out cells and the
connectivities[phonetic] until all the connectivity is gone. The rods and cones continue to work
but your occipital lobe sees nothing. It just doesn't get any data in the worst case. Anyway,
that's getting a little far afield. A wonderful question, an excellent topic, but a little far afield.
So the next thing is that these ocular muscles have to do is once you’ve looked at one thing,
you've taken the picture, you’ve gotten enough photons, there’s no more information to be
had there, it's much more valuable now to start looking someplace else, your eye then saccades
to the next location. Bing. Instead of looking at you, I'm going to look at you and I move my
eyes all over the place. And then saccades, the eye has to move at really high speeds; and as
it's moving, from looking at you to over looking at you, you don't perceive it but the video
signal, if you want to think of it in those terms, it goes from your eyeballs back to your occipital
lobe, your visual cortex, stops. It's a phenomenon called saccadic suppression.
And saccadic suppression was actually discovered and kind of validated in a cool way and that
is people actually flashed a light at you while your eye was saccading from one fixation to the
next. People didn’t notice the saccades or notice the flashes if they happened during that
period of time. So that's where this concept of saccadic suppression came from. Do you
actually notice that when your eye fixates or saccades from one fixation? No. So that's
happening way down at a much lower level, but that brings up this concept that your visual
cortex is processing these images, but your perception of the environment happens in a
completely different part of your brain. And that completely different part of the brain views
the world not in an eye-centric frame of reference; it views it in a world-centric frame.
What's gravity? This room is out there, it's relatively stable, I walk around on a relatively,
locally flat earth; and so I can just set a frame of reference out there, some inertial coordinate
frame. We’ll work out of an inertial coordinate frame and your vision sees that what you
perceive as your environment is in that frame. But your eyes are going around collecting a little
bit of data there, they saccade over to that place with another fixation, they get a little bit of
data there, and they're putting this image together. And so remember, we're doing all this
because somehow eye trackers need to be able to accommodate all this action that's going on.
So there's one other really important part of what's going on in your brain that we need to
discuss, and that is all seated in a part of the brain called the superior colliculus. Fantastic
chunk of brain. The question is you can only look at one place at a time. You’ve got this
thumbnail going around; you can put the thumbnail there, you can put the thumbnail there,
how do you choose where to put the thumbnail next? How do you choose where we are going
to look next? That fundamental cognitive process of ours is essential to how we live. And you
could think of it this way, we are always looking at, when we need visual information our brain
somehow is optimizing the process, where do you point your eyes, winner take all decision,
where’s the one place I want to put my eyes next to get the most important information to me
right now? How does the brain make that decision?
Well, some pretty interesting work was done by Doug Munoz up at Queen’s University; and
basically what he found was, and this work is now fairly old and it's rooted 10, 15 years ago and
a lot of theory before that, but in the superior colliculus there is the equivalent of a map. If you
were to take the folds of the brain in the SC and lay them out you'd find a map. And at the
center of the map is your foveola. So this is an eye-centric map. And in that map it starts off
being blank. There's nothing in this map at all. It's empty. And some part of your brain comes
up and say, a visual part of brain if you're reading, it says well my fixation right here is right
now, and I'm projecting the next fixation for me to get the next most useful piece of
information is over in that chunk of text over there, so I want that next fixation to go over
there. So it sends a signal down, goes into this map in the superior colliculus, and it starts to
build a spike saying, I want information at coordinate X, Y with respect to where I'm looking
right now.
And if there were no other inputs eventually that spike would reach a level and hit a threshold
and bam, that would trigger your next saccade, and so your saccade would move 13 degrees to
the right, two degrees down, and depending upon how you have the orientation of your book
in this case, and bam that's where you're I would go next. So if you get philosophical about this
and think well what is going on, the superior colliculus is getting inputs from all over the brain.
It gets inputs from your vestibular system. So you sit down and you feel something, you think
well maybe I need to look at that, it will send a signal into this map in the SC start building up a
spike at that location. If you hear a scream off in the distance and it's your little kid you’ll say
well that needs some attention, so that part of your brain will send a signal down into your SC
and it will start building a spike and this is pretty important to you, so it he’ll build that spike
real fast with respect to some of the other ones you get.
If you're walking down and you feel your balance is going a little crazy, you're going to trip over
something, that part of your brain will say I need visual information here. That's where I want
to look next. It will send a signal to the superior colliculus. So the superior colliculus, this map
has got these spikes building up all over the place. One of those spikes eventually goes through
the threshold that we were talking about. Bang. That's where your eye goes. Once it goes
there the map is cleared and starts again. All these pieces of your brain that would like visual
attention will start putting their boats into the superior colliculus and the superior colliculus, it
doesn't quote make the decision, but it adjudicates that decision. That's where the
adjudication of the decision to where your eye goes next is made. And that process is
happening how often?
>>: Every 250 milliseconds.
Dixon Cleveland: Every 250 milliseconds. Exactly. And it goes on and on and on 24 hours a day
we are doing that, in REM sleep we are doing that. Do your ocular muscles ever get tired?
Never. Do your eyes get tired? Sure. You perceive eyes being tired, but what is it that actually
perceives being tired? It's your eyelids. It's not your ocular muscles. Those ocular muscles say
I'm ready to go man, I'm holding still, bam. Saccade over to there, they're happy. They're just
absolutely happy out there all day long. They don't need sleep. They’re a lot like the muscles in
birds. Once they start to fly those bird muscles, and physiologically there's a lot of similarity
between those muscles, those birds just keep going, they just fly and fly and fly.
One of the important things about all of this stuff, so you can start to see how this is going on in
your cognitive, your brain is just always, all parts of your brain want visual attention, we’ve got
this mechanism for choosing where your eyes are going to go next, and as I look at you I can see
where you choose your eyes and that's very important information to me. And that then ties us
back to this place where we want to have our computers do the same thing. It is fundamentally
essentially human process and we want to duplicate that process as best we can in computers
to make them as interactive, as humanly interactive as we can make them. So if anybody's got
any questions at this point I'm sort of finished with the idea of what's going on in your brain
about all of this stuff. Yes?
>>: So how do you explain micro saccades?
Dixon Cleveland: I don’t. Micro saccades, I've never been, there are a lot of theories about
what micro saccades do. There is the theory of edge detection. You need to move your eye
around a little bit small enough and the theory there is that you want to move them at least
one cone which is a 100th of a degree, but that theory kind of bothers me because you do it
side to side how do you get the vertical stuff at the same time? And micro saccades are known
not to be circulardic[phonetic], to go in different corrections they happen at random times. It's
a marvelous phenomenon.
And there's another theory that says it’s corrective. If you have it centered up exactly where
you want to look then you should go make a correction and center that whatever it is that
you're really looking at up in the central part of your vision. By the way, the central part of the
vision actually has a tail distribution. When I was talking about the foveola 1.2 degrees across,
that's only the very central part, and the distribution drops down sort of almost like a bell curve
on either side. But first off, those actions are quite small. And so unless you want to get into
physiological studies of saying the eyeball is moving in a way that you really wouldn't expect it
to move to try to figure out whether somebody's got some physiological issue with their eyes,
but in general, that's a different realm of eye tracking. A very important one, nice applications
come out of that area, but that's not the case. Yeah?
>>: It seems that our system, our vision system has very high reaction to movements, like our
eyes will not be able to avoid looking at moving objects.
Dixon Cleveland: Yes.
>>: Can you make a comment on it?
Dixon Cleveland: Well, that happens mostly with the rods rather than the cones. The cones
have very, very, are very sensitive to, they're basically considered the black and white part of
your issues. They can see very well in the dark and stuff like that. But one of the things they
are is very sensitive to motion. Motion is important to survival. Generally, when things move
in your environment, they are more important for how you interact with that environment than
just static things. So that's one of their central roles, and that's kind of a different topic here.
But you do get attracted by that, and a lot of things that happened in the rod systems that you
detect it at the rod level to get fed back around, go back to the superior colliculus and the
superior colliculus it builds a spike, and the superior colliculus then moves to that thing, then
moves your eyes to look at that place where there is motion. That's a very well-known
phenomenon.
>>: Question.
Dixon Cleveland: Just a second.
>>: Let’s say you have more light, do you have a shorter saccades?
Dixon Cleveland: Interestingly not. No. I don't know why that is. It's a marvelous idea, and
why nature didn't optimize it that way I'm not really sure. But basically what happens is that
your pupils stop down to make sure that you’ve got a relatively constant amount of light and
then the ocular system continues to operate at its own pace. Yes.
>>: What about the whites of your eyes? [inaudible] difficult to understand what you're looking
at. Is there a physical advantage [inaudible] evolution or is that a social advantage that selects
the whites of the eyes [inaudible]?
Dixon Cleveland: I don't really know. I've never really read a lot of good papers on that topic.
So I can speculate one way or another. But as far as I'm concerned there are good concepts in
both of those points of view, so I wouldn't say one or the other. It's probably the answer is yes,
and yes. Yeah.
>>: Are there certain patterns of saccadic movement for novel or confusing inputs?
Dixon Cleveland: Yes. That was one of the early research done by a lot of eye tracking
researchers is trying to figure out the patterns of newness of verses, and that has evolved
recently into the definition of trying to differentiate whether somebody's a novice or an expert.
And so obviously the eye pattern’s pilots are perfect example of that. When somebody first
learns to fly they’re looking all over and who knows what they're looking at. And after a while,
when they become an expert, they know what to look at. Their eye patterns do change
considerably, and so one of the cool applications of eye tracking is actually being able to
differentiate when somebody's learned something well enough to be moved over into the
expert category.
Dixon Cleveland: There's another interesting phenomenon underlying that too, and that is that
when you first learn something you learn it in your frontal lobe. You’re conscious about it,
you're aware of it. You can't be aware of everything. So as you learn it transfers to different
parts of the brain, and the cerebellum section is one of the places it picks up a lot of that stuff.
So when you walk, you just walk with your cerebellum; it's in control, and it’s unconscious. So
the learned knowledge actually transfers from one place to another.
And if you ever have taught somebody how to drive and you’re teaching them about stop signs,
for example, they won't even see the stop signs, then they'll start paying attention too much
attention to the stop signs, and then after that, there's actually a dip and they quit paying
attention again. And you can actually tell as the teacher while you're watching this kid that
they just, that information, the frontal lobe is onto the next task; it hasn't quite made it back to
the cerebellum part of the brain and that part hasn't learned yet and they seem to have
forgotten something. They're not forgetting; they're becoming an expert. So celebrate that
change, don't penalize them and say oh, you forgot to look at that. Yeah, they might've gotten
in an accident as a consequence of it, but that wasn't slowing down their warning process
necessarily. So there are those paradoxical behaviors that happen as one learns.
>>: So are there two levels of consciousness? When you're looking at something you may not
be paying attention to it and you might be thinking of something else, but your eyes are still
stuck to that particular spot. Does that make sense?
Dixon Cleveland: It does. That brings up the whole topic that the way I've been talking so far it
actually sounds as though maybe your eyes are just going all over the place sopping up
information as fast as they can and that is often the model. But there often times when you
don't want visual information. And in that case you'll actually see people get to the point
where they'll close their eyes, they’ll just go ugh; they're in their own head. Their cognitive
process is elsewhere; they don't want to clog up the process with the visual information. That's
superfluous to them at the time. And that's one of the real problems of eye tracking. I can look
at your eye, but I can't tell necessarily, so I can tell when you look here or there that yes that's
the more important thing for you to look at the time, but I can't really tell is visual information
what's really important in the central part of your brain at the time. So there's a lot of work to
be done with cognitive psychology to be able to untangle those kinds of things and really fine
tune the use of eye tracking information. We as humans seem to be fairly tolerant of that.
How we do that, the algorithms that we employee in our heads to do it, we need to go study
that stuff. You had a question?
>>: Uh, yeah. So you mentioned the fatigue and it being caused mostly by eyelids. Is that
fatigue caused by also adjusting [inaudible] people up and down?
Dixon Cleveland: That's a good question. I don't know the answer to that question, but my
speculation is not very much.
>>: I think related to that I notice like most of these eye trackers use IR light.
Dixon Cleveland: Yeah.
>>: I guess with visible light shining in your face you don’t actually go and, basically your ocular
system knows how to go and stop [inaudible] so they reduce the light coming in, but for IR
[inaudible] apply?
Dixon Cleveland: No. We’re not very [inaudible]>>: And there's often fatigue [inaudible].
Dixon Cleveland: Right. And that's an important topic in eye tracking is we don't want to put
too much IR light on the eyes. Two reasons, one is that they dry out the surface of your eye and
that in itself is not good for eye tracking because of the corneal reflection image is not as clear
and can even go away and you get congealed tears and stuff like that, so the ability of the eye
tracker to track is not as good as it might be. And the second thing is typically a point source of
light becomes ends up as a point source of on the back on the retina. So if there's a lot of light
that's concentrated in one LED it has the potential to warm up the rods and cones that it lands
on and so we have to be a little careful about that. I actually sat on a committee in [inaudible]
where we talked about that kind of thing and went back to David Sliney’s work way back in the
1970s when they were first beginning to figure out how much light damage was done, and eye
trackers are fairly safe. There's really not a safety issue to speak of. We’ve paid attention to it
at LC Technologies because we work a lot with people with disabilities, and the last thing you
want to do is create any kind of even comfort problems with their eyes.
>>:. How would you track [inaudible] power emitted by your systems compared to say, infrared
and yet walk around outside [inaudible]?
Dixon Cleveland: That's a wonderful way to frame that question. Typically, pretty small. I
forget exactly what the maximum permissible exposure number is, but when you're walking
around outside it's typically 5 to 10 times that in worst case. But also in that worst case when
you're outside you tend to squint fairly seriously.
>>: Worst case for you is brighter?
Dixon Cleveland: Yes.
>>: Okay.
Dixon Cleveland: Yeah. So your eye has also a natural protection method of squinting. But, as
you might guess, eye trackers would love to see a naked eyeball out there in space. If there
were just an eyeball out there floating you could aim a camera at that thing and figure out
where it was pointing fairly easily, but unfortunately we have these things called eyelids and
eyebrows and glasses and stuff like that that kind of cloud the image of the eye. And squinting
is a serious problem in eye tracking. It's just a plain serious problem. And it's one that we really
do need to address more if eye trackers are absolutely going to become ubiquitous.
>>: That point that you, I noticed that your materials online say that your systems work on 90
percent of the population. What's going on with the 10 percent? [inaudible]?
Dixon Cleveland: We could talk for a whole hour on that topic. And I really dislike putting those
numbers out because there are so many ways that you have to slice that up. But in general, we
use the bright people, LC Technologies uses the bright people system, and that really provides a
lot of advantage to a lot of people because we get better contrast between the pupil and the
surrounding iris and can calculate the pupil center significantly better using that approach, and
that gives better accuracy, ultimately. But, if somebody has a reflective, the corneal surface
where the light actually goes into the eye, reflects off the retina, reemerges from the eye and
causes the bright pupil effect, if they have low reflectivity there are about one or two percent
of the people that have exceptionally low, well, it's not that high, one percent of the people
have very, very low reflectivity and we have a hard time detecting a bright pupil in that case. In
those cases using a dark pupil eye tracker actually is an advantage for those people.
Some people just have very droopy eyelids. So as your eyelids come down you can still see just
fine but instead of, so if the eye looks like this and the eyelid comes down and whacks off the
top of the pupil, where's the pupil center? And if that lower eyelid comes up in the corneal
reflection is down there you have to be able to see both the pupil and the corneal reflection in
order to be able to predict a person's gaze accurately and sometimes some of those, the
percentage of people just don't have a very good, their droopy eyelids block too much of the
pupil for them to be able to predict their gaze. Some people just plain have goopy eyeballs.
They don't blink enough. And so there's this surface reflection that you get off the cornea that
isn't as nice as it might be and so that can get in the way. So there are any number-
>>: [inaudible] dry in that case?
Dixon Cleveland: Yes. The lateral fluid than the eye is very complicated chemistry and you have
to blink a lot; and some people, in particular who have motor-neuron type diseases, their ocular
muscles continue to work but their blink muscles don't. So they can be difficult to track. So
there are several different dimensions over which that can cause these regions and they're all
fairly small, but they accumulate, and you have to be aware of them all if you're going to design
for a general population. You have to accommodate as many of these cases as you can.
>>: Question.
Dixon Cleveland: Yes.
>>: How do you solve the glass problem?
Dixon Cleveland: Well, regular glasses?
>>: Glasses. Yeah.
Dixon Cleveland: Generally eye trackers can work fairly well through glasses.
>>: But they have all these reflections on the same [inaudible]?
Dixon Cleveland: Yes, they do. If you're glasses are tilted wrong the LED that’s trying to see
your, the camera is trying to see your eye through the glasses and superimposed on that thing
is a big reflection and so happily most glasses are fit such that that reflection occurs outside of
the image of the eye, but not always. A lot of people have these glasses that if you look at me
from the side you see that they're sort of tilted this funny way, and those people can have
trouble. Another thing with regular glasses is the people who have hard line bifocals, there's a
split. So actually when the camera is looking, it sees your eye and it sees the top half of your
eye through one lens and the bottom half through the other lens and it throws up its hands and
it basically can't handle that. Graduated bifocals don't have that problem. Explicitly they can
still find a corneal reflection and find a pupil, but as the calibration of the eye is different if you
see it through one power in through another power. So a cool solution to that eventually is
that you figure out which angle you're looking through that person’s eyes and you put some
information into the computer about the glasses that you're wearing and can figure out all
things. We haven't gotten that far on the eye tracking industry to solve those problems. So
that is a good point.
>>: You mentioned ocular muscles in the superior colliculus function 24 hours a day even in
sleep. Would you expand on that, the sleeping part?
Dixon Cleveland: No. My point was that it just keeps going. It doesn't have to sleep as much,
and mostly when you see that activity is in REM sleep when your eyes do go. So I was making a
point of that’s not part of your system that gets tired or anything like that; it’s happy to keep
going. Yes.
>>: So when you calibrate one of these systems one thing that I've noticed is that you can
perform a calibration that works really great and then come back a day later and the calibration
doesn't work on you anymore. I don't know about your system, maybe you figured out that
magic to make that calibration work over a longer time period, but what's going on there?
What would cause that calibration to come down?
Dixon Cleveland: Again, there are any number of reasons. And the main one is that what is
involved in the calibration procedure? There are two things that you're calibrating a lot of
times and there are two general philosophical designs towards it. Do you mind if I put this topic
off for just, I'm going to get to that topic. I will answer that question. I don't mean to avoid it,
but what I'd like to do is at this point is talk a little bit now about how eye trackers are designed
and what some of the requirements are of eye trackers and that will provide a good basis for
answering your question.
So what are the general performance characteristics that you want out of then eye tracker? My
personal opinion is that the most important thing to do, assuming that you’ve got the ability to
track a large number of people and the system will actually find an eye in the first place, but
once you've solved that basic problem a really key issue is accuracy. There are some cases
where accuracy is, some applications cases where accuracy isn't particularly important. Is
somebody when they're driving, do they glance up outside the windshield and look down the
road every once in a while? You don't have to know if they're looking at this angle or that angle
very much. But in a lot of applications with computers, and particularly with small handheld
devices where you’ve got a lot of options, you want to know is he looking at this coordinate or
that coordinate. So accuracy ends up ultimately being pretty important if you want to talk
about gaze-based interaction with a computer screen.
One of the things that I attempted to do in my earlier part of this discussion was to indicate the
eyes are capable of pointing very precisely. They can point probably, we don't know this, but
they can probably point to an accuracy of the about, and a repeatability on that of about one
tenth of a degree. We know it's at least half a degree because the foveola itself is one degree
across and there would be no purpose in our eyes just pointing with a half a degree error
because the image that we are wanting to look at would be at the outer edge of that foveola.
It’s just not the way we are designed. So we can point at least a half a degree, and a lot of
people say well that's the only, that’s it. It doesn't need to do any better.
Well, there's some nice old eye trackers that have shown, that Purkinje eye trackers that were
developed by Cornsweet and Crane in the late 60s and early 70s, and those guys actually
showed that the repeatability is probably closer to about a 10th of a degree. So back to
Heisenberg Uncertainty Principle, that really ought to be the target of our eye tracking is to get
those kinds of accuracies because the eyes can do that well, why not measure that well? And in
fact, when you design your screens, you put the size of your icons are based upon how
repeatedly and how accurately your eyes can resolve those things. So we want to target for
that. So accuracy is important.
The second thing that's really important is that we, as human beings, move around. I'm not
standing here giving this lecture standing like this. I'm moving all over the place. And if we
don't move, all this research these days going and saying you better stand up and not sit at your
desk for too long, blah, blah, blah, we’ve got to keep moving. We’re just designed that way. So
we need to build eye trackers that can accommodate that.
So those two performance metrics are really central. We want to get accuracy, but we want to
allow simultaneous freedom of head motion. So what you’ll see on this demo system that we’ll
show you after the discussion here is how LC Technologies has solved, demonstrated that that
problem can be solved. What it actually does is takes cameras and puts them on a gimbal and
so the cameras can move around. Was that on the original idea? No. We borrowed it from our
own eyes. Our own eyes are gimbaled. So if that's how nature chose to solve that problem
why don't we just solve it the same way? Just move them around.
And so we have the equivalent of the peripheral vision and the central vision. In the peripheral
vision there's a couple of wide field cameras. In this case we don't move them, we could, but
they're actually different and they have a wide field of view, and when somebody comes in to
sit down in front of a scene it sees that there's a face over there and it swings those cameras
around, and then these cameras are the central vision. They’re telephoto lenses that can zero
in on your eye and they have a fairly small field of view. If you didn't have the gimbal you'd
have to hold your eye in this really tiny space. But what we’ve found is that if you can measure
the eye and get roughly get ten pixels per millimeter at the eye you can get pretty good gazetracking.
So that is one of the important things we want to do. Get 10 millimeters, or ten pixels per
millimeter at the eye. So if you're up head mounted system real close, that's easy. Millimeter
spans a lot. You don't have to get very much resolution in the camera. The further you get
back, those pixels narrow down, and it's a tougher and tougher problem. And typically, if you
want to sit 60 centimeters or roughly 2 feet away from the monitor, to get that you need a
fairly telephoto lens to get that sensitivity. You’ve also got to get enough photons in the
camera to get that high resolution image at that point. And remember the target is let's be able
to measure the eye within, there may be other problems with the noise, but we are trying to
get down to resolutions and accuracies of between one tenth and a half of a degree.
So then, as you move even further back you need more and more telephoto lenses. Well, one
way to do it is just continually put more and more pixels into the camera sensor. Well, if you’ve
put too many pixels in the camera sensor you're getting one photon per hour on one pixel. You
just, there's not enough light out there. You have to throw so much light on the subject to get
the resolution of the pixels that you want.
So then they say well, why don't you just point all your LEDs phased array and just point them
at the eye and not illuminate the entire field? Well, nature didn't design that approach with us
so there’s not an existence proof that that's a really good way to go. So the approach that we
use on our system is to just point those cameras, use the wide peripheral vision. There’s a
completely different vision system to find the face, you could use that also to do the reading of
the facial expressions and the facial gestures’ but the key part that we, for our eye tracking
need out of those cameras is to find out where the eyes are so we can point our eye tracking
cameras at them.
And as you'll see back there, this is a great big, clunky, ugly device; and it's expensive. It costs
tens of thousands of dollars for that bloody thing. But it’s a proven principal model. It can be
done. It demonstrates, it’s a commercially available piece of equipment that we have out there
now, it's too expensive and too big yes, but it demonstrates that the problem can be solved.
And so it really comes down now, we are at the place where we need to just put a bunch of
good engineers on this job, solve the problems with the optics, use smaller motors, little MEMS,
devices, if we make a smaller camera we need smaller motors, and can we get this whole thing
down into the sugar cube that I was talking about earlier on? Well, maybe not next year. But is
that doable? I don't see why not. In fact, I'm confident it can be done. And that's exactly why
I'm talking to you guys because the environment at Microsoft is let's build this thing. If we've
got some reasonable confidence that you can build the thing and have it do what it’s going to
do and allow computers to communicate with us the way people do come with vision systems,
let's do it.
So basically that's the chat, and we can go back and you guys can play with the demo at some
time. But I do want to get back to your question now. So, would you rephrase your question?
>>: Why calibrate the system? How long does the calibration last? Why did you change it?
Dixon Cleveland: The eye is a tough little tennis ball. Its parameters don't change. So there is
no good reason that a calibration that you get today shouldn’t work an hour from now, a day
from now, a week from now, a month from now, even years from now. If your eyeball
changed, if the radius of curvature of your cornea, if the flattening of the cornea towards the
edges, if the location of your foveola within your retina, if any of those parameters changed
your occipital lobe would throw up its hands and say, what in the world? So that doesn't
change. But what happens in most eye trackers is that they calibrate a combination of
parameters. When projecting a gaze, remember that the concept of gaze prediction. This isn’t,
let me get a better. You’ve got a screen you’ve got an eye tracker down here, you’ve got an eye
up here, you're looking at a gaze point out here, and that's your direction of gaze. Behind this
thing is this whole eyeball, and there is an LED in the center of the lens and it’s throwing up
light, excuse me, and there's a corneal reflection here someplace and a pupil center, so we have
the pupil center and corneal reflection. If your gaze angle at the eyeball is fixed all this
geometry ought to be fixed. But when you do a calibration we have to know, in theory, the
general idea is what's the X, Y, Z location of the eyeball in space? What's its orientation in
space? And when you project that line out where does it the object that you're looking at?
So it's a complicated almost robotic problem of calculating the gaze point. Most eye trackers
do a calibration where they throw all of this geometry, the optics geometry, the spatial
geometry of the environment, all into one big model and they say that the x-coordinate of the
gaze is proportional to some constant plus, and this isn’t any one dimension one constant plus
some gain K times the glint pupil vector on a G, P V, and so you have to come up with this and
then multiply it out and there's all kinds of other polynomial expansions about that. This is
what's called a lumped parameter model, and you can kind of get the feeling that embedded in
these coefficients C, K, and all the other higher-order terms in there is all the geometry of the
eyeball, all of the geometry of this space, and if any of that changes then you’ve got to
recalibrate.
So we've actually separated, when we do a calibration on a human we’ve already calibrate the
geometry. And you’ll see back on that thing that the geometry, that we’ve got a monitor that's
actually fixed into the eye tracker. So the eye tracker down here knows its relative position to
the environment. And all we do is calculate seven parameters of your eye for each of your two
eyes. So all we're doing is getting those parameters, and as I said before because the eyeball is
a tough stable little device, a tough little tennis ball as I like to call it, you don't need to calibrate
again.
>>: So those seven parameters are geometric parameters described in geometry of the
eyeball.?
Dixon Cleveland: Yeah, that's right. Anatomical geometric descriptions of your eyeball and
your eyeball alone. I’ve got one other point here. And here's where it really gets down to one
of the important problems of eye tracking. As good as your eyeball is regarding its physical and
anatomical stability, there's one thing in the eyeball that is a potential problem for eye tracking.
And that is the muscles that control your pupil diameter. You have two muscles that are
controlling your pupil. There’s one pair of muscles, one set of muscles, radial muscles that
propagate radially and then there's the sphincter muscles. So the sphincter muscle sits right
around the outside perimeter of your pupil. And the radial muscles are attached the other
direction, and so it's the counterbalance of these two muscles that controls the pupil diameter.
It turns out that this pupil center does not have to stay exactly at the center of the optic axis in
order to maintain a good image of your eye. If it were to drift off a little bit to the right or to
the left, up or down, the photons that do get through the pupil would still converge at the same
point on the retina. Get the idea there?
So it turns out that as the eye, as the pupil opens and closes, it does not open and close about a
precise concentric point that's constant. So as the radial muscles contract and your eyes dilate,
it might dilate more to one side than the other. At that point the pupil center has actually
drifted. Correspondingly, when you're pupil closes back down it might go back to that point
where it could end up someplace else. So in the pupil center corneal reflection method the
concept is that the center of the pupil represents a known and fixed location in the eyeball.
>>: And that also could happen right after the calibration.
Dixon Cleveland: Absolutely.
>>: But that does not fully explain the question of the next time you come back the calibration
does not work. Is it because of the [inaudible] cones or lighting?
Dixon Cleveland: There's another, since you asked the question in detail I'll answer it in detail.
An eyeball, lens of the eye, cornea sticking out, I'll exaggerate it, optic axis of the eye, first
known point of the eye, and then I'll exaggerate. So this is the optic axis, visual axis. At the
back of the visual axis right there is where the foveola is. So when we point our eyes, we point
at such that the visual axis lands on that thing that we want to look at and its image lands right
in the middle of the foveola. So one of the issues with eye tracking is what’s this angle between
the optic axis and the visual axis? And in the optics field that angle is called kappa, and it has a
vertical component and a horizontal component.
So if I calibrate and measure kappa and then somehow I rotate my head my 90 degrees but
we've made the assumption, which is a false assumption, that somehow the eye tracker might
make the assumption that the rotation of the eye is just the same and it goes back to some
equation like this and then it says what’s the gaze point? Well, the glint pupil vector with these
two components has actually shifted some because the gaze vector, instead of projecting
straight out of the eye is off at an angle, and you may have corrected for that angle beautifully
as long as you assume that there's no roll rotation, which is also called wheel rotation or torsion
of the eye, and then you extend, you take the worst case where it rolls 90 degrees. And now,
instead of going out and projecting well here's where the intercept of the optic axis is and so I’ll
therefore translate over two centimeters to get out to the next point, your head is rotated.
And so now the actual visual axis intercepts a different point in the screen. So if you don't
measure the roll angle you've lost the information.
One of the biggest problems in eye tracking is the range. If your eyeball is at this range when
you calibrate and then you move your eyeball back, look at the new geometry. It now sees
your eye from a different point of view and the gaze angle is actually less than it was when it
was closer. So that means the pupil vector actually got smaller. You're still seeing looking at
exactly the same gaze point but the projection, the measurement that the eye tracker is able to
make on that glint pupil vector got smaller. It got smaller for two reasons. It's a double barrel
effect. One is that the angle actually got smaller; the included angle between the camera axis
and your visual axis got smaller, but at the same time the eye is further away so anything that's
further away shrinks in the image. So it then projects a gaze point that’s below. And
conversely, if your eye were to move forward it would project a gaze point that's too high. If
you don't accommodate all that geometry correctly, if you come back a couple days later and
you happen to be sitting further back or sitting too close, you can get these kinds of errors.
So the way LC Technologies has solved that problem is with a completely different kind of
device for measuring range. Most of the eye trackers out there have a camera in the center
and two LEDs that are offset to the side that illuminate the eye and then, so the camera sees
the eye, okay, and in the image of the eye there are two corneal reflections, and the distance of
those two corneal reflections is somehow related disproportional to range, and so you can do a
first order of correction in this geometry. But that's kind of fallacious because it's assuming that
the corneal sphere is a sphere. It isn't. There’s a lot of flattening out towards the edges of the
eye, and that’s just so the eye can focus better. So when you calibrate, if you're looking off, if
you’re looking at some point on the center of the screen and then you look at a different place,
even though your range didn't change at all the distance between these two corneal reflections
did change because the surface is not a sphere. It's varied.
So the approach that we've taken in the eye tracking here is to solve all these problems
explicitly rather than having a lumped parameter model that calculates the gaze point from the
calculated pupil vector we go through many steps. We do calculate the glint pupil vector very
precisely, but then we actually do some ray tracing. So as we find that the eye is all oriented
over in this direction it accommodates the fact that there's a flattening of the cornea. And that
geometry is explicitly calculated when we see a gaze off at an angle. We have a mechanism in
here called the Asymmetric Aperture Method which is able to measure the distance to the
corneal reflection on the eye. That's kind of a whole different topic here, we can go off on that
one for half an hour too, but let's not for the moment. But we can measure the range to the
eye without these two LEDs that are offset from each other. Our LEDs are at the center of the
lens, and we actually look at the shape of the corneal reflection and you will see that in there.
But that allows us to measure the range.
So we minimize a lot of those effects by doing explicit modeling of all of the ray tracing, taking
into account the geometry of the environment, the geometry of the eyeball, as we move back
and forth we got this Asymmetric Aperture for measuring more precisely the range to the eye,
find its X, Y, Z location in space, and therefore more accurately predict its gaze point.
>>: I’d like to leave some time for the people to see the fantastic system.
Dixon Cleveland: Sounds good.
>>: Otherwise we'll continue for another hour. I want to stop you here. Let's thank Dixon.
Download