Zhengyou Zhang: It's my great pleasure to introduce Oleg...

advertisement
Zhengyou Zhang: It's my great pleasure to introduce Oleg Komogortsev. It's hard for me to pronounce
Russian name, yeah. I know Oleg from his work and meeting in person earlier this year at
[indiscernible] 2014 in Florida. Why is my voice suddenly dropped? It's okay. So Oleg is a associate
professor at the University of Texas State University. He has done a lot of work in [indiscernible] team,
but as well as a lot of applications like in privacy, security, human computation interaction; and today
he will talk about some of his projects that he has been working on.
Dr. Oleg Komogortsev: Thank you so much for your introduction. So today primarily I'm going to talk
about eye movements in biometrics and human computer interaction.
Each scientist, each of us, has a motivation. So my technological motivation can be presented with this
short video. Getting the curser is tricky.
"A road diverges in the desert. Lexus. The road you're on is the one [indiscernible]. [indiscernible].
Forget your troubles with [indiscernible]."
Well, if you looked at this video, you are able to see two things. First of all, the surrounding
environment is able to detect who the user is from a distance, whether it does it via eye movements or
iris tracking, that's another question. And the second that he can interact with the environment by
moving the eyes, and basically whenever he looks at the environment detects where he's looking at and
presents the content that is personified to that person. And in my lab we are tackling both problems.
So today my talk will include the discussion on both.
Well, let's start with why biometrics exists. Every day we use quite a lot of passwords. As a human,
it's very hard to remember the amount of passwords that we need to use. And though we use -- we
have to use tons of passwords, usual people use three, the most secure one, the medium, and the one for
the junk stuff. Well, biometrics is a science that targets to identify a person based on who the person is
rather than what the person remembers, such as password or what the person has, such as some sort of
ID card. However, the biggest challenge in biometrics is spoofing. And I would like to present a short
video that would talk about spoofing.
[Music] so you can see here that the intruder took a picture of authentic user's eye and printed it on
contact lens, and then now they're trying to fool very sophisticated biometrics system to get access to
some good. And you can see that they were successfully able to do that, right? And so in my work
we're trying to prevent this. And we are doing it by identifying a person based on eye movements.
And for you to understand this technology, it's very important to consciously experience those eye
movement types. And so today we're going to experience them, the majority of them.
So the first eye movement that I would like you to experience called eye fixation. Could you please
look at number one. And now I want you looking at number one, your brain is getting high-quality
information when you're looking at this dot, and this is how you basically perceive the world around us.
The second eye movement that I would like you to experience is called a saccade. So could you please
look at number 1, and now look at number 2. Saccades are extremely rapid rotation of your eye. The
velocity during saccade reaches degrees of 700 degrees per second. And that's why we are not able to
see during saccades. So we'll see there was hairy monkey running between those two dots. When you
were moving your eyes, you were not able to see the monkey. And you see the world around us by
having those fixation spots and moving between saccades and actually we get the illusion of the stable
picture with all information around us from our brain, but it's a false sort of image, because the brain
reconstructs all this information from those little fixation spots that we exhibit. And actually the acuity
of vision during a fixation is quite small. So if I study my thumb, just the bit of my thumb, I can see
with high quality, and everything else is blurred. And so if it was the other way around, so let's say we
could see everything with uniform quality around us, then the sole distribution in the eye would be
different and our brain would basically burn out, because it would have to analyze all this information.
And the final eye movement I would like you to experience is called smooth pursuit. This is basically
what I see when you are driving a car. And depending on how well you can track your target, the
quality of your vision will change. This is an example of a signal that is recorded from an eye tracker.
An eye tracker web-camera-like device.
So this is the example of one of those devices. Right now it is very cheap, costs just $100. And the
signal that you get is similar to what you see on the screen. And here on the bottom you see the time,
here the position, the horizontal position, and you see those fixations and saccades, and you can see the
fixation are quite jittery, saccades are quite rapid. And this is the signal that you have to work with to
basically analyze who is who.
So right now I will talk about how it is possible to identify a person based on those eye movements.
Well, in the approach that we decided to take, we're not just slapping some learning machine learning
algorithms on the oppositional data that we are recording. One of the aims is to really understand what
is happening with human and most specifically with physiology of the brain, the physiology of the eye,
to identify a person based on those characteristics.
In case of eye movements, there are two main components that make eye movements possible. This is
the brain and the brain sends a neuronal control signal to what is called occulomotor plant. An
occulomotor plant is basically the eye globe and the muscle that operate the eye globe. And the idea is
that that would express, would allow to identify a person based on the structure of the occulomotor
plant and based on the functionality of the brain.
Also what is important to understand, that brain encodes two important questions when working with
eye movements; where the eye is going to go next, and to less extent, how the eye is going to go
between two points. And the occulomotor plant encodes the information how the eye is going to go
from one point to the next. And by encoding this information we will be able to identify who is who.
Well, this is the example of two people reading the text. One person is identified by the red dot and
another person identified by a blue dot. If you look at this video carefully -- and I hope you can see it - you really can see that spatially and temporally those two people move their eyes differently. So this
is sort of an intuitive way to explain that there is difference between people in terms of how they move
the eye. And this is two overlaid scan paths, scan paths is basically sequence of fixation in saccades.
And you can see here that indeed the movements are different.
A little bit more information, I will start by identifying a person based on occulomotor plant. So
occulomotor plant is the eye globe and the muscles that rotate the eye globe. If you look at that
autonomy, there are three pairs of muscles. Two of them are responsible for rotating the eye globe
horizontally, two of them are responsible for rotating the eye globe vertically, and two are responsible
for the torsional rotation. And again my goal here is to identify a person based on the properties of
those muscles and the eye globe.
And this is the moment in time where you look at me, the presenter, not the screen. In my hands I have
a human muscle. And the muscle is not innervated by the brain. It acts like a rubber band. Basically
you extend the muscle, it pulls back, so it has passive elasticity. The picture becomes interesting for
example when we decide to make a saccade and then the brain sends the neuronal control signal to the
muscle. It sounds like this bom, bom, bom, bom, bom, bom, bom, bom, bom, bom, bom; bom, bom,
bom bom. So this bom, bom, bom is very rapid and creates a neural finding to contract your muscles.
And when the muscles are bombarded by neurons, it has very interesting characteristics. For example,
the amount of force that the muscle is able to generate is different depending on how far the muscle is
extended or contracted.
Why does it happen? Because in the muscle we have filaments. Those filaments can be presented
roughly like this. And so for the muscle is shortened, the signal cannot penetrate the middle fibers.
And that's why the amount of force is less than possible maximum. And when the muscle is extended,
the overlap between fibers becomes less. And again that's why the maximum amount of force is less
than the maximum.
Another interesting property that the amount of force changes on how fast the muscle is contracting or
expanding, and this has to do with the chemical reactions that they place to basically change the
neuronal control signal to actual contraction. Also the fibers of the muscle are connected by tendons.
They're connected by tendons to the eye wall and to the eye globe and also fibers between themselves
also connected by tendons. So tendons also expand and contract, and we can create some elements that
would model them. Now you probably be happy to look back at the slides, not me.
In addition we have such things as eye globe inertia and some vicious properties such as fatty tissue
around the eye. Well, what I'm going to say next is that now I would like to model the eye globe
occulomotor plant mathematically. So I'm going to put the elements that have one-to-one respondents
between the anatomical, actual anatomical elements in the eye, and the components of the model. And
so the first component is that neural control signal that I tried to mimic by bom bom bomping, and the
second part is the active contractile force that is generated by this neuronal control signal. Here I have
a linear spring that basically is responsible for this filaments that are connected to each other for this
length tension part. And I have component that is responsible for the tendons. And by the way, this is a
system in equilibrium. The eye is table, it's bombarded to make the eye studying straight ahead, but it's
not moving.
So as I mention, things become more interesting when the muscle start contracting. In this case we
have a bomping component which is added to the system which is responsible for the representation of
this force velocity relationship. Additional interesting things that I hope you can see that, for example,
the muscle is contracting, the length tension part of the component shortens, or you can imagine it like
this. And series elastic, the tendons, become longer. And so this is all incorporated in the model. And
right now we'll just give you a few ideas. I'm not going to go into the deep math. I understand this is a
Friday afternoon and the weather looks pretty nice here.
So what I'm going to do is basically just build the model by creating mathematical equations for each
muscle that is involved in rotating the eye globe. Actually there are two types of muscles, the agonist
and antagonist. Agonist that contracts and pulls the eye globe, and antagonist is the one that is being
pulled and expands.
So what I'm doing here is basically building differential equations that would represent the system.
And this is the one-dimensional presentation that is responsible for describing how the horizontal
component of movement is formed. So basically for each muscle I would have a differential equation
that would describe how the muscle works. Then I generate a third differential equations by employing
a Newton's 2#nd# law that says that all forces combined, that acting on the rigid body equal to the mass
multiplied by acceleration. And in my case I have inertia. And what I'm doing I'm basically summing
up all of those forces.
Additional interesting idea is that when the neuronal control signal is created by the brain, it looks like
a poll step function. Remember that rapid increase in firing, bom bom bom bom, bom bom bom bom
bom so this is what this graph represents, just a rapid increase in firing in those neurons that send the
control signal to the muscle.
Well, inactive force inside of the muscle would change in the same rapid way. The muscle would just
tear apart. So that's why, because of those chemical reactions, it's basically smooth and we have low
path filtering process to represent what is happening. And again, for each of the muscles I have a
differential equation that describes that.
As the result, I have a two-dimensional model of occulomotor plant that has 12 differential equations,
36 different characteristics that have a correspondence between the anatomical properties and the
components of the model. And what the model is able to do is it is able to accurately simulate
saccades. And this is the result of one of the simulations. The gray area is the simulated signal and the
black line is the recorded saccade. And you can see that they differ, but you should believe me, in the
world of eye tracking, that's not bad.
So again, coming sort of back to biometrics. When I started working in this area, the idea that I had in
my head is that all those anatomical properties, what if those anatomical properties are really different
from person to person. Before, nobody tested this hypothesis. And then the question that you should
ask me, well, how did you get those initial parameters with which you stimulated those saccades?
Before answering this question I would like to remind you again a little bit about spoofing; right. So
we are trying to combat the problem of spoofing. So here, additional little video.
[Music] John Anderson approved.
Actually the system behaved quite well; right? This was John Anderson's eyes and he identified as
him. But unfortunately, the eyes were removed; right? So again, in the design of the system that would
use whatever method, we really would like to make sure that nobody would remove any part of the
body to be admitted to the system. And actually removal of body parts is not a joke. I think very rich
businessmen and [indiscernible] was dragged out of the bar and, you know, the thieves basically put his
finger on his expensive Mercedes so they could drive. But unfortunately the system required
continuous authentication. And that thing happened to him. And of course he was alive but without a
finger. So that thing is really important to prevent in biometrics system. So remember those ideas.
So I have occulomotor plant. And basically why I showed the video is that if the eyes are basically
removed, right, then the system is not live. So it's impossible to extract those parameters from the eye
movement signal and identify a person. So if this is actual method of identification of an individual,
nobody would even think of removing your eye and trying to present it to the camera.
Any way, so those parameters, initial parameters from the model were extracted in this way. So if you
are sensitive to blood, I would like to ask you to close your eyes for the next 15 seconds. So what is
happening here is during the surgery, the surgeon cuts the eye and he uses a special instrument to take
extra-occular muscle out, which is connected to the force measuring device, which basically is able to
tell us how much force is generated at different eccentricity levels where the eye is looking. So those
initial parameters are collected as a result of surgery from three people and were averaged in the month.
>>: How do you find [indiscernible]?
Dr. Oleg Komogortsev: A video?
>>: A volunteer.
Dr. Oleg Komogortsev: Oh, no, no, they're getting their eyes corrected, this is strabismus surgery. So [laughter]
So it's actually doing good. Though it looks very painful. I mean at the end the person gets corrected
the problem.
Okay, so the project we have created doesn't harm anybody and you have this little baby who is looking
at eye tracker. Again, if you look at the eye tracker, this is nothing else than basically a web camera, so
it has two sensors and some infrared lights. And another interesting idea is that in terms of hardware
it's extremely similar to eye recognition device. So basically they are identical. So this is something
that I would like you to have in mind until a certain point in the presentation.
>>: [Inaudible]
Dr. Oleg Komogortsev: Actually not very many. And this is something again that I will mention later
as a possible future [indiscernible].
So this is the algorithm that we employed to extract those characteristics and identify a person. So we
recorded the eye movements by an eye tracker, then from the oppositional signal, we extract those
fixation and saccades that you so colored previously. And then we have the mathematical model of
occulomotor plant that I presented to you. And the idea of the model is to simulate exactly the same
saccades that were recorded. Well, we start with the average values, but they don't match well in terms
of what was recorded, so we have an optimization algorithm that basically allows us to select better
parameters that provide a much closer match. And we go through this multi-iterative process with the
goal of reducing the error between the classified and simulated signal. And when this error is the
smallest, then this vector for each saccade goes to the biometric template and then the person can be
identified based on this template. Can you guess who are those two people? One of them is important
in imagery recognition world, now they're in biometrics world.
>>: [Indiscernible].
Dr. Oleg Komogortsev: No, it's Kevin Boyer and [indiscernible].
Okay, so this is another look at how the template is generated. But remember those fixations and
saccades. So for each saccade we are getting this vector of parameters, those anatomical characteristics
of the occulomotor plant, and then this template can be matched to another template. Again, we
employ Hotelling's T square test that proved to be the simplest, the quickest, and the most accurate.
Specific -- so I'm going to present actually the result of this method. The system that we investigated
the performance of this method had a very accurate eye tracker. So this is a commercial grade system,
very expensive and very accurate. The data capturing conditions are ideal. So for example we are
sampling at the rate of 1,000 Hertz our calibration accuracy is on par with physiological capabilities of
the system in the eye, 0.5 degrees. Once again, this is a quarter of my thumb. This is a tower mounted
system that also adds accuracy. And this is the largest study so far in terms of the number of subjects.
So we have 335.
>>: How much time did you ask the user to -Dr. Oleg Komogortsev: One hundred dots -- sorry this is actually the next information that I was going
to present.
So here we have a dot of light that is jumping back and forth. So the idea is to have -- to execute large
saccades, so approximate 30 degrees. So I would say in front of me this would be this much. So the
person has to make 100 saccades. And so our biometric template consists of 100 vectors.
>>: To the same 2 points or -Dr. Oleg Komogortsev: Same two points. Same two points.
>>: [indiscernible].
Dr. Oleg Komogortsev: I would like to come back to the question of unpredictability when I talk about
spoofing. But this is a great question. So for those of you who maybe are not familiar with the world
of biometrics, there are several metrics that basically identify how accurate the system works. So first
of them is false acceptance rate, and basically this is the percentage of the attempts from the impostors
that are accepted as authentic users, false rejection rate, and those are attempts from authentic users that
are erroneously rejected. And what I'm going to present to you, because I have a lot of data, just equal
error rate. So those are two -- this is the point where false rejection rate equals false acceptance rate.
And this is a point that is achieved just by varying the threshold that distinguishes between two
templates.
And basically in identification scenarios, basically the metric called IR, and this is how many -- what
does the person that you correctly identified individual [indiscernible]. So this is the graph. And what
the graph represent is basically number of subjects. So we started with 50 and then proceeded
increasing the number of subjects to 335. And basically what it tells you is it tells you how good the
system scales on the big scale. So ER is really a presentation of the performance of the system in
authentication scenario. Basically when you come to the computer and saying I'm Oleg, and the
system says yes or no.
So with this, the ER was approximately 14 percent. And just for the sake of thought, the ER for such
system as fingerprint is 2 percent. And for the IS, it's even better. Depending on how good is your
data, you can get it to 0.1, or even lower. And you can see here that in terms of authentication that
performance is flat. And possibly for authentication, we even don't have to increase the number of
subjects because we know so much about the distributions of imposters and authentic users. But you
can see that authentication rate really goes down.
And so what we can see right now is the system can be used, for example, in the authentication mode
but not in the identification mode. And by the way, just for the reference, FBI considered -- from what
I've heard, FBI considers acceptable performance of face recognition system of ER of 10 percent. So
very close to the 10 percent. But again it's also important to realize that this is an ideal scenario of data
capture.
So remember, we are talking about two different physiological traits. So one of them is occulomotor
plant, and we call it for short OPC characteristics, that allow to identify a person, and the second one is
brain. So right now I just described this part and now I'm going to talk about the brain.
Okay. There are multiple zones in the brain that are responsible for vision. Almost all of them have
very complex names, and I don't want you to get lost in them. Our goal is to identify the physiology of
the brain, how it works based on eye movements. But we started with processing simple metrics. And
those simple metrics that represent brain behavior are, for example, number fixations and duration of
those fixations. Also we can analyze saccade amplitudes, because they are created as a result of the
neuronal control signal. Also such thing as saccade waveform. For example, here you can see this is
the velocity of the saccade, and you can see here there is an acceleration period and deceleration
period. So how those characteristics are related to each other in the human also can tell who is who.
There are also fundamental properties of the human visual system. So there is what is called amplitude
relationship. This is how far you go with your eye, and how long it takes you to go with your eye. And
you can see that some fundamental works says this is linear relationship, other says this is exponential,
but you can see there is a lot of variability in the data. And we have a certain fit. So this fit also
changes depending who is who.
Another fundamental characteristic is what is called main sequence relationship. This is a dependency
in terms of how far you go with your saccade, the amplitude, and what is your peek velocity during a
saccade. And this is exponential relationship, but again you can see how much variability we have in
our data. And for example, those crosses represent a certain eye pathology that you can immediately
see in looking at this relationship. And again, different people have slightly different relationships like
this.
Also we look at the metrics that are related to the scan paths. For example here you have four fixation
dots, 1, 2, 3, 4. And then the person basically makes a cause between those two points, and those are
fixation. So we looked at the length of the scan path and inflections. And inflections is how frequently
the person changes the gaze in directions. Also we look at the area, how large is the area that is formed
by fixations. And also the distances between different fixation groups.
There are quite a few statistical tests that allow us to compare biometric templates, because the
biometric templates that you get from the brain-related stuff are, first of all, multidimensional and have
different lengths. So there are different statistical tests that allow us to explore certain properties of the
data. For us we determined the Cramér von Mises test worked the best to compare individual
distributions. And because you have multiple distributions on which you have to make distributions,
multiple distributions of this eye movement data, you also can use information fusion techniques to
improve the performance. And the one that worked for us the best was random forest.
So this is a performance of the method. And again, this is the number of subjects that we are
employing, and this is [indiscernible] rate and identification rate. So identification rate is this dotted
line and [indiscernible] rate is this line. So this brain-related metrics, as I call them, provided us with
the best authentication performance, so this golden line of 10 percent. And what I told here by some
people who are very good at biometrics is probably here was some statistical noise. But again, you can
see that for small groups of people, even identification is more or less accurate, and then it goes down
as the number of subjects increase. Again, an important idea here is that this is a very separate trait. So
this is brain related, and then we have the stuff that is related to the occulomotor plant.
Another important idea here is that what happens to the methods when we are decreasing the sampling
frequency of the device. So the device with which we used in the experiments had the sampling
frequency of 1,000 hertz. So, for example, this one, $100 one, has sampling frequency of just 30 hertz.
So how well would it perform. So here we have a single separating curve that basically tells us how
the performance degrades if we decrease the sampling rate. And what we can see here, that there is no
decrease in the performance until 250 hertz. But really after 250 hertz, this equal area becomes larger
and larger. The 30 hertz still allows us to distinguish the person better than random. So there is
information, but the performance is much worse.
Okay, so we wanted then to work on the method that would allow us to combat the degradation in the
sampling frequency. So this method represents the attentional strategy, basically how the brain
allocates the attention when it looks at something. And what we are doing can be represented with
another video. So you can see here there are two people that look at the video.
"Where does your journey end? You seek that which would bestow upon you the right to rule. The
quest to reclaim a homeland and sleigh a dragon. [music] Do not think I won't kill you. When did we
allow evil to become stronger than us? It is not our fight. It is our fight. What [indiscernible]?
[indiscernible] [music]"
It's interesting because directors of movies work really hard that the people would really concentrate
attention on certain points. But we saw that there is no [indiscernible] in terms of where people are
looking, but still there are difference.
>>: How can we make sure that this is not the error in adaptations?
Dr. Oleg Komogortsev: By looking at a lot of data.
So this is the idea of the method. So we are basically creating what can be called as signature of your
attention. So we are looking -- we had this temporal slide that goes to the recording and basically
creates what we call fixation density maps. And basically those are imprints of the fixations that we
make in the short period of time. And so then for each -- the segment -- our segment is five seconds.
For each segment we have a map like that. And you can see the difference between two people.
Basically in the experiment that we had, person looked at the same video at the interval of
approximately 20 minutes. So here you can see this is session 1, this is session 2, from one person.
And you can see that they are more similar than what we can see from another person, for example,
from the session 2. So this is the representation of the similarity metrics that we have.
>>: The same individual over time?
Dr. Oleg Komogortsev: This is a totally separate research. This is what is called template aging.
Again, if I can defer this question to the end, I think that will be good.
>>: You are session 1, session 2, the same content?
Dr. Oleg Komogortsev: Exactly the same.
>>: But would it matter if you watch it first time or watch it second time?
Dr. Oleg Komogortsev: Yes, a little bit it matter. So it shows there are no similarity between first and
second session, and by deferral of the question what happened if you watch it third, fourth, and tenth
time.
>>: What if you drink a glass of wine before watching the -Dr. Oleg Komogortsev: Excellent question. Can I also defer it? I'll remember. So that way you will
have more information by the end. So hopefully it will allow me to be more effective in the way that I
present.
So again, we look at different metrics that would allow us to compare those attentional templates. In
our case, earth mover's distance, which is relatively simple, allowed us to get quite -- well, the best
performance that we were able to get.
We looked at also different fusion mechanics and likelihood ratio was, again, the best.
So the system was identical to what I was presenting before, so this is exactly the same setup. Same
335 subjects and the stimulus was one-half of the official trailer for Hobbit II. So this is the
performance. Again, here we are increasing the number of subjects. And you can see again that for
authentication scenario it's good. Well, again, comparatively good. It's close to this 10 percent line.
And again, it's another trait that we are considering. [indiscernible] not so good, plus decreases with
the number of people.
>>: Is it the number of templates you're matching against?
Dr. Oleg Komogortsev: So, we have the database; right. So the database is basically collected, it is
divided in the training testing set, and the validation I think we have 20 times a validation, randomly
splitting the training testing. So this is the subject pool that we are working with.
So this is how the degradation of performance occurs. So this is the performance of CM, previous
metrics that are related to the brain. And sampling frequency goes down from 1,000 to 30 hertz. So
here you can see that those methods performed, for this specific content, and the performance of the
method changes depending on what you're working with, whether text, video, and the duration. So
here it was at 20 percent and increased almost to 30 percent. But the performance of this fixation
density maps method that only uses [indiscernible] fixations, was pretty stable even when it went down
to 30 hertz. And we even test at 15 hertz, it's very comparable to what you see here with 30 hertz.
Okay, a few words about even another trait. So could you please move your eyes between 1 and 2. So
back and forth again and again and again so do you think you're very accurate in terms of how you
position your eyes on those points? How many of you think that you are very precise? Okay, very
good.
Actually there is quite a few corrective behaviors. So those are different cases in terms of how our
visual system performs. So here I have this simple lines, straight line, it's basically the position of the
stimulus. This is the jumping position of the dot. And this is what is called normal saccade where we
basically match. With the saccade we match the position of the stimulus quite well, within a certain
threshold. Then what we can see is under shoot, basically the system under shoots the target and
doesn't correct. We have simple overshoot, where the system overshoots the target and doesn't correct.
We have dynamic overshoot where basically the pullback from that anatomical -- from the muscles.
We have express saccades, which saccades that are very closely spaced. Usually for our human visual
system to make a new saccade we need at least 200 milliseconds to program the signal for the extra
ocular muscle to execute the new saccade.
But there is a phenomenon that is called express saccades. Those are saccades with latencies of less
than 150 milliseconds. And some people have them more, some people have them less. There's also
corrective undershoot where we undershoot the target and then correct to the target. Corrected
overshoot. Over shoot the target correct, then we have multiple corrections of different kinds, and even
compound saccades that have a very weird signal that basically go back and forth, back and forth,
trying to position to the target.
Now come back on the question of alcohol with that. Okay, so exactly the same setup. And we are
looking only at the uniqueness of this corrective behavior. So here you can see that the authentication
performance is actually much worse than for the other method. It's not random, but ER is at 30. And
identification is quite bad, approximately 10 percent.
Okay, so here I would like to concentrate -- this basically concludes the talk about how I can identify a
person based what characteristics, and some of the questions I would defer to the future work, which
I'm going to talk about later.
And right now I would like to concentrate on spoofing. First, I would like to start with a few ideas,
how you can basically spoof a system that either use iris or eye movements or both. So if you can
believe me, the most successful attack that you can employ to fool an iris recognition system is to
capture a person's eyes in the camera, which you can do at today's technology at a distance of 12
meters, print this iris pattern and present it to the system making a little hole for your pupil so, you
would get a coronary reflection.
And a lot of real commercial iris recognition systems will be spoofed by this simple technique. So this
is one idea.
The second idea is pretty [indiscernible]. So let's say if you have eye tracking-base system, what would
happen, let's say, if you have a camera and you would prerecord the eye movements and then just give
it back to the system?
And now I would like to go back on the question in terms of randomness. For oculomotor plant
characteristics, because the anatomy of the eye stays the same no matter what saccades you are
executing, it is still a pretty reliable method to authenticate the person, because physiology of the eye
doesn't change. Brain behavior is a more difficult thing. But the oculomotor point characteristics, you
can identify a person even if each time you present a new random stimulus. And that why intuitively
prerecorded attack would not be effective.
The third type of the attack is the case where the intruder imprints things on the contact lens, the video
that you saw at the very beginning. We have not published anything on this topic. Intuitively again,
the system would fall back on the -- not iris recognition, but on eye movement biometrics.
The fourth case is the most interesting theoretically. So the case of mechanical replicas. Who watch
the movie Terminator? Quite a lot; right? Remember how they tried to identify who is Terminator and
who is not? Actually eye movement biometrics is the most effective way of identifying who is the
Terminator. Well, you can tell me, of course, that I'm crazy but ->>: [Inaudible]
Dr. Oleg Komogortsev: Yeah, probably the smell.
So here is the attempt of scientific group from Germany to get the mechanical replication of the eyes.
As you can see here we have one Euro, so the system is relatively small. You can of course see that it
makes some non-human-like movements, and especially the speed at which it rotates is not exactly the
same as for the metro system, but at least I mean we have a concept where certain things are possible.
Mathematically we analyze how difficult it would be to spoof the system that uses eye movement
biometrics with the replications by mechanical replicas. And basically here we mathematically
implemented the mathematical models of the eye that exist, and then based on the signal, try to spoof,
at least theoretically, on the system that we have. So for example, this is one of the first representations
of oculomotor plant, Westheimer's second-order model. And those orders, for second, second order
means that the differential equation that describes the dynamics of the system, characteristics
differential equation; has two orders. So it has different disadvantages in terms of it even doesn't
generate realistic velocity profiles. So it's very easy to detect even without looking at anything
sophisticated.
So the second model that we looked at was Robinson. It was a more complex model that provided
more accurate velocity profiles. But still they are not realistic. And the third model was the one that -the model that was created by me when I was doing the dissertation and refined by my students. So
this is the model that I presented to you. And they actually looked at even one more model,
[indiscernible] model. And this is how we tried to generate the spoof signal. So remember that the live
system has a lot of corrective behavior. So we tried to do the same thing by injecting this instability
into the system so it would make those overshoots, undershoots, and also inject jitter during fixation.
So we tried to do whatever we could to make it as human as possible.
And so this is how the system was spoofed, basically to this mechanism of identifying a person based
on occulomotor characteristics. We either fed the live signal or the signal generated by those models.
We have extracted the biometric template, and we have analyzed the numbers in those templates.
And here's an important idea, is that human people are very, very variable. So they have a lot of
variability embedded in them. That is why we are so complex, have problems and many other things.
But an important mention here, that for mechanical system, it's extremely difficult to replicate some
type of variability. And if you want to do it in a mechanical system, it much more increases the
complexity of your system.
So by looking at actually quite simple statistical methods, such as PCA, and looking at the conveyance
matrices that are generated by the data, we are able to find out whether the data came from the
mechanical system or from the live system quite easily. So we used basically the same setup. And the
person was making horizontal saccades or the mechanical replica was doing the horizontal saccades.
There are two important scenarios. First scenario where the intruder generated the mechanical system
just based on the averages that are published in the literature. Those averages from those three people.
And the scenario B that is adhered by many corporations is that where your passwords or, in this case,
biometric data is fully compromised by the intruder and they have access, direct access to your
authentic biometric template. So they use this biometric template to create a perfect mechanical
replica.
So in these systems we found out -- the performance metric here is correct recognition rate, and 1-A
was the most simple system. So you can see most simple system that spoofed the biometric system.
So correct recognition rate was the highest. And for the B, where everything was compromised, still
the detection rate was high. And we also have unpublished results whereby additional malices we can
get those numbers close to 99. So it's very reliable in a way how you can detect this mechanical
replica.
Okay, so the conclusion here is that we have extremely high detection rates for the mechanical replicas
of the eye. And this is a very important property of the biometric system. We also have work that
allows us to defend against [indiscernible] attack, but again it's unpublished so I'm not going to talk
about it here.
So my overall vision in terms of how the system works, and I call it ocular biometrics approach, is on
the iris recognition device. And again, iris recognition device is basically image sensor and some
infrared lights. We are able to collect iris information, plus information about the occulomotor plant
and the brain. And that way the system becomes very, very accurate and very secure.
And so funding for this was provided by NSF and NIST. And if you Google, for example, for eye
movement biometrics, or passwords, you will see a lot of articles about the work that we have done.
Okay, so future thoughts. So probably all of you all know this, this is Google Glass. So the idea is how
we can employ or how we can authenticate effectively using Google Glass. Right now, basically, you
need to swipe, it's very cumbersome, so basically either protect your device or do something that is not
really natural. So the idea is would it be possible to identify a person based -- basically put the device
on and then, you know, read the line of text. And then the person automatically detects that this is you.
And then if somebody else puts the glasses on it would immediately detect that this is not you.
Well, the challenge of the Google Glass is that the amount of power that is available to the device is
quite limited. They actually have a camera that is facing the eye, but it's very small, just one pixel. So
the question is can you design a device that would allow you to do eye tracking, either with this 1 pixel
camera, or with a little bit bigger one. And this is where there is an advantage of eye movement
biometrics over iris. Iris, to work correctly, needs really high resolution camera to get the eye image
and do some processing on this image. For eye movement-based biometrics, it can work even on
images that are much smaller. But again, this is a future work and something that we are interested in
working on.
Other applications of the technology. And after this I will come back to your question about alcohol.
So right now there is a big problem, for example, in professional sports in terms of the detecting
traumatic brain injuries. So but we found out that because we are extracting so many metrics from eye
movements, more than 50 at the moment, we are actually able to identify quite reliably in terms of who
has mild traumatic brain injury and who is not. We have a very limited sample and we would like to
grow it to more people, but for the very limited database that we have, we identified people with 100
percent accuracy and we published this paper at [indiscernible] that was presented this year, in addition
with eye movements, because really you are getting information about the brain behavior and about
occulomotor plant. It can tell us quite a lot of things.
And yes, you can tell whether the subject is fatigued. For example, the corrective behavior increases
dramatically. You can tell if the person is drunk even if they cannot follow the single dot on the screen.
So if you have a variable computing device that is able to do eye tracking, not only can you basically
interact with this device by using eye movements, but you also can be authenticated and your
psychological or physiological state can be detected. But in terms of what he with can do, only
traumatic brain injuries is a published work.
And then of course the questions from you can be, so is the performance affected by the alcohol? So
basically right now we don't know how the template changes. But we really can see that there is a
sense in doing this analysis in terms of if the person has traumatic brain injury in terms of personalized
analysis. So if you have a biometric template and the person is healthy, and when the person has a
certain pathology, for example, acquired as a result of the heat, we would be able to detect this
pathology much more accurately than just based on some averages that are recommended by the
research literature.
>>: Doesn't age matter?
Dr. Oleg Komogortsev: That's right. That's right. So there is research, so I can now come back to the
template, age in question. So, yes, everything in us ages, and so the performance of the muscles
changes, of course. So the muscles being less elastic, so they change their properties.
Yes?
>>: Yeah, the reason I mention [indiscernible] as an example, but basically the motor skills are affected
by things like angry for example, a high level of [indiscernible] and that changes the way we act. So on
these applications, I mean the classical [indiscernible] if you try to reconnect to your voice and then
you call and it doesn't work any more. So do we have similar problems?
Dr. Oleg Komogortsev: Yes, I think we do have similar problems. So unfortunate the resources of my
lab are limited and so we are able to say what we are able to say based from our experiments. So we
have not conducted extensive studies, either when people are drunk or because we have a very limited
database. But we have acquired those people, so we are writing, you know, whether they drank within
a certain period of time, how many hours they slept, how much coffee they drank and so much
information like this. We just did not have time to analyze. Actually the lines that I presented, the
performance lines are not even published.
So Yes?
>>: I'm angry with Christian that there are lots of interesting individual differences that could be -things ranging from, you know, fatigue, and anxiety and whether you had a fine glass of wine, or
familiarity with the task. Do you have any way of characterizing what those are? I mean they're
probably not random differences, right; if so, it would be interesting -- that's why I asked about the
differences within an individual.
Dr. Oleg Komogortsev: Yes, I don't think they are random. See, this is sort of the research in terms of - we have an application, biometrics, can we make eye movement work in biometrics. For example,
psychologists take a fundamental approach where they break the performance of people into very
controlled experiments and publish meticulous journal papers on each small change in the person's
state. I would love to do it in the same way, but I don't have the luxury. So I can only report numbers
from very large recording of people. And we also have [indiscernible] recordings, basically the
recordings of over eight months, but we just did not analyze all of that data. The only thing that I can
tell that there is definitely a template aging. So this ER would increase, let's say, if the person was
recorded again in the period of twenty minutes or she or he came back after a period of two weeks.
Even two weeks we see the differences that effect ER. So definitely there is a notion. For example,
right now there is a fight in Biometric Community if there is aging, for example, in iris. So there are
different arguments on that. So here I can definitely say there is no need for the argument, there is a
template aging problem, which should do the research more.
Yes?
>>: Going back to the individual differences briefly. There is some work from a few years ago that
shows that from repeat scan paths, that people begin to become more efficient and they [indiscernible]
scan paths at some level. So I wonder, one thing is at the rate of learning, is there a [indiscernible], and
there are any challenges for you in terms of the kind of fixation points to the kind of targets that you
present to a person? So if you present random ones, would a different kind of efficiency emerge in how
you scan them?
Dr. Oleg Komogortsev: So there is literature that talks -- that specifically talks about scan paths. What
I can tell is that biometric performance would be different for different stimulus. There is a statistical
difference in terms of how much we can extract. For example, for the movie that you saw, the
amplitudes of the saccades and distributions of those saccades is different, than let's say, for completely
random stuff. So the amount of occulomotor behavior that we can extract would be less because we
execute less saccades. Because subjects get tired and he doesn't want to look at anywhere that follows.
Or even with text, one of the stimulus that we presented was text. Some people meticulously read the
text, others just, "give me the $20."
So definitely a lot of interesting information that can be extracted. And unfortunately the only thing
that I can say that we have not analyzed it.
So this is actually not the end of my talk. I think I have time until 1:00. Because I have -- human
computer interaction component. So eye movement interaction ->>: Is there any kind of upper limit which you guys have analyzed in terms of failure rates? Like for
example, we know that for face recognition it's around one in ten thousand, fingerprints one in 200,000.
Dr. Oleg Komogortsev: Yeah, so those -- so remember the ER is where the false acceptance rate equals
false rejection rate. We can change that. So we can make it more toward false acceptance -- I mean
less false acceptance, but then false rejection would be much higher.
>>: Yes, but what I meant was that even for the example of the ER, there is like, you can see kind of
what kind of freedom you have. That's exactly what you were talking about a few minutes back is
there is a limit to the variability in the saccades and position where a user will point at. So is there
basically a limit, like for example we know coming back to the example of faces around 20,
fingerprints around [indiscernible], where is this limit where [indiscernible]?
Dr. Oleg Komogortsev: So I would wish that I would answer beautifully like John Dougman did for
the iris work. He basically mathematically said that our irises are so highly unique that those
distributions between the imposters and authentic user are really quite separatable for quite a lot of
people. What I could have done theoretically of course, inject some differentiate occulomotor plant
characteristics and tried to analyze it theoretically. We have not done that work. But definitely there is
a lot of similarity. It would be actually quite surprising to me if it would be really fundamentally
different on the level of muscle design. Like for example, for iris, I mean the pattern is performed
when we are still in the womb of our mothers. And so it's basically completely random. Where here it
has a certain purpose of guiding the eye. So the mechanisms are actually quite similar for each person.
And if you look at the psychological work, most scientists study how people are the same in trying to
average the performance and see how similar we are in our behavior. And only biometrics study how
different we are. But the majority of the biometrics branches study static traits such as face or iris or
fingerprint. We can only name a few. The one that is more or less established is gait, and there are, I
guess, exotic ones such as heart rate or EEG, and they also have advantages, disadvantages.
>>: When we analyze faces, what we see is that the difference between faces increase a lot with age.
Basically already 2 percent. And you have something similar?
Dr. Oleg Komogortsev: Yes, so there is published psychological research that basically states the
maximum velocity saccade changes with age. So it immediately -- so those few fundamental metrics
that we are employing would be affected by aging. But for example if you would say how it would
effect the movie, looking at the movie, we don't know. Like for example, how the distribution of
fixations. Because remember there are two questions, how we move between points and where. So
how the age effects where our eye are pointed, I don't know. And I don't know of a literature that
would investigate that point.
Okay, so a few things that might be also interesting to you about the design of eye gaze interfaces. So
this is the example where something that we have designed in the lab. And this is basically a photo
viewer with an idea that the person would be able to select, let's say, different images, by looking at
folders and then selecting specific image and then this image would present to the person.
And right now eye tracking in human computer interaction is popular mostly for the disabled users. So
you can interact much more fluidly, for example, than with EEG, things like this. I think actually when
those aid is, in my opinion, quite well designed to work with I gaze, because there are tiles and you can
change the separation between tiles. And those are the numbers from my lab in terms of when we
experimented, we found sort of optimal numbers. So for example, for the layout, the layout for both
performing eye gaze devices, for example, is for something like this should be greater than 0.5 degrees
of the visual angle. Once again, it's a quarter of the thumb, much less than that, otherwise there would
be problems. An individual component size should be greater than 2 degrees of the visual angle. And
those are the representation that we have.
And so for the conclusion of the talk, I wanted to present something that I hope to be thought
provocative. So here is an image -- a video from a game. And the goal that I have for you guys, or the
task, try and understand what is the goal of the main character in this movie.
[Music]
So what do you think, guys, is the goal of that guy?
>>: He's a [indiscernible].
Dr. Oleg Komogortsev: No, actually he's doing damage. So his idea, his goal, is to do as much
damage as possible before he dies. So with that goal in mind, and actually this branch of research, I
was and still am a big fan of War Craft, so it was inspired by this game. So the idea here is to select a
target as soon as possible, to create a target. If you are able to select a target as soon as possible, then
you would be able to do more to that target. In this case it's damage.
So let's look at how basic mouse-based selection work. So what happens, that the brain sees an
interesting target in the periphery. And then basically it looks at that target. And then the hand, with all
the muscles and bones, moves the curser to the target and then there is a click.
Well, with the conventional eye-based selection, what can we do? Again, brain sees the target,
programs the eye to go to the target, and basically eye dwells on the target and then the target is
clicked. So then we basically reduce the amount of time by the time that we need to move the mouse.
And so the idea is can we actually do better than this? And sort of can we click instantaneously? So
just program the movement to the target and then click. So this is the idea. And first of all, so this is
how the conventional method works. So this is the time where the brain is programming the signal for
the eye to go to the target. We have a saccade here, and then with a dwell time we select the target.
And what I want to do is that immediately after the signal is computed, I would like to select my target
here. The eye even did not land on the target yet.
Well, it makes sense to find out, you know, how much time it would save if such method would be
implemented. And here is basically my simple theoretical analysis where I have target acquisition
time, then saccade duration, then dwell time. And in the method that relies on instantaneous saccade
selection, there would be target acquisition time. And I start moving, and then I select. And
[indiscernible] basically how many samples I need from the eye tracker to determine where the saccade
is going to land.
And then roughly speaking, considering the size of the regular computer screen, my speed up would be
35 to 45 percent. So I need two parts. First part it would detect the onset of a saccade, the start of the
saccade in realtime, and then predict the amplitude. So the question is whether it's possible
theoretically and whether it's possible practically. So in theoretical evaluation, because now I have this
model of oculomotor plant, again that I represented to you, I can analyze this signal. So what I do here,
I employ the common filter to process the signal and use the Chi-square test to also analyze this signal.
And what I can see in the Chi-square test part of the signal, I see those two distinct peaks. And the time
of the occurrence of the first peak is not -- doesn't exceed 14 milliseconds. So if I can connect the
properties of the saccade to the properties of this peak, I would be able to predict where a saccade is
going to land and click the target in 14 milliseconds.
And so the first question, is the size of this peak correlated to how far I'm going to go with the eye with
a saccade.
>>: Again with that peak is, is it the detection of the saccade? Or the location, the destination -Dr. Oleg Komogortsev: So basically because I have a model, I can simulate as many saccades as I
want. So what I'm basically doing is simulating the trajectory of a saccade by a model, and on top I'm
putting 2-state linear common filter. From the common filter I'm getting the prediction in terms of
what my velocity is, I'm analyzing this velocity by Chi-square test, and running this temporal window,
so which gives me this precise signature in terms of how the signal is going to look like. And then I'm
connecting the properties of the predictive saccade to the properties of the peak.
And what I can see here is that the saccade amplitude and the value of the peak are very, very well
correlated. So I can predict where it's going to go quite accurately, at least theoretically.
So that was practical evaluation. In the practical evaluation, I'm basically doing a simple task of
selecting the target. And target is moving back and forth only horizontally and randomly, because in
the two-dimensional movements are actually much more complex than horizontal movements.
Okay, and so basically it's a [indiscernible] test, and the numbers that I'm presenting here is that it was
much faster to select a target using instantaneous saccade, so I have 57 percent in reduction of
completion time. And you saw in theoretical analysis, this number was actually much smaller. And I
think the difference, because I related the amount of latency, or target acquisition time that usually
occurs on the human visual system.
And in terms of the throughput calculated as a result of the [indiscernible], we see that this 48 percent
increase in terms of throughput.
Well, what is the advantage of this method? Like any eye-gaze based method, it has inherent problem,
of course, the Midas Touch problem. So if you employ this, your application should not suffer from
Midas Touch problem. So question is, is it possible to create an application like this? And again World
of War Craft is a much example that is geared toward this, where if you erroneously select a friendly
target, nothing bad happens. So basically you can only do damage to friendly -- to unfriendly targets.
So in this case it's a problem of the designer how I can create an interface where erroneous selection
doesn't cause any problems. But again it's an interesting problem because now I can do things much,
much faster than before.
So we have created also a simple computer game that simulated World of War Craft environment. And
what it is, basically we have balloons, we have red and blue balloons. So blue ones are friendly and red
ones you should pop with your eyes. And this is the method of doing it by dwell time. And you can
see that it's very hard for the person to dwell on the target and get selected. And so tasks take quite
substantial amount of time. And for instantaneous saccade selection, it's much, much quicker. So you
can see here that the person is able to pop those red balloons much faster than before. And actually
when we do the introduction of computer science to kids who are interested in [indiscernible], they
love this game, so all they want to be computer scientists and design games that are moved by the eyes.
Okay, so in terms of future work, it is my personal goal, sort of, to make eye tracking part of existing
iris recognition devices. And I believe that it can be done with a software upgrade to increase the
security of those systems. Also to create methods for user authentication for -- on the wearable devices
like Google Glass, and also employ this technology to test for pathologies, psychological or physical
state.
And I would like to acknowledge the funding that I have. Those are the sources.
Thank you so much, guys, for your attention.
Yes.
>>: I think going back to the previous question of spoofing. So there is this -- I mean probably an
analogy to [indiscernible] problem where we ask the user to speak for a certain time and have these
graphs where, say, the longer you ask, the more features you can select and the stronger -Dr. Oleg Komogortsev: Uh-huh. Uh-huh.
>>: So in terms of any biometric, the bigger factor from a human user perspective is the sloppiness -Dr. Oleg Komogortsev: How fast, yes. That's right.
>>: -- saccade is looking at something and is recognized when you start putting in features like eye
movements and so on, the question of course is how long would the user have to do all these
movements before you actually think he is not spoofing.
Dr. Oleg Komogortsev: So we start to be non-random sort of from 10 saccades. So 10 saccades. And
then for 40 saccades we have quite good biometric performance. After 40 saccades, let's say, after 40
saccades, sometimes we see issues such as equipment slippage. And this is, for example, for the deskmounted system, or the person fatigue, gets fatigued. So the data becomes more noisy. However,
statistical methods can remove part of this noise, or [indiscernible] this noise. So the final research
conclusion was, again, the more data we have, even if it's noisy, the higher is the accuracy.
>>: But your golden number was around 40?
Dr. Oleg Komogortsev: 40, right. So where we have the highest quality of our signal. So I think it's a
good balance between accuracy and the amount of data that should be collected for the first decision to
say who's who.
>>: How much time was it approximately?
Dr. Oleg Komogortsev: So in terms of physiologic limitations, how quickly we can tamp out those
saccades. So maximum way you can make approximately four or five saccades per second. So at least
-- yeah, so the more time, the more comfortable you are doing it, so four seconds -- yeah, so [laughter]
so yes, that's right. Eight seconds is the answer.
>>: [indiscernible].
Dr. Oleg Komogortsev: Okay, great. Thank you very much. [applause]
Download