>> Mary Czerwinski: Hi all thanks for coming. It’s... introduce Jonny Canny today from Berkeley. He is a...

advertisement
>> Mary Czerwinski: Hi all thanks for coming. It’s my pleasure to
introduce Jonny Canny today from Berkeley. He is a professor of
computer science and his research is near and dear to our heart because
he marries HCI, with machine learning, with signal processing and even
big data. He is focusing these days on health and education primarily
and he will be talking to us about those topics today. And some news
about Berkeley, John’s participating in sponsoring the creation of a
Jacob’s Design Institute, which is supposed to open next year?
>> Jonny Canny: 2014, I did say next year, 2015 I should have said.
>> Mary Czerwinski: 2015, that’s kind of cool that it’s happening soon,
so welcome Jon.
>> Jonny Canny: Thanks Mary and thanks for the chance to talk to folks.
So today I was going to talk about some work that we have been doing
for a few years on the opportunities of sensing without sensors or at
least without things that we normally recognize as sensors. And most
of the work has been directed at sensing some sort of affect. Most of
our work has actually been on stress. We did a little bit of work on
pain and some work that had a focus on depression and mental illness.
The signals though are quite similar that we end up being able to
detect and we are currently starting to move from the sensing and
detection into interventions based on what we find on the sensor data.
So to quickly review stress, stress is a very important and very global
regulatory mechanism in human beings that helps us function at enhanced
performance when we need to. That’s how we pull all nighters and make
conference deadlines, but it has negative effects obviously as well if
we leave it on too long because basically our stress mechanism is a
tradeoff between fight and flight ability. Our ability to respond
accurately and sort of exceed or average potential briefly and all of
the important regenerative and repair functions that we need to do as
well.
So we have, it’s a complex system, but roughly there are fast and slow
components of the stress response. The fast one is the adrenaline or
epinephrine response which is when we get excited our pupils dilate, we
are ready to spring in to action and it happens very fast for obvious
reasons. If you are going to fight or flight you probably should do it
soon. And then there is a slower response which is to do not, well
largely with shutting down or sort of toning down the rest and digest
functions. So this mechanism is perhaps the one we should be most
concerned about because when that kicks in over long periods of time
that’s when we start to have trouble.
Okay, so two different physiological responses and they tend to have
two different types of signals that you can observe in the body in
response to them. And I will just clarify for those of you that, no in
both cases I am talking about parts of the Sympathetic Nervous System.
So both of these parts are enhancing fight and flight and reducing rest
and digest. There is a complimentary system that helps with your
regenerative functions, but we are only talking about the negative side
or the stress side I guess on this slide.
All right. So we have these fast chemical mechanisms and then the
signs you see in the body are heart rate increasing with fast stress
response, breathing speeds up, pupils dilate, we improve memory
retention and some types of recognition and our emotions are typically
intensified, amplified. Then the slow mechanism interestingly the
heart rate variability, the sort of variance in heart rate, changes
with the slow stress response. We get galvanic skin response changes,
little pulses effectively in changes in skin resistance with
perspiration, breathing becomes more eccentric often and we have muscle
tension. So when we say we feel tense because of stress that’s not
just a kind of illusion, we really are tense. Many muscles in the body
become tensor in response to stress.
So heart rate variability, although we are not primarily trying to
measure it this is the gold standard that we do in the [indiscernible]
work. So just to quickly review what it is when we talk about heart
rate variability we are typically talking about different measures of
variation of inter-beat interval and this is a very complex topic.
There are lots of different measures. The simplest one to understand
is just the variance or standard deviation in that actual number from a
suitably measured peak in the heart wave form. The others are quite a
bit more difficult to explain, low frequency/high frequency ratio which
requires us to Fourier transform the interval signal.
The first one is easier to measure, but it’s harder typically to
provide physiological justification for. Biological arguments have
been made that this more complicated measure directly measures the
ratio of the sympathetic or stress response to the parasympathetic or
rest digest response. So, I am bringing it up, we will say more about
it later, it is a measurement we made as part of our study of
[indiscernible] sensing. The hardware we use is another project we
call the Tricorder, Berkeley Tricorder. That was a custom sensor that
we started building about 8 years ago and we can probably see some
dates on here. This is a 2008 version; it recorded all of these vital
signs. So in particular we had a heart wave form signal and it would
both store the data on a flash card or also stream it out on bluetooth
and it recorded a lot of other measures that we didn’t use for this
study.
So having done the work on the Tricorder by the way we became quite
enamored of the idea of not using hardware sensing. Hardware sensing
is obviously hard to do, it’s challenging, and there are lots of
adoption challenges even with fairly simple technologies like wrist
watches and so on. So we really started to focus on measurements that
we could make implicitly from people using existing electronic devices
in the way that they would normally use them. So we did some work that
I will describe on monitoring a voice on a cell phone typically during
phone calls and we also have a sensor in our lab, which I won’t talk
about, but we have the ability to also do location analysis and also
affect analysis on voice signals of people in the lab at Berkeley.
The most interesting stuff which is current work is looking at motion
sensing from the mouse, from people using a mouse on a desktop,
potentially using a trackpad on a laptop which we haven’t done yet.
Another similar kind of signal which we do have measurements of and
which we are analyzing right now is accelerometer phone. The phone is
quite rich and it includes the study use of the phone. We are
interested in both the passive signal of somebody just holding the
phone in front of them and looking at something, the tapping signals
from interacting with the screen widgits, dragging and so on. All of
these produce a signal that can be analyzed in a similar way to the
mouse signal.
Finally there are a lot of opportunities, we are not pursuing this, but
a number of other people are. Actually one of my former students has
been recently looking at camera sensing on cell phones for pulse
measurements for stress also. There is a lot of opportunity there, but
we are not currently following that though.
Okay, so speech encodes stress in an interesting way. It’s, I am not
an expert, but as I understand it part of our response to stress is for
the vocal cords to pull away and basically open up the breathing
passages so we can suck down more air rapidly. So those stress related
muscles actually pull on the vocal cords and change the shape of the
glottal pulses when we speak. This particular signal happens to be
very accurate in predicting stress. So, we have got about 95 percent
accuracy, and what we were measuring was a controlled experiment where
we deliberately stressed the subjects and had them in an unstressed
state. We then looked at the accuracy of their signal in predicting
which state they were in.
So it would be nice, it’s desirable to do analysis of that signal on
the phone, but it turns out to be computationally challenging because
it has a fairly complex inverse problem to solve. Because we are
interested in their signal, which is the vocal cords basically
flickering I suppose, that’s the signal that produces our speech, but
before we hear it goes through the vocal tract and basically a complex
filter is applied by the way we shape and make phonetic sounds. So
what comes out is a lot more complex then little pulses, more or less
isolated pulses that are going in.
So the task to do this analysis on a cell phone is to model and invert
this transpose. So notice that we don’t know what the filter function
is because it’s varying as the shape of the vocal tract varies. So we
have to both estimate it and try to undo it.
Yeah?
>>: So when you say 90 percent accuracy at predicting stress you mean
the short term stress right?
>> Jonny Canny: Yeah, I mean stress that we were able to induce with
math problems, yeah, short-term stress. Well, okay to be a little bit
clearer there are both kind of probable adrenaline and probably
cortisol responses in the stress response that we measure so it’s not
purely short-term and in fact a lot of the experiments we have done
have had more than a 5 minute lag between the stressor and the
measurements. So they are more likely to be from the slower response.
Well, this is more detail than I need, but there are a lot of features
that contribute to stress analysis. The one that works really well
though is the one that’s not on this page which is the glottal
features; those are the ones that are de-convolving. So to remind you
those tend to be more then all of the other ones on the other screen
combined. So we basically did some work on accelerating. So we have
an architecture for doing general speech analysis with all of those
features, but the features on that list were fairly easy to implement
because we used an existing tool kit called OpenSmile.
Yeah?
>>: So how do the classifications features that you are looking at
differ from [indiscernible] recognition?
>> Jonny Canny: Well, so some of them are essentially, it’s a broader
set. I mean speech recognition tends to be built on HMM models on the
MFCC, MFCC’s are more or less the standard features at the front end of
a speech recognizer. The other ones tend to be, you know --. Pitch
per say is considered more of an affect signal than a speech
recognition signal. A lot of these things are the people that speech
recognition people try to avoid because this is highly variable from
male to female speakers, children and so on.
So speech people try to, in a sense speech recognition people try to
factor out those effects, but we are very interested in them from an
affective point of view. But for pure speech these are basically sort
of filter banks, nonlinear filter banks. And then they go through
HMM’s to basically track the little trajectories of phoneticization of
speech. When we do recognition we are actually using these more as
like static features. So again getting more of this kind of general
average shape of the vocal tract which is probably a better cue to
affect than dynamics, but there are some dynamic features that we --.
So my student did a course project and one of the things that he was
trying to measure was using the speech features and as simpler HMM
basically measuring speech repetition the number of phones per minute
and also the durations of pauses, because diminution of pauses is the
sign of stress as well. So yeah, there is a little bit of stuff that
you can get from high level phonetic analysis of speech, but we didn’t
do much of that and we didn’t do any analysis of the content of
recognized speech either, which could have also been a good signal.
Yeah?
>>: [inaudible].
>> Jonny Canny: Yeah, so 90 percent I am saying the experiment was a
counter balance to the experiment where people would either start off
or they would rather be sort of pressured into a stress state and then
relaxed or they would go the other way around. So we were comparing
the two signals for each speak --. No, I am sorry; actually the 90
percent is a speaker independent measure. So it’s actually a very
strong measure. I need to keep the experiment straight.
So that was a remarkable thing about our work and the work before which
is the stress measurements are actually speaker independent. So they
are surprisingly strong. The work I am going to talk about later is
trained per subject. But the vocal speech features are robust and you
can get those accuracies in a speaker independent way, surprisingly.
>>: [inaudible].
>> Jonny Canny: Yes, so the point is that almost all the recognition,
or most of the accuracy comes from the glottal features which are much
more speakers independent. The changes in them are more speakers
independent. So those are good questions.
So here is the framework that we built onto a phone platform. Again,
most of the features came from the OpenSmile Toolkit which is an open
source speech processing toolkit and our classification was just
simple. It was linear SVM’s built on those features. The toolkit, I
think we have the accuracy figures, with let’s see, where are we? So
these figures by the way are for recognizing a discrete set of emotions
rather than stress. The numbers are lower, but also they are multiwave classifications so you would expect them to be lower. The numbers
for the stress figures down here are with and without those glottal
features.
It’s a bit misleading because you see only a 1 point increase. The
point is thought that this is a very large increase at that level of
accuracy. And also when you look at this recognizer it’s using mostly
those features. So they do contribute a lot to recognition. But it’s
still worth pointing out that the stress features are actually
remarkably strong compared to say recognition of emotion. Recognizing
stress is relatively easy, certainly for machines, probably for people
too I suspect if you really ask people. A lot of people I know claim
that they can tell, certainly relatives claim they can tell when I am
stressed.
Okay, so that’s what we did earlier on recognition of stress from
voice. I should point out that there was related work by other at
Georgia Tech on actually discriminating depressed and non-depressed
patients from very similar signals. And they also obtained accuracy in
the 90's for detecting depression. But they had a control group and a
clinically depressed group and the signals were very strong between
those two.
Yeah, question?
>>: Yeah, I certainly feel my stress level is kind of continuous from a
little stressed to a lot stressed. Do these signals operate in that
same way?
>> Jonny Canny: Yeah, so all of the measurements are going to be
quantitative measures. So I am quoting accuracies in order to give a -. Well, actually these accuracy measures here are assuming some sort
of threshold has been set. A rock area is a kind of measure that is
capturing the accuracy as you vary the threshold. So rock area is
normally used for a quantitative measure of some kind of feature. It
tells you that this should work over a range of different values. So
it is a useful quantitative predictor.
Okay, so the most recent work we did was on motion sensing because the
voice work is interesting, but there are a number of issues with it.
It requires somebody to speak at regular intervals if you want to track
stress. It can potentially; well it suffers from the usual challenges
of speech which is people will often do it in a noisy environment. In
fact people will deliberately avoid often quite environments like their
office so they don’t disturb people. On the other hand things like
computer keyboards and mice are tremendous potential tools for sensing
because people use them so much and they are using them also in a
context where there potential stressors are actually hitting them at
the same time.
So we started looking at phone sensing and, excuse me mouse sensing and
phone sensing. So we have a system called MouStress which looks at the
stress-induced increases in muscle tension. As I mentioned there has
been work documenting stress certainly in the neck and shoulders, but
also in the arms and people have specifically looked at stress during
computer work. So there is very large literature on this. This is
just a couple of references that have kind of told us the obvious that
yes, people do have muscle tension when they are stressed and it’s
measurable. So we are in good ground for trying to do this.
So then to the study our goal was to see if we could measure stress by
looking at fairly typical mouse movements such as moving, clicking,
dragging and steering. Steering is a maybe less common one, but does
show up. Steering kind of mechanically it’s one of the best ones for
making the measurement. It’s not as common, but it would include
things like tracing your way through kind of nested pop-up menus. So
we have not yet done a naturalistic study on real interfaces. We did a
more, I suppose theoretical study where we had controlled tasks with
varying distances for people to move and varying size targets.
So those had similar dimensions, similar geometries for their clicking
task and for a dragging task where the idea was to move that target
onto that one. And the steering tasks look like this. These are also
tasks people have used often in other studies of mouse performance. So
our target sizes were in factors of two across a fairly large range of,
this is pixels a big fraction of this screen, perhaps a fraction of an
inch to several inches for the movements and there were 5 different
values for distance and 4 different values for width. Again, more or
less very similar to what people do in other types of mouse study.
So when we want to analyze the mechanics of mouse movement we adopt a
very simple, but nevertheless fairly widely used mechanical model which
is a Mass-Spring-Damper.
So in biomechanics it’s fairly common to consider muscles more as
adjustable springs rather then some kind of motor that’s supplying
controlled force. It does seem like muscles when we activate them. We
typically activate them in pairs with a set point and there is some
biological damping that causes the muscles not to oscillate when we do
that. So it’s a simple system. It’s characterized by a second order
differential equation which means that basically it rings with a
diminishing sign wave.
So that’s all we need to do. We want to though infer those models
somehow from the data. And the data is going to look like this. These
are some recordings in one dimension from those mouse movements. It’s
interesting you can see people very visibly hunting and making several
adjustments most of the time, but the dynamics of the models are
encoded in the curvatures of these little segments here and we use LPC
which is a standard signal processing technique in order to infer
second order models that match those trajectories.
Yeah?
>>: Is the model [inaudible] for different types of [inaudible]?
>> Jonny Canny: Well, because we are trying to model the arm itself
intuitively it should work with any sort of position sensor, velocity
sensor or acceleration sensor because it’s really the mechanics of the
arm rather than the --. But yeah, if somehow, I see what you are
saying, yeah if it’s a tracking device that doesn’t involve perhaps
movement of the arm then yes, we have to look at that. The hope would
be that because the muscle tension seems to be a fairly global kind of
effect it might also apply to fingers, but yeah that would be worth
checking because it’s not obvious that would work.
So from this we fit the second order model. So LPC is a model that’s
commonly used to fit second order differential models to signals and
it’s very efficient. You basically just need to extract a few local
correlation coefficients from these signals and that gives you a second
order LPC model which has a very simple relationship to parameters of
the spring mass model. So these are parameters that you can get from
the LPC model and then those translate directly into stiffness and
effective mass of the mass model. In other words, we can observe a
trajectory, generate these second order LPC model coefficients, find
some roots and then derive mass spring down per parameters. None of
this is expensive to do computationally. So it’s easily done as a
little background process on a PC.
So we designed an experiment to try to quantify the accuracy of this
and it was a counter balance design. Our goal was to have reached
subject because we didn’t know how strong the signal would be. We
wanted to have some measurements periods for each subject where there
were in a stressed state and an unstressed state, but because it was a
one hour experiment it was rather compressed so people would enter the
experiment and have a calming phase to try to get everybody to a
similar state and then they had to do a challenging math task which was
stressful for most people. Then we did the first mouse measurement
where they were given these pointing and dragging tasks. Then there
was this kind of calming exercise given to them for five minutes and
finally there was a second mouse measurement phase and a final exit
calming required by IRB.
So that’s the first condition for subjects and half of their subjects
were in the counter balance condition where they were given an
effective calmer first, then given the mouse task and then finally the
stressor in the second phase. So one thing to note about this design
is it’s wasn’t ideal in terms of giving us the best stress signal to
measure with the mouse because we actually had separate phases of the
stressor itself from the mouse task. So the measurement in the
stressor weren’t actually concurrent. And that was the choice we made
in order to make sure we had a very good sort of robust signal here we
basically used a task where people were concentrating on the math
problem that was similar to what people used before in other stress
studies.
So we could have tried to give them a mouse task at the same time, but
the concern was that it might have distracted them. Maybe they weren’t
really stressed. We thought it was safer to produce real stress which
we could validate with self report NHRB measurements which we did and
then see what we could get from the mouse analysis later. So we
expected a certain amount of decay and in fact there was some decay of
the stress signal during the mouse measurement phase, but it was
nevertheless strong enough for us to get reasonable results. In fact
it was considerably better then we had expected. So I think this is
all just saying what I just said.
So we wanted to again have validation that the stressors were really
working so subjects were given self report questionnaires at the
beginning and ending of each phase, that’s important to point. Sort of
for practical reasons we had to give them the survey when they were
transitioning from one behavior to another. Those were very reliably
significant and you can see that also there is a label here saying that
the surveys were taken at stress and calm stages when actually those
numbers represent the averages of the questionnaire responses at the
beginning and the end. So we called the stress questionnaire response,
the value average between here and here, and it was supposed to be
something in the middle here. Similarly the M-stress signal was an
average of the reading before and after the phase.
So what we observed was the difference in stress was higher during the
active stress phase relative to the calming phase. The difference is
about twice as big there was it was here, so about roughly 2.1 verses
1.0 differences. So in fact there was dome decay over time once the
stressor was moved. So heart rate variability was measuring
continuously all of the subjects where hooked up to our tricorder
instrument and we measured heart rate variability and basically the
results were all over the map. We worked hard to get them cleaner then
this, but the reality of heart rate variability is that it’s a very
difficult signal to get consistent answers out of. Every reference we
have seen on the particular instrument has similar results.
The best results were actually for the fast stress response, the basic
heart rate difference was the stronger signal, not the variability at
all. There was a huge difference in the two measurements. So we
actually went over the data a number of times to check that and it’s
actually a very strong signal and it’s really there. Realistically and
none of the other measures I would say are reliability reportable,
although we have some significant results here. We have taken seven
different measures and if you really want to have a significance of .05
when you are taking multiple measures you need to correct. And the
thresholds would drop below. And you really should be working at
.05/7. So these effects aren’t really strong enough to be reported.
You are having too many chances to succeed I would argue.
The other rather disturbing thing I would is that these signals were
marginally significant maybe as one way to say it. There were some
others over here thought, especially this one and there is one up there
which is actually as strong and they are in the wrong direction. So
heart rates seem to be they were more stressed in the calm phase. And
again, we went over this data a lot and while there are some outliers a
lot of it is coming from a little bit of motion artifacts which we
weren’t able to sort of reliable eliminate, that is we couldn’t define
criteria that said we can throw this data out and keep the other data.
So the high order message was that we tried our best to keep good HRB
results for this data set and we got one which was actually a fast
stress response. All of the other I would say was not credible. And
the last thing I would say is that when people do heart rate analysis
all of the formal work does involve hand clean up. So it involves
trying to eliminate the worst phases of non-signal and the worst
outliers of these values. And in spite of doing that we still didn’t
get a reliable signal in heart rate variability.
So there were several measures, but if you actually factor in
Bonferroni correction they were less strong in the MS stress, but none
of them I think were reliable reported and some of them were actually
in the wrong direction, so not a good result. So now to the measures
that were derived from the dynamic measurements, the mass spring model,
here we got much more credible results. And just to remind you these
measurements were made in the aftermath periods, MStress or MCalms
which where actually the intervals right after this stress or the
calming influence.
The absolute values of the signals are pretty small, but you will see
on the graphs that those are actually, the parameters themselves are
clustered around those values and when you see differences like that
they are actually quite good. So we do get a reliable difference and a
nice strong ”P” value and they go in the right direction, in other
words people are more tense when they are stressed. And these are just
simply some different parameters. These are the actual frequencies of
the damped response these are the damping parameters. It was less
clear whether the damping parameters should be smaller or larger when
people are stressed. And our results were not very conclusive on that.
And as a reality check you might expect people to be a little bit
faster when they are stressed, but we didn’t get a robust signal
endorsing that, in fact we got virtually no signal. Okay and what I
just showed you were the aggregate across all of the pointing, dragging
and steering tasks. It didn’t matter if you broke them down they are
individually significant in all of the cases. As long as you use the
frequency features. So yeah, it seems like there is a real signal
there. We didn’t get a signal from time surprisingly.
So let’s look at the signals a little bit more, one really nice feature
that we noticed of the stress signal as a function as task is that it
had a visible and sensitivity to the distance of the task. So just to
recall we had in all 20 tasks, 5 different distances and 4 different
widths in powers of 2 and they are shown along here. This is the 5
different distances. These are the distances here and these are the
widths of the targets along here. And you can see that there is a
complete distance dependence on the distance and close to zero
sensitivity on the width. So, it’s quite a difference from a Fitts’s
law kind of sensitivity.
But you can also see a nice separation between stress and no stress,
but you can also see it would be important to model the sensitivity to
task if you want to really distinguish these things. There is no
separation if you confound for task.
>> [inaudible].
>> Jonny Canny: Well we could because the Fitts’s law index of
difficult --. Well you will see in a second the time follows exactly
the index of difficulty. But no, the index of difficulty is basically
the distance divided by the width so that’s a sore tooth going like
that. And basically it’s only half right. It has the right
sensitivity to distance, but the completely wrong sensitivity to width.
So I use a different and simpler model and that’s what we built.
We found similar results for dragging. The dragging curves look like
this, again virtually no sensitivity. Maybe it’s very slight slope, we
actually didn’t model that, we just measured the distance sensitivity.
And there is a good reason for that. Let’s see finally, yeah, so we
turned this observation into model where basically we fitted a
parameter based on a log of difference. So the other things to observe
here is the steps are constant in size and the target distance is
varying exponentially so it argues that the right model is alpha times
the log of the distance, similar to Fitts’s law, but again without the
width sensitivity.
And again from that it’s very easy to derive what alpha should be and
in fact we use an alpha that’s independent from the subjects. I am
sorry I didn’t actually quite, I was out of sync, and the dragging task
here, the other ones here were actually for the pointing tasks, but you
can see it’s the same kind of shape. The steering was more like a
Fitts’s law type of sensitivity. So, this one did show sensitivity to
the target width as well as the distance, again though with apparently
logarithmic dependence. So we did analyze the steering task using the
small complex model that had both width and distance parameters. And
finally time looks like this, and if you work these values are just
proportional to index of difficulty from the number below.
But here it’s also kind of obvious that there is really very little or
no separation in the time signals which was a bit of a surprise. All
right so now we have a very simple model, but nevertheless a model
which is very easy to compute which is we can take those role readings
and basically subtract off the stare effect which in the case of
clicking and dragging was independent of W, just depending on D. And
the numbers that you get out are sort of then normalized and you can
simply apply a classifier between them. Is that making sense?
So we basically remove the staircase dependent. So now if we have had
any observation as long as we know what the distance of the movement
was we can produce a kind of canonical measurement which should be
different for the stressed and stress case. So from that day did we
finally run some kind of a classifier and derive some accuracy results.
And here is the result of that. We tried a few ways of classifying.
The simplest one is just taking the stressed and non-stressed points,
taking the mean of them; these measurements like the HRV measurements
still have a lot of outlier mean values have a lot of outlier problems
so taking mean values is a very bad model.
We instead used a max accuracy classifier which simply means we took
the threshold; it’s a one dimensional signal now. So we took the
threshold which gave us the highest accuracy which would be equivalent
to doing a support vector machine one dimension, but it’s just simpler
to take the highest accuracy threshold. So that’s what the blue curve
here. The red curve is taking a simpler threshold, which is just
simply the mean of the two sets, so taking an average mid point between
stressed and non-stressed populations of data.
Both of those are using the stare model. If you take the stare model
away you get this accuracy here. And to be more specific I think it
says it here, but the measurements are made by taking our experimental
data, randomly taking a sample of some of the points as the test set
and then training the model, meaning just setting the threshold on the
other points and finally using the trained threshold to classify the
held out points. So along this access here is the number of held out
points, the number of sampled points. There were only 100 points in
total so the accuracy is generally increasing as the sample get’s
bigger, but at some point it tapers off because you don’t have enough
data from the model. The model is the other points. But anyway, we
get accuracy of about 70, this is per user.
So given let’s say a few hundred points of data for a user you can
generate, assuming it’s labeled as stressed and unstressed or perhaps
it might be labeled as neutral, you can learn a threshold and then from
about 10 subsequent observations you should be able to classify
stressed and unstressed to about 70 percent accuracy.
Again, to remind you the data that we were using was based on the state
of the subjects in the MStress and MCalm state from the self report and
the HRV data. Roughly half as much stress as the full stressors. The
data for the clicking and steering tasks is here. We get similar
results overall. This one is a bit lower and this one is about the
same so around 70 percent accuracy again. So it seems good enough for
practical use. And the key advantage of the staircase models for
clicking and dragging is they don’t require knowledge of D. That’s
extremely useful logger that’s application oblivious that’s just
running at mass activity all the logger has to do is recognize the stat
in the end of mass movement, let’s say with a time window, it doesn’t
need to know what the target size was in order to figure out the stare
correction, because it’s only using the distance value.
Presumably because it has that really nice logarithmic dependence on
the values that we were able to take you should get fairly accurate
measurements of some distance which wasn’t one of the one that we did.
So we are actually in the process of doing a subsequent study with more
realistic math tasks where we simulate angry messages from supervisors
coming in e-mail to produce the stress the stress in an actual GUI
context so we can get more realistic movements, but still there is
pretty good evidence from this study that should work.
So we are building this revised logger which will run as an independent
processes, won’t need to be linked to applications which has some
privacy advantages as well. But from simply by looking at mass
movements it should be able to report and evaluate a kind of real time
estimate of stress. So our original goal of this was sort of health
related. How can we help people monitor stress, but it does suggest
that we can perhaps generalize a little bit in our goals how we can
look at peoples levels of frustration or perhaps anxiety about user
interfaces or applications.
In the absence of measures of what the stressors are in peoples lives
if we simply look at these stress levels as a function of time and if
we are able to get a little bit of information about which application
people are running then we can use this kind of measurement as a kind
of implicit usability measure, which I think would be pretty
interesting. I mean its simple enough we could get masses of data and
then depending on how you cross cut that data you could isolate factors
such as the application that people are running and get perhaps the
implicit usability information.
All right. So that’s the summary. I am going to wrap up there. We
have generalized to a cell phone. We have collected from a similar
study from a cell phone. The only difference in the cell phone is the
task are more diverse and less controlled, meaning that we did so some
pointing and dragging tasks on the cell phone. On the other hand
though is we observed, and we have the video of the experiments that
peoples use of the cell phone was a lot more diverse of the mouse and
that people would rest a hand on the table and sometimes hole the phone
up. I some cases would be holding the phone in two hands and dong two
tasks.
So in terms of the dynamics it’s a lot more complex and most likely we
will have to at least attempt to recognize the distinct mechanical
states that people are in when they are using the phone. Nevertheless
there is some science that certain types of features, the basic tapping
on the screen seems to have a really nice dynamic ringing signal that
seems to be related to hand tension.
So all right, that’s the work in progress. I hope to have that soon.
So to summarize we have been working on senseless sensing, which is
trying to leverage existing technologies some of which seem to have
remarkably strong signals around affect generally, but especially for
stress. So we described recent measurements on mouse stress which is a
ubiquitous low cost, hopefully reliable source of stress in the real
world from ordinary mouse use. We would like to see if we can get
similar measurements from cell phone use and we think we would have
advantages over the voice based cell phone use in that there would be
people holding the phone which they arguably spend a lot of time doing
and we will have to see if the environmental vibrations and so on are
trackable or not.
So, yes and of course work that my student Pablo is doing and Mary’s
work is doing is working on interventions and trying to tie some of
these measurements back and deliver appropriate, timely and effective
interventions for relieving stress and improving mental health.
Okay.
[clapping]
Yeah?
[inaudible]?
>> Jonny Canny: I don’t currently have a student who is oriented
towards that. I think it’s a great source. My former student
[indiscernible] who is at University at Pitsburg has been doing some
work using the camera bit contact mode. So he has built a small video
game that involves a lot of movement of the thumbs over the camera
sensor so he get’s a simple pulse signal from that data. But the face,
you might know, there has been work at MIT and elsewhere on recognizing
pulse ROM, changes in
face during red blood
there’s enough signal
least a pulse signal,
of course there is so
of toolkits.
facial color, there is enough blushing of the
infiltration, I guess that’s not the word, but
from the flushing of the face that you can get at
a fast stress signal directly from the face. And
much emotion in the face and there are a number
Unfortunately a number of them seem to be proprietary right now. There
is a little bit of open source work in the open CV toolkit, open
computer vision toolkit that does at least face isolation and a little
bit of feature recognition. But anyway part of the trouble is that
there is a fairly significant technical on ramp for doing vision
analysis. So right now we are interested a little bit more in some
related topics. We are doing some work in deep neural networks for
general image recognition. It might be applicable to this, but for
right now, no we are not doing tracking. Do you know of work?
[inaudible]
>> Jonny Canny: Oh, I would love to do pupil dilation, but I don’t have
the resources right now. I think especially because it’s both a stress
cue, but also an attention cute. And I think a lot of the works on
stress I focusing on these rather microscopic effects. They are very
important effects to do with attention and interest that are more on
the positive side, like you would like to know when people are being
effectively engaged and having an appropriate level of engagement, but
not becoming obsessed. So in a sense they are in the zone of ideal
behavior that is they are sort of attending to things without being
distracted. I think gaze patterns and pupil dilations. You want the
right combination where people are transitioning from one stimulus to
the other without being haphazard, applying appropriate focus, etc, I
mean that’s really about detecting whether people are in the zone.
That’s sort of ideally where we would like to move from remediating
stress to actually helping people get into the right cognitive end
cognitive zone. So yes that would be a great topic, but we aren’t
really there.
[inaudible]
>> Jonny Canny: So I will tell you a little secret which is no. The
detail is we didn’t expect to see really the second order behavior and
gross behavior. And the truth is we don’t actually, it’s not strictly
second order. It’s two second orders stacked because the gross
movement is really not what we are interested in. We are interested in
the second order little wiggling that’s on the top of it that’s at a
different frequency. So the truth is we actually do a full photo model
and throw away the lower frequency poles. And we take the high
frequency ones which turn out to be in the right frequency range for
the system that we are trying to detect. So you are very astute,
that’s not exactly what we are doing, but it works to essentially
filter out the other component.
>>: And when I think about that model, when I move the mouse am I just
basically just making a new set point for my [indiscernible]?
>> Jonny Canny: Well again, it’s not exactly a fourth order system.
What we understand is that when people do a gross motion it’s not that
the spring damper is supposed to be a sort of open loop system that
doesn’t have a big input. When you actually move you really have this
other system with a big forced input that’s responding to that input.
So you again don’t expect. And when you look at those polls they are
just not robust. They are all over the place, the dominant poles. So I
am not sure if I am going to answer the question, but yeah, so what
were the cracks of the question?
>>: If the model of a motion is just making a set point from one
[inaudible]?
>> Jonny Canny: So the simplest biomechanical models that I like are
basically changing the set point of the system. So yeah, we did try a
variety of things that intuitively might have helped such as trying to
only run the analysis during what appears to be a passive phase. You
can define passive as energy as coming out of the system and it turns
out have a simple formula in terms of the direction of derivatives,
first and second derivatives. So we did it in passive phase and it was
not as good. We also looked at active phase. So, I don’t know, I mean
yeah, we tried to do the things that might have helped, but they didn’t
really help.
>>: What did you do to try and calm your subjects down during the
study? [inaudible]
>> Jonny Canny: Yeah, you know what I honestly don’t recall. So it’s a
research question which interventions work best, and I think it might
have been a breathing exercise, but I am not completely sure. But
yeah, from the work that Pablo has been doing and Mary has been doing
them are clearly different interventions having different effects on
different people. So it would be great to have a better understanding
of this in the context of this experience. I am sure you can, you get
so many anomalies because some people don’t get calmed at all during
the calm phase and some people aren’t getting very stressed in the
stress phase. So there are all these outliers which make the data a
lot less clean then we would like it to be. And perhaps better machine
learning and modeling so that you are giving people perhaps the best
one would probably help this data quite a bit.
>>: My second question about checking for stress inside the voice and
do you think there is a different type of change in the voice during
the high speed version of the stress version the low speed version?
Because if in real life you don’t know when the stimulus is, so if you
can distinguish between the two you can almost figure out what was the
stimulus that caused them to be stressed.
>> Jonny Canny: So I do know that it’s a chronic signal. So it’s the
same signal they find in depression. It’s most likely cortisol because
it’s basically permanent from depressed people.
>>: But could there be one from [inaudible]?
>> Jonny Canny: Yeah, you would think that there would some how it
might also be the same effect but stronger in a short-term situation.
So I don’t know if anyone has measured that.
>>: [inaudible]?
>> Jonny Canny: Yes, yes, that’s certainly true. Yeah, the voice
really is a wonderful signal and they breathing --. The good thing
about the work that we did earlier is that there are so many untapped
signals in vital signs if you are able to get them; it’s just really
hard to get them. There is definitely more in breathing that people
haven’t tapped yet. It would be great to be able to do that. We just
decided this is much easier to do the work, do the experience and have
impact with the implicit signals, but the signals are often a whole lot
more ambiguous.
>> Mary Czerwinski: All right, let’s thank Jon again.
[clapping]
Download