1

advertisement
1
>> Li Deng: So it's my great pleasure to introduce Adrian Lee from University
of Washington to talk about brain dynamics. This is one of the topics that
probably will capture interest in a lot of research here doing neural network
type of research.
So Professor Lee graduated from MIT Harvard Speech and Hearing Ph.D. program.
He recently moved to UW just a few months ago.
>> Adrian KC Lee:
A year ago.
>> Li Deng: A year
lab a couple months
Today, he will come
as related to brain
Adrian.
ago. Wow, time flies. I got an opportunity to visit his
ago. There's a lot of exciting things going on there.
here to tell us something about speech and hearing research
imaging and as related to brain dynamics. To give floor to
>> Adrian KC Lee: Thank you. Thank you, Li, for the invitation. It's great
to be here. My goal today is to try and convey what's different in
neuroscience. And really, in terms of the interface between neuroscience and
engineering. I think that's actually where the most exciting part is. So
where we're at and where we want to head to.
My home Department of Speech and Hearing, and I'm adjunct professor in electro
engineering. But our lab is actually based in the Institute For Learning and
Brain Sciences, and we'll show you the type of equipment that we have for
capturing brain dynamics over there.
But I want to start with something that maybe captured our thinking every day,
the combination of neuroscience and engineering. What are the things, the
challenges that we face.
Well, one of the things that we're interested in is how to do people
selectively attend to sound of interest? This is something that a human can do
quite easily, but a machine yet can do properly. Another thing is recognizing
speech. Just a brand new speaker coming in, a human can potentially understand
even my weird accent of mixed with Colonial British, Australian and some
Bostonian accent. Hopefully, you can still understand my speech properly. You
don't need to train yourself for 20 hours of tokens of me speaking.
2
So how do we do that? It's something that we want to capture in terms of how
our brains do it and learn from it too.
One of things we're also interested in is whether we can design machines and
now it depends on different levels, whether we can mimic, interact, or even
complement human abilities. So understanding where we're at in the brain, what
are our deficits, can we capture brain signals and interject.
And, of course, you know, this is the ultimate goal, right, have a seamless
human-computer interaction and I think that's something that you're all
interested in. So truly believe that a successful interface between
neuroscience and engineering would, although it actually starts needing from
the brain-state classification problem, can we identify which brain states
you're in and there are make brain-computer interface and maybe in having
artificial intelligence that's actually enhanced our capabilities.
So let's start with our basic problem, right. One of the things we study is
the cocktail party. Now, in this environment, there are lots of different
conversations coming on, right. And for us, it's really easy to pick a certain
conversation. For people at UW, they might say hey, I've got a Huskies ticket.
Can you come over and, you know, do you want to join me for the weekend to
watch it.
For grad students or post-Doc, they may say hey, there's free drinks in room
204B. You're sharply tuned to that. And that's defined, back in 1953,
identified as the cocktail party problem. Yet we cannot solve for it
computationally right now.
I tell you why
view, here's a
and Jenny says
that's not the
attend affects
this is also important. From a learning environment point of
Chinese lesson for the kids. It says ni hao. Repeat after me,
ni hao. That's great. Unfortunately if you've ever taught,
environment that this is, right. So being able to selectively
education, affects a lot of things.
This picture also reminds me of that there's a lot of machine learning going on
here, right. The interesting part is that these are the little machines that
learn every single day. In terms of Ilabs, we do a lot of brain imaging on how
they learn. We want to understand how they do it so that we can better learn.
So in some sense, this is a generic problem statement of how we can learn from
the best learners.
3
Okay. So if the signal is not good, even though it's ni hao, poor Billy might
have gotten mao, which is vir different from the signal. So if you don't
attend it properly, ni hao means how are you. Mao means cat, right. So if you
don't selectively attend and fish out the acoustics properly, there will be
information loss.
So even though it's a very perceptual question, it has a very high upstream
consequences.
So it's really hard to describe what an auditory object is. So I'm just going
to play a game with you here. We're going to play the game called password.
You tell me what the password is. Listen to the male voice and tell me what
the password is. There might be a prize at the end. So male voice, give me
the password.
>>:
The password is Huskies.
>> Adrian KC Lee:
So what is the password?
>>:
Huskies.
>>:
The password is Huskies.
>> Adrian KC Lee: Okay. Now I said that there might be a prize at the end,
right? So what did you ->>:
I'll buy you a drink.
>> Adrian KC Lee:
from me.
Now listen to what you could have done to get that prize
>>:
The password is Huskies.
>>:
I'll buy you a drink if you stand up now.
>> Adrian KC Lee: Ah! It is a very expensive game to play if I miss out,
right? But I can direct your attention to same physical stimulus, yet you can
pick up the signal depending on what you listen to. This is the dynamics that
we talk about. Attention is rewiring your brain at hundred of milliseconds
4
level or giving you the respective cues to bias the physical stimulus so that
you get one message out or the other. That's cocktail party effect, and that's
what an auditory object is. We listened to the male voice versus the female
voice.
Now, it's slightly different from the visual scene analysis. So in vision,
here's a screen in front of the white board, and in that case, something's in
the foreground, occluding the background, hence there's an etch. An etch
detection in vision helps a lot to segregate objects in front and background.
And auditory is very different, we call is a transparency. Spectral temporal
elements add on top of each other, and it's like transparent. It's really hard
to segregate them.
Now, there are different cues that we can use. Here, I'm using a visual
analogy here, and have different things in different colors. So you could
imagine that I can attend to a certain feature here if I'm attending to yellow
in terms of neuro physiology. We know that if you're attending to a certain
color, those type of neurons actually are active more and therefore have a bias
in terms of attention.
Now, you can selectively switch from one view to another, and also, I just
played a game with you, because I was fiddling around with audio, told you a
lot about Huskies and UW. I primed you that the word is Huskies. So priming
also works.
So in our lab, we're interested in auditory cues, how they group sounds and how
to group acoustic sounds and how selective attention works. So what are the
ultra cues that normally works. In an orchestra, you can listen for the
violins versus the flute. So that's Timbre, intensity. I have a louder voice
than the fan that's actually humming in the back. Now that I've directed you
to that fan, you're probably just listening to that fan instead of me. Going
to concentrate more on spatial cues and pitch.
Spatial cues are nice, especially when you have beam formers in terms of -- I
know that you a speaker, conference forms, you can actually beam form to a
certain speaker. It's the same thing we use for spatial cues in human. Pitch,
whether I have a high pitch, low pitch, male voice versus female voice.
So just briefly, I know that auditory spatial cues, just going to talk about it
5
briefly. So here's a subject's head. Here's an acoustic event coming on. And
the sound arrives to the ipsilateral ear, the ear that's closer, slightly, ever
so slightly faster than the one that's opposite by tens or hundreds of
microseconds. That's good enough for us to have something called interaural
time difference cue. We human can actually go os far low as 10 microseconds as
that discrimination.
Interaural level difference. The head, at higher frequency, casts an acoustic
shadow such that the closer ear actually receives the signal slightly higher by
a few dB than the signal that's further away. And that gives you the acoustic
cues and there are some spectral cues as well.
Given all this, I can direct you to listen to one cue versus another. What we
want to do is to find the brain dynamics that's associated with you rewiring
your brain really briefly to listen to our stim list.
And here's the paradigm that we use. We ask the subject to -- we cue the
subject which side to listen to. So here's a left arrow. And then they
maintain fixation. And then what happens is a stimulus comes on. Two
simultaneous. One coming from the left, one coming from the right. One has a
high pitch and one has a low witch. And so because the arrow was on the left,
pointing to the left, the right answer, when we asked you to respond, would be
three.
Now, you could imagine that I can just rewire your brain slightly differently
using the same stimulus, but point the arrow to up and down. So now, pay
attention to the pitch. The pitch of this, I've denoted high pitch in red so
if I have the arrow going up, you listen to that simultaneous speech again.
And to answer correctly, you have to listen to the high pitch, and the correct
answer should be two.
And what we're going to do is to, once you've been primed what to listen to,
look at the brain dynamics to how you actually rewire to listen to that cue.
And so that's the dynamical process that your brain goes through.
The technology that we use is as follows. MEG measurement. This is newly
installed in UW two years ago it's a multimillion dollar machine. I think the
price tag in inventory is $3.4 million, I think. Sits in a room that is
magnetically shielded. It's at 306 channels, measuring the magnetic fields
coming out of your brain.
6
Simultaneously, use EEG, electro encephalography. So not only picking up the
magnetic fields, we're picking up the electric potential on your head.
On a separate session, we do an MRI scan. We want to know the anatomy inside
your brain so that we can relate the brain signals that you measure and
spatially co-register it such that we know where anatomically it's coming from.
Once you do that, when you co-register your anatomy with your brain activities
that you measure, what you can now do is to capture the brain dynamics and we
make brain movies. So I told you that they get a visual cue in the beginning,
right? So here's a brain movie showing the first 300 milliseconds. I have to
explain that brain a tiny bit. It's a funny brain view. Everything gets sort
of inflated. I've marked the visual area, the auditory area, attentional area
and executive region.
>>:
So the color here indicates what?
>> Adrian KC Lee: The red spot is the flow of activities, brain activities.
Precisely, this is MEG. So this is measuring about the post-[indiscernible]
potential. And if you look at a time scale we can get down to millisecond
precision. So here at 1,000 milliseconds prior to the onset of sound, you
receive a visual cue. Has about a 100 millisecond delay to reach into the
visual cortex and as you see, there's a wave of activities now in your
occipital cortex, in your visual cortex.
Marching further down, now you're holding fixation, but you're actually now
rewiring your brain to attend to either spatial cues or the pitch cues, right.
And you see there's a sustained activity in the attentional areas. This is
actually called the frontal eye fields. It's also involved in vision,
directing your eye gaze from one position to another.
If you think about moving your eye gaze, it's casting a spatial attention from
one position to another. And the question that we ask is, is this region, even
though it's only for eyes previously defined, is it also useful for auditory
spatial hearing.
>>:
Is that commonly recognized region called attention?
>> Adrian KC Lee:
So this is actually the pre-central sulcus.
Okay.
So at
7
zero milliseconds, that's when the sound comes up. So just watch the auditory
cortex when it receives of the auditory sound. Now, this is fundamentally
slightly different from FMRI, right. Traditionally, in neuroscience, FMRI or
even in terms of, before that, lesion studies, people talk about where in the
brain matters, right.
Here, we're trading off a tiny bit of the spatial resolution, but we want the
brain dynamics. You know that in FMRI, every two seconds you take a brain
picture and you infer where things are in the brain. Two seconds is an
eternity in the brain, I hope, right. I can do a lot in two seconds. One
thousand one, one thousand two. You just switched off because you know that
one thousand two is redundant. I'm counting one seconds, two seconds. Your
brain shuts on and off depending on the context and all this stuff.
So we want to capture, at millisecond level, what the brain is doing.
Now, using this technology, what we can then do is very precisely work out the
difference between your wiring of the brain even before the onset of sound as
you're rewiring to listen to it. So right before the onset of sound, when
you're paying attention to space, the spatial cues, one part of the brain,
frontal eye fields, is involved more active. When you're paying attention to
pitch, another region, it turns out that this region, the superior temporal
sulcus, has previously been recognized as the people using musical pitch
discrimination and absolute pitch, people with absolute pitch shows a stronger
activity here.
So naturally, your brain is recruiting different brain regions at different
times to help you to do the task.
Because of the fine resolution, and this is where we differ from FMRI, we can
distinguish what happens right after the sound onset. Also, FMRI is really
noisy, right. If you have ever been to an MRI scans, it's constant like that.
It interferes with the very signals what we're measuring in speech and hearing.
MEG is quiet. So we can definitively work out, during the stimulus, how we're
reacting to the sound. And here, we can actually see that while the spatial
areas, the frontal eye field is still involved, while the stimulus is on, it's
less active, or it's no longer differentially recruited when the sound is
already on.
8
Now, there are other ways of using this technology. Here, we're using sending
attentional probe into the sound. So what do we mean by that? Well, we are
also going to give you two sounds, one coming from the left, one coming from
the right. And your task is to count how many Es. So let's say listen to the
stream coming from the left EEMAELK. Great. There are three Es. Respond
three. However, we're going to tag this with a certain frequency, a low -it's high frequency for the brain dynamics people. It's low in terms of speech
people, right. 35 hertz, 45 hertz. It's nothing that you think of as
something that's high frequency. In the brain we think of four to six hertz,
four to eight hertz as low frequency or even down to one to three hertz.
So in any case, we're tagging 35 to 45 hertz. We can talk more why we chose
those frequencies. So basically, here's a speech signal and we're going to
amplitude modulate it. We're going to use that as a probe to find out where in
the brain is actually responding to the 35 hertz or 45 hertz.
Remember that when I cue to listen to the left, this is now in the foreground,
right. And if it's true that you're modulating what you're listening to, the
things that are in the background shouldn't matter. You're concentrating to
the left stream and we can look at the tags.
So what happens? Now, listen to the left. 35 hertz is the brain. Let's tune
the brain at 35 hertz. We're just doing a phase-lock value here, right? You
can see that the left frontal eye field, it's locked to the stimulus. And also
the auditory cortex. Remember, 45 hertz is the side that you don't listen to.
We told you not to listen to that.
When we tune into 45 hertz, that signal is no longer there in the frontal eye
fields. You can counter balance in, you know, which side that you're listening
to, what frequency, only the left frontal eye fields is locked to the stimulus
that you're listening to. And the probe, turns out psycho physically, is
transparent. As in the subject doesn't even know that the -- even if we change
the probe in midstream, the subjects don't even know. So it's a neat
neuroscience cue that we can use and also that can be utilized for a lot of
eventual deployment for brain computing.
>>:
So that spatial seems to [indiscernible] here.
>> Adrian KC Lee:
EEG.
That's because we combine the anatomical scans with MEG and
9
>>:
Oh, okay.
So after you do this, how does it compare to the FMRI?
>> Adrian KC Lee: Let's say FMRI, a canonical resolution is roughly one
millimeter. After a lot of spatial smoothing, it's around five to six
millimeters effectively. We would say one centimeter. So for us, we care
about the network approach. We don't care about whether this region of the
auditory cortex responds more to pitch or space. We want the global picture of
the brain.
Eric Larson actually here, if you want to learn more about this experiment, I'm
just going to briefly describe it. One of the things that we want in the lab
is to eventually work out how to tune a hearing aid dynamically. So hearing
aids right now don't work in a very noisy environment. If you've ever had
older relatives listening in a restaurant, they would rather take it off.
That's because everything's been amplified. It doesn't selectively amplify to
a sound. Now, if we want to selectively amplify the sound, we need to know
where you're attending to, and one of the questions we also need to address is
have you just switch attention from one person to another.
So here, we're designing an experiment just to see whether you've switched your
attention from one speaker to another. Use the same cue, whether you listen to
the left or right, visual cue. Two-thirds of the times great, you know, just
respond to the two digits, one coming from the left, one coming from right,
from the side that we've cued you on.
But one third of the time, we're going to say, oh, no, just kidding. I'm going
to switch the attention to the other side. The right at that 600 milliseconds,
when we told you to switch your attention, where in the brain helps you switch
that attention? Turns out that the right temporal parietal junction -- here's
the manipulation point, right. Just kidding, switch attention.
Green is switch attention. White is hold attention.
difference in terms of that particular region.
You see a massive
There are other regions that are involved the right frontal eye fields or the
middle frontal gyrus. They're different regions. But for us is whether we can
capture that signal and do classification to work out whether you've just
switched attention or not.
10
>>:
But the area is normally not related to hearing.
>> Adrian KC Lee:
>>:
But attention is generic attribute.
>> Adrian KC Lee:
>>:
Yeah.
That's right.
That goes across vision?
>> Adrian KC Lee: That's right. So one of the questions that we have is, is
attention super modal, right. If you're paying attention, it doesn't matter
whether you're paying attention to sound or space, it should just be an
attention network. Auditory attentional network has not been studied
thoroughly, and that's what we want to contrast that to vision. RTPJ has been
implicated in visual cortex, and I'll show you.
The coolest thing, though, is this. Not only that we find a difference in
terms of the brain dynamics, we then correlate back to the behavioral
performance. And you see a massive correlation, depending 0 whether you can
switch attention versus not and correlate that into the differential activities
of that region. Almost like, actually, when Eric showed me this, I was saying
that's too good to be true. It's really highly significant.
Clinically, we can use this to perhaps diagnose central auditory processing
disorder. There are people that have normal audiograms but just can't do
things in a cocktail party environment. So from clinical perspective, this is
useful. For machine or brain/computer interface or usage, it turns out that
one part of the region is a really good predictor of your brain-state. And
this is what Li was pointing to is here's a visual attentional network that's
been previously mapped out. I've shown you left frontal eye fields, right
frontal eye fields, right temporal parietal junction.
I think it's -- I'm a bit biased, I'm at auditory person. Auditory always
cares about time and frequency. It is not the native axes that the visual
people think about. So the question that we want to now address is, yes, we
have a network. How do these signals pass from one note to another? Also,
when they pass a signal, can you tell that they there is some correlation? How
do they pass the signals in the first place? And I'll get back to that.
11
So as Li said. We sort of [indiscernible] my lab over here at UW. And looking
specifically into auditory brain sciences, we have a neuroengineering goal in
the lab as well. So I'm going to walk through some different experiments,
different people doing different things and hopefully if you're interested in
it, please also talk to them during the course of the day.
So Ross Maddox is also here. He is currently studying audio visual binding
problems. Now, why do we care about audio visual binding? There are many
objects that comes on, right. If you have a computer passing a video scene,
how do you know whether the sound and the vision is the same person?
It turns out that we use a lot of temporal coherence. At the syllabic rate, my
mouth moves at this four to seven hertz signal and my syllables also coming up
at four to seven hertz. That gives you a cue that my mouth moving and my sound
coming out, it's coming from the same object. And so we're designing
experiments where Ross is looking into the different ways of binding, how
temporal coherence can change your perceptions of binding.
And the implication is that if we can harness this, it can also be turned into
how a machine can use that type of information for binding information. I
think one of the main problem is, you know, auditory scene analysis, visual
scene analysis is still a very hard computational problem right now. Can we
learn from human to do that task.
Another project that we have is with the graduate student, with Elliott Saba
and he's co-supervised by professor Les Atlas. And what we look at is the
signal processing in the brain. Now, what does that mean? Speech people would
tend to think about carriers of pitch, you know, few hundred hertz and then you
have [indiscernible] at a thousand hertz.
As I alluded to in the brain, we talk about four to eight hertz at some brain
signals carrier and there's some high frequency at about 30 to 150 hertz. Here
is showing a mouse moving in a maze, and where it goes into that particular
part of the maze affects where the face procession, when that particular -when they're at that particular region, that neuron is fired slightly earlier
in phase of that carrier frequency.
So it becomes the phase of another frequency can be used to coat where in
space, in the memory, in the hippocampus, people have gone experiments on
12
memory retrieval, items retrieval. If I give you a sequence of one, two,
three, four items, you can be locked into the different phase of the lower
carrier to give you that sequential results back.
So we're starting to analyze the signal processing in the brain, the coupling
of different frequencies and trying to map that out. You can also imagine that
this can be used eventually in EEG and MEG, right. I can dynamically look at
how different parts of the brain are communicating using these lower
frequencies and therefore infer what brain-state or what items that you're
retrieving.
I've mentioned the problem statement that our labs, one of our lab goals is to
get a dynamic -- hearing aids dynamically tuned to the signals of interest.
And in this case, we want to know what you're attending, but in a generic
problem statement is what is your brain-state. If we can measure that you
are -- you just switch attention to one from another, I can also measure
whether, oops, I just made an error. Here's an error signal from the interior
signaller cortex, for example.
So a broader statement is can we classify brain states. So Eric Larson, who is
sitting in the back, a post-doc in our lab, he is working on this particular
problem. We start with brains. Lots of brains.
Let me take a step back too. In terms of brain computer interface, you can
classify all your brain signals all you want for a particular subject. You
will do very well. However, when you have a brand new subject, it's really
hard to predict the N plus 1 subject what that brain-state classification is
going to be.
We want to make a difference in that room. Can we predict a brand new subject,
what sort of brain classification, taken into account of the bank of brain
signals that we have.
And this is how we're approaching this problem. First, we calculate the game
matrix. What this means is that we have an MRI scan, we decimate the space in
brain. This is pure [indiscernible] laws just to work out thou the sense is
seeing each of the [indiscernible] in your brain. And then however,
unfortunately, this is -- we're talking to Li before. We're using the cheap
data right now. We're doing modeling, because we can simulate brain signals.
13
Ultimately, we are never going to measure individual neurons in the brain.
yet, anyway. Instead, what we ->>:
[indiscernible].
>> Adrian KC Lee: That's right, that's right. But not the whole brain.
you poke the whole brain, I'm not sure the organism will survive, right?
>>:
Not
If
[indiscernible] signal from a single neuron.
>> Adrian KC Lee: That's true, yes. From a -- I think we really want to do a
non-invasive imaging approach. I think that's where the commercialization
aspect will come in. I'm not sure everyone will put up their hands for
sticking needles into their brain. So what we're measuring is something coming
out of the head as the MEG the EEG signal. To infer back to where those
signals coming from the brain, you need inverse approach, right? This is high
dimension, not as high dimension, rank deficient, need an inverse approach.
So we do the inverse imaging L to norm constraint. We talk about the actual
methods later on if you're interested. One other problem. Turns out that
luckily, I guess, each of our brain, it's different. Look at each brain here.
Each brain is just slightly, ever so slightly different in the geometry. They
all have some resemblance to a generic brain, but they are all different.
The
MEG
now
all
different in the folding changes the physical properties of your EEG and
signals by the right hand rule. So a lot of brain computer interface right
uses the motor strip, the motor central sulcus, and it happens that they
realign quite likely like this. It's nice.
One's in the prefrontal cortex where the cool things are happening, right?
Attention, executive control. Those are the things that the brain signals that
we really want to capture. The folding is so different that it can massively
change. Just imagine you're folding, right? Just do your right hand here.
The field will change very differently.
So one thing we have to do is to align all these little brains. Sorry, big
brains, whatever. Spherical morphing is the approach that we use. Look at one
brain, we're going to map it to a generic brain and come back to it just to
see.
14
So you can see that there's some sort of decimation, right? Low
[indiscernible]. But it captures mainly the sulcal gyral patterns, which is
important for us to get the field patterns. By doing so, what we can then do
is to start approaching solving this N plus 1 problem. If we have a new brain,
we're just going to cheat one more time. When we do, the brain computer
interface for the brand new subject, all we need to do is just ask this person
to give an MRI scan. So we're almost getting all the information. We don't
need so much training of the classification. The goal is by understanding just
capturing the anatomy of the brand new subject, can we do much better in terms
of brain computer interface.
And so in a preliminary modeling experiment, so just that, the answer is yes.
Where is the regime that we do in general in terms of brain computer interface
when we want a massive deployment, right. It's low trial counts. We're not
going to get the subject to do 100 trials and classify, right? That's the
commercialization aspect is sort of low.
We want low SNR, low trial counts. Those are the things that we want. And it
turns out that in this simulation, having the anatomy of the untrained subject,
right, so we're training all the brain datas that we have with the anatomy.
Now a brand new subject, with that new anatomy, can that actually help. And in
the low trial count, and low SNR, it actually provides improvement.
In fact, if you only just use the sensor space, so only classifying brain
signals using the EEG channels, you can't even start projecting to a new
subject. Normally doesn't work unless it's a well defined motoring -- there's
a few experiments that work, and that's actually where the mainstay of PCI
right now.
Okay. I'm going to conclude with a cool new experiment that's done by Nema, a
good friend of ours. He's a former student of Shehab. I know some of you know
Shehab as well. Here is a direct recording on the brain. So this person's
skull has been removed. Electrodes placed on the brain. This is for epilepsy
detection, pre-surgical planning.
And the subject has been played two sounds. Again, it's the cocktail party
problem. This is the CRM corpus, coordinate response measure and the two
sentences are going on. Speaker one says ready tiger go to green five now.
Speaker two simultaneously, just like the example I played for you, says ready
Ringo, go to red two now.
15
The person has to, you know, respond to where Ringo went to. So when the
subject listened to Ringo, great, okay. I need to know it's red two and ignore
green five.
This is the computational problem that the brain is facing, right. You're all
familiar with computational auditory analysis, try to fish out noise. It turns
out, you know, as an engineering, the signal-to-noise ratio, right, there's
always noise. In human, there's no noise. It's what I'm attending to and what
I'm not attending to is noise. It's really hard to define noise in human. So
you can switch your attention at any point in time.
So this is the signals that you get. It's a summation of the two, and if I'm
only recording one signal at a time, this is the brain response, the spectral
temporal response and the auditory cortex to speaker two alone, to speaker one
alone. This is the result of the unmixing of the auditory cortex due to the
attentional bias to when you're listening to speaker two alone, speaker one
alone and see the correspondence here.
>>: So how do they process the signal here?
electrodes, do they do something?
You have ten different
>> Adrian KC Lee: Yeah, so the spatial dimension here is not the most
important part. You can even tune down ->>:
A single electrode could --
>> Adrian KC Lee:
>>:
No, there should be multiple electrodes.
Do they [indiscernible].
>> Adrian KC Lee: Yeah, that actually gives you another piece of information.
The weighting of the electrodes may give you a clue of where the actual area of
the brain that's most responsive to this. So that actually gives you the where
question, right.
Here, at least, gives you the spectral temporal dimension that of the auditory
cortex ->>:
[indiscernible].
16
>> Adrian KC Lee:
>>:
I believe so.
I actually need to re-read that paper.
Okay.
>> Adrian KC Lee: But here's the sort of -- a proof of something that we
already know, right. We can do this. It turns out that the auditory cortex,
when you have attentional bias, can modulate the spectral temporal signals like
this.
So I guess to conclude, I want to sort of open up where we think that is ahead,
the road ahead of us and also how that integrates to perhaps things that are
maybe in five to ten years commercializable or a decade later.
I think speech perception is fundamentally something that is time critical.
Speech and hearing science, always time frequency axis are the most important
dimensions to us. So one thing that we should start looking into is how
different brain regions coordinate at different times.
That should provide us with a new way of thinking in terms of the computational
modeling. It's not just the where. It's not just the anatomical connection.
People have been doing that a lot, right. How information is being passed
dynamically, that adds the time dimension, even though we've, from the
engineering side, we've done that using, you know, the hidden Markov or
whatever -- hidden time state changes, right. You try to learn that. Why
don't we learn from the brain as well?
Another thing is the interaction between top-down and bottom-up attention. It
is really hard to ignore when a phone comes on right now, right. That's like
an auditory siren. You automatically switch to the phone. And, in fact,
that's why a phone rings. It's to grab your attention.
That's the computation between top-down attention. Oh, wait, I'm supposed to
be listening to this speaker and to ignore that phone ring. That's the
computation.
Now, a question that would be of interest is, is auditory siren an all or
nothing? Is a siren only just grab your attention like this, or can it be sort
of like an attention-grabbing, but depending on your brain-state whether that
would grab your attention or not.
17
So I understand that there's a great interest in Microsoft looking into triage
alert system. Like the busy body system. In that sense, you want to know
whether I want to alert the user whether it's a good time to stay away from my
work.
Now, the decision point, of course, is based only a lot of data learning and
how likely it is from the decision model to alert to the user. You can flip it
to the other side, right. I want to know how likely the brain wants to be
alerted or, hey, brain, I'm going to give you about 30 percent chance of
actually capturing this signal. Just like if you have a green light on your
phone, right, and you actually -- if momentarily I'm just bored with my work, I
see a green light flashing. I can make that decision myself too, whether I
want to grab the phone and look at that message really quickly, right.
So here, it's giving an option of not just an all or nothing alarm. But let
the user, give the user back the chance to say hey, here's the signal in the
back. Oh, yeah, I'm actually interested in it because I got bored a tiny bit.
So it's an interaction between the computer and the user.
a [indiscernible] acoustics end how to integrate that in.
And using that from
Now, of course, there are a few other things we can do, right? One, imagine in
10 or 20 years' time, when EEG technology is available such that we don't need
all these gels, right? In fact, dry cap's being produced now, slowly becoming
more of a mature technology.
You could have EEG caps that literally just have these micro sensors in your
hair, picking up the signals, wirelessly transfer to your phone device. The
key is what signals are you picking up? And how -- what are the key signals
from the brain that is usable to interact with the device. That's the key,
right. And that's why we're interested in brain-state classification. We want
to bank of vocabulary of your brain such that we know how to look it up.
Moreover, to look it up is one thing. We need to know the dynamics. The other
is how generic it is. We don't want to train every user all the time. Maybe
some day, by using anatomy, we can just say, hey, here's out of the box a new
phone device. Here's some EEG caps. Just put it on this way and we'll do the
learning myself.
18
So that's where I'm going to end and just like to acknowledge in terms of
funding. Li was asking what type of agencies now are interested in brain-state
classification. From a hearing aid point of view, the national institute for
deaf and communication disorder. So sort of the hearing research branch in NIH
is obviously interested in the hearing aid application. But the defense is
also starting to get into the brain-state classifications for reasons that
might be obvious.
And I'd like to thank my collaborators, both in UW ands about ton university
and also at Mass General and if you want further information, here's our
website. Thank you.
>>: Back to this attention sort of inverse engineering. Do you have any
computation model that somehow can give you similar [indiscernible] that
measure so that the same kind of [indiscernible] afterwards can cover one of
these?
>> Adrian KC Lee: There are two questions here, right? One is what dimension
should we use, right. So what feature space, what hyper space should we do
that classification for, for computational auditory signal analysis, auditory
stream segregation. I think what we haven't been using, and this is the type
of research that we want to at least give clues of what auditory cues we use,
right.
So grouping cues are the most important things. Continuity of sound.
there's a whole branch of auditory grouping literature. --
So
>>: When you showed the result, you don't care about [indiscernible] as long
as you can produce something that gives you the same kind of measurements
[indiscernible].
>> Adrian KC Lee: No, no. The brain that you measured really cares about the
cues. The person that is doing it is using all the cues that the person can
use to segregate this. Or else this wouldn't work.
>>:
Somehow you've got to get the cue.
>> Adrian KC Lee: Yes, and the interesting point, we were also talking about a
metric of how good you are in terms of after you segregate, right. It's good
to have an L to norm, mean square error, engineering standard go-to. This may
19
not be the most relevant thing for the user, right. And a mean square error
might have spikes in there, which really perceptually unpleasant, right.
And just like MP3 code, why is it that transparent coding works? You need to
work out from the MP3 is from the perceptual side, from the cochlear side, you
can block up the masking. Here is the higher level. What grouping cues can
make the streaming more seamless, and you can still have some residual error
that is irrelevant in terms of getting a stream out.
>>: So go back a few slides here.
the base of the signal.
>> Adrian KC Lee:
Okay, yes.
So the [indiscernible] is at
Yes.
>>: So there is a missing diagram that I really want to know in terms of the
raw signal that you measure.
>> Adrian KC Lee:
Oh, from the -- yeah.
>>: And then you have those, and whatever process they do, they will cover
this, assuming that [indiscernible].
>> Adrian KC Lee:
>>:
Absolutely.
But we're measuring from the auditory cortex.
You're asking the patient to focus on one of those?
>> Adrian KC Lee: Right. But I've showed you previous slides, when I pay
attention to space or pitch, it uses different areas. So here, there is a
spectral temporal modulation of the physical stimuli, which is great. But that
doesn't necessarily tell me where the attention signal is coming from.
Somewhere else in the brain is telling the auditory cortex, modulate this way.
So it is the trade-off between you want to know locally very well in terms of
the spiking of the neurons, versus I want a global view of the brain and how
things have been connected.
And I think you need both. There's every single level in neuroscience is
important, down to single unit neuro physiology, you have multiunit recording
and for [indiscernible] neuroscience approach.
20
>>: I think this piece of work is that it shows you whatever recording you
have, it will record it, it is sufficient to [indiscernible]. So what it means
is if researcher, if you have this model that can verify through some
[indiscernible], and that give you reasonably good output that you might
measure over here.
>> Adrian KC Lee:
Yes.
>>: It shows you can actually cover -- it allows you to investigate more
[indiscernible] modelling to figure out.
>> Adrian KC Lee: Right. And you could imagine if I drop some spectral
temporal element, would that matter. So there is some metric that you can use
getting what the brain cares about versus not.
>>:
And so far, no idea about how attention is [indiscernible].
>> Adrian KC Lee:
>>:
We're working on that one.
Okay.
>>: What would happen if you did the same test on attention with subjects that
have severe attention deficit disorder? I work with a student that has a
severe [inaudible].
>> Adrian KC Lee: Yeah, there are a whole host of psychiatric disorders that
may influence attention, ADHD and also autism. We're talking about groupings
here, right. About how different frequencies in the cochlear, even though you
hear pitch coming out, they all have to fire at the same time and group all
these things, right.
So back in the [indiscernible] state, you talk about, you know, pitch
templates, frequency templates. But ultimately, it's a grouping binding
problem. How do I bind these things together from different frequencies or
different dimension.
It turns out that autism, it's characterized as a lack or a hypo connectivity.
The long range connectivity is not as good. Now extrapolate, it's probably a
problem with grouping or at least some aspect in their ASD spectrum is the lack
of grouping. That could also be in ADHD. We don't know. And there's a whole
21
host of questions stemming from the perceptual grouping on to sort of a higher
learning disorder.
>>: Are there significant differences between gender and age? Is there a
tendency to discriminate, to focus attention? I chew my wife out all the time.
>> Adrian KC Lee: I really want to do that study without being fired, but we
don't know. We don't know. We haven't systematically studied it. That's the
official -- yeah.
>>: So clues are very high on the processing or the understanding of the
[indiscernible]. They tried working with the lower level clues that influence
the subject even without him knowing?
>> Adrian KC Lee:
feature space that
all these things.
special cue, which
other cues.
Yes. So let me rephrase this. This is a high dimensional
you're looking at. There is space, there is pitch, there is
You can tune into whether you can attentionally modulate one
might be stemming from brain stem, if it's ITT or IOD or
We're actually doing an experiment right now, just trying to see whether there
is a direct coupling from your eyes with very rudimentary neural cues coupling.
I think traditionally, speech and hearing sciences stem from the engineering
side. Ever since the '50s, '60s, it's always bottom-up. As good engineers,
anything that top-down is just noise. Let's cut it out and understand the
bottom-up very well.
I think in the last five to ten years, we're starting to try to see what a
top-down signal is. So there's not much literature out there. So we have to
systematically go back all the way down to brain stem, all the way down to
cochlear to see the top-down effect and that's actually hard to measure.
>>: I wanted to add something about that. We've also done experiments where,
in addition to putting these stimuli, we've put in short, quiet noise bursts
that users don't even realize they're there. And if you look at the
performance on the test, don't affect their ability to selectively attend to
one thing or the other.
But then when we look at the brain activity, we see that there is a brain there
22
specifically involved in suppressing the response to these sort of distracting
noise. We're trying to work out what the network is that allows you to do
that.
>> Adrian KC Lee:
So even if you don't know --
>>:
Yeah.
>>:
That's without the user being primed?
>>: Well, the user don't even know that they're there. Yeah, they're doing
some task with the distracting sound, and we ask them afterwards if they even
noticed it was there. Every subject said no. And yet, there is this brain
[indiscernible] that's involved in allowing them to not notice it.
>>: See if I can raise a question. So it sounds like this attention process
is kind of generic. But is it true that it really all happens kind of at the
top, or if like you measure what's going on in the auditory pathway, as
selection changes, is there a change in that pathway?
>> Adrian KC Lee: You would hope so. And I think that is the reason why
hearing aids don't work. Hearing aids, it's a bottom-up amplification. We
want to have the top-down signal. We want the control signal back.
>>: That would apply, then, even though if it really is a generic selection
mechanism, then the mechanism for going down to the feature level is direct
too.
>> Adrian KC Lee: It could be. That would be a lot of efferent pathways that
are not mapped out systematically. I think the hearing literature has always
been the afferent pathway, the bottom-up pathways. How the top-down pathways
go all the way to the outer selves to tune specifically the cochlear amplifier
would be the truly interesting part, right. It's hard to access to that data.
>>: You're saying you're using spatial cues [indiscernible] precisional cues
and pitch cues or harmonic cues. Some other big cues to onset and offset cues
[indiscernible] not harmonically structured.
>> Adrian KC Lee: Yeah, that's the huge feature space right there. Onset
tends to be more important than offset, but there are people that talk about
23
how the change in the scenes between the onset and offset, there are studies
that -- [indiscernible] over in UCL specifically studies the transient
onset/offset.
To me, I think onset is just, again, at huge cues. What exactly are we doing
in terms of as an organism to try and solve auditory signal analysis, right, is
to make sense of the world in terms of things that you care about, whether, you
know, the lion is coming over from the right as opposed to some sound coming in
the roaring sound and all these frequencies. It's important for you to solve
that problem so you use all the cues.
Onset is great, because as a sound emulating in space, if they all come from
the same source, they will have the same onset. They will all have the same
phase relationship. And I bet that our brain capitalizes that information.
>>: Typically, offset is [indiscernible] by the reverberation and it is way
more basically [indiscernible] onset.
>> Adrian KC Lee: That's true. Onset is much more reliable. That's the great
Bayesian brain that we have, we would weight onset as the great evidence than
offset.
>>:
So the same in [indiscernible].
>> Adrian KC Lee:
Yes, yes.
>>: There's cues for onset or offset that aren't affected by reverberation at
all, which is watching someone's mouth or something like that.
>>:
True.
>>: And, I mean, we just beginning to look at it, but there's very likely that
there's cross over of these signals that help you do this attentional gain that
don't even come from acoustics.
>>:
I thought we were just talking [indiscernible] here.
>>: So for these cues [indiscernible], is there no attention injected in this
at all?
24
>> Adrian KC Lee: No, no. I do not want to convey that at all. Especially, I
just checked the box that this is going out. This talk is available to
everyone. No, no. This is not ->>:
So people --
>> Adrian KC Lee: There are many studies in the past that talks about
attention. There are great debates on whether see analysis is pre-attentive or
not. Does attention matter when you segregate stream. There's a great, long
debate since early 2000.
A lot of studies before, I think, would be fair to say is from EEG
event-related potential studies. They answer different questions. I think
that from a network approach, it's quite new. Just because the MEG, EEG,
string all these things together, took a while to develop that technology.
So in terms of how the auditory cortex communicates with the frontal eye
fields, I could imagine that that's -- I could say that maybe as far as I know,
there are no other literature that talks about that.
>>: Now about in CASA community, computational option.
people, do they --
They have a group of
>> Adrian KC Lee: I know that like Martin Cook, there are a lot of -- of
course, they care about auditory grouping. But one thing that it turns out
from my doctoral thesis, a lot of CASA people think of -- you have to phrase it
in a probabilistic sense. Each time frequency pixel, it's either belong to
speaker one or speaker two. Or, you know, whichever speaker it is.
>>:
Then how does attention come into all this?
>> Adrian KC Lee: So this is the interesting part. You can measure the same
thing in a human. It turns out that they don't have to add up. There's no
reason for a subject to think that, you know, each time frequency pixel has to
conserve probabilistically. We phrase that model because computationally, we
need that constraint.
Turns out when you do all the -- there are asymmetries in the auditory cues
that actually we pass a scene differently as a problem statement than the CASA
people phrase.
25
>>:
I see.
>>:
So is the attention [indiscernible].
>> Adrian KC Lee: We don't know. I know that colleagues in our department are
starting to look into it. I can tell you that in visual scene analysis, Powell
Singer, who is actually a reader of my -- or of my dissertation back at MIT, he
goes to India and looks at a particular group of children where they have
this disease -- I don't actually know what the term is.
Basically, it's really hard for them to see, but at one quick procedure, they
can fix the vision. So they can look at, you know, the -- when they can't do
scene analysis to when they can do and how quickly it is, and is it innate. So
from the visual literature, there's this literature that can point us to the
innateness for us to do this.
In auditory scene analysis, unfortunately, we don't have that information yet,
as far as I know.
>>: Some people are just more absent-minded.
genetically?
Do you think that's because of
>> Adrian KC Lee: This is the spouse-not filter that he was alluding to.
pretty sure I don't need to test that. That's got to be universal.
>> Li Deng:
Thank you very much.
>> Adrian KC Lee:
Thank you.
I'm
Download