1 >> Li Deng: So it's my great pleasure to introduce Adrian Lee from University of Washington to talk about brain dynamics. This is one of the topics that probably will capture interest in a lot of research here doing neural network type of research. So Professor Lee graduated from MIT Harvard Speech and Hearing Ph.D. program. He recently moved to UW just a few months ago. >> Adrian KC Lee: A year ago. >> Li Deng: A year lab a couple months Today, he will come as related to brain Adrian. ago. Wow, time flies. I got an opportunity to visit his ago. There's a lot of exciting things going on there. here to tell us something about speech and hearing research imaging and as related to brain dynamics. To give floor to >> Adrian KC Lee: Thank you. Thank you, Li, for the invitation. It's great to be here. My goal today is to try and convey what's different in neuroscience. And really, in terms of the interface between neuroscience and engineering. I think that's actually where the most exciting part is. So where we're at and where we want to head to. My home Department of Speech and Hearing, and I'm adjunct professor in electro engineering. But our lab is actually based in the Institute For Learning and Brain Sciences, and we'll show you the type of equipment that we have for capturing brain dynamics over there. But I want to start with something that maybe captured our thinking every day, the combination of neuroscience and engineering. What are the things, the challenges that we face. Well, one of the things that we're interested in is how to do people selectively attend to sound of interest? This is something that a human can do quite easily, but a machine yet can do properly. Another thing is recognizing speech. Just a brand new speaker coming in, a human can potentially understand even my weird accent of mixed with Colonial British, Australian and some Bostonian accent. Hopefully, you can still understand my speech properly. You don't need to train yourself for 20 hours of tokens of me speaking. 2 So how do we do that? It's something that we want to capture in terms of how our brains do it and learn from it too. One of things we're also interested in is whether we can design machines and now it depends on different levels, whether we can mimic, interact, or even complement human abilities. So understanding where we're at in the brain, what are our deficits, can we capture brain signals and interject. And, of course, you know, this is the ultimate goal, right, have a seamless human-computer interaction and I think that's something that you're all interested in. So truly believe that a successful interface between neuroscience and engineering would, although it actually starts needing from the brain-state classification problem, can we identify which brain states you're in and there are make brain-computer interface and maybe in having artificial intelligence that's actually enhanced our capabilities. So let's start with our basic problem, right. One of the things we study is the cocktail party. Now, in this environment, there are lots of different conversations coming on, right. And for us, it's really easy to pick a certain conversation. For people at UW, they might say hey, I've got a Huskies ticket. Can you come over and, you know, do you want to join me for the weekend to watch it. For grad students or post-Doc, they may say hey, there's free drinks in room 204B. You're sharply tuned to that. And that's defined, back in 1953, identified as the cocktail party problem. Yet we cannot solve for it computationally right now. I tell you why view, here's a and Jenny says that's not the attend affects this is also important. From a learning environment point of Chinese lesson for the kids. It says ni hao. Repeat after me, ni hao. That's great. Unfortunately if you've ever taught, environment that this is, right. So being able to selectively education, affects a lot of things. This picture also reminds me of that there's a lot of machine learning going on here, right. The interesting part is that these are the little machines that learn every single day. In terms of Ilabs, we do a lot of brain imaging on how they learn. We want to understand how they do it so that we can better learn. So in some sense, this is a generic problem statement of how we can learn from the best learners. 3 Okay. So if the signal is not good, even though it's ni hao, poor Billy might have gotten mao, which is vir different from the signal. So if you don't attend it properly, ni hao means how are you. Mao means cat, right. So if you don't selectively attend and fish out the acoustics properly, there will be information loss. So even though it's a very perceptual question, it has a very high upstream consequences. So it's really hard to describe what an auditory object is. So I'm just going to play a game with you here. We're going to play the game called password. You tell me what the password is. Listen to the male voice and tell me what the password is. There might be a prize at the end. So male voice, give me the password. >>: The password is Huskies. >> Adrian KC Lee: So what is the password? >>: Huskies. >>: The password is Huskies. >> Adrian KC Lee: Okay. Now I said that there might be a prize at the end, right? So what did you ->>: I'll buy you a drink. >> Adrian KC Lee: from me. Now listen to what you could have done to get that prize >>: The password is Huskies. >>: I'll buy you a drink if you stand up now. >> Adrian KC Lee: Ah! It is a very expensive game to play if I miss out, right? But I can direct your attention to same physical stimulus, yet you can pick up the signal depending on what you listen to. This is the dynamics that we talk about. Attention is rewiring your brain at hundred of milliseconds 4 level or giving you the respective cues to bias the physical stimulus so that you get one message out or the other. That's cocktail party effect, and that's what an auditory object is. We listened to the male voice versus the female voice. Now, it's slightly different from the visual scene analysis. So in vision, here's a screen in front of the white board, and in that case, something's in the foreground, occluding the background, hence there's an etch. An etch detection in vision helps a lot to segregate objects in front and background. And auditory is very different, we call is a transparency. Spectral temporal elements add on top of each other, and it's like transparent. It's really hard to segregate them. Now, there are different cues that we can use. Here, I'm using a visual analogy here, and have different things in different colors. So you could imagine that I can attend to a certain feature here if I'm attending to yellow in terms of neuro physiology. We know that if you're attending to a certain color, those type of neurons actually are active more and therefore have a bias in terms of attention. Now, you can selectively switch from one view to another, and also, I just played a game with you, because I was fiddling around with audio, told you a lot about Huskies and UW. I primed you that the word is Huskies. So priming also works. So in our lab, we're interested in auditory cues, how they group sounds and how to group acoustic sounds and how selective attention works. So what are the ultra cues that normally works. In an orchestra, you can listen for the violins versus the flute. So that's Timbre, intensity. I have a louder voice than the fan that's actually humming in the back. Now that I've directed you to that fan, you're probably just listening to that fan instead of me. Going to concentrate more on spatial cues and pitch. Spatial cues are nice, especially when you have beam formers in terms of -- I know that you a speaker, conference forms, you can actually beam form to a certain speaker. It's the same thing we use for spatial cues in human. Pitch, whether I have a high pitch, low pitch, male voice versus female voice. So just briefly, I know that auditory spatial cues, just going to talk about it 5 briefly. So here's a subject's head. Here's an acoustic event coming on. And the sound arrives to the ipsilateral ear, the ear that's closer, slightly, ever so slightly faster than the one that's opposite by tens or hundreds of microseconds. That's good enough for us to have something called interaural time difference cue. We human can actually go os far low as 10 microseconds as that discrimination. Interaural level difference. The head, at higher frequency, casts an acoustic shadow such that the closer ear actually receives the signal slightly higher by a few dB than the signal that's further away. And that gives you the acoustic cues and there are some spectral cues as well. Given all this, I can direct you to listen to one cue versus another. What we want to do is to find the brain dynamics that's associated with you rewiring your brain really briefly to listen to our stim list. And here's the paradigm that we use. We ask the subject to -- we cue the subject which side to listen to. So here's a left arrow. And then they maintain fixation. And then what happens is a stimulus comes on. Two simultaneous. One coming from the left, one coming from the right. One has a high pitch and one has a low witch. And so because the arrow was on the left, pointing to the left, the right answer, when we asked you to respond, would be three. Now, you could imagine that I can just rewire your brain slightly differently using the same stimulus, but point the arrow to up and down. So now, pay attention to the pitch. The pitch of this, I've denoted high pitch in red so if I have the arrow going up, you listen to that simultaneous speech again. And to answer correctly, you have to listen to the high pitch, and the correct answer should be two. And what we're going to do is to, once you've been primed what to listen to, look at the brain dynamics to how you actually rewire to listen to that cue. And so that's the dynamical process that your brain goes through. The technology that we use is as follows. MEG measurement. This is newly installed in UW two years ago it's a multimillion dollar machine. I think the price tag in inventory is $3.4 million, I think. Sits in a room that is magnetically shielded. It's at 306 channels, measuring the magnetic fields coming out of your brain. 6 Simultaneously, use EEG, electro encephalography. So not only picking up the magnetic fields, we're picking up the electric potential on your head. On a separate session, we do an MRI scan. We want to know the anatomy inside your brain so that we can relate the brain signals that you measure and spatially co-register it such that we know where anatomically it's coming from. Once you do that, when you co-register your anatomy with your brain activities that you measure, what you can now do is to capture the brain dynamics and we make brain movies. So I told you that they get a visual cue in the beginning, right? So here's a brain movie showing the first 300 milliseconds. I have to explain that brain a tiny bit. It's a funny brain view. Everything gets sort of inflated. I've marked the visual area, the auditory area, attentional area and executive region. >>: So the color here indicates what? >> Adrian KC Lee: The red spot is the flow of activities, brain activities. Precisely, this is MEG. So this is measuring about the post-[indiscernible] potential. And if you look at a time scale we can get down to millisecond precision. So here at 1,000 milliseconds prior to the onset of sound, you receive a visual cue. Has about a 100 millisecond delay to reach into the visual cortex and as you see, there's a wave of activities now in your occipital cortex, in your visual cortex. Marching further down, now you're holding fixation, but you're actually now rewiring your brain to attend to either spatial cues or the pitch cues, right. And you see there's a sustained activity in the attentional areas. This is actually called the frontal eye fields. It's also involved in vision, directing your eye gaze from one position to another. If you think about moving your eye gaze, it's casting a spatial attention from one position to another. And the question that we ask is, is this region, even though it's only for eyes previously defined, is it also useful for auditory spatial hearing. >>: Is that commonly recognized region called attention? >> Adrian KC Lee: So this is actually the pre-central sulcus. Okay. So at 7 zero milliseconds, that's when the sound comes up. So just watch the auditory cortex when it receives of the auditory sound. Now, this is fundamentally slightly different from FMRI, right. Traditionally, in neuroscience, FMRI or even in terms of, before that, lesion studies, people talk about where in the brain matters, right. Here, we're trading off a tiny bit of the spatial resolution, but we want the brain dynamics. You know that in FMRI, every two seconds you take a brain picture and you infer where things are in the brain. Two seconds is an eternity in the brain, I hope, right. I can do a lot in two seconds. One thousand one, one thousand two. You just switched off because you know that one thousand two is redundant. I'm counting one seconds, two seconds. Your brain shuts on and off depending on the context and all this stuff. So we want to capture, at millisecond level, what the brain is doing. Now, using this technology, what we can then do is very precisely work out the difference between your wiring of the brain even before the onset of sound as you're rewiring to listen to it. So right before the onset of sound, when you're paying attention to space, the spatial cues, one part of the brain, frontal eye fields, is involved more active. When you're paying attention to pitch, another region, it turns out that this region, the superior temporal sulcus, has previously been recognized as the people using musical pitch discrimination and absolute pitch, people with absolute pitch shows a stronger activity here. So naturally, your brain is recruiting different brain regions at different times to help you to do the task. Because of the fine resolution, and this is where we differ from FMRI, we can distinguish what happens right after the sound onset. Also, FMRI is really noisy, right. If you have ever been to an MRI scans, it's constant like that. It interferes with the very signals what we're measuring in speech and hearing. MEG is quiet. So we can definitively work out, during the stimulus, how we're reacting to the sound. And here, we can actually see that while the spatial areas, the frontal eye field is still involved, while the stimulus is on, it's less active, or it's no longer differentially recruited when the sound is already on. 8 Now, there are other ways of using this technology. Here, we're using sending attentional probe into the sound. So what do we mean by that? Well, we are also going to give you two sounds, one coming from the left, one coming from the right. And your task is to count how many Es. So let's say listen to the stream coming from the left EEMAELK. Great. There are three Es. Respond three. However, we're going to tag this with a certain frequency, a low -it's high frequency for the brain dynamics people. It's low in terms of speech people, right. 35 hertz, 45 hertz. It's nothing that you think of as something that's high frequency. In the brain we think of four to six hertz, four to eight hertz as low frequency or even down to one to three hertz. So in any case, we're tagging 35 to 45 hertz. We can talk more why we chose those frequencies. So basically, here's a speech signal and we're going to amplitude modulate it. We're going to use that as a probe to find out where in the brain is actually responding to the 35 hertz or 45 hertz. Remember that when I cue to listen to the left, this is now in the foreground, right. And if it's true that you're modulating what you're listening to, the things that are in the background shouldn't matter. You're concentrating to the left stream and we can look at the tags. So what happens? Now, listen to the left. 35 hertz is the brain. Let's tune the brain at 35 hertz. We're just doing a phase-lock value here, right? You can see that the left frontal eye field, it's locked to the stimulus. And also the auditory cortex. Remember, 45 hertz is the side that you don't listen to. We told you not to listen to that. When we tune into 45 hertz, that signal is no longer there in the frontal eye fields. You can counter balance in, you know, which side that you're listening to, what frequency, only the left frontal eye fields is locked to the stimulus that you're listening to. And the probe, turns out psycho physically, is transparent. As in the subject doesn't even know that the -- even if we change the probe in midstream, the subjects don't even know. So it's a neat neuroscience cue that we can use and also that can be utilized for a lot of eventual deployment for brain computing. >>: So that spatial seems to [indiscernible] here. >> Adrian KC Lee: EEG. That's because we combine the anatomical scans with MEG and 9 >>: Oh, okay. So after you do this, how does it compare to the FMRI? >> Adrian KC Lee: Let's say FMRI, a canonical resolution is roughly one millimeter. After a lot of spatial smoothing, it's around five to six millimeters effectively. We would say one centimeter. So for us, we care about the network approach. We don't care about whether this region of the auditory cortex responds more to pitch or space. We want the global picture of the brain. Eric Larson actually here, if you want to learn more about this experiment, I'm just going to briefly describe it. One of the things that we want in the lab is to eventually work out how to tune a hearing aid dynamically. So hearing aids right now don't work in a very noisy environment. If you've ever had older relatives listening in a restaurant, they would rather take it off. That's because everything's been amplified. It doesn't selectively amplify to a sound. Now, if we want to selectively amplify the sound, we need to know where you're attending to, and one of the questions we also need to address is have you just switch attention from one person to another. So here, we're designing an experiment just to see whether you've switched your attention from one speaker to another. Use the same cue, whether you listen to the left or right, visual cue. Two-thirds of the times great, you know, just respond to the two digits, one coming from the left, one coming from right, from the side that we've cued you on. But one third of the time, we're going to say, oh, no, just kidding. I'm going to switch the attention to the other side. The right at that 600 milliseconds, when we told you to switch your attention, where in the brain helps you switch that attention? Turns out that the right temporal parietal junction -- here's the manipulation point, right. Just kidding, switch attention. Green is switch attention. White is hold attention. difference in terms of that particular region. You see a massive There are other regions that are involved the right frontal eye fields or the middle frontal gyrus. They're different regions. But for us is whether we can capture that signal and do classification to work out whether you've just switched attention or not. 10 >>: But the area is normally not related to hearing. >> Adrian KC Lee: >>: But attention is generic attribute. >> Adrian KC Lee: >>: Yeah. That's right. That goes across vision? >> Adrian KC Lee: That's right. So one of the questions that we have is, is attention super modal, right. If you're paying attention, it doesn't matter whether you're paying attention to sound or space, it should just be an attention network. Auditory attentional network has not been studied thoroughly, and that's what we want to contrast that to vision. RTPJ has been implicated in visual cortex, and I'll show you. The coolest thing, though, is this. Not only that we find a difference in terms of the brain dynamics, we then correlate back to the behavioral performance. And you see a massive correlation, depending 0 whether you can switch attention versus not and correlate that into the differential activities of that region. Almost like, actually, when Eric showed me this, I was saying that's too good to be true. It's really highly significant. Clinically, we can use this to perhaps diagnose central auditory processing disorder. There are people that have normal audiograms but just can't do things in a cocktail party environment. So from clinical perspective, this is useful. For machine or brain/computer interface or usage, it turns out that one part of the region is a really good predictor of your brain-state. And this is what Li was pointing to is here's a visual attentional network that's been previously mapped out. I've shown you left frontal eye fields, right frontal eye fields, right temporal parietal junction. I think it's -- I'm a bit biased, I'm at auditory person. Auditory always cares about time and frequency. It is not the native axes that the visual people think about. So the question that we want to now address is, yes, we have a network. How do these signals pass from one note to another? Also, when they pass a signal, can you tell that they there is some correlation? How do they pass the signals in the first place? And I'll get back to that. 11 So as Li said. We sort of [indiscernible] my lab over here at UW. And looking specifically into auditory brain sciences, we have a neuroengineering goal in the lab as well. So I'm going to walk through some different experiments, different people doing different things and hopefully if you're interested in it, please also talk to them during the course of the day. So Ross Maddox is also here. He is currently studying audio visual binding problems. Now, why do we care about audio visual binding? There are many objects that comes on, right. If you have a computer passing a video scene, how do you know whether the sound and the vision is the same person? It turns out that we use a lot of temporal coherence. At the syllabic rate, my mouth moves at this four to seven hertz signal and my syllables also coming up at four to seven hertz. That gives you a cue that my mouth moving and my sound coming out, it's coming from the same object. And so we're designing experiments where Ross is looking into the different ways of binding, how temporal coherence can change your perceptions of binding. And the implication is that if we can harness this, it can also be turned into how a machine can use that type of information for binding information. I think one of the main problem is, you know, auditory scene analysis, visual scene analysis is still a very hard computational problem right now. Can we learn from human to do that task. Another project that we have is with the graduate student, with Elliott Saba and he's co-supervised by professor Les Atlas. And what we look at is the signal processing in the brain. Now, what does that mean? Speech people would tend to think about carriers of pitch, you know, few hundred hertz and then you have [indiscernible] at a thousand hertz. As I alluded to in the brain, we talk about four to eight hertz at some brain signals carrier and there's some high frequency at about 30 to 150 hertz. Here is showing a mouse moving in a maze, and where it goes into that particular part of the maze affects where the face procession, when that particular -when they're at that particular region, that neuron is fired slightly earlier in phase of that carrier frequency. So it becomes the phase of another frequency can be used to coat where in space, in the memory, in the hippocampus, people have gone experiments on 12 memory retrieval, items retrieval. If I give you a sequence of one, two, three, four items, you can be locked into the different phase of the lower carrier to give you that sequential results back. So we're starting to analyze the signal processing in the brain, the coupling of different frequencies and trying to map that out. You can also imagine that this can be used eventually in EEG and MEG, right. I can dynamically look at how different parts of the brain are communicating using these lower frequencies and therefore infer what brain-state or what items that you're retrieving. I've mentioned the problem statement that our labs, one of our lab goals is to get a dynamic -- hearing aids dynamically tuned to the signals of interest. And in this case, we want to know what you're attending, but in a generic problem statement is what is your brain-state. If we can measure that you are -- you just switch attention to one from another, I can also measure whether, oops, I just made an error. Here's an error signal from the interior signaller cortex, for example. So a broader statement is can we classify brain states. So Eric Larson, who is sitting in the back, a post-doc in our lab, he is working on this particular problem. We start with brains. Lots of brains. Let me take a step back too. In terms of brain computer interface, you can classify all your brain signals all you want for a particular subject. You will do very well. However, when you have a brand new subject, it's really hard to predict the N plus 1 subject what that brain-state classification is going to be. We want to make a difference in that room. Can we predict a brand new subject, what sort of brain classification, taken into account of the bank of brain signals that we have. And this is how we're approaching this problem. First, we calculate the game matrix. What this means is that we have an MRI scan, we decimate the space in brain. This is pure [indiscernible] laws just to work out thou the sense is seeing each of the [indiscernible] in your brain. And then however, unfortunately, this is -- we're talking to Li before. We're using the cheap data right now. We're doing modeling, because we can simulate brain signals. 13 Ultimately, we are never going to measure individual neurons in the brain. yet, anyway. Instead, what we ->>: [indiscernible]. >> Adrian KC Lee: That's right, that's right. But not the whole brain. you poke the whole brain, I'm not sure the organism will survive, right? >>: Not If [indiscernible] signal from a single neuron. >> Adrian KC Lee: That's true, yes. From a -- I think we really want to do a non-invasive imaging approach. I think that's where the commercialization aspect will come in. I'm not sure everyone will put up their hands for sticking needles into their brain. So what we're measuring is something coming out of the head as the MEG the EEG signal. To infer back to where those signals coming from the brain, you need inverse approach, right? This is high dimension, not as high dimension, rank deficient, need an inverse approach. So we do the inverse imaging L to norm constraint. We talk about the actual methods later on if you're interested. One other problem. Turns out that luckily, I guess, each of our brain, it's different. Look at each brain here. Each brain is just slightly, ever so slightly different in the geometry. They all have some resemblance to a generic brain, but they are all different. The MEG now all different in the folding changes the physical properties of your EEG and signals by the right hand rule. So a lot of brain computer interface right uses the motor strip, the motor central sulcus, and it happens that they realign quite likely like this. It's nice. One's in the prefrontal cortex where the cool things are happening, right? Attention, executive control. Those are the things that the brain signals that we really want to capture. The folding is so different that it can massively change. Just imagine you're folding, right? Just do your right hand here. The field will change very differently. So one thing we have to do is to align all these little brains. Sorry, big brains, whatever. Spherical morphing is the approach that we use. Look at one brain, we're going to map it to a generic brain and come back to it just to see. 14 So you can see that there's some sort of decimation, right? Low [indiscernible]. But it captures mainly the sulcal gyral patterns, which is important for us to get the field patterns. By doing so, what we can then do is to start approaching solving this N plus 1 problem. If we have a new brain, we're just going to cheat one more time. When we do, the brain computer interface for the brand new subject, all we need to do is just ask this person to give an MRI scan. So we're almost getting all the information. We don't need so much training of the classification. The goal is by understanding just capturing the anatomy of the brand new subject, can we do much better in terms of brain computer interface. And so in a preliminary modeling experiment, so just that, the answer is yes. Where is the regime that we do in general in terms of brain computer interface when we want a massive deployment, right. It's low trial counts. We're not going to get the subject to do 100 trials and classify, right? That's the commercialization aspect is sort of low. We want low SNR, low trial counts. Those are the things that we want. And it turns out that in this simulation, having the anatomy of the untrained subject, right, so we're training all the brain datas that we have with the anatomy. Now a brand new subject, with that new anatomy, can that actually help. And in the low trial count, and low SNR, it actually provides improvement. In fact, if you only just use the sensor space, so only classifying brain signals using the EEG channels, you can't even start projecting to a new subject. Normally doesn't work unless it's a well defined motoring -- there's a few experiments that work, and that's actually where the mainstay of PCI right now. Okay. I'm going to conclude with a cool new experiment that's done by Nema, a good friend of ours. He's a former student of Shehab. I know some of you know Shehab as well. Here is a direct recording on the brain. So this person's skull has been removed. Electrodes placed on the brain. This is for epilepsy detection, pre-surgical planning. And the subject has been played two sounds. Again, it's the cocktail party problem. This is the CRM corpus, coordinate response measure and the two sentences are going on. Speaker one says ready tiger go to green five now. Speaker two simultaneously, just like the example I played for you, says ready Ringo, go to red two now. 15 The person has to, you know, respond to where Ringo went to. So when the subject listened to Ringo, great, okay. I need to know it's red two and ignore green five. This is the computational problem that the brain is facing, right. You're all familiar with computational auditory analysis, try to fish out noise. It turns out, you know, as an engineering, the signal-to-noise ratio, right, there's always noise. In human, there's no noise. It's what I'm attending to and what I'm not attending to is noise. It's really hard to define noise in human. So you can switch your attention at any point in time. So this is the signals that you get. It's a summation of the two, and if I'm only recording one signal at a time, this is the brain response, the spectral temporal response and the auditory cortex to speaker two alone, to speaker one alone. This is the result of the unmixing of the auditory cortex due to the attentional bias to when you're listening to speaker two alone, speaker one alone and see the correspondence here. >>: So how do they process the signal here? electrodes, do they do something? You have ten different >> Adrian KC Lee: Yeah, so the spatial dimension here is not the most important part. You can even tune down ->>: A single electrode could -- >> Adrian KC Lee: >>: No, there should be multiple electrodes. Do they [indiscernible]. >> Adrian KC Lee: Yeah, that actually gives you another piece of information. The weighting of the electrodes may give you a clue of where the actual area of the brain that's most responsive to this. So that actually gives you the where question, right. Here, at least, gives you the spectral temporal dimension that of the auditory cortex ->>: [indiscernible]. 16 >> Adrian KC Lee: >>: I believe so. I actually need to re-read that paper. Okay. >> Adrian KC Lee: But here's the sort of -- a proof of something that we already know, right. We can do this. It turns out that the auditory cortex, when you have attentional bias, can modulate the spectral temporal signals like this. So I guess to conclude, I want to sort of open up where we think that is ahead, the road ahead of us and also how that integrates to perhaps things that are maybe in five to ten years commercializable or a decade later. I think speech perception is fundamentally something that is time critical. Speech and hearing science, always time frequency axis are the most important dimensions to us. So one thing that we should start looking into is how different brain regions coordinate at different times. That should provide us with a new way of thinking in terms of the computational modeling. It's not just the where. It's not just the anatomical connection. People have been doing that a lot, right. How information is being passed dynamically, that adds the time dimension, even though we've, from the engineering side, we've done that using, you know, the hidden Markov or whatever -- hidden time state changes, right. You try to learn that. Why don't we learn from the brain as well? Another thing is the interaction between top-down and bottom-up attention. It is really hard to ignore when a phone comes on right now, right. That's like an auditory siren. You automatically switch to the phone. And, in fact, that's why a phone rings. It's to grab your attention. That's the computation between top-down attention. Oh, wait, I'm supposed to be listening to this speaker and to ignore that phone ring. That's the computation. Now, a question that would be of interest is, is auditory siren an all or nothing? Is a siren only just grab your attention like this, or can it be sort of like an attention-grabbing, but depending on your brain-state whether that would grab your attention or not. 17 So I understand that there's a great interest in Microsoft looking into triage alert system. Like the busy body system. In that sense, you want to know whether I want to alert the user whether it's a good time to stay away from my work. Now, the decision point, of course, is based only a lot of data learning and how likely it is from the decision model to alert to the user. You can flip it to the other side, right. I want to know how likely the brain wants to be alerted or, hey, brain, I'm going to give you about 30 percent chance of actually capturing this signal. Just like if you have a green light on your phone, right, and you actually -- if momentarily I'm just bored with my work, I see a green light flashing. I can make that decision myself too, whether I want to grab the phone and look at that message really quickly, right. So here, it's giving an option of not just an all or nothing alarm. But let the user, give the user back the chance to say hey, here's the signal in the back. Oh, yeah, I'm actually interested in it because I got bored a tiny bit. So it's an interaction between the computer and the user. a [indiscernible] acoustics end how to integrate that in. And using that from Now, of course, there are a few other things we can do, right? One, imagine in 10 or 20 years' time, when EEG technology is available such that we don't need all these gels, right? In fact, dry cap's being produced now, slowly becoming more of a mature technology. You could have EEG caps that literally just have these micro sensors in your hair, picking up the signals, wirelessly transfer to your phone device. The key is what signals are you picking up? And how -- what are the key signals from the brain that is usable to interact with the device. That's the key, right. And that's why we're interested in brain-state classification. We want to bank of vocabulary of your brain such that we know how to look it up. Moreover, to look it up is one thing. We need to know the dynamics. The other is how generic it is. We don't want to train every user all the time. Maybe some day, by using anatomy, we can just say, hey, here's out of the box a new phone device. Here's some EEG caps. Just put it on this way and we'll do the learning myself. 18 So that's where I'm going to end and just like to acknowledge in terms of funding. Li was asking what type of agencies now are interested in brain-state classification. From a hearing aid point of view, the national institute for deaf and communication disorder. So sort of the hearing research branch in NIH is obviously interested in the hearing aid application. But the defense is also starting to get into the brain-state classifications for reasons that might be obvious. And I'd like to thank my collaborators, both in UW ands about ton university and also at Mass General and if you want further information, here's our website. Thank you. >>: Back to this attention sort of inverse engineering. Do you have any computation model that somehow can give you similar [indiscernible] that measure so that the same kind of [indiscernible] afterwards can cover one of these? >> Adrian KC Lee: There are two questions here, right? One is what dimension should we use, right. So what feature space, what hyper space should we do that classification for, for computational auditory signal analysis, auditory stream segregation. I think what we haven't been using, and this is the type of research that we want to at least give clues of what auditory cues we use, right. So grouping cues are the most important things. Continuity of sound. there's a whole branch of auditory grouping literature. -- So >>: When you showed the result, you don't care about [indiscernible] as long as you can produce something that gives you the same kind of measurements [indiscernible]. >> Adrian KC Lee: No, no. The brain that you measured really cares about the cues. The person that is doing it is using all the cues that the person can use to segregate this. Or else this wouldn't work. >>: Somehow you've got to get the cue. >> Adrian KC Lee: Yes, and the interesting point, we were also talking about a metric of how good you are in terms of after you segregate, right. It's good to have an L to norm, mean square error, engineering standard go-to. This may 19 not be the most relevant thing for the user, right. And a mean square error might have spikes in there, which really perceptually unpleasant, right. And just like MP3 code, why is it that transparent coding works? You need to work out from the MP3 is from the perceptual side, from the cochlear side, you can block up the masking. Here is the higher level. What grouping cues can make the streaming more seamless, and you can still have some residual error that is irrelevant in terms of getting a stream out. >>: So go back a few slides here. the base of the signal. >> Adrian KC Lee: Okay, yes. So the [indiscernible] is at Yes. >>: So there is a missing diagram that I really want to know in terms of the raw signal that you measure. >> Adrian KC Lee: Oh, from the -- yeah. >>: And then you have those, and whatever process they do, they will cover this, assuming that [indiscernible]. >> Adrian KC Lee: >>: Absolutely. But we're measuring from the auditory cortex. You're asking the patient to focus on one of those? >> Adrian KC Lee: Right. But I've showed you previous slides, when I pay attention to space or pitch, it uses different areas. So here, there is a spectral temporal modulation of the physical stimuli, which is great. But that doesn't necessarily tell me where the attention signal is coming from. Somewhere else in the brain is telling the auditory cortex, modulate this way. So it is the trade-off between you want to know locally very well in terms of the spiking of the neurons, versus I want a global view of the brain and how things have been connected. And I think you need both. There's every single level in neuroscience is important, down to single unit neuro physiology, you have multiunit recording and for [indiscernible] neuroscience approach. 20 >>: I think this piece of work is that it shows you whatever recording you have, it will record it, it is sufficient to [indiscernible]. So what it means is if researcher, if you have this model that can verify through some [indiscernible], and that give you reasonably good output that you might measure over here. >> Adrian KC Lee: Yes. >>: It shows you can actually cover -- it allows you to investigate more [indiscernible] modelling to figure out. >> Adrian KC Lee: Right. And you could imagine if I drop some spectral temporal element, would that matter. So there is some metric that you can use getting what the brain cares about versus not. >>: And so far, no idea about how attention is [indiscernible]. >> Adrian KC Lee: >>: We're working on that one. Okay. >>: What would happen if you did the same test on attention with subjects that have severe attention deficit disorder? I work with a student that has a severe [inaudible]. >> Adrian KC Lee: Yeah, there are a whole host of psychiatric disorders that may influence attention, ADHD and also autism. We're talking about groupings here, right. About how different frequencies in the cochlear, even though you hear pitch coming out, they all have to fire at the same time and group all these things, right. So back in the [indiscernible] state, you talk about, you know, pitch templates, frequency templates. But ultimately, it's a grouping binding problem. How do I bind these things together from different frequencies or different dimension. It turns out that autism, it's characterized as a lack or a hypo connectivity. The long range connectivity is not as good. Now extrapolate, it's probably a problem with grouping or at least some aspect in their ASD spectrum is the lack of grouping. That could also be in ADHD. We don't know. And there's a whole 21 host of questions stemming from the perceptual grouping on to sort of a higher learning disorder. >>: Are there significant differences between gender and age? Is there a tendency to discriminate, to focus attention? I chew my wife out all the time. >> Adrian KC Lee: I really want to do that study without being fired, but we don't know. We don't know. We haven't systematically studied it. That's the official -- yeah. >>: So clues are very high on the processing or the understanding of the [indiscernible]. They tried working with the lower level clues that influence the subject even without him knowing? >> Adrian KC Lee: feature space that all these things. special cue, which other cues. Yes. So let me rephrase this. This is a high dimensional you're looking at. There is space, there is pitch, there is You can tune into whether you can attentionally modulate one might be stemming from brain stem, if it's ITT or IOD or We're actually doing an experiment right now, just trying to see whether there is a direct coupling from your eyes with very rudimentary neural cues coupling. I think traditionally, speech and hearing sciences stem from the engineering side. Ever since the '50s, '60s, it's always bottom-up. As good engineers, anything that top-down is just noise. Let's cut it out and understand the bottom-up very well. I think in the last five to ten years, we're starting to try to see what a top-down signal is. So there's not much literature out there. So we have to systematically go back all the way down to brain stem, all the way down to cochlear to see the top-down effect and that's actually hard to measure. >>: I wanted to add something about that. We've also done experiments where, in addition to putting these stimuli, we've put in short, quiet noise bursts that users don't even realize they're there. And if you look at the performance on the test, don't affect their ability to selectively attend to one thing or the other. But then when we look at the brain activity, we see that there is a brain there 22 specifically involved in suppressing the response to these sort of distracting noise. We're trying to work out what the network is that allows you to do that. >> Adrian KC Lee: So even if you don't know -- >>: Yeah. >>: That's without the user being primed? >>: Well, the user don't even know that they're there. Yeah, they're doing some task with the distracting sound, and we ask them afterwards if they even noticed it was there. Every subject said no. And yet, there is this brain [indiscernible] that's involved in allowing them to not notice it. >>: See if I can raise a question. So it sounds like this attention process is kind of generic. But is it true that it really all happens kind of at the top, or if like you measure what's going on in the auditory pathway, as selection changes, is there a change in that pathway? >> Adrian KC Lee: You would hope so. And I think that is the reason why hearing aids don't work. Hearing aids, it's a bottom-up amplification. We want to have the top-down signal. We want the control signal back. >>: That would apply, then, even though if it really is a generic selection mechanism, then the mechanism for going down to the feature level is direct too. >> Adrian KC Lee: It could be. That would be a lot of efferent pathways that are not mapped out systematically. I think the hearing literature has always been the afferent pathway, the bottom-up pathways. How the top-down pathways go all the way to the outer selves to tune specifically the cochlear amplifier would be the truly interesting part, right. It's hard to access to that data. >>: You're saying you're using spatial cues [indiscernible] precisional cues and pitch cues or harmonic cues. Some other big cues to onset and offset cues [indiscernible] not harmonically structured. >> Adrian KC Lee: Yeah, that's the huge feature space right there. Onset tends to be more important than offset, but there are people that talk about 23 how the change in the scenes between the onset and offset, there are studies that -- [indiscernible] over in UCL specifically studies the transient onset/offset. To me, I think onset is just, again, at huge cues. What exactly are we doing in terms of as an organism to try and solve auditory signal analysis, right, is to make sense of the world in terms of things that you care about, whether, you know, the lion is coming over from the right as opposed to some sound coming in the roaring sound and all these frequencies. It's important for you to solve that problem so you use all the cues. Onset is great, because as a sound emulating in space, if they all come from the same source, they will have the same onset. They will all have the same phase relationship. And I bet that our brain capitalizes that information. >>: Typically, offset is [indiscernible] by the reverberation and it is way more basically [indiscernible] onset. >> Adrian KC Lee: That's true. Onset is much more reliable. That's the great Bayesian brain that we have, we would weight onset as the great evidence than offset. >>: So the same in [indiscernible]. >> Adrian KC Lee: Yes, yes. >>: There's cues for onset or offset that aren't affected by reverberation at all, which is watching someone's mouth or something like that. >>: True. >>: And, I mean, we just beginning to look at it, but there's very likely that there's cross over of these signals that help you do this attentional gain that don't even come from acoustics. >>: I thought we were just talking [indiscernible] here. >>: So for these cues [indiscernible], is there no attention injected in this at all? 24 >> Adrian KC Lee: No, no. I do not want to convey that at all. Especially, I just checked the box that this is going out. This talk is available to everyone. No, no. This is not ->>: So people -- >> Adrian KC Lee: There are many studies in the past that talks about attention. There are great debates on whether see analysis is pre-attentive or not. Does attention matter when you segregate stream. There's a great, long debate since early 2000. A lot of studies before, I think, would be fair to say is from EEG event-related potential studies. They answer different questions. I think that from a network approach, it's quite new. Just because the MEG, EEG, string all these things together, took a while to develop that technology. So in terms of how the auditory cortex communicates with the frontal eye fields, I could imagine that that's -- I could say that maybe as far as I know, there are no other literature that talks about that. >>: Now about in CASA community, computational option. people, do they -- They have a group of >> Adrian KC Lee: I know that like Martin Cook, there are a lot of -- of course, they care about auditory grouping. But one thing that it turns out from my doctoral thesis, a lot of CASA people think of -- you have to phrase it in a probabilistic sense. Each time frequency pixel, it's either belong to speaker one or speaker two. Or, you know, whichever speaker it is. >>: Then how does attention come into all this? >> Adrian KC Lee: So this is the interesting part. You can measure the same thing in a human. It turns out that they don't have to add up. There's no reason for a subject to think that, you know, each time frequency pixel has to conserve probabilistically. We phrase that model because computationally, we need that constraint. Turns out when you do all the -- there are asymmetries in the auditory cues that actually we pass a scene differently as a problem statement than the CASA people phrase. 25 >>: I see. >>: So is the attention [indiscernible]. >> Adrian KC Lee: We don't know. I know that colleagues in our department are starting to look into it. I can tell you that in visual scene analysis, Powell Singer, who is actually a reader of my -- or of my dissertation back at MIT, he goes to India and looks at a particular group of children where they have this disease -- I don't actually know what the term is. Basically, it's really hard for them to see, but at one quick procedure, they can fix the vision. So they can look at, you know, the -- when they can't do scene analysis to when they can do and how quickly it is, and is it innate. So from the visual literature, there's this literature that can point us to the innateness for us to do this. In auditory scene analysis, unfortunately, we don't have that information yet, as far as I know. >>: Some people are just more absent-minded. genetically? Do you think that's because of >> Adrian KC Lee: This is the spouse-not filter that he was alluding to. pretty sure I don't need to test that. That's got to be universal. >> Li Deng: Thank you very much. >> Adrian KC Lee: Thank you. I'm