25179 >> Arjmand Samuel: So, it's my pleasure and honor to...

25179 >> Arjmand Samuel: So, it's my pleasure and honor to welcome Professor Les Atlas to MSR. Les is a distinguished professor at U-Dub, he's been involved in different things. Mainly he is working in single processing of acoustics. The sound is his specialty I believe and sensing that and working on it. So Professor Atlas please. >> Les Atlas: Thank you. So I want to mention that this talk that I'm giving is a slightly updated version of a talk I gave a few weeks ago in Cambridge, and it was given to the group of hearing researchers, Brian Moore and his group and other people, Roy Patterson's group at Cambridge UK, and the reason for that is kind of right here, and Bloedel Research Hearing Scholar is something that I was awarded about a year ago, which is, gave me some release time and some motivation to work on how the ear works because people have run into a limit in what they feel the science and mathematics can tell them about the ear and they want to help, wanted some help with that limit. So, I also want to thank someone who's been very inspirational in this work, Bishnu Atal, who many of you have heard of and are sponsors. So, let me go ahead and get into my first point. And it's really real, just what I just said that conventional method tools and science for audio signal representations that we work with now, that we’re taught, are insufficient, and they limit our creativity. We are taught a certain set of things that are standard part of electrical engineering, computer science, physics, mathematics, and those are limiting. And the better tools and science would give us the ability to, for example, come up with enhanced listening, be it for normal hearing, impaired listeners or machines, multiple simultaneous sources, noise, and reverb. These are standard problems that people face, importing systems from the laboratory to the real world. And let me talk about existence proof that there are better solutions. The first one is, we know our ears work well. And we can look at top-down approaches, people like Shihab Shamma or Li Deng or others who work in dynamics. That's essential, having models of how sounds evolve over time is absolutely essential part. And I'm not saying that's unimportant. But I'm going to stress something else that's important which is more the bottom-up feature-based approach. And just from the standpoint of what really can be done if you expand your imagination, you let go of some of your previous notions and previous mathematics, allow for the fact that there might be a different world out there than what we've learned. And I've got a picture here from a recent publication that actually made use of some of our present previous concepts which is a notion of a modulation spectrum and modulation filtering. This is a model of what the ear might be doing, according to McDemott et al. Neuron, and they cited a paper transaction signal processing, something I've talked about before called modulation filtering, and their argument, which is an argument that we've gone a bit beyond, and I will be doing that the rest of my talk, is that, well first of all, the way that the ear works, it's got a bunch of subbands and each subband is producing something like a bandpass filter, there's an envelope, and compressed nonlinearity. Envelopes come out of every subband and whenever you talk about an envelope or modulation, there's another word for modulation which is called product. Modulation, multiplying product, all mean the same thing, which means if you really want to find a modulation envelope, you have to DE multiply the original signal, find its carrier, or do something else, get rid of that original carrier. And I'm going to talk about that old communications model, why it's old, and why it's a bit dated, and how we can move on from that point. It’s the conventional model that people are using. This is a 2011 paper, they've got some pretty new results, nice results, using something called modulation filtering, which I've argued for previously, but what’s really missing>>: Is that experimental work or is that modern work? >> Les Atlas: Is what? >>: Is this experimental work or is it a modern work? >> Les Atlas: Both. >>: It's both. >> Les Atlas: So, in this neuron paper is both, and you'll find if you look in the literature, you'll certainly see sets of both. But what's missing is part of the scientific community and also the engineering community is people talk about a modulation envelope. And that’s a well accepted concept because the way that the ear works, and the way that vocoders work, and the way that you can get intelligible speech is about these envelopes, these slow-moving envelopes, one for every subband, you might have eight or sixteen or even four subbands, can excite them with noise, you can get intelligible speech. But when you have two talkers, or you have noise, or you have reverberation going on, that falls apart. So the problems that we started with, which is multiple sources noise, reverb, are things that don't hold up well to having the envelopes only. So the notion of was called a temporal fine structure is what the community is after. Problem is they don't have an agreed-upon definition. There is kind of an operational definition, I'm going to tell you why that definition is wrong. And I'm going to give you a substitute way of looking at it. Now, let me just give you another existence proof. Something that's actually related to this problem. When you go into a new room and you listen to a group of people talking, you can single out a particular talker in a highly reverberant environment like that. Understand? There's speech. Even at a negative signal to noise ratio. And there's another case of something like that. That's radiofrequency, walking around, be it Wi-Fi that's running at 450 or 600 megabytes per second, or 4G Internet, or 3G Internet, if you take a look at something which is city kind of layout, that's actually a Wi-Fi record, that's not 4G Internet. Little devices that look like this thing that's small on the left, you can get a layout like this between buildings that tells you about how well you'll do. And the key thing about that is when you use Wi-Fi or use 4G Internet these days, there's no training sequence. It's a blind equalization that's done. And there's tricks being done in Orthogonal Frequency Division Multiplexing that are remarkably similar, as I will argue, to what are our ear might be doing. That's the analogy I'm going to make toward the end of my talk. So let's start heading toward that. And first of all, to get to that point, that final point of OFDM, our frequency division multiplexing, let's start with some history. The idea of breaking things into frequency or subbands, the first person to do it really was Alexander Graham Bell, who tried to send multiple Morse code signals down a single line simultaneous, the multi-user problem for Morse code. He failed in doing that, but by mistake, he was able to send voice over that same line, that's called the telephone now. That was in about 1875 that he did that. 1877, Helmholtz ended up writing a book, and a key part of that book is quoted right here: the notion of tone, they used cronk[phonetic] out zero crossings if they could, they used to try to make a siligram[phonetic] or time domain pictures, but they didn't have a notion of frequency, didn't link what Fourier did with what happens with sound. It was Helmholtz who made that link. Once we did that, suddenly the area took off. 1906, the notion of product models, that are still being used, that's AM radios. Initiated back then. 1933, vocoding models, AM and FM together. FM radio came there, and the problem is, we're kind of stuck at that 1933 point in terms of modeling we're using for understanding how the ear works, and understanding things like even mel-frequency cepstrum and what they're doing. Now frequency cepstrum go well beyond what AM, FM models do, but inherent in them, inherent in their formulation, is right there. Let’s take a look at something that's more modern. OFDM. And I'm really stressing the FTM part. Doesn't have to be orthogonal, notion of frequency division multiplexing, that’s 1966 is the first papers, I list 1942 because the first person to talk about frequency division multiplexing are spread spectrum in general, Hedy Lamarr, the actress, 1942. And that's used in all modern Wi-Fi, 4G and high-speed data communications. Now, it's just like this problem we are talking about, multiple talkers. That's called multiple access. Uses reverberation, it doesn't just blindly equalize, it uses reverb to its advantage in ways that I'm going to show you without the reverb there's multiple paths between buildings that would work worse. At worst best if thousands of frequency channels are available for one data stream. The more frequency channels, the better, except for battery life. It's kind of like the ear, we have thousands of frequencies channels highly overlapping in the ear, they're not orthogonal, but offer some novel and surprisingly simple concepts for auditory encoding, whether we’re trying to understand what the ear does from a scientific standpoint, or come up with new algorithms that are insensitive to noise, can single out sources. Now, what's different about it? It's talking about some communication scheme, but what the single kernel of difference between what auditory models use and assume, and the stuff I've even talked about before, and what's done in OFDM? Well, it's a fact. It’s multiple axis, primarily additive. I'm going to show you what I mean by this soon, and not a product model. When people talk about modulation coming out of every subband, and then the fine structure that might be coming at the pitch rate, or to a higher frequency rate, they're presuming a product model. A product model causes problems which I will show. If you use an additive model instead, things get better. Let's take a look at the auditory system, just a simple model of it. And I'm going to keep a simple model from known material. Not making any strange speculation here, saying that we have an input signal that's broken into a bunch of subbands, could be thousands if it's our auditory system, if it's some device, you know modern audio compression, it might not be thousands, you are limited by computes, but in any case, you have a bunch of subbands that have certain properties. Now what’s special about our auditory system is the hair cell rectification right here. So the hair cell rectification I'm showing an instantaneous nonlinearity that compresses the signal in a way where if the signal is negative going, it either provides zero or it lowers the resting rate, so it has a slight amount of negative value which I show by having this white line below the origin. On the right side, we pretty much duplicate the output. Half wave rectifier, not a perfect model, but pretty much what the transduction between acoustic pressure and neural stimuli does, followed by, now, I'm going to foreshadow what's coming, a crummy lowpass filter. If you take a look at every hair cell on the ear, there’s several neural fibers coming off it. Anywhere between one and eight for the hair cells that go up to the brain. And what comes off them has a lowpass filter, but it's a leaky integrator lowpass filter. Some of those fibers carry information at higher rates than others. Never really been looked at. So let me give you some alternative views. Here's the conventional view of what happens in every subband in our auditory system; here's the conventional view from the standpoint of auditory research and cycle acoustics. Take a Hilbert transform to form a complex signal called an analytic signal, put things in polar form. Okay? That's where the problem is. You put things in polar form, as you'll see, there is that product. That product’s not going to be our friend, as I'm going to show you. Because you start with something which will be a beautiful looking envelope, if you looked at the original signal, two-sided real signal, formed its one-sided analytic complex signal, and then looked at this magnitude, which is again real and nonnegative, it's a beautiful looking temporal envelope with a signal. Tracks it really nice. Don’t need a lowpass when you do this. But then what people do, is so they get an envelope that's look reasonable, nonnegative real, and then they take what's left over, because that's what you need to put the whole signal back together, yeah- >>: What’s the motive of the processing people have done? They don't even take the phase one. They just take [inaudible] energy as the function of pi. >> Les Atlas: Well, the energy as a function of time would be very similar to this if you put a square there, for example, and lowpass filter it if you wish. If you don't use the analytic signal. But the problem then is all you have are envelopes. Okay? So the standard thing is if you only have envelopes, you don't have what we call to fine structure which is this branch down here, you're not able to track multiple talkers and not able to track a single talk or a noise. These are the things that I'm arguing for. >>: [inaudible] into the phase there. >> Les Atlas: This is the standard way people go after what they call temporal fine structure. That's a way I'm going to argue against. >>: Multiple [inaudible] simply take into spike time? >> Les Atlas: Take what? >>: Into spike time, of the low spiking one. >> Les Atlas: Oh. Well the spiking comes from somewhere, yes. Okay. There's>>: Actually, I think I've seen, you know, all the correlation models, that they recall the spiking time, whereas they’re not, they cannot typically [inaudible] represent by this [inaudible] white phase, the phase the other [inaudible]. >> Les Atlas: Well, I'm not going to argue with that. Okay? There's issues. But in terms of psychoacoustic experiments these days, and some physiology that's being done with mammals, which Shiab does with Ian and other people at Cambridge do, they do use Hilbert phase and that is the most common thing that's done. The neurophysiologist and psychoacousticians tend to use the Hilbert phase, which is that lower signal. You might be a little uncomfortable with it, I am sorry, I am too. Okay? So that's one example okay? But here, let me make a more general point, that if you decompose a signal into a slowly varying energetic like component, multiplied by a finer time structure component, that's still going to be problematic. Any of those. Especially if this is unimodular or magnitude equal to one. Okay? So let's take a look at the alternative. That's the problem. It's a multiplicative view. So what we’re going to propose instead is something that's closer to what's done in frequency vision multiplexing. We’re going to start with rectification. The ear does that. There's a certain properties about rectification that are going to be coming soon that you're going to see. Really interesting properties that no one really takes advantage of. That's half wave rectification. Then what's known as this notion of integration after that, which is a lowpass filter, now the new part, that top branch is a modulation envelope. Now the new part is, instead of it being a product, it's additive. What we’re going to put is what we call a fast envelope. If you're talking about phase locking, your phase locking to this fast envelope or carrier, it might be at the pitch, right, if is lower subband in speech, it might be at the input signal rate. So this addition on the envelope, will be riding on top of it, a much higher frequency. That's a very different model. It’s different algebra than we have here. Let's talk a little bit more about this new additive model. So we're talking about the top being something which is identical to the standard half wave rectifier, however you'd like to do it whether you say Hilbert envelope or use a rectification followed by a lowpass filter on a real signal. If you're look at for example, Stuart Rosen's definitions, you tend to have frequencies up to about 50 hertz and below for that envelope. Maybe below 16 or 18 hertz in some people's definition. But this new part, which is additive, that bottom fast envelope, starts with a low corner about 50 hertz, below the pitch rate. Below the lowest pitch, and goes on up to whatever phase locking rate you can have. If you look in the latest edition of Brian's book, you're up to four-five kilohertz for your phase locking rate. So I’m calling it a bandpass filter just to show it doesn't go on forever. And let's look at some of these models. The conventional model, I'm using a Hilbert phase because that's the most common of people I was talking to in Cambridge, but you can substitute your other favorite product model for the high-frequency stuff that your phase locking to. First problem with it: complex exponential phase, this is a theoretical issue, does not have an inverse function in the standard sense. If you pair it with its original modulation, you're okay. But if you make any change, you're stuck. Mathematically or algebraically. The underlying complex polar forms magnitude must always be nonnegative. We're going to see the problems that develop. Because whenever the real part attempts to dip below zero, you’ve got a subband of speech coming through bandpass filter. In the ear or whatever device you're building, it wants to cross zero. It's not nonnegative. Whenever it crosses zero, this particular Hilbert phase or whatever definition you use for a polar from product model, it's going to have a problem. It's going to abruptly change its phase by plus or minus pi. It’s not connected to any known physiology. And let's contrast this to what I'm proposing instead, which is an additive model. Fast envelope. Now there's a deep theory behind it. I didn't pull this out of a hat. The reason it’s connected to frequency division multiplexing and high-speed data communication is something called complementary statistics. If you take a look at, you know, more recently, the work of Peter Schreier, and Louis Scharf, and before that, Picinbono in France, transaction signal processing 1994, you can see new information and signals that hasn't been used in most audio processing. Very rarely used in audio processing. The only papers I've seen are icast 2012, maybe one in 2011 that I'm co-author on. Well I haven't seen anyone else use it in audio. It's very rarely used for natural signals, in fact. No show complementary envelope is usually used for complex communication signals where you have a cluster in a complex plane of 8 or 64 different points in your signal constellation. And because you have this cluster in your complex plane, it's not circular. If you look in a complex plane of the real versus the imaginary part, you wish to use quadrature that's fine, there's a pattern which is not exactly circular. That pattern’s not circular, it's got complementary statistics that are meaningful. It's got new information. When speech has periodic correlation, any signal with periodic correlation, could be speech, it could be sonar from a propeller, it has significant complementary statistics. >>: So is there any argument that this kind of[inaudible] is connected to auditory physiology, no? >> Les Atlas: I'm getting there. >>: Okay. Sorry. >> Les Atlas: I'm getting there. Okay? I'm about to get there. Couple slides. They still make use of a simplified lowpass and what I'm arguing for, the additive fast envelope. I'm trying to get at that. Okay? When you look at these kind of papers, they're talking about expected values, things that are in the limit. Expectations where you have either infinite limits of use SMI ergodic or other ways of finding an expected value if it's not ergodic you also have to assume that signals are harmonizable, other things that you need because they're not necessarily stationary, but yet you need to be able to find some statistics on them. Complicated mathematics, hard to get through with this complementary statistics. What we do instead these days, it’s not going to be my talk, I don't have time for it today, is a generative model. And what a generative model can do, it's much like having a DFT instead of a power spectral density. If you have a power spectral density, that's a pencil and paper thing. You need to know the underlying true autocorrelation function, you take its Fourier transform and pencil and paper. But if you have data you can't do that. You do a DFT, you do an FFT, you might window it. You might use linear prediction with another characteristic. Those are all spectral estimators. One estimator of complementary part is fast envelope. It is an estimator of it. It's not expected value, is not a pencil and paper thing, but it works with the data. Let's show you why. This is a slide where I can show you why that's true. This is only math that I'll be giving you today. It’s actually very interesting. And the reason is, if you take a look at this generative model of what the complementary part is, this new information that has to do with the non-circularity, if you took the analytic signal, the stuff that's really important communication signal the reason you get 600 megabits per second, if you look at that, you get a double frequency term where that complementary part is. And zero frequency term is where the envelope is. The standard envelope is beat by itself down to baseband, double the frequency that you started with; if your carriers have 2 kilohertz, at four kilohertz is a complementary part. That's what you’d normally get in statistics. In complementary statistics. Now when you work with a half wave rectifier, it gets a little more believable and plausible for the auditory system. This is why. Let's assume a sinusoid’s coming in. And what happens to the Fourier series for a sinusoid, gets really interesting. We have a sinusoid coming in, what comes out? This is a sinusoidal input. It's a sinusoid at the fundamental frequency. And with a minus pi phase shift, and minus quadrature, double frequency term. So here's the story. If you're talking about complementary statistics as new information, it’s usually a double frequency term. We know 1.5 kilohertz, 2 kilohertz is important for speech. The double frequency term would be 3 or 4 kilohertz. Hard to argue for phase locking up at those frequencies. Really hard to argue Brian Moore and his group for phase locking at those frequencies very rare. But if you're right at that 1.5 or 2 kilohertz, it's not so hard anymore. So stuff that leaks through at the fundamental is right at the frequencies you can get phase locking which is that complementary information, and by the way, if your first harmonic for a sinusoid is in minus quadrature. Not just the nature of the Fourier series. But just for a sinusoidal input. I can’t generalize this for broadband case because it's nonlinear. But for a narrowband case, it would be like this. The higher order terms for the sign term are very small. You know, go away basically. The higher order terms for the cosine terms are also quite small but they don't totally go away. So there's key principles going on here that's linking a fundamental to its harmonic. If we’re talking about a subband that's entered at 3 kilohertz for speech, at 6 kilohertz, that first harmonic, there's not to be much phase locking there, so we’re really not getting much. But if you are talking about something and 800 hertz, 1600 hertz, you will get phase locking and maybe this guy in this phase shift is important. So let's keep that in mind. >>: Just for this matter I couldn’t tell that there's a 270° phase shift. >> Les Atlas: Oh, all I did is, I just did the algebra. I don't have the derivation on this slide, okay, I can show that derivation separately. Okay? But the derivation is if you start with this as a signal model and then determine a sub k a zero and b sub k using standard definition of Fourier’s series, you get these functions that I plotted. So all that is, it’s closed form solution for a sub k and b sub k, even though it's a series, infinite series. >>: [inaudible] for a series. >> Les Atlas: Yes. If use a standard definition of Fourier series there's a definition of a sub k is equal to, another definition of what b sub k is, you solve for them and then you plot them, you get these pictures. For this particular s of t. That's all that is. Okay. So this is the effect of half wave rectification, which is interesting, a lot more interesting than I would’ve ever thought. So let's now do some demos that relate to these concepts. What I'm going to do now is I'm going to play two sounds that differ. This is a really crude model of one versus two talkers. And the way I'm going to do this crude model is not really one versus two sources; it's something which is consonant versus dissonant. And the one which is consonant is when they’re harmonic; it sounds like one source, no question about it. When they’re dissonant or inharmonic, which is the 412 and 740, those two, they don't necessarily sound like one source, they sound chord, like a crummy chord. And what we’re going to do is we’re going to play them where we play this first one, then the second one and we’ll keep looping through. And just to hear the difference between them. You’re going to listen to them now, and then I'm going to show you the fast and the slow envelope for these. That's what I'm getting to. I’m making the argument the fast envelope tells you the difference between consonance and dissonance. Between one source and multiple sources. But let's just start with the test signal so you understand how it hears. How it sounds. [sound] >> Les Atlas: That's this one. [sound] >> Les Atlas: That one. [sound] >> Les Atlas: That one again. [sound] >> Les Atlas: Okay. One sounds like a single source, the second one sounds maybe like a train whistle but has two reeds in it, whatever it is. Not a perfect cord but certainly more dissonant than the first one. Now let's take a look at these two. If I go with the, just the first case, which was this one, not playing, there it is. [sound] Just that thing where they’re harmonic and sounds like one source, what I'm going to do for the sake of my discussion, is to talk about half wave rectification, start with the usual, follow a lowpass filter. Simplest model auditorily acceptable to most people. What our auditory system does to find an envelope. I'll call it the slow envelope to distinguish it from the new concept, the additive thing, not a product, the fast envelope. Okay here's a way to look at it. You can think of the single signal that's coming through the half wave rectifier. And the lowpass filtered version is just following the peaks of it. Okay? Because the lowpass filter. What's left over? Bandpass filtered part. High enough corner of the bandpass filtered part, these two sum together give you what came out of the half wave rectifier. Give you everything except the negative going part of the original signal. So what you're doing>>: So for the bandpass filter, what is the band frequency? >> Les Atlas: So for this particular one, I think that what we used here was it the corner 450 for this one? >>: Four hundred fifty over two. >> Les Atlas: 450 over 2, okay, so it’s half way. >>: Is that the band? >> Les Atlas: That's the low corner. >>: And the upper corner was 3 [f c] over two, so three times 450 over two. So you would get that single frequency term. >> Les Atlas: Right. And likewise for this one which shows a corner that was, made sure that it was flat. And we'll see the results are shortly. But I'm going to be comparing the results to this case, which is the inharmonic case. Takes a while to load. [sound] >> Les Atlas: Sounds like two things, maybe, train whistle at least. We're going to do the same thing for say, comparison. This is the more dissonant case, like two sources maybe. The slow and the fast envelope. And we’re going to look at them and compare them. Let's compare the harmonic fast envelope first The 300, the 600. Clearly harmonic . Here's a fast envelope coming out of the 600 hertz input. Look at its peaks. Line them up. There's a 300 hertz input fast envelope. Perfectly lined up. And the valleys also line up with the peaks of the other one. There's the fast envelope structure aligns in time . So if I had to subbands, one corresponding to about 300 hertz, another corresponding to about 600 hertz, look at that fast envelope that comes out, call the lowpass filter after the half wave rectifier a leaky integrator, look at what leaks out, line them up with their phase locking, the phase locking lines up nicely. If you correlated the two. >>: That's funny. Just [inaudible]>> Les Atlas: Sure. >>: Just to make sure I'm following. Aren’t those [inaudible] flipped? Isn't 600 the one in the top and 300 the one in the bottom? >> Les Atlas: You're right. Sorry about that. Okay. I could call this scale and we’d get away with it. Sorry about that. Yes they are. That's a mistake. Thank you for pointing that out. Good eye. >>: Those envelopes, they line up in the same way as the original harmonics that you used to create the sound lineup. So what would happen if the original harmonics would be shifted relatively to each other? >> Les Atlas: So you mean that the original harmonics are all shifted together? >>: Yeah, so could you go back to the last one? >> Les Atlas: Okay, because I will be showing you the inharmonic case soon okay. >>: No, I’m talking about the harmonic case. >> Les Atlas: Okay. >>: So yeah, it's a, hang on please, on the top right, you see that, so the top right is the sum of the left signals, so you see that they have then aligned such that peaks of the 300 hertz line up with the peaks of the 600 hertz. And of course as a result, the envelopes line up in the same way. What would happen if you were to shift 600 hertz wave by say>> Les Atlas: 50 hertz. >>: More than one? >> Les Atlas:. Both by 50 hertz, you’re saying. >>: No, if you shape it in time by a quarter period or so. >> Les Atlas: Oh, if there's a relative phase delay between them. >>: So pretty much I believe this one makes them to belong to the same, we to perceive them belonging to the same signal, they are in phase. My guess is that if you put, let’s say, 57° for the initial phase shift between 300 and 600, we may perceive>>: [inaudible] explanation is different. That particular phase relationship has to do the fact that they [inaudible] output of the half wave rectifier, right? Because with the half wave rectifier, the second harmonic, the second beat has to be aligned with the other one to produce the continuation of the low peak because of the rectification. >> Les Atlas: That's a factor. But if we went ahead and did a relative phase, and if you think of what the ear does with the traveling wave, they’re not in phase by the time they get to the hair cell. Okay>>: I would expect in the ways they move around it won't change much our>> Les Atlas: Same perception, so>>: And every model would predict that. [inaudible] model would predict the same thing. >> Les Atlas: Yeah. And so when I brought this example up to the psychoacoustics and physiology people in Cambridge, it’s not a discussion that went on for a week. And this is to me as success. Okay? That's what I wanted to do. I want people to things a little differently. >>: [inaudible] because all that randomness, when you go to the phase spikes, you'd don't swap out all these differences. >> Les Atlas: You're looking for correlation somewhere else down the line. So if you do a relative phase shift one, and that's never exact, but you get random firing at the peaks okay, and if there's some correlation somewhere down the line in this huge structure in the auditory midbrain, it caused these likely from the same source . And all I want to do is contrast, so we could put phase shift in, we could get something where they don't line up perfectly in the plot that I showed you. Let's go back to the plot. We get something where we don't get exactly this alignment. Okay? There’d be relative phase shift in the bottom one versus the top one, but over time there'd be correlation between the two. And how that correlation is measured somewhere else up in the auditory midbrain. >>: [inaudible] >> Les Atlas: Yeah the cochlear, and over time, be it 10 milliseconds or 40 milliseconds, so statistics would accumulate, say, hey, we've got one source. Because they’re firing in synchrony in some way. Even though there's a relative delay. Yes. >>: And the magnitude of these envelopes would also be effective by the relative phase of the original signals? >> Les Atlas: Nope. Here's the magnitude of them right there. They're both, the slow>>: As the signals, as you change the timeline into the signals, the magnitude of this spectrum stays the same? >> Les Atlas: The magnitude of each of sum band output stays the same. Okay? The single signal itself, of course if you look at the magnitude of the signal itself before it goes into the individual subbands, of course that’s changing. Okay? The signal itself is changing it peaks and its shape. But if you look at the subbands in the auditory system, what's coming out of each one, call it a slow envelope that's defined by half wave rectifier and a crummy lowpass filter, but only keeps low frequencies, that’s flat as a board. With your phase shift or without. Okay? And the reason I'm doing this particular case is because when we go to the inharmonic case, the dissonant case next, they're also both going to be flat as a board. So when you look at the standard auditory model of a bunch of subbands followed by a rectifier followed by a lowpass filter, consonant and dissonant shouldn't sound different. They may sound different pitch, but the consonance versus dissonance should not be noticeable. It’s something else. What is that something else? Okay? People would argue for the Hilbert phase. I argue against that. In a moment I'll tell you why. This is the alternative, which is the additive alternative. Now this is only the harmonic case. Let’s go to the inharmonic case. See what happens. Now they're not lined up. No matter what you do with the numbers that we chose, these are never going to line up. You could shift the phase of the top, relative to the bottom, but they're never going to line and accumulate their statistics in terms of their, for example, phase locking in such a way that over a period of 20 or 500 milliseconds you can collect things where they’re synchronized. They’re always a synchronized and that's why they sound dissonant. You can look at the peaks too, and the valleys, and we also have, [inaudible] have reversed again, so sorry about that. So they're not aligned ever. But look at their slow envelope. No information of slow envelope for subband. They're both flat. So consonance versus dissonance did not make it through the low pass half wave rectifier. It's the high pass part. Or the bandpass part. Or the fast envelope, as I call it. Okay, so that's my new concept. These are additive concepts, that's another important thing. Let's compare this to the conventional view that’s used by psychoacousticians and neurophysiologists. The Hilbert phase is very popular now. And I'm going to give you a different example. Let two tones beat. Sum them together. Let's just play this case. Takes a little while to load. [sound] >> Les Atlas: Now these, this would be considered the case of two tones falling into the same auditory subband. Because they're following under the same auditory sub brand, they’re beating with each other, and you can certainly hear that beating whatever that envelope rate is that you're staring that. And that's a pretty well understood phenomena, going all the way back to Helmholtz, that you have an envelope, could be tuning a guitar or something like that, you’re listening to the beating, you are waiting for it to go away. When you've got frequencies that are close to each other, now the first kind of hint that there's an issue here is which of the two envelopes is correct. That top one or the bottom one? I don't know. They're both correct. Now let’s take a look at that in the frequency domain . They're very close. We presume they're in the same subband, which what they would be for our auditory system. Let's look at the conventional Hilbert phase temporal fine structure. The way it’s defined is that bottom quantity is the cosine of that phase that comes after the analytic signal. You put in polar form, complex number for polar form, its phase function, which is a cosine, is between -1 and one. That's what a cosine’s going to give you. Whatever phase you put in there. Let's take a look at that signal. That’s that same beating signal we just looked that. Let's look at some of its characteristics. Let's zoom in on this portion. After we zoom in on Tom, I didn't want to use, I told Scott not to use Matlab's interpolation between points because there is no interpolation to be done between points. It isn't a vertical line there, that’s the discontinuity. So there's something wrong mathematically with this. This definition gives you discontinuous signal. This is what people are using. And when I get to speech, you're going to see that this is not an artificial case, this happens in speech also. So there's a severe discontinuity, plus minus pi, whenever that thing wanted to cross a zero. So let's take a look at this same issue. I argue there's a problem with Hilbert phase and other ways of representing this fine structure. And the problem is fundamentally that whenever your envelope wants to cross zero you're going to have a discontinuity in any of the standard definitions. What happens with our new fast envelope with this case, do we still have a discontinuity, do we have a mess? Let's take a look. Half wave rectifier, slow envelope, is just that envelope. There's a fast envelope in blue. Okay. The vast envelope is not magnitude one. It doesn't go between -1 and one. It has modulation left on it. That's troubling for some people. That was troubling for me at first because I'm used to this product model. And the product model is where the fine structure has a magnitude that's going between plus minus one because you've taken all the modulation out of it. How could you say that the envelope’s a modulator if there's still modulation left in what you call the fine structure? I was stuck thinking that way for the last 10 years. Our entire community’s been stuck thinking that way since the 1920s. Okay? There's no way to force this to go between plus minus one without forcing a discontinuity in it. And that discontinuity, as were going to see, would exist in speech if you analyze it this way and it also causes troubles. >>: So why do physchoacoustics use this model? I never liked this model to begin with. I talked about [inaudible], so this stuff [inaudible]. Spiking information? >> Les Atlas: Well, this is a predecessor to the spiking. Okay? This is what comes before the spike. This is what the spiking’s based on. I'm not arguing that this, I'm not arguing this in place of spiking. At all. I'm saying the spiking's got to come from something. What does it come from? >>: The spiking come from the [inaudible]. [inaudible] going there, they could, you know, [inaudible] down this there [inaudible], you don’t have to explain, you know, [inaudible] you can still detect all kinds of spiking. >> Les Atlas: I will argue that, but the physcoacoustic experiments that are being done these days that are looking at, are using this, yes. You know, 2012 papers are using this, okay? So, but let's zoom in on this portion. What I'm trying to do is give you a good alternative. So what you’ll have is people who look at it other ways, but they still look at it in ways where that fine structure before they find the spiking has to go between -1 and one, so it still has to have problems. And I'm arguing you have to get away from that. So here's what happens with the fast envelope when I zoom in on that portion where the envelope crossed zero. The fast envelope just go to zero smoothly, there's no discontinuity, there's no trouble. But what I want to get to is this case of speech because we’re going to do the same kind of thing for speech in a moment. So I'm preparing a Hilbert phase version, or any version which is a product form and its fine structure before you determine the spiking, with something that we’ll call a fast envelope, which is just the stuff that leaks through the leaking integrator that's going on after the half wave rectifier. And just compare them spectrally. If we look at them spectrally between the two, the red is the Hilbert case, spectrally, the blue is the case of the fast envelope. The fast envelope is giving you your two tones exactly. Hilbert phase is giving you all these harmonics which are distortion products due to that discontinuity. Now, we could compare these two by listening to them, but I think I want to get to the speech looking at the time. Let's take a look at a 1.5 kilohertz subband of speech. We’re talking about artificial signals. Does this really matter for speech? And my argument is that yes it does. Let me just play the speech sample. [speech sample] A bicycle has two wheels. >> Les Atlas: Let me move ahead. Let's pass that speech sample through a single subband of speech based on, well, the models that they used in Cambridge. And we could just play it after it goes through that subband. [sound] >> Les Atlas: 1.5 kilohertz, it sounds squeaky, but you can hear some of the speech coming through just that subband. Now let's take a look at our various representations. Hilbert phase: there's the original signal in purple, coming through there, not the original signal, but after the subband, is in purple right there. Its Hilbert envelope is the black dashed line. Beautiful looking temporal envelope. These are time signals. There's a Hilbert phase for speech. What did it do? Pretty ugly. This is any definition that's assuming a product form where that fine structure has to go between -1 and one. If you are spiking follows, fine. But any model that does that is going to have these problems. Let's zoom in on some portions. Discontinuity right there. Let's zoom in when it gets small. When the magnitude gets smaller, the signal, got some problems. What this does right in this portion, not a function of the signal, it’s a function of the word size of your computer. Really, what it’s doing here. Because we’re down at a very small signal level and we’re looking for a transcendental function inverse tangent. So, a Hilbert temporal fine structure, which pretty much everyone's using an psychoacoustics and a lot of physiologists are using as a predecessor for the spiking model has artifacts. It's because it's of constant amplitude even at times when the artifacts are pronounced, and that's fundamentally because of the product model. It's also not terrible physiological, using this. So the fast envelope fine structures are alternative that doesn't have these mathematical problems and also is more physiological. Let's do the same experiment. Here's a Hilbert phase temporal fine structure that would come out. If we use a half wave rectifier instead, let's look at the fast envelope. Fast envelope is, I'll zoom in on the difference. The Hilbert is the red. Conventional is red. Product model’s red, our fast envelope is the blue one, right there. It has no discontinuity. >>: [inaudible] this fast envelope can capture the carrier somehow? >> Les Atlas: That's exactly what it, it's capturing both the pitch and the carrier, depends where you put that low past corner. >>: But typically a communication signal passing [inaudible] carrier is because of the frequency and the cosine sign. But here, carrier changes the function pattern into [inaudible]. >> Les Atlas: We have to have it that way otherwise we’re going to get artifacts. Now, those artifacts you get in man-made communication signals are the message that's being sent. Okay? Here, we are not, we don't have such a clean the message that's being, that was being designed, that was man-made. So, that's why it's a problem to use those communication ideas for natural signals. There's nothing that's doing, for example, phase shift gain, coming from our auditory system that we have control over. Okay? So if we would put, or make a receiver look for that, it’s going to see artifacts which were zero>>: So it’s [inaudible] to argue that the slow envelope is another [inaudible] message that gets sent? >> Les Atlas: The slope? >>: I mean in man-made communication system, you get constant error of fast [inaudible]. And then you get the message that you [inaudible] but now for this metro signal, so what is the analogy here? Do you try to make analogy that the fast envelope, it's like, you know, distorted version of the carrier and the slow envelope [inaudible]? >> Les Atlas: Here's the closest I can answer that. The experiments are in progress right now. Okay? They're working with guinea pigs; they're looking at this model and comparing it to Hilbert phase in Cambridge right now. The model that we are using is one, we are comparing the red for where the spikes are. The probability of a spike is highest at the peaks of the red. That's old model. Okay? The new model says the probability is where the peaks in the blue are. But if the peak is very small, it's a lower probability. That's all. >>: I see. >> Les Atlas: So what we’re saying is we are still phase locking, but when the signal level is small or crossing zero the probability of phase locking at that phase drops down. That's all. >>: So you can simulate using this one as a [inaudible] and then compare the spiking results experimentally to see how much constance you might get? >> Les Atlas: Yes. >>: Nothing has been [inaudible]? >> Les Atlas: No. By December, well the resulst will be ready by the end of, probably paper ready by December. A draft. >>: And there’ll be experiments to verify that. >> Les Atlas: Excuse me? >>: Who [inaudible] experiment measurement to see whether using this model as probability, you can tell the spikes they are consistently [inaudible] statistic with the [inaudible]? >> Les Atlas: Like Stone and his colleagues at, in Cambridge. Okay, and physiology people from the neurophysiology department, Ian, I can’t think of his last name, his first name is Ian, who I was there with. They're working with guinea pigs and looking at single in the inferior colliculus and lower-level cochlear nucleus. >>: But in the cochlear, in the ear, you’ve probably [inaudible], you probably have to go to a high level. >> Les Atlas: You can't really get to the fibers coming off the hair cells. My argument would be in the fibers, the eight or so fibers coming off each hair cell, each outer hair cell, you might see something that spikes at the peaks of the blue. So here's a way to look at it: spike probability’s high here, it's pretty high here, but it’s higher here than it is here. That's an example. >>: That's very cool. >> Les Atlas: Okay? So it's just, it's a model that doesn't mathematically break. If you followed the red, and spike at every peak of the red, what’s going to happen here? Big spike here, what happens there? It's meaningless. Okay? So when you look at the results that people try to predict what the physiology is doing when they use the Hilbert phase or other measures that are going between plus one or minus one with a product model, they don't work so well. Okay? That's really not a big change. Now there's a little more on this. We could listen to these, and if you listened to the Hilbert phase it sounds horrible, all clicky and stuff, you listen to the fast envelope it's much closer to intelligible, doesn't have the distortion, that should be obvious. But one last, a few last points I want to make. We have all the subbands in the ear. And I want to get back to what, you know, an application what this could be. And on the right-hand side is a really interesting one. We don’t necessarily with OFDM or with orthogonal frequency division multiplex, we don't, we do blind equalization, in fact, if you walk around holding a Wi-Fi or cell phone or something like that, the reverb helps you. Why does it help you? Well, we look at it couple different frequencies. And if we’re at frequency of one, some subband and the direction a for a strongest reflective path and out of phase, ignore that subband, it's weak or weighed a lot less. Whereas if you're up some other frequency and just the distances of these are just right, the timing of them just right, you get constructive interference, not destructive like you do for frequency one. So if you look at a whole bunch of frequencies divided up into a bunch of orthogonal frequencies, which is what they do in OFDM, you’re going to get something that does much better. As you walk around it's automatically scanning over the frequencies. Now there's other things, like it's orthogonal, which the area isn't, there’s a carrier channel coming through which the ear doesn't necessarily directly have. I'm just arguing for this kind of analogy and also care carrier recovery issues. A little more complicated than that. So, we can compare what we have, this is kind of from Malcolm's purpose. What's the difference between what we are arguing for now, which is this fast envelope, versus say Malcolm Slaney’s corellogram? In a corellogram, you're looking for this kind of information by doing correlations, what’s coming out of every subband, where this axis is the frequency of subband, this axis over here is autocorrelation lag. And might just contrast those two. In a corellogram you've got perfect symmetry about the center points of autocorrelation, perfect symmetry right here and here, slight amount of time variation, in this case over here, we don't have the exact perfect symmetry but there's a much bigger difference. This picture’s oh, nonnegative real. Okay? This corellogram is all non-negative real. This picture on the left can go negative. Okay? And what's happening here with the picture on the left is everything that's blue is a zero crossing. And that zero crossing and that asymmetry is a query different representation. The thing on the right is a square. Is a quadratic version of the signal. The thing on the left, there is a rectifier in there, but there is a potential for it to be closer to holding superposition. >>: [inaudible] this is for one segment of vowel, it’s [inaudible], it’s not something you have to change over time? >> Les Atlas: Single speaker, one segment of a vowel about six, seven pitch periods. >>: But the time is different in the corellogram, the axis is the time delay. >> Les Atlas: This is samples here, and this is time in milliseconds, ees. >>: Oh>> Les Atlas: So they're not the same. They're both time but different units of time. >>: Can you pluck that fast envelope [inaudible] something also in the same way that you do correlation? >> Les Atlas: Oh, it kind of is here. The only difference is that this is a sample index and this is actually, Scott converted this to actual time in milliseconds. So they really are equivalent. The duration of the left one and the right one, they're pretty close, not exact. But they really have the same time axis, they're just, the units are different in this plot. >>: I see. >>: There's a straightforward mapping [inaudible] for the general frequencies so>> Les Atlas: Yeah. There is a straightforward mapping. There's a straightforward mapping of this axis, of that axis, these axes are identical, but the axis coming out of the screen, okay? The dependent variable is wildly different. That's the big difference. >>: How do you experimentally verify which one’s more? You know, realistically>> Les Atlas: Well, I really don't know what to do with these, but the physiological experiments that are going on in Cambridge right now, are trying to confer whether that fast envelope does a better job of predicting phase locking. >>: I see. >> Les Atlas: For signals that are, for example, beating, and he's also using chirp sounds from guinea pigs, their natural vocalizations, a set of signals that have characteristics of the things I've demonstrated for you. I can make a few more summaries here. The main point I'm making is that the envelope in its fine structure, which I call TFS, for temporal fine structure additive and not multiplicative. You don't have the problems if they’re additive. You're not forcing the temporal fine structure to have these discontinuities or strange behavior. Additivity is an easier decomposition to work with. If you align fast envelopes, it’s a good way to do segregation. We can allow, for example, an observation that happens in the auditory system is that neighboring subbands in speech that overlap heavily have sharper,allow for sharper tuning curves for loud sounds. How does that happen? Well, allowing the fast envelopes to be additive can reinforce the tuning curve, or the narrowness of a subband, which is actually quite narrow in the auditory system. And all of this is a something called frequency diversity. It's potentially a new view of pitch modeling; I was talking to Bob Carillon about this when I was in Cambridge. There's need for something more general than autocorrelation features, and this is, when Bob and I talked and when I was talking about corellogram, there's a couple things the corellogram doesn't predict that happen. One is that the corellogram is not sensitive to forward versus backward order . You could take a signal and reverse its order. Corellogram doesn't change, okay, assuming the signal’s, for example, is a steady vowel. If you reverse it, the corellogram is identical, whereas what we're showing with the fast envelope, it changes. And even a transient signal, if you reverse its order, corellogram, over a certain interval, will not change. Roy Patterson's auditory image, this is another person who is at Cambridge that I worked with, he talks about a stabilized image, and what we're talking about with the fast envelope is a computationally or mathematically defined way of doing what he calls an auditory image. Better potential for analysis than Hilbert or other product models, the fact that it's an additive model means that he can be decompose it. So, that's it. Done with 2 minutes left. Thank you. >>: Can you say something about how this bandwidth has filters [inaudible]? >> Les Atlas: Okay. , I don't I don't want to fix on any two frequency endpoints, but let's just say the lowest frequency of the bandpass filter is just lower than the lowest pitch you'd ever get. Because you want to capture the pitch period, in speech that is, you want to capture your periodicity of your pitch. Let's make it 50 hertz or 70 hertz or something like that. Let's let its highest frequency capture the fastest phase locking, whatever mammalian species you're working with. Now cats phase lock at a higher frequency than humans do. So whatever your phase locking of humans gets argued between 1.5 kilohertz up to 8 kilohertz. Somewhere in that range for the high-end. So it's basically everything outside of that low-frequency envelope- >>: So it doesn't matter how big it is as long as>> Les Atlas: It can be large, yes. Now there's going to be certain subband width you start with, which is if course going to limit it. >>: I see. Does it come from this complementary>> Les Atlas: All the complementary information is in the fast envelope. Now let me be a little more clear about what the theory of complementary statistics. Okay. Everything that's down in the slow envelope is the circular or non-complementary part. There's nothing which is complementary in the slow envelope. It's all left out. But when you go to the fast envelope, you have both. >>: I see. >> Les Atlas: You have a mixture of both. That's with the theory says and that's what you get in practice. That's a fancy way of saying that when you look at that fast envelope it still has modulation on top of it. But it's got other information buried in there that you wouldn’t have had in that slow envelope. >>: Okay. So going back to my original question, which information do you [inaudible] as the message, and which information to you [inaudible] as carrier? And do you mix them together or do you consider them to be separate as [inaudible]? >> Les Atlas: You know, I would have to say that, you know, in terms of how the ear works, for example, or how a speech recognizer should work, you should use both. Okay? Both the best you can. Why throw out any of it. Okay? But you can, but here's, but if you have pristine conditions, that is, one talker, high SNR, you can use only the envelope and use noise as a carrier. That's Bob Shannon's experiment. That was in science. Eight subbands, you get intelligible speech. You throw some noise in there, you have a lower SNR, it falls apart. You throw two talkers in, it falls apart. What does that suggest? It suggests that this fast envelope is needed for noise, for multiple talkers, or for reverb, the things that would make the noise excited vocoder break down. >>: Thank you very much, Les, for the wonderful talk. Let’s thank these people.

25179 >> Arjmand Samuel: So, it's my pleasure and honor to...

Related documents

Products

Support

25179 &gt;&gt; Arjmand Samuel: So, it's my pleasure and honor to...

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib

25179 >> Arjmand Samuel: So, it's my pleasure and honor to...