>> Ivan Tashev: Good morning, everyone, those who are... room and those who are attending the talk remotely. ...

>> Ivan Tashev: Good morning, everyone, those who are in the room and those who are attending the talk remotely. Today, we have the pleasure to have Eleftheria Georganti. She's a Ph.D. student in University of Patras in Greece. Got her Master's degree in electrical and computer engineering in 2007. So for the last six years, she has been a Ph.D. student and had the opportunity to visit a lot of places, being nine months in the Technical University of Denmark, then she, then she visited Philips Research in Eindhoven, in the Netherlands and spent some time in University of Oldenburg, always doing [indiscernible] research and processing. So today, she's going to talk about her findings and her results. Without further adieu, Eleftheria, you have the floor. >> Eleftheria Georganti: Thank you very much for the introduction. So my name is Eleftheria Georganti. I'm pronouncing it also in the Greek way, and I will give you an overview of my research experience over the last six years of my work. So the contents of this presentation, I will initially introduce myself, although I got already a good introduction. Then I will let you know of my of the topic of my Ph.D. thesis and I will provide you an overview of it. And I will emphasize on four segments of my Ph.D. work related to the analysis of statistical properties of responses and signals. I will present you two methods for the estimation of distance between a source and a receiver from a single channel and binaural signals, a method for room classification, and a framework for a method for the estimation of acoustical parameters, such as the directive [indiscernible] ratio and clarity. And I will also refer to some hide side projects which are not actually part of my Ph.D. work, but I have spent some time working on other topics while be in Philips or while collaborating with other colleagues in my laboratory at the University of Patras, related to automatic calibration of an ambient telephony system, subjective evaluation of signal enhancement techniques, and finally some work related to architectural room acoustics, which we have quite some experience in my group. And I will finish the presentation with some ideas for the future. So my first thing is where I come from is the University of Patras. It's you can see this red dot, this is where Patras is, it's like 250 kilometers from Athens. My group is the audio and acoustic technology group, which is a part of a bigger group called Wire Communication Laboratory and has all these other groups you can see here. We are actually a small group, I think, for [indiscernible] so my professor, like, the head of the group is Professor John Mourjopoulos. We have three post doc students and we have three Ph.D. students, and there is a wide variety of topics that we've been working on, but the expertise of the group is audio signal processing, room correction, signal enhancement techniques. But there are also some other topics but they're not so signal processing oriented let's say with, for example, for this was doing some working on novel sound reproduction devices or Bob is working with acoustic energy harvesting. Concerning me, [indiscernible] I told you about my background. I've been working on my Ph.D. from 2007, and had the chance to collaborate with other groups, as you can see here, and have spent nine months at the Technical University of Denmark collaborating with Dr. Finn Jacobsen, who unfortunately passed away last week. And getting a good insight on physical acoustics with and this is something that I didn't have so good background before that. Then another nine months at Philips Research, collaborating with Dr. Stephen van de Par and Aki Harma. And since we have studied, established a good collaboration, I spent another three months at the University of Oldenburg where Dr. Stephen van de Par got the position, professor position last year. And I'm now in the final stage of my Ph.D. degree. I have actually already finished up, finished the writing and presented in the first committee and I still have to do the final presentation in September. And something else that I've been also participating I have also participated in is the intellectual group of AABBA, which is a grouping of various laboratories in Europe and also in the states, and everybody's working with binaural techniques and methodologies and we try to exchange, let's collaborate and to further improve the understanding of the mechanisms, but also propose some novel methodologies for signal processing. So we'll start with a basic concept of my Ph.D. work and everything that has to do with the acoustics, and this is reverberation. We know very well that room acoustics introduces reverberation. And we can fully describe this phenomenon with the room impulse response, which can be easily determined following the standard methodology by driving a sweep signal. For example, a specific [indiscernible] room and [indiscernible] microphone, we can divide it [indiscernible] and we get the expressions of [indiscernible] response and the room transfer function in the time domain and the room transfer function in the frequency domain. And, of course, these are complete acoustical descriptions containing all the information we need for the acoustical properties of the room, and how the acoustics affect the signals they reproduce in the rooms. And, of course, it's known that the room responses will vary across different positions in the room. So, for example, here you can see at position one, we would have a more flat response and less [indiscernible] in the frequency domain than in the case of position two, which is for the distance. My Ph.D. topic is entitled modeling, analysis and processing of acoustical room responses and signal us under reverberant conditions. And bearing in mind all these things that I told you in the previous slide, that the room responses contain all these important information for the acoustic engineer, my task was to try to determine attributes that normally it's easy to extract from responses directly from signals. So getting in the room recording signals and then from the signals try to estimate acoustical parameters, the distance between the source and receiver and so on. The way I have approached this problem is based on this simple relationship that the anechoic signal when it gets reproduced in the room, we get the reverberant signal, which is a compilation of the room and impulse response for these pair of positions and the anechoic signal. And the frequency domain would have a multiplication, but we can also face expressions of the signal, the anechoic and the reverberant signal, and the room transfer function as distributions, as [indiscernible] distributions. We tried to determine what are the statistical relationships between these three components and tried to find whether some statistical measures are related to statistical parameters, for example, the direct to reverberant ratio, clarity and so on. And try to tract models for each of these three components with emphasis on the response part, on the room transfer function, and at a final aim try to exam the potentials or a potential of using this knowledge for the development of novel techniques, signal processing techniques. So we start with some an overview of an analysis of statistical properties of responses and signals. I hope it's clear why I'm motivated to do this, and I hope it will get more clear as I further continue. So as I have already mentioned, we have a [indiscernible] time domain, which is [indiscernible] frequency domain. And then if we were trying to determine the different relationships between these three expressions, I have already done some work, just the first step was to do some things to examine this relationship just by observing things, just taking the histograms, plotting them for the anechoic signal response and the reverberant signal and try to see if they're related, how they change with distance with other within other rooms and so on. So we emphasized on the measures of standard deviation and there is a good reason to do that, as I will let you know in a while. And kurtosis, which seems to work quite well, and we tried to create some models for the standard deviation and the kurtosis of the room responses and tried to estimate the statistical measures [indiscernible] from the reverberant signals. So the idea was to find a model for a room transfer function and then use only the reverberant signal in order to extract information for the these statistical measures of the anechoic signal. >>: [indiscernible] do you mean like a special signal like [indiscernible] or speech. >> Eleftheria Georganti: Speech. It can be any type of signal. This analysis, it can be any type of signal. The methods I will refer to, afterwards developed for speech signals. But it shouldn't be, of course, [indiscernible] signal. This is not to use the [indiscernible] signals and use any [indiscernible] signals. So then [indiscernible] relationships, we can we have the convolution time domain, frequency domain and multiplication and the addition of [indiscernible] domain, denoting each of these three terms which you can see here, have also used different colors, we can assume that X and Y are independent, random variables. Then we can find some relationships for the standard deviation. This is spectral standard deviation so we are in the frequency domain, we look at the magnitude of the frequency response, and this is the standard deviation of these three [indiscernible] reverberant signal. Sigma X to the transfer function, and Y to the signal. And we can find some relationships for these two statistical measures for these three yes, please? >>: [indiscernible]. >> Eleftheria Georganti: Sorry? >>: [indiscernible] in the frequency domain or in general they're complex. >> Eleftheria Georganti: >>: You mean a positive? Signal magnitude. >> Eleftheria Georganti: Yeah, so, of course, you can always say sift the frequency response wherever you want so it's actually the relationship between the spectral values of the actually is of interest for us. So the actual absolute values, the dB values, whether they're positive or negative, it's not of concern of this point. >>: Okay. [indiscernible]. >> Eleftheria Georganti: >>: Yes, these are [indiscernible]. Yeah. And the frequency [indiscernible] general statistics? >> Eleftheria Georganti: Yeah, it's for specific frequency bands, the analysis. You can use either one [indiscernible] fraction analysis or wider frequency bands. I have done several tests with in order to find what would be the most appropriate frequency and bandwidth analysis, depending on the application that you want to end up. So the reason we have used the standard deviation is that there is the well known work of Manfred Schroeder in 1954, that he showed that the [indiscernible] deviation when we are in a distance, for distances above the critical distance. The standard deviation, the [indiscernible] standard deviation converts to the takes the values that is close to 5.6 dB. when they diffuse sound field in the room. So And then also, Jetzt, in 1979, he showed that the standard deviation would increase when we are very close to the [indiscernible], let's say you have a microphone and the sound source very close, we will have smaller spectral standard deviation. Of course, the spectrum will be flatter, there will be more deviations. So the more you increase the distance, the more deviations will appear on the spectrum, and this can be captured with a spectral standard deviation. So we have an increase up to the critical distance of a room, which depends on the reverberation time and the volume of each of the rooms. And according to the [indiscernible] theory, above this distance, you would have a specific standard deviation value, which is 5.6 decibels. So that's why we decided in these first steps of my analysis to use the method of standard deviation, which is there was already some existing, let's say, knowledge about it. So here you can also see some results using one [indiscernible] band analysis or full band analysis. >>: So this is the standard deviation of the spectrum of the room impulses across the frequency base? >> Eleftheria Georganti: >>: Exactly, yeah. So [indiscernible]. >> Eleftheria Georganti: It gets more deviated. So this is the standard deviation of the response, and we were trying to see, as I already showed you this, that we have found a relationship within the standard deviation of the reverberant, the response and the anechoic signal. We were trying to relate these three components and using those [indiscernible] model. So let's say this is actually the base, the framework that my Ph.D. thesis has been was based on it. So as a further step, we decided to try to estimate distance from single channel signals, and the idea is as you can see here in the figure, we have a speaker which is situated at a specific distance from the microphone, and this can be also used in, for example, in an open conference call or if you have multiple devices in a room, you might need to know the distance between the speaker and the device. So we were trying to find some ways to do that. One easy way was to, okay, take the sound level, the sound [indiscernible] level of a speaker and say okay, if it gets closer, it will get higher and so on. But, of course, this is not a good idea because each one of us would speak in a different way depending on whether it's a man or a woman. And so on. So we tried to use some statistical features that depend on distance. And relate and employ pattern recognition techniques, machine learning techniques in order to develop a method for the distance estimation. So as I have already told you, the first thing to do is try to observe what is happening. So here, you can see reverberant signals. They're speech signals, the frequency spectrum of signals recorded at 0.5 meters, one meter, two meters, and three meters. And you can see that the histogram changes, and these changes could be captured, probably, with kurtosis and skewness. Kurtosis shows how far is distribution from the normal distribution, how picky it gets or how flat it is, changes with this. And skewness shows how symmetrical it is. So we use these two features and here you can see how they behave across time. Across different time frames. The skewness and the kurtosis. And it's evident of their really distance dependent. So the idea was to use these two attributes and also some other ones related to the [indiscernible] of the linear prediction residual, which works mainly for speech, also [indiscernible] detector, which is this last line of the code. And using all these parameters and employing pattern recognition techniques, okay, we could develop a method for distance estimation from single channel signals. For this method, we have employed Gaussian mixture models and used the five features I have already shown to you, and they're filtered versions only for 10 to 15 kilohertz. So full band, but also for a smaller frequency rate. And we have evaluated the method, and you can see here that when the system is trained with the same speakers that is evaluated, the method got 72 percent. And then when the speaker is unknown, we got a worse performance. We also conducted a listening test, where we tried to we used the same signals as the signals as we tested the method, and we asked the test subjects to sort the signals in terms of distance and say what is further and what sounds closer. And here you can see the results of these analysis, which are quite close to the unknown speaker situation. >>: Is this all in the same room? >> Eleftheria Georganti: >>: Yes, we did the test in two rooms. Test in two rooms? >> Eleftheria Georganti: Yeah, but so these results are all the training, and the testing is in the same room. So you train in one room, and you test it in the same room. >>: [indiscernible]. >> Eleftheria Georganti: Exactly. So you have a different course, this is one of the biggest problems of distance of estimation techniques right now, to find a way to have them work in unknown room environments. >>: How many [indiscernible]. >> Eleftheria Georganti: Yeah, maybe about 20 minutes of recordings for the training and then another ten minutes for the testing of the method. >>: And the [indiscernible]. >> Eleftheria Georganti: Yes. [indiscernible] yeah, okay. So another thing that I have been working on was trying to estimate distance from binaural signals, which is, which should be easier to do than from [indiscernible] signals where you have only one input. For this reason, we have extended our framework analysis in this binaural case scenario. Here, we're also going to write the relationships as previously but in this case, we'll have two channels. So convolution for the left channel and the convolution for the right channel, multiplication, similarly, the frequency domain, and then in the [indiscernible] domain we can simply write the same relationships. And then we [indiscernible] subtract it, the right spectrum from the left spectrum of the [indiscernible] signals. So by doing this, we can get rid of this anechoic signal factor, as you can see here, and in this way, we will determine some relationships for the standard deviation. So this, the standard deviation of the difference between the spectrum of the left and right signals of the [indiscernible] signals would be related only to the standard deviation of the responses, which responses the binaural responses in this case, because it will have the binaural responses. So the idea is that such a methodology could lead to the determination of a novel distance estimation parameter. So as you can see here, we do the [indiscernible] of subtracting the left and the right signal spectrum, reversing the spectrum, and we calculate the standard deviation, which is proved to be, under specific circumstances, to be related only to the standard deviation of the binaural responses as you can see here. So as I have already mentioned, if this was just the room response, we know very well that the standard deviation really depends on the distance between the source and the receiver. It's highly distance dependent feature when we are below the critical distance of the room. But this is not straightforward in the case of binaural responses. At least my knowledge, there was no there was something, let's say, well established. So the idea was to check if this feature is distance dependent. And interestingly, but probably it was somewhat expected from my side, these differential standard deviation really depends on the distance. So here, you can see for an orchestral musical piece, this is the time domain. This is the feature that I have [indiscernible] dimension and the different colors correspond to the different distances between the source and the receiver. And similarly, for a signal consisting of guitar, you can see that the feature seems to be highly distance dependent. And seems to behave in a signal independent way. So the idea was to use this feature and Gaussian mixture models which we had already the frame work from the first method I have presented to you. And using two frames of two seconds to extract this feature, and in order to develop a method, then evaluate what is the performance. So here, you can see for a room, Carolina Hall, I'd say available [indiscernible], the mean performance was 75.5 percent when the system is tested with five different distance classes. And then we were trying to find some ways to increase the performance and get some better results. And the idea was to introduce a binaural feature extraction framework. So since we had the available binaural signals, we were able to easily extract binaural cues [indiscernible] time difference, [indiscernible] level difference and coherence for several frequency bands and then calculate several statistical attributes of those features. So this would lead in the calculation of [indiscernible] more additional features. And that could be used also in our classifier to [indiscernible] obtain better results. So here [indiscernible] this is really not a it's an engineering approach so we were just trying to feed more features in [indiscernible] classifier. And this stage, we were not really concerned with why should these statistical features of binaural cues should be, let's see, distance dependent or something like that. But this was the first type of this of the development of this work. So here you can see the results, which now our mean performance increased about 20 percent. And we have used only four of these features out of these 430 that I showed you before. But they are but we have an isolated using feature selection algorithms. And so now I can to answer your question, this is again performance in one room so we train the system in one room and we test it in the same room, but it was of interest to examine what would happen when we would go in other rooms. So here, for the easy task of having only three rooms, so three different classes, you can see that the method seems to work quite well. I can find more information in paper that is now in press, and we have analyzed in more detail these things referring to right now. >>: [indiscernible]. How you generated the [indiscernible]. Is this a [indiscernible] provided by the database, or you measure them by yourself. >> Eleftheria Georganti: Okay. So we have some binaural responses from a database, but we also have some from some rooms at the university in a conference center. So I use all these responses. I converted them with the anechoic signals. I have a database, and then I use the database for the training of the testing curve. >>: [indiscernible]. >> Eleftheria Georganti: Using the [indiscernible]. So we go to the room, we get the we take the [indiscernible], we put the loud speaker in another for a different distances so one, two, three, four, five meters and so on, and we calculate responses by the [indiscernible] the recorded signal with anechoic [indiscernible] has been reproduced. >>: [indiscernible]? >> Eleftheria Georganti: Yes, but there are, of course, software that does everything easily. So, for example, there is I don't know if I should refer to it, okay, but there is a specific software, but of course always you an do it with Matlab. I have the code. But it's something that is kind of straightforward to do. It's not difficult. It doesn't add convolution of the input signal and the output signal. And this takes me now to the introduction of room classification method. So since we had all these interesting features, binaural features, we >>: [indiscernible]. >> Eleftheria Georganti: Yes, please. >>: It seems like your distance estimation is true, like a classification [indiscernible]. So I don't know, you know, [indiscernible] two seem like a regression [indiscernible]. >> Eleftheria Georganti: >>: Or two meters. appropriate. Yeah, so Or later regression classification is more >> Eleftheria Georganti: Yeah, so [indiscernible] works well. So that's why we have used GMM. And also, binaural distance measured technique, we also used a [indiscernible] vector machine, which is even more it seems like it works, they're much more robust and they can work better in unknown environments. So, of course, I did some tests with the regression. I would prefer if it could work with the linear regression, because it's a procedure that you can fully understand somehow why some of the features are, let's say, not relevant for the task. But the first first the results were not so good. So right now, the methods as they have developed, they are only oriented to work for specific distance classes, and then you have to find some tricks in order to if you want to find a different class between the two classes. >>: [indiscernible]. >> Eleftheria Georganti: Yeah, so in some places, it might go in the other if it's 1.5, you would have almost 50 percent confusion. So you might go on the on either side. But in the case of when you're at 1.1 meter and then you have the other class at two meters, then you would get it on the one meter class. So it seems to be quite robust in small distance mismatches. Yeah. And that's why it works also quite well in [indiscernible]. So since we have all this binaural feature framework, we were trying to think of other ways to exploit this information. So one idea was whether we could use such a framework for the classification of rooms. But a good question here would be how can one classify rooms? And according to the reverberation time, to the volume or how big they are, and according to some acoustical parameters and how they behave in the rooms, or is there another way to do it. What are like the ways that can tell me that now I'm in a lecture room and I'm not in a big auditorium. What makes this room similar to another lecture room in the same building. And so, of course, this is not an easy question to answer. But we found in the literature the work of Floyd Toole. Redefines three room categories. But this is for sound reproduction purposes only and small, medium, large. And he somehow classifies them according to the reverberation time. So we have used the, again, these databases with binaural room responses, and we have separated the rooms in small, medium and large. We have created the different audio files and we have we are trying to find, well, if these binaural statistical features are related to some of these room properties. So here you can see again the binaural feature statistical framework. And the rooms that we have used for our analysis. So with the blue color, you can see the rooms that were used for the training and the black color are the ones that were used for the testing of the method. And here it's the performance of the method using Gaussian mixture models. So we got, let's say, some insight that such framework would [indiscernible] some other, let's say, [indiscernible] techniques. For example, this room classification, which is really abstract, of course. And some further work of mine is related to the acoustical parameters. In this case, we were also trying estimate parameters, but not from the responses, from the measured responses but from the signals. So I have done some work related to the estimation of clarity, of clarity. If you are aware, it shows the energy of the ratio of the energy of the early part of the 50 first milliseconds of the response over the [indiscernible] part of the response. >>: [indiscernible]. >> Eleftheria Georganti: Yes, the 50 milliseconds. over the late part, over the rest of the response. And then So this is a very typical use of acoustical use parameter, and it solves whether there is a good [indiscernible] signal and many times, it's used to tune signal [indiscernible] techniques, for example, dereverberation. And so we have used these data base of features using two second frames for the extraction, and we have extracted the clarity of binaural responses and made these are actually the values that we have were calculated from the responses. And these values were used for the training of the method and the other ones for the evaluation. In this case, we used the linear regression technique, and it was found that the variance of the [indiscernible] time difference were given the highest weights. So the other features were somehow the factors that were multiplied of this linear regression equation. They were very close to zero. So all these other features were found to be less important. And here you can see the results of this analysis. The vertical axis, you can see the error. And then on the horizontal axes are the actual clarity device. So it works not so well, but it depends on what you want to do. And here, you can also see the repredicted and the [indiscernible] clarity values as a function of the time frame. So this is another indication that probably sets an analysis and a framework for again a list of signals [indiscernible] assist the estimation of other parameters. For example, the clarity or the direct to reverberant ratio, which is another acoustical parameter. In this case, instead of having the 50 first millisecond, it's calculated only using the direct signal, which is the main pick, and then maybe five more milliseconds after the main pick and it another important and widely used parameter in [indiscernible] acoustics. And at the same time, there was also some work that it's already existing work that relates the standard deviation of the responses with a direct to reverberant ratio. So using the same framework as before, by subtracting the left and the right dereverberant signal spectrums, but in this case, assuming again [indiscernible] independent, we can find a relationship for the sound deviation. And then it can also be proved that this differential spectrum is related to the reverberant ratio. >>: [indiscernible]. >> Eleftheria Georganti: Excuse me? >>: You're assuming this is [indiscernible] the left and right signals? >> Eleftheria Georganti: >>: Yes. Which is not correct. Okay. >> Eleftheria Georganti: Which is not correct always, yeah. >>: What is the sound source? You have this set of [indiscernible] where you put the sound source always in front? >> Eleftheria Georganti: Yeah, so for the binaural [indiscernible] estimation technique, we have done this for all [indiscernible]. So it works well if you like train the system with various angles and then you test with various angles, it works quite well. But yeah, of course we have also tried it with not only for zero degrees. Otherwise, it could not be published, of course, because the reviewers would changes the signals change a lot. and it's normal. It And okay. So yeah, we have concerning this statistical independence here, this is actually work that is still in progress. I have found this actually, I was trying not to assume this statistical independence, but these relationships become much more complicated. So I was trying to see if we assume this independence, which could be true for high frequencies and other specific circumstances. And we end up in a relationship that relates the standard deviation of this [indiscernible] with a direct to reverberant ratio. And that was quite interesting in my opinion. And here, you can see that we've tested this method for three different signals cello music, guitar, and speech signal for five different rooms. And you can see on the vertical axis the standard deviation error, because you would also estimate the standard deviation. But this is the most important thing. It's the direct to reverberant ratio error which this in case we used only zero degrees for the results of them presenting here. And the prediction error was always less than three dB, and it was found that the direct to reverberant ratio could be predicted from binaural signals. This somehow sums up in a very fast way the work that is more, let's say, related to my Ph.D. thesis, which is actually part of the text that is going to be my Ph.D. thesis. But there are also other things that I have been I have done during my Ph.D. studies. And so when I was in Philips for a few months, I would have worked I've been work for an ambient telephony system. And they were trying to estimate the position of various devices situated in the room. So each device would have a microphone and a speaker. There would be like four devices in this setup, four devices in one room. Another four devices in another room, and we were trying to find a way to find where these devices are situated in the room. >>: [indiscernible] system? >> Eleftheria Georganti: I can't say where, no. Okay. So you have a speaker that he's moving around in the room, and it's like a conference, an open conference call. Hands free, let's say, call. You have a hands free call. And these devices are used to render the sound to so if the speaker is here, close to this device, then this device would play the signal louder. And then the other devices would play the signal with less strength. And the idea, of course, was also that these could be also direct more directed to have a bin of sound only the specific position whereas the speak [indiscernible] but this is for like further steps [indiscernible]. >>: [indiscernible]. >> Eleftheria Georganti: But my task was to this automatic calibration, which was to find where are these devices situated. So since there was a >>: [indiscernible] device? >> Eleftheria Georganti: Exactly, yes. And find also a way to find which devices are in the one room and which devices are in the other room and so on. So since there was a loud speaker and a microphone in each device, it was very easy to calculate the impulse responses so the idea was that okay if the user buys a system, push a button, he gets the system calibrated and he can start using it. So I have used these impulse responses. The input of the signal was always [indiscernible] response between all these points. >>: So they got full [indiscernible] of the impulse responses between each of the speakers and each of the microphones? >> Eleftheria Georganti: Exactly. So the multi dimensional scaling technique, which distances between points, and then it just the actual XY set, depending if you're two of these devices. I hope it's clear. But idea was to use the takes as input the returns the positions, or three dimensions, I can elaborate. Another thing that I have also some experience is on this subjective evaluation of signal enhancement. >>: So you used your other stuff to estimate the distances and you put that >> Eleftheria Georganti: When you have the impulse response, it's easy to calculate the distance, because you calculate the delay between the yeah. And it's easy to say this is >>: So all devices are going to the same computer? >> Eleftheria Georganti: >>: Exactly. So they're all synchronized. >> Eleftheria Georganti: They're synchronized and you know the delay of the sound current of these things. >>: [indiscernible]. >>: So the problem is you have the full [indiscernible] between the distances [indiscernible]. >> Eleftheria Georganti: Exactly, which is something that I didn't do, but it was multi dimensional scaling that did it for me. But I have to find this method somewhere. And apart from this, there was also the thing of trying to find which devices having the one room, and then you could probably see which responses had similar reverberation characteristics. So this would mean that they would be like these responses in this room, and this response is in the other room. Or you could see that this pair of responses would be not so good. It would be like [indiscernible] in between so you would know that seven is definitely in another room from device three and so on. Okay. So this is more just to show you that I have done some other things and it's not so, probably, let's say, exciting as my research topic. And okay. So have spent some time collaborating with other colleagues who has been mainly working with dereverberation techniques. And I have help sometimes for the evaluation of the algorithms of the developed algorithms. And many times, we have conducted some listening tests in order to see how the listeners, what the listeners think of the algorithms and, let's say a perception artifacts of these techniques. So here, for this, we were I was mainly I have mainly worked with dereverberation algorithms, and we were trying to extend, let's say, the expertise that was in the group, in the development of the dereverberation algorithm. But for single channel, for one single channel signals to the binaural scenario. And in this case, of course, you have several problems, because you would have, if you do different processing in the right and in the left ear, you would destroy the binaural cues, the [indiscernible] time differences, level differences, coherence and so on. This would also destroy the localization cues of the listener. So the idea was to find a way to do this to, let's say, extend the existing expertise and framework for the single channel scenario to the binaural case. And we have proposed some ideas to do the same processing here on the left and on the right signal. And by proposing some, let's say, ideas from the on the calculation of this game that you would use, I don't know if you're actually aware of dereverberation techniques. One way is to use the spectral subtraction, where you decide that you have to subtract some from some frequency bins, you subtract something, which is supposed to be dereverberation. So this is done using the factor of D, a gain factor, which can be different for its frequency bin. And the idea was to find a way to use the same gain factor for the left and the right. And this can be, for example, the minimum of these two signals, the maximum, or the average. So we did some >>: Hold, hold, hold. >> Eleftheria Georganti: Yes. >>: I kind of missed the main point. You have [indiscernible] try to do speech enhancement in dereverberation of the two microphones. >> Eleftheria Georganti: >>: Exactly. So then after [indiscernible] user into [indiscernible]. >> Eleftheria Georganti: Exactly. >>: So you two channel speech enhancement and not [indiscernible] of the speaker signal? >> Eleftheria Georganti: No, two channel signal. >>: [indiscernible] device [indiscernible] and the problem is speech in the room [indiscernible]. >> Eleftheria Georganti: Yes, exactly. Bearing in mind that we already had some techniques that they were working in the single channel case quite well and robust with not so many artifacts and so but, of course, when you want to do it binaurally, it gets more complicated. And this is an open research issue right now, to my knowledge. So I have also spent some time working with this, although I was not a part of development of these actual methods. But mainly on the evaluation of those. And >>: [indiscernible]. So you mentioned that providing some suppression techniques, and the [indiscernible] independently can cause [indiscernible]. How are you able to [indiscernible] this? >> Eleftheria Georganti: You ask the user where was the source at the beginning and where is it after the processing. So you can have >>: Users [indiscernible] recordings before and after. >> Eleftheria Georganti: >>: [indiscernible]. >> Eleftheria Georganti: >>: Yes. This is one way to Spatial cues? >> Eleftheria Georganti: Do it, yes. Sorry. Just but in this case, this person who did it was not destroying the binaural cues. So we didn't have actually to check whether the position would be destroyed, because we would use the same gain on both signals. But yeah, the number would be the 15, 20 test subjects to usually we don't have so many test subjects for your tests. >>: Was there something on the subjective side used to [indiscernible] algorithm which does the same? >> Eleftheria Georganti: Of course, we have used these objective measures. They calculate [indiscernible] and there are two others, but they are widely used. >>: [indiscernible]. >> Eleftheria Georganti: They are single channel, that's true. And, of course, so you have to evaluate them separately for each channel. To my knowledge, this is not a [indiscernible]. They are not like ways right now to evaluate and check if this binaural [indiscernible] algorithm works great because it does these metrics. First of all, the metrics are already problem, even in a single case scenario, although there is already some years of work. But in the binaural scenario, it gets even more complicated, because you have to also think of these localization cues and other attributes. >>: Thank you. >> Eleftheria Georganti: And the last thing that I have also spent quite some time is related to architectural room acoustics. There is a lot of experience in my group. So mainly on the simulation of the acoustics of ancient Greek theaters, which because there are many, actually, in Greece. So in the last years, after we have simulated all of the [indiscernible] and we wanted to do something more, we have tried to simulate the acoustic effect of wearing a mask. Because in the past, when you the actors, they would usually wear a mask in the while playing tragedies and so on. So the mask, they changed the [indiscernible] so the actor voice for the listener but also for the spectators. So we wanted we have collaborated with a director so not an engineer. It was someone who was constructing actually the masks. And he wanted to somehow explain some these audible effects, that the actors would mention, like the masks are felt to vibrate, that they resonate on specific frequencies, or that the position of the mask would change the acoustic effect, or that they get, let's say, distract the localization cues the question distracted. And so we have did some measurements with a dummy head. That's the dummy head we have. So these are the masks. There are various types with open ears, closed ears, open mouth. There are some others with closed mouth. And we have measured the response using a loud speaker and the camera, the microphone in the ears but also the loud speaker on the opposite side. Also using as loud speaker, the loud speaker of the mouth. So [indiscernible] speaking. Then recording the responses in the ears. So this would get would lead to the [indiscernible] perception effect. And you can see some results, which I don't know so much so interesting right now. For values degrees, this is the self perception effect. These measurements were open, outdoors since we don't have an anechoic [indiscernible] very expensive to have one. And yeah, it was found we have [indiscernible] some conclusions that I don't know if they are so interesting for you right now concerning the boost of some frequencies that specific orientation [indiscernible] and there is a high frequency cut in the high frequency range. And so for the future, I'm somehow, let's say, closing this first part of my work, I have thought of some ideas for future work. I would be interested to see how one can modify cues by neural cues or the signals themselves to create a different distance perception to the listener. I would be interested to see if my work could be also enveloped in spatial audio encoding/decoding techniques where you, many times you need some information of the [indiscernible] environment in order to encode it in the signal. I would like to extend my statistical analysis for the single channel scenario [indiscernible] deviation and all these things I have referred to, to the binaural case and see in more detail how the spectral standard deviation changes for the binaural case. Is it like do we get higher values, do we get this 5.66 dB value of [indiscernible], and try also to, apart from just observing, also establish theoretical framework for this. Of course, the early reflections is always an open topic. Actually, still an open topic. We cannot use statistical models, to my knowledge, to model it. But it would be nice if we could do something on this direction. The idea also of getting an acoustic perceptual map so using, let's say, signals, you have a binaural signals, which is the common things for human listeners, and then you can say, okay, how far is the sound source, how many sound sources are present, all these issues related to the [indiscernible] auditory signal analysis, which I hope that will be my work would also assist in this direction. And finally, explore the information extracted from the signals for signal enhancement [indiscernible] and see if we can improve algorithms for hearing aids or other similar applications. So thank you very much. Yes, please. I would be glad to answer any questions. >>: You said you used [indiscernible] features [indiscernible]. Are all of these features based on [indiscernible] or also based on the temporal [indiscernible]. >> Eleftheria Georganti: >>: Both temporal and frequency domain. You get a lot of information. >> Eleftheria Georganti: Exactly. >>: How consistent is this [indiscernible] over different placements, same distance, different placements inside the room. Say if you're standing in the corner, does it get worse, or >> Eleftheria Georganti: Yeah, for the binaural case, the truth is that I haven't tried it. So I haven't done another pair of positions with the same distance. But for the single family scenario, it works well. So the good thing with these techniques, because for the binaural [indiscernible] there are other techniques in the literature. It's not like I'm not the first person that did that. But they are really sensitive to small changes, as you say, for example, you go to another pair of position or if you go to if you move a little bit further microphone, it's 1.1 meter, instead of one meter, they fail, because they are very accurate, but they are also very sensitive. But this feature seems to be more robust, more you cannot get probably the best performance, the top performance, the 100 percent performance, but it seems like they work quite well for these this agreement in. >>: [indiscernible] then you're changing. >> Eleftheria Georganti: Yes, exactly. So for the binaural distance scenario, I should admit that I haven't tried it, but I think that it will work maybe not equally well, but similarly well. >>: I have a question. Okay, you have your [indiscernible] to the sound force of zero degrees. So presuming that the [indiscernible] is actually symmetric left and right. So pretty much the room impulse [indiscernible] three meters to the left and the right here is, okay, pretty much the same. But it's symmetric. Because during the later [indiscernible] see kind of the same statistic. Why the difference between those signals which are kind of both statistically the same matter so much? Do you have any idea? [indiscernible]. >> Eleftheria Georganti: Well, the reason that I use these differential thing is because you get rid of the signal. So of course, if it's like having so let's say we use only one signal, but the assumption is that the left and the right signal are almost the same. So why don't you use only one signal, for example. But since we have two signals and we subtract them and we get something that's, let's say, signal it takes out the signal. So this is the reason that I have used the subtraction. So according to the statistical analysis, the anechoic gets out and you only end one a statistical attribute of a response. Otherwise, you would have a signal inside, which would have, of course, all this [indiscernible] deviations, which are difficult to model and >>: But have [indiscernible] I can have up to 20 dB continuation for certain frequencies. >> Eleftheria Georganti: Yes, so in this direction, I think that what makes this thing work is actually the classifier that gets trained for this specific [indiscernible] and then it does the work. But there is definitely something missing in this work of mine. So I know that for the standard deviation, I know that for the single channel case, I know very well how it behaves for different frequency bands if it goes to 5.6 dB. But when you are in the binaural case, you have another situation. You have the sound effect. You have all these, let's say, frequency ranges, but they are a little bit more boosted in the [indiscernible] frequency range and all these differences where you compare a binaural response with a single channel response. So I think one should analyze, do the same analysis also in the binaural case, but not only for zero degrees, because with a single channel case, you have only zero degrees and you're relaxed. So one has to do this for several angles, and then you have to find also a theoretical, let's say, framework to, yeah, to better describe it. And then it might be more, let's say, easier to say why does this thing work. So right now, I think it works because of the classifier for other angles. >> Ivan Tashev: More questions? >>: So you didn't talk at all about noise. So what happens if, I mean, have you tried, like, you know, you train your classifier and then at test time, there is [indiscernible] refrigerator was humming and the computer's on, and the kid is screaming and the projector is on. >> Eleftheria Georganti: Yeah, I haven't tried to see how these methods work with noise at all. I have never tried it, but I suppose one could use some noise enhancement techniques, subtract the noise somehow if it's not too bad, of course, and the signal to noise ratio is quite okay. And then I think that you could probably get some good results with these methods. But I haven't tried it. But I think that you need this first stage of the noise before you can apply these methods. >> Ivan Tashev: More questions? >> Eleftheria Georganti: If not, thank you very much. Thank you very much for your time.

>> Ivan Tashev: Good morning, everyone, those who are... room and those who are attending the talk remotely. ...

Related documents

Products

Support

&gt;&gt; Ivan Tashev: Good morning, everyone, those who are... room and those who are attending the talk remotely. ...

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib

>> Ivan Tashev: Good morning, everyone, those who are... room and those who are attending the talk remotely. ...