>> Ivan Tashev: Good morning, everyone, those who are... room and those who are attending the talk remotely. ...

advertisement
>> Ivan Tashev: Good morning, everyone, those who are in the
room and those who are attending the talk remotely. Today, we
have the pleasure to have Eleftheria Georganti. She's a Ph.D.
student in University of Patras in Greece. Got her Master's
degree in electrical and computer engineering in 2007.
So for the last six years, she has been a Ph.D. student and had
the opportunity to visit a lot of places, being nine months in
the Technical University of Denmark, then she, then she visited
Philips Research in Eindhoven, in the Netherlands and spent some
time in University of Oldenburg, always doing [indiscernible]
research and processing.
So today, she's going to talk about her findings and her results.
Without further adieu, Eleftheria, you have the floor.
>> Eleftheria Georganti: Thank you very much for the
introduction. So my name is Eleftheria Georganti. I'm
pronouncing it also in the Greek way, and I will give you an
overview of my research experience over the last six years of my
work.
So the contents of this presentation, I will initially introduce
myself, although I got already a good introduction. Then I will
let you know of my
of the topic of my Ph.D. thesis and I will
provide you an overview of it. And I will emphasize on four
segments of my Ph.D. work related to the analysis of statistical
properties of responses and signals. I will present you two
methods for the estimation of distance between a source and a
receiver from a single channel and binaural signals, a method for
room classification, and a framework for a method for the
estimation of acoustical parameters, such as the directive
[indiscernible] ratio and clarity.
And I will also refer to some hide side projects which are not
actually part of my Ph.D. work, but I have spent some time
working on other topics while be in Philips or while
collaborating with other colleagues in my laboratory at the
University of Patras, related to automatic calibration of an
ambient telephony system, subjective evaluation of signal
enhancement techniques, and finally some work related to
architectural room acoustics, which we have quite some experience
in my group. And I will finish the presentation with some ideas
for the future.
So my first thing is where I come from is the University of
Patras. It's
you can see this red dot, this is where Patras
is, it's like 250 kilometers from Athens. My group is the audio
and acoustic technology group, which is a part of a bigger group
called Wire Communication Laboratory and has all these other
groups you can see here.
We are actually a small group, I think, for [indiscernible] so my
professor, like, the head of the group is Professor John
Mourjopoulos. We have three post doc students and we have three
Ph.D. students, and there is a wide variety of topics that we've
been working on, but the expertise of the group is audio signal
processing, room correction, signal enhancement techniques. But
there are also some other topics but they're not so signal
processing oriented
let's say with, for example, for this was
doing some working on novel sound reproduction devices or Bob is
working with acoustic energy harvesting.
Concerning me, [indiscernible] I told you about my background.
I've been working on my Ph.D. from 2007, and had the chance to
collaborate with other groups, as you can see here, and have
spent nine months at the Technical University of Denmark
collaborating with Dr. Finn Jacobsen, who unfortunately passed
away last week. And getting a good insight on physical acoustics
with
and this is something that I didn't have so good
background before that.
Then another nine months at Philips Research, collaborating with
Dr. Stephen van de Par and Aki Harma. And since we have studied,
established a good collaboration, I spent another three months at
the University of Oldenburg where Dr. Stephen van de Par got the
position, professor position last year.
And I'm now in the final stage of my Ph.D. degree. I have
actually already finished up, finished the writing and presented
in the first committee and I still have to do the final
presentation in September. And something else that I've been
also participating
I have also participated in is the
intellectual group of AABBA, which is a grouping of various
laboratories in Europe and also in the states, and everybody's
working with binaural techniques and methodologies and we try to
exchange, let's collaborate and to further improve the
understanding of the mechanisms, but also propose some novel
methodologies for signal processing.
So we'll start with a basic concept of my Ph.D. work and
everything that has to do with the acoustics, and this is
reverberation. We know very well that room acoustics introduces
reverberation. And we can fully describe this phenomenon with
the room impulse response, which can be easily determined
following the standard methodology by driving a sweep signal.
For example, a specific [indiscernible] room and [indiscernible]
microphone, we can divide it [indiscernible] and we get the
expressions of [indiscernible] response and the room transfer
function in the time domain and the room transfer function in the
frequency domain. And, of course, these are complete acoustical
descriptions containing all the information we need for the
acoustical properties of the room, and how the acoustics affect
the signals they reproduce in the rooms.
And, of course, it's known that the room responses will vary
across different positions in the room. So, for example, here
you can see at position one, we would have a more flat response
and less [indiscernible] in the frequency domain than in the case
of position two, which is for the distance.
My Ph.D. topic is entitled modeling, analysis and processing of
acoustical room responses and signal us under reverberant
conditions. And bearing in mind all these things that I told you
in the previous slide, that the room responses contain all these
important information for the acoustic engineer, my task was to
try to determine attributes that normally it's easy to extract
from responses directly from signals.
So getting in the room recording signals and then from the
signals try to estimate acoustical parameters, the distance
between the source and receiver and so on.
The way I have approached this problem is based on this simple
relationship that the anechoic signal when it gets reproduced in
the room, we get the reverberant signal, which is a compilation
of the room and impulse response for these pair of positions and
the anechoic signal. And the frequency domain would have a
multiplication, but we can also face expressions of the signal,
the anechoic and the reverberant signal, and the room transfer
function as distributions, as [indiscernible] distributions.
We tried to determine what are the statistical relationships
between these three components and tried to find whether some
statistical measures are related to statistical parameters, for
example, the direct to reverberant ratio, clarity and so on. And
try to tract models for each of these three components with
emphasis on the response part, on the room transfer function, and
at a final aim try to exam the potentials or a potential of using
this knowledge for the development of novel techniques, signal
processing techniques.
So we start with some
an overview of an analysis of
statistical properties of responses and signals. I hope it's
clear why I'm motivated to do this, and I hope it will get more
clear as I further continue.
So as I have already mentioned, we have a [indiscernible] time
domain, which is [indiscernible] frequency domain. And then if
we were trying to determine the different relationships between
these three expressions, I have already done some work, just the
first step was to do some things to examine this relationship
just by observing things, just taking the histograms, plotting
them for the anechoic signal response and the reverberant signal
and try to see if they're related, how they change with distance
with other
within other rooms and so on.
So we emphasized on the measures of standard deviation and there
is a good reason to do that, as I will let you know in a while.
And kurtosis, which seems to work quite well, and we tried to
create some models for the standard deviation and the kurtosis of
the room responses and tried to estimate the statistical measures
[indiscernible] from the reverberant signals. So the idea was to
find a model for a room transfer function and then use only the
reverberant signal in order to extract information for the
these statistical measures of the anechoic signal.
>>: [indiscernible] do you mean like a special signal like
[indiscernible] or speech.
>> Eleftheria Georganti: Speech. It can be any type of signal.
This analysis, it can be any type of signal. The methods I will
refer to, afterwards developed for speech signals. But it
shouldn't be, of course, [indiscernible] signal. This is not to
use the [indiscernible] signals and use any [indiscernible]
signals.
So then [indiscernible] relationships, we can
we have the
convolution time domain, frequency domain and multiplication and
the addition of [indiscernible] domain, denoting each of these
three terms which you can see here, have also used different
colors, we can assume that X and Y are independent, random
variables. Then we can find some relationships for the standard
deviation. This is spectral standard deviation so we are in the
frequency domain, we look at the magnitude of the frequency
response, and this is the standard deviation of these three
[indiscernible] reverberant signal. Sigma X to the transfer
function, and Y to the signal.
And we can find some relationships for these two statistical
measures for these three
yes, please?
>>:
[indiscernible].
>> Eleftheria Georganti:
Sorry?
>>: [indiscernible] in the frequency domain or in general
they're complex.
>> Eleftheria Georganti:
>>:
You mean a positive?
Signal magnitude.
>> Eleftheria Georganti: Yeah, so, of course, you can always say
sift the frequency response wherever you want so it's actually
the relationship between the spectral values of the actually is
of interest for us. So the actual absolute values, the dB
values, whether they're positive or negative, it's not of concern
of this point.
>>:
Okay.
[indiscernible].
>> Eleftheria Georganti:
>>:
Yes, these are [indiscernible].
Yeah.
And the frequency [indiscernible] general statistics?
>> Eleftheria Georganti: Yeah, it's for specific frequency
bands, the analysis. You can use either one [indiscernible]
fraction analysis or wider frequency bands. I have done several
tests with
in order to find what would be the most appropriate
frequency and bandwidth analysis, depending on the application
that you want to end up.
So the reason we have used the standard deviation is that there
is the well known work of Manfred Schroeder in 1954, that he
showed that the [indiscernible] deviation when we are in a
distance, for distances above the critical distance. The
standard deviation, the [indiscernible] standard deviation
converts to the
takes the values that is close to 5.6 dB.
when they diffuse sound field in the room.
So
And then also, Jetzt, in 1979, he showed that the standard
deviation would increase when we are very close to the
[indiscernible], let's say you have a microphone and the sound
source very close, we will have smaller spectral standard
deviation. Of course, the spectrum will be flatter, there will
be more deviations. So the more you increase the distance, the
more deviations will appear on the spectrum, and this can be
captured with a spectral standard deviation.
So we have an increase up to the critical distance of a room,
which depends on the reverberation time and the volume of each of
the rooms. And according to the [indiscernible] theory, above
this distance, you would have a specific standard deviation
value, which is 5.6 decibels.
So that's why we decided in these first steps of my analysis to
use the method of standard deviation, which is there was already
some existing, let's say, knowledge about it. So here you can
also see some results using one [indiscernible] band analysis or
full band analysis.
>>: So this is the standard deviation of the spectrum of the
room impulses across the frequency base?
>> Eleftheria Georganti:
>>:
Exactly, yeah.
So [indiscernible].
>> Eleftheria Georganti: It gets more deviated. So this is the
standard deviation of the response, and we were trying to see, as
I already showed you this, that we have found a relationship
within the standard deviation of the reverberant, the response
and the anechoic signal. We were trying to relate these three
components and using those [indiscernible] model.
So let's say this is actually the base, the framework that my
Ph.D. thesis has been
was based on it.
So as a further step, we decided to try to estimate distance from
single channel signals, and the idea is as you can see here in
the figure, we have a speaker which is situated at a specific
distance from the microphone, and this can be also used in, for
example, in an open conference call or if you have multiple
devices in a room, you might need to know the distance between
the speaker and the device.
So we were trying to find some ways to do that. One easy way was
to, okay, take the sound level, the sound [indiscernible] level
of a speaker and say okay, if it gets closer, it will get higher
and so on. But, of course, this is not a good idea because each
one of us would speak in a different way depending on whether
it's a man or a woman. And so on.
So we tried to use some statistical features that depend on
distance. And relate and employ pattern recognition techniques,
machine learning techniques in order to develop a method for the
distance estimation. So as I have already told you, the first
thing to do is try to observe what is happening.
So here, you can see reverberant signals. They're speech
signals, the frequency spectrum of signals recorded at 0.5
meters, one meter, two meters, and three meters. And you can see
that the histogram changes, and these changes could be captured,
probably, with kurtosis and skewness. Kurtosis shows how far is
distribution from the normal distribution, how picky it gets or
how flat it is, changes with this. And skewness shows how
symmetrical it is.
So we use these two features and here you can see how they behave
across time. Across different time frames. The skewness and the
kurtosis. And it's evident of their really distance dependent.
So the idea was to use these two attributes and also some other
ones related to the [indiscernible] of the linear prediction
residual, which works mainly for speech, also [indiscernible]
detector, which is this last line of the code. And using all
these parameters and employing pattern recognition techniques,
okay, we could develop a method for distance estimation from
single channel signals.
For this method, we have employed Gaussian mixture models and
used the five features I have already shown to you, and they're
filtered versions only for 10 to 15 kilohertz. So full band, but
also for a smaller frequency rate. And we have evaluated the
method, and you can see here that when the system is trained with
the same speakers that is evaluated, the method got 72 percent.
And then when the speaker is unknown, we got a worse performance.
We also conducted a listening test, where we tried to
we used
the same signals as the signals as we tested the method, and we
asked the test subjects to sort the signals in terms of distance
and say what is further and what sounds closer. And here you can
see the results of these analysis, which are quite close to the
unknown speaker situation.
>>:
Is this all in the same room?
>> Eleftheria Georganti:
>>:
Yes, we did the test in two rooms.
Test in two rooms?
>> Eleftheria Georganti: Yeah, but so these results are all the
training, and the testing is in the same room. So you train in
one room, and you test it in the same room.
>>:
[indiscernible].
>> Eleftheria Georganti: Exactly. So you have a different
course, this is one of the biggest problems of distance
of
estimation techniques right now, to find a way to have them work
in unknown room environments.
>>:
How many [indiscernible].
>> Eleftheria Georganti: Yeah, maybe about 20 minutes of
recordings for the training and then another ten minutes for the
testing of the method.
>>:
And the [indiscernible].
>> Eleftheria Georganti: Yes. [indiscernible] yeah, okay. So
another thing that I have been working on was trying to estimate
distance from binaural signals, which is, which should be easier
to do than from [indiscernible] signals where you have only one
input. For this reason, we have extended our framework analysis
in this binaural case scenario.
Here, we're also going to write the relationships as previously
but in this case, we'll have two channels. So convolution for
the left channel and the convolution for the right channel,
multiplication, similarly, the frequency domain, and then in the
[indiscernible] domain we can simply write the same
relationships. And then we [indiscernible] subtract it, the
right spectrum from the left spectrum of the [indiscernible]
signals.
So by doing this, we can get rid of this anechoic signal factor,
as you can see here, and in this way, we will determine some
relationships for the standard deviation. So this, the standard
deviation of the difference between the spectrum of the left and
right signals of the [indiscernible] signals would be related
only to the standard deviation of the responses, which responses
the binaural responses in this case, because it will have the
binaural responses.
So the idea is that such a methodology could lead to the
determination of a novel distance estimation parameter. So as
you can see here, we do the [indiscernible] of subtracting the
left and the right signal spectrum, reversing the spectrum, and
we calculate the standard deviation, which is proved to be, under
specific circumstances, to be related only to the standard
deviation of the binaural responses as you can see here.
So as I have already mentioned, if this was just the room
response, we know very well that the standard deviation really
depends on the distance between the source and the receiver.
It's highly distance dependent feature when we are below the
critical distance of the room. But this is not straightforward
in the case of binaural responses. At least my knowledge, there
was no
there was something, let's say, well established.
So the idea was to check if this feature is distance dependent.
And interestingly, but probably it was somewhat expected from my
side, these differential standard deviation really depends on the
distance. So here, you can see for an orchestral musical piece,
this is the time domain. This is the feature that I have
[indiscernible] dimension and the different colors correspond to
the different distances between the source and the receiver.
And similarly, for a signal consisting of guitar, you can see
that the feature seems to be highly distance dependent. And
seems to behave in a signal independent way. So the idea was to
use this feature and Gaussian mixture models which we had already
the frame work from the first method I have presented to you.
And using two frames of two seconds to extract this feature, and
in order to develop a method, then evaluate what is the
performance.
So here, you can see for a room, Carolina Hall, I'd say available
[indiscernible], the mean performance was 75.5 percent when the
system is tested with five different distance classes.
And then we were trying to find some ways to increase the
performance and get some better results. And the idea was to
introduce a binaural feature extraction framework. So since we
had the available binaural signals, we were able to easily
extract binaural cues [indiscernible] time difference,
[indiscernible] level difference and coherence for several
frequency bands and then calculate several statistical attributes
of those features.
So this would lead in the calculation of [indiscernible] more
additional features. And that could be used also in our
classifier to [indiscernible] obtain better results.
So here [indiscernible] this is really not a
it's an
engineering approach so we were just trying to feed more features
in [indiscernible] classifier. And this stage, we were not
really concerned with why should these statistical features of
binaural cues should be, let's see, distance dependent or
something like that. But this was the first type of this
of
the development of this work.
So here you can see the results, which now our mean performance
increased about 20 percent. And we have used only four of these
features out of these 430 that I showed you before. But they are
but we have an isolated using feature selection algorithms.
And so now I can to answer your question, this is again
performance in one room so we train the system in one room and we
test it in the same room, but it was of interest to examine what
would happen when we would go in other rooms. So here, for the
easy task of having only three rooms, so three different classes,
you can see that the method seems to work quite well.
I can find more information in paper that is now in press, and we
have analyzed in more detail these things referring to right now.
>>: [indiscernible]. How you generated the [indiscernible]. Is
this a [indiscernible] provided by the database, or you measure
them by yourself.
>> Eleftheria Georganti: Okay. So we have some binaural
responses from a database, but we also have some from some rooms
at the university in a conference center. So I use all these
responses. I converted them with the anechoic signals. I have a
database, and then I use the database for the training of the
testing curve.
>>:
[indiscernible].
>> Eleftheria Georganti: Using the [indiscernible]. So we go to
the room, we get the
we take the [indiscernible], we put the
loud speaker in another
for a different distances so one, two,
three, four, five meters and so on, and we calculate responses by
the [indiscernible] the recorded signal with anechoic
[indiscernible] has been reproduced.
>>:
[indiscernible]?
>> Eleftheria Georganti: Yes, but there are, of course, software
that does everything easily. So, for example, there is
I
don't know if I should refer to it, okay, but there is a specific
software, but of course always you an do it with Matlab. I have
the code. But it's something that is kind of straightforward to
do. It's not difficult. It doesn't add convolution of the input
signal and the output signal.
And this takes me now to the introduction of room classification
method. So since we had all these interesting features, binaural
features, we
>>:
[indiscernible].
>> Eleftheria Georganti:
Yes, please.
>>: It seems like your distance estimation is true, like a
classification [indiscernible]. So I don't know, you know,
[indiscernible] two seem like a regression [indiscernible].
>> Eleftheria Georganti:
>>: Or two meters.
appropriate.
Yeah, so
Or later regression classification is more
>> Eleftheria Georganti: Yeah, so [indiscernible] works well.
So that's why we have used GMM. And also, binaural distance
measured technique, we also used a [indiscernible] vector
machine, which is even more
it seems like it works, they're
much more robust and they can work better in unknown
environments. So, of course, I did some tests with the
regression. I would prefer if it could work with the linear
regression, because it's a procedure that you can fully
understand somehow why some of the features are, let's say, not
relevant for the task. But the first
first the results were
not so good.
So right now, the methods as they have developed, they are only
oriented to work for specific distance classes, and then you have
to find some tricks in order to
if you want to find a
different class between the two classes.
>>:
[indiscernible].
>> Eleftheria Georganti: Yeah, so in some places, it might go in
the other
if it's 1.5, you would have almost 50 percent
confusion. So you might go on the
on either side. But in the
case of when you're at 1.1 meter and then you have the other
class at two meters, then you would get it on the one meter
class. So it seems to be quite robust in small distance
mismatches. Yeah. And that's why it works also quite well in
[indiscernible].
So since we have all this binaural feature framework, we were
trying to think of other ways to exploit this information. So
one idea was whether we could use such a framework for the
classification of rooms. But a good question here would be how
can one classify rooms? And according to the reverberation time,
to the volume or how big they are, and according to some
acoustical parameters and how they behave in the rooms, or is
there another way to do it. What are like the ways that can tell
me that now I'm in a lecture room and I'm not in a big
auditorium. What makes this room similar to another lecture room
in the same building.
And so, of course, this is not an easy question to answer. But
we found in the literature the work of Floyd Toole. Redefines
three room categories. But this is for sound reproduction
purposes only and small, medium, large. And he somehow
classifies them according to the reverberation time.
So we have used the, again, these databases with binaural room
responses, and we have separated the rooms in small, medium and
large. We have created the different audio files and we have
we are trying to find, well, if these binaural statistical
features are related to some of these room properties.
So here you can see again the binaural feature statistical
framework. And the rooms that we have used for our analysis. So
with the blue color, you can see the rooms that were used for the
training and the black color are the ones that were used for the
testing of the method.
And here it's the performance of the method using Gaussian
mixture models. So we got, let's say, some insight that such
framework would [indiscernible] some other, let's say,
[indiscernible] techniques. For example, this room
classification, which is really abstract, of course.
And some further work of mine is related to the acoustical
parameters. In this case, we were also trying estimate
parameters, but not from the responses, from the measured
responses but from the signals. So I have done some work related
to the estimation of clarity, of clarity. If you are aware, it
shows the energy of
the ratio of the energy of the early part
of the 50 first milliseconds of the response over the
[indiscernible] part of the response.
>>:
[indiscernible].
>> Eleftheria Georganti: Yes, the 50 milliseconds.
over the late part, over the rest of the response.
And then
So this is a very typical use of acoustical use parameter, and it
solves whether there is a good [indiscernible] signal and many
times, it's used to tune signal [indiscernible] techniques, for
example, dereverberation. And so we have used these data base of
features using two second frames for the extraction, and we have
extracted the clarity of binaural responses and made
these are
actually the values that we have
were calculated from the
responses. And these values were used for the training of the
method and the other ones for the evaluation.
In this case, we used the linear regression technique, and it was
found that the variance of the [indiscernible] time difference
were given the highest weights. So the other features were
somehow the factors that were multiplied of this linear
regression equation. They were very close to zero. So all these
other features were found to be less important.
And here you can see the results of this analysis. The vertical
axis, you can see the error. And then on the horizontal axes are
the actual clarity device. So it works not so well, but it
depends on what you want to do.
And here, you can also see the repredicted and the
[indiscernible] clarity values as a function of the time frame.
So this is another indication that probably sets an analysis and
a framework for again a list of signals [indiscernible] assist
the estimation of other parameters. For example, the clarity or
the direct to reverberant ratio, which is another acoustical
parameter.
In this case, instead of having the 50 first millisecond, it's
calculated only using the direct signal, which is the main pick,
and then maybe five more milliseconds after the main pick and it
another important and widely used parameter in [indiscernible]
acoustics.
And at the same time, there was also some work that it's already
existing work that relates the standard deviation of the
responses with a direct to reverberant ratio. So using the same
framework as before, by subtracting the left and the right
dereverberant signal spectrums, but in this case, assuming again
[indiscernible] independent, we can find a relationship for the
sound deviation. And then it can also be proved that this
differential spectrum is related to the reverberant ratio.
>>:
[indiscernible].
>> Eleftheria Georganti:
Excuse me?
>>: You're assuming this is [indiscernible] the left and right
signals?
>> Eleftheria Georganti:
>>:
Yes.
Which is not correct.
Okay.
>> Eleftheria Georganti:
Which is not correct always, yeah.
>>: What is the sound source? You have this set of
[indiscernible] where you put the sound source always in front?
>> Eleftheria Georganti: Yeah, so for the binaural
[indiscernible] estimation technique, we have done this for all
[indiscernible]. So it works well if you like train the system
with various angles and then you test with various angles, it
works quite well. But yeah, of course we have also tried it with
not only for zero degrees. Otherwise, it could not be published,
of course, because the reviewers would
changes
the signals change a lot.
and it's normal.
It
And okay. So yeah, we have concerning this statistical
independence here, this is actually work that is still in
progress. I have found this
actually, I was trying not to
assume this statistical independence, but these relationships
become much more complicated. So I was trying to see if we
assume this independence, which could be true for high
frequencies and other specific circumstances.
And we end up in a relationship that relates the standard
deviation of this [indiscernible] with a direct to reverberant
ratio. And that was quite interesting in my opinion.
And here, you can see that we've tested this method for three
different signals cello music, guitar, and speech signal for five
different rooms. And you can see on the vertical axis the
standard deviation error, because you would also estimate the
standard deviation. But this is the most important thing. It's
the direct to reverberant ratio error which this in case we used
only zero degrees for the results of them presenting here.
And the prediction error was always less than three dB, and it
was found that the direct to reverberant ratio could be predicted
from binaural signals.
This somehow sums up in a very fast way the work that is more,
let's say, related to my Ph.D. thesis, which is actually part of
the text that is going to be my Ph.D. thesis. But there are also
other things that I have been
I have done during my Ph.D.
studies. And so when I was in Philips for a few months, I would
have worked
I've been work for an ambient telephony system.
And they were trying to estimate the position of various devices
situated in the room.
So each device would have a microphone and a speaker. There
would be like four devices in this setup, four devices in one
room. Another four devices in another room, and we were trying
to find a way to find where these devices are situated in the
room.
>>:
[indiscernible] system?
>> Eleftheria Georganti: I can't say where, no. Okay. So you
have a speaker that he's moving around in the room, and it's like
a conference, an open conference call. Hands free, let's say,
call. You have a hands free call. And these devices are used to
render the sound to
so if the speaker is here, close to this
device, then this device would play the signal louder. And then
the other devices would play the signal with less strength.
And the idea, of course, was also that these could be also direct
more directed to have a bin of sound only the specific position
whereas the speak [indiscernible] but this is for like further
steps [indiscernible].
>>:
[indiscernible].
>> Eleftheria Georganti: But my task was to this automatic
calibration, which was to find where are these devices situated.
So since there was a
>>:
[indiscernible] device?
>> Eleftheria Georganti: Exactly, yes. And find also a way to
find which devices are in the one room and which devices are in
the other room and so on. So since there was a loud speaker and
a microphone in each device, it was very easy to calculate the
impulse responses so the idea was that okay if the user buys a
system, push a button, he gets the system calibrated and he can
start using it.
So I have used these impulse responses. The input of the signal
was always [indiscernible] response between all these points.
>>: So they got full [indiscernible] of the impulse responses
between each of the speakers and each of the microphones?
>> Eleftheria Georganti: Exactly. So the
multi dimensional scaling technique, which
distances between points, and then it just
the actual XY set, depending if you're two
of these devices. I hope it's clear. But
idea was to use the
takes as input the
returns the positions,
or three dimensions,
I can elaborate.
Another thing that I have also some experience is on this
subjective evaluation of signal enhancement.
>>: So you used your other stuff to estimate the distances and
you put that
>> Eleftheria Georganti: When you have the impulse response,
it's easy to calculate the distance, because you calculate the
delay between the
yeah. And it's easy to say this is
>>:
So all devices are going to the same computer?
>> Eleftheria Georganti:
>>:
Exactly.
So they're all synchronized.
>> Eleftheria Georganti: They're synchronized and you know the
delay of the sound current of these things.
>>:
[indiscernible].
>>: So the problem is you have the full [indiscernible] between
the distances [indiscernible].
>> Eleftheria Georganti: Exactly, which is something that I
didn't do, but it was multi dimensional scaling that did it for
me. But I have to find this method somewhere. And apart from
this, there was also the thing of trying to find which devices
having the one room, and then you could probably see which
responses had similar reverberation characteristics. So this
would mean that they would be like these responses in this room,
and this response is in the other room. Or you could see that
this pair of responses would be not so good. It would be like
[indiscernible] in between so you would know that seven is
definitely in another room from device three and so on.
Okay. So this is more just to show you that I have done some
other things and it's not so, probably, let's say, exciting as my
research topic.
And okay. So have spent some time collaborating with other
colleagues who has been mainly working with dereverberation
techniques. And I have help sometimes for the evaluation of the
algorithms of the developed algorithms. And many times, we have
conducted some listening tests in order to see how the listeners,
what the listeners think of the algorithms and, let's say a
perception artifacts of these techniques.
So here, for this, we were
I was mainly
I have mainly
worked with dereverberation algorithms, and we were trying to
extend, let's say, the expertise that was in the group, in the
development of the dereverberation algorithm. But for single
channel, for one single channel signals to the binaural scenario.
And in this case, of course, you have several problems, because
you would have, if you do different processing in the right and
in the left ear, you would destroy the binaural cues, the
[indiscernible] time differences, level differences, coherence
and so on. This would also destroy the localization cues of the
listener.
So the idea was to find a way to do this to, let's say, extend
the existing expertise and framework for the single channel
scenario to the binaural case.
And we have proposed some ideas to do the same processing here on
the left and on the right signal. And by proposing some, let's
say, ideas from the
on the calculation of this game that you
would use, I don't know if you're actually aware of
dereverberation techniques. One way is to use the spectral
subtraction, where you decide that you have to subtract some
from some frequency bins, you subtract something, which is
supposed to be dereverberation.
So this is done using the factor of D, a gain factor, which can
be different for its frequency bin. And the idea was to find a
way to use the same gain factor for the left and the right. And
this can be, for example, the minimum of these two signals, the
maximum, or the average.
So we did some
>>:
Hold, hold, hold.
>> Eleftheria Georganti:
Yes.
>>: I kind of missed the main point. You have [indiscernible]
try to do speech enhancement in dereverberation of the two
microphones.
>> Eleftheria Georganti:
>>:
Exactly.
So then after [indiscernible] user into [indiscernible].
>> Eleftheria Georganti:
Exactly.
>>: So you two channel speech enhancement and not
[indiscernible] of the speaker signal?
>> Eleftheria Georganti:
No, two channel signal.
>>: [indiscernible] device [indiscernible] and the problem is
speech in the room [indiscernible].
>> Eleftheria Georganti: Yes, exactly. Bearing in mind that we
already had some techniques that they were working in the single
channel case quite well and robust with not so many artifacts and
so
but, of course, when you want to do it binaurally, it gets
more complicated. And this is an open research issue right now,
to my knowledge.
So I have also spent some time working with this, although I was
not a part of development of these actual methods. But mainly on
the evaluation of those. And
>>: [indiscernible]. So you mentioned that providing some
suppression techniques, and the [indiscernible] independently can
cause [indiscernible]. How are you able to [indiscernible] this?
>> Eleftheria Georganti: You ask the user where was the source
at the beginning and where is it after the processing. So you
can have
>>:
Users [indiscernible] recordings before and after.
>> Eleftheria Georganti:
>>:
[indiscernible].
>> Eleftheria Georganti:
>>:
Yes.
This is one way to
Spatial cues?
>> Eleftheria Georganti: Do it, yes. Sorry. Just but in this
case, this person who did it was not destroying the binaural
cues. So we didn't have actually to check whether the position
would be destroyed, because we would use the same gain on both
signals.
But yeah, the number would be the 15, 20 test subjects to usually
we don't have so many test subjects for your tests.
>>: Was there something on the subjective side used to
[indiscernible] algorithm which does the same?
>> Eleftheria Georganti: Of course, we have used these objective
measures. They calculate [indiscernible] and there are two
others, but they are widely used.
>>:
[indiscernible].
>> Eleftheria Georganti: They are single channel, that's true.
And, of course, so you have to evaluate them separately for each
channel. To my knowledge, this is not a [indiscernible]. They
are not like ways right now to evaluate and check if this
binaural [indiscernible] algorithm works great because it does
these metrics. First of all, the metrics are already problem,
even in a single case scenario, although there is already some
years of work. But in the binaural scenario, it gets even more
complicated, because you have to also think of these localization
cues and other attributes.
>>:
Thank you.
>> Eleftheria Georganti: And the last thing that I have also
spent quite some time is related to architectural room acoustics.
There is a lot of experience in my group. So mainly on the
simulation of the acoustics of ancient Greek theaters, which
because there are many, actually, in Greece.
So in the last years, after we have simulated all of the
[indiscernible] and we wanted to do something more, we have tried
to simulate the acoustic effect of wearing a mask. Because in
the past, when you
the actors, they would usually wear a mask
in the
while playing tragedies and so on.
So the mask, they changed the [indiscernible] so the actor voice
for the listener but also for the spectators. So we wanted
we
have collaborated with a director so not an engineer. It was
someone who was constructing actually the masks. And he wanted
to somehow explain some these audible effects, that the actors
would mention, like the masks are felt to vibrate, that they
resonate on specific frequencies, or that the position of the
mask would change the acoustic effect, or that they get, let's
say, distract
the localization cues the question distracted.
And so we have did some measurements with a dummy head. That's
the dummy head we have. So these are the masks. There are
various types with open ears, closed ears, open mouth. There are
some others with closed mouth. And we have measured the response
using a loud speaker and the camera, the microphone in the ears
but also the loud speaker on the opposite side.
Also using as loud speaker, the loud speaker of the mouth. So
[indiscernible] speaking. Then recording the responses in the
ears. So this would get
would lead to the [indiscernible]
perception effect.
And you can see some results, which I don't know so much so
interesting right now. For values degrees, this is the self
perception effect. These measurements were open, outdoors since
we don't have an anechoic [indiscernible] very expensive to have
one. And yeah, it was found we have [indiscernible] some
conclusions that I don't know if they are so interesting for you
right now concerning the boost of some frequencies that specific
orientation [indiscernible] and there is a high frequency cut in
the high frequency range.
And so for the future, I'm somehow, let's say, closing this first
part of my work, I have thought of some ideas for future work. I
would be interested to see how one can modify cues by neural cues
or the signals themselves to create a different distance
perception to the listener.
I would be interested to see if my work could be also enveloped
in spatial audio encoding/decoding techniques where you, many
times you need some information of the [indiscernible]
environment in order to encode it in the signal.
I would like to extend my statistical analysis for the single
channel scenario [indiscernible] deviation and all these things I
have referred to, to the binaural case and see in more detail how
the spectral standard deviation changes for the binaural case.
Is it like do we get higher values, do we get this 5.66 dB value
of [indiscernible], and try also to, apart from just observing,
also establish theoretical framework for this.
Of course, the early reflections is always an open topic.
Actually, still an open topic. We cannot use statistical models,
to my knowledge, to model it. But it would be nice if we could
do something on this direction.
The idea also of getting an acoustic perceptual map so using,
let's say, signals, you have a binaural signals, which is the
common things for human listeners, and then you can say, okay,
how far is the sound source, how many sound sources are present,
all these issues related to the [indiscernible] auditory signal
analysis, which I hope that will be my work would also assist in
this direction.
And finally, explore the information extracted from the signals
for signal enhancement [indiscernible] and see if we can improve
algorithms for hearing aids or other similar applications.
So thank you very much.
Yes, please.
I would be glad to answer any questions.
>>: You said you used [indiscernible] features [indiscernible].
Are all of these features based on [indiscernible] or also based
on the temporal [indiscernible].
>> Eleftheria Georganti:
>>:
Both temporal and frequency domain.
You get a lot of information.
>> Eleftheria Georganti:
Exactly.
>>: How consistent is this [indiscernible] over different
placements, same distance, different placements inside the room.
Say if you're standing in the corner, does it get worse, or
>> Eleftheria Georganti: Yeah, for the binaural case, the truth
is that I haven't tried it. So I haven't done another pair of
positions with the same distance. But for the single family
scenario, it works well. So the good thing with these
techniques, because for the binaural [indiscernible] there are
other techniques in the literature. It's not like
I'm not the
first person that did that.
But they are really sensitive to small changes, as you say, for
example, you go to another pair of position or if you go to
if
you move a little bit further microphone, it's 1.1 meter, instead
of one meter, they fail, because they are very accurate, but they
are also very sensitive. But this feature seems to be more
robust, more
you cannot get probably the best performance, the
top performance, the 100 percent performance, but it seems like
they work quite well for these
this agreement in.
>>:
[indiscernible] then you're changing.
>> Eleftheria Georganti: Yes, exactly. So for the binaural
distance scenario, I should admit that I haven't tried it, but I
think that it will work maybe not equally well, but similarly
well.
>>: I have a question. Okay, you have your [indiscernible] to
the sound force of zero degrees. So presuming that the
[indiscernible] is actually symmetric left and right. So pretty
much the room impulse [indiscernible] three meters to the left
and the right here is, okay, pretty much the same. But it's
symmetric. Because during the later [indiscernible] see kind of
the same statistic. Why the difference between those signals
which are kind of both statistically the same matter so much? Do
you have any idea? [indiscernible].
>> Eleftheria Georganti: Well, the reason that I use these
differential thing is because you get rid of the signal. So of
course, if it's like having
so let's say we use only one
signal, but the assumption is that the left and the right signal
are almost the same. So why don't you use only one signal, for
example. But since we have two signals and we subtract them and
we get something that's, let's say, signal
it takes out the
signal. So this is the reason that I have used the subtraction.
So according to the statistical analysis, the anechoic gets out
and you only end one a statistical attribute of a response.
Otherwise, you would have a signal inside, which would have, of
course, all this [indiscernible] deviations, which are difficult
to model and
>>: But have [indiscernible] I can have up to 20 dB continuation
for certain frequencies.
>> Eleftheria Georganti: Yes, so in this direction, I think that
what makes this thing work is actually the classifier that gets
trained for this specific [indiscernible] and then it does the
work. But there is definitely something missing in this work of
mine. So I know that for the standard deviation, I know that for
the single channel case, I know very well how it behaves for
different frequency bands if it goes to 5.6 dB.
But when you are in the binaural case, you have another
situation. You have the sound effect. You have all these, let's
say, frequency ranges, but they are a little bit more boosted in
the [indiscernible] frequency range and all these differences
where you compare a binaural response with a single channel
response.
So I think one should analyze, do the same analysis also in the
binaural case, but not only for zero degrees, because with a
single channel case, you have only zero degrees and you're
relaxed. So one has to do this for several angles, and then you
have to find also a theoretical, let's say, framework to, yeah,
to better describe it. And then it might be more, let's say,
easier to say why does this thing work.
So right now, I think it works because of the classifier for
other angles.
>> Ivan Tashev:
More questions?
>>: So you didn't talk at all about noise. So what happens if,
I mean, have you tried, like, you know, you train your classifier
and then at test time, there is [indiscernible] refrigerator was
humming and the computer's on, and the kid is screaming and the
projector is on.
>> Eleftheria Georganti: Yeah, I haven't tried to see how these
methods work with noise at all. I have never tried it, but I
suppose one could use some noise enhancement techniques, subtract
the noise somehow if it's not too bad, of course, and the signal
to noise ratio is quite okay. And then I think that you could
probably get some good results with these methods. But I haven't
tried it. But I think that you need this first stage of the
noise before you can apply these methods.
>> Ivan Tashev:
More questions?
>> Eleftheria Georganti:
If not, thank you very much.
Thank you very much for your time.
Download