1

advertisement
1
>> Ivan Tashev: Good afternoon, everyone. It's the traditional five minutes
past the hour so we can start. Today, we have Professor Kainam Thomas Wong
from Hong Kong Polytechnic University. He's associate professor there, but
took his Bachelor's degree in University of California Los Angeles and his
Ph.D. from Purdue University.
He's going to talk today about acoustic velocity sensors and what kind of
beamforming and sound capturing we can provide to them. Without further ado,
Thomas, you can have the floor.
>> Kainam Thomas Wong: As Ivan mentioned, my talk is about acoustic sensor
ray. The objective is we have is beamforming. And specifically, we are going
to do frequency invariant beamforming. What I mean by that is, what I mean by
this term is that a beamformer weights do not depend on frequency, do not
depend on the frequency of the incoming signals, of the incoming interference
or of the frequency of the noise.
And we can realize this because of a special kind of sensor that we are going
to use. Not microphone, not the [indiscernible]. Not uniform circular ray,
but a special kind of sensor called the acoustic velocity-sensor triad.
So basically, we have three velocity sensors in this system. So this is the
presentation of my talk. So first, what is a velocity-sensor triad. What are
its advantages for beamforming, and how about some specific algorithm for
adaptive beamforming, using the velocity-sensor triad. Now, adaptive
beamforming, different people use this term to mean different things. What I
mean here is not beam pattern, not space or mat filtering, but some part of the
data model is unknown to the algorithm and changes over time. So the algorithm
itself adapts to the external environment.
And then there will be some data for jury trial, using simulated speech data.
This presentation is actually based on a paper just appeared last month in JSA.
Okay. So what is a velocity-sensor triad? Now, this is the array manifold of
a velocity-sensor triad. So we actually have three components in the triad.
This is the measurement model. If the source comes in with elevation angle
theta and azimuth angle fie, this will be the measurement. I mean, the array
manifold.
2
Now, notice that the UVW turns out to be just the direction cosine along the X
axis, along Y axis, along Z axis. So this is just that. This just that. This
just that. This is just the property of the velocity sensor, and we have three
of them.
Now, notice some interesting properties about the array manifold. The array
manifold depends on the elevation angle and also depends on azimuth angle.
Very importantly, no frequency, no wavelength here. And like the ULA, uniform
linear array or uniform circular array.
Now, this array manifold is applicable for far afield, the source very far away
or the source relatively close by to the sensor itself. But again, there's no
R here. The R, the separation between the meter and the sensor, does not
appear in the array manifold.
Now, this would be -- those properties, the property of F that's not up here
and R that's not up here would be really, really advantageous for adaptive
beamforming. As we will see in a moment.
Let me just skip over this page. Let me show you a picture of the velocities.
Okay. So it's [indiscernible]. The velocity sensor for air acoustics is
actually commercially available. For example, by a company in Illinois. And
this is the spec sheet just downloaded from the company website, vertical axis
[indiscernible] response. How [indiscernible] axis is just frequency. So
according to this spec sheet, has pretty constant response from 100 hertz to
about maybe three to four kilohertz within 3 dB. So this kind of inexpensive
velocity sensor would be good enough, perhaps, for speech processing.
But that's just one example. There are other kind of commercial models
available for velocity sensor. A professor from the University of Illinois
Urbana Champaign actually just have a do-it-yourself kind of vector sensor. So
this is one, two, three velocity sensors and one microphone. So the X axis is
here. Y axis would be here. Z axis would be here. So they do not exactly
co-locate, but relative to the wavelength, they're almost co-located.
Now, in here, there's an optional pressure sensor here. It is optional. Now,
but they are actually commercially, commercial products available for the
vector sensor. One doesn't have to do it yourself. This is a company in the
Netherlands. This is a picture I downloaded from the company website.
According to the company, the frequency range from one tenth of a hertz to 20
3
kilohertz. So the human hearing range for a very young person.
That's what the company claims.
Within one dB.
So the velocity-sensor triad is practical. It is commercially available. It's
been implemented. We can actually buy it. What we have been talking about is
for air acoustics. But actually, the velocity sensor has a long history, going
back over a century, according to some reference, according to some books. But
mostly has its roots in underwater acoustics for defense purposes.
So this is just underwater acoustic version of the vector hydrophone. This is
just external case. So the black thing is just the support. For example, a
boat may tow an array of this type of vector hydrophone behind a boat or behind
a submarine.
Okay. So, so far, we see that the velocity-sensor triad is available if we
have money. Just go and buy. So what are its advantages? Well, we hinted on
it a while ago. The advantages lies in the simplicity of its array manifold.
Simplicity in the sense that it is independent of frequency and independent of
R. Now, when we say independent of frequency, this is an idealization when
we're looking at that product, [indiscernible] responds. The [indiscernible]
does drop off after four kilohertz and so on. But it is independent of
frequency in the sense that since the three components are the same, so the
drop-off would be sort of similar for the three. In that sense, it is
independent of frequency.
So there are actually four advantages that I'm going to talk about. The first
two are really simple. We already mentioned it. We mentioned the first one
already. One unit, the picture from microphone. One tiny unit give us azimuth
and also elevation angle. It doesn't have to in ULA spread out on an array
grid. It can be very, very compact. It can be very, very compact.
So the third advantage is what I would like to dwell on a little bit longer.
The array manifold does not depend on frequency. So when we do beamforming,
the beamformer waves also do not have to depend on frequency. We can make it
to depend on frequency, okay, but that's just optional. But the beamforming
waves do not have to depend on frequency.
Now, this is in contrast to a ULA, a linear array of uniformly spaced
microphones on the directional pressure sensors. If we just consider the first
three, if it's just the three element, ULA, then this would be the ULA array
4
manifold familiar to, I think, both of us. The frequency appears through the
wavelength here. So if we have the ULA have a look direction of 30 degree and
if we set the beamforming weight to be for two kilohertz, that's the speed of
sound in air, and then we do -- okay, and then we plot the space of mat filter
beam pattern, well, it just -- instead we get two kilohertz, we got a purple
one, and that is right, because we get a peak at three degrees.
But if the frequency is 500 hertz, very much within the human speech range, the
peak might grade significantly to the left such that at 30 degree, you know, it
is, actually has an attenuated [indiscernible] response. And if the frequency
rises to 3.5 kilohertz at 30 degree, we again may have only 0.75 and so on.
So basically, what I try to show here is that for ULA or similar kind of
[indiscernible], UCA, URA, uniform rectangular array, things like that, because
the frequency appears in the array manifold, we really need to have a lot of
signal processing to account for the array manifold's dependency on frequency.
This just for an L-shaped array. Similarly, the frequency appears through the
wavelength, and I just give this example because for this L-shaped array, we
can do the azimuth angle as well as the elevation angle. But still, the
frequency appears.
So if we have a beam pattern have direction of elevation angle third degree and
azimuth third degree at a frequency of two kilohertz, then this would be the
beam pattern, but then if using the same beamforming weights and frequency,
500, beam pattern changes quite a bit. And if the frequency now becomes 400, I
mean, 4,000 hertz, the beam pattern again quite different.
Now, this is the intended location of the peak. But here, where the peak
should be we actually get Sullivan now. So the frequency dependency of the
array manifold could be a really big problem.
>>:
[indiscernible].
>> Kainam Thomas Wong: If, right, so the reason we have this kind of phenomena
is because the pressure sensors are not co-located, and if, as you mentioned,
if we cannot use phase, but if we use time, then it would be wide band
processing, and the time and the direction would be coupled together. But for
the vector sensor, the look direction, and the frequency axis can be decoupled.
So we can handle the two separately. And that give us more versatility.
5
>>: Still, why does -- okay, you have estimated the delay in some weights for
the 2,000 hertz, and that's fine. But if you talk phases, this is more
processed in the frequency domain. What can estimate the proper weights for
every single frequency band and then [indiscernible] always the maximum will be
at the desired location. This is not a substantial amount of computations
which is a problem for [indiscernible].
>> Kainam Thomas Wong: I would agree with you, that this kind of system can be
used for the way you mention it. Now, if it is adaptive signal processing, you
know, if some of the interference or sources are moving around and if you want
the CPU to be really, really inexpensive and really cheap, so the vector sensor
would be one alternative.
So how cheap a CPU is cheap enough, I think that's a system development
question so, you know, you would much more than I do. I suppose it probably
depends on how cheap is cheap enough.
And also, how adaptive the [indiscernible] need to be.
>>:
What this velocity sensor measures is actually the speed [inaudible].
>> Kainam Thomas Wong: Right. So the proper term, I guess, is called acoustic
particle velocity filled vector. Particle velocity filled vector. So one way
to measure it. Actually, one way to measure -- actually maybe I skip over it.
One way to measure it is to have two microphones and do [indiscernible] the
difference. That's one way. Another way is do it in, to measure the velocity
directly by optical method, by thermal methods and also by other mechanical
methods as well.
>>: Even though you can't correct with computational, just acquisition can be
a problem sometimes. You have to make up a lot more to the dynamic system in
the front end to capture the data because of the frequency effects and the
[indiscernible] effects there too. There's computational [indiscernible] too.
>> Kainam Thomas Wong: And also, for some applications, the interference can
be -- I mean, some of the sources may have a known frequency band, maybe just
not speed. Maybe something else. For example, one thing this thing is used
6
for is to detect sniper. Or detect an expected sound that would indicate
something abnormal or that should not happen is happening.
So if part of -- so suppose if [indiscernible] is human speech, but we do want
to reject some impulse interference that is frequency band may be some new
location, the AVS might give us some additional versatility.
>>:
[inaudible].
>> Kainam Thomas Wong: So I would totally agree that the AVS is not
categorically superior, but it offers some positive and also some negative.
But what AVS is [indiscernible] for particular product, it really system
development judgment.
Okay. So basically, because as we decouple the look direction from the
frequency axis, so it could be computationally simpler if we use the vector
sensor. So there's actually, Ivan just mentioned that. Okay. So speech,
music and background noises, they are really broadband and the bandwidth may be
a the unpredictable locations and that bandwidth varies over time, and
typically a [indiscernible] unknown.
So if we can decouple the look direction coordinate from the frequency
coordinate, it might give us some additional advantage. So the velocity-sensor
triad, which I sometimes call the vector sensor, its beam pattern does not
depend on F and also does not depend on R.
So here's another advantage. That if we look at the ULA, if it's a near field,
and if R is the distance between the [indiscernible] and the sensor, the array
would depend on R. But for the velocity-sensor triad, R does not appear. So
that's another advantage is that the same beam pattern can be used regardless
of how far away or how close the meter is from the sensor system.
Okay. So how about using the velocity-sensor triad for adaptive beamforming.
I guess many people are actually beamforming experts so I'll just go through it
really quick. This is a scenario in my subsequent discussions of conference
room. There are potentially up to six simultaneous active speakers,
potentially.
So the sensor in this case is put at the top of the ceiling. This is just the
simulation scenario I have for the subsequent pages. So this is the collected
7
data at time T. So that's a particular speaker is the desired speaker, and
then I have on that diagram [indiscernible] would be 5. Five interfering
speakers and some noise. So I formed a three times three spatial covariance
matrix. Why three? Well, why three times three? Because we have a triad. We
have three velocity sensors so we have three times three.
So this is just the minimum power, distortionless response beamformer. Some
people may be a little bit unfamiliar with this terminology, but it's basically
minimum barriers distortionless response. The only difference is that in MVDR,
this is supposed to be the actual real data -- I mean, this is supposed to
reflect the real statistics in MVDR. But in MPDR, this is just the empirical
collected data.
But this, the idea behind it is basically the same as MVDR. So in MVDR
beamforming, the way vector here will ensure no distortion, this look
direction, the tuned direction, while minimizing the beamformer's overall
output power.
Now, remember, in the triad, we only have three sensors. We have a very
stressful situation here. We have to desired user, the blue, purple thing
here. This is the azimuth angle. That is the elevation angle, and we have
five interferers. So we actually have six people talking together. At the
same time. So this is a very stressful situation, and the left-hand side, the
right-hand side are basically the same thing, but it's just a contour map.
Now the SOI admittedly is not at a peak, but if you look at them now, if you
look at the now here, another now, the two nows are placed sort of nearby the
five interference. So the beamformer would actually have the SINR, to signal
the interference noise ratio. I emphasize this is a very stressful situation,
because we have more emitters than we have sensors, which is three.
Now, I also have to admit, you know, my [indiscernible] scenario, and this is
the best looking one.
Okay. This is just to summarize the advantage of using velocity-sensor triad.
One set of beamforming weight for all frequencies. One set of beamforming
weights without regard to how close by the speaker is. One set of beamforming
weight regardless of the interfering sources' distance from the sensor system.
No need for any prior information of the time frequency structure of the signal
and -- of the [indiscernible] and of the interference. So simplicity would be
8
its primary advantage. Now where in simplicity is worthwhile in any particular
system, there will be a system development kind of judgment.
Okay. Now, how about if the look direction is unknown, well, there's something
called MUSIC. MUSIC has nothing to do with music. It's just an acronym. It's
a parameter estimation method. So through this kind of method, we can actually
estimate the desired look direction.
The assumption behind this method is that a desired speaker is actually
loudest. Which I hope is the case, unless somebody try to shout out the
desired speaker.
So we can see that at the vertical axis is the estimation bias in degrees.
this is one degree. So in this particular scenario, if the SINR, there's
interference present, if the SINR is sort of like 10, 15 dB, then the
estimation could be correct within one degree.
So
Now, how about if that MUSIC method does not estimate too precisely, or if the
velocity-sensor triad is tuned by some machine or human, but tuned imprecisely,
what would happen? Well, that's the problem of mispointing the beamforming.
If the beamformer is mispointed, I'm sorry I don't have the numbers with me on
this slide, but I have the numbers in the paper. Then the desired user could,
indeed be nulled. But there's a method, a signal processing method called
diagonal loading. It's a very simple method, just add an extra identity matrix
here, scaled by the loading factor, gamma.
Then the desired speaker will not be nulled anymore.
>>:
[indiscernible]?
>> Kainam Thomas Wong: According to the papers I have read, and I read most of
the papers there, it's kind of ad hoc. They don't have a theory for it. So I
guess if it's a certain kind of application scenario, people will try ahead of
time what range would typically work for this, for a certain class of scenario.
And, of course, the right-hand side diagram, have tried many, many ways and
finally this would look the best.
>>: But still, the question is if used, [indiscernible] system uses given set
of sensors and makes a mistake, isn't it better to make the same mistake when
9
you do the capturing so if the vocalizer thinks the sound source is right
there, most [indiscernible] not the correct location, most probably the
beamforming should point where the system thinks is because some reasons the
sensors are not identical [indiscernible] capture the sound.
>> Kainam Thomas Wong: I think the pointing error may come from many different
causes. Maybe the cause that you mentioned is that the array itself is not
calibrated such that when you say incorrect, maybe, I don't know, maybe what
you mean is the correct is the nominal if the array is perfectly calibrated,
its ideal is ULA or the sensors are isotropic, [indiscernible] gain and set a
space a half wavelength or whatever, so it's a perfect idealized version. Then
it has beamforming vector and then the actual physical ULA that we have, which
would be uncalibrated, the microphones may not have equal gain. They may not
be located at a correct nominal position, then that ideal -- that non-ideal ULA
would give us another set of beamformer weights.
So we should not try to make it too equal. That, I would totally agree. That
I would totally agree. But the mispointing here is a different kind of
mispointing. It is that it doesn't matter at this time mispointing would exist
whether we have the ideal ULA or we have the uncalibrated, imperfect ULA. But
just somehow, because the music algorithm or other kind of parameter estimation
algorithms has bias. Has bias. And the bias could be quite significant if the
SINR is bad enough. Or if the tuning is actually done by a person and the
person would be a little bit sloppy and do not tune manually to the correct
direction.
So I'm talking about a different kind of mispointing. Not because the array is
calibrated or not. So even if the array is calibrated, the algorithm could
have estimation bias. Because of noise, because of interference, because of
other kind of imperfections.
Okay. Now, the jury trial is not very realistic, I would be first to admit.
did not actually have the money to buy a microphone for that system.
>>:
Which costs, by the way, $15,000.
>> Kainam Thomas Wong: And my graduate student trying to build it, but is
still building it. So the amplifier is not that easy to do, apparently.
So what we do, for the sake of publishing the paper, is that we just download
I
10
some file from the internet, people reading news, people reading some book or
something, and basically just put them in to the data model here and then to
the signal processing.
Now, the jury is real. They are actual humans. So what's the jury evaluation
system is from a scale of zero to ten. So it's 11 possible marks. If it is
zero, one, two, or three, it would be totally unintelligible, if it's seven,
eight, nine, ten, it would be totally intelligible. But, of course, you know,
a ten speech would be -- would have very good sound quality beyond being
intelligible. So here, I have a 15-member jury. Vertical axis is the average
score. [indiscernible] axis is the SINR in, in to the beamformer. I have got
three curves. The ISO is the isotropic single sensor. Just not ULA. Just one
sensor. The black one.
This thing is the AVS, but using spatial map filtering. Just spatial map
filter to have a look direction, look at the desired speaker. The interference
would not affect the SMF beamforming weights. The interference will affect the
SMF beamformer output, but not the beamformer weights.
The MPDR, as expected, would have the best performance. As a matter of fact,
when the single microphone and the mat filter AVS or both basically serial one,
the speech is already intelligible, because it's seven.
Now, this is the case for three speakers. Remember, we have a triad, so three
speakers. Three speakers, including the desired speaker. So desired speaker
plus two interference.
This case, we have desired speaker and five interference. So it's very
stressful. Still, the MPDR give us a little bit of help by about a score of
one. But because the situation is really stressful, so the gain over the
spatial match filtering is just a score of one. Could remind you, it's a very
stressful situation.
Six speakers simultaneously active.
>>:
What kind of noise [indiscernible] do they measure it with?
>> Kainam Thomas Wong:
Just white Gaussian.
Edited white Gaussian noise.
>>: And in the corporate generation, you guys would [indiscernible] so we can
say that this is a no reverberation case.
11
>> Kainam Thomas Wong: No reverberation. So it is really not that convincing.
But this is sort of a, just to show that at least under this ideal situation,
there could be some improvement.
>>:
Do you have an audio example?
May we hear what it sounds like?
>> Kainam Thomas Wong: Unfortunately, when I was coming today, I just
remembered that I should put the sound samples in it, because we are talking
about speech. I actually do not have the proper sound samples on my laptop.
And also because this is done in Hong Kong, the samples are in Chinese, because
the students with limited English fluency so I don't want the English, you
know, language problem be affecting to it.
And it's much easier to find people to listen to Chinese speech in Hong Kong.
So unfortunately, I don't have it. I should have it, but I don't. I'm sorry.
But this is really a toy scenario, I totally agree.
>>: [inaudible] not have a source that's the same as your original one you're
trying to focus on [inaudible].
>> Kainam Thomas Wong: I totally agree. That's a very good criticism. I
mean, academic, so my piece is just publishing papers. My publisher be happy
with it. So part of the reason I come to Microsoft is perhaps to learn from
you guys what kind of realistic problem I should deal with, yeah, for
reverberation to develop a system. We really need to look into that.
For publishing a paper, this have toy scenario could get through it
fortunately.
On the other hand, the reverberation, you know, in a way, it is a somewhat
different dimension, because the purpose of this paper is just to demonstrate
that this kind of velocity-sensor triad can separate the frequency dimension
and the radial dimension from the look direction. That was actually the only
thesis in this paper.
So there are a lot of questions not addressed by this paper.
>>:
I totally agree.
So in general, you used a theoretical model of the gradient sensors.
12
Assuming that they are perfectly identical, how robust or sensitive is this
particular implementation of MPDR [indiscernible].
>> Kainam Thomas Wong: Right. I have a research work in progress with a
mathematician to look into that problem. If the channel mismatch, the gain,
the phase and the location, I mean, we have an array of such triads. The triad
may not be at its nominal location. And if I have two triads how about if the
orientation not identical. I mean, just having to shift by one or two degree,
I mean, it's possible.
Then how would it degrade performance? So I have a research ongoing with a
mathematician, and we model that mismatch statistically, that if the gain, the
gain mismatch has, say, bias of zero, but that mismatch is stochastic but is
Gaussian distributed, then how would it affect the direction finding. How
would it affect the beamforming. So we were trying to devise a nice and
beautiful equation to show us what the exact degradation is. So we're working
with that.
>>: One note on this experiment, even for the six speaker, so the left shows
that you have a correct implementation at the beamformer and with three
microphones, it can place [indiscernible] towards the desired direction and
[indiscernible] desired speakers. Perfect.
But then once you go to multiple speakers, even ->> Kainam Thomas Wong:
We have multiple speaker here.
>>: The six speaker scenario, then even in this case, I would go with
processing [indiscernible] estimation the MVDR beamformer [indiscernible]. The
reason for this is that speech is a very sparse signal, and you can have six
speakers talk, but it's unlikely to have six speakers in the same frequency
beam. So each frequency beam processes separately with frequency dependent
weights will adaptively place the [indiscernible] towards those two speakers
which are [indiscernible] the speaker scenario into the results of the other
speaker scenario.
>> Kainam Thomas Wong: Right. I would agree that would be an alternative. I
mean, that be would a very good method. The trade-off is very simple here in
terms of the computation. So is that simplicity worthwhile? You know, it
depends on the system development philosophy, I guess.
13
So it may not be worthwhile.
specification.
Or it might be worthwhile, depends on the
>>: It is not much of a CPU required. I think one mobile phone could dink and
drive at least ten of those beamformers, easily, realtime.
>> Kainam Thomas Wong:
>>:
It's dual core CPU, which is today's telephones.
>> Kainam Thomas Wong:
>>:
Okay.
Okay.
And, of course, the [indiscernible].
>> Kainam Thomas Wong:
So other comments?
>>: More questions. I think [indiscernible] to begin with, so feel free to
interject and ask questions, please.
>>: I have a question about the low frequency performance if you were to make
a [inaudible]. As a measure of the velocity, it's measuring -- it's
[indiscernible] gradient depression, which makes it much more sensitive to if
you had a source that could produce a constant pressure aptitude and sweep it
across frequency, it's going to be much more sensitive at the high frequency
because you have a much greater pressure gradient.
If you wanted to make a wide band beamformer, it may be seven or eight octaves.
That means somewhere along the chain, you have to have the compensation of
maybe 50 dB. In other words, you've got to boost the low frequencies by 50 dB,
relative to the very high frequencies. So that's going to limit your low
frequency performance, and it's also going to be limited by the noise.
Now, as I understand it, these devices aren't made by -- each is measuring the
[indiscernible] loss?
>> Kainam Thomas Wong:
I'm not too sure how the transducer works.
>>: It doesn't have to be a noisy process, which means that you're
[indiscernible] on your low frequency performance. Do you have any comments on
14
how practical these are?
>> Kainam Thomas Wong: I really don't have any comment on that. My background
is signal processing so I don't know much about those implementation issues,
yeah. So that's why ->>: So [indiscernible] I have seen, is it pretty much ten millimeters pipe
with a tiny wire going through, which is [indiscernible] through curved and
then they measure the [indiscernible], which means how cool or how is the wire,
that's it.
>>:
That's proportional with the pressure gradient.
>>: Yes. And [indiscernible] frequency response as they show, you have to
[indiscernible] filter.
>>: Which is a heck of a lot for a [indiscernible]. So it's not really
frequency variant. It's frequency variant over a bandwidth. There are going
to be [indiscernible].
>> Kainam Thomas Wong: Right, right. Yeah, thanks for the comment. I really
don't know much about the physical [indiscernible] behind it, but I think that,
you know, that would be a very interesting research topic for a signal
processing person like me, how to correct for it by signal processing.
>>: I think it's for this reason that people are using hysterical microphone
arrays where you're not measuring the pressure in the point, but you're
measuring it at some distance away from the central system. And that helps to
deal with the low frequency problems.
>> Kainam Thomas Wong:
Okay.
Thanks.
Thanks for the comment.
>>: So still considering the sensor, per se, if you have a sensor switch out
place or increase distance, let's say, [indiscernible] four or five centimeters
so you can have them a couple centimeters, the sensors, away, then in this
case, you can use the differences in the time of arrival in the phases.
With this particular design, you put the sensors close to each other, which
means that you the only cue you have for the direction of arrival is the
magnitudes. Because the sensors come from very specific patterns. Can you
15
comment to this?
What will you gain and what you lose.
>> Kainam Thomas Wong: I haven't really compared the two different approaches.
One obvious thing is that the time difference of arrival would need to have
more than one location. At that point is very obvious. For computation power,
you mentioned it's not a big factor at all. I don't know if the AVS -- I mean,
this method would save a little bit of computation power. Now however
important that saving might be.
>>: The way you put them at the distance, we can use both the differences in
magnitudes and in the phase. From the moment you put them together, the only
cue you have is the magnitude, that's it. And this pretty much limits whatever
directivity pattern you do with the first order directivity pattern. So the
best you can do is [indiscernible] the directivity pattern [indiscernible]
isotropic ambient voice fields. That's it.
While with the [indiscernible] microphone array, you can go a little bit
further than that.
>> Kainam Thomas Wong: Right. That directivity pattern here, I don't know
what you might have in mind with the spatial mat filter pattern. With eigen
structure of signal processing, we can have, in adaptive beamforming, we can
have the main loop to be much narrower than the spatial mat filter's main loop.
So with some adaptive signal processing beamforming techniques, we can actually
make the main loop to be much, much, much sharper than this. This is spatial
match filtering, kind of.
>>: So this is equivalent of the [indiscernible] beamformer. It's pretty much
a delay beamformer. And you cannot have it go a little bit beyond that
[indiscernible]. And that's for directivity pattern. That's it. Four
sensors.
>> Kainam Thomas Wong: The kind of beamforming you mentioned, I mean, this
would be a subclass of the kind of beamforming that you mentioned. Even with
just the delay in some beamformer, if we take the delay and the weight
properly, we can make the main beam width to be much, much sharper than the mat
filtering kind of beam width.
>>: [indiscernible] the sensitive to any one of the velocity sensors is a
figure of eight [indiscernible]. And you're never going to get a directivity
16
that's any sharper than a figure of eight. That's your limit.
sharpest you can do is to create a [inaudible].
>> Kainam Thomas Wong:
In fact, the
Right.
>>: That's the physical limit.
to get sharper directivity.
You need to have a higher order sense in order
>> Kainam Thomas Wong: The kind of thinking that I have, maybe it is not
correct, but just for brainstorming is that yeah, that is the [indiscernible]
of an individual sensor, but we have several of them. And if we take the
summing weight wisely, the composite, the entire arrays' composite beam pattern
can be much sharper. It's a little bit like the ULA in the [indiscernible]
isotropic sensors, each of the individual sensor has no directivity at all.
But if we have a ULA and take the beam weights wisely, the entire array can
have a very sharp -- can have a somewhat sharper main loop. So I don't know if
we're talking about the same thing.
>>: We're actually going to
So if you've got these three
where you've got an incoming
at the nulls of two of them.
information is the one whose
limit you're directed to.
>> Kainam Thomas Wong:
have a chat later so maybe we can go through that.
figure of eights that are -- say you've got a case
wave that is coming from this direction. You're
So the only sensor that's going to give you any
main loop is alive [indiscernible]. That's the
But how about if --
>>:
The other two sensors become useless in that case.
>>:
Okay.
>>:
Then you get --
>>:
Yeah, and that's it, you go to six degree.
If you have the only mic, you can go a little bit farther.
>>: That's why he's [indiscernible], because if he has another one, so the
interference is going to have signal of three.
>>:
That's correct.
17
>>: Puts a null there, essentially.
be huge.
>>:
There's a place where the gain's going to
You're absolutely -- you've got to do better than that.
>>: But I'm saying like if you still have the figure of eight beam
[indiscernible] in such a way that the null is in your interference ->>:
Okay, yeah.
The null is steeper than the loop, yeah.
>>:
So [indiscernible].
>>: Assuming [indiscernible], yeah. If you have one, an interfering source
that you want to point a null to, then yeah, that's different. If you had one
source that -- if you wanted to measure the direction of the resulting system,
it's never going to be any greater than the directive [indiscernible].
>> Kainam Thomas Wong:
>>:
Right, right.
[indiscernible] have you steer it.
>> Kainam Thomas Wong: Correct, right, right. My question is even for the
ULA, an are ULA of isotropic sensors, even if we have one source and we want to
stay towards the source by wisely picking the beamformer weights, we can still
have a very high gain towards the source look direction.
>>:
Yes.
>> Kainam Thomas Wong: So the kind of beamformer, if we also use some kind
of -- I mean, if we also use some kind of beamformer and pay the beamformer
weights wisely for the three components in the velocity-sensor triad, can we
actually have sharper beam towards the [indiscernible] direction? But for the
isotropic.
>>:
[indiscernible].
>>:
[inaudible].
>> Kainam Thomas Wong:
For the ULA case --
18
>>:
[inaudible].
>> Kainam Thomas Wong: Right, but that ULA beam would still be narrower than
any one individual isotropic sensors gain. So for the ULA case, so by doing
the signal processing wisely, we can add a little bit more gain beyond the gain
of an individual sensor.
>>:
So you're talking about a rate of this type of sensors, right?
>> Kainam Thomas Wong:
Just one of them.
>>: I think the triad is on an orthogonal coordinate systems, the ULA sensors
are the same. So if you combine multiple things on the same coordinate axis,
you can get benefits by combining them if they're orthogonal.
>>:
So if they have spatial diversity, then this doesn't happen.
>> Kainam Thomas Wong: But actually, two spatial dimensions. There are
actually two -- it's the azimuth and elevation. So there are actually two
independent coordinates, spatially speaking, azimuth and elevation. And we
have three of them, three sensors.
Now, I would agree with you, if the [indiscernible] happened to be parallel to
the X axis, Z axis, yeah, the other two basically would give us zero response.
So that's basically no matter what beamformer weights we peg, it's basically no
effect.
But how about a more general situation. If it does not [indiscernible] to the
X axis, Y, Z axis, we have three sensors but two direction of arrival
coordinates. Azimuth and elevation.
>>: But you add another degree to your set. You've added another degree of
freedom to your source. So just because it's in 3D doesn't make it any -imagine you had two figures of eight. You can steer [indiscernible], whatever,
you can steer that to any angle you like. But the directivity index doesn't
change with angle. It doesn't matter if it's completely in line with one or
completely in line with the other. The directed index stays the same. And
that directivity index is limited by the directivity path note of any one of
the sensors.
19
>> Kainam Thomas Wong: But when we do beamforming with the velocity-sensor
triad, we are just rotating it. We are also making one of them larger relative
to the other ones. This is not ->>: It's the same thing.
rotation.
>> Kainam Thomas Wong:
You put a normalization term, and that's the
Okay.
I need to look more into it.
But --
>>: How do you achieve more [indiscernible] directivity when you only have
first order direction [indiscernible] if you have them in two different
locations, but this is at one point in space?
>> Kainam Thomas Wong: My problem with the discussion right now is that I
would need to have a mathematical definition of defect directivity, because, I
mean, I need to have the precise definition. My impression is that we might be
using the same term slightly differently.
>>:
So the pattern you gain is a function of the direction and the duration?
>> Kainam Thomas Wong: Yes, yes. I understand that. But when we talk about
the triad, then the directivity of the entire triad and the directivity of an
individual velocity sensor, you know, is -- then I don't quite follow your
reasoning, but I'll think more about that.
>>:
Maybe you can [indiscernible] this afternoon.
>> Ivan Tashev:
Any more questions?
>>: Yes. You showed a whole bunch of commercial velocity triads.
processing do they use compared to the method you presented?
What
>> Kainam Thomas Wong: Actually, I only showed one commercial one, the
microphone. And I think they basically just make the product. They are not
into devising beamforming algorithm, as far as I know. There are some papers
that use the microphone in some few testing, open air, indoor testing. And
those people, some of them are related to microphones, some of them are not
related. Offhand, I don't remember what kind of algorithm they use.
For the UIUC, University of Illinois Urbana Champaign one, that professor
20
basically, and his graduate student basically just build a system.
build a system, not to --
They just
>>: Those underwater microphones, they seemed to be quite well developed or
well thought about. Do you know what processing they use?
>> Kainam Thomas Wong:
>> Ivan Tashev:
MVDR.
Most of them.
More questions?
>> Kainam Thomas Wong:
Thank you.
Let's thank our speaker today.
Download