22706 >> Ivan: Good morning. It's my pleasure to... Technical University of Berlin, University of Technology in Berlin, and...

advertisement
22706
>> Ivan: Good morning. It's my pleasure to introduce to you Jens Ahrens from
Technical University of Berlin, University of Technology in Berlin, and he has
received his master's degree of science with distinction in electrical energy from
Graz University of Technology in 2005. And since 2006, he works for Deutsche
Telekom Labs and the University of Technology of Berlin where he defended his
Ph.D. in October last year. And today he's going to present analytical methods of
sound field synthesis-theory and applications. With no further ado, Jens, you
have the floor.
>> Jens Ahrens: Thank you, Ivan. The term sound field synthesis is not an
established term. People use many different terms. I will come back to this point
later. It is a method that we use for the presentation of audio, of spatial audio
signals.
But before I go into detail, I want to quickly add one comment to my biography
about the Deutsche Telekom Laboratories because this is an institute which has
a specific spatial aspect. And that is it is a public and private partnership
between the University of Technology and Deutsche Telekom. Deutsche
Telekom is a private company and a major German telecommunications
provider, and used to be the mother of T-Mobile U.S.A. And the scientific -- they
have a department which is purely telecom and a department which is purely
academic. I am affiliated to the academic department which is part of the
University of Technology Berlin. I'm attached to a professor, but the scientific
staff is sponsored by telecom. That doesn't mean that they force -- so they pay
the university to pay us, basically, but that doesn't mean that they force us to
work on telecom relevant project, but of course, they encourage us.
So now, back to the topic.
I guess you are aware that there are different approaches for the presentation of
spacial audio signals. A possible categorization might be head-related methods
on one side, which exclusively control the sound pressure at the eardrums of the
listener. Typically they employ head-related transfer functions, HRTFs, which
represent the acoustical inference of the human body on the sound waves that
impinges on a person and that typically employ headphones. And on the other
hand, they are the room-related methods which aim at satisfying a specific
portion of space. It is, for example, stereophony and ambisonics which employ a
large number of loudspeakers, say, between two and maybe ten. And there's, for
example, sound field synthesis which employees a significant larger amount of
loudspeakers.
And when I say sound field synthesis, I refer to the situation when an ensemble
of elementary sound sources, for example, loudspeakers, is driven such that a
sound field with specific desired physical properties evolves over an extended
area. And this extended area can be a plane or a volume.
So example systems, example ensemble of elementary sound sources, you can
see here on this slide, this top image shows a section of loudspeaker system that
we have installed in our laboratory. I also have a panoramic image. It's
composed of 56 loudspeakers arranged on a circle of 1.5 meters in radius. So
this is about the smallest that is useful. Then the other image shows the largest
that is currently around. This is a lecture hall equipped with I think two and a half
thousand loudspeakers all around in this gray -- along this gray ribbon, different
with 800-something individual channels, sorry, different by 16 synchronized
computers. So it's an enormous effort in terms of the hardware.
Now, this is the point where people often get skeptical. They say, okay, I have
two loudspeakers at home, and my sound system sounds great. Yes and no.
It certainly sounds great, but there are certain things that you cannot do with
stereophony or other methods that employ a low number of loudspeakers.
I would like to postpone a discussion of the use [inaudible] of sound field
synthesis to the end of the talk because I want to prefer outlining the specific
properties of sound field synthesis before so that you can better grasp my
opinion.
So for the moment, the only motivation I want to outline is that what you cannot
do with stereophony is assuring a plausible aural perspective. If you listen to a
sound system, you face into loudspeakers and you move your head to the left
and right, you hear how the sound sources move with you. So this is not
plausible, especially when you consider an extended -- a large receiver area like
in a cinema or like.
If you have -- if you are familiar with the literature on sound field synthesis, you
might certainly have stumbled over the two specific methods. One is wave field
synthesis, and the other is near-field compensated higher order ambisonics. It
is -- the best thing is to just accept the tones as they are. It would go a bit far to
describe why this -- especially this is called a term what it is termed. It's more
confusing rather than clarifying. So I will refer to this method as ambisonics.
Those are -- can be formulated analytically, and of course, if there's analytical
solution, there can also be a numerical one.
I will not explicitly speak about numerical solutions. At specific points, I will
outline the impact of the results I present on the analytical solutions.
When I started working on these topics, the problem was that the methods listed
here, they based on very, very different formulations, and this is even
diplomatically expressed. Actually, even the persons that -- the personalities that
work on the different methods are -- it's two universes that crash, so either you
have people who work specifically on wave field synthesis or on ambisonics.
And hardly anybody in the entire world, actually, works intensively on both.
So our motivation was to find a framework that allows to compare the two
methods. And this is what I am going to present. And also this unified
framework that I will present, I will allow for a certain extensions of the theory and
I will especially pay attention to the possibilities and limitations of these methods.
So this is then the contents. This will be the biggest part of the presentation, the
theory. I will present the framework and the properties in considerable detail,
and after that, there will be a short illustration of what type of synthetic sound
fields you can use in order to creatively employ sound field synthesis.
So now let's go into the theory.
Very elegant starting point to assume a continuous distribution of secondary
sources. Typically in theory, you would not speak of loudspeakers because
sometimes you assume properties that are not -- cannot easily be associated to
a loudspeaker that has a -- is spatially is a district entity, so in this case, you
assume a continuous layer of the loudspeakers of secondary sources, and then
you can easily compare the situation to related problems for which the solutions
exist, for example scattering problems where you have an inhomogenous
boundary of a volume which presents the acoustical properties of a specific
object. And this is exactly what we have here. The continuous distribution of
secondary sources can be interpreted as an inhomogenous boundary.
So the essential equation is what we term synthesis equation. It states the
following: We have in this case an enclosing distribution of secondary sources
[inaudible]. The letter G refers to the spatial transfer function of the secondary
sources. We use the letter G because it may be interpreted as a [inaudible]
function. And each X refers to a point in space, and X not refers to a point on the
secondary source distribution. And D is the driving signal of the secondary
sources.
And if you integrate over this distribution, which is enclosing in this case, it
doesn't need to be a contour integral. It can also be a plane or whatever.
Then this will yield the synthesized sound field. Of course usually you would not
want to know what sound [inaudible] if you drive the secondary sources in a
specific manner. You would rather want to know how to drive the loudspeakers
in order that the specific sound [inaudible]. So you want to dictate this and find
out that. And this is the essence of the theory of sound field synthesis.
The very important and very important aspect of this continuous formulation is
the fact that you construct -- can construct a situation where you can show that a
perfect synthesis, physically perfect synthesis is possible because if you meet a
situation where the requirements for this perfect synthesis are not so clear to
you, immediately, before you find the solution that certain limitations arise. And I
will outline that later.
So this is again a synthesis equation, just for reference. And you can physically
interpret the secondary source distribution as a single-layer potential, which is
often employed in scattering problems. There is such that on a scattering object
and object that -- on which a specific field, for example a sound field impinges, is
replaced by a continuous layer of secondary sources by a potential -- single layer
potential which represents the properties of these -- the acoustical properties and
the impact of this object on to the impinging sound field. This is mathematically a
very, very simple -- similar formulation, so you can learn a lot from the solutions
that have been found there.
And for integrals like the one stated here, there is a theorem established by a
Swedish mathematician, Eric Fredholm, which says that, interpreted for this
solution, it says that an exact solution exists for arbitrary source-free sound fields
if the secondary source distribution enclose the target volume. And it has to be
simply connected there so more prerequisites that have to be met. But if we
have a simply connected enclosing secondary sources, we know that if it's -- we
can find a perfect solution. And this solution has been found where via an
orthogonal expansion of the involved quantities, the sound field, the driving
function, and the [inaudible] secondary source transfer function.
Although this is possible for very -- for many or for arbitrary geometries, only the
solution for the sphere is practical because in other cases, you might just not find
the suitable orthogonal expression, expansion, and if you find it, it will be very
difficult to implement it in practice.
So now I want to illustrate the [inaudible] sphere problem. Let's assume a
sphere, which is indicated by the gray shading, which is centered around the or
begin of the [inaudible] system. It has a radius upper case R, and it encloses the
target volume. So we want to synthesize a sound field inside this sphere.
Then this -- the synthesis equation looks like this. The integral over the alpha
refers to the [inaudible] better to the [inaudible] latitude, and this expression here
and here, this represents integration of our sphere.
So the -- according to the Fredholm theorem, we can expand the [inaudible]
quantities into a specific orthogonal basis functions. In this case, these are
surface spherical harmonics. Don't be puzzled if you don't know -- if you're not
familiar with a spherical harmonics. It's not necessary to know the details in the
following. Just accept that it's complete phases into which you can expand the
involved quantities.
And then you compare -- you perform a comparison of coefficients, and you can
easily solve for the driving signal.
I want to propose and alternative which is actually exactly the same, which for
this example, it will be just be later -- later in the talk, it will be very helpful to use
in this interpretation that I propose. I propose to interpret this integral as a
convolution along the surface of the sphere, because then a convolution theorem
can be established, which relates the coefficients of the inverse quantities with
respect to an expansion into spherical harmonics. So this is actually exactly the
same. This is the comparison of coefficients. And as I said, later in the talk, it
will be clear why this can be beneficial.
So you have some type of space spectrum representation of the individual sound
fields which can be easily solved for the coefficients of the driving function and
provided that there is no zero here, which is the case, for example, for only
directional secondary sources. You can compose the driving signal from the
coefficients. These are the basis functions so you have to sum up over this
infinite amount of basis functions, but you don't need to go to infinity because this
is converted above a specific threshold, so you don't need to wait forever until the
computer has solved the result.
And as theoretically predicted, the solution is exact.
Now using relations to illustrate what it looks like, this is the top view on a
horizontal plane. This is the secondary source distribution. And it is different
such that it synthesizes a plane wave that moves downwards on the screen. And
usually we use a plane wave to illustrate the results because it has very -- it's
defined very simple properties. Plan [inaudible] and also equal amplitude
anywhere in space, so you can easily detect if something goes wrong.
And now on the next slide, I will present the view from the right on to the plane
that is perpendicular an the screen. So you see that indeed this is the horizontal
plane by the way. Indeed, in the entire volume, the sound field is properly
synthesized. So.
So unfortunately -- it is a bit unfortunate that only the result for this sphere is
useful because many other -- I'm sorry. I completely forgot about that.
If you look at the classical literature on ambisonics, you find that you find a very
similar formulation. Only in that case, they don't assume a discrete distribution of
secondary sources, but a continuous distribution of secondary sources but a
discrete setup of loudspeakers. So instead of an integral, you have a summation
here over the amount of upper case J loudspeakers, which is then expanded into
sphere harmonics and the coefficients are compared. So if you compare this to
the solution I have proposed, you see that it's exactly the same.
So the first result is that ambisonics is a special case of the explicit single layer
potential solution. More explicitly, it is discrete formulation of the spherical
geometry.
So the pity about this Fredholm theorem is that we want to use other geometries
than the sphere. For example, a circular distribution. In this case, it is located in
the horizontal plane. Now, we know, we see already that the prerequisites for
the [inaudible] solution are not fulfilled because the secondary source distribution
does not enclose the target volume. So we have to expect specific limitations.
But the way we found the solution for the sphere is also helpful here. In this
case, the synthesis equation represents a circular integral over the [inaudible].
And in that case, you can interpret it as a convolution along the surface of a
sphere and relate the [inaudible] expansion coefficients, which are the
coefficients of a different type of orthogonal expansion of the sound field. And
then again, you can solve it for the driving signal. And if there's no zero here,
you can construct -- reconstruct the driving function. These are the basis
function in this case.
Though we find out that you see it already here, there's the radius R in the
equations. You have to reference a synthesis to the center of the setup. That
means there's only in the center of the setup there's one single point where in the
general case the sound field will be correct. Anywhere else, it is -- it is different
from the desired sound field.
So this looks then, for example, for the plane wave like this. Again, it's a top
view. On the horizontal plane, you have the secondary source distribution
indicated by the black circle, and you say the plane wave exhibits plane wave
fronts or at least straight wave fronts in the horizontal plane, but it also exhibits a
specific amplitude, DK, which is very [inaudible] for this type of synthesis, which
is termed -- I go back two and a half, I mentioned synthesis because you have
secondary sources with a three-dimensional transfer function but you aim at
synthesis in a plane and since it's something that in between a two-dimensional
and a three-dimensional problem, you can term it two and a half dimensional
problem.
And again, if we look from the right inside on to the plane that is perpendicular to
the screen, it looks very different. This is again, the secondary source
distribution. The plane wave inside the horizontal plane. It travels to the left but
outside of the horizontal plane, of course the sound field is very different from
what we want to have.
And another quick last example, this is for a linear secondary source distribution
situated along the X axis. You can find another convolution theorem which
relates the quantities in wave number domain. And again, you can solve this for
the driving signal and again you have to refer it to a line parallel to the secondary
source distribution, which is then the only location where it's correct. Again it's a
two and a half dimensional synthesis, and the simulated sound fields look like
this. This is the secondary source distribution, and again, we have this
characteristic amplitude DK. And also we can look on the plane that is
perpendicular to the screen, and you see again outside of the horizontal plane
which is indicated here, the sound field is different from what we want to have.
So this is a summary of the geometries for which we found the solution. The
sphere is the classical problem for which many authors, some of which are listed
here found a solution, which they came from different directions, but in the end of
course they found the same solution. And for the other geometries, we have
proposed solutions.
Excuse me.
So that was the explicit solution of the synthesis equation. You can also find an
implicit solution. And I will tell you in a minute what that is.
If you -- and this is actually provided by the approach of wave field synthesis.
Wave field synthesis basis, either you can start with two different -- start from two
different directions. You can either start with the Rayleigh Integral or
Kirchhoff-Helmholtz Integral. I will quickly summarize how it can be derived from
the Kirchhoff-Helmholtz Integral.
The Kirchhoff-Helmholtz Integral represents a physical law. It represents the fact
that if you consider a volume, in this case the yellow area, which is source free,
and if you know the sound field and the field of the sound field perpendicular to
the boundary, then you know exactly what sound field is apparent inside the
volume.
On the other hand, you know that you can reinterpret this and state that if you
can control the sound pressure and the gradient of the sound field on a boundary
of a specific volume, you have full control over the sound field inside this volume,
inside this boundary.
You have to, in order to arrive at a practical solution, you have to apply a number
of approximations which then yield a very simple driving function which can be
implemented very efficiently, which is then only a high frequency approximation
of the correct solution, but this is not a problem, as I will show in the following
slides. Sorry. One slide in between.
This is then a mathematical formulation. Actually, this is a little bit simplified, but
it represents what I want to say. I want to say that we find the driving function
without solving the entire equation from the physical relationships between the
sound field on the boundary and the sound field in the volume, we can calculate
the driving function.
I want to emphasize that this S and this S, this is the same. So if we know the
sound field, we have to take some gradient and we have to do some selection of
the secondary sources. We cannot use them all. We have to carefully select
which one we use. We can calculate the driving function without solving this
equation. This is why we call it an implicit solution. And just for illustration, the
secondary source selection that you have, that is a consequence of one of the
approximations you apply. [Inaudible] the fact that you use only those secondary
sources which are illuminated by the vertical sound field.
So assume such a secondary source distribution and you want to synthesis a
plane wave that travels to the right, you use only those secondary sources which
are illuminated by the virtual sound field. And for the one source, this is similar.
This is kind of intuitive, but actually if you analyze the exact solution, you will find
that also the other secondary sources on the wrong side of the array, they also
contribute of course in a very minor way, but physically they contribute to the
synthesized sound field, although it travels into the other direction.
Okay. Now, the illustration of the accuracy. On the left-hand side, you see the
exact solution, and own the right-hand side, the W of [inaudible] solution for
spherical secondary source distribution. This is the illuminated area, the
secondary sources that are active, and those are those secondary sources which
are inactive. And you see at very low frequencies. In this case, this is 200 hertz.
You see some minor deviations of the WFS solution compared to the [inaudible]
solution. But this is -- it's hard to prove, but this is most certainly inaudible. So
this sounds very much the same. And if you go to a higher frequency then -yes?
>>: [Inaudible] you are using half of the points?
>> Jens Ahrens: Kind of. The approximation, this is -- you have -- you apply two
approximations in this case. And one, the consequence of one is that you use
only half of the secondary sources. If you would only use half of the secondary
sources here, it would be -- this would become more similar to that one, but not
exactly the same. Is that okay for the moment?
>>: [Inaudible] two approximations [inaudible] some other approximation.
>> Jens Ahrens: Exactly. And as I said, that is high frequency approximation
that is applied in WFS. So if you go to higher frequencies, if the wavelength is
significantly shorter, then the dimensions of the secondary source distribution,
there's hardly any difference. And so I go back. As I said, this is probably -most probably not audible.
So to summarize, the treatment of the continuous secondary source distributions,
we have an implicit and an explicit solution. And the explicit solution is obviously
found by explicitly solving the equation. And the implicit solution is found without
solving the equation, by considering some physical relationships. And the
physical accuracy of the solution is -- at least when it comes to sound field
synthesis, it's equivalent.
So far so good.
The problem is just that in theory, okay, we have such continuous distributions,
but in practice, we are something like this. A discrete setup of a finite number of
secondary sources. So in order to model -- to mathematically grasp the impact
of this spatial discretization, we do not assume a discrete secondary source
distribution, but we assume a continuous distribution that is excited at discrete
points, because then all the mathematical relationships we have established
between the different quantities, they stay valid, and we just need to find a new
driving function that equals the continuous driving function at the sampling points
and vanishes elsewhere. This is exactly what we -- what is done in the analyzes
of, for example, the discretization of a time -- of a time continuous signal.
And in order to emphasize the relationship to this [inaudible] something, I want to
quickly review the process.
Consider a continuous time domain signal [inaudible]. In this case it's purely
[inaudible] so it might have a magnitude spectrum like this. Typically they are
some [inaudible] anti-analyzing filter is important in order to ensure that it's band
limited, so in this case it doesn't change much.
If you then space -- if you then discretize the signal in time with a specific
constant interval, you will get repetitions in the spectrum of the signal. And in
order to reconstruct the signal, you impose a low [inaudible] filter in order to filter
out the basement of these repetitions and the result is then if everything goes -- if
all parameters are selected accordingly, is then the reconstruction signal is then
equal to the initial continuous domain signal. And this is very similar in spatial
discretization.
So here, if we discretize a time domain signal, we get repetitions in time
frequency domain. In spatial discretization, if we assume constant something
interval, we get repetitions in different representations of this space frequency, of
the space spectrum, in which representation these repetitions arise depends on
the geometry of the secondary source distribution.
So far this spherical distribution, we have repetitions in the coefficients of this
spectrum harmonics expansion. For the circle, we have the repetitions in the
Fourier series expansion coefficients and for the plane or the linear source, we
have repetitions in a case base.
So this should already ring a bell. The -- this list, remember, for this sphere, we
found a solution in the spherical harmonics. Here we found it in the Fourier
series, coefficients domain. And the solution for the lines, the linear [inaudible]
we found away from the [inaudible] domain.
So we have already calculated everything we need. We just need to analyze it a
bit more. Now we'll use the example ->>: So yeah, this is [inaudible].
>> Jens Ahrens: No. This is always the case.
If you -- I will analyze the artifacts later in the talk, but if you have placed the
loudspeakers closer than half the wavelengths, you will -- the -- you will hardly
have any impact or hardly have any difference between a continuous distribution
and the discrete distribution. But these spectral repetitions, they always arise.
It's just such that the spatial transfer function of the secondary sources, for
example, if using [inaudible], they suppress these repetitions, these -- because
when you have a small sampling interval when the loudspeakers are close
together, these repetitions are at very, very high space frequencies, and there
they are suppressed by the spatial transfer function of the secondary sources.
So you just don't realize that you have these repetitions there.
>>: [Inaudible].
>>: Try saying that again.
>>: Is there an assumption or does it get messier if you have nonuniform
[inaudible]?
>> Jens Ahrens: It gets significantly messier because they you cannot identify
these repetitions. Something very [inaudible] happens then.
We have not analyzed it explicitly, but even this analyzes, it's quite complicated.
So if you -- this is the simplest case that we analyzed. If you analyze more
advanced cases, it's really difficult to deduce any findings.
So I want to illustrate this process and sampling and the [inaudible] repetitions on
the example of the circular distribution. Remember, the driving function is a
Fourier series. At least are the basis functions. The coefficients, and this is
[inaudible]. Again it doesn't need to -- you don't need to [inaudible] minus infinity
to infinity because of the conversions of this series.
So the magnitude of the driving function may look like this. This is some
triangular -- it has some triangular shape. On the horizontal axis you have the
order M. In the center is zero, and this is the time frequency.
Why it is such is not significant for the moment.
So this is a continuous. This is the -- what it looks like for a continuous driving
function. If you sample this driving function, these special repetitions arise.
>>: Excuse me. What is again the X axis?
>> Jens Ahrens: This is [inaudible] -- sorry. This is the order M, the index of the
Fourier series for which you can interpret as some type of space frequency.
So if you sample this continuous driving function, you get these repetitions. And
since for unbounded frequency, the -- this -- the -- the spatial bandwidth of this
driving functions is not limited so the higher you go, the further you go to the
sides. And the repetitions can overlap and interfere and cause [inaudible] spatial
aliasing.
And this is what we propose the term especially for band driving function
because, of course, I go back once more, you don't need to go from a very high
number to a very high number. You can consciously reduce the spatial
bandwidth of the driving function just by doing a -- choosing narrower summation
limits, and then the continuous driving function may lose -- look like this. We just
decided here to sum only from minus 26 or minus 27 to plus 27. Because if you
then sample this driving function, the repetitions will not overlap and there will be
no interference. And the base band will be uncorrupted.
This has of course significant impact on the synthesized sound field that I will
illustrate in a minute.
So but before that, I want to say that if you analyze what happens in wave field
synthesis, you realize that it's especially [inaudible] approach. The first one, this
one, this is what happens in wave field synthesis. And if you analyze
ambisonics, then it's a narrowband approach. And this is again this case.
So we have one conclusion that means, in general, W of S constitutes a high
frequency approximation which is probably irrelevant. [Inaudible] compensated
infinite order ambisonics.
Usually this is termed higher order ambisonics, and this term higher order refers
to the fact that -- go back again -- you stop at a certain point. They just chose
this terminology, although it's -- retrospectively, it's not a very useful.
So if you would go in ambisonics to infinite order, you would have approximately
the same. But the fact that W of S is spatially full bent and ambisonics is narrow
bent, has a very essential impact on the synthesized sound field.
At low frequencies, you hardly see any difference. By the way, this example is
equivalent to 27 order ambisonics so from minus 27 to plus 27. At a frequency of
one kilohertz, you hardly see any difference. If you go to two kilohertz, you
already see in the full bent example, [inaudible] portion of the target area gets
corrupted by artifacts commonly referred to as spatial aliasing. And there have
been synthesis something else happens. The sound field gets smaller and
comes -- and the energy concentrates around the center of the array. If you go
even higher in frequency, you see that at a certain point, W of S, the entire target
area is filled with art facts super posed on the [inaudible] sound field and in the
ambisonics example, the design sound field gets very small and outside you
have significant artifacts.
This is only half of the truth because you can interpret this monochromatic
analyzes as some kind of frequent -- yes?
>>: So when you [inaudible] in the [inaudible] domain, what unit is this?
>> Jens Ahrens: It's no unit. I go back again. The order of 27 means that we
choose the bounds of this summation as minus 27 and plus 27.
>>: Oh, so it's more the size of the [inaudible].
>> Jens Ahrens: Yes and no. It's represents the spatial bandwidth. I go a bit
forward again. So you could also go from -- you could choose an order of ten.
Then you would make this even narrower. So if you then sample it, you would
have -- the repetitions themselves to be a bit narrower. But in a practical
implementation, it means that -- want to go back again.
Each mode is a filtering operation. And as you -- at the end, you sum the filter
signals according to this formulation. So if you reduce the order, you have to do
less filtering and less summations, which is a bit more efficient, but has essential
impact [inaudible].
Does that answer your question?
>>: Yes.
>> Jens Ahrens: So as I said, these monochromatic considerations ->>: [Inaudible].
>> Jens Ahrens: I'm sorry. I forgot [inaudible]. This is the example that I
showed in the panoramic image. That one. This is the parameters. It shows me
exactly like in this example, in this image.
So in this case we looked at individual frequencies and investigated what
happens depending on the spatial band width. And what is more essential for
the ear is the time domain structure of the sound field. And we have also made
some simulations. I start with wave field synthesis. So you will now see the
impulse -- impulse response of the loudspeaker system when different in order to
synthesize a plane wave. So that means once I have a plane wave that moves
downwards, and if you do that with a discrete array, it will look like this. So in the
beginning, there's not much to see, but then you realize that in the wave field
synthesis case, you have indeed a very strong first wave front, and this is
actually what we want. But afterwards, you have a lot of different wave fronts.
[Inaudible] which is spatial aliasing, which move into various directions. And in
the ambisonics example, it will look like this.
In the beginning it all looks very different, but then you realize that something
else goes on. In the center, we have this area where there's hardly any
corruption, where we have indeed the desire to impulsive wave front. But then
we have some low frequency wave front around here followed by high frequency
wave fronts.
So here there's a separation between a low and high frequencies taking place. I
go back a bit on the slides, which is you can also see this on this image. Here
you have the desired sound field superimposed with artifacts in here. You have
the synthesized sound field exhibits a specific curvature at high frequencies
which is not there if you go to lower frequencies. Like in this case, at low
frequents [inaudible] and this is also apparent in the time low domain analyzes.
So and this is perceptually a big difference. We have here -- we have the
strong -- the first wave front, and here we have a separation of low frequency and
high frequency energy.
So I see that it took me a bit longer than I initially planned, so I will have to
accelerate a bit.
And I want to illustrate the essential [inaudible] acoustical mechanisms on the
example of two sound sources emitting sufficiently coherent signals, for example
two loudspeakers emitting two signals. Yes?
>>: Is there any [inaudible] -- [inaudible]. Is there any advantage to [inaudible]?
>> Jens Ahrens: Yes. You could choose a different arrangement of the
loudspeakers, like the [inaudible] space, [inaudible] space microphone array, but
then -- then you can reduce the spatial aliasing, but only for one specific
propagation direction of the plane wave.
If you want to synthesize a different propagation direction, then you would have
stronger artifacts than with the uniform sampling. So let's maybe use this
example. You could choose to put the loudspeakers closer to go at the point
where the [inaudible] wave touches the second noise host distribution and
choose a wider spacing here. Then you could race the aliasing frequency the
frequency where the artifacts arise, but this then would hold only if the [inaudible]
propagates downwards if you want to synthesize a plane wave from that -- plane
wave that propagates in that direction, you have more aliasing than in the
uniform sampling.
So back to the psycho-acoustical mechanisms which might be relevant here.
There's one mechanism which is psycho-acoustical mechanisms which might be
relevant here. There's one mechanism which is termed summing localization,
which happens for example when you listen to stereophony. If you hear two
loudspeakers, two loudspeakers are playing, but you hear only one sound source
in the center, so the ear kind of combines the two signals and you hear some
ever rich. You don't hear the sounds from left to right. You hear them in the
center if two loudspeakers are playing coherent signals. And if you delay one
loudspeaker by less than a millisecond, then the source moves towards the other
loudspeaker.
Then if the time delay between the two wave fronts is larger, than for example, a
millisecond -- approximately a millisecond, then the precedence effect takes
over. So you will hear the sound source exclusively in the earlier loudspeaker so
that only the early loudspeaker determines localization, and the second -- the
signal of the second loudspeaker will not consciously be perceived. It just adds a
bit to the spatial impression. And if you have even stronger separation between
the two wave fronts in time, you can perceive an echo.
So and what happens here -- I'll go back -- is probably a mixer of everything.
You have a strong wave front followed by closely spaced wave fronts where
spacing is much closer than one millisecond. So this could trigger some
localization. Might not. It's very difficult to say. And what happens here is very
hard to interpret.
We have made some -- many informal and also some formal experiments which
we have also published. We have measured tens of thousands of impulse
responses of [inaudible] related impulse responses from the system we have in
order to binaurally -- to simulate the loudspeakers system over headphones, in
order to have a controlled environment where you can seamlessly switch
between different listening positions and different methods, including head
tracking and everything. You can download some samples from our spatial
audio block which contain the full binaural information so that they can get an
idea of what it sounds like.
It's very hard to summarize our findings so far into synthesis, but the only thing I
want to emphasize is that it's very difficult to find out what happens because if
you think you have found a basic mechanism in one situation, in a different
situation, it might be completely different.
So I think since the time has progressed so much, I will skip -- I'm sorry.
>>: [Inaudible].
>> Jens Ahrens: Sorry. I was aiming at half past 11:00. So I can -- I'm sorry.
I'm a bit confused. So I can take it easy. Very good.
So I can summarize on the dysplasia discretization artifacts. If you saw that at
low frequencies, they all -- even the discrete secondary source distributions, they
all perform comparably. But the big difference is in the performance at the high
frequencies. And what properties arise in the synthetic sound fields depends
essentially on the spatial bandwidth of the driving function. And I have put my
Ph.D. thesis here as a reference because I see this as the most significant result
of my work so far.
And by the way, the spatial -- the spectral repetition thing in all these findings are
also for numerical approaches because it's -- again, it's still a discrete area of
loudspeakers and the repetitions in the spectrum of the -- in the spatial spectrum
of the driving function that arise anyway. No matter if you -- if you have found the
solution numerically or not or if you have used an explicit solution or an implicit
solution, it -- the repetitions are always there.
So and I also show that a manipulation of the spatial bandwidth can change the
properties of the artifacts. I go back again to remind you. Here you can have
artifacts that superimpose the desired sound field sample, and here you can
separate the artifacts and the desired sound field.
And there's more possibilities. You could -- and we have proposed a number of
methods that we decided to term local sound field synthesis because you can -- I
go back again -- you can move this spot where the synthesized sound field is
accurately synthesized to different locations. And since this represents a local
increase of the accuracy of course by the cost of more significant artifacts
elsewhere, we decided to term it local sound field synthesis.
So I think that was enough for the moment for the analyzes of the basic
properties. We can move on to some application examples.
The plane wave example I showed is of course not very impressive in practice
because it's a very simple sound field. You can do much more.
You can, for example, have focus sources. I can show you quickly what a focus
source is. This is a focused source. We have a secondary source distribution
down here. This is a linear one in that case, and it synthesizes a sound field that
converges towards a focus point. This is where the sound field focuses and this
is the area where it converges. It passes the focus point and diverges again.
And in this part of the target plane, it looks very similar to the sound field of a
mono [inaudible] sound source that is located here. So if a listener is positioned
here, he will hear a sound source in front of the loudspeakers. This is -- it
actually works in practice. It exhibits some essential limitations but, still, it can
work.
And this is a very essential property of sound field synthesis because if you think
about large venues like a cinema, in stereophony, the problem is that you cannot
create auditory events that are closer than loudspeakers.
So if you want to, for example, synthesize a soundscape, replay a soundscape,
like rain over stereophony, you always -- it sounds like you're sitting in a dry
bubble and the rain is happening somewhere in the distance or around you, even
if it's a surround system. But you can use for example these focus sources
[inaudible] plays individual raindrops in the audience, and this essentially
increases the immersion because you really feel like being inside the scene.
So one approach to synthesize ->>: [Inaudible] question here [inaudible].
>>: [Inaudible]. [Laughter].
>> Jens Ahrens: I'm afraid I cannot. Not acoustically.
>>: [Laughter].
>> Jens Ahrens: So one way to find the driving function for the focus sources is
to start with a point source, purely diverting sound field indicated by the arrows,
and then reverse the timing of the loudspeakers.
In this case, the loudspeakers in the center plays first, and then graduated
towards the sides and the loudspeakers -- the other loudspeakers play.
If you now make the loudspeakers on the far end play first and the loudspeaker
in the center play last, you will indeed get this sound field that diverges towards
this focus point and then that converges to the focus point and then diverges
again and makes up a focus source.
You can -- we have proposed a number of different formulations of this which
plays on the manipulation of the design sound field in a specific [inaudible]
domain like a plane wave decomposition. You decompose the sound field of a
mono [inaudible] into plane waves, and then recompose it only by taking part of
the -- one section of plane waves, those which propagate into the target area,
and the result is then a focused source.
I will not go into details because -- yes?
>>: [Inaudible] reconciled this [inaudible] at that point?
>> Jens Ahrens: No. I have to move to the back of that in order to show you
once again. I have prepared that.
Focused sources have very specific properties in terms of the aliasing artifacts.
This is found here.
So this is -- this is again a linear array. The secondary sources are marked by
black crosses which you probably cannot see, but they are around there with a
spacing of 20 centimeters. [Inaudible] focus source one meter in front of the
array for a low frequency of one kilohertz, it all looks fine.
If you move on to a higher frequency, you see that around the focused source,
always in any case, no matter if it's spatially broad, wideband or narrowband,
always a region arises where there's no artifacts. The artifacts are elsewhere,
and this region gets smaller if you go higher on frequency.
So this is very different. And this is -- this is -- can be beneficial if you're located
here, but it can be problematic if you're located there because in the case of
focussed sources, it is such that it's not -- it's not such that the desired wave front
is followed by the artifacts, but the artifacts occur before the desired wave front.
And this triggers -- obviously triggers very different perceptual mechanisms.
>>: [Inaudible]? Does using loudspeakers really change any of [inaudible]?
>> Jens Ahrens: Not in terms of the aliasing, no. If you [inaudible] loudspeakers
has a specific more complex directivity towards higher frequencies, that has
some impact on the desired -- on the sound field synthesized, but it's not very
essential. It changes the amplitude distribution a little bit, but there's no essential
impact on the structure of the wave fronts which is the more important.
>>: [Inaudible].
>> Jens Ahrens: You mean if the loudspeaker has a considerable spatial extent?
>>: Well, I mean, one, two meters ->> Jens Ahrens: That is the spacing between the loudspeakers [inaudible]?
>>: [Inaudible] the size of the plane that the loudspeaker is driving is not
considerably smaller than [inaudible].
>> Jens Ahrens: At low frequencies, it's not. Again, back to the photograph, I'm
sorry.
So these loudspeakers, the spacing is about 17 centimeters. So the
loudspeakers we chose are significantly smaller, but they are two-way
loudspeakers where you have some midrange driver and a high frequency driver.
And when the large loudspeakers is active, this is at a rather low frequency range
where there's hardly any artifacts anyway. And at the high frequency range,
where the considerable artifacts arise, we have the very small loudspeaker which
is active, so it won't change much.
>>: [Inaudible] surface.
>> Jens Ahrens: Yeah. It's hard to say where there's -- the transition between
the two, but it is very -- the problem is that at high frequencies, it is very difficult
to measure the directivity of a loudspeaker with higher spatial resolution. So it's
hard to say what happens.
So that was the focus source. And of course, you don't need to synthesize -assume an omnidirectional point source or focus source. You can also describe
sources with complex directivities.
An example is this one. The following, you can -- the problem is -- and -- that in
wave field synthesis, you have these strong first wave front that I showed you,
which on the one hand causes very -- actually very localization. This is often
praised in scientific papers. That's fine.
But in some cases, it can be a problem because the locatedness of a sound
source is very large. That means the amount to which the energy is focused to a
point is very high, so you hear a very small source. If it has a far [inaudible],
imagine a venue like this room equipped with a [inaudible] loudspeaker system,
you have sound source, it's three or four sound sources. You have a lot of space
in between the sound sources where there's nothing. So you would want to
control the perceived extent, the apparent source width of the sound sources.
One -- some approaches that have been proposed in the context of stereophony
and such methods, they split a given sound source into several sources which
are put at different locations, and those individual sources are then different with
decorrelated signals. It can sound extremely good. I heard a demonstration by
Villa Puki (phonetic) at last year's spatial audio -- AES spatial conference in
Tokyo. It was really, really impressive.
The only problem is that you cannot control the -- for example, the orientation of
a sound source, so there is some room for improvement.
So we have recently proposed to model a virtual sound source, to model a virtual
sound source that vibrates in higher spatial modes. Like, for example, a string of
a guitar, it doesn't only vibrate in the very low mode like this, but there is some
partial -- there are some harmonic modes and you can do this also seamless with
a spatial vibration.
So it -- then a very complex sound field evolves which lowers the coherence
between the signals at two ears, at the ears of the listener, and then this can
increase the perceived extent.
I have some example which will be a two-channel audio signal, which the
channels contain the sound fields of the signals captured by a virtual -- by two
virtual omnidirectional microphones put in the sound field in these positions
approximately at the distance of the ears.
It is not -- you will not -- it will not sound exactly like you would hear, like what
you would hear, but you would at least get an idea of what it could sound like.
So first, the input signal. (Speaking German.)
If you are sitting in the center, you hear a sound source in between the two
loudspeakers, which is ->>: [Inaudible].
>>: -- which is rather flawed. So maybe -- maybe you want to move here. And
the best is of course always to close your eyes. So I play again. (Speaking
German.)
And then a simulated source of half a meter of size, of length, a plane, a plate of
this size. (Speaking German.)
The room, but also if you listen over headphones, you see that the sound gets
indeed broader. Maybe you hear -- you don't look so convinced, so maybe
loudspeakers are too high or too far apart.
And also there are many parameters that you consume in this example, and this
is -- the parameters we chose are certainly far away from being optimal.
I have another -- one last example. In many cases, you will, of course, want to
have a sound source that moves. If you -- in real life, a moving sound source
exhibits the Doppler effect. That means that is a consequence. Fact that the
speed of sound in air is constant irrespective of the motion -- of a source.
So a sound source that moves towards you raises -- if the sound source -- the
source moves towards you, the pitch is raised. And if it moves away from you,
the pitch is lowered.
In many applications you will not want that because if you think for example
about the telephone -- teleconferencing scenario, you have people talking and
you move them with some multi [inaudible] or other interface and if you move the
persons closer, their voice will rise and if you push them away, they will -- the
pitch will go down. So this sounds very irritating, so you would want to suppress
this Doppler effect.
What you can do is to synthesize a sequence of stationary positions and cross
fade between the sequence. And this is, depending on how you choose the
parameters, this on one hand it can suppress the Doppler effect, but on the other
hand, if you use very short intervals of the sequence, this can also create a
frequency shift.
We have then provided -- proposed to derive the driving signal from a model of
the sound field of a moving source in order to synthesize the Doppler effect, and
this indeed works. I have -- sorry -- an animation of it.
So this would be the moving source. And it moves quite quickly. This is of
course slow motion. And you see obviously how the sound waves are
compressed in the direction of motion and how they are expanded in the other
direction. And if you drive a loudspeakers system accordingly, then it will look
like this. The source moves along here approximately. So in direction of motion,
sound waves are compressed. And in the opposite direction, they are expanded.
And you can do much more. I think these are enough examples for the moment,
so I want to -- before I conclude, I want to mention the software we wrote. It's -we termed it soundscape renderer. We had to find a name for it. It implements
the most basic versions of most of these spatial audio rendering algorithms,
wave field synthesis, [inaudible], [inaudible], and many binaural versions of the
binaural rendering.
This is an example with a loudspeakers system. These symbols represent the
56 internal loudspeaker system, and then they have a lot of sound sources. And
if you will listen, use a binaural method, loudspeakers system is replaced by
yourself. And there's an interface for head tracking everything. You can
download the source code.
The software only has one flaw. I hope you don't mind. [Laughter].
>>: That's [inaudible]. [Laughter].
>> Jens Ahrens: I'm sorry. It was not my decision.
So I finally conclude.
The physical fundamentals of sound field synthesis can elegantly be described
using integral equations. Then we have identified ambisonics and wave field
synthesis as two special cases of the solution. The spatial bandwidth is essential
in terms of the properties of the synthesized sound field.
Remember the different structures of the wave front. Then I showed that various
types of virtual sources are possible, moving sources, sources with directivity,
focused sources and the like.
The only problem is that perception of synthetic sound fields is hardly known.
Now, I want to -- I have one last slide within Outlook. I will start with a white slide
in order to increase the excitement. [Laughter].
I have been one of the very few fortunate -- very few fortunate persons in this
world who have access to all the different types of audio reproduction systems. I
have access to this 800-channel wave field synthesis system. We have this
56-channel system in our laboratory. We have a heap of individual loudspeakers
that we can arrange in various ways. I have travelled to many laboratories in
order to hear the system.
And I think it is indeed such that sound field synthesis can be superior to other
audio presentation methods, especially for large venues or especially if you have
a clear reference of what something should sound like. If you listen to a pop
music recording, it won't tell you much because there's no acoustic, no
unamplified pop music, so the sound of pop music is very closely related to the
sound of stereophony.
But if you listen to classical music played over stereophony or traditional
ambisonics or any other loudspeaker arrangement with a low number of
loudspeakers, it sounds different to the -- to real classical music in the sense that
it doesn't contain as many details, both spatially and also with respect to the
[inaudible]. So some information is getting lost.
This can be better if you use sound field synthesis because you have more
degrees of freedom; you can add more details. Still, it doesn't sound perfect, but
it can be superior. Not always. Often situations stereophony and other methods
are superior, but certainly in the current state, sound field synthesis is not the
ultimate solution. It's just not reasonable to employ dozens or maybe hundreds
of loudspeakers for this potential benefit.
On the other hand, other audio presentation methods can most likely not
overcome the limitations. For example, in stereophony, you will always have a
sweet spot, especially if you serve multiple people. It's -- there's no way to
overcome this.
So neither/nor is the solution. Probably we will experience a convergence
between the methods.
There's a small company called Iozono (phonetic) in Eastern Germany which
commercially sells wave field synthesis systems, and they have also equipped,
for example, Manns Chinese Theater in Hollywood with a system. And they have
a new generation of systems where they significantly reduce the number of
loudspeakers. So they have a -- initially they had a loudspeaker spacing of
maybe 15 centimeters or so, and now they use half a meter to a meter. So this
significantly reduces the amount of loudspeakers.
This new generation of systems, I heard it twice, once in the laboratory where it
sounded really great; then I heard it on tune master [inaudible] which is a
German meeting of tone masters of balance engineers. So it was a different
room, though the same system. I was not really convinced, but still this is
certainly an important step into the right direction.
And also, this convergence of methods could be something like a desktop audio
system where you have -- where you put a number of small loudspeakers onto
the screen and then you form beams that control the sound pressure at the ears
of the listener, or as some of you in the speech technology group that you have -heard that you have this linear array where you do some acquired zone synthesis
and such.
So it's not clear where to go, but it's certainly such that the solution that are
around are not -- there's not the ultimate solution amongst them.
And of course, whatever you do, since we cannot achieve ultimate physical
accuracy, we have to consider how the ears work and then optimize the methods
accordingly.
Usually I end such long talks with an image that summarizes and represents the
contents, but I think in this case, I leave this slide because it's certainly an
inspiring and good inspiration for the discussion.
Thank you very much. [Applause].
>>: So I [inaudible] you mentioned there's this commercial [inaudible] Yamaha.
Commercial [inaudible]. [Inaudible]?
>> Jens Ahrens: No. It's closely related. I've been forming some kind of
optimized [inaudible] synthesis. If you use an array of loudspeakers for beam
forming, you will have some -- what was I -- yeah. We will have something like
this.
The individual loudspeaker, the timing of the individual loudspeakers causes
individual wave fronts, which of course it will not be exactly like this or like that,
but you will have specific complex structure of wave fronts. So -- and it's a
[inaudible] array. It could probably -- you can at first assume a continuous array
and then form the beams and then analyze how it -- how the discretization
affects, so there is a very close relationship.
>>: So [inaudible] mathematically [inaudible] just to see the impact, but in
practice, is this even desirable or what are the kind of things that you might have
to generate?
>> Jens Ahrens: [Inaudible] wave can be useful, for example, in the creation of
reverberation because certainly you don't need to accurately measure the
reverberation of a room in order to mimic it with a loudspeakers system, but you
can use a set of band waves impinging from different directions which contain
[inaudible] signal and which can make a pretty good spatial impression. So this
could be useful.
Then what you often do is, for efficiency, you synthesize point sources because
they have a specific fixed location. This can be implemented very efficiently so
you can have a high number of sources.
It sounds a bit boring. Imagine you synthesize a person speaking with an
omnidirectional source. No matter where you move in the target zone, you will
always have the impression that the person [inaudible] towards you. It increases
the -- it enhances the spatial impression if you impose a specific directivity on the
sound source, which you can also do. It is then less efficient to implement. You
can then not have dozens of sources, but maybe ten sources or so on one
computer in realtime. But then you will hear the orientation of this person that is
talking to you which can be beneficial in aesthetic terms.
>>: I guess what I was referring to is it depends on the application. In some
applications, if you want to do teleconferencing and telepresence, I would expect
to reproduce to you as a point source, not as a [inaudible] wave. So I'm not
particularly interested in how well this system will work [inaudible]. I'm interested
in how well I can pretend that [inaudible] another one that is [inaudible] there and
then perceive that. So that may be more interesting than [inaudible] showed later
[inaudible] sources.
>> Jens Ahrens: I agree. The reason why we -- it was the plain waive for the
analyzes of the basic properties is that, first of all, you can decompose any sound
to a set of -- to a continuum of plain waves so you can make -- can deduce
information.
On the other hand, because of its simple geometrical structure, because, as I
showed in two and a half dimensional synthesis, here, if you would analyze this,
the synthesized sound field by the circular array, by means of a point source, you
would have a different curvature of the wave fronts, but you -- the sound field of
the point source exhibits some inherently amplitude decay. And then this two
and a half dimensional -- this incorrect two and a half dimensional [inaudible] into
decay is imposed on this already existing amplitude decay, and it can happen
that you just don't see it from a visual inspection.
So in the -- with the plane wave, you know that I go back and into the circular
array. We know that this is what we want. We want a plane wave front and
constant amplitude everywhere for the -- if you synthesize the sound field of a
point source, you have the amplitude decay. So here, we see that there's not
constant amplitude [inaudible] so this is some basic property of this loudspeaker
array, which can happen that you don't see it if you use more complex sound
field.
Is that what you ->>: [Inaudible], I cannot imagine any real applications where that -- I mean,
there's some where that's a desired outcome.
>> Jens Ahrens: Of course, of course.
>>: The scenarios. So maybe not as important as what are the scenarios that I
want to accurately reproduce and how would I do that?
>> Jens Ahrens: Yeah. So they are ->>: It's mathematically maybe convenient, but maybe ->> Jens Ahrens: In practice, they are two -- two essential difference scenarios.
One is where you record a [inaudible] with a microphone or microphone array
that captures the spatial information like a spherical array, and you want to
resynthesize or recreate the sound field by means of a loudspeaker array.
This is one scenario, so you don't manipulate or you don't impose spatial
information on to the signals. You just want to recreate them.
The other scenarios where you record the sound sources individually, like people
sitting at the table, everybody has his or her individual microphone. Then you
have the individual audio signals in individual channels so you can arrange them
in space as you would like to reproduce the people as point sources or other
complex sources or moving sources or so. So these are the two different
scenarios.
>>: I have a question about [inaudible]. So here you have 56 loudspeakers in a
circle of [inaudible] meters, so that's exactly 1000 and [inaudible]. And then you
go [inaudible] and you [inaudible] able to overcome the spatial [inaudible].
[Inaudible]?
>> Jens Ahrens: If you do this order limitation, you reduce -- you concentrate the
energy of your synthesized -- of your desired sound field into a specific region.
There was orders close to zero. Those lower orders that describe the sound field
around the center of the expansion. In this case, it is the center of the array.
So the low orders, they described the sound field around here. So and this
spatial extent of the sound field is strongly dependent on the frequency. So say
a fifth or a sound field at one hertz has a -- a hundred hertz has an extent of
several meters, but at one kilohertz, a fifth order sound field has an extent of
maybe this size.
So here, at low frequencies, you do not realize that this is order limited. If you go
to higher frequencies, you realize the order limitation because this is your desired
sound field. This is an order limited -- order limited plane wave. This energy is
concentrated around the center, and you have hardly any information around
here. This is what you lose.
If you go higher in frequency, the concentration around the center is even
stronger. So if this is your desired sound field, desired plane wave, and these
are the individual repetitions of the drawing function -- I go back -- yeah, I go
back. So this is -- describes the sound field around the center, and this higher
order, in this case, where the repetitions are describes the sound field at
locations far from the center.
>>: [Inaudible] spatial areas. Do we have the same resolution [inaudible]
variation? So the only thing you lose is [inaudible].
>> Jens Ahrens: Yeah. The zone where the sound field synthesis is actually
[inaudible]. This is what you [inaudible] here.
The problem is that if you could -- or if you assume a continuous secondary
source distribution, synthesizing order limited sound field, you will only have that
thing in the middle, this general. Anywhere else you will have very low
amplitude, 20, 30, 40 db lower than the desired sound field, so in -- so if you
would -- the listener would be located in the center, you would hear the signal.
And if you go one step to the side, the signal would pass away. Or actually the
higher frequencies are reduced and then the further you go away, the lower are
the frequencies that are remaining.
So if we have a discrete array, we have these artifacts which kind of add energy
to where the desired sound field does not have energy. So it might even sound
better to have these artifacts than to not having them.
>>: Thank you. [Inaudible]. [Applause].
>> Jens Ahrens: Thank you very much for your attention.
Download