22706 >> Ivan: Good morning. It's my pleasure to... Technical University of Berlin, University of Technology in Berlin, and...

22706 >> Ivan: Good morning. It's my pleasure to introduce to you Jens Ahrens from Technical University of Berlin, University of Technology in Berlin, and he has received his master's degree of science with distinction in electrical energy from Graz University of Technology in 2005. And since 2006, he works for Deutsche Telekom Labs and the University of Technology of Berlin where he defended his Ph.D. in October last year. And today he's going to present analytical methods of sound field synthesis-theory and applications. With no further ado, Jens, you have the floor. >> Jens Ahrens: Thank you, Ivan. The term sound field synthesis is not an established term. People use many different terms. I will come back to this point later. It is a method that we use for the presentation of audio, of spatial audio signals. But before I go into detail, I want to quickly add one comment to my biography about the Deutsche Telekom Laboratories because this is an institute which has a specific spatial aspect. And that is it is a public and private partnership between the University of Technology and Deutsche Telekom. Deutsche Telekom is a private company and a major German telecommunications provider, and used to be the mother of T-Mobile U.S.A. And the scientific -- they have a department which is purely telecom and a department which is purely academic. I am affiliated to the academic department which is part of the University of Technology Berlin. I'm attached to a professor, but the scientific staff is sponsored by telecom. That doesn't mean that they force -- so they pay the university to pay us, basically, but that doesn't mean that they force us to work on telecom relevant project, but of course, they encourage us. So now, back to the topic. I guess you are aware that there are different approaches for the presentation of spacial audio signals. A possible categorization might be head-related methods on one side, which exclusively control the sound pressure at the eardrums of the listener. Typically they employ head-related transfer functions, HRTFs, which represent the acoustical inference of the human body on the sound waves that impinges on a person and that typically employ headphones. And on the other hand, they are the room-related methods which aim at satisfying a specific portion of space. It is, for example, stereophony and ambisonics which employ a large number of loudspeakers, say, between two and maybe ten. And there's, for example, sound field synthesis which employees a significant larger amount of loudspeakers. And when I say sound field synthesis, I refer to the situation when an ensemble of elementary sound sources, for example, loudspeakers, is driven such that a sound field with specific desired physical properties evolves over an extended area. And this extended area can be a plane or a volume. So example systems, example ensemble of elementary sound sources, you can see here on this slide, this top image shows a section of loudspeaker system that we have installed in our laboratory. I also have a panoramic image. It's composed of 56 loudspeakers arranged on a circle of 1.5 meters in radius. So this is about the smallest that is useful. Then the other image shows the largest that is currently around. This is a lecture hall equipped with I think two and a half thousand loudspeakers all around in this gray -- along this gray ribbon, different with 800-something individual channels, sorry, different by 16 synchronized computers. So it's an enormous effort in terms of the hardware. Now, this is the point where people often get skeptical. They say, okay, I have two loudspeakers at home, and my sound system sounds great. Yes and no. It certainly sounds great, but there are certain things that you cannot do with stereophony or other methods that employ a low number of loudspeakers. I would like to postpone a discussion of the use [inaudible] of sound field synthesis to the end of the talk because I want to prefer outlining the specific properties of sound field synthesis before so that you can better grasp my opinion. So for the moment, the only motivation I want to outline is that what you cannot do with stereophony is assuring a plausible aural perspective. If you listen to a sound system, you face into loudspeakers and you move your head to the left and right, you hear how the sound sources move with you. So this is not plausible, especially when you consider an extended -- a large receiver area like in a cinema or like. If you have -- if you are familiar with the literature on sound field synthesis, you might certainly have stumbled over the two specific methods. One is wave field synthesis, and the other is near-field compensated higher order ambisonics. It is -- the best thing is to just accept the tones as they are. It would go a bit far to describe why this -- especially this is called a term what it is termed. It's more confusing rather than clarifying. So I will refer to this method as ambisonics. Those are -- can be formulated analytically, and of course, if there's analytical solution, there can also be a numerical one. I will not explicitly speak about numerical solutions. At specific points, I will outline the impact of the results I present on the analytical solutions. When I started working on these topics, the problem was that the methods listed here, they based on very, very different formulations, and this is even diplomatically expressed. Actually, even the persons that -- the personalities that work on the different methods are -- it's two universes that crash, so either you have people who work specifically on wave field synthesis or on ambisonics. And hardly anybody in the entire world, actually, works intensively on both. So our motivation was to find a framework that allows to compare the two methods. And this is what I am going to present. And also this unified framework that I will present, I will allow for a certain extensions of the theory and I will especially pay attention to the possibilities and limitations of these methods. So this is then the contents. This will be the biggest part of the presentation, the theory. I will present the framework and the properties in considerable detail, and after that, there will be a short illustration of what type of synthetic sound fields you can use in order to creatively employ sound field synthesis. So now let's go into the theory. Very elegant starting point to assume a continuous distribution of secondary sources. Typically in theory, you would not speak of loudspeakers because sometimes you assume properties that are not -- cannot easily be associated to a loudspeaker that has a -- is spatially is a district entity, so in this case, you assume a continuous layer of the loudspeakers of secondary sources, and then you can easily compare the situation to related problems for which the solutions exist, for example scattering problems where you have an inhomogenous boundary of a volume which presents the acoustical properties of a specific object. And this is exactly what we have here. The continuous distribution of secondary sources can be interpreted as an inhomogenous boundary. So the essential equation is what we term synthesis equation. It states the following: We have in this case an enclosing distribution of secondary sources [inaudible]. The letter G refers to the spatial transfer function of the secondary sources. We use the letter G because it may be interpreted as a [inaudible] function. And each X refers to a point in space, and X not refers to a point on the secondary source distribution. And D is the driving signal of the secondary sources. And if you integrate over this distribution, which is enclosing in this case, it doesn't need to be a contour integral. It can also be a plane or whatever. Then this will yield the synthesized sound field. Of course usually you would not want to know what sound [inaudible] if you drive the secondary sources in a specific manner. You would rather want to know how to drive the loudspeakers in order that the specific sound [inaudible]. So you want to dictate this and find out that. And this is the essence of the theory of sound field synthesis. The very important and very important aspect of this continuous formulation is the fact that you construct -- can construct a situation where you can show that a perfect synthesis, physically perfect synthesis is possible because if you meet a situation where the requirements for this perfect synthesis are not so clear to you, immediately, before you find the solution that certain limitations arise. And I will outline that later. So this is again a synthesis equation, just for reference. And you can physically interpret the secondary source distribution as a single-layer potential, which is often employed in scattering problems. There is such that on a scattering object and object that -- on which a specific field, for example a sound field impinges, is replaced by a continuous layer of secondary sources by a potential -- single layer potential which represents the properties of these -- the acoustical properties and the impact of this object on to the impinging sound field. This is mathematically a very, very simple -- similar formulation, so you can learn a lot from the solutions that have been found there. And for integrals like the one stated here, there is a theorem established by a Swedish mathematician, Eric Fredholm, which says that, interpreted for this solution, it says that an exact solution exists for arbitrary source-free sound fields if the secondary source distribution enclose the target volume. And it has to be simply connected there so more prerequisites that have to be met. But if we have a simply connected enclosing secondary sources, we know that if it's -- we can find a perfect solution. And this solution has been found where via an orthogonal expansion of the involved quantities, the sound field, the driving function, and the [inaudible] secondary source transfer function. Although this is possible for very -- for many or for arbitrary geometries, only the solution for the sphere is practical because in other cases, you might just not find the suitable orthogonal expression, expansion, and if you find it, it will be very difficult to implement it in practice. So now I want to illustrate the [inaudible] sphere problem. Let's assume a sphere, which is indicated by the gray shading, which is centered around the or begin of the [inaudible] system. It has a radius upper case R, and it encloses the target volume. So we want to synthesize a sound field inside this sphere. Then this -- the synthesis equation looks like this. The integral over the alpha refers to the [inaudible] better to the [inaudible] latitude, and this expression here and here, this represents integration of our sphere. So the -- according to the Fredholm theorem, we can expand the [inaudible] quantities into a specific orthogonal basis functions. In this case, these are surface spherical harmonics. Don't be puzzled if you don't know -- if you're not familiar with a spherical harmonics. It's not necessary to know the details in the following. Just accept that it's complete phases into which you can expand the involved quantities. And then you compare -- you perform a comparison of coefficients, and you can easily solve for the driving signal. I want to propose and alternative which is actually exactly the same, which for this example, it will be just be later -- later in the talk, it will be very helpful to use in this interpretation that I propose. I propose to interpret this integral as a convolution along the surface of the sphere, because then a convolution theorem can be established, which relates the coefficients of the inverse quantities with respect to an expansion into spherical harmonics. So this is actually exactly the same. This is the comparison of coefficients. And as I said, later in the talk, it will be clear why this can be beneficial. So you have some type of space spectrum representation of the individual sound fields which can be easily solved for the coefficients of the driving function and provided that there is no zero here, which is the case, for example, for only directional secondary sources. You can compose the driving signal from the coefficients. These are the basis functions so you have to sum up over this infinite amount of basis functions, but you don't need to go to infinity because this is converted above a specific threshold, so you don't need to wait forever until the computer has solved the result. And as theoretically predicted, the solution is exact. Now using relations to illustrate what it looks like, this is the top view on a horizontal plane. This is the secondary source distribution. And it is different such that it synthesizes a plane wave that moves downwards on the screen. And usually we use a plane wave to illustrate the results because it has very -- it's defined very simple properties. Plan [inaudible] and also equal amplitude anywhere in space, so you can easily detect if something goes wrong. And now on the next slide, I will present the view from the right on to the plane that is perpendicular an the screen. So you see that indeed this is the horizontal plane by the way. Indeed, in the entire volume, the sound field is properly synthesized. So. So unfortunately -- it is a bit unfortunate that only the result for this sphere is useful because many other -- I'm sorry. I completely forgot about that. If you look at the classical literature on ambisonics, you find that you find a very similar formulation. Only in that case, they don't assume a discrete distribution of secondary sources, but a continuous distribution of secondary sources but a discrete setup of loudspeakers. So instead of an integral, you have a summation here over the amount of upper case J loudspeakers, which is then expanded into sphere harmonics and the coefficients are compared. So if you compare this to the solution I have proposed, you see that it's exactly the same. So the first result is that ambisonics is a special case of the explicit single layer potential solution. More explicitly, it is discrete formulation of the spherical geometry. So the pity about this Fredholm theorem is that we want to use other geometries than the sphere. For example, a circular distribution. In this case, it is located in the horizontal plane. Now, we know, we see already that the prerequisites for the [inaudible] solution are not fulfilled because the secondary source distribution does not enclose the target volume. So we have to expect specific limitations. But the way we found the solution for the sphere is also helpful here. In this case, the synthesis equation represents a circular integral over the [inaudible]. And in that case, you can interpret it as a convolution along the surface of a sphere and relate the [inaudible] expansion coefficients, which are the coefficients of a different type of orthogonal expansion of the sound field. And then again, you can solve it for the driving signal. And if there's no zero here, you can construct -- reconstruct the driving function. These are the basis function in this case. Though we find out that you see it already here, there's the radius R in the equations. You have to reference a synthesis to the center of the setup. That means there's only in the center of the setup there's one single point where in the general case the sound field will be correct. Anywhere else, it is -- it is different from the desired sound field. So this looks then, for example, for the plane wave like this. Again, it's a top view. On the horizontal plane, you have the secondary source distribution indicated by the black circle, and you say the plane wave exhibits plane wave fronts or at least straight wave fronts in the horizontal plane, but it also exhibits a specific amplitude, DK, which is very [inaudible] for this type of synthesis, which is termed -- I go back two and a half, I mentioned synthesis because you have secondary sources with a three-dimensional transfer function but you aim at synthesis in a plane and since it's something that in between a two-dimensional and a three-dimensional problem, you can term it two and a half dimensional problem. And again, if we look from the right inside on to the plane that is perpendicular to the screen, it looks very different. This is again, the secondary source distribution. The plane wave inside the horizontal plane. It travels to the left but outside of the horizontal plane, of course the sound field is very different from what we want to have. And another quick last example, this is for a linear secondary source distribution situated along the X axis. You can find another convolution theorem which relates the quantities in wave number domain. And again, you can solve this for the driving signal and again you have to refer it to a line parallel to the secondary source distribution, which is then the only location where it's correct. Again it's a two and a half dimensional synthesis, and the simulated sound fields look like this. This is the secondary source distribution, and again, we have this characteristic amplitude DK. And also we can look on the plane that is perpendicular to the screen, and you see again outside of the horizontal plane which is indicated here, the sound field is different from what we want to have. So this is a summary of the geometries for which we found the solution. The sphere is the classical problem for which many authors, some of which are listed here found a solution, which they came from different directions, but in the end of course they found the same solution. And for the other geometries, we have proposed solutions. Excuse me. So that was the explicit solution of the synthesis equation. You can also find an implicit solution. And I will tell you in a minute what that is. If you -- and this is actually provided by the approach of wave field synthesis. Wave field synthesis basis, either you can start with two different -- start from two different directions. You can either start with the Rayleigh Integral or Kirchhoff-Helmholtz Integral. I will quickly summarize how it can be derived from the Kirchhoff-Helmholtz Integral. The Kirchhoff-Helmholtz Integral represents a physical law. It represents the fact that if you consider a volume, in this case the yellow area, which is source free, and if you know the sound field and the field of the sound field perpendicular to the boundary, then you know exactly what sound field is apparent inside the volume. On the other hand, you know that you can reinterpret this and state that if you can control the sound pressure and the gradient of the sound field on a boundary of a specific volume, you have full control over the sound field inside this volume, inside this boundary. You have to, in order to arrive at a practical solution, you have to apply a number of approximations which then yield a very simple driving function which can be implemented very efficiently, which is then only a high frequency approximation of the correct solution, but this is not a problem, as I will show in the following slides. Sorry. One slide in between. This is then a mathematical formulation. Actually, this is a little bit simplified, but it represents what I want to say. I want to say that we find the driving function without solving the entire equation from the physical relationships between the sound field on the boundary and the sound field in the volume, we can calculate the driving function. I want to emphasize that this S and this S, this is the same. So if we know the sound field, we have to take some gradient and we have to do some selection of the secondary sources. We cannot use them all. We have to carefully select which one we use. We can calculate the driving function without solving this equation. This is why we call it an implicit solution. And just for illustration, the secondary source selection that you have, that is a consequence of one of the approximations you apply. [Inaudible] the fact that you use only those secondary sources which are illuminated by the vertical sound field. So assume such a secondary source distribution and you want to synthesis a plane wave that travels to the right, you use only those secondary sources which are illuminated by the virtual sound field. And for the one source, this is similar. This is kind of intuitive, but actually if you analyze the exact solution, you will find that also the other secondary sources on the wrong side of the array, they also contribute of course in a very minor way, but physically they contribute to the synthesized sound field, although it travels into the other direction. Okay. Now, the illustration of the accuracy. On the left-hand side, you see the exact solution, and own the right-hand side, the W of [inaudible] solution for spherical secondary source distribution. This is the illuminated area, the secondary sources that are active, and those are those secondary sources which are inactive. And you see at very low frequencies. In this case, this is 200 hertz. You see some minor deviations of the WFS solution compared to the [inaudible] solution. But this is -- it's hard to prove, but this is most certainly inaudible. So this sounds very much the same. And if you go to a higher frequency then -yes? >>: [Inaudible] you are using half of the points? >> Jens Ahrens: Kind of. The approximation, this is -- you have -- you apply two approximations in this case. And one, the consequence of one is that you use only half of the secondary sources. If you would only use half of the secondary sources here, it would be -- this would become more similar to that one, but not exactly the same. Is that okay for the moment? >>: [Inaudible] two approximations [inaudible] some other approximation. >> Jens Ahrens: Exactly. And as I said, that is high frequency approximation that is applied in WFS. So if you go to higher frequencies, if the wavelength is significantly shorter, then the dimensions of the secondary source distribution, there's hardly any difference. And so I go back. As I said, this is probably -most probably not audible. So to summarize, the treatment of the continuous secondary source distributions, we have an implicit and an explicit solution. And the explicit solution is obviously found by explicitly solving the equation. And the implicit solution is found without solving the equation, by considering some physical relationships. And the physical accuracy of the solution is -- at least when it comes to sound field synthesis, it's equivalent. So far so good. The problem is just that in theory, okay, we have such continuous distributions, but in practice, we are something like this. A discrete setup of a finite number of secondary sources. So in order to model -- to mathematically grasp the impact of this spatial discretization, we do not assume a discrete secondary source distribution, but we assume a continuous distribution that is excited at discrete points, because then all the mathematical relationships we have established between the different quantities, they stay valid, and we just need to find a new driving function that equals the continuous driving function at the sampling points and vanishes elsewhere. This is exactly what we -- what is done in the analyzes of, for example, the discretization of a time -- of a time continuous signal. And in order to emphasize the relationship to this [inaudible] something, I want to quickly review the process. Consider a continuous time domain signal [inaudible]. In this case it's purely [inaudible] so it might have a magnitude spectrum like this. Typically they are some [inaudible] anti-analyzing filter is important in order to ensure that it's band limited, so in this case it doesn't change much. If you then space -- if you then discretize the signal in time with a specific constant interval, you will get repetitions in the spectrum of the signal. And in order to reconstruct the signal, you impose a low [inaudible] filter in order to filter out the basement of these repetitions and the result is then if everything goes -- if all parameters are selected accordingly, is then the reconstruction signal is then equal to the initial continuous domain signal. And this is very similar in spatial discretization. So here, if we discretize a time domain signal, we get repetitions in time frequency domain. In spatial discretization, if we assume constant something interval, we get repetitions in different representations of this space frequency, of the space spectrum, in which representation these repetitions arise depends on the geometry of the secondary source distribution. So far this spherical distribution, we have repetitions in the coefficients of this spectrum harmonics expansion. For the circle, we have the repetitions in the Fourier series expansion coefficients and for the plane or the linear source, we have repetitions in a case base. So this should already ring a bell. The -- this list, remember, for this sphere, we found a solution in the spherical harmonics. Here we found it in the Fourier series, coefficients domain. And the solution for the lines, the linear [inaudible] we found away from the [inaudible] domain. So we have already calculated everything we need. We just need to analyze it a bit more. Now we'll use the example ->>: So yeah, this is [inaudible]. >> Jens Ahrens: No. This is always the case. If you -- I will analyze the artifacts later in the talk, but if you have placed the loudspeakers closer than half the wavelengths, you will -- the -- you will hardly have any impact or hardly have any difference between a continuous distribution and the discrete distribution. But these spectral repetitions, they always arise. It's just such that the spatial transfer function of the secondary sources, for example, if using [inaudible], they suppress these repetitions, these -- because when you have a small sampling interval when the loudspeakers are close together, these repetitions are at very, very high space frequencies, and there they are suppressed by the spatial transfer function of the secondary sources. So you just don't realize that you have these repetitions there. >>: [Inaudible]. >>: Try saying that again. >>: Is there an assumption or does it get messier if you have nonuniform [inaudible]? >> Jens Ahrens: It gets significantly messier because they you cannot identify these repetitions. Something very [inaudible] happens then. We have not analyzed it explicitly, but even this analyzes, it's quite complicated. So if you -- this is the simplest case that we analyzed. If you analyze more advanced cases, it's really difficult to deduce any findings. So I want to illustrate this process and sampling and the [inaudible] repetitions on the example of the circular distribution. Remember, the driving function is a Fourier series. At least are the basis functions. The coefficients, and this is [inaudible]. Again it doesn't need to -- you don't need to [inaudible] minus infinity to infinity because of the conversions of this series. So the magnitude of the driving function may look like this. This is some triangular -- it has some triangular shape. On the horizontal axis you have the order M. In the center is zero, and this is the time frequency. Why it is such is not significant for the moment. So this is a continuous. This is the -- what it looks like for a continuous driving function. If you sample this driving function, these special repetitions arise. >>: Excuse me. What is again the X axis? >> Jens Ahrens: This is [inaudible] -- sorry. This is the order M, the index of the Fourier series for which you can interpret as some type of space frequency. So if you sample this continuous driving function, you get these repetitions. And since for unbounded frequency, the -- this -- the -- the spatial bandwidth of this driving functions is not limited so the higher you go, the further you go to the sides. And the repetitions can overlap and interfere and cause [inaudible] spatial aliasing. And this is what we propose the term especially for band driving function because, of course, I go back once more, you don't need to go from a very high number to a very high number. You can consciously reduce the spatial bandwidth of the driving function just by doing a -- choosing narrower summation limits, and then the continuous driving function may lose -- look like this. We just decided here to sum only from minus 26 or minus 27 to plus 27. Because if you then sample this driving function, the repetitions will not overlap and there will be no interference. And the base band will be uncorrupted. This has of course significant impact on the synthesized sound field that I will illustrate in a minute. So but before that, I want to say that if you analyze what happens in wave field synthesis, you realize that it's especially [inaudible] approach. The first one, this one, this is what happens in wave field synthesis. And if you analyze ambisonics, then it's a narrowband approach. And this is again this case. So we have one conclusion that means, in general, W of S constitutes a high frequency approximation which is probably irrelevant. [Inaudible] compensated infinite order ambisonics. Usually this is termed higher order ambisonics, and this term higher order refers to the fact that -- go back again -- you stop at a certain point. They just chose this terminology, although it's -- retrospectively, it's not a very useful. So if you would go in ambisonics to infinite order, you would have approximately the same. But the fact that W of S is spatially full bent and ambisonics is narrow bent, has a very essential impact on the synthesized sound field. At low frequencies, you hardly see any difference. By the way, this example is equivalent to 27 order ambisonics so from minus 27 to plus 27. At a frequency of one kilohertz, you hardly see any difference. If you go to two kilohertz, you already see in the full bent example, [inaudible] portion of the target area gets corrupted by artifacts commonly referred to as spatial aliasing. And there have been synthesis something else happens. The sound field gets smaller and comes -- and the energy concentrates around the center of the array. If you go even higher in frequency, you see that at a certain point, W of S, the entire target area is filled with art facts super posed on the [inaudible] sound field and in the ambisonics example, the design sound field gets very small and outside you have significant artifacts. This is only half of the truth because you can interpret this monochromatic analyzes as some kind of frequent -- yes? >>: So when you [inaudible] in the [inaudible] domain, what unit is this? >> Jens Ahrens: It's no unit. I go back again. The order of 27 means that we choose the bounds of this summation as minus 27 and plus 27. >>: Oh, so it's more the size of the [inaudible]. >> Jens Ahrens: Yes and no. It's represents the spatial bandwidth. I go a bit forward again. So you could also go from -- you could choose an order of ten. Then you would make this even narrower. So if you then sample it, you would have -- the repetitions themselves to be a bit narrower. But in a practical implementation, it means that -- want to go back again. Each mode is a filtering operation. And as you -- at the end, you sum the filter signals according to this formulation. So if you reduce the order, you have to do less filtering and less summations, which is a bit more efficient, but has essential impact [inaudible]. Does that answer your question? >>: Yes. >> Jens Ahrens: So as I said, these monochromatic considerations ->>: [Inaudible]. >> Jens Ahrens: I'm sorry. I forgot [inaudible]. This is the example that I showed in the panoramic image. That one. This is the parameters. It shows me exactly like in this example, in this image. So in this case we looked at individual frequencies and investigated what happens depending on the spatial band width. And what is more essential for the ear is the time domain structure of the sound field. And we have also made some simulations. I start with wave field synthesis. So you will now see the impulse -- impulse response of the loudspeaker system when different in order to synthesize a plane wave. So that means once I have a plane wave that moves downwards, and if you do that with a discrete array, it will look like this. So in the beginning, there's not much to see, but then you realize that in the wave field synthesis case, you have indeed a very strong first wave front, and this is actually what we want. But afterwards, you have a lot of different wave fronts. [Inaudible] which is spatial aliasing, which move into various directions. And in the ambisonics example, it will look like this. In the beginning it all looks very different, but then you realize that something else goes on. In the center, we have this area where there's hardly any corruption, where we have indeed the desire to impulsive wave front. But then we have some low frequency wave front around here followed by high frequency wave fronts. So here there's a separation between a low and high frequencies taking place. I go back a bit on the slides, which is you can also see this on this image. Here you have the desired sound field superimposed with artifacts in here. You have the synthesized sound field exhibits a specific curvature at high frequencies which is not there if you go to lower frequencies. Like in this case, at low frequents [inaudible] and this is also apparent in the time low domain analyzes. So and this is perceptually a big difference. We have here -- we have the strong -- the first wave front, and here we have a separation of low frequency and high frequency energy. So I see that it took me a bit longer than I initially planned, so I will have to accelerate a bit. And I want to illustrate the essential [inaudible] acoustical mechanisms on the example of two sound sources emitting sufficiently coherent signals, for example two loudspeakers emitting two signals. Yes? >>: Is there any [inaudible] -- [inaudible]. Is there any advantage to [inaudible]? >> Jens Ahrens: Yes. You could choose a different arrangement of the loudspeakers, like the [inaudible] space, [inaudible] space microphone array, but then -- then you can reduce the spatial aliasing, but only for one specific propagation direction of the plane wave. If you want to synthesize a different propagation direction, then you would have stronger artifacts than with the uniform sampling. So let's maybe use this example. You could choose to put the loudspeakers closer to go at the point where the [inaudible] wave touches the second noise host distribution and choose a wider spacing here. Then you could race the aliasing frequency the frequency where the artifacts arise, but this then would hold only if the [inaudible] propagates downwards if you want to synthesize a plane wave from that -- plane wave that propagates in that direction, you have more aliasing than in the uniform sampling. So back to the psycho-acoustical mechanisms which might be relevant here. There's one mechanism which is psycho-acoustical mechanisms which might be relevant here. There's one mechanism which is termed summing localization, which happens for example when you listen to stereophony. If you hear two loudspeakers, two loudspeakers are playing, but you hear only one sound source in the center, so the ear kind of combines the two signals and you hear some ever rich. You don't hear the sounds from left to right. You hear them in the center if two loudspeakers are playing coherent signals. And if you delay one loudspeaker by less than a millisecond, then the source moves towards the other loudspeaker. Then if the time delay between the two wave fronts is larger, than for example, a millisecond -- approximately a millisecond, then the precedence effect takes over. So you will hear the sound source exclusively in the earlier loudspeaker so that only the early loudspeaker determines localization, and the second -- the signal of the second loudspeaker will not consciously be perceived. It just adds a bit to the spatial impression. And if you have even stronger separation between the two wave fronts in time, you can perceive an echo. So and what happens here -- I'll go back -- is probably a mixer of everything. You have a strong wave front followed by closely spaced wave fronts where spacing is much closer than one millisecond. So this could trigger some localization. Might not. It's very difficult to say. And what happens here is very hard to interpret. We have made some -- many informal and also some formal experiments which we have also published. We have measured tens of thousands of impulse responses of [inaudible] related impulse responses from the system we have in order to binaurally -- to simulate the loudspeakers system over headphones, in order to have a controlled environment where you can seamlessly switch between different listening positions and different methods, including head tracking and everything. You can download some samples from our spatial audio block which contain the full binaural information so that they can get an idea of what it sounds like. It's very hard to summarize our findings so far into synthesis, but the only thing I want to emphasize is that it's very difficult to find out what happens because if you think you have found a basic mechanism in one situation, in a different situation, it might be completely different. So I think since the time has progressed so much, I will skip -- I'm sorry. >>: [Inaudible]. >> Jens Ahrens: Sorry. I was aiming at half past 11:00. So I can -- I'm sorry. I'm a bit confused. So I can take it easy. Very good. So I can summarize on the dysplasia discretization artifacts. If you saw that at low frequencies, they all -- even the discrete secondary source distributions, they all perform comparably. But the big difference is in the performance at the high frequencies. And what properties arise in the synthetic sound fields depends essentially on the spatial bandwidth of the driving function. And I have put my Ph.D. thesis here as a reference because I see this as the most significant result of my work so far. And by the way, the spatial -- the spectral repetition thing in all these findings are also for numerical approaches because it's -- again, it's still a discrete area of loudspeakers and the repetitions in the spectrum of the -- in the spatial spectrum of the driving function that arise anyway. No matter if you -- if you have found the solution numerically or not or if you have used an explicit solution or an implicit solution, it -- the repetitions are always there. So and I also show that a manipulation of the spatial bandwidth can change the properties of the artifacts. I go back again to remind you. Here you can have artifacts that superimpose the desired sound field sample, and here you can separate the artifacts and the desired sound field. And there's more possibilities. You could -- and we have proposed a number of methods that we decided to term local sound field synthesis because you can -- I go back again -- you can move this spot where the synthesized sound field is accurately synthesized to different locations. And since this represents a local increase of the accuracy of course by the cost of more significant artifacts elsewhere, we decided to term it local sound field synthesis. So I think that was enough for the moment for the analyzes of the basic properties. We can move on to some application examples. The plane wave example I showed is of course not very impressive in practice because it's a very simple sound field. You can do much more. You can, for example, have focus sources. I can show you quickly what a focus source is. This is a focused source. We have a secondary source distribution down here. This is a linear one in that case, and it synthesizes a sound field that converges towards a focus point. This is where the sound field focuses and this is the area where it converges. It passes the focus point and diverges again. And in this part of the target plane, it looks very similar to the sound field of a mono [inaudible] sound source that is located here. So if a listener is positioned here, he will hear a sound source in front of the loudspeakers. This is -- it actually works in practice. It exhibits some essential limitations but, still, it can work. And this is a very essential property of sound field synthesis because if you think about large venues like a cinema, in stereophony, the problem is that you cannot create auditory events that are closer than loudspeakers. So if you want to, for example, synthesize a soundscape, replay a soundscape, like rain over stereophony, you always -- it sounds like you're sitting in a dry bubble and the rain is happening somewhere in the distance or around you, even if it's a surround system. But you can use for example these focus sources [inaudible] plays individual raindrops in the audience, and this essentially increases the immersion because you really feel like being inside the scene. So one approach to synthesize ->>: [Inaudible] question here [inaudible]. >>: [Inaudible]. [Laughter]. >> Jens Ahrens: I'm afraid I cannot. Not acoustically. >>: [Laughter]. >> Jens Ahrens: So one way to find the driving function for the focus sources is to start with a point source, purely diverting sound field indicated by the arrows, and then reverse the timing of the loudspeakers. In this case, the loudspeakers in the center plays first, and then graduated towards the sides and the loudspeakers -- the other loudspeakers play. If you now make the loudspeakers on the far end play first and the loudspeaker in the center play last, you will indeed get this sound field that diverges towards this focus point and then that converges to the focus point and then diverges again and makes up a focus source. You can -- we have proposed a number of different formulations of this which plays on the manipulation of the design sound field in a specific [inaudible] domain like a plane wave decomposition. You decompose the sound field of a mono [inaudible] into plane waves, and then recompose it only by taking part of the -- one section of plane waves, those which propagate into the target area, and the result is then a focused source. I will not go into details because -- yes? >>: [Inaudible] reconciled this [inaudible] at that point? >> Jens Ahrens: No. I have to move to the back of that in order to show you once again. I have prepared that. Focused sources have very specific properties in terms of the aliasing artifacts. This is found here. So this is -- this is again a linear array. The secondary sources are marked by black crosses which you probably cannot see, but they are around there with a spacing of 20 centimeters. [Inaudible] focus source one meter in front of the array for a low frequency of one kilohertz, it all looks fine. If you move on to a higher frequency, you see that around the focused source, always in any case, no matter if it's spatially broad, wideband or narrowband, always a region arises where there's no artifacts. The artifacts are elsewhere, and this region gets smaller if you go higher on frequency. So this is very different. And this is -- this is -- can be beneficial if you're located here, but it can be problematic if you're located there because in the case of focussed sources, it is such that it's not -- it's not such that the desired wave front is followed by the artifacts, but the artifacts occur before the desired wave front. And this triggers -- obviously triggers very different perceptual mechanisms. >>: [Inaudible]? Does using loudspeakers really change any of [inaudible]? >> Jens Ahrens: Not in terms of the aliasing, no. If you [inaudible] loudspeakers has a specific more complex directivity towards higher frequencies, that has some impact on the desired -- on the sound field synthesized, but it's not very essential. It changes the amplitude distribution a little bit, but there's no essential impact on the structure of the wave fronts which is the more important. >>: [Inaudible]. >> Jens Ahrens: You mean if the loudspeaker has a considerable spatial extent? >>: Well, I mean, one, two meters ->> Jens Ahrens: That is the spacing between the loudspeakers [inaudible]? >>: [Inaudible] the size of the plane that the loudspeaker is driving is not considerably smaller than [inaudible]. >> Jens Ahrens: At low frequencies, it's not. Again, back to the photograph, I'm sorry. So these loudspeakers, the spacing is about 17 centimeters. So the loudspeakers we chose are significantly smaller, but they are two-way loudspeakers where you have some midrange driver and a high frequency driver. And when the large loudspeakers is active, this is at a rather low frequency range where there's hardly any artifacts anyway. And at the high frequency range, where the considerable artifacts arise, we have the very small loudspeaker which is active, so it won't change much. >>: [Inaudible] surface. >> Jens Ahrens: Yeah. It's hard to say where there's -- the transition between the two, but it is very -- the problem is that at high frequencies, it is very difficult to measure the directivity of a loudspeaker with higher spatial resolution. So it's hard to say what happens. So that was the focus source. And of course, you don't need to synthesize -assume an omnidirectional point source or focus source. You can also describe sources with complex directivities. An example is this one. The following, you can -- the problem is -- and -- that in wave field synthesis, you have these strong first wave front that I showed you, which on the one hand causes very -- actually very localization. This is often praised in scientific papers. That's fine. But in some cases, it can be a problem because the locatedness of a sound source is very large. That means the amount to which the energy is focused to a point is very high, so you hear a very small source. If it has a far [inaudible], imagine a venue like this room equipped with a [inaudible] loudspeaker system, you have sound source, it's three or four sound sources. You have a lot of space in between the sound sources where there's nothing. So you would want to control the perceived extent, the apparent source width of the sound sources. One -- some approaches that have been proposed in the context of stereophony and such methods, they split a given sound source into several sources which are put at different locations, and those individual sources are then different with decorrelated signals. It can sound extremely good. I heard a demonstration by Villa Puki (phonetic) at last year's spatial audio -- AES spatial conference in Tokyo. It was really, really impressive. The only problem is that you cannot control the -- for example, the orientation of a sound source, so there is some room for improvement. So we have recently proposed to model a virtual sound source, to model a virtual sound source that vibrates in higher spatial modes. Like, for example, a string of a guitar, it doesn't only vibrate in the very low mode like this, but there is some partial -- there are some harmonic modes and you can do this also seamless with a spatial vibration. So it -- then a very complex sound field evolves which lowers the coherence between the signals at two ears, at the ears of the listener, and then this can increase the perceived extent. I have some example which will be a two-channel audio signal, which the channels contain the sound fields of the signals captured by a virtual -- by two virtual omnidirectional microphones put in the sound field in these positions approximately at the distance of the ears. It is not -- you will not -- it will not sound exactly like you would hear, like what you would hear, but you would at least get an idea of what it could sound like. So first, the input signal. (Speaking German.) If you are sitting in the center, you hear a sound source in between the two loudspeakers, which is ->>: [Inaudible]. >>: -- which is rather flawed. So maybe -- maybe you want to move here. And the best is of course always to close your eyes. So I play again. (Speaking German.) And then a simulated source of half a meter of size, of length, a plane, a plate of this size. (Speaking German.) The room, but also if you listen over headphones, you see that the sound gets indeed broader. Maybe you hear -- you don't look so convinced, so maybe loudspeakers are too high or too far apart. And also there are many parameters that you consume in this example, and this is -- the parameters we chose are certainly far away from being optimal. I have another -- one last example. In many cases, you will, of course, want to have a sound source that moves. If you -- in real life, a moving sound source exhibits the Doppler effect. That means that is a consequence. Fact that the speed of sound in air is constant irrespective of the motion -- of a source. So a sound source that moves towards you raises -- if the sound source -- the source moves towards you, the pitch is raised. And if it moves away from you, the pitch is lowered. In many applications you will not want that because if you think for example about the telephone -- teleconferencing scenario, you have people talking and you move them with some multi [inaudible] or other interface and if you move the persons closer, their voice will rise and if you push them away, they will -- the pitch will go down. So this sounds very irritating, so you would want to suppress this Doppler effect. What you can do is to synthesize a sequence of stationary positions and cross fade between the sequence. And this is, depending on how you choose the parameters, this on one hand it can suppress the Doppler effect, but on the other hand, if you use very short intervals of the sequence, this can also create a frequency shift. We have then provided -- proposed to derive the driving signal from a model of the sound field of a moving source in order to synthesize the Doppler effect, and this indeed works. I have -- sorry -- an animation of it. So this would be the moving source. And it moves quite quickly. This is of course slow motion. And you see obviously how the sound waves are compressed in the direction of motion and how they are expanded in the other direction. And if you drive a loudspeakers system accordingly, then it will look like this. The source moves along here approximately. So in direction of motion, sound waves are compressed. And in the opposite direction, they are expanded. And you can do much more. I think these are enough examples for the moment, so I want to -- before I conclude, I want to mention the software we wrote. It's -we termed it soundscape renderer. We had to find a name for it. It implements the most basic versions of most of these spatial audio rendering algorithms, wave field synthesis, [inaudible], [inaudible], and many binaural versions of the binaural rendering. This is an example with a loudspeakers system. These symbols represent the 56 internal loudspeaker system, and then they have a lot of sound sources. And if you will listen, use a binaural method, loudspeakers system is replaced by yourself. And there's an interface for head tracking everything. You can download the source code. The software only has one flaw. I hope you don't mind. [Laughter]. >>: That's [inaudible]. [Laughter]. >> Jens Ahrens: I'm sorry. It was not my decision. So I finally conclude. The physical fundamentals of sound field synthesis can elegantly be described using integral equations. Then we have identified ambisonics and wave field synthesis as two special cases of the solution. The spatial bandwidth is essential in terms of the properties of the synthesized sound field. Remember the different structures of the wave front. Then I showed that various types of virtual sources are possible, moving sources, sources with directivity, focused sources and the like. The only problem is that perception of synthetic sound fields is hardly known. Now, I want to -- I have one last slide within Outlook. I will start with a white slide in order to increase the excitement. [Laughter]. I have been one of the very few fortunate -- very few fortunate persons in this world who have access to all the different types of audio reproduction systems. I have access to this 800-channel wave field synthesis system. We have this 56-channel system in our laboratory. We have a heap of individual loudspeakers that we can arrange in various ways. I have travelled to many laboratories in order to hear the system. And I think it is indeed such that sound field synthesis can be superior to other audio presentation methods, especially for large venues or especially if you have a clear reference of what something should sound like. If you listen to a pop music recording, it won't tell you much because there's no acoustic, no unamplified pop music, so the sound of pop music is very closely related to the sound of stereophony. But if you listen to classical music played over stereophony or traditional ambisonics or any other loudspeaker arrangement with a low number of loudspeakers, it sounds different to the -- to real classical music in the sense that it doesn't contain as many details, both spatially and also with respect to the [inaudible]. So some information is getting lost. This can be better if you use sound field synthesis because you have more degrees of freedom; you can add more details. Still, it doesn't sound perfect, but it can be superior. Not always. Often situations stereophony and other methods are superior, but certainly in the current state, sound field synthesis is not the ultimate solution. It's just not reasonable to employ dozens or maybe hundreds of loudspeakers for this potential benefit. On the other hand, other audio presentation methods can most likely not overcome the limitations. For example, in stereophony, you will always have a sweet spot, especially if you serve multiple people. It's -- there's no way to overcome this. So neither/nor is the solution. Probably we will experience a convergence between the methods. There's a small company called Iozono (phonetic) in Eastern Germany which commercially sells wave field synthesis systems, and they have also equipped, for example, Manns Chinese Theater in Hollywood with a system. And they have a new generation of systems where they significantly reduce the number of loudspeakers. So they have a -- initially they had a loudspeaker spacing of maybe 15 centimeters or so, and now they use half a meter to a meter. So this significantly reduces the amount of loudspeakers. This new generation of systems, I heard it twice, once in the laboratory where it sounded really great; then I heard it on tune master [inaudible] which is a German meeting of tone masters of balance engineers. So it was a different room, though the same system. I was not really convinced, but still this is certainly an important step into the right direction. And also, this convergence of methods could be something like a desktop audio system where you have -- where you put a number of small loudspeakers onto the screen and then you form beams that control the sound pressure at the ears of the listener, or as some of you in the speech technology group that you have -heard that you have this linear array where you do some acquired zone synthesis and such. So it's not clear where to go, but it's certainly such that the solution that are around are not -- there's not the ultimate solution amongst them. And of course, whatever you do, since we cannot achieve ultimate physical accuracy, we have to consider how the ears work and then optimize the methods accordingly. Usually I end such long talks with an image that summarizes and represents the contents, but I think in this case, I leave this slide because it's certainly an inspiring and good inspiration for the discussion. Thank you very much. [Applause]. >>: So I [inaudible] you mentioned there's this commercial [inaudible] Yamaha. Commercial [inaudible]. [Inaudible]? >> Jens Ahrens: No. It's closely related. I've been forming some kind of optimized [inaudible] synthesis. If you use an array of loudspeakers for beam forming, you will have some -- what was I -- yeah. We will have something like this. The individual loudspeaker, the timing of the individual loudspeakers causes individual wave fronts, which of course it will not be exactly like this or like that, but you will have specific complex structure of wave fronts. So -- and it's a [inaudible] array. It could probably -- you can at first assume a continuous array and then form the beams and then analyze how it -- how the discretization affects, so there is a very close relationship. >>: So [inaudible] mathematically [inaudible] just to see the impact, but in practice, is this even desirable or what are the kind of things that you might have to generate? >> Jens Ahrens: [Inaudible] wave can be useful, for example, in the creation of reverberation because certainly you don't need to accurately measure the reverberation of a room in order to mimic it with a loudspeakers system, but you can use a set of band waves impinging from different directions which contain [inaudible] signal and which can make a pretty good spatial impression. So this could be useful. Then what you often do is, for efficiency, you synthesize point sources because they have a specific fixed location. This can be implemented very efficiently so you can have a high number of sources. It sounds a bit boring. Imagine you synthesize a person speaking with an omnidirectional source. No matter where you move in the target zone, you will always have the impression that the person [inaudible] towards you. It increases the -- it enhances the spatial impression if you impose a specific directivity on the sound source, which you can also do. It is then less efficient to implement. You can then not have dozens of sources, but maybe ten sources or so on one computer in realtime. But then you will hear the orientation of this person that is talking to you which can be beneficial in aesthetic terms. >>: I guess what I was referring to is it depends on the application. In some applications, if you want to do teleconferencing and telepresence, I would expect to reproduce to you as a point source, not as a [inaudible] wave. So I'm not particularly interested in how well this system will work [inaudible]. I'm interested in how well I can pretend that [inaudible] another one that is [inaudible] there and then perceive that. So that may be more interesting than [inaudible] showed later [inaudible] sources. >> Jens Ahrens: I agree. The reason why we -- it was the plain waive for the analyzes of the basic properties is that, first of all, you can decompose any sound to a set of -- to a continuum of plain waves so you can make -- can deduce information. On the other hand, because of its simple geometrical structure, because, as I showed in two and a half dimensional synthesis, here, if you would analyze this, the synthesized sound field by the circular array, by means of a point source, you would have a different curvature of the wave fronts, but you -- the sound field of the point source exhibits some inherently amplitude decay. And then this two and a half dimensional -- this incorrect two and a half dimensional [inaudible] into decay is imposed on this already existing amplitude decay, and it can happen that you just don't see it from a visual inspection. So in the -- with the plane wave, you know that I go back and into the circular array. We know that this is what we want. We want a plane wave front and constant amplitude everywhere for the -- if you synthesize the sound field of a point source, you have the amplitude decay. So here, we see that there's not constant amplitude [inaudible] so this is some basic property of this loudspeaker array, which can happen that you don't see it if you use more complex sound field. Is that what you ->>: [Inaudible], I cannot imagine any real applications where that -- I mean, there's some where that's a desired outcome. >> Jens Ahrens: Of course, of course. >>: The scenarios. So maybe not as important as what are the scenarios that I want to accurately reproduce and how would I do that? >> Jens Ahrens: Yeah. So they are ->>: It's mathematically maybe convenient, but maybe ->> Jens Ahrens: In practice, they are two -- two essential difference scenarios. One is where you record a [inaudible] with a microphone or microphone array that captures the spatial information like a spherical array, and you want to resynthesize or recreate the sound field by means of a loudspeaker array. This is one scenario, so you don't manipulate or you don't impose spatial information on to the signals. You just want to recreate them. The other scenarios where you record the sound sources individually, like people sitting at the table, everybody has his or her individual microphone. Then you have the individual audio signals in individual channels so you can arrange them in space as you would like to reproduce the people as point sources or other complex sources or moving sources or so. So these are the two different scenarios. >>: I have a question about [inaudible]. So here you have 56 loudspeakers in a circle of [inaudible] meters, so that's exactly 1000 and [inaudible]. And then you go [inaudible] and you [inaudible] able to overcome the spatial [inaudible]. [Inaudible]? >> Jens Ahrens: If you do this order limitation, you reduce -- you concentrate the energy of your synthesized -- of your desired sound field into a specific region. There was orders close to zero. Those lower orders that describe the sound field around the center of the expansion. In this case, it is the center of the array. So the low orders, they described the sound field around here. So and this spatial extent of the sound field is strongly dependent on the frequency. So say a fifth or a sound field at one hertz has a -- a hundred hertz has an extent of several meters, but at one kilohertz, a fifth order sound field has an extent of maybe this size. So here, at low frequencies, you do not realize that this is order limited. If you go to higher frequencies, you realize the order limitation because this is your desired sound field. This is an order limited -- order limited plane wave. This energy is concentrated around the center, and you have hardly any information around here. This is what you lose. If you go higher in frequency, the concentration around the center is even stronger. So if this is your desired sound field, desired plane wave, and these are the individual repetitions of the drawing function -- I go back -- yeah, I go back. So this is -- describes the sound field around the center, and this higher order, in this case, where the repetitions are describes the sound field at locations far from the center. >>: [Inaudible] spatial areas. Do we have the same resolution [inaudible] variation? So the only thing you lose is [inaudible]. >> Jens Ahrens: Yeah. The zone where the sound field synthesis is actually [inaudible]. This is what you [inaudible] here. The problem is that if you could -- or if you assume a continuous secondary source distribution, synthesizing order limited sound field, you will only have that thing in the middle, this general. Anywhere else you will have very low amplitude, 20, 30, 40 db lower than the desired sound field, so in -- so if you would -- the listener would be located in the center, you would hear the signal. And if you go one step to the side, the signal would pass away. Or actually the higher frequencies are reduced and then the further you go away, the lower are the frequencies that are remaining. So if we have a discrete array, we have these artifacts which kind of add energy to where the desired sound field does not have energy. So it might even sound better to have these artifacts than to not having them. >>: Thank you. [Inaudible]. [Applause]. >> Jens Ahrens: Thank you very much for your attention.

22706 >> Ivan: Good morning. It's my pleasure to... Technical University of Berlin, University of Technology in Berlin, and...

Related documents

Products

Support

22706 &gt;&gt; Ivan: Good morning. It's my pleasure to... Technical University of Berlin, University of Technology in Berlin, and...

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib

22706 >> Ivan: Good morning. It's my pleasure to... Technical University of Berlin, University of Technology in Berlin, and...