>> Mark Thomas: Hello, everyone. I'd like to... working with me for the past 12 weeks. Archontis...

>> Mark Thomas: Hello, everyone. I'd like to introduce Archontis Politis, who has been working with me for the past 12 weeks. Archontis joins us from Aalto University, in Finland, where he works with problems on spatial audio capture and reproduction as part of his PhD work, and it seemed only natural that we should give him an image-processing problem to be working with here. So without any further ado, please, you have the floor. >> Archontis Politis: Thank you, Mark. So, yes, I will jump straight into the presentation, and what was my task here for this internship? So I was working on an audio acoustics problem through a kind of graphics problem route, and that was quite interesting, because I haven't done stuff like that before in a way. But yes, so the topic of the presentation is Applications of 3D Spherical Transforms to Acoustics and Personalization of Head-Related Transfer Functions. So before I go to the actual task and the method that we used, I will speak a little bit about the motivation, what got us into the -- to do what we finally used. And the motivation for this work was HRTF personalization. So HRTFs or head-related transfer functions are filters that they are direction dependent, and they model the acoustical response from a sound source to the listener's ears. So, basically, they are the transfer functions for a source at some distance from the listener to the eardrums of the listener, and these are crucial for high-quality special sound reproduction over headphones, since if we convolve a set of these filters for a certain direction with, for example, a monophonic recording, we can create the sensation that the sound is coming from that specific direction. Excuse me. One property of these filters is that they are quite much individualized, so we have to measure them for a certain individual, and this individual's set of HRTFs, they encode important special cues, mainly cues that are having to do with elevation, but also with azimuth of sound source. And this creates a problem, because it means that ideally we have to measure a set of HRTFs for each specific individual that we want to deploy an immersive application that uses full special sound over headphones and measurement of HRTFs is a very lengthy and costly process. You have to build a room, an anechoic space like these ones in the photo, and you also need some special apparatus that is specifically designed for the task of measuring HRTFs around the listener. This is an example of the anechoic space here at Microsoft Research, and the apparatus with a rotating arc that measures the HRTFs around the head of the listener that sits in the middle. And the lower picture, I think it's some older setup in either NASA or some Air Force facility, that they have a full grid of loudspeakers, and the listener has to get inside that grid. So this is not really an option for general deployment of immersive audiovisual applications. So the way the research world has treated this problem is there are two or three different approaches. One of them is -- one of the things people have proposed is that you somehow model the HRTFs as a set of filters with a full parameters, and then you present a task to the user that they can tune these parameters in order to make these HRTFs kind of match whether they would like to hear or use or match their own. The second approach is an approach based on numerical simulation, so you get some kind of a rough model of the head and the ears, and then you put it an acoustic simulator on a computer, and you try to model the scattering effect that the head has on a sound wave up to 20 kilohertz, up to the audible frequency range, and this is also a quite nice solution. The problem is that it's also extremely lengthy. Normally, for most numerical simulation methods, it will take days to compute the full audible frequency range. The third approach, and the one that relates mostly to what we are doing, is to measure a database of HRTFs for many subjects and then try to find for a specific user the one that would match him well, based on some features of him, some morphological features, mostly. And so what people have done is they have measured tens or hundreds of subjects, and they have kept a directory of anthropometric features, and then they have tried to -- and then for any new user, all they need to do is measure their own anthropometric features, find the ones in the database that are closer to the users, and then pick these HRTFs or a set of HRTFs, hoping that this would match the user quite well. And the problem with this approach up until now is that there is no direct connection between an HRTF - or at least a very clear direct connection between an HRTF and all these specific anthropometric features. It's very hard and it's still unclear and an open problem how, for example, these individual features affect specific features in the HRTF response. So even though this approach has been studied quite a lot, it's still an open problem, and many times, the relations that have been used are quite arbitrary. So the main motivation of this research was, first, we know that the HRTFs clearly depend on the head shape and the head size on some way, since basically they model the scattering of the sound around the head. And what we were thinking is that if we can determine a similarity measure between heads in a database, that is based directly on the total head shape rather than individual features, and the way we approach this problem was to see it as a three-dimensional shape or a surface or distribution and perform a spectral or harmonic decomposition on the head shape, so kind of try to find spatial frequency components that they model this head shape in some way and try to determine similarity based on these spatial frequency components. So the first step towards that was to consider one quite popular spatial transform, popular in many fields, also in acoustics, since it has been used in many acoustic applications, and that's the spherical harmonic transform, which basically, it's like a two-dimensional Fourier transform for data or functions defined on the unit sphere, and that has been used in acoustics for modeling of loudspeaker radiation patterns, microphone directivity patterns, also for spherical array beamforming and spatial filtering, which it's a very active field of research the last decades. Also, for immersive 3D sound recording and reproduction, with a family of methods that have been popularized as ambisonics. And finally, for interpolation of HRTFs. Since HRTFs are measured around the listener on a spherical grid, the spherical harmonic transform is a pretty natural way to first compress all this data and also use the inverse transform to interpolate between the measurement points. So that's a look on a little bit of the math of the spherical harmonic transform. So we get the spherical harmonic coefficients by projecting our function onto the basis, spherical harmonic basis functions, which are also called the spherical harmonics, and these spherical harmonics have a dependency on the azimuth and the elevation, with a normalization term that keeps them ortho-normal over integration over the unit sphere, and then using these coefficients, these harmonic coefficients, we can reconstruct our function or interpolate our data for any direction that we want. And the domain of the function is the unit sphere, so it depends on azimuth and elevation, and even though theoretically the transform should be infinite, for most practical functions, especially in acoustics, this summation can be described perfectly well with a finite sum, so actually the transform is practical. An example of the use of the spherical harmonic transform is, for example, in HRTF interpolation, as I mentioned, so these are HRTF magnitudes for the left ear of a user measured here in MSR at 1 kilohertz, and this grid of points is actually the measurement grid, and we can see this -- yes, the magnitude variation across different directions, and if we perform a spherical harmonic transform on this data, we get the spherical harmonic spectra. And using this spectra, by performing the inverse transform, we get a very nice, smooth surface at any direction we want that models very well this HRTF directivity function. >>: I have a question about that. Since the title of the parameter is L equals 15, what does that mean? >> Archontis Politis: Yes. That's actually the order of the transform, meaning that we use more and more harmonic components to capture the variation across angles of the function. And this order is determined by a few factors. First of all, it's determined by how smooth our original let's say HRTF is across different directions, but also, it's limited by how many measurements we have done, so by kind of a sampling condition. And in this case, the order of the transform was 15, which actually means it used 256 coefficients, so basically, with 256 coefficients, we could interpolate for any direction we wanted for this single frequency. So yes, that's also a quick example of how this basis function looks like, the spherical harmonics, and we can see that they have more angular variation at increasing orders, and also different basis functions, different spherical harmonics, have different symmetry properties, meaning that they can capture different kind of properties of the shape or the distribution that we are transforming. So of special interest to our work here was the fact that the spherical harmonic transform is also quite popular in the graphics community. It's used mainly for tasks that have to do with lightening of 3D objects and rendering, but it has also been used for, for example, fast approximation of 3D heads, so this is an example of a 3D image captured with laser scanning, and this is what the spherical harmonic transform gives with an order of 25 and 676 coefficients. So you can see that it captures most features of the head in a smooth representation, and the same for the other head, too. One thing we cannot hear is that very complex surfaces, like for example, the back of the ear, cannot really be captured by a transform like that. Because it's a two-dimensional transform, it can only detect the furthest points on the surface of the head, and it cannot model variations with -- radial variations like that. And of even more interest to our work in this internship was the fact that the spherical harmonic transform has been used also in the problem, again, from the graphics community, in the problem of trying to detect similar 3D objects in a 3D model database, so if we have a database of thousands of 3D models and we do a query with a new 3D model, the problem is to try to find similar models in the database. And as I mentioned, the spherical transform, as it is, it has its limitation, but it cannot capture radial variations, so it's not by itself suited for this problem very well, for complex 3D shapes. Essentially, it requires that if the origin is the inside of the 3D model, it requires that every point of the 3D model is visible from the origin, so there are no occlusions or stuff like that. So the way the people used it before for this similarity problem is that they took the 3D model. Then they created concentric spheres that they were growing outwards from the center of the object, and then they were taking the intersection of these concentric spheres with a 3D model, and they were applying the spherical harmonic transform for these spheres separately, and then they were ending with a twodimensional matrix of spherical harmonic coefficients, which was then used as a template to find the similarity between the 3D models. One other nice detail about this study and similar studies is that it makes sense for this kind of problem not to use directly the spherical harmonic coefficients but actually to use the energy of the coefficients for each order L, which has the important property that it doesn't depend on the rotation of the object. So if instead of directly the coefficients, we use the spectral energies, we end up with a smaller vector, and that vector doesn't change if the object is rotated or not, and that's a very nice property for this similarity matching problem. So finally, we saw that the spherical harmonic transform can be quite good for our task, which relates also to this HRTF personalization task. However, the way that people used it before is a bit ad hoc. It's in a way somewhere between a spatial transform and somewhere between a spatial representation and a harmonic representation. It breaks down the object in these spheres, and then you have a harmonic representation for each one of these spheres. What we thought is that we can go one step further and use a full three-dimensional transform, so instead of only capturing the angular variation across each one of these spheres, we can also try to capture the radial variation, and then we end up with a full three-dimensional transform that models both angular and radial frequency components. So we focused on two 3D spherical transforms. One is a spherical Fourier-Bessel transform, which has been used quite much in physics and chemistry and I think also in some image processing problems, and the second one is a spherical harmonic oscillator transform, which is not very popular by this name. There are only a few works that mention it like that, but I think it has been used quite much in quantum physics or quantum engineering. So the first one, the spherical Fourier-Bessel transform, just a few notes here. First of all, this transform is not any more defined like the spherical harmonic transform only on the surface of a sphere, but is defined -- also, it's defined on 3D space. It has also the radial component, and here, the domain of integration is a solid sphere going from zero up to some radius that we want, and basically, it encloses completely our shape. And this radius of the domain determines many properties of the wave functions, like the scaling of the radial functions and the normalization term, and we can also see that the basis functions for this transform contain basically the spherical harmonics, so the angular components are still captured by the spherical harmonics, while the radial component is captured by the spherical Bessel functions. This is an example of how the basis functions look like. This is not very clear, but what we can say at least is that this is a specific wave function, and apart from the angular variation, it's now a full three-dimensional function. So it's defined in the full 3D space, and we can say at least here that it has variation, both across the angular dimension and across the radial dimension. The second transform is the spherical harmonic oscillator transform, similar to the spherical Fourier-Bessel transform, it's also three dimensional. The difference with the spherical Fourier-Bessel is that for the radial component, it uses this associated Laguerre polynomials, and now the domain of integration is not a solid sphere, but it goes from zero to infinity. However, the transform is still suitable to capture and model shapes that are concentrated at some region. Again, that's a picture of an example of the harmonic oscillator transform basis functions. And next, after we implemented the basic form of these transforms, we had to determine -- we had to think of what we could use them for, and naturally, one good application is compression interpolation not only of data on a sphere but data on a 3D space. Especially data that are sampled spherically are very well suited for these transforms, and a natural application for that can be the 3D interpolation of near-field HRTFs. Near-field HRTFs refer to the fact that even though HRTFs are known to change apart from a scaling factor for source distances beyond 1 to 1.5 meters, that fact doesn't hold true for close distances to the head, so if we want to make a sound appear like it's coming really close to the head, we have actually to measure HRTFs at different distances from the head. So we have a radial variation, and this is something that can be compressed and also expressed by use of the full 3D transforms, and also, after we have gotten the harmonic coefficients, it's very easy and fast to interpolate at any point we want. The second task, and this is the task that motivated this research, is that we can use these 3D transforms to capture the harmonic components of head shapes, get, for example -- get quickly a 3D scan of a user by using, for example, the Kinect. And then, by use of the Fourier-Bessel transform or the SHOT, we can get the harmonic components, or the spectrum of the user's head, and then compare it with the spectra of the heads in a database that we have also measured the HRTFs, and then just pull the HRTF that corresponds to the closest head to the user. And as I mentioned before, a nice property of the three-dimensional transforms similar to the spherical harmonic transform is that we can use this spectral energy vector, which is rotational invariant, so the method will be robust, for example, if there is a head in the database that is similar to the user's, but due to the measurement procedure, it's rotated or non-aligned. >>: What translation, if you hadn't properly found the center or if you misaligned the center somehow, is it ->> Archontis Politis: So translation is a problem. The transform is not robust translation. Of course, that depends on how much translated it is. Yes, it seems like you cannot have both. For example, you can use normal three-dimensional FFT, and that will be kind of translation invariant if you take the energy, but it won't be rotationally invariant at all, while these transforms are rotationally invariant but not translation invariant. For this problem, for example, on this study, we did some rough alignment of the heads first by using -- by detecting the ends of the ears, so all of the heads were pretty much aligned to have the inter-aural axis here, and the center was put in the middle of the inter-aural axis. So all of them had kind of a common reference point, and we were hoping after that that deviations would be small enough that it would still capture similarities between two heads without huge errors, for example. So after we implemented the basic transforms, we wanted to apply them to the head scans, and we needed to perform a series of steps to manage to do that, starting from a 3D mesh coming from laser scans. So the way we did that is we sampled the head scans by using first a ray tracer that was shooting rays in uniformly arranged directions all over the 3D sphere. And then we could find the collision points from the ray tracer with a head mesh, and based on these collision points, we can determine again in a grid of concentric spheres -- we can determine if points were inside the head or close to the boundary of the head and then use these sampled points as input to the transform. We used two cases of sampling. One was a solid sampling, so considering the head as a solid object, meaning that any point that was detected inside the head had a binary value of 1, and all the other points had zero. And the second case was considering the head as a cell with some width, meaning that all the points that were around with some margin to the head shape had a value of 1 until the other points were zero. The difference within these two conditions was we couldn't predict from the beginning which one would perform the best. Also, the cell sampling has the advantage that it's more robust to any errors or irregularities like holes, so it wouldn't care about that, while for the solid sampling, we need to make sure that the mesh is closed to detect if the points are within the head or outside the head. >>: Are you going to talk more about your sampling rates, like how dense from the center? >> Archontis Politis: Yes. We tried a few things. So in the beginning, I took very high sampling rates, so we had -- and I can actually show a little bit here, too. So we had a number of rays that were shooting from the center, and they were uniformly distributed, and these were around 15,000, and then they had concentric spheres that were at about 1 millimeter difference in radius. And finally decided that we would like to keep this radial resolution, but we reduced quite a lot the angular resolution by reducing the rays to around 5,000 without significant changes, at least not seen in this -- yes, in this study we did. But that was something that we found important also to optimize the transform, because if you do just a direct naive implementation of it, it's quite slow and intensive, costly, both in memory and in processing, but if we observe the fact that inside the transform, there is a spherical harmonic transform, which becomes just a single summation, if you make sure that your rays are as much uniformly distributed on the sphere as possible, so this integration becomes just a sum, then you can have a good spin up. And also, there were a few other optimizations, like for all the spheres that they were completely inside the head, because all the sums had just the value of 1 or 0 if it was the solid condition or the cell condition, we could also very quickly determine the coefficients of the transform. Yes. So this is an example of how the algorithm was -- how the method was -- what it was seeing, basically, so that's a head measure of my scary mentor, Mark, and this is an example of the ray tracer collision points, where here different colors model how many times the ray has exited the head and then it has entered the head again, if it hits, for example, the neck or the ear. And this is a case of the solid sub-link, so here, we have a very coarse condition of around 15 spheres, I think, only, which captures just the very basic shape of the head, while here I think I had around 200 spheres. And we can see that it captures the head quite well. And for the case of the cell sampling, so keeping only the points that are around the mesh, this is a case of 5,000 points, and this is a case of 15,000 points. Again, that has to depend with how much stuff you are trying to model, so in the beginning, we were really trying to capture all the variations on the ear, too, but then on the way we decided that it's probably better to separate the two things, first try to capture just the head shape and then maybe try to capture several of the ear shape and use that somehow. So that allowed us to reduce the number of sampling points quite a lot. And this is an example of how this head spectra looks like. This is the case for the SHOT. This is the case for the Fourier-Bessel transform, and these are the representations after we have computed the spectral energies, so these are these rotationally invariant vectors that we are actually using to compare similarities between the heads. And you can see that there are various periodicities. This has to do with the way the basis functions are indexed in the transform, and basically, what they mean is that every few coefficients, it comes the coefficient that corresponds to a wave function that grows only radially and it doesn't have any angular variation, and these coefficients integrate with the interior of the head, which is a solid, and they have a quite high value, basically. The coefficients that have angular variation, they get suppressed. So yes, first, by getting the spectra, then we are computing these energy spectra, which are also reduced in length, and then to compute the similarity between the heads, we just take the Euclidean distance between these energy spectra, the Euclidean distance between these vectors for two different heads, and we have a metric that determines somehow how closely they are in shape. >>: So before you cut to the results, I had wanted to ask about why you've chosen to evaluate two certain transforms, the SHOT and the Fourier-Bessel one, because are they both on a complete orthogonal basis function sets? >> Archontis Politis: Yes, yes. >>: So then theoretically, if you used ->> Archontis Politis: It wouldn't matter. >>: Then it shouldn't matter, right? So are you going to be comparing them in terms of the length of coefficients you need, or is there one you would expect -- >> Archontis Politis: There were a few results, and actually, they were kind of -- the original motivation for that was dropped in the midway, so the original idea was to go with a SHOT transform. It's novel and it has some nice properties. And one of them was that you could have data on a rectangular grid, and there was like perfectly defined transformation to go from this -to make the SHOT work on data on a rectangular grid without error, while in the case of a Fourier-Bessel transform, you would need interpolation of this data on the rectangular grid to a spherical grid, which means that you would lose some information there. So in that case, we are going with a SHOT, but also trying the Fourier-Bessel transform, because it's most commonly used in a way. But then we realized that since we can define the sampling grid we want ourselves and in a way that can speed up the computations a lot, we used spherical sampling grids, so in that case it wouldn't matter anymore which one of the two. But then I decided to use both of them, mainly because of this different boundary condition. So the Fourier-Bessel transform is determined in a solid sphere without seeing anything outside of it, and that sounds perform for a head, say, for example, because you can enclose it completely in a sphere, while the SHOT, you have to consider kind of the size of the shape, too, to be sure that it's captured enough -- yes, that the transform works, basically. And that wasn't clear, how it would work. So we decided to keep both of them. And then in terms of implementation, at least after you had the math done right in some functions, there was no difference in applying one or the other. Yes. And they were also comparable in terms of speed, quite much. >>: That would be the other consideration, if one of them is much fatter than the other, but they're kind of comparable? >> Archontis Politis: Yes. It wasn't. It wasn't in this case, at least. If somebody looks much more in detail on them, maybe there are stuff that can be optimized. Maybe the polynomials of one have some nicer formula to compute than on the other, but it wasn't anything apparent, at least. So yes, after we managed to apply the transforms to the heads, we wanted to see if we get anything reasonable out of these transforms, and one way to do that was to try to reconstruct the heads back from the coefficients, and since we could define this reconstruction in any kind of grid we wanted, this is an example of trying to reconstruct the head in a horizontal plane, passing through the ears, and we can see that for both transforms, the head shape is captured quite well, so it has a clear boundary going from very close to 1 down to 0, and also, here I have plotted directly the points that are coming from the ray trace or collision points with the mesh, the true mesh, and you can see that they are very well aligned along the boundary that the transform gives. And you can also see that things like the pinna lobe here is also captured, something, for example, the spherical harmonic transform wouldn't be able to model. Then, we could also reconstruct the full head on a 3D volumetric grid and get a 3D representation of it, so that's an example of the transform and how it performs going from a very low-order presentation, so using just a few coefficients, to a pretty high representation. And we can see that it starts from looking like a rough bowl to something that starts to get quite accurate with regards to the head. All right. Finally, after this work on the transforms and the applications to the head, we wanted finally to see if it was any useful to the problem of this HRTF matching, so using the head scan of a user, trying to detect the closest head scan in the database, and then pulling the HRTF and see if it would match that user at all. However, we decided as a first step to check not the full HRTF but the ITDs, mainly for the reason that the ITDs are better understood in the way that we know that they depend mainly on the head shape and the head size, while the HRTF magnitudes, they depend both on the head shape and size but also quite strongly on the ear and the ear shape. So the magnitude seemed that it needed some more work in order to separate it in a way that you would have head-related features and ear-related features. And ITD anyway is one of the two major components of the HRTFs, and we thought that it would be a nice first step for that. So the ITD basically models -- ITD means the role of time difference, and it models the time difference that it takes for a sound to propagate from one ear to the other, coming from a certain direction. And it looks something like that if you plot it in 3D space, measured for a single subject, and that's in milliseconds, I think, and it's almost zero in the medial plane, because there are no time differences between the two ears, and it gets its maximum on the complete extreme lateral directions, on the left and the right. So what we did to apply the method and evaluate it was first to apply the transform to all the heads in the database, get the spectra for each head. Then, since we also had the measured HRTFs for each head, we extracted the ITDs for each subject. We determined similarity between directly the ITDs for all of the subjects in the database by taking the Euclidean distance between ITDs for all directions for each subject, and having these two similarity metrics, one for the ITDs and one for the head spectra, we constructed a distance matrix first for the heads and then for the ITDs themselves. Then, we also used as baselines two quite popular approaches to using non-personalized HRTFs. One was to use an average ITD from the whole database, and the other was to use an ITD measured from an anthropomorphic mannequin, meaning that you take kind of a doll that models kind of an average person and you put microphones on its hear and you measure its HRTFs and then you use these HRTFs or ITDs for the user. So these are the baseline conditions of average and generic. The generic corresponds to the mannequin ITDs, and for each subject, we also compared the ITD distances between the subjects' own ITDs and the average and the generic ones. Before I move to the results, this is an example of what this similarity, head similarity, looks like. So in this case, I plot the three most similar heads for two original heads, and we can see that even though it's very hard to say visually anything about the transform itself, I think that it seems like it's getting something about the shape and the size of the heads, at least. So about the results, this is a pretty confusing plot, but what it shows, basically, is that for each subject in the database -- yes, so I forgot to mention that we used 144 subjects in the database, so for each subject in the database, this plot shows the ITD difference between their own ITDs and the ITDs that were returned from our method, which is the head, the blue line. The ITD difference between the subject's own and the average ITD and the same for the subject's own ITDs and the mannequin, which is this HAT. Also, the bottom line, the purple one, shows the ITD distance between its subject and its closest match in terms of ITD distance itself. So if we take one subject, which other subject is closer, has the least ITD distance to that subject, and how much is that? And we can see that this line basically defines a lower bound in performance in what we can get with our method, because if we can select a person from the database randomly, we can never go below that line, or by using any kind of algorithm. And for the rest of the lines, it's pretty hard to see what is going on. We can say that the HATS, the generic head, doesn't seem to not perform too well. It has the most variation, the yellow line, while the average on the head, they look pretty close. So in order to evaluate that a bit further, we created some scores that they say basically for how many subjects the method was performing better, so the blue line was better than the average, the orange line, or the yellow, the generic ITD. And the scores were basically this, so what they say is that for 64% of the subjects, the method based on the SHOT transform was performing better than the average ITD, and for 71% of the subjects, it was performing better than the generic ITD. And also, the Fourier-Bessel transform looks to be quite close but performs slightly worse, and the spherical harmonic transform, so we also included that for comparison and evaluation, so just applying the more traditional approach of using the spherical harmonic transform to take the similarity of the heads, performs significantly worse than the full three-dimensional transforms, which is a good motivation to keep looking at these transforms, even though they require some extra work compared to the simplest approach. One comment about the average and ITD is that there seems to be a high number of subjects here that the orange line is pretty low, so it seems that there are subjects inside the database that are very close to the average ITD. And actually, for some cases, the orange line goes even below the purple line, which means that there is not a single subject in the whole database that has a closer ITD to the user, to the specific user, than the average ITD. So it seems that it would be advantageous to make some preliminary study on the ITDs or the HRTFs themselves to cluster somehow the users inside them, and that can probably give more information for the matching problem, since it seems like there are subjects that are very average in a way, very close to the average HRTF. But then, there are also subjects that are very far from that, so they're highly individualized, in a way. Some comments on future work or potential next steps, so of course, the ITD is one part of the HRTF. The second part is the HRTF magnitudes, but as I mentioned, this doesn't work directly because of the effect of the pinna, so one idea was to apply this shape similarity on the ear shapes too, and in that way, we could end up with two sets of similarity -sorry, of spectra, one for the head and one for the ear, and then somehow use these two similarities to pick up HRTF magnitudes from the database. However, that somehow requires some kind of factorization or decomposition of the HRTF magnitudes into head-related and pinna-related components. If there is a way to do that, then it also means that you can also probably pull up the head-related part from one subject and the pinna-related part from a different subject if the algorithm looks that this is a good way to go, if the method shows that. Finally, to conclude, this was a study on some 3D spherical transforms that they have not been used before in acoustics, and they seem to have interesting properties with potential to interpolation, registration or similarity finding or matching of 3D data. And it looks to us and we did some preliminary steps to validate that, that they're suitable for some larger-scale applications, such as 3D interpolation of near-field HRTFs and HRTF personalization, and we've got some promising results on personalization of HRTFs by application of these transforms on head meshes and on ITDs. And that should be it. Thank you very much, and I would like to thank also especially the guys on the team, my mentor, Mark Thomas, Hannes, David and Ivan that is not here today, and also the great interns that they left already and they left me alone here, Matt, Long and Supreeth. I hope they will watch this presentation in the future, or probably not. Yes, anyway, thank you very much. >>: So one quick question. You said you'd talked about, you'd thought about using Kinect or whatever to do the 3D head scans. Did you actually do much of that, or did you just use the scans from the head database that you already had? >> Archontis Politis: Yes, no, this study didn't use this lower kind of quality scans. It was based on pretty high-quality laser scans that they were measured by the guys here. So that would be a natural progression, to check it with lower resolution scans or scans of the same person but measured in different conditions, different setups, different methods, maybe, to generate the scan, and hopefully, the method should be robust to that. So the same person measures with different setups and ways, after some alignment for this translation problem, based on some very rough features, for example. They should be detected as very similar, compared to other heads, hopefully. But yes, we haven't checked that yet. >>: Can you give an idea of how long it would take to process one head? Say you have everything set up, and then I scan somebody. >> Archontis Politis: Well, for the resolution that we used, at least for these results, that would be around a minute per head, per transform. So the transforms were pretty much the same time. Using the same sample grid, it was about a minute each. And this was a pretty high-resolution way that maybe 1 millimeter for each sphere is not required, but these things need to be checked individually with some similar cases, not just through a full head scanning and check what is going on, because it's pretty easy to probably detect sampling conditions for the angular variation, but at least to me, it's not apparent to detect the condition across the radial dimension. There is a radius squared on the integration there and some stuff that makes it a bit more involved. >>: You didn't mention the other motivation for using SHOT. >> Archontis Politis: Oh, yeah, actually, that was the main one. How could I forget it. >>: We could take it after we're done. >> Archontis Politis: Yes, so to your question about why using one or the other, SHOT sounded great, because you combined with head lots of functions, and you have head SHOT, and that narrows it down. >>: We have no further questions. No one online? >> Archontis Politis: Sorry? >>: No one online has asked any questions? >> Archontis Politis: I don't see anything here. >>: Thanks. Thank you.

>> Mark Thomas: Hello, everyone. I'd like to... working with me for the past 12 weeks. Archontis...

Related documents

Products

Support

&gt;&gt; Mark Thomas: Hello, everyone. I'd like to... working with me for the past 12 weeks. Archontis...

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib

>> Mark Thomas: Hello, everyone. I'd like to... working with me for the past 12 weeks. Archontis...