Springer Topics in Signal Processing Franz Zotter Matthias Frank Ambisonics A Practical 3D Audio Theory for Recording, Studio Production, Sound Reinforcement, and Virtual Reality Springer Topics in Signal Processing Volume 19 Series Editors Jacob Benesty, INRS-EMT, University of Quebec, Montreal, QC, Canada Walter Kellermann, Erlangen-Nürnberg, Friedrich-Alexander-Universität, Erlangen, Germany The aim of the Springer Topics in Signal Processing series is to publish very high quality theoretical works, new developments, and advances in the field of signal processing research. Important applications of signal processing will be covered as well. Within the scope of the series are textbooks, monographs, and edited books. More information about this series at http://www.springer.com/series/8109 Franz Zotter Matthias Frank • Ambisonics A Practical 3D Audio Theory for Recording, Studio Production, Sound Reinforcement, and Virtual Reality Franz Zotter Institute of Electronic Music and Acoustics University of Music and Performing Arts Graz, Austria Matthias Frank Institute of Electronic Music and Acoustics University of Music and Performing Arts Graz, Austria ISSN 1866-2609 ISSN 1866-2617 (electronic) Springer Topics in Signal Processing ISBN 978-3-030-17206-0 ISBN 978-3-030-17207-7 (eBook) https://doi.org/10.1007/978-3-030-17207-7 © The Editor(s) (if applicable) and The Author(s) 2019. This book is an open access publication. Open Access This book is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made. The images or other third party material in this book are included in the book’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the book’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Switzerland AG. The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland Preface The intention of this textbook is to provide a concise explanation of fundamentals and background of the surround sound recording and playback technology Ambisonics. Despite the Ambisonic technology has been practiced in the academic world for quite some time, it is happening now that the recent ITU,1 MPEG-H,2 and ETSI3 standards firmly fix it into the production and media broadcasting world. What is more, Internet giants Google/YouTube recently recommended to use tools that have been well adopted from what the academic world is currently using.4,5 Last but most importantly, the boost given to the Ambisonic technology by recent advancements has been in usability: Ways to obtain safe Ambisonic decoders,6,7 the availability of higher-order Ambisonic main microphone arrays (Eigenmike,8 Zylia9) and their filter-design theory, and above all: the usability increased by plugins integrating higher-order Ambisonic production in digital audio workstations or mixers.7,10,11,12,13,14,15 And this progress was a great motivation to write a book about the basics. 1 https://www.itu.int/rec/R-REC-BS.2076/en. https://www.iso.org/standard/69561.html. 3 https://www.techstreet.com/standards/etsi-ts-103-491?product_id=1987449. 4 https://support.google.com/jump/answer/6399746?hl=en. 5 https://developers.google.com/vr/concepts/spatial-audio. 6 https://bitbucket.org/ambidecodertoolbox/adt.git. 7 https://plugins.iem.at/. 8 https://mhacoustics.com/products. 9 https://www.zylia.co. 10 http://www.matthiaskronlachner.com/?p=2015. 11 http://www.blueripplesound.com/product-listings/pro-audio. 12 https://b-com.com/en/bcom-spatial-audio-toolbox-render-plugins. 13 https://harpex.net/. 14 http://forumnet.ircam.fr/product/panoramix-en/. 15 http://research.spa.aalto.fi/projects/sparta_vsts/. 2 v vi Preface The book is dedicated to provide a deeper understanding of Ambisonic technologies, especially for but not limited to readers who are scientists, audio-system engineers, and audio recording engineers. As, from time to time, the underlying maths would get too long for practical readability, the book comes with a comprehensive appendix with the beautiful mathematical details. For a common understanding, the introductory section spans a perspective on Ambisonics from its origins in coincident recordings from the 1930s, to the Ambisonic concepts from the 1970s, and to classical ways of applying Ambisonics in first-order coincident sound scene recording and reproduction that have been practiced from the 1980s on. In its main contents, this book intends to provide all psychoacoustical, signal processing, acoustical, and mathematical knowledge needed to understand the inner workings of modern processing utilities, special equipment for recording, manipulation, and reproduction in the higher-order Ambisonic format. As advanced outcomes, the aim of the book is to explain higher-order Ambisonic decoding, 3D audio effects, and higher-order Ambisonic recording with microphones or main microphone arrays. Those techniques are shown to be suitable to supply audience areas ranging from studio-sized to hundreds of listeners, or headphone-based playback, regardless whether it is live, interactive, or studio-produced 3D audio material. The book comes with various practical examples based on free software tools and open scientific data for reproducible research. Our Ambisonic events experience: In the past years, we have contributed to organizing Symposia on Ambisonics (Ambisonics Symposium 2009 in Graz, 2010 in Paris, 2011 in Lexington, 2012 in York, 2014 in Berlin), demonstrated and brought the technology to various winter/summer schools and conferences (EAA Winter School Merano 2013, EAA Symposium Berlin 2014, workshops and Ambisonic music repertory demonstration at Darmstädter Ferienkurse für Neue Musik in 2014, ICAD workshop in Graz 2015, ICSA workshop 2015 in Graz with PURE Ambisonics night, summer school at ICSA 2017 in Graz, a course at Kraków film music festival 2015, mAmbA demo facility DAGA in Aachen 2016, Al Di Meola’s live 3D audio concert hosted in Graz in June 2016, and AES Convention Milano 2018. In 2017 (ICSA Graz) and 2018 (TMT Cologne), we initiated and organized Europe’s First and Second Student 3D Audio Production Competition together with Markus Zaunschirm and Daniel Rudrich. Graz, Austria February 2019 Franz Zotter Matthias Frank Acknowledgements To our lab and colleagues: Traditionally at the Institute of Electronic Music and Acoustics (IEM), there had been a lot of activity in developing and applying Ambisonics by Robert Höldrich, Alois Sontacchi, Markus Noisternig, Thomas Musil, Johannes Zmölnig, Winfried Ritsch, even before our active time of research. Most of the developments were done in pure-data, e.g., with [iem_ambi], [iem_bin_ambi], [iem_matrix], CUBEmixer. Dear colleagues deserve to be mentioned who contributed a lot of skill to improve the usability of Ambisonics: Hannes Pomberger and his mathematical talent, Matthias Kronlachner who developed the ambix and mcfx VST plugin suites in 2014, and Daniel Rudrich, who developed the IEM Plugin Suite, that also involves technology that was elaborated together with our colleagues Markus Zaunschirm, Christian Schörkhuber, Sebastian Grill. We thank you all for your support; it’s the best environment to work in! First readers: We thank Archontis Politis (Aalto Univ., Espoo and Tampere Univ., Finland), Nicolas Epain (b<>com, France), and Matthias Kronlachner (Harman, Germany/US), to be our first critical readers, supplying us with valuable comments. Open Access Funding: We are grateful about funding from our local government of Styria (Steiermark) Section 8, Office for Science and Research (Wissenschaft und Forschung) that covers roughly half the open access publishing costs. We gratefully thank our University (University of Music and Performing Arts, “Kunstuni”, Graz) for the other half, transition to open access library, and vice rectorate for research. vii Outline First-order Ambisonics is nowadays strongly revived by internet technology supported by Google/YouTube, Facebook 360°, 360° audio and video recording and rendering, as well as VR in games. This renaissance lies in its benefits of (i) its compact main microphone arrays capturing the entire surrounding sound scene in only four audio channels (e.g., Zoom H3-VR, Oktava A-Format Microphone, Røde NT-SF1, Sennheiser AMBEO VR Mic.), and (ii) it easily permits rotation of the sound scene, allowing to render surround audio scenes, e.g., on head-tracked headphones, head-mounted AR/VR sets, or mobile devices, as described in Chap. 1. Auditory events and vector-base panning: Chapter 2 of this book is dedicated to conveying a comprehensive understanding of the localization impressions in multi-loudspeaker playback and its models, followed by Chap. 3 that outlines the essentials of practical vector panning models and their extensions by downmix from imaginary loudspeakers, which are both fundamental to contemporary Ambisonics. Harmonic functions, Ambisonic encoding and decoding: Based on the ideals of accurate localization with panning-invariant loudness and perceived width, Chap. 4 provides a profound mathematical derivation of higher-order Ambisonic panning functions in 2D and 3D in terms of angular harmonics. These idealized functions can be maximized in their directional focus (max-rE) and they are strictly limited in their directional resolution. This resolution limit entails perfectly well-defined constraints on loudspeaker layouts that make us reach ideal measures for accurate localization as well as panning-invariant loudness and width. And what is highly relevant for practical decoding: All-Round Ambisonic decoding to loudspeakers and TAC/MagLS decoders for headphones are explained in Chap. 4. The Ambisonic signal processing chain and effects are described in Chap. 5. It illustrates the signal flow from source encoding through Ambisonic bus to decoding and where input-specific or general insert and auxiliary Ambisonic effects are located. In particular, the chapter describes the working principles behind frequency-independent manipulation effects that are either mirroring/rotating/ re-mapping, warping, or directionally weighting, or such effects that are frequency-dependent. Frequency-dependent effects can introduce widening, depth or diffuseness, convolution reverb, or feedback-delay-network (FDN)-based diffuse ix x Outline reverberation. Directional resolution enhancements are outlined in terms of SDM/SIRR pre-processing of recorded reverberation and in terms of available tools such as HARPEX, DirAC, and COMPASS for recorded signals. Compact higher-order Ambisonic microphones rely on the solutions of the Helmholtz equation, and their processing uses a frequency-independent decomposition of the spherical array signals into spherical harmonics and the frequency-dependent radial-focusing filtering associated with each spherical harmonic order, which yield the Ambisonic signals. The critical part is to handle the properties of radial-focusing filters in the processing of higher-order Ambisonic microphone arrays (e.g., the Eigenmike). To keep the noise level and the sidelobes in the recordings low and a balanced frequency response, a careful way for radial filter design is outlined in Chap. 6. Compact higher-order loudspeaker arrays oppose the otherwise inwardsoriented Ambisonic surround playback, as described in Chap. 7. This outlooking last chapter discusses IKO and loudspeaker cubes as compact spherical loudspeaker arrays with Ambisonically controlled radiation patterns. In natural environments with acoustic reflections, such directivity-controlled arrays have their own sound-projecting and distance-changing effects, and they can be used to simulate sources of specific directivity patterns. Contents 1 XY, MS, and First-Order Ambisonics . . . . . . . . . . . . . . . . . . . . 1.1 Blumlein Pair: XY Recording and Playback . . . . . . . . . . . . 1.2 MS Recording and Playback . . . . . . . . . . . . . . . . . . . . . . . 1.3 First-Order Ambisonics (FOA) . . . . . . . . . . . . . . . . . . . . . 1.3.1 2D First-Order Ambisonic Recording and Playback 1.3.2 3D First-Order Ambisonic Recording and Playback 1.4 Practical Free-Software Examples . . . . . . . . . . . . . . . . . . . 1.4.1 Pd with Iemmatrix, Iemlib, and Zexy . . . . . . . . . . 1.4.2 Ambix VST Plugins . . . . . . . . . . . . . . . . . . . . . . . 1.5 Motivation of Higher-Order Ambisonics . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 Auditory Events of Multi-loudspeaker Playback . . . . . . . . . . . . . . 2.1 Loudness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Direction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.1 Time Differences on Frontal, Horizontal Loudspeaker Pair . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.2 Level Differences on Frontal, Horizontal Loudspeaker Pair . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.3 Level Differences on Horizontally Surrounding Pairs . 2.2.4 Level Differences on Frontal, Horizontal to Vertical Pairs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.5 Vector Models for Horizontal Loudspeaker Pairs . . . . 2.2.6 Level Differences on Frontal Loudspeaker Triangles . 2.2.7 Level Differences on Frontal Loudspeaker Rectangles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.8 Vector Model for More than 2 Loudspeakers . . . . . . . 2.2.9 Vector Model for Off-Center Listening Positions . . . . . . . . . . . . . . . 1 2 3 5 6 9 13 13 13 18 21 .. .. .. 23 24 25 .. 25 .. .. 25 27 .. .. .. 28 28 31 .. .. .. 31 32 32 . . . . . . . . . . . xi xii Contents 2.3 Width . . . . . . . . . . . . . . . . . . . . . . . . 2.3.1 Model of the Perceived Width 2.4 Coloration . . . . . . . . . . . . . . . . . . . . . 2.5 Open Listening Experiment Data . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Amplitude Panning Using Vector Bases . . . . . . . . . . . . . . . . 3.1 Vector-Base Amplitude Panning (VBAP) . . . . . . . . . . . . 3.2 Multiple-Direction Amplitude Panning (MDAP) . . . . . . . 3.3 Challenges in 3D Triangulation: Imaginary Loudspeaker Insertion and Downmix . . . . . . . . . . . . . . . . . . . . . . . . . 3.4 Practical Free-Software Examples . . . . . . . . . . . . . . . . . 3.4.1 VBAP/MDAP Object for Pd . . . . . . . . . . . . . . . 3.4.2 SPARTA Panner Plugin . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 35 36 38 38 ...... ...... ...... 41 42 44 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 50 50 51 52 4 Ambisonic Amplitude Panning and Decoding in Higher Orders . 4.1 Direction Spread in First-Order 2D Ambisonics . . . . . . . . . . 4.2 Higher-Order Polynomials and Harmonics . . . . . . . . . . . . . . 4.3 Angular/Directional Harmonics in 2D and 3D . . . . . . . . . . . 4.4 Panning with Circular Harmonics in 2D . . . . . . . . . . . . . . . . 4.5 Ambisonics Encoding and Optimal Decoding in 2D . . . . . . . 4.6 Listening Experiments on 2D Ambisonics . . . . . . . . . . . . . . 4.7 Panning with Spherical Harmonics in 3D . . . . . . . . . . . . . . . 4.8 Ambisonic Encoding and Optimal Decoding in 3D . . . . . . . . 4.9 Ambisonic Decoding to Loudspeakers . . . . . . . . . . . . . . . . . 4.9.1 Sampling Ambisonic Decoder (SAD) . . . . . . . . . . . 4.9.2 Mode Matching Decoder (MAD) . . . . . . . . . . . . . . 4.9.3 Energy Preservation on Optimal Layouts . . . . . . . . . 4.9.4 Loudness Deficiencies on Sub-optimal Layouts . . . . 4.9.5 Energy-Preserving Ambisonic Decoder (EPAD) . . . . 4.9.6 All-Round Ambisonic Decoding (AllRAD) . . . . . . . 4.9.7 EPAD and AllRAD on Sub-optimal Layouts . . . . . . 4.9.8 Decoding to Hemispherical 3D Loudspeaker Layouts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.10 Practical Studio/Sound Reinforcement Application Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.11 Ambisonic Decoding to Headphones . . . . . . . . . . . . . . . . . . 4.11.1 High-Frequency Time-Aligned Binaural Decoding (TAC) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.11.2 Magnitude Least Squares (MagLS) . . . . . . . . . . . . . 4.11.3 Diffuse-Field Covariance Constraint . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 54 57 58 58 61 61 67 71 72 72 73 73 74 74 75 77 ... 77 ... ... 81 85 ... ... ... 88 89 90 . . . . . Contents 4.12 Practical Free-Software Examples . . . . . . . . . . . . . . . . 4.12.1 Pd and Circular/Spherical Harmonics . . . . . . . 4.12.2 Ambix Encoder, IEM MultiEncoder, and IEM AllRADecoder . . . . . . . . . . . . . . . . . . . . . . . . 4.12.3 Reaper, IEM RoomEncoder, and IEM BinauralDecoder . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii ....... ....... 91 91 ....... 92 ....... ....... 95 96 5 Signal Flow and Effects in Ambisonic Productions . . . . . . . . . 5.1 Embedding of Channel-Based, Spot-Microphone, and First-Order Recordings . . . . . . . . . . . . . . . . . . . . . . . 5.2 Frequency-Independent Ambisonic Effects . . . . . . . . . . . . 5.2.1 Mirror . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2.2 3D Rotation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2.3 Directional Level Modification/Windowing . . . . . 5.2.4 Warping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3 Parametric Equalization . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4 Dynamic Processing/Compression . . . . . . . . . . . . . . . . . . 5.5 Widening (Distance/Diffuseness/Early Lateral Reflections) 5.6 Feedback Delay Networks for Diffuse Reverberation . . . . 5.7 Reverberation by Measured Room Impulse Responses and Spatial Decomposition Method in Ambisonics . . . . . . 5.8 Resolution Enhancement: DirAC, HARPEX, COMPASS . 5.9 Practical Free-Software Examples . . . . . . . . . . . . . . . . . . 5.9.1 IEM, ambix, and mcfx Plug-In Suites . . . . . . . . . 5.9.2 Aalto SPARTA . . . . . . . . . . . . . . . . . . . . . . . . . 5.9.3 Røde . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..... 99 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 103 104 106 107 109 110 111 112 114 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116 119 120 120 125 126 127 . . . . . . . . . . . . 131 132 132 133 . . . . . . . . . . . . . . . . . . . . . . . . 133 135 137 139 141 142 145 146 6 Higher-Order Ambisonic Microphones and the Wave Equation (Linear, Lossless) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.1 Equation of Compression . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2 Equation of Motion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.3 Wave Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.3.1 Elementary Inhomogeneous Solution: Green’s Function (Free Field) . . . . . . . . . . . . . . . . . . . . . . . 6.4 Basis Solutions in Spherical Coordinates . . . . . . . . . . . . . . . 6.5 Scattering by Rigid Higher-Order Microphone Surface . . . . . 6.6 Higher-Order Microphone Array Encoding . . . . . . . . . . . . . . 6.7 Discrete Sound Pressure Samples in Spherical Harmonics . . . 6.8 Regularizing Filter Bank for Radial Filters . . . . . . . . . . . . . . 6.9 Loudness-Normalized Sub-band Side-Lobe Suppression . . . . 6.10 Influence of Gain Matching, Noise, Side-Lobe Suppression . . xiv Contents 6.11 Practical Free-Software Examples . . . . . . . . . . . . . . . . . . 6.11.1 Eigenmike Em32 Encoding Using Mcfx and IEM Plug-In Suites . . . . . . . . . . . . . . . . . . . . . . . . . . 6.11.2 SPARTA Array2SH . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 Compact Spherical Loudspeaker Arrays . . . . . . . . . . . . . . . 7.1 Auditory Events of Ambisonically Controlled Directivity 7.1.1 Perceived Distance . . . . . . . . . . . . . . . . . . . . . . 7.1.2 Perceived Direction . . . . . . . . . . . . . . . . . . . . . 7.2 First-Order Compact Loudspeaker Arrays and Cubes . . . 7.3 Higher-Order Compact Spherical Loudspeaker Arrays and IKO . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.3.1 Directivity Control . . . . . . . . . . . . . . . . . . . . . . 7.3.2 Control System and Verification Based on Measurements . . . . . . . . . . . . . . . . . . . . . . . 7.4 Auditory Objects of the IKO . . . . . . . . . . . . . . . . . . . . . 7.4.1 Static Auditory Objects . . . . . . . . . . . . . . . . . . . 7.4.2 Moving Auditory Objects . . . . . . . . . . . . . . . . . 7.5 Practical Free-Software Examples . . . . . . . . . . . . . . . . . 7.5.1 IEM Room Encoder and Directivity Shaper . . . . 7.5.2 IEM Cubes 5.1 Player and Surround with Depth 7.5.3 IKO . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148 . . . . . 148 . . . . . 150 . . . . . 151 . . . . . . . . . . . . . . . . . . . . . . . . . 153 154 154 154 155 . . . . . . 157 . . . . . . 159 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162 164 164 165 166 166 167 169 169 Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171 Chapter 1 XY, MS, and First-Order Ambisonics Directionally sensitive microphones may be of the light moving strip type. […] the strips may face directions at 45◦ on each side of the centre line to the sound source. Alan Dower Blumlein [1], Patent, 1931 Abstract This chapter describes first-order Ambisonic technologies starting from classical coincident audio recording and playback principles from the 1930s until the invention of first-order Ambisonics in the 1970s. Coincident recording is based on arrangements of directional microphones at the smallest-possible spacings in between. Hereby incident sound approximately arrives with equal delay at all microphones. Intensity-based coincident stereophonic recording such as XY and MS typically yields stable directional playback on a stereophonic loudspeaker pair. While the stereo width is adjustable by MS processing, the directional mapping of firstorder Ambisonics is a bit more rigid: the omnidirectional and figure-of-eight recording pickup patterns are reproduced unaltered by equivalent patterns in playback. In perfect appreciation of the benefits of coincident first-order Ambisonic recording technologies in VR and field recording, the chapter gives practical examples for encoding, headphone- and loudspeaker-based decoding. It concludes with a desire for a higher-order Ambisonics format to get a larger sweet area and accommodate first-order resolution-enhancement algorithms, the embedding of alternative, channel-based recordings, etc. Intensity-based coincident stereophonic recording such as XY uses two figure-ofeight microphones, after Blumlein’s original work [1] from the 1930s, with an angular spacing of 90◦ , see [2–4]). Another representative, MS, uses an omnidirectional and a lateral figure-of-eight microphone [2]. Both typically yield a stable directional playback in stereo, but signals often get too correlated, yielding a lack in depth and diffuseness of the recording space when played back [5, 6] and compared to delay-based AB stereophony or equivalence-based alternatives. Gerzon’s work in the 1970s [7] gave us what we call first-order Ambisonic recording and playback technology today. Ambisonics preserves the directional mapping © The Author(s) 2019 F. Zotter and M. Frank, Ambisonics, Springer Topics in Signal Processing 19, https://doi.org/10.1007/978-3-030-17207-7_1 1 2 1 XY, MS, and First-Order Ambisonics by recording and reproducing with spatially undistorted omnidirectional and figureof-eight patterns on circularly (2D) or spherically (3D) surrounding loudspeaker layouts. 1.1 Blumlein Pair: XY Recording and Playback The XY technique dates back to Blumlein’s patent from the 1930s [1] and his patents thereafter [4]. Nowadays outdated, manufacturers started producing ribbon microphones that offered means to record with figure-of-eight pickup patterns. Blumlein Pair using 90◦ -angled figure-of-eight microphones (XY). Blumlein’s classic coincident microphone pair [3, Fig. 3] uses two figure-of-eight microphones pointing to ±45◦ , see Fig. 1.1. Its directional pickup pattern is described by cos φ when φ is the angle enclosed by microphone aiming and sound source. Using a mathematically positive coordinate definition for X (front-right) and Y (front-left), the polar angle ϕ = 0 aiming at the front, the figure-of-eight X uses the angle φ = ϕ + 45◦ and Y the angle φ = ϕ − 45◦ , so that the pickup pattern of the microphone pair is: cos(ϕ + 45◦ ) . gXY (ϕ) = cos(ϕ − 45◦ ) (1.1) Assuming a signal s coming from the angle ϕ, the signals recorded are [X, Y ]T g(ϕ) s. Sound sources from the left 45◦ , the front 0◦ and the right −45◦ will be received by the pair of gains: X Y (a) Blumlein XY pair (b) Picture of the recording setup Fig. 1.1 Blumlein pair consisting of 90◦ -angled figure-of-eight microphones 1.1 Blumlein Pair: XY Recording and Playback 1 ◦ right : gXY (−45 ) = , 0 3 ⎡ ◦ center : gXY (0 ) = ⎤ √1 ⎣ 2⎦, √1 2 0 . left : gXY (45 ) = 1 ◦ Obviously, a source moving from the right −45◦ to the left 45◦ pans the signal from the channel X to the channel Y. This property provides a strongly perceivable lateralization of lateral sources when feeding the left and right channel of a stereophonic loudspeaker pair by Y and X, respectively. However, ideally there should not be any dominant sounds arriving from the sides, as for the source angles between −135◦ ≤ ϕ ≤ −45◦ and 45◦ ≤ ϕ ≤ 135◦ the Blumlein pair produces out-of-phase signals between X and Y. The back directions are mapped with consistent sign again, however, left-right reversed. It is only possible to avoid this by decreasing the angle between the microphone pair, which, however, would make the stereo image narrower. Therefore, coincident XY recording pairs nowadays most often use cardioid directivities 21 + 21 cos ϕ, instead. They receive all directions without sign change and easily permit stereo width adjustments by varying the angle between the microphones. 1.2 MS Recording and Playback Blumlein’s patent [1] considers sum and difference signals between a pair of channels/microphones, yielding M-S stereophony. In M-S [8], the sum signal represents the mid (omnidirectional, sometimes cardioid-directional to front) and the difference the side signal (figure-of-eight). MS recordings can also be taken with cardioid microphones and permit manipulation of the stereo width of the recording. MS recording by omnidirectional and figure-of-eight microphone (native MS). Mid-side recording can be done by using a pair of coincident microphones with an omnidirectional (mid, W) and a side-ways oriented figure-of-eight (side, Y) directivity, Fig. 1.2. The pair of pickup patterns is described by the vector: W Y (a) Native MS recording (b) Picture of the recording setup Fig. 1.2 Native mid-side recording with the coincident arrangement of an omnidirectional microphone heading front and a figure-of-eight microphone heading left 4 1 XY, MS, and First-Order Ambisonics gWY (ϕ) = 1 sin(ϕ) (1.2) that depends on the angle ϕ of the sound source. Equation (1.2) maps a single sound s from ϕ to the mid W and side Y signals by the gains [W, Y ]T = g(ϕ) s 1 left : gWY (90 ) = , 1 1 right : gWY (−90 ) = −1 ◦ ◦ 1 center : gWY (0 ) = . 0 ◦ MS recording with a pair of 180◦ -angled cardioids. Two coincident cardioid microphones (cardioid directivity 21 + 21 cos ϕ) pointing to the polar angles 90◦ (left) and −90◦ (right) are also applicable to mid-side recording, Fig. 1.3. Their pickup patterns gC±90◦ (ϕ) = 1 1 + sin(ϕ) 1 1 + cos(ϕ − 90◦ ) = 2 1 + cos(ϕ + 90◦ ) 2 1 − sin(ϕ) (1.3) are encoded into the MS pickup patterns (W,Y) by a matrix 1 1 gWY (ϕ) = g ◦ (ϕ). 1 −1 C±90 (1.4) The matrix eliminates the cardioids’ figure-of-eight characteristics by their sum signal, and their omnidirectional characteristics by the difference. We obtain the MS signal pair (W,Y) from the cardioid microphone signals as W 1 1 C90◦ = . Y 1 −1 C−90◦ (1.5) W Y (a) 180 angled cardioid microphones Fig. 1.3 Mid-side recording by 180◦ -angled cardioids (b) Picture of the recording setup 1.2 MS Recording and Playback 5 (a) Changing the MS stereo width (b) MS decoding to loudspeakers Fig. 1.4 Change of the stereo width by modifying the balance between W and Y signals of MS (left). Decoding of the M/S signal pair (W, Y) to a stereo loudspeaker pair (right) Decoding of MS signals to a stereo loudspeaker pair. Decoding of the mid-side signal pair to left and right loudspeaker is done by feeding both signals to both loudspeakers, however out-of-phase for the side signal, Fig. 1.4b: 1 1 1 W L = . R 2 1 −1 Y (1.6) An interesting aspect about the 180◦ -angled cardioid microphone MS is that after inserting the XY-to-MS encoder Eq. (1.5) into the decoder Eq. (1.6), a brief calculation shows that matrices invert each other. In this case, the cardioid signals are directly fed to the loudspeakers [L , R] = [C90◦ , C−90◦ ]. Stereo width. Modifying the mid versus side signal balance before stereo playback, using a blending parameter α, allows to change the width of the stereo image from α = 0 (narrow) to α = 1 (full), Fig. 1.4a, see also [9]: 1 1 1 2−α 0 W L = . R 0 α Y 2 1 −1 (1.7) In stereophonic MS playback, the playback loudspeaker directions at ±30◦ are not identical to the peaks of the recording pickup pattern of the side channel (Y) at ±90◦ . Ambisonics assumes a more strict correspondence between directional patterns of recording and patterns mapped on the playback system. 1.3 First-Order Ambisonics (FOA) After Cooper and Shiga [10] worked on expressing panning strategies for arbitrary surround loudspeaker setups in terms of a directional Fourier series, the notion and technology of Ambisonics was developed by Felgett [11], Gerzon [7], and Craven [12]. In particular, they were also considering a suitable recording technology. Essentially based on similar considerations as MS, one can define first-order Ambisonic recording. For 2D recordings, a Double-MS microphone arrangement is suitable and only requires one more microphone than MS recording: a front-back 6 1 XY, MS, and First-Order Ambisonics oriented figure-of-eight microphone. The scheme is extended to 3D first-order Ambisonics by a third figure-of-eight microphone of up-down aiming. Oftentimes, first-order Ambisonics still is the basis of nowadays’ virtual reality applications and 360◦ audio streams on the internet. In addition to potential loudspeaker playback, it permits interactive playback on head-tracked headphones to render the acoustic sound scene static to the listener. First-order Ambisonic recording has the advantage that it can be done with only a few high-quality microphones. However, the sole distribution of first-order Ambisonic recordings to playback loudspeakers is typically not convincing without going to higher orders and directional enhancements (Sect. 5.8). 1.3.1 2D First-Order Ambisonic Recording and Playback The first-order Ambisonic format in 2D consists of one signal corresponding to an omnidirectional pickup pattern (called W), and two signals corresponding to the figure-of-eight pickup patterns aligned with the Cartesian axes (X and Y). Native 2D Ambisonic recording (Double-MS). To record the Ambisonic channels W, X, Y, one can use a Double-MS arrangement as shown in Fig. 1.5. 2D Ambisonic recording with four 90◦ -angled cardioids. Extending the MS scheme for recording with cardioid microphones, Fig. 1.3, cardioid microphones could be used to obtain the front-back and left-right figure-of-eight pickup patterns by corresponding pair-wise differences, and one omnidirectional pattern as their sum, Fig. 1.6. However, the use of 4 microphones for only 3 output signals is inefficient. X W Y (a) Native 2D FOA recording (b) Picture of recording setup Fig. 1.5 Native 2D first-order Ambisonic recording with an omnidirectional and a figure-of-eight microphone heading front, and a figure-of-eight microphone heading left; photo shown on the right 1.3 First-Order Ambisonics (FOA) 7 W X Y (a) 2D FOA with 4 cardioid microphones (b) Picture of recording setup Fig. 1.6 2D first-order Ambisonic recording with four 90◦ -angled cardioid microphones, sums and differences between them (front±back, left±right) W Y (a) 2D FOA with 3 cardioid microphones (b) Picture of recording setup Fig. 1.7 2D first-order Ambisonics with three 120◦ -angled cardioid microphones 2D Ambisonic recording with three 120◦ -angled cardioids. Assuming 3 coincident cardioid microphones aiming at the angles 0◦ , ±120◦ in the horizontal plane, cf. Fig. 1.7, we obtain as the pickup pattern for the incoming sound ⎤ ⎡ cos(ϕ) 1 1⎣ cos(ϕ + 120◦ )⎦ . g(θ) = + 2 2 cos(ϕ − 120◦ ) Combining all the three microphone signals yields an omnidirectional pickup N −1 cos(ϕ + 2π k) = 0. Moreover introducing the differences between pattern as k=0 N the front and two back microphone signals and between the left and right microphone signals yields an encoding matrix to obtain the omnidirectional W and the two X and Y figure-of-eight characteristcs ⎤ ⎡ ⎤ ⎡ 1 1 1 1 2⎣ ⎦ ⎣ ⎦ 2 −1 √ g(ϕ) = cos(ϕ) . √ −1 3 sin(ϕ) 3− 3 0 (1.8) 8 1 XY, MS, and First-Order Ambisonics Fig. 1.8 2D first-order Ambisonic decoding to 4 loudspeakers X Y W 2D Ambisonic decoding to loudspeakers. The W, X, and Y channel of 2D firstorder Ambisonics (Double-MS) can easily be played on an arrangement of four loudspeakers, front, back, left, right. While the omnidirectional signal contribution is played by all of the loudspeakers, the figure-of-eight contributions are played outof-phase by the corresponding front-back or left-right pair of loudspeakers, Fig. 1.8. ⎡ ⎤ ⎡ ⎤ F 1 1 0 ⎡ ⎤ ⎢ L ⎥ ⎢1 0 1 ⎥ W ⎢ ⎥=⎢ ⎥⎣ ⎦ ⎣ B ⎦ ⎣1 −1 0⎦ X . Y R 1 0 −1 (1.9) The decoding weights obviously discretizes the directional pickup characteristics of the Ambisonic channels at the directions of the loudspeaker layout. Consequently, if the loudspeaker layout is more arbitrary and described by the set of its angles {ϕl }, the sampling decoder can be given as ⎡ ⎤ ⎤ 1 cos(ϕ1 ) sin(ϕ1 ) ⎡W ⎤ Sϕ1 ⎢ .. ⎥ 1 ⎢ .. .. .. ⎥ ⎣ X ⎦ . ⎣ . ⎦ = ⎣. . . ⎦ 2 Y SϕL 1 cos(ϕL ) sin(ϕL ) ⎡ (1.10) To achieve a panning-invariant and balanced mapping by this decoder, loudspeakers should be evenly arranged. Moreover, it can be favorable to sharpen the spatial image by attenuating W by √13 to map a sound by a supercardioid playback pattern. Playback to head-tracked headphones and interactive rotation. In headphone playback, the headphone signals are generated by convolution with the head-related impulses responses of all four loudspeakers contributing to the left and the right ear signals L ear Rear ⎡ ⎤ F ⎥ (t)∗ h 180 (t)∗ h −90 (t)∗ ⎢ h 0L (t)∗ h 90 L L L ⎢L⎥ . = 0◦ −90◦ 90◦ 180◦ ⎣ B⎦ h R (t)∗ h R (t)∗ h R (t)∗ h R (t)∗ R ◦ ◦ ◦ ◦ (1.11) 1.3 First-Order Ambisonics (FOA) W 9 F L X B R Y Fig. 1.9 2D first-order Ambisonic decoding to head-tracked headphones To rotate the Ambisonic input scene of the decoder, it is sufficient to obtain a new set of figure-of-eight signals by mixing the X, Y channels with the following matrix depending on the rotation angle ρ, keeping W unaltered ⎤⎡ ⎤ ⎡ ⎤ ⎡ W 1 0 0 W ⎣ X̃ ⎦ = ⎣0 cos ρ − sin ρ ⎦ ⎣ X ⎦ . 0 sin ρ cos ρ Y Ỹ (1.12) This effect is important for head-tracked headphone playback to render the VR/360◦ audio scene static around the listener. A complete playback system is shown in Fig. 1.9. The big advantage of such a system is that rotational updates can be done at high control rates and the HRIRs of the convolver are constant. 1.3.2 3D First-Order Ambisonic Recording and Playback The first-order Ambisonic format in 3D consists of a signal W corresponding to an omnidirectional pickup pattern, and three signals (X, Y, and Z) corresponding to figure-of-eight pickup patterns aligned with the Cartesian coordinate axes. In three dimensions, we cannot work with figure-of-eight patterns described by sin ϕ or cos ϕ of the azimuth angle only, anymore. It is more convenient to describe the arbitrarily oriented figure-of-eight characteristics cos(φ) using the inner product between a variable direction vector (direction of arriving sound) and a fixed direction vector (microphone direction). Direction vectors are of unit length θ = 1 and their inner product corresponds to θT1 θ = cos(φ), where φ is the angle enclosed by the direction of arrival θ and the microphone direction θ1 . Consequently, a cardioid pickup pattern aiming at θ1 is described by 21 + 21 θT1 θ. Native 3D Ambisonic recording (Triple-MS). To record the Ambisonic channels W, X, Y, Z, one can use a Triple-MS scheme as shown in Fig. 1.10. With the transposed unit direction vectors representing the aiming of the figure-of-eight channels θTX = [1, 0, 0], θTY = [0, 1, 0], θTZ = [0, 0, 1], to produce the direction dipoles θTX θ, 10 1 XY, MS, and First-Order Ambisonics X Z W Y (a) Native 3D FOA recording (b) Picture of the recording setup Fig. 1.10 Native 3D first-order Ambisonic recording with an omnidirectional and three figure-ofeight microphones aligned with the Cartesian axes X, Y, Z θTY θ, and θTZ θ, we can mathematically describe the pickup patterns of native 3D firstorder Ambisonics as ⎤ ⎡ ⎤ ⎡ ⎛ 1 ⎞ ⎛ T1⎞ ⎥ ⎢ 100 ⎥ ⎢ θX 1 ⎥ ⎢ ⎥ = = gWXYZ (θ) = ⎢ . (1.13) ⎣⎝θTY ⎠ θ⎦ ⎣⎝0 1 0⎠ θ⎦ θ 001 θTZ 3D Ambisonic recording with a tetrahedral arrangement of cardioids. The principle that worked for three cardioid microphones on the horizon also works for a coincident tetrahedron microphone array of cardioids with the aiming directions FLU-FRDBLD-BRU, see Fig. 1.11, and [12], ⎡ ⎤ ⎡ ⎤ 1 1 1 T ⎥ 1 1 1 ⎢ 1 −1 −1⎥ 1 1⎢ ⎢θ ⎥ ⎥ θ. g(θ) = + ⎢ TFRD ⎥ θ = + √ ⎢ 2 2 ⎣θBLD ⎦ 2 2 3 ⎣−1 1 −1⎦ −1 −1 1 θTBRU θTFLU (1.14) Encoding is achieved there by the matrix that adds all microphone signals in the first line (W omnidirectional), subtracts back from front microphone signals in the second line (X figure-of-eight), subtracts right from left microphone signals in the third line (Y figure-of-eight), and subtracts down from up microphone signals in the last line (Z figure-of-eight), see also Fig. 1.11, ⎤ ⎛1 1 1 1 ⎞ ⎥ 1⎢ √ 1 1 −1 −1 ⎥ g(θ). gWXYZ (θ) = ⎢ 2 ⎣ 3 ⎝1 −1 1 −1⎠⎦ 1 −1 −1 1 ⎡ (1.15) 1.3 First-Order Ambisonics (FOA) 11 W X Y Z (a) Tetrahedral array with four cardioids (b) Encoder of microphone signals Fig. 1.11 Tetrahedral arrangement of cardioid microphones for 3D first-order Ambisonics and their encoding; microphone capsules point inwards to minimize their spacing (a) Tetrahedral array of 4 cardioids (b) SPS200 (c) MK4012 (d) ST450 Fig. 1.12 Practical tetrahedral recording setups with cardioid microphones; the Soundfield SPS200, Oktava MK4012, and Soundfield ST450 offer a fixed 4-capsule geometry. Equally important: Zoom’s H3-VR, Røde’s NT-SF1, Sennheiser’s AMBEO VR Mic As Fig. 1.12 shows, practical microphone layouts should be as closely spaced as possible. Nevertheless for high frequencies, the microphones cannot be considered coincident anymore, and besides a directional error, there will be a loss of presence in the diffuse field. Typically a shelving filter is used to slightly boost high frequencies. Roughly, a high-shelf filter with a 3 dB boost is sufficient to correct timbral defects at frequencies above which the microphone spacing exceeds half a wavelength, e.g., 5 kHz for a 3.4 cm spacing of the microphones. More advanced strategies are found, e.g., in [7, 13–15]. 3D Ambisonic decoding to loudspeakers. As before in the 2D case, a sampling decoder can be defined that represents the continuous directivity patterns associated with the channels W, X, Y, Z to map the signals to the discrete directions of the 12 1 XY, MS, and First-Order Ambisonics loudspeakers. Given the set of loudspeaker directions {θl } and the unit-vectors to X, Y, Z, the loudspeaker signals of the sampling decoder become ⎡ T ⎡ T ⎤ ⎡W ⎤ ⎤ ⎤ ⎡W ⎤ 1 θ1 θX θT1 θY θT1 θZ 1 θ1 S1 1 ⎢. . ⎥ ⎢ X⎥ X⎥ ⎥⎢ ⎢ .. ⎥ 1 ⎢ .. ⎢ ⎥ ⎥ ⎦ ⎣ Y ⎦ = ⎣ .. .. ⎦ ⎢ ⎣ . ⎦ = ⎣. ⎣Y ⎦. 2 2 T T T T SL 1 θL θX θL θY θL θZ 1θ Z Z L ⎡ (1.16) D Equivalent panning function/virtual microphone. The sampling decoder together with the native Ambisonic directivity patterns gTWXYZ (θ) = [1, θT ] yields the mapping of a signal s from the direction θ to the loudspeakers to be ⎤ ⎡ T⎤ ⎡ ⎤ 1 θ1 1 + θT1 θ S1 1⎢ . ⎥ ⎢ .. ⎥ 1 ⎢ .. .. ⎥ 1 ⎣ . ⎦ = ⎣ . . ⎦ θ s = ⎣ .. ⎦ s, 2 2 SL 1 θTL 1 + θTL θ ⎡ (1.17) This result means that the gain of a source from θ at each loudspeaker θl corresponds to evaluating a cardioid pattern aligned with θ. Consequently, the Ambisonic mapping corresponds to a signal distribution to the loudspeakers using weights obtained by discretization of an Ambisonics-equivalent first-order panning function. Equivalently, Ambisonic playback using a sampling decoder is comparable to recording each loudspeaker signal with a virtual first-order cardioid microphone aligned with the loudspeaker’s direction θl . It is decisive for a panning-independent loudness mapping and balanced performance that the directions of the loudspeaker layout are well chosen. Also, it can be preferred to reduce the level of the omnidirectional channel W by √13 to map a sound by the narrower supercardioid playback pattern instead of a cardioid pattern, which is rather broad. Decoder design problems were early addressed by Gerzon [16], Malham [17], and Daniel [18]. A current solution for higher-order decoding is given in Sect. 4.9.6 on All-round Ambisonic decoding. 3D Ambisonic decoding to headphones. 3D Ambisonic decoding to headphones uses the same approach as for 2D above, except that additional rotational degrees are implemented to compensate for any change in head orientation. Rotation concerns the three directional components X, Y, Z ⎡ ⎤ ⎡ ⎤ X̃ X ⎣ Ỹ ⎦ = R(α, β, γ ) ⎣ Y ⎦ . Z Z̃ (1.18) For the definition of the rotation matrix R(α, β, γ ) and the meaning of its angles refer to Eq. 5.5 of Sect. 5.2.2. The selection of a suitable set of HRIRs is a question 1.3 First-Order Ambisonics (FOA) 13 of directional discretization of the 3D directions, as addressed in the decoder above. Signals obtained for virtual loudspeakers are again to be convolved with the corresponding HRIRs for the left and the right ear. 1.4 Practical Free-Software Examples The practical examples below show first-order Ambisonic panning a mono sound, decoded to simple loudspeaker layouts. These are either a square layout with 4 loudspeakers at the azimuth angles [0◦ , 90◦ , 180◦ , −90◦ ] or an octahedral layout with 6 loudspeakers at azimuth [0◦ , 90◦ , 180◦ , −90◦ , 0◦ , 0◦ ] and elevation [0◦ , 0◦ , 0◦ , 0◦ , 90◦ , −90◦ ]. 1.4.1 Pd with Iemmatrix, Iemlib, and Zexy Pd is free and it can load and install its extensions from the internet. Required software components are: • • • • pure-data (free, http://puredata.info/downloads/pure-data) iemmatrix (free download within pure-data) zexy (free download within pure-data) iemlib (free download within pure-data) Figure 1.13 gives an example for horizontal (2D) first-order Ambisonic panning, decoded to 4 loudspeaker and 2 headphone signals. Figure 1.14 shows the processing inside the Pd abstraction [FOA_binaural_decoder] contained in the Fig. 1.13 example, which uses SADIE database1 subject 1 (KU100 dummy head) HRIRs to render headphone signals. Figure 1.15 sketches a first-order Ambisonic panning in 3D with decoding to an octahedral loudspeaker layout; master level [multiline∼] and hardware outlets [dac∼] were omitted for easier readability. 1.4.2 Ambix VST Plugins This example uses a DAW and ready-to-use VST plug-ins to render first-order Ambisonics. As DAW, we recommend Reaper (reaper.fm) because it nicely facilitates higher-order Ambisonics by allowing tracks of up to 64 channels. Moreover, it is relatively low-priced and there is a fully functional free evaluation version available. You can also use any other DAW that supports VST and sufficiently many multi-track 1 https://www.york.ac.uk/sadie-project/Resources/SADIEIIDatabase/D1/D1_HRIR_WAV.zip. 14 1 XY, MS, and First-Order Ambisonics encoding vector 0 decoding matrix loadbang matrix 4 3 1 1 0 1 0 1 1 -1 0 1 0 -1 azimuth angle in deg. zexy/deg2rad declare -lib zexy expr 1.0/2; cos($f1)/2; sin($f1)/2 declare -lib iemmatrix pack f f f mtx 3 1 pink~ <-test signal master level encoder mtx_*~ 3 1 10 *~ 1.4 *~ 1.4 <-2D superdecoder mtx_*~ 4 3 0 card. dbtorms prvu~ prvu~ prvu~ prvu~ >+12 +6 +2 -0dB -2 -6 -12 -20 -30 -50 <-99 >+12 +6 +2 -0dB -2 -6 -12 -20 -30 -50 <-99 >+12 +6 +2 -0dB -2 -6 -12 -20 -30 -50 <-99 >+12 +6 +2 -0dB -2 -6 -12 -20 -30 -50 <-99 multiline~ 0 0 0 0 10 FOA_binaural_decoder lspks-> dac~ 3 4 5 6 dac~ 1 2 <-headphs. Fig. 1.13 First-order 2D encoding and decoding in pure data (Pd) for a square layout Left Rear Right loadbang Front inlet~ inlet~ inlet~ inlet~ 0 90 180 270 read 44K_16bit/azi_$1\,0_ele_0\,0.wav h$1L h$1R soundfiler h0L h0R FIR~ h270L 256 FIR~ h270R 256 FIR~ h180L 256 FIR~ h180R 256 h90L FIR~ h90L 256 h90R h270L h270R FIR~ h90R 256 FIR~ h0L 256 FIR~ h0R 256 outlet~ outlet~ h180L h180R Fig. 1.14 Binaural rendering to headphones on pure data (Pd) by convolution with SADIE KU100 HRIRs channels. The example employs the freely available ambiX plug-in suite (http://www. matthiaskronlachner.com/?p=2015), although there exist other Ambisonics plug-ins, especially for first-order. Track Name Ins Outs FX Virtual source 1 1 4 ambix_encoder_o1 MASTER 4 6 ambix_decoder_o1 1.4 Practical Free-Software Examples 15 declare -lib zexy Azimuth angle in deg. declare -lib iemmatrix 20 zexy/deg2rad static decoding matrix declare -lib iemlib loadbang matrix 6 4 1 0 0 1 1 -1 0 0 1 0 0 -1 Elevation angle 15 1 1 0 0 90 $1 in deg. 1 0 1 0 1 0 -1 0 zexy/deg2rad ENCODER: osc~ 1000 <- sine test signal mtx_*~ 4 1 10 <- encodes signal into 4 signals of an Ambisonics bus decodes to octahedron loudspeaker <- signals mtx_* 0.354491 t b f pack f f DECODER: mtx_*~ 6 4 0 mtx 2 1 mtx_spherical_harmonics 1 mtx_* 3.54491 prvu~ prvu~ prvu~ prvu~ >+12 +6 +2 -0dB -2 -6 -12 -20 -30 -50 <-99 >+12 +6 +2 -0dB -2 -6 -12 -20 -30 -50 <-99 >+12 +6 +2 -0dB -2 -6 -12 -20 -30 -50 <-99 >+12 +6 +2 -0dB -2 -6 -12 -20 -30 -50 <-99 mtx_transpose adjustable encoding vector encodes test signal by the factors: W: 1, -Ysqrt(3): -sqrt(3)sin(azi)cos(ele), Front Left Rear Right Zsqrt(3): sqrt(3)sin(ele), Xsqrt(3): sqrt(3)cos(azi)cos(ele), prvu~ >+12 +6 +2 -0dB -2 -6 -12 -20 -30 -50 <-99 Top prvu~ >+12 +6 +2 -0dB -2 -6 -12 -20 -30 -50 <-99 Bottom Fig. 1.15 First-order 3D encoding and decoding in pure data (Pd) using [mtx_spherical_harmonics] for an octahedral layout ([dac∼] omitted for simplicity) After creating the new track for the virtual source and importing a mono/stereo audio file (per drag-and-drop), the next step is the setup of the track channels. As shown in the table, the virtual source has a single-channel (mono) input and 4 output channels to send the 4 channels of first-order Ambisonics to the Master. The option to send to the Master is activated by default, cf. left in Fig. 1.16. The Master track itself requires 4 input channels and 6 output channels to feed the 6 loudspeakers (right). In Reaper, there is no separate adjustment for input and output channels, thus the Master track has to be set to 6 channels. In the source track FX, the ambix_encoder_o1 can be used to encode the virtual source signal at an arbitrary location on a sphere by inserting the plug-in into the track of the virtual source, cf. its panning GUI in Fig. 1.17. For adding more sources, the track of the virtual source can simply be copied or duplicated. All effects and routing options are maintained for the new tracks. In order to decode the 4 first-order Ambisonics Master channels to the loudspeakers the ambix_decoder_o1 plug-in is added to the Master track. The plug-in requires a preset that defines the decoding matrix and its channel sequence and normalization. For the exemplary octahedral setup with 6 loudspeakers, the following text can be copied to a text file and saved as config-file, e.g., “octahedral.config”. The decoder 16 1 XY, MS, and First-Order Ambisonics Fig. 1.16 1st-order example in Reaper DAW: routing Fig. 1.17 1st-order example in Reaper DAW: encoder matrix contains W, -Y, Z, X, with W as constant and -Y, Z, X refer to Cartesian coordinates of the octahedron. #GLOBAL /coeff_scale n3d /coeff_seq acn #END #DECODERMATRIX 1 0 0 1 1 1 0 0 1 -1 0 0 1 0 0 -1 1 0 1 0 1 0 -1 0 #END 1.4 Practical Free-Software Examples 17 Fig. 1.18 1st-order example in Reaper DAW: decoder After loading the preset into the decoder plug-in, the decoder can generate the loudspeaker signals as shown in Fig. 1.18. In the example, the virtual source is panned to the front, resulting in the highest level for loudspeaker 1 (front). The loudspeaker 3 (back) is 12dB quieter because of a side-lobe suppressing super cardioid weighting implied by the switch /coeff_scale n3d, as a trick to keep things simple. As shown on the SADIE-II website,2 the SADIE-II head-related impulse responses can be used to rendering Ambisonics to headphones. The listing below shows a configuration file to be used with ambix_binaural, cf. Fig. 1.19, again using the trick to select n3d to keep the numbers simple and super-cardioid weighting #GLOBAL /coeff_scale n3d /coeff_seq acn #END #HRTF 44K_16bit/azi_0,0_ele_0,0.wav 44K_16bit/azi_90,0_ele_0,0.wav 44K_16bit/azi_180,0_ele_0,0.wav 44K_16bit/azi_270,0_ele_0,0.wav 44K_16bit/azi_0,0_ele_90,0.wav 44K_16bit/azi_0,0_ele_-90,0.wav #END 2 https://www.york.ac.uk/sadie-project/ambidec.html. 18 1 XY, MS, and First-Order Ambisonics Fig. 1.19 1st-order example in Reaper DAW: binaural decoder #DECODERMATRIX 1 0 0 1 1 1 0 0 1 -1 0 0 1 0 0 -1 1 0 1 0 1 0 -1 0 #END For decoding to less regular loudspeaker layouts, the IEM AllRADecoder3 permits editing loudspeaker coordinates and automatically calculating a decoder within the plugin. For decoding to headphones, the IEM BinauralDecoder offers a high-quality decoder. The technology behind both plugins is explained in Chap. 4. In addition to the virtual sources, you can also add a 4-channel recording done with a B-format microphone by placing the 4-channel file in a new track. Reaper will automatically set the number of track channels to 4 and send the channels to the Master. Note that some B-format microphones use a different order and/or weighting of the Ambisonics channels. Simple conversion to the AmbiX-format can be done by inserting the ambix_converter_o1 plug-in into the microphone track. 1.5 Motivation of Higher-Order Ambisonics Diffuseness, spaciousness, depth? Diffuse sound fields are typically characterized by sound arriving randomly from evenly distributed directions at evenly distributed delays. It is practical knowledge that the impression of diffuseness and spaciousness 3 https://plugins.iem.at/. 1.5 Motivation of Higher-Order Ambisonics 19 requires benefits from decorrelated signals, which is typically achieved by large distances between the microphones rather than by coincident microphones. Due to the evenness of diffuse sound fields, one would still hope that a low spatial resolution is sufficient to map diffuseness and spatial depth of a room, using coincident microphones or first-order Ambisonics. Nevertheless, high directional correlation during playback destroys this hope and in fact yields a perceptually impeded playback of diffuseness, spaciousness, and depth. The technical advantages in interactivity and VR as well as the known shortcomings of first-order coincident recording techniques offer enough motivation to increase the directional resolution and go to higher-order Ambisonics, as presented in the subsequent chapters. For professional productions, it is often not sufficient to only rely on first-order coincident microphone recordings. By contrast, higher-order Ambisonics is able to drastically improve the mapping of diffuseness, spaciousness, and depth, as shown in the upcoming chapter about psychoacoustical properties of many-loudspeaker systems. Recording with a higher-order main microphone array increases the required technological complexity. Nevertheless, digital signal processing and the theory presented in the later chapters is powerful nowadays to achieve this goal. After all, it seems that delay-based stereophonic recording, such as AB, or equivalence-based recording, such as ORTF, INA5, etc., is often required and wellknown in its mapping properties for spaciousness and diffuseness, correspondingly. What is nice about higher-order Ambisonics: it can make use of these benefits by embedding such recordings appropriately, see Fig. 1.20. Facts about higher orders: Ambisonics extended to higher orders permits a refinement of the directional resolution and hereby improves the mapping of uncorrelated sounds in playback. Figure 1.21a shows the correlation introduced in two neighboring loudspeaker signals when using Ambisonics, given their spacing of 60◦ . Given the just noticeable difference (JND) of the inter-aural cross correlation, the figure indicates that an Ambisonic order of ≥3 might be necessary to perceptually preserve decorrelation. Fig. 1.20 How is a microphone tree represented in Ambisonics, when it consist of 6 cardioids spaced by 60 cm and 60◦ on a horizontal ring, and a ring of 4 super cardioids spaced by 40 cm and 90◦ as height layer, pointing upwards? 1 XY, MS, and First-Order Ambisonics 1 1 perceived spatial depth mapping Inter−Channel Cross Correlation 20 0.8 0.6 0.4 1/2 JND 0.2 0 0.8 0.6 0.4 0.2 1: center 2: off-center 0 0 2 4 6 1 10 8 3 5 Ambisonic order Ambisonic order (a) Inter-channel cross correlation of two Ambisonically driven loudspeakers spaced by 60 (b) Perceived depth in dependence of Ambisonics playback order for two listening positions Fig. 1.21 Relation between the Ambisonic order, decorrelation, and perceived depth 5 L C 4 R 3 2 x in m Fig. 1.22 Perceptual sweet area of Ambisonic playback from the front at first (light gray), third (dark gray), and fifth order (black). It marks the area in which the perceived direction is plausible, i.e., does not collapse into a single loudspeaker other than C 1 0 -1 -2 -3 -4 -5 6 5 4 3 2 1 0 -1 -2 -3 -4 -5 -6 y in m For this reason, the perception of spatial depth strongly improves when increasing the Ambisonic order from 1 up to 3, Fig. 1.21b. However, this is only the case when seated at the central listening position. Outside this sweet spot, higher orders than 3, e.g., 5, additionally improve the mapping of depth [19]. Therefore, higher-order Ambisonics is important for preserving spatial impressions and when supplying a large audience. Figure 1.22 shows that the sweet area of perceptually plausible playback increases with the Ambisonic order [20]. With fifth-order Ambisonics, nearly all the area spanned by the horizontal loudspeakers at the IEM CUBE, the 12 × 10 m concert space at our lab, becomes a valid listening area. References 21 References 1. A.D. Blumlein, Improvements in and relating to sound-transmission, sound-recording and sound-reproducing systems. GB patent 394325 (1933), http://www.richardbrice.net/blumlein_ patent.htm 2. M.A. Gerzon, Why coincident microphones? in Reproduced from Studio Sound, by permission of IPC Media Ltd, publishers of Hi-Fi News (www.hifinews.co.uk), vol. 13 (1971), https:// www.michaelgerzonphotos.org.uk/articles/Why 3. P.B. Vanderlyn, In search of Blumlein: the inventor incognito. J. Audio Eng. Soc. 26(9) (1978) 4. S.P. Lipshitz, Stereo microphone techniques: Are the purists wrong? J. Audio Eng. Soc. 34(9) (1986) 5. S. Weinzierl, Handbuch der Audiotechnik (Springer, Berlin, 2008) 6. A. Friesecke, Die Audio-Enzyklopädie. G. Sazur (2007) 7. M.A. Gerzon, The design of precisely coincident microphone arrays for stereo and surround sound, prepr. L-20 of 50th Audio Eng. Soc. Conv. (1975) 8. G. Bore, S.F. Temmer, M-s stereophony and compatibility, Audio Magazine (1958), http:// www.americanradiohistory.com 9. M.A. Gerzon, Application of blumlein shuffling to stereo microphone techniques. J. Audio Eng. Soc. 42(6) (1994) 10. D.H. Cooper, T. Shiga, Discrete-matrix multichannel stereo. J. Audio Eng. Soc. 20(5), 346–360 (1972) 11. P. Felgett, Ambisonic reproduction of directionality in surround-sound systems. Nature 252, 534–538 (1974) 12. P. Craven, M.A. Gerzon, Coincident microphone simulation covering three dimensional space and yielding various directional outputs, U.S. Patent, no. 4,042,779 (1977) 13. C. Faller and M. Kolundžija, “Design and limitations of non-coincidence correction filters forsoundfield microphones,” in prepr. 7766, 126th AES Conv, Munich (2009) 14. J.-M. Batke, The b-format microphone revisited, in 1st Ambisonics Symposium, Graz (2009) 15. A. Heller E. Benjamin, Calibration of soundfield microphones using the diffuse-field response, in prepr. 8711, 133rd AES Conv, San Francisco (2012) 16. M. Gerzon, General metatheory of auditory localization, in prepr. 3306, Conv. Audio Eng. Soc. (1992) 17. D.G. Malham, A. Myatt, 3D Sound spatialization using ambisonic techniques. Comput. Music. J. 19(4), 58–70 (1995) 18. J. Daniel, J.-B. Rault, J.-D. Polack, Acoustic properties and perceptive implications of stereophonic phenomena, in AES 6th International Conference: Spatial Sound Reproduction (1999) 19. M. Frank, F. Zotter, Spatial impression and directional resolution in the reproduction of reverberation, in Fortschritte der Akustik - DEGA, Aachen (2016) 20. M. Frank, F. Zotter, Exploring the perceptual sweet area in ambisonics, in AES 142nd Conv. (2017) 22 1 XY, MS, and First-Order Ambisonics Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made. The images or other third party material in this chapter are included in the chapter’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. Chapter 2 Auditory Events of Multi-loudspeaker Playback It is evident that until one knows what information needs to be presented at the listener’s ears, no rational system design can proceed. Michael A. Gerzon 1976 and AES Vienna [1] 1992. Abstract This chapter describes the perceptual properties of auditory events, the sound images that we localize in terms of direction and width, when distributing a signal with different amplitudes to one or a couple of loudspeakers. These amplitude differences are what methods for amplitude panning implement, and they are also what mapping of any coincident-microphone recording implies when reproduced over the directions of a loudspeaker layout. Therefore several listening experiments on localization are described and analyzed that are essential to understand and model the psychoacoustical properties of amplitude panning on multiple loudspeakers of a 3D audio system. For delay-based recordings or diffuse sounds, there is some relation, however, it is found to be less stable for the desired applications. Moreover, amplitude panning is not only about consistent directional localization. Loudness, spectrum, temporal structure, or the perceived width should be panning-invariant. The chapter also shows experiments and models required to understand and provide those panning-invariant aspects, especially for moving sounds. It concludes with openly-available response data of most of the presented listening experiments. Starting from classic listening experiments on stereo panning by Leakey [2], Wendt [3], and pairwise horizontal panning by Theile [4], this chapter explores the relevant perceptual properties for 3D amplitude panning and their models. Important experimental studies considered here are for instance those by Simon [5], Kimura [6], F. Wendt [7], Lee [8], Helm [9], and Frank [10, 11]. By the experimental results, it © The Author(s) 2019 F. Zotter and M. Frank, Ambisonics, Springer Topics in Signal Processing 19, https://doi.org/10.1007/978-3-030-17207-7_2 23 24 2 Auditory Events of Multi-loudspeaker Playback is possible to firmly establish Gerzon’s [1] E, r E and r E estimators for perceived loudness, direction, and width that apply to most stationary sounds in typical studio and performance environments. 2.1 Loudness At a measurement point in the free field, the same signal fed to equalized loudspeakers of exactly the same acoustic distance would superimpose constructively (+6 dB). In a room with early reflections and a less strict equality of the incoming pair of sounds (typical, slight inaccuracy in loudspeaker/listener position, different mounting situations, different directions in the directivities of ears and loudspeakers), the superposition can be regarded as stochastically constructive (+3 dB) in particular at frequencies that aren’t very low. For the above reasoning, typical amplitude panning rules try to keep the weights distributing the signal to the loudspeakers normalized by root of squares instead of normalizing to the linear sum, in order to obtain constant loudness ([12], VBAP): gl gl ← L l=1 . (2.1) gl2 Loudness Model. If all loudspeakers are equalized, located at the same distance to the listener, and fed by the same signal with different amplitude gains gl , a constructive interference could be expected so that the amplitude becomes [1] P= L gl . (2.2) l=1 However, the interference stops to be strictly constructive as soon as the room is not entirely anechoic, the sitting position is not exactly centered, or even for anechoic and centered conditions at high frequencies, when the superposition at the ears cannot be assumed to be purely constructive anymore. Then it is better to assume a less well-defined, stochastic superposition in which a squared amplitude is determined by the sum of the squared weights [1]: E= L gl2 . (2.3) l=1 Therefore, the most common amplitude panning rules use root-squares normalization to obtain a loudness impression that is as constant as possible. The measure E seems to be most useful when designing and evaluating amplitudepanning or coincident microphone techniques. It is not surprising that the ITU-R 2.1 Loudness 25 BS.1770-41 uses the Leq(RLB) measure as a loudness model: it is essentially the RMS level after high-pass filtering, cf. [13], which is closely related to the E measure detected from loudspeaker signals. An interesting refinement was proposed by Laitinen et al. [14], which uses a p L p measure l=1 gl in which the exponent p is close to 1 at low frequencies under anechoic conditions and close to 2 at high frequencies/under reverberant conditions. 2.2 Direction In the early years of stereophony, researchers investigated the differences in delay times and amplitudes required to control the perceived direction. Below, only experiments are considered that did not use fixation of the listener’s head. 2.2.1 Time Differences on Frontal, Horizontal Loudspeaker Pair The dissertation of K. Wendt in 1963 [3] shows notably accurate listening experiments done on ±30◦ two-channel stereophony using time delays, in which listeners indicated from where they heard the sounds for each of the tested time differences. H. Lee revisited the properties in 2013 [8], but with musical sound material and an experiment, in which the listener adjusted the time differences until the perceived direction matched the one of a corresponding fixed reference loudspeaker, Fig. 2.1. The time differences are seldom applicable to reliable angular auditory event placement: auditory images are strongly frequency-dependent (not shown here) and therefore unstable for narrow-band sounds. Leakey and Cherry showed 1957 [2] that time-delay stereophony loses its effect under the presence of background noise. 2.2.2 Level Differences on Frontal, Horizontal Loudspeaker Pair K. Wendt’s [3] and H. Lee’s [8] experiments deliver insights in sound source positioning with ±30◦ two-channel stereophony, however this time with level differences. As opposed to Fig. 2.1, in which auditory image panning with time differences were characterized by statistical spreads of up to 15◦ , level-difference-based panning is clearly smaller in the spread of perceived directions than 10◦ , Fig. 2.2. Signal dependency. Wendt [3] described the signal dependency of panning curves on various transient and band-limited sounds, and Lee [8] for musical sounds. A new 1 https://www.itu.int/rec/R-REC-BS.1770-4-201510-I/en Algorithms to measure audio programme loudness and true-peak audio level (10/2015). 26 2 Auditory Events of Multi-loudspeaker Playback 30 angle in degrees 20 10 0 −10 −20 1 0.75 −0.5 −0.4 −0.3 −0.2 −0.1 0 0.1 0.2 0.3 0.4 0.5 −1 −0.75 −30 time delay in ms Fig. 2.1 K. Wendt’s experiment [3] used an angular marks helping to specify the localized direction (left). Right shows results for time differences between impulse signals fed to loudspeakers, no head fixation (diagram shows means and standard deviation; the standard deviation was interpolated for the figure). In gray: Results of the time-difference adjustment experiment of Lee [8] using musical material (25, 50, 75% quartiles, symmetrized diagram) 30 angle in degrees 20 10 0 −10 −20 30 20 −15 −12 −9 −6 −3 0 3 6 9 12 15 −20 −30 −30 Level difference in dB Fig. 2.2 Wendt’s [3] results to crack (impulsive) signals with level differences and without head fixation (the figure shows means and standard deviation; standard deviation was interpolated to plot this figure). In gray: Results of Lee’s [8] level-difference adjustment experiment with musical sounds (25, 50, 75% quartiles, symmetrized diagram) comprehensive investigation on frequency dependency was carried out by Helm and Kurz [9]. With level differences {0, 3, 6, 9, 12} dB and third-octave filtered pulsed pink noise at {125, 250, 500, 1k, 2k, 4k} Hz, they showed that the perceived angle pointed at by the listeners using a motion-tracked pointer was similar between the broad-band case and third-octave bands below 2 kHz. In bands above 2 kHz, smaller level differences cause a larger lateralization, see interpolated curves in Fig. 2.3. 27 30 20 10 broadb. 4kHz 500Hz 0 0 3 6 9 12 slope in degree per dB Perceived angle in degree 2.2 Direction 3.5 3 2.5 2 broadb. 125Hz 250Hz 500Hz 1kHz 2kHz 4kHz Level difference in dB Frequency band (a) Third-octave vs. broad-band (b) Slope of different bands 8kHz Fig. 2.3 Panning curve for frontal ±30◦ loudspeaker pair from [9] on the example of the 500 and 4 kHz third-octave band and the slopes for different bands, based on the 3 and 6 dB conditions 2.2.3 Level Differences on Horizontally Surrounding Pairs Successive pairwise panning on neighboring loudspeaker pairs is typically used to pan auditory events freely along the loudspeakers of a horizontally surrounding loudspeaker ring. The classical research done specifically targeted at such applications was contributed by Theile and Plenge 1977 [4]. They used a mobile reference loudspeaker with some reference sound that could be moved to match the perceived direction of a loudspeaker pair playing pink noise with level differences at different orientations with respect to the listener’s head. There is also the experiment of Pulkki [15] using a level-adjustment task, in which levels were adjusted as to match the auditory event to one of a reference loudspeaker at three different reference directions and for different head orientations. A comprehensive experiment was done by Simon et al. [5], who used a graphical user interface displaying the floor plan of a 45◦ -spaced loudspeaker ring to have the listeners specify the perceived direction. Martin et al. in 1999 [16] used a graphical user interface showing the floorplan of a 5.1 ring in their experiment, and last but not least, Matthias Frank used a direct pointing method to enter the perceived direction [10] in one of his experiments. As the experiments did not seem to yield consistent results, a comprehensive leveldifference adjustment experiment with 24 loudspeakers arranged as a horizontal ring was done in [17] and partially repeated later in [11], see results in Fig. 2.4. In the repeated experiment [11] it became clear that in the anechoic room, a large amount of the differently pronounced localization biases can be avoided by encouraging the listeners to do front-back and left-right head motion by a few of centimeters, whenever there is doubt. 28 2 Auditory Events of Multi-loudspeaker Playback 120 reference (=perceived) angle in degree reference (=perceived) angle in degree 30 15 0 −15 −30 −10 −5 0 5 10 105 90 75 60 −10 −5 0 5 10 adjusted level difference in dB adjusted level difference in dB (a) frontal 60 stereo pair (b) lateral 60 stereo pair Fig. 2.4 Medians and 95% confidence intervals for adjusted level differences to align amplitudepanned pink-noise with harmonic complex tone from {±15◦ , 0◦ }, for a frontal and b lateral 60◦ stereo pair; a uses data from [17] with 4 responses per direction from 5 listeners; b used data from [11] with 20 responses per direction. Despite the considerably different spread, frontal and lateral stereo pairs seem to yield pretty much the same tendency 2.2.4 Level Differences on Frontal, Horizontal to Vertical Pairs Quite extensively, T. Kimura investigates the localization of auditory events between frontal, vertical ±13.5◦ loudspeaker pairs in 2012 [6, 18]. The work of F. Wendt in 2013 [7, 19] also investigates a slant and vertical loudspeaker pair, Fig. 2.5. Kimura uses pulsed white noise, Wendt uses pulsed pink noise. Obviously, the horizontal spread is always smaller than the vertical spread and the spread does not align with the direction of the loudspeaker pair. The largest vertical spread appears for the vertical loudspeaker pair. 2.2.5 Vector Models for Horizontal Loudspeaker Pairs A weighted sum of the loudspeakers’ direction vectors θ1 , θ2 could be conceived as simple linear model of the perceived direction, using a linear blending parameter 0≤q≤1 r = (1 − q) θ1 + q θ2 . (2.4) The parameter q adjusts where the resulting vector r is located on the connecting line between θ1 and θ2 . On frontal loudspeaker pairs, localization curves typically run through the middle direction q = 21 for level differences of 0 dB. If only one 2.2 Direction 29 15 angle in degrees 10 5 0 −5 −10 15 9 12 6 3 0 −3 −6 −9 −15 −12 −15 Level difference in dB Fig. 2.5 Mean values and 95% confidence intervals of the direct-pointing experiments of Kimura (top) with level differences on a vertical ±13.5◦ loudspeaker pair and results of F. Wendt (bottom) on frontally arranged horizontal, slant, and vertical ±20◦ loudspeaker pairs showing two-dimensional 95% confidence (solid) and standard deviation ellipses (dotted) loudspeakers is active, the result is either of the loudspeaker directions, thus the parameter is q = 0 or q = 1. Classical definitions. As the simplest choice for q, one could insert q = g22 g12 +g22 g2 g1 +g2 or to get the vector definitions as weighted average using either the linear or q= squared gains according to [1]: rV = g1 θ1 + g2 θ2 , g1 + g2 rE = g12 θ1 + g22 θ2 . g12 + g22 (2.5) For both models, equal gains g1 = g2 yield q = 21 , and also the endpoints with g2 = 0 or g1 = 0 correspond to q = 0 or q = 1, respectively. However, the slope of the r E vector is steeper than the one of the r V . For instance, if g2 = 2 g1 , the vector r V lies on q = 2/3 of the line between θ1 and θ2 , while r E lies at q = 4/5 of the connecting line. The r V vector for the ±α loudspeaker pair at the directions θT1,2 = (cos α, ± sin α) corresponds to the tangent law [20], whose formal origin lies in a model of summing localization based on a simple model of the ear signals, cf. Appendix A.7. The equivalence of this law to the vector model follows from the tangent tan ϕ as ratio of the 2 sin(−α) 2 = gg11 −g tan α. y divided by x component of the r V vector, tan ϕ = gg11 sin(α)+g cos(α)+g2 cos(α) +g2 2 Auditory Events of Multi-loudspeaker Playback 30 30 reference (=perceived) angle in degree Perceived angle in degree 30 rV 20 rE r 1.4 r 1.8 10 4kHz 500Hz 15 0 rE r 0 1.5 −30 −10 12 9 6 3 0 Level difference in dB −5 0 5 10 adjusted level difference in dB (a) frontal 60 stereo pair, third-octaves 120 (b) frontal 60 stereo pair, broadband 20 Perceived angle in degree reference (=perceived) angle in degree rV −15 105 90 r V 75 r E r 1.7 60 −10 −5 0 5 10 rE r 2.3 0 horizontal vertical −10 −20 10 adjusted level difference in dB (c) lateral 60 stereo pair, broadband −6 −3 0 3 6 Level difference in dB (d) frontal horizontal and vertical 20 pair, broadband Fig. 2.6 Fit of the r V , r E , and r γ models for a third-octave noise on a frontal stereo pair using data from [9], and with data from [11]: b pink noise frontal and c lateral, cf. Figs. 2.3 and 2.4; d horizontal and vertical from [7], Fig. 2.5 Adjusted slope. Differently steep curves were fitted by an adjustable-slope model [17] rγ = |g1 |γ θ1 + |g2 |γ θ2 , |g1 |γ + |g2 |γ (2.6) which uses γ = 1 for r V and γ = 2 for r E . Figure 2.6 compares the prediction by r V , r E , and r γ to frequency-dependently perceived directions in frontal horizontal pairs, to perceived directions in a lateral stereo pair, and to perceived directions in a frontal pair that is either horizontal or vertical, using various studies mentioned above. Practical choice r E . While a specific exponent γ closely fitting the experimental data may vary, a constant value is preferable. Figure 2.6 indicates that in most cases focusing on r E is reasonable and sufficiently precise, see also [11]. 2.2 Direction 31 0.3dB,11.2dB 0dB,0dB 0dB,0dB 3.6dB,3.3dB 0dB,5.5dB 5.8dB,0dB Fig. 2.7 Indirect level-adjustment experiment of Pulkki [21] shows the spread and mean of the adjusted VBAP angles for frontal loudspeaker triplets, and the experiments of F. Wendt [7, 19] use a direct pointing method to obtain results in the shape of two-dimensional 95% confidence (solid) and standard deviation ellipses (dotted) for {−∞, 0, +11.71} dB for the top loudspeaker (left diagram), or the right loudspeaker (center diagram) respectively, or {−∞, 0, +11.51} dB for the bottom loudspeaker (right) 2.2.6 Level Differences on Frontal Loudspeaker Triangles V. Pulkki [21] and F. Wendt [7, 19] investigated localization properties for frontal loudspeaker triplets with level differences, see Fig. 2.7. Both used pulsed pink noise in their experiments. While V. Pulkki used an indirect adjustment task to evaluate VBAP control angles to obtain auditory events directionally matching the respective reference loudspeakers, F. Wendt uses a direct pointing method. Wendt’s experiments indicate that loudspeaker triplets with three different azimuthal positions yield a smaller spread in the indicated direction than such with vertical loudspeaker pairs (not the case in Pulkki’s experiments). 2.2.7 Level Differences on Frontal Loudspeaker Rectangles F. Wendt [7, 19] moreover presents experiments about frontal loudspeaker rectangles, again using a pointer method and pulsed pink noise, Fig. 2.8. Again it seems that arrangements avoiding vertical loudspeaker pairs exhibit a smaller statistical spread in the responses. 32 2 Auditory Events of Multi-loudspeaker Playback Fig. 2.8 Wendt’s experiments about frontal loudspeaker rectangles showing two-dimensional 95% confidence (solid) and standard deviation ellipses (dotted). The experimental setup of this and above-mentioned experiments is shown. Left: each of the corner loudspeakers is raised once by +6 dB in level, right: both left/right loudspeaker levels are raised once by {+3, +6} dB, and both top/bottom pairs are once raised by +6 dB 2.2.8 Vector Model for More than 2 Loudspeakers For more than two active loudspeakers and in 3D, a vector model based on the exponent γ = 2 yields the r E vector [1] L r E = l=1 L l=1 gl2 θl gl2 . (2.7) 2.2.9 Vector Model for Off-Center Listening Positions At off-center listening positions, the distances to the loudspeakers are not equal anymore, resulting in additional attenuation and delay for each loudspeaker depending on the position. For stationary sounds, this effect can be incorporated into the energy vector by additional weights wr,i and wτ,i 2.2 Direction 33 L r E = l=1 L (wr,l wτ,l gl )2 θl l=1 (wr,l wτ,l gl )2 . (2.8) The weight wr,l models the attenuation of a point-source-like propagation r1 . The reference distance is the distance to the closest loudspeaker at the evaluated listening position, thus the weight of each loudspeaker results in wr,l = 1 . rl (2.9) The incorporation of delays into the energy vector requires a transformation that yields the weights wτ,l for each loudspeaker. It is reasonable that these weights attenuate the lagging signals in order to reduce their influence on the predicted direction. is known from the echo threshold in [22], similarly [23], An attenuation of 41 dB ms and has successfully been applied for the prediction of localization in rooms [24]. The weight of each loudspeaker is calculated as τl = rcl in seconds at the listening position under test wτ,l = 10 −1000 4·20 τl . (2.10) Further weights can be applied in order to model the precedence effect in more detail, as proposed by Stitt [25, 26]. Listening test results in [27] compared the differently complex extensions of the energy vector and revealed that the simple weighting with wr,i and wτ,i is sufficient for a rough prediction of the perceived direction in typical playback scenarios. The left side of Fig. 2.9 shows the predicted directions by the energy vector for various listening positions when playing back the same signal on a standard stereo loudspeaker pair with a radius of 2.5 m. The absolute localization error can be calculated from the difference of the predicted direction and the desired panning direction. The right side of Fig. 2.9 depicts areas with localization errors within 4 ranges: 0◦ . . . 10◦ (white, perfect localization), 10◦ . . . 30◦ (light gray, plausible localization), 30◦ . . . 90◦ (gray, rough localization), and >90◦ (dark gray, poor localization). Concerning a single playback scenario, i.e. a single panning direction on a loudspeaker setup, the perceptual sweet area for plausible playback can be estimated by the area with localization errors below 30◦ . For the prediction of a more general sweet area, the absolute localization errors can be computed for all possible panning directions in a fine grid of 1◦ and averaged at each listening position as shown in Fig. 2.10. 34 2 Auditory Events of Multi-loudspeaker Playback Fig. 2.9 Predictions of perceived directions by the energy vector for different listening positions in a standard stereo setup with two loudspeakers playing the same signal. Gray-scale areas on the right indicate listening areas with predicted absolute localization errors within different angular ranges Fig. 2.10 Predictions of mean absolute localization errors by the energy vector in a standard stereo setup for panning directions between −30◦ and 30◦ 2.3 Width M. Frank [10] investigated the auditory source width for frontal loudspeaker pairs with 0 dB level difference and various aperture angles, as well as the influence of an additional center loudspeaker on the auditory source width. The response was given by reading numbers off a left-right symmetric scale written on the loudspeaker arrangement (Fig. 2.11). Figure 2.11 (right) shows the statistical analysis of the responses. Obviously the additional center loudspeaker decreases the auditory source width. Auditory source with is difficult to compare for different directions and also single loudspeakers yield auditory source widths that vary with direction. Still, a relatively constant auditory source width is desirable for moving auditory events. For static auditory events, the narrowest-possible extent can be desirable. 35 perceived source width in ° 2.3 Width 60 80 70 50 40 60 C80 50 30 C70 40 C50 30 20 0 10 20 10 C20 C30 C10 0 C40 10 20 30 40 50 60 70 80 aperture angle between the two outmost active loudspeakers in ° Fig. 2.11 Experimental setup and results of experiments of M. Frank (confidence intervals) about auditory source width of frontal stereo pairs of the angles ±5◦ , . . . , ±40◦ and with an additional center loudspeaker (C) 2.3.1 Model of the Perceived Width The angle 2 arccos r E describes the aperture of a cap cut off the unit sphere perpendicular to the r E vector, at its tip, from the origin, see Fig. 2.12. As the r E vector length is between 0 (unclear direction) and 1 (only one loudspeaker active), this angle stays between 180◦ and 0◦ . M. Frank’s experiments about the auditory source width [10, 28] showed that stereo pairs of larger half angles α were also heard as wider. The length of the r E vector gets shorter with the half angle α. In a symmetrical loudspeaker pair θT12 = (cos α, ± sin α) with g1 = g2 = 1, the y coordinate of the r E vector cancels and its length is r E = rE,x = cos α. The corresponding spherical cap is same size as the loudspeaker pair 2 arccos r E = 2α. However, only 58 of the size was indicated by the listeners of the experiments, which yields the following estimator of the perceived width: ASW = 5 8 · 180◦ π · 2 arccos r E . (2.11) Fig. 2.12 Cap size associated with r E length model for L+R (left plot) and L+R+C (right plot) 36 60 perceived source width in ° Fig. 2.13 Model of the perceived width as 58 of the half-angle arccos r E matches the half-angle of the experiment. Except for a lower limit, which is determined by the apparent source width (ASW) due to the room acoustical setting 2 Auditory Events of Multi-loudspeaker Playback 80 70 50 40 60 C80 50 30 C70 40 C50 30 20 0 20 10 10 C10 0 C40 10 C20 C30 20 30 40 50 60 70 80 aperture angle between the two outmost active loudspeakers in ° For an additional center loudspeaker g3 = 1, θT = (1, 0), the estimator yields r E = rE,x = 1 2 + cos α, 3 3 an increase matching the experiments as arccos r E < α, see Figs. 2.13 and 2.12. 2.4 Coloration Despite research primarily focuses on the spatial fidelity of multi-loudspeaker playback, the overall quality of surround sound playback was found to be largely determined by timbral fidelity (70%) [29]. Loudspeakers in a studio or performance space are often characterized by different colorations that are caused by different reflection patterns (most often the wall behind the loudspeaker). When changing the active loudspeakers, or their number, these differences become audible. On the one hand, static coloration, e.g. the frequency responses of the loudspeakers, can typically be equalized. On the other hand, changes in coloration during the movement of a source cannot be equalized easily and yield annoying comb filters. Although coloration is often assessed verbally [30], we employ a simple technical predictor based on the composite loudness level (CLL) by Ono [31, 32]. The CLL spectrum predicts the perceived coloration and is calculated from the sum of the loudnesses of both ears in each third-octave band. Studies about loudspeaker and headphone equalization show that differences in third-octave band levels of less than 1dB are inaudible by most listeners [33, 34]. This criterion can also be applied for the perception of coloration, i.e., differences between CLL spectra of less than 1dB are assumed to be inaudible. Pairwise panning between loudspeakers results in a single active loudspeaker for source directions that coincide with the direction of a loudspeaker and two equally loud loudspeakers for source directions exactly between two neighboring loudspeakers, cf. Fig. 2.14. In the second case, the different propagation paths from the two loudspeakers to the ears create a comb filter. This comb filter is not present for sources 2.4 Coloration 37 L C R L C R Coloration CLL in dB 1 0 -1 -2 on lspk. betw. lspk. difference -3 10 2 10 3 10 4 Frequency in Hz Fig. 2.14 Coloration predicted by composite loudness levels for a single loudspeaker C (black), two equally loud loudspeakers C and R (light gray), and their difference (dashed dark gray) played from a single loudspeaker. Thus, moving a source between the two directions yields noticeable coloration. This is in contrast to static sources, for which Theile’s experiments [35] indicated that they are perceived without coloration. The actual shape of the afore-mentioned comb filter depends on the angular distance between the loudspeakers. The first notch and its depth decreases with the distance. This implies that coloration increases for playback with higher loudspeaker densities. A similar comb filter is created when using a triplet of loudspeakers with the same loudspeaker density as the pair, e.g. L, C, R compared to C, R. In order to avoid a strong increase in source width or annoying phasing effects, the outmost loudspeakers L and R are strongly reduced in their level, typically around -12dB compared to loudspeaker C. In doing so, the similarity of the comb filters yields barely any coloration when moving a source between the two directions, cf. Fig. 2.15. Judging from what is shown above, it appears beneficial to activate always a few loudspeaker to stabilize the coloration, as opposed to using just one loudspeaker and moving the playback to another one. Keeping the number of simultaneously active loudspeakers more or less constant does not only prevent coloration of source movements, it also yields a more constant source width. Because of this relation between coloration and source width, the fluctuation of r E is also a simple predictor of panning-dependent coloration. In general, the strongest coloration is perceived under anechoic listening conditions. In reverberant rooms, the additional comb filters introduced by reflections help to conceal the comb filters due to multi-loudspeaker playback. 38 2 Auditory Events of Multi-loudspeaker Playback L C R L C R Coloration CLL in dB 1 0 -1 -2 -3 on lspk. betw. lspk. difference 10 2 10 3 10 4 Frequency in Hz Fig. 2.15 Coloration predicted by composite loudness levels for loudspeaker C with additional – 12 dB from L and R (black), two equally loud loudspeakers C and R (light gray), and their difference (dashed dark gray) 2.5 Open Listening Experiment Data Experimental data from azimuthal localization in frontal and lateral loudspeaker pairs Figs. 2.3 and 2.4, azimuthal/elevational localization in horizontal, skew, and vertical frontal pairs Fig. 2.5, triangles Fig. 2.7, and quadrilaterals Fig. 2.8 are available online at https://opendata.iem.at in the listening experiment data project, as well as the data to the width experiment in Fig. 2.11. The opendata.iem.at listening experiment data project contains evaluation routines to analyze the 95%-confidence intervals symmetrically based on means, standard deviations and the inverse Student’s t-distribution CIMEAN.m, or more robustly based on median and inter-quartile ranges CI2.m and Student’s t-distribution, or for twodimensional data analysis robust_multivariate_confidence_region.m. The MATLAB script plot_gathered_data.m reads the formatted listening experiment data and its exemplary code generates figures like the above. In order to support others providing own listening experiment data, the MATLAB functions write_experimental_data.m read_experimental_data.m are provided on the website. References 1. M. Gerzon, General metatheory of auditory localization, in prepr. 3306, Convention Audio Engineering Society (1992) 2. D.M. Leakey E.C. Cherry, Influence of noise upon the equivalence of intensity differences and small time delays in two-loudspeaker systems. J. Acoust. Soc. Am. 29, 284–286 (1957) 3. K. Wendt, Das Richtungshören bei Überlagerung zweier Schallfelder bei Intensitäts- und Laufzeitsterophonie, Ph.D. Thesis, RWTH-Aachen (1963) References 39 4. G. Theile, G. Plenge, Localization of lateral phantom sources. J. Audio Eng. Soc. 25(4), 96–200 (1977) 5. L. Simon, R. Mason, F. Rumsey, Localisation curves for a regularly-spaced octagon loudspeaker array, in prepr. 7015, Convention Audio Engineering Society (2009) 6. T. Kimura, H. Ando, 3D audio system using multiple vertical panning for large-screen multiview 3D video display. ITE Trans. Media Technol. Appl. 2(1), 33–45 (2014) 7. F. Wendt, M. Frank, F. Zotter, Panning with height on 2, 3, and 4 loudspeakers, in Proceedings of the ICSA (Erlangen, 2013) 8. H. Lee, F. Rumsey, Level and time panning of phantom images for musical sources. J. Audio Eng. Soc. 61(12), 978–988 (2013) 9. J.M. Helm, E. Kurz, M. Frank, Frequency-dependent amplitude-panning curves, in 29th Tonmeistertagung (Köln, 2016) 10. M. Frank, Phantom sources using multiple loudspeakers in the horizontal plane, Ph.D. Thesis, Kunstuni Graz (2013) 11. M. Frank F. Zotter, Extension of the generalized tangent law for multiple loudspeakers, in Fortschritte der Akustik - DAGA (Kiel, 2017) 12. V. Pulkki, Virtual sound source positioning using vector base amplitude panning. J. Audio Eng. Soc. 45(6), 456–466 (1997) 13. G. Soulodre, Evaluation of objective measures of loudness, in prepr. 6161, 116th AES Convention (Berlin, 2004) 14. M.-V. Laitinen, J. Vilkamo, K. Jussila, A. Politis, V. Pulkki, Gain normalization in amplitude panning as a function of frequency and room reverberance, in Proceedings 55th AES Conference (Helsinki, 2014) 15. V. Pulkki, Compensating displacement of amplitude-panned virtual sources, in 22nd Conference Audio Engineering Society (2003) 16. G. Martin, W. Woszczyk, J. Corey, R. Quesnel, Sound source localization in a five-channel surround sound reproduction system, in prepr. 4994, Convention Audio Engineering Society (1999) 17. F. Zotter, M. Frank, Generalized tangent law for horizontal pairwise amplitude panning, in Proceedings of the 3rd ICSA, Graz (2015) 18. T. Kimura, H. Ando, Listening test for three-dimensional audio system based on multiple vertical panning, in Aoustics-12, Hong Kong (2012) 19. F. Wendt, Untersuchung von phantomschallquellen vertikaler lautsprecheranordnungen, M. Thesis, Kunstuni Graz (2013) 20. H. Clark, G. Dutton, P. Vanerlyn, The ’stereosonic’ recording and reproduction system. Repr. J. Audio Eng. Soc. Proc. Inst. Electr. Eng. 1957 6(2), 102–117 (1958) 21. V. Pulkki, Localization of amplitude-panned virtual sources ii: two- and three-dimensional panning. J. Audio Eng. Soc. 49(9), 753–767 (2001) 22. B. Rakerd, W.M. Hartmann, J. Hsu, Echo suppression in the horizontal and median sagittal planes. J. Acoust. Soc. Am. 107(2) (2000) 23. P.W. Robinson, A. Walther, C. Faller, J. Braasch, Echo thresholds for reflections from acoustically diffusive architectural surfaces. J. Acoust. Soc. Am. 134(4) (2013) 24. F. Zotter, M. Frank, M. Kronlachner, J. Choi, Efficient phantom source widening and diffuseness in ambisonics, in EAA Symposium on Auralization and Ambisonics (Berlin, 2014) 25. P. Stitt, S. Bertet, M. van Walstijn, Extended energy vector prediction of ambisonically reproduced image direction at off-center listening positions. J. Audio Eng. Soc. 64(5), 299–310 (2016) 26. P. Stitt, S. Bertet, M. Van Walstijn, Off-center listening with third-order ambisonics: dependence of perceived source direction on signal type. J. Audio Eng. Soc. 65(3), 188–197 (2017) 27. E. Kurz, M. Frank, Prediction of the listening area based on the energy vector, in Proceedings of the 4th International Conference on Spatial Audio (ICSA) (Graz, 2017) 28. M. Frank, Source width of frontal phantom sources: perception, measurement, and modeling. Arch. Acoust. 38(3), 311–319 (2013) 40 2 Auditory Events of Multi-loudspeaker Playback 29. F. Rumsey, S. Zieliński, R. Kassier, S. Bech, On the relative importance of spatial and timbral fidelities in judgments of degraded multichannel audio quality. J. Acoust. Soc. Am. 118(2), 968–976 (2005), http://link.aip.org/link/?JAS/118/968/1 30. S. Choisel, F. Wickelmaier, Evaluation of multichannel reproduced sound: scaling auditory attributes underlying listener preference. J. Acoust. Soc. Am. 121(1), 388–400 (2007) 31. K. Ono, V. Pulkki, M. Karjalainen, Binaural modeling of multiple sound source perception: methodology and coloration experiments. Audio Eng. Soc. Conv. 111, 11 (2001), http://www. aes.org/e-lib/browse.cfm?elib=9884 32. K. Ono, V. Pulkki, M. Karjalainen, Binaural modeling of multiple sound source perception: Coloration of wideband sound. Audio Eng. Soc. Conv. 112 4 (2002), http://www.aes.org/e-lib/ browse.cfm?elib=11331 33. A. Ramsteiner, G. Spikofski, Ermittlung von Wahrnehmbarkeitsschwellen für Klangfarbenunterschiede unter Verwendung eines diffusfeldentzerrten Kopfhörers (Fortschritte der Akustik, DAGA, 1987) 34. M. Karjalainen, E. Piirilä, A. Järvinen, J. Huopaniemi, Comparison of loudspeaker equalization methods based on DSP techniques. J. Audio Eng. Soc. 47(1/2), 14–31 (1999), http://www.aes. org/e-lib/browse.cfm?elib=12117 35. G. Theile, Über die Lokalisation im überlagerten Schallfeld, Ph.D. dissertation, Technische Universität (Berlin, 1980) Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made. The images or other third party material in this chapter are included in the chapter’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. Chapter 3 Amplitude Panning Using Vector Bases The method is straightforward and can be used on many occasions succesfully. Ville Pulkki [1], Ph.D. Thesis, 2001. Abstract This chapter describes Ville Pulkki’s famous vector-base amplitude panning (VBAP) as the most robust and generic algorithm of amplitude panning that works on nearly any surrounding loudspeaker layout. VBAP activates the smallestpossible number of loudspeakers, which gives a directionally robust auditory event localization for virtual sound sources, but it can also cause fluctuations in width and coloration for moving sources. Multiple-direction amplitude panning (MDAP) proposed by Pulkki is a modification that increases the number of activated loudspeakers. In this way, more direction-independence is achieved at the cost of an increased perceived source width and reduced localization accuracy at off-center positions. As vector-base panning methods rely on convex hull triangulation, irregular loudspeaker layouts yielding degenerate vector bases can become a problem. Imaginary loudspeaker insertion and downmix is shown as robust method improving the behavior, in particular for smaller surround-with-height loudspeaker layouts. The chapter concludes with some practical examples using free software tools that accomplish amplitude panning on vector bases. Vector-base amplitude panning (VBAP) was extensively described and investigated in [2], alongside with the stabilization of moving sources by adding spread with multiple-direction amplitude panning (MDPA) [3]. Since then, VBAP and MDAP have been becoming the most common and popular amplitude panning techniques, which is particularly robust and can automatically adapt to specific playback layouts. © The Author(s) 2019 F. Zotter and M. Frank, Ambisonics, Springer Topics in Signal Processing 19, https://doi.org/10.1007/978-3-030-17207-7_3 41 42 3 Amplitude Panning Using Vector Bases 3.1 Vector-Base Amplitude Panning (VBAP) Assuming the r V model to predict the perceived direction, an intended auditory event at a panning direction θ , we call it the virtual source, can theoretically be controlled by the criterion according to V. Pulkki [2] θ= L g̃l θl . (3.1) l=1 Here, θl are the direction vectors of the loudspeakers involved and the amplitude weights g̃l need to be normalized for constant loudness g̃l gl = L . (3.2) 2 l=1 g̃l Moreover, the weights gl should always stay positive to avoid in-head localization or other irritating listening experiences. For loudspeaker rings around the horizon, always 1 or 2 loudspeakers will be contributing to the auditory event, for loudspeakers arranged on a surrounding sphere, always 1 up to 3 loudspeakers will be used, whose directions must enclose the direction of the desired auditory event, the virtal source. For the directional stability of the auditory event, the angle enclosed between the loudspeakers should stay smaller than 90◦ . The system of equations for VBAP [2] uses 3 loudspeaker directions and gains to model the panning direction θ ⎡ ⎤ g̃1 θ = [θ1 , θ2 , θ3 ] ⎣g̃2 ⎦ = L · g̃ g̃3 ⇒ g̃ = L−1 θ, g= g̃ . g̃ (3.3) The selection of the activated loudspeaker triplet is preceded by forming all triplets of the convex hull spanned by all the given playback loudspeakers. To find the loudspeaker triplet that needs to be activated, the list of all triplets is being searched for the one with all-positive weights, g1 ≥ 0, g2 ≥ 0, g3 ≥ 0. Figure 3.1 shows the localization curve for VBAP between a loudspeaker at 0◦ and 45◦ for a centrally seated listener and one shifted to the left. The experiment is described in [4] and results were gathered by a 1.8 m circle of 8 loudspeakers, and listeners indicated the perceived direction by naming numbers from a 5◦ scale mounted on the loudspeaker setup. Black whiskers of the results (95% confidence intervals and medians) for the centrally seated listener indicate a mismatch between slope of the perceived angles with VBAP; the ideal curve is represented by the dashed line and the mismatch can be understood by a better match of other exponents γ in Fig. 2.6. The directional spread is quite narrow. For an off-center left-shifted listening position the perceived directions is shown in terms of a 5◦ histogram (gray bubbles) in Fig. 3.1. For this off-center position, it becomes clear that the closest loudspeaker dominates localization within a third of the panning directions. Still, the directional right 3.1 Vector-Base Amplitude Panning (VBAP) 43 -45 -40 panning angle in ° -35 -30 -25 -20 -15 -10 left -5 0 30 25 20 15 10 left 5 0 -5 -10 -15 -20 -25 -30 -35 -40 -45 perceived angle in ° right Fig. 3.1 Perceived directions for VBAP between loudspeakers at 0◦ and 45◦ from [4]. 95% confidence intervals and medians (black) are for a centrally seated listener in a circle of 2.5 m radius. Localization for left-shifted listener (1.25 m) can become bi-modal, so that 5◦ bubble histogram is shown (gray) panning (=adjusted) angle in degree panning (=adjusted) angle in degree mapping seems to be monotonic with the panning angle, and the perceived direction stays within the loudspeaker pair, which is a robust result, at least. In Fig. 3.2 we see that responses from [5] in which the panning angle was adjusted to match reference loudspeakers set up in steps of 15◦ on amplitude-panned lateral loudspeaker pairs fairly match the reference directions using VBAP. The r E vector model (black curve) delivers a better match with only one exception at 105◦ . This motivates VBIP as alternative strategy. 120 90 60 60 90 120 reference (=perceived) angles in degree (a) Ring starting at 0 120 90 60 60 90 120 reference (=perceived) angles in degree (b) Ring starting at 30 Fig. 3.2 VBAP angles on a 60◦ -spaced horizontal loudspeaker ring starting at 0◦ (a) or 30◦ (b), perceptually adjusted to match panned pink noise with harmonic-complex acoustic reference in 15◦ steps, from [5]; black curve shows r E model prediction 44 3 Amplitude Panning Using Vector Bases 120 100 80 60 40 20 0 0 90 180 270 360 Fig. 3.3 The width measure 2 arccos r E for a virtual source on a horizontal and a vertical trajectory (45◦ azimuth) using VBAP on an octahedral arrangement Vector-Base Intensity Panning (VBIP). With nearly the same set of equations, but improving the perceptual mapping by the squares of the weights, the auditory event can be controlled corresponding to the direction of the r E vector ⎡ 2⎤ g̃1 ⎢ 2⎥ θ = [θ1 , θ2 , θ3 ] ⎣g̃2 ⎦ = L · g̃ sq g̃32 ⎡ ⇒ g̃ sq = L−1 θ, g̃sq1 ⎤ ⎥ ⎢ g̃ = ⎣ g̃sq2 ⎦ , g̃sq3 g= g̃ . g̃ (3.4) This formulation appears more contemporary due to the excellent match of the r E model to predict experimental results, as shown earlier. Non-smooth VBAP/VBIP width. If one of the loudspeakers is exactly aligned with the virtual source for either VBAP or VBIP, e.g. θ1 = θ , the resulting gains are g1,2,3 = (1, 0, 0), and therefore only 1 loudspeaker will be activated. For a virtual source √ between the 2 loudspeakers, e.g. θ1 + θ2 ∝ θ , then we obtain g1,2,3 = (1, 1, 0)/ 2, and hereby only 2 loudspeakers will be active. This behavior in particular yields audible variation in the perceived width and coloration. For virtual source movements that cross a common edge of neighboring loudspeaker triplets, there will often be unexpectedly intense jumps that are quite pronounced. Figure 3.3 illustrates the variation of the perceived width with VBAP on an octahedral arrangements of loudspeakers in the directions θlT ∈ {[±1, 0, 0], [0, ±1, 0], [0, 0, ±1]}. 3.2 Multiple-Direction Amplitude Panning (MDAP) In order to adjust the r E or r V vector not only directionally but also in length, and thus to control the number of active loudspeakers for moving sound objects, Pulkki extended VBAP to multiple-direction amplitude panning (MDAP [3]). Hereby not only the perceived width but also the coloration can be held constant. 3.2 Multiple-Direction Amplitude Panning (MDAP) 45 Direction spread in MDAP. MDAP employs more than one virtual source distributed around the panning direction as a directional spreading strategy. For horizontal loudspeaker rings, MDAP can consist of a pair of virtual VBAP sources at the angle ±α around the panning direction ϕs ± α. In a ring of L loudspeakers with uniform ◦ ◦ , the angle α = 90% 180 yields optimally flat width for all angular spacing of 360 L L panning directions, as shown for L = 6 in comparison between MDAP and VBAP in Fig. 3.4. Moreover, MDAP seems to equalize the aiming of the r E measure to the aiming of the r V measure, which is the one controlled by VBAP and MDAP. 30 VBAP MDAP,φ ±27° s 15 0 90% 30°=27° 0 60 120 180 240 300 360 panning angle in degree ∠ rE − φs in degree Fig. 3.4 Width-related angle arccos r E and angular error ∠r E − ϕs for VBAP and MDAP, with MDAP using two◦virtual sources at ±90% 180 L around the intended panning direction. This not only achieves minimal angular errors but also panning-invariant width arccos|rE| in degree Listening experiment results. Experiments from [4] in Fig. 3.5 investigate the perceived width for two possible horizontal loudspeaker ring layouts, both with 45◦ spacings, but one starting at 0◦ (“0”) the other at 22.5◦ (“1/2”). Widths of MDAP with a direction spread of α = 22.5◦ are perceived as significantly similar on both 30 15 VBAP MDAP,φs±27° 0 −15 −30 0 60 120 180 240 300 360 panning angle in degree wide 1 relative source width 0.8 narrow Fig. 3.5 Perceived width differences when using VBAP and MDAP for panning onto or between loudspeaker directions from listening experiment in [4] 0.6 0.4 0.2 0 VBAP 0 VBAP 1/2 MDAP 0 MDAP 1/2 46 3 Amplitude Panning Using Vector Bases coloration imperceptible Fig. 3.6 Coloration change of horizontally moving sources with VBAP and MDAP on different loudspeaker aperture angles from listening experiment in [4] very intense ring layouts, while VBAP yields significantly narrower results for panning onto the frontal loudspeaker in the “0” layout, which activates a single loudspeaker, only. Note that VBAP1/2 and MDAP1/2 are identical with α = 22.5◦ and were treated as one condition. Moreover, a more constant width measure also describes a more constant number of activated loudspeakers while panning. Figure 3.6 shows that listeners can hear the difference in coloration changes with rotatory panning using pink noise and a constant speed. The figure shows that coloration fluctuations of MDAP are always clearly smaller than with VBAP on similar loudspeaker rings. Moreover, coloration changes are more pronounced on rings of 16 loudspeakers than with 8 loudspeakers, which is explained by their faster fluctuation. Figure 3.7 shows the results from [6] for a central and left-shifted off-center listening position when using MDAP on an 8-channel ring of loudspeakers. At the central listening position, the perceived directional spread around the loudspeaker positions 0◦ and 45◦ , obviously increases as expected, as indicated by the whiskers (95% confidence intervals and medians). Moreover, the spread of MDAP seems to slightly decrease the slope mismatch between the underlying VBAP algorithm and the perceptual curve around the 22.5◦ direction. Despite MDAP enforces a larger number of active loudspeakers, its localization is still similarly robust as the one of VBAP, also at on off-center listening positions. The perceived direction can be assumed to stay at least confined within a strictly directionally limited activation of loudspeakers. Correspondingly, the perceived directions shown in the gray 5◦ -histogram bubbles of Fig. 3.7 indicate the perceived directions when the listener is located left-shifted off-center. While localization is slightly attracted by the closer loudspeaker at 0◦ , the larger spread causes a more monotonic outcome that is less split than with VBAP in Fig. 3.1. For a more exhaustive study, Frank used 6 loudspeakers on the horizon and gave the task to his listeners to align an MDAP pink-noise direction to match acoustical references every 15◦ (harmonic complex) by adjusting the panning direction [5]. The results in Fig. 3.8 contain 24 answers from 6 subjects responding four times 1 0.8 0.6 0.4 0.2 0 VBAP8 LS VBAP16 LS MDAP8 LS MDAP16 LS right 3.2 Multiple-Direction Amplitude Panning (MDAP) 47 -45 -40 panning angle in ° -35 -30 -25 -20 -15 -10 left -5 0 30 25 20 15 10 left 5 0 -5 -10 -15 -20 -25 -30 -35 -40 -45 perceived angle in ° right panning (=adjusted) angle in degree panning (=adjusted) angle in degree Fig. 3.7 Perceived directions for MDAP panning on an 8-channel 2.5 m radius loudspeaker ring within the interval [0◦ , 45◦ ] at a central (black medians and 95%- confidence whiskers) and 1.25 m left-shifted off-center listening position (gray 5◦ bubble histogram); dashed line indicates ideal panning curve 180 150 120 90 60 30 0 0 30 60 90 120 150 180 reference (=perceived) angles in degree (a) Ring starting at 0 180 150 120 90 60 30 0 0 30 60 90 120 150 180 reference (=perceived) angles in degree (b) Ring starting at 30 Fig. 3.8 MDAP pink-noise directions on horizontal rings of 60◦ -spaced loudspeakers adjusted to perceptually match reference loudspeaker directions (harmonic complex) every 15◦ . Markers and whiskers indicate 95% confidence interals and medians, black curve the r E vector model (by repetition and symmetrization). The black line shows directions indicated by the r E vector model for the tested conditions. Obviously, the confidence intervals of the adjusted MDAP angles match quite well both the reference directions and predictions by the r E vector model, in particular for angles between 0◦ and 90◦ (except 75◦ ) for the ring starting at 0◦ , and from 0◦ to 120◦ for the 30◦ -rotated ring. The mismatch is much less than 4◦ for panning angles ≤ 90◦ . 48 3 Amplitude Panning Using Vector Bases 120 100 80 60 40 20 0 0 90 180 270 360 Fig. 3.9 The width measure 2 arccos r E for virtual sources on a horizontal and vertical path on an octahedron setup using MDAP with additional 8 half-amplitude virtual sources at 45◦ distance to the main virtual source MDAP with 3D loudspeaker layouts. For more arbitrary 3D loudspeaker arrangements, multiple-directions could be arranged ring-like, see Fig. 3.9. This arrangement uses 8 additional virtual sources inclined by 45◦ wrt. the main virtual source. At least mathematically, however, it requires to post optimize the amplitudes and angles of the virtual sources in order to accurately match the desired r V or r E vector in direction and length on irregular loudspeaker arrangements, cf. [7]. Non-uniform r V vector lengths of the individual virtual sources involved cause a distorted resultant vector. In particular, their superposition is distorted towards those of the multiple virtual source directions with the longest r V vectors. Epain’s article [7] proposes optimization retrieving optimal orientation and weighting of the multiple virtual sources for every panning direction. 3.3 Challenges in 3D Triangulation: Imaginary Loudspeaker Insertion and Downmix Surrounding loudspeaker hemispheres typically exhibit the following two problems, in most cases: • Loudspeaker rectangles at the sides of standard setups with height (ITU-R BS.2051-0 [8]) can be decomposed ambiguously into triangles at the sides, back, and top. This can yield noticeable ranges within loudspeaker quadrilaterals, in which auditory events are unexpectedly created by just two of the loudspeakers. • Signals of virtual sources below the horizon usually get lost. The problem of unfavorable or ambiguous triangulations into loudspeaker triplets appears subtle, however, it can cause clearly audible deficiencies. Especially when ambiguous triangulation yields asymmetric behavior between left and right, e.g., for the top, rear, and lateral directions, where we would manually define loudspeaker quadrilaterals instead of triangles, see [9]. As surrounding loudspeaker hemispheres are typically open by 180◦ towards below, VBAP/ VBIP/ MDAP is numerically unstable and theoretically useless for any panning direction below. Despite the absence of loudspeakers below renders 3.3 Challenges in 3D Triangulation: Imaginary Loudspeaker Insertion and Downmix 49 downwards amplitude panning theroretically infeasible, it is still reasonable to preserve signals of virtual sources that are meant for playback on spherically surrounding setups. In the case of the asymmetric loudspeaker rectangles, see Fig. 3.10, and a missing lower hemisphere of surrounding loudspeakers, the insertion of one or more imaginary loudspeakers in the vertical direction (nadir) or in the middle of the rectangle (the average direction vector) has proven to be a useful strategy, e.g. in [10]. Any imaginary loudspeaker aims at either extending the admissible triangulation towards open parts of the surround loudspeaker setup, or to cover for parts with potential asymmetry, see [9]. 0 −5 −10 −15 −20 0 90 180 270 360 120 100 80 60 40 20 0 0 90 180 0 90 180 120 100 80 60 40 20 0 Fig. 3.10 VBAP on the ITU D (4 + 5 + 0) setup [8]. Top row: Insertion of imaginary loudspeaker at nadir preserves loudness of downward-panned signals, shown for vertical path and E values in 1 dB for factors { √1 , √ , 0} to re-distribute the signal to the 5 existing horizontal loudspeakers. 5 2 5 Middle row: Due to typical triangulation, two left-right mirrored vertical paths (45◦ azimuth) yield asymmetric behavior, as shown by the 2 arccos r E measure. Bottom row: Insertion of imaginary loudspeaker at 65◦ fixes symmetry and feeds the signal with a factor √1 to the 4 existing neighbor 4 loudspeakers 50 3 Amplitude Panning Using Vector Bases The signal of the imaginary loudspeaker can be dealt with in two ways • it can be dismissed, e.g., for loudspeaker below at nadir, this would still yield a signal near the closest horizontal pair of loudspeakers for virtual sources panned to below-horizontal directions unless panned exactly to nadir • it can be down-mixed to the neighboring M loudspeakers by a factor of √1M , or less as in Fig. 3.10; alternatively for control yielding perfectly flat E measures, the resulting down-mixed gain vector can be re-normalized by Eq. (3.2). 3.4 Practical Free-Software Examples 3.4.1 VBAP/MDAP Object for Pd There is a classic VBAP/MDAP implementation by Ville Pulkki that is available as external in pure data (Pd). The example in Fig. 3.11 illustrates its use together with some other useful externals in Pd. Software requirements are: loadbang define_loudspeakers 3 0 0 90 0 180 0 -90 0 0 90 0 -90 octahedron layout azimuth elevation 20 spread t b f 10 t b f 25 panning angles and spread t b f <-VBAP/MDAP gains vbap expr $f1+1; $f2 t f f <-shaping individual gains to panning vector pack master level sel 6 element $1 1 $2 mtx 6 1 osc~ 1000 <-test signal mtx_*~ 6 1 10 dbtorms prvu~ >+12 +6 +2 -0dB -2 -6 -12 -20 -30 -50 <-99 Front prvu~ >+12 +6 +2 -0dB -2 -6 -12 -20 -30 -50 <-99 Left prvu~ >+12 +6 +2 -0dB -2 -6 -12 -20 -30 -50 <-99 Rear prvu~ prvu~ >+12 +6 +2 -0dB -2 -6 -12 -20 -30 -50 <-99 >+12 +6 +2 -0dB -2 -6 -12 -20 -30 -50 <-99 Right Top prvu~ multiline~ 0 0 0 0 0 0 10 >+12 +6 +2 -0dB -2 -6 -12 -20 -30 -50 <-99 dac~ 1 2 3 4 5 6 lspks Bottom Fig. 3.11 Vector-Base/Multi-Direction Amplitude Panning (VBAP/MDAP) example in pure data (Pd) using Pulkki’s [vbap] external for an octahedral layout 3.4 Practical Free-Software Examples • • • • 51 pure-data (free, http://puredata.info/downloads/pure-data) iemmatrix (free download within pure-data) zexy (free download within pure-data) vbap (free download within pure-data). 3.4.2 SPARTA Panner Plugin The SPARTA Panner under http://research.spa.aalto.fi/projects/sparta_vsts/plugins.html provides a vector-base amplitude panning interface (VBAP) and multiple-direction amplitude panning (MDAP), see Fig. 3.12, with frequency-dependent loudness norp L gl adjustable to the listening conditions, see Laitinen [11]. malization by p l=0 The parameter DTT can be varied between 0 (standard, frequency-independent VBAP normalization, i.e. diffuse-field normalization), 0.5 for typical listening environments, and 1 for the anechoic chamber. The plugin allows to either manually enter the azimuth and elevation angles of multiple panning directions (if more than one input signal is used) and for the playback loudspeakers, or import/export from/to preset files. Of course all panning directions can be time-varying and be moved per mouse, automations, or controls. Fig. 3.12 The Panner VST plug-in from Aalto University’s SPARTA plug-in suite manages Vector-Base Amplitude Panning within sequencers supporting VST 52 3 Amplitude Panning Using Vector Bases References 1. V. Pulkki, Spatial sound generation and perception by amplitude panning techniques, Ph.D. dissertation, Helsinki University of Technology (2001) 2. V. Pulkki, Virtual sound source positioning using vector base amplitude panning. J. Audio Eng. Soc. 45(6), 456–466 (1997) 3. V. Pulkki, Uniform spreading of amplitude panned virtual sources, in Proceedings of the Applications of Signal Processing to Audio and Acoustics (WASPAA), IEEE (New Paltz, NY, 1999) 4. M. Frank, Phantom sources using multiple loudspeakers in the horizontal plane, Ph.D. Thesis, Kunstuni Graz (2013) 5. M. Frank, F. Zotter, Extension of the generalized tangent law for multiple loudspeakers, in Fortschritte der Akustik - DAGA (Kiel, 2017) 6. M. Frank, Source width of frontal phantom sources: perception, measurement, and modeling. Arch. Acoust. 38(3), 311–319 (2013) 7. N. Epain, C.T. Jin, F. Zotter, Ambisonic decoding with constant angular spread. Acta Acust. United Acust. 100(5) (2014) 8. ITU, Recommendation BS.2051: Advanced sound system for programme production (ITU, 2018) 9. M. Romanov, M. Frank, F. Zotter, T. Nixon, Manipulations improving amplitude panning on small standard loudspeaker arrangements for surround with height, in Tonmeistertagung (2016) 10. F. Zotter, M. Frank, All-round ambisonic panning and decoding. J. Audio Eng. Soc. (2012) 11. M.-V. Laitinen, J. Vilkamo, K. Jussila, A. Politis, V. Pulkki, Gain normalization in amplitude panning as a function of frequency and room reverberance, in Proceedings 55th AES Conference (Helsinki, 2014) Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made. The images or other third party material in this chapter are included in the chapter’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. Chapter 4 Ambisonic Amplitude Panning and Decoding in Higher Orders …the second-order Ambisonic system offers improved imaging over a wider area than the first-order system and is suitable for larger rooms. Jeffrey S. Bamford [1], Canadian Acoustics, 1994. Abstract Already in the 1970s, the idea of using continuous harmonic functions of scalable resolution was described by Cooper and then Gerzon, who introduced the name Ambisonics. This chapter starts by reviewing properties of first-order horizontal Ambisonics, using an interpretation in terms of panning functions. And the required mathematical formulations for 3D higher-order Ambisonics are developed here, with the idea to improve the directional resolution. Based on this formalism, ideal loudspeaker layouts can be defined for constant loudness, localization, and width, according to the previous models. The chapter discusses how Ambisonics can be decoded to less ideal, typical loudspeaker setups for studios, concerts, sound-reinforcement systems, and to headphones. The behavior is analyzed by a rich variety of listening experiments and for various decoding applications. The chapter concludes with example applications using free software tools. Cooper [2] used higher-order angular harmonics to formulate circular panning of auditory events. Due to the work of Felgett [3], Gerzon [4], and Craven [5], the term Ambisonics became common for technology using spherical harmonic functions. Around the early 2000s, most notably Bamford [6], Malham [7], Poletti [8], Jot [9], and Daniel [10] pioneered the development of higher-order Ambisonic panning and decoding, Ward and Abhayapala [11], Dickens [12], and at the lab of the authors Sontacchi [13]. Another leap happened around 2010, when Ambisonic decoding to loudspeakers could be largely improved by considering regularization methods [14], singularvalue decomposition [15], and all-round Ambisonic decoding (AllRAD) [15, 16], © The Author(s) 2019 F. Zotter and M. Frank, Ambisonics, Springer Topics in Signal Processing 19, https://doi.org/10.1007/978-3-030-17207-7_4 53 54 4 Ambisonic Amplitude Panning and Decoding in Higher Orders a combination of vector-base panning techniques with Ambisonics, yielding the most robust and flexible higher-order decoding method known today. For headphones, after the work of Jot [9] that outlined the basic problems of binaural decoding in the 1990s, Sun, Bernschütz, Ben-Hur, and Brinkmann [17–19] made important contributions to binaural decoding, and we consider TAC and MagLS decoders by Zaunschirm and Schörkhuber [20, 21] as the essential binaural decoders. Both remove HRTF delays or optimize HRTF phases at high frequencies to avoid spectral artifacts. By interaural covariance correction, MagLS/TAC manage to play back diffuse fields consistently, using the formalism of Vilkamo et al [22]. 4.1 Direction Spread in First-Order 2D Ambisonics In 2D first-order Ambisonics as discussed in Chap. 1, the directional mapping of a single sound source from the angle ϕs to the direction of each loudspeaker ϕ is described by the shape of panning function (or direction-spread function) in Eq. (1.17). The directional spreading is not infinitely narrow, but determined by what can be represented by first-order directivity patterns. Consequently, sound from the angle ϕs will be mapped by a dipole pattern aligned with the source and an additional omnidirectional pattern. We can involve a spread parameter a to make the directional spread to the loudspeakers √system adjustable and either cardioid-shaped a = 1, 2D-supercardioid-shaped a = 2, or 2D-hypercardioid-shaped a = 2, using: g(ϕ) = 1 + a cos(ϕ − ϕS ). (4.1) This function represents how first-order Ambisonic panning would distribute a mono signal to loudspeakers. With the loudspeaker positions described by the set of angles {ϕl }, a vector of amplitude-panning gains with an entry for each loudspeaker could be determined by sampling the direction-spread function: ⎤ ⎡ ⎤ cos(ϕ1 − ϕS ) g1 ⎢ ⎥ ⎢ ⎥ .. g = ⎣ ... ⎦ = 1 + a ⎣ ⎦. . gL cos(ϕL − ϕS ) ⎡ (4.2) With these gain values, we evaluate models of perceived loudness, direction, and width, as introduced in Chap. 2, in order to enter a discussion of perceptual goals. If the loudspeaker directions {θl } are chosen suitably, it is possible to2 obtain panning-independent loudness, direction, and width measures E = l gl , r E = 1 2 5 180◦ l gl θl , and 8 π 2 arccos r E . How is it done? E For first-order 2D Ambisonics, it is theoretically optimal √ to use at least a ring of 4 loudspeakers with uniform angular spacing and a = 2, which is easily checked with the aid of a computer, cf. Fig. 4.1, and explained below and in Sect. 4.4. Direction spread in FOA. The panning-function interpretation with its directional spread has some similarity to MDAP, with its attempt to directionally spread an amplitude-panned signal. Similar to the discrete virtual spread by ±α = arccos r E 55 60 cardioid a=0.5 2D−supercard. a=1.414 2D−hypercard. a=2 E 50 40 30 0 90 180 270 360 panning angle in degree 30 s 15 0 E ∠ r − φ in degree Fig. 4.1 Width-related angle arccos r E and angular error ∠r E − ϕs for first-order Ambisonics on a ring with 90◦ spaced loudspeakers using a cardioid, or 2D super-/hyper-cardioid patterns; the measures are all panning-invariant and the super-cardioid weighting is the 2D max-r E weighting with arccos r E = 45◦ arccos|r | in degree 4.1 Direction Spread in First-Order 2D Ambisonics −15 −30 cardioid a=0.5 2D−supercard. a=1.414 2D−hypercard. a=2 0 90 180 270 360 panning angle in degree around the panning direction. The virtual direction spread of first-order Ambisonics is described by its continuous panning function g(ϕ) in Eq. (4.1). To inspect the continuous function by the r E measure defined in Eq. (2.7), we may evaluate an integral over the panning function instead of the sum. Because of the symmetry around ϕs , we may set for convenience ϕs = 0, which knowingly causes rE,y = 0, and evaluate rE,x = 2π 0 g 2 (ϕ) cos ϕ dϕ 2π 0 g 2 (ϕ) dϕ = π 0 [1 + 2a π 0 [1 + cos ϕ + a 2 cos2 ϕ] cos ϕ dϕ 2a cos ϕ + a2 cos2 ϕ] dϕ = a 2 . 1 + a2 (4.3) 2a d 4+2a −4a = 0, hence at a = The maximum of rE,x = 2+a 2 is found by da r E,x = 2 √ 2+a √ 2 1 √ 2. Consequently, the 2D max-r E weight is rE,x = 2 = 2 and yields the angle arccos r E = 45◦ . This would resemble a 2D-MDAP-equivalent source spread to ±45◦ . Note that first-order Ambisonics cannot map to a smaller spread than this. Only higher orders permit to further reduce this spread to a desired angle below 90◦ . 2 2 Ideal loudspeaker layouts. Not only is the directional aiming of the virtual, continuous first-order Ambisonic panning function ideal and its width panning-invariant, also its loudness measure is panning-invariant. However, decoding to a physical loudspeaker setup can degrade the ideal behavior. For which loudspeaker layout are these properties preserved by sampling decoding? The 2D first-order Ambisonic components (W, X, Y ) correspond to {1, cos ϕ, sin ϕ} patterns, a first-order Fourier series in the angle. Sampling the playback directions by L = 3 uniformly spaced loudspeakers on the horizon, the sampling theorem for this series is already fulfilled. Accordingly, Parseval’s theorem ensures panninginvariant loudness E for any panning direction. For an ideal r E measure, however, one more loudspeaker is required L ≥ 4 for a uniformly spaced horizontal ring. To explain this increase exhaustively, the concept of circular/spherical polynomials and t-designs will be introduced in this chapter. For a brief explanation, g 2 (ϕ) is a second-order expression and therefore to represent the ideal constant loudness E = g 2 (ϕ) dϕ2 of the continuous panning function consistently after discretization E = 2π l gl , it requires L = 3 uniformly L 56 4 Ambisonic Amplitude Panning and Decoding in Higher Orders spaced loudspeakers, as argued before. By contrast, the expressions g 2 (ϕ) cos ϕ and g 2 (ϕ) sin ϕ are third-order and appear in r E · E = g 2 (ϕ) [cos ϕ, sin ϕ]T dϕ. Consequently, ideal mapping of r E (direction and width) requires at least one more loudspeaker L = 4 for a uniformly arrangement to make the continuous and spaced 2 g [cos ϕl , sin ϕl ]T perfectly equal. the discretized form r E E = 2π l l L Towards a higher-order panning function. An Nth-order cardioid pattern is obtained from the cardioid pattern by taking its Nth power gN (ϕ) = 1 (1 + cos ϕ)N , 2N which makes it narrower. With N = 2, this becomes, using cos2 ϕ = 21 (1 + cos 2ϕ), g2 (ϕ) = 1 1 (1 + 2 cos ϕ + cos2 ϕ) = (3 + 4 cos ϕ + cos 2ϕ). 4 8 More generally, Chebyshev polynomials Tm (cos ϕ) = cos mϕ, cf. [23, Eq. 3.11.6] can be used to argue that there is always a fully equivalent cosine series describing the higher-order 2D panning function in the azimuth angle N g(ϕ) = am cos mϕ. (4.4) m=0 Rotated panning function. In first-order Ambisonics, panning functions consist of an omnidirectional part, cos(0ϕ) = 1, and a figure-of-eight to x, cos ϕ, but that was not all: Recording and playback also required a figure-of-eight pattern to y, sin ϕ. The additional component allows to express rotated first-order directivities by a basis set of fixed directivities. For higher orders, a panning function rotated to a non-zero aiming ϕs = 0 N g(ϕ − ϕs ) = am cos[m(ϕ − ϕs )] (4.5) m=0 can be re-expressed by the addition theorem cos(α + β) = cos α cos β − sin α sin β into a series involving the sinusoids (odd symmetric part of a Fourier series), N g(ϕ − ϕs ) = am [cos mϕs cos mϕ + sin mϕs sin mϕ] (4.6) m=0 N N am(c) cos mϕ + = n=0 am(s) sin mϕ. m=0 We conclude: Higher-order Ambisonics in 2D (and the associated set of theoretical microphone directivities) is based on the Fourier series in the azimuth angle ϕ. 4.2 Higher-Order Polynomials and Harmonics 57 4.2 Higher-Order Polynomials and Harmonics The previous section required that direction and length of the r E vector resulting from amplitude panning on loudspeakers matched the desired auditory event direction and width. Harmonic functions with strict symmetry around a panning direction θ s will help us in achieving this goal and in defining good sampling. Regardless of the dimensions, be it in 2D or 3D, we desire to define continuous and resolution-limited axisymmetric functions around the panning direction θs to fulfill our perceptual goals of a panning-invariant loudness E, width r E , and perfect alignment between panning direction θs and localized direction r E . Then we hope to find suitable directional discretization schemes for ideal loudspeaker layouts, so that the measures E and r E are perfectly reconstructed in playback. The projection of a variable direction vector θ onto the panning direction θs always yields the cosine of the enclosed angle θTs θ = cos φ, no matter whether it is in two or three dimensions. Hereby constructing the panning function based on this projection readily meets the desired goals. The mth power thereof, (θTs θ )m = cosm φ T m helps to build an Nth-order power series g = N m=0 am (θs θ ) to describe a virtual Ambisonic panning function. T m contains all (N + For 2D, such a circular polynomial g = N m=0 am (θs θ ) m T m k 1)(N + 2)/2 mixed powers by (θs θ ) = (θxs θx + θys θy )m = m k=0 k (θxs θx ) (θys θy )m−k of the direction vectors’ entries θTs = [θxs , θys ] and θ = [θx , θy ]T . However, already Nrecognize mthat it only takes 2N + 1 functions to express we could T m g= N m=0 am (θs θ ) = m=0 am cos φ: First an initial polynomial with relative azimuth φ = ϕ − ϕs relating to a harmonic N series of N + 1 cosines or Chebyshevpolynomials g = N m=0 bm cos mφ = m=0 bm Tm (θs θ ). Then, in terms of absolute azimuth ϕ, the trigonometric addition theorem re-expresses the series into one of N + 1 cosines and N sines, with Tm (θs θ ) = cos[m(ϕ − ϕs )] = cos mϕs cos mϕ + sin mϕs sin mϕ. As shown in the upcoming section, we can alternatively obtain such orthonormal harmonic functions by solving a second-order differential equation that is generally used to define harmonics, which bears the later benefit that we can use the approach to define spherical harmonics in three space dimensions. an (θTs θ )n , involving the Spherical polynomials are similar, g = N n=0 n−k n l expressions (θTs θ )n = (θxs θx + θys θy + θzs θz )n = nk=0 l=0 (θzs θz )k k n−k l n−k−l (θxs θx ) (θys θy ) . Again, all these (N + 1)(N + 2)(N + 3)/6 combinations would be too many to form an orthogonal set of basis functions. Moreover, while the different cosine harmonics are orthogonal axisymmetric functions in 2D, they are not in 3D. On the sphere, the N + 1 orthogonal Legendre polynomials Pn (cos φ) replace the cosine series as a basis for g = N n=0 cn Pn (cos φ), as shown below. All mathematical derivations for the sphere rely on the definition of harmonics. They result in (N + 1)2 spherical harmonics andtheir addition theorem as a basis in terms of Pn (θTs θ ) = nm=−n Ynm (θs )Ynm (θ ). Dickins’ thesis is interabsolute directions 2n+1 4π esting for further reading [12]. In both regimes, 2D and 3D, the circular or spherical polynomials concept will be used to determine optimal layouts, so-called t-designs. Such t-designs are directional 58 4 Ambisonic Amplitude Panning and Decoding in Higher Orders sampling grids that are able to keep the information about the constant part of any either circular (2D) or spherical (3D) polynomials up to the order N ≤ t. This will be a mathematical key property exploited to determine requirements for preserving E and r E measures during Ambisonic playback with optimal loudspeaker setups, but not only. Also t-designs simplify numerical integration of circular or spherical harmonics to define state-of-the-art Ambisonic decoders or mapping effects. 4.3 Angular/Directional Harmonics in 2D and 3D The Laplacian is defined in the D-dimensional Cartesian space as D = j=1 ∂2 , ∂ x 2j (4.7) and for any function f , the Laplacian f describes the curvature. Any harmonic function is proportional to its curvature by an eigenvalue λ, f = −λ f, (4.8) and therefore is an oscillatory function. Generally, eigensolutions f = −λ f to the Laplacian are called harmonics. For suitable eigenvalues λ, harmonics span an orthogonal set of basis functions that are typically used for Fourier expansion on a finite interval. It seems desirable to find such harmonics for functions only exhibiting directional dependencies, i.e. in the azimuth angle ϕ in 2D, and azimuth and zenith angle ϕ, ϑ in 3D. 4.4 Panning with Circular Harmonics in 2D For 2 dimensions Appendix A.3.2 uses the generalized chain rule to convert the 2 2 Laplacian of a 2D coordinate system = ∂∂x 2 + ∂∂y 2 to a polar coordinate system ∂ with the radius r and the angle ϕ to the x axis, = r1 ∂r∂ + ∂r∂ 2 + r12 ∂ϕ 2 . And for functions = (ϕ) purely in the angle ϕ, the radial derivatives of all vanish and it remains (∂ → d) 2 d2 dϕ 2 = −λ r 2 . 2 (4.9) It is only yielding useful solutions with λ r 2 = m 2 , m ∈ Z, cf. Appendix A.3.4, Fig. 4.2, 4.4 Panning with Circular Harmonics in 2D −3 −2 59 −1 0 1 2 3 0dB −6dB −12dB −18dB 0dB −6dB −12dB −18dB 0dB −6dB −12dB −18dB 0dB −6dB −12dB −18dB 0dB −12dB −18dB −6dB 0dB −6dB −12dB −18dB 0dB −6dB −18dB −12dB Fig. 4.2 √ Circular harmonics with m = −3, . . . , 3 plotted as polar diagram using the radius R = 20 lg | π m | and grayscale to distinguish between positive (gray) and negative (black) signs m ⎧√ ⎪ 2 sin(|m|ϕ), for m < 0, 1 ⎨ =√ 1, for m = 0, 2π ⎪ ⎩√ 2 cos(mϕ), for m > 0, (4.10) which defines how to decompose panning functions of limited order |m| < N. The harmonics are periodic in azimuth, orthogonal and normalized (orthonormal) on the period −π ≤ ϕ ≤ π . Due to their completeness, any square-integrable function g(ϕ) can be expanded into a series of the harmonics using coefficients γm ∞ g(ϕ) = γm m (ϕ). (4.11) m=−∞ For a known function g(ϕ), the coefficients γm are obtained by the transformation integral γm = π g(ϕ) −π m (ϕ) dϕ, (4.12) as shown in Appendix Eq. (A.14). 2D panning function. An infinitely narrow angular range around a desired direction |ϕ − ϕs | < ε → 0 is represented by the transformation integral over a Dirac delta distribution δ(ϕ − ϕs ), cf. Appendix Eq. (A.16), so that the coefficients of such a panning function are γm = m (ϕs ). (4.13) As the infinite circular harmonic series is complete, the panning function is ∞ g(ϕ) = m (ϕs ) m (ϕ) = δ(ϕ − ϕs ), (4.14) m=−∞ and in practice we resolution-limit it to the Nth Ambisonic order, |m| ≤ N, and use an additional weight am that allows us to design its side lobes 4 Ambisonic Amplitude Panning and Decoding in Higher Orders N=1 N=2 N=5 0° 0dB −10dB 180° ° 45 −20dB 0dB −30dB 0° −10dB −20dB −30dB 180° N=1 N=2 N=5 D=2, maxre 90° ° 45 13 5° D=2, basic 90° 13 5° 60 Fig. 4.3 2D unweighted an = 1 basic and weighted max-r E Ambisonic panning functions for the orders N = 1, 2, 5 N gN (ϕ) = am m (ϕs ) m (ϕ). (4.15) m=−N πm The max-r E panning function [24] uses the weights am = cos( 2(N+1) ), as derived in 90◦ Appendix Eq. (A.20). The spread is now adjustable by the order to ± N+1 . The result is shown in Fig. 4.3, compared with no side-lobe suppression when an = 1 (basic). It is easy to recognize: m (ϕs ) represents the recorded or encoded directions, and m (ϕ) represents the decoded playback directions. Optimal sampling of the 2D panning function. In the theory of circular/spherical polynomials in the variable ζ = cos(ϕ − ϕs ), so-called t-designs in 2D are optimal point sets of given angles {ϕl } with l = 1, . . . , L and size L. A t-design allows to perfectly compute the integral (constant part) over the polynomials Pm (ζ ) of limited degree m ≤ t by discrete summation π −π L Pm (cos φ) dφ = Pm [cos(ϕl − ϕs )] 2π , L (4.16) l=1 regardless of any angular shift ϕs . In 2D, Chebyshev polynomials Tm (cos φ) = cos(mφ) are orthogonal polynomials, therefore an Nth-order panning function composed out of cos(mφ) is always a polynomial of Nth degree. Knowing this, it is clear that the integral over gN2 required to evaluate the loudness measure E is a polynomial of the order 2N. The integral to calculate r E is over gN2 cos(φ) and thus of the order 2N + 1. In playback, to get a perfectly panning-invariant loudness measure E of the continuous panning function and also the perfectly oriented r E vector of constant spread arccos r E , the parameter t must be t ≥ 2N + 1. In 2D, all regular polygons are t-designs with L = t + 1 points ϕl = 2π (l − 1). t +1 We can use the smallest set of 2N + 2 angles ϕl = 180◦ N+1 (4.17) (l − 1) as optimal 2D layout. 4.5 Ambisonics Encoding and Optimal Decoding in 2D 61 4.5 Ambisonics Encoding and Optimal Decoding in 2D To encode a signal s into Ambisonic signals χm , we multiply the signal with the encoder representing the direction of the signal at the angle ϕs by the weights m (ϕs ) χm (t) = m (ϕs ) s(t), (4.18) or in vector notation χ N = yN (ϕs ) s, (4.19) using the column vector yN = [ −N (ϕs ), . . . , N (ϕs )]T of 2N + 1 components. The Ambisonic signals in χ N are weighted by side-lobe suppressing weights aN = [a|−N| , . . . , aN ]T , expressed by the multiplication with a diagonal matrix diag{aN }, and then decoded to the L loudspeaker signals x by a sampling decoder D= 2π L yN (ϕ1 ), . . . , yN (ϕL ) T = 2π L Y TN , (4.20) using x = D diag{aN } χ N . (4.21) In total, the system for encoding and decoding can also be written to yield a set of loudspeaker gains for one virtual source g = D diag{aN } yN (ϕs ), or in particular for the 2D sampling decoder g = 2π L (4.22) Y TN diag{aN } yN (ϕs ). 4.6 Listening Experiments on 2D Ambisonics There are several listening experiments discussing the features of Ambisonics, most of which are summarized in [25], which will be discussed complemented with those from [26] below. The perceptually adjusted panning angle of 2nd-order max-r E Ambisonics panning on 6 horizontal loudspeakers matches quite well the acoustic reference direction as shown in Fig. 4.4, similar to MDAP in Fig. 3.8, but with a slightly more 62 4 Ambisonic Amplitude Panning and Decoding in Higher Orders 180 panning (=adjusted) angle in degree panning (=adjusted) angle in degree 180 150 120 90 60 30 150 120 90 60 30 0 0 0 30 60 90 120 150 180 reference (=perceived) angles in degree 0 30 60 90 120 150 180 reference (=perceived) angles in degree Fig. 4.4 2nd-order max-r E -weighted Ambisonic panning with pink-noise on horizontal rings of 60◦ -spaced loudspeakers adjusted to perceptually match reference loudspeaker directions (harmonic complex) every 15◦ . Markers and whiskers indicate 95% confidence intervals and medians, black curve the r E vector model C 4 4 3 3 2 1 0 C 1 0 -1 B 2 1 2 A 5 x in m x in m B A 5 1 2 -1 6 5 4 3 2 1 0 -1 -2 -3 -4 -5 -6 6 5 4 y in m horizontal localization error in ° horizontal localization error in ° 1 0 -1 -2 -3 -4 -5 -6 off-center position 20 w/o delay comp. w delay comp. 8 2 y in m central position 10 3 6 4 2 0 w/o delay comp. w delay comp. 15 10 5 0 basic max-rE in-phase basic max-rE in-phase Fig. 4.5 In Frank’s 2008 pointing experiment [27] on center and off-center listening seats for 3 virtual sources (A, B, C) using 1st-order (left) and 5th-order (right) Ambisonics on 12 horizontal loudspeakers (IEM CUBE) indicate a more stable localization with high orders. Moreover, for 5th-order, max-r E weighting and omission of delay compensation were preferred. Omission of max-r E weights (“basic”) or alternative “in-phase” weights that entirely suppresses any side lobe yield less precise localization at off-center listening positions right 4.6 Listening Experiments on 2D Ambisonics 63 -45 -40 panning angle in ° -35 -30 -25 -20 -15 -10 left -5 0 90 85 80 75 70 65 60 55 50 45 40 35 30 25 20 15 10 right left 5 0 -5 -10 -15 -20 -25 -30 -35 -40 -45 perceived angle in ° right -45 -40 panning angle in ° -35 -30 -25 -20 -15 -10 left -5 0 90 85 80 75 70 65 60 55 50 45 40 35 30 25 20 15 10 left 5 0 -5 -10 -15 -20 -25 -30 -35 -40 -45 perceived angle in ° right horizontal localization error in ° 50 2.2 m radius 5 m radius 40 30 20 10 0 center right back/left center right back/left 1st order 3rd order Fig. 4.6 Experiments on an off-center position in a show that max-r E outperforms the basic, rectangularly truncated Fourier series at off-center listening positions, b where it can avoid splitting of the auditory event. Stitt’s experiments c imply that localization with higher orders is more stable and that the localization deficiency at off-center listening seats seems to be proportional to the ratio between distance to the center divided by radius of the loudspeaker ring, and not the specific time-delays that are larger for large loudspeaker rings, cf. [28] accurate median by 0.5◦ on average, and in particular at side and back panning directions. Another aspect to investigate is how stable the results are for center and offcenter listening seats as shown in Fig. 4.5. It illustrates that max-r E with the highest order achieves the best stability with regard to localization at off-center listening seats. Astonishingly, the delay compensation for non-uniform delay times to the center deteriorated the results, most probably because of the nearly linear frontal 64 4 Ambisonic Amplitude Panning and Decoding in Higher Orders Fig. 4.7 Predicted sweet area sizes using the r E model Sect. 2.2.9 for loudspeaker layouts and playback orders used in Stitt’s experiments [28]: first order (top), third order (bottom), small (left), and large (right) arrangement of loudspeakers that is more robust to lateral shifts of the listening positions than a circular arrangement. Figure 4.6a, b shows the direction histogram for two different weightings am , and it illustrates that proper sidelobe suppression of the panning function by using max-r E weights is decisive at shifted listening positions to avoid splitting of the auditory image, as it appears in Fig. 4.6b without the weights (basic). Peter Stitt’s work shows that the localization offsets at off-center listening seats do not increase with the radius of the loudspeaker arrangement as long as the off-center seat stays in proportion to the radius, Fig. 4.6c. The result are predicted by the sweet area model from Sect. 2.2.9 for the first order (top row) and third order (bottom row) in Fig. 4.7, with both sizes small setup (left) and large setup (right). 4.6 Listening Experiments on 2D Ambisonics 65 5 L C R 4 3 x in m 2 1 0 -1 -2 -3 -4 -5 6 5 4 3 2 1 0 -1 -2 -3 -4 -5 -6 y in m (a) Listening area size experiment (b) Sweet area for frontal sound relative source width 1 narrow Fig. 4.9 Frank’s 2013 experiments showed for max-r E Ambisonics on three frontal loudspeakers that large perceived widths are not entirely accurately modeled by the optimally-sampled, therefore constant r E vector for low orders, however reasonably well, and significantly for high orders/narrow width wide Fig. 4.8 The perceptual sweet spot size as investigated by Frank [29] is nearly covering the entire area enclosed by the IEM CUBE as a playback setup (black = 5th, gray = 3rd, light gray = 1st order Ambisonics). It is smallest for 1st-order Ambisonics 0.5 0 on lsp. quarter half on lsp. quarter half 8 loudspeakers/3rd order 16 loudspeakers/7th order Frank’s 2016 experiments [29] used scales on the floor from which listeners read off where the sweet area ends in every radial direction, cf. Fig. 4.8a. For Fig. 4.8b, the criterion for listeners to indicate leaving the sweet area was when the frontally panned sound was mapped outside the loudspeaker pairs L, C, and R. It showed that a sweet area providing perceptually plausible playback measures at least 23 of the radius of the loudspeaker setup if the order is high enough. The perceived width of auditory events is investigated in the experimental results of Fig. 4.9, [25], in which pink noise was frontally panned in different orientations of the loudspeaker ring (with one loudspeaker in front, with front direction lying quarter- and half-spaced wrt. loudspeaker spacing). Listeners compared the width of multiple stimuli, and the results were expected to indicate constant width for the differently rotated loudspeaker ring, as the optimal arrangement with L = 2N + 2 66 4 Ambisonic Amplitude Panning and Decoding in Higher Orders imperceptible coloration very intense Fig. 4.10 Frank’s 2013 experiments on the variation of the sound coloration of virtual sources rotating at a speed of 100◦ /s imply that max-r E weighting outperforms the “basic” weighting with am = 1, and that adjusting the number of loudspeakers to the Ambisonic order seems to be reasonable 1 central position basic max r 0.8 off-center position E 0.6 0.4 0.2 0 8/3 16/7 loudspeakers/order 8/3 16/7 loudspeakers/order provides constant r E length. The panning-invariant length is not perfectly reflected in the perceived widths with 3rd order on 8 loudspeakers, for which the on-loudspeaker position is perceived as being significantly wider. By contrast, the high-order experiment with 7th order on 16 loudspeakers would perfectly validate the model. Figure 4.10 shows experiments investigating the time-variant change in sound coloration for a pink-noise virtual source rotating at a speed of 100◦ /s, and for different Ambisonic panning setups. There is an obvious advantage of a reduced fluctuation in coloration at both listening positions, centered and off-center, when using the side-lobe-suppressing “max-r E ” weighting instead of the “basic” rectangular truncation of the Fourier series. At the off-center listening position, max-r E weights achieve good results with regard to constant coloration for both 3rd and 7th order arrangements with 8 and 16 loudspeakers that were investigated. How well would diffuse signals be preserved played back? All the above experiments deal with how non-diffuse signals are presented. To complement what is shown in Fig. 1.21 of Chap. 1 with an explanation, the relation between Ambisonic order and its ability to preserve diffuse fields is estimated here by the covariance between uncorrelated directions. Assume a max-r E -weighted Nth-order Ambisonic panning function g(θ Ts θ ) that is normalized to g(1) = 1, encodes two sounds s1,2 from two directions θ 1 and θ 2 , with the sounds being uncorrelated and unit-variance E{s1 s2 } = δ1,2 . We can find that the Ambisonic representation mixes the sounds at their respective mapped directions and yields an increase of their correlation x1 = s1 + g12 s2 and x2 = s2 + g12 s1 , using g12 = g(cos φ), R{x1 x2 } = = E{x1 x2 } (4.23) E{x12 }E{x22 } 2 E{(1 + g12 ) s1 s2 + g12 (s12 + s22 )} E{s12 + 2 g12 s1 s2 + 2 2 g12 s2 }E{s22 + 2 g12 s1 s2 + 2 2 g12 s1 } = 2 g12 . 2 1 + g12 4.6 Listening Experiments on 2D Ambisonics 67 This result was presented in Fig. 1.21 and was used to argue that the directional separation of first-order Ambisonics by its high crosstalk term g12 might be too weak. Higher-order Ambisonics decreases this directional crosstalk and therefore improves the representation of diffuse sound fields. 4.7 Panning with Spherical Harmonics in 3D In three space dimensions, the spherical coordinate system has a radius r and two angles, azimuth ϕ indicating the polar angle of the orthogonal projection to the x y plane, and the zenith angle ϑ indicating the angle to the z axis, according to the righthanded spherical coordinate systems in ISO31-11, ISO80000-2, [30, 31], Fig. 4.11. By the generalized chain rule, Appendix A.3 re-writes the Laplacian to spherical coordinates in 3D with r signifying the radius, ϕ the azimuth angle, and the zenith angle ϑ re-expressed as ζ = rz = cos ϑ, yielding the operator = 2 1−ζ 2 ∂ 2 2 ∂ 1 2 ∂ ∂2 + ∂r∂ 2 + r 2 (1−ζ . Any radius-dependent part is removed 2 ) ∂ϕ 2 − r 2 ζ ∂ζ + r ∂r r 2 ∂ζ 2 to define an eigenproblem yielding the basis for panning functions, taking only r 2 ϕ,ζ,3D , 2 1 ∂2 ∂ 2 ∂ (4.24) + (1 − ζ ) 2 Y = −λ Y − 2ζ 1 − ζ 2 ∂ϕ 2 ∂ζ ∂ζ whose solution with λ = n(n + 1) defines the spherical harmonics Ynm (θ ) = Ynm (ϕ, ϑ) = m n (ϑ) m (ϕ). (4.25) The pre-requisites are (i) periodicity in ϕ and (ii) that the function Ynm is finite on the sphere. In addition to the circular harmonics m expressing the dependency on azimuth ϕ according to Eq. (4.10), the spherical harmonics contain the associated Legendre functions Pnm and their normalization term |m| |m| m n (ϑ) = Nn Pn (cos ϑ) Fig. 4.11 The spherical coordinate system (4.26) 68 4 Ambisonic Amplitude Panning and Decoding in Higher Orders to express the dependency on the zenith angle ϑ. The index n ≥ 0 expresses the order and the directional resolution can be limited by requiring 0 ≤ n ≤ N. The index m is the degree and for each n it is limited by −n ≤ 0 ≤ n. The spherical harmonics, Fig. 4.12, are orthonormal on the sphere −π ≤ ϕ ≤ π and 0 ≤ ϑ ≤ π , and for unbounded order N → ∞ they are complete; see also Appendix A.3.7. The spherical harmonics permit a series representation of square-integrable 3D directional functions by the coefficients γnm , ∞ n g(θ) = γnm Ynm (θ ). (4.27) n=0 m=−n From a known function g(θ), the coefficients are obtained by the transformation integral over the unit sphere S2 , cf. appendix Eq. (A.38) γnm = S2 g(θ) Ynm (θ) dθ . (4.28) Note that the above N3D normalization θ∈S2 |Ynm (θ )|2 dθ = 1 defines each spherical harmonic except for an arbitrary-phase it might be multiplied with. Legendre functions for the zenith dependency might be defined differently in literature, and for azimuth, some implementations use sin(mϕ) insteadof sin(|m|ϕ). (n−|m|)! are In Ambisonics, real-valued functions and the SN3D normalization 21 (n+|m|)! preferred, and positive signs of the first-order dipole components in the directions of the respective coordinate axes, x, y, z, are preferred. This might require to involve Fig. 4.12 Spherical harmonics indexed by Ambisonic channel number AC N = n 2 + n + m; rows show spherical harmonics for the order 0 ≤ n ≤ 3 with the 2n + 1 harmonics of the degree −n ≤ m ≤ n. What is plotted is a polar diagram with the radius R = 20 lg |Ynm | normalized to the upper 30 dB of each pattern, with positive (gray) and negative (black) color indicating the sign. The order n counts the circular zero crossings, and |m| counts those running through zenith and nadir 4.7 Panning with Spherical Harmonics in 3D 69 the Condon-Shortley phase (−1)m to correct the signs of the Legendre functions, or −1 for m < 0 to correct the sign of azimuthal sinusoids, depending on the implementation of the respective functions. It is often helpful to employ converters and directional checks to ensure compatibility! 3D panning function. An infinitely narrow direction range around a desired direction θ Ts θ > cos ε → 1 is represented by the transformation integral over the Dirac delta δ(1 − θ Ts θ ), cf. Eq.(A.41), so that the coefficients of the panning function are γnm = Ynm (θ s ). (4.29) As infinitely many spherical harmonics are complete, the panning function is ∞ n g(θ) = Ynm (θ s ) Ynm (θ ) = δ(1 − θ Ts θ ), (4.30) n=0 m=−n and in practice, the finite-resolution Nth-order panning function with n ≤ N employs a weight an to reduce side lobes and optimize the spread N n gN (θ ) = an Ynm (θ s ) Ynm (θ ). (4.31) n=0 m=−n 137.9◦ ) , as derived The max-r E panning function uses the weights an = Pn cos( N+1.51 137.9◦ in Appendix Eq. (A.46). The spread is now adjustable by the order to ± N+1.51 . Figure 4.13 shows a comparison to the basic weighting an = 1. An alternative expression that uses Legendre polynomials Pn and only depends on the angle φ to the panobtained by replacing the sum over m by the spherical harmonics ning direction θ s is Pn (cos φ), addition theorem nm=−n Ynm (θ s ) Ynm (θ) = 2n+1 4π N gN (φ) = 2n+1 4π an Pn (cos φ). (4.32) n=0 Comparison to first-order Ambisonics shows: now Ynm (θ s ) represents the recorded or encoded directions, and Ynm (θ ) represents the decoded playback directions. Optimal sampling of the 3D panning function. In the theory of spherical polynomials in the variable ζ = θ Ts θ , so-called t-designs describe point sets of given directions {θl } with l = 1, . . . , L and size L that allow to perfectly compute the integral (constant part) over the polynomials Pn (ζ ) of limited order n ≤ t by discrete summation π 1 dϕ −π −1 L Pn (ζ ) dζ = Pn (θ Ts θl ) 4π , L l=1 (4.33) 4 Ambisonic Amplitude Panning and Decoding in Higher Orders N=1 N=2 N=5 0° 0dB −10dB 13 −20dB 0dB −30dB 0° −10dB 180° ° 13 45 ° −20dB −30dB 180° N=1 N=2 N=5 D=3, maxre 90° 45 5° D=3, basic 90° 5° 70 Fig. 4.13 3D unweighted an = 1 basic and weighted max-r E Ambisonic panning functions for the orders N = 1, 2, 5 relative to any axis θ s the point set is projected onto. In 3D, the Legendre polynomials Pn (ζ ) are orthogonal polynomials, therefore an Nth-order panning function composed thereof is a polynomial of Nth order. The loudness measure E is calculated by the integral over gN2 , therefore over a polynomial of the order 2N. The integral to calculate r E runs over gN2 ζ , therefore over a polynomial of the order 2N + 1. In playback, to get a perfectly panning-invariant loudness measure E of the continuous panning function and also the perfectly oriented r E vector of constant spread arccos r E , the parameter t must be t ≥ 2N + 1. In 3D there are only 5 geometrically regular layouts • • • • • the tetrahedron, L = 4 corners, is a 2-design, the octahedron, L = 6 corners, is a 3-design, the hexahedron (cube), L = 8 corners, is a 3-design, the icosahedron, L = 12 corners, is a 5-design, the dodecahedron, L = 20 corners, is a 5-design. For instance, for N = 1, the octahedron is a suitable spherical design, for N = 2, the icosahedral or dodecahedral layouts are suitable. Exceeding the geometrically regular layouts, there are designs found by optimiza√ tion to be regular under the mathematical rule to approximate S2 Ynm (θ) dθ = 4π δn m accurately by 4π l Yn (θl ) for all n ≤ t and |m| ≤ n. A large collection can be found L by Hardin and Sloane [32], Gräf and Potts [33], and Womersley [34] available on the following websites http://neilsloane.com/sphdesigns/dim3/ http://homepage.univie.ac.at/manuel.graef/quadrature.php (Chebyshev-type Quadratures on S2 ), and https://web.maths.unsw.edu.au/~rsw/Sphere/EffSphDes/ss.html. Figure 4.14 gives some graphical examples. 4.8 Ambisonic Encoding and Optimal Decoding in 3D 71 Fig. 4.14 t-designs from Gräf’s website (Chebyshev-type quadrature) 4.8 Ambisonic Encoding and Optimal Decoding in 3D To encode a signal s into Ambisonic signals χnm , we multiply the signal with the encoder representing the direction θ s of the signal by the weights Ynm (θ s ) χnm (t) = Ynm (θ s ) s(t), (4.34) χ N = yN (θ s ) s, (4.35) or in vector notation using the column vector yN = [Y00 (θ s ), Y1−1 (θ s ), . . . , YNN (θ s )]T of (N + 1)2 components. The Ambisonic signals in χ N are weighted by side-lobe suppressing weights aN = [a0 , a1 , a1 , a1 , a2 , . . . , aN ]T , expressed by the multiplication with a diagonal matrix diag{aN }, and then decoded to the L loudspeaker signals x by a sampling decoder T yN (θ1 ), . . . , yN (θL ) = 4π Y TN , (4.36) D = 4π L L using x = D diag{aN } χ N . (4.37) In total, the system for encoding Eq. (4.35) and decoding Eq. (4.36) can also be written to yield loudspeaker gains for one signal g = D diag{aN } yN (θ s ), or in particular for the 3D sampling decoding g = 4π L (4.38) Y TN diag{aN } yN (θ s ). 72 4 Ambisonic Amplitude Panning and Decoding in Higher Orders 4.9 Ambisonic Decoding to Loudspeakers Ambisonic decoding to loudspeakers has been dealt with by numerous researchers, in the past, particularly because result are not very stable for first-order Ambisonics, and later because they strongly depend on how uniform the loudspeaker layout is for higher-order Ambisonics. Moreover, Solvang found that even the use of too many loudspeakers has a degrading effect [35]. For first-order decoding, the Vienna decoders by Michael Gerzon [36] are often cited, and for higher-order Ambisonic decoding, one can, e.g. find works by Daniel with max-r E [37] and pseudo-inverse decoding [10], also by Poletti [14, 38, 39]. What turned out to be the most practical solution, is the All-Round Ambisonic Decoding approach (AllRAD) due to its feature of allowing imaginary loudspeaker insertion and downmix as described in the sections above, cf. [40]. It moreover does not have restrictions on the Ambisonics order, which for other decoders often yields poor controllability of panning-dependent fluctuations in loudness and directional mapping errors. The playable set of directions θl or ϕl is usually finite and discrete, and it is represented by the surrounding loudspeakers’ directions. The directional distribution of the surrounding loudspeakers is typically neither a t-design (with t ≥ 2N + 1 in general, sometimes not even regular polygons with L ≥ 2N + 2 loudspeakers for 2D, in particular). In such cases, it is extremely helpful to be aware of the properties of the various decoder design methods. 4.9.1 Sampling Ambisonic Decoder (SAD) The sampling decoder as introduced above is the simplest decoding method. For dimensions (D = 2) and three (D = 3), it uses the matrix Y N = [ yN (θ1 ), . . . , yN (θL )] containing the respective circular or spherical harmonics yN (θ ) sampled at the loudspeaker directions {θl }, D= SD−1 L Y TN , (4.39) with the circumference of the unit circle denoted as S1 = 2π or the surface of the unit expresses that each loudspeaker synthesphere written as S2 = 4π . The factor SD−1 L sizes a fraction of the E measure on the circle or sphere of the surrounding directions. However, the sampling decoder would neither yield perfectly constant loudness and width measures, E, r E , nor a correct aiming of the localization measure r E if the loudspeaker layout wasn’t optimal. For instance concerning loudness, for panning towards directional regions of poor loudspeaker coverage, sampling misses out the main lobe of the panning function, yielding a noticeably reduced loudness. 4.9 Ambisonic Decoding to Loudspeakers 73 4.9.2 Mode Matching Decoder (MAD) The mode-matching method is used in [10, 39] and yields a fundamentally different decoder design. Its concept is to re-encode the gain vector g of the loudspeakers for any panning direction θ s by the encoding matrix Y N = [ yN (θ1 ), . . . , yN (θL )] for all loudspeaker directions {θl }. Ideally, the re-encoded result should match the encoding of the panning direction with sidelobes suppressed Y N g = diag{aN } yN (θ s ). Using the definition g = D diag{aN } yN (θ s ) of the panning gains, we obtain Y N D diag{aN } yN (θ s ) = diag{aN } yN (θ s ), L ⇒ D = SD−1 Y TN (Y N Y TN )−1 (4.40) so that the decoder D is required to be right-inverse to the matrix Y N , i.e. Y N D = Y N Y TN (Y N Y TN )−1 = I, see Eq. (A.63) in Appendix A.4. For the inverse of Y N Y TN to exist, it is necessary to have at least as many loudspeakers as harmonics, i.e. L ≥ (N + 1)2 with D = 3 or L ≥ 2N + 1 for D = 2. However, this is not a sufficient criterion yet: In directions poorly covered with loudspeakers, the inversion will boost the loudness, so that the result is often numerically ill conditioned for (Y N Y TN )−1 unless the loudspeaker layout is uniformly designed, at least. Mode matching decoding is ill-conditioned on hemispherical or semicircular loudspeaker layouts. The solution is equivalently described by the more general pseudo inverse Y †N , which is right-inverse for fat matrices. 4.9.3 Energy Preservation on Optimal Layouts For instance, for an order of N = 2, 2D Ambisonics should work optimally with a ring of 45◦ spaced loudspeakers on the horizon, a circular (2N + 1)-design, or for 3D, a spherical (2N + 1)-design. On a t-design selected by t ≥ 2N, the loudness measure E is panning-invariant, in general, T diag{a } y (θ ) = diag{a } y (θ )2 = const. E = g2 = yT N N s N N s N (θ s ) diag{a N } DD =I This is because a t ≥ 2N-design discretization preserves orthonormality L yN (θ ) yTN (θ ) dθ = SD−1 L yN (θl ) yTN (θl ) = l=1 SD−1 Y N Y TN L = I, (4.41) E/Eideal in dB Fig. 4.15 Analysis of loudness, localization error, and width for 3rd-order sampling (SAD) and mode-matching (MAD) Ambisonic decoding for all panning angles on a sub-optimal 45◦ spaced loudspeaker ring with gap at −90◦ , compared to VBIP 4 Ambisonic Amplitude Panning and Decoding in Higher Orders 10 5 0 −5 −180 −135 ∠ r −φs in ° E 74 0 45 90 135 180 −45 0 45 90 135 180 50 0 −50 −180 −135 arccos |rE| in ° −45 90 MAD, N=3 SAD, N=3 VBIP 67.5 45 22.5 0 −180 −135 −45 0 45 90 135 180 which implies for the sampling decoder DT D = SD−1 Y N Y TN = I, and we notice the L panning invariant norm of g(θ) within its coefficients γ N = diag{aN } yN (θ s ) by the Parseval theorem g 2 (θ) dθ = γ N 2 . The panning invariant E measure also holds for the mode-matching decoder using a t ≥ 2N-design, as it becomes equivalent to L L T T −1 T SD−1 Y TN . Under a sampling decoder D = SD−1 Y N (Y N Y N ) = SD−1 Y N L = SD−1 L these ideal conditions, both decoders are energy-preserving. 4.9.4 Loudness Deficiencies on Sub-optimal Layouts For 2D layouts, Fig. 4.15 shows what happens if a decoder is calculated for a t ≥ 2N + 1-design with one loudspeaker removed: While, for panning across the gap, the sampling Ambisonic decoder (SAD) yields a quieter signal, moderate localization errors and width fluctuation, the mode-matching decoder (MAD) yields a strong loudness increase and severe jumps in the localization/width. MAD is therefore not very practical with sub-optimal layouts, SAD only slightly more so. 4.9.5 Energy-Preserving Ambisonic Decoder (EPAD) To establish panning-invariant loudness for decoding to non-uniform surround loudspeaker layouts one can ensure a constant loudness measure E by enforcing DT D = I, which is otherwise only achieved on t ≥ 2N-designs. We may search for 4.9 Ambisonic Decoding to Loudspeakers 75 a decoding matrix D whose entries are closest to the sampling decoder under the constraint to be column-orthogonal: D − SD−1 L Y TN 2Fro → min (4.42) subject to DT D = I. The singular value decomposition of Y TN = U [diag{s}, 0]T V T (4.43) T D = U I, 0 V T . (4.44) can be used to create by replacing the singular values s with ones. Such a decoder is column-orthogonal, as the singular-value decomposition delivers U T U = I and V V T = I, and as a consequence1 DT D = I. The energy-preserving decoder in this basic version requires L ≥ 2N + 1 loudspeakers in 2D or L ≥ (N + 1)2 in 3D to work. Note that if the loudspeaker setup directions are already a t ≥ 2N design, the sampling, mode-matching, and energy-preserving decoders are equivalent. 4.9.6 All-Round Ambisonic Decoding (AllRAD) In Chap. 3 on vector-base amplitude panning methods, a well-balanced panning result in terms of loudness, width, and localization was achieved by MDAP that distributes a signal to an arrangement of several superimposed VBAP virtual sources. Hereby E = const., r E ≈ rE θ s , and rE ≈ const. This works for nearly any loudspeaker layout. While, to calculate loudspeaker gains, MDAP superimposes an arrangement of discrete virtual sources within a range of ±α around the panning direction θ s , one could also think of superimposing a quasi-continuous distribution of virtual sources that are weighted by a continuous panning function g(θ). The ideal continuous panning function g(θ) of axisymmetric directional spread around the panning direction θ s is described by g(θ) = yTN (θ) diag{aN } yN (θ s ), the Ambisonic panning function. This rotation-invariant continuous function is optimal in terms of loudness, width, and localization measures, which are all evaluated by continuous integrals: E = g 2 (θ ) dθ = const. expresses panning-invariant loudness, r E = E1 g 2 (θ) θ dθ = rE θ s indicates a perfect alignment r E θ s with the panning direction and a panning-invariant width rE = const. However, the optimal values of 1 In ( (( detail, this follows from DT D = V [I, 0] U T U[I, 0]T V T = V( [I,( 0][I, 0]T V T = V V T = I. 76 4 Ambisonic Amplitude Panning and Decoding in Higher Orders these integrals are only preserved by discretization with optimal t ≥ 2N + 1-design loudspeaker layouts. All-round Ambisonic decoding (AllRAD) is preceded by the work of Batke and Keiler [16]. They describe Ambisonic panning g AllRAD (θ) = D yN (θ ) by a decoder D, whose result matches best with VBAP g VBAP (θ ). Without max-r E weights yet, we use this here to define AllRAD by the integral expressing a minimum-meansquare-error problem using the integral over all panning directions θ min D S2 g VBAP (θ ) − D yN (θ )2 dθ . (4.45) Equivalently, as described by Zotter and Frank [40] who coined the name, we may define AllRAD as VBAP synthesis on the physical loudspeakers when using as multiple-virtual-source inputs the Ambisonic panning function gAMBI (θ ) = yTN (θ ) diag{aN } yN (θ s ) sampled at an optimal layout of virtual loudspeakers. Here, we write the synthesis as the integral over infinitely many virtual loudspeakers θ , g= = g VBAP (θ) gAMBI (θ ) dθ = g VBAP (θ ) yTN (θ ) diag{aN } yN (θ s ) dθ g VBAP (θ) yTN (θ) dθ diag{aN } yN (θ s ). := D We can obviously pull the term diag{aN } yN (θ s ) out of the integral. The remaining integral defines the AllRAD matrix D. We may interpret it as a transformation of the VBAP loudspeaker gain functions g VBAP (θ ) into spherical harmonic coefficients. In the original paper [40], AllRAD is evaluated by an optimal layout of discrete virtual loudspeakers D= L̂ g VBAP (θ ) yTN (θ ) dθ = SD−1 L̂ g VBAP (θ̂l ) yTN (θ̂l ) = SD−1 L̂ T Ĝ Ŷ N , (4.46) l=0 using the directions {θ̂l } of a t-design. As VBAP’s gain functions aren’t smooth (derivatives are non-continuous), they are order-unlimited, and a t-design of sufficiently high t should be used. In the 3D practice, the 5200 pts. Chebyshev-type design from [33] is dense enough. Note that the VBAP part permits improvements by insertion and downmix of imaginary loudspeakers to adapt to asymmetric or hemispherical layouts, as suggested in the original paper [40], cf. Sect. 3.3. Note that the decoder needs to be scaled properly. For instance, the norm of the omnidirectional component (first column) could be equalized to one, as it would typically be with a sampling decoder; there are alternative strategies to circumvent the scaling problem [41]. 10 5 0 −5 −180 −135 ∠ rE−φs in ° Fig. 4.16 Analysis of loudness, localization error and width measure for energy-preserving Ambisonic decoding (EPAD) and all-round Ambisonic decoding (AllRAD) for all panning angles on a sub-optimal 45◦ spaced loudspeaker ring with gap at −90◦ 77 E/Eideal in dB 4.9 Ambisonic Decoding to Loudspeakers 0 45 90 135 180 −45 0 45 90 135 180 50 0 −50 −180 −135 90 ALLRAD, N=3 EPAD, N=3 VBIP 67.5 E arccos |r | in ° −45 45 22.5 0 −180 −135 −45 0 45 90 135 180 4.9.7 EPAD and AllRAD on Sub-optimal Layouts Figude 4.16 shows the improvement achieved with EPAD and AllRAD on an equiangular arrangement that is suboptimal by the missing loudspeaker at −90◦ . Both decoders manage to handle either the loudness stabilization perfectly well (EPAD) or keep the directional and spread mapping errors small (AllRAD). We notice that for EPAD, with the constraint that L ≥ (2N + 1) just fulfilled for N = 3 and L = 7 of the simulation, it would not simply be possible to remove any further loudspeakers without degradation. 4.9.8 Decoding to Hemispherical 3D Loudspeaker Layouts In typical loudspeaker playback situations for large audience, a solid floor and no loudspeakers below ear level are considered practical for several reasons. However, this does not permit decoding by sampling with optimal t-design layouts covering all directions. As shown above, EPAD and AllRAD do not require such arrays. And yet, they still require some care when used with hemispherical loudspeaker layouts, see [15, 40] for further reading. EPAD with hemispherical loudspeaker layouts. Even for a hemispherical layout, the energy-preserving decoding method requires L ≥ (N + 1)2 loudspeakers to achieve a perfectly panning-invariant loudness. However, this is counter-intuitive: Why should one need at least as many loudspeakers on a hemisphere as are required for sameorder playback on a full sphere? Shouldn’t the number be half as many? 78 4 Ambisonic Amplitude Panning and Decoding in Higher Orders Table 4.1 Integration ranges 0 ≤ ϑ ≤ ϑmax to obtain (N + 1)(N + 2)/2 Slepian functions with minimum loudness fluctuation maxE minE for panning on the hemisphere N 1 2 3 4 5 6 7 8 9 ϑmax 115◦ 114◦ 113◦ 111◦ 108◦ 106◦ 104◦ 102◦ maxE minE 0.5 dB 0.4 dB 0.4 dB 0.3 dB 0.3 dB 0.3 dB 0.2 dB 0.3 dB 101◦ 0.2 dB We can show that while the spherical harmonics are orthonormal on the sphere S2 , i.e. S2 yN (θ ) yTN (θ ) dθ = I, they aren’t orthogonal on the hemisphere S = S2 : ϑ ≤ ϑmax yN (θ ) yTN (θ ) dθ = G. (4.47) S T Here, G is called Gram matrix, and it is evaluated by 4π l:θz,l ≥0 yN (θl ) yN (θl ) using a L̂ high-enough t-design. By singular-value decomposition of the positive semi-definite matrix G = Q diag{s} Q T , with Q T Q = Q Q T = I, we diagonalize G and find new basis functions ỹN (θ ), the so-called Slepian functions [42], that are orthogonal on S Q G Q = diag{s} = T S Q T yN (θ) yTN (θ ) Q dθ , ⇒ ỹN (θ) = Q T yN (θ ). Typically, the singular values in s are sorted descendingly s1 ≥ s2 ≥ · · · ≥ s(N+1)2 so that it is possible to cut out basis functions of significantly large contribution to the upper hemisphere S by ỹN (θ ) = [I, 0] Q T yN (θ ). (4.48) Typically, the numerical integral is extended to slightly below the horizon, see Table 4.1, so that truncation to the (N + 1)(N + 2)/2 most significant basis functions, see Fig. 4.17, produces a minimum fluctuation in the loudness measure Ẽ = ỹN (θ )2 for panning on the hemisphere. With ỹN (θ), EPAD is calculated in the same way as for the ordinary harmonics T T Ỹ N = Ũ[diag{s}, 0]T Ṽ , T D̃ = Ũ[I, 0]T Ṽ , (4.49) with the main difference that the lower limit for the number of loudspeakers decreases to L ≥ (N + 1)(N + 2)/2. Interfaced to the spherical harmonics by [I, 0] Q T , the hemispherical energy-preserving decoder becomes D = D̃ [I, 0] Q T . (4.50) 4.9 Ambisonic Decoding to Loudspeakers 79 Fig. 4.17 Slepian basis functions for the upper hemisphere, composed of the spherical harmonics up to 3rd order, using 0◦ ≤ ϑ ≤ 113◦ as integration interval and (N + 1)(N + 2)/2 functions. To get these nice shapes, the Slepian functions were found separately for every degree m where they were moreover de-mixed by Q R decomposition AllRAD with hemispherical loudspeaker layouts. Because of the vector-base amplitude panning involved, all-round Ambisonic decoding (AllRAD) is comparatively robust to irregular loudspeaker setups. Still, a hemispherical layout does not contain any loudspeaker direction vector pointing to the lower half space, therefore one could just omit information of the lower half space. However, the Ambisonic panning function implies a directional spread, so that panning to exactly the horizon also produces content below, whose omission causes: (i) a loss in loudness, (ii) a slight elevation of the perceived direction, cf. Fig. 4.18. As discussed in the section on triangulation Sect. 3.3, the insertion of imaginary loudspeakers fixes this behavior. In the case of hemispherical loudspeaker layouts, it is not necessary to downmix the signal of the imaginary loudspeaker at nadir to stabilize both loudness and localization for panning to the horizon. Signal contributions below but close to the horizon largely contribute to the horizontal loudspeakers, and it is therefore safe to dispose the signal that would feed the imaginary loudspeaker at nadir without loss of loudness. Moreover, this contribution from below also reinforces signals on the horizontal loudspeakers so that localization is pulled back down. Both can be observed in Fig. 4.18 that shows the loudness measure E as well as mislocalization and width by the measure r E using max-r E -weighted AllRAD with 5th-order Ambisonics along a vertical panning circle on the IEM mobile Ambisonics Array (mAmbA). It consists of 25 loudspeakers set up in rings of 8, 8, 4, 4, and 1 loudspeakers at 0, 20, 40, 60, 90 degrees elevation. Rings two and four start at 0 degree, the others are half-way rotated. Performance comparison on hemispherical layouts. Figure 4.18 shows a comparison of AllRAD and EPAD decoding to the 25-channel mAmbA hemispherical loudspeaker layout. 80 4 Ambisonic Amplitude Panning and Decoding in Higher Orders 3 0 −3 −6 −9 180 20 10 0 −10 −20 180 80 60 40 20 0 180 90 0 −90 −180 90 0 −90 −180 90 0 −90 −180 Fig. 4.18 Perceptual measures for 5th max-r E -weighted AllRAD on the IEM mAmbA layout [43] with (black) and without insertion of the bottom imaginary loudspeaker (black dotted) whose signal is disposed, and max-r E -weighted EPAD (gray), for panning on a vertical circle: E in dB (top), orientation error of r E in degrees (middle), and width expressed by arccos r E in degrees (bottom). The thin dashed line shows AllRAD without imaginary loudspeakers While (top in Fig. 4.18) AllRAD produces a loudness fluctuation roughly spanning 1 dB for panning on the hemisphere, EPAD only exhibits 0.3 dB, as specified in Table 4.1. While in monophonic playback of noise, loudness differences of less than 0.5 dB can be heard, it is safe to assume that a weak directional loudness fluctuation of less than 1 dB is normally inaudible. In this regard, loudness fluctuation should be no problem with both EPAD and AllRAD. Concerning the directional mapping, EPAD produces a more strongly pronounced ripple, with r E indicating sounds on the horizon ϑs = ±90◦ to be pulled upwards towards 0◦ more with EPAD (7◦ ) than with AllRAD (3◦ ). In terms of width, both EPAD and AllRAD exhibit the ≈20◦ average associated with max-r E weighting. However, EPAD also produces a greater fluctuation, and it widens up to about 30◦ degree for panning to the horizon ϑs = ±90◦ . With the 9 loudspeakers of the ITU [44] 4 + 5 + 0 layout (horizontal ring: ϕ = 0, ±30◦ , ±120◦ , upper ring at 40◦ elevation with ϕ = ±30◦ , ±120◦ ), it is not possible anymore to use EPAD with 5th order, which would be the optimal resolution for the front loudspeaker triplet. EPAD only supports orders up to N = 2, and to lose level towards below-horizon directions, we can use the reduced set of 6 Slepian functions; alternatively all 9 spherical harmonics of N = 2 would also be thinkable. For AllRAD, imaginary loudspeakers are inserted at the sides at azimuth/elevation ±75◦ /27◦ , up at 0◦ /78◦ , back 180◦ /35◦ , and below 0◦ / − 90◦ . It is reasonable to downmix the imaginary loudspeakers with a factor one for up, sides, back, and re-normalize the VBAP gain matrix, while disposing the signal of the imaginary loudspeaker below. AllRAD permits to use the order N = 5, which resolves the frontal loudspeaker triplet much better for horizontal panning. Figure 4.19 shows the result of max-r E -weighted 2nd-order EPAD and 5th-order AllRAD for the 4 + 5 + 0 layout using a vertical panning curve. While the perfectly constant loudness measure of EPAD might be favored over the almost +3 dB loudness 4.9 Ambisonic Decoding to Loudspeakers 0 −5 −10 180 40 20 0 −20 −40 180 150 100 50 0 180 81 90 0 −90 −180 90 0 −90 −180 90 0 −90 −180 Fig. 4.19 Perceptual measures for Ambisonic panning on the ITU [44] 4 + 5 + 0 layout with insertion and downmix of imaginary loudspeakers at the sides, back, and top, and insertion and disposal of an imaginary loudspeaker below for AllRAD. Measures are evaluated for max-r E weighted 5th-order AllRAD (black) and 2nd-order EPAD (gray), for panning on a vertical circle: E in dB (top), orientation error of r E in degrees (middle), and width expressed by arccos r E in degrees (bottom) increase of front and back for AllRAD, AllRAD’s lower directional error, narrower width mapping, greater flexibility, and simplicity has often proven to be clearly superior in practice. 4.10 Practical Studio/Sound Reinforcement Application Examples This section analyzes the application of 3D Ambisonic amplitude panning consisting of encoding and AllRAD to studio (with typical setups of 2 m radius) and sound reinforcement applications (for an audience of, e.g., 250 people). Application scenarios are sketched in [43], and various other examples are given below. Requirements of a constant loudness and width are analyzed below, and as sound reinforcement requires a particularly large sweet area, the r E vector model for off-center listening positions from Sect. 2.2.9 is used to depict the sweet area size. The analysis of decoders above described loudness measures for panning on a circle. To observe them with panning across all directions in Figs. 4.20 and 4.22, world-map-like mappings using a gray-scale representation of the loudness and width measures are more reasonable. For several loudspeaker layouts, its axes are azimuth horizontally and zenith vertically, and the gray-scale map displays the loudness measure E in dB (left column) and the width measure arccos r E in degrees (right column). As 5th-order max-r E -weighted AllRAD typically produces minor directional mapping errors, they aren’t explicitly shown in Figs. 4.20 and 4.22. However, the mappings of the sweet area size of plausible localization in Figs. 4.21 and 4.23 illustrate the usefulness of the systems for the listening areas hosting the number of listeners targeted for either the studio or the sound reinforcement application. Figure 4.20 illustrates AllRAD’s tendency of attenuated signals in too closely spaced loudspeaker ensembles as in the front section of the ITU [44] 4 + 5 + 0. By 82 4 Ambisonic Amplitude Panning and Decoding in Higher Orders 0 1 0 50 45 0 45 40 -1 -180 135 -90 0 90 -2 30 90 180 -180 135 -90 0 90 180 90 -3 180 20 10 180 -4 0 (a) ITU [44] 4+5+0 Studio (1 listener) 0 1 0 50 45 0 45 40 -1 -3 180 -180 135 -90 0 90 -2 30 90 180 -180 135 -90 0 90 180 90 20 10 180 -4 0 (b) IEM Production Studio (4 listeners) Fig. 4.20 Comparison of 5th-order max-r E -weighted AllRAD for panning across all directions on hemispherical loudspeaker layouts in studios. The left column the loudness measure E in dB and the right-most column the width measure arccos r E in degree, and the loudspeaker position are marked with a white + sign (a) ITU [44] 4+5+0 (1 listener) (b) IEM Production Studio (4 listeners) Fig. 4.21 Comparison of the calculated sweet are size for 5th-order max-r E -weighted AllRAD for panning across all directions on hemispherical loudspeaker layouts in studios. As a plausibility definition, the directional mapping errors depending on the listening position should stay within angular bounds (e.g. 10◦ ) contrast, for instance the mAmbA layout in Fig. 4.22 only has 8 loudspeakers on the horizon, and signals panned to the largely spaced below-horizon triangles tend to get louder. Moreover, it is easier for loudspeaker systems of many channels such as IEM CUBE, mAmbA, Lobby, and Ligeti Hall in Fig. 4.22 to yield smooth loudness and width mappings. Still, also with only a few loudspeakers, slight direction adjustment in the layout can fix some of the behavior, as with the IEM Production Studio, whose ±45◦ loudspeakers in the elevated layer is superior to a ±30◦ spacing. 4.10 Practical Studio/Sound Reinforcement Application Examples 0 1 0 50 45 0 45 40 -1 30 -180 135 -90 0 90 90 -2 180 -180 135 -90 0 90 90 180 83 -3 180 20 10 180 -4 0 (a) IEM mAmbA (40 listeners) 0 1 0 50 45 0 45 40 -1 30 -180 135 -90 0 90 90 -2 180 -180 135 -90 0 90 180 90 -3 180 20 10 180 -4 0 (b) IEM CUBE (80 listeners) 0 1 0 50 45 0 45 40 -1 30 -180 135 -90 0 90 90 -2 180 -180 135 -90 0 90 180 90 -3 180 20 10 180 -4 0 (c) Lobby demo room at AES 144 Milan (200 listeners) 0 1 0 50 45 0 45 40 -1 -4 -180 -90 0 135 -3 180 30 90 90 -2 180 -180 135 -90 0 90 180 90 20 10 180 0 (d) KUG Ligeti Hall (250 listeners) Fig. 4.22 Comparison of 5th-order max-r E -weighted AllRAD for panning across all directions on various hemispherical loudspeaker layouts for sound reinforcement. The left column the loudness measure E in dB and the right-most column the width measure arccos r E in degree, and the loudspeaker position are marked with a white + sign A hint for designing good decoders sometimes is idealization: often it is better to disregard the true loudspeaker setup locations and feed the decoder design with idealized positions instead. Hereby can one trade slight directional distortions for a more uniform loudness distribution. For instance at the IEM CUBE, loudspeaker 84 4 Ambisonic Amplitude Panning and Decoding in Higher Orders (a) IEM mAmbA (40 listeners) (b) IEM CUBE (80 listeners) (c) Lobby demo room at AES 144 Milan (200 listeners) (d) KUG Ligeti Hall (250 listeners) Fig. 4.23 Comparison of the calculated sweet are size for 5th-order max-r E -weighted AllRAD for panning across all directions on various hemispherical loudspeaker layouts for sound reinforcement. As a plausibility definition, the directional mapping errors depending on the listening position should stay within angular bounds (e.g. 10◦ ) 4.10 Practical Studio/Sound Reinforcement Application Examples 85 locations of the horizontal ring could be idealized to 30◦ to get a smoother loudness mapping as the one shown in Fig. 4.22. 4.11 Ambisonic Decoding to Headphones Typically, Ambisonic decoding to headphones can be done similarly as with loudspeakers, except that the loudspeaker signals are rendered to headphones by convolution with the head-related impulse responses (HRIRs) of the corresponding playback directions. Various databases of such HRIRs can be found, e.g., on the website SOFAconventions.2 This headphone decoding approach is classically using a small set of so-called virtual loudspeakers, as it is found in many places in technical literature, e.g. in the pioneering works of Jean-Marc Jot et al. [9] or Jérôme Daniel [10]. It is relevant in many important other works [18, 45, 46], the SADIE project,3 and it is employed in Sect. 1.4.2 on first-order Ambisonics. Coarse. However, as outlined in some research papers [9, 18, 46], these approaches have in common that low-order Ambisonic synthesis is problematic. It can either happen when inserting a dense grid of virtual-loudspeaker HRIRs that the Ambisonic smoothing attenuates high-frequency at frontal and dorsal directions. Or, what had been the solution for a long time, a coarse grid of virtual-loudspeaker HRIRs does not attenuate high frequencies, but still yields that spatial quality strongly depends on the particular grid layout or orientation [46]. An early paper by Jot [9] proposed to remove the time delays of the HRIR before Ambisonic decomposition, and then to re-insert the otherwise missing interaural time-delay afterwards, for any sound panned in Ambisonics, which unfortunately yields an object-based panning system rather than a scene-based Ambisonic system. Dense. Some dense-grid approaches propose to keep the HRIR time delays, or if formulated in the frequency domain: the HRTF phases (head-related transfer function), and hereby stay in a scene-based Ambisonic format, while correcting spectral deficiencies by diffuse-field or interaural-covariance equalization [18, 47]. Finally, most recent solutions proposed by Jin, Sun, and Epain, [17, 48] or Zaunschirm, Schörkhuber, and Höldrich [20, 21] modify the HRIR time delays/HRTF phases but only above, e.g., 3 kHz, without any object-based re-insertion afterwards. The omission of high-frequency interaural time-delay/phase information is a reasonable trade off done in favor of a more important accuracy in spectral magnitude. What does directional HRIR smoothing do to high frequencies? The geometrical theory of diffraction [49] suggests that HRIRs must always contain at least the delay to the ear of either the shortest direct path or the shortest indirect path via the surface of the head. For a spherical head model with the radius R = 0.0875 m and speed of sound c = 343 ms , the Woodworth-Schlosberg formula [50] is composed of this 2 https://www.sofaconventions.org. 3 https://www.york.ac.uk/sadie-project/database.html. 86 4 Ambisonic Amplitude Panning and Decoding in Higher Orders Fig. 4.24 Rigid-sphere model of delay to ear depending on angle φ (here: head rotation) 0 0.4 0.2 direct sound (left) time in ms time in ms 1 0 -6 -12 0.5 -18 -24 0 -30 indirect sound (right) -0.2 -36 150 100 50 0 -50 angle in degree (a) HRIR delay model -100 -150 150 50 100 0 -50 -100 -150 angle in degrees (b) Measured HRIR Fig. 4.25 Time delay to the ear depending on azimuth of horizontal sound (a) and (b) 360◦ -measured KU100 dummy head HRIRs set from TH Köln (color displays dB levels) consideration, see Fig. 4.24. The left ear receives a distant horizontal sound from the azimuth interval 0 ≤ φ ≤ π2 as direct sound anticipated by τ = − Rc sin φ, or for − π2 < φ ≤ 0 as an indirect sound delayed by τ = − Rc φ, τ (φ) = − Rc min {sin φ, φ} , (4.51) as plotted in Fig. 4.25a, and recognizable from dummy-head measurements4 in Fig. 4.25b. If the HRIR is smoothed across an angular range, the time-delay curve gets spread across time as well, see Fig. 4.26. In this way, depending on whether the smoothing uses a continuous or discrete set of directions, one either obtains something like a comb filter or a sinc-shaped frequency response. This smoothing is least disturbing around the direct-ear side as shown left in Fig. 4.26, and, as the indirect ear also encounters high-frequency shadowing effects, it is most disturbing mainly for frontal and rear sounds at 0◦ or 180◦ , as shown right in Fig. 4.26. The corresponding frequency responses are roughly exemplified with what third-order Ambisonics equivalent smoothing would do to either 45◦ -spaced HRIRs in Fig. 4.27a or 15◦ -spaced ones in Fig. 4.27b. 4 Data HRIR_CIRC360.sofa from http://sofacoustics.org/data/database/thk. 4.11 Ambisonic Decoding to Headphones 87 Fig. 4.26 Directionally smoothed playback to multiple HRIRs within a window, e.g. 30◦ , causes different impulse response shapes at 90◦ and 0◦ 0 0 -5 -5 -10 -10 -15 0 -20 90 200 400 800 1.6k 3.15k 6.3k 12.5k -15 0 -20 90 200 400 (a) Sparse HRIR set 800 1.6k 3.15k 6.3k 12.5k (b) Dense HRIR set Fig. 4.27 Differences of directionally smoothed HRTF frequency responses for a horizontal sound from either 0◦ and 90◦ , smoothed within a ±22.5◦ window, which roughly corresponds to the Ambisonics order N = 3; a and b either use a grid of 45◦ or 7.5◦ spaced HRIRs. The dashed line shows the the theoretical frequency limit 1.87 kHz for N = 3 To get an upper frequency limit, it is insightful to work in the frequency domain where the HRIR is denoted head-related transfer function (HRTF). A simplified linearized-phase version around φ = 0 uses τ ≈ Rc φ, and the resulting in the Fourier transform with ω = 2π f is ω ω H ≈ e−i c R τ (φ) = ei c R φ . (4.52) To represent it by circular or spherical harmonics transformation limited to the order N, a maximum phase change represented by the harmonic eiNφ implies that we can only resolve the phase up to ωc R ≤ N, hence the range of accurate operation is limited in frequency fN ≤ cN 2π R = N · 624 Hz. (4.53) As high-frequency HRTF phase evolves more rapidly over the angle as what the finite order can represent, this typically yields attenuation of the high frequencies when obtaining circular/spherical harmonics coefficients by transformation integral. Directional smoothing of the discrete directional HRTFs causes relevant spectral problems, regardless of whether directional smoothing is done by Ambisonics, VBAP, MDAP. Mainly the geometric delay in the HRIRs is responsible for the emerging comb-filter or low-pass behavior. One could pull out the linear phase trend above the frequency limit and re-insert it, but is re-insertion necessary? 88 4 Ambisonic Amplitude Panning and Decoding in Higher Orders 4.11.1 High-Frequency Time-Aligned Binaural Decoding (TAC) As a pre-requisite for their binaural Ambisonic decoders, Schörkhuber et al. [21] tested, above which frequency the removal of the HRTF linear phase trend remains inaudible in direct HRTF-based rendering without panning or smoothing. In fact, most of their listeners could not distinguish the absence of the linear phase trend when removed above 3 kHz for various sound examples (drums, speech, pink noise, rendered at directions 10◦ , −45◦ , 80◦ , −130◦ ). They had their subjects compare the result to a reference with unaltered HRTFs, and the result is analyzed in Fig. 4.28. By this finding, it is possible to split up each of the 2 × 1 HRIRs h(t, θ ) into an unaltered low-pass band and a time-aligned high-pass band to unify the highfrequency HRIR delay h left>3k H z [t − τ (arcsin θy ), θ ] . ĥ(t, θ ) = h≤3k H z (t, θ ) + h right>3k H z [t + τ (arcsin θy ), θ ] (4.54) The time delay model τ (φ) uses the angle to the left/right ear on the positive/negative y axis, so arccos ±θy , but shifted by 90◦ , hence φ = ± arcsin θy . This removal allows use all available HRIRs of dense measurement sets for binaural synthesis of high accuracy, using a suitable linear Ambisonic decoder such as AllRAD. Assuming the resulting modified left and right HRIR for all directions are denoted as 2 × L matrix Ĥ(t) = [ ĥ(t, θ1 ), . . . , ĥ(t, θL )]T , the 2 × (N + 1)2 filter set for decoding every of the Ambisonic channels to the ears becomes: T severe difference Ĥ SH (t) = Ĥ(t) D diag{a}. (4.55) no difference 10° -45° 80° -130° Ref 0Hz 500Hz 1kHz 1.5kHz 2kHz 3kHz 4kHz 5kHz 7kHz Fig. 4.28 Experiment on audibility of removal of the linear phase trend from HRTF above a varied cutoff frequency from [21] showing medians and 95% confidence intervals 4.11 Ambisonic Decoding to Headphones 45 ° 90° N=3 lin ta mls max 13 13 0° 5° 90° 5° lin ta mls max 180° 1 f=8000 -90° ° 35 N=3 -90° -90° lin ta mls max 0° 5° -4 B 0d dB -6 B 2d B -1 8d -1 45 ° f=4000 5° -4 B 0d dB -6 B 2d B -1 8d -1 45 ° N=3 5° -4 B 0d dB -6 B 2d B -1 8d -1 90° 0° 180° f=2000 89 180° Fig. 4.29 Exemplary horizontal cross-sections of linear (lin), time-aligned (ta), and MagLS/magnitude-least-squares (mls) of third order N = 3, compared to high-order N = 35 (max) Ambisonic left-ear HRTF representations of the TH Cologne HRIR_L2720.sofa set Results achieved by a pseudo-inverse decoding to hereby time-aligned HRIRs using R = 0.085 cm with N = 3 from the 2702-directions Cologne HRIRs5 is shown in Fig. 4.29. The resulting polar patterns (ta) clearly outperform the linear decomposition (lin) at frequencies above 2kHz in representing the original HRTFs (max). 4.11.2 Magnitude Least Squares (MagLS) Alternative to high-frequency time delay disposal, Schörkhuber et al. present an optimum-phase approach [21] that disregards phase match in favor of an improved magnitude match above cutoff. Formulated exemplarily for the left ear, across every HRTF direction θl , and for every discrete frequency ωk , with h l,k = h(θl , ωk ), this becomes L min 2 | yN (θl )T ĥSH,k | − |h l,k | . (4.56) ĥSH,k l=1 Typically, one would need to solve magnitude least squares or magnitude squares least squares tasks with semidefinite relaxation, see Kassakian [51]. In practice, however, results turn out to be perfect already with an iterative combination of the reconstructed phase φ̂l,k−1 from the previous frequency ωk−1 with the HRTF magnitude |h k,l | of the current frequency ωk , before a linear decomposition thereof into spherical harmonic coefficients ĥSH,k . Every frequency below cutoff ωk < 2π f N just uses the linear least-squares spherical harmonics decomposition with the left-inverse of the spherical harmonics Y N sampled at the HRTF measurement nodes, 5 Data HRIR_L2702.sofa from http://sofacoustics.org/data/database/thk. 90 4 Ambisonic Amplitude Panning and Decoding in Higher Orders ĥSH,k = (Y TN Y N )−1 Y TN h l,k l . (4.57) Continuing with the first frequency above/equal to cutoff ωk ≥ 2π f N , the algorithm proceeds as: φ̂l,k−1 = ∠ yN (θl )T ĥSH,k−1 , ĥSH,k = (Y TN Y N )−1 Y TN |h l,k | eiφ̂l,k−1 l , (4.58) (4.59) and then moves to the next frequency k ← k + 1. The results are typically transformed back to time domain to get a real-valued impulse response for every spherical harmonic to the regarded ear. The results of the MagLS approach (mls) outperform the time-alignment approach (ta) in the exemplary results shown for N = 3 in Fig. 4.29, in particular at the highest frequencies, where sphere-model-based delay simplification is not sufficiently helpful, anymore. 4.11.3 Diffuse-Field Covariance Constraint Also for both the above approaches that modify the high-frequency phase, Zaunschirm et al. [20] note that low order rendering degrades envelopment in diffuse fields, so that they introduce an additional covariance constraint as defined by Vilkamo [22]. It can be implemented as a 2 × 2 filter matrix equalizing the resulting frequencydomain diffuse-field covariance matrix to the one of the original HRTF datasets. On the main diagonal, this covariance matrix shows the diffuse-field ear sensitivities (left and right), and off-diagonal it contains the diffuse-field inter-aural cross correlation. At every frequency, the 2 × 2 diffuse-field covariance matrix of the original, very-high-order spherical harmonics HRTF dataset H H SH of the dimensions 2 × (M + 1)2 with (M N) is given by R = HH SH H SH . (4.60) The derivation why this inner product of spherical harmonic coefficients represents the diffuse-field covariance is given in Appendix A.5. The low-order high-frequency modified HRTF coefficient set H̃ SH of the dimensions 2 × (N + 1)2 also has a 2 × 2 covariance matrix R̂ that will differ from the more accurate R, H R̂ = Ĥ SH Ĥ SH . (4.61) Its diffuse-field reproduction improves after equalizing R = R̂ by a 2 × 2 filter matrix, Covariance filter |Mij| 4.11 Ambisonic Decoding to Headphones 3 0 -3 -6 -9 -12 -15 -18 -21 -24 91 N=1: M11 ,M22 N=1: M11 ,M22 N=2: M11 ,M22 N=2: M12 ,M21 N=3: M12 ,M21 N=3: M12 ,M21 100 200 400 800 1.6k 3.15k 6.3k 12.5k Frequency in Hz Fig. 4.30 Covariance constraint filters enhance the binaural decorrelation of MagLS by negative crosstalk M12 and M21 , under corresponding correction of the diffuse-field sensitivities M11 and M22 at playback orders N < 3 Ĥ SH,corr = Ĥ SH M. (4.62) Appendix A.5 shows the derivation of M based on [20, 22]. In summary, it is composed of factors obtained by Cholesky and SVD matrix decompositions Ĥ corr,SH = Ĥ SH X̂ −1 V U H X, Cholesky factors: (4.63) SVD: H HH SH H SH = X X, H H X̂ X = U SV H , H Ĥ SH Ĥ SH = X̂ X̂. While MagLS binaural decoding with orders higher than 2 or 3 does not require covariance correction, the correction enhances the decorrelation of the ear signals for 1st to 2nd order reproduction, as shown in Fig. 4.30. 4.12 Practical Free-Software Examples 4.12.1 Pd and Circular/Spherical Harmonics Similar as in the example section on first-order encoding and decoding in pure data (Pd), Fig. 4.31 shows 3rd-order 2D Ambisonic encoding and decoding for an octagon loudspeaker layout. The implementation [mtx_circular_harmonics] of the circular harmonics is used from the iemmatrix library, and the numbers for 180 = 57.29 π πm were pre-calculated. Note the similarity to the first-order 2D and am = cos 2·(N+1) example of Fig. 1.13, to which the main change is the use of the circular harmonics matrix object. For decoding to headphones, programming in Pd also looks rather similar as in the first-order example in Fig. 1.14, only more HRIRs matching the respective 92 4 Ambisonic Amplitude Panning and Decoding in Higher Orders decoding matrix encoding vector loadbang 90 max-re weighted t b b 0 45 315 lspk. angles mtx_: azimuth angle in deg. zexy/deg2rad declare -lib zexy mtx 1 1 declare -lib iemmatrix mtx_circular_harmonics 3 mtx_transpose mtx_./ 57.29 0.38 0.71 0.92 1 0.92 0.71 0.38 mtx_circular_harmonics 3 max-rE weights mtx_diag mtx_* osc~ 1000 <-test signal master level encoder mtx_*~ 7 1 10 decoder mtx_*~ 8 7 0 dbtorms pd meters >+12 +6 +2 -0dB -2 -6 -12 -20 -30 -50 <-99 >+12 +6 +2 -0dB -2 -6 -12 -20 -30 -50 <-99 >+12 +6 +2 -0dB -2 -6 -12 -20 -30 -50 <-99 >+12 +6 +2 -0dB -2 -6 -12 -20 -30 -50 <-99 >+12 +6 +2 -0dB -2 -6 -12 -20 -30 -50 <-99 >+12 +6 +2 -0dB -2 -6 -12 -20 -30 -50 <-99 >+12 +6 +2 -0dB -2 -6 -12 -20 -30 -50 <-99 >+12 +6 +2 -0dB -2 -6 -12 -20 -30 -50 <-99 multiline~ 0 0 0 0 0 0 0 0 10 dac~ 1 2 3 4 5 6 7 8 Fig. 4.31 2D encoding and decoding in Pd using [mtx_circular_harmonics] with 3rd order, 8 equidistant loudspeakers, and max-r E weighted decoder loudspeaker positions need to be employed. To work in 3 dimensions, programming in Pd would also be similar as in the corresponding first-order example of Fig. 1.15, using the matrix object [mtx_spherical_harmonics]. Typically, pre-calculated decoders including AllRAD and max-r E are used and loaded by, e.g., [mtx D.mtx] into Pd to keep programming simple. 4.12.2 Ambix Encoder, IEM MultiEncoder, and IEM AllRADecoder For encoding single- or multi-channel signals into Ambisonics, there are the ambix_encode_o<N>, or ambix_encode_i<L>_o<N> VST plugins available from Kronlachner’s ambix plugin suite or the IEM MultiEncoder from the IEM plugin suite. As exemplarily shown in Fig. 4.32, the multi encoder allows to encode channel-based multi-channel audio material, where channel-based [52] typically refers to each channel of the multi-channel material meant to be played back on a separate loudspeaker of clearly defined direction, cf. [44]. Elsewhere, the embedding of virtual playback directions can also be found referred to as beds or virtual panning spots. 4.12 Practical Free-Software Examples Fig. 4.32 MultiEncoder plug-in: encoding of a 4 + 5 + 0 recording Fig. 4.33 AllRADecoder plug-in: 5 + 7 + 0 layout from IEM Production Studio 93 94 4 Ambisonic Amplitude Panning and Decoding in Higher Orders Fig. 4.34 AllRADecoder plug-in: 5 + 7 + 0 layout from IEM Production Studio Fig. 4.35 AllRADecoder plug-in: 5 + 7 + 0 layout from IEM Production Studio 4.12 Practical Free-Software Examples 95 The IEM AllRADecoder permits to manually enter or import the loudspeaker coordinates and channel indices, with the coordinates specified by the azimuth and elevation angle in degrees, as exemplified for the IEM production studio in Fig. 4.33. The figure also shows that just entering the pure 5 + 7 + 0 layout would produce an error message Point of origin not within convex hull. Try adding imaginary loudspeakers. By adding an imaginary loudspeaker below whose signal is typically omitted, see Fig. 4.34, it becomes geometrically valid to calculate and employ the resulting decoder, however it is better to also insert an imaginary loudspeaker at the rear whose signal is preserved by specifying the gain value 1, as shown in Fig. 4.35. 4.12.3 Reaper, IEM RoomEncoder, and IEM BinauralDecoder Particularly relevant for head-phone-based listening, rendering of anechoic sounds will typically not externalize well, as it does not match the mental expectation of ordinary listening environments [53–56]. To avoid that this would rather cause an in-head localization than the desired external sound image, one can, e.g., use the IEM RoomEncoder plugin, see Fig. 4.36. It is based on an image-source room model Fig. 4.36 RoomEncoder plug-in 96 4 Ambisonic Amplitude Panning and Decoding in Higher Orders Fig. 4.37 BinauralDecoder plug-in and encodes first-order wall-reflections involving reflection factors and propagation delays together with the desired direct sound. The MagLS approach for Ambisonic decoding, using the KU100 measurements from Cologne Applied Science University and (optionally) their headphone equalization curves is implemented by the IEM BinauralDecoder, see Fig. 4.37. In combination of both, IEM RoomEncoder and IEM BinauralDecoder with an Ambisonics-encoded single-channel sound (e.g. using ambix_encoder), one can simply try to place the source and receiver together in the symmetry plane of the room, and then to slightly shift one of both sideways to see how externalization improves by slight asymmetry in the ear signals. References 1. J.S. Bamford, Ambisonic sound for the masses (1994) 2. D.H. Cooper, T. Shiga, Discrete-matrix multichannel stereo. J. Audio Eng. Soc. 20(5), 346–360 (1972) 3. P. Felgett, Ambisonic reproduction of directionality in surround-sound systems. Nature 252, 534–538 (1974) 4. M.A. Gerzon, The design of precisely coincident microphone arrays for stereo and surround sound, in prepr. L-20 of 50th Audio Engineering Society Convention (1975) 5. P. Craven, M.A. Gerzon, Coincident microphone simulation covering three dimensional space and yielding various directional outputs, U.S. Patent, no. 4,042,779 (1977) 6. J.S. Bamford, An analysis of ambisonic sound systems of first and second order, Master’s thesis, University of Waterloo, Ontario (1995) 7. D.G. Malham, A. Myatt, 3D sound spatialization using ambisonic techniques. Comput. Music J. 19(4), 58–70 (1995) 8. M.A. Poletti, The design of encoding functions for stereophonic and polyphonic sound systems. J. Audio Eng. Soc. 44(11), 948–963 (1996) 9. J.-M. Jot, V. Larcher, J.-M. Pernaux, A comparative study of 3-d audio encoding and rendering techniques, in 16th AES Conference (Rovaniemi, 1999) 10. J. Daniel, Représentation des champs acoustiques, application à la transmission et à la reproduction de scènes sonores complexes dans un contexte multimédia, Ph.D. dissertation, Université Paris 6 (2001) 11. D.B. Ward, T.D. Abhayapala, Reproduction of a plane-wave sound field using an array of loudspeakers. IEEE Trans. Speech Audio Process. 9(6), 697–707 (2001) 12. G. Dickins, Sound field representation, reconstruction and perception, Master’s thesis, Australian National University, Canberra (2003) 13. A. Sontacchi, Neue Ansätze der Schallfeldreproduktion, Ph.D. dissertation, TU Graz (2003) References 97 14. M.A. Poletti, Robust two-dimensional surround sound reproduction for nonuniform loudspeaker layouts. J. Audio Eng. Soc. 55(7/8), 598–610 (2007) 15. F. Zotter, H. Pomberger, M. Noisternig, Energy-preserving ambisonic decoding. Acta Acust. United Acust. 98(11), 37–47 (2012) 16. J.-M. Batke, F. Keiler, Using vbap-derived panning functions for 3d ambisonics decoding, in 2nd International Symposium on Ambisonics and Spherical Acoustics (Paris, 2010) 17. D. Sun, Generation and perception of three-dimensional sound fields using higher order ambisonics (University of Sydney, School of Electrical and Information Engineering, 2013). Ph.D thesis 18. Z. Ben-Hur, F. Brinkmann, J. Sheaffer, S. Weinzierl, B. Rafaely, Spectral equalization in binaural signals represented by order-truncated spherical harmonics. J. Acoust. Soc. Am. 141(6) (2017) 19. F. Brinkmann, S. Weinzierl, Comparison of head-related transfer functions pre-processing techniques for spherical harmonics decomposition, in AES Conference AVAR (Redmont, 2018) 20. M. Zaunschirm, C. Schörkhuber, R. Höldrich, Binaural rendering of ambisonic signals by head-related impulse response time alignment and a diffuseness constraint. J. Acoust. Soc. Am. 143(6), 3616–3627 (2018) 21. C. Schörkhuber, M. Zaunschirm, R. Höldrich, Binaural rendering of ambisonic signals via magnitude least squares, in Fortschritte der Akustik - DAGA (Munich, 2018) 22. J. Vilkamo, T. Bäckström, A. Kuntz, Optimized covariance domain framework for timefrequency processing of spatial audio. J. Audio Eng. Soc. 61(6), 403–411 (2013) 23. F.W.J. Olver, R.F. Boisvert, C.W. Clark (eds.), NIST Handbook of Mathematical Functions (Cambridge University Press, Cambridge, 2000), http://dlmf.nist.gov. Accessed June 2012 24. J. Daniel, J.-B. Rault, J.-D. Polack, Acoustic properties and perceptive implications of stereophonic phenomena, in AES 6th International Conference: Spatial Sound Reproduction (1999) 25. M. Frank, How to make ambisonics sound good, in Forum Acusticum, Krakow (2014) 26. M. Frank, F. Zotter, Extension of the generalized tangent law for multiple loudspeakers, in Fortschritte der Akustik - DAGA (Kiel, 2017) 27. M. Frank, A. Sontacchi, F. Zotter, Localization experiments using different 2d ambisonic decoders, in 25th Tonmeistertagung (2008) 28. P. Stitt, S. Bertet, M.v. Walstijn, Off-centre localisation performance of ambisonics and hoa for large and small loudspeaker array radii. Acta Acust. United Acust. 100(5), 937–944 (2014) 29. M. Frank, F. Zotter, Exploring the perceptual sweet area in ambisonics, in AES 142nd Convention (2017) 30. ISO 31-11:1978, Mathematical signs and symbols for use in physical sciences and technology (1978) 31. ISO 80000-2, quantities and units? Part 2: Mathematical signs and symbols to be used in the natural sciences and technology (2009) 32. R.H. Hardin, N.J.A. Sloane, Mclaren’s improved snub cube and other new spherical designs in three dimensions. Discret. Comput. Geom. 15, 429–441 (1996), http://neilsloane.com/ sphdesigns/dim3/ 33. M. Gräf, D. Potts, On the computation of spherical designs by a new optimization approach based on fast spherical fourier transforms. Numer. Math. 119 (2011), http://homepage.univie. ac.at/manuel.graef/quadrature.php 34. R.S. Womersley, Chapter, Efficient spherical designs with good geometric properties in Contemporary Computational Mathematics - A Celebration of the 80th Birthday of Ian Sloan (Springer, Berlin, 2018), pp. 1243–1285 35. A. Solvang, Spectral impairment of 2d higher-order ambisonics. J. Audio Eng. Soc. 56(4) (2008) 36. M.A. Gerzon, G.J. Barton, Ambisonic decoders for HDTV, in prepr. 3345, 92nd AES Convention, Vienna, 1992 37. J. Daniel, J.-B. Rault, J.-D. Polack, Ambisonics encoding of other audio formats for multiple listening conditions, in prepr. 4795, 105th AES Convention (San Francisco, 1998) 98 4 Ambisonic Amplitude Panning and Decoding in Higher Orders 38. M.A. Poletti, A unified theory of horizontal holographic sound systems. J. Audio Eng. Soc. 48(12), 1155–1182 (2000) 39. M.A. Poletti, Three-dimensional surround sound systems based on spherical harmonics (J. Audio Eng, Soc, 2005) 40. F. Zotter, M. Frank, All-round ambisonic panning and decoding (J. Audio Eng, Soc, 2012) 41. F. Zotter, M. Frank, Ambisonic decoding with panning-invariant loudness on small layouts (allrad2), in 144th AES Convention, prepr. 9943 (Milano, 2018) 42. R. Pail, G. Plank, W.-D. Schuh, Spatially restricted data distributions on the sphere: the method of ortnonormalized functions and applications. J. Geodesy 75, 44–56 (2001) 43. M. Frank, A. Sontacchi, Case study on ambisonics for multi-venue and multi-target concerts and broadcasts. J. Audio Eng. Soc. 65(9) (2017) 44. ITU, Recommendation BS.2051: Advanced sound system for programme production. ITU (2018) 45. M. Noisternig, A. Sontacchi, T. Musil, R. Höldrich, “A 3d ambisonic based binaural sound reproduction system, in 24th AES Conference (Banff, 2003) 46. B. Bernschütz, A. V. Giner, C. Pörschmann, J. Arend, Binaural reproduction of plane waves with reduced modal order. Acta Acust. United Acust. 100(5) (2014) 47. S. Delikaris-Manias, J. Vilkamo, Adaptive mixing of excessively directive and robust beamformers for reproduction of spatial sound, in Parametric Time-Frequency Domain Spatial Audio, ed. by V. Pulkki, S. Delikaris-Manias, A. Politis (Wiley, New Jersey, 2017) 48. C.T. Jin, N. Epain, D. Sun, Perceptually motivated binaural rendering of higher order ambisonic sound scenes, in Fortschritte der Akustik AIA-DAGA (Merano, 2013). March 49. J.B. Keller, Geometrical theory of diffraction. J. Acoust. Soc. Am. 52(2), 116–130 (1962) 50. R.S. Woodworth, H. Schlosberg, Experimental Psychology (Holt, Rinehart and Winston, 1954) 51. P.W. Kassakian, Convex approximation and optimization with applications in magnitude filter design and radiation pattern synthesis, Ph.D. dissertation, EECS Department, University of California, Berkeley (2006) 52. ITU, Recommendation BS.2076: Audio Definition Model (ITU, 2017) 53. S. Werner, G. Götz, F. Klein, Influence of head tracking on the externalization of auditory events at divergence between synthesized andf listening room using a binaural headphone system, in prepr. 9690, 142nd AES Convention (Berlin, 2017) 54. F. Klein, S. Werner, T. Mayenfels, Influences of tracking on externalization of binaural synthesis in situations of room divergence. J. Audio Eng. Soc. 65(3), 178–187 (2017) 55. J. Cubick, Investigating distance perception, externalization and speech intelligibility in complex acoustic environments, Ph.D. dissertation, DTU Copenhagen (2017) 56. G. Plenge, Über das Problem der Im-Kopf-Lokalisation. Acustica 26(5), 241–252 (1972) Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made. The images or other third party material in this chapter are included in the chapter’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. Chapter 5 Signal Flow and Effects in Ambisonic Productions This system offers significantly enhanced spatialisation technologies […] with new creative possibilities opening up to anyone with the appropriate number of audio channels available on their computer systems. Dave G. Malham [1], ICMC, Bejing, 1999. Abstract This chapter presents the internal working principles of various Ambisonic 3D audio effects. No matter which digital audio workstation or processing software is used in a production, the general Ambisonic signal infrastructure is outlined as an important overview of the signal processing chain. The effects presented are frequency-independent effects such as directional re-mapping (mirror, rotation, warping) and re-weighting (directional level modification), and frequency-dependent effects such as widening/distance/diffuseness, diffuse reverberation, and resolutionenhanced convolution reverberation. The typical audio processing steps for Ambisonic surround-sound signal manipulation are shown in the block diagram Fig. 5.1 from [2]. The description of the multi-venue application in [3] and one for live effects [4] might be encouraging. Ambisonic encoding and Ambisonic bus. From the previous section we know that representing single-channel signals sc (t) together with their direction θ c is a matter of encoding, of multiplying the signal by the coefficients yN (θ c ) obtained by evaluating the spherical harmonics at the direction from which the signal should appear to come. In productions, there will be multiple signals, either representing spot microphones, virtual playback spots of embedded channel-based content (beds), e.g. stereo or 5.1, material. With all input signals encoded and summed up on an Ambisonic bus, we obtain the multi-channel Ambisonic signal representation of an entire audio production © The Author(s) 2019 F. Zotter and M. Frank, Ambisonics, Springer Topics in Signal Processing 19, https://doi.org/10.1007/978-3-030-17207-7_5 99 100 5 Signal Flow and Effects in Ambisonic Productions Fig. 5.1 Block diagram as in [2] χ N (t) = C yN (θ c ) sc (t). (5.1) c=1 Ambisonic surround-sound signal. Without decoding to a specific loudspeaker layout, the signal χ N of the Ambisonic bus might appear somewhat virtual. Nevertheless, it allows to be drawn as a surround-sound signal x(θ , t) whose amplitude can be evaluated and metered at any direction θ, anytime t, using the expansion into spherical harmonics x(θ , t) = yTN (θ) χ N (t). (5.2) Upmixing. As first-order recordings are not highly resolved, there are several works on algorithms with resolution enchancement strategies that re-assign time-frequency bins more sharply to directions. A good summary on such input-specific insert effects has been given in the book [5, 6]. Available solutions are DirAC, HOA-DirAC, COMPASS, Harpex. Higher order. Higher-order microphones require more of the acoustic holophonic and holographic basics than presented above, yielding pre-processing filters as inputspecific insert effect. Higher-order recording is dealt with in the subsequent Chap. 6 after the derivation of the wave equation and the solutions in the spherical coordinate system. Insert effects: Generic re-mapping and leveling. One can imagine that it should be possible to manipulate the surround-sound signal x(θ , t) in various ways. For 5 Signal Flow and Effects in Ambisonic Productions 101 instance, effects based on directional re-mapping can take signals out of their original directional range and place them back into the Ambisonic signal at manipulated directions. Also, directions can be altered in amplitude levels so that, for instance, signals at directions with unwanted content undergo attenuation. Many more useful effects are presented below. Decoding to loudspeakers/headphones. To map the modified Ambisonic signal χ̃ Ñ to loudspeakers or headphones, an Ambisonic decoder is needed as discussed in the previous chapter. For decoding to headphones, it should be considered to either take only as few HRIR directions to decode to as possible [7, 8], before signals get convolved and mixed to avoid coloration at frontal directions where delays in the HRIRs change too strongly over the direction to get resolved properly [9, 10]. Alternatively, the approach in [11] proposed removal of the HRIR delay at high frequencies and diffuse-field covariance equalization by a 2 × 2 filter system, cf. Sect. 4.11. 5.1 Embedding of Channel-Based, Spot-Microphone, and First-Order Recordings Microphone arrays for near-coincident higher-order Ambisonic recording based on holography will be discussed in the subsequent chapter. Nevertheless it possible to use (i) spot and close microphones and encode their direction into the directional panorama, (ii) first-order microphone arrays to fill the Ambisonic channels only up to the first order, (iii) more classical non-coincident or equivalence-stereophonic microphone arrays whose typical playback directions are encoded in Ambisonics. The study by Kurz et al. [12] investigated how recordings by first-order encoding of the soundfield microphone ST450 and the Oktava MK4012 tetrahedral microphone arrays compare to the equivalence-stereophonic ORTF, see Fig. 5.2. In addition, ORTF-like mapping of the Oktava MK4012’s frontal signals to the ±30◦ directions in 5th order was tested instead of its first-order encoding. Figure 5.3 shows the results of the study in terms of the perceptual attributes localization and spatial depth. It seems that a mixture between ORTF-like 5th-order encoding and first-order encoding of the MK4012 microphone achieves preferred results, while the first-order encoded output of the ST450 Soundfield microphone is rated fair in both attributes, the ORTF microphone only ranked well terms of localization. The results of the ST450 were independent from its orientation, whereas the localization of the first-order-encoded MK4012 was found to be dependent on the orientation. This dependency of the MK4012 is because its microphones are not sufficiently coincident. As a bottom line of the detailed analysis, one should be encouraged to keep using classical microphone techniques where known to be appropriate and encode their output in higher-order beds or virtual playback directions. However, this should be done with the awareness that stereophonic recording won’t necessarily work for a 102 5 Signal Flow and Effects in Ambisonic Productions 1 1 0.8 0.8 0.6 0.4 orientation ORTF 0° 90° 0.2 0 MSTC64 ST450 SPS200 (a) Sound Localization MK4012 playback stereo 1st order 1st/5th order Spatial Depth Localization Fig. 5.2 Ensemble of the Ambisonic and reference microphones of the study by Kurz et al. [12]; the pixelized microphone prototype by AKG was excluded from the study 0.6 0.4 0.2 0 MSTC64 ST450 SPS200 MK4012 (b) Spatial Depth Fig. 5.3 Median values and 95% confidence intervals for each attribute from experiments in [12] for different microphones, orientations, and playback processing large audience area, for which the robustness in directional mapping of equivalencebased techniques seem to be attractive. An interesting layout is, e.g., specified in Hendrickx et al’s work [13], in which they use an equivalence-stereophonic six-channel microphone array. Another interesting idea was used in the ICSA Ambisonics Summer School 2017. A height layer of suitably inclined super-cardioid microphones was added at small vertical distance to the horizontal microphone layer, similarly as the upwards-pointing directional microphones suggested in Lee’s and Wallis’ work [14, 15] to provide sufficiently attenuated horizontal sounds to the height layer. 1 1 0.8 0.8 timbral quality spatial quality 5.1 Embedding of Channel-Based, Spot-Microphone, and First-Order Recordings 0.6 0.4 0.2 103 0.6 0.4 0.2 0 0 0 1C 1 2 3 4 (a) Spatial Quality 5 ref 0 1C 1 2 3 4 5 ref (b) Timbral Quality Fig. 5.4 Median values and 95% confidence intervals of listening experiment comparing channelbased orchestra recordings on headphone playback, either directly rendered using the corresponding HRIRs or via binaural Ambisonic decoding of different orders Binaural rendering study using surround-with-height material. In another study by Lee, Frank, and Zotter [16], static headphone-based rendering of channel-based recordings was compared using direct HRIR-based rendering or Ambisonics-based binaural rendering, cf. Sect. 4.11. The aim was to find whether differently recorded material could be rendered at high quality via binaural Ambisonics renderers, or under which settings this would imply quality degradation when compared to channel-based binaural rendering. The results from the half of the listening experiment done in Graz is analyzed in Fig. 5.4, and the renderers compared were channel-based “ref”, a low-passed mono anchor designed to have poor quality “0”, a first-order binaural Ambisonic renderer “1c” based on a cube layout with loudspeakers at ±90◦ , ±270◦ azimuth and ±35.3◦ elevation, and MagLS binaural Ambisonic renderers at the orders “1”, “2”, “3”, “4”, and “5”. Obviously, for orders 2 and above, there is not much quality degradation compared to the reference channel-based binaural rendering. The spatial quality cannot be distinguished from the reference for MagLS with Ambisonic orders 3 and above, and the timbral qualities cannot be distinguished for Ambisonic orders 2 and above. While this result simplifies the practical requirements for headphone playback remarkably, it can be supposed that due to the limited sweet spot size, loudspeaker playback would still require higher orders, typically. 5.2 Frequency-Independent Ambisonic Effects Many frequency- and time-independent Ambisonic effects are based on the aforementioned re-mapping of directions and manipulation of directional amplitudes, see e.g. Kronlachner’s thesis, [2, 17]; advanced effects can be found in [18]. In general, the surround-sound signal allows to be manipulated by any thinkable transformation that modifies the directional mapping and amplitude of its contents. The formulation 104 5 Signal Flow and Effects in Ambisonic Productions x̃(θ̃, t) = g(θ) x(θ , t) (5.3) expresses an operation that is able to pick out every direction θ of the input signal, weight its signal by a directional gain g(θ), and re-map it to a new direction θ̃ = τ {θ } within a transformed signal x̃. To find out how this affects Ambisonic signals, we write both x and x̃ as Ambisonic signals x(θ , t) = yTN (θ ) χ N (t) and x̃(θ, t) = yTÑ (θ̃ ) χ̃ N (t) expanded in spherical/circular harmonics, yTÑ (θ̃ ) χ̃ Ñ (t) = g(θ) yTN (θ ) χ N (t), and use left SD yÑ (θ̃) yTÑ (θ̃ ) dθ̃ = I by integrating over yÑ (θ̃ )dθ̃ :=T χ̃ Ñ (t) = SD yÑ (θ̃ ) g(θ) SD to get χ̃ Ñ (t) on the yTN (θ ) dθ̃ χ N (t) = T χ N (t) (5.4) to find the transformed signals being just re-mixed Ambisonic input signals by the matrix T (note that it might require an increased Ambisonic order Ñ). Numerical evaluation of the matrix T = SD yÑ (θ̃ ) g(θ) yÑ (θ ) dθ̃ is best done by using a highenough t-design = [θl ] to discretize the integration variable θ̃ = τ {θ }. For the discretized input directions θ , an inverse mapping θ = τ −1 {θ̃} of the output direction must exist (directional re-mapping must be bijective), so that we can write T= SD yÑ (θ̃ ) g(τ −1 {θ̃}) yTN (τ −1 {θ̃}) dθ̃ = 4π L̂ Y Ñ, diag{g τ −1 {} } Y TN,τ −1 {} . This formalism is generic and covers simplistic and more complex tasks. It helps understanding that every frequency-independent directional weighting and/or remapping is just re-mixing the Ambisonic signals by a matrix, as in Fig. 5.6a. The ambix VST plugin suite implements several effects, e.g. in the VST plugins ambix_mirror, ambix_rotate, ambix_directional_loudness, ambix_warp. The sections below explain how these and other effects work inside. 5.2.1 Mirror Mirroring does not actually require the generic re-mapping and re-weighting formalism from above, yet. The spherical harmonics associated with the Ambisonic channels are shown in Fig. 4.12 and upon closer inspection one recognizes their symmetries, see Fig. 5.5. To mirror the Ambisonic sound scene with regard to planes of symmetry, it is sufficient to sign-invert channels associated with odd-symmetric spherical harmonics as in Fig. 5.6b. Formally, the transform matrix consists of a diagonal matrix T = diag{c} only, with the corresponding sign-change sequence c. 5.2 Frequency-Independent Ambisonic Effects 105 Fig. 5.5 Ambisonic singals associated with odd symmetric spherical harmonics are sign-inverted to mirror the sound scene. For every Cartesian axis, illustrations above show spherical harmonics up to the third-order, with the order index n organized in rows and the mode index m in columns. Even harmonics are blurred for visual distinction Up-down: For instance, spherical harmonics with |m| = n are even symmetric with regard to z = 0 (up-down), and from this index on, every second harmonic in m is. To flip up and down, it is therefore sufficient to invert the signs of odd-symmetric spherical harmonics with regard to z = 0; they are characterized by n + m being an odd number, or cnm = (−1)n+m . Left-right: The sin ϕ-related spherical harmonics with m < 0 are odd-symmetric with regard to y = 0 (left-right); therefore sign-inverting the signals with the index m < 0 exchanges left and and right in the Ambisonic surround signal, i.e. cnm = (−1)m<0 . 106 5 Signal Flow and Effects in Ambisonic Productions Fig. 5.6 Block diagrams of frequency-independent transformations such as re-mapping and reweighting (left, matrix operations), or mirroring (right, sign-only operations) Front-back: Every odd-numbered m > 0 is odd-symmetric with regard to x = 0 (front-back), and so is every even-numbered harmonic with m < 0. Inverting the sign of these harmonics, cnm = (−1)m+(m<0) , flips front and back in the Ambisonic surround signal. 5.2.2 3D Rotation Rotation can be expressed by a general rotation matrix R consisting of a rotation around z by χ , around y by ϑ, and again around z by ϕ, see Fig. 5.7. This rotation matrix maps every direction θ to a rotated direction θ̃: θ̃ = R(ϕ, ϑ, χ ) θ , (5.5) ⎡ ⎤⎡ ⎤⎡ ⎤ cos(ϕ) − sin(ϕ) 0 cos(ϑ) 0 − sin(ϑ) cos(χ ) − sin(χ ) 0 ⎦ ⎣ sin(χ ) cos(χ ) 0⎦ . 0 R = ⎣ sin(ϕ) cos(ϕ) 0⎦ ⎣ 0 1 0 0 1 sin(ϑ) 0 cos(ϑ) 0 0 1 Using this as a transform rule τ (θ̃ ) = R θ with neutral gain g(θ) = 1, we find the transform matrix by the inverse mapping θ = RT θ̃ as Fig. 5.7 zyz-Rotation on the plain example of great-circle navigation of a paper plane around the earth. With the original location at the zenith, a first rotation around z determines the course, and the subsequent rotations around y and z relocate the plane in zenith and azimuth 5.2 Frequency-Independent Ambisonic Effects T= 4π L̂ 107 Y N, Y TN,RT . (5.6) Using the L̂ directions of a t ≥ 2N-design is sufficient to sample the harmonics accurately. With the resulting T , rotation is implemented as in Fig. 5.6a. There is plenty of potential for simplification: As only the spherical harmonics of a given order n are required to re-express a rotated spherical harmonic of the same order n, T is actually block diagonal T = blk diagn {T n }, and within each spherical harmonic order, the integral could be more efficiently evaluated using a smaller t ≥ 2n-design. Moreover, there are various fast and recursive ways to calculate the entries of T as in [19–25] and implemented in most plugins. And yet, in practice a naïve implementation can be fast enough and pragmatic. Rotation around z. One special case of rotation is important and particularly simple to implement. A directional encoding in azimuth always either is equal to m (ϕs ) in 2D, or contains it in 3D. For m > 0, the azimuth encoding m (ϕs ) depends on cos mϕs , and its negative-sign version −m (ϕs ) depends on sin(|m|ϕs ). The encoding angle can be offset by the trigonometric addition theorems. They can be written as a matrix: cos mϕ sin mϕ sin mϕs sin m(ϕs + ϕ) = . (5.7) − sin mϕ cos mϕ cos mϕs cos m(ϕs + ϕ) R(mϕ) By this, any Ambisonic signal, be it 2D or 3D, can be rotated around z by the matrices R(mϕ) for the signal pairs with ±m. −m (ϕs + ϕ) −m (ϕs ) = R(mϕ) , m (ϕs + ϕ) m (ϕs ) Yn−m (ϕs + ϕ, ϑs ) Yn−m (ϕs , ϑs ) = R(mϕ) . Ynm (ϕs + ϕ, ϑs ) Ynm (ϕs,ϑs ) (5.8) Figure 5.8a shows the processing scheme implementing only the non-zero entries of the associated matrix operation T . Combined with a fixed set of 90◦ rotations around y (read from files), it can be used to access all rotational degrees of freedom in 3D [20]. The rotation effect is one of the most important features when using head-tracked interactive VR playback for headphones. Here, rotation counteracting the head movement has the task to support the impression of a static image of the virtual outside world. 5.2.3 Directional Level Modification/Windowing What might be most important when mixing is the option to treat the gains of different directions differently: it might be necessary to attenuate directions of uninteresting or disturbing content while boosting directions of a soft target signal. For such a 108 5 Signal Flow and Effects in Ambisonic Productions Fig. 5.8 Rotation around z and Ambisonic widening/diffuseness apply simple 2 × 2 rotation matrices/filter matrices to each Ambisonic signal pair χn,m χn,−m of the same order n. Note that the order of the input/output channels plotted is not the typical ACN sequence to avoid crossing connections and hereby simplify the diagram manipulation there is a neutral directional re-mapping θ̃ = θ and the transform to define the matrix T that is implemented as in Fig. 5.6a remains T= 4π L̂ Y Ñ, diag{g } Y TN, . (5.9) In the simplest version, as implemented in ambix_directional_loudness, the gain function just consists of two mutually exclusive regions, e.g. within a region of diameter α around the direction θ g , and a complementary region outside, with separately controlled gains gin and gout : g(θ) = gin u(θ T θ g − cos α2 ) + gout u(cos α2 − θ T θ g ), (5.10) where u(x) represents the unit-step function that is 1 for x ≥ 0 and 0 else. Note that the Ambisonic order of this effect will need to be larger to be lossless. However, with reasonably chosen sizes α and gain ratios gin /gout , the effect will nevertheless produce reasonable results. Figure 5.9 shows a window at azimuth and elevation at 22.5◦ with an aperture of 50◦ using gin = 1 and gout = 0 and the order of N = 10 with a grid of encoded directions to illustrate the influence of the transformation. For reference: entries of the tensor used to analytically re-expand the product of two spherical functions x(θ ) g(θ) given by their spherical harmonic coefficients χnm , γnm are called Gaunt coefficients or Clebsh-Gordan coefficients [6, 26]. 180 180 135 135 45 0 -45 -90 135 -135 90 -180 90 180 45 135 45 90 0 90 109 0 45 0 -45 -90 -135 -180 5.2 Frequency-Independent Ambisonic Effects 180 (a) original (b) windowed Fig. 5.9 Directionally windowed Ambisonic test image at every 90◦ in azimuth, interleaved in azimuth for the elevations ±60◦ and ±22.5◦ , using the order N = Ñ = 10, a window size of α ◦ ◦ 2 = 50 around azimuth and elevation of 0 and max-r E weighting 5.2.4 Warping Gerzon [27, Eq. 4a] described the effect dominance that is meant to warp the Ambisonic surround scene to modify how vitally the essential parts in front of the scene are presented. Warping wrt. a direction. For mathematical simplicity, we describe this bilinear warping with regard to the z direction. To warp with regard to the frontal direction, one first rotates the front upwards, applies the warping operation there, and then rotates back. The bilinear warping modifies the normalized z coordinate ζ = cos ϑ = θz so that signals from the horizon ζ = 0 are pulled to ζ̃ = α, ζ̃ = α+ζ , 1 + αζ (5.11) while keeping for the poles ζ̃ = ±1 what was originally there ζ = ±1. Hereby, the surround signal gets squeezed towards or stretched away from the zenith, or when rotating before and after: towards/from any direction. The integral can be discretized and solved by a suitable t-design as before, only that for lossless operation, the output order Ñ must be higher than the input order N. We get a matrix T that is implemented as in Fig. 5.6a is computed by T= 4π L̂ T Y Ñ, diag{g τ −1 {} } Ỹ N,τ −1 {} . (5.12) The inverse mapping yields ζ = τ −1 (ζ̃ ) = ζ̂ − α 1 − α ζ̂ , (5.13) and it modifies the coordinates of the t-design inserted for θ̃ l = θl = [θx,l , θy,l , θz,l ]T with ζ̃l = θz,l accordingly 180 135 135 90 45 0 −45 135 −90 90 −135 90 −180 45 180 45 135 0 90 0 45 0 -45 -90 5 Signal Flow and Effects in Ambisonic Productions -135 -180 110 180 180 (b) warped (a) original Fig. 5.10 Warping of the horizontal plane by 22.5◦ downwards; original Ambisonic test image contains points at every 90◦ in azimuth, interleaved in azimuth for the elevations ±60◦ and ±22.5◦ ; orders are N = Ñ = 10; max-r E weighted ⎡ 2 ⎤ θz,l −α ⎢θx,l 1 − 1−αθz,l ⎥ 1 − (τ −1 {ζ̃l })2 θ ⎥ ⎢ ⎢ x,l 2 ⎥ −1 ⎥ ⎥ ⎢ τ {l } = ⎢ ⎣θy,l 1 − (τ −1 {ζ̃l })2 ⎦ = ⎢θy,l 1 − θz,l −α ⎥ . ⎣ ⎦ 1−αθz,l τ −1 {ζ̃l } θz,l −α ⎡ ⎤ (5.14) 1−αθz,l The gain g(ζ̂ ) of the generic transformation is useful to preserve the loudness of what becomes wider and therefore louder in terms of the E measure after re-mapping. To preserve loudness, the resulting surround signal is divided by the square root of the 1 stretch applied, which is related to the slope of the mapping by g = dζ . Expressed dζ̃ as de-emphasis gain, we get 1 − α ζ̂l 1 − αθl,z g(ζ̂l ) = √ = √ . 2 1−α 1 − α2 (5.15) Figure 5.10 shows warping of the horizontal plane by 20◦ downwards, using the test image parameters as with windowing; de-emphasis attenuates widened areas. In the same fashion, Kronlachner [17] describes another warping curve that warps with regard to fixed horizontal plane and pole, either squeezing or stretching the content towards or away from the horizon, symmetrically for both the upper and lower hemispheres (second option of the ambix_warp plugin). 5.3 Parametric Equalization There are two ways of employing parametric equalizers to Ambisonic channels. Either a single-/multi-channel input of a mono-encoder or a multiple-input encoder is filtered by parametric equalizers. Or each of the Ambisonic signal’s channels is filtered by the same parametric equalizer, see Fig. 5.11a. 5.3 Parametric Equalization 111 Fig. 5.11 Block diagram of processing that commonly and equally affects all Ambisonic signals, such as parametric equalization and dynamic processing (compression), without recombining the signals Bass management is often important to not overdrive smaller loudspeaker systems of, e.g., a 5th-order hemispherical playback system with subwoofer signals: All 36 channels from the Ambisonic bus can be sent to a decoder section, in which frequencies below 70–100 Hz are high-cut by a 4th-order filter before running through the Ambisonic decoder, while the first channel from the Ambisonics bus alone, the omnidirectional channel, is being sent to a subwoofer section, in which a 4th-order filter high-cut removes the high frequencies above 70–100 Hz before the signal is sent to the subwoofers. If the playback system is time-aligned between subwoofer and higher frequencies, the 4th-order crossovers should be Linkwitz–Riley filters (either squared Butterworth high-pass or low-pass filters) to preserve phase equality [28]. For more information on parametric equalizers, the reader is referred to Udo Zölzer’s book on Digital audio effects [29]. 5.4 Dynamic Processing/Compression Individual compression of different Ambisonic channels would destroy the directional consistency of the Ambisonics signal. Consequently, dynamic processing should rather affect the levels of all Ambisonic channels in the same way. As it typically contains all the audio signals, it is useful to have the first, omnidirectional Ambisonic channel control the dynamic processor as side-chain input, see Fig. 5.11b. For more information on dynamic processing, the reader is referred to Udo Zölzer’s book on Digital audio effects [29]. Moreover, it is sometimes useful to compress the vocals of a singer separately. To this end, the directional compression would first extract a part of the Ambisonic signals by a directional window, creating one set of Ambisonic signals without the directional region of the window, and another one exclusively containing it. The compression is applied on the resulting window signal before re-combining it with the rest signals. 112 5 Signal Flow and Effects in Ambisonic Productions 5.5 Widening (Distance/Diffuseness/Early Lateral Reflections) Basic widening and diffuseness effects can be regarded as being inspired by Gerzon [30] and Laitinen [31] who proposed to apply frequency-dependent panning filters, mapping different frequencies to directions dispersed around the panning direction. The resulting effect is fundamentally different from and superior to increasing the spread in frequency-independent MDAP with enlarged spread or Ambisonics with reduced order, which could yield audible comb filtering. To apply this technique to Ambisonics, Zotter et al. [32] proposed to employ a dispersive, i.e. frequency-dependent, rotation of the Ambisonic scene around the zaxis as in Eq. (5.8) by the matrix R as described above and in Fig. 5.8b, using 2 × 2 matrices of filters to implement the frequency-dependent argument m φ̂ cos ωτ R(m φ̂ cos ωτ ) = cos(m φ̂ cos ωτ ) sin(m φ̂ cos ωτ ) , − sin(m φ̂ cos ωτ ) cos(m φ̂ cos ωτ ) (5.16) whose parameters φ̂ and τ allow to control the magnitude and change rate of the rotation with increasing frequency. How this filter matrix is implemented efficiently was described in [33], where a sinusoidally frequency-varying pair of functions g1 (ω) = cos [α cos(ω τ )] , g2 (ω) = sin [α cos(ω τ )] , (5.17) was found to correspond to the sparse impulse responses in the time domain g1 (t) = ∞ J|q| (α) cos( π2 |q|) δ(t − q τ ) (5.18) q=−∞ g2 (t) = ∞ J|q| (α) sin( π2 |q|) δ(t − q τ ), q=−∞ allowing for truncation to just a few terms in q, typically 11 taps between −5 ≤ q ≤ 5 or fewer, and hereby an efficient implementation. For the implementation of the filter matrix, for each degree m, the value α = m φ̂. (It might be helpful to be reminded of a phase-modulated cosine and sine from radio communication, whose spectra are the same functions as this impulse response pair.) As the algorithm places successive frequencies at slightly displaced directions, the auditory source width increases. Moreover, the frequency-dependent part causes a smearing of the temporal fine-structure in the signal. In [34], it was found that implementations discarding the negative values of q, i.e. keeping q ≥ 0 sound more natural and still exhibit a sufficiently strong effect. Time constants τ around 1.5 ms yield a widening effect, and a diffuseness and distance impression is obtained with τ around 15 ms. The parameter φ̂ is adjustable between 0 (no effect) and larger values. 5.5 Widening (Distance/Diffuseness/Early Lateral Reflections) 113 Beyond 80◦ the audio quality starts to degrade. The use as diffusing effect has turned out to be useful as simple simulation of early lateral reflections, because most parts of the spectrum are played back near the reversal points ±φ̂ of the dispersion contour. For naturally sounding early reflections, additional shelving filters introducing attenuation of high frequencies prove useful. Figures 5.12 and 5.13 show experimental ratings of the perceived effect strength (width or distance) of the above algorithm in [34], which was implemented as frequency-dependent (dispersive) panning on just a few loudspeakers L = 3, 4, 5, 7 evenly arranged from −90◦ to 90◦ on the horizon at 2.5 m distance from the central listening position. Loudspeakers were controlled by a sampling decoder of the orders N = 1, 2, 3, 5 with the center of the max-r E -weighted panning direction at 0◦ in front. The signal was speech and as a reference it used the frontal loudspeaker with the unprocessed signal “REF”. The experiment tested the algorithm with both the symmetric impulse responses suggested by Eq. (5.18), and such truncated to their causal q ≥ 0-side, for a listening position at the center of the arrangement (bullet marker) and at 1.25 m shifted to the right, off-center (square marker). Figure 5.12 indicates for the widening algorithm with τ = 1.5 ms that the perceived width saturates above N > 2 at both listening positions. Despite the effect of the causal-sided Fig. 5.12 Perceived width (left) and audio quality (right) of frequency-dependent dispersive Ambisonic rotation as widening effect using the setting τ = 1.5 ms, the Ambisonic orders N = 1, 2, 3, 5, and L = 3, 4, 5, 7 loudspeakers on the frontal semi-circle, with listening positions at the center (bullet marker) and half-way right off-center (square marker) Fig. 5.13 Perceived width (left) and audio quality (right) of frequency-dependent dispersive Ambisonic rotation as distance/diffuseness effect using the setting τ = 15 ms, the Ambisonic orders N = 1, 2, 3, 5, and L = 3, 4, 5, 7 loudspeakers on the frontal semi-circle, with listening positions at the center (bullet marker) and half-way right off-center (square marker) 114 5 Signal Flow and Effects in Ambisonic Productions implementation is weaker in effect strength, it highly outperforms the symmetric FIR implementation in terms audio quality (right diagram), while still producing a clearly noticeable effect when compared to the unprocessed reference (left diagram). A more pronounced preference of the causal-sided implementation in terms of audio quality is found in Fig. 5.13 for the setting τ = 15 ms, where the algorithm is increasing the diffuseness or perceived distance for orders N > 2 at both listening positions. 5.6 Feedback Delay Networks for Diffuse Reverberation Feedback delay networks (FDN, cf. [35, 36]) can directly be employed to create diffuse Ambisonic reverberation. A dense response and an individual reverberation for every encoded source can be expected when feeding the Ambisonic signals directly into the inputs of the FDN. As in Fig. 5.14, FDNs consists of a matrix A that is orthogonal AT A = I and should mix the signals of the feedback loop well enough to distribute them across all different channels to couple the resonators associated with the different delays τi . These delays should not have common divisors to avoid pronounced resonance frequencies, and are therefore typically chosen to be related to prime numbers. Small delays are typically selected to be more closely spaced {2, 3, 5, . . . } ms to simulate a diffuse part with densely spaced response at the beginning, and long delays further apart often make the reverberation more interesting. Using unity factors as channel τi τi τi , gmi , ghi = 1 and any orthogonal matrix A, the reverberation time becomes gains glo infinite. For smaller channel gains, the FDN produces decaying output. −3 t Reverberation is characterized by the exponentially decaying envelope 10 T60 . − 3 For a single delay of the length τi , the corresponding gain is g τi with g = 10 T60 . This factor with the corresponding exponent provides equal reverberation decay rate in every channel, and hereby exact control of the reverberation time. To make the effect sound natural, it is typical to adjust the gains within a high-mid-low filter set τi ≤ to decrease the reverberation towards higher frequency bands by the gains ghi τi τi gmi ≤ glo . The vector gathering the current sample for every feedback path is multiplied by the matrix A. For calculation in real-time, Rocchesso proposed to use a scaled Hadamard matrix A = M1 H of the dimensions M = 2k in [37]. It consists of ±1 entries only and hereby perfectly mixes the signal across the different feedbacks to create a diffuse set of resonances. What is more, this not only replaces the M × M multiply and adds of matrix multiplication multiplies by sums and differences, it is 5.6 Feedback Delay Networks for Diffuse Reverberation 115 Fig. 5.14 Feedback delay network (FDN) for Ambisonic reverb. The matrix A is unitary and the − 3 gain g = 10 T60 to the power of the delay τi allows to adjust a spaitally and temporally diffuse reverberation effect in different bands (lo, mi, hi) moreover equivalent to the efficient implementation as Fast Walsh-Hadamard Transform (FWHT), a butterfly algorithm. Figure 5.15 shows a graphical implementation example of a 16-channel FWHT in the real-time signal processing environment Pure Data. Fig. 5.15 The fast Walsh-Hadamard transform (FWHT) variant implemented in the 16-channel feedback delay network reverberator [rev3∼] in Pure Data requires only 4 × 16 sums/differences to replace the 16 × 16 multiplies of matrix multiplication by A 116 5 Signal Flow and Effects in Ambisonic Productions 5.7 Reverberation by Measured Room Impulse Responses and Spatial Decomposition Method in Ambisonics The first-order spatial impulse response of a room at the listener can be improved by resolution enhancements of the spatial decomposition method (SDM) by Tervo [38], which is a broad-band version of spatial impulse response rendering (SIRR) by Merimaa and Pulkki [39, 40]. For reliable measurements, typically loudspeakers are employed, and the typical measurement signals aren’t impulses, but swept-sine signals that can are reverted to impulses by deconvolution. A room impulse response is typically sparse in its beginning whenever direct sound and early reflections arrive at the measurement location. Generally, it is likely that those arrival times in the early part do not coincide and are well separated from each other, so that one can assume their temporal disjointness at the receiver. From a room impulse response h(t) that complies with this assumption, for which there consequently is a direction of arrival (DOA) θ DOA (t) for every time instant, one could construct an Ambisonic receiver-directional room impulse response as in [41] h(θ R , t) = h(t) δ[1 − θ TR θ DOA (t)], depending on the direction θ R at the receiver. This response can be transformed into the spherical harmonic domain by integrating it over yN (θ R ) dθ R S2 , to get the set of Nth -order Ambisonic room impulse responses h̃N (t) = h(t) yN [θ DOA (t)]. A signal s(t) convolved by this vector of impulse responses theoretically generates a 3D Ambisonic image of the mono sound in the room of the measurement. This can be done, e.g., by the plug-in mcfx_convolver. Now there are two problems to be solved: (i) how to estimate θ DOA (t), (ii) how to deal with the diffuse part of h(t), when there are more sound arrivals at a time than one. Estimation of the DOA. One could now just detect the temporal peaks of the room impulse response and assign the guessed evolution of the direction of arrival as suggested in [42], and hereby span the envelopment of the room impulse response. Alternatively, if the room impulse response was recorded by a microphone array as in [38], array processing can be used to estimate the direction of arrival θ DOA (t). For first-order Ambisonic microphone arrays, when suitably band-limited to the frequency range in which the directional mapping is correct, e.g. between 200 Hz and 4 kHz, the vector r DOA of Eq. (A.83) in Appendix A.6.2 yields a suitable estimate ⎡ ⎤ X (t) r̃ DOA (t) . (5.19) r̃ DOA (t) = W (t) ⎣ Y (t) ⎦ = −ρc h(t) v(t), θ DOA (t) = r̃ DOA (t) Z (t) Figure 5.16 shows the directional analysis of the first 100 ms of a first-order directional impulse response taken from the openair lib1 This response was measured in St. Andrew’s Church Lyddington, UK (2600 m3 volume, 11.5 m source-receiver distance) with a Soundfield SPS422B microphone. 1 http://www.openairlib.net. 5.7 Reverberation by Measured Room Impulse Responses . . . 0 45 180 90 −90 0 90 −180 Fig. 5.16 Spatial distribution of the first 100 ms of a first-order directional impulse response measured at St. Andrew’s Church Lyddington. Brightness and size of the circles indicate the level 117 135 180 2 reverberation time in s Fig. 5.17 Frequencydependent reverberation time calculated from original and SDM-enhanced impulse responses without spectral decay recovery 1.5 1 st 1 order original 1st order SDM 0.5 3rd order SDM th 5 order SDM th 20 order SDM 0 2 10 3 10 4 10 frequency in Hz The direct sound from the front is clearly visible, as well as strong early reflections from front and back, and equally distributed weak directions from the diffuse reverb. Spectral decay recovery for higher-order RIRs. The second task mentioned above is that the multiplication of h(t) by yN [θ DOA (t)] to obtain h̃(t) degrades the spectral decay at higher orders. If there is no further processing, the resulting response typically exhibits a noticeable increased spectral brightness [38, 41, 43]. This unnatural brightness mainly affects the diffuse reverberation tail, where temporal disjointness is a poor assumption. There, the corresponding rapid changes of θ DOA (t) cause a strong amplitude modulation in the pre-processing of the late room impulse response at high Ambisonic orders. Typically, long decays of low frequencies leak into high frequencies, and hereby result in an erroneous spectral brightening of the diffuse tail. Figure 5.17 analyses the behavior in terms of an erroneous increase of reverberation time at high frequencies, especially when using high orders. In order to equalize the spectral decay and hereby the reverberation time of the SDM-enhanced impulse response, there is a helpful pseudo-allpass property of the spherical harmonics for direct and diffuse fields, as described in Eqs. (A.52) and (A.55) of Appendix A.3.7. The signals in the vector h̃(t) = [h̃ m n (t)]nm are first decomposed into frequency bands, yielding the sub-band responses h̃ m n (t, b). We can equalize the spectral sub-band decay for every band b and order n by targeting fulfillment of the pseudo-allpass property 118 5 Signal Flow and Effects in Ambisonic Productions n 2 = (2n + 1)E |h 00 (t, b)|2 . E |h m n (t, b)| (5.20) m=−n The formulation above relies on the correct spectral decay of the omnidirectional signal h 00 (t, b) = h̃ 00 (t, b), which is unaffected by modulation. Correction is achieved by (2n + 1) E{|h̃ 0 (t, b)|2 } 0 m m h n (t, b) = h̃ n (t, b) n ; m (t, b)|2 } E{| h̃ n m=−n (5.21) here, the expression E{| · |2 } refers to estimation of the squared signal envelope. Perceptual evaluation. Frank’s 2016 experiments [44] measuring the area of the sweet spot also investigated the plausibility of reverberation created by their Ambisonically SDM-processed measurements at different order settings, N = 1, 3, 5. For Fig. 5.18b listeners indicated at which distance from the room’s center they heard that envelopment began to collapse to the nearest loudspeakers. One can observe that rendering diffuse reverberation for a large audience benefits from a high Ambisonic order. Moreover, experiments in [43] revealed an improvement of the perceived spatial depth mapping, i.e. a clearer separation between foreground and background sound for the SDM-processed higher-order reverberation, cf. Fig. 1.21b. 5 L C R 4 3 x in m 2 1 0 -1 -2 -3 -4 -5 6 5 4 3 2 1 0 -1 -2 -3 -4 -5 -6 y in m (a) Sweet spot experiment (b) Median size for diuse sound Fig. 5.18 The perceptual sweet spot size as investigated by Frank [44] for SDM processed RIRs cover an area in IEM CUBE that increases with the SDM order N chosen (black = 5th, gray = 3rd, light gray = 1st order Ambisonics). In comparison to panned direct sound, one should keep some distance to the loudspeakers to avoid breakdown of envelopment 5.8 Resolution Enhancement: DirAC, HARPEX, COMPASS 119 5.8 Resolution Enhancement: DirAC, HARPEX, COMPASS The concept of parametric audio processing [5] describes ways to obtain resolutionenhanced first-order Ambisonic recordings by parametric decomposition and rendering. One main idea is to decompose short-term stationary signals of a sound scene into a directional and a less directional diffuse stream. For synthesis of the directional part based on mono signals, it is clear how to obtain the most narrow presentations by amplitude panning or higher-order Ambisonic panning of consistent r E vector predictions as in Chap. 2. The synthesis of diffuse and enveloping parts based on a mono signal can require extra processing such as either widening/diffuseness effects or reverberation as in Sects. 5.5 and 5.6, which both also provide a directionally wide distribution of sound. Or more practically, the recording itself could deliver sufficiently many uncorrelated instances of the diffuse sound to be played back by surrounding virtual sources. Envelopment and diffuseness is based on providing a consistently low interaural covariance or cross correlation of sufficiently high decorrelation. DirAC. A main goal of DirAC (Directional Audio Coding [5]) is finding signals and parameters for sound rendering by analyzing first-order Ambisonic recordings. One variant is to use the intensity-vector-based analysis in the short-term Fourier transform (STFT), see also Appendix A.6.2: r DOA (t, ω) = − ρc Re{ p(t, ω)∗ v(t, ω)} Re{W (t, ω)∗ [X (t, ω), Y (t, ω), Z (t, ω)]T } = , √ | p(t, ω)|2 2|W (t, ω)|2 (5.22) which can be treated similarly as the r E vector, regarding direction and diffuseness ψ = 1 − r DOA 2 . Single-channel DirAC is Ville Pulkki’s original √ way to decompose the W (t, ω) signal in the STFT domain into a directional signal 1 − ψ W (t, ω) that is synthesized by √ amplitude panning and a diffuse signal ψ W (t, ω) to be synthesized diffusely [45]. Virtual-microphone DirAC uses a first-order Ambisonic decoder to the given loudspeaker layout and time-frequency-adaptive sharpening masks increasing the focus of direct sounds, see Vilkamo [46] and [5, Ch. 6], or order e.g. Sect. 5.2.3. Playback of diffuse sounds benefits from an optional diffuseness effect. HARPEX (high angular-resolution plane-wave expansion [47]) is Svein Berge’s patented solution to optimally decode sub-band signals. It is based on the observation he made with Natasha Barrett that decoding to a tetrahedral loudspeaker layout is perceptually outperforming if the tetrahedron nodes are rotationally aligned with the sources of the recording. HARPEX accomplishes convincing diffuse and direct sound reproduction by decoding to a variably adapted virtual loudspeaker layout in every sub band. The layout is adaptively rotation-aligned with sources detected in the band. HARPEX is typically described using an estimator for direction pairs. 120 5 Signal Flow and Effects in Ambisonic Productions COMPASS (COding and Multidirectional Parameterization of Ambisonic Sound Scenes [48]) by Archontis Politis can be seen as an extension of DirAC. In contrast to DirAC, it tries to detect and separate multiple direct sound sources from the ambient or background sound. This is done by applying two different kinds of beamformers: one that contains only the direct sound for each sound source (source signals) and one that contains everything but the direct sound (ambient signal). Similar as before, the source signals are reproduced using amplitude panning and the ambient signal is sent to the decorrelator. In contrast to DirAC, COMPASS is not limited to first-order input but can also enhance the spatial resolution of higher-order inputs. 5.9 Practical Free-Software Examples 5.9.1 IEM, ambix, and mcfx Plug-In Suites The ambix_converter is an important tool when adapting between the different Ambisonic scaling conventions, e.g. the standard SN3D normalization that uses only (n−|m|)! 2−δm (n+|m|)! 4π (2−δm )(2n+1) for normalization instead of the full (n−|m|)! that is called (n+|m|)! 4π N3D, see Fig. 5.19. This alternating definition is because of a practical choice of the ambix format [49] to avoid high-order channels becoming louder than the zerothorder channel. Also it permits to adapt between channel sequences such as ACN’s i = n 2 + n + m or SID’s i = n 2 + 2(n − |m|) + (m < 0). It is advisable to use test recordings with the main directions, e.g. front, left, top, and to check that the channel separation for decoded material is roughly exceeding 20 dB for 5th-order material. Moreover, it contains inversion of the Condon-Shortley phase that typically causes a 180◦ rotation around the z axis, and it contains the left-right, front-back, and topbottom flips discussed in the mirroring operations above. Fig. 5.19 ambix_converter plug-in 5.9 Practical Free-Software Examples 121 The ambix_warping plugin, see Fig. 5.20, implements the above-mentioned warping operations shifting horizontal sounds towards one of the poles, or into both polar directions. Warping can be applied to any other direction than zenith and nadir when placing it between two mutually inverting ambix_rotation or IEM SceneRotator objects that intermediately rotate zenith to another direction. The IEM SceneRotator as the ambix_rotation plugin can be controlled by head tracking and it essential for an immersive headphone-based experience, see Fig. 5.21. Its processing is done as described above. The ambix_directional_loudness plugin in Fig. 5.22 implements the abovementioned directional amplitude window in either circular or equi-rectangular spherical shape. Several of these windows can be made, soloed, and remote controlled, each one of which allowing to set a gain for the inside and outside region. This is often useful in practice, when, e.g., reinforcing or attenuating desired or undesired signal parts within an Ambisonic scene. Fig. 5.20 ambix_warping plug-in in Reaper Fig. 5.21 IEM SceneRotator and ambix_rotator plug-ins 122 5 Signal Flow and Effects in Ambisonic Productions Fig. 5.22 ambix_directional_loudness plug-in Fig. 5.23 EnergyVisualizer plug-in To observe the changes made to the Ambisonic scene, the IEM Energy Visualizer can be helpful, see Fig. 5.23. If, for instance, the Ambisonic scene requires dynamic compression, as outlined in the section above, the IEM OmniCompressor is a helpful tool. It uses the omnidirectional Ambisonic channel to derive the compression gains (as a side-chain for all other Ambisonic channels). Similarly as the directional_loudness plug-in, the IEM DirectionalCompressor allows to select a window, but this time for setting different dynamic compression within and outside the selected window, see Fig. 5.24. The multichannel mcfx_filter plugin in Fig. 5.25 does not only implement a set of parametric equalizers, a low- and high cut that can be toggled between filter skirts 5.9 Practical Free-Software Examples 123 Fig. 5.24 OmniCompressor and DirectionalCompressor plug-in Fig. 5.25 mcfx_filter plug-in of either 2nd or 4th order, but it also features a real-time spectrum analyzer to observe the changes done to the signal. It is not only practical for Ambisonic purposes, it’s just a set of parametric filters that is equally applied to all channels and controlled from one interface. The mcfx_convolver plug-in in Fig. 5.26 is useful for many purposes, also scientific ones, e.g., when testing binaural filters or driving multi-channel arrays with filters, etc. Its configuration files use the jconvolver format that specifies which filter file (typically stored in multi-channel wav files) connects which of its multiple inlets to which of its multiple outlets. It is also used to implement the SDM-based reverberation described in the above sections. For a cheaper reverberation network, the IEM FDNReverb network described above can be used, see Fig. 5.27. It is not in particular an Ambisonic tool, but can 124 5 Signal Flow and Effects in Ambisonic Productions Fig. 5.26 mcfx_convolver plug-in Fig. 5.27 FDNReverb plug-in be used in any multi-channel environment. The particularity of the implementation in the IEM suite is that a slow onset can be adjusted. The ambix_widening plug-in in Fig. 5.28 implements the widening by frequencydependent, dispersive rotation of the Ambisonic scene around the z axis as described above. It can also be used to cheaply stylize lateral reflections instead of the IEM RoomEncoder (Fig. 4.36) with time constant settings exceeding 5 ms, or just as a 5.9 Practical Free-Software Examples 125 Fig. 5.28 ambix_widening plug-in in Reaper Fig. 5.29 mcfx_gain_delay plug-in widening effect. The setting single-sided permits to suppress the slow attack of the Bessel sequence. Another tool is quite helpful, the mcfx_gain_delay plug-in in Fig. 5.29. It permits to to solo or mute individual channels, as well as delay and attenuate them differently. What is more and often even more useful: It is invaluably helpful for testing the signal chain, as one can step through the channels with different signals. 5.9.2 Aalto SPARTA The SPARTA plug-in suite by Aalto University provides Ambisonic tools for encoding, decoding on loudspeakers and headphones, as well as visualization. A special feature is the COMPASS decoder plug-in Fig. 5.30 that can increase the spatial resolution of first-, second-, and third-order recordings. Playback can be done either 126 5 Signal Flow and Effects in Ambisonic Productions Fig. 5.30 COMPASS Decoder plug-in on arbitrary loudspeaker arrangements or their virtualization on headphones. The signal-dependent parametric processing allows to adjust the balance between direct and diffuse sound in each frequency band. In order to suppress artifacts due to the processing, the parametric playback (Par) can be mixed with the static decoding (Lin) of the original recording. While it is advisable to keep the parametric contribution below 2/3 for noticable directional improvements and low artifacts, in general, in recordings with cymbals or hihats it is advisable to fade towards lin starting at around 4 kHz. 5.9.3 Røde The Soundfield plug-in by Røde in Fig. 5.31 was originally designed to process the signals from the four cardioid microphone capsules of their Soundfield microphone. However, it also supports first-order Ambisonics as input format. It can decode to various loudspeaker arrangements by placing virtual microphones into the directions of the loudspeakers. The directivity of each virtual microphone can be adjusted between first-order cardioid and hyper-cardioid. Moreover, higher-order directivity patterns are possible using a parametric signal-dependent processing, resulting in an increase of the spatial resolution. References 127 Fig. 5.31 Soundfield by Røde plug-in References 1. D.G. Malham, Higher-order ambisonic systems for the spatialisation of sound, in Proceedings of the ICMC (Bejing, 1999) 2. M. Frank, F. Zotter, A. Sontacchi, Producing 3D audio in ambisonics, in 57th International AES Conference, LA, The Future of Audio Entertainment Technology? Cinema, Television and the Internet (2015) 3. M. Frank, A. Sontacchi, Case study on ambisonics for multi-venue and multi-target concerts and broadcasts. J. Audio Eng. Soc. 65(9) (2017) 4. D. Rudrich, F. Zotter, M. Frank, Efficient spatial ambisonic effects for live audio, in 29th TMT, Cologne, November 2016 5. V. Pulkki, S. Delikaris-Manias, A. Politis (eds.), Spatial Decomposition by Spherical Array Processing (Wiley, New Jersey, 2017) 6. A. Politis, Microphone array processing for parametric spatial audio techniques, Ph.D. dissertation, Aalto University, Helsinki (2016) 7. B. Bernschütz, Microphone arrays and sound field decomposition for dynamic binaural recording, Ph.D. Thesis, TU Berlin (TH Köln) (2016), http://dx.doi.org/10.14279/depositonce-5082 8. B. Bernschütz, A.V. Giner, C. Pörschmann, J. Arend, Binaural reproduction of plane waves with reduced modal order. Acta Acustica u. Acustica 100(5) (2014) 128 5 Signal Flow and Effects in Ambisonic Productions 9. Z. Ben-Hur, F. Brinkmann, J. Sheaffer, S. Weinzierl, B. Rafaely, Spectral equalization in binaural signals represented by order-truncated spherical harmonics. J. Acoust. Soc. Am. 141(6) (2017) 10. C. Andersson, Headphone auralization of acoustic spaces recorded with spherical microphone arrays, M. Thesis, Chalmers TU, Gothenburg (2017) 11. M. Zaunschirm, C. Schörkhuber, R. Höldrich, Binaural rendering of ambisonic signals by head-related impulse response time alignment and a diffuseness constraint. J. Acoust. Soc. Am. 143(6), 3616–3627 (2018) 12. E. Kurz, F. Pfahler, M. Frank, Comparison of first-order ambisonic microphone arrays, in International Conference, Spatial Audio (ICSA) (2015) 13. E. Hendrickx, P. Stitt, J.-C. Messonier, J.-M. Lyzwa, B.F. Katz, C. Boishéraud, Influence of head tracking on the externalization of speech stimuli for non-individualized binaural synthesis. J. Ac. Soc. Am. 141(3), 2011–2023 (2017) 14. H. Lee, C. Gribben, Effect of vertical microphone layer spacing for a 3d microphone array. J. Audio Eng. Soc. 62(12), 870–884 (2014) 15. R. Wallis, H. Lee, The reduction of vertical interchannel cross talk: The analysis of localisation thresholds for natural sound sources, MDPI Appl. Sci. 7(3) (2017) 16. H. Lee, M. Frank, F. Zotter, Spatial and timbral fidelities of binaural ambisonics decoders for main microphone array recordings, in AES IIA Conference (York, 2019) 17. M. Kronlachner, F. Zotter, Spatial transformations for the enhancement of ambisonic recordings, in Proceedings of the ICSA, Erlangen (2014) 18. M. Hafsati, N. Epain, J. Daniel, Editing ambisonic sound scenes, in 4th International Conference, Spatial Audio (ICSA) (Graz, 2017) 19. G. Aubert, An alternative to wigner d-matrices for rotating real spherical harmonics. AIP Adv. 3(6), 062121 (2013). https://doi.org/10.1063/1.4811853 20. F. Zotter, Analysis and synthesis of sound-radiation with spherical arrays, PhD. Thesis, University of Music and Performing Arts, Graz (2009) 21. D. Pinchon, P.E. Hoggan, Rotation matrices for real spherical harmonics: general rotations of atomic orbitals in space-fixed axes. J. Phys. A 40, 1751–8113 (2007) 22. J. Ivanic, K. Ruedenberg, Rotation matrices for real spherical harmonics. Direct determination by recursion. J. Phys. Chem. 100, 6342–6347 (1996) 23. J. Ivanic, K. Ruedenberg, Additions and corrections. J. Phys. Chem. A 102, 9099–9100 (1998) 24. C.H. Choi, J. Ivanic, M.S. Gordon, K. Ruedenberg, Rapid and stable determination of rotation matrices between spherical harmonics by direct recursion. J. Chem. Phys. 111, 8825–8831 (1999) 25. N.A. Gumerov, R. Duraiswami, Recursions for the computation of multipole translation and rotation coefficients for the 3-d helmholtz equation. SIAM 25, 1344–1381 (2003) 26. J.R. Driscoll, D.M.J. Healy, Computing fourier transforms and convolutions on the 2-sphere. Adv. Appl. Math. 15, 202–250 (1994) 27. M.A. Gerzon, G.J. Barton, Ambisonic decoders for HDTV, in prepr. 3345, 92nd AES Convention (Vienna, 1992) 28. S.P. Lipshitz, J. Vanderkoy, In-phase crossover network design. J. Audio Eng. Soc. 34(11), 889–894 (1986) 29. U. Zölzer, Digital Audio Effects, 2nd edn. (Wiley, New Jersey, 2011) 30. M.A. Gerzon, Signal processing for simulating realistic stereo images, in prepr. 3423, Convention Audio Engineering Society (San Francisco, 1992) 31. M.-V. Laitinen, T. Pihlajamäki, C. Erkut, V. Pulkki, Parametric time-frequency representation of spatial sound in virtual worlds. ACM TAP 9(2) (2012) 32. F. Zotter, M. Frank, M. Kronlachner, J. Choi, Efficient phantom source widening and diffuseness in ambisonics, in EAA Symposium on Auralization and Ambisonics (Berlin, 2014) 33. F. Zotter, M. Frank, Efficient phantom source widening. Arch. Acoust. 38(1) (2013) 34. F. Zotter, M. Frank, Phantom source widening by filtered sound objects, in prepr. 9728, Convention, Audio Engineering Society (Berlin, 2017) References 129 35. J. Stautner, M. Puckette, Designing multi-channel reverberators. Comput. Music J. 6(1), 52–65 (1982) 36. J.-M. Jot, A. Chaigne, Digital delay networks for designing artificial reverberators, in 90th AES Convention, prepr. 3030 (Paris, 1991) 37. D. Rocchesso, Maximally diffusive yet efficient feedback delay networks for artificial reverberation. IEEE Signal Process. Lett. 4(9) (1997) 38. S. Tervo, J. Pätynen, A. Kuusinen, T. Lokki, Spatial decomposition method for room impulse responses. J. Audio Eng. Soc. 61(1/2), 17–28 (2013), http://www.aes.org/e-lib/browse.cfm? elib=16664 39. J. Merimaa, V. Pulkki, Spatial impulse response rendering i: Analysis and synthesis. J. Audio Eng. Soc. 53(12), 1115–1127 (2005) 40. V. Pulkki, J. Merimaa, Spatial impulse response rendering ii: Reproduction of diffuse sounds and listening tests. J. Audio Eng. Soc. 54(1/2), 3–20 (2006) 41. M. Zaunschirm, M. Frank, F. Zotter, Brir synthesis using first-order microphone arrays, in prepr. 9944, 144th AES Convention (Milano, 2018) 42. C. Pörschmann, P. Stade, J.M. Arend, Binauralization of omnidirectional room impulse responses – algorithm and technical evaluation,” in DAFx-17 (Edinburgh, 2017) 43. M. Frank, F. Zotter, Spatial impression and directional resolution in the reproduction of reverberation, in Fortschritte der Akustik - DEGA (Aachen, 2016) 44. M. Frank, F. Zotter, Exploring the perceptual sweet area in ambisonics, in AES 142nd Convention (2017) 45. V. Pulkki, Spatial sound reproduction with directional audio conding. J. Audio Eng. Soc. 55(6), 503–516 (2007) 46. J. Vilkamo, T. Lokki, V. Pulkki, Directional audio coding: virtual microphone-based synthesis and subjective evaluation. J. Audio Eng. Soc. 57(9) (2009) 47. S. Berge, Device and method converting spatial audio signal, US Patent, no. 8,705,750 B2 (2014) 48. A. Politis, S. Tervo, V. Pulkki, Compass: Coding and multidirectional parameterization of ambisonic sound scenes, in 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (IEEE, 2018), pp. 6802–6806 49. C. Nachbar, F. Zotter, A. Sontacchi, E. Deleflie, Ambix - suggesting an ambisonic format, in Proceedings of the 3rd Ambisonics Symposium (Lexington, Kentucky, 2011) Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made. The images or other third party material in this chapter are included in the chapter’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. Chapter 6 Higher-Order Ambisonic Microphones and the Wave Equation (Linear, Lossless) … a turning point has been the design of HOA microphones, opening an exciting experimental field in terms of real 3D sound field recording … Jérôme Daniel [1] at Ambisonics Symposium 2009. Abstract Unlike pressure-gradient transducers, single-transducer microphones with higher-order directivity apparently turned out to be difficult to manufacture at reasonable audio quality. Therefore nowadays, higher-order Ambisonic recording with compact devices is based on compact spherical arrays of pressure transducers. To prepare for higher-order Ambisonic recording based on arrays, we first need a model of the sound pressure that the individual transducers of such an array would receive in an arbitrary surrounding sound field. The lossless, linear wave equation is the most suitable model to describe how sound propagates when the sound field is composed of surrounding sound sources. Fundamentally, the wave equation models sound propagation by how small packages of air react (i) when being expanded or compressed by a change of the internal pressure, and to (ii) directional differences in the outside pressure by starting to move. Based there upon, the inhomogeneous solutions of the wave equation describe how an entire free sound field builds up if being excited by an omnidirectional sound source, as a simplified model of an arbitrary physical source, such as a loudspeaker, human talker, or musical instrument. After adressing these basics, the chapter shows a way to get Ambisonic signals of high spatial and timbral quality from the array signals, considering the necessary diffusefield equalization, side-lobe suppression, and trade off between spatial resolution and low-frequeny noise boost. The chapter concludes with application examples. Gary Elko and Jens Meyer are the well-known inventors of the first commercially available compact spherical microphone array that is able to record higher-order Ambisonics [2], the Eigenmike. There are several inspiring scientific works with © The Author(s) 2019 F. Zotter and M. Frank, Ambisonics, Springer Topics in Signal Processing 19, https://doi.org/10.1007/978-3-030-17207-7_6 131 132 6 Higher-Order Ambisonic Microphones and the Wave Equation (Linear, Lossless) valuable contributions that can be recommended for further reading [3–12], above all Boaz Rafaely’s excellent introductory book [13]. This mathematical theory might appear extensive, but it cannot be avoided when aiming at an in-depth understanding of higher-order Ambisonic microphones. The theory enables processing of the microphone signals received such that the surrounding sound field excitation is retrieved in terms of an Ambisonic signal. Some readers may want to skip the physical introduction and resume in Sect. 6.5 on spherical scattering or Sect. 6.6 on the processing of the array signals. 6.1 Equation of Compression Wave propagation involves reversible short-term temperature fluctuations becoming effective when air is being compressed by sound, causing the specific stiffness of air in sound propagation. The Appendix A.6.1 shows how to derive this adiabatic compression relation based on the first law of thermodynamics and the ideal gas law. It relates the relative volume change VV0 to the pressure change p = −K VV0 by the bulk modulus of air. After expressing the bulk modulus by more common constants1 K = ρ c2 and differentially formulating the volume change over time using the change of the sound particle velocity in space, e.g. in one dimension ṗ = x , cf. Appendix A.6.1, we get the three-dimensional compression equation: −ρ c2 ∂v ∂x ∂p = −ρ c2 ∇ T v. ∂t (6.1) ∂ Here the inner product of the Del symbol ∇ T = ( ∂∂x , ∂∂y , ∂z ) with v yields what is ∂v x z + ∂ yy + ∂v . The equation means: Indepencalled divergence div(v) = ∇ T v = ∂v ∂x ∂z dently of whether the outer boundaries of a small package of air are traveling at a common velocity: If there are directions into which their velocity is spatially increasing, the resulting gradual volume expansion over time causes a proportional decrease of interior pressure over time. 6.2 Equation of Motion The equation of motion is relatively simple to understand from the Newtonian equax equates the external force to tion of motion, e.g. for the x direction, Fx = m ∂v ∂t . For a small package of mass m times acceleration, i.e. increase in velocity ∂v ∂t air with constant volume V0 = xyz, the mass is obtained by the air density m = ρ V0 , and the force equals the decrease of in pressure over the three space directions, times the corresponding partial surface, e.g. for the x direction 1 Typical constants are: density ρ = 1.2 kg/m3 , speed of sound c = 343 m/s. 6.2 Equation of Motion 133 Fx = −[ p(x + x) − p(x)]yz. For the x direction, this yields after expanding by x x ∂vx p − V0 = ρ V0 . x ∂t Dividing by −V0 and letting V0 → 0, we obtain the typical shape of the equation of motion for all three space directions ∇ p = −ρ ∂v . ∂t (6.2) The equation of motion means: Independently of the common exterior pressure load on all the outer boundaries of a small air package, an outer pressure decrease into any direction implies a corresponding pushing force on the package causing a proportional acceleration into this direction. 6.3 Wave Equation We can combine the compression equation ∂p ∂t = −ρ c2 ∇ T v with the equation of motion ∇ p = −ρ ∂v by deriving the first one with regard to time ∂∂t 2p = −ρ c2 ∇ T ∂v ∂t ∂t and the second one with the gradient ∇ T yielding the Laplacian ∇ T ∇ = , hence . Division of the first result by c2 and equating both terms yields the p = −ρ∇ T ∂v ∂t 2 lossless wave equation p = c12 ∂t∂ 2 p that is typically written as − 2 1 ∂2 p = 0. c2 ∂t 2 (6.3) Obviously, the wave equation relates the curvature in space (expressed by the Laplacian) to curvature in time (expressed by the second-order derivative). If p is a pure sinusoidal oscillation sin(ω t + φ0 ), the second derivative in time corresponds to a factor −ω2 , and by substitution with the wave-number k = ωc , we can write the frequency-domain wave equation as ( + k 2 ) p = 0, Helmholtz equation. (6.4) 6.3.1 Elementary Inhomogeneous Solution: Green’s Function (Free Field) The Green’s function is an elementary prototype for solutions to inhomogeneous problems ( + k 2 ) p = −q, which is defined as + k 2 G = −δ. 134 6 Higher-Order Ambisonic Microphones and the Wave Equation (Linear, Lossless) A general excitation q ofthe equation can be represented by its convolution with the Dirac delta distribution q(s) δ(r − s) dV (s) = q(r). Consequently, as the wave equation is linear, the general solution must therefore also equal the convolution of the Green’s function with the excitation function p(r) = q(s) G(r − s) dV (s) over space; if formulated in the time domain: also over time. The integral superimposes acoustical responses of any point in time and space of the source phenomenon, weighted by the corresponding source strength in space and time. The Green’s function in three dimensions is derived in Appendix A.6.3, Eq. (A.91), G= e−ik r , 4πr (6.5) with the wave number k = ωc and distance between source and receiver r = r − r s 2 . Acoustic source phenomena are characterized by the behavior of the Green’s function: far away, the amplitude decays with r1 and the phase −kr = −ω rc corresponds to the radially increasing delay rc . Both is expressed in Sommerfeld’s radiation condition limr →∞ r ∂r∂ p + ik p = 0. Plane waves. The radius coordinate of the Green’s function is the distance between two Cartesian position vectors r s and r, the source and receiver location. Letting one of them become large is denoted by re-expressing it in terms of radius and direction vector r s = rs θ s . This permits far-field approximation rs = r s − r = (rs θ s − r)T (rs θ s − r) = lim rs = lim rs 1 − 2 rs →∞ rs →∞ θ Ts r rs + rr2 = rs − θ Ts r. s 2 rs2 − 2rs θ Ts r + r 2 (with lim x→0 √ (6.6) 1 − 2x = 1 − x). For the phase approximation, for instance at a wave-length of 30 cm, we notice even for a relatively small distance difference, e.g. between 15 m and 15 m + 15 cm, we could change the sign of the wave. To approximate the phase of the Green’s function, we must therefore at least use rs − θ Ts r as approximation. By contrast, this level of precision is irrelevant for the magnitude approximation, e.g., it would be negligible 1 . if we used 151m instead of the magnitude 15 m+15 cm At a large distance rs assumed to be constant, the Green’s function is proportional to a plane wave from the source direction θ s lim G = rs →∞ e−ik rs 4π rs eik θ s r . T (6.7) The plane-wave part is of unit magnitude | p| = 1 p = eik θ s T r (6.8) 6.3 Wave Equation 135 and its phase evaluates the projection of the position vector onto the plane-wave arrival direction θ s . Towards the direction θ s , the phase grows positive, i.e. the signal arrives earlier. Towards the plane-wave propagation direction −θ s the phase grows negatively, implying an increasing time delay, which is constant on any plane perpendicular to θ s . Plane waves are an invaluable tool to locally approximate sound fields from sources that are sufficiently far away, within a small region.2 6.4 Basis Solutions in Spherical Coordinates Figure 4.11 shows spherical coordinates [14, 15] using radius r , azimuth ϕ, and zenith ϑ. For simplification, zenith is replaced by ζ = cos ϑ = rz , here. We may solve the Helmholtz equation ( + k 2 ) p = 0 in spherical coordinates by the radial and directional parts of the Laplacian = r + ϕ,ζ , as identified in Appendix A.3 r = ∂2 2 ∂ , + 2 ∂r r ∂r ϕ,ζ = 1 ∂2 1 − ζ 2 ∂2 2 ∂ + − ζ . r 2 ∂ζ 2 r 2 ∂ζ r 2 (1 − ζ 2 ) ∂ϕ 2 (6.9) We already know the spherical harmonics as directional eigensolutions from Sect. 4.7 ϕ,ζ Ynm = − n(n + 1) m Yn r2 (6.10) and assume them to be a factor of the solution pnm = R Ynm determining the value of ϕ,ζ in (r + k 2 + ϕ,ζ ) pnm = 0. We find a separated radial differential equation 2 ∂ after insertion, multiplication by Yr m , and re-expressing the differentials ∂r∂ = k ∂kr and ∂2 ∂r 2 ∂ = k 2 ∂(kr )2 2 (kr )2 n ∂2 ∂ + (kr )2 − n(n + 1) R = 0. + 2(kr ) ∂(kr )2 ∂(kr ) (6.11) Appendix A.6.4 shows how to get physical solutions for R of this, so-called, spherical Bessel differential equation: spherical Hankel functions of the second kind h (2) n (kr ) able to represent radiation (radially outgoing into every direction), consistently with Green’s function G, diverging with an (n + 1)-fold pole at kr = 0, a physical behavior that would also be observed after spatially differentiating G, see Fig. 6.1; spherical Bessel functions jn (kr ) = ℜ{h (2) n (kr )} are real-valued, converge everywhere, exhibit 2 This is because, strictly speaking, an entire plane-wave sound field is unphysical and of infinite energy: either the exhaustive in-phase vibration of an infinite plane is required, or an infiniteamplitude point-source infinitely far away is required with infinite anticipation ts → +∞ (noncausal). 136 6 Higher-Order Ambisonic Microphones and the Wave Equation (Linear, Lossless) 1 1 ℜ{h(2)(kr)} 0 ℜ{h(2)(kr)} 1 (2) ℜ{h (kr)} 2 (2) ℜ{h3 (kr)} 0.5 0 0 5 10 15 ℑ{h(2)(kr)} 0 ℑ{h(2)(kr)} 0.5 1 (2) 2 (2) ℑ{h3 (kr)} ℑ{h (kr)} 0 0 5 10 15 (2) 40 |h0 (kr)|dB 20 |h(2)(kr)|dB 0 |h(2)(kr)|dB −20 |h(2)(kr)|dB 0.03 1 2 3 0.1 0.32 1 3.2 10 32 (2) Fig. 6.1 Spherical Bessel functions jn (kr ) = ℜ{h n (kr )} (top left), imaginary part of spherical (2) Hankel functions ℑ{h (2) n (kr )} (top right), and magnitude/dB of |h n (kr )| (bottom), over kr an n-fold zero at kr = 0, and can’t represent radiation. Implementations typically rely on the accurate standard libraries implementing cylindrical Bessel and Hankel functions: jn (kr ) = π 1 J 1 (kr ), 2 kr n+ 2 h (2) n (kr ) = π 1 (2) H 1 (kr ). 2 kr n+ 2 (6.12) Wave spectra and spherical basis solutions. Any sound field evaluated at a radius r where the air is source-free and homogeneous in any direction can be represented by spherical basis functions for enclosed jn (kr )Ynm (θ ) and radiating fields h n (kr )Ynm (θ) ∞ n p= bnm jn (kr ) + cnm h n (kr ) Ynm (θ) . (6.13) n=0 m=−n Here, bnm are the coefficients for incoming waves that pass through and emanate from radii larger than r and cnm are the coefficients of outgoing waves radiating from sources at radii smaller than r ; the coefficients are called wave spectra of the incoming and outgoing waves, cf. [16]. Ambisonic plane-wave spectrum, plane wave. Plane waves only use the coefficients bnm , while cnm = 0 in Eq. (6.13). The sum of incoming plane waves from all directions, whose amplitudes are given by the spherical harmonics coefficients χnm as a set of Ambisonic signals are described by the incoming wave spectrum, see Appendix A.6.5, Eq. (A.119) bnm = 4π in χnm . (6.14) Figure 6.2 shows a single plane wave incoming from the direction θ s represented by bnm = 4π in Ynm (θ s ) (6.15) 6.4 Basis Solutions in Spherical Coordinates 137 Fig. 6.2 Plane wave from y axis ϕ = ϑ = π2 in horizontal cross section; time steps correspond to 0◦ , 60◦ , 120◦ , and 180◦ phase shifts φ in the plot ℜ{ p eiφ } showing p from Eq. (6.13) with cnm = 0 and bnm of Eq. (6.15) with bnm = 4π in Ynm ( π2 , π2 ); long wave (top), short wave (bottom); simulation uses N = 25 and area shows |kx|, |ky| < 2π and 8π at four different time steps corresponding to 0◦ , 60◦ , 120◦ and 180◦ time shifts for the two wave lengths shown. 6.5 Scattering by Rigid Higher-Order Microphone Surface Higher-order Ambisonic microphone arrays are typically mounted on a rigid sphere of some radius r = a, such as the Eigenmike EM32, see Fig. 6.3. The physical boundary of the rigid spherical surface is expressed as a vanishing radial component of the sound particle velocity. The radial sound particle velocity is obtained via the Fig. 6.3 32-channel higher-order Ambisonic mic. Eigenmike EM32 138 6 Higher-Order Ambisonic Microphones and the Wave Equation (Linear, Lossless) Fig. 6.4 Plane waves scattered by rigid sphere ka = π (top) or ka = 4π (bottom); time steps correspond to 0◦ , 60◦ , 120◦ , and 180◦ phase shifts φ in the plot ℜ{ p eiφ } showing p from Eq. (6.13) with bnm and cnm from Eq. (6.15) with bnm = 4π in Ynm ( π2 , π2 ) and Eq. (6.16); simulation uses N = 25 equation of motion Eq. (6.2) by deriving Eq. (6.13). This requires to evaluate differentiated spherical radial solutions jn (x) as well as h (2) n (x), which is implemented by f n (x) = nx f n (x) − f n+1 (x) for either of the functions, cf. e.g. [16]. A sound-hard boundary condition at the radius a requires i vr r =a = ρc ∞ n bnm jn (kr ) + cnm h (2) n (kr ) Y m (θ) r =a n = 0, n=0 m=−n which is fulfilled by a vanishing term in square brackets. The rigid boundary responds to incoming surround-sound by velocity-canceling outgoing waves h (2) n (ka) cnm = − jn (ka) bnm . The coefficients ψnm yield the sound pressure in Fig. 6.4, jn (ka) ψnm Ynm (θ ), with ψnm = jn (kr ) − h (2) (kr ) bnm . n h (2) n (ka) r =a n=0 m=−n (6.16) ∞ p= n The two terms of the bracket are typically further simplified by a common denominator and recognizing the Wronskian Eq. (A.97) in the numerator jn (x)h n (x)− jn (x)h n (x) = x 2 hi (x) h (x) n n ψnm |r =a = i (ka)2 h (2) n (ka) bnm . (6.17) 6.5 Scattering by Rigid Higher-Order Microphone Surface 139 0 2 (2) −1 |(ka) h’0 (ka)| dB −20 |(ka)2 h’(2)(ka)|−1dB −40 |(ka)2 h’(2) (ka)|−1dB 2 1 2 (2) −1 |(ka) h’3 (ka)| dB −60 0.03 0.1 0.32 1 3.2 10 32 Fig. 6.5 Attenuation/dB of Ambisonic signals of different orders for varying values of ka Relation of recorded sound pressure to Ambisonic signal. The scattering equation relates the recorded sound pressure expanded in spherical harmonics to the Ambisonic signal of surround sound scene, see frequency responses in Fig. 6.5, ψnm |r =a = 4π in+1 (ka)2 h (2) n (ka) χnm . (6.18) It is formally convenient that as soon as the sound pressure is given in terms of its spherical harmonic coefficient signals ψnm , the Ambisonic signals χnm of a concentric playback system are obviously just an inversely filtered version thereof, with no need for further unmixing/matrixing. Recognizable from Fig. 6.6 and following our intuition, waves of lengths larger than the diameter 2a of the sphere will only weakly map to complicated high-order patterns. It is therefore easily understood that the transfer function in+1 [(ka)2 h (2) n (ka)]−1 attenuates the reception of high-order Ambisonic signals at low frequencies, see Fig. 6.5. 6.6 Higher-Order Microphone Array Encoding The block diagram of Ambisonic encoding of higher-order microphone array signals is shown in Fig. 6.7. The first processing step is about decomposing the pressure samples p(t) from the microphone array into its spherical harmonics coefficients ψ N (t): To which amount do the samples contain omnidirectional, figure-of-eight, and other spherical harmonic patterns, up to which the microphone arrangement allows decomposition. The frequency-independent matrix (Y TN )† does the conversion. It is the left-inverse to the spherical harmonics sampled at the microphone positions, as shown in the upcoming section. The second step then sharpens the sound pressure image to an Ambisonic signal by filtering the spherical harmonic coefficient signals. The basic relation between sound pressure coefficients and Ambisonic signals is given in Eq. (6.18) and describes a filter for every coefficient signal, differing only in filter characteristics for different spherical harmonic orders. Robustness to noise, microphone matching and position- 140 6 Higher-Order Ambisonic Microphones and the Wave Equation (Linear, Lossless) diameter,i.e. k a = π2 (a) 1 32 -wave-length diameter,i.e. k a = π 32 (b) (c) 1 16 -wave-length diameter,i.e. k a = π 16 (d) 1-wave-length diameter,i.e. k a = π (e) 1 8 -wave-length diameter,i.e. k a = π 8 (f) 2-wave-length diameter,i.e. k a = 2 π (g) 1 4 -wave-lengths diameter,i.e. k a = π 4 1 2 -wave-length (h) 4-wave-lengths diameter,i.e. k a = 4 π Fig. 6.6 Plane-wave sound pressure image ℜ{ p e−ika } on rigid sphere with varying ka using ψnm from Eq. (6.17) expanded over the spherical harmonics p = ψnm Ynm and χnm = Ynm (0, 0) for a plane wave from z. With the wave length λ = cf , the value ka is related to a diameter 2a of = 2ππ cf a = 2a λ in wave lengths to express frequency dependency; simulation uses N = 50; for a = 4.2 cm, ka values correspond to f = 125, 250, 500, 1000, 2000, 4000, 8000, 16000 Hz ka π 6.6 Higher-Order Microphone Array Encoding 141 Fig. 6.7 Higher-order Ambisonic microphone encoding: sound pressure samples p(t) are spherical-harmonics decomposed by the matrix (Y TN )† , and the resulting coefficient signals ψ N (t) are converted to Ambisonic signals χ N (t) by the sharpening filters ρn (ω) ing is the key here, and only achieved by the careful design of these filters, as shown in a further sections below. The design considers a gradually increasing sharpening over frequency, for which it moreover employs a filter bank with separate, max-r E weighted and E normalized bands, in order to provide (i) limitation of noise and errors, (ii) a frequency response perceived as flat, and (iii) optimal suppression of the sidelobes. 6.7 Discrete Sound Pressure Samples in Spherical Harmonics To determine the Ambisonics signals χnm , we obviously need to find ψnm based on all sound pressure samples p(θi ) recorded by the microphones distributed on the rigid-sphere array. To accomplish this, we set up a system of model equations equating the pressure samples to the unknown coefficients ψnm expanded over the spherical harmonics Ynm (θi ) sampled at every microphone position. A vector and matrix notation p = [ p(θi )]i and Y TN = [ y(θi )T ]i,nm is helpful ⎡ ⎤ ⎤⎡ . . . YNN (θ 1 ) ψ00 .. .. ⎥ ⎢ .. ⎥ . . ⎦⎣ . ⎦ ψNN Y00 (θ M ) . . . YNN (θ M ) ⎤ ⎡ 0 p(θ1 ) Y0 (θ 1 ) ⎢ .. ⎥ ⎢ .. ⎣ . ⎦=⎣ . p(θM ) pN = Y TN ψ N . (6.19) Left inverse (MMSE). The equation can be (pseudo-)inverted if the matrix Y N is well conditioned. Typically more microphones are used than coefficients searched M ≥ (N + 1)2 . Inversion is a matter of mean-square error minimization: As the M dimensions may contain more degrees of freedom than (N + 1)2 , the coefficient 142 6 Higher-Order Ambisonic Microphones and the Wave Equation (Linear, Lossless) vector ψ N giving the closest model pN to the measurement p is searched, mine2 , ψN with e = pN − p = Y TN ψ N − p. (6.20) The minimum-mean-square-error (MMSE) solution is, see Appendix A.4, Eq. (A.65), ψ N = (Y N Y TN )−1 Y N p = (Y TN )† p. (6.21) The resulting left inverse (Y N Y TN )−1 Y N inverts the thin matrix Y TN from the left. (Y TN )† symbolizes the pseudo inverse; it is left-inverse for thin matrices. If the microphones are arranged in a t-design and the order N is chosen suitably, is equivalent to the left inverse. A more thorough then the transpose matrix times 4π L discussion on spherical point sets can be found in [17–19]. The maximum determinant points [20] are a particular kind of critical directional sampling scheme that allows to use exactly as few microphones M = (N + 1)2 as spherical harmonic coefficients obtained, yielding a well-conditioned square matrix Y N , so that it can be inverted directly without left/pseudo-inversion. The 25 maximum-determinant points for N = 4 are used in the simulation example below.3 Finite-order assumption and spatial aliasing. An important implication of estimating ψnm is that we need to assume that the distribution of the sound pressure is of limited spherical harmonic order on the measurement surface. This could be done by restricting the frequency range, as high-order harmonics are attenuated wellenough according above suitable frequency limits, cf. Fig. 6.5. However, low-pass filtered signals are unacceptable in practice. Instead, one has to accept spatial aliasing at high frequencies, i.e. directional mapping errors and direction-specific comb T −1 filters. Figure m6.8 shows spatial aliasing of ψ N = (Y N ) p in the angular domain p = ψnm Yn . 6.8 Regularizing Filter Bank for Radial Filters −1 The filters in (ka)2 h (2) of Fig. 6.5 exhibit an nth-order zero at 0 Hz, ka = 0. n (ka) To retrieve the Ambisonic signals χnm from the sound pressure signals ψnm , their inverse would have a n-fold (unstable) pole at 0 Hz. Considering that microphone self noise and array imperfection cause erroneous signals louder than the acoustically expected nth-order vanishing signals around 0 Hz, filter shapes will moreover cause an excessive boost of erroneous signals unless implemented with precaution. Filters of the different orders n must be stabilized by high-pass slopes of at least the order n, see also [6, 9, 21–25], and with (n + 1)th-order high-pass slopes, see Fig. 6.9, such errors are being cut off by first-order high-pass slopes at exemplary cut-on frequencies at 90, 680, 1650, 2600 Hz for the Ambisonic orders 1, 2, 3, 4, yielding a 3 md04.0025 on https://web.maths.unsw.edu.au/~rsw/Sphere/Images/MD/md_data.html. 6.8 Regularizing Filter Bank for Radial Filters 143 (a) 1 32 -wave-length diameter, i.e. k a = π 32 (c) 1 16 -wave-length diameter, i.e. k a = π 16 (d) 1-wave-length diameter, i.e. k a = π (e) 1 8 -wave-length diameter, i.e. k a = π 8 (f) 2-wave-length diameter, i.e. k a = 2π (g) 1 4 -wave-lengths diameter, i.e. k a = π 4 (b) 1 2 -wave-length diameter, i.e. k a = π 2 , (h) 4-wave-lengths diameter, i.e. k a = 4π Fig. 6.8 Interpolated plane-wave sound pressure image ℜ{ p e−ika } on rigid-sphere array with 25 microphones allowing decomposition up to the order N = 4; simulation uses orders up to 25, and the aliasing-free operation can only be expected within kr < N 144 6 Higher-Order Ambisonic Microphones and the Wave Equation (Linear, Lossless) 30dB n=0 n=1 n=2 n=3 n=4 20dB 10dB -0dB n<1 -10dB 32Hz n<2 63Hz 125Hz n<3 250Hz 500Hz n<4 1kHz 2kHz n<5 4kHz 8kHz (2) Fig. 6.9 Filters (ka)2 h n (ka)/dB over frequency/Hz, regularized with (n + 1)th-order high-pass filters n<1 0dB n<2 n<3 n<4 sum H n<5 0 H1 -20dB H2 -40dB H H -60dB 32Hz 63Hz 125Hz 250Hz 500Hz 1kHz 2kHz 4kHz 8kHz 3 4 Fig. 6.10 Stabilizing filter bank/dB over frequency/Hz: signal orders n > b are excluded from the band b noise boost of 20 dB for a 4th-order microphone with a = 4.2 cm, at most. However, just cutting on the frequencies of each order is not enough: every cut-on frequency causes a noticeable loudness drop below due to the discarded signal contributions. It is better to design a filter bank with crossovers instead, which allows compensation for the loudness loss in every band. A zero-phase, nth-order Butterworth high-pass ωn response is defined by Hhi = 1+ω n and amplitude-complementary to the low pass 1 Hlo = 1+ωn , so that Hhi + Hlo = 1. Using this filter type, the filter bank in Fig. 6.10 can be constructed as follows: The band-pass filters Hb (ω) are composed of a (b + 1)th-order high- and (b + 2)th-order low-pass skirt at ωb , and ωb+1 , respectively, except for the band b = 0 (low-pass) and b = N (high-pass) Ĥ0 (ω) = 1+ 1 ω 2 , ω1 ω b+1 ωb b+1 1 + ωωb Ĥb (ω) = 1+ ω N+1 ωN N+1 . 1 + ωωN 1 ω b+2 ωb+1 , ĤN (ω) = (6.22) To make the bands perfectly reconstructing, filters are normalized by the sum response Hb = N Ĥb b=0 Ĥb (ω) . (6.23) 6.8 Regularizing Filter Bank for Radial Filters 145 By adjusting the cut-on frequencies ωb of the different orders b = 1, . . . , N, the noise and mapping behavior of the microphone array is adjusted; only the zeroth order is present in every band down to 0 Hz. This filter bank design moreover allows to adjust loudness and sidelobe suppression in every frequency band, separately. 6.9 Loudness-Normalized Sub-band Side-Lobe Suppression The filter bank design shown above would only yield Ambisonic signals whose order increases with the frequency band. Ideally, this variation of the order comes with the necessity of individual max-r E sidelobe suppression in every band. Moreover, Ambisonic signals of different orders are differently loud, so also diffuse-field equalization of the E measure is desirable in every band. To fulfill the above constraints, we propose to use the following set of FIR filter responses as given in [26, 27], that are modified by a filter bank employing diffuse-field normalized max-r E -weights in separate frequency bands b = 0, . . . , N, cf. Fig. 6.11, with the nth order discarded for bands below b < n: N ika an,b Hb (ω) i−n−1 (ka)2 h (2) n (ka) e . ρn (ω) = (6.24) b=n √ Here, eika removes the linear phase of h (2) n , and an,b is the set of diffuse-field ( E) equalized max-r E weights for the band b in which the Ambisonic orders retrieved are 0 ≤ n ≤ b an,b = ⎧ ⎪ ⎨ ⎪ ⎩ 137.9◦ Pn cos b+1.51 0, N 2 2 , for n ≤ b ◦ n=0 (2n+1) Pn cos 137.9◦ N+1.51 n=0 (2n+1) Pn cos 137.9 b+1.51 b (6.25) otherwise. Figure 6.12 shows the polar patterns of the corresponding direction-spread functions. For the implementation of ρn (ω) by fast block filtering, ω = 2π f and k = ω/c are uniformly sampled with frequency, and the inverse discrete Fourier transform yields the associated impulse responses (attention: the value at 0 Hz must be replaced for stable results, and cyclic time-domain shifts and windows are necessary). The direction-spread function of a plane-wave sound pressure mapped to a directional Ambisonic signal becomes frequency-dependent as shown in Fig. 6.13, and it has minimal side lobes. 146 6 Higher-Order Ambisonic Microphones and the Wave Equation (Linear, Lossless) 30dB n=0 n=1 n=2 n=3 n=4 20dB 10dB 0dB n<1 -10dB 32Hz 63Hz n<2 n<3 125Hz 250Hz 500Hz n<4 1kHz n<5 2kHz 4kHz 8kHz Fig. 6.11 Filter-bank-regularized/dB over frequency/Hz, diffuse-field equalized max-r E weighted 2 (2) spherical microphone array responses using in ρn (ω) = N b=n an,b Hb (ω) (ka) h n (ka) 13 5° 0dB 90° 45 ° −12dB b=0 b=1 b=2 b=3 b=4 5° 13 45 ° 0° 180° −24dB 90° Fig. 6.12 Diffuse-field equalized (to E = 1) max-r E direction-spread functions; even orders are plotted on upper, odd orders on lower semi-circle Fig. 6.13 Direction spread/dB over frequency/Hz in zenithal cross section/degrees through Ambisonic signal of simulated microphone processing response to plane wave from zenith and the parameters a = 4.2 cm, M = 25 mics., max-r E -weighted in bands 90, 680, 1650, 2600 Hz for the cut on of the orders 1, 2, 3, 4. Simulation is done with the order Nsim = 30 and spatial aliasing will occur above 5.2 kHz. Gain matching was assumed to be up to < ±0.5 dB accurate; the map shows the direction spread normalized to its value at 0◦ for every frequency to make its shape easier to read 6.10 Influence of Gain Matching, Noise, Side-Lobe Suppression Typical gain mismatch between the microphones is not always more accurate than 0.5 dB. The result is that the physically dominant omnidirectional signal will leak into the higher-order signals by directionally random gain variations. However, acoustically, higher-order components are expected to be weak and to require amplification. 6.10 Influence of Gain Matching, Noise, Side-Lobe Suppression 147 (a) 50, 160, 500, 1600 cut on with < ±0.5 dB gain matching, no sidelobe suppression (b) 50, 160, 500, 1600 cut on with only 4th -order sidelobe suppression, assuming perfect gain match (c) 50, 160, 500, 1600 cut on with individual max-r E sidelobe suppression per band, assuming perfect gain match Fig. 6.14 Influence of carelessly selected cut-on frequencies for regularization (top), and of nonindividual sidelobe suppression per band (middle), in contrast to ideal results (bottom); the maps show direction spreads normalized to their values at 0◦ for every frequency to make side lobes easier to read The effect on mapping is equivalent to one of microphone self noise, however gain mismatch yields a correlated signal exciting the microphones, whereas self-noise yields low-frequency noise. If regularization filters were set to 50, 160, 500, 1600 and sidelobe suppression turned off for testing, one would get the poor image as in Fig. 6.14a, where high-order signals at low frequencies are highly boosted. If a noise-free case is assumed, and only the max-r E side-lobe suppression of the highest band is used for all bands, one gets the image in Fig. 6.14b, which improves with individual max-r E weights in Fig. 6.14c. Self-noise behavior. Assuming that self-noise of the microphones is uncorrelated, it will also remain uncorrelated and of equal strength after decomposing the M microphone signals pi = N into the (N + 1)2 spherical harmonic coefficient sig2 nals ψnm = (N+1) N , if M ≈ (N + 1)2 and the microphone arrangement permits a M well-conditioned pseudo inversion Y †N . The spectral change of the microphone self noise due to the radial filters ρn (ω) can be described by the noise of the (2n + 1) 148 6 Higher-Order Ambisonic Microphones and the Wave Equation (Linear, Lossless) 25dB 20dB 15dB 10dB 5dB 0dB -5dB 32Hz 63Hz 125Hz 250Hz 500Hz 1kHz 2kHz 4kHz 8kHz Fig. 6.15 Self-noise modification |G(ω)|2 /dB over frequency/Hz for the filter bank configurations using the cut on frequencies 2k, 3k, 4k, 5k (no noise amplification), 600, 2k, 3.5k, 4.2k (5 dB noise amplification), 280, 1.3k, 2.6k, 3.6k (10 dB noise amplification), 150, 950, 2k, 3.15k (15 dB noise amplification), and 90, 680, 1.65k, 2.6k (20 dB noise amplification) signals of the same order, amplified by |ρn (ω)|2 , in comparison to the zeroth-order signal: N |G(ω)|2 = n=0 (2n + 1)|ρn (ω)| 2 |(ka)2 h (2) 0 (ka)| 2 . (6.26) Figure 6.15 analyzes the noise amplification for the simulation example (max-r E weighting in each sub band, a = 4.2 cm) and shows the dependency on exemplary cut on frequencies configured to tune the filterbank to 0, 5, 10, 15, and 20 dB noise boosts. The trade here is: the more noise boost one can allow, the more directional resolution one gets, see Fig. 6.16. Open measurement data (SOFA format) characterizing the directivity patterns of the 32 Eigenmike em32 transducers are provided under the link http://phaidra.kug.ac.at/o:69292. They are measured on a 12◦ × 11.25◦ azimuth× zenith grid, yielding 480 × 256 pt impulse responses for each of the 32 transducers. 6.11 Practical Free-Software Examples 6.11.1 Eigenmike Em32 Encoding Using Mcfx and IEM Plug-In Suites We give a practical signal processing example for the Eigenmike em32 which is applicable e.g. in digital audio workstations. First the 32 signals are encoded by matrix multiplication (IEM MatrixMultiplier), cf. Fig. 6.17a, yielding 25 fourth-order signals. The preset (json file) is provided online http://phaidra.kug.ac.at/o:79231. The radial filtering that sharpens the surround sound image uses mcfx-convolver, see Fig. 6.17b, with 25 SISO filters, one for each Ambisonic signal, using the 5 different filter curves for the orders n = 0, . . . , 4 as defined above. The convolver presets (wav 6.11 Practical Free-Software Examples 149 (a) 0 dB noise boost (2k , 3k , 4k , 5k ) individual max-r E weighting (b) 5 dB noise boost (600, 2k , 3.5k , 4.2k ) individual max-r E weights (c) 10 dB noise boost (280, 1.3k , 2.6k , 3.6k ) individual max-r E weights (d) 15 dB noise boost (150, 950, 2k , 3.15k ) individual max-r E weights (e) 20 dB noise boost (90, 680, 1.65k , 2.6k ) individual max-r E weights Fig. 6.16 Direction spread/dB for over frequency/Hz and zenith/degrees of filterbank with different settings to achieve 0, 5, 10, 15, 20 dB noise boosts; the maps show direction spreads normalized to their values at 0◦ at every frequency as above 150 6 Higher-Order Ambisonic Microphones and the Wave Equation (Linear, Lossless) (a) em 32 encoding (b) em 32 radial filters Fig. 6.17 IEM MatrixMultiplier encoding the Eigenmike em32 signals and mcfxconvolver applying radial filters to encoded em32 recording Fig. 6.18 Practical equalization of the em32 transducer characteristics by two parametric shelving filters of the mcfx_filter, cf. [28] files and config files for mcfx-convolver) are provided online http://phaidra.kug. ac.at/o:79231 and are available for the different noise boosts 0, 5, 10, 15, 20 dB. As found in [28], the em32 transducers exhibit a frequency response that favors low frequencies and attenuates high frequencies. This behavior is sufficiently well equalized in practice using two parametric shelving filters, a low shelf at 500 Hz with a gain of −5 dB, and a high shelf at 5 kHz using a gain of +5 dB, see Fig. 6.18. 6.11.2 SPARTA Array2SH The SPARTA suite by Aalto University includes the Array2SH plug-in shown in Fig. 6.19 to convert the transducer signals of a microphone array into Ambisonics. It provides both encoding of the signals, as well as calculation and application of radialfocusing filters based on the geometry of the array. It supports rigid and open arrays 6.11 Practical Free-Software Examples 151 Fig. 6.19 SPARTA Array2SH encoding for, e.g., em32 and comes with presets for several arrays, such as the Eigenmike em32. The plug-in allows to adjust the radial filters in terms of regularization type and maximum gain. The Reg. Type called Z-Style corresponds to the linear-phase design of Sect. 6.9. References 1. J. Daniel, Evolving views on HOA: from technological to pragmatic concernts, in Proceedings of the 1st Ambisonics Symposium (Graz, 2009) 2. G.W. Elko, R.A. Kubli, J. Meyer, Audio system based on at least second-order eigenbeams, in PCT Patent, vol. WO 03/061336, no. A1 (2003) 3. G.W. Elko, Superdirectional microphone arrays, in Acoustic Signal Processing for Telecommunication, ed. by J. Benesty, S.L. Gay (Kluwer Academic Publishers, Dordrecht, 2000) 4. J. Meyer, G. Elko, A highly scalable spherical microphone array based on an orthonormal decomposition of the soundfield, in IEEE International Conference on Acoustics, Speech, and Signal Processing, 2002. Proceedings.(ICASSP’02), vol. 2 (Orlando, 2002) 5. G.W. Elko, Differential microphone arrays, in Audio Signal Processing for Next-Generation Multimedia Communication Systems, ed. by Y. Huang, J. Benesty (Springer, Berlin, 2004) 6. J. Daniel, S. Moreau, Further study of sound ELD coding with higher order ambisonics, in 116th AES Convention (2004) 7. S.-O. Petersen, Localization of sound sources using 3d microphone array, M. Thesis, University of South Denmark, Odense (2004). www.oscarpetersen.dk/speciale/Thesis.pdf 8. B. Rafaely, Analysis and design of spherical microphone arrays. IEEE Trans. Speech Audio Process. (2005) 9. S. Moreau, Étude et réalisation d’outils avancés d’encodage spatial pour la technique de spatialisation sonore Higher Order Ambisonics: microphone 3d et contrôle de distance, Ph.D. Thesis, Université du Maine (2006) 10. H. Teutsch, Modal Array Signal Processing: Principles and Applications of Acoustic Wavefield Decomposition (Springer, Berlin, 2007) 11. Z. Li, R. Duraiswami, Flexible and optimal design of spherical microphone arrays for beamforming. IEEE Trans. ASLP 15(2) (2007) 152 6 Higher-Order Ambisonic Microphones and the Wave Equation (Linear, Lossless) 12. W. Song, W. Ellermeier, J. Hald, Using beamforming and binaural synthesis for the psychoacoustical evaluation of target sources in noise. J. Acoust. Soc. Am. 123(2) (2008) 13. B. Rafaely, Fundamentals of Spherical Array Processing, 2nd edn. (Springer, Berlin, 2019) 14. ISO 31-11:1978, Mathematical signs and symbols for use in physical sciences and technology (1978) 15. ISO 80000-2, quantities and units? Part 2: Mathematical signs and symbols to be used in the natural sciences and technology (2009) 16. E.G. Williams, Fourier Acoustics (Academic, Cambridge, 1999) 17. B. Rafaely, B. Weiss, E. Bachmat, Spatial aliasing in spherical microphone arrays. IEEE Trans. Signal Process. 55(3) (2007) 18. F. Zotter, Sampling strategies for acoustic holography/holophony on the sphere, in NAG-DAGA, Rotterdam (2009) 19. P. Lecomte, P.-A. Gauthier, C. Langrenne, A. Berry, A. Garcia, A fifty-node Lebedev grid and its applications to ambisonics. J. Audio Eng. Soc. 64(11) (2016) 20. I.H. Sloan, R.S. Womersley, Extremal systems of points and numerical integration on the sphere. Adv. Comput. Math. 21, 107–125 (2004) 21. B. Bernschütz, C. Pörschmann, S. Spors, Soft-limiting bei modaler amplitudenverstärkung bei sphärischen mikrofonarrays im plane-wave decomposition verfahren, in Fortschritte der Akustik - DAGA (2011) 22. T. Rettberg, S. Spors, On the impact of noise introduced by spherical beamforming techniques on data-based binaural synthesis, in Fortschritte der Akustik - DAGA (2013) 23. T. Rettberg, S. Spors, Time-domain behaviour of spherical microphone arrays at high orders, in Fortschritte der Akustik - DAGA (2014) 24. B. Rafaely, Fundamentals of Spherical Array Processing, 1st edn. (Springer, Berlin, 2015) 25. D.L. Alon, B. Rafaely, Spatial decomposition by spherical array processing, in Parametric Time-Frequency Domain Spatial Audio, ed. by V. Pulkki, S. Delikaris-Manias, A. Politis (Wiley, New Jersey, 2017) 26. S. Lösler, F. Zotter, Comprehensive radial filter design for practical higher-order ambisonic recording, in Fortschritte der Akustik – DAGA Nürnberg (2015) 27. F. Zotter, M. Zaunschirm, M. Frank, M. Kronlachner, A beamformer to play with wall reflections: The icosahedral loudspeaker. Comput. Music J. 41(3) (2017) 28. F. Zotter, M. Frank, C. Haar, Spherical microphone array equalization for ambisonics, in Fortschritte der Akustik - DAGA (Nürnberg, 2015) Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made. The images or other third party material in this chapter are included in the chapter’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. Chapter 7 Compact Spherical Loudspeaker Arrays Le haut-parleur anonymise la source réelle. Pierre Boulez [1], ICA, 1983. …adjustable radiation naturalize[s] alien sounds by embedding them in the natural spatiality of the room. OSIL project [2], InSonic, 2015. Abstract This chapter introduces auditory objects that can be created by adjustabledirectivity sources in rooms. After showing basic positioning properties in distance and direction, we describe physical first- and higher-order spherical loudspeaker arrays and their control, such as the loudspeaker cubes or the icosahedral loudspeaker (IKO). Not only static auditory objects, but such traversing space by their timevarying beam forming are considered here. Signal dependency and different practical setups are discussed and briefly analyzed. This young Ambisonic technology brings new means of expression to sound reinforcement, electroacoustic or computer music. While surrounding Ambisonic loudspeaker arrays play sound from outside the listening area into the audience, compact spherical loudspeaker arrays play sound into the room from a single position. Directivity adjustable in orientation and shape can be used to steer sound beams in order to excite wall reflections in the given, acoustic environment. The directional shapes and orientations of such beams are all controlled by—guess what—Ambisonic signals. Despite the huge practical difference, both applications do not only share the spherical harmonics that lend their shapes to Ambisonic signals: The control of radiating sound beams employs nearly the same model- or measurement-based radial steering filters as those of compact higher-order Ambisonic microphones. The works of Warusfel [3], Kassakian [4], Avizienis [5], Zotter [6, 7], Pomberger [8], Pollow [9], Mattioli Pasqual [10] established the electroacoustic background technology required to describe compact spherical loudspeaker arrays built with electrodynamic transducers. The early works on auditory objects were written by Schmeder [11], Sharma, Frank, and Zotter [2, 12, 13]. And some contemporary results were found in the project “Orchestrating the Space by Icosahedral Loudspeaker” (OSIL) between 2015 and 2018 [14–19]. © The Author(s) 2019 F. Zotter and M. Frank, Ambisonics, Springer Topics in Signal Processing 19, https://doi.org/10.1007/978-3-030-17207-7_7 153 154 7 Compact Spherical Loudspeaker Arrays 7.1 Auditory Events of Ambisonically Controlled Directivity 7.1.1 Perceived Distance Laitinen showed in [20] that increasing the directivity of a listener-facing loudspeaker array from omnidirectional to second order was able to create auditory events that were perceptually closer than the physical distance to the loudspeaker array. The experimental results can be explained by the increase of the direct-to-reverberant energy ratio, as the sound beam of the directional source does not as much excite room reflections. Wendt extended Laitinen’s work by experiments employing a simulation of a third-order directional source in a virtual room (third-order image source model) played back by a loudspeaker ring in an anechoic room [16]. He could show that the perceived distance between the listener and the higher-order directional source could not only be controlled by the order of the directivity pattern but also by the orientation of the source (towards the listener, away from the listener). Beams projecting sounds away from the listener were perceived behind the source, cf. Fig. 7.1. Again, the perceptual results could be modeled by simple measures known from room acoustics. 7.1.2 Perceived Direction Using a similar room simulation, the study in [21] asked participants to indicate the perceived direction of an auditory event created by a third-order directional source. The results showed that for different source orientations, listeners perceived auditory objects at directions that often did not coincide with the sound source, but with the delayed reflection paths, cf. Fig. 7.2. Perceived directions focused on the direct sound and the three first reflections after 6, 8, and 9 ms. For some orientations, still even 1 relative auditory distance Fig. 7.1 Perceived distance (medians and 95% confidence intervals) depends on order and direction of a static source emitting sound with max-r E directivity (180◦ towards listener, 0◦ away from listener) in a simulated environment 0.8 0.6 0.4 0.2 0 3/180 ° 2/180 ° 1/180 ° 0 1/0 ° beam order/direction 2/0 ° 3/0 ° right 7.1 Auditory Events of Ambisonically Controlled Directivity 155 -90 25 14 -45 0 8 40 -0 12 45 delay of each path in ms perceived direction in ° 9 6 left 17 90 180 135 90 left 45 0 -45 source orientation in ° -90 -135 -180 right Fig. 7.2 IKO perceived directions (black circles, radii indicate relative amount of answers) and modeling (gray crosses), 3rd-order max-r E beam, 2nd-order image source model. Gray shading in the background indicates level of each path the second-order reflections at 12 and 14 ms were dominating localization. However, the influence of later reflections is reduced by the precedence effect. The perceived directions can be modeled by the extended energy vector originally developed for offcenter listening positions in surrounding loudspeakers arrangements, as also shown in [17]. Experiments in [22] showed that panning between a reflection and the direct sound creates auditory objects in between. When applying the appropriate delay and gain to the direct sound to compensate for the longer path of the reflection, the localization curves are similar to those of standard stereo using a pair of loudspeakers. 7.2 First-Order Compact Loudspeaker Arrays and Cubes The simplest way of creating a loudspeaker array with adjustable directivity in a practical sense is a cube with loudspeakers on its plane surfaces, as suggested by Misdariis [23]. Restricting the directivity control to two dimensions reduces the number of loudspeaker drivers to four and facilitates to equip the array with a carrying handle on top and a flange adapter at the bottom, cf. [24] and Fig. 7.3. Directivity control. First-order Ambisonics utilizes monopole and dipole modes, which directly translate to the corresponding far-field radiation patterns. These modes can easily be created due to the cubic shape by either playing of all four drivers in phase or the opposing drivers out of phase, cf. Fig. 7.4. Nevertheless, the frequency responses of such monopole and dipole modes need to be equalized to enable their phase- and magnitude-aligned superposition in the far field. Filters and measurement data of cube loudspeakers built at IEM [24] are freely available on http://phaidra. kug.ac.at/o:67631. 156 7 Compact Spherical Loudspeaker Arrays (a) loud speaker cube (b) vertical cross section (c) horizontal cross section Fig. 7.3 Design of a loudspeaker cube: prototype, and vertical and horizontal cross section plots Fig. 7.4 System controlling the monopoles and dipole modes of the loudspeaker cubes, to accomplish first-order beamforming with the shape parameter α and beam direction ϕ0 To overcome the compressive effort of interior volume changes at low frequencies, the filter Hbctl in Fig. 7.4 equalizes the smaller velocity of the loudspeaker cones when driven omnidirectionally to the velocity when driven in dipoles as a first step, and as a second step, it attenuates the monopole pattern slightly to account for its more efficient radiation at low frequencies. The filter HEQ is a general equalizer required to obtain a flat frequency response, 0 ≤ α ≤ 1 is a first-order omni to dipole beamshape parameter, and ϕ0 is the beam direction. The filter Hbctl can be specified as a 5th-order IIR filter purely based on geometric and electroacoustic parameters [19]. Direct and indirect sound with two cubes. The study in [19] examined the width of the listening area for the creation of a central auditory object between a pair of loudspeaker cubes cf. Fig. 7.5. Steering the two beams directly at the listener yielded a narrow listening area that increased with the distance to the loudspeakers, similar as known from typical stereo applications, cf. Fig. 2.9. A much wider listening area is achieved by steering the beams to the front wall to excite reflections. To this end, max-r E (super-cardioid) beams were chosen and oriented in a way to ideally suppress direct sound from the loudspeaker cubes at the listening position. The proposed setup of two loudspeaker cubes can be used to play back stable L, C, R channels of a surround production without the need of an actual center loudspeaker. Surround with depth: Together with the distance control described by Laitinen [20], the stable in-between auditory image has been used in [19] to establish a surround- 2 2 1.5 1.5 1 1 0.5 0.5 0 y in m y in m 7.2 First-Order Compact Loudspeaker Arrays and Cubes -0.5 -1 157 0 -0.5 -1 -1.5 -1.5 -2 -2 -2.5 -2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 x in m (a) beam pair aiming directly at central listening position at 2.5 m distance -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 x in m (b) beam pair excting front wall reflections Fig. 7.5 Width of the listening area for a central auditory object at two distances from a pair of loudspeaker cubes with different orientation of max-r E /super-cardioid beams with-depth system consisting of a quadraphonic setup of four loudspeaker cubes. As first layer, it uses the direct sounds from the 4 loudspeakers from ±45◦ and ±135◦ together with the 4 in-between images at 0◦ , ±90◦ , and 180◦ to obtain 8 directions for third-order Ambisonic surround panning. As a second layer for depth, surround with depth uses 4 cardioid beams pointing into the 4 room corners to provide the impression of distant sounds. Blending between those two layer is used to control the distance impression of surround sounds. 7.3 Higher-Order Compact Spherical Loudspeaker Arrays and IKO With transducers mounted on spheres or polyhedra, higher-order radiators can be built. Typically, those are Platonic solids such as dodecahedra or icosahedra, as they can easily be manufactured from equal-sided polygons cf. Fig. 7.6. Often, the loudspeakers are also mounted onto a common interior volume. Hereby, the higher-order modes can be controlled at reduced impedance of the inner stiffness, however, this also causes acoustic coupling of the transducer motions. Typically, multiple-inputmultiple-output (MIMO) crosstalk cancellers are employed to suppress the coupling and to control the velocity of the transducer cones. If this is accomplished, the acoustic radiation can be modeled and equalized by the spherical cap model, cf. [6, 15, 25, 26]. 158 7 Compact Spherical Loudspeaker Arrays Fig. 7.6 Powerful icosahedral loudspeaker array (IKO by IEM and Sonible) and reflecting baffles in Ligeti concert hall, in preparation of an electroacoustic music concert Cap model. Higher-order loudspeaker arrays on a compact spherical housing are modeled by the spherical cap model. It assumes for the exterior air that the radial surface velocity is a boundary condition consisting of separated spherical cap shapes of the size α centered around the directions {θl }, each unity in value. These idealized transducer shapes driven by the transducer velocities vl compose the surface velocity v(θ ) = L u(θlT θ − cos α2 ) vl . (7.1) l=0 Here, u(ζ ) denotes the unit step function that is unity for ζ ≥ 0 and zero otherwise. The surface velocity distribution can be decomposed into spherical harmonics as v(θ ) = ∞ n Ynm (θ) n=0 m=−n L (l) wnm vl . (7.2) l=0 (l) The coefficients wnm of the lth cap are defined by spherical convolution Eq. (A.56) of a Dirac delta δ(θlT θ − 1) pointing to the cap center with a zenithal cap u(cos ϑ − cos α2 ): (l) wnm = wn Ynm (θl ), (7.3) where Ynm (θl ) are the coefficients expressing the Dirac delta, extended to a cap by 1 weighting with wn . The term wn = 2π cos α Pn (ζ ) dζ is derived in Eq. (A.60) 2 P (cos α )−cos n 2 − n+1 wn = 2π 1 − cos α2 , α 2 Pn (cos α2 ) , for n > 0, for n = 0. (7.4) 7.3 Higher-Order Compact Spherical Loudspeaker Arrays and IKO 159 Decoder. Without radiation control yet, any low-order target spherical harmonic the spherical cap n ≤ N can be synthesized as velocity pattern φnm by superimposing (l) with suitable transducer velocities vl , i.e. φnm = l wn Ynm (θl ) vl . coefficients wnm We write a matrix/vector notation with the matrix Y = [ y(θ1 ), . . . , y(θL )] containing the spherical harmonics y(θ ) = [Ynm (θ )]nm sampled at the transducer positions {θl } to represent Dirac deltas pointing there, and w = [wn ]nm to represent the cap shape, φ = diag{w}Y v. (7.5) As long as the order N up to which coefficients are controlled is low enough L ≥ (N + 1)2 and transducers are well-distributed, perfect control is feasible. The corresponding velocities are found by solving a least-squares problem, see Appendix A.4, Eq. (A.63), yielding the right inverse of the Nth-order cap-coefficient matrix, v = Y TN (Y N Y TN )−1 diag{wN }−1 φ N = D diag{wN }−1 φ N . (7.6) The right inverse D = Y TN (Y N Y TN )−1 is a mode-matching decoder, cf. Eq. (4.40). Exterior problem. The radiated sound pressure is described by the exterior problem denoted by the coefficients cnm in Eq. (6.13) and the spherical Hankel functions h (2) n (kr ). To relate it to a time-derived surface velocity at the array radius r = a, we ∂p = −ikc ρv, cf. Eq. (6.2), derive the exterior solution with regard to radius ∂∂rp = k ∂kr v(θ ) = ∞ n i (2) h (ka) Ynm (θ ) cnm . ρc n=0 m=−n n (7.7) L −1 m Comparing Eq. (7.2) to Eq. (7.7) yields cnm = ρc[ih (2) n (ka)] l=0 wn Yn (θl ) vl , the coefficients to calculate the radiated pressure. Far away, we replace the sphern+1 −1 −ikr k e by the term in+1 k −1 ical Hankel function that approaches h (2) n (kr ) → i in Eq. (6.13) so that the radiated far-field sound pressure p ∝ in+1 k −1 Ynm cnm becomes ∞ L n in wn m p(θ) ∝ Ynm (θ ) Yn (θl ) vl . (7.8) k h (2) n (ka) l=0 n=0 m=−n 7.3.1 Directivity Control The spherical harmonics coefficients of the far-field sound pressure pattern in Eq. (7.8) are controlled by the cap velocities vl ψnm = in wn k L h (2) n (ka) l=0 Ynm (θl ) vl , (7.9) 160 7 Compact Spherical Loudspeaker Arrays and we desire to form the directional sound beam they represent according to a max-r E pattern an Ynm (θ 0 ) yielding radiation focused towards θ 0 ψnm = an Ynm (θ 0 ). (7.10) To find suitable cap velocities vl , we equate the model Eqs. (7.9) and (7.10). In matrix/vector notation never used the equation is diag{[in wn k −1 / h (2) n (ka)]nm } Y v = diag{[an ]nm } y(θ 0 ). (7.11) The diagonal matrix on the left is easy to invert, and for patterns up to the order n ≤ N, the mode-matching decoder D of Eq. (7.6) already gives us a way to define velocities inverting the matrix Y N from the right. The preliminary solution becomes ⇒ v = D diag{[i−n wn−1 k h (2) n (ka) an ]nm } yN (θ 0 ). (7.12) On-axis equalized, sidelobe-suppressing directivity control limiting the excursion. The inverse cap shape coefficient wn−1 and the max-r E weight an can be regarded as a −n−1 (ka)2 h (2) part of the radiation control filters i−n k h (2) n (ka). The expression i n (ka) of compact spherical microphone arrays (Sect. 6.6) qualitatively differs by a factor k. Practical implementation of radiation control filters and their regularization is therefore quite similar to radial filters of spherical microphone arrays. There are three main differences, as explained in [15]: • With loudspeaker arrays, it is rather the excursion that is limited, which primarily entails a different strategy of adjusting the filter bank cut-on frequencies, which due to size are at lower frequencies where group-delay distortions are less disturbing, and linear-phase implementations would cause avoidably long delays. • Moreover, instead of cut-on filter slopes of (n + 1)th order required for noise removal in signals obtained from spherical microphone arrays, limited excursion requires cut-on slopes of at least (n + 3)th order, i.e. 4th order to cut on the 1storder Ambisonic signals. Thereof, one additional order is caused by the qualitative difference of k −1 in radial filters, and another order by the conversion of velocity to excursion by a factor (iω)−1 . • Finally, instead of diffuse-field equalization that is useful for surround sound playback of spherical microphone array signals, it is more useful to equalize spherical sound beams on-axis (free field). On-axis equalization yields a different scaling of the sub-band max-r E weights an,b ⎧ ⎨ P cos 137.9◦ n b+1.51 = ⎩ 0, N n=0 (2n+1)Pn cos n=0 (2n+1)Pn cos b 137.9◦ N+1.51 137.9◦ b+1.51 , for n ≤ b, otherwise. (7.13) 7.3 Higher-Order Compact Spherical Loudspeaker Arrays and IKO 161 Typically, cut-on frequencies for compact spherical loudspeaker arrays are low, and linear-phase filterbanks would require long pre-delays. It is useful to employ Linkwitz-Riley filters for the crossovers, to get a low-latency implementation. To emphasize the similarity to Eq. (6.22), we write Linkwitz-Riley filters [27] as combination of an all-pass Am with twice the phase response of an mth-order Butterworth low-pass combined either with the magnitude-squared low-pass response [1 + (ω/ωc )2m ]−1 or high-pass response (ω/ωc )2m [1 + (ω/ωc )2m ]−1 per crossover. Such a minimum-phase crossover is of even order, so that the minimum-order cut-on . Plain high/low crossovers slope must be rounded up to the next even order 2 b+3 2 would be in-phase unless combined with further crossovers to form narrower bands. However, an in-phase filterbank is obtained after inserting the product of all all-passes in every band, cf. [28]. Although non-minimum-phase, this is still low-latency. For the band b containing Ambisonic orders 0 ≤ n ≤ b, the modified filterbank is Hb (ω) = ω 2 ωb 1+ b+3 2 ω 2 ωb b+3 2 N 1 1+ ω ωb+1 2 b+4 2 b +3 Aωb2 (ω). (7.14) b =0 The sum b Hb (ω) is considered to be sufficiently flat, so that the radial filters for compact spherical loudspeaker arrays using Eqs. (7.8), (7.13), (7.14) become ρn (ω) = N ika an,b Hb (ω) i−n wn−1 h (2) n (ka) e . (7.15) b=n Figure 7.7 shows the block diagram to control compact spherical loudspeaker arrays by Ambisonic input signals, including the radiation control filters, the decoder, and a voltage-equalizing crosstalk canceller feeding the loudspeakers. Fig. 7.7 Signal processing for higher-order compact spherical loudspeaker array control: the Ambisonic directivity signals χ N (t) run through radiation control filters ρn (ω), a decoder yielding desired velocities v(t), and a crosstalk canceller/equalizer provides suitable output voltages u(t) 162 7 Compact Spherical Loudspeaker Arrays 7.3.2 Control System and Verification Based on Measurements Velocity equalization/crosstalk cancellation. In the frequency domain, laser vibrometer measurements, cf. Fig. 7.8a, characterize the physical multiple-input-multipleoutput (MIMO) system of transducer input voltages u l (ω) to transducer velocities vl (ω) v(ω) = T (ω) u(ω), (7.16) including the effect of acoustic coupling through the common enclosure. Corresponding open measurement data sets1 can be found online, as described in [18]. Theoretically, the frequency-domain inverse of the matrix T (ω) can be used to equalize and control the transducer velocities with acoustic crosstalk cancelled, as indicated in Fig. 7.7, u(ω) = T −1 (ω) v(ω). (7.17) In practice, this is only useful up to the frequency at which the loudspeaker cone vibration breaks up into modes, so typically below 1 kHz. Control system: The entire control system with Ambisonic signals χ N (ω) as inputs uses Eqs. (7.6), (7.15), (7.17) u(ω) = T −1 (ω) D diag{ρ(ω)} χ N (ω). (7.18) Directivity measurement. It is useful to characterize the directivity obtained by measurements to verify the results; high-resolution 648 × 20 measurements G(ω) of the IKO are found online1 . The sound pressure can be decomposed with the known directional sampling by left-inversion of a spherical harmonics matrix Y T17 , see Appendix A.4, Eq. (A.65, which can be up to 17th order on a 10◦ × 10◦ grid in azimuth and zenith: p(ω) = G(ω) u(ω), ⇒ ψ 17 (ω) = (Y T17 )† p(ω). (7.19) With the highly resolved spherical harmonics coefficients, polar diagrams or balloon diagrams can be evaluated at any direction p(θ, ω) = y17 (θ)T ψ 17 (ω), (7.20) given any control system delivering suitable voltages u for beamforming, as e.g. obtained by Eq. (7.18). 1 http://phaidra.kug.ac.at/o:67609. 7.3 Higher-Order Compact Spherical Loudspeaker Arrays and IKO (a) velocities 163 (b) directivities Fig. 7.8 Measurements on the IKO as a MIMO system in terms of transducer output velocities (left) and radiation patterns (right) depending on the transducer input voltages Fig. 7.9 Horiontal cross section of the IKO’s directivity/dB over frequency/Hz and azimuth/degrees when beamforming to 0◦ azimuth on the horizon, with radiation control filters above, with filterbank frequencies (38, 75, 125, 210) Hz To inspect the frequency-dependent directivity, a horizontal cross section is shown in Fig 7.9. The beamforming gets effective above 100 Hz and a beam width of ±30◦ is held until 2 kHz. The filterbank starts the 0th order above 38 Hz, and with 75, 125, 210 Hz, 1st, 2nd, and 3rd order are successively added including on-axis equalized max-r E weightings. Above 2 kHz both spatial aliasing and modal breakup of the transducer cones affect directivity. However, these beamforming-directiondependent distortions are often negligible in typical rooms. 164 7 Compact Spherical Loudspeaker Arrays 7.4 Auditory Objects of the IKO 7.4.1 Static Auditory Objects Fig. 7.10 Distance control using the IKO in the IEM CUBE auditory distance from IKO in m The study in [16] showed that distance control by changing the directivity and its orientation can also be achieved with the IKO in a real room, cf. Fig. 7.10. The experiments used stationary pink noise and could create auditory objects nearly 2 m behind the IKO, which corresponds to the distance between the IKO and the front wall of the playback room. The maximum distance of auditory objects created by the IKO is strongly signaldependent. Experiments in [14] showed that the auditory distance of pink noise bursts decreased for shorter fade-in times, while the fade-out time had no influence, cf. Fig. 7.11. A transient click sound was perceived even closer to the IKO. This can be explained by the precedence effect, that favors the earlier direct sound over the reflected sound from the walls. While this effect is strong for transient sounds, it is inhibited for stationary sounds with long fade-in times. However, the precedence effect can even be reduced for transient click sounds by simultaneous playback of a masker sound that reduces the influence of the direct sound [29]. In comparison to no masker, playing a pink noise masker doubles the auditory distance, cf. Fig. 7.12. Using the room noise as a masker by playing the 2 1 0 -1 -2 front 0 back Fig. 7.11 Signal-dependent distance of auditory objects created by the IKO: markers indicate short or long fade-in and fade-out of pink noise bursts, · indicates transient click sound auditory distance from IKO in m transition between front and back beam 2 1.5 1 0.5 0 1 2 3 sound 4 5 7.4 Auditory Objects of the IKO 165 auditory distance from IKO in m Fig. 7.12 Distance of auditory objects created by the IKO playing a transient click sound in dependence of maskers: no masker (M0), noise at different levels (M1, M2) and playback at very low level (room masker MR1 and MR2) 2 1.5 1 0.5 0 ref. M0 MN1 MN2 MR1 MR2 target sound very softly further increases the distance and yield a perception that is detached from the IKO. 7.4.2 Moving Auditory Objects 2 2 0 0 y position in m y position in m The studies in [14, 15] extended the previous listening experiments towards simple time-varying beam directions, such as from the left to the right, front/back or circles. To report the perceived locations of the moving auditory objects, listeners used a touch screen that showed a floor plan of the room, including the listening position and the position of the IKO. They had to indicate the location of the auditory object’s trajectory every 500 ms. The perceived trajectories depend on the listening position, but they can always be recognized, cf. Fig. 7.13. The empirical knowledge was -2 -4 -2 -4 -6 -6 -4 -2 0 x position in m 2 -4 -2 0 2 x position in m Fig. 7.13 Average perceived locations for each 500 ms step during front/back-movement (dark gray) and left/right-movement (light gray) at two listening positions, triangle indicates start and asterisk end of the trajectory 7 Compact Spherical Loudspeaker Arrays 4 4 2 2 y position in m y position in m 166 0 -2 -4 0 -2 -4 -4 -2 0 2 4 x position in m -4 -2 0 2 4 x position in m Fig. 7.14 Average perceived locations for each 500 ms step during circular movement of transient sound (dark gray) and stationary noise (light gray) without and with additional reflectors, triangle indicates start and asterisk end of the trajectory applied in the artistic study in [14] about body-space relations, composing sounds that are spatialized with different static directions and simple movements. For concerts, the artistic practice evolved to set the IKO up together with reflector baffles, cf. Fig. 7.14. A recent study in [30] investigated their effect on the perception of moving transient and stationary sounds. The baffles obviously reduce the signaldependency by contributing more additional reflection paths, contrasting the direct sound. 7.5 Practical Free-Software Examples 7.5.1 IEM Room Encoder and Directivity Shaper The IEM Room Encoder VST plug-in, cf. Fig. 4.36, can not only be used to simulate the room reflections of an omnidirectional sound source based on the imagesource method, but it also supports directional sound sources. As format, it employs Ambisonics with ACN ordering and adjustable normalization up to seventh order. Thus, it enables to utilize data from directivity measurements or even directional recordings done with a surrounding spherical microphone, e.g. to put real instrument recordings into the virtual room. As an alternative, the IEM Directivity Shaper, cf. Fig. 7.15 provides simple means to generate a frequency-dependent directivity pattern from scratch and to apply it on a mono input signal. This is useful to generate the typical rotary speaker effect of a Leslie cabinet. 7.5 Practical Free-Software Examples 167 Fig. 7.15 IEM Directivity Shaper plug-in 7.5.2 IEM Cubes 5.1 Player and Surround with Depth As shown in Fig. 7.5, a pair of loudspeaker cubes can create a stable auditory event in between them to replace an actual center loudspeaker. In order to play back an entire 5.1 production, the IEM cubes 5.1 Player plug-in extends this approach by two additional beams to the side walls for the surround channels, cf. Fig. 7.16. The plug-in provides a control of the shape, direction, and level of all beams, as well as a delay compensation for the reflection paths. Surround sound with depth can be realized with a quadraphonic setup of four loudspeaker cubes and a combination of the cubes Surround Decoder and multiple Distance Encoder plug-ins, cf. Fig. 7.16. For each source, the Distance Encoder controls position and distance, i.e. the blending between the two layers. The output of the plug-in is a 10-channel audio stream including 7 channels for third-order (inner layer) and 3 for first-order 2D Ambisonics (outer depth layer). The cubes Surround Decoder plug-in decodes the 10-channel audio stream and distributes the signals to the 16 drivers of four loudspeaker cubes. For each loudspeaker cube, the directions to excite direct and reflected sound of the inner layer and the diffuse sound of the depth layer can be adjusted in order to adapt to the playback environment. Additionally, the directivity patterns for direct, reflected, and diffuse sound beams 168 7 Compact Spherical Loudspeaker Arrays Fig. 7.16 IEM cubes 5.1 Player, cubes Surround Decoder and Distance Encoder plug-ins can be controlled, as well as a delay to compensate for the longer propagation paths of the reflected sound. The plug-ins are available under https://git.iem.at/audioplugins/ CubeSpeakerPlugins. 7.5 Practical Free-Software Examples 169 7.5.3 IKO Spatialization using the IKO can use a similar infrastructure of plug-ins as surrounding loudspeaker arrays. Ambisonic encoder plug-ins, such as the ambix_encoder or the IEM StereoEncoder or MultiEncoder, create the third-order Ambisonic signals that are subsequently fed to a decoder. Decoding to the IKO requires the processing steps as shown in Fig. 7.7: radiation control filters in the spherical harmonic domain, decoding from spherical harmonics to transducer signals, as well as crosstalk cancellation and equalization of the transducers. This processing can be summarized in a 16 (spherical harmonics up to third order) × 20 (transducers) filter matrix. Convolution can be done efficiently using the mcfx_convolver plug-in. Filter presets for the IKO can be found under http://phaidra.kug.ac.at/o:79235. References 1. P. Boulez, L’acoustique et la musique contemporaine, in Proceedings of the 11th International Congress on Acoustics, vol. 8 (Paris, 1983), https://www.icacommission.org/Proceedings/ ICA1983Paris/ICA11%20Proceedings%20Vol8.pdf 2. M. Frank, G.K. Sharma, F. Zotter, What we already know about spatialization with compact spherical arrays as variable-directivity loudspeakers, in Proceedings of the inSONIC2015 (Karlsruhe, 2015) 3. O. Warusfel, P. Derogis, R. Causse, Radiation synthesis with digitally controlled loudspeakers, in 103rd AES Convention (Paris, 1997) 4. P. Kassakian, D. Wessel, Characterization of spherical loudspeaker arrays, in Convention paper #1430 (San Francisco, 2004) 5. R. Avizienis, A. Freed, P. Kassakian, D. Wessel, A compact 120 independent element spherical loudspeaker array with programmable radiation patterns, in 120th AES Convention (France, Paris, 2006) 6. F. Zotter, R. Höldrich, Modeling radiation synthesis with spherical loudspeaker arrays, in Proceedings of the ICA (Madrid, 2007), http://iaem.at/Members/zotter 7. F. Zotter, Analysis and synthesis of sound-radiation with spherical arrays, PhD. Thesis, University of Music and Performing Arts, Graz (2009) 8. H. Pomberger, Angular and radial directivity control for spherical loudspeaker arrays, M. Thesis, Institut für Elektronische Musik und Akustik, Kunstuni Graz, Technical University Graz, Graz, A (2008) 9. M. Pollow, G.K. Behler, Variable directivity for platonic sound sources based on spherical harmonics optimization. Acta Acust. United Acust. 95(6), 1082–1092 (2009) 10. A.M. Pasqual, P. Herzog, J.R. Arruda, Theoretical and experimental analysis of the behavior of a compact spherical loudspeaker array for directivity control. J. Acoust. Soc. Am. 128(6), 3478–3488 (2010) 11. A. Schmeder, An exploration of design parameters for human-interactive systems with compact spherical loudspeaker arrays, in 1st Ambisonics Symposium (Graz, 2009) 12. G.K. Sharma, F. Zotter, M. Frank, Orchestrating wall reflections in space by icosahedral loudspeaker: findings from first artistic research exploration, in Proceedings of the ICMC/SMC (Athens, 2014), pp. 830–835 13. F. Zotter, M. Frank, A. Fuchs, D. Rudrich, Preliminary study on the perception of orientationchanging directional sound sources in rooms, in Proceedings of the Forum Acusticum (Kraków, 2014) 170 7 Compact Spherical Loudspeaker Arrays 14. F. Wendt, G.K. Sharma, M. Frank, F. Zotter, R. Höldrich, Perception of spatial sound phenomena created by the icosahedral loudspeaker. Comput. Music J. 41(1) (2017) 15. F. Zotter, M. Zaunschirm, M. Frank, M. Kronlachner, A beamformer to play with wall reflections: the icosahedral loudspeaker. Comput. Music J. 41(3) (2017) 16. F. Wendt, F. Zotter, M. Frank, R. Höldrich, Auditory distance control using a variable-directivity loudspeaker. MDPI Appl. Sci. 7(7) (2017) 17. F. Wendt, M. Frank, On the localization of auditory objects created by directional sound sources in a virtual room, in VDT Tonmeistertagung (Cologne, 2018) 18. F. Schultz, M. Zaunschirm, F. Zotter, Directivity and electro-acoustic measurements of the IKO, in Audio Engineering Society Convention 144 (2018), http://www.aes.org/e-lib/browse. cfm?elib=19557 19. T. Deppisch, N. Meyer-Kahlen, F. Zotter, M. Frank, Surround with depth on first-order beamcontrolling loudspeakers, in Audio Engineering Society Convention 144 (2018), http://www. aes.org/e-lib/browse.cfm?elib=19494 20. M.-V. Laitinen, A. Politis, I. Huhtakallio, V. Pulkki, Controlling the perceived distance of an auditory object by manipulation of loudspeaker directivity. J. Acoust. Soc. Am. 137(6) EL462–EL468 (2015) 21. F. Zotter, M. Frank, Investigation of auditory objects caused by directional sound sources in rooms. Acta Phys. Pol. A 128(1-A) (2015), http://przyrbwn.icm.edu.pl/APP/PDF/128/ a128z1ap01.pdf 22. F. Zagala, J. Linke, F. Zotter, M. Frank, Amplitude panning between beamforming-controlled direct and reflected sound, in Audio Engineering Society Convention 142 (Berlin, 2017), http:// www.aes.org/e-lib/browse.cfm?elib=18679 23. N. Misdariis, F. Nicolas, O. Warusfel, R. Caussee, Radiation control on multi-loudspeaker device : La timée,” in International Computer Music Conference (La Habana, 2001) 24. N. Meyer-Kahlen, F. Zotter, K. Pollack, Design and measurement of a first-order, in Audio Engineering Society Convention 144 (2018), http://www.aes.org/e-lib/browse.cfm?elib=19559 25. F. Zotter, H. Pomberger, A. Schmeder, Efficient directivity pattern control for spherical loudspeaker arrays, in ACOUSTICS08 (2008) 26. F. Zotter, A. Schmeder, M. Noisternig, Crosstalk cancellation for spherical loudspeaker arrays, in Fortschritte der Akustik - DAGA (Dresden, 2008) 27. S.P. Lipshitz, J. Vanderkoy, In-phase crossover network design. J. Audio Eng. Soc. 34(11), 889–894 (1986) 28. S. Lösler, F. Zotter, Comprehensive radial filter design for practical higher-order ambisonic recording, in Fortschritte der Akustik – DAGA Nürnberg (2015) 29. J. Linke, F. Wendt, F. Zotter, M. Frank, How masking affects auditory objects of beamformed sounds, in Fortschritte der Akustik, DAGA (Munich, 2018) 30. J. Linke, F. Wendt, F. Zotter, M. Frank, How the perception of moving sound beams is influenced by masking and reflector setup, in VDT Tonmeistertagung (Cologne, 2018) Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made. The images or other third party material in this chapter are included in the chapter’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. Appendix Many have written of the experience of mathematical beauty as being comparable to that derived from the greatest art. S. Zeki, J.P. Romaya, D.M.T. Benincasa, M.F. Atiyah [1] Frontiers in Human Neuroscience, Feb. 2014. A.1 Harmonic Functions The Laplacian is defined in the D-dimensional Cartesian space as = D ∂2 . ∂ x 2j j=1 The Laplacian eigenproblem f = −λ f is solved by harmonics, on a finite interval. A.2 Laplacian in Orthogonal Coordinates In general, coordinates can be expressed by n-tuples of values. For instance, the Cartesian coordinates are (x1 , x2 , . . . ) and coordinates of another coordinate systems are (u 1 , u 2 , . . . ), and both describe the location of a point in a space depending on a finite number of dimensions. Each location of the space should be accessible by both © The Editor(s) (if applicable) and The Author(s) 2019 F. Zotter and M. Frank, Ambisonics, Springer Topics in Signal Processing 19, https://doi.org/10.1007/978-3-030-17207-7 171 172 Appendix coordinate systems and there should be a bijective mapping between both systems, e.g. u j = u j (x1 , x2 , . . . ). A single differentiation with regard to the component xi , for instance, is described by the chain rule and consists of the sum of weighted partial differentials with regard to u j : ∂u j ∂ ∂ = . ∂ xi ∂ xi ∂u j j Written in terms of vectors, the (Cartesian) gradient ∇ = (A.1) ∂ ∂x with ∂ ∂u yields: ∂ uT ∂ ∂ := J ∂u/∂ x , (A.2) ∂x ∂u ∂u ∂u for which the Jacobian matrix J ∂u/∂ x = ∂ xij that is either written in dependency ∇= ij of x or u represents all the partial derivatives of the mapping between the coordinate systems. For bijective mappings also Jacobian of the inverse mapping exists J ∂ x/∂u = ∂ xi ∂u j ji . The coordinate systems are equivalent if the determinant of the Jacobian is non-zero | J| = 0. Orthogonal coordinate systems have the interesting property that the rows of the Jacobian (or its columns) are orthogonal, so that J T J yields a diagonal matrix. T With J ∂ x/∂u = ∂∂xu the meaning of this property becomes easier to understand: the differential changes in the location ∂ x/∂u j of each Cartesian coordinate into the direction of each individual non-Cartesian coordinate u j describes an orthogonal set of motion directions in space, whose orientation depends on the location and whose individual lengths may vary. 2 To obtain a description of the Laplacian in the Helmholtz equation = i ∂∂x 2 i is our goal here, and it can be obtained with the chain rule, now calculating from xi to u j , ⎛ ⎞ ∂u j ∂ ∂ ∂ ∂ ⎝ ⎠ = = ∂ xi ∂ xi ∂ xi ∂ xi ∂u j i i j (A.3) ∂ 2u j ∂u j ∂u k ∂ ∂2 + , ∂ xi2 ∂u j i, j,k ∂ xi ∂ xi ∂u j ∂u k i, j ⎤ ⎡ ∂u j ∂u k ∂2 ∂ ∂ ⎥ ∂u j 2 ∂ 2 ⎢ = with = 1T ⎣ J T J ◦ , ⎦ ∂ xi ∂ xi ∂u j ∂u k ∂ uT ∂ u ∂ xi ∂u 2j i, j,k i, j = ortho:diag with ◦ denoting the element-wise, i.e. Hadamard product. Orthogonal coordinates largely simplify the Laplacian (see last line) and make it consist of first- and secondorder differentials with regard to the new coordinates, individually, with all mixed Appendix 173 derivatives canceling. Both first- and second-order differentials are weighted by the partial derivatives of the coordinate mapping. For each u j , the Laplacian is composed of those two expressions = u j , where u j j A.3 ∂ 2u j ∂u j 2 ∂ 2 ∂ = + . (A.4) ∂ xi ∂ xi2 ∂u j ∂u 2j i i Laplacian in Spherical Coordinates The right-handed spherical coordinate systems in ISO31-11, ISO80000-2, [2, 3], uses a radius r , an azimuth angle ϕ, and a zenigh angle ϑ, mapping to Cartesian coordinates x 2 + y2 + z2, x = r cos ϕ sin ϑ, y = r sin ϕ sin ϑ, z = r cos ϑ, or inversely r = √ x 2 +y 2 ϕ = arctan xy , ϑ = arctan , see Fig. 4.11. z Re-expressing the zenith angle coordinate by ζ = cos ϑ = rz reduces the effort in calculation and yields x = r cos ϕ 1 − ζ 2 , y = r sin ϕ 1 − ζ 2 , z = r ζ . In order to obtain solutions along the angular dimensions azimuth and zenith, we first need to re-write the Laplacian from Cartesian to spherical coordinates. For first-order derivative along the x axis, we get the generalized differential ∂r ∂ ∂ϕ ∂ ∂ζ ∂ ∂ = + + . ∂x ∂ x ∂r ∂ x ∂ϕ ∂ x ∂ζ As the Cartesian and spherical coordinates are orthogonal, therefore any mixed second-order derivatives in Cartesian or spherical coordinates vanish. We may derive a second time wrt. x: ∂ ∂x ∂ ∂x = ∂ 2r + ∂x2 ∂r ∂x 2 ∂ ∂r ∂ ∂r 2 2 ∂2ϕ ∂ ∂2ζ ∂ ∂ ∂ ∂ϕ ∂ζ + + + + . ∂x2 ∂x ∂ϕ ∂ϕ ∂x2 ∂x ∂ζ ∂ζ Obviously, we require all first-order derivatives squared, and all second-order derivatives of the spherical coordinates. A.3.1 The Radial Part 2 2 With r = x 2 + y 2 + z 2 we obtain for the radial part r = ∂∂ xr2 + ∂∂rx ∂r∂ ∂r∂ of the Laplacian 174 Appendix 2 2 x 2 ∂ x 2 + y2 + z2 1 x2 = = 2x = = 2, ∂x r r 2 x 2 + y2 + z2 ∂ x 1 ∂x r2 − x2 ∂ 2r ∂r ∂ 1 1 x 1 = = = . + x = − x ∂x2 ∂x r r ∂x ∂ x ∂r r r r r2 r3 ∂r ∂x 2 For x, y, z altogether, this is for: 3r 2 − x 2 − y 2 − z 2 r = r3 2 ∂ x ∂2 y2 z2 ∂ 2 2 ∂ + 2 + 2 + 2 + = . (A.5) ∂r r r r ∂r 2 r ∂r ∂r 2 2D. In two dimensions, there is no z coordinate, therefore there is just one term fewer: r,2D = A.3.2 2r 2 − x 2 − y 2 r3 2 x ∂2 ∂ y2 ∂ 2 1 ∂ + 2 + 2 + = . ∂r r r ∂r 2 r ∂r ∂r 2 (A.6) The Azimuthal Part With ϕ = arctan xy and dxd arctan x = ∂ of the Laplacian becomes ∂ϕ 1 , the azimuthal part ϕ 1+x 2 = ∂2ϕ ∂x2 + ∂ϕ 2 ∂x ∂ ∂ϕ 2 2 2 2 2 ∂ arctan xy 1 ∂ϕ ∂ y y x2 y y2 = = = − 2 = − 2 = 4 , 2 2 2 y ∂x ∂x x +y x rxy rxy 1 + x2 ∂ x x 2 2 2 2 2 ∂ arctan xy x ∂ y ∂ϕ x2 x2 1 x2 = = = = 2 = 4 , 2 2 2 2 ∂y ∂y x + y ∂y x x +y x rxy rxy ∂ 2ϕ ∂ 2x y y = − 2 = 3 , ∂x2 ∂x rxy rxy x ∂ 2ϕ ∂ ∂ϕ 2x y = 0. = =− 3 , 2 2 ∂y ∂ y rxy rxy ∂z It only depends on x and y, and altogether, we obtain ϕ = 2x y − 2x y 3 rxy x 2 + y2 ∂ 2 ∂ ∂2 1 ∂2 1 + = 2 = 2 . 4 2 2 2 ∂ϕ rxy ∂ϕ rxy ∂ϕ r (1 − ζ ) ∂ϕ 2 (A.7) 2D. In two dimensions, ζ = 0, therefore Appendix 175 ϕ,2D = A.3.3 1 ∂2 . r 2 ∂ϕ 2 (A.8) The Zenithal Part The zenith angle is actually ϑ, and we define ζ = cos ϑ as a variable to express it in order to simplify the derivation. With ζ = √ 2 z 2 2 = rz , the zenithal part x +y +z ∂ 2 ζ ∂ζ 2 ∂ ∂ ζ = ∂ x 2 + ∂ x ∂ζ ∂ζ becomes ∂ζ ∂x ∂ζ ∂z 2 2 ∂ 2ζ ∂x2 ∂ 2ζ ∂z 2 2 z x 2 x z 2 x 2 z2 = − 2 = − 3 = 6 , r r r r 2 2 2 2 2 2 4 rxy rxy z z ∂ z 1 r − z2 − 2 = = = = = , ∂z r r r r r3 r3 r6 3x 2 − r 2 z ∂ xz 1 x − 3 = − 3 + 3x z 4 = z = , ∂x r r r r r5 2 2 2 3rxy rxy z ∂ rxy = −z = . = −3 ∂z r 3 r4 r r5 = ∂ z ∂x r For x, y, and z altogether, we get ζ = z 2 4 2 3x 2 + 3y 2 − 2r 2 − 3rxy (x 2 + y 2 )z 2 + rxy r 2 rxy ∂ ∂2 ∂2 2r 2 ∂ =− 5 + + 6 6 2 5 ∂ζ r ∂ζ r ∂ζ 2 r r ∂ζ r 2 (1 − ζ 2 ) ∂ 2 1 − ζ 2 ∂2 2 ∂ 2 ∂ + + = −z 3 = − 2ζ . 4 2 r ∂ζ r ∂ζ r ∂ζ r 2 ∂ζ 2 (A.9) 2D. This part does not exist in 2D. A.3.4 Azimuthal Solution in 2D and 3D 2 The azimuth harmonics are found by solving ϕ = −λrxy d2 2 = −λ rxy . dϕ 2 (A.10) We know that cos x = − cos x and sin x = − sin x, therefore we can insert the solutions 176 Appendix ˆ = cos(aϕ), for a ≥ 0, sin(|a|ϕ), for a < 0, and obtain with d2 ˆ dϕ 2 ˆ the characteristic equation that fixes a = −a 2 2 −a 2 = −λ rxy . ˆ ˆ + 2π l) with l ∈ Z. This is only possible Geometrically, we desire that (ϕ) = (ϕ 2 = m 2 , and m ∈ Z. with λ rxy We can therefore define for −∞ ≤ m ≤ ∞ the terms of a normalized Fourier series ⎧√ ⎪ 2 sin(|m|ϕ), for m < 0, 1 ⎨ (A.11) m (ϕ) = √ 1, for m = 0, 2π ⎪ ⎩√ 2 cos(mϕ), for m > 0. The azimuth harmonics are orthogonal: none of the products cos(iϕ) sin( jϕ), cos(iϕ) cos( jϕ), or sin(iϕ) sin( jϕ) produces a constant component unless i = j, excluding the mixed cosine and sine product. Normalization ensures that the nonzero result is unity $ 2π 0 by % 2π 0 1 √ 2 dϕ 2π = % 2π 0 1, for i = j, i j dϕ = δi j = 0, else, cos2 mϕ √ 2 π dϕ = % 2π 0 sin2 mϕ √ 2 π (A.12) dϕ = 1. 2D functions. With unbounded |m| → ∞, the circular harmonics are complete in the Hilbert space of square-integrable circular polynomials. By their orthonormality, we can derive a transformation integral of a function g(ϕ) that should be represented as series g= ∞ γj j (A.13) j=−∞ by integration of g over m $ π −π % dϕ g(ϕ) m dϕ = ∞ m=−∞ $ γm π m j dϕ = γm . −π (A.14) =δm j 2D panning functions. To decompose an infinitely narrow unit-surface Dirac delta function that represents an infinite-order panning function towards the direction ϕs , Appendix 177 δ(ϕ − ϕs ) = limε→0 2ε1 , for |ϕ − ϕs | ≤ ε, 0 otherwise, (A.15) we the obtain the coefficients from the transformation integral $ γm = π −π $ δ(ϕ − ϕs ) m dϕ = m (ϕs ) lim ε ε→0 −ε 1 dϕ 2ε = m (ϕs ). (A.16) Typically, a finite-order series will be employed as a panning function N gN = am m (ϕs )m (ϕ), (A.17) m=−N involving a weight am = a|m| controlling the side lobes. To evaluate the E measure of loudness, we can write ⎤ ⎡ N $ π $ π $ π N E= gN2 dϕ = ai i ⎣ a j j ⎦ dϕ = ai a j i j dϕ −π = −π N 2 − δi 2π i=0 i=0 j=0 i, j ai2 . −π (A.18) For ϕs = 0, we obtain an axisymmetric function in terms of a pure cosine series, as sin 0 = 0, gN (ϕ) = N m=0 am 2 − δm cos(mϕ), 2π (A.19) with δm = 1 for 1 for m = 0 and 0 elsewhere. The axisymmetric panning function is easier to design. 2D max-r E . For the narrowest-possible spread, we maximize the length of r E , %π rE = −π %π N = 2 cos ϕ dϕ gN 2 −π gN dϕ i=1 ai ai−1 πE =: % N = r̂E E i, j=0 ai a j (2 − δi )(2 − δ j ) cos(iϕ) cos( jϕ) cos(ϕ) dϕ (2π)2 E (A.20) where we used cos(iϕ) cos(ϕ) = cos[(i+1)ϕ]+cos[(i−1)ϕ] , inserted the orthogonality 2−δi % π (2−δi ) cos(iϕ) cos( jϕ) of the cosine −π dϕ = δi j , and combined (ai ai−1 + ai ai+1 ) = 2π 2 ai ai−1 . To maximize, we zero the derivative to am 178 Appendix r̂E r̂E 1 − 2 E = r̂E − E rE = 0 E E E = am−1 + am+1 − (2 − δm )am rE = 0. rE = +am−1 = am rE , we recIf we assume that am = cos(mα), and we insert this for am+12−δ m cos[(m+1)α]+cos[(m−1)α] ognize by inserting the above theorem cos(mα) cos(α) = 2−δm cos[(m + 1)α] + cos[(m − 1)α] = cos(mα) rE = cos(mα) cos(α) 2 − δm that rE = cos α. And to maximize rE by constraining that aN+1 = 0, we get the 1 90◦ = ± N+1 smallest-possible spread α = ± π2 N+1 cos am = 0, π m 2 N+1 , for 0 ≤ m ≤ N, elsewhere. (A.21) The max-r E panning function in 2D consequently is gN (ϕ) = N am m (ϕs ) m (ϕ) = m=−N A.3.5 N cos π m 2 N+1 m (ϕs ) m (ϕ). (A.22) m=−N Towards Spherical Harmonics (3D) The spherical harmonics are harmonics depending only on angular terms. We may superimpose both parts ϕζ = ϕ + ζ of the Laplacian and solve the eigenproblem r 2 ϕζ Y = −λY 2 1 ∂2 ∂ 2 ∂ Y + (1 − ζ Y − 2ζ ) Y = −λY. 1 − ζ 2 ∂ϕ 2 ∂ζ ∂ζ 2 We assume Y to be a product of the azimuth harmonics m (ϕ) from above and undefined zenith harmonics (ζ ) Y = m , (A.23) which yields a differential equation (∂ → d) only in ζ after inserting −m 2 m d −m 2 m − 2ζ m 1 − ζ2 dζ + (1 − ζ 2 )m d2 dζ 2 = −λm . d2 d2 ϕ m = Appendix 179 And after dividing by m , we obtain the associated Legendre differential equation −m 2 1 − ζ2 d d2 + (1 − ζ 2 ) 2 dζ dζ 2 2 m d d (1 − ζ 2 ) 2 − 2ζ +λ− dζ dζ 1 − ζ2 A.3.6 − 2ζ = −λ , = 0. (A.24) Zenithal Solution: Associated Legendre Differential Equation The associated Legendre differential equation (written in x and y for mathematical simplicity) is (1 − x 2 )y − 2x y + λ − m2 y = 0, 1 − x2 or after gathering the derivatives 2 (1 − x )y + λ − m2 y = 0. 1 − x2 1 Simplifying the differential equation by 1−x 2 . In the associated Legendre differential 1 equation, we would like to get rid of the denominator 1−x 2 . In this case, it is typical 2 α to substitute y = (1 − x ) v and try out which α succeeds. For insertion into the differential equation, the derivative of y is y = −α(1 − x 2 )α−1 2x v + (1 − x 2 )α v = −2α(1 − x 2 )α−1 x v + (1 − x 2 )α v , and the second-order derivative term is [(1 − x 2 )y ] = −2α(1 − x 2 )α x v + (1 − x 2 )α+1 v = 4α 2 (1 − x 2 )α−1 x 2 v − 2α(1 − x 2 )α v − 2α(1 − x 2 )α x v − 2(α + 1)(1 − x 2 )α x v + (1 − x 2 )α+1 v 4α 2 2 2 2 α x v − 2α v − 2(2α + 1) x v + (1 − x ) v . = (1 − x ) 1 − x2 Together with the term λ − becomes m2 1−x 2 y, the associated Legendre differential equation 180 Appendix (1 − x 2 )α m2 4α 2 2 2 x v − 2α v + λ − + (1 − x ) v v − 2(2α + 1) x v =0 1 − x2 1 − x2 −m 2 We see that the term 1 1−x 2 1− 4α 2 m2 1− x2 x2 v + (λ − 2α) v − 2(2α + 1) x v + (1 − x 2 ) v = 0. entirely cancels by α = y= m , 2 which fixes the substitution m 1 − x 2 v. (A.25) Note that for rotational symmetric solutions around the Cartesian z coordinate, the choice of m = 0 would ensure a constant part m = const. Re√ azimuthal m inserting x = ζ = cos ϑ, the preceding term 1 − cos2 ϑ = sinm ϑ is understandably required to represent shapes that aren’t rotationally symmetric around z, but any other, freely rotated axis, for which we also required the sinusoids in 2D. The differential equation for v = v(cos ϑ) is (1 − x 2 ) v − 2(m + 1) x v + [λ − m(m + 1)] v = 0. (A.26) Still, the above equation is singular at x ± 1, which means that the secondderivative term multiplied by (1 − x 2 ) vanishes there, rendering the differential equation into a first-order differential equation, locally. Instead of the more comprehensive Frobenius method we keep it simple: Desired spherical polynomials m−1 θx −θy θx P (θ ,θ ) 1 m m = √m x 2ym Yn = m n = Pn (θx , θy , θz ) with m (ϕ) ∝ √ 2 m θ θ θy 1−θz 1−θz y x m 2 P imply that m must contain 1 − θ (θ ) to be polynomial and nth-order: in n−m z n √ z m n−m k a x , see also [4]. condensed notation this is y = 1 − x 2 k=0 k ∞ k Power-series for v. With v = k=0 ak x , we get after inserting and deriving (1 − x 2 ) ∞ k(k − 1) ak x k−2 − 2(m + 1)x k=2 ∞ k ak x k−1 k=1 ∞ + λ − m(m + 1) ak x k = 0, k=0 ∞ k=2 k(k − 1) ak x k−2 − ∞ k=2 k(k − 1) ak x k − 2(m+1) ∞ k ak x k k=1 ∞ + λ − m(m + 1) ak x k = 0. k=0 For k ≥ 2, all sum terms are present and the comparison of coefficients for the kth power yields: Appendix 181 (k + 1)(k + 2) ak+2 = k(k − 1) + 2(m + 1)k − λ − m(m + 1) ak k(k + 2m + 1) + m(m + 1) − λ ak+2 = ak . (k + 1)(k + 2) Typically for such a two-step recurrence, two starting conditions a0 = 1, a1 = 0 and a0 = 0, a1 = 1 yield a pair of linearly independent solutions (even and odd). If the series in x should converge, it will most certainly do so when v is polynomial and stops at some order. y to be of some arbitrary finite order n ∈ Z, we √ To design m take into account that 1 − x 2 is of mth order already, so the polynomial v must be (n − m)th order, and |m| ≤ n. The series is forced to stop the coefficient ak for k = n − m if the numerator is forced to become zero by a suitably chosen λ, thus λ = (n − m)(n + m + 1) + m(m + 1) = n(n + 1). Corresponding to the termination either at an even or odd k = n − m, even a0 = 1, a1 = 0 or odd a0 = 0, a1 = 1 starting conditions must be chosen. The otherwise wrong-parity solution is an infinite series [5, Eq. 3.2.45] whose convergence radius R indicates singularities at x = ±1, R = lim ak k→∞ ak+2 (k + 1)(k + 2) = lim k→∞ k(k + 2m + 1) − m(m + 1) − n(n + 1) k→∞ = lim k2 + · · · = 1. k2 + · · · (A.27) Using λ = n(n + 1) and writing the differentials in condensed form, the defining differential equations for associated Legendre functions Pnm (m is no exponent but a second index) and their polynomial part vnm become m(m + 1) d 2 d m Pnm = 0, (1 − x ) Pn + n(n + 1) − dx dx 1 − x2 2 −m d 2 m+1 d m (1 − x ) vn + n(n + 1) − m(m + 1) vnm = 0. (1 − x ) dx dx (A.28) (A.29) Orthogonality of associated Legendre functions. The resulting associated Legendre differential equation (1 − x 2 ) Pnm + n(n + 1) − m2 Pnm = 0, 1 − x2 yields a sequence of finite-order functions Pnm with the order n ∈ N0 and |m| ≤ n. %1 Before even defining these functions, we can prove their orthogonality −1 Pnm Plm dx = 0 for n = l. This means no product of a pair of associated Legendre functions of different indices n = l produces any constant part on x ∈ [−1; 1], and Pnm and Plm do not contain shapes of the respective other function. This is important to uniquely decompose shapes and to define transformation integrals. We multiply the differential equation with Plm and integrate it over x 182 Appendix $ 1 (1 − x ) 2 −1 Pnm $ Plm dx + 1 −1 m2 n(n + 1) − Pnm Plm dx = 0. 1 − x2 Integration by parts of the first integral yields & $ 1 $ 1 m m m m &1 2 2 & Pl dx = (1 − x ) Pn Pl & − (1 − x 2 ) Pnm Plm dx, (1 − x ) Pn −1 −1 −1 =0 (A.30) where the vanishing part is because of (1 − x 2 ) = 0 at the endpoints x = ±1 where [Pnm ] and Plm are finite. We get $ 1 −1 (1 − x 2 ) Pnm Plm dx = $ 1 −1 n(n + 1) − m2 Pnm Plm dx. 1 − x2 We could have arrived at an alternative expression, with the only difference in l(l + 1) instead of n(n + 1), $ 1 −1 (1 − x 2 ) Plm Pnm dx = $ 1 −1 l(l + 1) − m2 Plm Pnm dx, 1 − x2 if we had started integrating the differential equation of Plm over Pnm , instead. The difference of both equations is n(n + 1) − l(l + 1) $ 1 −1 Pnm Plm dx = 0, and the scalar in brackets only vanishes for n = l. For the equation to hold at other %1 n = l, we conclude that the associated Legendre functions −1 Pnm Plm dx = 0 must be orthogonal. (Orthogonality needs not hold for different m, as m achieves this orthogonality in azimuth.) Solving for polynomial part of associated Legendre functions. To solve the differential equation for the polynomial part vnm in a way to arrive at the elegant Rodrigues formula, we first play with a test function u n = (1 − x 2 )n , differentiated u n = −2n x (1 − x 2 )n−1 = −2n (1 − x 2 )−1 x u n . We may write its derivative as differential equation (1 − x 2 )u n + 2n x u n = 0 Appendix 183 and derive it l times by the Leibniz rule ( f g)(n) = nk=0 nk f (k) g (n−k) for repeated n! and f (k) = differentiation of products, with the binomial coefficient nk = k!(n−k)! dk f dx k for simplicity. The few non-zero derivatives of x and (1 − x 2 ) simplify differentiation, x = 1, [(1 − x 2 ) = −2x] = −2, (l − 1)l (l−1) + 2n x u (l) = 0, (−2) u (l−1) n n + 2n l u n 2 (l−1) − 2(l − n) x u (l) = 0. (1 − x 2 ) u (l+1) n n + l(2n − l + 1) u n (1 − x 2 )u (l+1) + l (−2x) u (l) n n + This equation matches (1 − x 2 ) vnm − 2(m + 1) x vnm + [n(n + 1) − m(m + 1)] vnm = 0 by matching the coefficients l − n = m + 1, hence l = m + n + 1, which nicely implies l(2n − l + 1) = n(n + 1) − m(m + 1), − 2(m + 1) x u (m+n+1) + n(n + 1) − m(m + 1) u (m+n) = 0. (1 − x 2 ) u (m+n+2) n n n We therefore find the solutions vnm = u (n+m) = n √ m dn+m 2 n 2 1 − x dx n+m (1 − x ) . dn+m (1 dx n+m − x 2 )n yielding ynm = Rodrigues formula. By of the above, the Rodrigues formula for the associated Legendre functions Pnm becomes m dn+m (−1)n+m 2 1 − x (1 − x 2 )n 2n n! dx n+m m dm or Pnm = (−1)m 1 − x 2 Pn , dx m Pnm = (A.31) with Pn = (−1) d (1 − x 2 )n . 2n n! dx n n n and Pn = Pn0 are the Legendre polynomials. The Legendre polynomials are norn . Because (1 − x 2 ) is zero at x = 1 with malized to Pn (1) = 1 by the factor (−1) 2n n! any positive integer exponent, only the part of its n-fold derivative that exclusively affects the power of (1 − x 2 )n for n times is responsible for its value there: n!(−2x)n (1 − x 2 )0 |x=1 = n!2n (−1)n . The scaling of the associated Legendre functions with m > 0 is somewhat more arbitrary in sign and value. Indies n and m. The boundaries for the index m of the Legendre functions m ∈ Z are m2 typically −n ≤ m ≤ m, however due to the shift of the eigenvalue by 1−x 2 , functions for positive and negative m are linearly dependent. We observe this by inspecting the highest-order terms in [6] m dn+m (1 − x 2 )n 2n n! 1 − x 2 Pnm = (−1)n+m (1 − x 2 )m n+m dx n+m (2n)! d x n−m − · · · = x 2m n+m x 2n − · · · = x 2m dx (n − m)! n−m 2 n m d (1 − x ) 2n n! 1 − x 2 Pn−m = (−1)n+m dx n−m 184 Appendix (2n)! dn−m 2n m n+m x x − · · · = (−1) − · · · dx n−m (n + m)! (n − m)! = (−1)m (A.32) Pm (n + m)! n = (−1)m =⇒ Pn−m and to avoid confusion, it convenient to only use m ≥ 0, or |m| to evaluate the associated Legendre functions. Alternative definition: three-term recurrence. Any polynomial Pn of the order n n ci Pi , and the Legendre can be decomposed into Legendre polynomials Pn = i=0 %1 polynomial P j is orthogonal to all those Legendre polynomials −1 Pn P j dx = 0 if %1 j > n. With this knowledge it is interesting to describe −1 (x Pi )P j dx. As (x Pi ) is of (i + 1)th order, the integral must vanish for j > i + 1. Because of commutativity, %1 −1 Pi (x P j )dx, and (x P j ) being ( j + 1)th order, it also vanishes for i > j + 1. Hereby, re-expansion of x Pn can maximally use three terms, x Pn = α Pn−1 + γ Pn + β Pn+1 . In fact only two terms remain as P2k are even functions on x ∈ [−1; 1] and P2k+1 are odd, thus orthogonal. The product x Pn changes the parity of Pn , leaving x Pn = α Pn−1 + β Pn+1 . At x = 1 all polynomials were normalized to Pi (1) = 1, therefore evaluation at x = 1 leaves 1 = α + β, so α = 1 − β, hence x Pn = βn Pn+1 + (1 − βn ) Pn−1 . As also the associated Legendre functions Pnm for a specific m are orthogonal, the recurrence is more general m m + (1 − βnm ) Pn−1 . x Pnm = βnm Pn+1 To determine the coefficient βnm , we only need to find out how the highest-power coefm are related. We see this after ficients x n−m+1 of the polynomial parts in x Pnm and Pn+1 √ √ n+m m (−1)n+m dn+m m 2 n 2 inserting Pn = 1 − x (1 − x ) and division by 1 − x 2 (−1) , 2n n! dx n+m 2n n! which leaves a recurrence for the polynomial part x vnm O=n−m+1 =− βnm m vm − 2(n + 1)(1 − βnm )vn−1 . 2(n + 1) n+1 O=n−m−1 O=n−m+1 m m Of the highest powers x n−m+1 in both x vnm and vn+1 the coefficients cn,n−m and m cn+1,n−m+1 define βnm = −2(n + 1) m cn,n−m m cn+1,n−m+1 . To find it, we binomially expand (1 − x 2 )n to (1 − x 2 )n = (−1)n x 2(n−k) n k=0 (−1) k n k Appendix 185 n−m n 2 n (2n − 2k)!(−1)k n−m−2k vnm dn+m n k 2(n−k) (−1) x = x = , k (n − m − 2k)! (−1)n dx n+m k=0 k k=0 n! (2n)! (2n)! m so that with k = 0 we can find cn,n−m = (−1) = (−1) for the highestn! (n−m)! (n−m)! m power coefficient of vn . Accordingly, coefficient of the recurrence is n βnm = 2(n + 1) hence with 1 − βnm = Pnm recursively by n+m , 2n+1 n (n − m + 1) n−m+1 = , (2n + 1)(2n + 2) 2n + 1 and x Pnm = + n−m+1 m Pn+1 2n+1 n+m m P , 2n+1 n−1 we can construct 2n + 1 n+m x Pnm − Pm . n−m+1 n − m + 1 n−1 m Pn+1 = 2n √ n 2n 1 − x 2 dxd 2n (1 − x 2 )n = The start value is Pnn = (−1) 2n n! m is excluded. n = m, the term Pn−1 (−1)n (2n)! 2n n! (A.33) √ n 1 − x 2 , and for Normalization. A unity square integral (orthonormalization) simplifies the definition of transform integrals. We would like to obtain the corresponding factor Nnm with $ 1 −1 (Pnm Nnm )2 dx = 1. Normalization for m = 0 is easy to find by repeated integration by parts 22n n!2 = 22n n!2 (Nn )2 $ (n−1) (n) &1 (1 − x 2 )n &−1 (Pn )2 dx = (1 − x 2 )n −1 1 $ − $ = · · · = (−1)n $ = 1 −1 1 −1 =0 1 −1 (1 − x 2 )n (n−1) (n) (1 − x 2 )n dx (2n) (1 − x 2 )n (1 − x 2 )n dx $ π (1 − x 2 )n (2n)! dx = (2n)! sin2n ϑ sin ϑ dϑ. 0 %π (2n)!! 2 (2n)! 22n n!2 With the integral 0 sin2n+1 ϑ dϑ = 2 (2n+1)!! = 2 (2n+1)! , this is (Nn )−2 = (2n+1)! = 2 m m −m . For Nn , a trick to insert the relation between Pn and Pn is used [6], and 2n+1 integration by parts until the differentials are of the same order $ 1 $ 1 (−1)m (n − m)! −m m P m dx = Pn dx = P Pnm n n m 2 (n + m)! (Nn ) −1 −1 1 186 Appendix $ (n+m) (n−m) 1 (−1)m (n − m)! 1 = 2n 2 (1 − x 2 ) (1 − x 2 ) dx = · · · (n + m)! 2 n! −1 $ (n−m) (n−m) 1 2 (n + m)! (n − m)! 1 (1 − x 2 ) (1 − x 2 ) dx = = 2n 2 (n + m)! −1 2 n! 2n + 1 (n − m)! ' ⇒ Nnm = (−1)m =1/(Nn )2 (2n + 1) (n − m)! 2 (n + m)! (A.34) The (−1)m can be excluded if not used in the Rodrigues formula. (It is always a wise idea to check and compare signs as conventions may differ … in practice (−1)m is a rotation around z by 180◦ .) A.3.7 Spherical Harmonics With all the above definitions, we obtain the fully normalized spherical harmonics Ynm (ϕ, ϑ) =Nn|m| Pn|m| (cos ϑ) m (ϕ) (A.35) Orthonormality. They are orthonormal when integrated over the sphere $ S2 Ynm Ynm d cos θ dϕ = δnn δmm . (A.36) Transform integral. Because of their completeness in the Hilbert space, any squareintegrable function g(ϕ, ϑ) can be decomposed by g(ϕ, ϑ) = ∞ n γn m Ynm (ϕ, ϑ). (A.37) n=0 m =−n From a known function g(ϕ, ϑ), the coefficients are obtained by integrating g %1 % 2π with another spherical harmonic Ynm over the unit sphere S2 , −1 d cos ϑ 0 dϕ. For a simple notation, we gather the two variables in a direction vector θ = [cos ϕ sin ϑ, sin ϕ sin ϑ, cos ϑ]T and write $ $ S2 g(θ) Ynm dθ = ∞ n n =0 m =−n γn m Ynm Ynm dθ = γnm . S2 δnn δmm (A.38) Appendix 187 Parseval’s theorem. Due to orthonormality, the integral norm of any pattern g(θ) n m composed as ∞ n=0 m=−n γnm Yn (θ ) is equivalent to $ |g(θ)| dθ = 2 S2 because % |γnm |2 (A.39) n=0 m=−n n,n ,m,m S2 ∞ n γnm γn∗ m Ynm (θ ) Ynm (θ) dθ = n,n ,m,m γnm γn∗ m δnn δmm . 3D panning functions: Dirac delta on the sphere. An infinitely narrow range around the desired direction θ s can be described by limiting the dot product θ Ts θ > cos ε → 1. A unit-surface Dirac delta distribution δ(1 − θ Ts θ ) can be described as θ Ts θ ) δ(1 − 1 = 2π 1 , for arccos θ Ts θ < ε limε→0 1−cos ε 0, otherwise. (A.40) And its coefficients are found by the transformation integral $ γnm = $ S2 Ynm (θ ) dθ = Ynm (θ s ) $ 2π dϕ lim 0 1 ε→0 cos ε dζ = Ynm (θ s ). (A.41) Typically, a finite-order panning function with n ≤ N employs a weight an to reduce side lobes gN (θ ) = N n an Ynm (θ s ) Ynm (θ ). (A.42) n=0 m=−n T Assuming the panning (direction is θ s = [0, 0, 1] , we get the axisymmetric panning function, with Yn0 = 2n+1 Pn , 4π gN (ϑ) = N 1 2n + 1 an Pn (cos ϑ). 2π n=0 2 (A.43) We can evaluate its E measure by integrating gN2 over the sphere $ E= $ 2π dϕ 0 1 −1 gN (ζ )2 dζ = 2π = δi j N 2n + 1 n=0 $ 1 2i + 1 2 j + 1 ai a j Pi P j dζ 2 2 −1 i, j 2 an2 . 2 2i+1 (A.44) 188 Appendix The r E measure is, because of the axisymmetry, perfectly aligned with z, therefore, its length is calculated by %1 −1 rE = gN (ζ ) ζ dζ E 2i+1 i, j 4 = ai a j = %1 −1 i, j 2i+1 2 j+1 2 2 ai a j %1 −1 j+1 2 j+1 Pi j P j+1 + 2 j+1 P j−1 ζ Pj dζ E Pi [( j + 1)P j+1 + j P j−1 ]dζ E N [n a a + (n + 1) an an+1 ] n an an−1 n n−1 n=0 = n=1 . 2E E N = (A.45) 3D max-r E . For the narrowest-possible spread, we maximize rE , which we decompose into rE = r̂EE and we zero its derivative, as for 2D, r̂E r̂E 1 − 2 E = [r̂E − E rE ] = 0 E E E n an−1 + (n + 1) an+1 − (2n + 1) an rE = 0. rE = (A.46) If we assume that an = Pn (ζ ), we see by (n + 1)Pn+1 + n Pn−1 = (2n + 1) Pn ζ (n + 1) Pn+1 + n Pn−1 = (2n + 1) Pn ζ = Pn rE (A.47) that rE = ζ and an = Pn (ζ ) = Pn (rE ). We maximize rE under the constraint that PN+1 (rE ) = 0. Therefore, rE must be as close to 1 as possible, and be a zero of the Legendre polynomial PN+1 . It can be discovered by a root-finding algorithm in MATLAB, e.g. Newton–Raphson, when the function Pn is implemented. In [7], the 2.4062 137.9◦ = cos N+1.51 was given. useful approximation rE = cos N+1.51 Squared norm mirror/rotation invariance. The norm of any pattern a(θ ) is invariant under orthogonal coordinate transform (rotation/mirror) θ̂ = R θ with RT R = I, $ $ S2 a 2 (θ ) dθ = S2 b2 (θ ) dθ , with b(θ ) = a(R θ ). (A.48) The norm equivalence of the corresponding spherical harmonics coefficients αnm and βnm follows from Parseval’s theorem ∞ n n=0 m=−n |αnm |2 = ∞ n |βnm |2 . (A.49) n=0 m=−n In vector notation of the coefficients, i.e. α = [α00 , . . . , αNN ]T and β = [β00 , . . . , βNN ]T , this is α2 = β2 . To fulfill this equivalence, both vectors are related by an orthogonal matrix β = Q α with Q T Q = I, and hence b2 = β T β = α T Q T Qα = Appendix 189 α T α = α2 . Moreover, rotation/mirroring neither creates components of higher nor lower orders, so that Q must be block structured ⎤ Q0 0 . . . 0 ⎢ .. ⎥ ⎥ Q=⎢ ⎣ 0 Q1 0 . ⎦ . .. . . . . . . . . . . ⎡ (A.50) The order subspaces in n therefore stay de-coupled, so that the coefficient vectors for every order-subspace n are related and norm equivalent under mirror/rotation operations α n 2 = β n 2 , β n = Qn αn , Q Tn Q n = I 2n+1 . (A.51) Pseudo all-pass character of the Dirac delta. Dirac delta distributions δ(θ T θ s − 1) yield the coefficients Ynm (θ s ), and due to rotation invariance they yield constant energy in every spherical harmonic order n, regardless of the aiming θ s . One can determine the norm for zenithal aiming ϑ = 0, i.e. θ z = 0, 0, 1T , yielding a non-zero coefficient for m = 0 Ynm (θ z ) = ( =1 2n+1 4π ( δm . Pn (1) δm = 2n+1 4π Because of the rotation invariance we recognize a pseudo-allpass character (Unsöld theorem) of the spherical harmonics of any order n n n |Ynm (θ s )|2 = m=−n |Ynm (θ z )|2 = 2n+1 4π = (2n + 1) |Y00 (θ s )|2 . m=−n For encoded single-direction Ambisonic signals αnm (t), this implies n |αnm (t)|2 = (2n + 1) |α00 (t)|2 . (A.52) m=−n Expected norm in the diffuse field. An ideal diffuse sound field is composed of directional signals a(θ s , t) from all directions θ , with no correlation for signals from different directions E{a(θ 1 , t) a(θ 2 , t)} = σa2 δ(θ T1 θ 2 − 1). Its coefficients are obtained by the integral over the directions $ αnm (t) = S2 a(θ s , t) Ynm (θ s ) dθ s , (A.53) and we can show that not only the expected directional signals, but also the spherical harmonic coefficients are orthogonal by E{a(θ 1 , t) a(θ 2 , t)} = σa2 δ(θ T1 θ 2 − 1) and 190 Appendix % Ynm Ynm dθ = δnn δmm of the spherical harmonics $ $ E{αnm (t) αn m (t)} = E{a(θ 1 , t) a(θ 2 , t)} Ynm (θ 1 ) Ynm (θ 2 ) dθ 1 dθ 2 S2 S2 $ 2 = σa Ynm (θ) Ynm (θ) dθ = σa2 δnn δmm . (A.54) orthonormality S2 S2 In a perfectly diffuse field, we therefore expect the same norm in every spherical harmonic component per frequency band. We could reformulate this to E{|αnm (t)|2 } = E{|α00 (t)|2 }, however the temporal disjointness assumption of SDM only invents a drastically thinned out content in the individual higher-order spherical harmonics. To cover the available temporal information from all (2n + 1) spherical harmonic signals within each order n and for a similar formulation as for a single-direction component, we may re-formulate n * ) E |αnm (t)|2 = (2n + 1) E{|α00 (t)|2 }. (A.55) m=−n Spherical convolution. By the argumentation used above to prove rotation invariance, we can argue that isotropic filtering of spherical patterns is invariant under rotation, and must therefore depend only on the order n. Spherical convolution of is defined in [8] by the coefficients βnm of a function b(θ ) convolved with the coefficients αn of a rotationally symmetric shape a(θ ) = a(θz ) γnm = αn βnm . (A.56) Spherical cap function. A rotationally symmetric spherical cap function at ± α2 centered around ϑ = 0, briefly θ z , can be written in terms of a unit-step. We find the shape coefficients wn for its spherical harmonic decomposition by u(cos ϑ − cos α2 ) = ∞ n wn Ynm (θ) Ynm (θ z ) = n=0 m=−n where Ynm (θ z ) = ( 2n+1 m Pn (1) 4π ∞ wn Pn (cos ϑ) 2n+1 , (A.57) 4π n=0 = ( 2n+1 δ 4π m and Pn0 = Pn was used. The coefficients %1 wn are obtained by integration over Legendre polynomials d cos ϑ −1 Pn (cos ϑ) %1 2 δnn , and for the right hand-side using their orthogonality −1 Pn (ζ )Pn (ζ ) dζ = 2n+1 2 2n+1 leaving wn 2n+1 4π , so that $ wn = 2π 1 cos α 2 Pn (x) dx. Appendix 191 n n The integral is solved by 2n n!Pn = ddxvn with v = (x 2 − 1) after replacing the innerdv d d = 2x dv , and Leibniz’ rule for repeated derivatives most derivative by dxd = dx dv n dn−1 2x dv dn v n dn−1 (2xnvn−1 ) dv = = dx n dx n−1 dx n−1 0 1 0 d x dn−1 vn−1 1 d x dn−2 vn−1 = (2n) + n − 1 dx 0 dx n−1 n − 1 dx 1 dx n−2 n−1 n−1 d v dn−2 vn−1 . = (2n) x + (n − 1) dx n−1 dx n−2 We may increase n by one, observe that the last expression has one fewer differential, n n thus is an integrated version, and obtain after re-inserting 2n n!Pn = ddxvn 2 n+1 (n + 1)! Pn+1 = 2(n + 1) 2 n! x Pn + n2 n! $ Pn+1 − x Pn +C Pn dx = n 1 x0 Pn dx − nC n With the definite integration limits x0 = cos α2 and 1 and lower boundary as Pn+1 (1) − 1 · Pn (1) = 0, $ $ n (A.58) %1 Pn+1 (x0 ) − x0 Pn (x0 ) n Pn+1 cos α2 − cos α2 Pn cos α2 , wn = −2π n x0 Pn dx only depends on Pn (x)dx = − (A.59) for n > 0, (A.60) %1 and w0 = 2π cos α dx = 2π(1 − cos α2 ) for n = 0. 2 The recurrence (2n + 1)x Pn −(n + 1)Pn+1 = n Pn−1 yields alternatively for n>0 Pn−1 cos α2 − cos α2 Pn cos α2 Pn+1 cos α2 + Pn−1 cos α2 = 2π . wn = 2π 2n + 1 n+1 (A.61) A.4 Encoding to SH and Decoding to SH Mode-matching decoder: L loudspeakers driven by the weights gl and given by their directions {θl } produce a pattern f (θ ) linearly composed of Dirac deltas 192 Appendix f (θ) = L δ(θ T θl − 1) gl (A.62) l=1 = ∞ n Ynm (θ ) n=0 m=−n L Ynm (θl ) gl = y(θ )T Y g, l=1 in an order-unlimited representation. The vector y = [Ynm (θ)]nm contains all the spherical harmonics 0 ≤ n ≤ ∞, −n ≤ m ≤ n, in a suitable order, e.g. Ambisonic Channel Number (ACN) n 2 + m + n, and the matrix Y = [ y(θl )]l contains the spherical harmonic coefficient vectors of every loudspeaker. Obviously, the spherical harmonic coefficients synthesized by the loudspeakers are φ = Y g, so that f (θ ) = y(θ)T Y g = y(θ)T φ. With L loudspeakers, at most (N + 1)2 ≤ L spherical harmonics can be controlled. Therefore control typically restricts to the under-determined Nth-order subspace φ N = Y N g, in which we can synthesize any coefficient vector φ N . To get a finite and welldetermined solution with the exceeding and arbitrary degrees of freedom in g, the least-squares solution for g is searched under the constraint min g2 subject to: φ N = Y N g, (A.63) yielding the cost function with the Lagrange multipliers λ J (g, λ) = g T g + (φ N − Y N g)T λ. For the optimum in g, its derivative to g is zero, and in λ the corresponding derivative: ∂J = 2g opt − Y TN λ = 0, ∂g ∂J = φ N − Y N g = 0. ∂λ For g the equation yields g opt = 21 Y TN λ, and for λ the original constraint φ N = Y N g that only allows to insert the optimal g yielding φ N = Y N ( 21 Y TN λopt ). Inversion of ( 21 Y N Y TN )−1 from the left yields the multipliers 2(Y N Y TN )−1 φ N = λopt , so that g = Y TN (Y N Y TN )−1 φ N . The solution is right-inverse to Y N , i.e. Y N [Y TN (Y N Y TN )−1 ] = I. (A.64) Appendix 193 Best-fit encoder by MMSE: When given M samples of a spherical function g(θ) at the locations {θl }, we can minimize the means-square error (MMSE) min L g(θl ) − N n 2 Ynm (θl ) γnm n=0 m=−n l=1 to find suitable spherical harmonic coefficients γnm . Using the matrix notation from above, this is min e2 = min g − Y TN γ N 2 (A.65) and we find by zeroing the derivative ∂e T ∂ T e e=2 e = 2Y N e = 2Y N Y TN γ N − 2Y N g = 0, ∂γ N ∂γ N =⇒ γ N = (Y TN Y N )−1 Y N g. (A.66) The resulting matrix is left-inverse to the thin matrix Y TN , and can be written in terms of the more general pseudo inverse (Y TN )† . A.5 Covariance Constraint for Binaural Ambisonic Decoding The interaural covariance matrix is related to the expectation value of the auto/crosscovariances of the left and right HRTFs: ∗ R = $ S2 $ h left (θ , ω) ∗ ∗ h left (θ , ω) h right (θ , ω) dθ = h(θ, ω)h(θ , ω)H dθ . h right (θ, ω) S2 (A.67) When in terms of spherical harmonic coefficients h = yT hSH , the inte% specified ∗ gral S2 h 1 h 2 dθ of any of R’s entries vanishes by the orthogonality of the spherical % H T harmonics hH SH1 S2 y y dθ hSH2 = hSH1 hSH2 , and we obviously only need the inner product between the spherical-harmonic coefficients of the HRTFs. A very-high-order spherical harmonics HRTF dataset H H SH of dimensions 2 × (M + 1)2 with the order (M N) yields a covariance matrix at every frequency H R = HH SH H SH = X X that can be factored into a quadratic form of a 2 × 2 matrix X by Cholesky factorization, which reduces the degrees of freedom involved to the minimum required size. 194 Appendix The Nth-order Ambisonically reproduced, high-frequency modified HRTF dataset H̃ SH of dimensions 2 × (M + 1)2 also has a 2 × 2 covariance matrix R̂ that will differ from R, and which we also decompose in Cholesky factors X̂, H H R̂ = Ĥ SH Ĥ SH = X̂ X̂. (A.68) To equalize R = R̂, the reproduced HRTF set is corrected by a 2 × 2 filter matrix M, Ĥ SH,corr = Ĥ SH M. (A.69) This is done properly as soon as I X X = M X̂ X̂ M = M X̂ Q H Q X̂ M, H H H H H (A.70) and the orthogonal matrix Q is used to compensate for degrees of freedom that the Cholesky factors X and X̂ have in sign, phase, mixing, with regard to each other. We recognize the root and hereby the preliminary solution for M X = Q X̂ M, ⇒ M = X̂ −1 Q H X. (A.71) −1 This leaves Ĥ SH,corr = Ĥ SH X̂ Q H X depending on an unspecific orthogonal 2 × 2 matrix Q. To obtain a corrected-covariance HRTFs Ĥ corr.SH of highest-possible phase-alignment and correlation to its uncorrected counterpart Ĥ SH , we maximize the trace, i.e. the sum of diagonal elements H H max ReTr{ Ĥ SH Ĥ corr,SH } = max ReTr{ Ĥ SH Ĥ SH X̂ H max ReTr{ X̂ X̂ X̂ −1 H −1 Q H X} = H Q H X} = max ReTr{ X̂ Q H X} = max ReTr{ Q H X̂ X}. For the last expression, the property Tr{ AB} = Tr{B A} was used. An orthogonal matrix Q H = V U H composed of two orthogonal matrices U and V would yield H H Tr{ X̂ X V U H } = Tr{U H X̂ X V }, and it would maximize the trace if U and V H H diagonalized X̂ X. This is accomplished by singular-value decomposition (SVD) H X̂ X = U SV H , when singular values S = diag{[s1 , s2 ]} are real and positive, as in most SVD implementations. Using U and V , the desired solution is: Ĥ corr,SH = Ĥ SH X̂ −1 V U H X. (A.72) Appendix 195 If the SVD delivers negative or complex-valued singular values, the complex/negative factor just need to be pulled out and factored into either the corresponding left or right singular vector. A.6 A.6.1 Physics of the Helmholtz Equation Adiabatic Compression We search for a physical compression equation relating pressure p and volume V . Ideal gas. The gas pressure p inside the volume V obeys the ideal gas law [9] p V = n R T, (A.73) with n measuring the amount of substance in moles, R is the gas constant, and T is the temperature. This would yield a valid compression equation if the medium of sound propagation was isothermal. However, this is not the case T = const, and local temperature fluctuations happen too fast to be equalized by thermal dissipation. Isothermal compression would be too soft, the resulting speed of sound off by −15%. A compression law involving fluctuations of all three quantities ( p, V, T ) needs an additional equation. First law of thermodynamics. In thermodynamics [10–12], the enthalpy H describes the energy required to heat up a freely expanding gas under constant pressure p. The enthalpy goes to the internal energy U required to heat up the gas in a constant volume, which is easier, plus the ideal-gas volume work p V taken by the gas to expand under the constant external pressure H = U + p V, specifically n cp T = n cV T + n R T, (A.74) ⇒ R = cp − cV . The quantities cp and cV are the specific heat capacities for heating up a gas that is expanding ( p = const.) or confined in a fixed volume (V = const.) to a temperature T , which can be accurately measured or modeled. Obviously, the gas constant R is the difference between the two. To make sound propagation isenthalpic, the energy must fluctuate between internal energy U and volume work pV . Adiabatic process. The above steady-state equations are not useful yet to describe short-term fluctuations of p, V , and T in time and space. A differential formulation related to the change in enthalpy, internal energy, and volume work dH = dU + p dV is more useful. Moreover, we regard packages of a constant amount of substance whose internal heat up is just due to compression and not due to external enthalpy sources, it is therefore isenthalpic dH = 0, see [10, Sect. 3.12.2], [13] 196 Appendix 0 = n cV dT + nRT dV. V c We may divide by n cV T , replace R = cp − cV , and obtain dT + ( cVp − 1) dV = 0, T V c cp p cp −1 −1 = 0, hence T V cV = whose integration yields ln T +( cV − 1) ln V = ln T V cV pV 1, and with the ideal gas equation inserted as T = n R , the adiabatic process law becomes cp p V cV = n R = const, (A.75) c for which the adiabatic exponent is frequently expressed as γ = cVp . For air, the exponent is γ = 1.4, and we may express a state change as ( p0 , V0 ) → ( p0 + p, V0 + V ). γ γ The equation p0 V0 = ( p0 + p)(V0 + V )γ yields after division by p0 V0 and by V γ (1 + V0 ) : p V −γ V 1+ = 1+ ≈1−γ , p0 V0 V0 hence p = −γ p0 V . V0 Assuming the Cartesian coordinates x, y, z measured in the resting gas to define its volume V0 = xyz, as well as its deflected coordinates ξ(x), η(y), ζ (z) after a volume change to V , we can approximate the volume change well-enough by the three independent volume changes ξ yz, xηz, and xyζ , resulting from the superimposed individual elongation into the three coordinates’ directions, ∂ξ ∂η ∂ζ V ξ yz + xηz + xyζ = + + = ∇T ξ . = lim V0 →0 V0 V0 →0 xyz ∂x ∂y ∂z lim 2 1 Replacing the √ bulk modulus K = γ p0 = ρ c of air by ∂more common constants, K where c = /ρ and applying the derivative in time ∂t , we get the equation of compression in its typical form using the velocities ∂ξ = v, ∂t ∂p = −ρ c2 ∇ T v. ∂t A.6.2 (A.76) Potential and Kinetic Sound Energies, Intensity, Diffuseness The potential energy density or volume work stored in the elastic medium that gets compressed by a deformation dV increases with dwp = p dV , while deformation also increases the pressure by d p = K dV . We may substitute for dV = K −1 d p 1 Typical constants are γ = 1.4, p0 = 105 Pa, ρ = 1.2 kg/m3 , c = 343 m/s. Appendix 197 yielding dwp = K −1 p d p. The volume work stored by a pressure increase from 0 to p is $ p wp = 0 p dp p2 p2 . = = K 2K 2ρc2 (A.77) The kinetic energy density stored in the motion of the medium along any axis, e.g. x, increases by acceleration against its mass, dwvx = ρ dvdtx dx. The velocity is vx = dx dt so that we substitute for dx = vx dt to get dwvx = ρ vx dvx . The total kinetic energy density stored in velocities increasing from 0 to vx , vy , vz is $ $ vx wv = ρ vx dvx + 0 vy $ vz dvz = ρ vy dvy + 0 vz vx2 + vy2 + vz2 0 2 = ρv2 . 2 (A.78) Total energy density and intensity. The total energy density therefore becomes w = wp + wv = p2 ρv2 , + 2ρc2 2 (A.79) and derived with regard to time, it becomes p ∂p ∂w ∂v = 2 + ρvT = p∇ T v + vT ∇ p = ∇ T ( pv) = ∇ T I, ∂t ρc ∂t ∂t (A.80) and defines the (time-domain) intensity vector I = pv that describes the energy flow in space. Hereby, ∂w = ∇ T I expresses that only a non-zero divergence of the ∂t intensity causes energy increase (source) or loss (absorption) in the lossless medium. Direction of arrival and diffuseness: The intensity vector carries a meaning in its own right: it displays into which direction the energy flows (direction of emission). In the frequency domain, it becomes I = Re{ p ∗ v}, and for a plane-wave sound field T ∇p s p = − θρc , it indicates the direction of arrival (DOA) p = eik θ s r , where v = − ikρc r DOA = − ρc I ρc Re{ p ∗ v} ρc | p|2 θ s = − = = θ s. | p|2 | p|2 ρc | p|2 (A.81) An ideal, uniformly enveloping diffuse field is composed of uncorrelated plane waves % T )∗ a(θ 2 ) a2 √ s ) eikθ s r dθ s . E{ a(θ 1 4π } = 4π δ(1 − θ T1 θ 2 ) resulting in the sound pressure p = a(θ 4π 2 a While the expected sound pressure is non-zero as before E{| p|2 } = 4π = | p|2 , the expected intensity of the uniformly surrounding waves vanishes −ρc E{I} = % a2 θ dθ s = 0. 4π S2 s Assuming stochastic interference of all sources, the intensity-based DOA estimap∗ v} is therefore the physical equivalent to the r E vector measure. tor r DOA = − ρc Re{ 2 | p| 198 Appendix A typical diffuseness measure 0 ≤ ψ ≤ 1 relies on its length between 0 and 1 ψ = 1 − r DOA 2 . (A.82) √ √ p = −ρc 2v of a first-order Ambisonic The signals W = p and [X, Y, Z ]T = 2∇ ik microphone allow to describe a time-domain estimator r DOA r DOA = − A.6.3 ρc E{ p v} E{W [X, Y, Z ]T } ρc E{I} = − = . √ E{ p 2 } E{ p 2 } 2 E{W 2 } (A.83) Green’s Function in 3 Cartesian Dimensions We may compose the Green’s function, the solution to the inhomogeneous wave equation − 1 ∂2 c2 ∂t 2 G = −δ(t)δ(r), from products of complex exponentials with regard to time and the Cartesian directions T eiω t+ikx x+iky y+ikz z = eik r eiω t , (A.84) where the position x, y, z was gathered in a position vector r, and the wave numbers kx , ky , kz of the individual coordinates were gathered in a wave-number vector k. From this solution, we compose the Green’s function by superimposing all spatial and temporal complex exponentials in k and ω, weighted by an unknown coefficient γ : $$ G= γ eik T T r eiω t dω dk. T Because of ei k r = (−k x2 − k 2y − k z2 ) ei k r = −k 2 ei k insertion into the inhomogeneous wave equation yields $$ − γ k2 − ω2 c2 + γ k2 − ω2 c2 r and ∂ 2 iωt e ∂t 2 = −ω2 eiωt , T T %% T ei k r ei ω t dω dk = −δ(t) δ(r). Multiple transformations e−i k̂ r e−iω̂t dr dt − (A.85) %% remove the integrals by orthogonality , % T % % % T [ ei (k− k̂) r dr] [ ei(ω−ω̂)t dt] dω dk = − δ(t)e−iω̂t dt δ(r)e−i k̂ r dr , (2π )3 δ(k− k̂) 2π δ(ω−ω̂) 1 1 Appendix 199 and the unknown coefficient remains γ = domain, one via k 1 2π 1 1 (2π)3+1 k 2 − ω2 c2 in γ and the integral eiωt dω G= 1 (2π )3 $$ eik T k2 − r ω2 c2 dk = % . Letting G in the frequency are omitted. We transform γ back 1 (2π )3 $$ eikr ζ k2 − ω2 c2 dk. By re-expressing kT r = kr θ Tk θ r = kr cos ϑ = kr ζ we simplify the integral. Now we already see formally that Green’s function can only depend on the distance r between source and receiver location G = G(ω, r ). The book [14, S.110-112] shows a notably compact derivation, which we will use below. Derivation for by transforming back from the Fourier domain: For three dimensions, the transformation back from the Fourier domain is relatively easy to accomplish. Before going into details, we recognize that the substitution of kT r by kr cos ϑ contains the radius of the wave vector k = k and the cosine of the angle between r and k. In k space, we can always define a correspondingly oriented coordinate sys%%% ∞ % ∞ % 2π % π 2 tem for any r as to simplify the integral −∞ dk = 0 0 0 k dk dϕ d cos ϑ = % ∞ % 2π % 1 2 k dk dϕ dζ . After re-arranging the integrals, we get 0 0 −1 % 2π G= dϕ (2π)3 $ 0 = 1 1 (2π)2 ir = 1 1 (2π)2 ir = 1 1 (2π)2 ir = 1 1 (2π)2 ir ∞ %1 ikr ζ dζ −1 e 2 k 2 − ωc2 $ ∞ 1 eikr − e−ikr 2 k dk ikr k 2 − ω22 0 0 c $ $ ∞ ikr $ ∞ −ikr ∞ eikr k e − e−ikr e k 1 1 k dk = dk − dk 2 2 2 (2π)2 ir 0 0 0 k 2 − ωc2 k 2 − ωc2 k 2 − ωc2 $ $ −∞ −i(−k)r ∞ eikr k e (−k) dk − d(−k) 2 2 0 0 k 2 − ωc2 (−k)2 − ωc2 $ $ 0 ∞ eikr k eikr k dk + dk 2 2 0 −∞ k 2 − ω2 k 2 − ωc2 c $ ∞ eikr k dk. (A.86) 2 −∞ k 2 − ω2 c k 2 dk = 1 (2π)2 The denominator is expanded in partial fractions G= 1 1 2 2(2π ) ir $ ∞ −∞ 1 2 k 2 − ωc2 eikr dk + k − ωc $ = ∞ −∞ 1 1 2k k− ωc + 1 1 2k k+ ωc eikr dk . k + ωc , yielding (A.87) To obtain causal temporal solutions, there needs to be a specific solution of the improper and singular integrals. 200 Appendix Fig. A.1 Closure of improper integration path using C+ over the positive imaginary half plane for t > 0, and C− for t < 0, and regularization of pole for causal result, i.e. non-zero only for C+ Causal solutions (integral in ω). Causal responses obeying the transformation $ h(t) = ∞ −∞ eiω t dω ω−a (A.88) %∞ are obtained by replacing the improper integral −∞ by a closed integration contour, and by introducing vanishing regularization. Jordan’s lemma states that improper %∞ integration −∞ is equivalent to a closed integration path C+ of positive orientation involving the additional semi-circle on the upper half of the complex number plane %∞ %π % R −∞ = C+ = lim R→∞ −R dω + 0 R dϕ if the integrand of the semi-circle van(i cos ϕ−sin ϕ) R t ishes, i.e. lim R→∞ e R eiϕ −a = 0. This is the case for positive times t > 0. For negative times, the integral can be closed using the lower part of the complex num%∞ % −π % R ber plane, −∞ = C− = lim R→∞ −R dω + 0 R dϕ , if the semi-circular integral vanishes, which is true for negative times t < 0 in our case, see Fig. A.1. We get ⎧⎨ C + h(t) = ⎩ eiω t dω, ω−a if t > 0, eiω t C− ω−a dω, if t < 0. (A.89) According to Cauchy’s integral formula for analytic regular functions f (z) over 1 a single pole z−a , we obtain . C± f (z) f (a), if the path C± surrounds a, dz = ±2π i z−a 0, if a lies outside the path C± . (A.90) If a pole on the real axis a ∈ R is slightly shifted by a vanishing imaginary eiω t dω (regularization) so that it lies within the path amount to lim→0+ C± ω−i−a C+ and not in C− , the result is perfectly causal and vanishes at negative times: h(t) = 2π i lim→0+ eia t− t u(t), with the unit step function u(t) = 1 for t ≥ 0 and 0 for t < 0, see Fig. A.1. Appendix 201 Fig. A.2 Causalityenforcing regularization ω = lim→0+ ω − i of the poles in Green’s function wrt. wave number k. For a positive radius r ≥ 0, closure of the improper integration path requires C+ with semicircle on the positive imaginary half plane (or C− for r ≤ 0) Integral in k. Causality requires specific regularization in frequency as shown above: Replacement of ω by lim→0+ ω − i c guarantees causality in the partial-fraction expanded Green’s function Eq. (A.87). Jordan’s lemma requires to use the path C+ to close the improper path in k for a positive radius r ≥ 0, cf. Fig. A.2, 1 1 G= lim+ 2 →0 2(2π ) ir A.6.4 $ $ ω 2π ie−ikr e−i c r eikr dk eikr dk + = = . ω ω 2 (2π ) i r 4π r Ck − c + i C+ k + c − i + (A.91) Radial Solution of the Helmholtz Equation The radial part of the Helmholtz equation in spherical coordinates is characterized by the spherical Bessel differential equation in x = kr y + 2x −1 y + [1 − n(n + 1) x −2 ]y = 0. (A.92) Recursive construction. For n = 0, we know that the omnidirectional Green’s func−ix tion is a solution diverging from x = 0, and it is proportional to y ∝ e x . We can simplify the equation by inserting y = x −1 u n , which yields with y = x −1 u n − x −2 u n and y = x −1 u n − 2x −2 u n + 2x −3 u n after multiplying with x: u n − 2x −1 u n + 2x −2 u n + 2x −1 u n − 2x −2 u n + [1 − n(n + 1)x −2 ]u n = 0 u n + [1 − n(n + 1)x −2 ]u n = 0. (A.93) Moreover, we attempt to find a recursive definition for n > 0 using the approach yn = x −1 u n , u n = −x a x −a u n−1 . 202 Appendix We evaluate the recursion for the derivatives u n = −u n−1 + a x −1 u n−1 , u n = −u n−1 + a x −1 u n−1 − a x −2 u n−1 , with − u n−1 = [1 − n(n − 1) x −2 ]u n−1 = a x −1 u n−1 + {1 − [n(n − 1) + a] x −2 } u n−1 u n = ax −1 u n−1 − ax −2 u n−1 + {1 − [n(n − 1) + a]x −2 }u n−1 + 2[n(n − 1) + a]x −3 u n−1 = {1 − [n(n − 1) + 2a]x −2 }u n−1 + {[2n(n − 1) + 2a + an(n − 1)]x −3 − ax −1 }u n−1 = {1 − [n(n − 1) + 2a] x −2 } u n−1 + {[n(n − 1)(a + 2) + 2a] x −3 − a x −1 } u n−1 The equation u n + [1 − n(n + 1)x −2 ]u n = 0 using the above expressions becomes {1 − [n(n − 1) + 2a] x −2 } u n−1 + {[n(n − 1)(a + 2) + 2a] x −3 − a x −1 } u n−1 +[1 − n(n + 1)x −2 ][−u n−1 + a x −1 u n−1 ] = 0. Comparing coefficients for u n−1 and u n−1 yields a = n u n−1 : u n−1 : 1 − 1 − [n(n − 1) + 2a − n(n + 1)]x −2 = 2(a − n)x −2 = 0, [−a + a]x −1 + [n(n − 1)(a + 2) + 2a − an(n + 1)]x −3 = 0 an(n − 1) + 2n(n − 1) + 2a(1 − n) − an(n − 1) = 2(n − a) = 0, and hereby a recurrence for yn from u n = −x n [x −n u n−1 ] with yn = x −1 u n , u n = x yn , yn = −x n−1 [x −(n−1) yn−1 ] ⇒ yn+1 = −x n [x −n yn ] . (A.94) Singular and regular solution. We know from the Green’s function that the omnidirectional solution should be proportional to g0 ∝ e−ix . The typical radial solution for an omnidirectional source field is chosen to be the spherical Hankel function of the second kind2 1 d e−ikr (2) (2) n (2) , h n+1 (kr ) = −(kr ) h 0 (kr ) = h (kr ) . (A.95) −ikr d(kr ) (kr )n n However, this solution is not sufficient to solve problems without singularity at r = 0. ) is finite at kr , and so are all real parts of the spherical We know that the function sin(kr kr Hankel functions of the second kind, the spherical Bessel functions j0 (kr ) = sin(kr ) , kr jn+1 (kr ) = −(kr )n 1 d j (kr ) . n d(kr ) (kr )n (A.96) with opposite sign e−iωt and require to use (1) (2)∗ the complex conjugate in every expression containing imaginary constants h n = h n . 2 Note that some scholars use the Fourier expansion eiωt Appendix 203 The solutions are linearly independent. One check after some calculation that their Wronski determinant is non-zero [15, Eq. 10.50.1] & & & jn (kr ) h (2) (kr ) & i (2) (2) n & & & j (kr ) h (2) (kr )& = jn (kr )h n (kr ) − jn (kr )h n (kr ) = − (kr )2 . n n (A.97) Below, the Frobenius method is shown as alternative way to get these functions. Alternative way: Frobenius method. Given a second-order differential equation with singular coefficients, it can be solved by a generalized infinite power series: y + /∞ 0 al x l x −1 y + /∞ l=0 0 bl x l x −2 y = 0, solution: y = l=0 ∞ ck x k+γ . k=0 (A.98) Insertion of the solution yields ∞ ∞ ∞ (k + γ ) al + bl ck x k +l+γ −2 = 0, (k + γ − 1) (k + γ )ck x k+γ −2 + k =0 l=0 k=0 an index shift k + l = k, and l = 0 . . . k allows to pull out the common factor x k+γ −2 ∞ 1 (k + γ − 1) (k + γ )ck + k=0 ∞ 1 k 2 (k − l + γ ) al + bl ck−l x k+γ −2 = 0 l=0 k 2 (k + γ + a0 − 1) (k + γ ) + b0 ck + (k − l + γ ) al + bl ck−l x k+γ −2 = 0. k=0 l=1 The coefficient of every exponent of x in the above equation must be zero: indical equation for k = 0 : indical equation for k = 1 : recurrence for k > 1 : (γ + a0 − 1)γ + b0 c0 = 0, (A.99) (γ + a0 )(γ + 1) + b0 c1 + a1 γ + b1 c0 = 0, (A.100) k (k − l + γ ) al + bl ck−l − l=1 = ck . (k + γ + a0 − 1) (k + γ ) + b0 (A.101) Depending on the specific values found for γ , the recurrence, etc. the Frobenius method suggests how to find or construct an independent pair of solutions. Spherical Bessel differential equation. In y + 2x −1 y + [−n(n + 1) + x 2 ]x −2 y = 0, all al and bl are zero except a0 = 2, b0 = −n(n + 1), and b2 = 1. Indical equations and recurrence become 204 Appendix γ (γ + 1) − n(n + 1) c0 = 0, (γ + 1)(γ + 2) − n(n + 1) c1 = 0, (A.102) (A.103) (k + γ + 1)(k + γ ) ck = −ck−2 . (A.104) We see that the recurrence is again a two-step recurrence, so that one can choose between an even solution using c0 = 0, c1 = 0 yielding γ = n or γ = −(n + 1), [or an odd solution that won’t be used, with c0 = 0, c1 = 0 yielding γ + 1 = n or γ + 1 = −(n + 1)]. Spherical Bessel functions. The choice γ = n yields a solution converging everyck−2 (−1)k c0 where: Powers of x are all positive, the recurrences ck = − (n+k+1)(n+k) = (n+k+1)! & ck−2 & yield a convergence radius R = limk→∞ & ck & = limk→∞ (n + k + 1)(n + k) = ∞. 2n n! With a starting value c0 = (2n+1)! , solutions are called spherical Bessel functions [5, Chap. 3.4] jn = (2x)n ∞ (−1)k (n + k)! 2k x . k![2(n + k) + 1]! k=0 (A.105) which are a physical set of regular solutions with n-fold zero at 0. The spherical Bessel function for n = 0 is j0 (x) = sin x . x (A.106) With the above recursive definition iterated [5, Eq. 3.4.15] one could define jn+1 d = −x dx n 1 jn , xn jn = (−x) n 1 d x dx n j0 . (A.107) Spherical Neumann functions. For γ = −(n + 1) and c0 = 0, c1 = 0, the recurck−2 (−1)k c0 = (n−k+1)! and yield the spherical Neumann functions rences are ck = − (n−k+1)(n−k) with an (n + 1)-fold pole at 0. They also obey the recursive definition from above, cos x , y0 = − x yn = (−x) n 1 d x dx n y0 . (A.108) Spherical Hankel functions. The spherical Neumann and Bessel functions based on either sin or cos are clearly linearly independent. The spherical Bessel functions are useful to representing fields convergent everywhere. Physical source fields (Green’s function) diverge at the source location and exhibit a specific way phase radiates, with −ix G ∝ e x . The spherical Bessel and Neumann functions are asymptotically similar to [15, Eq. 10.52.3] Appendix 205 lim jn = (−1)n x→∞ sin(x) , x lim yn = (−1)n+1 x→∞ cos(x) , x (A.109) therefore only their combination to spherical Hankel functions of the second kind h (2) n = jn − i yn (A.110) yields useful physical set of singular solutions. They inherit their (n + 1)-fold pole from the spherical Neumann functions at 0. Their limiting form for large arguments are 1 (2) d (2) n−1 lim h lim h (x) = −x r →∞ n r →∞ dx x n−1 n−1 n − 1 (2) d (2) 1 d (2) n−1 h = −x lim − n h n−1 + n−1 h n−1 = − lim r →∞ r →∞ dx n−1 x x dx dn (2) = (−1)n lim h = in h (2) (A.111) 0 (x). r →∞ dx n 0 With Eqs. (A.110), (A.106) and (A.108), the zeroth-order spherical Hankel function e−ix is h (2) 0 (x) = −ix . Alternative implementation by cylindrical functions. We can transform the spherical Bessel differential equation by inserting y = x α u and obtain after division by x α x α u + 2 αx α−1 u + α(α − 1)u + 2x α−1 u + 2αx α−2 u + [1 − n(n + 1)x −2 ]u = 0 α+1 α(α + 1) − n(n + 1) u + 2 u = 0. u + 1+ x x2 For α = − 21 , the equation for u becomes the Bessel differential equation with α(α + 1) − n(n + 1) = −(n 2 + n + 41 ) = −(n + 21 )2 (n + 21 )2 1 u + u + 1− u = 0. x x2 (A.112) Consequently, the spherical Bessel functions and spherical Hankel functions of the second kind can be implemented using the Bessel and Hankel functions that can be found in any standard maths programming library. The specific relations are: 3 jn (x) = π1 J 1 (x), 2 x n+ 2 3 h (2) n (x) = π 1 (2) H 1 (x). 2 x n+ 2 (A.113) 206 Appendix A.6.5 Green’s Function in Spherical Solutions, Angular Distributions, Plane Waves We can write the inhomogeneous Helmholtz equation ( + k 2 )G = −δ to be excited by a source at the direction θ 0 at the radius r0 . We decompose the excitation into a Delta function in radius and direction −r0−2 δ(r − r0 )δ(θ T0 θ − 1). The directional part needs not be restricted to the spherical Dirac delta function, so we can take a distribution of sources at r0 , weighted by the panning function g(θ), + k 2 p = −r0−2 δ(r − r0 ) g(θ). (A.114) From the spherical basis solutions, we know that at a radius other than r0 , p can be expanded into spherical harmonics p= ∞ n ψnm Ynm (θ ). (A.115) n=0 m=−n Acting on the decomposition of p, the directional part of the Laplacian will yield the eigenvalue ϕ,ζ Ynm = −n(n + 1) r −2 Ynm of the spherical harmonics, and its radial 2 part r 2 r = ∂r∂ 2 + 2r −1 ∂r∂ , as around Eq. (6.11), hence ( + k ) 2 ∞ n ψnm Ynm (θ ) = n=0 m=−n ∞ n 2 ∂ 2∂ n(n + 1) 2 ψnm Ynm (θ) = −r0−2 δ(r − r0 ) g(θ). + k + − 2 2 ∂r r ∂r r n=0 m=−n Obviously, ψnm must depend on k and r , so we may pull the factor k into the differen d2 d d to get the differential operator k 2 d(kr + kr2 d(kr + 1 − n(n+1) and tials drd = k dkr )2 ) (kr )2 observe kr as its variable on the left, and we replace kr by x for brevity. Applying % the factor k −2 and the spherical harmonics transform S2 Ynm (θ) dθ on the equation removes the double sum on the left (orthogonality) and decomposes the panning function g(θ) on the right into γnm d2 n(n + 1) 2 d + 1 − ψnm = −(kr0 )−2 δ(r − r0 ) γnm . + dx 2 x dx x We collect the x-independent term γnm as factors of the solution ψnm = y γnm and get 2 n(n + 1) y = −x0−2 δ(r − r0 ), y + y + 1− x x2 (A.116) Appendix 207 the inhomogeneous spherical Bessel differential equation. As described, e.g., in [16, 17], the inhomogeneous differential equation can be solved by the Lagrangian variation of the parameters for equations of the type y + py + qy = r , knowing its independent homogeneous solutions y1 = h (2) n (x) and y2 = jn (x). It uses a solution y = uy1 + vy2 with variable parameters u and v, which upon first and second-order differentiation becomes y = uy1 + u y1 + vy2 + v y2 y = uy1 + 2u y1 + u y1 + vy2 + 2v y2 + v y2 . y = uy1 + vy2 , Inserted into the equation y + py + qy = r , this yields =0 =0 u (y1 + py1 + qy1 ) +v (y2 + py2 + qy2 ) +u y1 + 2u y1 + v y2 + 2v y2 + p(u y1 + v y2 ) = d (u y1 + v y2 ) + u y1 + v y2 + p (u y1 + v y2 ) = u y1 + v y2 + ( dx + p)(u y1 + v y2 ) = r. Now two functions u and v are to be determined from only one equation, so we may pose an additional constraint. The above equation would simplify if the term (u y1 + v y2 ) vanished. By this and the simplified equation, we get two conditions u y1 + v y2 = 0 u y1 + v y2 = r I: II : and obtain by elimination with either A = I y1 − II y1 or B = I y2 − II y2 A: v (y1 y2 − y1 y2 ) = −r y1 −W B: u (y1 y2 − y1 y2 ) = −r y2 . −W % % So that the solution y = uy1 + vy2 uses u = rWy2 dx and v = rWy1 dx. In our case, −2 we have y1 = h (2) n (x), y2 = jn (x), r = −x 0 δ(r − r 0 ), and the Wronskian W = 2 −1 (ix ) from Eq. (A.97), hence with integration constants enforcing the physical solutions: y = −h (2) n (x) $ x 0 ix 2 jn (x) x0−2 δ(r − r0 ) dx − jn (x) $ ∞ x −2 ix 2 h (2) n (x) x 0 δ(r − r0 ) dx. % % To convert δ(r − r0 ) into δ(x − x0 ) with x = kr , we use δ(x) dx = δ(r ) dr = 1 = k, hence dx = k dr and obviously by with the integration constant replaced, dx dr % % δ(x) k dr = δ(r ) dr we find δ(r ) = k δ(x), 208 Appendix $ x $ ∞ (2) (2) y = −h n (x) ix 2 jn (x) k x0−2 δ(x − x0 ) dx − jn (x) ix 2 h n (x) k x0−2 δ(x − x0 ) dx 0 x (2) h n (x) jn (x0 ), for x ≥ x0 , = −i k (2) jn (x) h n (x0 ) for x ≤ x0 . The solution becomes after re-substituting n x = kr mand expanding ψnm = y γnm over the spherical harmonics p = ∞ n=0 m=−n ψnm Yn (θ ): p = −i k ∞ n γnm Ynm (θ ) n=0 m=−n h (2) n (kr ) jn (kr 0 ), for r ≥ r 0 , jn (kr ) h (2) n (kr 0 ) for r ≤ r 0 . (A.117) Green’s function. For the Green’s function at the direction θ 0 , the angular panning function is expanded as φnm = Ynm (θ 0 ), and we get the formulation of the Green’s function in terms of spherical basis functions: G = −i k ∞ n Ynm (θ 0 ) Ynm (θ ) n=0 m=−n h (2) for r ≥ r0 , n (kr ) jn (kr 0 ) (2) jn (kr ) h n (kr0 ) for r ≤ r0 . (A.118) Plane waves/far field approximation. Equation (6.7) in Sect. 6.3.1 formulates plane T r0 1 waves p = eik θ 0 r as far-field limit p = 4π limr0 →∞ e−ikr G = limr0 →∞ G. (2) 0 −ik h 0 (kr0 ) Using a distribution of plane waves driven by the gains g(θ) = Eq. (A.117), m n (2) γ Y consequently yields with limr0 →∞ h (2) nm n n (kr 0 ) = i h 0 (kr 0 ), n m p = 4π ∞ n jn (kr ) n=0 m=−n = 4π ∞ n h (2) n (kr0 ) (2) r0 →∞ h 0 (kr0 ) lim Ynm (θ ) γnm in jn (kr ) Ynm (θ ) γnm . (A.119) n=0 m=−n or for a single plane-wave direction γnm = Ynm (θ 0 ) p = 4π ∞ n in jn (kr ) Ynm (θ) Ynm (θ 0 ). (A.120) n=0 m=−n A.7 Sine and Tangent Law The sine and tangent law [18] observes the sound pressure of plane waves at to locations x = 0, y = ±d at ear distance in order to simulate the ear signals. A plane wave from the left half of the room from the angle ϕ > 0 first arrives at the left ear Appendix 209 pleft = ei kd sin ϕ and later on the right one pright = e−i kd sin ϕ . The phase difference is ϕ = 2 kd sin ϕ. A superimposed pair of plane waves from the directions ±α arrives at the left ear as pleft = g1 ei kd sin α + g2 e−i kd sin α , right as pright = g1 e−i kd sin α + g2 ei kd sin α = ∗ 1 −g2 ) sin(kd sin α) pleft . The phase difference ±α = 2∠ pleft = 2 arctan (g can be lin(g+ 1 +g2 ) cos(kd sin α) , g1 −g2 2 earized for long wave lengths kd → 0 to ±α ≈ 2 arctan kd g1 +g2 sin α ≈ 2 gg11 −g +g2 kd sin α. Comparing the phase difference of the single plane wave with the one of the 2 sin α, one arrives at the sine law superimposed pair, 2 kd sin ϕ = 2 kd gg11 −g +g2 sin ϕ = g1 − g2 sin α. g1 + g2 If we claim our hearing to possess the ability to not only estimate the interaural phase difference but also its derivative with regard to head rotation ∂ , we arrive at ∂δ ∂ϕ a value pair of binaural features (ϕ , ∂δ ) = 2 kd (sin ϕ, cos ϕ) than should match the one of the stereophonic plane-wave pair. For stereo, the phase difference derived with regard to head rotation is 2 kd ∂ ∂δ ±α ∂ sin(α+δ)|δ=0 +g2 ∂δ sin(−α+δ)|δ=0 = g1 +g2 2 (±α , ∂∂δ±α ) = 2 kd ( gg11 −g sin α, cos α). +g2 ≈ 2 kd g1 ∂ ∂δ 2 2 kd gg11 +g cos α = 2 kd cos α, and yields +g2 In polar coordinates, the radius of both value pairs differs. While the plane wave yields a value pair at the radius 2 kd in the binaural feature space, the stereophonic waves is of the radius 2 kd only at ±α, at which one of the two gains must vanish, and amplitude panning can be used to connect these two points 2kd (± sin α, cos α) by a straight line. The plane wave with the most similar feature pair must lie on the ±α and obtain same polar angle. We may equate the tangents of both points ∂ϕϕ = ∂ ±α ∂δ the tangent law: tan ϕ = ∂δ g1 − g2 tan α. g1 + g2 If instead of the angle of a plane wave with the closest features to those of a given amplitude difference is searched, but the closest features of an amplitude difference matching those of a given plane wave, then the sine law is the best match, even in the two-dimensional feature space. References 1. S. Zeki, J.P. Romaya, D.M.T. Benincasa, M.F. Atiyah, The experience of mathematical beauty and its neural correlates. Front. Hum. Neurosci. 8, e00068 (2014) 2. Mathematical signs and symbols for use in physical sciences and technology (1978) 3. Iso 80000-2, quantities and units? part 2: mathematical signs and symbols to be used in the natural sciences and technology (2009) 210 Appendix 4. G. Dickins, Sound field representation, reconstruction and perception. Master’s thesis, Australian National University, Canberra (2003) 5. W. Cassing, H. van Hees, Mathematische Methoden für Physiker, Göthe-Universität Frankfurt (2014), https://th.physik.uni-frankfurt.de/~hees/publ/maphy.pdf 6. A. Sommerfeld, Vorlesungen über Theoretische Physik (Band 6) Partielle Differentialgleichungen in der Physik, 6th edn. (Harri-Deutsch, 1978) (reprint 1992) 7. F. Zotter, M. Frank, All-round ambisonic panning and decoding. J. Audio Eng. Soc. (2012) 8. J.R. Driscoll, D.M.J. Healy, Computing fourier transforms and convolutions on the 2-sphere. Adv. Appl. Math. 15, 202–250 (1994) 9. W. Contributors, Ideal gas, https://en.wikipedia.org/wiki/Ideal_gas. Accessed Dec 2017 10. G. Franz, Physik III: Thermodynamik (Hochschule, München, 2014), http://www.fb06.fhmuenchen.de/fbalt/queries/download.php?id=6696 11. W. Contributors, Enthalpy, https://en.wikipedia.org/wiki/Enthalpy. Accessed Dec 2017 12. W. Contributors, First law of thermodynamics, https://en.wikipedia.org/wiki/First_law_of_ thermodynamics. Accessed Dec 2017 13. W. Contributors, Adiabatic process, https://en.wikipedia.org/wiki/Adiabatic_process. Accessed Dec 2017 14. D.G. Duffy, Green’s Functions with Applications (CRC Press, Boca Raton, 2001) 15. F.W.J. Olver, R.F. Boisvert, C.W. Clark (eds.), NIST Handbook of Mathematical Functions (Cambridge University Press, Cambridge, 2000), http://dlmf.nist.gov. Accessed June 2012 16. E. Kreyszig, Advanced Engineering Mathematics, 8th edn. (Wiley, New York, 1999) 17. U.H. Gerlach, Linear Mathematics in Infinite Dimensions: Signals, Boundary Value Problems, and Special Functions, Beta edn. (2007), http://www.math.ohio-state.edu/~gerlach/ math/BVtypset/ 18. H. Clark, G. Dutton, P. Vanerlyn, The ‘stereosonic’ recording and reproduction system, reprinted in the J. Audio Eng. Soc., from Proceedings of the Institution of Electrical Engineers, 1957, vol. 6(2) (1958), pp. 102–117