Uploaded by marcel.szabo11

Frank Zotter, Matthias Frank: Ambisonics - A Practical 3D Audio Theory for Recording, Studio Production, Sound Reinforcement and Virtual Reality

advertisement
Springer Topics in Signal Processing
Franz Zotter
Matthias Frank
Ambisonics
A Practical 3D Audio Theory
for Recording, Studio Production,
Sound Reinforcement,
and Virtual Reality
Springer Topics in Signal Processing
Volume 19
Series Editors
Jacob Benesty, INRS-EMT, University of Quebec, Montreal, QC, Canada
Walter Kellermann, Erlangen-Nürnberg, Friedrich-Alexander-Universität,
Erlangen, Germany
The aim of the Springer Topics in Signal Processing series is to publish very high
quality theoretical works, new developments, and advances in the field of signal
processing research. Important applications of signal processing will be covered as
well. Within the scope of the series are textbooks, monographs, and edited books.
More information about this series at http://www.springer.com/series/8109
Franz Zotter Matthias Frank
•
Ambisonics
A Practical 3D Audio Theory for Recording,
Studio Production, Sound Reinforcement,
and Virtual Reality
Franz Zotter
Institute of Electronic Music and Acoustics
University of Music and Performing Arts
Graz, Austria
Matthias Frank
Institute of Electronic Music and Acoustics
University of Music and Performing Arts
Graz, Austria
ISSN 1866-2609
ISSN 1866-2617 (electronic)
Springer Topics in Signal Processing
ISBN 978-3-030-17206-0
ISBN 978-3-030-17207-7 (eBook)
https://doi.org/10.1007/978-3-030-17207-7
© The Editor(s) (if applicable) and The Author(s) 2019. This book is an open access publication.
Open Access This book is licensed under the terms of the Creative Commons Attribution 4.0
International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to
the original author(s) and the source, provide a link to the Creative Commons license and indicate if
changes were made.
The images or other third party material in this book are included in the book’s Creative Commons
license, unless indicated otherwise in a credit line to the material. If material is not included in the book’s
Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the
permitted use, you will need to obtain permission directly from the copyright holder.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the
relevant protective laws and regulations and therefore free for general use.
The publisher, the authors and the editors are safe to assume that the advice and information in this
book are believed to be true and accurate at the date of publication. Neither the publisher nor the
authors or the editors give a warranty, expressed or implied, with respect to the material contained
herein or for any errors or omissions that may have been made. The publisher remains neutral with regard
to jurisdictional claims in published maps and institutional affiliations.
This Springer imprint is published by the registered company Springer Nature Switzerland AG.
The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
Preface
The intention of this textbook is to provide a concise explanation of fundamentals
and background of the surround sound recording and playback technology
Ambisonics.
Despite the Ambisonic technology has been practiced in the academic world for
quite some time, it is happening now that the recent ITU,1 MPEG-H,2 and ETSI3
standards firmly fix it into the production and media broadcasting world.
What is more, Internet giants Google/YouTube recently recommended to use tools
that have been well adopted from what the academic world is currently using.4,5
Last but most importantly, the boost given to the Ambisonic technology by
recent advancements has been in usability: Ways to obtain safe Ambisonic decoders,6,7 the availability of higher-order Ambisonic main microphone arrays
(Eigenmike,8 Zylia9) and their filter-design theory, and above all: the usability
increased by plugins integrating higher-order Ambisonic production in digital audio
workstations or mixers.7,10,11,12,13,14,15 And this progress was a great motivation to
write a book about the basics.
1
https://www.itu.int/rec/R-REC-BS.2076/en.
https://www.iso.org/standard/69561.html.
3
https://www.techstreet.com/standards/etsi-ts-103-491?product_id=1987449.
4
https://support.google.com/jump/answer/6399746?hl=en.
5
https://developers.google.com/vr/concepts/spatial-audio.
6
https://bitbucket.org/ambidecodertoolbox/adt.git.
7
https://plugins.iem.at/.
8
https://mhacoustics.com/products.
9
https://www.zylia.co.
10
http://www.matthiaskronlachner.com/?p=2015.
11
http://www.blueripplesound.com/product-listings/pro-audio.
12
https://b-com.com/en/bcom-spatial-audio-toolbox-render-plugins.
13
https://harpex.net/.
14
http://forumnet.ircam.fr/product/panoramix-en/.
15
http://research.spa.aalto.fi/projects/sparta_vsts/.
2
v
vi
Preface
The book is dedicated to provide a deeper understanding of Ambisonic technologies, especially for but not limited to readers who are scientists, audio-system
engineers, and audio recording engineers. As, from time to time, the underlying
maths would get too long for practical readability, the book comes with a comprehensive appendix with the beautiful mathematical details.
For a common understanding, the introductory section spans a perspective on
Ambisonics from its origins in coincident recordings from the 1930s, to the
Ambisonic concepts from the 1970s, and to classical ways of applying Ambisonics
in first-order coincident sound scene recording and reproduction that have been
practiced from the 1980s on.
In its main contents, this book intends to provide all psychoacoustical, signal
processing, acoustical, and mathematical knowledge needed to understand the inner
workings of modern processing utilities, special equipment for recording, manipulation, and reproduction in the higher-order Ambisonic format. As advanced
outcomes, the aim of the book is to explain higher-order Ambisonic decoding, 3D
audio effects, and higher-order Ambisonic recording with microphones or main
microphone arrays. Those techniques are shown to be suitable to supply audience
areas ranging from studio-sized to hundreds of listeners, or headphone-based
playback, regardless whether it is live, interactive, or studio-produced 3D audio
material.
The book comes with various practical examples based on free software tools
and open scientific data for reproducible research.
Our Ambisonic events experience: In the past years, we have contributed to
organizing Symposia on Ambisonics (Ambisonics Symposium 2009 in Graz, 2010
in Paris, 2011 in Lexington, 2012 in York, 2014 in Berlin), demonstrated and
brought the technology to various winter/summer schools and conferences (EAA
Winter School Merano 2013, EAA Symposium Berlin 2014, workshops and
Ambisonic music repertory demonstration at Darmstädter Ferienkurse für Neue
Musik in 2014, ICAD workshop in Graz 2015, ICSA workshop 2015 in Graz with
PURE Ambisonics night, summer school at ICSA 2017 in Graz, a course at Kraków
film music festival 2015, mAmbA demo facility DAGA in Aachen 2016, Al Di
Meola’s live 3D audio concert hosted in Graz in June 2016, and AES Convention
Milano 2018.
In 2017 (ICSA Graz) and 2018 (TMT Cologne), we initiated and organized
Europe’s First and Second Student 3D Audio Production Competition together with
Markus Zaunschirm and Daniel Rudrich.
Graz, Austria
February 2019
Franz Zotter
Matthias Frank
Acknowledgements
To our lab and colleagues: Traditionally at the Institute of Electronic Music and
Acoustics (IEM), there had been a lot of activity in developing and applying
Ambisonics by Robert Höldrich, Alois Sontacchi, Markus Noisternig, Thomas
Musil, Johannes Zmölnig, Winfried Ritsch, even before our active time of research.
Most of the developments were done in pure-data, e.g., with [iem_ambi],
[iem_bin_ambi], [iem_matrix], CUBEmixer. Dear colleagues deserve to be
mentioned who contributed a lot of skill to improve the usability of Ambisonics:
Hannes Pomberger and his mathematical talent, Matthias Kronlachner who
developed the ambix and mcfx VST plugin suites in 2014, and Daniel Rudrich,
who developed the IEM Plugin Suite, that also involves technology that was
elaborated together with our colleagues Markus Zaunschirm, Christian
Schörkhuber, Sebastian Grill.
We thank you all for your support; it’s the best environment to work in!
First readers: We thank Archontis Politis (Aalto Univ., Espoo and Tampere Univ.,
Finland), Nicolas Epain (b<>com, France), and Matthias Kronlachner (Harman,
Germany/US), to be our first critical readers, supplying us with valuable comments.
Open Access Funding: We are grateful about funding from our local government of Styria (Steiermark) Section 8, Office for Science and Research
(Wissenschaft und Forschung) that covers roughly half the open access publishing
costs. We gratefully thank our University (University of Music and Performing
Arts, “Kunstuni”, Graz) for the other half, transition to open access library, and
vice rectorate for research.
vii
Outline
First-order Ambisonics is nowadays strongly revived by internet technology supported by Google/YouTube, Facebook 360°, 360° audio and video recording and
rendering, as well as VR in games. This renaissance lies in its benefits of (i) its
compact main microphone arrays capturing the entire surrounding sound scene in
only four audio channels (e.g., Zoom H3-VR, Oktava A-Format Microphone, Røde
NT-SF1, Sennheiser AMBEO VR Mic.), and (ii) it easily permits rotation of the
sound scene, allowing to render surround audio scenes, e.g., on head-tracked
headphones, head-mounted AR/VR sets, or mobile devices, as described in Chap. 1.
Auditory events and vector-base panning: Chapter 2 of this book is dedicated to
conveying a comprehensive understanding of the localization impressions in
multi-loudspeaker playback and its models, followed by Chap. 3 that outlines the
essentials of practical vector panning models and their extensions by downmix from
imaginary loudspeakers, which are both fundamental to contemporary Ambisonics.
Harmonic functions, Ambisonic encoding and decoding: Based on the ideals of
accurate localization with panning-invariant loudness and perceived width, Chap. 4
provides a profound mathematical derivation of higher-order Ambisonic panning
functions in 2D and 3D in terms of angular harmonics. These idealized functions
can be maximized in their directional focus (max-rE) and they are strictly limited in
their directional resolution. This resolution limit entails perfectly well-defined
constraints on loudspeaker layouts that make us reach ideal measures for accurate
localization as well as panning-invariant loudness and width. And what is highly
relevant for practical decoding: All-Round Ambisonic decoding to loudspeakers
and TAC/MagLS decoders for headphones are explained in Chap. 4.
The Ambisonic signal processing chain and effects are described in Chap. 5. It
illustrates the signal flow from source encoding through Ambisonic bus to decoding
and where input-specific or general insert and auxiliary Ambisonic effects are
located. In particular, the chapter describes the working principles behind
frequency-independent manipulation effects that are either mirroring/rotating/
re-mapping, warping, or directionally weighting, or such effects that are
frequency-dependent. Frequency-dependent effects can introduce widening, depth
or diffuseness, convolution reverb, or feedback-delay-network (FDN)-based diffuse
ix
x
Outline
reverberation. Directional resolution enhancements are outlined in terms of
SDM/SIRR pre-processing of recorded reverberation and in terms of available tools
such as HARPEX, DirAC, and COMPASS for recorded signals.
Compact higher-order Ambisonic microphones rely on the solutions of the
Helmholtz equation, and their processing uses a frequency-independent decomposition of the spherical array signals into spherical harmonics and the
frequency-dependent radial-focusing filtering associated with each spherical harmonic order, which yield the Ambisonic signals. The critical part is to handle the
properties of radial-focusing filters in the processing of higher-order Ambisonic
microphone arrays (e.g., the Eigenmike). To keep the noise level and the sidelobes
in the recordings low and a balanced frequency response, a careful way for radial
filter design is outlined in Chap. 6.
Compact higher-order loudspeaker arrays oppose the otherwise inwardsoriented Ambisonic surround playback, as described in Chap. 7. This outlooking
last chapter discusses IKO and loudspeaker cubes as compact spherical loudspeaker
arrays with Ambisonically controlled radiation patterns. In natural environments
with acoustic reflections, such directivity-controlled arrays have their own
sound-projecting and distance-changing effects, and they can be used to simulate
sources of specific directivity patterns.
Contents
1 XY, MS, and First-Order Ambisonics . . . . . . . . . . . . . . . . . . . .
1.1 Blumlein Pair: XY Recording and Playback . . . . . . . . . . . .
1.2 MS Recording and Playback . . . . . . . . . . . . . . . . . . . . . . .
1.3 First-Order Ambisonics (FOA) . . . . . . . . . . . . . . . . . . . . .
1.3.1 2D First-Order Ambisonic Recording and Playback
1.3.2 3D First-Order Ambisonic Recording and Playback
1.4 Practical Free-Software Examples . . . . . . . . . . . . . . . . . . .
1.4.1 Pd with Iemmatrix, Iemlib, and Zexy . . . . . . . . . .
1.4.2 Ambix VST Plugins . . . . . . . . . . . . . . . . . . . . . . .
1.5 Motivation of Higher-Order Ambisonics . . . . . . . . . . . . . .
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
2 Auditory Events of Multi-loudspeaker Playback . . . . . . . . . . . . . .
2.1 Loudness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.2 Direction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.2.1 Time Differences on Frontal, Horizontal Loudspeaker
Pair . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.2.2 Level Differences on Frontal, Horizontal Loudspeaker
Pair . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.2.3 Level Differences on Horizontally Surrounding Pairs .
2.2.4 Level Differences on Frontal, Horizontal to Vertical
Pairs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.2.5 Vector Models for Horizontal Loudspeaker Pairs . . . .
2.2.6 Level Differences on Frontal Loudspeaker Triangles .
2.2.7 Level Differences on Frontal Loudspeaker
Rectangles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.2.8 Vector Model for More than 2 Loudspeakers . . . . . . .
2.2.9 Vector Model for Off-Center Listening Positions . . . .
.
.
.
.
.
.
.
.
.
.
.
1
2
3
5
6
9
13
13
13
18
21
..
..
..
23
24
25
..
25
..
..
25
27
..
..
..
28
28
31
..
..
..
31
32
32
.
.
.
.
.
.
.
.
.
.
.
xi
xii
Contents
2.3
Width . . . . . . . . . . . . . . . . . . . . . . . .
2.3.1 Model of the Perceived Width
2.4 Coloration . . . . . . . . . . . . . . . . . . . . .
2.5 Open Listening Experiment Data . . . . .
References . . . . . . . . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
3 Amplitude Panning Using Vector Bases . . . . . . . . . . . . . . . .
3.1 Vector-Base Amplitude Panning (VBAP) . . . . . . . . . . . .
3.2 Multiple-Direction Amplitude Panning (MDAP) . . . . . . .
3.3 Challenges in 3D Triangulation: Imaginary Loudspeaker
Insertion and Downmix . . . . . . . . . . . . . . . . . . . . . . . . .
3.4 Practical Free-Software Examples . . . . . . . . . . . . . . . . .
3.4.1 VBAP/MDAP Object for Pd . . . . . . . . . . . . . . .
3.4.2 SPARTA Panner Plugin . . . . . . . . . . . . . . . . . .
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
34
35
36
38
38
......
......
......
41
42
44
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
48
50
50
51
52
4 Ambisonic Amplitude Panning and Decoding in Higher Orders .
4.1 Direction Spread in First-Order 2D Ambisonics . . . . . . . . . .
4.2 Higher-Order Polynomials and Harmonics . . . . . . . . . . . . . .
4.3 Angular/Directional Harmonics in 2D and 3D . . . . . . . . . . .
4.4 Panning with Circular Harmonics in 2D . . . . . . . . . . . . . . . .
4.5 Ambisonics Encoding and Optimal Decoding in 2D . . . . . . .
4.6 Listening Experiments on 2D Ambisonics . . . . . . . . . . . . . .
4.7 Panning with Spherical Harmonics in 3D . . . . . . . . . . . . . . .
4.8 Ambisonic Encoding and Optimal Decoding in 3D . . . . . . . .
4.9 Ambisonic Decoding to Loudspeakers . . . . . . . . . . . . . . . . .
4.9.1 Sampling Ambisonic Decoder (SAD) . . . . . . . . . . .
4.9.2 Mode Matching Decoder (MAD) . . . . . . . . . . . . . .
4.9.3 Energy Preservation on Optimal Layouts . . . . . . . . .
4.9.4 Loudness Deficiencies on Sub-optimal Layouts . . . .
4.9.5 Energy-Preserving Ambisonic Decoder (EPAD) . . . .
4.9.6 All-Round Ambisonic Decoding (AllRAD) . . . . . . .
4.9.7 EPAD and AllRAD on Sub-optimal Layouts . . . . . .
4.9.8 Decoding to Hemispherical 3D Loudspeaker
Layouts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.10 Practical Studio/Sound Reinforcement Application
Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.11 Ambisonic Decoding to Headphones . . . . . . . . . . . . . . . . . .
4.11.1 High-Frequency Time-Aligned Binaural Decoding
(TAC) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.11.2 Magnitude Least Squares (MagLS) . . . . . . . . . . . . .
4.11.3 Diffuse-Field Covariance Constraint . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
53
54
57
58
58
61
61
67
71
72
72
73
73
74
74
75
77
...
77
...
...
81
85
...
...
...
88
89
90
.
.
.
.
.
Contents
4.12 Practical Free-Software Examples . . . . . . . . . . . . . . . .
4.12.1 Pd and Circular/Spherical Harmonics . . . . . . .
4.12.2 Ambix Encoder, IEM MultiEncoder, and IEM
AllRADecoder . . . . . . . . . . . . . . . . . . . . . . . .
4.12.3 Reaper, IEM RoomEncoder, and IEM
BinauralDecoder . . . . . . . . . . . . . . . . . . . . . .
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
xiii
.......
.......
91
91
.......
92
.......
.......
95
96
5 Signal Flow and Effects in Ambisonic Productions . . . . . . . . .
5.1 Embedding of Channel-Based, Spot-Microphone,
and First-Order Recordings . . . . . . . . . . . . . . . . . . . . . . .
5.2 Frequency-Independent Ambisonic Effects . . . . . . . . . . . .
5.2.1 Mirror . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.2.2 3D Rotation . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.2.3 Directional Level Modification/Windowing . . . . .
5.2.4 Warping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.3 Parametric Equalization . . . . . . . . . . . . . . . . . . . . . . . . . .
5.4 Dynamic Processing/Compression . . . . . . . . . . . . . . . . . .
5.5 Widening (Distance/Diffuseness/Early Lateral Reflections)
5.6 Feedback Delay Networks for Diffuse Reverberation . . . .
5.7 Reverberation by Measured Room Impulse Responses
and Spatial Decomposition Method in Ambisonics . . . . . .
5.8 Resolution Enhancement: DirAC, HARPEX, COMPASS .
5.9 Practical Free-Software Examples . . . . . . . . . . . . . . . . . .
5.9.1 IEM, ambix, and mcfx Plug-In Suites . . . . . . . . .
5.9.2 Aalto SPARTA . . . . . . . . . . . . . . . . . . . . . . . . .
5.9.3 Røde . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
.....
99
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
101
103
104
106
107
109
110
111
112
114
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
116
119
120
120
125
126
127
.
.
.
.
.
.
.
.
.
.
.
.
131
132
132
133
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
133
135
137
139
141
142
145
146
6 Higher-Order Ambisonic Microphones and the Wave Equation
(Linear, Lossless) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.1 Equation of Compression . . . . . . . . . . . . . . . . . . . . . . . . . .
6.2 Equation of Motion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.3 Wave Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.3.1 Elementary Inhomogeneous Solution: Green’s
Function (Free Field) . . . . . . . . . . . . . . . . . . . . . . .
6.4 Basis Solutions in Spherical Coordinates . . . . . . . . . . . . . . .
6.5 Scattering by Rigid Higher-Order Microphone Surface . . . . .
6.6 Higher-Order Microphone Array Encoding . . . . . . . . . . . . . .
6.7 Discrete Sound Pressure Samples in Spherical Harmonics . . .
6.8 Regularizing Filter Bank for Radial Filters . . . . . . . . . . . . . .
6.9 Loudness-Normalized Sub-band Side-Lobe Suppression . . . .
6.10 Influence of Gain Matching, Noise, Side-Lobe Suppression . .
xiv
Contents
6.11 Practical Free-Software Examples . . . . . . . . . . . . . . . . . .
6.11.1 Eigenmike Em32 Encoding Using Mcfx and IEM
Plug-In Suites . . . . . . . . . . . . . . . . . . . . . . . . . .
6.11.2 SPARTA Array2SH . . . . . . . . . . . . . . . . . . . . . .
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7 Compact Spherical Loudspeaker Arrays . . . . . . . . . . . . . . .
7.1 Auditory Events of Ambisonically Controlled Directivity
7.1.1 Perceived Distance . . . . . . . . . . . . . . . . . . . . . .
7.1.2 Perceived Direction . . . . . . . . . . . . . . . . . . . . .
7.2 First-Order Compact Loudspeaker Arrays and Cubes . . .
7.3 Higher-Order Compact Spherical Loudspeaker Arrays
and IKO . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7.3.1 Directivity Control . . . . . . . . . . . . . . . . . . . . . .
7.3.2 Control System and Verification Based
on Measurements . . . . . . . . . . . . . . . . . . . . . . .
7.4 Auditory Objects of the IKO . . . . . . . . . . . . . . . . . . . . .
7.4.1 Static Auditory Objects . . . . . . . . . . . . . . . . . . .
7.4.2 Moving Auditory Objects . . . . . . . . . . . . . . . . .
7.5 Practical Free-Software Examples . . . . . . . . . . . . . . . . .
7.5.1 IEM Room Encoder and Directivity Shaper . . . .
7.5.2 IEM Cubes 5.1 Player and Surround with Depth
7.5.3 IKO . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
. . . . . 148
. . . . . 148
. . . . . 150
. . . . . 151
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
153
154
154
154
155
. . . . . . 157
. . . . . . 159
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
162
164
164
165
166
166
167
169
169
Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171
Chapter 1
XY, MS, and First-Order Ambisonics
Directionally sensitive microphones may be of the light moving
strip type. […] the strips may face directions at 45◦ on each side
of the centre line to the sound source.
Alan Dower Blumlein [1], Patent, 1931
Abstract This chapter describes first-order Ambisonic technologies starting from
classical coincident audio recording and playback principles from the 1930s until
the invention of first-order Ambisonics in the 1970s. Coincident recording is based
on arrangements of directional microphones at the smallest-possible spacings in
between. Hereby incident sound approximately arrives with equal delay at all microphones. Intensity-based coincident stereophonic recording such as XY and MS
typically yields stable directional playback on a stereophonic loudspeaker pair. While
the stereo width is adjustable by MS processing, the directional mapping of firstorder Ambisonics is a bit more rigid: the omnidirectional and figure-of-eight recording pickup patterns are reproduced unaltered by equivalent patterns in playback. In
perfect appreciation of the benefits of coincident first-order Ambisonic recording
technologies in VR and field recording, the chapter gives practical examples for
encoding, headphone- and loudspeaker-based decoding. It concludes with a desire
for a higher-order Ambisonics format to get a larger sweet area and accommodate first-order resolution-enhancement algorithms, the embedding of alternative,
channel-based recordings, etc.
Intensity-based coincident stereophonic recording such as XY uses two figure-ofeight microphones, after Blumlein’s original work [1] from the 1930s, with an angular
spacing of 90◦ , see [2–4]). Another representative, MS, uses an omnidirectional and
a lateral figure-of-eight microphone [2]. Both typically yield a stable directional
playback in stereo, but signals often get too correlated, yielding a lack in depth
and diffuseness of the recording space when played back [5, 6] and compared to
delay-based AB stereophony or equivalence-based alternatives.
Gerzon’s work in the 1970s [7] gave us what we call first-order Ambisonic recording and playback technology today. Ambisonics preserves the directional mapping
© The Author(s) 2019
F. Zotter and M. Frank, Ambisonics, Springer Topics in Signal Processing 19,
https://doi.org/10.1007/978-3-030-17207-7_1
1
2
1 XY, MS, and First-Order Ambisonics
by recording and reproducing with spatially undistorted omnidirectional and figureof-eight patterns on circularly (2D) or spherically (3D) surrounding loudspeaker
layouts.
1.1 Blumlein Pair: XY Recording and Playback
The XY technique dates back to Blumlein’s patent from the 1930s [1] and his patents
thereafter [4]. Nowadays outdated, manufacturers started producing ribbon microphones that offered means to record with figure-of-eight pickup patterns.
Blumlein Pair using 90◦ -angled figure-of-eight microphones (XY). Blumlein’s
classic coincident microphone pair [3, Fig. 3] uses two figure-of-eight microphones
pointing to ±45◦ , see Fig. 1.1. Its directional pickup pattern is described by cos φ
when φ is the angle enclosed by microphone aiming and sound source. Using a mathematically positive coordinate definition for X (front-right) and Y (front-left), the polar
angle ϕ = 0 aiming at the front, the figure-of-eight X uses the angle φ = ϕ + 45◦
and Y the angle φ = ϕ − 45◦ , so that the pickup pattern of the microphone pair is:
cos(ϕ + 45◦ )
.
gXY (ϕ) =
cos(ϕ − 45◦ )
(1.1)
Assuming a signal s coming from the angle ϕ, the signals recorded are [X, Y ]T g(ϕ) s.
Sound sources from the left 45◦ , the front 0◦ and the right −45◦ will be received by
the pair of gains:
X
Y
(a) Blumlein XY pair
(b) Picture of the recording setup
Fig. 1.1 Blumlein pair consisting of 90◦ -angled figure-of-eight microphones
1.1 Blumlein Pair: XY Recording and Playback
1
◦
right : gXY (−45 ) =
,
0
3
⎡
◦
center : gXY (0 ) =
⎤
√1
⎣ 2⎦,
√1
2
0
.
left : gXY (45 ) =
1
◦
Obviously, a source moving from the right −45◦ to the left 45◦ pans the signal from
the channel X to the channel Y. This property provides a strongly perceivable lateralization of lateral sources when feeding the left and right channel of a stereophonic
loudspeaker pair by Y and X, respectively.
However, ideally there should not be any dominant sounds arriving from the
sides, as for the source angles between −135◦ ≤ ϕ ≤ −45◦ and 45◦ ≤ ϕ ≤ 135◦ the
Blumlein pair produces out-of-phase signals between X and Y. The back directions
are mapped with consistent sign again, however, left-right reversed. It is only possible
to avoid this by decreasing the angle between the microphone pair, which, however,
would make the stereo image narrower.
Therefore, coincident XY recording pairs nowadays most often use cardioid directivities 21 + 21 cos ϕ, instead. They receive all directions without sign change and easily permit stereo width adjustments by varying the angle between the microphones.
1.2 MS Recording and Playback
Blumlein’s patent [1] considers sum and difference signals between a pair of channels/microphones, yielding M-S stereophony. In M-S [8], the sum signal represents
the mid (omnidirectional, sometimes cardioid-directional to front) and the difference the side signal (figure-of-eight). MS recordings can also be taken with cardioid
microphones and permit manipulation of the stereo width of the recording.
MS recording by omnidirectional and figure-of-eight microphone (native MS).
Mid-side recording can be done by using a pair of coincident microphones with
an omnidirectional (mid, W) and a side-ways oriented figure-of-eight (side, Y)
directivity, Fig. 1.2. The pair of pickup patterns is described by the vector:
W
Y
(a) Native MS recording
(b) Picture of the recording setup
Fig. 1.2 Native mid-side recording with the coincident arrangement of an omnidirectional microphone heading front and a figure-of-eight microphone heading left
4
1 XY, MS, and First-Order Ambisonics
gWY (ϕ) =
1
sin(ϕ)
(1.2)
that depends on the angle ϕ of the sound source. Equation (1.2) maps a single sound
s from ϕ to the mid W and side Y signals by the gains [W, Y ]T = g(ϕ) s
1
left : gWY (90 ) =
,
1
1
right : gWY (−90 ) =
−1
◦
◦
1
center : gWY (0 ) =
.
0
◦
MS recording with a pair of 180◦ -angled cardioids. Two coincident cardioid microphones (cardioid directivity 21 + 21 cos ϕ) pointing to the polar angles 90◦ (left) and
−90◦ (right) are also applicable to mid-side recording, Fig. 1.3. Their pickup patterns
gC±90◦ (ϕ) =
1 1 + sin(ϕ)
1 1 + cos(ϕ − 90◦ )
=
2 1 + cos(ϕ + 90◦ )
2 1 − sin(ϕ)
(1.3)
are encoded into the MS pickup patterns (W,Y) by a matrix
1 1
gWY (ϕ) =
g
◦ (ϕ).
1 −1 C±90
(1.4)
The matrix eliminates the cardioids’ figure-of-eight characteristics by their sum signal, and their omnidirectional characteristics by the difference. We obtain the MS
signal pair (W,Y) from the cardioid microphone signals as
W
1 1
C90◦
=
.
Y
1 −1 C−90◦
(1.5)
W
Y
(a) 180 angled cardioid microphones
Fig. 1.3 Mid-side recording by 180◦ -angled cardioids
(b) Picture of the recording setup
1.2 MS Recording and Playback
5
(a) Changing the MS stereo width
(b) MS decoding to loudspeakers
Fig. 1.4 Change of the stereo width by modifying the balance between W and Y signals of MS
(left). Decoding of the M/S signal pair (W, Y) to a stereo loudspeaker pair (right)
Decoding of MS signals to a stereo loudspeaker pair. Decoding of the mid-side
signal pair to left and right loudspeaker is done by feeding both signals to both
loudspeakers, however out-of-phase for the side signal, Fig. 1.4b:
1 1 1 W
L
=
.
R
2 1 −1 Y
(1.6)
An interesting aspect about the 180◦ -angled cardioid microphone MS is that after
inserting the XY-to-MS encoder Eq. (1.5) into the decoder Eq. (1.6), a brief calculation shows that matrices invert each other. In this case, the cardioid signals are
directly fed to the loudspeakers [L , R] = [C90◦ , C−90◦ ].
Stereo width. Modifying the mid versus side signal balance before stereo playback,
using a blending parameter α, allows to change the width of the stereo image from
α = 0 (narrow) to α = 1 (full), Fig. 1.4a, see also [9]:
1 1 1 2−α 0 W
L
=
.
R
0 α Y
2 1 −1
(1.7)
In stereophonic MS playback, the playback loudspeaker directions at ±30◦ are not
identical to the peaks of the recording pickup pattern of the side channel (Y) at ±90◦ .
Ambisonics assumes a more strict correspondence between directional patterns of
recording and patterns mapped on the playback system.
1.3 First-Order Ambisonics (FOA)
After Cooper and Shiga [10] worked on expressing panning strategies for arbitrary
surround loudspeaker setups in terms of a directional Fourier series, the notion
and technology of Ambisonics was developed by Felgett [11], Gerzon [7], and
Craven [12]. In particular, they were also considering a suitable recording technology.
Essentially based on similar considerations as MS, one can define first-order
Ambisonic recording. For 2D recordings, a Double-MS microphone arrangement is
suitable and only requires one more microphone than MS recording: a front-back
6
1 XY, MS, and First-Order Ambisonics
oriented figure-of-eight microphone. The scheme is extended to 3D first-order
Ambisonics by a third figure-of-eight microphone of up-down aiming. Oftentimes,
first-order Ambisonics still is the basis of nowadays’ virtual reality applications and
360◦ audio streams on the internet. In addition to potential loudspeaker playback,
it permits interactive playback on head-tracked headphones to render the acoustic
sound scene static to the listener.
First-order Ambisonic recording has the advantage that it can be done with
only a few high-quality microphones. However, the sole distribution of first-order
Ambisonic recordings to playback loudspeakers is typically not convincing without
going to higher orders and directional enhancements (Sect. 5.8).
1.3.1 2D First-Order Ambisonic Recording and Playback
The first-order Ambisonic format in 2D consists of one signal corresponding to an
omnidirectional pickup pattern (called W), and two signals corresponding to the
figure-of-eight pickup patterns aligned with the Cartesian axes (X and Y).
Native 2D Ambisonic recording (Double-MS). To record the Ambisonic channels
W, X, Y, one can use a Double-MS arrangement as shown in Fig. 1.5.
2D Ambisonic recording with four 90◦ -angled cardioids. Extending the MS scheme
for recording with cardioid microphones, Fig. 1.3, cardioid microphones could be
used to obtain the front-back and left-right figure-of-eight pickup patterns by corresponding pair-wise differences, and one omnidirectional pattern as their sum, Fig. 1.6.
However, the use of 4 microphones for only 3 output signals is inefficient.
X
W
Y
(a) Native 2D FOA recording
(b) Picture of recording setup
Fig. 1.5 Native 2D first-order Ambisonic recording with an omnidirectional and a figure-of-eight
microphone heading front, and a figure-of-eight microphone heading left; photo shown on the right
1.3 First-Order Ambisonics (FOA)
7
W
X
Y
(a) 2D FOA with 4 cardioid microphones
(b) Picture of recording setup
Fig. 1.6 2D first-order Ambisonic recording with four 90◦ -angled cardioid microphones, sums and
differences between them (front±back, left±right)
W
Y
(a) 2D FOA with 3 cardioid microphones
(b) Picture of recording setup
Fig. 1.7 2D first-order Ambisonics with three 120◦ -angled cardioid microphones
2D Ambisonic recording with three 120◦ -angled cardioids. Assuming 3 coincident cardioid microphones aiming at the angles 0◦ , ±120◦ in the horizontal plane,
cf. Fig. 1.7, we obtain as the pickup pattern for the incoming sound
⎤
⎡
cos(ϕ)
1 1⎣
cos(ϕ + 120◦ )⎦ .
g(θ) = +
2 2 cos(ϕ − 120◦ )
Combining
all the three microphone signals yields an omnidirectional pickup
N −1
cos(ϕ + 2π
k) = 0. Moreover introducing the differences between
pattern as k=0
N
the front and two back microphone signals and between the left and right microphone
signals yields an encoding matrix to obtain the omnidirectional W and the two X and
Y figure-of-eight characteristcs
⎤
⎡
⎤
⎡
1
1
1
1
2⎣
⎦
⎣
⎦
2 −1
√ g(ϕ) = cos(ϕ) .
√ −1
3
sin(ϕ)
3− 3
0
(1.8)
8
1 XY, MS, and First-Order Ambisonics
Fig. 1.8 2D first-order
Ambisonic decoding to 4
loudspeakers
X
Y
W
2D Ambisonic decoding to loudspeakers. The W, X, and Y channel of 2D firstorder Ambisonics (Double-MS) can easily be played on an arrangement of four
loudspeakers, front, back, left, right. While the omnidirectional signal contribution
is played by all of the loudspeakers, the figure-of-eight contributions are played outof-phase by the corresponding front-back or left-right pair of loudspeakers, Fig. 1.8.
⎡ ⎤ ⎡
⎤
F
1 1 0 ⎡ ⎤
⎢ L ⎥ ⎢1 0 1 ⎥ W
⎢ ⎥=⎢
⎥⎣ ⎦
⎣ B ⎦ ⎣1 −1 0⎦ X .
Y
R
1 0 −1
(1.9)
The decoding weights obviously discretizes the directional pickup characteristics of
the Ambisonic channels at the directions of the loudspeaker layout. Consequently, if
the loudspeaker layout is more arbitrary and described by the set of its angles {ϕl },
the sampling decoder can be given as
⎡
⎤
⎤
1 cos(ϕ1 ) sin(ϕ1 ) ⎡W ⎤
Sϕ1
⎢ .. ⎥ 1 ⎢ ..
..
.. ⎥ ⎣ X ⎦ .
⎣ . ⎦ = ⎣.
.
. ⎦
2
Y
SϕL
1 cos(ϕL ) sin(ϕL )
⎡
(1.10)
To achieve a panning-invariant and balanced mapping by this decoder, loudspeakers
should be evenly arranged. Moreover, it can be favorable to sharpen the spatial image
by attenuating W by √13 to map a sound by a supercardioid playback pattern.
Playback to head-tracked headphones and interactive rotation. In headphone playback, the headphone signals are generated by convolution with the head-related
impulses responses of all four loudspeakers contributing to the left and the right ear
signals
L ear
Rear
⎡ ⎤
F
⎥
(t)∗ h 180
(t)∗ h −90
(t)∗ ⎢
h 0L (t)∗ h 90
L
L
L
⎢L⎥ .
= 0◦
−90◦
90◦
180◦
⎣
B⎦
h R (t)∗ h R (t)∗ h R (t)∗ h R (t)∗
R
◦
◦
◦
◦
(1.11)
1.3 First-Order Ambisonics (FOA)
W
9
F
L
X
B
R
Y
Fig. 1.9 2D first-order Ambisonic decoding to head-tracked headphones
To rotate the Ambisonic input scene of the decoder, it is sufficient to obtain a new
set of figure-of-eight signals by mixing the X, Y channels with the following matrix
depending on the rotation angle ρ, keeping W unaltered
⎤⎡ ⎤
⎡ ⎤ ⎡
W
1 0
0
W
⎣ X̃ ⎦ = ⎣0 cos ρ − sin ρ ⎦ ⎣ X ⎦ .
0 sin ρ cos ρ
Y
Ỹ
(1.12)
This effect is important for head-tracked headphone playback to render the VR/360◦
audio scene static around the listener. A complete playback system is shown in
Fig. 1.9. The big advantage of such a system is that rotational updates can be done
at high control rates and the HRIRs of the convolver are constant.
1.3.2 3D First-Order Ambisonic Recording and Playback
The first-order Ambisonic format in 3D consists of a signal W corresponding to an
omnidirectional pickup pattern, and three signals (X, Y, and Z) corresponding to
figure-of-eight pickup patterns aligned with the Cartesian coordinate axes.
In three dimensions, we cannot work with figure-of-eight patterns described by
sin ϕ or cos ϕ of the azimuth angle only, anymore. It is more convenient to describe
the arbitrarily oriented figure-of-eight characteristics cos(φ) using the inner product
between a variable direction vector (direction of arriving sound) and a fixed direction
vector (microphone direction). Direction vectors are of unit length θ = 1 and their
inner product corresponds to θT1 θ = cos(φ), where φ is the angle enclosed by the
direction of arrival θ and the microphone direction θ1 . Consequently, a cardioid
pickup pattern aiming at θ1 is described by 21 + 21 θT1 θ.
Native 3D Ambisonic recording (Triple-MS). To record the Ambisonic channels
W, X, Y, Z, one can use a Triple-MS scheme as shown in Fig. 1.10. With the transposed unit direction vectors representing the aiming of the figure-of-eight channels
θTX = [1, 0, 0], θTY = [0, 1, 0], θTZ = [0, 0, 1], to produce the direction dipoles θTX θ,
10
1 XY, MS, and First-Order Ambisonics
X
Z
W
Y
(a) Native 3D FOA recording
(b) Picture of the recording setup
Fig. 1.10 Native 3D first-order Ambisonic recording with an omnidirectional and three figure-ofeight microphones aligned with the Cartesian axes X, Y, Z
θTY θ, and θTZ θ, we can mathematically describe the pickup patterns of native 3D firstorder Ambisonics as
⎤ ⎡
⎤
⎡
⎛ 1 ⎞
⎛ T1⎞
⎥ ⎢ 100 ⎥
⎢ θX
1
⎥
⎢
⎥
=
=
gWXYZ (θ) = ⎢
.
(1.13)
⎣⎝θTY ⎠ θ⎦ ⎣⎝0 1 0⎠ θ⎦
θ
001
θTZ
3D Ambisonic recording with a tetrahedral arrangement of cardioids. The principle
that worked for three cardioid microphones on the horizon also works for a coincident
tetrahedron microphone array of cardioids with the aiming directions FLU-FRDBLD-BRU, see Fig. 1.11, and [12],
⎡
⎤
⎡
⎤
1 1 1
T ⎥
1 1 1 ⎢ 1 −1 −1⎥
1 1⎢
⎢θ ⎥
⎥ θ.
g(θ) = + ⎢ TFRD ⎥ θ = + √ ⎢
2 2 ⎣θBLD ⎦
2 2 3 ⎣−1 1 −1⎦
−1 −1 1
θTBRU
θTFLU
(1.14)
Encoding is achieved there by the matrix that adds all microphone signals in the
first line (W omnidirectional), subtracts back from front microphone signals in the
second line (X figure-of-eight), subtracts right from left microphone signals in the
third line (Y figure-of-eight), and subtracts down from up microphone signals in the
last line (Z figure-of-eight), see also Fig. 1.11,
⎤
⎛1 1 1 1 ⎞
⎥
1⎢
√ 1 1 −1 −1 ⎥ g(θ).
gWXYZ (θ) = ⎢
2 ⎣ 3 ⎝1 −1 1 −1⎠⎦
1 −1 −1 1
⎡
(1.15)
1.3 First-Order Ambisonics (FOA)
11
W
X
Y
Z
(a) Tetrahedral array with four cardioids
(b) Encoder of microphone signals
Fig. 1.11 Tetrahedral arrangement of cardioid microphones for 3D first-order Ambisonics and
their encoding; microphone capsules point inwards to minimize their spacing
(a) Tetrahedral array of 4 cardioids
(b) SPS200
(c) MK4012
(d) ST450
Fig. 1.12 Practical tetrahedral recording setups with cardioid microphones; the Soundfield SPS200,
Oktava MK4012, and Soundfield ST450 offer a fixed 4-capsule geometry. Equally important:
Zoom’s H3-VR, Røde’s NT-SF1, Sennheiser’s AMBEO VR Mic
As Fig. 1.12 shows, practical microphone layouts should be as closely spaced as
possible. Nevertheless for high frequencies, the microphones cannot be considered
coincident anymore, and besides a directional error, there will be a loss of presence in
the diffuse field. Typically a shelving filter is used to slightly boost high frequencies.
Roughly, a high-shelf filter with a 3 dB boost is sufficient to correct timbral defects
at frequencies above which the microphone spacing exceeds half a wavelength, e.g.,
5 kHz for a 3.4 cm spacing of the microphones. More advanced strategies are found,
e.g., in [7, 13–15].
3D Ambisonic decoding to loudspeakers. As before in the 2D case, a sampling
decoder can be defined that represents the continuous directivity patterns associated
with the channels W, X, Y, Z to map the signals to the discrete directions of the
12
1 XY, MS, and First-Order Ambisonics
loudspeakers. Given the set of loudspeaker directions {θl } and the unit-vectors to X,
Y, Z, the loudspeaker signals of the sampling decoder become
⎡ T
⎡ T ⎤ ⎡W ⎤
⎤
⎤ ⎡W ⎤
1 θ1 θX θT1 θY θT1 θZ
1 θ1
S1
1 ⎢. . ⎥ ⎢
X⎥
X⎥
⎥⎢
⎢ .. ⎥ 1 ⎢ ..
⎢
⎥
⎥
⎦ ⎣ Y ⎦ = ⎣ .. .. ⎦ ⎢
⎣ . ⎦ = ⎣.
⎣Y ⎦.
2
2
T
T
T
T
SL
1 θL θX θL θY θL θZ
1θ
Z
Z
L ⎡
(1.16)
D
Equivalent panning function/virtual microphone. The sampling decoder together
with the native Ambisonic directivity patterns gTWXYZ (θ) = [1, θT ] yields the mapping of a signal s from the direction θ to the loudspeakers to be
⎤
⎡ T⎤
⎡
⎤
1 θ1 1 + θT1 θ
S1
1⎢ . ⎥
⎢ .. ⎥ 1 ⎢ .. .. ⎥ 1
⎣ . ⎦ = ⎣ . . ⎦ θ s = ⎣ .. ⎦ s,
2
2
SL
1 θTL
1 + θTL θ
⎡
(1.17)
This result means that the gain of a source from θ at each loudspeaker θl corresponds
to evaluating a cardioid pattern aligned with θ. Consequently, the Ambisonic mapping
corresponds to a signal distribution to the loudspeakers using weights obtained by
discretization of an Ambisonics-equivalent first-order panning function.
Equivalently, Ambisonic playback using a sampling decoder is comparable to
recording each loudspeaker signal with a virtual first-order cardioid microphone
aligned with the loudspeaker’s direction θl .
It is decisive for a panning-independent loudness mapping and balanced performance that the directions of the loudspeaker layout are well chosen. Also, it can be
preferred to reduce the level of the omnidirectional channel W by √13 to map a sound
by the narrower supercardioid playback pattern instead of a cardioid pattern, which
is rather broad.
Decoder design problems were early addressed by Gerzon [16], Malham [17],
and Daniel [18]. A current solution for higher-order decoding is given in Sect. 4.9.6
on All-round Ambisonic decoding.
3D Ambisonic decoding to headphones. 3D Ambisonic decoding to headphones
uses the same approach as for 2D above, except that additional rotational degrees are
implemented to compensate for any change in head orientation. Rotation concerns
the three directional components X, Y, Z
⎡ ⎤
⎡ ⎤
X̃
X
⎣ Ỹ ⎦ = R(α, β, γ ) ⎣ Y ⎦ .
Z
Z̃
(1.18)
For the definition of the rotation matrix R(α, β, γ ) and the meaning of its angles
refer to Eq. 5.5 of Sect. 5.2.2. The selection of a suitable set of HRIRs is a question
1.3 First-Order Ambisonics (FOA)
13
of directional discretization of the 3D directions, as addressed in the decoder above.
Signals obtained for virtual loudspeakers are again to be convolved with the corresponding HRIRs for the left and the right ear.
1.4 Practical Free-Software Examples
The practical examples below show first-order Ambisonic panning a mono sound,
decoded to simple loudspeaker layouts. These are either a square layout with 4
loudspeakers at the azimuth angles [0◦ , 90◦ , 180◦ , −90◦ ] or an octahedral layout with 6 loudspeakers at azimuth [0◦ , 90◦ , 180◦ , −90◦ , 0◦ , 0◦ ] and elevation
[0◦ , 0◦ , 0◦ , 0◦ , 90◦ , −90◦ ].
1.4.1 Pd with Iemmatrix, Iemlib, and Zexy
Pd is free and it can load and install its extensions from the internet. Required software
components are:
•
•
•
•
pure-data (free, http://puredata.info/downloads/pure-data)
iemmatrix (free download within pure-data)
zexy (free download within pure-data)
iemlib (free download within pure-data)
Figure 1.13 gives an example for horizontal (2D) first-order Ambisonic panning,
decoded to 4 loudspeaker and 2 headphone signals.
Figure 1.14
shows
the
processing
inside
the
Pd
abstraction
[FOA_binaural_decoder] contained in the Fig. 1.13 example, which uses SADIE
database1 subject 1 (KU100 dummy head) HRIRs to render headphone signals.
Figure 1.15 sketches a first-order Ambisonic panning in 3D with decoding to
an octahedral loudspeaker layout; master level [multiline∼] and hardware outlets
[dac∼] were omitted for easier readability.
1.4.2 Ambix VST Plugins
This example uses a DAW and ready-to-use VST plug-ins to render first-order
Ambisonics. As DAW, we recommend Reaper (reaper.fm) because it nicely facilitates higher-order Ambisonics by allowing tracks of up to 64 channels. Moreover, it is
relatively low-priced and there is a fully functional free evaluation version available.
You can also use any other DAW that supports VST and sufficiently many multi-track
1 https://www.york.ac.uk/sadie-project/Resources/SADIEIIDatabase/D1/D1_HRIR_WAV.zip.
14
1 XY, MS, and First-Order Ambisonics
encoding vector
0
decoding matrix
loadbang
matrix
4 3
1 1 0
1 0 1
1 -1 0
1 0 -1
azimuth angle in deg.
zexy/deg2rad
declare -lib zexy
expr 1.0/2;
cos($f1)/2;
sin($f1)/2
declare -lib iemmatrix
pack f f f
mtx 3 1
pink~
<-test signal
master
level
encoder mtx_*~ 3 1 10
*~ 1.4 *~ 1.4 <-2D
superdecoder mtx_*~ 4 3 0
card.
dbtorms
prvu~ prvu~ prvu~ prvu~
>+12
+6
+2
-0dB
-2
-6
-12
-20
-30
-50
<-99
>+12
+6
+2
-0dB
-2
-6
-12
-20
-30
-50
<-99
>+12
+6
+2
-0dB
-2
-6
-12
-20
-30
-50
<-99
>+12
+6
+2
-0dB
-2
-6
-12
-20
-30
-50
<-99
multiline~ 0 0 0 0 10
FOA_binaural_decoder
lspks-> dac~ 3 4 5 6
dac~ 1 2 <-headphs.
Fig. 1.13 First-order 2D encoding and decoding in pure data (Pd) for a square layout
Left
Rear
Right loadbang
Front
inlet~ inlet~ inlet~ inlet~ 0
90 180 270
read 44K_16bit/azi_$1\,0_ele_0\,0.wav h$1L h$1R
soundfiler
h0L
h0R
FIR~ h270L 256 FIR~ h270R 256
FIR~ h180L 256 FIR~ h180R 256 h90L
FIR~ h90L 256
h90R
h270L
h270R
FIR~ h90R 256
FIR~ h0L 256
FIR~ h0R 256
outlet~
outlet~
h180L
h180R
Fig. 1.14 Binaural rendering to headphones on pure data (Pd) by convolution with SADIE KU100
HRIRs
channels. The example employs the freely available ambiX plug-in suite (http://www.
matthiaskronlachner.com/?p=2015), although there exist other Ambisonics plug-ins,
especially for first-order.
Track Name
Ins Outs FX
Virtual source 1 1 4
ambix_encoder_o1
MASTER
4 6
ambix_decoder_o1
1.4 Practical Free-Software Examples
15
declare -lib zexy
Azimuth angle in deg.
declare -lib iemmatrix
20
zexy/deg2rad
static decoding matrix declare -lib iemlib
loadbang
matrix 6 4
1 0 0 1
1 -1 0 0
1 0 0 -1
Elevation
angle
15
1 1 0 0
90 $1 in deg.
1 0 1 0
1 0 -1 0
zexy/deg2rad
ENCODER: osc~ 1000 <- sine
test signal
mtx_*~ 4 1 10
<- encodes
signal into 4
signals of an
Ambisonics bus
decodes to
octahedron
loudspeaker
<- signals
mtx_* 0.354491
t b f
pack f f
DECODER: mtx_*~ 6 4 0
mtx 2 1
mtx_spherical_harmonics 1
mtx_* 3.54491
prvu~ prvu~ prvu~ prvu~
>+12
+6
+2
-0dB
-2
-6
-12
-20
-30
-50
<-99
>+12
+6
+2
-0dB
-2
-6
-12
-20
-30
-50
<-99
>+12
+6
+2
-0dB
-2
-6
-12
-20
-30
-50
<-99
>+12
+6
+2
-0dB
-2
-6
-12
-20
-30
-50
<-99
mtx_transpose
adjustable encoding vector
encodes test signal by the factors:
W:
1,
-Ysqrt(3): -sqrt(3)sin(azi)cos(ele),
Front Left Rear Right
Zsqrt(3): sqrt(3)sin(ele),
Xsqrt(3): sqrt(3)cos(azi)cos(ele),
prvu~
>+12
+6
+2
-0dB
-2
-6
-12
-20
-30
-50
<-99
Top
prvu~
>+12
+6
+2
-0dB
-2
-6
-12
-20
-30
-50
<-99
Bottom
Fig. 1.15 First-order 3D encoding and decoding in pure data (Pd) using
[mtx_spherical_harmonics] for an octahedral layout ([dac∼] omitted for simplicity)
After creating the new track for the virtual source and importing a mono/stereo
audio file (per drag-and-drop), the next step is the setup of the track channels. As
shown in the table, the virtual source has a single-channel (mono) input and 4 output
channels to send the 4 channels of first-order Ambisonics to the Master. The option
to send to the Master is activated by default, cf. left in Fig. 1.16. The Master track
itself requires 4 input channels and 6 output channels to feed the 6 loudspeakers
(right). In Reaper, there is no separate adjustment for input and output channels, thus
the Master track has to be set to 6 channels.
In the source track FX, the ambix_encoder_o1 can be used to encode the virtual
source signal at an arbitrary location on a sphere by inserting the plug-in into the
track of the virtual source, cf. its panning GUI in Fig. 1.17. For adding more sources,
the track of the virtual source can simply be copied or duplicated. All effects and
routing options are maintained for the new tracks.
In order to decode the 4 first-order Ambisonics Master channels to the loudspeakers the ambix_decoder_o1 plug-in is added to the Master track. The plug-in requires a
preset that defines the decoding matrix and its channel sequence and normalization.
For the exemplary octahedral setup with 6 loudspeakers, the following text can be
copied to a text file and saved as config-file, e.g., “octahedral.config”. The decoder
16
1 XY, MS, and First-Order Ambisonics
Fig. 1.16 1st-order example in Reaper DAW: routing
Fig. 1.17 1st-order example
in Reaper DAW: encoder
matrix contains W, -Y, Z, X, with W as constant and -Y, Z, X refer to Cartesian
coordinates of the octahedron.
#GLOBAL
/coeff_scale n3d
/coeff_seq acn
#END
#DECODERMATRIX
1 0 0 1
1 1 0 0
1 -1 0 0
1 0 0 -1
1 0 1 0
1 0 -1 0
#END
1.4 Practical Free-Software Examples
17
Fig. 1.18 1st-order example in Reaper DAW: decoder
After loading the preset into the decoder plug-in, the decoder can generate the
loudspeaker signals as shown in Fig. 1.18. In the example, the virtual source is panned
to the front, resulting in the highest level for loudspeaker 1 (front). The loudspeaker
3 (back) is 12dB quieter because of a side-lobe suppressing super cardioid weighting
implied by the switch /coeff_scale n3d, as a trick to keep things simple.
As shown on the SADIE-II website,2 the SADIE-II head-related impulse responses
can be used to rendering Ambisonics to headphones. The listing below shows a configuration file to be used with ambix_binaural, cf. Fig. 1.19, again using the trick to
select n3d to keep the numbers simple and super-cardioid weighting
#GLOBAL
/coeff_scale n3d
/coeff_seq acn
#END
#HRTF
44K_16bit/azi_0,0_ele_0,0.wav
44K_16bit/azi_90,0_ele_0,0.wav
44K_16bit/azi_180,0_ele_0,0.wav
44K_16bit/azi_270,0_ele_0,0.wav
44K_16bit/azi_0,0_ele_90,0.wav
44K_16bit/azi_0,0_ele_-90,0.wav
#END
2 https://www.york.ac.uk/sadie-project/ambidec.html.
18
1 XY, MS, and First-Order Ambisonics
Fig. 1.19 1st-order example in Reaper DAW: binaural decoder
#DECODERMATRIX
1 0 0 1
1 1 0 0
1 -1 0 0
1 0 0 -1
1 0 1 0
1 0 -1 0
#END
For decoding to less regular loudspeaker layouts, the IEM AllRADecoder3 permits
editing loudspeaker coordinates and automatically calculating a decoder within the
plugin. For decoding to headphones, the IEM BinauralDecoder offers a high-quality
decoder. The technology behind both plugins is explained in Chap. 4.
In addition to the virtual sources, you can also add a 4-channel recording done
with a B-format microphone by placing the 4-channel file in a new track. Reaper
will automatically set the number of track channels to 4 and send the channels to the
Master. Note that some B-format microphones use a different order and/or weighting
of the Ambisonics channels. Simple conversion to the AmbiX-format can be done
by inserting the ambix_converter_o1 plug-in into the microphone track.
1.5 Motivation of Higher-Order Ambisonics
Diffuseness, spaciousness, depth? Diffuse sound fields are typically characterized
by sound arriving randomly from evenly distributed directions at evenly distributed
delays. It is practical knowledge that the impression of diffuseness and spaciousness
3 https://plugins.iem.at/.
1.5 Motivation of Higher-Order Ambisonics
19
requires benefits from decorrelated signals, which is typically achieved by large
distances between the microphones rather than by coincident microphones.
Due to the evenness of diffuse sound fields, one would still hope that a low spatial
resolution is sufficient to map diffuseness and spatial depth of a room, using coincident microphones or first-order Ambisonics. Nevertheless, high directional correlation during playback destroys this hope and in fact yields a perceptually impeded
playback of diffuseness, spaciousness, and depth.
The technical advantages in interactivity and VR as well as the known shortcomings of first-order coincident recording techniques offer enough motivation to
increase the directional resolution and go to higher-order Ambisonics, as presented
in the subsequent chapters. For professional productions, it is often not sufficient to
only rely on first-order coincident microphone recordings. By contrast, higher-order
Ambisonics is able to drastically improve the mapping of diffuseness, spaciousness,
and depth, as shown in the upcoming chapter about psychoacoustical properties of
many-loudspeaker systems.
Recording with a higher-order main microphone array increases the required
technological complexity. Nevertheless, digital signal processing and the theory presented in the later chapters is powerful nowadays to achieve this goal.
After all, it seems that delay-based stereophonic recording, such as AB, or
equivalence-based recording, such as ORTF, INA5, etc., is often required and wellknown in its mapping properties for spaciousness and diffuseness, correspondingly.
What is nice about higher-order Ambisonics: it can make use of these benefits by
embedding such recordings appropriately, see Fig. 1.20.
Facts about higher orders: Ambisonics extended to higher orders permits a refinement of the directional resolution and hereby improves the mapping of uncorrelated
sounds in playback. Figure 1.21a shows the correlation introduced in two neighboring loudspeaker signals when using Ambisonics, given their spacing of 60◦ . Given
the just noticeable difference (JND) of the inter-aural cross correlation, the figure
indicates that an Ambisonic order of ≥3 might be necessary to perceptually preserve
decorrelation.
Fig. 1.20 How is a microphone tree represented in Ambisonics, when it consist of 6 cardioids
spaced by 60 cm and 60◦ on a horizontal ring, and a ring of 4 super cardioids spaced by 40 cm and
90◦ as height layer, pointing upwards?
1 XY, MS, and First-Order Ambisonics
1
1
perceived spatial depth mapping
Inter−Channel Cross Correlation
20
0.8
0.6
0.4
1/2 JND
0.2
0
0.8
0.6
0.4
0.2
1: center
2: off-center
0
0
2
4
6
1
10
8
3
5
Ambisonic order
Ambisonic order
(a) Inter-channel cross correlation of two
Ambisonically driven loudspeakers spaced
by 60
(b) Perceived depth in dependence of Ambisonics playback order for two listening positions
Fig. 1.21 Relation between the Ambisonic order, decorrelation, and perceived depth
5
L
C
4
R
3
2
x in m
Fig. 1.22 Perceptual sweet
area of Ambisonic playback
from the front at first (light
gray), third (dark gray), and
fifth order (black). It marks
the area in which the
perceived direction is
plausible, i.e., does not
collapse into a single
loudspeaker other than C
1
0
-1
-2
-3
-4
-5
6
5
4
3
2
1
0
-1 -2 -3 -4 -5 -6
y in m
For this reason, the perception of spatial depth strongly improves when increasing
the Ambisonic order from 1 up to 3, Fig. 1.21b. However, this is only the case when
seated at the central listening position. Outside this sweet spot, higher orders than
3, e.g., 5, additionally improve the mapping of depth [19]. Therefore, higher-order
Ambisonics is important for preserving spatial impressions and when supplying a
large audience.
Figure 1.22 shows that the sweet area of perceptually plausible playback increases
with the Ambisonic order [20]. With fifth-order Ambisonics, nearly all the area
spanned by the horizontal loudspeakers at the IEM CUBE, the 12 × 10 m concert
space at our lab, becomes a valid listening area.
References
21
References
1. A.D. Blumlein, Improvements in and relating to sound-transmission, sound-recording and
sound-reproducing systems. GB patent 394325 (1933), http://www.richardbrice.net/blumlein_
patent.htm
2. M.A. Gerzon, Why coincident microphones? in Reproduced from Studio Sound, by permission
of IPC Media Ltd, publishers of Hi-Fi News (www.hifinews.co.uk), vol. 13 (1971), https://
www.michaelgerzonphotos.org.uk/articles/Why
3. P.B. Vanderlyn, In search of Blumlein: the inventor incognito. J. Audio Eng. Soc. 26(9) (1978)
4. S.P. Lipshitz, Stereo microphone techniques: Are the purists wrong? J. Audio Eng. Soc. 34(9)
(1986)
5. S. Weinzierl, Handbuch der Audiotechnik (Springer, Berlin, 2008)
6. A. Friesecke, Die Audio-Enzyklopädie. G. Sazur (2007)
7. M.A. Gerzon, The design of precisely coincident microphone arrays for stereo and surround
sound, prepr. L-20 of 50th Audio Eng. Soc. Conv. (1975)
8. G. Bore, S.F. Temmer, M-s stereophony and compatibility, Audio Magazine (1958), http://
www.americanradiohistory.com
9. M.A. Gerzon, Application of blumlein shuffling to stereo microphone techniques. J. Audio
Eng. Soc. 42(6) (1994)
10. D.H. Cooper, T. Shiga, Discrete-matrix multichannel stereo. J. Audio Eng. Soc. 20(5), 346–360
(1972)
11. P. Felgett, Ambisonic reproduction of directionality in surround-sound systems. Nature 252,
534–538 (1974)
12. P. Craven, M.A. Gerzon, Coincident microphone simulation covering three dimensional space
and yielding various directional outputs, U.S. Patent, no. 4,042,779 (1977)
13. C. Faller and M. Kolundžija, “Design and limitations of non-coincidence correction filters
forsoundfield microphones,” in prepr. 7766, 126th AES Conv, Munich (2009)
14. J.-M. Batke, The b-format microphone revisited, in 1st Ambisonics Symposium, Graz (2009)
15. A. Heller E. Benjamin, Calibration of soundfield microphones using the diffuse-field response,
in prepr. 8711, 133rd AES Conv, San Francisco (2012)
16. M. Gerzon, General metatheory of auditory localization, in prepr. 3306, Conv. Audio Eng. Soc.
(1992)
17. D.G. Malham, A. Myatt, 3D Sound spatialization using ambisonic techniques. Comput. Music.
J. 19(4), 58–70 (1995)
18. J. Daniel, J.-B. Rault, J.-D. Polack, Acoustic properties and perceptive implications of stereophonic phenomena, in AES 6th International Conference: Spatial Sound Reproduction (1999)
19. M. Frank, F. Zotter, Spatial impression and directional resolution in the reproduction of reverberation, in Fortschritte der Akustik - DEGA, Aachen (2016)
20. M. Frank, F. Zotter, Exploring the perceptual sweet area in ambisonics, in AES 142nd Conv.
(2017)
22
1 XY, MS, and First-Order Ambisonics
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0
International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing,
adaptation, distribution and reproduction in any medium or format, as long as you give appropriate
credit to the original author(s) and the source, provide a link to the Creative Commons license and
indicate if changes were made.
The images or other third party material in this chapter are included in the chapter’s Creative
Commons license, unless indicated otherwise in a credit line to the material. If material is not
included in the chapter’s Creative Commons license and your intended use is not permitted by
statutory regulation or exceeds the permitted use, you will need to obtain permission directly from
the copyright holder.
Chapter 2
Auditory Events of Multi-loudspeaker
Playback
It is evident that until one knows what information needs to be
presented at the listener’s ears, no rational system design can
proceed.
Michael A. Gerzon 1976 and AES Vienna [1] 1992.
Abstract This chapter describes the perceptual properties of auditory events, the
sound images that we localize in terms of direction and width, when distributing a
signal with different amplitudes to one or a couple of loudspeakers. These amplitude
differences are what methods for amplitude panning implement, and they are also
what mapping of any coincident-microphone recording implies when reproduced
over the directions of a loudspeaker layout. Therefore several listening experiments
on localization are described and analyzed that are essential to understand and model
the psychoacoustical properties of amplitude panning on multiple loudspeakers of
a 3D audio system. For delay-based recordings or diffuse sounds, there is some
relation, however, it is found to be less stable for the desired applications. Moreover,
amplitude panning is not only about consistent directional localization. Loudness,
spectrum, temporal structure, or the perceived width should be panning-invariant.
The chapter also shows experiments and models required to understand and provide
those panning-invariant aspects, especially for moving sounds. It concludes with
openly-available response data of most of the presented listening experiments.
Starting from classic listening experiments on stereo panning by Leakey [2], Wendt
[3], and pairwise horizontal panning by Theile [4], this chapter explores the relevant
perceptual properties for 3D amplitude panning and their models. Important experimental studies considered here are for instance those by Simon [5], Kimura [6],
F. Wendt [7], Lee [8], Helm [9], and Frank [10, 11]. By the experimental results, it
© The Author(s) 2019
F. Zotter and M. Frank, Ambisonics, Springer Topics in Signal Processing 19,
https://doi.org/10.1007/978-3-030-17207-7_2
23
24
2 Auditory Events of Multi-loudspeaker Playback
is possible to firmly establish Gerzon’s [1] E, r E and r E estimators for perceived
loudness, direction, and width that apply to most stationary sounds in typical studio
and performance environments.
2.1 Loudness
At a measurement point in the free field, the same signal fed to equalized loudspeakers
of exactly the same acoustic distance would superimpose constructively (+6 dB).
In a room with early reflections and a less strict equality of the incoming pair of
sounds (typical, slight inaccuracy in loudspeaker/listener position, different mounting situations, different directions in the directivities of ears and loudspeakers), the
superposition can be regarded as stochastically constructive (+3 dB) in particular at
frequencies that aren’t very low.
For the above reasoning, typical amplitude panning rules try to keep the weights
distributing the signal to the loudspeakers normalized by root of squares instead of
normalizing to the linear sum, in order to obtain constant loudness ([12], VBAP):
gl
gl ← L
l=1
.
(2.1)
gl2
Loudness Model. If all loudspeakers are equalized, located at the same distance to the
listener, and fed by the same signal with different amplitude gains gl , a constructive
interference could be expected so that the amplitude becomes [1]
P=
L
gl .
(2.2)
l=1
However, the interference stops to be strictly constructive as soon as the room is not
entirely anechoic, the sitting position is not exactly centered, or even for anechoic and
centered conditions at high frequencies, when the superposition at the ears cannot
be assumed to be purely constructive anymore. Then it is better to assume a less
well-defined, stochastic superposition in which a squared amplitude is determined
by the sum of the squared weights [1]:
E=
L
gl2 .
(2.3)
l=1
Therefore, the most common amplitude panning rules use root-squares normalization
to obtain a loudness impression that is as constant as possible.
The measure E seems to be most useful when designing and evaluating amplitudepanning or coincident microphone techniques. It is not surprising that the ITU-R
2.1 Loudness
25
BS.1770-41 uses the Leq(RLB) measure as a loudness model: it is essentially the
RMS level after high-pass filtering, cf. [13], which is closely related to the E measure
detected from loudspeaker signals.
An interesting
refinement was proposed by Laitinen et al. [14], which uses a
p
L
p
measure
l=1 gl in which the exponent p is close to 1 at low frequencies under
anechoic conditions and close to 2 at high frequencies/under reverberant conditions.
2.2 Direction
In the early years of stereophony, researchers investigated the differences in delay
times and amplitudes required to control the perceived direction. Below, only experiments are considered that did not use fixation of the listener’s head.
2.2.1 Time Differences on Frontal, Horizontal Loudspeaker
Pair
The dissertation of K. Wendt in 1963 [3] shows notably accurate listening experiments done on ±30◦ two-channel stereophony using time delays, in which listeners
indicated from where they heard the sounds for each of the tested time differences.
H. Lee revisited the properties in 2013 [8], but with musical sound material and an
experiment, in which the listener adjusted the time differences until the perceived
direction matched the one of a corresponding fixed reference loudspeaker, Fig. 2.1.
The time differences are seldom applicable to reliable angular auditory event
placement: auditory images are strongly frequency-dependent (not shown here) and
therefore unstable for narrow-band sounds. Leakey and Cherry showed 1957 [2] that
time-delay stereophony loses its effect under the presence of background noise.
2.2.2 Level Differences on Frontal, Horizontal Loudspeaker
Pair
K. Wendt’s [3] and H. Lee’s [8] experiments deliver insights in sound source positioning with ±30◦ two-channel stereophony, however this time with level differences.
As opposed to Fig. 2.1, in which auditory image panning with time differences
were characterized by statistical spreads of up to 15◦ , level-difference-based panning
is clearly smaller in the spread of perceived directions than 10◦ , Fig. 2.2.
Signal dependency. Wendt [3] described the signal dependency of panning curves
on various transient and band-limited sounds, and Lee [8] for musical sounds. A new
1 https://www.itu.int/rec/R-REC-BS.1770-4-201510-I/en Algorithms to measure audio programme loudness and true-peak audio level (10/2015).
26
2 Auditory Events of Multi-loudspeaker Playback
30
angle in degrees
20
10
0
−10
−20
1
0.75
−0.5
−0.4
−0.3
−0.2
−0.1
0
0.1
0.2
0.3
0.4
0.5
−1
−0.75
−30
time delay in ms
Fig. 2.1 K. Wendt’s experiment [3] used an angular marks helping to specify the localized direction
(left). Right shows results for time differences between impulse signals fed to loudspeakers, no head
fixation (diagram shows means and standard deviation; the standard deviation was interpolated for
the figure). In gray: Results of the time-difference adjustment experiment of Lee [8] using musical
material (25, 50, 75% quartiles, symmetrized diagram)
30
angle in degrees
20
10
0
−10
−20
30
20
−15
−12
−9
−6
−3
0
3
6
9
12
15
−20
−30
−30
Level difference in dB
Fig. 2.2 Wendt’s [3] results to crack (impulsive) signals with level differences and without head
fixation (the figure shows means and standard deviation; standard deviation was interpolated to
plot this figure). In gray: Results of Lee’s [8] level-difference adjustment experiment with musical
sounds (25, 50, 75% quartiles, symmetrized diagram)
comprehensive investigation on frequency dependency was carried out by Helm and
Kurz [9]. With level differences {0, 3, 6, 9, 12} dB and third-octave filtered pulsed
pink noise at {125, 250, 500, 1k, 2k, 4k} Hz, they showed that the perceived angle
pointed at by the listeners using a motion-tracked pointer was similar between the
broad-band case and third-octave bands below 2 kHz. In bands above 2 kHz, smaller
level differences cause a larger lateralization, see interpolated curves in Fig. 2.3.
27
30
20
10
broadb.
4kHz
500Hz
0
0
3
6
9
12
slope in degree per dB
Perceived angle in degree
2.2 Direction
3.5
3
2.5
2
broadb. 125Hz 250Hz 500Hz 1kHz
2kHz
4kHz
Level difference in dB
Frequency band
(a) Third-octave vs. broad-band
(b) Slope of different bands
8kHz
Fig. 2.3 Panning curve for frontal ±30◦ loudspeaker pair from [9] on the example of the 500 and
4 kHz third-octave band and the slopes for different bands, based on the 3 and 6 dB conditions
2.2.3 Level Differences on Horizontally Surrounding Pairs
Successive pairwise panning on neighboring loudspeaker pairs is typically used to
pan auditory events freely along the loudspeakers of a horizontally surrounding loudspeaker ring. The classical research done specifically targeted at such applications
was contributed by Theile and Plenge 1977 [4]. They used a mobile reference loudspeaker with some reference sound that could be moved to match the perceived
direction of a loudspeaker pair playing pink noise with level differences at different orientations with respect to the listener’s head. There is also the experiment of
Pulkki [15] using a level-adjustment task, in which levels were adjusted as to match
the auditory event to one of a reference loudspeaker at three different reference
directions and for different head orientations. A comprehensive experiment was done
by Simon et al. [5], who used a graphical user interface displaying the floor plan of
a 45◦ -spaced loudspeaker ring to have the listeners specify the perceived direction.
Martin et al. in 1999 [16] used a graphical user interface showing the floorplan of
a 5.1 ring in their experiment, and last but not least, Matthias Frank used a direct
pointing method to enter the perceived direction [10] in one of his experiments.
As the experiments did not seem to yield consistent results, a comprehensive leveldifference adjustment experiment with 24 loudspeakers arranged as a horizontal ring
was done in [17] and partially repeated later in [11], see results in Fig. 2.4. In the
repeated experiment [11] it became clear that in the anechoic room, a large amount
of the differently pronounced localization biases can be avoided by encouraging
the listeners to do front-back and left-right head motion by a few of centimeters,
whenever there is doubt.
28
2 Auditory Events of Multi-loudspeaker Playback
120
reference (=perceived)
angle in degree
reference (=perceived)
angle in degree
30
15
0
−15
−30
−10
−5
0
5
10
105
90
75
60
−10
−5
0
5
10
adjusted level difference in dB
adjusted level difference in dB
(a) frontal 60 stereo pair
(b) lateral 60 stereo pair
Fig. 2.4 Medians and 95% confidence intervals for adjusted level differences to align amplitudepanned pink-noise with harmonic complex tone from {±15◦ , 0◦ }, for a frontal and b lateral 60◦
stereo pair; a uses data from [17] with 4 responses per direction from 5 listeners; b used data
from [11] with 20 responses per direction. Despite the considerably different spread, frontal and
lateral stereo pairs seem to yield pretty much the same tendency
2.2.4 Level Differences on Frontal, Horizontal to Vertical
Pairs
Quite extensively, T. Kimura investigates the localization of auditory events between
frontal, vertical ±13.5◦ loudspeaker pairs in 2012 [6, 18]. The work of F. Wendt in
2013 [7, 19] also investigates a slant and vertical loudspeaker pair, Fig. 2.5. Kimura
uses pulsed white noise, Wendt uses pulsed pink noise.
Obviously, the horizontal spread is always smaller than the vertical spread and the
spread does not align with the direction of the loudspeaker pair. The largest vertical
spread appears for the vertical loudspeaker pair.
2.2.5 Vector Models for Horizontal Loudspeaker Pairs
A weighted sum of the loudspeakers’ direction vectors θ1 , θ2 could be conceived
as simple linear model of the perceived direction, using a linear blending parameter
0≤q≤1
r = (1 − q) θ1 + q θ2 .
(2.4)
The parameter q adjusts where the resulting vector r is located on the connecting
line between θ1 and θ2 . On frontal loudspeaker pairs, localization curves typically
run through the middle direction q = 21 for level differences of 0 dB. If only one
2.2 Direction
29
15
angle in degrees
10
5
0
−5
−10
15
9
12
6
3
0
−3
−6
−9
−15
−12
−15
Level difference in dB
Fig. 2.5 Mean values and 95% confidence intervals of the direct-pointing experiments of Kimura
(top) with level differences on a vertical ±13.5◦ loudspeaker pair and results of F. Wendt (bottom) on
frontally arranged horizontal, slant, and vertical ±20◦ loudspeaker pairs showing two-dimensional
95% confidence (solid) and standard deviation ellipses (dotted)
loudspeakers is active, the result is either of the loudspeaker directions, thus the
parameter is q = 0 or q = 1.
Classical definitions. As the simplest choice for q, one could insert q =
g22
g12 +g22
g2
g1 +g2
or
to get the vector definitions as weighted average using either the linear or
q=
squared gains according to [1]:
rV =
g1 θ1 + g2 θ2
,
g1 + g2
rE =
g12 θ1 + g22 θ2
.
g12 + g22
(2.5)
For both models, equal gains g1 = g2 yield q = 21 , and also the endpoints with g2 = 0
or g1 = 0 correspond to q = 0 or q = 1, respectively. However, the slope of the r E
vector is steeper than the one of the r V . For instance, if g2 = 2 g1 , the vector r V lies
on q = 2/3 of the line between θ1 and θ2 , while r E lies at q = 4/5 of the connecting
line.
The r V vector for the ±α loudspeaker pair at the directions θT1,2 = (cos α, ± sin α)
corresponds to the tangent law [20], whose formal origin lies in a model of summing
localization based on a simple model of the ear signals, cf. Appendix A.7. The equivalence of this law to the vector model follows from the tangent tan ϕ as ratio of the
2 sin(−α)
2
= gg11 −g
tan α.
y divided by x component of the r V vector, tan ϕ = gg11 sin(α)+g
cos(α)+g2 cos(α)
+g2
2 Auditory Events of Multi-loudspeaker Playback
30
30
reference (=perceived)
angle in degree
Perceived angle in degree
30
rV
20
rE
r
1.4
r
1.8
10
4kHz
500Hz
15
0
rE
r
0
1.5
−30
−10
12
9
6
3
0
Level difference in dB
−5
0
5
10
adjusted level difference in dB
(a) frontal 60 stereo pair, third-octaves
120
(b) frontal 60 stereo pair, broadband
20
Perceived angle in degree
reference (=perceived)
angle in degree
rV
−15
105
90
r
V
75
r
E
r
1.7
60
−10
−5
0
5
10
rE
r
2.3
0
horizontal
vertical
−10
−20
10
adjusted level difference in dB
(c) lateral 60 stereo pair, broadband
−6
−3
0
3
6
Level difference in dB
(d) frontal horizontal and vertical 20 pair, broadband
Fig. 2.6 Fit of the r V , r E , and r γ models for a third-octave noise on a frontal stereo pair using
data from [9], and with data from [11]: b pink noise frontal and c lateral, cf. Figs. 2.3 and 2.4; d
horizontal and vertical from [7], Fig. 2.5
Adjusted slope. Differently steep curves were fitted by an adjustable-slope model [17]
rγ =
|g1 |γ θ1 + |g2 |γ θ2
,
|g1 |γ + |g2 |γ
(2.6)
which uses γ = 1 for r V and γ = 2 for r E . Figure 2.6 compares the prediction by
r V , r E , and r γ to frequency-dependently perceived directions in frontal horizontal
pairs, to perceived directions in a lateral stereo pair, and to perceived directions in
a frontal pair that is either horizontal or vertical, using various studies mentioned
above.
Practical choice r E . While a specific exponent γ closely fitting the experimental
data may vary, a constant value is preferable. Figure 2.6 indicates that in most cases
focusing on r E is reasonable and sufficiently precise, see also [11].
2.2 Direction
31
0.3dB,11.2dB
0dB,0dB
0dB,0dB
3.6dB,3.3dB
0dB,5.5dB
5.8dB,0dB
Fig. 2.7 Indirect level-adjustment experiment of Pulkki [21] shows the spread and mean of the
adjusted VBAP angles for frontal loudspeaker triplets, and the experiments of F. Wendt [7, 19]
use a direct pointing method to obtain results in the shape of two-dimensional 95% confidence
(solid) and standard deviation ellipses (dotted) for {−∞, 0, +11.71} dB for the top loudspeaker
(left diagram), or the right loudspeaker (center diagram) respectively, or {−∞, 0, +11.51} dB for
the bottom loudspeaker (right)
2.2.6 Level Differences on Frontal Loudspeaker Triangles
V. Pulkki [21] and F. Wendt [7, 19] investigated localization properties for frontal
loudspeaker triplets with level differences, see Fig. 2.7. Both used pulsed pink noise
in their experiments.
While V. Pulkki used an indirect adjustment task to evaluate VBAP control angles
to obtain auditory events directionally matching the respective reference loudspeakers, F. Wendt uses a direct pointing method. Wendt’s experiments indicate that loudspeaker triplets with three different azimuthal positions yield a smaller spread in the
indicated direction than such with vertical loudspeaker pairs (not the case in Pulkki’s
experiments).
2.2.7 Level Differences on Frontal Loudspeaker Rectangles
F. Wendt [7, 19] moreover presents experiments about frontal loudspeaker rectangles,
again using a pointer method and pulsed pink noise, Fig. 2.8.
Again it seems that arrangements avoiding vertical loudspeaker pairs exhibit a
smaller statistical spread in the responses.
32
2 Auditory Events of Multi-loudspeaker Playback
Fig. 2.8 Wendt’s experiments about frontal loudspeaker rectangles showing two-dimensional 95%
confidence (solid) and standard deviation ellipses (dotted). The experimental setup of this and
above-mentioned experiments is shown. Left: each of the corner loudspeakers is raised once by
+6 dB in level, right: both left/right loudspeaker levels are raised once by {+3, +6} dB, and both
top/bottom pairs are once raised by +6 dB
2.2.8 Vector Model for More than 2 Loudspeakers
For more than two active loudspeakers and in 3D, a vector model based on the
exponent γ = 2 yields the r E vector [1]
L
r E = l=1
L
l=1
gl2 θl
gl2
.
(2.7)
2.2.9 Vector Model for Off-Center Listening Positions
At off-center listening positions, the distances to the loudspeakers are not equal anymore, resulting in additional attenuation and delay for each loudspeaker depending
on the position. For stationary sounds, this effect can be incorporated into the energy
vector by additional weights wr,i and wτ,i
2.2 Direction
33
L
r E = l=1
L
(wr,l wτ,l gl )2 θl
l=1 (wr,l
wτ,l gl )2
.
(2.8)
The weight wr,l models the attenuation of a point-source-like propagation r1 . The
reference distance is the distance to the closest loudspeaker at the evaluated listening
position, thus the weight of each loudspeaker results in
wr,l =
1
.
rl
(2.9)
The incorporation of delays into the energy vector requires a transformation that
yields the weights wτ,l for each loudspeaker. It is reasonable that these weights attenuate the lagging signals in order to reduce their influence on the predicted direction.
is known from the echo threshold in [22], similarly [23],
An attenuation of 41 dB
ms
and has successfully been applied for the prediction of localization in rooms [24].
The weight of each loudspeaker is calculated as τl = rcl in seconds at the listening
position under test
wτ,l = 10
−1000
4·20 τl
.
(2.10)
Further weights can be applied in order to model the precedence effect in more
detail, as proposed by Stitt [25, 26]. Listening test results in [27] compared the
differently complex extensions of the energy vector and revealed that the simple
weighting with wr,i and wτ,i is sufficient for a rough prediction of the perceived
direction in typical playback scenarios.
The left side of Fig. 2.9 shows the predicted directions by the energy vector for
various listening positions when playing back the same signal on a standard stereo
loudspeaker pair with a radius of 2.5 m. The absolute localization error can be calculated from the difference of the predicted direction and the desired panning direction. The right side of Fig. 2.9 depicts areas with localization errors within 4 ranges:
0◦ . . . 10◦ (white, perfect localization), 10◦ . . . 30◦ (light gray, plausible localization),
30◦ . . . 90◦ (gray, rough localization), and >90◦ (dark gray, poor localization).
Concerning a single playback scenario, i.e. a single panning direction on a
loudspeaker setup, the perceptual sweet area for plausible playback can be estimated
by the area with localization errors below 30◦ . For the prediction of a more general
sweet area, the absolute localization errors can be computed for all possible panning
directions in a fine grid of 1◦ and averaged at each listening position as shown in
Fig. 2.10.
34
2 Auditory Events of Multi-loudspeaker Playback
Fig. 2.9 Predictions of perceived directions by the energy vector for different listening positions in
a standard stereo setup with two loudspeakers playing the same signal. Gray-scale areas on the right
indicate listening areas with predicted absolute localization errors within different angular ranges
Fig. 2.10 Predictions of
mean absolute localization
errors by the energy vector in
a standard stereo setup for
panning directions between
−30◦ and 30◦
2.3 Width
M. Frank [10] investigated the auditory source width for frontal loudspeaker pairs
with 0 dB level difference and various aperture angles, as well as the influence of
an additional center loudspeaker on the auditory source width. The response was
given by reading numbers off a left-right symmetric scale written on the loudspeaker
arrangement (Fig. 2.11).
Figure 2.11 (right) shows the statistical analysis of the responses. Obviously the
additional center loudspeaker decreases the auditory source width.
Auditory source with is difficult to compare for different directions and also single
loudspeakers yield auditory source widths that vary with direction. Still, a relatively
constant auditory source width is desirable for moving auditory events. For static
auditory events, the narrowest-possible extent can be desirable.
35
perceived source width in °
2.3 Width
60
80
70
50
40
60
C80
50
30
C70
40
C50
30
20
0
10
20
10
C20 C30
C10
0
C40
10
20
30
40
50
60
70
80
aperture angle between the two outmost
active loudspeakers in °
Fig. 2.11 Experimental setup and results of experiments of M. Frank (confidence intervals) about
auditory source width of frontal stereo pairs of the angles ±5◦ , . . . , ±40◦ and with an additional
center loudspeaker (C)
2.3.1 Model of the Perceived Width
The angle 2 arccos r E describes the aperture of a cap cut off the unit sphere perpendicular to the r E vector, at its tip, from the origin, see Fig. 2.12. As the r E vector
length is between 0 (unclear direction) and 1 (only one loudspeaker active), this angle
stays between 180◦ and 0◦ .
M. Frank’s experiments about the auditory source width [10, 28] showed that
stereo pairs of larger half angles α were also heard as wider. The length of the
r E vector gets shorter with the half angle α. In a symmetrical loudspeaker pair
θT12 = (cos α, ± sin α) with g1 = g2 = 1, the y coordinate of the r E vector cancels
and its length is
r E = rE,x = cos α.
The corresponding spherical cap is same size as the loudspeaker pair 2 arccos r E =
2α. However, only 58 of the size was indicated by the listeners of the experiments,
which yields the following estimator of the perceived width:
ASW =
5
8
·
180◦
π
· 2 arccos r E .
(2.11)
Fig. 2.12 Cap size associated with r E length model for L+R (left plot) and L+R+C (right plot)
36
60
perceived source width in °
Fig. 2.13 Model of the
perceived width as 58 of the
half-angle arccos r E matches the half-angle of the
experiment. Except for a
lower limit, which is
determined by the apparent
source width (ASW) due to
the room acoustical setting
2 Auditory Events of Multi-loudspeaker Playback
80
70
50
40
60
C80
50
30
C70
40
C50
30
20
0
20
10
10
C10
0
C40
10
C20 C30
20
30
40
50
60
70
80
aperture angle between the two outmost
active loudspeakers in °
For an additional center loudspeaker g3 = 1, θT = (1, 0), the estimator yields
r E = rE,x =
1 2
+ cos α,
3 3
an increase matching the experiments as arccos r E < α, see Figs. 2.13 and 2.12.
2.4 Coloration
Despite research primarily focuses on the spatial fidelity of multi-loudspeaker playback, the overall quality of surround sound playback was found to be largely determined by timbral fidelity (70%) [29]. Loudspeakers in a studio or performance space
are often characterized by different colorations that are caused by different reflection
patterns (most often the wall behind the loudspeaker). When changing the active
loudspeakers, or their number, these differences become audible. On the one hand,
static coloration, e.g. the frequency responses of the loudspeakers, can typically be
equalized. On the other hand, changes in coloration during the movement of a source
cannot be equalized easily and yield annoying comb filters.
Although coloration is often assessed verbally [30], we employ a simple technical
predictor based on the composite loudness level (CLL) by Ono [31, 32]. The CLL
spectrum predicts the perceived coloration and is calculated from the sum of the
loudnesses of both ears in each third-octave band. Studies about loudspeaker and
headphone equalization show that differences in third-octave band levels of less than
1dB are inaudible by most listeners [33, 34]. This criterion can also be applied for
the perception of coloration, i.e., differences between CLL spectra of less than 1dB
are assumed to be inaudible.
Pairwise panning between loudspeakers results in a single active loudspeaker for
source directions that coincide with the direction of a loudspeaker and two equally
loud loudspeakers for source directions exactly between two neighboring loudspeakers, cf. Fig. 2.14. In the second case, the different propagation paths from the two
loudspeakers to the ears create a comb filter. This comb filter is not present for sources
2.4 Coloration
37
L
C
R
L
C
R
Coloration CLL in dB
1
0
-1
-2
on lspk.
betw. lspk.
difference
-3
10
2
10
3
10
4
Frequency in Hz
Fig. 2.14 Coloration predicted by composite loudness levels for a single loudspeaker C (black),
two equally loud loudspeakers C and R (light gray), and their difference (dashed dark gray)
played from a single loudspeaker. Thus, moving a source between the two directions
yields noticeable coloration. This is in contrast to static sources, for which Theile’s
experiments [35] indicated that they are perceived without coloration.
The actual shape of the afore-mentioned comb filter depends on the angular distance between the loudspeakers. The first notch and its depth decreases with the distance. This implies that coloration increases for playback with higher loudspeaker
densities.
A similar comb filter is created when using a triplet of loudspeakers with the
same loudspeaker density as the pair, e.g. L, C, R compared to C, R. In order to
avoid a strong increase in source width or annoying phasing effects, the outmost
loudspeakers L and R are strongly reduced in their level, typically around -12dB
compared to loudspeaker C. In doing so, the similarity of the comb filters yields
barely any coloration when moving a source between the two directions, cf. Fig. 2.15.
Judging from what is shown above, it appears beneficial to activate always a few
loudspeaker to stabilize the coloration, as opposed to using just one loudspeaker
and moving the playback to another one. Keeping the number of simultaneously
active loudspeakers more or less constant does not only prevent coloration of source
movements, it also yields a more constant source width. Because of this relation
between coloration and source width, the fluctuation of r E is also a simple predictor
of panning-dependent coloration.
In general, the strongest coloration is perceived under anechoic listening conditions. In reverberant rooms, the additional comb filters introduced by reflections help
to conceal the comb filters due to multi-loudspeaker playback.
38
2 Auditory Events of Multi-loudspeaker Playback
L
C
R
L
C
R
Coloration CLL in dB
1
0
-1
-2
-3
on lspk.
betw. lspk.
difference
10 2
10 3
10 4
Frequency in Hz
Fig. 2.15 Coloration predicted by composite loudness levels for loudspeaker C with additional –
12 dB from L and R (black), two equally loud loudspeakers C and R (light gray), and their difference
(dashed dark gray)
2.5 Open Listening Experiment Data
Experimental data from azimuthal localization in frontal and lateral loudspeaker pairs
Figs. 2.3 and 2.4, azimuthal/elevational localization in horizontal, skew, and vertical
frontal pairs Fig. 2.5, triangles Fig. 2.7, and quadrilaterals Fig. 2.8 are available online
at https://opendata.iem.at in the listening experiment data project, as well as the data
to the width experiment in Fig. 2.11.
The opendata.iem.at listening experiment data project contains evaluation routines
to analyze the 95%-confidence intervals symmetrically based on means, standard
deviations and the inverse Student’s t-distribution CIMEAN.m, or more robustly based
on median and inter-quartile ranges CI2.m and Student’s t-distribution, or for twodimensional data analysis robust_multivariate_confidence_region.m. The
MATLAB script plot_gathered_data.m reads the formatted listening experiment
data and its exemplary code generates figures like the above.
In order to support others providing own listening experiment data, the
MATLAB functions write_experimental_data.m read_experimental_data.m
are provided on the website.
References
1. M. Gerzon, General metatheory of auditory localization, in prepr. 3306, Convention Audio
Engineering Society (1992)
2. D.M. Leakey E.C. Cherry, Influence of noise upon the equivalence of intensity differences and
small time delays in two-loudspeaker systems. J. Acoust. Soc. Am. 29, 284–286 (1957)
3. K. Wendt, Das Richtungshören bei Überlagerung zweier Schallfelder bei Intensitäts- und
Laufzeitsterophonie, Ph.D. Thesis, RWTH-Aachen (1963)
References
39
4. G. Theile, G. Plenge, Localization of lateral phantom sources. J. Audio Eng. Soc. 25(4), 96–200
(1977)
5. L. Simon, R. Mason, F. Rumsey, Localisation curves for a regularly-spaced octagon loudspeaker
array, in prepr. 7015, Convention Audio Engineering Society (2009)
6. T. Kimura, H. Ando, 3D audio system using multiple vertical panning for large-screen multiview
3D video display. ITE Trans. Media Technol. Appl. 2(1), 33–45 (2014)
7. F. Wendt, M. Frank, F. Zotter, Panning with height on 2, 3, and 4 loudspeakers, in Proceedings
of the ICSA (Erlangen, 2013)
8. H. Lee, F. Rumsey, Level and time panning of phantom images for musical sources. J. Audio
Eng. Soc. 61(12), 978–988 (2013)
9. J.M. Helm, E. Kurz, M. Frank, Frequency-dependent amplitude-panning curves, in 29th Tonmeistertagung (Köln, 2016)
10. M. Frank, Phantom sources using multiple loudspeakers in the horizontal plane, Ph.D. Thesis,
Kunstuni Graz (2013)
11. M. Frank F. Zotter, Extension of the generalized tangent law for multiple loudspeakers, in
Fortschritte der Akustik - DAGA (Kiel, 2017)
12. V. Pulkki, Virtual sound source positioning using vector base amplitude panning. J. Audio Eng.
Soc. 45(6), 456–466 (1997)
13. G. Soulodre, Evaluation of objective measures of loudness, in prepr. 6161, 116th AES Convention (Berlin, 2004)
14. M.-V. Laitinen, J. Vilkamo, K. Jussila, A. Politis, V. Pulkki, Gain normalization in amplitude
panning as a function of frequency and room reverberance, in Proceedings 55th AES Conference
(Helsinki, 2014)
15. V. Pulkki, Compensating displacement of amplitude-panned virtual sources, in 22nd Conference Audio Engineering Society (2003)
16. G. Martin, W. Woszczyk, J. Corey, R. Quesnel, Sound source localization in a five-channel
surround sound reproduction system, in prepr. 4994, Convention Audio Engineering Society
(1999)
17. F. Zotter, M. Frank, Generalized tangent law for horizontal pairwise amplitude panning, in
Proceedings of the 3rd ICSA, Graz (2015)
18. T. Kimura, H. Ando, Listening test for three-dimensional audio system based on multiple
vertical panning, in Aoustics-12, Hong Kong (2012)
19. F. Wendt, Untersuchung von phantomschallquellen vertikaler lautsprecheranordnungen, M.
Thesis, Kunstuni Graz (2013)
20. H. Clark, G. Dutton, P. Vanerlyn, The ’stereosonic’ recording and reproduction system. Repr.
J. Audio Eng. Soc. Proc. Inst. Electr. Eng. 1957 6(2), 102–117 (1958)
21. V. Pulkki, Localization of amplitude-panned virtual sources ii: two- and three-dimensional
panning. J. Audio Eng. Soc. 49(9), 753–767 (2001)
22. B. Rakerd, W.M. Hartmann, J. Hsu, Echo suppression in the horizontal and median sagittal
planes. J. Acoust. Soc. Am. 107(2) (2000)
23. P.W. Robinson, A. Walther, C. Faller, J. Braasch, Echo thresholds for reflections from acoustically diffusive architectural surfaces. J. Acoust. Soc. Am. 134(4) (2013)
24. F. Zotter, M. Frank, M. Kronlachner, J. Choi, Efficient phantom source widening and diffuseness
in ambisonics, in EAA Symposium on Auralization and Ambisonics (Berlin, 2014)
25. P. Stitt, S. Bertet, M. van Walstijn, Extended energy vector prediction of ambisonically reproduced image direction at off-center listening positions. J. Audio Eng. Soc. 64(5), 299–310
(2016)
26. P. Stitt, S. Bertet, M. Van Walstijn, Off-center listening with third-order ambisonics: dependence
of perceived source direction on signal type. J. Audio Eng. Soc. 65(3), 188–197 (2017)
27. E. Kurz, M. Frank, Prediction of the listening area based on the energy vector, in Proceedings
of the 4th International Conference on Spatial Audio (ICSA) (Graz, 2017)
28. M. Frank, Source width of frontal phantom sources: perception, measurement, and modeling.
Arch. Acoust. 38(3), 311–319 (2013)
40
2 Auditory Events of Multi-loudspeaker Playback
29. F. Rumsey, S. Zieliński, R. Kassier, S. Bech, On the relative importance of spatial and timbral
fidelities in judgments of degraded multichannel audio quality. J. Acoust. Soc. Am. 118(2),
968–976 (2005), http://link.aip.org/link/?JAS/118/968/1
30. S. Choisel, F. Wickelmaier, Evaluation of multichannel reproduced sound: scaling auditory
attributes underlying listener preference. J. Acoust. Soc. Am. 121(1), 388–400 (2007)
31. K. Ono, V. Pulkki, M. Karjalainen, Binaural modeling of multiple sound source perception:
methodology and coloration experiments. Audio Eng. Soc. Conv. 111, 11 (2001), http://www.
aes.org/e-lib/browse.cfm?elib=9884
32. K. Ono, V. Pulkki, M. Karjalainen, Binaural modeling of multiple sound source perception:
Coloration of wideband sound. Audio Eng. Soc. Conv. 112 4 (2002), http://www.aes.org/e-lib/
browse.cfm?elib=11331
33. A. Ramsteiner, G. Spikofski, Ermittlung von Wahrnehmbarkeitsschwellen für Klangfarbenunterschiede unter Verwendung eines diffusfeldentzerrten Kopfhörers (Fortschritte der Akustik,
DAGA, 1987)
34. M. Karjalainen, E. Piirilä, A. Järvinen, J. Huopaniemi, Comparison of loudspeaker equalization
methods based on DSP techniques. J. Audio Eng. Soc. 47(1/2), 14–31 (1999), http://www.aes.
org/e-lib/browse.cfm?elib=12117
35. G. Theile, Über die Lokalisation im überlagerten Schallfeld, Ph.D. dissertation, Technische
Universität (Berlin, 1980)
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0
International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing,
adaptation, distribution and reproduction in any medium or format, as long as you give appropriate
credit to the original author(s) and the source, provide a link to the Creative Commons license and
indicate if changes were made.
The images or other third party material in this chapter are included in the chapter’s Creative
Commons license, unless indicated otherwise in a credit line to the material. If material is not
included in the chapter’s Creative Commons license and your intended use is not permitted by
statutory regulation or exceeds the permitted use, you will need to obtain permission directly from
the copyright holder.
Chapter 3
Amplitude Panning Using Vector Bases
The method is straightforward and can be used on many
occasions succesfully.
Ville Pulkki [1], Ph.D. Thesis, 2001.
Abstract This chapter describes Ville Pulkki’s famous vector-base amplitude panning (VBAP) as the most robust and generic algorithm of amplitude panning that
works on nearly any surrounding loudspeaker layout. VBAP activates the smallestpossible number of loudspeakers, which gives a directionally robust auditory event
localization for virtual sound sources, but it can also cause fluctuations in width
and coloration for moving sources. Multiple-direction amplitude panning (MDAP)
proposed by Pulkki is a modification that increases the number of activated loudspeakers. In this way, more direction-independence is achieved at the cost of an
increased perceived source width and reduced localization accuracy at off-center
positions. As vector-base panning methods rely on convex hull triangulation, irregular loudspeaker layouts yielding degenerate vector bases can become a problem.
Imaginary loudspeaker insertion and downmix is shown as robust method improving the behavior, in particular for smaller surround-with-height loudspeaker layouts.
The chapter concludes with some practical examples using free software tools that
accomplish amplitude panning on vector bases.
Vector-base amplitude panning (VBAP) was extensively described and investigated
in [2], alongside with the stabilization of moving sources by adding spread with
multiple-direction amplitude panning (MDPA) [3]. Since then, VBAP and MDAP
have been becoming the most common and popular amplitude panning techniques,
which is particularly robust and can automatically adapt to specific playback layouts.
© The Author(s) 2019
F. Zotter and M. Frank, Ambisonics, Springer Topics in Signal Processing 19,
https://doi.org/10.1007/978-3-030-17207-7_3
41
42
3 Amplitude Panning Using Vector Bases
3.1 Vector-Base Amplitude Panning (VBAP)
Assuming the r V model to predict the perceived direction, an intended auditory event
at a panning direction θ , we call it the virtual source, can theoretically be controlled
by the criterion according to V. Pulkki [2]
θ=
L
g̃l θl .
(3.1)
l=1
Here, θl are the direction vectors of the loudspeakers involved and the amplitude
weights g̃l need to be normalized for constant loudness
g̃l
gl = L
.
(3.2)
2
l=1 g̃l
Moreover, the weights gl should always stay positive to avoid in-head localization
or other irritating listening experiences. For loudspeaker rings around the horizon,
always 1 or 2 loudspeakers will be contributing to the auditory event, for loudspeakers
arranged on a surrounding sphere, always 1 up to 3 loudspeakers will be used, whose
directions must enclose the direction of the desired auditory event, the virtal source.
For the directional stability of the auditory event, the angle enclosed between the
loudspeakers should stay smaller than 90◦ .
The system of equations for VBAP [2] uses 3 loudspeaker directions and gains to
model the panning direction θ
⎡ ⎤
g̃1
θ = [θ1 , θ2 , θ3 ] ⎣g̃2 ⎦ = L · g̃
g̃3
⇒ g̃ = L−1 θ,
g=
g̃
.
g̃
(3.3)
The selection of the activated loudspeaker triplet is preceded by forming all triplets
of the convex hull spanned by all the given playback loudspeakers. To find the
loudspeaker triplet that needs to be activated, the list of all triplets is being searched
for the one with all-positive weights, g1 ≥ 0, g2 ≥ 0, g3 ≥ 0.
Figure 3.1 shows the localization curve for VBAP between a loudspeaker at 0◦
and 45◦ for a centrally seated listener and one shifted to the left. The experiment
is described in [4] and results were gathered by a 1.8 m circle of 8 loudspeakers,
and listeners indicated the perceived direction by naming numbers from a 5◦ scale
mounted on the loudspeaker setup. Black whiskers of the results (95% confidence
intervals and medians) for the centrally seated listener indicate a mismatch between
slope of the perceived angles with VBAP; the ideal curve is represented by the dashed
line and the mismatch can be understood by a better match of other exponents γ in
Fig. 2.6. The directional spread is quite narrow. For an off-center left-shifted listening
position the perceived directions is shown in terms of a 5◦ histogram (gray bubbles)
in Fig. 3.1. For this off-center position, it becomes clear that the closest loudspeaker
dominates localization within a third of the panning directions. Still, the directional
right
3.1 Vector-Base Amplitude Panning (VBAP)
43
-45
-40
panning angle in °
-35
-30
-25
-20
-15
-10
left
-5
0
30 25 20 15 10
left
5
0
-5 -10 -15 -20 -25 -30 -35 -40 -45
perceived angle in °
right
Fig. 3.1 Perceived directions for VBAP between loudspeakers at 0◦ and 45◦ from [4]. 95% confidence intervals and medians (black) are for a centrally seated listener in a circle of 2.5 m radius.
Localization for left-shifted listener (1.25 m) can become bi-modal, so that 5◦ bubble histogram is
shown (gray)
panning (=adjusted) angle in degree
panning (=adjusted) angle in degree
mapping seems to be monotonic with the panning angle, and the perceived direction
stays within the loudspeaker pair, which is a robust result, at least.
In Fig. 3.2 we see that responses from [5] in which the panning angle was adjusted
to match reference loudspeakers set up in steps of 15◦ on amplitude-panned lateral
loudspeaker pairs fairly match the reference directions using VBAP. The r E vector
model (black curve) delivers a better match with only one exception at 105◦ . This
motivates VBIP as alternative strategy.
120
90
60
60
90
120
reference (=perceived) angles in degree
(a) Ring starting at 0
120
90
60
60
90
120
reference (=perceived) angles in degree
(b) Ring starting at 30
Fig. 3.2 VBAP angles on a 60◦ -spaced horizontal loudspeaker ring starting at 0◦ (a) or 30◦ (b),
perceptually adjusted to match panned pink noise with harmonic-complex acoustic reference in 15◦
steps, from [5]; black curve shows r E model prediction
44
3 Amplitude Panning Using Vector Bases
120
100
80
60
40
20
0
0
90
180
270
360
Fig. 3.3 The width measure 2 arccos r E for a virtual source on a horizontal and a vertical trajectory (45◦ azimuth) using VBAP on an octahedral arrangement
Vector-Base Intensity Panning (VBIP). With nearly the same set of equations, but
improving the perceptual mapping by the squares of the weights, the auditory event
can be controlled corresponding to the direction of the r E vector
⎡ 2⎤
g̃1
⎢ 2⎥
θ = [θ1 , θ2 , θ3 ] ⎣g̃2 ⎦ = L · g̃ sq
g̃32
⎡
⇒ g̃ sq = L−1 θ,
g̃sq1
⎤
⎥
⎢
g̃ = ⎣ g̃sq2 ⎦ ,
g̃sq3
g=
g̃
.
g̃
(3.4)
This formulation appears more contemporary due to the excellent match of the r E
model to predict experimental results, as shown earlier.
Non-smooth VBAP/VBIP width. If one of the loudspeakers is exactly aligned with
the virtual source for either VBAP or VBIP, e.g. θ1 = θ , the resulting gains are
g1,2,3 = (1, 0, 0), and therefore only 1 loudspeaker will be activated. For a virtual source
√ between the 2 loudspeakers, e.g. θ1 + θ2 ∝ θ , then we obtain g1,2,3 =
(1, 1, 0)/ 2, and hereby only 2 loudspeakers will be active. This behavior in particular yields audible variation in the perceived width and coloration. For virtual source
movements that cross a common edge of neighboring loudspeaker triplets, there will
often be unexpectedly intense jumps that are quite pronounced.
Figure 3.3 illustrates the variation of the perceived width with VBAP on an octahedral arrangements of loudspeakers in the directions θlT ∈ {[±1, 0, 0], [0, ±1, 0],
[0, 0, ±1]}.
3.2 Multiple-Direction Amplitude Panning (MDAP)
In order to adjust the r E or r V vector not only directionally but also in length, and
thus to control the number of active loudspeakers for moving sound objects, Pulkki
extended VBAP to multiple-direction amplitude panning (MDAP [3]). Hereby not
only the perceived width but also the coloration can be held constant.
3.2 Multiple-Direction Amplitude Panning (MDAP)
45
Direction spread in MDAP. MDAP employs more than one virtual source distributed
around the panning direction as a directional spreading strategy. For horizontal loudspeaker rings, MDAP can consist of a pair of virtual VBAP sources at the angle
±α around the panning direction ϕs ± α. In a ring of L loudspeakers with uniform
◦
◦
, the angle α = 90% 180
yields optimally flat width for all
angular spacing of 360
L
L
panning directions, as shown for L = 6 in comparison between MDAP and VBAP
in Fig. 3.4. Moreover, MDAP seems to equalize the aiming of the r E measure to the
aiming of the r V measure, which is the one controlled by VBAP and MDAP.
30
VBAP
MDAP,φ ±27°
s
15
0
90% 30°=27°
0
60
120
180
240
300
360
panning angle in degree
∠ rE − φs in degree
Fig. 3.4 Width-related angle
arccos r E and angular
error ∠r E − ϕs for VBAP
and MDAP, with MDAP
using two◦virtual sources at
±90% 180
L around the
intended panning direction.
This not only achieves
minimal angular errors but
also panning-invariant width
arccos|rE| in degree
Listening experiment results. Experiments from [4] in Fig. 3.5 investigate the perceived width for two possible horizontal loudspeaker ring layouts, both with 45◦
spacings, but one starting at 0◦ (“0”) the other at 22.5◦ (“1/2”). Widths of MDAP
with a direction spread of α = 22.5◦ are perceived as significantly similar on both
30
15
VBAP
MDAP,φs±27°
0
−15
−30
0
60
120
180
240
300
360
panning angle in degree
wide
1
relative source width
0.8
narrow
Fig. 3.5 Perceived width
differences when using
VBAP and MDAP for
panning onto or between
loudspeaker directions from
listening experiment in [4]
0.6
0.4
0.2
0
VBAP 0
VBAP 1/2
MDAP 0
MDAP 1/2
46
3 Amplitude Panning Using Vector Bases
coloration
imperceptible
Fig. 3.6 Coloration change
of horizontally moving
sources with VBAP and
MDAP on different
loudspeaker aperture angles
from listening experiment
in [4]
very intense
ring layouts, while VBAP yields significantly narrower results for panning onto the
frontal loudspeaker in the “0” layout, which activates a single loudspeaker, only.
Note that VBAP1/2 and MDAP1/2 are identical with α = 22.5◦ and were treated as
one condition.
Moreover, a more constant width measure also describes a more constant number
of activated loudspeakers while panning. Figure 3.6 shows that listeners can hear
the difference in coloration changes with rotatory panning using pink noise and a
constant speed. The figure shows that coloration fluctuations of MDAP are always
clearly smaller than with VBAP on similar loudspeaker rings. Moreover, coloration
changes are more pronounced on rings of 16 loudspeakers than with 8 loudspeakers,
which is explained by their faster fluctuation.
Figure 3.7 shows the results from [6] for a central and left-shifted off-center listening position when using MDAP on an 8-channel ring of loudspeakers. At the
central listening position, the perceived directional spread around the loudspeaker
positions 0◦ and 45◦ , obviously increases as expected, as indicated by the whiskers
(95% confidence intervals and medians). Moreover, the spread of MDAP seems to
slightly decrease the slope mismatch between the underlying VBAP algorithm and
the perceptual curve around the 22.5◦ direction.
Despite MDAP enforces a larger number of active loudspeakers, its localization
is still similarly robust as the one of VBAP, also at on off-center listening positions. The perceived direction can be assumed to stay at least confined within a
strictly directionally limited activation of loudspeakers. Correspondingly, the perceived directions shown in the gray 5◦ -histogram bubbles of Fig. 3.7 indicate the
perceived directions when the listener is located left-shifted off-center. While localization is slightly attracted by the closer loudspeaker at 0◦ , the larger spread causes
a more monotonic outcome that is less split than with VBAP in Fig. 3.1.
For a more exhaustive study, Frank used 6 loudspeakers on the horizon and gave
the task to his listeners to align an MDAP pink-noise direction to match acoustical
references every 15◦ (harmonic complex) by adjusting the panning direction [5].
The results in Fig. 3.8 contain 24 answers from 6 subjects responding four times
1
0.8
0.6
0.4
0.2
0
VBAP8 LS
VBAP16 LS
MDAP8 LS MDAP16 LS
right
3.2 Multiple-Direction Amplitude Panning (MDAP)
47
-45
-40
panning angle in °
-35
-30
-25
-20
-15
-10
left
-5
0
30 25 20 15 10
left
5
0
-5 -10 -15 -20 -25 -30 -35 -40 -45
perceived angle in °
right
panning (=adjusted) angle in degree
panning (=adjusted) angle in degree
Fig. 3.7 Perceived directions for MDAP panning on an 8-channel 2.5 m radius loudspeaker ring
within the interval [0◦ , 45◦ ] at a central (black medians and 95%- confidence whiskers) and 1.25 m
left-shifted off-center listening position (gray 5◦ bubble histogram); dashed line indicates ideal
panning curve
180
150
120
90
60
30
0
0
30
60
90
120
150
180
reference (=perceived) angles in degree
(a) Ring starting at 0
180
150
120
90
60
30
0
0
30
60
90
120
150
180
reference (=perceived) angles in degree
(b) Ring starting at 30
Fig. 3.8 MDAP pink-noise directions on horizontal rings of 60◦ -spaced loudspeakers adjusted to
perceptually match reference loudspeaker directions (harmonic complex) every 15◦ . Markers and
whiskers indicate 95% confidence interals and medians, black curve the r E vector model
(by repetition and symmetrization). The black line shows directions indicated by the
r E vector model for the tested conditions. Obviously, the confidence intervals of the
adjusted MDAP angles match quite well both the reference directions and predictions
by the r E vector model, in particular for angles between 0◦ and 90◦ (except 75◦ ) for
the ring starting at 0◦ , and from 0◦ to 120◦ for the 30◦ -rotated ring. The mismatch is
much less than 4◦ for panning angles ≤ 90◦ .
48
3 Amplitude Panning Using Vector Bases
120
100
80
60
40
20
0
0
90
180
270
360
Fig. 3.9 The width measure 2 arccos r E for virtual sources on a horizontal and vertical path on
an octahedron setup using MDAP with additional 8 half-amplitude virtual sources at 45◦ distance
to the main virtual source
MDAP with 3D loudspeaker layouts. For more arbitrary 3D loudspeaker arrangements, multiple-directions could be arranged ring-like, see Fig. 3.9. This arrangement
uses 8 additional virtual sources inclined by 45◦ wrt. the main virtual source.
At least mathematically, however, it requires to post optimize the amplitudes and
angles of the virtual sources in order to accurately match the desired r V or r E vector in
direction and length on irregular loudspeaker arrangements, cf. [7]. Non-uniform r V
vector lengths of the individual virtual sources involved cause a distorted resultant
vector. In particular, their superposition is distorted towards those of the multiple
virtual source directions with the longest r V vectors. Epain’s article [7] proposes
optimization retrieving optimal orientation and weighting of the multiple virtual
sources for every panning direction.
3.3 Challenges in 3D Triangulation: Imaginary
Loudspeaker Insertion and Downmix
Surrounding loudspeaker hemispheres typically exhibit the following two problems,
in most cases:
• Loudspeaker rectangles at the sides of standard setups with height (ITU-R
BS.2051-0 [8]) can be decomposed ambiguously into triangles at the sides, back,
and top. This can yield noticeable ranges within loudspeaker quadrilaterals, in
which auditory events are unexpectedly created by just two of the loudspeakers.
• Signals of virtual sources below the horizon usually get lost.
The problem of unfavorable or ambiguous triangulations into loudspeaker triplets
appears subtle, however, it can cause clearly audible deficiencies. Especially when
ambiguous triangulation yields asymmetric behavior between left and right, e.g., for
the top, rear, and lateral directions, where we would manually define loudspeaker
quadrilaterals instead of triangles, see [9].
As surrounding loudspeaker hemispheres are typically open by 180◦ towards
below, VBAP/ VBIP/ MDAP is numerically unstable and theoretically useless for
any panning direction below. Despite the absence of loudspeakers below renders
3.3 Challenges in 3D Triangulation: Imaginary Loudspeaker Insertion and Downmix
49
downwards amplitude panning theroretically infeasible, it is still reasonable to preserve signals of virtual sources that are meant for playback on spherically surrounding
setups.
In the case of the asymmetric loudspeaker rectangles, see Fig. 3.10, and a missing
lower hemisphere of surrounding loudspeakers, the insertion of one or more imaginary loudspeakers in the vertical direction (nadir) or in the middle of the rectangle
(the average direction vector) has proven to be a useful strategy, e.g. in [10]. Any
imaginary loudspeaker aims at either extending the admissible triangulation towards
open parts of the surround loudspeaker setup, or to cover for parts with potential
asymmetry, see [9].
0
−5
−10
−15
−20
0
90
180
270
360
120
100
80
60
40
20
0
0
90
180
0
90
180
120
100
80
60
40
20
0
Fig. 3.10 VBAP on the ITU D (4 + 5 + 0) setup [8]. Top row: Insertion of imaginary loudspeaker
at nadir preserves loudness of downward-panned signals, shown for vertical path and E values in
1
dB for factors { √1 , √
, 0} to re-distribute the signal to the 5 existing horizontal loudspeakers.
5 2 5
Middle row: Due to typical triangulation, two left-right mirrored vertical paths (45◦ azimuth) yield
asymmetric behavior, as shown by the 2 arccos r E measure. Bottom row: Insertion of imaginary
loudspeaker at 65◦ fixes symmetry and feeds the signal with a factor √1 to the 4 existing neighbor
4
loudspeakers
50
3 Amplitude Panning Using Vector Bases
The signal of the imaginary loudspeaker can be dealt with in two ways
• it can be dismissed, e.g., for loudspeaker below at nadir, this would still yield a
signal near the closest horizontal pair of loudspeakers for virtual sources panned
to below-horizontal directions unless panned exactly to nadir
• it can be down-mixed to the neighboring M loudspeakers by a factor of √1M , or
less as in Fig. 3.10; alternatively for control yielding perfectly flat E measures, the
resulting down-mixed gain vector can be re-normalized by Eq. (3.2).
3.4 Practical Free-Software Examples
3.4.1 VBAP/MDAP Object for Pd
There is a classic VBAP/MDAP implementation by Ville Pulkki that is available
as external in pure data (Pd). The example in Fig. 3.11 illustrates its use together with
some other useful externals in Pd. Software requirements are:
loadbang
define_loudspeakers 3 0 0 90 0 180 0 -90 0 0 90 0 -90
octahedron layout
azimuth
elevation
20
spread
t b f 10
t b f 25
panning angles and spread
t b f
<-VBAP/MDAP gains
vbap
expr $f1+1;
$f2
t f f
<-shaping individual gains to panning vector
pack
master
level
sel 6 element $1 1 $2
mtx 6 1
osc~ 1000
<-test signal
mtx_*~ 6 1 10
dbtorms
prvu~
>+12
+6
+2
-0dB
-2
-6
-12
-20
-30
-50
<-99
Front
prvu~
>+12
+6
+2
-0dB
-2
-6
-12
-20
-30
-50
<-99
Left
prvu~
>+12
+6
+2
-0dB
-2
-6
-12
-20
-30
-50
<-99
Rear
prvu~
prvu~
>+12
+6
+2
-0dB
-2
-6
-12
-20
-30
-50
<-99
>+12
+6
+2
-0dB
-2
-6
-12
-20
-30
-50
<-99
Right
Top
prvu~ multiline~ 0 0 0 0 0 0 10
>+12
+6
+2
-0dB
-2
-6
-12
-20
-30
-50
<-99
dac~ 1 2 3 4 5 6
lspks
Bottom
Fig. 3.11 Vector-Base/Multi-Direction Amplitude Panning (VBAP/MDAP) example in pure data
(Pd) using Pulkki’s [vbap] external for an octahedral layout
3.4 Practical Free-Software Examples
•
•
•
•
51
pure-data (free, http://puredata.info/downloads/pure-data)
iemmatrix (free download within pure-data)
zexy (free download within pure-data)
vbap (free download within pure-data).
3.4.2 SPARTA Panner Plugin
The SPARTA Panner under http://research.spa.aalto.fi/projects/sparta_vsts/plugins.html provides a vector-base amplitude panning interface (VBAP) and multiple-direction
amplitude panning
(MDAP), see Fig. 3.12, with frequency-dependent loudness norp
L
gl adjustable to the listening conditions, see Laitinen [11].
malization by p l=0
The parameter DTT can be varied between 0 (standard, frequency-independent
VBAP normalization, i.e. diffuse-field normalization), 0.5 for typical listening environments, and 1 for the anechoic chamber. The plugin allows to either manually
enter the azimuth and elevation angles of multiple panning directions (if more than
one input signal is used) and for the playback loudspeakers, or import/export from/to
preset files. Of course all panning directions can be time-varying and be moved per
mouse, automations, or controls.
Fig. 3.12 The Panner VST plug-in from Aalto University’s SPARTA plug-in suite manages
Vector-Base Amplitude Panning within sequencers supporting VST
52
3 Amplitude Panning Using Vector Bases
References
1. V. Pulkki, Spatial sound generation and perception by amplitude panning techniques, Ph.D.
dissertation, Helsinki University of Technology (2001)
2. V. Pulkki, Virtual sound source positioning using vector base amplitude panning. J. Audio Eng.
Soc. 45(6), 456–466 (1997)
3. V. Pulkki, Uniform spreading of amplitude panned virtual sources, in Proceedings of the Applications of Signal Processing to Audio and Acoustics (WASPAA), IEEE (New Paltz, NY, 1999)
4. M. Frank, Phantom sources using multiple loudspeakers in the horizontal plane, Ph.D. Thesis,
Kunstuni Graz (2013)
5. M. Frank, F. Zotter, Extension of the generalized tangent law for multiple loudspeakers, in
Fortschritte der Akustik - DAGA (Kiel, 2017)
6. M. Frank, Source width of frontal phantom sources: perception, measurement, and modeling.
Arch. Acoust. 38(3), 311–319 (2013)
7. N. Epain, C.T. Jin, F. Zotter, Ambisonic decoding with constant angular spread. Acta Acust.
United Acust. 100(5) (2014)
8. ITU, Recommendation BS.2051: Advanced sound system for programme production (ITU,
2018)
9. M. Romanov, M. Frank, F. Zotter, T. Nixon, Manipulations improving amplitude panning on
small standard loudspeaker arrangements for surround with height, in Tonmeistertagung (2016)
10. F. Zotter, M. Frank, All-round ambisonic panning and decoding. J. Audio Eng. Soc. (2012)
11. M.-V. Laitinen, J. Vilkamo, K. Jussila, A. Politis, V. Pulkki, Gain normalization in amplitude
panning as a function of frequency and room reverberance, in Proceedings 55th AES Conference
(Helsinki, 2014)
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0
International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing,
adaptation, distribution and reproduction in any medium or format, as long as you give appropriate
credit to the original author(s) and the source, provide a link to the Creative Commons license and
indicate if changes were made.
The images or other third party material in this chapter are included in the chapter’s Creative
Commons license, unless indicated otherwise in a credit line to the material. If material is not
included in the chapter’s Creative Commons license and your intended use is not permitted by
statutory regulation or exceeds the permitted use, you will need to obtain permission directly from
the copyright holder.
Chapter 4
Ambisonic Amplitude Panning
and Decoding in Higher Orders
…the second-order Ambisonic system offers improved imaging
over a wider area than the first-order system and is suitable for
larger rooms.
Jeffrey S. Bamford [1], Canadian Acoustics, 1994.
Abstract Already in the 1970s, the idea of using continuous harmonic functions of
scalable resolution was described by Cooper and then Gerzon, who introduced the
name Ambisonics. This chapter starts by reviewing properties of first-order horizontal
Ambisonics, using an interpretation in terms of panning functions. And the required
mathematical formulations for 3D higher-order Ambisonics are developed here, with
the idea to improve the directional resolution. Based on this formalism, ideal loudspeaker layouts can be defined for constant loudness, localization, and width, according to the previous models. The chapter discusses how Ambisonics can be decoded
to less ideal, typical loudspeaker setups for studios, concerts, sound-reinforcement
systems, and to headphones. The behavior is analyzed by a rich variety of listening experiments and for various decoding applications. The chapter concludes with
example applications using free software tools.
Cooper [2] used higher-order angular harmonics to formulate circular panning of
auditory events. Due to the work of Felgett [3], Gerzon [4], and Craven [5], the term
Ambisonics became common for technology using spherical harmonic functions.
Around the early 2000s, most notably Bamford [6], Malham [7], Poletti [8], Jot [9],
and Daniel [10] pioneered the development of higher-order Ambisonic panning and
decoding, Ward and Abhayapala [11], Dickens [12], and at the lab of the authors
Sontacchi [13].
Another leap happened around 2010, when Ambisonic decoding to loudspeakers
could be largely improved by considering regularization methods [14], singularvalue decomposition [15], and all-round Ambisonic decoding (AllRAD) [15, 16],
© The Author(s) 2019
F. Zotter and M. Frank, Ambisonics, Springer Topics in Signal Processing 19,
https://doi.org/10.1007/978-3-030-17207-7_4
53
54
4 Ambisonic Amplitude Panning and Decoding in Higher Orders
a combination of vector-base panning techniques with Ambisonics, yielding the most
robust and flexible higher-order decoding method known today.
For headphones, after the work of Jot [9] that outlined the basic problems
of binaural decoding in the 1990s, Sun, Bernschütz, Ben-Hur, and Brinkmann
[17–19] made important contributions to binaural decoding, and we consider TAC
and MagLS decoders by Zaunschirm and Schörkhuber [20, 21] as the essential binaural decoders. Both remove HRTF delays or optimize HRTF phases at high frequencies
to avoid spectral artifacts. By interaural covariance correction, MagLS/TAC manage
to play back diffuse fields consistently, using the formalism of Vilkamo et al [22].
4.1 Direction Spread in First-Order 2D Ambisonics
In 2D first-order Ambisonics as discussed in Chap. 1, the directional mapping of a
single sound source from the angle ϕs to the direction of each loudspeaker ϕ
is described by the shape of panning function (or direction-spread function) in
Eq. (1.17). The directional spreading is not infinitely narrow, but determined by what
can be represented by first-order directivity patterns. Consequently, sound from the
angle ϕs will be mapped by a dipole pattern aligned with the source and an additional omnidirectional pattern. We can involve a spread parameter a to make the directional spread to the loudspeakers
√system adjustable and either cardioid-shaped a = 1,
2D-supercardioid-shaped a = 2, or 2D-hypercardioid-shaped a = 2, using:
g(ϕ) = 1 + a cos(ϕ − ϕS ).
(4.1)
This function represents how first-order Ambisonic panning would distribute a mono
signal to loudspeakers. With the loudspeaker positions described by the set of angles
{ϕl }, a vector of amplitude-panning gains with an entry for each loudspeaker could
be determined by sampling the direction-spread function:
⎤
⎡
⎤
cos(ϕ1 − ϕS )
g1
⎢
⎥
⎢ ⎥
..
g = ⎣ ... ⎦ = 1 + a ⎣
⎦.
.
gL
cos(ϕL − ϕS )
⎡
(4.2)
With these gain values, we evaluate models of perceived loudness, direction, and
width, as introduced in Chap. 2, in order to enter a discussion of perceptual goals.
If the loudspeaker directions {θl } are chosen suitably, it is possible
to2 obtain
panning-independent
loudness,
direction,
and
width
measures
E
=
l gl , r E =
1 2
5 180◦
l gl θl , and 8 π 2 arccos r E . How is it done?
E
For first-order 2D Ambisonics, it is theoretically optimal
√ to use at least a ring of
4 loudspeakers with uniform angular spacing and a = 2, which is easily checked
with the aid of a computer, cf. Fig. 4.1, and explained below and in Sect. 4.4.
Direction spread in FOA. The panning-function interpretation with its directional
spread has some similarity to MDAP, with its attempt to directionally spread an
amplitude-panned signal. Similar to the discrete virtual spread by ±α = arccos r E 55
60
cardioid a=0.5
2D−supercard. a=1.414
2D−hypercard. a=2
E
50
40
30
0
90
180
270
360
panning angle in degree
30
s
15
0
E
∠ r − φ in degree
Fig. 4.1 Width-related angle
arccos r E and angular
error ∠r E − ϕs for first-order
Ambisonics on a ring with
90◦ spaced loudspeakers
using a cardioid, or 2D
super-/hyper-cardioid
patterns; the measures are all
panning-invariant and the
super-cardioid weighting is
the 2D max-r E weighting
with arccos r E = 45◦
arccos|r | in degree
4.1 Direction Spread in First-Order 2D Ambisonics
−15
−30
cardioid a=0.5
2D−supercard. a=1.414
2D−hypercard. a=2
0
90
180
270
360
panning angle in degree
around the panning direction. The virtual direction spread of first-order Ambisonics
is described by its continuous panning function g(ϕ) in Eq. (4.1). To inspect the
continuous function by the r E measure defined in Eq. (2.7), we may evaluate an
integral over the panning function instead of the sum. Because of the symmetry
around ϕs , we may set for convenience ϕs = 0, which knowingly causes rE,y = 0,
and evaluate
rE,x =
2π
0
g 2 (ϕ) cos ϕ dϕ
2π
0
g 2 (ϕ) dϕ
=
π
0 [1 + 2a
π
0 [1 +
cos ϕ + a 2 cos2 ϕ] cos ϕ dϕ
2a cos ϕ +
a2
cos2
ϕ] dϕ
=
a
2 .
1 + a2
(4.3)
2a
d
4+2a −4a
= 0, hence at a =
The maximum of rE,x = 2+a
2 is found by da r E,x =
2
√ 2+a
√
2
1
√
2. Consequently, the 2D max-r E weight is rE,x = 2 = 2 and yields the angle
arccos r E = 45◦ . This would resemble a 2D-MDAP-equivalent source spread to
±45◦ . Note that first-order Ambisonics cannot map to a smaller spread than this.
Only higher orders permit to further reduce this spread to a desired angle below 90◦ .
2
2
Ideal loudspeaker layouts. Not only is the directional aiming of the virtual, continuous first-order Ambisonic panning function ideal and its width panning-invariant,
also its loudness measure is panning-invariant. However, decoding to a physical
loudspeaker setup can degrade the ideal behavior. For which loudspeaker layout are
these properties preserved by sampling decoding?
The 2D first-order Ambisonic components (W, X, Y ) correspond to {1, cos ϕ,
sin ϕ} patterns, a first-order Fourier series in the angle. Sampling the playback directions by L = 3 uniformly spaced loudspeakers on the horizon, the sampling theorem
for this series is already fulfilled. Accordingly, Parseval’s theorem ensures panninginvariant loudness E for any panning direction.
For an ideal r E measure, however, one more loudspeaker is required L ≥ 4
for a uniformly spaced horizontal ring. To explain this increase exhaustively, the
concept of circular/spherical polynomials and t-designs will be introduced in this
chapter. For a brief explanation, g 2 (ϕ) is a second-order expression and therefore to
represent the ideal constant loudness E = g 2 (ϕ)
dϕ2 of the continuous panning
function consistently after discretization E = 2π
l gl , it requires L = 3 uniformly
L
56
4 Ambisonic Amplitude Panning and Decoding in Higher Orders
spaced loudspeakers, as argued before. By contrast, the expressions g 2 (ϕ) cos ϕ
and g 2 (ϕ) sin ϕ are third-order and appear in r E · E = g 2 (ϕ) [cos ϕ, sin ϕ]T dϕ.
Consequently, ideal mapping of r E (direction and width) requires at least one more
loudspeaker L = 4 for a uniformly
arrangement to make the continuous and
spaced
2
g
[cos
ϕl , sin ϕl ]T perfectly equal.
the discretized form r E E = 2π
l l
L
Towards a higher-order panning function. An Nth-order cardioid pattern is obtained
from the cardioid pattern by taking its Nth power
gN (ϕ) =
1
(1 + cos ϕ)N ,
2N
which makes it narrower. With N = 2, this becomes, using cos2 ϕ = 21 (1 + cos 2ϕ),
g2 (ϕ) =
1
1
(1 + 2 cos ϕ + cos2 ϕ) = (3 + 4 cos ϕ + cos 2ϕ).
4
8
More generally, Chebyshev polynomials Tm (cos ϕ) = cos mϕ, cf. [23, Eq. 3.11.6]
can be used to argue that there is always a fully equivalent cosine series describing
the higher-order 2D panning function in the azimuth angle
N
g(ϕ) =
am cos mϕ.
(4.4)
m=0
Rotated panning function. In first-order Ambisonics, panning functions consist of
an omnidirectional part, cos(0ϕ) = 1, and a figure-of-eight to x, cos ϕ, but that was
not all: Recording and playback also required a figure-of-eight pattern to y, sin ϕ.
The additional component allows to express rotated first-order directivities by a basis
set of fixed directivities. For higher orders, a panning function rotated to a non-zero
aiming ϕs = 0
N
g(ϕ − ϕs ) =
am cos[m(ϕ − ϕs )]
(4.5)
m=0
can be re-expressed by the addition theorem cos(α + β) = cos α cos β − sin α sin β
into a series involving the sinusoids (odd symmetric part of a Fourier series),
N
g(ϕ − ϕs ) =
am [cos mϕs cos mϕ + sin mϕs sin mϕ]
(4.6)
m=0
N
N
am(c) cos mϕ +
=
n=0
am(s) sin mϕ.
m=0
We conclude: Higher-order Ambisonics in 2D (and the associated set of theoretical
microphone directivities) is based on the Fourier series in the azimuth angle ϕ.
4.2 Higher-Order Polynomials and Harmonics
57
4.2 Higher-Order Polynomials and Harmonics
The previous section required that direction and length of the r E vector resulting from
amplitude panning on loudspeakers matched the desired auditory event direction and
width. Harmonic functions with strict symmetry around a panning direction θ s will
help us in achieving this goal and in defining good sampling.
Regardless of the dimensions, be it in 2D or 3D, we desire to define continuous and
resolution-limited axisymmetric functions around the panning direction θs to fulfill
our perceptual goals of a panning-invariant loudness E, width r E , and perfect
alignment between panning direction θs and localized direction r E . Then we hope
to find suitable directional discretization schemes for ideal loudspeaker layouts, so
that the measures E and r E are perfectly reconstructed in playback.
The projection of a variable direction vector θ onto the panning direction θs
always yields the cosine of the enclosed angle θTs θ = cos φ, no matter whether it is
in two or three dimensions. Hereby constructing the panning function based on this
projection readily meets the desired goals. The mth power thereof, (θTs θ )m = cosm φ
T m
helps to build an Nth-order power series g = N
m=0 am (θs θ ) to describe a virtual
Ambisonic panning function.
T m
contains
all (N +
For 2D, such a circular polynomial g = N
m=0 am (θs θ )
m
T m
k
1)(N + 2)/2 mixed powers by (θs θ ) = (θxs θx + θys θy )m = m
k=0 k (θxs θx )
(θys θy )m−k of the direction vectors’ entries θTs = [θxs , θys ] and θ = [θx , θy ]T .
However,
already
Nrecognize mthat it only takes 2N + 1 functions to express
we could
T m
g= N
m=0 am (θs θ ) =
m=0 am cos φ: First an initial polynomial with relative
azimuth φ = ϕ −
ϕs relating to a harmonic
N series of N + 1 cosines or Chebyshevpolynomials g = N
m=0 bm cos mφ =
m=0 bm Tm (θs θ ). Then, in terms of absolute
azimuth ϕ, the trigonometric addition theorem re-expresses the series into one of
N + 1 cosines and N sines, with Tm (θs θ ) = cos[m(ϕ − ϕs )] = cos mϕs cos mϕ +
sin mϕs sin mϕ. As shown in the upcoming section, we can alternatively obtain such
orthonormal harmonic functions by solving a second-order differential equation that
is generally used to define harmonics, which bears the later benefit that we can use
the approach to define spherical harmonics in three
space dimensions.
an (θTs θ )n , involving the
Spherical polynomials are similar, g = N
n=0
n−k n l
expressions
(θTs θ )n = (θxs θx + θys θy + θzs θz )n = nk=0 l=0
(θzs θz )k
k n−k
l
n−k−l
(θxs θx ) (θys θy )
. Again, all these (N + 1)(N + 2)(N + 3)/6 combinations would
be too many to form an orthogonal set of basis functions. Moreover, while the different cosine harmonics are orthogonal axisymmetric functions in 2D, they are not in
3D. On the sphere, the N + 1 orthogonal
Legendre polynomials Pn (cos φ) replace
the cosine series as a basis for g = N
n=0 cn Pn (cos φ), as shown below. All mathematical derivations for the sphere rely on the definition of harmonics. They result
in (N + 1)2 spherical harmonics andtheir addition theorem as a basis in terms of
Pn (θTs θ ) = nm=−n Ynm (θs )Ynm (θ ). Dickins’ thesis is interabsolute directions 2n+1
4π
esting for further reading [12].
In both regimes, 2D and 3D, the circular or spherical polynomials concept will be
used to determine optimal layouts, so-called t-designs. Such t-designs are directional
58
4 Ambisonic Amplitude Panning and Decoding in Higher Orders
sampling grids that are able to keep the information about the constant part of any
either circular (2D) or spherical (3D) polynomials up to the order N ≤ t. This will
be a mathematical key property exploited to determine requirements for preserving
E and r E measures during Ambisonic playback with optimal loudspeaker setups,
but not only. Also t-designs simplify numerical integration of circular or spherical
harmonics to define state-of-the-art Ambisonic decoders or mapping effects.
4.3 Angular/Directional Harmonics in 2D and 3D
The Laplacian is defined in the D-dimensional Cartesian space as
D
=
j=1
∂2
,
∂ x 2j
(4.7)
and for any function f , the Laplacian f describes the curvature. Any harmonic
function is proportional to its curvature by an eigenvalue λ,
f = −λ f,
(4.8)
and therefore is an oscillatory function. Generally, eigensolutions f = −λ f to
the Laplacian are called harmonics. For suitable eigenvalues λ, harmonics span an
orthogonal set of basis functions that are typically used for Fourier expansion on a
finite interval. It seems desirable to find such harmonics for functions only exhibiting
directional dependencies, i.e. in the azimuth angle ϕ in 2D, and azimuth and zenith
angle ϕ, ϑ in 3D.
4.4 Panning with Circular Harmonics in 2D
For 2 dimensions Appendix A.3.2 uses the generalized chain rule to convert the
2
2
Laplacian of a 2D coordinate system = ∂∂x 2 + ∂∂y 2 to a polar coordinate system
∂
with the radius r and the angle ϕ to the x axis, = r1 ∂r∂ + ∂r∂ 2 + r12 ∂ϕ
2 . And for
functions = (ϕ) purely in the angle ϕ, the radial derivatives of all vanish
and it remains (∂ → d)
2
d2
dϕ 2
= −λ r 2 .
2
(4.9)
It is only yielding useful solutions with λ r 2 = m 2 , m ∈ Z, cf. Appendix A.3.4,
Fig. 4.2,
4.4 Panning with Circular Harmonics in 2D
−3
−2
59
−1
0
1
2
3
0dB
−6dB
−12dB
−18dB
0dB
−6dB
−12dB
−18dB
0dB
−6dB
−12dB
−18dB
0dB
−6dB
−12dB
−18dB
0dB
−12dB
−18dB
−6dB
0dB
−6dB
−12dB
−18dB
0dB
−6dB
−18dB
−12dB
Fig. 4.2
√ Circular harmonics with m = −3, . . . , 3 plotted as polar diagram using the radius R =
20 lg | π m | and grayscale to distinguish between positive (gray) and negative (black) signs
m
⎧√
⎪ 2 sin(|m|ϕ), for m < 0,
1 ⎨
=√
1,
for m = 0,
2π ⎪
⎩√
2 cos(mϕ), for m > 0,
(4.10)
which defines how to decompose panning functions of limited order |m| < N. The
harmonics are periodic in azimuth, orthogonal and normalized (orthonormal) on the
period −π ≤ ϕ ≤ π . Due to their completeness, any square-integrable function g(ϕ)
can be expanded into a series of the harmonics using coefficients γm
∞
g(ϕ) =
γm
m (ϕ).
(4.11)
m=−∞
For a known function g(ϕ), the coefficients γm are obtained by the transformation
integral
γm =
π
g(ϕ)
−π
m (ϕ) dϕ,
(4.12)
as shown in Appendix Eq. (A.14).
2D panning function. An infinitely narrow angular range around a desired direction
|ϕ − ϕs | < ε → 0 is represented by the transformation integral over a Dirac delta
distribution δ(ϕ − ϕs ), cf. Appendix Eq. (A.16), so that the coefficients of such a
panning function are
γm =
m (ϕs ).
(4.13)
As the infinite circular harmonic series is complete, the panning function is
∞
g(ϕ) =
m (ϕs )
m (ϕ)
= δ(ϕ − ϕs ),
(4.14)
m=−∞
and in practice we resolution-limit it to the Nth Ambisonic order, |m| ≤ N, and use
an additional weight am that allows us to design its side lobes
4 Ambisonic Amplitude Panning and Decoding in Higher Orders
N=1
N=2
N=5
0°
0dB
−10dB
180°
°
45
−20dB
0dB
−30dB
0°
−10dB
−20dB
−30dB
180°
N=1
N=2
N=5
D=2, maxre
90°
°
45
13
5°
D=2, basic
90°
13
5°
60
Fig. 4.3 2D unweighted an = 1 basic and weighted max-r E Ambisonic panning functions for the
orders N = 1, 2, 5
N
gN (ϕ) =
am
m (ϕs )
m (ϕ).
(4.15)
m=−N
πm
The max-r E panning function [24] uses the weights am = cos( 2(N+1)
), as derived in
90◦
Appendix Eq. (A.20). The spread is now adjustable by the order to ± N+1
. The result
is shown in Fig. 4.3, compared with no side-lobe suppression when an = 1 (basic).
It is easy to recognize: m (ϕs ) represents the recorded or encoded directions, and
m (ϕ) represents the decoded playback directions.
Optimal sampling of the 2D panning function. In the theory of circular/spherical
polynomials in the variable ζ = cos(ϕ − ϕs ), so-called t-designs in 2D are optimal
point sets of given angles {ϕl } with l = 1, . . . , L and size L. A t-design allows to
perfectly compute the integral (constant part) over the polynomials Pm (ζ ) of limited
degree m ≤ t by discrete summation
π
−π
L
Pm (cos φ) dφ =
Pm [cos(ϕl − ϕs )] 2π
,
L
(4.16)
l=1
regardless of any angular shift ϕs . In 2D, Chebyshev polynomials Tm (cos φ) =
cos(mφ) are orthogonal polynomials, therefore an Nth-order panning function composed out of cos(mφ) is always a polynomial of Nth degree. Knowing this, it is clear
that the integral over gN2 required to evaluate the loudness measure E is a polynomial
of the order 2N. The integral to calculate r E is over gN2 cos(φ) and thus of the order
2N + 1. In playback, to get a perfectly panning-invariant loudness measure E of the
continuous panning function and also the perfectly oriented r E vector of constant
spread arccos r E , the parameter t must be t ≥ 2N + 1. In 2D, all regular polygons
are t-designs with L = t + 1 points
ϕl =
2π
(l − 1).
t +1
We can use the smallest set of 2N + 2 angles ϕl =
180◦
N+1
(4.17)
(l − 1) as optimal 2D layout.
4.5 Ambisonics Encoding and Optimal Decoding in 2D
61
4.5 Ambisonics Encoding and Optimal Decoding in 2D
To encode a signal s into Ambisonic signals χm , we multiply the signal with the
encoder representing the direction of the signal at the angle ϕs by the weights m (ϕs )
χm (t) =
m (ϕs ) s(t),
(4.18)
or in vector notation
χ N = yN (ϕs ) s,
(4.19)
using the column vector yN = [ −N (ϕs ), . . . , N (ϕs )]T of 2N + 1 components.
The Ambisonic signals in χ N are weighted by side-lobe suppressing weights aN =
[a|−N| , . . . , aN ]T , expressed by the multiplication with a diagonal matrix diag{aN },
and then decoded to the L loudspeaker signals x by a sampling decoder
D=
2π
L
yN (ϕ1 ), . . . , yN (ϕL )
T
=
2π
L
Y TN ,
(4.20)
using
x = D diag{aN } χ N .
(4.21)
In total, the system for encoding and decoding can also be written to yield a set of
loudspeaker gains for one virtual source
g = D diag{aN } yN (ϕs ),
or in particular for the 2D sampling decoder g =
2π
L
(4.22)
Y TN diag{aN } yN (ϕs ).
4.6 Listening Experiments on 2D Ambisonics
There are several listening experiments discussing the features of Ambisonics, most
of which are summarized in [25], which will be discussed complemented with those
from [26] below.
The perceptually adjusted panning angle of 2nd-order max-r E Ambisonics
panning on 6 horizontal loudspeakers matches quite well the acoustic reference
direction as shown in Fig. 4.4, similar to MDAP in Fig. 3.8, but with a slightly more
62
4 Ambisonic Amplitude Panning and Decoding in Higher Orders
180
panning (=adjusted) angle in degree
panning (=adjusted) angle in degree
180
150
120
90
60
30
150
120
90
60
30
0
0
0
30
60
90
120
150
180
reference (=perceived) angles in degree
0
30
60
90
120
150
180
reference (=perceived) angles in degree
Fig. 4.4 2nd-order max-r E -weighted Ambisonic panning with pink-noise on horizontal rings of
60◦ -spaced loudspeakers adjusted to perceptually match reference loudspeaker directions (harmonic
complex) every 15◦ . Markers and whiskers indicate 95% confidence intervals and medians, black
curve the r E vector model
C
4
4
3
3
2
1
0
C
1
0
-1
B
2
1
2
A
5
x in m
x in m
B
A
5
1
2
-1
6
5
4
3
2
1
0
-1 -2 -3 -4 -5 -6
6
5
4
y in m
horizontal localization error in °
horizontal localization error in °
1
0
-1 -2 -3 -4 -5 -6
off-center position
20
w/o delay comp.
w delay comp.
8
2
y in m
central position
10
3
6
4
2
0
w/o delay comp.
w delay comp.
15
10
5
0
basic
max-rE
in-phase
basic
max-rE
in-phase
Fig. 4.5 In Frank’s 2008 pointing experiment [27] on center and off-center listening seats for 3
virtual sources (A, B, C) using 1st-order (left) and 5th-order (right) Ambisonics on 12 horizontal
loudspeakers (IEM CUBE) indicate a more stable localization with high orders. Moreover, for
5th-order, max-r E weighting and omission of delay compensation were preferred. Omission of
max-r E weights (“basic”) or alternative “in-phase” weights that entirely suppresses any side lobe
yield less precise localization at off-center listening positions
right
4.6 Listening Experiments on 2D Ambisonics
63
-45
-40
panning angle in °
-35
-30
-25
-20
-15
-10
left
-5
0
90 85 80 75 70 65 60 55 50 45 40 35 30 25 20 15 10
right
left
5
0
-5 -10 -15 -20 -25 -30 -35 -40 -45
perceived angle in °
right
-45
-40
panning angle in °
-35
-30
-25
-20
-15
-10
left
-5
0
90 85 80 75 70 65 60 55 50 45 40 35 30 25 20 15 10
left
5
0
-5 -10 -15 -20 -25 -30 -35 -40 -45
perceived angle in °
right
horizontal localization error in °
50
2.2 m radius
5 m radius
40
30
20
10
0
center
right back/left center
right back/left
1st order
3rd order
Fig. 4.6 Experiments on an off-center position in a show that max-r E outperforms the basic,
rectangularly truncated Fourier series at off-center listening positions, b where it can avoid splitting
of the auditory event. Stitt’s experiments c imply that localization with higher orders is more stable
and that the localization deficiency at off-center listening seats seems to be proportional to the
ratio between distance to the center divided by radius of the loudspeaker ring, and not the specific
time-delays that are larger for large loudspeaker rings, cf. [28]
accurate median by 0.5◦ on average, and in particular at side and back panning
directions.
Another aspect to investigate is how stable the results are for center and offcenter listening seats as shown in Fig. 4.5. It illustrates that max-r E with the highest
order achieves the best stability with regard to localization at off-center listening
seats. Astonishingly, the delay compensation for non-uniform delay times to the
center deteriorated the results, most probably because of the nearly linear frontal
64
4 Ambisonic Amplitude Panning and Decoding in Higher Orders
Fig. 4.7 Predicted sweet area sizes using the r E model Sect. 2.2.9 for loudspeaker layouts and
playback orders used in Stitt’s experiments [28]: first order (top), third order (bottom), small (left),
and large (right)
arrangement of loudspeakers that is more robust to lateral shifts of the listening
positions than a circular arrangement.
Figure 4.6a, b shows the direction histogram for two different weightings am ,
and it illustrates that proper sidelobe suppression of the panning function by using
max-r E weights is decisive at shifted listening positions to avoid splitting of the
auditory image, as it appears in Fig. 4.6b without the weights (basic).
Peter Stitt’s work shows that the localization offsets at off-center listening seats do
not increase with the radius of the loudspeaker arrangement as long as the off-center
seat stays in proportion to the radius, Fig. 4.6c. The result are predicted by the sweet
area model from Sect. 2.2.9 for the first order (top row) and third order (bottom row)
in Fig. 4.7, with both sizes small setup (left) and large setup (right).
4.6 Listening Experiments on 2D Ambisonics
65
5
L
C
R
4
3
x in m
2
1
0
-1
-2
-3
-4
-5
6
5
4
3
2
1
0
-1 -2 -3 -4 -5 -6
y in m
(a) Listening area size experiment
(b) Sweet area for frontal sound
relative source width
1
narrow
Fig. 4.9 Frank’s 2013
experiments showed for
max-r E Ambisonics on three
frontal loudspeakers that
large perceived widths are
not entirely accurately
modeled by the
optimally-sampled, therefore
constant r E vector for low
orders, however reasonably
well, and significantly for
high orders/narrow width
wide
Fig. 4.8 The perceptual sweet spot size as investigated by Frank [29] is nearly covering the entire
area enclosed by the IEM CUBE as a playback setup (black = 5th, gray = 3rd, light gray = 1st
order Ambisonics). It is smallest for 1st-order Ambisonics
0.5
0
on lsp. quarter half
on lsp. quarter half
8 loudspeakers/3rd order 16 loudspeakers/7th order
Frank’s 2016 experiments [29] used scales on the floor from which listeners read
off where the sweet area ends in every radial direction, cf. Fig. 4.8a. For Fig. 4.8b,
the criterion for listeners to indicate leaving the sweet area was when the frontally
panned sound was mapped outside the loudspeaker pairs L, C, and R. It showed that
a sweet area providing perceptually plausible playback measures at least 23 of the
radius of the loudspeaker setup if the order is high enough.
The perceived width of auditory events is investigated in the experimental results
of Fig. 4.9, [25], in which pink noise was frontally panned in different orientations
of the loudspeaker ring (with one loudspeaker in front, with front direction lying
quarter- and half-spaced wrt. loudspeaker spacing). Listeners compared the width
of multiple stimuli, and the results were expected to indicate constant width for the
differently rotated loudspeaker ring, as the optimal arrangement with L = 2N + 2
66
4 Ambisonic Amplitude Panning and Decoding in Higher Orders
imperceptible
coloration
very intense
Fig. 4.10 Frank’s 2013
experiments on the variation
of the sound coloration of
virtual sources rotating at a
speed of 100◦ /s imply that
max-r E weighting
outperforms the “basic”
weighting with am = 1, and
that adjusting the number of
loudspeakers to the
Ambisonic order seems to be
reasonable
1
central position
basic
max r
0.8
off-center position
E
0.6
0.4
0.2
0
8/3
16/7
loudspeakers/order
8/3
16/7
loudspeakers/order
provides constant r E length. The panning-invariant length is not perfectly reflected in
the perceived widths with 3rd order on 8 loudspeakers, for which the on-loudspeaker
position is perceived as being significantly wider. By contrast, the high-order experiment with 7th order on 16 loudspeakers would perfectly validate the model.
Figure 4.10 shows experiments investigating the time-variant change in sound
coloration for a pink-noise virtual source rotating at a speed of 100◦ /s, and for
different Ambisonic panning setups. There is an obvious advantage of a reduced
fluctuation in coloration at both listening positions, centered and off-center, when
using the side-lobe-suppressing “max-r E ” weighting instead of the “basic” rectangular truncation of the Fourier series. At the off-center listening position, max-r E
weights achieve good results with regard to constant coloration for both 3rd and 7th
order arrangements with 8 and 16 loudspeakers that were investigated.
How well would diffuse signals be preserved played back? All the above experiments
deal with how non-diffuse signals are presented. To complement what is shown in
Fig. 1.21 of Chap. 1 with an explanation, the relation between Ambisonic order and its
ability to preserve diffuse fields is estimated here by the covariance between uncorrelated directions. Assume a max-r E -weighted Nth-order Ambisonic panning function
g(θ Ts θ ) that is normalized to g(1) = 1, encodes two sounds s1,2 from two directions
θ 1 and θ 2 , with the sounds being uncorrelated and unit-variance E{s1 s2 } = δ1,2 .
We can find that the Ambisonic representation mixes the sounds at their respective
mapped directions and yields an increase of their correlation x1 = s1 + g12 s2 and
x2 = s2 + g12 s1 , using g12 = g(cos φ),
R{x1 x2 } = =
E{x1 x2 }
(4.23)
E{x12 }E{x22 }
2
E{(1 + g12
) s1 s2 + g12 (s12 + s22 )}
E{s12
+ 2 g12 s1 s2 +
2 2
g12
s2 }E{s22
+ 2 g12 s1 s2 +
2 2
g12
s1 }
=
2 g12
.
2
1 + g12
4.6 Listening Experiments on 2D Ambisonics
67
This result was presented in Fig. 1.21 and was used to argue that the directional
separation of first-order Ambisonics by its high crosstalk term g12 might be too
weak. Higher-order Ambisonics decreases this directional crosstalk and therefore
improves the representation of diffuse sound fields.
4.7 Panning with Spherical Harmonics in 3D
In three space dimensions, the spherical coordinate system has a radius r and two
angles, azimuth ϕ indicating the polar angle of the orthogonal projection to the x y
plane, and the zenith angle ϑ indicating the angle to the z axis, according to the righthanded spherical coordinate systems in ISO31-11, ISO80000-2, [30, 31], Fig. 4.11.
By the generalized chain rule, Appendix A.3 re-writes the Laplacian to spherical coordinates in 3D with r signifying the radius, ϕ the azimuth angle, and
the zenith angle ϑ re-expressed as ζ = rz = cos ϑ, yielding the operator =
2
1−ζ 2 ∂ 2
2 ∂
1
2
∂
∂2
+ ∂r∂ 2 + r 2 (1−ζ
. Any radius-dependent part is removed
2 ) ∂ϕ 2 − r 2 ζ ∂ζ +
r ∂r
r 2 ∂ζ 2
to define an eigenproblem yielding the basis for panning functions, taking only
r 2 ϕ,ζ,3D ,
2 1
∂2
∂
2 ∂
(4.24)
+ (1 − ζ ) 2 Y = −λ Y
− 2ζ
1 − ζ 2 ∂ϕ 2
∂ζ
∂ζ
whose solution with λ = n(n + 1) defines the spherical harmonics
Ynm (θ ) = Ynm (ϕ, ϑ) = m
n (ϑ)
m (ϕ).
(4.25)
The pre-requisites are (i) periodicity in ϕ and (ii) that the function Ynm is finite on
the sphere. In addition to the circular harmonics m expressing the dependency on
azimuth ϕ according to Eq. (4.10), the spherical harmonics contain the associated
Legendre functions Pnm and their normalization term
|m| |m|
m
n (ϑ) = Nn Pn (cos ϑ)
Fig. 4.11 The spherical
coordinate system
(4.26)
68
4 Ambisonic Amplitude Panning and Decoding in Higher Orders
to express the dependency on the zenith angle ϑ. The index n ≥ 0 expresses the order
and the directional resolution can be limited by requiring 0 ≤ n ≤ N. The index m
is the degree and for each n it is limited by −n ≤ 0 ≤ n.
The spherical harmonics, Fig. 4.12, are orthonormal on the sphere −π ≤ ϕ ≤
π and 0 ≤ ϑ ≤ π , and for unbounded order N → ∞ they are complete; see also
Appendix A.3.7.
The spherical harmonics permit a series representation of square-integrable 3D
directional functions by the coefficients γnm ,
∞
n
g(θ) =
γnm Ynm (θ ).
(4.27)
n=0 m=−n
From a known function g(θ), the coefficients are obtained by the transformation
integral over the unit sphere S2 , cf. appendix Eq. (A.38)
γnm =
S2
g(θ) Ynm (θ) dθ .
(4.28)
Note that the above N3D normalization θ∈S2 |Ynm (θ )|2 dθ = 1 defines each
spherical harmonic except for an arbitrary-phase it might be multiplied with.
Legendre functions for the zenith dependency might be defined differently in literature, and for azimuth, some implementations use sin(mϕ) insteadof sin(|m|ϕ).
(n−|m|)!
are
In Ambisonics, real-valued functions and the SN3D normalization 21 (n+|m|)!
preferred, and positive signs of the first-order dipole components in the directions of
the respective coordinate axes, x, y, z, are preferred. This might require to involve
Fig. 4.12 Spherical harmonics indexed by Ambisonic channel number AC N = n 2 + n + m; rows
show spherical harmonics for the order 0 ≤ n ≤ 3 with the 2n + 1 harmonics of the degree −n ≤
m ≤ n. What is plotted is a polar diagram with the radius R = 20 lg |Ynm | normalized to the upper
30 dB of each pattern, with positive (gray) and negative (black) color indicating the sign. The order
n counts the circular zero crossings, and |m| counts those running through zenith and nadir
4.7 Panning with Spherical Harmonics in 3D
69
the Condon-Shortley phase (−1)m to correct the signs of the Legendre functions, or
−1 for m < 0 to correct the sign of azimuthal sinusoids, depending on the implementation of the respective functions. It is often helpful to employ converters and
directional checks to ensure compatibility!
3D panning function. An infinitely narrow direction range around a desired direction
θ Ts θ > cos ε → 1 is represented by the transformation integral over the Dirac delta
δ(1 − θ Ts θ ), cf. Eq.(A.41), so that the coefficients of the panning function are
γnm = Ynm (θ s ).
(4.29)
As infinitely many spherical harmonics are complete, the panning function is
∞
n
g(θ) =
Ynm (θ s ) Ynm (θ ) = δ(1 − θ Ts θ ),
(4.30)
n=0 m=−n
and in practice, the finite-resolution Nth-order panning function with n ≤ N employs
a weight an to reduce side lobes and optimize the spread
N
n
gN (θ ) =
an Ynm (θ s ) Ynm (θ ).
(4.31)
n=0 m=−n
137.9◦
) , as derived
The max-r E panning function uses the weights an = Pn cos( N+1.51
137.9◦
in Appendix Eq. (A.46). The spread is now adjustable by the order to ± N+1.51
.
Figure 4.13 shows a comparison to the basic weighting an = 1. An alternative expression that uses Legendre polynomials Pn and only depends on the angle φ to the panobtained by replacing the sum over m by the spherical harmonics
ning direction θ s is
Pn (cos φ),
addition theorem nm=−n Ynm (θ s ) Ynm (θ) = 2n+1
4π
N
gN (φ) =
2n+1
4π
an Pn (cos φ).
(4.32)
n=0
Comparison to first-order Ambisonics shows: now Ynm (θ s ) represents the recorded
or encoded directions, and Ynm (θ ) represents the decoded playback directions.
Optimal sampling of the 3D panning function. In the theory of spherical polynomials in the variable ζ = θ Ts θ , so-called t-designs describe point sets of given
directions {θl } with l = 1, . . . , L and size L that allow to perfectly compute the integral (constant part) over the polynomials Pn (ζ ) of limited order n ≤ t by discrete
summation
π
1
dϕ
−π
−1
L
Pn (ζ ) dζ =
Pn (θ Ts θl ) 4π
,
L
l=1
(4.33)
4 Ambisonic Amplitude Panning and Decoding in Higher Orders
N=1
N=2
N=5
0°
0dB
−10dB
13
−20dB
0dB
−30dB
0°
−10dB
180°
°
13
45
°
−20dB
−30dB
180°
N=1
N=2
N=5
D=3, maxre
90°
45
5°
D=3, basic
90°
5°
70
Fig. 4.13 3D unweighted an = 1 basic and weighted max-r E Ambisonic panning functions for the
orders N = 1, 2, 5
relative to any axis θ s the point set is projected onto. In 3D, the Legendre polynomials Pn (ζ ) are orthogonal polynomials, therefore an Nth-order panning function
composed thereof is a polynomial of Nth order. The loudness measure E is calculated by the integral over gN2 , therefore over a polynomial of the order 2N. The
integral to calculate r E runs over gN2 ζ , therefore over a polynomial of the order
2N + 1. In playback, to get a perfectly panning-invariant loudness measure E of the
continuous panning function and also the perfectly oriented r E vector of constant
spread arccos r E , the parameter t must be t ≥ 2N + 1. In 3D there are only 5
geometrically regular layouts
•
•
•
•
•
the tetrahedron, L = 4 corners, is a 2-design,
the octahedron, L = 6 corners, is a 3-design,
the hexahedron (cube), L = 8 corners, is a 3-design,
the icosahedron, L = 12 corners, is a 5-design,
the dodecahedron, L = 20 corners, is a 5-design.
For instance, for N = 1, the octahedron is a suitable spherical design, for N = 2, the
icosahedral or dodecahedral layouts are suitable.
Exceeding the geometrically regular layouts, there are designs found by optimiza√
tion to be regular under the mathematical rule to approximate S2 Ynm (θ) dθ = 4π δn
m
accurately by 4π
l Yn (θl ) for all n ≤ t and |m| ≤ n. A large collection can be found
L
by Hardin and Sloane [32], Gräf and Potts [33], and Womersley [34] available on
the following websites
http://neilsloane.com/sphdesigns/dim3/
http://homepage.univie.ac.at/manuel.graef/quadrature.php
(Chebyshev-type Quadratures on S2 ), and
https://web.maths.unsw.edu.au/~rsw/Sphere/EffSphDes/ss.html.
Figure 4.14 gives some graphical examples.
4.8 Ambisonic Encoding and Optimal Decoding in 3D
71
Fig. 4.14 t-designs from Gräf’s website (Chebyshev-type quadrature)
4.8 Ambisonic Encoding and Optimal Decoding in 3D
To encode a signal s into Ambisonic signals χnm , we multiply the signal with the
encoder representing the direction θ s of the signal by the weights Ynm (θ s )
χnm (t) = Ynm (θ s ) s(t),
(4.34)
χ N = yN (θ s ) s,
(4.35)
or in vector notation
using the column vector yN = [Y00 (θ s ), Y1−1 (θ s ), . . . , YNN (θ s )]T of (N + 1)2 components. The Ambisonic signals in χ N are weighted by side-lobe suppressing weights
aN = [a0 , a1 , a1 , a1 , a2 , . . . , aN ]T , expressed by the multiplication with a diagonal
matrix diag{aN }, and then decoded to the L loudspeaker signals x by a sampling
decoder
T yN (θ1 ), . . . , yN (θL ) = 4π
Y TN ,
(4.36)
D = 4π
L
L
using
x = D diag{aN } χ N .
(4.37)
In total, the system for encoding Eq. (4.35) and decoding Eq. (4.36) can also be
written to yield loudspeaker gains for one signal
g = D diag{aN } yN (θ s ),
or in particular for the 3D sampling decoding g =
4π
L
(4.38)
Y TN diag{aN } yN (θ s ).
72
4 Ambisonic Amplitude Panning and Decoding in Higher Orders
4.9 Ambisonic Decoding to Loudspeakers
Ambisonic decoding to loudspeakers has been dealt with by numerous researchers,
in the past, particularly because result are not very stable for first-order Ambisonics,
and later because they strongly depend on how uniform the loudspeaker layout is for
higher-order Ambisonics. Moreover, Solvang found that even the use of too many
loudspeakers has a degrading effect [35].
For first-order decoding, the Vienna decoders by Michael Gerzon [36] are often
cited, and for higher-order Ambisonic decoding, one can, e.g. find works by Daniel
with max-r E [37] and pseudo-inverse decoding [10], also by Poletti [14, 38, 39].
What turned out to be the most practical solution, is the All-Round Ambisonic
Decoding approach (AllRAD) due to its feature of allowing imaginary loudspeaker
insertion and downmix as described in the sections above, cf. [40]. It moreover does
not have restrictions on the Ambisonics order, which for other decoders often yields
poor controllability of panning-dependent fluctuations in loudness and directional
mapping errors.
The playable set of directions θl or ϕl is usually finite and discrete, and it is
represented by the surrounding loudspeakers’ directions. The directional distribution
of the surrounding loudspeakers is typically neither a t-design (with t ≥ 2N + 1 in
general, sometimes not even regular polygons with L ≥ 2N + 2 loudspeakers for
2D, in particular). In such cases, it is extremely helpful to be aware of the properties
of the various decoder design methods.
4.9.1 Sampling Ambisonic Decoder (SAD)
The sampling decoder as introduced above is the simplest decoding method.
For dimensions (D = 2) and three (D = 3), it uses the matrix Y N = [ yN (θ1 ), . . . ,
yN (θL )] containing the respective circular or spherical harmonics yN (θ ) sampled at
the loudspeaker directions {θl },
D=
SD−1
L
Y TN ,
(4.39)
with the circumference of the unit circle
denoted as S1 = 2π or the surface of the unit
expresses that each loudspeaker synthesphere written as S2 = 4π . The factor SD−1
L
sizes a fraction of the E measure on the circle or sphere of the surrounding directions.
However, the sampling decoder would neither yield perfectly constant loudness and
width measures, E, r E , nor a correct aiming of the localization measure r E if the
loudspeaker layout wasn’t optimal. For instance concerning loudness, for panning
towards directional regions of poor loudspeaker coverage, sampling misses out the
main lobe of the panning function, yielding a noticeably reduced loudness.
4.9 Ambisonic Decoding to Loudspeakers
73
4.9.2 Mode Matching Decoder (MAD)
The mode-matching method is used in [10, 39] and yields a fundamentally different
decoder design. Its concept is to re-encode the gain vector g of the loudspeakers for
any panning direction θ s by the encoding matrix Y N = [ yN (θ1 ), . . . , yN (θL )] for all
loudspeaker directions {θl }. Ideally, the re-encoded result should match the encoding
of the panning direction with sidelobes suppressed
Y N g = diag{aN } yN (θ s ).
Using the definition g = D diag{aN } yN (θ s ) of the panning gains, we obtain
Y N D diag{aN } yN (θ s ) = diag{aN } yN (θ s ),
L
⇒ D = SD−1
Y TN (Y N Y TN )−1
(4.40)
so that the decoder D is required to be right-inverse to the matrix Y N , i.e. Y N D =
Y N Y TN (Y N Y TN )−1 = I, see Eq. (A.63) in Appendix A.4. For the inverse of Y N Y TN to
exist, it is necessary to have at least as many loudspeakers as harmonics, i.e. L ≥ (N +
1)2 with D = 3 or L ≥ 2N + 1 for D = 2. However, this is not a sufficient criterion
yet: In directions poorly covered with loudspeakers, the inversion will boost the
loudness, so that the result is often numerically ill conditioned for (Y N Y TN )−1 unless
the loudspeaker layout is uniformly designed, at least. Mode matching decoding is
ill-conditioned on hemispherical or semicircular loudspeaker layouts. The solution is
equivalently described by the more general pseudo inverse Y †N , which is right-inverse
for fat matrices.
4.9.3 Energy Preservation on Optimal Layouts
For instance, for an order of N = 2, 2D Ambisonics should work optimally with a
ring of 45◦ spaced loudspeakers on the horizon, a circular (2N + 1)-design, or for
3D, a spherical (2N + 1)-design. On a t-design selected by t ≥ 2N, the loudness
measure E is panning-invariant, in general,
T diag{a } y (θ ) = diag{a } y (θ )2 = const.
E = g2 = yT
N N s
N N s
N (θ s ) diag{a N } DD
=I
This is because a t ≥ 2N-design discretization preserves orthonormality
L
yN (θ ) yTN (θ ) dθ =
SD−1
L
yN (θl ) yTN (θl ) =
l=1
SD−1
Y N Y TN
L
= I,
(4.41)
E/Eideal in dB
Fig. 4.15 Analysis of
loudness, localization error,
and width for 3rd-order
sampling (SAD) and
mode-matching (MAD)
Ambisonic decoding for all
panning angles on a
sub-optimal 45◦ spaced
loudspeaker ring with gap at
−90◦ , compared to VBIP
4 Ambisonic Amplitude Panning and Decoding in Higher Orders
10
5
0
−5
−180 −135
∠ r −φs in °
E
74
0
45
90
135
180
−45
0
45
90
135
180
50
0
−50
−180 −135
arccos |rE| in °
−45
90
MAD, N=3
SAD, N=3
VBIP
67.5
45
22.5
0
−180 −135
−45
0
45
90
135
180
which implies for the sampling decoder DT D = SD−1
Y N Y TN = I, and we notice the
L
panning invariant norm of g(θ) within its coefficients γ N = diag{aN } yN (θ s ) by the
Parseval theorem g 2 (θ) dθ = γ N 2 . The panning invariant E measure also holds
for the mode-matching decoder using a t ≥ 2N-design, as it becomes
equivalent to
L
L
T
T −1
T SD−1
Y TN . Under
a sampling decoder D = SD−1 Y N (Y N Y N ) = SD−1 Y N L = SD−1
L
these ideal conditions, both decoders are energy-preserving.
4.9.4 Loudness Deficiencies on Sub-optimal Layouts
For 2D layouts, Fig. 4.15 shows what happens if a decoder is calculated for a t ≥
2N + 1-design with one loudspeaker removed: While, for panning across the gap, the
sampling Ambisonic decoder (SAD) yields a quieter signal, moderate localization
errors and width fluctuation, the mode-matching decoder (MAD) yields a strong
loudness increase and severe jumps in the localization/width. MAD is therefore not
very practical with sub-optimal layouts, SAD only slightly more so.
4.9.5 Energy-Preserving Ambisonic Decoder (EPAD)
To establish panning-invariant loudness for decoding to non-uniform surround
loudspeaker layouts one can ensure a constant loudness measure E by enforcing
DT D = I, which is otherwise only achieved on t ≥ 2N-designs. We may search for
4.9 Ambisonic Decoding to Loudspeakers
75
a decoding matrix D whose entries are closest to the sampling decoder under the
constraint to be column-orthogonal:
D −
SD−1
L
Y TN 2Fro → min
(4.42)
subject to DT D = I.
The singular value decomposition of
Y TN = U [diag{s}, 0]T V T
(4.43)
T
D = U I, 0 V T .
(4.44)
can be used to create
by replacing the singular values s with ones. Such a decoder is column-orthogonal,
as the singular-value decomposition delivers U T U = I and V V T = I, and as a consequence1 DT D = I. The energy-preserving decoder in this basic version requires
L ≥ 2N + 1 loudspeakers in 2D or L ≥ (N + 1)2 in 3D to work.
Note that if the loudspeaker setup directions are already a t ≥ 2N design, the
sampling, mode-matching, and energy-preserving decoders are equivalent.
4.9.6 All-Round Ambisonic Decoding (AllRAD)
In Chap. 3 on vector-base amplitude panning methods, a well-balanced panning result
in terms of loudness, width, and localization was achieved by MDAP that distributes
a signal to an arrangement of several superimposed VBAP virtual sources. Hereby
E = const., r E ≈ rE θ s , and rE ≈ const. This works for nearly any loudspeaker layout.
While, to calculate loudspeaker gains, MDAP superimposes an arrangement of
discrete virtual sources within a range of ±α around the panning direction θ s , one
could also think of superimposing a quasi-continuous distribution of virtual sources
that are weighted by a continuous panning function g(θ).
The ideal continuous panning function g(θ) of axisymmetric directional spread
around the panning direction θ s is described by g(θ) = yTN (θ) diag{aN } yN (θ s ), the
Ambisonic panning function. This rotation-invariant continuous function is optimal
in terms of loudness, width, and localization measures, which are all evaluated by continuous integrals: E = g 2 (θ ) dθ = const. expresses panning-invariant loudness,
r E = E1 g 2 (θ) θ dθ = rE θ s indicates a perfect alignment r E θ s with the panning
direction and a panning-invariant width rE = const. However, the optimal values of
1 In
(
((
detail, this follows from DT D = V [I, 0]
U T
U[I, 0]T V T = V(
[I,(
0][I,
0]T V T = V V T = I.
76
4 Ambisonic Amplitude Panning and Decoding in Higher Orders
these integrals are only preserved by discretization with optimal t ≥ 2N + 1-design
loudspeaker layouts.
All-round Ambisonic decoding (AllRAD) is preceded by the work of Batke and
Keiler [16]. They describe Ambisonic panning g AllRAD (θ) = D yN (θ ) by a decoder
D, whose result matches best with VBAP g VBAP (θ ). Without max-r E weights yet,
we use this here to define AllRAD by the integral expressing a minimum-meansquare-error problem using the integral over all panning directions θ
min
D
S2
g VBAP (θ ) − D yN (θ )2 dθ .
(4.45)
Equivalently, as described by Zotter and Frank [40] who coined the name, we may
define AllRAD as VBAP synthesis on the physical loudspeakers when using as
multiple-virtual-source inputs the Ambisonic panning function gAMBI (θ ) = yTN (θ )
diag{aN } yN (θ s ) sampled at an optimal layout of virtual loudspeakers. Here, we write
the synthesis as the integral over infinitely many virtual loudspeakers θ ,
g=
=
g VBAP (θ) gAMBI (θ ) dθ =
g VBAP (θ ) yTN (θ ) diag{aN } yN (θ s ) dθ
g VBAP (θ) yTN (θ) dθ diag{aN } yN (θ s ).
:= D
We can obviously pull the term diag{aN } yN (θ s ) out of the integral. The remaining
integral defines the AllRAD matrix D. We may interpret it as a transformation of the
VBAP loudspeaker gain functions g VBAP (θ ) into spherical harmonic coefficients. In
the original paper [40], AllRAD is evaluated by an optimal layout of discrete virtual
loudspeakers
D=
L̂
g VBAP (θ ) yTN (θ ) dθ =
SD−1
L̂
g VBAP (θ̂l ) yTN (θ̂l ) =
SD−1
L̂
T
Ĝ Ŷ N ,
(4.46)
l=0
using the directions {θ̂l } of a t-design. As VBAP’s gain functions aren’t smooth
(derivatives are non-continuous), they are order-unlimited, and a t-design of sufficiently high t should be used. In the 3D practice, the 5200 pts. Chebyshev-type
design from [33] is dense enough. Note that the VBAP part permits improvements
by insertion and downmix of imaginary loudspeakers to adapt to asymmetric or
hemispherical layouts, as suggested in the original paper [40], cf. Sect. 3.3.
Note that the decoder needs to be scaled properly. For instance, the norm of the
omnidirectional component (first column) could be equalized to one, as it would
typically be with a sampling decoder; there are alternative strategies to circumvent
the scaling problem [41].
10
5
0
−5
−180 −135
∠ rE−φs in °
Fig. 4.16 Analysis of
loudness, localization error
and width measure for
energy-preserving
Ambisonic decoding (EPAD)
and all-round Ambisonic
decoding (AllRAD) for all
panning angles on a
sub-optimal 45◦ spaced
loudspeaker ring with gap at
−90◦
77
E/Eideal in dB
4.9 Ambisonic Decoding to Loudspeakers
0
45
90
135
180
−45
0
45
90
135
180
50
0
−50
−180 −135
90
ALLRAD, N=3
EPAD, N=3
VBIP
67.5
E
arccos |r | in °
−45
45
22.5
0
−180 −135
−45
0
45
90
135
180
4.9.7 EPAD and AllRAD on Sub-optimal Layouts
Figude 4.16 shows the improvement achieved with EPAD and AllRAD on an equiangular arrangement that is suboptimal by the missing loudspeaker at −90◦ . Both
decoders manage to handle either the loudness stabilization perfectly well (EPAD)
or keep the directional and spread mapping errors small (AllRAD). We notice that
for EPAD, with the constraint that L ≥ (2N + 1) just fulfilled for N = 3 and L = 7
of the simulation, it would not simply be possible to remove any further loudspeakers
without degradation.
4.9.8 Decoding to Hemispherical 3D Loudspeaker Layouts
In typical loudspeaker playback situations for large audience, a solid floor and no
loudspeakers below ear level are considered practical for several reasons. However,
this does not permit decoding by sampling with optimal t-design layouts covering
all directions. As shown above, EPAD and AllRAD do not require such arrays. And
yet, they still require some care when used with hemispherical loudspeaker layouts,
see [15, 40] for further reading.
EPAD with hemispherical loudspeaker layouts. Even for a hemispherical layout, the
energy-preserving decoding method requires L ≥ (N + 1)2 loudspeakers to achieve a
perfectly panning-invariant loudness. However, this is counter-intuitive: Why should
one need at least as many loudspeakers on a hemisphere as are required for sameorder playback on a full sphere? Shouldn’t the number be half as many?
78
4 Ambisonic Amplitude Panning and Decoding in Higher Orders
Table 4.1 Integration ranges 0 ≤ ϑ ≤ ϑmax to obtain (N + 1)(N + 2)/2 Slepian functions with
minimum loudness fluctuation maxE
minE for panning on the hemisphere
N
1
2
3
4
5
6
7
8
9
ϑmax
115◦
114◦
113◦
111◦
108◦
106◦
104◦
102◦
maxE
minE
0.5 dB
0.4 dB
0.4 dB
0.3 dB
0.3 dB
0.3 dB
0.2 dB
0.3 dB
101◦
0.2 dB
We can show that while the spherical harmonics are orthonormal on the sphere
S2 , i.e. S2 yN (θ ) yTN (θ ) dθ = I, they aren’t orthogonal on the hemisphere S = S2 :
ϑ ≤ ϑmax
yN (θ ) yTN (θ ) dθ = G.
(4.47)
S
T
Here, G is called Gram matrix, and it is evaluated by 4π
l:θz,l ≥0 yN (θl ) yN (θl ) using a
L̂
high-enough t-design. By singular-value decomposition of the positive semi-definite
matrix G = Q diag{s} Q T , with Q T Q = Q Q T = I, we diagonalize G and find new
basis functions ỹN (θ ), the so-called Slepian functions [42], that are orthogonal on S
Q G Q = diag{s} =
T
S
Q T yN (θ) yTN (θ ) Q dθ ,
⇒ ỹN (θ) = Q T yN (θ ).
Typically, the singular values in s are sorted descendingly s1 ≥ s2 ≥ · · · ≥ s(N+1)2
so that it is possible to cut out basis functions of significantly large contribution to
the upper hemisphere S by
ỹN (θ ) = [I, 0] Q T yN (θ ).
(4.48)
Typically, the numerical integral is extended to slightly below the horizon, see
Table 4.1, so that truncation to the (N + 1)(N + 2)/2 most significant basis functions, see Fig. 4.17, produces a minimum fluctuation in the loudness measure
Ẽ = ỹN (θ )2 for panning on the hemisphere.
With ỹN (θ), EPAD is calculated in the same way as for the ordinary harmonics
T
T
Ỹ N = Ũ[diag{s}, 0]T Ṽ ,
T
D̃ = Ũ[I, 0]T Ṽ ,
(4.49)
with the main difference that the lower limit for the number of loudspeakers decreases
to L ≥ (N + 1)(N + 2)/2. Interfaced to the spherical harmonics by [I, 0] Q T , the
hemispherical energy-preserving decoder becomes
D = D̃ [I, 0] Q T .
(4.50)
4.9 Ambisonic Decoding to Loudspeakers
79
Fig. 4.17 Slepian basis functions for the upper hemisphere, composed of the spherical harmonics
up to 3rd order, using 0◦ ≤ ϑ ≤ 113◦ as integration interval and (N + 1)(N + 2)/2 functions. To
get these nice shapes, the Slepian functions were found separately for every degree m where they
were moreover de-mixed by Q R decomposition
AllRAD with hemispherical loudspeaker layouts. Because of the vector-base
amplitude panning involved, all-round Ambisonic decoding (AllRAD) is comparatively robust to irregular loudspeaker setups. Still, a hemispherical layout does not
contain any loudspeaker direction vector pointing to the lower half space, therefore
one could just omit information of the lower half space. However, the Ambisonic
panning function implies a directional spread, so that panning to exactly the horizon
also produces content below, whose omission causes: (i) a loss in loudness, (ii) a
slight elevation of the perceived direction, cf. Fig. 4.18.
As discussed in the section on triangulation Sect. 3.3, the insertion of imaginary
loudspeakers fixes this behavior. In the case of hemispherical loudspeaker layouts,
it is not necessary to downmix the signal of the imaginary loudspeaker at nadir to
stabilize both loudness and localization for panning to the horizon.
Signal contributions below but close to the horizon largely contribute to the horizontal loudspeakers, and it is therefore safe to dispose the signal that would feed
the imaginary loudspeaker at nadir without loss of loudness. Moreover, this contribution from below also reinforces signals on the horizontal loudspeakers so that
localization is pulled back down. Both can be observed in Fig. 4.18 that shows the
loudness measure E as well as mislocalization and width by the measure r E using
max-r E -weighted AllRAD with 5th-order Ambisonics along a vertical panning circle on the IEM mobile Ambisonics Array (mAmbA). It consists of 25 loudspeakers
set up in rings of 8, 8, 4, 4, and 1 loudspeakers at 0, 20, 40, 60, 90 degrees elevation.
Rings two and four start at 0 degree, the others are half-way rotated.
Performance comparison on hemispherical layouts. Figure 4.18 shows a comparison of AllRAD and EPAD decoding to the 25-channel mAmbA hemispherical loudspeaker layout.
80
4 Ambisonic Amplitude Panning and Decoding in Higher Orders
3
0
−3
−6
−9
180
20
10
0
−10
−20
180
80
60
40
20
0
180
90
0
−90
−180
90
0
−90
−180
90
0
−90
−180
Fig. 4.18 Perceptual measures for 5th max-r E -weighted AllRAD on the IEM mAmbA layout [43]
with (black) and without insertion of the bottom imaginary loudspeaker (black dotted) whose signal
is disposed, and max-r E -weighted EPAD (gray), for panning on a vertical circle: E in dB (top),
orientation error of r E in degrees (middle), and width expressed by arccos r E in degrees (bottom).
The thin dashed line shows AllRAD without imaginary loudspeakers
While (top in Fig. 4.18) AllRAD produces a loudness fluctuation roughly spanning
1 dB for panning on the hemisphere, EPAD only exhibits 0.3 dB, as specified in
Table 4.1. While in monophonic playback of noise, loudness differences of less than
0.5 dB can be heard, it is safe to assume that a weak directional loudness fluctuation
of less than 1 dB is normally inaudible. In this regard, loudness fluctuation should
be no problem with both EPAD and AllRAD.
Concerning the directional mapping, EPAD produces a more strongly pronounced
ripple, with r E indicating sounds on the horizon ϑs = ±90◦ to be pulled upwards
towards 0◦ more with EPAD (7◦ ) than with AllRAD (3◦ ). In terms of width, both
EPAD and AllRAD exhibit the ≈20◦ average associated with max-r E weighting.
However, EPAD also produces a greater fluctuation, and it widens up to about 30◦
degree for panning to the horizon ϑs = ±90◦ .
With the 9 loudspeakers of the ITU [44] 4 + 5 + 0 layout (horizontal ring:
ϕ = 0, ±30◦ , ±120◦ , upper ring at 40◦ elevation with ϕ = ±30◦ , ±120◦ ), it is not
possible anymore to use EPAD with 5th order, which would be the optimal resolution for the front loudspeaker triplet. EPAD only supports orders up to N = 2, and to
lose level towards below-horizon directions, we can use the reduced set of 6 Slepian
functions; alternatively all 9 spherical harmonics of N = 2 would also be thinkable.
For AllRAD, imaginary loudspeakers are inserted at the sides at azimuth/elevation
±75◦ /27◦ , up at 0◦ /78◦ , back 180◦ /35◦ , and below 0◦ / − 90◦ . It is reasonable to
downmix the imaginary loudspeakers with a factor one for up, sides, back, and
re-normalize the VBAP gain matrix, while disposing the signal of the imaginary
loudspeaker below. AllRAD permits to use the order N = 5, which resolves the
frontal loudspeaker triplet much better for horizontal panning.
Figure 4.19 shows the result of max-r E -weighted 2nd-order EPAD and 5th-order
AllRAD for the 4 + 5 + 0 layout using a vertical panning curve. While the perfectly
constant loudness measure of EPAD might be favored over the almost +3 dB loudness
4.9 Ambisonic Decoding to Loudspeakers
0
−5
−10
180
40
20
0
−20
−40
180
150
100
50
0
180
81
90
0
−90
−180
90
0
−90
−180
90
0
−90
−180
Fig. 4.19 Perceptual measures for Ambisonic panning on the ITU [44] 4 + 5 + 0 layout with
insertion and downmix of imaginary loudspeakers at the sides, back, and top, and insertion and
disposal of an imaginary loudspeaker below for AllRAD. Measures are evaluated for max-r E weighted 5th-order AllRAD (black) and 2nd-order EPAD (gray), for panning on a vertical circle:
E in dB (top), orientation error of r E in degrees (middle), and width expressed by arccos r E in
degrees (bottom)
increase of front and back for AllRAD, AllRAD’s lower directional error, narrower
width mapping, greater flexibility, and simplicity has often proven to be clearly
superior in practice.
4.10 Practical Studio/Sound Reinforcement Application
Examples
This section analyzes the application of 3D Ambisonic amplitude panning consisting
of encoding and AllRAD to studio (with typical setups of 2 m radius) and sound reinforcement applications (for an audience of, e.g., 250 people). Application scenarios
are sketched in [43], and various other examples are given below. Requirements of a
constant loudness and width are analyzed below, and as sound reinforcement requires
a particularly large sweet area, the r E vector model for off-center listening positions
from Sect. 2.2.9 is used to depict the sweet area size.
The analysis of decoders above described loudness measures for panning on a
circle. To observe them with panning across all directions in Figs. 4.20 and 4.22,
world-map-like mappings using a gray-scale representation of the loudness and width
measures are more reasonable. For several loudspeaker layouts, its axes are azimuth
horizontally and zenith vertically, and the gray-scale map displays the loudness measure E in dB (left column) and the width measure arccos r E in degrees (right
column). As 5th-order max-r E -weighted AllRAD typically produces minor directional mapping errors, they aren’t explicitly shown in Figs. 4.20 and 4.22. However,
the mappings of the sweet area size of plausible localization in Figs. 4.21 and 4.23
illustrate the usefulness of the systems for the listening areas hosting the number of
listeners targeted for either the studio or the sound reinforcement application.
Figure 4.20 illustrates AllRAD’s tendency of attenuated signals in too closely
spaced loudspeaker ensembles as in the front section of the ITU [44] 4 + 5 + 0. By
82
4 Ambisonic Amplitude Panning and Decoding in Higher Orders
0
1
0
50
45
0
45
40
-1
-180
135
-90
0
90
-2
30
90
180
-180
135
-90
0
90
180
90
-3
180
20
10
180
-4
0
(a) ITU [44] 4+5+0 Studio (1 listener)
0
1
0
50
45
0
45
40
-1
-3
180
-180
135
-90
0
90
-2
30
90
180
-180
135
-90
0
90
180
90
20
10
180
-4
0
(b) IEM Production Studio (4 listeners)
Fig. 4.20 Comparison of 5th-order max-r E -weighted AllRAD for panning across all directions on
hemispherical loudspeaker layouts in studios. The left column the loudness measure E in dB and
the right-most column the width measure arccos r E in degree, and the loudspeaker position are
marked with a white + sign
(a) ITU [44] 4+5+0 (1 listener)
(b) IEM Production Studio (4 listeners)
Fig. 4.21 Comparison of the calculated sweet are size for 5th-order max-r E -weighted AllRAD
for panning across all directions on hemispherical loudspeaker layouts in studios. As a plausibility
definition, the directional mapping errors depending on the listening position should stay within
angular bounds (e.g. 10◦ )
contrast, for instance the mAmbA layout in Fig. 4.22 only has 8 loudspeakers on the
horizon, and signals panned to the largely spaced below-horizon triangles tend to get
louder. Moreover, it is easier for loudspeaker systems of many channels such as IEM
CUBE, mAmbA, Lobby, and Ligeti Hall in Fig. 4.22 to yield smooth loudness and
width mappings. Still, also with only a few loudspeakers, slight direction adjustment
in the layout can fix some of the behavior, as with the IEM Production Studio, whose
±45◦ loudspeakers in the elevated layer is superior to a ±30◦ spacing.
4.10 Practical Studio/Sound Reinforcement Application Examples
0
1
0
50
45
0
45
40
-1
30
-180
135
-90
0
90
90
-2
180
-180
135
-90
0
90
90
180
83
-3
180
20
10
180
-4
0
(a) IEM mAmbA (40 listeners)
0
1
0
50
45
0
45
40
-1
30
-180
135
-90
0
90
90
-2
180
-180
135
-90
0
90
180
90
-3
180
20
10
180
-4
0
(b) IEM CUBE (80 listeners)
0
1
0
50
45
0
45
40
-1
30
-180
135
-90
0
90
90
-2
180
-180
135
-90
0
90
180
90
-3
180
20
10
180
-4
0
(c) Lobby demo room at AES 144 Milan (200 listeners)
0
1
0
50
45
0
45
40
-1
-4
-180
-90
0
135
-3
180
30
90
90
-2
180
-180
135
-90
0
90
180
90
20
10
180
0
(d) KUG Ligeti Hall (250 listeners)
Fig. 4.22 Comparison of 5th-order max-r E -weighted AllRAD for panning across all directions on
various hemispherical loudspeaker layouts for sound reinforcement. The left column the loudness
measure E in dB and the right-most column the width measure arccos r E in degree, and the
loudspeaker position are marked with a white + sign
A hint for designing good decoders sometimes is idealization: often it is better
to disregard the true loudspeaker setup locations and feed the decoder design with
idealized positions instead. Hereby can one trade slight directional distortions for
a more uniform loudness distribution. For instance at the IEM CUBE, loudspeaker
84
4 Ambisonic Amplitude Panning and Decoding in Higher Orders
(a) IEM mAmbA
(40 listeners)
(b) IEM CUBE (80 listeners)
(c) Lobby demo room at AES 144 Milan (200 listeners)
(d) KUG Ligeti Hall (250 listeners)
Fig. 4.23 Comparison of the calculated sweet are size for 5th-order max-r E -weighted AllRAD for
panning across all directions on various hemispherical loudspeaker layouts for sound reinforcement.
As a plausibility definition, the directional mapping errors depending on the listening position should
stay within angular bounds (e.g. 10◦ )
4.10 Practical Studio/Sound Reinforcement Application Examples
85
locations of the horizontal ring could be idealized to 30◦ to get a smoother loudness
mapping as the one shown in Fig. 4.22.
4.11 Ambisonic Decoding to Headphones
Typically, Ambisonic decoding to headphones can be done similarly as with loudspeakers, except that the loudspeaker signals are rendered to headphones by convolution with the head-related impulse responses (HRIRs) of the corresponding playback
directions. Various databases of such HRIRs can be found, e.g., on the website SOFAconventions.2 This headphone decoding approach is classically using a small set of
so-called virtual loudspeakers, as it is found in many places in technical literature,
e.g. in the pioneering works of Jean-Marc Jot et al. [9] or Jérôme Daniel [10]. It is
relevant in many important other works [18, 45, 46], the SADIE project,3 and it is
employed in Sect. 1.4.2 on first-order Ambisonics.
Coarse. However, as outlined in some research papers [9, 18, 46], these approaches
have in common that low-order Ambisonic synthesis is problematic. It can either
happen when inserting a dense grid of virtual-loudspeaker HRIRs that the Ambisonic
smoothing attenuates high-frequency at frontal and dorsal directions. Or, what had
been the solution for a long time, a coarse grid of virtual-loudspeaker HRIRs does
not attenuate high frequencies, but still yields that spatial quality strongly depends
on the particular grid layout or orientation [46]. An early paper by Jot [9] proposed
to remove the time delays of the HRIR before Ambisonic decomposition, and then
to re-insert the otherwise missing interaural time-delay afterwards, for any sound
panned in Ambisonics, which unfortunately yields an object-based panning system
rather than a scene-based Ambisonic system.
Dense. Some dense-grid approaches propose to keep the HRIR time delays, or if
formulated in the frequency domain: the HRTF phases (head-related transfer function), and hereby stay in a scene-based Ambisonic format, while correcting spectral
deficiencies by diffuse-field or interaural-covariance equalization [18, 47]. Finally,
most recent solutions proposed by Jin, Sun, and Epain, [17, 48] or Zaunschirm,
Schörkhuber, and Höldrich [20, 21] modify the HRIR time delays/HRTF phases but
only above, e.g., 3 kHz, without any object-based re-insertion afterwards. The omission of high-frequency interaural time-delay/phase information is a reasonable trade
off done in favor of a more important accuracy in spectral magnitude.
What does directional HRIR smoothing do to high frequencies? The geometrical
theory of diffraction [49] suggests that HRIRs must always contain at least the delay
to the ear of either the shortest direct path or the shortest indirect path via the surface
of the head. For a spherical head model with the radius R = 0.0875 m and speed
of sound c = 343 ms , the Woodworth-Schlosberg formula [50] is composed of this
2 https://www.sofaconventions.org.
3 https://www.york.ac.uk/sadie-project/database.html.
86
4 Ambisonic Amplitude Panning and Decoding in Higher Orders
Fig. 4.24 Rigid-sphere
model of delay to ear
depending on angle φ (here:
head rotation)
0
0.4
0.2
direct sound (left)
time in ms
time in ms
1
0
-6
-12
0.5
-18
-24
0
-30
indirect sound (right)
-0.2
-36
150
100
50
0
-50
angle in degree
(a) HRIR delay model
-100
-150
150
50
100
0
-50
-100 -150
angle in degrees
(b) Measured HRIR
Fig. 4.25 Time delay to the ear depending on azimuth of horizontal sound (a) and (b) 360◦ -measured
KU100 dummy head HRIRs set from TH Köln (color displays dB levels)
consideration, see Fig. 4.24. The left ear receives a distant horizontal sound from
the azimuth interval 0 ≤ φ ≤ π2 as direct sound anticipated by τ = − Rc sin φ, or for
− π2 < φ ≤ 0 as an indirect sound delayed by τ = − Rc φ,
τ (φ) = − Rc min {sin φ, φ} ,
(4.51)
as plotted in Fig. 4.25a, and recognizable from dummy-head measurements4 in
Fig. 4.25b.
If the HRIR is smoothed across an angular range, the time-delay curve gets spread
across time as well, see Fig. 4.26. In this way, depending on whether the smoothing
uses a continuous or discrete set of directions, one either obtains something like a
comb filter or a sinc-shaped frequency response. This smoothing is least disturbing around the direct-ear side as shown left in Fig. 4.26, and, as the indirect ear
also encounters high-frequency shadowing effects, it is most disturbing mainly for
frontal and rear sounds at 0◦ or 180◦ , as shown right in Fig. 4.26. The corresponding frequency responses are roughly exemplified with what third-order Ambisonics equivalent smoothing would do to either 45◦ -spaced HRIRs in Fig. 4.27a or
15◦ -spaced ones in Fig. 4.27b.
4 Data
HRIR_CIRC360.sofa from http://sofacoustics.org/data/database/thk.
4.11 Ambisonic Decoding to Headphones
87
Fig. 4.26 Directionally smoothed playback to multiple HRIRs within a window, e.g. 30◦ , causes
different impulse response shapes at 90◦ and 0◦
0
0
-5
-5
-10
-10
-15
0
-20
90
200
400

800
1.6k
3.15k
6.3k
12.5k
-15
0
-20
90
200

400
(a) Sparse HRIR set
800
1.6k
3.15k
6.3k
12.5k
(b) Dense HRIR set
Fig. 4.27 Differences of directionally smoothed HRTF frequency responses for a horizontal sound
from either 0◦ and 90◦ , smoothed within a ±22.5◦ window, which roughly corresponds to the
Ambisonics order N = 3; a and b either use a grid of 45◦ or 7.5◦ spaced HRIRs. The dashed line
shows the the theoretical frequency limit 1.87 kHz for N = 3
To get an upper frequency limit, it is insightful to work in the frequency domain
where the HRIR is denoted head-related transfer function (HRTF). A simplified
linearized-phase version around φ = 0 uses τ ≈ Rc φ, and the resulting in the Fourier
transform with ω = 2π f is
ω
ω
H ≈ e−i c R τ (φ) = ei c R φ .
(4.52)
To represent it by circular or spherical harmonics transformation limited to the order
N, a maximum phase change represented by the harmonic eiNφ implies that we can
only resolve the phase up to ωc R ≤ N, hence the range of accurate operation is limited
in frequency
fN ≤
cN
2π R
= N · 624 Hz.
(4.53)
As high-frequency HRTF phase evolves more rapidly over the angle as what the
finite order can represent, this typically yields attenuation of the high frequencies
when obtaining circular/spherical harmonics coefficients by transformation integral.
Directional smoothing of the discrete directional HRTFs causes relevant spectral problems, regardless of whether directional smoothing is done by Ambisonics,
VBAP, MDAP. Mainly the geometric delay in the HRIRs is responsible for the emerging comb-filter or low-pass behavior. One could pull out the linear phase trend above
the frequency limit and re-insert it, but is re-insertion necessary?
88
4 Ambisonic Amplitude Panning and Decoding in Higher Orders
4.11.1 High-Frequency Time-Aligned Binaural Decoding
(TAC)
As a pre-requisite for their binaural Ambisonic decoders, Schörkhuber et al. [21]
tested, above which frequency the removal of the HRTF linear phase trend remains
inaudible in direct HRTF-based rendering without panning or smoothing. In fact,
most of their listeners could not distinguish the absence of the linear phase trend
when removed above 3 kHz for various sound examples (drums, speech, pink noise,
rendered at directions 10◦ , −45◦ , 80◦ , −130◦ ). They had their subjects compare the
result to a reference with unaltered HRTFs, and the result is analyzed in Fig. 4.28.
By this finding, it is possible to split up each of the 2 × 1 HRIRs h(t, θ ) into
an unaltered low-pass band and a time-aligned high-pass band to unify the highfrequency HRIR delay
h left>3k H z [t − τ (arcsin θy ), θ ]
.
ĥ(t, θ ) = h≤3k H z (t, θ ) +
h right>3k H z [t + τ (arcsin θy ), θ ]
(4.54)
The time delay model τ (φ) uses the angle to the left/right ear on the positive/negative
y axis, so arccos ±θy , but shifted by 90◦ , hence φ = ± arcsin θy .
This removal allows use all available HRIRs of dense measurement sets for binaural synthesis of high accuracy, using a suitable linear Ambisonic decoder such as
AllRAD. Assuming the resulting modified left and right HRIR for all directions are
denoted as 2 × L matrix Ĥ(t) = [ ĥ(t, θ1 ), . . . , ĥ(t, θL )]T , the 2 × (N + 1)2 filter
set for decoding every of the Ambisonic channels to the ears becomes:
T
severe difference
Ĥ SH (t) = Ĥ(t) D diag{a}.
(4.55)
no difference
10°
-45°
80°
-130°
Ref
0Hz 500Hz 1kHz 1.5kHz 2kHz 3kHz 4kHz 5kHz 7kHz
Fig. 4.28 Experiment on audibility of removal of the linear phase trend from HRTF above a varied
cutoff frequency from [21] showing medians and 95% confidence intervals
4.11 Ambisonic Decoding to Headphones
45
°
90°
N=3
lin
ta
mls
max
13
13
0°
5°
90°
5°
lin
ta
mls
max
180°
1
f=8000
-90°
°
35
N=3
-90°
-90°
lin
ta
mls
max
0°
5°
-4
B
0d
dB
-6
B
2d
B
-1
8d
-1
45
°
f=4000
5°
-4
B
0d
dB
-6
B
2d
B
-1
8d
-1
45
°
N=3
5°
-4
B
0d
dB
-6
B
2d
B
-1
8d
-1
90°
0°
180°
f=2000
89
180°
Fig. 4.29 Exemplary horizontal cross-sections of linear (lin), time-aligned (ta), and
MagLS/magnitude-least-squares (mls) of third order N = 3, compared to high-order N = 35 (max)
Ambisonic left-ear HRTF representations of the TH Cologne HRIR_L2720.sofa set
Results achieved by a pseudo-inverse decoding to hereby time-aligned HRIRs using
R = 0.085 cm with N = 3 from the 2702-directions Cologne HRIRs5 is shown in
Fig. 4.29. The resulting polar patterns (ta) clearly outperform the linear decomposition (lin) at frequencies above 2kHz in representing the original HRTFs (max).
4.11.2 Magnitude Least Squares (MagLS)
Alternative to high-frequency time delay disposal, Schörkhuber et al. present an
optimum-phase approach [21] that disregards phase match in favor of an improved
magnitude match above cutoff. Formulated exemplarily for the left ear, across every
HRTF direction θl , and for every discrete frequency ωk , with h l,k = h(θl , ωk ), this
becomes
L
min
2
| yN (θl )T ĥSH,k | − |h l,k | .
(4.56)
ĥSH,k l=1
Typically, one would need to solve magnitude least squares or magnitude squares
least squares tasks with semidefinite relaxation, see Kassakian [51].
In practice, however, results turn out to be perfect already with an iterative combination of the reconstructed phase φ̂l,k−1 from the previous frequency ωk−1 with the
HRTF magnitude |h k,l | of the current frequency ωk , before a linear decomposition
thereof into spherical harmonic coefficients ĥSH,k .
Every frequency below cutoff ωk < 2π f N just uses the linear least-squares spherical harmonics decomposition with the left-inverse of the spherical harmonics Y N
sampled at the HRTF measurement nodes,
5 Data
HRIR_L2702.sofa from http://sofacoustics.org/data/database/thk.
90
4 Ambisonic Amplitude Panning and Decoding in Higher Orders
ĥSH,k = (Y TN Y N )−1 Y TN h l,k l .
(4.57)
Continuing with the first frequency above/equal to cutoff ωk ≥ 2π f N , the algorithm
proceeds as:
φ̂l,k−1 = ∠ yN (θl )T ĥSH,k−1 ,
ĥSH,k = (Y TN Y N )−1 Y TN |h l,k | eiφ̂l,k−1 l ,
(4.58)
(4.59)
and then moves to the next frequency k ← k + 1. The results are typically transformed back to time domain to get a real-valued impulse response for every spherical
harmonic to the regarded ear.
The results of the MagLS approach (mls) outperform the time-alignment approach
(ta) in the exemplary results shown for N = 3 in Fig. 4.29, in particular at the highest frequencies, where sphere-model-based delay simplification is not sufficiently
helpful, anymore.
4.11.3 Diffuse-Field Covariance Constraint
Also for both the above approaches that modify the high-frequency phase, Zaunschirm et al. [20] note that low order rendering degrades envelopment in diffuse fields,
so that they introduce an additional covariance constraint as defined by Vilkamo [22].
It can be implemented as a 2 × 2 filter matrix equalizing the resulting frequencydomain diffuse-field covariance matrix to the one of the original HRTF datasets.
On the main diagonal, this covariance matrix shows the diffuse-field ear sensitivities (left and right), and off-diagonal it contains the diffuse-field inter-aural cross
correlation.
At every frequency, the 2 × 2 diffuse-field covariance matrix of the original,
very-high-order spherical harmonics HRTF dataset H H
SH of the dimensions 2 × (M +
1)2 with (M N) is given by
R = HH
SH H SH .
(4.60)
The derivation why this inner product of spherical harmonic coefficients represents
the diffuse-field covariance is given in Appendix A.5. The low-order high-frequency
modified HRTF coefficient set H̃ SH of the dimensions 2 × (N + 1)2 also has a 2 × 2
covariance matrix R̂ that will differ from the more accurate R,
H
R̂ = Ĥ SH Ĥ SH .
(4.61)
Its diffuse-field reproduction improves after equalizing R = R̂ by a 2 × 2 filter
matrix,
Covariance filter |Mij|
4.11 Ambisonic Decoding to Headphones
3
0
-3
-6
-9
-12
-15
-18
-21
-24
91
N=1: M11 ,M22
N=1: M11 ,M22
N=2: M11 ,M22
N=2: M12 ,M21
N=3: M12 ,M21
N=3: M12 ,M21
100
200
400
800
1.6k
3.15k
6.3k
12.5k
Frequency in Hz
Fig. 4.30 Covariance constraint filters enhance the binaural decorrelation of MagLS by negative
crosstalk M12 and M21 , under corresponding correction of the diffuse-field sensitivities M11 and
M22 at playback orders N < 3
Ĥ SH,corr = Ĥ SH M.
(4.62)
Appendix A.5 shows the derivation of M based on [20, 22]. In summary, it is composed of factors obtained by Cholesky and SVD matrix decompositions
Ĥ corr,SH = Ĥ SH X̂
−1
V U H X,
Cholesky factors:
(4.63)
SVD:
H
HH
SH H SH = X X,
H
H
X̂ X = U SV H ,
H
Ĥ SH Ĥ SH = X̂ X̂.
While MagLS binaural decoding with orders higher than 2 or 3 does not require
covariance correction, the correction enhances the decorrelation of the ear signals
for 1st to 2nd order reproduction, as shown in Fig. 4.30.
4.12 Practical Free-Software Examples
4.12.1 Pd and Circular/Spherical Harmonics
Similar as in the example section on first-order encoding and decoding in pure data
(Pd), Fig. 4.31 shows 3rd-order 2D Ambisonic encoding and decoding for an octagon
loudspeaker layout. The implementation [mtx_circular_harmonics] of the circular harmonics is used from the iemmatrix library, and the numbers for 180
= 57.29
π
πm
were pre-calculated. Note the similarity to the first-order 2D
and am = cos 2·(N+1)
example of Fig. 1.13, to which the main change is the use of the circular harmonics
matrix object.
For decoding to headphones, programming in Pd also looks rather similar as
in the first-order example in Fig. 1.14, only more HRIRs matching the respective
92
4 Ambisonic Amplitude Panning and Decoding in Higher Orders
decoding matrix
encoding vector
loadbang
90
max-re weighted
t b b
0 45 315
lspk.
angles
mtx_:
azimuth angle in deg.
zexy/deg2rad
declare -lib zexy
mtx 1 1
declare -lib iemmatrix
mtx_circular_harmonics 3
mtx_transpose
mtx_./ 57.29 0.38 0.71 0.92 1 0.92 0.71 0.38
mtx_circular_harmonics 3
max-rE weights
mtx_diag
mtx_*
osc~ 1000 <-test signal
master
level
encoder mtx_*~ 7 1 10
decoder
mtx_*~ 8 7 0
dbtorms
pd meters
>+12
+6
+2
-0dB
-2
-6
-12
-20
-30
-50
<-99
>+12
+6
+2
-0dB
-2
-6
-12
-20
-30
-50
<-99
>+12
+6
+2
-0dB
-2
-6
-12
-20
-30
-50
<-99
>+12
+6
+2
-0dB
-2
-6
-12
-20
-30
-50
<-99
>+12
+6
+2
-0dB
-2
-6
-12
-20
-30
-50
<-99
>+12
+6
+2
-0dB
-2
-6
-12
-20
-30
-50
<-99
>+12
+6
+2
-0dB
-2
-6
-12
-20
-30
-50
<-99
>+12
+6
+2
-0dB
-2
-6
-12
-20
-30
-50
<-99
multiline~ 0 0 0 0 0 0 0 0 10
dac~ 1 2 3 4 5 6 7 8
Fig. 4.31 2D encoding and decoding in Pd using [mtx_circular_harmonics] with 3rd
order, 8 equidistant loudspeakers, and max-r E weighted decoder
loudspeaker positions need to be employed. To work in 3 dimensions, programming
in Pd would also be similar as in the corresponding first-order example of Fig. 1.15,
using the matrix object [mtx_spherical_harmonics]. Typically, pre-calculated
decoders including AllRAD and max-r E are used and loaded by, e.g., [mtx D.mtx]
into Pd to keep programming simple.
4.12.2 Ambix Encoder, IEM MultiEncoder, and IEM
AllRADecoder
For encoding single- or multi-channel signals into Ambisonics, there are the
ambix_encode_o<N>, or ambix_encode_i<L>_o<N> VST plugins available from
Kronlachner’s ambix plugin suite or the IEM MultiEncoder from the IEM plugin suite. As exemplarily shown in Fig. 4.32, the multi encoder allows to encode
channel-based multi-channel audio material, where channel-based [52] typically
refers to each channel of the multi-channel material meant to be played back on a
separate loudspeaker of clearly defined direction, cf. [44]. Elsewhere, the embedding
of virtual playback directions can also be found referred to as beds or virtual panning
spots.
4.12 Practical Free-Software Examples
Fig. 4.32 MultiEncoder plug-in: encoding of a 4 + 5 + 0 recording
Fig. 4.33 AllRADecoder plug-in: 5 + 7 + 0 layout from IEM Production Studio
93
94
4 Ambisonic Amplitude Panning and Decoding in Higher Orders
Fig. 4.34 AllRADecoder plug-in: 5 + 7 + 0 layout from IEM Production Studio
Fig. 4.35 AllRADecoder plug-in: 5 + 7 + 0 layout from IEM Production Studio
4.12 Practical Free-Software Examples
95
The IEM AllRADecoder permits to manually enter or import the loudspeaker
coordinates and channel indices, with the coordinates specified by the azimuth and
elevation angle in degrees, as exemplified for the IEM production studio in Fig. 4.33.
The figure also shows that just entering the pure 5 + 7 + 0 layout would produce
an error message Point of origin not within convex hull. Try adding imaginary loudspeakers.
By adding an imaginary loudspeaker below whose signal is typically omitted,
see Fig. 4.34, it becomes geometrically valid to calculate and employ the resulting
decoder, however it is better to also insert an imaginary loudspeaker at the rear whose
signal is preserved by specifying the gain value 1, as shown in Fig. 4.35.
4.12.3 Reaper, IEM RoomEncoder, and IEM
BinauralDecoder
Particularly relevant for head-phone-based listening, rendering of anechoic sounds
will typically not externalize well, as it does not match the mental expectation of
ordinary listening environments [53–56]. To avoid that this would rather cause an
in-head localization than the desired external sound image, one can, e.g., use the
IEM RoomEncoder plugin, see Fig. 4.36. It is based on an image-source room model
Fig. 4.36 RoomEncoder plug-in
96
4 Ambisonic Amplitude Panning and Decoding in Higher Orders
Fig. 4.37 BinauralDecoder plug-in
and encodes first-order wall-reflections involving reflection factors and propagation
delays together with the desired direct sound.
The MagLS approach for Ambisonic decoding, using the KU100 measurements
from Cologne Applied Science University and (optionally) their headphone equalization curves is implemented by the IEM BinauralDecoder, see Fig. 4.37.
In combination of both, IEM RoomEncoder and IEM BinauralDecoder with
an Ambisonics-encoded single-channel sound (e.g. using ambix_encoder), one can
simply try to place the source and receiver together in the symmetry plane of the
room, and then to slightly shift one of both sideways to see how externalization
improves by slight asymmetry in the ear signals.
References
1. J.S. Bamford, Ambisonic sound for the masses (1994)
2. D.H. Cooper, T. Shiga, Discrete-matrix multichannel stereo. J. Audio Eng. Soc. 20(5), 346–360
(1972)
3. P. Felgett, Ambisonic reproduction of directionality in surround-sound systems. Nature 252,
534–538 (1974)
4. M.A. Gerzon, The design of precisely coincident microphone arrays for stereo and surround
sound, in prepr. L-20 of 50th Audio Engineering Society Convention (1975)
5. P. Craven, M.A. Gerzon, Coincident microphone simulation covering three dimensional space
and yielding various directional outputs, U.S. Patent, no. 4,042,779 (1977)
6. J.S. Bamford, An analysis of ambisonic sound systems of first and second order, Master’s
thesis, University of Waterloo, Ontario (1995)
7. D.G. Malham, A. Myatt, 3D sound spatialization using ambisonic techniques. Comput. Music
J. 19(4), 58–70 (1995)
8. M.A. Poletti, The design of encoding functions for stereophonic and polyphonic sound systems.
J. Audio Eng. Soc. 44(11), 948–963 (1996)
9. J.-M. Jot, V. Larcher, J.-M. Pernaux, A comparative study of 3-d audio encoding and rendering
techniques, in 16th AES Conference (Rovaniemi, 1999)
10. J. Daniel, Représentation des champs acoustiques, application à la transmission et à la reproduction de scènes sonores complexes dans un contexte multimédia, Ph.D. dissertation, Université
Paris 6 (2001)
11. D.B. Ward, T.D. Abhayapala, Reproduction of a plane-wave sound field using an array of
loudspeakers. IEEE Trans. Speech Audio Process. 9(6), 697–707 (2001)
12. G. Dickins, Sound field representation, reconstruction and perception, Master’s thesis, Australian National University, Canberra (2003)
13. A. Sontacchi, Neue Ansätze der Schallfeldreproduktion, Ph.D. dissertation, TU Graz (2003)
References
97
14. M.A. Poletti, Robust two-dimensional surround sound reproduction for nonuniform loudspeaker layouts. J. Audio Eng. Soc. 55(7/8), 598–610 (2007)
15. F. Zotter, H. Pomberger, M. Noisternig, Energy-preserving ambisonic decoding. Acta Acust.
United Acust. 98(11), 37–47 (2012)
16. J.-M. Batke, F. Keiler, Using vbap-derived panning functions for 3d ambisonics decoding, in
2nd International Symposium on Ambisonics and Spherical Acoustics (Paris, 2010)
17. D. Sun, Generation and perception of three-dimensional sound fields using higher order
ambisonics (University of Sydney, School of Electrical and Information Engineering, 2013).
Ph.D thesis
18. Z. Ben-Hur, F. Brinkmann, J. Sheaffer, S. Weinzierl, B. Rafaely, Spectral equalization in binaural signals represented by order-truncated spherical harmonics. J. Acoust. Soc. Am. 141(6)
(2017)
19. F. Brinkmann, S. Weinzierl, Comparison of head-related transfer functions pre-processing
techniques for spherical harmonics decomposition, in AES Conference AVAR (Redmont, 2018)
20. M. Zaunschirm, C. Schörkhuber, R. Höldrich, Binaural rendering of ambisonic signals by
head-related impulse response time alignment and a diffuseness constraint. J. Acoust. Soc.
Am. 143(6), 3616–3627 (2018)
21. C. Schörkhuber, M. Zaunschirm, R. Höldrich, Binaural rendering of ambisonic signals via
magnitude least squares, in Fortschritte der Akustik - DAGA (Munich, 2018)
22. J. Vilkamo, T. Bäckström, A. Kuntz, Optimized covariance domain framework for timefrequency processing of spatial audio. J. Audio Eng. Soc. 61(6), 403–411 (2013)
23. F.W.J. Olver, R.F. Boisvert, C.W. Clark (eds.), NIST Handbook of Mathematical Functions
(Cambridge University Press, Cambridge, 2000), http://dlmf.nist.gov. Accessed June 2012
24. J. Daniel, J.-B. Rault, J.-D. Polack, Acoustic properties and perceptive implications of stereophonic phenomena, in AES 6th International Conference: Spatial Sound Reproduction (1999)
25. M. Frank, How to make ambisonics sound good, in Forum Acusticum, Krakow (2014)
26. M. Frank, F. Zotter, Extension of the generalized tangent law for multiple loudspeakers, in
Fortschritte der Akustik - DAGA (Kiel, 2017)
27. M. Frank, A. Sontacchi, F. Zotter, Localization experiments using different 2d ambisonic
decoders, in 25th Tonmeistertagung (2008)
28. P. Stitt, S. Bertet, M.v. Walstijn, Off-centre localisation performance of ambisonics and hoa for
large and small loudspeaker array radii. Acta Acust. United Acust. 100(5), 937–944 (2014)
29. M. Frank, F. Zotter, Exploring the perceptual sweet area in ambisonics, in AES 142nd Convention (2017)
30. ISO 31-11:1978, Mathematical signs and symbols for use in physical sciences and technology
(1978)
31. ISO 80000-2, quantities and units? Part 2: Mathematical signs and symbols to be used in the
natural sciences and technology (2009)
32. R.H. Hardin, N.J.A. Sloane, Mclaren’s improved snub cube and other new spherical designs
in three dimensions. Discret. Comput. Geom. 15, 429–441 (1996), http://neilsloane.com/
sphdesigns/dim3/
33. M. Gräf, D. Potts, On the computation of spherical designs by a new optimization approach
based on fast spherical fourier transforms. Numer. Math. 119 (2011), http://homepage.univie.
ac.at/manuel.graef/quadrature.php
34. R.S. Womersley, Chapter, Efficient spherical designs with good geometric properties in Contemporary Computational Mathematics - A Celebration of the 80th Birthday of Ian Sloan
(Springer, Berlin, 2018), pp. 1243–1285
35. A. Solvang, Spectral impairment of 2d higher-order ambisonics. J. Audio Eng. Soc. 56(4)
(2008)
36. M.A. Gerzon, G.J. Barton, Ambisonic decoders for HDTV, in prepr. 3345, 92nd AES Convention, Vienna, 1992
37. J. Daniel, J.-B. Rault, J.-D. Polack, Ambisonics encoding of other audio formats for multiple
listening conditions, in prepr. 4795, 105th AES Convention (San Francisco, 1998)
98
4 Ambisonic Amplitude Panning and Decoding in Higher Orders
38. M.A. Poletti, A unified theory of horizontal holographic sound systems. J. Audio Eng. Soc.
48(12), 1155–1182 (2000)
39. M.A. Poletti, Three-dimensional surround sound systems based on spherical harmonics (J.
Audio Eng, Soc, 2005)
40. F. Zotter, M. Frank, All-round ambisonic panning and decoding (J. Audio Eng, Soc, 2012)
41. F. Zotter, M. Frank, Ambisonic decoding with panning-invariant loudness on small layouts
(allrad2), in 144th AES Convention, prepr. 9943 (Milano, 2018)
42. R. Pail, G. Plank, W.-D. Schuh, Spatially restricted data distributions on the sphere: the method
of ortnonormalized functions and applications. J. Geodesy 75, 44–56 (2001)
43. M. Frank, A. Sontacchi, Case study on ambisonics for multi-venue and multi-target concerts
and broadcasts. J. Audio Eng. Soc. 65(9) (2017)
44. ITU, Recommendation BS.2051: Advanced sound system for programme production. ITU
(2018)
45. M. Noisternig, A. Sontacchi, T. Musil, R. Höldrich, “A 3d ambisonic based binaural sound
reproduction system, in 24th AES Conference (Banff, 2003)
46. B. Bernschütz, A. V. Giner, C. Pörschmann, J. Arend, Binaural reproduction of plane waves
with reduced modal order. Acta Acust. United Acust. 100(5) (2014)
47. S. Delikaris-Manias, J. Vilkamo, Adaptive mixing of excessively directive and robust beamformers for reproduction of spatial sound, in Parametric Time-Frequency Domain Spatial
Audio, ed. by V. Pulkki, S. Delikaris-Manias, A. Politis (Wiley, New Jersey, 2017)
48. C.T. Jin, N. Epain, D. Sun, Perceptually motivated binaural rendering of higher order ambisonic
sound scenes, in Fortschritte der Akustik AIA-DAGA (Merano, 2013). March
49. J.B. Keller, Geometrical theory of diffraction. J. Acoust. Soc. Am. 52(2), 116–130 (1962)
50. R.S. Woodworth, H. Schlosberg, Experimental Psychology (Holt, Rinehart and Winston, 1954)
51. P.W. Kassakian, Convex approximation and optimization with applications in magnitude filter
design and radiation pattern synthesis, Ph.D. dissertation, EECS Department, University of
California, Berkeley (2006)
52. ITU, Recommendation BS.2076: Audio Definition Model (ITU, 2017)
53. S. Werner, G. Götz, F. Klein, Influence of head tracking on the externalization of auditory events
at divergence between synthesized andf listening room using a binaural headphone system, in
prepr. 9690, 142nd AES Convention (Berlin, 2017)
54. F. Klein, S. Werner, T. Mayenfels, Influences of tracking on externalization of binaural synthesis
in situations of room divergence. J. Audio Eng. Soc. 65(3), 178–187 (2017)
55. J. Cubick, Investigating distance perception, externalization and speech intelligibility in complex acoustic environments, Ph.D. dissertation, DTU Copenhagen (2017)
56. G. Plenge, Über das Problem der Im-Kopf-Lokalisation. Acustica 26(5), 241–252 (1972)
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0
International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing,
adaptation, distribution and reproduction in any medium or format, as long as you give appropriate
credit to the original author(s) and the source, provide a link to the Creative Commons license and
indicate if changes were made.
The images or other third party material in this chapter are included in the chapter’s Creative
Commons license, unless indicated otherwise in a credit line to the material. If material is not
included in the chapter’s Creative Commons license and your intended use is not permitted by
statutory regulation or exceeds the permitted use, you will need to obtain permission directly from
the copyright holder.
Chapter 5
Signal Flow and Effects in Ambisonic
Productions
This system offers significantly enhanced spatialisation
technologies […] with new creative possibilities opening up to
anyone with the appropriate number of audio channels
available on their computer systems.
Dave G. Malham [1], ICMC, Bejing, 1999.
Abstract This chapter presents the internal working principles of various Ambisonic
3D audio effects. No matter which digital audio workstation or processing software is used in a production, the general Ambisonic signal infrastructure is outlined as an important overview of the signal processing chain. The effects presented
are frequency-independent effects such as directional re-mapping (mirror, rotation,
warping) and re-weighting (directional level modification), and frequency-dependent
effects such as widening/distance/diffuseness, diffuse reverberation, and resolutionenhanced convolution reverberation.
The typical audio processing steps for Ambisonic surround-sound signal
manipulation are shown in the block diagram Fig. 5.1 from [2]. The description of
the multi-venue application in [3] and one for live effects [4] might be encouraging.
Ambisonic encoding and Ambisonic bus. From the previous section we know that
representing single-channel signals sc (t) together with their direction θ c is a matter
of encoding, of multiplying the signal by the coefficients yN (θ c ) obtained by evaluating the spherical harmonics at the direction from which the signal should appear to
come. In productions, there will be multiple signals, either representing spot microphones, virtual playback spots of embedded channel-based content (beds), e.g. stereo
or 5.1, material. With all input signals encoded and summed up on an Ambisonic
bus, we obtain the multi-channel Ambisonic signal representation of an entire audio
production
© The Author(s) 2019
F. Zotter and M. Frank, Ambisonics, Springer Topics in Signal Processing 19,
https://doi.org/10.1007/978-3-030-17207-7_5
99
100
5 Signal Flow and Effects in Ambisonic Productions
Fig. 5.1 Block diagram as in [2]
χ N (t) =
C
yN (θ c ) sc (t).
(5.1)
c=1
Ambisonic surround-sound signal. Without decoding to a specific loudspeaker layout, the signal χ N of the Ambisonic bus might appear somewhat virtual. Nevertheless,
it allows to be drawn as a surround-sound signal x(θ , t) whose amplitude can be evaluated and metered at any direction θ, anytime t, using the expansion into spherical
harmonics
x(θ , t) = yTN (θ) χ N (t).
(5.2)
Upmixing. As first-order recordings are not highly resolved, there are several works
on algorithms with resolution enchancement strategies that re-assign time-frequency
bins more sharply to directions. A good summary on such input-specific insert effects
has been given in the book [5, 6]. Available solutions are DirAC, HOA-DirAC,
COMPASS, Harpex.
Higher order. Higher-order microphones require more of the acoustic holophonic
and holographic basics than presented above, yielding pre-processing filters as inputspecific insert effect. Higher-order recording is dealt with in the subsequent Chap. 6
after the derivation of the wave equation and the solutions in the spherical coordinate
system.
Insert effects: Generic re-mapping and leveling. One can imagine that it should
be possible to manipulate the surround-sound signal x(θ , t) in various ways. For
5 Signal Flow and Effects in Ambisonic Productions
101
instance, effects based on directional re-mapping can take signals out of their original
directional range and place them back into the Ambisonic signal at manipulated
directions. Also, directions can be altered in amplitude levels so that, for instance,
signals at directions with unwanted content undergo attenuation. Many more useful
effects are presented below.
Decoding to loudspeakers/headphones. To map the modified Ambisonic signal χ̃ Ñ
to loudspeakers or headphones, an Ambisonic decoder is needed as discussed in
the previous chapter. For decoding to headphones, it should be considered to either
take only as few HRIR directions to decode to as possible [7, 8], before signals
get convolved and mixed to avoid coloration at frontal directions where delays
in the HRIRs change too strongly over the direction to get resolved properly [9,
10]. Alternatively, the approach in [11] proposed removal of the HRIR delay at
high frequencies and diffuse-field covariance equalization by a 2 × 2 filter system,
cf. Sect. 4.11.
5.1 Embedding of Channel-Based, Spot-Microphone,
and First-Order Recordings
Microphone arrays for near-coincident higher-order Ambisonic recording based on
holography will be discussed in the subsequent chapter. Nevertheless it possible to
use (i) spot and close microphones and encode their direction into the directional
panorama, (ii) first-order microphone arrays to fill the Ambisonic channels only up
to the first order, (iii) more classical non-coincident or equivalence-stereophonic
microphone arrays whose typical playback directions are encoded in Ambisonics.
The study by Kurz et al. [12] investigated how recordings by first-order encoding of
the soundfield microphone ST450 and the Oktava MK4012 tetrahedral microphone
arrays compare to the equivalence-stereophonic ORTF, see Fig. 5.2. In addition,
ORTF-like mapping of the Oktava MK4012’s frontal signals to the ±30◦ directions in
5th order was tested instead of its first-order encoding. Figure 5.3 shows the results of
the study in terms of the perceptual attributes localization and spatial depth. It seems
that a mixture between ORTF-like 5th-order encoding and first-order encoding of
the MK4012 microphone achieves preferred results, while the first-order encoded
output of the ST450 Soundfield microphone is rated fair in both attributes, the ORTF
microphone only ranked well terms of localization. The results of the ST450 were
independent from its orientation, whereas the localization of the first-order-encoded
MK4012 was found to be dependent on the orientation. This dependency of the
MK4012 is because its microphones are not sufficiently coincident.
As a bottom line of the detailed analysis, one should be encouraged to keep using
classical microphone techniques where known to be appropriate and encode their
output in higher-order beds or virtual playback directions. However, this should be
done with the awareness that stereophonic recording won’t necessarily work for a
102
5 Signal Flow and Effects in Ambisonic Productions
1
1
0.8
0.8
0.6
0.4
orientation
ORTF
0°
90°
0.2
0
MSTC64
ST450
SPS200
(a) Sound Localization
MK4012
playback
stereo
1st order
1st/5th order
Spatial Depth
Localization
Fig. 5.2 Ensemble of the Ambisonic and reference microphones of the study by Kurz et al. [12];
the pixelized microphone prototype by AKG was excluded from the study
0.6
0.4
0.2
0
MSTC64
ST450
SPS200
MK4012
(b) Spatial Depth
Fig. 5.3 Median values and 95% confidence intervals for each attribute from experiments in [12]
for different microphones, orientations, and playback processing
large audience area, for which the robustness in directional mapping of equivalencebased techniques seem to be attractive.
An interesting layout is, e.g., specified in Hendrickx et al’s work [13], in which they
use an equivalence-stereophonic six-channel microphone array. Another interesting
idea was used in the ICSA Ambisonics Summer School 2017. A height layer of
suitably inclined super-cardioid microphones was added at small vertical distance
to the horizontal microphone layer, similarly as the upwards-pointing directional
microphones suggested in Lee’s and Wallis’ work [14, 15] to provide sufficiently
attenuated horizontal sounds to the height layer.
1
1
0.8
0.8
timbral quality
spatial quality
5.1 Embedding of Channel-Based, Spot-Microphone, and First-Order Recordings
0.6
0.4
0.2
103
0.6
0.4
0.2
0
0
0
1C
1
2
3
4
(a) Spatial Quality
5
ref
0
1C
1
2
3
4
5
ref
(b) Timbral Quality
Fig. 5.4 Median values and 95% confidence intervals of listening experiment comparing channelbased orchestra recordings on headphone playback, either directly rendered using the corresponding
HRIRs or via binaural Ambisonic decoding of different orders
Binaural rendering study using surround-with-height material. In another study
by Lee, Frank, and Zotter [16], static headphone-based rendering of channel-based
recordings was compared using direct HRIR-based rendering or Ambisonics-based
binaural rendering, cf. Sect. 4.11. The aim was to find whether differently recorded
material could be rendered at high quality via binaural Ambisonics renderers, or under
which settings this would imply quality degradation when compared to channel-based
binaural rendering.
The results from the half of the listening experiment done in Graz is analyzed in
Fig. 5.4, and the renderers compared were channel-based “ref”, a low-passed mono
anchor designed to have poor quality “0”, a first-order binaural Ambisonic renderer
“1c” based on a cube layout with loudspeakers at ±90◦ , ±270◦ azimuth and ±35.3◦
elevation, and MagLS binaural Ambisonic renderers at the orders “1”, “2”, “3”, “4”,
and “5”. Obviously, for orders 2 and above, there is not much quality degradation
compared to the reference channel-based binaural rendering. The spatial quality
cannot be distinguished from the reference for MagLS with Ambisonic orders 3 and
above, and the timbral qualities cannot be distinguished for Ambisonic orders 2 and
above.
While this result simplifies the practical requirements for headphone playback
remarkably, it can be supposed that due to the limited sweet spot size, loudspeaker
playback would still require higher orders, typically.
5.2 Frequency-Independent Ambisonic Effects
Many frequency- and time-independent Ambisonic effects are based on the aforementioned re-mapping of directions and manipulation of directional amplitudes, see
e.g. Kronlachner’s thesis, [2, 17]; advanced effects can be found in [18]. In general,
the surround-sound signal allows to be manipulated by any thinkable transformation
that modifies the directional mapping and amplitude of its contents. The formulation
104
5 Signal Flow and Effects in Ambisonic Productions
x̃(θ̃, t) = g(θ) x(θ , t)
(5.3)
expresses an operation that is able to pick out every direction θ of the input signal,
weight its signal by a directional gain g(θ), and re-map it to a new direction θ̃ = τ {θ }
within a transformed signal x̃. To find out how this affects Ambisonic signals, we write
both x and x̃ as Ambisonic signals x(θ , t) = yTN (θ ) χ N (t) and x̃(θ, t) = yTÑ (θ̃ ) χ̃ N (t)
expanded in spherical/circular harmonics,
yTÑ (θ̃ ) χ̃ Ñ (t) = g(θ) yTN (θ ) χ N (t),
and use
left
SD
yÑ (θ̃) yTÑ (θ̃ ) dθ̃ = I by integrating over yÑ (θ̃ )dθ̃
:=T
χ̃ Ñ (t) =
SD
yÑ (θ̃ ) g(θ)
SD
to get χ̃ Ñ (t) on the
yTN (θ ) dθ̃
χ N (t) = T χ N (t)
(5.4)
to find the transformed signals being just re-mixed Ambisonic input signals by the
matrix T (note that it might require
an increased Ambisonic order Ñ). Numerical
evaluation of the matrix T = SD yÑ (θ̃ ) g(θ) yÑ (θ ) dθ̃ is best done by using a highenough t-design = [θl ] to discretize the integration variable θ̃ = τ {θ }. For the
discretized input directions θ , an inverse mapping θ = τ −1 {θ̃} of the output direction
must exist (directional re-mapping must be bijective), so that we can write
T=
SD
yÑ (θ̃ ) g(τ −1 {θ̃}) yTN (τ −1 {θ̃}) dθ̃ =
4π
L̂
Y Ñ, diag{g τ −1 {} } Y TN,τ −1 {} .
This formalism is generic and covers simplistic and more complex tasks. It helps
understanding that every frequency-independent directional weighting and/or remapping is just re-mixing the Ambisonic signals by a matrix, as in Fig. 5.6a.
The ambix VST plugin suite implements several effects, e.g. in the VST plugins ambix_mirror, ambix_rotate, ambix_directional_loudness, ambix_warp.
The sections below explain how these and other effects work inside.
5.2.1 Mirror
Mirroring does not actually require the generic re-mapping and re-weighting formalism from above, yet. The spherical harmonics associated with the Ambisonic
channels are shown in Fig. 4.12 and upon closer inspection one recognizes their
symmetries, see Fig. 5.5. To mirror the Ambisonic sound scene with regard to planes
of symmetry, it is sufficient to sign-invert channels associated with odd-symmetric
spherical harmonics as in Fig. 5.6b. Formally, the transform matrix consists of a
diagonal matrix T = diag{c} only, with the corresponding sign-change sequence c.
5.2 Frequency-Independent Ambisonic Effects
105
Fig. 5.5 Ambisonic singals
associated with odd
symmetric spherical
harmonics are sign-inverted
to mirror the sound scene.
For every Cartesian axis,
illustrations above show
spherical harmonics up to the
third-order, with the order
index n organized in rows
and the mode index m in
columns. Even harmonics are
blurred for visual distinction
Up-down: For instance, spherical harmonics with |m| = n are even symmetric with
regard to z = 0 (up-down), and from this index on, every second harmonic in m is.
To flip up and down, it is therefore sufficient to invert the signs of odd-symmetric
spherical harmonics with regard to z = 0; they are characterized by n + m being an
odd number, or cnm = (−1)n+m .
Left-right: The sin ϕ-related spherical harmonics with m < 0 are odd-symmetric with
regard to y = 0 (left-right); therefore sign-inverting the signals with the index m < 0
exchanges left and and right in the Ambisonic surround signal, i.e. cnm = (−1)m<0 .
106
5 Signal Flow and Effects in Ambisonic Productions
Fig. 5.6 Block diagrams of frequency-independent transformations such as re-mapping and reweighting (left, matrix operations), or mirroring (right, sign-only operations)
Front-back: Every odd-numbered m > 0 is odd-symmetric with regard to x = 0
(front-back), and so is every even-numbered harmonic with m < 0. Inverting the
sign of these harmonics, cnm = (−1)m+(m<0) , flips front and back in the Ambisonic
surround signal.
5.2.2 3D Rotation
Rotation can be expressed by a general rotation matrix R consisting of a rotation
around z by χ , around y by ϑ, and again around z by ϕ, see Fig. 5.7. This rotation
matrix maps every direction θ to a rotated direction θ̃:
θ̃ = R(ϕ, ϑ, χ ) θ ,
(5.5)
⎡
⎤⎡
⎤⎡
⎤
cos(ϕ) − sin(ϕ) 0
cos(ϑ) 0 − sin(ϑ)
cos(χ ) − sin(χ ) 0
⎦ ⎣ sin(χ ) cos(χ ) 0⎦ .
0
R = ⎣ sin(ϕ) cos(ϕ) 0⎦ ⎣ 0 1
0
0
1
sin(ϑ) 0 cos(ϑ)
0
0
1
Using this as a transform rule τ (θ̃ ) = R θ with neutral gain g(θ) = 1, we find the
transform matrix by the inverse mapping θ = RT θ̃ as
Fig. 5.7 zyz-Rotation on the plain example of great-circle navigation of a paper plane around the
earth. With the original location at the zenith, a first rotation around z determines the course, and
the subsequent rotations around y and z relocate the plane in zenith and azimuth
5.2 Frequency-Independent Ambisonic Effects
T=
4π
L̂
107
Y N, Y TN,RT .
(5.6)
Using the L̂ directions of a t ≥ 2N-design is sufficient to sample the harmonics
accurately. With the resulting T , rotation is implemented as in Fig. 5.6a.
There is plenty of potential for simplification: As only the spherical harmonics of
a given order n are required to re-express a rotated spherical harmonic of the same
order n, T is actually block diagonal T = blk diagn {T n }, and within each spherical
harmonic order, the integral could be more efficiently evaluated using a smaller
t ≥ 2n-design. Moreover, there are various fast and recursive ways to calculate the
entries of T as in [19–25] and implemented in most plugins. And yet, in practice a
naïve implementation can be fast enough and pragmatic.
Rotation around z. One special case of rotation is important and particularly simple
to implement. A directional encoding in azimuth always either is equal to m (ϕs )
in 2D, or contains it in 3D. For m > 0, the azimuth encoding m (ϕs ) depends on
cos mϕs , and its negative-sign version −m (ϕs ) depends on sin(|m|ϕs ). The encoding
angle can be offset by the trigonometric addition theorems. They can be written as a
matrix:
cos mϕ sin mϕ sin mϕs
sin m(ϕs + ϕ)
=
.
(5.7)
− sin mϕ cos mϕ cos mϕs
cos m(ϕs + ϕ)
R(mϕ)
By this, any Ambisonic signal, be it 2D or 3D, can be rotated around z by the matrices
R(mϕ) for the signal pairs with ±m.
−m (ϕs + ϕ)
−m (ϕs )
= R(mϕ)
,
m (ϕs + ϕ)
m (ϕs )
Yn−m (ϕs + ϕ, ϑs )
Yn−m (ϕs , ϑs )
=
R(mϕ)
.
Ynm (ϕs + ϕ, ϑs )
Ynm (ϕs,ϑs )
(5.8)
Figure 5.8a shows the processing scheme implementing only the non-zero entries
of the associated matrix operation T . Combined with a fixed set of 90◦ rotations
around y (read from files), it can be used to access all rotational degrees of freedom
in 3D [20].
The rotation effect is one of the most important features when using head-tracked
interactive VR playback for headphones. Here, rotation counteracting the head movement has the task to support the impression of a static image of the virtual outside
world.
5.2.3 Directional Level Modification/Windowing
What might be most important when mixing is the option to treat the gains of different
directions differently: it might be necessary to attenuate directions of uninteresting
or disturbing content while boosting directions of a soft target signal. For such a
108
5 Signal Flow and Effects in Ambisonic Productions
Fig. 5.8 Rotation around z and Ambisonic widening/diffuseness apply simple 2 × 2 rotation matrices/filter matrices to each Ambisonic signal pair χn,m χn,−m of the same order n. Note that the order
of the input/output channels plotted is not the typical ACN sequence to avoid crossing connections
and hereby simplify the diagram
manipulation there is a neutral directional re-mapping θ̃ = θ and the transform to
define the matrix T that is implemented as in Fig. 5.6a remains
T=
4π
L̂
Y Ñ, diag{g } Y TN, .
(5.9)
In the simplest version, as implemented in ambix_directional_loudness, the
gain function just consists of two mutually exclusive regions, e.g. within a region
of diameter α around the direction θ g , and a complementary region outside, with
separately controlled gains gin and gout :
g(θ) = gin u(θ T θ g − cos α2 ) + gout u(cos α2 − θ T θ g ),
(5.10)
where u(x) represents the unit-step function that is 1 for x ≥ 0 and 0 else. Note that
the Ambisonic order of this effect will need to be larger to be lossless. However,
with reasonably chosen sizes α and gain ratios gin /gout , the effect will nevertheless
produce reasonable results. Figure 5.9 shows a window at azimuth and elevation at
22.5◦ with an aperture of 50◦ using gin = 1 and gout = 0 and the order of N = 10
with a grid of encoded directions to illustrate the influence of the transformation.
For reference: entries of the tensor used to analytically re-expand the product
of two spherical functions x(θ ) g(θ) given by their spherical harmonic coefficients
χnm , γnm are called Gaunt coefficients or Clebsh-Gordan coefficients [6, 26].
180
180
135
135
45
0
-45
-90
135
-135
90
-180
90
180
45
135
45
90
0
90
109
0
45
0
-45
-90
-135
-180
5.2 Frequency-Independent Ambisonic Effects
180
(a) original
(b) windowed
Fig. 5.9 Directionally windowed Ambisonic test image at every 90◦ in azimuth, interleaved in
azimuth for the elevations ±60◦ and ±22.5◦ , using the order N = Ñ = 10, a window size of
α
◦
◦
2 = 50 around azimuth and elevation of 0 and max-r E weighting
5.2.4 Warping
Gerzon [27, Eq. 4a] described the effect dominance that is meant to warp the
Ambisonic surround scene to modify how vitally the essential parts in front of the
scene are presented.
Warping wrt. a direction. For mathematical simplicity, we describe this bilinear
warping with regard to the z direction. To warp with regard to the frontal direction, one
first rotates the front upwards, applies the warping operation there, and then rotates
back. The bilinear warping modifies the normalized z coordinate ζ = cos ϑ = θz so
that signals from the horizon ζ = 0 are pulled to ζ̃ = α,
ζ̃ =
α+ζ
,
1 + αζ
(5.11)
while keeping for the poles ζ̃ = ±1 what was originally there ζ = ±1. Hereby, the
surround signal gets squeezed towards or stretched away from the zenith, or when
rotating before and after: towards/from any direction.
The integral can be discretized and solved by a suitable t-design as before, only
that for lossless operation, the output order Ñ must be higher than the input order N.
We get a matrix T that is implemented as in Fig. 5.6a is computed by
T=
4π
L̂
T
Y Ñ, diag{g τ −1 {} } Ỹ N,τ −1 {} .
(5.12)
The inverse mapping yields
ζ = τ −1 (ζ̃ ) =
ζ̂ − α
1 − α ζ̂
,
(5.13)
and it modifies the coordinates of the t-design inserted for θ̃ l = θl = [θx,l , θy,l , θz,l ]T
with ζ̃l = θz,l accordingly
180
135
135
90
45
0
−45
135
−90
90
−135
90
−180
45
180
45
135
0
90
0
45
0
-45
-90
5 Signal Flow and Effects in Ambisonic Productions
-135
-180
110
180
180
(b) warped
(a) original
Fig. 5.10 Warping of the horizontal plane by 22.5◦ downwards; original Ambisonic test image
contains points at every 90◦ in azimuth, interleaved in azimuth for the elevations ±60◦ and ±22.5◦ ;
orders are N = Ñ = 10; max-r E weighted
⎡ 2 ⎤
θz,l −α
⎢θx,l 1 − 1−αθz,l ⎥
1 − (τ −1 {ζ̃l })2
θ
⎥ ⎢ ⎢ x,l 2 ⎥
−1
⎥
⎥ ⎢
τ {l } = ⎢
⎣θy,l 1 − (τ −1 {ζ̃l })2 ⎦ = ⎢θy,l 1 − θz,l −α ⎥ .
⎣
⎦
1−αθz,l
τ −1 {ζ̃l }
θz,l −α
⎡
⎤
(5.14)
1−αθz,l
The gain g(ζ̂ ) of the generic transformation is useful to preserve the loudness of what
becomes wider and therefore louder in terms of the E measure after re-mapping. To
preserve loudness, the resulting surround signal is divided by the square
root of the
1
stretch applied, which is related to the slope of the mapping by g = dζ
. Expressed
dζ̃
as de-emphasis gain, we get
1 − α ζ̂l
1 − αθl,z
g(ζ̂l ) = √
= √
.
2
1−α
1 − α2
(5.15)
Figure 5.10 shows warping of the horizontal plane by 20◦ downwards, using the test
image parameters as with windowing; de-emphasis attenuates widened areas.
In the same fashion, Kronlachner [17] describes another warping curve that warps
with regard to fixed horizontal plane and pole, either squeezing or stretching the
content towards or away from the horizon, symmetrically for both the upper and
lower hemispheres (second option of the ambix_warp plugin).
5.3 Parametric Equalization
There are two ways of employing parametric equalizers to Ambisonic channels.
Either a single-/multi-channel input of a mono-encoder or a multiple-input encoder
is filtered by parametric equalizers. Or each of the Ambisonic signal’s channels is
filtered by the same parametric equalizer, see Fig. 5.11a.
5.3 Parametric Equalization
111
Fig. 5.11 Block diagram of processing that commonly and equally affects all Ambisonic signals,
such as parametric equalization and dynamic processing (compression), without recombining the
signals
Bass management is often important to not overdrive smaller loudspeaker systems
of, e.g., a 5th-order hemispherical playback system with subwoofer signals: All
36 channels from the Ambisonic bus can be sent to a decoder section, in which
frequencies below 70–100 Hz are high-cut by a 4th-order filter before running through
the Ambisonic decoder, while the first channel from the Ambisonics bus alone, the
omnidirectional channel, is being sent to a subwoofer section, in which a 4th-order
filter high-cut removes the high frequencies above 70–100 Hz before the signal is sent
to the subwoofers. If the playback system is time-aligned between subwoofer and
higher frequencies, the 4th-order crossovers should be Linkwitz–Riley filters (either
squared Butterworth high-pass or low-pass filters) to preserve phase equality [28].
For more information on parametric equalizers, the reader is referred to Udo
Zölzer’s book on Digital audio effects [29].
5.4 Dynamic Processing/Compression
Individual compression of different Ambisonic channels would destroy the directional consistency of the Ambisonics signal. Consequently, dynamic processing
should rather affect the levels of all Ambisonic channels in the same way. As it
typically contains all the audio signals, it is useful to have the first, omnidirectional
Ambisonic channel control the dynamic processor as side-chain input, see Fig. 5.11b.
For more information on dynamic processing, the reader is referred to Udo Zölzer’s
book on Digital audio effects [29].
Moreover, it is sometimes useful to compress the vocals of a singer separately.
To this end, the directional compression would first extract a part of the Ambisonic
signals by a directional window, creating one set of Ambisonic signals without the
directional region of the window, and another one exclusively containing it. The
compression is applied on the resulting window signal before re-combining it with
the rest signals.
112
5 Signal Flow and Effects in Ambisonic Productions
5.5 Widening (Distance/Diffuseness/Early Lateral
Reflections)
Basic widening and diffuseness effects can be regarded as being inspired by Gerzon
[30] and Laitinen [31] who proposed to apply frequency-dependent panning filters,
mapping different frequencies to directions dispersed around the panning direction.
The resulting effect is fundamentally different from and superior to increasing the
spread in frequency-independent MDAP with enlarged spread or Ambisonics with
reduced order, which could yield audible comb filtering.
To apply this technique to Ambisonics, Zotter et al. [32] proposed to employ a
dispersive, i.e. frequency-dependent, rotation of the Ambisonic scene around the zaxis as in Eq. (5.8) by the matrix R as described above and in Fig. 5.8b, using 2 × 2
matrices of filters to implement the frequency-dependent argument m φ̂ cos ωτ
R(m φ̂ cos ωτ ) =
cos(m φ̂ cos ωτ ) sin(m φ̂ cos ωτ )
,
− sin(m φ̂ cos ωτ ) cos(m φ̂ cos ωτ )
(5.16)
whose parameters φ̂ and τ allow to control the magnitude and change rate of the
rotation with increasing frequency. How this filter matrix is implemented efficiently
was described in [33], where a sinusoidally frequency-varying pair of functions
g1 (ω) = cos [α cos(ω τ )] ,
g2 (ω) = sin [α cos(ω τ )] ,
(5.17)
was found to correspond to the sparse impulse responses in the time domain
g1 (t) =
∞
J|q| (α) cos( π2 |q|) δ(t − q τ )
(5.18)
q=−∞
g2 (t) =
∞
J|q| (α) sin( π2 |q|) δ(t − q τ ),
q=−∞
allowing for truncation to just a few terms in q, typically 11 taps between −5 ≤ q ≤ 5
or fewer, and hereby an efficient implementation. For the implementation of the filter
matrix, for each degree m, the value α = m φ̂. (It might be helpful to be reminded of
a phase-modulated cosine and sine from radio communication, whose spectra are
the same functions as this impulse response pair.)
As the algorithm places successive frequencies at slightly displaced directions,
the auditory source width increases. Moreover, the frequency-dependent part causes
a smearing of the temporal fine-structure in the signal. In [34], it was found that
implementations discarding the negative values of q, i.e. keeping q ≥ 0 sound more
natural and still exhibit a sufficiently strong effect. Time constants τ around 1.5 ms
yield a widening effect, and a diffuseness and distance impression is obtained with τ
around 15 ms. The parameter φ̂ is adjustable between 0 (no effect) and larger values.
5.5 Widening (Distance/Diffuseness/Early Lateral Reflections)
113
Beyond 80◦ the audio quality starts to degrade. The use as diffusing effect has turned
out to be useful as simple simulation of early lateral reflections, because most parts
of the spectrum are played back near the reversal points ±φ̂ of the dispersion contour. For naturally sounding early reflections, additional shelving filters introducing
attenuation of high frequencies prove useful.
Figures 5.12 and 5.13 show experimental ratings of the perceived effect strength
(width or distance) of the above algorithm in [34], which was implemented as
frequency-dependent (dispersive) panning on just a few loudspeakers L = 3, 4, 5, 7
evenly arranged from −90◦ to 90◦ on the horizon at 2.5 m distance from the central listening position. Loudspeakers were controlled by a sampling decoder of the
orders N = 1, 2, 3, 5 with the center of the max-r E -weighted panning direction at
0◦ in front. The signal was speech and as a reference it used the frontal loudspeaker
with the unprocessed signal “REF”. The experiment tested the algorithm with both
the symmetric impulse responses suggested by Eq. (5.18), and such truncated to their
causal q ≥ 0-side, for a listening position at the center of the arrangement (bullet
marker) and at 1.25 m shifted to the right, off-center (square marker). Figure 5.12
indicates for the widening algorithm with τ = 1.5 ms that the perceived width saturates above N > 2 at both listening positions. Despite the effect of the causal-sided
Fig. 5.12 Perceived width (left) and audio quality (right) of frequency-dependent dispersive
Ambisonic rotation as widening effect using the setting τ = 1.5 ms, the Ambisonic orders
N = 1, 2, 3, 5, and L = 3, 4, 5, 7 loudspeakers on the frontal semi-circle, with listening positions
at the center (bullet marker) and half-way right off-center (square marker)
Fig. 5.13 Perceived width (left) and audio quality (right) of frequency-dependent dispersive
Ambisonic rotation as distance/diffuseness effect using the setting τ = 15 ms, the Ambisonic orders
N = 1, 2, 3, 5, and L = 3, 4, 5, 7 loudspeakers on the frontal semi-circle, with listening positions
at the center (bullet marker) and half-way right off-center (square marker)
114
5 Signal Flow and Effects in Ambisonic Productions
implementation is weaker in effect strength, it highly outperforms the symmetric
FIR implementation in terms audio quality (right diagram), while still producing a
clearly noticeable effect when compared to the unprocessed reference (left diagram).
A more pronounced preference of the causal-sided implementation in terms of
audio quality is found in Fig. 5.13 for the setting τ = 15 ms, where the algorithm
is increasing the diffuseness or perceived distance for orders N > 2 at both
listening positions.
5.6 Feedback Delay Networks for Diffuse Reverberation
Feedback delay networks (FDN, cf. [35, 36]) can directly be employed to create diffuse Ambisonic reverberation. A dense response and an individual reverberation for
every encoded source can be expected when feeding the Ambisonic signals directly
into the inputs of the FDN.
As in Fig. 5.14, FDNs consists of a matrix A that is orthogonal AT A = I and
should mix the signals of the feedback loop well enough to distribute them across
all different channels to couple the resonators associated with the different delays
τi . These delays should not have common divisors to avoid pronounced resonance
frequencies, and are therefore typically chosen to be related to prime numbers. Small
delays are typically selected to be more closely spaced {2, 3, 5, . . . } ms to simulate
a diffuse part with densely spaced response at the beginning, and long delays further
apart often make the reverberation more interesting. Using unity factors as channel
τi
τi
τi
, gmi
, ghi
= 1 and any orthogonal matrix A, the reverberation time becomes
gains glo
infinite. For smaller channel gains, the FDN produces decaying output.
−3 t
Reverberation is characterized by the exponentially decaying envelope 10 T60 .
− 3
For a single delay of the length τi , the corresponding gain is g τi with g = 10 T60 .
This factor with the corresponding exponent provides equal reverberation decay rate
in every channel, and hereby exact control of the reverberation time. To make the
effect sound natural, it is typical to adjust the gains within a high-mid-low filter set
τi
≤
to decrease the reverberation towards higher frequency bands by the gains ghi
τi
τi
gmi ≤ glo .
The vector gathering the current sample for every feedback path is multiplied
by the matrix A. For calculation in real-time, Rocchesso proposed to use a scaled
Hadamard matrix A = M1 H of the dimensions M = 2k in [37]. It consists of ±1
entries only and hereby perfectly mixes the signal across the different feedbacks to
create a diffuse set of resonances. What is more, this not only replaces the M × M
multiply and adds of matrix multiplication multiplies by sums and differences, it is
5.6 Feedback Delay Networks for Diffuse Reverberation
115
Fig. 5.14 Feedback delay network (FDN) for Ambisonic reverb. The matrix A is unitary and the
−
3
gain g = 10 T60 to the power of the delay τi allows to adjust a spaitally and temporally diffuse
reverberation effect in different bands (lo, mi, hi)
moreover equivalent to the efficient implementation as Fast Walsh-Hadamard Transform (FWHT), a butterfly algorithm. Figure 5.15 shows a graphical implementation
example of a 16-channel FWHT in the real-time signal processing environment Pure
Data.
Fig. 5.15 The fast Walsh-Hadamard transform (FWHT) variant implemented in the 16-channel
feedback delay network reverberator [rev3∼] in Pure Data requires only 4 × 16 sums/differences
to replace the 16 × 16 multiplies of matrix multiplication by A
116
5 Signal Flow and Effects in Ambisonic Productions
5.7 Reverberation by Measured Room Impulse Responses
and Spatial Decomposition Method in Ambisonics
The first-order spatial impulse response of a room at the listener can be improved by
resolution enhancements of the spatial decomposition method (SDM) by Tervo [38],
which is a broad-band version of spatial impulse response rendering (SIRR) by
Merimaa and Pulkki [39, 40]. For reliable measurements, typically loudspeakers
are employed, and the typical measurement signals aren’t impulses, but swept-sine
signals that can are reverted to impulses by deconvolution. A room impulse response
is typically sparse in its beginning whenever direct sound and early reflections arrive
at the measurement location. Generally, it is likely that those arrival times in the early
part do not coincide and are well separated from each other, so that one can assume
their temporal disjointness at the receiver.
From a room impulse response h(t) that complies with this assumption, for which
there consequently is a direction of arrival (DOA) θ DOA (t) for every time instant, one
could construct an Ambisonic receiver-directional room impulse response as in [41]
h(θ R , t) = h(t) δ[1 − θ TR θ DOA (t)], depending on the direction θ R at the receiver.
This response can be
transformed into the spherical harmonic domain by integrating
it over yN (θ R ) dθ R S2 , to get the set of Nth -order Ambisonic room impulse responses
h̃N (t) = h(t) yN [θ DOA (t)].
A signal s(t) convolved by this vector of impulse responses theoretically generates
a 3D Ambisonic image of the mono sound in the room of the measurement. This
can be done, e.g., by the plug-in mcfx_convolver. Now there are two problems to
be solved: (i) how to estimate θ DOA (t), (ii) how to deal with the diffuse part of h(t),
when there are more sound arrivals at a time than one.
Estimation of the DOA. One could now just detect the temporal peaks of the room
impulse response and assign the guessed evolution of the direction of arrival as
suggested in [42], and hereby span the envelopment of the room impulse response.
Alternatively, if the room impulse response was recorded by a microphone array as
in [38], array processing can be used to estimate the direction of arrival θ DOA (t).
For first-order Ambisonic microphone arrays, when suitably band-limited to the
frequency range in which the directional mapping is correct, e.g. between 200 Hz
and 4 kHz, the vector r DOA of Eq. (A.83) in Appendix A.6.2 yields a suitable estimate
⎡
⎤
X (t)
r̃ DOA (t)
. (5.19)
r̃ DOA (t) = W (t) ⎣ Y (t) ⎦ = −ρc h(t) v(t), θ DOA (t) =
r̃ DOA (t)
Z (t)
Figure 5.16 shows the directional analysis of the first 100 ms of a first-order directional impulse response taken from the openair lib1 This response was measured
in St. Andrew’s Church Lyddington, UK (2600 m3 volume, 11.5 m source-receiver
distance) with a Soundfield SPS422B microphone.
1 http://www.openairlib.net.
5.7 Reverberation by Measured Room Impulse Responses . . .
0
45
180
90
−90
0
90
−180
Fig. 5.16 Spatial
distribution of the first
100 ms of a first-order
directional impulse response
measured at St. Andrew’s
Church Lyddington.
Brightness and size of the
circles indicate the level
117
135
180
2
reverberation time in s
Fig. 5.17 Frequencydependent reverberation time
calculated from original and
SDM-enhanced impulse
responses without spectral
decay recovery
1.5
1
st
1 order original
1st order SDM
0.5
3rd order SDM
th
5 order SDM
th
20 order SDM
0
2
10
3
10
4
10
frequency in Hz
The direct sound from the front is clearly visible, as well as strong early reflections
from front and back, and equally distributed weak directions from the diffuse reverb.
Spectral decay recovery for higher-order RIRs. The second task mentioned above
is that the multiplication of h(t) by yN [θ DOA (t)] to obtain h̃(t) degrades the spectral
decay at higher orders. If there is no further processing, the resulting response typically exhibits a noticeable increased spectral brightness [38, 41, 43]. This unnatural
brightness mainly affects the diffuse reverberation tail, where temporal disjointness
is a poor assumption. There, the corresponding rapid changes of θ DOA (t) cause a
strong amplitude modulation in the pre-processing of the late room impulse response
at high Ambisonic orders. Typically, long decays of low frequencies leak into high
frequencies, and hereby result in an erroneous spectral brightening of the diffuse tail.
Figure 5.17 analyses the behavior in terms of an erroneous increase of reverberation
time at high frequencies, especially when using high orders.
In order to equalize the spectral decay and hereby the reverberation time of the
SDM-enhanced impulse response, there is a helpful pseudo-allpass property of the
spherical harmonics for direct and diffuse fields, as described in Eqs. (A.52) and
(A.55) of Appendix A.3.7. The signals in the vector h̃(t) = [h̃ m
n (t)]nm are first decomposed into frequency bands, yielding the sub-band responses h̃ m
n (t, b). We can equalize the spectral sub-band decay for every band b and order n by targeting fulfillment
of the pseudo-allpass property
118
5 Signal Flow and Effects in Ambisonic Productions
n
2
= (2n + 1)E |h 00 (t, b)|2 .
E |h m
n (t, b)|
(5.20)
m=−n
The formulation above relies on the correct spectral decay of the omnidirectional
signal h 00 (t, b) = h̃ 00 (t, b), which is unaffected by modulation. Correction is achieved
by
(2n + 1) E{|h̃ 0 (t, b)|2 }
0
m
m
h n (t, b) = h̃ n (t, b) n
;
m (t, b)|2 }
E{|
h̃
n
m=−n
(5.21)
here, the expression E{| · |2 } refers to estimation of the squared signal envelope.
Perceptual evaluation. Frank’s 2016 experiments [44] measuring the area of the
sweet spot also investigated the plausibility of reverberation created by their Ambisonically SDM-processed measurements at different order settings, N = 1, 3, 5. For
Fig. 5.18b listeners indicated at which distance from the room’s center they heard
that envelopment began to collapse to the nearest loudspeakers. One can observe that
rendering diffuse reverberation for a large audience benefits from a high Ambisonic
order. Moreover, experiments in [43] revealed an improvement of the perceived spatial depth mapping, i.e. a clearer separation between foreground and background
sound for the SDM-processed higher-order reverberation, cf. Fig. 1.21b.
5
L
C
R
4
3
x in m
2
1
0
-1
-2
-3
-4
-5
6
5
4
3
2
1
0
-1 -2 -3 -4 -5 -6
y in m
(a) Sweet spot experiment
(b) Median size for diuse sound
Fig. 5.18 The perceptual sweet spot size as investigated by Frank [44] for SDM processed RIRs
cover an area in IEM CUBE that increases with the SDM order N chosen (black = 5th, gray = 3rd,
light gray = 1st order Ambisonics). In comparison to panned direct sound, one should keep some
distance to the loudspeakers to avoid breakdown of envelopment
5.8 Resolution Enhancement: DirAC, HARPEX, COMPASS
119
5.8 Resolution Enhancement: DirAC, HARPEX,
COMPASS
The concept of parametric audio processing [5] describes ways to obtain resolutionenhanced first-order Ambisonic recordings by parametric decomposition and rendering. One main idea is to decompose short-term stationary signals of a sound scene
into a directional and a less directional diffuse stream.
For synthesis of the directional part based on mono signals, it is clear how to obtain
the most narrow presentations by amplitude panning or higher-order Ambisonic
panning of consistent r E vector predictions as in Chap. 2.
The synthesis of diffuse and enveloping parts based on a mono signal can require
extra processing such as either widening/diffuseness effects or reverberation as in
Sects. 5.5 and 5.6, which both also provide a directionally wide distribution of sound.
Or more practically, the recording itself could deliver sufficiently many uncorrelated
instances of the diffuse sound to be played back by surrounding virtual sources.
Envelopment and diffuseness is based on providing a consistently low interaural
covariance or cross correlation of sufficiently high decorrelation.
DirAC. A main goal of DirAC (Directional Audio Coding [5]) is finding signals
and parameters for sound rendering by analyzing first-order Ambisonic recordings.
One variant is to use the intensity-vector-based analysis in the short-term Fourier
transform (STFT), see also Appendix A.6.2:
r DOA (t, ω) = −
ρc Re{ p(t, ω)∗ v(t, ω)}
Re{W (t, ω)∗ [X (t, ω), Y (t, ω), Z (t, ω)]T }
=
,
√
| p(t, ω)|2
2|W (t, ω)|2
(5.22)
which can be treated similarly as the r E vector, regarding direction and diffuseness
ψ = 1 − r DOA 2 .
Single-channel DirAC is Ville Pulkki’s original
√ way to decompose the W (t, ω) signal
in the STFT domain into a directional signal
1 − ψ W (t, ω) that is synthesized by
√
amplitude panning and a diffuse signal ψ W (t, ω) to be synthesized diffusely [45].
Virtual-microphone DirAC uses a first-order Ambisonic decoder to the given
loudspeaker layout and time-frequency-adaptive sharpening masks increasing the
focus of direct sounds, see Vilkamo [46] and [5, Ch. 6], or order e.g. Sect. 5.2.3.
Playback of diffuse sounds benefits from an optional diffuseness effect.
HARPEX (high angular-resolution plane-wave expansion [47]) is Svein Berge’s
patented solution to optimally decode sub-band signals. It is based on the observation
he made with Natasha Barrett that decoding to a tetrahedral loudspeaker layout is
perceptually outperforming if the tetrahedron nodes are rotationally aligned with
the sources of the recording. HARPEX accomplishes convincing diffuse and direct
sound reproduction by decoding to a variably adapted virtual loudspeaker layout in
every sub band. The layout is adaptively rotation-aligned with sources detected in
the band. HARPEX is typically described using an estimator for direction pairs.
120
5 Signal Flow and Effects in Ambisonic Productions
COMPASS (COding and Multidirectional Parameterization of Ambisonic Sound
Scenes [48]) by Archontis Politis can be seen as an extension of DirAC. In contrast
to DirAC, it tries to detect and separate multiple direct sound sources from the ambient
or background sound. This is done by applying two different kinds of beamformers:
one that contains only the direct sound for each sound source (source signals) and
one that contains everything but the direct sound (ambient signal). Similar as before,
the source signals are reproduced using amplitude panning and the ambient signal is
sent to the decorrelator. In contrast to DirAC, COMPASS is not limited to first-order
input but can also enhance the spatial resolution of higher-order inputs.
5.9 Practical Free-Software Examples
5.9.1 IEM, ambix, and mcfx Plug-In Suites
The ambix_converter is an important tool when adapting between the different
Ambisonic
scaling conventions, e.g. the standard SN3D
normalization that uses only
(n−|m|)! 2−δm
(n+|m|)! 4π
(2−δm )(2n+1)
for normalization instead of the full (n−|m|)!
that is called
(n+|m|)!
4π
N3D, see Fig. 5.19. This alternating definition is because of a practical choice of the
ambix format [49] to avoid high-order channels becoming louder than the zerothorder channel. Also it permits to adapt between channel sequences such as ACN’s
i = n 2 + n + m or SID’s i = n 2 + 2(n − |m|) + (m < 0). It is advisable to use test
recordings with the main directions, e.g. front, left, top, and to check that the channel
separation for decoded material is roughly exceeding 20 dB for 5th-order material.
Moreover, it contains inversion of the Condon-Shortley phase that typically causes
a 180◦ rotation around the z axis, and it contains the left-right, front-back, and topbottom flips discussed in the mirroring operations above.
Fig. 5.19 ambix_converter
plug-in
5.9 Practical Free-Software Examples
121
The ambix_warping plugin, see Fig. 5.20, implements the above-mentioned
warping operations shifting horizontal sounds towards one of the poles, or into both
polar directions. Warping can be applied to any other direction than zenith and
nadir when placing it between two mutually inverting ambix_rotation or IEM
SceneRotator objects that intermediately rotate zenith to another direction.
The IEM SceneRotator as the ambix_rotation plugin can be controlled by
head tracking and it essential for an immersive headphone-based experience, see
Fig. 5.21. Its processing is done as described above.
The ambix_directional_loudness plugin in Fig. 5.22 implements the abovementioned directional amplitude window in either circular or equi-rectangular spherical shape. Several of these windows can be made, soloed, and remote controlled,
each one of which allowing to set a gain for the inside and outside region. This is
often useful in practice, when, e.g., reinforcing or attenuating desired or undesired
signal parts within an Ambisonic scene.
Fig. 5.20 ambix_warping plug-in in Reaper
Fig. 5.21 IEM SceneRotator and ambix_rotator plug-ins
122
5 Signal Flow and Effects in Ambisonic Productions
Fig. 5.22 ambix_directional_loudness plug-in
Fig. 5.23 EnergyVisualizer plug-in
To observe the changes made to the Ambisonic scene, the IEM Energy
Visualizer can be helpful, see Fig. 5.23.
If, for instance, the Ambisonic scene requires dynamic compression, as outlined
in the section above, the IEM OmniCompressor is a helpful tool. It uses the omnidirectional Ambisonic channel to derive the compression gains (as a side-chain for all
other Ambisonic channels). Similarly as the directional_loudness plug-in, the
IEM DirectionalCompressor allows to select a window, but this time for setting
different dynamic compression within and outside the selected window, see Fig. 5.24.
The multichannel mcfx_filter plugin in Fig. 5.25 does not only implement a set
of parametric equalizers, a low- and high cut that can be toggled between filter skirts
5.9 Practical Free-Software Examples
123
Fig. 5.24 OmniCompressor and DirectionalCompressor plug-in
Fig. 5.25 mcfx_filter plug-in
of either 2nd or 4th order, but it also features a real-time spectrum analyzer to observe
the changes done to the signal. It is not only practical for Ambisonic purposes, it’s
just a set of parametric filters that is equally applied to all channels and controlled
from one interface.
The mcfx_convolver plug-in in Fig. 5.26 is useful for many purposes, also
scientific ones, e.g., when testing binaural filters or driving multi-channel arrays with
filters, etc. Its configuration files use the jconvolver format that specifies which
filter file (typically stored in multi-channel wav files) connects which of its multiple
inlets to which of its multiple outlets. It is also used to implement the SDM-based
reverberation described in the above sections.
For a cheaper reverberation network, the IEM FDNReverb network described
above can be used, see Fig. 5.27. It is not in particular an Ambisonic tool, but can
124
5 Signal Flow and Effects in Ambisonic Productions
Fig. 5.26 mcfx_convolver plug-in
Fig. 5.27 FDNReverb plug-in
be used in any multi-channel environment. The particularity of the implementation
in the IEM suite is that a slow onset can be adjusted.
The ambix_widening plug-in in Fig. 5.28 implements the widening by frequencydependent, dispersive rotation of the Ambisonic scene around the z axis as described
above. It can also be used to cheaply stylize lateral reflections instead of the IEM
RoomEncoder (Fig. 4.36) with time constant settings exceeding 5 ms, or just as a
5.9 Practical Free-Software Examples
125
Fig. 5.28 ambix_widening plug-in in Reaper
Fig. 5.29 mcfx_gain_delay plug-in
widening effect. The setting single-sided permits to suppress the slow attack of the
Bessel sequence.
Another tool is quite helpful, the mcfx_gain_delay plug-in in Fig. 5.29. It permits
to to solo or mute individual channels, as well as delay and attenuate them differently.
What is more and often even more useful: It is invaluably helpful for testing the signal
chain, as one can step through the channels with different signals.
5.9.2 Aalto SPARTA
The SPARTA plug-in suite by Aalto University provides Ambisonic tools for encoding, decoding on loudspeakers and headphones, as well as visualization. A special
feature is the COMPASS decoder plug-in Fig. 5.30 that can increase the spatial resolution of first-, second-, and third-order recordings. Playback can be done either
126
5 Signal Flow and Effects in Ambisonic Productions
Fig. 5.30 COMPASS Decoder plug-in
on arbitrary loudspeaker arrangements or their virtualization on headphones. The
signal-dependent parametric processing allows to adjust the balance between direct
and diffuse sound in each frequency band. In order to suppress artifacts due to the
processing, the parametric playback (Par) can be mixed with the static decoding
(Lin) of the original recording. While it is advisable to keep the parametric contribution below 2/3 for noticable directional improvements and low artifacts, in general,
in recordings with cymbals or hihats it is advisable to fade towards lin starting at
around 4 kHz.
5.9.3 Røde
The Soundfield plug-in by Røde in Fig. 5.31 was originally designed to process the
signals from the four cardioid microphone capsules of their Soundfield microphone.
However, it also supports first-order Ambisonics as input format. It can decode to
various loudspeaker arrangements by placing virtual microphones into the directions
of the loudspeakers. The directivity of each virtual microphone can be adjusted
between first-order cardioid and hyper-cardioid. Moreover, higher-order directivity
patterns are possible using a parametric signal-dependent processing, resulting in an
increase of the spatial resolution.
References
127
Fig. 5.31 Soundfield by Røde plug-in
References
1. D.G. Malham, Higher-order ambisonic systems for the spatialisation of sound, in Proceedings
of the ICMC (Bejing, 1999)
2. M. Frank, F. Zotter, A. Sontacchi, Producing 3D audio in ambisonics, in 57th International
AES Conference, LA, The Future of Audio Entertainment Technology? Cinema, Television and
the Internet (2015)
3. M. Frank, A. Sontacchi, Case study on ambisonics for multi-venue and multi-target concerts
and broadcasts. J. Audio Eng. Soc. 65(9) (2017)
4. D. Rudrich, F. Zotter, M. Frank, Efficient spatial ambisonic effects for live audio, in 29th TMT,
Cologne, November 2016
5. V. Pulkki, S. Delikaris-Manias, A. Politis (eds.), Spatial Decomposition by Spherical Array
Processing (Wiley, New Jersey, 2017)
6. A. Politis, Microphone array processing for parametric spatial audio techniques, Ph.D. dissertation, Aalto University, Helsinki (2016)
7. B. Bernschütz, Microphone arrays and sound field decomposition for dynamic binaural recording, Ph.D. Thesis, TU Berlin (TH Köln) (2016), http://dx.doi.org/10.14279/depositonce-5082
8. B. Bernschütz, A.V. Giner, C. Pörschmann, J. Arend, Binaural reproduction of plane waves
with reduced modal order. Acta Acustica u. Acustica 100(5) (2014)
128
5 Signal Flow and Effects in Ambisonic Productions
9. Z. Ben-Hur, F. Brinkmann, J. Sheaffer, S. Weinzierl, B. Rafaely, Spectral equalization in binaural signals represented by order-truncated spherical harmonics. J. Acoust. Soc. Am. 141(6)
(2017)
10. C. Andersson, Headphone auralization of acoustic spaces recorded with spherical microphone
arrays, M. Thesis, Chalmers TU, Gothenburg (2017)
11. M. Zaunschirm, C. Schörkhuber, R. Höldrich, Binaural rendering of ambisonic signals by
head-related impulse response time alignment and a diffuseness constraint. J. Acoust. Soc.
Am. 143(6), 3616–3627 (2018)
12. E. Kurz, F. Pfahler, M. Frank, Comparison of first-order ambisonic microphone arrays, in
International Conference, Spatial Audio (ICSA) (2015)
13. E. Hendrickx, P. Stitt, J.-C. Messonier, J.-M. Lyzwa, B.F. Katz, C. Boishéraud, Influence of
head tracking on the externalization of speech stimuli for non-individualized binaural synthesis.
J. Ac. Soc. Am. 141(3), 2011–2023 (2017)
14. H. Lee, C. Gribben, Effect of vertical microphone layer spacing for a 3d microphone array. J.
Audio Eng. Soc. 62(12), 870–884 (2014)
15. R. Wallis, H. Lee, The reduction of vertical interchannel cross talk: The analysis of localisation
thresholds for natural sound sources, MDPI Appl. Sci. 7(3) (2017)
16. H. Lee, M. Frank, F. Zotter, Spatial and timbral fidelities of binaural ambisonics decoders for
main microphone array recordings, in AES IIA Conference (York, 2019)
17. M. Kronlachner, F. Zotter, Spatial transformations for the enhancement of ambisonic recordings, in Proceedings of the ICSA, Erlangen (2014)
18. M. Hafsati, N. Epain, J. Daniel, Editing ambisonic sound scenes, in 4th International Conference, Spatial Audio (ICSA) (Graz, 2017)
19. G. Aubert, An alternative to wigner d-matrices for rotating real spherical harmonics. AIP
Adv. 3(6), 062121 (2013). https://doi.org/10.1063/1.4811853
20. F. Zotter, Analysis and synthesis of sound-radiation with spherical arrays, PhD. Thesis, University of Music and Performing Arts, Graz (2009)
21. D. Pinchon, P.E. Hoggan, Rotation matrices for real spherical harmonics: general rotations of
atomic orbitals in space-fixed axes. J. Phys. A 40, 1751–8113 (2007)
22. J. Ivanic, K. Ruedenberg, Rotation matrices for real spherical harmonics. Direct determination
by recursion. J. Phys. Chem. 100, 6342–6347 (1996)
23. J. Ivanic, K. Ruedenberg, Additions and corrections. J. Phys. Chem. A 102, 9099–9100 (1998)
24. C.H. Choi, J. Ivanic, M.S. Gordon, K. Ruedenberg, Rapid and stable determination of rotation
matrices between spherical harmonics by direct recursion. J. Chem. Phys. 111, 8825–8831
(1999)
25. N.A. Gumerov, R. Duraiswami, Recursions for the computation of multipole translation and
rotation coefficients for the 3-d helmholtz equation. SIAM 25, 1344–1381 (2003)
26. J.R. Driscoll, D.M.J. Healy, Computing fourier transforms and convolutions on the 2-sphere.
Adv. Appl. Math. 15, 202–250 (1994)
27. M.A. Gerzon, G.J. Barton, Ambisonic decoders for HDTV, in prepr. 3345, 92nd AES Convention (Vienna, 1992)
28. S.P. Lipshitz, J. Vanderkoy, In-phase crossover network design. J. Audio Eng. Soc. 34(11),
889–894 (1986)
29. U. Zölzer, Digital Audio Effects, 2nd edn. (Wiley, New Jersey, 2011)
30. M.A. Gerzon, Signal processing for simulating realistic stereo images, in prepr. 3423, Convention Audio Engineering Society (San Francisco, 1992)
31. M.-V. Laitinen, T. Pihlajamäki, C. Erkut, V. Pulkki, Parametric time-frequency representation
of spatial sound in virtual worlds. ACM TAP 9(2) (2012)
32. F. Zotter, M. Frank, M. Kronlachner, J. Choi, Efficient phantom source widening and diffuseness
in ambisonics, in EAA Symposium on Auralization and Ambisonics (Berlin, 2014)
33. F. Zotter, M. Frank, Efficient phantom source widening. Arch. Acoust. 38(1) (2013)
34. F. Zotter, M. Frank, Phantom source widening by filtered sound objects, in prepr. 9728, Convention, Audio Engineering Society (Berlin, 2017)
References
129
35. J. Stautner, M. Puckette, Designing multi-channel reverberators. Comput. Music J. 6(1), 52–65
(1982)
36. J.-M. Jot, A. Chaigne, Digital delay networks for designing artificial reverberators, in 90th AES
Convention, prepr. 3030 (Paris, 1991)
37. D. Rocchesso, Maximally diffusive yet efficient feedback delay networks for artificial reverberation. IEEE Signal Process. Lett. 4(9) (1997)
38. S. Tervo, J. Pätynen, A. Kuusinen, T. Lokki, Spatial decomposition method for room impulse
responses. J. Audio Eng. Soc. 61(1/2), 17–28 (2013), http://www.aes.org/e-lib/browse.cfm?
elib=16664
39. J. Merimaa, V. Pulkki, Spatial impulse response rendering i: Analysis and synthesis. J. Audio
Eng. Soc. 53(12), 1115–1127 (2005)
40. V. Pulkki, J. Merimaa, Spatial impulse response rendering ii: Reproduction of diffuse sounds
and listening tests. J. Audio Eng. Soc. 54(1/2), 3–20 (2006)
41. M. Zaunschirm, M. Frank, F. Zotter, Brir synthesis using first-order microphone arrays, in
prepr. 9944, 144th AES Convention (Milano, 2018)
42. C. Pörschmann, P. Stade, J.M. Arend, Binauralization of omnidirectional room impulse
responses – algorithm and technical evaluation,” in DAFx-17 (Edinburgh, 2017)
43. M. Frank, F. Zotter, Spatial impression and directional resolution in the reproduction of reverberation, in Fortschritte der Akustik - DEGA (Aachen, 2016)
44. M. Frank, F. Zotter, Exploring the perceptual sweet area in ambisonics, in AES 142nd Convention (2017)
45. V. Pulkki, Spatial sound reproduction with directional audio conding. J. Audio Eng. Soc. 55(6),
503–516 (2007)
46. J. Vilkamo, T. Lokki, V. Pulkki, Directional audio coding: virtual microphone-based synthesis
and subjective evaluation. J. Audio Eng. Soc. 57(9) (2009)
47. S. Berge, Device and method converting spatial audio signal, US Patent, no. 8,705,750 B2
(2014)
48. A. Politis, S. Tervo, V. Pulkki, Compass: Coding and multidirectional parameterization of
ambisonic sound scenes, in 2018 IEEE International Conference on Acoustics, Speech and
Signal Processing (ICASSP) (IEEE, 2018), pp. 6802–6806
49. C. Nachbar, F. Zotter, A. Sontacchi, E. Deleflie, Ambix - suggesting an ambisonic format, in
Proceedings of the 3rd Ambisonics Symposium (Lexington, Kentucky, 2011)
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0
International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing,
adaptation, distribution and reproduction in any medium or format, as long as you give appropriate
credit to the original author(s) and the source, provide a link to the Creative Commons license and
indicate if changes were made.
The images or other third party material in this chapter are included in the chapter’s Creative
Commons license, unless indicated otherwise in a credit line to the material. If material is not
included in the chapter’s Creative Commons license and your intended use is not permitted by
statutory regulation or exceeds the permitted use, you will need to obtain permission directly from
the copyright holder.
Chapter 6
Higher-Order Ambisonic Microphones
and the Wave Equation (Linear, Lossless)
… a turning point has been the design of HOA microphones,
opening an exciting experimental field in terms of real 3D sound
field recording …
Jérôme Daniel [1] at Ambisonics Symposium 2009.
Abstract Unlike pressure-gradient transducers, single-transducer microphones with
higher-order directivity apparently turned out to be difficult to manufacture at reasonable audio quality. Therefore nowadays, higher-order Ambisonic recording with
compact devices is based on compact spherical arrays of pressure transducers. To
prepare for higher-order Ambisonic recording based on arrays, we first need a model
of the sound pressure that the individual transducers of such an array would receive
in an arbitrary surrounding sound field. The lossless, linear wave equation is the
most suitable model to describe how sound propagates when the sound field is composed of surrounding sound sources. Fundamentally, the wave equation models sound
propagation by how small packages of air react (i) when being expanded or compressed by a change of the internal pressure, and to (ii) directional differences in the
outside pressure by starting to move. Based there upon, the inhomogeneous solutions of the wave equation describe how an entire free sound field builds up if being
excited by an omnidirectional sound source, as a simplified model of an arbitrary
physical source, such as a loudspeaker, human talker, or musical instrument. After
adressing these basics, the chapter shows a way to get Ambisonic signals of high
spatial and timbral quality from the array signals, considering the necessary diffusefield equalization, side-lobe suppression, and trade off between spatial resolution
and low-frequeny noise boost. The chapter concludes with application examples.
Gary Elko and Jens Meyer are the well-known inventors of the first commercially
available compact spherical microphone array that is able to record higher-order
Ambisonics [2], the Eigenmike. There are several inspiring scientific works with
© The Author(s) 2019
F. Zotter and M. Frank, Ambisonics, Springer Topics in Signal Processing 19,
https://doi.org/10.1007/978-3-030-17207-7_6
131
132
6 Higher-Order Ambisonic Microphones and the Wave Equation (Linear, Lossless)
valuable contributions that can be recommended for further reading [3–12], above
all Boaz Rafaely’s excellent introductory book [13].
This mathematical theory might appear extensive, but it cannot be avoided when
aiming at an in-depth understanding of higher-order Ambisonic microphones. The
theory enables processing of the microphone signals received such that the surrounding sound field excitation is retrieved in terms of an Ambisonic signal. Some readers
may want to skip the physical introduction and resume in Sect. 6.5 on spherical
scattering or Sect. 6.6 on the processing of the array signals.
6.1 Equation of Compression
Wave propagation involves reversible short-term temperature fluctuations becoming
effective when air is being compressed by sound, causing the specific stiffness of
air in sound propagation. The Appendix A.6.1 shows how to derive this adiabatic
compression relation based on the first law of thermodynamics and the ideal gas
law. It relates the relative volume change VV0 to the pressure change p = −K VV0
by the bulk modulus of air. After expressing the bulk modulus by more common
constants1 K = ρ c2 and differentially formulating the volume change over time
using the change of the sound particle velocity in space, e.g. in one dimension ṗ =
x
, cf. Appendix A.6.1, we get the three-dimensional compression equation:
−ρ c2 ∂v
∂x
∂p
= −ρ c2 ∇ T v.
∂t
(6.1)
∂
Here the inner product of the Del symbol ∇ T = ( ∂∂x , ∂∂y , ∂z
) with v yields what is
∂v
x
z
+ ∂ yy + ∂v
. The equation means: Indepencalled divergence div(v) = ∇ T v = ∂v
∂x
∂z
dently of whether the outer boundaries of a small package of air are traveling at a
common velocity: If there are directions into which their velocity is spatially increasing, the resulting gradual volume expansion over time causes a proportional decrease
of interior pressure over time.
6.2 Equation of Motion
The equation of motion is relatively simple to understand from the Newtonian equax
equates the external force to
tion of motion, e.g. for the x direction, Fx = m ∂v
∂t
. For a small package of
mass m times acceleration, i.e. increase in velocity ∂v
∂t
air with constant volume V0 = xyz, the mass is obtained by the air density m = ρ V0 , and the force equals the decrease of in pressure over the three
space directions, times the corresponding partial surface, e.g. for the x direction
1 Typical
constants are: density ρ = 1.2 kg/m3 , speed of sound c = 343 m/s.
6.2 Equation of Motion
133
Fx = −[ p(x + x) − p(x)]yz. For the x direction, this yields after expanding
by x
x
∂vx
p
−
V0 = ρ V0
.
x
∂t
Dividing by −V0 and letting V0 → 0, we obtain the typical shape of the equation of
motion for all three space directions
∇ p = −ρ
∂v
.
∂t
(6.2)
The equation of motion means: Independently of the common exterior pressure load
on all the outer boundaries of a small air package, an outer pressure decrease
into any direction implies a corresponding pushing force on the package causing a
proportional acceleration into this direction.
6.3 Wave Equation
We can combine the compression equation
∂p
∂t
= −ρ c2 ∇ T v with the equation of
motion ∇ p = −ρ ∂v
by deriving the first one with regard to time ∂∂t 2p = −ρ c2 ∇ T ∂v
∂t
∂t
and the second one with the gradient ∇ T yielding the Laplacian ∇ T ∇ = , hence
. Division of the first result by c2 and equating both terms yields the
p = −ρ∇ T ∂v
∂t
2
lossless wave equation p = c12 ∂t∂ 2 p that is typically written as
−
2
1 ∂2 p = 0.
c2 ∂t 2
(6.3)
Obviously, the wave equation relates the curvature in space (expressed by the Laplacian) to curvature in time (expressed by the second-order derivative).
If p is a pure sinusoidal oscillation sin(ω t + φ0 ), the second derivative in time
corresponds to a factor −ω2 , and by substitution with the wave-number k = ωc , we
can write the frequency-domain wave equation as
( + k 2 ) p = 0,
Helmholtz equation.
(6.4)
6.3.1 Elementary Inhomogeneous Solution: Green’s
Function (Free Field)
The Green’s function is an elementary prototype for solutions to inhomogeneous
problems ( + k 2 ) p = −q, which is defined as
+ k 2 G = −δ.
134
6 Higher-Order Ambisonic Microphones and the Wave Equation (Linear, Lossless)
A general excitation q ofthe equation can be represented by its convolution with the
Dirac delta distribution q(s) δ(r − s) dV (s) = q(r). Consequently, as the wave
equation is linear, the general solution must therefore also
equal the convolution of
the Green’s function with the excitation function p(r) = q(s) G(r − s) dV (s) over
space; if formulated in the time domain: also over time. The integral superimposes
acoustical responses of any point in time and space of the source phenomenon,
weighted by the corresponding source strength in space and time.
The Green’s function in three dimensions is derived in Appendix A.6.3, Eq. (A.91),
G=
e−ik r
,
4πr
(6.5)
with the wave number k = ωc and distance between source and receiver
r = r − r s 2 .
Acoustic source phenomena are characterized by the behavior of the Green’s
function: far away, the amplitude decays with r1 and the phase −kr = −ω rc corresponds to the radially increasing delay rc . Both is expressed in Sommerfeld’s radiation
condition limr →∞ r ∂r∂ p + ik p = 0.
Plane waves. The radius coordinate of the Green’s function is the distance between
two Cartesian position vectors r s and r, the source and receiver location. Letting one
of them become large is denoted by re-expressing it in terms of radius and direction
vector r s = rs θ s . This permits far-field approximation
rs = r s − r =
(rs θ s − r)T (rs θ s − r) =
lim rs = lim rs 1 − 2
rs →∞
rs →∞
θ Ts r
rs
+ rr2 = rs − θ Ts r.
s
2
rs2 − 2rs θ Ts r + r 2
(with lim
x→0
√
(6.6)
1 − 2x = 1 − x).
For the phase approximation, for instance at a wave-length of 30 cm, we notice even
for a relatively small distance difference, e.g. between 15 m and 15 m + 15 cm, we
could change the sign of the wave. To approximate the phase of the Green’s function,
we must therefore at least use rs − θ Ts r as approximation. By contrast, this level of
precision is irrelevant for the magnitude approximation, e.g., it would be negligible
1
.
if we used 151m instead of the magnitude 15 m+15
cm
At a large distance rs assumed to be constant, the Green’s function is proportional
to a plane wave from the source direction θ s
lim G =
rs →∞
e−ik rs
4π rs
eik θ s r .
T
(6.7)
The plane-wave part is of unit magnitude | p| = 1
p = eik θ s
T
r
(6.8)
6.3 Wave Equation
135
and its phase evaluates the projection of the position vector onto the plane-wave
arrival direction θ s . Towards the direction θ s , the phase grows positive, i.e. the signal
arrives earlier. Towards the plane-wave propagation direction −θ s the phase grows
negatively, implying an increasing time delay, which is constant on any plane perpendicular to θ s .
Plane waves are an invaluable tool to locally approximate sound fields from
sources that are sufficiently far away, within a small region.2
6.4 Basis Solutions in Spherical Coordinates
Figure 4.11 shows spherical coordinates [14, 15] using radius r , azimuth ϕ, and
zenith ϑ. For simplification, zenith is replaced by ζ = cos ϑ = rz , here. We may
solve the Helmholtz equation ( + k 2 ) p = 0 in spherical coordinates by the radial
and directional parts of the Laplacian = r + ϕ,ζ , as identified in Appendix A.3
r =
∂2
2 ∂
,
+
2
∂r
r ∂r
ϕ,ζ =
1
∂2
1 − ζ 2 ∂2
2 ∂
+
−
ζ
.
r 2 ∂ζ 2 r 2 ∂ζ
r 2 (1 − ζ 2 ) ∂ϕ 2
(6.9)
We already know the spherical harmonics as directional eigensolutions from Sect. 4.7
ϕ,ζ Ynm = −
n(n + 1) m
Yn
r2
(6.10)
and assume them to be a factor of the solution pnm = R Ynm determining the value
of ϕ,ζ in (r + k 2 + ϕ,ζ ) pnm = 0. We find a separated radial differential equation
2
∂
after insertion, multiplication by Yr m , and re-expressing the differentials ∂r∂ = k ∂kr
and
∂2
∂r 2
∂
= k 2 ∂(kr
)2
2
(kr )2
n
∂2
∂
+ (kr )2 − n(n + 1) R = 0.
+ 2(kr )
∂(kr )2
∂(kr )
(6.11)
Appendix A.6.4 shows how to get physical solutions for R of this, so-called, spherical
Bessel differential equation: spherical Hankel functions of the second kind h (2)
n (kr )
able to represent radiation (radially outgoing into every direction), consistently with
Green’s function G, diverging with an (n + 1)-fold pole at kr = 0, a physical behavior that would also be observed after spatially differentiating G, see Fig. 6.1; spherical
Bessel functions jn (kr ) = ℜ{h (2)
n (kr )} are real-valued, converge everywhere, exhibit
2 This
is because, strictly speaking, an entire plane-wave sound field is unphysical and of infinite
energy: either the exhaustive in-phase vibration of an infinite plane is required, or an infiniteamplitude point-source infinitely far away is required with infinite anticipation ts → +∞ (noncausal).
136
6 Higher-Order Ambisonic Microphones and the Wave Equation (Linear, Lossless)
1
1
ℜ{h(2)(kr)}
0
ℜ{h(2)(kr)}
1
(2)
ℜ{h (kr)}
2
(2)
ℜ{h3 (kr)}
0.5
0
0
5
10
15
ℑ{h(2)(kr)}
0
ℑ{h(2)(kr)}
0.5
1
(2)
2
(2)
ℑ{h3 (kr)}
ℑ{h (kr)}
0
0
5
10
15
(2)
40
|h0 (kr)|dB
20
|h(2)(kr)|dB
0
|h(2)(kr)|dB
−20
|h(2)(kr)|dB
0.03
1
2
3
0.1
0.32
1
3.2
10
32
(2)
Fig. 6.1 Spherical Bessel functions jn (kr ) = ℜ{h n (kr )} (top left), imaginary part of spherical
(2)
Hankel functions ℑ{h (2)
n (kr )} (top right), and magnitude/dB of |h n (kr )| (bottom), over kr
an n-fold zero at kr = 0, and can’t represent radiation. Implementations typically
rely on the accurate standard libraries implementing cylindrical Bessel and Hankel
functions:
jn (kr ) =
π 1
J 1 (kr ),
2 kr n+ 2
h (2)
n (kr ) =
π 1 (2)
H 1 (kr ).
2 kr n+ 2
(6.12)
Wave spectra and spherical basis solutions. Any sound field evaluated at a radius r
where the air is source-free and homogeneous in any direction can be represented by
spherical basis functions for enclosed jn (kr )Ynm (θ ) and radiating fields h n (kr )Ynm (θ)
∞
n
p=
bnm jn (kr ) + cnm h n (kr ) Ynm (θ) .
(6.13)
n=0 m=−n
Here, bnm are the coefficients for incoming waves that pass through and emanate
from radii larger than r and cnm are the coefficients of outgoing waves radiating
from sources at radii smaller than r ; the coefficients are called wave spectra of the
incoming and outgoing waves, cf. [16].
Ambisonic plane-wave spectrum, plane wave. Plane waves only use the coefficients bnm , while cnm = 0 in Eq. (6.13). The sum of incoming plane waves from
all directions, whose amplitudes are given by the spherical harmonics coefficients
χnm as a set of Ambisonic signals are described by the incoming wave spectrum, see
Appendix A.6.5, Eq. (A.119)
bnm = 4π in χnm .
(6.14)
Figure 6.2 shows a single plane wave incoming from the direction θ s represented by
bnm = 4π in Ynm (θ s )
(6.15)
6.4 Basis Solutions in Spherical Coordinates
137
Fig. 6.2 Plane wave from y axis ϕ = ϑ = π2 in horizontal cross section; time steps correspond
to 0◦ , 60◦ , 120◦ , and 180◦ phase shifts φ in the plot ℜ{ p eiφ } showing p from Eq. (6.13) with
cnm = 0 and bnm of Eq. (6.15) with bnm = 4π in Ynm ( π2 , π2 ); long wave (top), short wave (bottom);
simulation uses N = 25 and area shows |kx|, |ky| < 2π and 8π
at four different time steps corresponding to 0◦ , 60◦ , 120◦ and 180◦ time shifts for
the two wave lengths shown.
6.5 Scattering by Rigid Higher-Order Microphone Surface
Higher-order Ambisonic microphone arrays are typically mounted on a rigid sphere
of some radius r = a, such as the Eigenmike EM32, see Fig. 6.3. The physical boundary of the rigid spherical surface is expressed as a vanishing radial component of
the sound particle velocity. The radial sound particle velocity is obtained via the
Fig. 6.3 32-channel
higher-order Ambisonic mic.
Eigenmike EM32
138
6 Higher-Order Ambisonic Microphones and the Wave Equation (Linear, Lossless)
Fig. 6.4 Plane waves scattered by rigid sphere ka = π (top) or ka = 4π (bottom); time steps
correspond to 0◦ , 60◦ , 120◦ , and 180◦ phase shifts φ in the plot ℜ{ p eiφ } showing p from Eq. (6.13)
with bnm and cnm from Eq. (6.15) with bnm = 4π in Ynm ( π2 , π2 ) and Eq. (6.16); simulation uses
N = 25
equation of motion Eq. (6.2) by deriving Eq. (6.13). This requires to evaluate differentiated spherical radial solutions jn (x) as well as h (2)
n (x), which is implemented
by f n (x) = nx f n (x) − f n+1 (x) for either of the functions, cf. e.g. [16]. A sound-hard
boundary condition at the radius a requires
i
vr r =a =
ρc
∞
n
bnm jn (kr ) + cnm h (2)
n (kr )
Y m (θ)
r =a n
= 0,
n=0 m=−n
which is fulfilled by a vanishing term in square brackets. The rigid boundary responds
to incoming surround-sound by velocity-canceling outgoing waves h (2)
n (ka) cnm =
− jn (ka) bnm . The coefficients ψnm yield the sound pressure in Fig. 6.4,
jn (ka) ψnm Ynm (θ ), with ψnm = jn (kr ) − h (2)
(kr
)
bnm .
n
h (2)
n (ka) r =a
n=0 m=−n
(6.16)
∞
p=
n
The two terms of the bracket are typically further simplified by a common
denominator and recognizing the Wronskian Eq. (A.97) in the numerator
jn (x)h n (x)− jn (x)h n (x)
= x 2 hi (x)
h (x)
n
n
ψnm |r =a =
i
(ka)2
h (2)
n (ka)
bnm .
(6.17)
6.5 Scattering by Rigid Higher-Order Microphone Surface
139
0
2
(2)
−1
|(ka) h’0 (ka)| dB
−20
|(ka)2 h’(2)(ka)|−1dB
−40
|(ka)2 h’(2)
(ka)|−1dB
2
1
2
(2)
−1
|(ka) h’3 (ka)| dB
−60
0.03
0.1
0.32
1
3.2
10
32
Fig. 6.5 Attenuation/dB of Ambisonic signals of different orders for varying values of ka
Relation of recorded sound pressure to Ambisonic signal. The scattering equation relates the recorded sound pressure expanded in spherical harmonics to the
Ambisonic signal of surround sound scene, see frequency responses in Fig. 6.5,
ψnm |r =a =
4π in+1
(ka)2 h (2)
n (ka)
χnm .
(6.18)
It is formally convenient that as soon as the sound pressure is given in terms
of its spherical harmonic coefficient signals ψnm , the Ambisonic signals χnm of a
concentric playback system are obviously just an inversely filtered version thereof,
with no need for further unmixing/matrixing.
Recognizable from Fig. 6.6 and following our intuition, waves of lengths larger
than the diameter 2a of the sphere will only weakly map to complicated high-order
patterns. It is therefore easily understood that the transfer function in+1 [(ka)2 h (2)
n
(ka)]−1 attenuates the reception of high-order Ambisonic signals at low frequencies,
see Fig. 6.5.
6.6 Higher-Order Microphone Array Encoding
The block diagram of Ambisonic encoding of higher-order microphone array signals
is shown in Fig. 6.7. The first processing step is about decomposing the pressure
samples p(t) from the microphone array into its spherical harmonics coefficients
ψ N (t): To which amount do the samples contain omnidirectional, figure-of-eight, and
other spherical harmonic patterns, up to which the microphone arrangement allows
decomposition. The frequency-independent matrix (Y TN )† does the conversion. It is
the left-inverse to the spherical harmonics sampled at the microphone positions, as
shown in the upcoming section.
The second step then sharpens the sound pressure image to an Ambisonic signal by
filtering the spherical harmonic coefficient signals. The basic relation between sound
pressure coefficients and Ambisonic signals is given in Eq. (6.18) and describes a
filter for every coefficient signal, differing only in filter characteristics for different
spherical harmonic orders. Robustness to noise, microphone matching and position-
140
6 Higher-Order Ambisonic Microphones and the Wave Equation (Linear, Lossless)
diameter,i.e. k a = π2
(a)
1
32 -wave-length
diameter,i.e. k a =
π
32
(b)
(c)
1
16 -wave-length
diameter,i.e. k a =
π
16
(d) 1-wave-length diameter,i.e. k a = π
(e)
1
8 -wave-length
diameter,i.e. k a =
π
8
(f) 2-wave-length diameter,i.e. k a = 2 π
(g)
1
4 -wave-lengths
diameter,i.e. k a =
π
4
1
2 -wave-length
(h) 4-wave-lengths diameter,i.e. k a = 4 π
Fig. 6.6 Plane-wave sound pressure image ℜ{ p e−ika } on rigid
sphere with varying ka using ψnm
from Eq. (6.17) expanded over the spherical harmonics p = ψnm Ynm and χnm = Ynm (0, 0) for
a plane wave from z. With the wave length λ = cf , the value ka is related to a diameter 2a of
= 2ππ cf a = 2a
λ in wave lengths to express frequency dependency; simulation uses N = 50; for
a = 4.2 cm, ka values correspond to f = 125, 250, 500, 1000, 2000, 4000, 8000, 16000 Hz
ka
π
6.6 Higher-Order Microphone Array Encoding
141
Fig. 6.7 Higher-order
Ambisonic microphone
encoding: sound pressure
samples p(t) are
spherical-harmonics
decomposed by the matrix
(Y TN )† , and the resulting
coefficient signals ψ N (t) are
converted to Ambisonic
signals χ N (t) by the
sharpening filters ρn (ω)
ing is the key here, and only achieved by the careful design of these filters, as shown
in a further sections below. The design considers a gradually increasing sharpening
over frequency, for which it moreover employs a filter bank with separate, max-r E
weighted and E normalized bands, in order to provide (i) limitation of noise and
errors, (ii) a frequency response perceived as flat, and (iii) optimal suppression of
the sidelobes.
6.7 Discrete Sound Pressure Samples in Spherical
Harmonics
To determine the Ambisonics signals χnm , we obviously need to find ψnm based
on all sound pressure samples p(θi ) recorded by the microphones distributed on
the rigid-sphere array. To accomplish this, we set up a system of model equations
equating the pressure samples to the unknown coefficients ψnm expanded over the
spherical harmonics Ynm (θi ) sampled at every microphone position. A vector and
matrix notation p = [ p(θi )]i and Y TN = [ y(θi )T ]i,nm is helpful
⎡
⎤
⎤⎡
. . . YNN (θ 1 )
ψ00
..
.. ⎥ ⎢ .. ⎥
.
. ⎦⎣ . ⎦
ψNN
Y00 (θ M ) . . . YNN (θ M )
⎤ ⎡ 0
p(θ1 )
Y0 (θ 1 )
⎢ .. ⎥ ⎢ ..
⎣ . ⎦=⎣ .
p(θM )
pN = Y TN ψ N .
(6.19)
Left inverse (MMSE). The equation can be (pseudo-)inverted if the matrix Y N is
well conditioned. Typically more microphones are used than coefficients searched
M ≥ (N + 1)2 . Inversion is a matter of mean-square error minimization: As the M
dimensions may contain more degrees of freedom than (N + 1)2 , the coefficient
142
6 Higher-Order Ambisonic Microphones and the Wave Equation (Linear, Lossless)
vector ψ N giving the closest model pN to the measurement p is searched,
mine2 ,
ψN
with e = pN − p = Y TN ψ N − p.
(6.20)
The minimum-mean-square-error (MMSE) solution is, see Appendix A.4, Eq. (A.65),
ψ N = (Y N Y TN )−1 Y N p = (Y TN )† p.
(6.21)
The resulting left inverse (Y N Y TN )−1 Y N inverts the thin matrix Y TN from the left.
(Y TN )† symbolizes the pseudo inverse; it is left-inverse for thin matrices.
If the microphones are arranged in a t-design and the order N is chosen suitably,
is equivalent to the left inverse. A more thorough
then the transpose matrix times 4π
L
discussion on spherical point sets can be found in [17–19].
The maximum determinant points [20] are a particular kind of critical directional
sampling scheme that allows to use exactly as few microphones M = (N + 1)2
as spherical harmonic coefficients obtained, yielding a well-conditioned square
matrix Y N , so that it can be inverted directly without left/pseudo-inversion. The 25
maximum-determinant points for N = 4 are used in the simulation example below.3
Finite-order assumption and spatial aliasing. An important implication of estimating ψnm is that we need to assume that the distribution of the sound pressure is of
limited spherical harmonic order on the measurement surface. This could be done
by restricting the frequency range, as high-order harmonics are attenuated wellenough according above suitable frequency limits, cf. Fig. 6.5. However, low-pass
filtered signals are unacceptable in practice. Instead, one has to accept spatial aliasing at high frequencies, i.e. directional mapping errors and direction-specific comb
T −1
filters.
Figure m6.8 shows spatial aliasing of ψ N = (Y N ) p in the angular domain
p = ψnm Yn .
6.8 Regularizing Filter Bank for Radial Filters
−1
The filters in (ka)2 h (2)
of Fig. 6.5 exhibit an nth-order zero at 0 Hz, ka = 0.
n (ka)
To retrieve the Ambisonic signals χnm from the sound pressure signals ψnm , their
inverse would have a n-fold (unstable) pole at 0 Hz. Considering that microphone
self noise and array imperfection cause erroneous signals louder than the acoustically
expected nth-order vanishing signals around 0 Hz, filter shapes will moreover cause
an excessive boost of erroneous signals unless implemented with precaution. Filters
of the different orders n must be stabilized by high-pass slopes of at least the order
n, see also [6, 9, 21–25], and with (n + 1)th-order high-pass slopes, see Fig. 6.9,
such errors are being cut off by first-order high-pass slopes at exemplary cut-on
frequencies at 90, 680, 1650, 2600 Hz for the Ambisonic orders 1, 2, 3, 4, yielding a
3 md04.0025
on https://web.maths.unsw.edu.au/~rsw/Sphere/Images/MD/md_data.html.
6.8 Regularizing Filter Bank for Radial Filters
143
(a)
1
32 -wave-length
diameter, i.e. k a =
π
32
(c)
1
16 -wave-length
diameter, i.e. k a =
π
16
(d) 1-wave-length diameter, i.e. k a = π
(e)
1
8 -wave-length
diameter, i.e. k a =
π
8
(f) 2-wave-length diameter, i.e. k a = 2π
(g)
1
4 -wave-lengths
diameter, i.e. k a =
π
4
(b)
1
2 -wave-length
diameter, i.e. k a =
π
2
,
(h) 4-wave-lengths diameter, i.e. k a = 4π
Fig. 6.8 Interpolated plane-wave sound pressure image ℜ{ p e−ika } on rigid-sphere array with 25
microphones allowing decomposition up to the order N = 4; simulation uses orders up to 25, and
the aliasing-free operation can only be expected within kr < N
144
6 Higher-Order Ambisonic Microphones and the Wave Equation (Linear, Lossless)
30dB
n=0
n=1
n=2
n=3
n=4
20dB
10dB
-0dB
n<1
-10dB
32Hz
n<2
63Hz
125Hz
n<3
250Hz
500Hz
n<4
1kHz
2kHz
n<5
4kHz
8kHz
(2)
Fig. 6.9 Filters (ka)2 h n (ka)/dB over frequency/Hz, regularized with (n + 1)th-order high-pass
filters
n<1
0dB
n<2
n<3
n<4
sum
H
n<5
0
H1
-20dB
H2
-40dB
H
H
-60dB
32Hz
63Hz
125Hz
250Hz
500Hz
1kHz
2kHz
4kHz
8kHz
3
4
Fig. 6.10 Stabilizing filter bank/dB over frequency/Hz: signal orders n > b are excluded from the
band b
noise boost of 20 dB for a 4th-order microphone with a = 4.2 cm, at most. However,
just cutting on the frequencies of each order is not enough: every cut-on frequency
causes a noticeable loudness drop below due to the discarded signal contributions. It
is better to design a filter bank with crossovers instead, which allows compensation
for the loudness loss in every band. A zero-phase, nth-order Butterworth high-pass
ωn
response is defined by Hhi = 1+ω
n and amplitude-complementary to the low pass
1
Hlo = 1+ωn , so that Hhi + Hlo = 1.
Using this filter type, the filter bank in Fig. 6.10 can be constructed as follows: The
band-pass filters Hb (ω) are composed of a (b + 1)th-order high- and (b + 2)th-order
low-pass skirt at ωb , and ωb+1 , respectively, except for the band b = 0 (low-pass)
and b = N (high-pass)
Ĥ0 (ω) =
1+
1
ω 2 ,
ω1
ω b+1
ωb
b+1
1 + ωωb
Ĥb (ω) =
1+
ω N+1
ωN
N+1 .
1 + ωωN
1
ω b+2
ωb+1
,
ĤN (ω) =
(6.22)
To make the bands perfectly reconstructing, filters are normalized by the sum
response
Hb = N
Ĥb
b=0
Ĥb (ω)
.
(6.23)
6.8 Regularizing Filter Bank for Radial Filters
145
By adjusting the cut-on frequencies ωb of the different orders b = 1, . . . , N, the
noise and mapping behavior of the microphone array is adjusted; only the zeroth
order is present in every band down to 0 Hz.
This filter bank design moreover allows to adjust loudness and sidelobe suppression in every frequency band, separately.
6.9 Loudness-Normalized Sub-band Side-Lobe
Suppression
The filter bank design shown above would only yield Ambisonic signals whose
order increases with the frequency band. Ideally, this variation of the order comes
with the necessity of individual max-r E sidelobe suppression in every band. Moreover, Ambisonic signals of different orders are differently loud, so also diffuse-field
equalization of the E measure is desirable in every band.
To fulfill the above constraints, we propose to use the following set of FIR filter responses as given in [26, 27], that are modified by a filter bank employing
diffuse-field normalized max-r E -weights in separate frequency bands b = 0, . . . , N,
cf. Fig. 6.11, with the nth order discarded for bands below b < n:
N
ika
an,b Hb (ω) i−n−1 (ka)2 h (2)
n (ka) e .
ρn (ω) =
(6.24)
b=n
√
Here, eika removes the linear phase of h (2)
n , and an,b is the set of diffuse-field ( E)
equalized max-r E weights for the band b in which the Ambisonic orders retrieved
are 0 ≤ n ≤ b
an,b =
⎧
⎪
⎨
⎪
⎩
137.9◦
Pn cos b+1.51
0,
N
2
2 , for n ≤ b
◦
n=0 (2n+1)
Pn cos
137.9◦
N+1.51
n=0 (2n+1)
Pn cos
137.9
b+1.51
b
(6.25)
otherwise.
Figure 6.12 shows the polar patterns of the corresponding direction-spread functions.
For the implementation of ρn (ω) by fast block filtering, ω = 2π f and k = ω/c are
uniformly sampled with frequency, and the inverse discrete Fourier transform yields
the associated impulse responses (attention: the value at 0 Hz must be replaced for
stable results, and cyclic time-domain shifts and windows are necessary).
The direction-spread function of a plane-wave sound pressure mapped to a directional Ambisonic signal becomes frequency-dependent as shown in Fig. 6.13, and it
has minimal side lobes.
146
6 Higher-Order Ambisonic Microphones and the Wave Equation (Linear, Lossless)
30dB
n=0
n=1
n=2
n=3
n=4
20dB
10dB
0dB
n<1
-10dB
32Hz
63Hz
n<2
n<3
125Hz
250Hz
500Hz
n<4
1kHz
n<5
2kHz
4kHz
8kHz
Fig. 6.11 Filter-bank-regularized/dB over frequency/Hz, diffuse-field equalized max-r E weighted
2 (2)
spherical microphone array responses using in ρn (ω) = N
b=n an,b Hb (ω) (ka) h n (ka)
13
5°
0dB 90°
45
°
−12dB
b=0
b=1
b=2
b=3
b=4
5°
13
45
°
0°
180°
−24dB
90°
Fig. 6.12 Diffuse-field equalized (to E = 1) max-r E direction-spread functions; even orders are
plotted on upper, odd orders on lower semi-circle
Fig. 6.13 Direction spread/dB over frequency/Hz in zenithal cross section/degrees through
Ambisonic signal of simulated microphone processing response to plane wave from zenith and
the parameters a = 4.2 cm, M = 25 mics., max-r E -weighted in bands 90, 680, 1650, 2600 Hz for
the cut on of the orders 1, 2, 3, 4. Simulation is done with the order Nsim = 30 and spatial aliasing
will occur above 5.2 kHz. Gain matching was assumed to be up to < ±0.5 dB accurate; the map
shows the direction spread normalized to its value at 0◦ for every frequency to make its shape easier
to read
6.10 Influence of Gain Matching, Noise, Side-Lobe
Suppression
Typical gain mismatch between the microphones is not always more accurate than
0.5 dB. The result is that the physically dominant omnidirectional signal will leak into
the higher-order signals by directionally random gain variations. However, acoustically, higher-order components are expected to be weak and to require amplification.
6.10 Influence of Gain Matching, Noise, Side-Lobe Suppression
147
(a) 50, 160, 500, 1600 cut on with < ±0.5 dB gain matching, no sidelobe suppression
(b) 50, 160, 500, 1600 cut on with only 4th -order sidelobe suppression, assuming perfect gain
match
(c) 50, 160, 500, 1600 cut on with individual max-r E sidelobe suppression per band, assuming
perfect gain match
Fig. 6.14 Influence of carelessly selected cut-on frequencies for regularization (top), and of nonindividual sidelobe suppression per band (middle), in contrast to ideal results (bottom); the maps
show direction spreads normalized to their values at 0◦ for every frequency to make side lobes
easier to read
The effect on mapping is equivalent to one of microphone self noise, however gain
mismatch yields a correlated signal exciting the microphones, whereas self-noise
yields low-frequency noise.
If regularization filters were set to 50, 160, 500, 1600 and sidelobe suppression
turned off for testing, one would get the poor image as in Fig. 6.14a, where high-order
signals at low frequencies are highly boosted.
If a noise-free case is assumed, and only the max-r E side-lobe suppression of the
highest band is used for all bands, one gets the image in Fig. 6.14b, which improves
with individual max-r E weights in Fig. 6.14c.
Self-noise behavior. Assuming that self-noise of the microphones is uncorrelated, it will also remain uncorrelated and of equal strength after decomposing the
M microphone signals pi = N into the (N + 1)2 spherical harmonic coefficient sig2
nals ψnm = (N+1)
N , if M ≈ (N + 1)2 and the microphone arrangement permits a
M
well-conditioned pseudo inversion Y †N . The spectral change of the microphone self
noise due to the radial filters ρn (ω) can be described by the noise of the (2n + 1)
148
6 Higher-Order Ambisonic Microphones and the Wave Equation (Linear, Lossless)
25dB
20dB
15dB
10dB
5dB
0dB
-5dB
32Hz
63Hz
125Hz
250Hz
500Hz
1kHz
2kHz
4kHz
8kHz
Fig. 6.15 Self-noise modification |G(ω)|2 /dB over frequency/Hz for the filter bank configurations
using the cut on frequencies 2k, 3k, 4k, 5k (no noise amplification), 600, 2k, 3.5k, 4.2k (5 dB noise
amplification), 280, 1.3k, 2.6k, 3.6k (10 dB noise amplification), 150, 950, 2k, 3.15k (15 dB noise
amplification), and 90, 680, 1.65k, 2.6k (20 dB noise amplification)
signals of the same order, amplified by |ρn (ω)|2 , in comparison to the zeroth-order
signal:
N
|G(ω)|2 =
n=0 (2n + 1)|ρn (ω)|
2
|(ka)2 h (2)
0 (ka)|
2
.
(6.26)
Figure 6.15 analyzes the noise amplification for the simulation example (max-r E
weighting in each sub band, a = 4.2 cm) and shows the dependency on exemplary
cut on frequencies configured to tune the filterbank to 0, 5, 10, 15, and 20 dB noise
boosts. The trade here is: the more noise boost one can allow, the more directional
resolution one gets, see Fig. 6.16.
Open measurement data (SOFA format) characterizing the directivity
patterns of the 32 Eigenmike em32 transducers are provided under the link
http://phaidra.kug.ac.at/o:69292. They are measured on a 12◦ × 11.25◦ azimuth×
zenith grid, yielding 480 × 256 pt impulse responses for each of the 32 transducers.
6.11 Practical Free-Software Examples
6.11.1 Eigenmike Em32 Encoding Using Mcfx and IEM
Plug-In Suites
We give a practical signal processing example for the Eigenmike em32 which is
applicable e.g. in digital audio workstations. First the 32 signals are encoded by matrix
multiplication (IEM MatrixMultiplier), cf. Fig. 6.17a, yielding 25 fourth-order
signals. The preset (json file) is provided online http://phaidra.kug.ac.at/o:79231. The
radial filtering that sharpens the surround sound image uses mcfx-convolver, see
Fig. 6.17b, with 25 SISO filters, one for each Ambisonic signal, using the 5 different
filter curves for the orders n = 0, . . . , 4 as defined above. The convolver presets (wav
6.11 Practical Free-Software Examples
149
(a) 0 dB noise boost (2k , 3k , 4k , 5k ) individual max-r E weighting
(b) 5 dB noise boost (600, 2k , 3.5k , 4.2k ) individual max-r E weights
(c) 10 dB noise boost (280, 1.3k , 2.6k , 3.6k ) individual max-r E weights
(d) 15 dB noise boost (150, 950, 2k , 3.15k ) individual max-r E weights
(e) 20 dB noise boost (90, 680, 1.65k , 2.6k ) individual max-r E weights
Fig. 6.16 Direction spread/dB for over frequency/Hz and zenith/degrees of filterbank with different
settings to achieve 0, 5, 10, 15, 20 dB noise boosts; the maps show direction spreads normalized to
their values at 0◦ at every frequency as above
150
6 Higher-Order Ambisonic Microphones and the Wave Equation (Linear, Lossless)
(a) em 32 encoding
(b) em 32 radial filters
Fig. 6.17 IEM MatrixMultiplier encoding the Eigenmike em32 signals and mcfxconvolver applying radial filters to encoded em32 recording
Fig. 6.18 Practical equalization of the em32 transducer characteristics by two parametric shelving
filters of the mcfx_filter, cf. [28]
files and config files for mcfx-convolver) are provided online http://phaidra.kug.
ac.at/o:79231 and are available for the different noise boosts 0, 5, 10, 15, 20 dB.
As found in [28], the em32 transducers exhibit a frequency response that favors
low frequencies and attenuates high frequencies. This behavior is sufficiently well
equalized in practice using two parametric shelving filters, a low shelf at 500 Hz
with a gain of −5 dB, and a high shelf at 5 kHz using a gain of +5 dB, see Fig. 6.18.
6.11.2 SPARTA Array2SH
The SPARTA suite by Aalto University includes the Array2SH plug-in shown in
Fig. 6.19 to convert the transducer signals of a microphone array into Ambisonics. It
provides both encoding of the signals, as well as calculation and application of radialfocusing filters based on the geometry of the array. It supports rigid and open arrays
6.11 Practical Free-Software Examples
151
Fig. 6.19 SPARTA Array2SH encoding for, e.g., em32
and comes with presets for several arrays, such as the Eigenmike em32. The plug-in
allows to adjust the radial filters in terms of regularization type and maximum gain.
The Reg. Type called Z-Style corresponds to the linear-phase design of Sect. 6.9.
References
1. J. Daniel, Evolving views on HOA: from technological to pragmatic concernts, in Proceedings
of the 1st Ambisonics Symposium (Graz, 2009)
2. G.W. Elko, R.A. Kubli, J. Meyer, Audio system based on at least second-order eigenbeams, in
PCT Patent, vol. WO 03/061336, no. A1 (2003)
3. G.W. Elko, Superdirectional microphone arrays, in Acoustic Signal Processing for Telecommunication, ed. by J. Benesty, S.L. Gay (Kluwer Academic Publishers, Dordrecht, 2000)
4. J. Meyer, G. Elko, A highly scalable spherical microphone array based on an orthonormal
decomposition of the soundfield, in IEEE International Conference on Acoustics, Speech, and
Signal Processing, 2002. Proceedings.(ICASSP’02), vol. 2 (Orlando, 2002)
5. G.W. Elko, Differential microphone arrays, in Audio Signal Processing for Next-Generation
Multimedia Communication Systems, ed. by Y. Huang, J. Benesty (Springer, Berlin, 2004)
6. J. Daniel, S. Moreau, Further study of sound ELD coding with higher order ambisonics, in
116th AES Convention (2004)
7. S.-O. Petersen, Localization of sound sources using 3d microphone array, M. Thesis, University
of South Denmark, Odense (2004). www.oscarpetersen.dk/speciale/Thesis.pdf
8. B. Rafaely, Analysis and design of spherical microphone arrays. IEEE Trans. Speech Audio
Process. (2005)
9. S. Moreau, Étude et réalisation d’outils avancés d’encodage spatial pour la technique de spatialisation sonore Higher Order Ambisonics: microphone 3d et contrôle de distance, Ph.D.
Thesis, Université du Maine (2006)
10. H. Teutsch, Modal Array Signal Processing: Principles and Applications of Acoustic Wavefield
Decomposition (Springer, Berlin, 2007)
11. Z. Li, R. Duraiswami, Flexible and optimal design of spherical microphone arrays for beamforming. IEEE Trans. ASLP 15(2) (2007)
152
6 Higher-Order Ambisonic Microphones and the Wave Equation (Linear, Lossless)
12. W. Song, W. Ellermeier, J. Hald, Using beamforming and binaural synthesis for the psychoacoustical evaluation of target sources in noise. J. Acoust. Soc. Am. 123(2) (2008)
13. B. Rafaely, Fundamentals of Spherical Array Processing, 2nd edn. (Springer, Berlin, 2019)
14. ISO 31-11:1978, Mathematical signs and symbols for use in physical sciences and technology
(1978)
15. ISO 80000-2, quantities and units? Part 2: Mathematical signs and symbols to be used in the
natural sciences and technology (2009)
16. E.G. Williams, Fourier Acoustics (Academic, Cambridge, 1999)
17. B. Rafaely, B. Weiss, E. Bachmat, Spatial aliasing in spherical microphone arrays. IEEE Trans.
Signal Process. 55(3) (2007)
18. F. Zotter, Sampling strategies for acoustic holography/holophony on the sphere, in NAG-DAGA,
Rotterdam (2009)
19. P. Lecomte, P.-A. Gauthier, C. Langrenne, A. Berry, A. Garcia, A fifty-node Lebedev grid and
its applications to ambisonics. J. Audio Eng. Soc. 64(11) (2016)
20. I.H. Sloan, R.S. Womersley, Extremal systems of points and numerical integration on the
sphere. Adv. Comput. Math. 21, 107–125 (2004)
21. B. Bernschütz, C. Pörschmann, S. Spors, Soft-limiting bei modaler amplitudenverstärkung
bei sphärischen mikrofonarrays im plane-wave decomposition verfahren, in Fortschritte der
Akustik - DAGA (2011)
22. T. Rettberg, S. Spors, On the impact of noise introduced by spherical beamforming techniques
on data-based binaural synthesis, in Fortschritte der Akustik - DAGA (2013)
23. T. Rettberg, S. Spors, Time-domain behaviour of spherical microphone arrays at high orders,
in Fortschritte der Akustik - DAGA (2014)
24. B. Rafaely, Fundamentals of Spherical Array Processing, 1st edn. (Springer, Berlin, 2015)
25. D.L. Alon, B. Rafaely, Spatial decomposition by spherical array processing, in Parametric
Time-Frequency Domain Spatial Audio, ed. by V. Pulkki, S. Delikaris-Manias, A. Politis (Wiley,
New Jersey, 2017)
26. S. Lösler, F. Zotter, Comprehensive radial filter design for practical higher-order ambisonic
recording, in Fortschritte der Akustik – DAGA Nürnberg (2015)
27. F. Zotter, M. Zaunschirm, M. Frank, M. Kronlachner, A beamformer to play with wall reflections: The icosahedral loudspeaker. Comput. Music J. 41(3) (2017)
28. F. Zotter, M. Frank, C. Haar, Spherical microphone array equalization for ambisonics, in
Fortschritte der Akustik - DAGA (Nürnberg, 2015)
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0
International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing,
adaptation, distribution and reproduction in any medium or format, as long as you give appropriate
credit to the original author(s) and the source, provide a link to the Creative Commons license and
indicate if changes were made.
The images or other third party material in this chapter are included in the chapter’s Creative
Commons license, unless indicated otherwise in a credit line to the material. If material is not
included in the chapter’s Creative Commons license and your intended use is not permitted by
statutory regulation or exceeds the permitted use, you will need to obtain permission directly from
the copyright holder.
Chapter 7
Compact Spherical Loudspeaker Arrays
Le haut-parleur anonymise la source réelle.
Pierre Boulez [1], ICA, 1983.
…adjustable radiation naturalize[s] alien sounds by embedding
them in the natural spatiality of the room.
OSIL project [2], InSonic, 2015.
Abstract This chapter introduces auditory objects that can be created by adjustabledirectivity sources in rooms. After showing basic positioning properties in distance
and direction, we describe physical first- and higher-order spherical loudspeaker
arrays and their control, such as the loudspeaker cubes or the icosahedral loudspeaker
(IKO). Not only static auditory objects, but such traversing space by their timevarying beam forming are considered here. Signal dependency and different practical
setups are discussed and briefly analyzed. This young Ambisonic technology brings
new means of expression to sound reinforcement, electroacoustic or computer music.
While surrounding Ambisonic loudspeaker arrays play sound from outside the
listening area into the audience, compact spherical loudspeaker arrays play sound
into the room from a single position. Directivity adjustable in orientation and shape
can be used to steer sound beams in order to excite wall reflections in the given,
acoustic environment. The directional shapes and orientations of such beams are
all controlled by—guess what—Ambisonic signals. Despite the huge practical difference, both applications do not only share the spherical harmonics that lend their
shapes to Ambisonic signals: The control of radiating sound beams employs nearly
the same model- or measurement-based radial steering filters as those of compact
higher-order Ambisonic microphones.
The works of Warusfel [3], Kassakian [4], Avizienis [5], Zotter [6, 7], Pomberger
[8], Pollow [9], Mattioli Pasqual [10] established the electroacoustic background
technology required to describe compact spherical loudspeaker arrays built with
electrodynamic transducers. The early works on auditory objects were written by
Schmeder [11], Sharma, Frank, and Zotter [2, 12, 13]. And some contemporary
results were found in the project “Orchestrating the Space by Icosahedral Loudspeaker” (OSIL) between 2015 and 2018 [14–19].
© The Author(s) 2019
F. Zotter and M. Frank, Ambisonics, Springer Topics in Signal Processing 19,
https://doi.org/10.1007/978-3-030-17207-7_7
153
154
7 Compact Spherical Loudspeaker Arrays
7.1 Auditory Events of Ambisonically Controlled
Directivity
7.1.1 Perceived Distance
Laitinen showed in [20] that increasing the directivity of a listener-facing loudspeaker
array from omnidirectional to second order was able to create auditory events that
were perceptually closer than the physical distance to the loudspeaker array. The
experimental results can be explained by the increase of the direct-to-reverberant
energy ratio, as the sound beam of the directional source does not as much excite
room reflections.
Wendt extended Laitinen’s work by experiments employing a simulation of a
third-order directional source in a virtual room (third-order image source model)
played back by a loudspeaker ring in an anechoic room [16]. He could show that the
perceived distance between the listener and the higher-order directional source could
not only be controlled by the order of the directivity pattern but also by the orientation
of the source (towards the listener, away from the listener). Beams projecting sounds
away from the listener were perceived behind the source, cf. Fig. 7.1. Again, the
perceptual results could be modeled by simple measures known from room acoustics.
7.1.2 Perceived Direction
Using a similar room simulation, the study in [21] asked participants to indicate the
perceived direction of an auditory event created by a third-order directional source.
The results showed that for different source orientations, listeners perceived auditory
objects at directions that often did not coincide with the sound source, but with the
delayed reflection paths, cf. Fig. 7.2. Perceived directions focused on the direct sound
and the three first reflections after 6, 8, and 9 ms. For some orientations, still even
1
relative auditory distance
Fig. 7.1 Perceived distance
(medians and 95%
confidence intervals)
depends on order and
direction of a static source
emitting sound with max-r E
directivity (180◦ towards
listener, 0◦ away from
listener) in a simulated
environment
0.8
0.6
0.4
0.2
0
3/180 ° 2/180 ° 1/180 °
0
1/0 °
beam order/direction
2/0 °
3/0 °
right
7.1 Auditory Events of Ambisonically Controlled Directivity
155
-90
25
14
-45
0
8
40
-0
12
45
delay of each path in ms
perceived direction in °
9
6
left
17
90
180
135
90
left
45
0
-45
source orientation in °
-90
-135
-180
right
Fig. 7.2 IKO perceived directions (black circles, radii indicate relative amount of answers) and
modeling (gray crosses), 3rd-order max-r E beam, 2nd-order image source model. Gray shading in
the background indicates level of each path
the second-order reflections at 12 and 14 ms were dominating localization. However,
the influence of later reflections is reduced by the precedence effect. The perceived
directions can be modeled by the extended energy vector originally developed for offcenter listening positions in surrounding loudspeakers arrangements, as also shown
in [17]. Experiments in [22] showed that panning between a reflection and the direct
sound creates auditory objects in between. When applying the appropriate delay
and gain to the direct sound to compensate for the longer path of the reflection, the
localization curves are similar to those of standard stereo using a pair of loudspeakers.
7.2 First-Order Compact Loudspeaker Arrays and Cubes
The simplest way of creating a loudspeaker array with adjustable directivity in a
practical sense is a cube with loudspeakers on its plane surfaces, as suggested by
Misdariis [23]. Restricting the directivity control to two dimensions reduces the
number of loudspeaker drivers to four and facilitates to equip the array with a carrying
handle on top and a flange adapter at the bottom, cf. [24] and Fig. 7.3.
Directivity control. First-order Ambisonics utilizes monopole and dipole modes,
which directly translate to the corresponding far-field radiation patterns. These modes
can easily be created due to the cubic shape by either playing of all four drivers in
phase or the opposing drivers out of phase, cf. Fig. 7.4. Nevertheless, the frequency
responses of such monopole and dipole modes need to be equalized to enable their
phase- and magnitude-aligned superposition in the far field. Filters and measurement
data of cube loudspeakers built at IEM [24] are freely available on http://phaidra.
kug.ac.at/o:67631.
156
7 Compact Spherical Loudspeaker Arrays
(a) loud speaker cube
(b) vertical cross section
(c) horizontal cross section
Fig. 7.3 Design of a loudspeaker cube: prototype, and vertical and horizontal cross section plots
Fig. 7.4 System controlling the monopoles and dipole modes of the loudspeaker cubes, to accomplish first-order beamforming with the shape parameter α and beam direction ϕ0
To overcome the compressive effort of interior volume changes at low frequencies,
the filter Hbctl in Fig. 7.4 equalizes the smaller velocity of the loudspeaker cones when
driven omnidirectionally to the velocity when driven in dipoles as a first step, and
as a second step, it attenuates the monopole pattern slightly to account for its more
efficient radiation at low frequencies. The filter HEQ is a general equalizer required
to obtain a flat frequency response, 0 ≤ α ≤ 1 is a first-order omni to dipole beamshape parameter, and ϕ0 is the beam direction. The filter Hbctl can be specified as a
5th-order IIR filter purely based on geometric and electroacoustic parameters [19].
Direct and indirect sound with two cubes. The study in [19] examined the width
of the listening area for the creation of a central auditory object between a pair of
loudspeaker cubes cf. Fig. 7.5. Steering the two beams directly at the listener yielded
a narrow listening area that increased with the distance to the loudspeakers, similar
as known from typical stereo applications, cf. Fig. 2.9. A much wider listening area
is achieved by steering the beams to the front wall to excite reflections. To this end,
max-r E (super-cardioid) beams were chosen and oriented in a way to ideally suppress
direct sound from the loudspeaker cubes at the listening position. The proposed setup
of two loudspeaker cubes can be used to play back stable L, C, R channels of a
surround production without the need of an actual center loudspeaker.
Surround with depth: Together with the distance control described by Laitinen [20],
the stable in-between auditory image has been used in [19] to establish a surround-
2
2
1.5
1.5
1
1
0.5
0.5
0
y in m
y in m
7.2 First-Order Compact Loudspeaker Arrays and Cubes
-0.5
-1
157
0
-0.5
-1
-1.5
-1.5
-2
-2
-2.5
-2.5
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2
x in m
(a) beam pair aiming directly at
central listening position at 2.5 m
distance
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2
x in m
(b) beam pair excting front wall
reflections
Fig. 7.5 Width of the listening area for a central auditory object at two distances from a pair of
loudspeaker cubes with different orientation of max-r E /super-cardioid beams
with-depth system consisting of a quadraphonic setup of four loudspeaker cubes. As
first layer, it uses the direct sounds from the 4 loudspeakers from ±45◦ and ±135◦
together with the 4 in-between images at 0◦ , ±90◦ , and 180◦ to obtain 8 directions
for third-order Ambisonic surround panning. As a second layer for depth, surround
with depth uses 4 cardioid beams pointing into the 4 room corners to provide the
impression of distant sounds. Blending between those two layer is used to control
the distance impression of surround sounds.
7.3 Higher-Order Compact Spherical Loudspeaker Arrays
and IKO
With transducers mounted on spheres or polyhedra, higher-order radiators can be
built. Typically, those are Platonic solids such as dodecahedra or icosahedra, as they
can easily be manufactured from equal-sided polygons cf. Fig. 7.6. Often, the loudspeakers are also mounted onto a common interior volume. Hereby, the higher-order
modes can be controlled at reduced impedance of the inner stiffness, however, this
also causes acoustic coupling of the transducer motions. Typically, multiple-inputmultiple-output (MIMO) crosstalk cancellers are employed to suppress the coupling
and to control the velocity of the transducer cones. If this is accomplished, the
acoustic radiation can be modeled and equalized by the spherical cap model,
cf. [6, 15, 25, 26].
158
7 Compact Spherical Loudspeaker Arrays
Fig. 7.6 Powerful icosahedral loudspeaker array (IKO by IEM and Sonible) and reflecting baffles
in Ligeti concert hall, in preparation of an electroacoustic music concert
Cap model. Higher-order loudspeaker arrays on a compact spherical housing are
modeled by the spherical cap model. It assumes for the exterior air that the radial
surface velocity is a boundary condition consisting of separated spherical cap shapes
of the size α centered around the directions {θl }, each unity in value. These idealized
transducer shapes driven by the transducer velocities vl compose the surface velocity
v(θ ) =
L
u(θlT θ − cos α2 ) vl .
(7.1)
l=0
Here, u(ζ ) denotes the unit step function that is unity for ζ ≥ 0 and zero otherwise.
The surface velocity distribution can be decomposed into spherical harmonics as
v(θ ) =
∞ n
Ynm (θ)
n=0 m=−n
L
(l)
wnm
vl .
(7.2)
l=0
(l)
The coefficients wnm
of the lth cap are defined by spherical convolution Eq. (A.56)
of a Dirac delta δ(θlT θ − 1) pointing to the cap center with a zenithal cap u(cos ϑ −
cos α2 ):
(l)
wnm
= wn Ynm (θl ),
(7.3)
where Ynm (θl ) are the coefficients expressing the Dirac delta, extended to a cap by
1
weighting with wn . The term wn = 2π cos α Pn (ζ ) dζ is derived in Eq. (A.60)
2
P
(cos α )−cos
n
2
− n+1
wn = 2π
1 − cos α2 ,
α
2
Pn (cos α2 )
, for n > 0,
for n = 0.
(7.4)
7.3 Higher-Order Compact Spherical Loudspeaker Arrays and IKO
159
Decoder. Without radiation control yet, any low-order target spherical harmonic
the spherical cap
n ≤ N can be synthesized as velocity pattern φnm by superimposing (l)
with suitable transducer velocities vl , i.e. φnm = l wn Ynm (θl ) vl .
coefficients wnm
We write a matrix/vector notation with the matrix Y = [ y(θ1 ), . . . , y(θL )] containing
the spherical harmonics y(θ ) = [Ynm (θ )]nm sampled at the transducer positions {θl }
to represent Dirac deltas pointing there, and w = [wn ]nm to represent the cap shape,
φ = diag{w}Y v.
(7.5)
As long as the order N up to which coefficients are controlled is low enough L ≥
(N + 1)2 and transducers are well-distributed, perfect control is feasible. The corresponding velocities are found by solving a least-squares problem, see Appendix A.4,
Eq. (A.63), yielding the right inverse of the Nth-order cap-coefficient matrix,
v = Y TN (Y N Y TN )−1 diag{wN }−1 φ N = D diag{wN }−1 φ N .
(7.6)
The right inverse D = Y TN (Y N Y TN )−1 is a mode-matching decoder, cf. Eq. (4.40).
Exterior problem. The radiated sound pressure is described by the exterior problem
denoted by the coefficients cnm in Eq. (6.13) and the spherical Hankel functions
h (2)
n (kr ). To relate it to a time-derived surface velocity at the array radius r = a, we
∂p
= −ikc ρv, cf. Eq. (6.2),
derive the exterior solution with regard to radius ∂∂rp = k ∂kr
v(θ ) =
∞
n
i (2)
h (ka) Ynm (θ ) cnm .
ρc n=0 m=−n n
(7.7)
L
−1
m
Comparing Eq. (7.2) to Eq. (7.7) yields cnm = ρc[ih (2)
n (ka)]
l=0 wn Yn (θl ) vl ,
the coefficients to calculate the radiated pressure. Far away, we replace the sphern+1 −1 −ikr
k e
by
the term in+1 k −1
ical Hankel function that approaches h (2)
n (kr ) → i
in Eq. (6.13) so that the radiated far-field sound pressure p ∝ in+1 k −1 Ynm cnm
becomes
∞ L
n
in wn m
p(θ) ∝
Ynm (θ )
Yn (θl ) vl .
(7.8)
k h (2)
n (ka) l=0
n=0 m=−n
7.3.1 Directivity Control
The spherical harmonics coefficients of the far-field sound pressure pattern in
Eq. (7.8) are controlled by the cap velocities vl
ψnm =
in wn
k
L
h (2)
n (ka) l=0
Ynm (θl ) vl ,
(7.9)
160
7 Compact Spherical Loudspeaker Arrays
and we desire to form the directional sound beam they represent according to a
max-r E pattern an Ynm (θ 0 ) yielding radiation focused towards θ 0
ψnm = an Ynm (θ 0 ).
(7.10)
To find suitable cap velocities vl , we equate the model Eqs. (7.9) and (7.10). In
matrix/vector notation never used the equation is
diag{[in wn k −1 / h (2)
n (ka)]nm } Y v = diag{[an ]nm } y(θ 0 ).
(7.11)
The diagonal matrix on the left is easy to invert, and for patterns up to the order
n ≤ N, the mode-matching decoder D of Eq. (7.6) already gives us a way to define
velocities inverting the matrix Y N from the right. The preliminary solution becomes
⇒ v = D diag{[i−n wn−1 k h (2)
n (ka) an ]nm } yN (θ 0 ).
(7.12)
On-axis equalized, sidelobe-suppressing directivity control limiting the excursion.
The inverse cap shape coefficient wn−1 and the max-r E weight an can be regarded as a
−n−1
(ka)2 h (2)
part of the radiation control filters i−n k h (2)
n (ka). The expression i
n (ka)
of compact spherical microphone arrays (Sect. 6.6) qualitatively differs by a factor
k. Practical implementation of radiation control filters and their regularization is
therefore quite similar to radial filters of spherical microphone arrays. There are
three main differences, as explained in [15]:
• With loudspeaker arrays, it is rather the excursion that is limited, which primarily
entails a different strategy of adjusting the filter bank cut-on frequencies, which due
to size are at lower frequencies where group-delay distortions are less disturbing,
and linear-phase implementations would cause avoidably long delays.
• Moreover, instead of cut-on filter slopes of (n + 1)th order required for noise
removal in signals obtained from spherical microphone arrays, limited excursion
requires cut-on slopes of at least (n + 3)th order, i.e. 4th order to cut on the 1storder Ambisonic signals. Thereof, one additional order is caused by the qualitative
difference of k −1 in radial filters, and another order by the conversion of velocity
to excursion by a factor (iω)−1 .
• Finally, instead of diffuse-field equalization that is useful for surround sound playback of spherical microphone array signals, it is more useful to equalize spherical
sound beams on-axis (free field).
On-axis equalization yields a different scaling of the sub-band max-r E weights
an,b
⎧
⎨ P cos 137.9◦
n
b+1.51
=
⎩
0,
N
n=0 (2n+1)Pn
cos
n=0 (2n+1)Pn
cos
b
137.9◦
N+1.51
137.9◦
b+1.51
, for n ≤ b,
otherwise.
(7.13)
7.3 Higher-Order Compact Spherical Loudspeaker Arrays and IKO
161
Typically, cut-on frequencies for compact spherical loudspeaker arrays are low,
and linear-phase filterbanks would require long pre-delays. It is useful to employ
Linkwitz-Riley filters for the crossovers, to get a low-latency implementation. To
emphasize the similarity to Eq. (6.22), we write Linkwitz-Riley filters [27] as combination of an all-pass Am with twice the phase response of an mth-order Butterworth low-pass combined either with the magnitude-squared low-pass response
[1 + (ω/ωc )2m ]−1 or high-pass response (ω/ωc )2m [1 + (ω/ωc )2m ]−1 per crossover.
Such a minimum-phase crossover is of even order, so that the minimum-order cut-on
. Plain high/low crossovers
slope must be rounded up to the next even order 2 b+3
2
would be in-phase unless combined with further crossovers to form narrower bands.
However, an in-phase filterbank is obtained after inserting the product of all all-passes
in every band, cf. [28]. Although non-minimum-phase, this is still low-latency. For
the band b containing Ambisonic orders 0 ≤ n ≤ b, the modified filterbank is
Hb (ω) =
ω 2
ωb
1+
b+3
2
ω 2
ωb
b+3
2
N
1
1+
ω
ωb+1
2
b+4
2
b +3
Aωb2 (ω).
(7.14)
b =0
The sum b Hb (ω) is considered to be sufficiently flat, so that the radial filters for
compact spherical loudspeaker arrays using Eqs. (7.8), (7.13), (7.14) become
ρn (ω) =
N
ika
an,b Hb (ω) i−n wn−1 h (2)
n (ka) e .
(7.15)
b=n
Figure 7.7 shows the block diagram to control compact spherical loudspeaker arrays
by Ambisonic input signals, including the radiation control filters, the decoder, and
a voltage-equalizing crosstalk canceller feeding the loudspeakers.
Fig. 7.7 Signal processing
for higher-order compact
spherical loudspeaker array
control: the Ambisonic
directivity signals χ N (t) run
through radiation control
filters ρn (ω), a decoder
yielding desired velocities
v(t), and a crosstalk
canceller/equalizer provides
suitable output voltages u(t)
162
7 Compact Spherical Loudspeaker Arrays
7.3.2 Control System and Verification Based
on Measurements
Velocity equalization/crosstalk cancellation. In the frequency domain, laser vibrometer measurements, cf. Fig. 7.8a, characterize the physical multiple-input-multipleoutput (MIMO) system of transducer input voltages u l (ω) to transducer velocities
vl (ω)
v(ω) = T (ω) u(ω),
(7.16)
including the effect of acoustic coupling through the common enclosure. Corresponding open measurement data sets1 can be found online, as described in [18].
Theoretically, the frequency-domain inverse of the matrix T (ω) can be used to
equalize and control the transducer velocities with acoustic crosstalk cancelled, as
indicated in Fig. 7.7,
u(ω) = T −1 (ω) v(ω).
(7.17)
In practice, this is only useful up to the frequency at which the loudspeaker cone
vibration breaks up into modes, so typically below 1 kHz.
Control system: The entire control system with Ambisonic signals χ N (ω) as inputs
uses Eqs. (7.6), (7.15), (7.17)
u(ω) = T −1 (ω) D diag{ρ(ω)} χ N (ω).
(7.18)
Directivity measurement. It is useful to characterize the directivity obtained by
measurements to verify the results; high-resolution 648 × 20 measurements G(ω)
of the IKO are found online1 . The sound pressure can be decomposed with the
known directional sampling by left-inversion of a spherical harmonics matrix Y T17 ,
see Appendix A.4, Eq. (A.65, which can be up to 17th order on a 10◦ × 10◦ grid in
azimuth and zenith:
p(ω) = G(ω) u(ω),
⇒ ψ 17 (ω) = (Y T17 )† p(ω).
(7.19)
With the highly resolved spherical harmonics coefficients, polar diagrams or balloon
diagrams can be evaluated at any direction
p(θ, ω) = y17 (θ)T ψ 17 (ω),
(7.20)
given any control system delivering suitable voltages u for beamforming, as e.g.
obtained by Eq. (7.18).
1 http://phaidra.kug.ac.at/o:67609.
7.3 Higher-Order Compact Spherical Loudspeaker Arrays and IKO
(a) velocities
163
(b) directivities
Fig. 7.8 Measurements on the IKO as a MIMO system in terms of transducer output velocities
(left) and radiation patterns (right) depending on the transducer input voltages
Fig. 7.9 Horiontal cross section of the IKO’s directivity/dB over frequency/Hz and azimuth/degrees
when beamforming to 0◦ azimuth on the horizon, with radiation control filters above, with filterbank
frequencies (38, 75, 125, 210) Hz
To inspect the frequency-dependent directivity, a horizontal cross section is shown
in Fig 7.9. The beamforming gets effective above 100 Hz and a beam width of
±30◦ is held until 2 kHz. The filterbank starts the 0th order above 38 Hz, and with
75, 125, 210 Hz, 1st, 2nd, and 3rd order are successively added including on-axis
equalized max-r E weightings. Above 2 kHz both spatial aliasing and modal breakup
of the transducer cones affect directivity. However, these beamforming-directiondependent distortions are often negligible in typical rooms.
164
7 Compact Spherical Loudspeaker Arrays
7.4 Auditory Objects of the IKO
7.4.1 Static Auditory Objects
Fig. 7.10 Distance control
using the IKO in the IEM
CUBE
auditory distance from IKO in m
The study in [16] showed that distance control by changing the directivity and its
orientation can also be achieved with the IKO in a real room, cf. Fig. 7.10. The
experiments used stationary pink noise and could create auditory objects nearly 2 m
behind the IKO, which corresponds to the distance between the IKO and the front
wall of the playback room.
The maximum distance of auditory objects created by the IKO is strongly signaldependent. Experiments in [14] showed that the auditory distance of pink noise
bursts decreased for shorter fade-in times, while the fade-out time had no influence,
cf. Fig. 7.11. A transient click sound was perceived even closer to the IKO. This can
be explained by the precedence effect, that favors the earlier direct sound over the
reflected sound from the walls. While this effect is strong for transient sounds, it is
inhibited for stationary sounds with long fade-in times.
However, the precedence effect can even be reduced for transient click sounds
by simultaneous playback of a masker sound that reduces the influence of the direct
sound [29]. In comparison to no masker, playing a pink noise masker doubles the
auditory distance, cf. Fig. 7.12. Using the room noise as a masker by playing the
2
1
0
-1
-2
front
0
back
Fig. 7.11 Signal-dependent
distance of auditory objects
created by the IKO: markers
indicate short or long fade-in
and fade-out of pink noise
bursts, · indicates transient
click sound
auditory distance from IKO in m
transition between front and back beam
2
1.5
1
0.5
0
1
2
3
sound
4
5
7.4 Auditory Objects of the IKO
165
auditory distance from IKO in m
Fig. 7.12 Distance of
auditory objects created by
the IKO playing a transient
click sound in dependence of
maskers: no masker (M0),
noise at different levels (M1,
M2) and playback at very
low level (room masker MR1
and MR2)
2
1.5
1
0.5
0
ref.
M0
MN1
MN2
MR1
MR2
target sound very softly further increases the distance and yield a perception that is
detached from the IKO.
7.4.2 Moving Auditory Objects
2
2
0
0
y position in m
y position in m
The studies in [14, 15] extended the previous listening experiments towards simple
time-varying beam directions, such as from the left to the right, front/back or circles.
To report the perceived locations of the moving auditory objects, listeners used a
touch screen that showed a floor plan of the room, including the listening position
and the position of the IKO. They had to indicate the location of the auditory object’s
trajectory every 500 ms. The perceived trajectories depend on the listening position, but they can always be recognized, cf. Fig. 7.13. The empirical knowledge was
-2
-4
-2
-4
-6
-6
-4
-2
0
x position in m
2
-4
-2
0
2
x position in m
Fig. 7.13 Average perceived locations for each 500 ms step during front/back-movement (dark
gray) and left/right-movement (light gray) at two listening positions, triangle indicates start and
asterisk end of the trajectory
7 Compact Spherical Loudspeaker Arrays
4
4
2
2
y position in m
y position in m
166
0
-2
-4
0
-2
-4
-4
-2
0
2
4
x position in m
-4
-2
0
2
4
x position in m
Fig. 7.14 Average perceived locations for each 500 ms step during circular movement of transient
sound (dark gray) and stationary noise (light gray) without and with additional reflectors, triangle
indicates start and asterisk end of the trajectory
applied in the artistic study in [14] about body-space relations, composing sounds
that are spatialized with different static directions and simple movements.
For concerts, the artistic practice evolved to set the IKO up together with reflector
baffles, cf. Fig. 7.14. A recent study in [30] investigated their effect on the perception
of moving transient and stationary sounds. The baffles obviously reduce the signaldependency by contributing more additional reflection paths, contrasting the direct
sound.
7.5 Practical Free-Software Examples
7.5.1 IEM Room Encoder and Directivity Shaper
The IEM Room Encoder VST plug-in, cf. Fig. 4.36, can not only be used to simulate the room reflections of an omnidirectional sound source based on the imagesource method, but it also supports directional sound sources. As format, it employs
Ambisonics with ACN ordering and adjustable normalization up to seventh order.
Thus, it enables to utilize data from directivity measurements or even directional
recordings done with a surrounding spherical microphone, e.g. to put real instrument
recordings into the virtual room.
As an alternative, the IEM Directivity Shaper, cf. Fig. 7.15 provides simple
means to generate a frequency-dependent directivity pattern from scratch and to
apply it on a mono input signal. This is useful to generate the typical rotary speaker
effect of a Leslie cabinet.
7.5 Practical Free-Software Examples
167
Fig. 7.15 IEM Directivity Shaper plug-in
7.5.2 IEM Cubes 5.1 Player and Surround with Depth
As shown in Fig. 7.5, a pair of loudspeaker cubes can create a stable auditory event
in between them to replace an actual center loudspeaker. In order to play back an
entire 5.1 production, the IEM cubes 5.1 Player plug-in extends this approach by
two additional beams to the side walls for the surround channels, cf. Fig. 7.16. The
plug-in provides a control of the shape, direction, and level of all beams, as well as
a delay compensation for the reflection paths.
Surround sound with depth can be realized with a quadraphonic setup of four loudspeaker cubes and a combination of the cubes Surround Decoder and multiple
Distance Encoder plug-ins, cf. Fig. 7.16. For each source, the Distance Encoder
controls position and distance, i.e. the blending between the two layers. The output of
the plug-in is a 10-channel audio stream including 7 channels for third-order (inner
layer) and 3 for first-order 2D Ambisonics (outer depth layer). The cubes Surround
Decoder plug-in decodes the 10-channel audio stream and distributes the signals to
the 16 drivers of four loudspeaker cubes. For each loudspeaker cube, the directions to excite direct and reflected sound of the inner layer and the diffuse sound
of the depth layer can be adjusted in order to adapt to the playback environment.
Additionally, the directivity patterns for direct, reflected, and diffuse sound beams
168
7 Compact Spherical Loudspeaker Arrays
Fig. 7.16 IEM cubes 5.1 Player, cubes Surround Decoder and Distance
Encoder plug-ins
can be controlled, as well as a delay to compensate for the longer propagation paths of
the reflected sound. The plug-ins are available under https://git.iem.at/audioplugins/
CubeSpeakerPlugins.
7.5 Practical Free-Software Examples
169
7.5.3 IKO
Spatialization using the IKO can use a similar infrastructure of plug-ins as surrounding loudspeaker arrays. Ambisonic encoder plug-ins, such as the ambix_encoder
or the IEM StereoEncoder or MultiEncoder, create the third-order Ambisonic
signals that are subsequently fed to a decoder. Decoding to the IKO requires the
processing steps as shown in Fig. 7.7: radiation control filters in the spherical harmonic domain, decoding from spherical harmonics to transducer signals, as well as
crosstalk cancellation and equalization of the transducers. This processing can be
summarized in a 16 (spherical harmonics up to third order) × 20 (transducers) filter matrix. Convolution can be done efficiently using the mcfx_convolver plug-in.
Filter presets for the IKO can be found under http://phaidra.kug.ac.at/o:79235.
References
1. P. Boulez, L’acoustique et la musique contemporaine, in Proceedings of the 11th International
Congress on Acoustics, vol. 8 (Paris, 1983), https://www.icacommission.org/Proceedings/
ICA1983Paris/ICA11%20Proceedings%20Vol8.pdf
2. M. Frank, G.K. Sharma, F. Zotter, What we already know about spatialization with compact spherical arrays as variable-directivity loudspeakers, in Proceedings of the inSONIC2015
(Karlsruhe, 2015)
3. O. Warusfel, P. Derogis, R. Causse, Radiation synthesis with digitally controlled loudspeakers,
in 103rd AES Convention (Paris, 1997)
4. P. Kassakian, D. Wessel, Characterization of spherical loudspeaker arrays, in Convention paper
#1430 (San Francisco, 2004)
5. R. Avizienis, A. Freed, P. Kassakian, D. Wessel, A compact 120 independent element spherical
loudspeaker array with programmable radiation patterns, in 120th AES Convention (France,
Paris, 2006)
6. F. Zotter, R. Höldrich, Modeling radiation synthesis with spherical loudspeaker arrays, in
Proceedings of the ICA (Madrid, 2007), http://iaem.at/Members/zotter
7. F. Zotter, Analysis and synthesis of sound-radiation with spherical arrays, PhD. Thesis, University of Music and Performing Arts, Graz (2009)
8. H. Pomberger, Angular and radial directivity control for spherical loudspeaker arrays, M.
Thesis, Institut für Elektronische Musik und Akustik, Kunstuni Graz, Technical University
Graz, Graz, A (2008)
9. M. Pollow, G.K. Behler, Variable directivity for platonic sound sources based on spherical
harmonics optimization. Acta Acust. United Acust. 95(6), 1082–1092 (2009)
10. A.M. Pasqual, P. Herzog, J.R. Arruda, Theoretical and experimental analysis of the behavior
of a compact spherical loudspeaker array for directivity control. J. Acoust. Soc. Am. 128(6),
3478–3488 (2010)
11. A. Schmeder, An exploration of design parameters for human-interactive systems with compact
spherical loudspeaker arrays, in 1st Ambisonics Symposium (Graz, 2009)
12. G.K. Sharma, F. Zotter, M. Frank, Orchestrating wall reflections in space by icosahedral loudspeaker: findings from first artistic research exploration, in Proceedings of the ICMC/SMC
(Athens, 2014), pp. 830–835
13. F. Zotter, M. Frank, A. Fuchs, D. Rudrich, Preliminary study on the perception of orientationchanging directional sound sources in rooms, in Proceedings of the Forum Acusticum (Kraków,
2014)
170
7 Compact Spherical Loudspeaker Arrays
14. F. Wendt, G.K. Sharma, M. Frank, F. Zotter, R. Höldrich, Perception of spatial sound phenomena
created by the icosahedral loudspeaker. Comput. Music J. 41(1) (2017)
15. F. Zotter, M. Zaunschirm, M. Frank, M. Kronlachner, A beamformer to play with wall reflections: the icosahedral loudspeaker. Comput. Music J. 41(3) (2017)
16. F. Wendt, F. Zotter, M. Frank, R. Höldrich, Auditory distance control using a variable-directivity
loudspeaker. MDPI Appl. Sci. 7(7) (2017)
17. F. Wendt, M. Frank, On the localization of auditory objects created by directional sound sources
in a virtual room, in VDT Tonmeistertagung (Cologne, 2018)
18. F. Schultz, M. Zaunschirm, F. Zotter, Directivity and electro-acoustic measurements of the
IKO, in Audio Engineering Society Convention 144 (2018), http://www.aes.org/e-lib/browse.
cfm?elib=19557
19. T. Deppisch, N. Meyer-Kahlen, F. Zotter, M. Frank, Surround with depth on first-order beamcontrolling loudspeakers, in Audio Engineering Society Convention 144 (2018), http://www.
aes.org/e-lib/browse.cfm?elib=19494
20. M.-V. Laitinen, A. Politis, I. Huhtakallio, V. Pulkki, Controlling the perceived distance of
an auditory object by manipulation of loudspeaker directivity. J. Acoust. Soc. Am. 137(6)
EL462–EL468 (2015)
21. F. Zotter, M. Frank, Investigation of auditory objects caused by directional sound sources
in rooms. Acta Phys. Pol. A 128(1-A) (2015), http://przyrbwn.icm.edu.pl/APP/PDF/128/
a128z1ap01.pdf
22. F. Zagala, J. Linke, F. Zotter, M. Frank, Amplitude panning between beamforming-controlled
direct and reflected sound, in Audio Engineering Society Convention 142 (Berlin, 2017), http://
www.aes.org/e-lib/browse.cfm?elib=18679
23. N. Misdariis, F. Nicolas, O. Warusfel, R. Caussee, Radiation control on multi-loudspeaker
device : La timée,” in International Computer Music Conference (La Habana, 2001)
24. N. Meyer-Kahlen, F. Zotter, K. Pollack, Design and measurement of a first-order, in Audio Engineering Society Convention 144 (2018), http://www.aes.org/e-lib/browse.cfm?elib=19559
25. F. Zotter, H. Pomberger, A. Schmeder, Efficient directivity pattern control for spherical loudspeaker arrays, in ACOUSTICS08 (2008)
26. F. Zotter, A. Schmeder, M. Noisternig, Crosstalk cancellation for spherical loudspeaker arrays,
in Fortschritte der Akustik - DAGA (Dresden, 2008)
27. S.P. Lipshitz, J. Vanderkoy, In-phase crossover network design. J. Audio Eng. Soc. 34(11),
889–894 (1986)
28. S. Lösler, F. Zotter, Comprehensive radial filter design for practical higher-order ambisonic
recording, in Fortschritte der Akustik – DAGA Nürnberg (2015)
29. J. Linke, F. Wendt, F. Zotter, M. Frank, How masking affects auditory objects of beamformed
sounds, in Fortschritte der Akustik, DAGA (Munich, 2018)
30. J. Linke, F. Wendt, F. Zotter, M. Frank, How the perception of moving sound beams is influenced
by masking and reflector setup, in VDT Tonmeistertagung (Cologne, 2018)
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0
International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing,
adaptation, distribution and reproduction in any medium or format, as long as you give appropriate
credit to the original author(s) and the source, provide a link to the Creative Commons license and
indicate if changes were made.
The images or other third party material in this chapter are included in the chapter’s Creative
Commons license, unless indicated otherwise in a credit line to the material. If material is not
included in the chapter’s Creative Commons license and your intended use is not permitted by
statutory regulation or exceeds the permitted use, you will need to obtain permission directly from
the copyright holder.
Appendix
Many have written of the experience of mathematical beauty as
being comparable to that derived from the greatest art.
S. Zeki, J.P. Romaya, D.M.T. Benincasa, M.F. Atiyah [1]
Frontiers in Human Neuroscience, Feb. 2014.
A.1
Harmonic Functions
The Laplacian is defined in the D-dimensional Cartesian space as
=
D
∂2
.
∂ x 2j
j=1
The Laplacian eigenproblem
f = −λ f
is solved by harmonics, on a finite interval.
A.2
Laplacian in Orthogonal Coordinates
In general, coordinates can be expressed by n-tuples of values. For instance, the
Cartesian coordinates are (x1 , x2 , . . . ) and coordinates of another coordinate systems
are (u 1 , u 2 , . . . ), and both describe the location of a point in a space depending on a
finite number of dimensions. Each location of the space should be accessible by both
© The Editor(s) (if applicable) and The Author(s) 2019
F. Zotter and M. Frank, Ambisonics, Springer Topics in Signal Processing 19,
https://doi.org/10.1007/978-3-030-17207-7
171
172
Appendix
coordinate systems and there should be a bijective mapping between both systems,
e.g. u j = u j (x1 , x2 , . . . ). A single differentiation with regard to the component xi ,
for instance, is described by the chain rule and consists of the sum of weighted partial
differentials with regard to u j :
∂u j ∂
∂
=
.
∂ xi
∂ xi ∂u j
j
Written in terms of vectors, the (Cartesian) gradient ∇ =
(A.1)
∂
∂x
with
∂
∂u
yields:
∂ uT ∂
∂
:= J ∂u/∂ x
,
(A.2)
∂x ∂u
∂u
∂u
for which the Jacobian matrix J ∂u/∂ x = ∂ xij
that is either written in dependency
∇=
ij
of x or u represents all the partial derivatives of the mapping between the coordinate
systems.
For bijective mappings also Jacobian of the inverse mapping exists J ∂ x/∂u =
∂ xi
∂u j
ji
. The coordinate systems are equivalent if the determinant of the Jacobian is
non-zero | J| = 0.
Orthogonal coordinate systems have the interesting property that the rows of the
Jacobian (or its columns) are orthogonal, so that J T J yields a diagonal matrix.
T
With J ∂ x/∂u = ∂∂xu the meaning of this property becomes easier to understand: the
differential changes in the location ∂ x/∂u j of each Cartesian coordinate into the
direction of each individual non-Cartesian coordinate u j describes an orthogonal set
of motion directions in space, whose orientation depends on the location and whose
individual lengths may vary.
2
To obtain a description of the Laplacian in the Helmholtz equation = i ∂∂x 2
i
is our goal here, and it can be obtained with the chain rule, now calculating from xi
to u j ,
⎛
⎞
∂u j ∂
∂ ∂ ∂
⎝
⎠
=
=
∂ xi ∂ xi
∂ xi
∂ xi ∂u j
i
i
j
(A.3)
∂ 2u j
∂u j ∂u k
∂
∂2
+
,
∂ xi2 ∂u j i, j,k ∂ xi ∂ xi ∂u j ∂u k
i, j
⎤
⎡
∂u j ∂u k
∂2
∂ ∂ ⎥ ∂u j 2 ∂ 2
⎢
=
with
= 1T ⎣ J T J ◦
,
⎦
∂ xi ∂ xi ∂u j ∂u k
∂ uT ∂ u
∂ xi
∂u 2j
i, j,k
i, j
=
ortho:diag
with ◦ denoting the element-wise, i.e. Hadamard product. Orthogonal coordinates
largely simplify the Laplacian (see last line) and make it consist of first- and secondorder differentials with regard to the new coordinates, individually, with all mixed
Appendix
173
derivatives canceling. Both first- and second-order differentials are weighted by the
partial derivatives of the coordinate mapping. For each u j , the Laplacian is composed
of those two expressions
=
u j , where u j
j
A.3
∂ 2u j
∂u j 2 ∂ 2
∂
=
+
. (A.4)
∂ xi
∂ xi2 ∂u j
∂u 2j
i
i
Laplacian in Spherical Coordinates
The right-handed spherical coordinate systems in ISO31-11, ISO80000-2, [2, 3], uses
a radius r , an azimuth angle ϕ, and a zenigh angle ϑ, mapping to Cartesian
coordinates
x 2 + y2 + z2,
x = r cos ϕ sin ϑ, y = r sin
ϕ
sin
ϑ,
z
=
r
cos
ϑ,
or
inversely
r
=
√
x 2 +y 2
ϕ = arctan xy , ϑ = arctan
, see Fig. 4.11.
z
Re-expressing the zenith angle coordinate by ζ = cos ϑ = rz reduces the effort in
calculation and yields x = r cos ϕ 1 − ζ 2 , y = r sin ϕ 1 − ζ 2 , z = r ζ .
In order to obtain solutions along the angular dimensions azimuth and zenith,
we first need to re-write the Laplacian from Cartesian to spherical coordinates. For
first-order derivative along the x axis, we get the generalized differential
∂r ∂
∂ϕ ∂
∂ζ ∂
∂
=
+
+
.
∂x
∂ x ∂r
∂ x ∂ϕ
∂ x ∂ζ
As the Cartesian and spherical coordinates are orthogonal, therefore any mixed
second-order derivatives in Cartesian or spherical coordinates vanish. We may derive
a second time wrt. x:
∂
∂x
∂
∂x
=
∂ 2r
+
∂x2
∂r
∂x
2
∂
∂r
∂
∂r
2
2
∂2ϕ
∂
∂2ζ
∂
∂
∂
∂ϕ
∂ζ
+
+
+
+
.
∂x2
∂x
∂ϕ ∂ϕ
∂x2
∂x
∂ζ ∂ζ
Obviously, we require all first-order derivatives squared, and all second-order derivatives of the spherical coordinates.
A.3.1
The Radial Part
2
2 With r = x 2 + y 2 + z 2 we obtain for the radial part r = ∂∂ xr2 + ∂∂rx ∂r∂ ∂r∂ of
the Laplacian
174
Appendix
2
2 x 2
∂ x 2 + y2 + z2
1
x2
=
=
2x =
= 2,
∂x
r
r
2 x 2 + y2 + z2
∂ x 1 ∂x
r2 − x2
∂ 2r
∂r ∂ 1
1
x 1
=
=
=
.
+
x
=
−
x
∂x2
∂x r
r ∂x
∂ x ∂r r
r
r r2
r3
∂r
∂x
2
For x, y, z altogether, this is for:
3r 2 − x 2 − y 2 − z 2
r =
r3
2
∂
x
∂2
y2
z2 ∂ 2
2 ∂
+ 2 + 2 + 2
+
=
. (A.5)
∂r
r
r
r ∂r 2
r ∂r
∂r 2
2D. In two dimensions, there is no z coordinate, therefore there is just one term fewer:
r,2D =
A.3.2
2r 2 − x 2 − y 2
r3
2
x
∂2
∂
y2 ∂ 2
1 ∂
+ 2 + 2
+
=
.
∂r
r
r
∂r 2
r ∂r
∂r 2
(A.6)
The Azimuthal Part
With ϕ = arctan xy and dxd arctan x =
∂
of the Laplacian becomes
∂ϕ
1
, the azimuthal part ϕ
1+x 2
=
∂2ϕ
∂x2
+
∂ϕ 2
∂x
∂
∂ϕ
2
2 2 2 2 ∂ arctan xy
1
∂ϕ
∂ y
y
x2
y
y2
=
=
= − 2
= − 2
= 4 ,
2
2
2
y
∂x
∂x
x +y x
rxy
rxy
1 + x2 ∂ x x
2
2 2 2
2 ∂ arctan xy
x
∂ y
∂ϕ
x2
x2 1
x2
=
=
=
= 2
= 4 ,
2
2
2
2
∂y
∂y
x + y ∂y x
x +y x
rxy
rxy
∂ 2ϕ
∂
2x y
y
=
− 2 = 3 ,
∂x2
∂x
rxy
rxy
x
∂ 2ϕ
∂
∂ϕ
2x y
= 0.
=
=− 3 ,
2
2
∂y
∂ y rxy
rxy
∂z
It only depends on x and y, and altogether, we obtain
ϕ =
2x y − 2x y
3
rxy
x 2 + y2 ∂ 2
∂
∂2
1 ∂2
1
+
= 2
= 2
.
4
2
2
2
∂ϕ
rxy
∂ϕ
rxy ∂ϕ
r (1 − ζ ) ∂ϕ 2
(A.7)
2D. In two dimensions, ζ = 0, therefore
Appendix
175
ϕ,2D =
A.3.3
1 ∂2
.
r 2 ∂ϕ 2
(A.8)
The Zenithal Part
The zenith angle is actually ϑ, and we define ζ = cos ϑ as a variable to express
it in order to simplify the derivation. With ζ = √ 2 z 2 2 = rz , the zenithal part
x +y +z
∂ 2 ζ ∂ζ 2 ∂ ∂
ζ = ∂ x 2 + ∂ x ∂ζ ∂ζ becomes
∂ζ
∂x
∂ζ
∂z
2
2
∂ 2ζ
∂x2
∂ 2ζ
∂z 2
2
z x 2 x z 2
x 2 z2
= − 2
= − 3 = 6 ,
r r
r
r
2
2 2 2
2
2
4
rxy
rxy
z z
∂ z
1
r − z2
− 2
=
=
=
=
=
,
∂z r
r
r r
r3
r3
r6
3x 2 − r 2
z
∂ xz 1 x
− 3 = − 3 + 3x z 4 = z
=
,
∂x r
r
r r
r5
2
2
2
3rxy
rxy
z
∂ rxy
=
−z
=
.
=
−3
∂z r 3
r4 r
r5
=
∂ z
∂x r
For x, y, and z altogether, we get
ζ = z
2
4
2
3x 2 + 3y 2 − 2r 2 − 3rxy
(x 2 + y 2 )z 2 + rxy
r 2 rxy
∂
∂2
∂2
2r 2 ∂
=− 5
+
+ 6
6
2
5
∂ζ
r
∂ζ
r ∂ζ 2
r
r ∂ζ
r 2 (1 − ζ 2 ) ∂ 2
1 − ζ 2 ∂2
2 ∂
2 ∂
+
+
= −z 3
= − 2ζ
.
4
2
r ∂ζ
r
∂ζ
r ∂ζ
r 2 ∂ζ 2
(A.9)
2D. This part does not exist in 2D.
A.3.4
Azimuthal Solution in 2D and 3D
2
The azimuth harmonics are found by solving ϕ = −λrxy
d2
2
= −λ rxy
.
dϕ 2
(A.10)
We know that cos x = − cos x and sin x = − sin x, therefore we can insert the
solutions
176
Appendix
ˆ = cos(aϕ), for a ≥ 0,
sin(|a|ϕ), for a < 0,
and obtain with
d2 ˆ
dϕ 2
ˆ the characteristic equation that fixes a
= −a 2 2
−a 2 = −λ rxy
.
ˆ
ˆ + 2π l) with l ∈ Z. This is only possible
Geometrically, we desire that (ϕ)
= (ϕ
2
= m 2 , and m ∈ Z.
with λ rxy
We can therefore define for −∞ ≤ m ≤ ∞ the terms of a normalized Fourier
series
⎧√
⎪ 2 sin(|m|ϕ), for m < 0,
1 ⎨
(A.11)
m (ϕ) = √
1,
for m = 0,
2π ⎪
⎩√
2 cos(mϕ), for m > 0.
The azimuth harmonics are orthogonal: none of the products cos(iϕ) sin( jϕ),
cos(iϕ) cos( jϕ), or sin(iϕ) sin( jϕ) produces a constant component unless i = j,
excluding the mixed cosine and sine product. Normalization ensures that the nonzero result is unity
$
2π
0
by
% 2π
0
1
√ 2 dϕ
2π
=
% 2π
0
1, for i = j,
i j dϕ = δi j =
0, else,
cos2 mϕ
√ 2
π
dϕ =
% 2π
0
sin2 mϕ
√ 2
π
(A.12)
dϕ = 1.
2D functions. With unbounded |m| → ∞, the circular harmonics are complete in
the Hilbert space of square-integrable circular polynomials. By their orthonormality,
we can derive a transformation integral of a function g(ϕ) that should be represented
as series
g=
∞
γj j
(A.13)
j=−∞
by integration of g over m
$
π
−π
%
dϕ
g(ϕ) m dϕ =
∞
m=−∞
$
γm
π
m j dϕ = γm .
−π (A.14)
=δm j
2D panning functions. To decompose an infinitely narrow unit-surface Dirac delta
function that represents an infinite-order panning function towards the direction ϕs ,
Appendix
177
δ(ϕ − ϕs ) =
limε→0 2ε1 , for |ϕ − ϕs | ≤ ε,
0
otherwise,
(A.15)
we the obtain the coefficients from the transformation integral
$
γm =
π
−π
$
δ(ϕ − ϕs ) m dϕ = m (ϕs ) lim
ε
ε→0 −ε
1
dϕ
2ε
= m (ϕs ).
(A.16)
Typically, a finite-order series will be employed as a panning function
N
gN =
am m (ϕs )m (ϕ),
(A.17)
m=−N
involving a weight am = a|m| controlling the side lobes. To evaluate the E measure
of loudness, we can write
⎤
⎡ N
$ π
$ π $ π
N
E=
gN2 dϕ =
ai i ⎣
a j j ⎦ dϕ =
ai a j
i j dϕ
−π
=
−π
N
2 − δi
2π
i=0
i=0
j=0
i, j
ai2 .
−π
(A.18)
For ϕs = 0, we obtain an axisymmetric function in terms of a pure cosine series, as
sin 0 = 0,
gN (ϕ) =
N
m=0
am
2 − δm
cos(mϕ),
2π
(A.19)
with δm = 1 for 1 for m = 0 and 0 elsewhere. The axisymmetric panning function
is easier to design.
2D max-r E . For the narrowest-possible spread, we maximize the length of r E ,
%π
rE =
−π
%π
N
=
2 cos ϕ dϕ
gN
2
−π gN
dϕ
i=1 ai ai−1
πE
=:
% N
=
r̂E
E
i, j=0 ai a j (2 − δi )(2
− δ j ) cos(iϕ) cos( jϕ) cos(ϕ) dϕ
(2π)2 E
(A.20)
where we used cos(iϕ) cos(ϕ) = cos[(i+1)ϕ]+cos[(i−1)ϕ]
, inserted the orthogonality
2−δi
% π (2−δi ) cos(iϕ) cos( jϕ)
of
the cosine −π
dϕ = δi j , and combined (ai ai−1 + ai ai+1 ) =
2π
2 ai ai−1 . To maximize, we zero the derivative to am
178
Appendix
r̂E
r̂E
1 − 2 E =
r̂E − E rE = 0
E
E
E
= am−1 + am+1 − (2 − δm )am rE = 0.
rE =
+am−1
= am rE , we recIf we assume that am = cos(mα), and we insert this for am+12−δ
m
cos[(m+1)α]+cos[(m−1)α]
ognize by inserting the above theorem cos(mα) cos(α) =
2−δm
cos[(m + 1)α] + cos[(m − 1)α]
= cos(mα) rE = cos(mα) cos(α)
2 − δm
that rE = cos α. And to maximize rE by constraining that aN+1 = 0, we get the
1
90◦
= ± N+1
smallest-possible spread α = ± π2 N+1
cos
am =
0,
π
m
2 N+1
, for 0 ≤ m ≤ N,
elsewhere.
(A.21)
The max-r E panning function in 2D consequently is
gN (ϕ) =
N
am m (ϕs ) m (ϕ) =
m=−N
A.3.5
N
cos
π
m
2 N+1
m (ϕs ) m (ϕ).
(A.22)
m=−N
Towards Spherical Harmonics (3D)
The spherical harmonics are harmonics depending only on angular terms. We may
superimpose both parts ϕζ = ϕ + ζ of the Laplacian and solve the eigenproblem
r 2 ϕζ Y = −λY
2
1
∂2
∂
2 ∂
Y
+
(1
−
ζ
Y
−
2ζ
)
Y = −λY.
1 − ζ 2 ∂ϕ 2
∂ζ
∂ζ 2
We assume Y to be a product of the azimuth harmonics m (ϕ) from above and
undefined zenith harmonics (ζ )
Y = m
,
(A.23)
which yields a differential equation (∂ → d) only in ζ after inserting
−m 2 m
d
−m 2
m − 2ζ m
1 − ζ2
dζ
+ (1 − ζ 2 )m
d2
dζ 2
= −λm .
d2
d2 ϕ m
=
Appendix
179
And after dividing by m , we obtain the associated Legendre differential equation
−m 2
1 − ζ2
d
d2
+ (1 − ζ 2 ) 2
dζ
dζ
2
2 m
d
d
(1 − ζ 2 ) 2 − 2ζ
+λ−
dζ
dζ
1 − ζ2
A.3.6
− 2ζ
= −λ ,
= 0.
(A.24)
Zenithal Solution: Associated Legendre Differential
Equation
The associated Legendre differential equation (written in x and y for mathematical
simplicity) is
(1 − x 2 )y − 2x y + λ −
m2
y = 0,
1 − x2
or after gathering the derivatives
2 (1 − x )y + λ −
m2
y = 0.
1 − x2
1
Simplifying the differential equation by 1−x
2 . In the associated Legendre differential
1
equation, we would like to get rid of the denominator 1−x
2 . In this case, it is typical
2 α
to substitute y = (1 − x ) v and try out which α succeeds. For insertion into the
differential equation, the derivative of y is
y = −α(1 − x 2 )α−1 2x v + (1 − x 2 )α v = −2α(1 − x 2 )α−1 x v + (1 − x 2 )α v ,
and the second-order derivative term is
[(1 − x 2 )y ] = −2α(1 − x 2 )α x v + (1 − x 2 )α+1 v
= 4α 2 (1 − x 2 )α−1 x 2 v − 2α(1 − x 2 )α v − 2α(1 − x 2 )α x v
− 2(α + 1)(1 − x 2 )α x v + (1 − x 2 )α+1 v
4α 2 2
2
2 α
x v − 2α v − 2(2α + 1) x v + (1 − x ) v .
= (1 − x )
1 − x2
Together with the term λ −
becomes
m2
1−x 2
y, the associated Legendre differential equation
180
Appendix
(1 − x 2 )α
m2
4α 2
2
2 x
v
−
2α
v
+
λ
−
+
(1
−
x
)
v
v
−
2(2α
+
1)
x
v
=0
1 − x2
1 − x2
−m 2
We see that the term
1
1−x 2
1−
4α 2
m2
1−
x2
x2
v + (λ − 2α) v − 2(2α + 1) x v + (1 − x 2 ) v = 0.
entirely cancels by α =
y=
m
,
2
which fixes the substitution
m
1 − x 2 v.
(A.25)
Note that for rotational symmetric solutions around the Cartesian z coordinate,
the choice of m = 0 would ensure a constant
part m = const. Re√ azimuthal
m
inserting x = ζ = cos ϑ, the preceding term 1 − cos2 ϑ = sinm ϑ is understandably required to represent shapes that aren’t rotationally symmetric around z, but
any other, freely rotated axis, for which we also required the sinusoids in 2D. The
differential equation for v = v(cos ϑ) is
(1 − x 2 ) v − 2(m + 1) x v + [λ − m(m + 1)] v = 0.
(A.26)
Still, the above equation is singular at x ± 1, which means that the secondderivative term multiplied by (1 − x 2 ) vanishes there, rendering the differential
equation into a first-order differential equation, locally. Instead of the more comprehensive Frobenius method we keep it simple: Desired spherical polynomials
m−1 θx −θy
θx
P (θ ,θ )
1
m
m
= √m x 2ym
Yn = m n = Pn (θx , θy , θz ) with m (ϕ) ∝ √ 2 m
θ
θ
θy
1−θz
1−θz
y
x
m
2 P
imply that m
must
contain
1
−
θ
(θ
)
to
be
polynomial
and
nth-order:
in
n−m
z
n
√ z m n−m
k
a
x
,
see
also
[4].
condensed notation this is y = 1 − x 2
k=0 k
∞
k
Power-series for v. With v = k=0 ak x , we get after inserting and deriving
(1 − x 2 )
∞
k(k − 1) ak x k−2 − 2(m + 1)x
k=2
∞
k ak x k−1
k=1
∞
+ λ − m(m + 1)
ak x k = 0,
k=0
∞
k=2
k(k − 1) ak x k−2 −
∞
k=2
k(k − 1) ak x k − 2(m+1)
∞
k ak x k
k=1
∞
+ λ − m(m + 1)
ak x k = 0.
k=0
For k ≥ 2, all sum terms are present and the comparison of coefficients for the kth
power yields:
Appendix
181
(k + 1)(k + 2) ak+2 = k(k − 1) + 2(m + 1)k − λ − m(m + 1) ak
k(k + 2m + 1) + m(m + 1) − λ
ak+2 =
ak .
(k + 1)(k + 2)
Typically for such a two-step recurrence, two starting conditions a0 = 1, a1 = 0 and
a0 = 0, a1 = 1 yield a pair of linearly independent solutions (even and odd).
If the series in x should converge, it will most certainly do so when v is polynomial
and stops at some order.
y to be of some arbitrary finite order n ∈ Z, we
√ To design
m
take into account that 1 − x 2 is of mth order already, so the polynomial v must be
(n − m)th order, and |m| ≤ n. The series is forced to stop the coefficient ak for k =
n − m if the numerator is forced to become zero by a suitably chosen λ, thus λ = (n −
m)(n + m + 1) + m(m + 1) = n(n + 1). Corresponding to the termination either
at an even or odd k = n − m, even a0 = 1, a1 = 0 or odd a0 = 0, a1 = 1 starting
conditions must be chosen. The otherwise wrong-parity solution is an infinite series
[5, Eq. 3.2.45] whose convergence radius R indicates singularities at x = ±1,
R = lim
ak
k→∞ ak+2
(k + 1)(k + 2)
= lim
k→∞ k(k + 2m + 1) − m(m + 1) − n(n + 1)
k→∞
= lim
k2 + · · ·
= 1.
k2 + · · ·
(A.27)
Using λ = n(n + 1) and writing the differentials in condensed form, the defining
differential equations for associated Legendre functions Pnm (m is no exponent but a
second index) and their polynomial part vnm become
m(m + 1)
d
2 d
m
Pnm = 0,
(1 − x ) Pn + n(n + 1) −
dx
dx
1 − x2
2 −m d
2 m+1 d m
(1 − x )
vn + n(n + 1) − m(m + 1) vnm = 0.
(1 − x )
dx
dx
(A.28)
(A.29)
Orthogonality of associated Legendre functions. The resulting associated Legendre
differential equation
(1 − x 2 ) Pnm
+ n(n + 1) −
m2
Pnm = 0,
1 − x2
yields a sequence of finite-order functions Pnm with the order n ∈ N0 and |m| ≤ n.
%1
Before even defining these functions, we can prove their orthogonality −1 Pnm Plm
dx = 0 for n = l. This means no product of a pair of associated Legendre functions
of different indices n = l produces any constant part on x ∈ [−1; 1], and Pnm and Plm
do not contain shapes of the respective other function. This is important to uniquely
decompose shapes and to define transformation integrals. We multiply the differential
equation with Plm and integrate it over x
182
Appendix
$ 1
(1 − x )
2
−1
Pnm
$
Plm
dx +
1
−1
m2
n(n + 1) −
Pnm Plm dx = 0.
1 − x2
Integration by parts of the first integral yields
&
$ 1
$ 1
m m
m m &1
2
2
&
Pl dx = (1 − x ) Pn Pl & −
(1 − x 2 ) Pnm Plm dx,
(1 − x ) Pn
−1
−1
−1
=0
(A.30)
where the vanishing part is because of (1 − x 2 ) = 0 at the endpoints x = ±1 where
[Pnm ] and Plm are finite. We get
$
1
−1
(1 − x 2 ) Pnm Plm dx =
$
1
−1
n(n + 1) −
m2
Pnm Plm dx.
1 − x2
We could have arrived at an alternative expression, with the only difference in l(l + 1)
instead of n(n + 1),
$
1
−1
(1 − x 2 ) Plm Pnm dx =
$
1
−1
l(l + 1) −
m2
Plm Pnm dx,
1 − x2
if we had started integrating the differential equation of Plm over Pnm , instead. The
difference of both equations is
n(n + 1) − l(l + 1)
$
1
−1
Pnm Plm dx = 0,
and the scalar in brackets only vanishes for n = l. For the equation to hold at other
%1
n = l, we conclude that the associated Legendre functions −1 Pnm Plm dx = 0 must
be orthogonal. (Orthogonality needs not hold for different m, as m achieves this
orthogonality in azimuth.)
Solving for polynomial part of associated Legendre functions. To solve the differential equation for the polynomial part vnm in a way to arrive at the elegant Rodrigues
formula, we first play with a test function
u n = (1 − x 2 )n , differentiated u n = −2n x (1 − x 2 )n−1 = −2n (1 − x 2 )−1 x u n .
We may write its derivative as differential equation
(1 − x 2 )u n + 2n x u n = 0
Appendix
183
and derive it l times by the Leibniz rule ( f g)(n) = nk=0 nk f (k) g (n−k) for repeated
n!
and f (k) =
differentiation of products, with the binomial coefficient nk = k!(n−k)!
dk f
dx k
for simplicity. The few non-zero derivatives of x and (1 − x 2 ) simplify differentiation, x = 1, [(1 − x 2 ) = −2x] = −2,
(l − 1)l
(l−1)
+ 2n x u (l)
= 0,
(−2) u (l−1)
n
n + 2n l u n
2
(l−1)
− 2(l − n) x u (l)
= 0.
(1 − x 2 ) u (l+1)
n
n + l(2n − l + 1) u n
(1 − x 2 )u (l+1)
+ l (−2x) u (l)
n
n +
This equation matches (1 − x 2 ) vnm − 2(m + 1) x vnm + [n(n + 1) − m(m + 1)]
vnm = 0 by matching the coefficients l − n = m + 1, hence l = m + n + 1, which
nicely implies l(2n − l + 1) = n(n + 1) − m(m + 1),
− 2(m + 1) x u (m+n+1)
+ n(n + 1) − m(m + 1) u (m+n)
= 0.
(1 − x 2 ) u (m+n+2)
n
n
n
We therefore find the solutions vnm = u (n+m)
=
n
√
m dn+m
2 n
2
1 − x dx n+m (1 − x ) .
dn+m
(1
dx n+m
− x 2 )n yielding ynm =
Rodrigues formula. By of the above, the Rodrigues formula for the associated Legendre functions Pnm becomes
m dn+m
(−1)n+m 2
1
−
x
(1 − x 2 )n
2n n!
dx n+m
m dm
or Pnm = (−1)m 1 − x 2
Pn ,
dx m
Pnm =
(A.31)
with Pn =
(−1) d
(1 − x 2 )n .
2n n! dx n
n
n
and Pn = Pn0 are the Legendre polynomials. The Legendre polynomials are norn
. Because (1 − x 2 ) is zero at x = 1 with
malized to Pn (1) = 1 by the factor (−1)
2n n!
any positive integer exponent, only the part of its n-fold derivative that exclusively affects the power of (1 − x 2 )n for n times is responsible for its value there:
n!(−2x)n (1 − x 2 )0 |x=1 = n!2n (−1)n . The scaling of the associated Legendre functions with m > 0 is somewhat more arbitrary in sign and value.
Indies n and m. The boundaries for the index m of the Legendre functions m ∈ Z are
m2
typically −n ≤ m ≤ m, however due to the shift of the eigenvalue by 1−x
2 , functions
for positive and negative m are linearly dependent. We observe this by inspecting the
highest-order terms in [6]
m
dn+m (1 − x 2 )n
2n n! 1 − x 2 Pnm = (−1)n+m (1 − x 2 )m
n+m
dx
n+m (2n)!
d
x n−m − · · ·
= x 2m n+m x 2n − · · · = x 2m
dx
(n − m)!
n−m
2 n
m
d
(1 − x )
2n n! 1 − x 2 Pn−m = (−1)n+m
dx n−m
184
Appendix
(2n)!
dn−m 2n
m
n+m
x
x
−
·
·
·
=
(−1)
−
·
·
·
dx n−m
(n + m)!
(n
−
m)!
= (−1)m
(A.32)
Pm
(n + m)! n
= (−1)m
=⇒ Pn−m
and to avoid confusion, it convenient to only use m ≥ 0, or |m| to evaluate the
associated Legendre functions.
Alternative definition: three-term recurrence. Any polynomial
Pn of the order n
n
ci Pi , and the Legendre
can be decomposed into Legendre polynomials Pn = i=0
%1
polynomial P j is orthogonal to all those Legendre polynomials −1 Pn P j dx = 0 if
%1
j > n. With this knowledge it is interesting to describe −1 (x Pi )P j dx. As (x Pi ) is
of (i + 1)th order, the integral must vanish for j > i + 1. Because of commutativity,
%1
−1 Pi (x P j )dx, and (x P j ) being ( j + 1)th order, it also vanishes for i > j + 1.
Hereby, re-expansion of x Pn can maximally use three terms, x Pn = α Pn−1 + γ Pn +
β Pn+1 . In fact only two terms remain as P2k are even functions on x ∈ [−1; 1] and
P2k+1 are odd, thus orthogonal. The product x Pn changes the parity of Pn , leaving
x Pn = α Pn−1 + β Pn+1 . At x = 1 all polynomials were normalized to Pi (1) = 1,
therefore evaluation at x = 1 leaves 1 = α + β, so α = 1 − β, hence
x Pn = βn Pn+1 + (1 − βn ) Pn−1 .
As also the associated Legendre functions Pnm for a specific m are orthogonal, the
recurrence is more general
m
m
+ (1 − βnm ) Pn−1
.
x Pnm = βnm Pn+1
To determine the coefficient βnm , we only need to find out how the highest-power coefm
are related. We see this after
ficients x n−m+1 of the polynomial parts in x Pnm and Pn+1
√
√
n+m
m (−1)n+m dn+m
m
2 n
2
inserting Pn = 1 − x
(1 − x ) and division by 1 − x 2 (−1)
,
2n n! dx n+m
2n n!
which leaves a recurrence for the polynomial part
x vnm
O=n−m+1
=−
βnm
m
vm − 2(n + 1)(1 − βnm )vn−1
.
2(n + 1) n+1 O=n−m−1
O=n−m+1
m
m
Of the highest powers x n−m+1 in both x vnm and vn+1
the coefficients cn,n−m
and
m
cn+1,n−m+1 define
βnm = −2(n + 1)
m
cn,n−m
m
cn+1,n−m+1
.
To find it, we binomially expand (1 − x 2 )n to (1 − x 2 )n = (−1)n
x 2(n−k)
n
k=0 (−1)
k n
k
Appendix
185
n−m
n 2 n (2n − 2k)!(−1)k n−m−2k
vnm
dn+m n
k 2(n−k)
(−1)
x
=
x
=
,
k (n − m − 2k)!
(−1)n
dx n+m k=0 k
k=0
n! (2n)!
(2n)!
m
so that with k = 0 we can find cn,n−m
= (−1)
= (−1)
for the highestn!
(n−m)!
(n−m)!
m
power coefficient of vn . Accordingly, coefficient of the recurrence is
n
βnm = 2(n + 1)
hence with 1 − βnm =
Pnm recursively by
n+m
,
2n+1
n
(n − m + 1)
n−m+1
=
,
(2n + 1)(2n + 2)
2n + 1
and x Pnm =
+
n−m+1 m
Pn+1
2n+1
n+m m
P ,
2n+1 n−1
we can construct
2n + 1
n+m
x Pnm −
Pm .
n−m+1
n − m + 1 n−1
m
Pn+1
=
2n √
n 2n
1 − x 2 dxd 2n (1 − x 2 )n =
The start value is Pnn = (−1)
2n n!
m
is excluded.
n = m, the term Pn−1
(−1)n (2n)!
2n n!
(A.33)
√
n
1 − x 2 , and for
Normalization. A unity square integral (orthonormalization) simplifies the definition
of transform integrals. We would like to obtain the corresponding factor Nnm with
$
1
−1
(Pnm Nnm )2 dx = 1.
Normalization for m = 0 is easy to find by repeated integration by parts
22n n!2
= 22n n!2
(Nn )2
$
(n−1) (n) &1
(1 − x 2 )n &−1
(Pn )2 dx = (1 − x 2 )n
−1
1
$
−
$
= · · · = (−1)n
$
=
1
−1
1
−1
=0
1
−1
(1 − x 2 )n
(n−1) (n)
(1 − x 2 )n dx
(2n)
(1 − x 2 )n (1 − x 2 )n
dx
$
π
(1 − x 2 )n (2n)! dx = (2n)!
sin2n ϑ sin ϑ dϑ.
0
%π
(2n)!!
2 (2n)!
22n n!2
With the integral 0 sin2n+1 ϑ dϑ = 2 (2n+1)!!
= 2 (2n+1)!
, this is (Nn )−2 = (2n+1)!
=
2
m
m
−m
. For Nn , a trick to insert the relation between Pn and Pn is used [6], and
2n+1
integration by parts until the differentials are of the same order
$ 1
$ 1
(−1)m (n − m)! −m
m P m dx =
Pn dx
=
P
Pnm
n n
m
2
(n + m)!
(Nn )
−1
−1
1
186
Appendix
$
(n+m) (n−m)
1 (−1)m (n − m)! 1 = 2n 2
(1 − x 2 )
(1 − x 2 )
dx = · · ·
(n + m)!
2 n!
−1
$
(n−m) (n−m)
1 2 (n + m)!
(n − m)! 1
(1 − x 2 )
(1 − x 2 )
dx =
=
2n
2
(n + m)! −1 2 n!
2n + 1 (n − m)!
'
⇒ Nnm = (−1)m
=1/(Nn )2
(2n + 1) (n − m)!
2
(n + m)!
(A.34)
The (−1)m can be excluded if not used in the Rodrigues formula. (It is always a wise
idea to check and compare signs as conventions may differ … in practice (−1)m is a
rotation around z by 180◦ .)
A.3.7
Spherical Harmonics
With all the above definitions, we obtain the fully normalized spherical harmonics
Ynm (ϕ, ϑ) =Nn|m| Pn|m| (cos ϑ) m (ϕ)
(A.35)
Orthonormality. They are orthonormal when integrated over the sphere
$
S2
Ynm Ynm d cos θ dϕ = δnn δmm .
(A.36)
Transform integral. Because of their completeness in the Hilbert space, any squareintegrable function g(ϕ, ϑ) can be decomposed by
g(ϕ, ϑ) =
∞ n
γn m Ynm (ϕ, ϑ).
(A.37)
n=0 m =−n From a known function g(ϕ, ϑ), the coefficients are obtained by integrating g
%1
% 2π
with another spherical harmonic Ynm over the unit sphere S2 , −1 d cos ϑ 0 dϕ.
For a simple notation, we gather the two variables in a direction vector θ =
[cos ϕ sin ϑ, sin ϕ sin ϑ, cos ϑ]T and write
$
$
S2
g(θ) Ynm
dθ =
∞ n
n =0 m =−n γn m Ynm Ynm dθ = γnm .
S2
δnn δmm (A.38)
Appendix
187
Parseval’s theorem.
Due to orthonormality, the integral norm of any pattern g(θ)
n
m
composed as ∞
n=0
m=−n γnm Yn (θ ) is equivalent to
$
|g(θ)| dθ =
2
S2
because
% |γnm |2
(A.39)
n=0 m=−n
n,n ,m,m S2
∞ n
γnm γn∗ m Ynm (θ ) Ynm (θ) dθ =
n,n ,m,m γnm γn∗ m δnn δmm .
3D panning functions: Dirac delta on the sphere. An infinitely narrow range around
the desired direction θ s can be described by limiting the dot product θ Ts θ > cos ε →
1. A unit-surface Dirac delta distribution δ(1 − θ Ts θ ) can be described as
θ Ts θ )
δ(1 −
1
=
2π
1
, for arccos θ Ts θ < ε
limε→0 1−cos
ε
0,
otherwise.
(A.40)
And its coefficients are found by the transformation integral
$
γnm =
$
S2
Ynm (θ ) dθ
=
Ynm (θ s )
$
2π
dϕ lim
0
1
ε→0 cos ε
dζ = Ynm (θ s ).
(A.41)
Typically, a finite-order panning function with n ≤ N employs a weight an to reduce
side lobes
gN (θ ) =
N n
an Ynm (θ s ) Ynm (θ ).
(A.42)
n=0 m=−n
T
Assuming the panning
(direction is θ s = [0, 0, 1] , we get the axisymmetric panning
function, with Yn0 =
2n+1
Pn ,
4π
gN (ϑ) =
N
1 2n + 1
an Pn (cos ϑ).
2π n=0 2
(A.43)
We can evaluate its E measure by integrating gN2 over the sphere
$
E=
$
2π
dϕ
0 1
−1
gN (ζ )2 dζ =
2π
=
δi j
N
2n + 1
n=0
$ 1
2i + 1 2 j + 1
ai a j
Pi P j dζ
2
2
−1
i, j
2
an2 .
2
2i+1
(A.44)
188
Appendix
The r E measure is, because of the axisymmetry, perfectly aligned with z, therefore,
its length is calculated by
%1
−1
rE =
gN (ζ ) ζ dζ
E
2i+1
i, j 4
=
ai a j
=
%1
−1
i, j
2i+1 2 j+1
2
2
ai a j
%1
−1
j+1
2 j+1
Pi
j
P j+1 + 2 j+1
P j−1
ζ Pj
dζ
E
Pi [( j + 1)P j+1 + j P j−1 ]dζ
E
N
[n
a
a
+
(n
+
1) an an+1 ]
n an an−1
n
n−1
n=0
= n=1
.
2E
E
N
=
(A.45)
3D max-r E . For the narrowest-possible spread, we maximize rE , which we decompose into rE = r̂EE and we zero its derivative, as for 2D,
r̂E
r̂E
1
− 2 E = [r̂E − E rE ] = 0
E
E
E
n an−1 + (n + 1) an+1 − (2n + 1) an rE = 0.
rE =
(A.46)
If we assume that an = Pn (ζ ), we see by (n + 1)Pn+1 + n Pn−1 = (2n + 1) Pn ζ
(n + 1) Pn+1 + n Pn−1 = (2n + 1) Pn ζ = Pn rE
(A.47)
that rE = ζ and an = Pn (ζ ) = Pn (rE ). We maximize rE under the constraint that
PN+1 (rE ) = 0. Therefore, rE must be as close to 1 as possible, and be a zero of
the Legendre polynomial PN+1 . It can be discovered by a root-finding algorithm in
MATLAB, e.g. Newton–Raphson, when the function Pn is implemented. In [7], the
2.4062
137.9◦
= cos N+1.51
was given.
useful approximation rE = cos N+1.51
Squared norm mirror/rotation invariance. The norm of any pattern a(θ ) is invariant
under orthogonal coordinate transform (rotation/mirror) θ̂ = R θ with RT R = I,
$
$
S2
a 2 (θ ) dθ =
S2
b2 (θ ) dθ ,
with b(θ ) = a(R θ ).
(A.48)
The norm equivalence of the corresponding spherical harmonics coefficients αnm and
βnm follows from Parseval’s theorem
∞ n
n=0 m=−n
|αnm |2 =
∞ n
|βnm |2 .
(A.49)
n=0 m=−n
In vector notation of the coefficients, i.e. α = [α00 , . . . , αNN ]T and β = [β00 , . . . ,
βNN ]T , this is α2 = β2 . To fulfill this equivalence, both vectors are related by an
orthogonal matrix β = Q α with Q T Q = I, and hence b2 = β T β = α T Q T Qα =
Appendix
189
α T α = α2 . Moreover, rotation/mirroring neither creates components of higher nor
lower orders, so that Q must be block structured
⎤
Q0 0 . . . 0
⎢
.. ⎥
⎥
Q=⎢
⎣ 0 Q1 0 . ⎦ .
.. . . . . . .
. . .
.
⎡
(A.50)
The order subspaces in n therefore stay de-coupled, so that the coefficient vectors
for every order-subspace n are related and norm equivalent under mirror/rotation
operations
α n 2 = β n 2 ,
β n = Qn αn ,
Q Tn Q n = I 2n+1 .
(A.51)
Pseudo all-pass character of the Dirac delta. Dirac delta distributions δ(θ T θ s − 1)
yield the coefficients Ynm (θ s ), and due to rotation invariance they yield constant energy
in every spherical harmonic order n, regardless of the aiming θ s . One can determine
the norm for zenithal aiming ϑ = 0, i.e. θ z = 0, 0, 1T , yielding a non-zero coefficient
for m = 0
Ynm (θ z )
=
(
=1
2n+1
4π
(
δm .
Pn (1) δm = 2n+1
4π
Because of the rotation invariance we recognize a pseudo-allpass character (Unsöld
theorem) of the spherical harmonics of any order n
n
n
|Ynm (θ s )|2 =
m=−n
|Ynm (θ z )|2 =
2n+1
4π
= (2n + 1) |Y00 (θ s )|2 .
m=−n
For encoded single-direction Ambisonic signals αnm (t), this implies
n
|αnm (t)|2 = (2n + 1) |α00 (t)|2 .
(A.52)
m=−n
Expected norm in the diffuse field. An ideal diffuse sound field is composed of
directional signals a(θ s , t) from all directions θ , with no correlation for signals
from different directions E{a(θ 1 , t) a(θ 2 , t)} = σa2 δ(θ T1 θ 2 − 1). Its coefficients are
obtained by the integral over the directions
$
αnm (t) =
S2
a(θ s , t) Ynm (θ s ) dθ s ,
(A.53)
and we can show that not only the expected directional signals, but also the spherical
harmonic coefficients are orthogonal by E{a(θ 1 , t) a(θ 2 , t)} = σa2 δ(θ T1 θ 2 − 1) and
190
Appendix
%
Ynm Ynm dθ = δnn δmm of the spherical harmonics
$ $
E{αnm (t) αn m (t)} =
E{a(θ 1 , t) a(θ 2 , t)} Ynm (θ 1 ) Ynm (θ 2 ) dθ 1 dθ 2
S2 S2
$
2
= σa
Ynm (θ) Ynm (θ) dθ = σa2 δnn δmm .
(A.54)
orthonormality
S2
S2
In a perfectly diffuse field, we therefore expect the same norm in every spherical
harmonic component per frequency band. We could reformulate this to
E{|αnm (t)|2 } = E{|α00 (t)|2 },
however the temporal disjointness assumption of SDM only invents a drastically
thinned out content in the individual higher-order spherical harmonics. To cover the
available temporal information from all (2n + 1) spherical harmonic signals within
each order n and for a similar formulation as for a single-direction component, we
may re-formulate
n
*
)
E |αnm (t)|2 = (2n + 1) E{|α00 (t)|2 }.
(A.55)
m=−n
Spherical convolution. By the argumentation used above to prove rotation invariance, we can argue that isotropic filtering of spherical patterns is invariant under
rotation, and must therefore depend only on the order n. Spherical convolution of is
defined in [8] by the coefficients βnm of a function b(θ ) convolved with the coefficients αn of a rotationally symmetric shape a(θ ) = a(θz )
γnm = αn βnm .
(A.56)
Spherical cap function. A rotationally symmetric spherical cap function at ± α2
centered around ϑ = 0, briefly θ z , can be written in terms of a unit-step. We find the
shape coefficients wn for its spherical harmonic decomposition by
u(cos ϑ − cos α2 ) =
∞ n
wn Ynm (θ) Ynm (θ z ) =
n=0 m=−n
where Ynm (θ z ) =
(
2n+1 m
Pn (1)
4π
∞
wn Pn (cos ϑ) 2n+1
, (A.57)
4π
n=0
=
(
2n+1
δ
4π m
and Pn0 = Pn was used. The coefficients
%1
wn are obtained by integration over Legendre polynomials d cos ϑ −1 Pn (cos ϑ)
%1
2
δnn ,
and for the right hand-side using their orthogonality −1 Pn (ζ )Pn (ζ ) dζ = 2n+1
2 2n+1
leaving wn 2n+1 4π , so that
$
wn = 2π
1
cos
α
2
Pn (x) dx.
Appendix
191
n n
The integral is solved by 2n n!Pn = ddxvn with v = (x 2 − 1) after replacing the innerdv d
d
= 2x dv
, and Leibniz’ rule for repeated derivatives
most derivative by dxd = dx
dv
n
dn−1 2x dv
dn v n
dn−1 (2xnvn−1 )
dv
=
=
dx n
dx n−1
dx n−1
0
1
0
d x dn−1 vn−1
1
d x dn−2 vn−1
= (2n)
+
n − 1 dx 0 dx n−1
n − 1 dx 1 dx n−2
n−1 n−1
d v
dn−2 vn−1
.
= (2n) x
+ (n − 1)
dx n−1
dx n−2
We may increase n by one, observe that the last expression has one fewer differential,
n n
thus is an integrated version, and obtain after re-inserting 2n n!Pn = ddxvn
2
n+1
(n + 1)! Pn+1 = 2(n + 1) 2 n! x Pn + n2 n!
$
Pn+1 − x Pn
+C
Pn dx =
n
1
x0
Pn dx − nC
n
With the definite integration limits x0 = cos α2 and 1 and
lower boundary as Pn+1 (1) − 1 · Pn (1) = 0,
$
$
n
(A.58)
%1
Pn+1 (x0 ) − x0 Pn (x0 )
n
Pn+1 cos α2 − cos α2 Pn cos α2
,
wn = −2π
n
x0
Pn dx only depends on
Pn (x)dx = −
(A.59)
for n > 0,
(A.60)
%1
and w0 = 2π cos α dx = 2π(1 − cos α2 ) for n = 0.
2
The recurrence (2n + 1)x Pn −(n + 1)Pn+1 = n Pn−1 yields alternatively for n>0
Pn−1 cos α2 − cos α2 Pn cos α2
Pn+1 cos α2 + Pn−1 cos α2
= 2π
.
wn = 2π
2n + 1
n+1
(A.61)
A.4
Encoding to SH and Decoding to SH
Mode-matching decoder: L loudspeakers driven by the weights gl and given by their
directions {θl } produce a pattern f (θ ) linearly composed of Dirac deltas
192
Appendix
f (θ) =
L
δ(θ T θl − 1) gl
(A.62)
l=1
=
∞ n
Ynm (θ )
n=0 m=−n
L
Ynm (θl ) gl = y(θ )T Y g,
l=1
in an order-unlimited representation. The vector y = [Ynm (θ)]nm contains all the
spherical harmonics 0 ≤ n ≤ ∞, −n ≤ m ≤ n, in a suitable order, e.g. Ambisonic
Channel Number (ACN) n 2 + m + n, and the matrix Y = [ y(θl )]l contains the spherical harmonic coefficient vectors of every loudspeaker. Obviously, the spherical harmonic coefficients synthesized by the loudspeakers are φ = Y g, so that
f (θ ) = y(θ)T Y g = y(θ)T φ.
With L loudspeakers, at most (N + 1)2 ≤ L spherical harmonics can be controlled.
Therefore control typically restricts to the under-determined Nth-order subspace
φ N = Y N g,
in which we can synthesize any coefficient vector φ N . To get a finite and welldetermined solution with the exceeding and arbitrary degrees of freedom in g, the
least-squares solution for g is searched under the constraint
min g2
subject to: φ N = Y N g,
(A.63)
yielding the cost function with the Lagrange multipliers λ
J (g, λ) = g T g + (φ N − Y N g)T λ.
For the optimum in g, its derivative to g is zero, and in λ the corresponding derivative:
∂J
= 2g opt − Y TN λ = 0,
∂g
∂J
= φ N − Y N g = 0.
∂λ
For g the equation yields g opt = 21 Y TN λ, and for λ the original constraint φ N = Y N g
that only allows to insert the optimal g yielding φ N = Y N ( 21 Y TN λopt ). Inversion of
( 21 Y N Y TN )−1 from the left yields the multipliers 2(Y N Y TN )−1 φ N = λopt , so that
g = Y TN (Y N Y TN )−1 φ N .
The solution is right-inverse to Y N , i.e. Y N [Y TN (Y N Y TN )−1 ] = I.
(A.64)
Appendix
193
Best-fit encoder by MMSE: When given M samples of a spherical function g(θ) at
the locations {θl }, we can minimize the means-square error (MMSE)
min
L
g(θl ) −
N n
2
Ynm (θl ) γnm
n=0 m=−n
l=1
to find suitable spherical harmonic coefficients γnm . Using the matrix notation from
above, this is
min e2 = min g − Y TN γ N 2
(A.65)
and we find by zeroing the derivative
∂e T
∂ T
e e=2
e = 2Y N e = 2Y N Y TN γ N − 2Y N g = 0,
∂γ N
∂γ N
=⇒ γ N = (Y TN Y N )−1 Y N g.
(A.66)
The resulting matrix is left-inverse to the thin matrix Y TN , and can be written in terms
of the more general pseudo inverse (Y TN )† .
A.5
Covariance Constraint for Binaural Ambisonic
Decoding
The interaural covariance matrix is related to the expectation value of the auto/crosscovariances of the left and right HRTFs:
∗
R =
$ S2
$
h left (θ , ω) ∗
∗
h left (θ , ω) h right (θ , ω) dθ =
h(θ, ω)h(θ , ω)H dθ .
h right (θ, ω)
S2
(A.67)
When
in terms of spherical harmonic coefficients h = yT hSH , the inte% specified
∗
gral S2 h 1 h 2 dθ of any of R’s entries vanishes by the orthogonality of the spherical
%
H
T
harmonics hH
SH1 S2 y y dθ hSH2 = hSH1 hSH2 , and we obviously only need the inner
product between the spherical-harmonic coefficients of the HRTFs.
A very-high-order spherical harmonics HRTF dataset H H
SH of dimensions 2 ×
(M + 1)2 with the order (M N) yields a covariance matrix at every frequency
H
R = HH
SH H SH = X X
that can be factored into a quadratic form of a 2 × 2 matrix X by Cholesky factorization, which reduces the degrees of freedom involved to the minimum required size.
194
Appendix
The Nth-order Ambisonically reproduced, high-frequency modified HRTF dataset
H̃ SH of dimensions 2 × (M + 1)2 also has a 2 × 2 covariance matrix R̂ that will
differ from R, and which we also decompose in Cholesky factors X̂,
H
H
R̂ = Ĥ SH Ĥ SH = X̂ X̂.
(A.68)
To equalize R = R̂, the reproduced HRTF set is corrected by a 2 × 2 filter matrix
M,
Ĥ SH,corr = Ĥ SH M.
(A.69)
This is done properly as soon as
I
X X = M X̂ X̂ M = M X̂ Q H Q X̂ M,
H
H
H
H
H
(A.70)
and the orthogonal matrix Q is used to compensate for degrees of freedom that the
Cholesky factors X and X̂ have in sign, phase, mixing, with regard to each other. We
recognize the root and hereby the preliminary solution for M
X = Q X̂ M,
⇒ M = X̂
−1
Q H X.
(A.71)
−1
This leaves Ĥ SH,corr = Ĥ SH X̂ Q H X depending on an unspecific orthogonal 2 ×
2 matrix Q. To obtain a corrected-covariance HRTFs Ĥ corr.SH of highest-possible
phase-alignment and correlation to its uncorrected counterpart Ĥ SH , we maximize
the trace, i.e. the sum of diagonal elements
H
H
max ReTr{ Ĥ SH Ĥ corr,SH } = max ReTr{ Ĥ SH Ĥ SH X̂
H
max ReTr{ X̂ X̂ X̂
−1
H
−1
Q H X} =
H
Q H X} = max ReTr{ X̂ Q H X} = max ReTr{ Q H X̂ X}.
For the last expression, the property Tr{ AB} = Tr{B A} was used. An orthogonal
matrix Q H = V U H composed of two orthogonal matrices U and V would yield
H
H
Tr{ X̂ X V U H } = Tr{U H X̂ X V }, and it would maximize the trace if U and V H
H
diagonalized X̂ X. This is accomplished by singular-value decomposition (SVD)
H
X̂ X = U SV H , when singular values S = diag{[s1 , s2 ]} are real and positive, as in
most SVD implementations. Using U and V , the desired solution is:
Ĥ corr,SH = Ĥ SH X̂
−1
V U H X.
(A.72)
Appendix
195
If the SVD delivers negative or complex-valued singular values, the complex/negative
factor just need to be pulled out and factored into either the corresponding left or
right singular vector.
A.6
A.6.1
Physics of the Helmholtz Equation
Adiabatic Compression
We search for a physical compression equation relating pressure p and volume V .
Ideal gas. The gas pressure p inside the volume V obeys the ideal gas law [9]
p V = n R T,
(A.73)
with n measuring the amount of substance in moles, R is the gas constant, and T
is the temperature. This would yield a valid compression equation if the medium
of sound propagation was isothermal. However, this is not the case T = const, and
local temperature fluctuations happen too fast to be equalized by thermal dissipation.
Isothermal compression would be too soft, the resulting speed of sound off by −15%.
A compression law involving fluctuations of all three quantities ( p, V, T ) needs an
additional equation.
First law of thermodynamics. In thermodynamics [10–12], the enthalpy H describes
the energy required to heat up a freely expanding gas under constant pressure p. The
enthalpy goes to the internal energy U required to heat up the gas in a constant
volume, which is easier, plus the ideal-gas volume work p V taken by the gas to
expand under the constant external pressure
H = U + p V,
specifically n cp T = n cV T + n R T,
(A.74)
⇒ R = cp − cV .
The quantities cp and cV are the specific heat capacities for heating up a gas that is
expanding ( p = const.) or confined in a fixed volume (V = const.) to a temperature
T , which can be accurately measured or modeled. Obviously, the gas constant R is
the difference between the two. To make sound propagation isenthalpic, the energy
must fluctuate between internal energy U and volume work pV .
Adiabatic process. The above steady-state equations are not useful yet to describe
short-term fluctuations of p, V , and T in time and space. A differential formulation
related to the change in enthalpy, internal energy, and volume work dH = dU +
p dV is more useful. Moreover, we regard packages of a constant amount of substance
whose internal heat up is just due to compression and not due to external enthalpy
sources, it is therefore isenthalpic dH = 0, see [10, Sect. 3.12.2], [13]
196
Appendix
0 = n cV dT +
nRT
dV.
V
c
We may divide by n cV T , replace R = cp − cV , and obtain dT
+ ( cVp − 1) dV
= 0,
T
V
c
cp
p
cp
−1
−1
= 0, hence T V cV =
whose integration yields ln T +( cV − 1) ln V = ln T V cV
pV
1, and with the ideal gas equation inserted as T = n R , the adiabatic process law
becomes
cp
p V cV = n R = const,
(A.75)
c
for which the adiabatic exponent is frequently expressed as γ = cVp . For air, the exponent is γ = 1.4, and we may express a state change as ( p0 , V0 ) → ( p0 + p, V0 + V ).
γ
γ
The equation p0 V0 = ( p0 + p)(V0 + V )γ yields after division by p0 V0 and by
V γ
(1 + V0 ) :
p
V −γ
V
1+
= 1+
≈1−γ
,
p0
V0
V0
hence p = −γ p0
V
.
V0
Assuming the Cartesian coordinates x, y, z measured in the resting gas to define its
volume V0 = xyz, as well as its deflected coordinates ξ(x), η(y), ζ (z) after
a volume change to V , we can approximate the volume change well-enough by the
three independent volume changes ξ yz, xηz, and xyζ , resulting
from the superimposed individual elongation into the three coordinates’ directions,
∂ξ
∂η ∂ζ
V
ξ yz + xηz + xyζ
=
+
+
= ∇T ξ .
= lim
V0 →0 V0
V0 →0
xyz
∂x
∂y
∂z
lim
2
1
Replacing the
√ bulk modulus K = γ p0 = ρ c of air by ∂more common constants,
K
where c =
/ρ and applying the derivative in time ∂t , we get the equation of
compression in its typical form using the velocities ∂ξ
= v,
∂t
∂p
= −ρ c2 ∇ T v.
∂t
A.6.2
(A.76)
Potential and Kinetic Sound Energies, Intensity,
Diffuseness
The potential energy density or volume work stored in the elastic medium that gets
compressed by a deformation dV increases with dwp = p dV , while deformation
also increases the pressure by d p = K dV . We may substitute for dV = K −1 d p
1 Typical
constants are γ = 1.4, p0 = 105 Pa, ρ = 1.2 kg/m3 , c = 343 m/s.
Appendix
197
yielding dwp = K −1 p d p. The volume work stored by a pressure increase from 0 to
p is
$
p
wp =
0
p dp
p2
p2
.
=
=
K
2K
2ρc2
(A.77)
The kinetic energy density stored in the motion of the medium along any axis, e.g. x,
increases by acceleration against its mass, dwvx = ρ dvdtx dx. The velocity is vx = dx
dt
so that we substitute for dx = vx dt to get dwvx = ρ vx dvx . The total kinetic energy
density stored in velocities increasing from 0 to vx , vy , vz is
$
$
vx
wv = ρ
vx dvx +
0
vy
$
vz dvz = ρ
vy dvy +
0
vz
vx2 + vy2 + vz2
0
2
=
ρv2
.
2
(A.78)
Total energy density and intensity. The total energy density therefore becomes
w = wp + wv =
p2
ρv2
,
+
2ρc2
2
(A.79)
and derived with regard to time, it becomes
p ∂p
∂w
∂v
= 2
+ ρvT
= p∇ T v + vT ∇ p = ∇ T ( pv) = ∇ T I,
∂t
ρc ∂t
∂t
(A.80)
and defines the (time-domain) intensity vector I = pv that describes the energy
flow in space. Hereby, ∂w
= ∇ T I expresses that only a non-zero divergence of the
∂t
intensity causes energy increase (source) or loss (absorption) in the lossless medium.
Direction of arrival and diffuseness: The intensity vector carries a meaning in its
own right: it displays into which direction the energy flows (direction of emission).
In the frequency domain, it becomes I = Re{ p ∗ v}, and for a plane-wave sound field
T
∇p
s p
= − θρc
, it indicates the direction of arrival (DOA)
p = eik θ s r , where v = − ikρc
r DOA = −
ρc I
ρc Re{ p ∗ v}
ρc | p|2 θ s
=
−
=
= θ s.
| p|2
| p|2
ρc | p|2
(A.81)
An ideal, uniformly enveloping diffuse field is composed of uncorrelated plane waves
%
T
)∗ a(θ 2 )
a2
√ s ) eikθ s r dθ s .
E{ a(θ 1 4π
} = 4π
δ(1 − θ T1 θ 2 ) resulting in the sound pressure p = a(θ
4π
2
a
While the expected sound pressure is non-zero as before E{| p|2 } = 4π
= | p|2 ,
the expected intensity of the uniformly surrounding waves vanishes −ρc E{I} =
%
a2
θ dθ s = 0.
4π S2 s
Assuming stochastic
interference of all sources, the intensity-based DOA estimap∗ v}
is
therefore the physical equivalent to the r E vector measure.
tor r DOA = − ρc Re{
2
| p|
198
Appendix
A typical diffuseness measure 0 ≤ ψ ≤ 1 relies on its length between 0 and 1
ψ = 1 − r DOA 2 .
(A.82)
√
√
p
= −ρc 2v of a first-order Ambisonic
The signals W = p and [X, Y, Z ]T = 2∇
ik
microphone allow to describe a time-domain estimator r DOA
r DOA = −
A.6.3
ρc E{ p v}
E{W [X, Y, Z ]T }
ρc E{I}
=
−
=
.
√
E{ p 2 }
E{ p 2 }
2 E{W 2 }
(A.83)
Green’s Function in 3 Cartesian Dimensions
We may compose the Green’s function, the solution to the inhomogeneous wave
equation
−
1 ∂2
c2 ∂t 2
G = −δ(t)δ(r),
from products of complex exponentials with regard to time and the Cartesian directions
T
eiω t+ikx x+iky y+ikz z = eik r eiω t ,
(A.84)
where the position x, y, z was gathered in a position vector r, and the wave numbers
kx , ky , kz of the individual coordinates were gathered in a wave-number vector k.
From this solution, we compose the Green’s function by superimposing all spatial and
temporal complex exponentials in k and ω, weighted by an unknown coefficient γ :
$$
G=
γ eik
T
T
r
eiω t dω dk.
T
Because of ei k r = (−k x2 − k 2y − k z2 ) ei k r = −k 2 ei k
insertion into the inhomogeneous wave equation yields
$$
−
γ k2 −
ω2
c2
+
γ k2 −
ω2
c2
r
and
∂ 2 iωt
e
∂t 2
= −ω2 eiωt ,
T
T
%%
T
ei k r ei ω t dω dk = −δ(t) δ(r).
Multiple transformations e−i k̂ r e−iω̂t dr dt
−
(A.85)
%%
remove the integrals by orthogonality
, %
T
%
%
%
T
[ ei (k− k̂) r dr] [ ei(ω−ω̂)t dt] dω dk = − δ(t)e−iω̂t dt δ(r)e−i k̂ r dr ,
(2π )3 δ(k− k̂)
2π δ(ω−ω̂)
1
1
Appendix
199
and the unknown coefficient remains γ =
domain, one
via k
1
2π
1
1
(2π)3+1 k 2 − ω2
c2
in γ and the integral eiωt dω
G=
1
(2π )3
$$
eik
T
k2 −
r
ω2
c2
dk =
%
. Letting G in the frequency
are omitted. We transform γ back
1
(2π )3
$$
eikr ζ
k2 −
ω2
c2
dk.
By re-expressing kT r = kr θ Tk θ r = kr cos ϑ = kr ζ we simplify the integral. Now
we already see formally that Green’s function can only depend on the distance r
between source and receiver location G = G(ω, r ).
The book [14, S.110-112] shows a notably compact derivation, which we will use
below.
Derivation for by transforming back from the Fourier domain: For three dimensions, the transformation back from the Fourier domain is relatively easy to accomplish. Before going into details, we recognize that the substitution of kT r by kr cos ϑ
contains the radius of the wave vector k = k and the cosine of the angle between
r and k. In k space, we can always define a correspondingly oriented coordinate sys%%% ∞
% ∞ % 2π % π 2
tem for any r as to simplify the integral
−∞ dk = 0
0
0 k dk dϕ d cos ϑ =
% ∞ % 2π % 1 2
k
dk
dϕ
dζ
.
After
re-arranging
the
integrals,
we
get
0
0
−1
% 2π
G=
dϕ
(2π)3
$
0
=
1 1
(2π)2 ir
=
1 1
(2π)2 ir
=
1 1
(2π)2 ir
=
1 1
(2π)2 ir
∞
%1
ikr ζ dζ
−1 e
2
k 2 − ωc2
$
∞
1 eikr − e−ikr 2
k dk
ikr k 2 − ω22
0
0
c
$
$ ∞ ikr
$ ∞ −ikr
∞ eikr k
e − e−ikr
e
k
1 1
k dk =
dk −
dk
2
2
2
(2π)2 ir
0
0
0
k 2 − ωc2
k 2 − ωc2
k 2 − ωc2
$
$ −∞ −i(−k)r
∞ eikr k
e
(−k)
dk −
d(−k)
2
2
0
0
k 2 − ωc2
(−k)2 − ωc2
$
$ 0
∞ eikr k
eikr k
dk +
dk
2
2
0
−∞ k 2 − ω2
k 2 − ωc2
c
$ ∞
eikr
k dk.
(A.86)
2
−∞ k 2 − ω2
c
k 2 dk =
1
(2π)2
The denominator is expanded in partial fractions
G=
1
1
2
2(2π ) ir
$
∞
−∞
1
2
k 2 − ωc2
eikr
dk +
k − ωc
$
=
∞
−∞
1 1
2k k− ωc
+
1 1
2k k+ ωc
eikr
dk
.
k + ωc
, yielding
(A.87)
To obtain causal temporal solutions, there needs to be a specific solution of the
improper and singular integrals.
200
Appendix
Fig. A.1 Closure of
improper integration path
using C+ over the positive
imaginary half plane for
t > 0, and C− for t < 0, and
regularization of pole for
causal result, i.e. non-zero
only for C+
Causal solutions (integral in ω). Causal responses obeying the transformation
$
h(t) =
∞
−∞
eiω t
dω
ω−a
(A.88)
%∞
are obtained by replacing the improper integral −∞ by a closed integration contour,
and by introducing
vanishing regularization. Jordan’s lemma states that improper
%∞
integration −∞ is equivalent to a closed integration path C+ of positive orientation
involving the additional semi-circle on the upper half of the complex number plane
%∞
%π
% R
−∞ = C+ = lim R→∞ −R dω + 0 R dϕ if the integrand of the semi-circle van(i cos ϕ−sin ϕ) R t
ishes, i.e. lim R→∞ e R eiϕ −a = 0. This is the case for positive times t > 0. For
negative times, the integral can be closed using the lower part of the complex num%∞
% −π
% R
ber plane, −∞ = C− = lim R→∞ −R dω + 0 R dϕ , if the semi-circular integral
vanishes, which is true for negative times t < 0 in our case, see Fig. A.1. We get
⎧⎨ C
+
h(t) = ⎩
eiω t
dω,
ω−a
if t > 0,
eiω t
C− ω−a dω,
if t < 0.
(A.89)
According to Cauchy’s integral formula for analytic regular functions f (z) over
1
a single pole z−a
, we obtain
.
C±
f (z)
f (a), if the path C± surrounds a,
dz = ±2π i
z−a
0,
if a lies outside the path C± .
(A.90)
If a pole on the real axis a ∈ R is slightly shifted by a vanishing imaginary
eiω t
dω (regularization) so that it lies within the path
amount to lim→0+ C± ω−i−a
C+ and not in C− , the result is perfectly causal and vanishes at negative times:
h(t) = 2π i lim→0+ eia t− t u(t), with the unit step function u(t) = 1 for t ≥ 0 and
0 for t < 0, see Fig. A.1.
Appendix
201
Fig. A.2 Causalityenforcing regularization
ω = lim→0+ ω − i of the
poles in Green’s function
wrt. wave number k. For a
positive radius r ≥ 0, closure
of the improper integration
path requires C+ with
semicircle on the positive
imaginary half plane (or C−
for r ≤ 0)
Integral in k. Causality requires specific regularization in frequency as shown above:
Replacement of ω by lim→0+ ω − i c guarantees causality in the partial-fraction
expanded Green’s function Eq. (A.87). Jordan’s lemma requires to use the path C+
to close the improper path in k for a positive radius r ≥ 0, cf. Fig. A.2,
1
1
G=
lim+
2
→0
2(2π )
ir
A.6.4
$
$
ω
2π ie−ikr
e−i c r
eikr
dk
eikr dk
+
=
=
.
ω
ω
2 (2π ) i r
4π r
Ck − c + i
C+ k + c − i
+
(A.91)
Radial Solution of the Helmholtz Equation
The radial part of the Helmholtz equation in spherical coordinates is characterized
by the spherical Bessel differential equation in x = kr
y + 2x −1 y + [1 − n(n + 1) x −2 ]y = 0.
(A.92)
Recursive construction. For n = 0, we know that the omnidirectional Green’s func−ix
tion is a solution diverging from x = 0, and it is proportional to y ∝ e x . We can simplify the equation by inserting y = x −1 u n , which yields with y = x −1 u n − x −2 u n
and y = x −1 u n − 2x −2 u n + 2x −3 u n after multiplying with x:
u n − 2x −1 u n + 2x −2 u n
+ 2x −1 u n − 2x −2 u n
+ [1 − n(n + 1)x −2 ]u n = 0
u n + [1 − n(n + 1)x −2 ]u n = 0.
(A.93)
Moreover, we attempt to find a recursive definition for n > 0 using the approach
yn = x −1 u n ,
u n = −x a x −a u n−1 .
202
Appendix
We evaluate the recursion for the derivatives
u n = −u n−1 + a x −1 u n−1 ,
u n = −u n−1 + a x −1 u n−1 − a x −2 u n−1 , with − u n−1 = [1 − n(n − 1) x −2 ]u n−1
= a x −1 u n−1 + {1 − [n(n − 1) + a] x −2 } u n−1
u n = ax −1 u n−1 − ax −2 u n−1 + {1 − [n(n − 1) + a]x −2 }u n−1 + 2[n(n − 1) + a]x −3 u n−1
= {1 − [n(n − 1) + 2a]x −2 }u n−1 + {[2n(n − 1) + 2a + an(n − 1)]x −3 − ax −1 }u n−1
= {1 − [n(n − 1) + 2a] x −2 } u n−1 + {[n(n − 1)(a + 2) + 2a] x −3 − a x −1 } u n−1
The equation u n + [1 − n(n + 1)x −2 ]u n = 0 using the above expressions becomes
{1 − [n(n − 1) + 2a] x −2 } u n−1 + {[n(n − 1)(a + 2) + 2a] x −3 − a x −1 } u n−1
+[1 − n(n + 1)x −2 ][−u n−1 + a x −1 u n−1 ] = 0.
Comparing coefficients for u n−1 and u n−1 yields a = n
u n−1 :
u n−1 :
1 − 1 − [n(n − 1) + 2a − n(n + 1)]x −2 = 2(a − n)x −2 = 0,
[−a + a]x −1 + [n(n − 1)(a + 2) + 2a − an(n + 1)]x −3 = 0
an(n − 1) + 2n(n − 1) + 2a(1 − n) − an(n − 1) = 2(n − a) = 0,
and hereby a recurrence for yn from u n = −x n [x −n u n−1 ] with yn = x −1 u n , u n =
x yn ,
yn = −x n−1 [x −(n−1) yn−1 ]
⇒ yn+1 = −x n [x −n yn ] .
(A.94)
Singular and regular solution. We know from the Green’s function that the omnidirectional solution should be proportional to g0 ∝ e−ix . The typical radial solution
for an omnidirectional source field is chosen to be the spherical Hankel function of
the second kind2
1
d
e−ikr
(2)
(2)
n
(2)
,
h n+1 (kr ) = −(kr )
h 0 (kr ) =
h (kr ) .
(A.95)
−ikr
d(kr ) (kr )n n
However, this solution is not sufficient to solve problems without singularity at r = 0.
)
is finite at kr , and so are all real parts of the spherical
We know that the function sin(kr
kr
Hankel functions of the second kind, the spherical Bessel functions
j0 (kr ) =
sin(kr )
,
kr
jn+1 (kr ) = −(kr )n
1
d
j
(kr
)
.
n
d(kr ) (kr )n
(A.96)
with opposite sign e−iωt and require to use
(1)
(2)∗
the complex conjugate in every expression containing imaginary constants h n = h n .
2 Note that some scholars use the Fourier expansion eiωt
Appendix
203
The solutions are linearly independent. One check after some calculation that their
Wronski determinant is non-zero [15, Eq. 10.50.1]
&
&
& jn (kr ) h (2) (kr ) &
i
(2)
(2)
n
& &
& j (kr ) h (2) (kr )& = jn (kr )h n (kr ) − jn (kr )h n (kr ) = − (kr )2 .
n
n
(A.97)
Below, the Frobenius method is shown as alternative way to get these functions.
Alternative way: Frobenius method. Given a second-order differential equation with
singular coefficients, it can be solved by a generalized infinite power series:
y +
/∞
0
al x
l
x
−1
y +
/∞
l=0
0
bl x
l
x −2 y = 0, solution: y =
l=0
∞
ck x k+γ .
k=0
(A.98)
Insertion of the solution yields
∞
∞
∞ (k + γ ) al + bl ck x k +l+γ −2 = 0,
(k + γ − 1) (k + γ )ck x k+γ −2 +
k =0 l=0
k=0
an index shift k + l = k, and l = 0 . . . k allows to pull out the common factor x k+γ −2
∞ 1
(k + γ − 1) (k + γ )ck +
k=0
∞ 1
k
2
(k − l + γ ) al + bl ck−l x k+γ −2 = 0
l=0
k
2
(k + γ + a0 − 1) (k + γ ) + b0 ck +
(k − l + γ ) al + bl ck−l x k+γ −2 = 0.
k=0
l=1
The coefficient of every exponent of x in the above equation must be zero:
indical equation for k = 0 :
indical equation for k = 1 :
recurrence for k > 1 :
(γ + a0 − 1)γ + b0 c0 = 0,
(A.99)
(γ + a0 )(γ + 1) + b0 c1 + a1 γ + b1 c0 = 0,
(A.100)
k (k − l + γ ) al + bl ck−l
− l=1
= ck .
(k + γ + a0 − 1) (k + γ ) + b0
(A.101)
Depending on the specific values found for γ , the recurrence, etc. the Frobenius
method suggests how to find or construct an independent pair of solutions.
Spherical Bessel differential equation. In y + 2x −1 y + [−n(n + 1) + x 2 ]x −2 y =
0, all al and bl are zero except a0 = 2, b0 = −n(n + 1), and b2 = 1. Indical equations
and recurrence become
204
Appendix
γ (γ + 1) − n(n + 1) c0 = 0,
(γ + 1)(γ + 2) − n(n + 1) c1 = 0,
(A.102)
(A.103)
(k + γ + 1)(k + γ ) ck = −ck−2 .
(A.104)
We see that the recurrence is again a two-step recurrence, so that one can choose
between an even solution using c0 = 0, c1 = 0 yielding γ = n or γ = −(n + 1),
[or an odd solution that won’t be used, with c0 = 0, c1 = 0 yielding γ + 1 = n or
γ + 1 = −(n + 1)].
Spherical Bessel functions. The choice γ = n yields a solution converging everyck−2
(−1)k c0
where: Powers of x are all positive, the recurrences ck = − (n+k+1)(n+k)
= (n+k+1)!
& ck−2 &
yield a convergence radius R = limk→∞ & ck & = limk→∞ (n + k + 1)(n + k) = ∞.
2n n!
With a starting value c0 = (2n+1)!
, solutions are called spherical Bessel functions [5,
Chap. 3.4]
jn = (2x)n
∞
(−1)k (n + k)! 2k
x .
k![2(n + k) + 1]!
k=0
(A.105)
which are a physical set of regular solutions with n-fold zero at 0. The spherical
Bessel function for n = 0 is
j0 (x) =
sin x
.
x
(A.106)
With the above recursive definition iterated [5, Eq. 3.4.15] one could define
jn+1
d
= −x
dx
n
1
jn ,
xn
jn = (−x)
n
1 d
x dx
n
j0 .
(A.107)
Spherical Neumann functions. For γ = −(n + 1) and c0 = 0, c1 = 0, the recurck−2
(−1)k c0
= (n−k+1)!
and yield the spherical Neumann functions
rences are ck = − (n−k+1)(n−k)
with an (n + 1)-fold pole at 0. They also obey the recursive definition from above,
cos x
,
y0 = −
x
yn = (−x)
n
1 d
x dx
n
y0 .
(A.108)
Spherical Hankel functions. The spherical Neumann and Bessel functions based on
either sin or cos are clearly linearly independent. The spherical Bessel functions are
useful to representing fields convergent everywhere. Physical source fields (Green’s
function) diverge at the source location and exhibit a specific way phase radiates, with
−ix
G ∝ e x . The spherical Bessel and Neumann functions are asymptotically similar
to [15, Eq. 10.52.3]
Appendix
205
lim jn = (−1)n
x→∞
sin(x)
,
x
lim yn = (−1)n+1
x→∞
cos(x)
,
x
(A.109)
therefore only their combination to spherical Hankel functions of the second kind
h (2)
n = jn − i yn
(A.110)
yields useful physical set of singular solutions. They inherit their (n + 1)-fold pole
from the spherical Neumann functions at 0. Their limiting form for large arguments
are
1 (2)
d
(2)
n−1
lim
h
lim h (x) = −x
r →∞ n
r →∞ dx
x n−1 n−1
n − 1 (2)
d (2)
1 d (2)
n−1
h
= −x
lim − n h n−1 + n−1 h n−1 = − lim
r →∞
r →∞ dx n−1
x
x
dx
dn (2)
= (−1)n lim
h = in h (2)
(A.111)
0 (x).
r →∞ dx n 0
With Eqs. (A.110), (A.106) and (A.108), the zeroth-order spherical Hankel function
e−ix
is h (2)
0 (x) = −ix .
Alternative implementation by cylindrical functions. We can transform the spherical
Bessel differential equation by inserting y = x α u and obtain after division by x α
x α u + 2 αx α−1 u + α(α − 1)u
+ 2x α−1 u + 2αx α−2 u + [1 − n(n + 1)x −2 ]u = 0
α+1 α(α + 1) − n(n + 1)
u + 2
u = 0.
u + 1+
x
x2
For α = − 21 , the equation for u becomes the Bessel differential equation with α(α +
1) − n(n + 1) = −(n 2 + n + 41 ) = −(n + 21 )2
(n + 21 )2
1 u + u + 1−
u = 0.
x
x2
(A.112)
Consequently, the spherical Bessel functions and spherical Hankel functions of the
second kind can be implemented using the Bessel and Hankel functions that can be
found in any standard maths programming library. The specific relations are:
3
jn (x) =
π1
J 1 (x),
2 x n+ 2
3
h (2)
n (x)
=
π 1 (2)
H 1 (x).
2 x n+ 2
(A.113)
206
Appendix
A.6.5
Green’s Function in Spherical Solutions, Angular
Distributions, Plane Waves
We can write the inhomogeneous Helmholtz equation ( + k 2 )G = −δ to be excited
by a source at the direction θ 0 at the radius r0 . We decompose the excitation into
a Delta function in radius and direction −r0−2 δ(r − r0 )δ(θ T0 θ − 1). The directional
part needs not be restricted to the spherical Dirac delta function, so we can take a
distribution of sources at r0 , weighted by the panning function g(θ),
+ k 2 p = −r0−2 δ(r − r0 ) g(θ).
(A.114)
From the spherical basis solutions, we know that at a radius other than r0 , p can be
expanded into spherical harmonics
p=
∞ n
ψnm Ynm (θ ).
(A.115)
n=0 m=−n
Acting on the decomposition of p, the directional part of the Laplacian will yield the
eigenvalue ϕ,ζ Ynm = −n(n + 1) r −2 Ynm of the spherical harmonics, and its radial
2
part r 2 r = ∂r∂ 2 + 2r −1 ∂r∂ , as around Eq. (6.11), hence
( + k )
2
∞ n
ψnm Ynm (θ ) =
n=0 m=−n
∞ n 2
∂
2∂
n(n + 1)
2
ψnm Ynm (θ) = −r0−2 δ(r − r0 ) g(θ).
+
k
+
−
2
2
∂r
r
∂r
r
n=0 m=−n
Obviously, ψnm must depend on k and r , so we may pull the factor k into the differen d2
d
d
to get the differential operator k 2 d(kr
+ kr2 d(kr
+ 1 − n(n+1)
and
tials drd = k dkr
)2
)
(kr )2
observe kr as its variable on the left, and we replace kr
by
x
for
brevity.
Applying
%
the factor k −2 and the spherical harmonics transform S2 Ynm (θ) dθ on the equation
removes the double sum on the left (orthogonality) and decomposes the panning
function g(θ) on the right into γnm
d2
n(n + 1)
2 d
+
1
−
ψnm = −(kr0 )−2 δ(r − r0 ) γnm .
+
dx 2
x dx
x
We collect the x-independent term γnm as factors of the solution ψnm = y γnm and
get
2 n(n + 1)
y = −x0−2 δ(r − r0 ),
y + y + 1−
x
x2
(A.116)
Appendix
207
the inhomogeneous spherical Bessel differential equation. As described, e.g., in [16,
17], the inhomogeneous differential equation can be solved by the Lagrangian variation of the parameters for equations of the type y + py + qy = r , knowing its
independent homogeneous solutions y1 = h (2)
n (x) and y2 = jn (x).
It uses a solution y = uy1 + vy2 with variable parameters u and v, which upon
first and second-order differentiation becomes
y = uy1 + u y1 + vy2 + v y2
y = uy1 + 2u y1 + u y1 + vy2 + 2v y2 + v y2 .
y = uy1 + vy2 ,
Inserted into the equation y + py + qy = r , this yields
=0
=0
u (y1 + py1 + qy1 ) +v (y2 + py2 + qy2 ) +u y1 + 2u y1 + v y2 + 2v y2
+ p(u y1 + v y2 ) =
d
(u y1 + v y2 ) + u y1 + v y2 + p (u y1 + v y2 ) = u y1 + v y2 + ( dx
+ p)(u y1 + v y2 ) = r.
Now two functions u and v are to be determined from only one equation, so we
may pose an additional constraint. The above equation would simplify if the term
(u y1 + v y2 ) vanished. By this and the simplified equation, we get two conditions
u y1 + v y2 = 0
u y1 + v y2 = r
I:
II :
and obtain by elimination with either A = I y1 − II y1 or B = I y2 − II y2
A:
v (y1 y2 − y1 y2 ) = −r y1
−W
B:
u
(y1 y2
− y1 y2 ) = −r y2 .
−W
%
%
So that the solution y = uy1 + vy2 uses u = rWy2 dx and v = rWy1 dx. In our case,
−2
we have y1 = h (2)
n (x), y2 = jn (x), r = −x 0 δ(r − r 0 ), and the Wronskian W =
2 −1
(ix ) from Eq. (A.97), hence with integration constants enforcing the physical
solutions:
y = −h (2)
n (x)
$
x
0
ix 2 jn (x) x0−2 δ(r − r0 ) dx − jn (x)
$
∞
x
−2
ix 2 h (2)
n (x) x 0 δ(r − r0 ) dx.
%
%
To convert δ(r − r0 ) into δ(x − x0 ) with x = kr , we use δ(x) dx = δ(r ) dr = 1
= k, hence dx = k dr and obviously by
with
the integration
constant replaced, dx
dr
%
%
δ(x) k dr = δ(r ) dr we find δ(r ) = k δ(x),
208
Appendix
$ x
$ ∞
(2)
(2)
y = −h n (x)
ix 2 jn (x) k x0−2 δ(x − x0 ) dx − jn (x)
ix 2 h n (x) k x0−2 δ(x − x0 ) dx
0
x
(2)
h n (x) jn (x0 ), for x ≥ x0 ,
= −i k
(2)
jn (x) h n (x0 ) for x ≤ x0 .
The solution becomes after re-substituting
n x = kr mand expanding ψnm = y γnm over
the spherical harmonics p = ∞
n=0
m=−n ψnm Yn (θ ):
p = −i k
∞ n
γnm Ynm (θ )
n=0 m=−n
h (2)
n (kr ) jn (kr 0 ), for r ≥ r 0 ,
jn (kr ) h (2)
n (kr 0 ) for r ≤ r 0 .
(A.117)
Green’s function. For the Green’s function at the direction θ 0 , the angular panning
function is expanded as φnm = Ynm (θ 0 ), and we get the formulation of the Green’s
function in terms of spherical basis functions:
G = −i k
∞ n
Ynm (θ 0 ) Ynm (θ )
n=0 m=−n
h (2)
for r ≥ r0 ,
n (kr ) jn (kr 0 )
(2)
jn (kr ) h n (kr0 ) for r ≤ r0 .
(A.118)
Plane waves/far field approximation. Equation (6.7) in Sect. 6.3.1 formulates plane
T
r0
1
waves p = eik θ 0 r as far-field limit p = 4π limr0 →∞ e−ikr
G = limr0 →∞
G.
(2)
0
−ik h 0 (kr0 )
Using
a distribution of plane waves driven by the gains g(θ) =
Eq. (A.117),
m
n (2)
γ
Y
consequently
yields with limr0 →∞ h (2)
nm
n
n (kr 0 ) = i h 0 (kr 0 ),
n
m
p = 4π
∞ n
jn (kr )
n=0 m=−n
= 4π
∞ n
h (2)
n (kr0 )
(2)
r0 →∞ h 0 (kr0 )
lim
Ynm (θ ) γnm
in jn (kr ) Ynm (θ ) γnm .
(A.119)
n=0 m=−n
or for a single plane-wave direction γnm = Ynm (θ 0 )
p = 4π
∞ n
in jn (kr ) Ynm (θ) Ynm (θ 0 ).
(A.120)
n=0 m=−n
A.7
Sine and Tangent Law
The sine and tangent law [18] observes the sound pressure of plane waves at to
locations x = 0, y = ±d at ear distance in order to simulate the ear signals. A plane
wave from the left half of the room from the angle ϕ > 0 first arrives at the left ear
Appendix
209
pleft = ei kd sin ϕ and later on the right one pright = e−i kd sin ϕ . The phase difference is
ϕ = 2 kd sin ϕ.
A superimposed pair of plane waves from the directions ±α arrives at the left ear
as pleft = g1 ei kd sin α + g2 e−i kd sin α , right as pright = g1 e−i kd sin α + g2 ei kd sin α =
∗
1 −g2 ) sin(kd sin α)
pleft
. The phase difference ±α = 2∠ pleft = 2 arctan (g
can be lin(g+
1 +g2 ) cos(kd sin α) ,
g1 −g2
2
earized for long wave lengths kd → 0 to ±α ≈ 2 arctan kd g1 +g2 sin α ≈ 2 gg11 −g
+g2
kd sin α.
Comparing the phase difference of the single plane wave with the one of the
2
sin α, one arrives at the sine law
superimposed pair, 2 kd sin ϕ = 2 kd gg11 −g
+g2
sin ϕ =
g1 − g2
sin α.
g1 + g2
If we claim our hearing to possess the ability to not only estimate the interaural
phase difference but also its derivative with regard to head rotation ∂
, we arrive at
∂δ
∂ϕ
a value pair of binaural features (ϕ , ∂δ ) = 2 kd (sin ϕ, cos ϕ) than should match
the one of the stereophonic plane-wave pair. For stereo, the phase difference derived
with regard to head rotation is 2 kd
∂
∂δ ±α
∂
sin(α+δ)|δ=0 +g2 ∂δ
sin(−α+δ)|δ=0
=
g1 +g2
2
(±α , ∂∂δ±α ) = 2 kd ( gg11 −g
sin
α,
cos
α).
+g2
≈ 2 kd
g1
∂
∂δ
2
2 kd gg11 +g
cos α = 2 kd cos α, and yields
+g2
In polar coordinates, the radius of both value pairs differs. While the plane wave
yields a value pair at the radius 2 kd in the binaural feature space, the stereophonic
waves is of the radius 2 kd only at ±α, at which one of the two gains must vanish,
and amplitude panning can be used to connect these two points 2kd (± sin α, cos α)
by a straight line. The plane wave with the most similar feature pair must lie on the
±α
and obtain
same polar angle. We may equate the tangents of both points ∂ϕϕ = ∂
±α
∂δ
the tangent law:
tan ϕ =
∂δ
g1 − g2
tan α.
g1 + g2
If instead of the angle of a plane wave with the closest features to those of a given
amplitude difference is searched, but the closest features of an amplitude difference
matching those of a given plane wave, then the sine law is the best match, even in
the two-dimensional feature space.
References
1. S. Zeki, J.P. Romaya, D.M.T. Benincasa, M.F. Atiyah, The experience of mathematical beauty
and its neural correlates. Front. Hum. Neurosci. 8, e00068 (2014)
2. Mathematical signs and symbols for use in physical sciences and technology (1978)
3. Iso 80000-2, quantities and units? part 2: mathematical signs and symbols to be used in the
natural sciences and technology (2009)
210
Appendix
4. G. Dickins, Sound field representation, reconstruction and perception. Master’s thesis, Australian National University, Canberra (2003)
5. W. Cassing, H. van Hees, Mathematische Methoden für Physiker, Göthe-Universität Frankfurt
(2014), https://th.physik.uni-frankfurt.de/~hees/publ/maphy.pdf
6. A. Sommerfeld, Vorlesungen über Theoretische Physik (Band 6) Partielle Differentialgleichungen in der Physik, 6th edn. (Harri-Deutsch, 1978) (reprint 1992)
7. F. Zotter, M. Frank, All-round ambisonic panning and decoding. J. Audio Eng. Soc. (2012)
8. J.R. Driscoll, D.M.J. Healy, Computing fourier transforms and convolutions on the 2-sphere.
Adv. Appl. Math. 15, 202–250 (1994)
9. W. Contributors, Ideal gas, https://en.wikipedia.org/wiki/Ideal_gas. Accessed Dec 2017
10. G. Franz, Physik III: Thermodynamik (Hochschule, München, 2014), http://www.fb06.fhmuenchen.de/fbalt/queries/download.php?id=6696
11. W. Contributors, Enthalpy, https://en.wikipedia.org/wiki/Enthalpy. Accessed Dec 2017
12. W. Contributors, First law of thermodynamics, https://en.wikipedia.org/wiki/First_law_of_
thermodynamics. Accessed Dec 2017
13. W. Contributors, Adiabatic process, https://en.wikipedia.org/wiki/Adiabatic_process.
Accessed Dec 2017
14. D.G. Duffy, Green’s Functions with Applications (CRC Press, Boca Raton, 2001)
15. F.W.J. Olver, R.F. Boisvert, C.W. Clark (eds.), NIST Handbook of Mathematical Functions
(Cambridge University Press, Cambridge, 2000), http://dlmf.nist.gov. Accessed June 2012
16. E. Kreyszig, Advanced Engineering Mathematics, 8th edn. (Wiley, New York, 1999)
17. U.H. Gerlach, Linear Mathematics in Infinite Dimensions: Signals, Boundary Value Problems, and Special Functions, Beta edn. (2007), http://www.math.ohio-state.edu/~gerlach/
math/BVtypset/
18. H. Clark, G. Dutton, P. Vanerlyn, The ‘stereosonic’ recording and reproduction system,
reprinted in the J. Audio Eng. Soc., from Proceedings of the Institution of Electrical Engineers, 1957, vol. 6(2) (1958), pp. 102–117
Download