Third-Order Ambisonic Extensions for Max/MSP with Musical Applications Graham Wakefield Graduate Program in Media Arts and Technology, University California Santa Barbara wakefield@mat.ucsb.edu • To develop tools for prototyping virtual environments within multi-user facilities such as the UCSB CNSI AlloSphere (Pope 2005, http://www.mat.ucsb.edu/allosphere/). Abstract This paper describes a package of extensions (externals) for Cycling ‘74’s Max/MSP software to facilitate the exploration of Ambisonic techniques of up to third order. Areas of exploration well suited to the Max/MSP environment and techniques for composition within the Ambisonic domain using the presented externals are described. 1 Introduction Ambisonics (Gerzon 1973) is a technique to re-create the impression of (or synthesize) a spatial sound-field; alternative techniques include Wave Field Synthesis (Berkhout 1988) and Vector Base Panning (Pulkki 1997). Ambisonics is a two-part process of encoding recorded or virtual spatial sources into an Ambisonic domain representation, and then decoding this representation onto an array of spatial outputs (typically loudspeakers). The Ambisonic domain is a multi-channel representation of spatial sound fields based upon cylindrical (2-D) or spherical (3-D) harmonics. First-order Ambisonics, also known as the B-Format, encode sound-fields as an omni-directional signal (named W) plus three additional difference signals for each of the axes X, Y and Z. Higher Order Ambisonics (HOA) uses higher ordered spherical harmonics for additional encoded signals, increasing the detail of directional information and expanding the acceptable listening area of the decoded spatial sound field (Malham 2003). This paper describes a package of externals developed by the author to add Ambisonic capabilities to Max/MSP up to third order with full access to the Ambisonic domain signals. Max/MSP (Puckette 1988, Zicarelli 1998) is a flexible visual data-flow programming environment with audio signal processing capabilities, including a C API for developers to extend the functionality of Max/MSP by building extensions (externals). The rationale behind the development of the presented externals is twofold: • To facilitate rapid experimentation and prototyping of Ambisonic-domain processing techniques (some examples follow in this paper) 123 Ambisonic extensions for Max/MSP have until recently been limited to first-order representations (e.g. Courribet and Béguin 2005). Concurrently with the present set of externals, The Institute for Computer Music and Sound Technology ICST of the Zurich School of Music, Drama and Dance (Hochschule Musik und Theater Zürich HMT) announced a set of Ambisonics externals for Max/MSP, and both sets of externals share many capabilities. Beneficially, both the ICST externals and those described in this paper can be used together within the same application. 2 About the Ambisonic externals The package described in this paper contains several different externals to encapsulate distinct elements of the Ambisonic process. Some aspects of these externals, however, are common to all. Instantiation arguments of the externals set the order of encoding, and whether they are two-dimensional (pantophonic) or three-dimensional (periphonic) formats, determining how many signal inlets and/or outlets the externals require. Ambisonic domain inlets and outlets are labeled by the channel naming conventions of Jérôme Daniel (2000) to assist the correct connection of signal graphs. 2.1 Encoding sources A monophonic virtual or microphone source is encoded to the Ambisonic representation using the external ambi.encode~. The monophonic source signal is connected to the first inlet, and the azimuth and elevation control signals (in degrees or radians) are connected to the second and third inlets respectively (Figure 1). The outlets of the external are the encoded Ambisonic representation. By default, the azimuth and elevation signal inputs are sampled at block-rate with linear interpolation, however a more CPU intensive sample accurate mode (causing the encoding matrix to be recalculated at every sample) is available if required by particular applications (e.g. fast moving sources). equally spaced on the radius of a circle around the listener. In the 3-D case, only five regular polyhedra exist (the platonic solids), and thus usually a semi- or quasi-regular layout is sought, and maintaining an equatorial ring is usually preferred for psychoacoustic reasons. Hollerweger (2006) provides extensive detail and MATLAB code for the choice of speaker layouts in Ambisonic decoding. Ambi.decode~ uses the projection method to generate the decoding matrix, which is less sensitive to the regularity of the speaker layout than the pseudo-inverse method. Nevertheless, directional and energy balance distortions will increase as the layout becomes more irregular (Daniel 2000, Hollerweger 2006). Figure 1. A simple Max/MSP patch, encoding a virtual source (white noise) at centre-front into a 3-D, firstorder (B-Format) representation, rotated by π/2 radians, and decoded onto an array of four loudspeakers. Since one of the principal benefits of Ambisonics is the compact representation of a complex sound field, the author provided an external that can encode up to 16 individually positioned virtual sources to a mixed encoded output, named ambi.encoden~. The third instantiation argument defines how many monophonic sources the object will encode, and thus how many signal inlets it will have. Spatial orientations of azimuth and elevation can be specified (in degrees or radians) for individual sources using Max messages (Figure 2); ambi.encoden~ is better suited to static or infrequently moving objects since encoding matrices are recalculated at message rate. Ambi.granulate~ offers the ability to create even larger numbers of spatial events, based upon the granulation of a buffered delay line – creating thousands of discrete Ambisonic events per second would be otherwise difficult in Max/MSP. The encoding matrix of each grain is calculated when the grain is scheduled. Parameters for amplitude, azimuth, elevation, duration, inter-onset time, delay etc. are set (as centre value and deviation range) using Max messages, and the sizes of the delay buffer and maximum grain concurrency per block are specified as instantiation arguments (limited by available RAM). 2.3 Decoding the sound field The Ambisonic domain signal representation is decoded into a multi-channel array of channels, typically for a loudspeaker array, using the ambi.decode~ external. The number of output channels is determined by instantiation arguments, and the relative orientations for each speaker are specified using Max messages. If no speaker orientations are specified, a clockwise horizontal ring beginning from centre-left is assumed as by default (Figures 1,2). It should be noted that Ambisonics works best with regular orthonormal speaker layouts (Daniel 2000). For the 2-D case, a regular (polygonal) layout has all speakers 124 Figure 2. A Max/MSP patch encoding 16 individually positionable and tunable band-passed noise sources to a third-order Ambisonic representation, then decoded to a 16-speaker array. Several different encoding conventions exist to scale Ambisonic signals at various orders. The original B-Format definition specified a scalar of √2 for the W channel, while the Furse-Malham set (Malham 1999) scales second-order signals to achieve normalized values in the Ambisonic domain, and Daniel (2000) specified up to third order without normalizing weights except for the original √2 W channel. In the presented Max/MSP externals the author opted to employ the same convention as presented by Daniel (2000) since the floating point signal operations of Max/MSP obviate the need for normalization, while retaining compatibility with existing B-Format recordings. The ambi.decode~ external includes sets of order coefficients from the literature (basic, in-phase, max-rE), which can be recalled using Max messages. In addition, Max messages can be used to directly set the balance between the zeroth- (omnidirectional), first-, second- and third-order encoded channels, to enhance or suppress the spatial separation, or to improve the spatial impression for combinations of distinct order representations and/or unusual speaker arrays. 2.3 Rotating the sound field An Ambisonic sound-field can be rotated using a set of matrix operations (Malham 2003). The ambi.rotate~ external can be inserted into the Ambisonic domain signal stream and used to rotate an entire sound field (of up to third order) around one or more of the principal axes (Figures 1,2). Max messages with float arguments of ‘rotate’, ‘tilt’ and ‘tumble’ will transform the field around the Z, Y and X axes respectively. When tilt and tumble are set to zero (the default), the rotation calculations are simplified for efficiency. Rotating an entire sound field can be much more efficient than rotating each individual source when the number of sources is greater than the number of encoding channels. Sound field rotation is well suited to the sonification of a virtual environment, where navigation will shift apparent orientations of all sound sources in unison. 3 and the LFE channel submixed into the omnidirectional W Ambisonic channel, and the Ambisonic signals could then be decoded onto a larger array of loudspeakers. 3.2 Simulating distance cues Spatial hearing based upon directional and distance based cues. Ambisonics is a convenient approach to tackle directional information since it is based on spherical coordinates. Distance is not represented in the standard Ambisonic encoding (for mathematical reasons all sources are assumed to lie on the surface of a unit sphere), however distance is a vital component in virtual environment simulation. Distance cues can be used in combination with Ambisonics however (Malham 1998), and existing Max/MSP signal processing objects can be used alongside the Ambisonics externals accordingly (Figure 3). The Ambisonic externals in use The Ambisonic externals described in this paper give the designer of a signal graph a great deal of flexibility to define spatial sound fields and explore spatial manipulation techniques, due to the direct access they provide to individual Ambisonic channels, and the comprehensive range of signal and event processing tools that the Max/MSP environment itself provides. In this section a number of possible areas of exploration will be described. A particular benefit of the Ambisonic encoding is that summing the encoded channels will correctly superpose spatial representations encoded at the same order. A multitude of sources may therefore be individually manipulated before mixing down to a single Ambisonic representation to be decoded over many loudspeakers. Mixing Ambisonic signals of different orders is also possible, at the cost of spatial balance. 3.1 Using existing spatial recordings A spatial sound field representation can be captured to disk as a multi-channel sound file using standard Max/MSP objects such as sfrecord~, record~, though it may be necessary to apply normalization to the signals when recording to an integer based file format. Similarly, an Ambisonic sound file can be played back over a speaker array using the Max/MSP built-in objects sfplay~, play~ etc. connected to an ambi.decode~ object. Placing the sound field manipulations between the file player objects and the decoder allow the real-time transformation of the stored Ambisonic sound field at playback. Recordings for particular speaker arrangements can be somewhat generalized to larger speaker arrays using Ambisonics. For example, a typical 5.1 surround layout can be simulated in Ambisonics by a set of 5 virtual sources placed around the listener. A sound source encoded for 5.1 could be spatialized using ambi.encoden~ with five inputs, Figure 3: Simulating distance cues in combination with Ambisonics. Inset shows the reverb_by_distance subpatcher. Distance cues might include processing a sound source before encoding (including decreasing amplitude with increasing distance, applying IIR filters to simulate air absorption and near-field compensation (Daniel 2003), and using variable delays to mimic Doppler shift), as well as mixing in ‘room effect’ first reflection and tail reverberations to the encoded sound field according to distance and/or position. The Jitter extensions to Max/MSP include bindings to OpenGL for three-dimensional rendering. Ambisonic encoding alongside distance cues can facilitate the 125 auralization of rendered virtual worlds within a single application. 3.3 Processing in the Ambisonic domain Access to a sample-accurate Ambisonic encoder suggests experimentation with high-frequency variations in spatial orientation. Furthermore, access to the Ambisonic domain signals within a flexible DSP environment invites experimentation with new manipulations of the sound field, and the possibility of directly synthesizing the directional difference signals (rather than or in addition to encoding virtual sources). For example, a relatively simple FM synthesis technique when used to generate differential signals of the Ambisonic domain itself may result in quite complex and new spatial sound events. Certainly it must be valid to attempt the extension of existing synthesis techniques with spatial parametrics, if possible! The author has for example explored passing comb filters in a feedback matrix across the Ambisonic channels, producing subtle, complex and chaotic spatial images warranting further investigation. Certain signal processes are quite expensive and thus often cannot be duplicated per channel for large speaker arrays. Because the number of Ambisonic domain signals is almost always less than the number of loudspeakers, it can be more efficient to perform expensive operations in the Ambisonic domain. The author has employed a similar approach in order to spatialize individual bands of a spectral resynthesis process. The spectral resynthesis produces a summed monophonic output from a large number of individual bands, but provides independent amplitude control for each voice. To spatialize the individual bands, four spectral resynthesizers were Ambisonic encoded as four virtual speaker locations (in a regular tetrahedral pyramid layout), and then decoded onto an array of 16 real loudspeakers. The amplitude of a particular frequency band can thus be modulated spatially quite efficiently using four rather than sixteen banks, and although there is a tradeoff of spatial accuracy, the sonic effect is quite effective. A similar approach might be taken to simulate frequency variant radiation patterns, as described by Daniel (2003). 4 Conclusions Ambisonics is a powerful and efficient means to work with spatial digital audio. Second- and third-order Ambisonics involve complex algorithms that are more efficiently and conveniently managed using compiled C externals than existing Max/MSP objects. The open-ended nature of the Max/MSP signal graph architecture, and the access to the Ambisonic domain signals that the presented externals deliver, provide a flexible environment for exploring new kinds of signal manipulations and techniques for composition in the Ambisonic domain. 126 The extensions described in this paper constitute a minimal set to begin exploring Ambisonics in electroacoustic composition, however a few key areas are currently under consideration for future development. Speaker arrays of 30 or more may demand the development of fourth, fifth or higher order encoding and decoding externals to expand the sweet area. A HRTF decoder for headphone listening would be a welcome addition. At the time of writing, the externals are available as binaries from the author’s website (http://www.grahamwakefield.net). 5 Acknowledgments The presented Ambisonic extensions to Max/MSP were inspired by work on the CSL signal-processing library (Pope, Ramakrishnan, 2003) alongside Florian Hollerweger (IEM, Graz, Austria), and Jorge Castellanos (MAT, Santa Barbara, USA). References Berkhout A. J., 1988. A Holographic Approach to Acoustic Control, AES Journal, 36(12), pp 977-995. Courribet B., Béguin R. 2005. Outils de panoramisation pour l’environnement , 5.1 dans MaxMSP. Journées d’informatique musicale, June 2005. Daniel, J. 2000 Représentation de champs acoustiques, application à la transmission et à la reproduction de scènes sonores complexes dans un contexte multimédia PhD thesis, Université Paris 6 Daniel, J. 2003. Spatial Sound Encoding Including Near Field Effect: Introducing Distance Coding Filters and a Viable, New Ambisonic Format. Proceedings of the AES 23rd International Conference, Copenhagen, Denmark, 2003. Gerzon, M. A. 1973 Periphony: With-height Sound Reproduction Journal of the Audio Engineering Society, Vol. 21 No. 1 Jan/Feb pp.2-10 Hollerweger, F. 2006. Periphonic Sound Spatialization in MultiUser Virtual Environments. MSc. thesis, Institute of Electronic Music and Acoustics (IEM), March 2006. Malham, D.G. 1999. Higher order Ambisonic systems for the spatialisation of sound. Proceedings, ICMC99, Beijing, October 1999. Malham, D. G. 2003. Space in Music - Music in Space. MPhil thesis, University of York, April 2003 Pope, S. T., Ramakrishnan, C. 2003. The CREATE Signal Library (“Sizzle”): Design, Issues, and Applications. In Proceedings of the 2003 International Computer Music Conference (ICMC '03). Pope, S. T. 2005. Audio in the UCSB CNSI AlloSphere. Online: http://www.create.ucsb.edu/~stp/AlloSphereAudio_01.pdf. Retrieved July 2006. Puckette, M. 1988. The Patcher. Proceedings of the 14th International Computer Music Conference, Koln. Pulkki, V. 1997. Virtual sound source positioning using Vector Base Amplitude Panning. AES Journal, 45(6), pp456 – 466. Zicarelli, D. 1998. An Extensible Real-Time Signal Processing Environment for Max. Proceedings of the International Computer Music Conference, Ann Arbor, Michigan.