Organized Sound article - Camille Goudeseune

advertisement
Interpolated mappings for musical instruments
Camille Goudeseune
Beckman Institute, University of Illinois at Urbana-Champaign
405 N Mathews, Urbana IL 61801 USA
cog@uiuc.edu
Abstract
Software-based musical instruments have controls for input, a sound synthesiser for output, and
mappings connecting the two. An effective layout of controls considers how many degrees of freedom
each has, as well as the overhead of selecting each one while performing. An isolated mapping from one
control to one synthesis parameter needs an appropriate choice of proportional, integral, or derivative
control (the control’s value, or that value’s rate of change, drives the synthesis parameter’s value, or that
value’s rate of change). Beyond this, a compound mapping cross-coupling several controls and synthesis
parameters can surprisingly increase the performer’s intuitive understanding of the instrument.
Such a compound mapping is essentially a continuous function from Rd to Re for arbitrary
integers d and e, where 1  d < e. Such a function is easily constructed by associating several sets of d
control values with corresponding sets of e parameter values (i.e., sounds). This ‘pointwise’ mapping
can then be extended through a geometric technique called simplicial interpolation to produce a
continuous mapping, which can be adjusted and refined by simply moving or adding new pairs of
‘points.’ Furthermore, given the initial sounds in Re, the corresponding control settings in Rd can be
found automatically. An open-source C++ implementation of this technique is available.
1
1. Introduction
Since both an instrument and a composition written for it may involve software, and since
mappings are typically realised with software, mapping properly belongs to the disciplines of both
instrument design and composition. Here we focus on the real-time part, the instrument.
If the performer can comprehend the mappings embedded in an instrument, obviously a more
refined performance can result. This argues for static mappings over dynamic, and simple over complex
(although we shall see that overly simple mappings can be suboptimal). Explicit construction of a
mapping may or may not be better than having an algorithm compute it as the solution to some set of
criteria.
Whatever decisions are made about mappings in an instrument, they result in what performers
call the feel of the instrument, its responsiveness and controllability, its ‘consistency, continuity, and
coherence’ (Garnett and Goudeseune 1999). Listeners perceive the result in the range, accuracy, and
speed of performed gestures.
Three questions can be asked about a mapping: from what? to what? and by what means? Here
we answer the first two by considering a musical instrument as something into which the performer puts
gestures, and which then outputs sounds. Since adjustable mappings require software, we explore the
world where software lies between the gestures and the sounds. So a mapping is from an instrument’s
controls, to the inputs of a sound synthesiser (‘synthesis parameters’). An expanded answer to these first
two questions considers the interactions between individual controls and individual synthesis parameters.
Software-based instruments can be motivated from asking what sounds a given controller might
produce, or equally well from how one might perform a given family of synthesised sounds. Ryan (1992)
2
calls instrument design the putting of ‘physical handles on phantom models,’ discovering which controls
(‘handles’) work well with mappings into a synthesiser (abstract ‘models’ of sound).
One answer to the third question is explored at length in this paper. Given an instrument’s
controls, we must construct a smooth mapping from their several degrees of freedom (or dimensions,
geometrically speaking), to a possibly different number of scalar inputs accepted by the instrument’s
synthesiser. As a musically useful starting point, such a mapping can be built up from a pointwise map,
an association of particular input values with output values: when the performer does this, the instrument
should sound like this. An interpolator then produces reasonable intermediate outputs for intermediate
inputs. Many techniques of interpolation can generate intermediate data from discretely sampled data.
Simplicial interpolation, presented here, is well suited to the application domain of musical instruments.
Since it is often the case that a sound synthesiser has many more real-time inputs than a human
can attend to simultaneously, we would like to reduce the complexity of a large set of parameters. A
high-dimensional interpolator (HDI) lets the performer control a large number of parameters with a
much smaller number.
A parameter which takes on a discrete set of values can often be reduced to a continuous one. If
the values can be ordered from least to greatest, the parameter can directly be treated as continuous
though with coarse resolution. If the values are not so orderable, as with the ad hoc timbres offered by
the stops of a pipe organ, they may be embedded in a space of dimension greater than one by means of
perceptual discrimination, situating similar values close together. (This is attempted in two dimensions
by the layout of both organ stops on a console and orchestral musicians on a stage.) Some comparison
(i.e., ordering) of discrete values is necessary for any generalization or theory, and such an ordering leads
directly to speaking of points in a continuous space.
3
2. Controls and driving graphs
By controller we mean the complete interface or set of commands available to the
instrumentalist. By control we mean a single indivisible part of the controller. A control’s value is its
instantaneous state. We call a control scalar if its value is a single continuous scalar. If its value is
discrete, the control is often called a switch.
A dimension is a linear continuum. Its value is a scalar, the dimension’s realization at an instant
of time. Often a dimension is associated with a single scalar control, in which case we identify the values
of the dimension and the control. Sometimes we speak interchangeably of parameter, dimension, and
degree of freedom.
A continuous control drives a dimension if a change in its value produces a corresponding
change in the dimension’s value.
Since one control may drive several dimensions, and again, several controls may drive the same
dimension, we can imagine a set of points corresponding to controls, a second set of points
corresponding to dimensions, and arrows from points in the first set to points in the second. We call this
diagram the driving graph of the instrument (figures 1 and 2). Hunt, Wanderley, and Kirk (2000)
mention three common mapping strategies, namely ‘one-to-one,’ ‘one-to-many,’ and ‘many-to-one.’ The
concept of driving graph unifies and extends these, and also makes available many results from the
mathematical field of graph theory.
4
Value of control
Value of dimension
50 dB
20 dB
Control
Mapping
Dimension
0 dB
Figure 1. The simplest nontrivial driving graph of an
instrument: one scalar control driving one dimension.
violin elevation
FM synthesiser
pitch
Octave switch
amplitude
pitch
amplitude
index of modulation
Delay lines
output
relative amplitudes
violin azimuth
violin elevation
pitch
amplitude
pitch
overall amplitude
index of modulation
amplitude of delay #1
violin azimuth
amplitude of delay #2
amplitude of delay #3
amplitude of delay #4
5
Figure 2. Driving graph of an instrument based on an electric violin tracked in pitch, amplitude, and
spatial orientation. Rotating the violin upwards raises the output pitch by an octave; rotating about
a vertical axis changes the relative amplitude of several delay lines through which the sound passes.
The gain of a scalar control (with respect to some dimension that it drives) describes how
strongly input affects output. For the same range of input values, a high-gain control has a wider range of
output values while one with low gain demands less accuracy and thus is easier to play.
The order of a scalar control (again, with respect to a driven dimension) is an integer describing
how direct the mapping is from the value of the control to the value of the dimension. If the mapping is
direct, the control has order zero. A control of order one has a direct relationship between the control’s
value and the rate of change of the dimension’s value. A control of order –1 has the opposite behaviour:
a direct relationship between rate of change of the control’s value and the dimension’s value. To speak
of the order of a scalar control, we require only that the mapping is continuous; in particular, the
mapping need not be linear. (For reasons of extensibility, this separation of discrete and scalar controls
departs from the three-level hierarchy of discrete (‘set-and-forget’ auxiliary) controls, order-zero scalar
controls, and order-one scalar controls presented in (Vertegaal, Ungvary, and Kieslinger 1996: 310). A
discrete control cannot be said to have an order when it is not based on an underlying continuum.)
A more general description of how input drives output is given by a constant-coefficient linear
differential equation (Sheridan and Ferrell 1981: 178–80). The coefficients indicate how much gain is
present at each order. But we can practically restrict ourselves to equations with only one nonzero order
coefficient. In fact we prefer an alternative notation for the order of a control, borrowed from the field of
proportional-integral-derivative or PID control (Dean and Wellman 1991: 144). If x(t) is the control’s
value at time t and y(t) is its dimension’s value, then a proportional control can be expressed as y(t) =
6
t
(x(t)) for some mapping  : RR; an integral control has the form y(t) = (  x(t )dt ); and a
0
derivative control has form y(t) = ( dtd x(t)). In other words, the value of a proportional control depends
purely on its input, that of an integral control on the input’s history, and that of a derivative control on
the input’s rate of change. Orders –1, 0, and 1 correspond to derivative, proportional, and integral
controls respectively. (A mnemonic: –1, 0, and 1 correspond to a polynomial’s change of degree when
differentiated, unmodified, or integrated respectively.) Orders greater than 1 or less than –1 play a
relatively minor role in both industrial processes and musical instruments, so the terms proportional,
integral, and derivative control often suffice. However, sophisticated models of transient phenomena
such as the breakdown and reestablishment of Helmholtz motion when bow velocity changes or reverses
(i.e., acceleration), may involve greater orders.
Three commonplace examples illustrate PID controls. Fingertip placement on a fingerboard,
driving pitch, is a proportional control. Only where the finger is, not how fast it moves or where it has
been, determines pitch. Amplitude often uses a derivative control: the speed of a bow, not its current or
previous positions, determines the amplitude of the string’s vibration. Laptop computers are now often
seen on stage in computer music concerts; a sound synthesis parameter (perhaps volume, pitch, or a filter
parameter) driven by the laptop’s ‘trackpoint’ built-in mouse exemplifies an integral control, because the
parameter’s value depends purely on the history of nudges applied to the trackpoint.
Several results from PID control theory apply to musical instruments. Proportional controls are
most common. Derivative controls are more agile than proportional controls but fare worse at holding a
constant value, which suits them well to controlling sound amplitude. Integral controls are, not
surprisingly, the opposite: they lack agility but once they reach a desired output value they easily
maintain it (by zeroing their input: once x(t) is zero, as t continues to increase then
t
 x(t )dt remains
0
7
constant). Derivative and integral controls can also be characterised in terms of transient response, the
ability to execute sudden changes in value. Derivative controls have good transient response while
integral controls, especially ones with large gain, tend to overshoot the desired output value. Finally,
integral controls work better when the controlled system has high hysteresis (Dean and Wellman 1991:
148–9).
3. Inputs to a mapping
If a controller has several controls, particular controls can be selected, adjusted, and deselected.
Selection is the allocation of performer resources to a particular control; adjustment is the actual change
of state of the control; and deselection is the relinquishing of resources in preparation for subsequent
selections. Less abstractly, when a performer selects a control, he directs attention and possibly executes
some muscular motion. By resources we mean the performer’s finite cognition, attention, and physique:
only this many limbs with this range of motion, this spatial resolution, and this speed.
Selecting a control may be as simple as moving a limb to it. But sometimes it may be unwieldy
to have all the controls directly accessible by mere positioning of a hand. Then we need to replace ‘space
multiplexing’ with ‘time multiplexing’ (Buxton 1986); in other words, we need secondary controls to
change the behaviour of the primary controls. Mutes are a common example of secondary controls.
Also, a single multi-way selector switch can associate a primary control with one of several sets of
scalars, for example keyboard couplers on an organ. It is difficult to rigourously define when a control is
secondary, but the label applies well when several of the following hold: the control modifies the
behaviour of another control which is manipulated more often; the control can be left unattended for a
while, physically or only attentionally; the control causes a sound or behaviour which is in some way
nonstandard.
8
Introducing secondary controls reduces the number of primary controls, thereby simplifying the
controller. Of course this is a compromise: simultaneous adjustment of multiple controls is reduced.
Also, the interface is deepened even while it is narrowed: the performer’s mental model of the
instrument is more elaborate and takes longer to learn. A range of compromise in fact exists. At one
extreme there are k primary controls (strings on a stringed instrument, keys on a multiply touch-sensitive
clavier) and no secondary controls, at the other extreme one primary control with a single k-way selector
switch (e.g., the Ondes Martenot). Between these two extremes there may be m primary controls with an
n-way selector switch, where mn  k (bass guitar: m = 4 strings, n = 3-way pickup switch). If only a few
secondary controls extend the interface of an orchestral instrument, they can often be operated by the
feet, for instance as a bank of toggle switches or a three- or four-way ‘gas pedal.’
3.1 Continuous controls: sliders and multisliders
We abbreviate ‘continuous control’ with the term slider, distinguishing one-dimensional scalar
sliders and higher-dimensional multisliders. Scalar sliders include linear sliders and rotary knobs in a
physical control apparatus, sliders or scrollbars on a computer display, and nonmanually controlled
sensors of pressure, proximity, and so on. Joysticks, mice, virtual reality ‘wands,’ and motion-tracking
systems for dancers are examples of multisliders. As a multislider concurrently drives several
dimensions, it is suitable when performance gestures are desired more in the product space of these
dimensions than in each individual dimension. (To draw freehand, one prefers a mouse; to draw straight
vertical and horizontal lines, the humble Etch-a-Sketch is better (Buxton 1986).) Considering which
dimensions are coupled in performance gesture and which are independent shows the instrument designer
where multisliders are appropriate.
9
A slider can be absolute or relative. An absolute slider has a direct correspondence between the
slider’s position and the scalar’s value. A relative slider, in combination with a secondary control, can
change the origin of its coordinate system relative to that of the scalar. This secondary control is usually
a momentary switch (one which remains engaged only as long as force is applied to it). When the switch
is engaged, the slider is active: moving the slider adjusts the scalar’s value. When the switch is not held,
the slider is inactive and can be moved without adjusting the scalar’s value. Another way to think of this
is that the switch changes the slider’s behaviour between changing the parameter value and changing the
origin of the coordinate system. A more sophisticated kind of relative slider varies its gain continuously,
with yet another slider.
A relative slider is useful if the effective range of the slider needs to extend beyond its physical
range (‘pawing’ a computer mouse—here the secondary control is lifting the mouse from the desk).
Three reasons explain the rarity of relative sliders in musical instruments: the time required to perform a
change of coordinate system undesirably constrains real-time performance; the secondary controls
introduce what human-computer interface specialists call mode, undesirable state in the interface itself;
relative sliders do not offer direct proprioceptive feedback (a consequence of this extra state).
A bank of sliders with multiple selection typically requires two-handed or multi-fingered
operation. If all sliders in a bank can be selected simultaneously, the bank differs from a single
multislider only in that it provides a set of primary axes for the space in which gestures are performed.
‘Rotated’ gestures are harder to perform with an array of sliders than with a true multislider: try drawing
straight diagonal lines with an Etch-a-Sketch. If all controls cannot be selected simultaneously, this may
be due to either physical or attentional limitations. Physical limitations, as is the case with a pianist’s ten
fingers and finite hand span, impose hard rather than gradual constraints on which gestures are
performable. Attentional limitations lead to gradually more constrained gestures as more controls are
10
selected. This is because of rehearsal: the violinist can learn to play double and triple stops with more
accurate intonation and bowing, trading off accuracy and rehearsal time. But no amount of rehearsal
gives a pianist extra fingers.
If only one control at a time can be selected, as with sliders displayed on a computer screen and
manipulated with a mouse, then no gestures involving correlated parameters can be performed, a severe
restriction. A bank of sliders is better suited for ‘set-and-forget’ parameters than for direct real-time
control. At the other extreme, the Continuum controller is a sophisticated multislider with multiple
selection. It accurately tracks the pressure and x-y position of up to ten fingers on a smooth surface as
large as a piano keyboard (Haken, Fitz, Tellman, Wolfe and Christensen 1997). A bank of multipleselection controllers may have an additional implicit secondary control: each member of the bank is
enabled by depressing a momentary switch (touching the Continuum’s surface). This is particularly
natural for polyphonic controllers, where selecting one more control produces one more sound.
3.2 Multisliders and cross-coupling
Vertegaal, Eaglestone, and Clarke (1994) investigated the use of several controllers for a fourdimensional timbre space (overtone content, brightness, articulation of attack, and speed of envelope).
One controller was a single multislider, a glove tracked in three-dimensional position and in one
rotational dimension. The other controllers had a single two-dimensional primary control (mouse,
joystick); secondary switches applied it to either the first or the last pair of dimensions of the timbre
space. This pairwise division was reasonable: the first pair dealt with steady-state spectral content, the
second with attack. Visual feedback of the four dimensions was presented as two square grids each
containing a cursor, again emphasizing the pairwise division.
11
The glove controller fared most poorly in this study. This was attributed to its high latency, but
subjects may have also had difficulty cognizing three separate presentations of information: glove
position; the visual display; and the actual sound. In particular the pairwise structure of the display and
the sound was not reflected in the measured parameters of the glove. The visual feedback may have
distracted rather than assisted the performer: even a simple joystick suffers if its x-y position is rendered
indirectly, for instance as the size and hue of a coloured disc instead of as the x-y position of a point in a
square.
A more recent study found that a multislider sometimes fared better than a bank of sliders with
multiple selection, even when subjects could not verbalise how the multislider worked. Hunt and Kirk
(1999, 2000) conducted several thousand trials where subjects used three different controllers: a mouse
controlling four sliders displayed on a screen; four physical sliders; and a mouse plus two physical
sliders. With each controller the subject had to duplicate a short sound which might vary in volume,
pitch, unidimensional timbre, or stereo position. In all cases the mouse performed most poorly, as
predicted by our theory of selection overhead. In simple tasks where only one parameter of the sound
changed, the bank of physical sliders showed the best performance though the multislider (mouse plus
two sliders) showed improvement as trials progressed. For complex tasks the multislider was best. This
is remarkable since the four parameters were not simply assigned to slider one, slider two, mouse
x-position, and mouse y-position. Rather, each parameter depended on expressions like overall mouse
speed plus average of the slider positions, or mouse y-position plus the speed of slider one. The
experimenters concluded that the simplest mapping of controls to synthesis parameters is not necessarily
optimal. (We hope for another study where the four parameters are simply assigned to the four obvious
linear controls, for an even stronger conclusion.)
12
We call a set of controls cross-coupled if they lie in the same connected component of the
driving graph (figure 2: violin elevation and pitch). In other words, the controls cannot be considered
individually. Rovan, Wanderley, Dubnov, and Depalle (1997) elaborately cross-couple the components
of a MIDI wind controller to more realistically drive an additive-synthesis clarinet sound model. They
suggest that simpler, less cross-coupled, mappings can help novices learn to play; technically, crosscoupled controls are harder to learn because part-task training on individual controls transfers poorly to
the whole task. But both Rovan’s instrument and Hunt and Kirk’s multislider indicate that crosscoupling can produce a better controller, once learned. If an instrument seems to demand cross-coupled
controls, grouping them together to drive an HDI (again following Rovan) encourages the performer to
think of the controls as a single entity. The converse holds, too: a set of strongly non-coupled controls
like a bank of sliders is inappropriate for driving an HDI, because it incorrectly suggests that each control
has an inherent meaning.
3.3 Input devices
Rotary knobs and linear sliders provide inexpensive control of continuous parameters. Bounded
sliders typically work absolutely while unbounded sliders are naturally relative.
The simple joystick is like a pair of rotary knobs cross-coupled. Force-feedback joysticks offer
variable resistance to motion or haptic display of impulses and vibrations. SensAble Technologies’
Phantom is a desktop-sized articulated arm which senses position, and applies force in, three dimensions;
it has been extended to measure orientation and apply torque, for an impressive total of six dimensions
each of input and output (Chen 1999).
13
Pressure sensors such as force-sensing resistors offer continuous control in one dimension.
They can be combined into an isometric joystick (two or three degrees of freedom) or a spaceball (six
degrees of freedom, strongly cross-coupled).
The ribbon controller and touchpad track the position (and pressure) of one or more fingertips
along a line and on a surface respectively. A touchpad may be divided into smaller touchpads, sliders,
and switches on a single physical surface. This is the input-side analogue to a window manager for
graphical output. Tactile feedback can be provided for such multiple devices by laying a cardboard
template on the physical device (Buxton, Hill, and Rowley 1985).
Light pens and tablets with styli are more accurate than touchpads but require the hand to hold a
stylus, so they work poorly when the hand also has other tasks to perform. Buttons mounted on the stylus
may offer a greater repertoire of gestures than a touchpad. Motion tracking systems measure the position
and/or orientation of sensors freely moving in space, typically attached to a glove. Each sensor is like a
tablet plus stylus, extended to three (or six) dimensions: the working area is now a volume instead of a
surface.
Selection overhead 
Property sensed

low
medium
high
Discrete state
momentary switch,
toggle switch, hat switch
multi-way switch
numerical keypad
Fingertip location (and
unidirectional force)
ribbon (1–2),
touchpad (2–3)
multi-touch pad (15)
Continuum (30)
Unidirectional force
applied by hand
force-sensing resistor (1)
torque sensor (1),
isometric joystick (2–3),
spaceball (6)
14
Location (and orientation)
of manipulandum
large knob (1),
slider (1),
trackball (2)
small knob (1),
mouse (2),
joystick (2–3)
Location (and orientation)
of, and force applied by,
manipulandum
tablet+stylus (2–4),
wand (3–6),
motion tracking glove (15–30)
pressure-sensitive
tablet+stylus (3–5),
force-sensing mouse (4),
Phantom (6)
Figure 3. Elementary manual controls. Parenthesised numbers
indicate how many degrees of freedom a control has.
Properties of these devices are summarized in figure 3. Beyond what is shown in this figure, a
control may be:
*
primary / secondary;
*
separate / an element of a bank / an element of a multiply selectable
*
absolute / relative;
*
bounded / unbounded in its motion;
*
with / without visual feedback (at the control itself, or separately in a
bank;
computer display).
Figure 3 shows that selection overhead generally correlates with number of degrees of freedom.
Efficient controls, those with a high ratio of degrees of freedom to selection overhead, are: touchpads,
trackball; spaceball; motion-tracking glove. Other considerations being equal, these are therefore
especially recommended as manual controls for musical instruments. Those with low ratios may have
other advantages like high resolution or small size.
15
This terminology concisely summarises the findings of Vertegaal, Ungvary, and Kieslinger
(1996), supported by experimental results (Wanderley, Viollet, Isart, and Rodet 2000). An effective
isometric (scalar) control is integral with respect to applied force, while an effective isotonic control is
proportional with respect to position. Isometric controls rely particularly on tactile feedback, how hard
the control is pushed, rather than visual feedback. A discrete control works best when its different states
correspond to different spatial positions, not different speeds or forces; visual feedback can play a
greater role here.
Finally, common orchestral instruments can be themselves used as controls. The instrument’s
sound can be analysed into parameter streams which then drive a sound synthesiser. Examples of such
parameters are pitch, amplitude, and timbral information such as spectral centroid and amount of
unpitched noise. The literature speaks of amplitude following (envelope following) and pitch tracking.
Secondary parameters like depth of vibrato can also be derived from the tracking data. We can also add
other devices to the instrument such as motion tracking and pedals.
4. Interpolation
4.1 Classical interpolation
In the most general sense, interpolation is ‘the performance of a numerical procedure that
generates an estimate of functional dependence at a particular location, based upon knowledge of the
functional dependence at some surrounding locations’ (Watson 1992: 101). Interpolation is a convenient
starting point for constructing a mapping between two Euclidean spaces, from a space of dimension equal
to the number of degrees of freedom of the instrument’s continuous controls, to the space of parameters
fed to the synthesiser. (This can be taken even farther, to the space of perceptual parameters of the
16
synthesiser’s sound output; this perception can also be automated (Goudeseune 2001: 151–6). Here we
simply identify ‘sound’ with ‘synthesiser parameters.’)
It is easy to specify several pairs of points in the input and output spaces, in effect specifying
‘when the controls have these values, make this sound;’ interpolation then defines intermediate sounds
for intermediate values of the controls. Having tried out the instrument resulting from this map, one can
then refine the map by moving input points (make that sound over here instead), moving output points
(no, that sound needs to be more like this one), or introducing new pairs of points (adjust the sounds
which the interpolator happened to produce in this little region). With an appropriate choice of
interpolator we need assume little else about the system, not even linearity of the synthesiser or of human
perception. In short, constructing continuous maps by extending pointwise maps through interpolation
has the advantages of low effort, generality, scalability and local adjustment. We do require that the
interpolator accept arbitrary spatial arrangements of points, not only rectilinear grids; otherwise these
advantages vanish.
The result of all this is that given such points, an interpolator can be used as a controller which,
moment by moment, takes as input a point in Rd and outputs a point in Re. This output point moves
through an unbounded nonlinear e-dimensional region containing the given fixed output points. In a realtime musical application the controller takes as input d real-valued data streams from physical controls
and outputs e data streams to a sound synthesiser. Typically d is 2 or 3 while e lies between 5 and 50, but
the mathematics holds for any 0 < d < e.
In this context, we define the unidimensional interpolation problem thus: given a finite
pointwise map S = {(xi, yi)}  RR, construct a function f : RR such that yi = f(xi) for all i, and
such that f has nice properties. Among these properties continuity is usually most desirable; others are
differentiability, having bounded higher derivatives, being C and having adjustable smoothness and
17
tension. More generally: given sets A and B and a function g : AB, construct f : AB such that f 
g and f is nice. Our niceness is not treated directly in classical interpolation theory, because we have no
ideal function approximated by the interpolation; there is no error measurement to speak of.
Certain implicit assumptions of interpolation theory may not hold for musical instruments. For
interpolation to make sense, the input and output spaces must themselves be continuous. For the
mapping to be repeatable, the input and output spaces should not change with time; in particular, large
hysteresis in the sound synthesiser (when its output is influenced by past as well as current inputs) is
incompatible with this method of constructing mappings.
The property that the extended map f agrees with the pointwise map S characterizes the restricted
class of interpolators called exact. We assume exactness from now on (and thereby disregard neural
network interpolation; neural networks are better at classifying than interpolating). Classical exact
interpolators include proximal interpolators, B-splines and kriging (Collins and Bolstad 1996). Though
simple, proximal interpolation and its generalizations are inappropriate for most musical uses since they
produce many discontinuities in f. B-splining does better, constructing f from patches of polynomials so
it is not only continuous but C2. It tends to produce artifacts such as overshoots beyond the input values
when these input values are not monotonically distributed, because it prefers minimum curvature to
avoiding such artifacts. It is not recommended for points not spaced on a grid (Eckstein 1989;
Hutchinson and Gessler 1994). As musical applications need not in general be smooth (breaks in
woodwind register, for example), B-splining seems ill-suited for general musical use. The more
sophisticated regularised spline with tension avoids overshoots and constructs f to be C. Mitasova,
Mitas, Brown, Gerdes, Kosinovsky, and Baker (1995) describe its implementation in a geographical
information system; Mitas and Mitasova (1999) sketch a generalization to more than three dimensions.
18
Kriging considers how quickly the variance between input points changes in different parts of the
space (Krige 1981; Oliver and Webster 1990). It becomes cumbersome in higher dimensions. Sárközy
(1998) warns that the data should satisfy several stationarity conditions; these may not hold in musical
situations. Kriging also performs poorly on sparse data, which can well be the case with timbre spaces.
Hardy (1990) demonstrates that kriging is relatively poor at handling smoothness, an important factor for
musical instruments.
Finally, multiquadrics (Hardy 1990) and polynomial interpolation with basis functions are two
examples of function-fitting methods. These are less commonly used, perhaps because they generalise
poorly to nongridded data points (Sárközy 1998).
4.2 High-dimensional interpolators
‘How can I control e parameters with only d scalar controls (d < e)?’ We can restate this
question as ‘how can I make a desirable collection of gestures in Re using only d degrees of freedom?’
Here Re is a family of sounds, perhaps a family of steady-state timbres; we leave desirability undefined
in this general context. At any rate, by ruling out undesirable and redundant gestures we hope to find an
interesting d-dimensional space inside Re. It is in fact possible to reduce the number of dimensions from
e to d, practically because a high-dimensional space is simply too large to be exhausted by one
instrument. More technically, the size of the space grows exponentially with the number of dimensions
(e is an exponent, after all). Exhaustive exploration of such a space risks producing an incomprehensible
instrument. So we expect that fewer dimensions suffice to describe the subset of the whole space which
we call interesting or desirable.
One way to find an interesting d-dimensional space inside Re is to first specify several desirable
points pi in Re (chosen to correspond to a priori interesting sounds), then construct a d-space enveloping
19
them, and finally define a mapping from Rd into this d-space. Since this d-space is in Re, this defines a
mapping all the way from Rd to Re. (At least d+1 points are needed; otherwise a smaller value of d
could have been chosen.) If we build up such a mapping from a pointwise map by choosing
corresponding preimage points qi in Rd, we call the new mapping a high-dimensional interpolator or
HDI. Beyond the mere existence of this mapping, there are desiderata to balance: continuity,
differentiability, linearity, including all important parts of the range space, exactness, extensibility to
larger spaces, ease of editing the map locally and globally. For now we require only continuity.
4.2.1 Automatic generation of preimage points
We divide an interpolator into an initialization phase and a running phase. The first phase is selfexplanatory; in the second phase the interpolator actually performs the mapping from a query point q to
an image point p.
As part of the initialization phase, an interpolator can automatically find good preimages qi in Rd.
Taking the unit d-cube as the domain from which to choose the qi, we find a set of qi such that (i) their
normalised pairwise Euclidean distances approximate those of the pi, and (ii) they adequately fill the
d-cube. By filling we mean that their convex hull has nonzero volume (to take advantage of all d
dimensions at our disposal), and that the projection of this hull onto some principal axis of Rd yields the
unit interval (to take advantage of all the distance at our disposal).
I have implemented two methods of choosing the qi, a genetic algorithm (GA) and Sammon’s
mapping; both are random incremental algorithms. The population members of the GA are sets of qi,
and the fitness function minimised is RMS error between corresponding distance pairs of the pi and a
given set of qi. Sammon’s mapping iteratively refines a set of qi in the manner of simulated annealling.
Kohonen (1997: 31–2) suggests that Sammon’s mapping requires 104·k to 105·k steps to arrive at an
20
adequate solution, where k is the number of points qi, but in fact certain choices of temperature gradient
yield a good solution in merely 10·k to 100·k steps—if a good result is forthcoming at all. Since most of
the improvement happens in the first few iterations, these two incremental algorithms are run with only a
few iterations but many times, keeping the best result found. Sammon’s mapping uses 700 runs of 70·k
iterations each; the GA uses 10 runs averaging around 1,000 iterations each. Both take about 100 msec
on current desktop computers to find an adequate solution for moderate values of k. (Recall that k  d+1;
k has no upper bound, though.) The GA better avoids suboptimal local minima, but Sammon’s mapping
uses less memory and is slightly faster. One cannot be universally recommended over the other.
4.2.2 Simplicial interpolation
Recall a few definitions from algebraic topology and computational geometry. The triangle in R2
and the tetrahedron in R3 generalise to a d-simplex in Rd. A triangular mesh similarly generalises to a
simplicial d-complex. A Delaunay triangulation of a set of points is a simplicial complex based on those
points, with certain nice properties. A simplicial map is a kind of mapping between Delaunay
triangulations. A point q lies in the convex hull of a set of points {qi} if it lies ‘between’ the qi:
q = wiqi for some positive weights wi which sum to 1. Given a d-simplex σ with vertices {Vi}, any
point q in Rd is expressible in terms of barycentric coordinates with respect to σ as q = biVi where
bi = 1. This is a weighted sum of the Vi. The point q lies in σ if and only if all 0 ≤ bi ≤ 1. For more
rigour and background, see textbooks such as (Alexandroff 1961; de Berg, van Kreveld, Overmars and
Schwarzkopf 1997) or look up individual terms in the online reference (Weisstein 1999).
Simplicial interpolation consists of an initialization phase and a running phase. At initialization,
a Delaunay triangulation of the preimage points qi in Rd is constructed and used to extend the pointwise
map qi→pi to a simplicial map defined on the convex hull of the points in Rd. The simplicial map is
21
then extended beyond the hull to all of Rd, to allow arbitrarily input values. In the running phase, the
algorithm determines which simplex contains a given query point q in Rd, computes q’s barycentric
coordinates with respect to that simplex, and then finds the corresponding point p in the simplicial
d-complex in Re (figure 4). Two points in Rd and Re correspond if they lie in corresponding simplices,
and have the same barycentric coordinates with respect to those simplices.
Re
p1, ..., pk
p
Rd
q1, ..., qk
q
Figure 4. Pointwise map extended to a simplicial map, used to map
an arbitrary query point q in Rd to an image point p in Re.
The running phase first induces a d-triangulation of the pi in Re from the d-triangulation of the qi.
Given a query point q in Rd, it computes which d-simplex  contains q with a constant-time bucketsearch point location algorithm derived from a FORTRAN code by Edahiro, Kokubo, and Asano (1984). It
then computes the barycentric coordinates of q with respect to . The image p of q is then the point in
the simplex in the triangulation in Re corresponding to , with the same barycentric coordinates as those
of q (see figure 4 again). In other words, q is characterized as a weighted sum of the qi; p is the same
weighted sum of the corresponding pi. (Recall that barycentric coordinates can be interpreted as weights
22
in a weighted sum.) This approach is due to a refinement by H. Edelsbrunner of my earlier algorithm
(Choi, Bargar, and Goudeseune 1995).
The special case where q lies outside the convex hull of the qi is handled by partitioning Rd with
a set of ray-simplices. Instead of one of the already described simplices we choose the unique raysimplex which contains q, and then compute barycentric coordinates as usual with respect to that raysimplex (figure 5). First we place a central point C in Rd, at the centroid of the smallest axially-aligned
hyperrectangle containing the hull (the hull’s ‘bounding box’). Since the hull is convex, C is in fact
contained in the hull. Consider a face of the hull, i.e., a (d1)-simplex on the boundary of the hull.
Together with C, its vertices define a d-simplex. The set of all such simplices partitions the hull; but if
we extend each simplex away from C (imagine rays extending outwards from C through all the vertices
of the hull), the resultant ray-simplices in fact partition all of Rd. (We ignore overlapping faces of
adjacent ray-simplices, since their barycentric coordinates agree there). Formally, we define the raysimplex (, V) of a simplex  and a vertex V of  as those points whose barycentric coordinates with
respect to  are all nonnegative, with the possible exception of the coordinate corresponding to V. There
are many ways to choose the point in Re corresponding to C and thereby induce corresponding raysimplices which partition Re; the particular way does not affect the continuity and piecewise linearity of
this extension to the mapping. The point in Re corresponding to C we conveniently define one dimension
at a time: its ith coordinate is the median of the ith coordinates of each point on the boundary of the hull in
Re.
23
Re
p1 ,..., pk
p
p
q
C
R
d
q1, ..., qk
q
Figure 5. The centroid C and the four edges of the hull of the qi induce four ray-simplices,
one of which is shaded. If q lies outside the hull, its mapping to p is defined
in terms of corresponding ray-simplices instead of corresponding simplices.
We now analyse the complexity of the algorithm. Initialization consists of running the genetic
algorithm or Sammon’s mapping (both use O(k) time and space); computing a Delaunay triangulation
(also O(k), based on an efficient implementation by Clarkson, Mehlhorn, and Deidel (1993)); and
initializing the bucket-search algorithm. The bucket search works only for the case d = 2, and initialises
in O(k) time and space; in short, initialization is entirely O(k). For larger d a generalised bucket search
takes O(k3d3) time and O(k2) space, not linear but still eminently practical (Goudeseune 2001: 145).
During the running phase, the total time complexity for mapping a point from Rd to Re is O(d3+ de)
(Goudeseune 2001: 145). This is well behaved for large e, and does not increase at all as points are
added (as k increases).
Since the domain of the simplicial interpolator is the unbounded totality of Rd, its range may also
be unbounded. This may not be compatible with the synthesiser’s inputs, where bounds such as
24
nonnegative amplitudes and stable filter coefficients may apply. Three methods can ensure bounded
output from a simplicial interpolator (or any generator of unbounded data, for that matter). (i) If a realvalued component of Re must remain within the closed interval [a, b], it can simply be clamped: if out of
range, instead use a or b as appropriate. (ii) If the endpoints must be excluded, a sigmoid-shaped
function like an arctangent monotonically maps R to (a, b). (iii) In some cases valid input cannot be
broken down into individual dimensions of Re, because of interdimensional constraints (stable filter
coefficients, for example). If the valid subset V of Re can still be computed a priori, dilate a maximal
convex subset of V (ideally V itself) to cover an e-rectangle. This reduces the problem to one solvable by
(i) or (ii): the output of the simplicial interpolator is some point p in Re. Applying a clamp or sigmoid to
each coordinate of p moves p into this e-rectangle. Inverting the dilation then carries p into V, thereby
making p a valid input for the synthesiser.
Source code for simplicial interpolation, including automatic generation of preimage points, is
available at <http://zx81.isl.uiuc.edu/interpolation/>.
4.2.3 Other interpolators
Bowler, Manning, Purvis and Bailey (1990) present a way to define continuous piecewise linear
mappings from Rd to Re. The technique takes as input a d-lattice whose vertices are analogous to the qi
of simplicial interpolation. At each point of this lattice is stored the value of the corresponding point pi.
This defines a pointwise map from Rd to Re. A naive extension from this pointwise map to a continuous
map would first find which cell of the lattice contains a given point q in Rd, and then construct the image
p of q by interpolating among the pi-values associated with each of the cell’s 2d vertices. By dividing
each lattice cell canonically into d-simplices they reduce the number of points to interpolate among from
25
2d to d+1. In fact this turns out to be a special case of simplicial interpolation, with the qi arranged in a
lattice and using a fixed (not necessarily Delaunay) triangulation.
A lower bound for computing the image of q has been shown to be O(d3 + de), the same as the
exact cost using simplicial interpolation (Goudeseune 2001: 147–8). So Bowler’s interpolator is no
faster than simplicial interpolation. It differs in that it constrains the qi to be a regular d-lattice.
Simplicial interpolation needs far fewer points than this lattice. If a particular application already has
this d-lattice constraint and can afford its high O(2d) memory usage, Bowler’s simpler algorithm may be
indicated. Otherwise simplicial interpolation is preferred for its greater generality and flexibility.
Bilinear (trilinear, multilinear) interpolation (Watson 1992: 139) differs from simplicial
interpolation in that the number of control parameters is not constant but rather increases as log2 of the
number of data points (which value must be a power of 2). Bilinear interpolation may therefore be
preferred when the number of data points is fixed to be 4 or 8, or if like Bowler’s interpolator the qi are
constrained to a lattice (in which case interpolation is done cell by cell, as in Rovan et al. (1997)). It is
even simpler than Bowler’s algorithm, but becomes intractably slow for large d since each cell has 2d
points among which to interpolate.
Neither of these interpolations deeply uses the structure of Re. Since the actual operation of
interpolating is computing p as a weighted sum of pi, the only operations required on Re are scalar
multiplication and (vector) addition. Since the real line itself has these operations, mapping to R and
mapping to Re are structurally equivalent here.
26
5. Conclusion
Stated colloquially, reducing how many dimensions of control an instrument has makes it less
frightening to its performer. More formally, such a reduction concentrates the set of all possible inputs
into a more interesting set by avoiding the redundancy inherent in the exponential growth of increasing
dimensionality. Even more formally, it reduces the dimensionality of the set of synthesis parameters to
the dimensionality of the set of perceptual parameters: it rejects all that the performer cannot actually
understand and hear, while performing. Designing an instrument around the performer’s
cognition/perception instead of the engineer’s convenience is echoed by Jacob, Sibert, McFarlane, and
Mullen (1994) in the context of visual tasks: they conclude that “choosing an input device for a task
requires looking at the deeper perceptual structure of the task, the device, and the interrelationship
between task and device.”
The value of any dimension-reducing controller is found exactly in how well it loses information.
A controller based on a custom pointwise mapping extended to a continuous mapping by simplicial
interpolation precisely defines what information is lost; a controller which tries to preserve all
information (such as a complete set of linear sliders) effectively still loses information because it is
difficult to use. A synthesiser with many degrees of freedom can be played ad hoc by a finitely attentive
human performer, exploring first this and then that region of its parameter space. But richer music is
more likely if the instrument’s rate of information consumption is systematically matched to its
performer’s rate of production. Put another way, controlled loss of information is about discovering what
the performer can and cannot do, about matching that dividing line with the one between expressive and
inexpressive.
27
References
Alexandroff, P. 1961. Elementary concepts of topology. Trans. A. Farley. New York: Dover.
de Berg, M., M. van Kreveld, M. Overmars, and O. Schwarzkopf. 1997. Computational geometry:
algorithms and applications. Berlin: Springer.
Bowler, I., P. Manning, A. Purvis, and N. Bailey. 1990. On mapping n articulation onto m synthesisercontrol parameters. Proceedings of the International Computer Music Conference 1990:181–4.
Buxton, W. 1986. There’s more to interaction than meets the eye: some issues in manual input. In User
Centered System Design: New Perspectives on Human-Computer Interaction, ed. Norman and
Draper, 319–337. Hillsdale, NJ: Erlbaum.
Buxton, W., R. Hill, and P. Rowley. 1985. Issues and techniques in touch-sensitive tablet input.
Proceedings of SIGGRAPH 1985:215–24.
Chen, E. 1999. Six degree-of-freedom haptic system for desktop virtual prototyping applications.
Proceedings of the First International Workshop on Virtual Reality and Prototyping: 97–106,
Laval, France.
Choi, I., R. Bargar, and C. Goudeseune. 1995. A manifold interface for a high dimensional control
space. Proceedings of the International Computer Music Conference 1995:385–92.
Clarkson, K., K. Mehlhorn, and R. Deidel. 1993. Four results on randomised incremental constructions.
In Computational Geometry: Theory and Applications, 185–221.
<http://cm.bell-labs.com/netlib/voronoi/hull.html>.
28
Collins, F., Jr., and P. Bolstad. 1996. A comparison of spatial interpolation techniques in temperature
estimation. Proceedings of the Third International Conference/Workshop on Integrating GIS
and Environmental Modelling. Santa Barbara: National Center for Geographic Information and
Analysis.
<http://www.ncgia.ucsb.edu/conf/SANTA_FE_CD-ROM/sf_papers/collins_fred/collins.html>.
Dean, T., and Wellman, M. 1991. Planning and control. San Mateo, CA: Morgan Kaufmann.
Eckstein, B. 1989. Evaluation of spline and weighted average interpolation algorithms. Computers and
Geosciences 15(1):79–94.
Edahiro, M., I. Kokubo, and T. Asano. 1984. A new point-location algorithm and its practical
efficiency—comparison with existing algorithms. ACM Transactions on Graphics 3(2):86–109.
Garnett, G., and C. Goudeseune. 1999. Performance factors in control of high-dimensional spaces.
Proceedings of the International Computer Music Conference 1999:268–71.
Goudeseune, C. 2001. Composing with parameters for synthetic instruments. DMA thesis. UrbanaChampaign, IL: University of Illinois. <http://zx81.isl.uiuc.edu/camilleg/dissertation>.
Haken, L, K. Fitz, E. Tellman, P. Wolfe, and P. Christensen. 1997. A continuous music keyboard
controlling polyphonic morphing using bandwidth-enhanced oscillators. Proceedings of the
International Computer Music Conference 1997:375–8.
Hardy, R. 1990. Theory and applications of the multiquadric-biharmonic method. Computers and
Mathematics with Applications 19:163–208.
29
Hunt, A., and R. Kirk. 1999. Radical user interfaces for real-time control. Proc. Euromicro 1999,
2:6–12. Los Alamitos, CA: IEEE Computer Society.
———. 2000. Mapping strategies for musical performance. In Trends in Gestural Control of Music,
ed. Wanderley and Battier. Paris: IRCAM, Centre Pompidou.
Hunt, A., M. Wanderley, and R. Kirk. 2000. Towards a model for instrumental mapping in expert
musical interaction. Proceedings of the International Computer Music Conference 2000:209–12.
Hutchinson, M., and P. Gessler. 1994. Splines—more than just a smooth interpolator. Geoderma
62:45–67.
Jacob, R., L. Sibert, D. McFarlane, and M. Mullen, Jr. 1994. Integrality and separability of input
devices. ACM Transactions on Computer-Human Interaction 1(1):3–26.
Kohonen, T. 1997. Self-organizing maps, 2nd ed. Berlin: Springer.
Krige, D. 1981. Lognormal-de Wijsian geostatistics for ore evaluation. South African Institute of
Mining and Metallurgy Monograph Series: Geostatistics I. Johannesburg: South Africa
Institute of Mining and Metallurgy.
Mitas, L., and H. Mitasova. 1999. Spatial interpolation. In P. Longley, M. Goodchild, D. Maguire, and
D. Rhind (eds.) Geographical Information Systems: Principles, Techniques, Management and
Applications, 481–92. Cambridge: GeoInformation International.
Mitasova, H., L. Mitas, W. Brown, D. Gerdes, I. Kosinovsky, and T. Baker. 1995. Modelling spatially
and temporally distributed phenomena: new methods and tools for GRASS GIS. International
Journal of Geographical Information Systems 9(4):433–46.
30
Oliver, M., and R. Webster. 1990. Kriging: a method of interpolation for geographical information
systems. International Journal of Geographical Information Systems 4(3):313–32.
Rovan, J., M. Wanderley, S. Dubnov, and P. Depalle. 1997. Instrumental gestural mapping strategies as
expressivity determinants in computer music performance. Kansei, the Technology of Emotion.
Proceedings of the Associazione di Informatica Musicale Italiana International Workshop
1997:68–73.
Ryan, J. 1992. Effort and expression. Proceedings of the International Computer Music Conference
1992:414–16.
Sárközy, F. 1998. GIS functions—interpolation.
<http://www.agt.bme.hu/public_e/funcint/funcint.html>.
Shepard, D. 1968. A two-dimensional interpolation function for irregularly spaced data. Proceedings of
the National Conference of the Association for Computing Machinery 23:517–24.
Sheridan, T., and W. Ferrell. 1981. Man-machine systems: information control and decision models of
human performance. Cambridge, MA: MIT Press.
Vertegaal, R., B. Eaglestone, and M. Clarke. 1994. An evaluation of input devices for use in the ISEE
human-synthesiser interface. Proceedings of the International Computer Music Conference
1994:159–62.
Vertegaal, R., T. Ungvary, and M. Kieslinger. 1996. Towards a musician’s cockpit: transducers,
feedback and musical function. Proceedings of the International Computer Music Conference
1996:308–11.
31
Wanderley, M., J.-P. Viollet, F. Isart, and X. Rodet. 2000. On the choice of transducer technologies for
specific musical functions. Proceedings of the International Computer Music Conference
2000:244–7.
Watson, D. 1992. Contouring: a guide to the analysis and display of spatial data. New York:
Pergamon Press.
Weisstein, E. 1999. Eric Weisstein’s world of mathematics. <http://mathworld.wolfram.com>.
32
Download