Clock Induction William A. Sethares March 19, 2006

advertisement
Clock Induction
William A. Sethares March 19, 2006
Povel and Essens [6] hypothesize the presence of an internal clock that measures the passage of time. When hearing a rhythmic passage, the listener compares
the sound to the beating of the internal clock. In this view, the essence of the perception lies in the relationship between the sound and the clock, between the regular succession within the sound and the regular succession of the internal clock.
Like Parncutt’s [3] model of pulse salience, this is an attempt to model rhythmic
perception using a regular succession (the clock) as a part of the cognitive (and/or
perceptual) process.
Several kinds of clocks are possible. The simplest is an absolute clock that
allows accurate time stamping of events; the times at which the events occur con Department
of Electrical and Computer Engineering, University of Wisconsin-Madison,
Madison, WI 53706-1691 USA. 608-262-5669 sethares@ece.wisc.edu
1
stitute the information that is remembered and subject to further processing. For
example, if the internal clock has an accuracy of 1 ms, a sequence of sounds could
be notated directly by the interonset intervals, the time between the start of successive tones. Thus the sequence
represents an alternation
between shorter and longer sounds. The absolute clock model fails to explain several important aspects of perception such as the perceived structural identity of
sequences that are presented at different tempos. From an information-processing
perspective, the absolute time-stamp model predicts that all sequences consisting
of the same set of intervals should be processed equally well. In fact, simple arrangements of sounds are much easier to remember, duplicate, and predict than
complex arrangements. For example, the periodic sequence
is much simpler than the periodic sequence
even though both contain exactly the same set of durations.
In a relative clock, the basic time unit is derived from the sequence itself.
For example, if “1” represents the 100 ms time interval, the sequences can be
2
relabelled
In the relative clock, the basic unit of time is flexible and it can accomodate changing tempos since the coding is independent of the actual tempo of the pattern. But
it also predicts that any two sequences with the same number of elements should
be equally easy to conceptualize. This is incorrect, as shown by the sequences
and
above.
A third possibility is a hierarchical clock which is most easily pictured as a
collection of relative clocks operating at different rates simultaneously. For the
sequences
and
, one clock operates at the basic pulse rate of 100 ms and a
second operates at a slower rate (every 400 ms) that defines a larger-scale hierarchical encoding of the sequence. Like the relative clock, the basic unit of time is
flexible and can accomodate changing tempos. The higher levels are represented
as regular groupings of the lower levels.
While real music would likely consist of several such hierarchical clocks,
Povel and Essens [6] focus on simple sequences like
and
and ask funda-
mental questions about how the two clocks relate to each other. The lower level
(faster) clock is assumed to operate at the correct rate. At what rate will the slower
3
clock tick? Where in the sequence will it start?
A given temporal pattern may have several possible clock rates, and each of
these rates may begin at a variety of locations. To see the problem, consider the
sequence
containing 12 low-level clock beats.
This is (somewhat) compatible
with higher level clocks with two, three, four, and six pulses per period.
Low Level Clock:
Sequence
Clock 2(a):
Clock 4:
Clock 6:
Clock 2(b):
Clock 3:
As shown, the clocks are labelled by their periodicity and phase: clocks 2(a) and
(b) have period two and 2(b) is out-of-phase with 2(a). Since 2(a) aligns with four
of the lower level events and 2(b) aligns with only one, 2(a) is preferred to 2(b).
But how is one to choose between other clocks that may align with some ’s and
miss others?
Povel and Essens were unable to find a simple rule that considers only the
alignment (and misalignment) of the clock with the ’s. To proceed, they observe
4
that some of the ’s appear accented: some are perceptually more important than
others. This occurs even when the actual sound of each is identical: it is the placement of the sounds in time that causes the accent. A tone becomes perceptually
marked (accented) in a temporal sequence if:
(i) the tone is isolated
(ii) it is the second of a cluster of two tones
(iii) it is the initial or final tone of a cluster of three or more
Using these observations to recode the sequences, Povel and Essens propose a
simple formula that considers all possible clocks and all possible phase shifts and
rates them using a weighted sum of the number of clock ticks
(i) that coincide with accented ’s
(ii) that coincide with (unaccented) ’s
(iii) that occur during silence
This weighted sum provides a measure of the strength of the clock induced by
the sequence; the clock with the largest measure is the one that best fits1 with the
input.
1
Actually, the model of clock induction strength in [6] uses only (ii) and (iii) (counterevidence)
though the authors believe that positive evidence (i) may play a role in the suggestion of possible
clocks. This is not needed in their model because all possible clocks are rated.
5
Povel and Essens test the model using a collection of temporal patterns that
consist of all possible rearrangements of sequences with 16 low-level clock ticks
that have five interonset intervals with duration 1, two with duration 2, and one
each with durations 3 and 4. The patterns are shown schematically in Fig. 1,
ordered by the measure of clock induction with the easiest at the top and the
hardest at the bottom. Subjects were asked to listen to the sequences until they
could reproduce them by tapping. The researchers recorded a number of features
of the subject’s performance including
(i) the number of times the subjects chose to listen to the sequence
before responding
(ii) the accuracy of subjects reproduction of the intervals in the sequence.
Overall, the results agree with the predictions of the model. Subjects needed fewer
hearings for the easier sequences and made smaller errors in the reproductions.
It is a good idea to listen to a few of these patterns to hear (understand) the
tasks that confronted the subjects; the thirty-five sequences are performed using
a “stick” sound on the CD in [7]. Subjects found considerable differences in the
difficulty of the sequences. “Those stimuli experienced as simple seemed to organize themselves automatically and were remembered without any cognitive activ6
easier
1
6
11
16
21
harder
26
31
Figure 1: Povel and Essens [6] rate these 35 patterns according to the degree
to which they induce the best clock, which corresponds roughly to the complexity of the sequence. All thirty-five patterns may be heard in sound example [7].
Twenty-four test subjects were asked to listen to these sequences until they could
reproduce them. In general, the sequences rated easier required fewer repetitions
than those rated difficult. Moreover, subjects made smaller errors in the durations
when reproducing the easier patterns than when reproducing the more complex
patterns. Thus the sequences at the top are simpler and more comprehensible than
those towards the bottom.
7
ity. For difficult sequences, subjects reported that they used various mneumonics
such as assigning numbers to successive tones.”
Thus easy patterns are those which have a strong relationship between the
clocks at different hierarchical levels with accented events occurring at (and helping to define the location of) pulses of the higher level clock. Harder patterns do
not show this kind of organization. Because these patterns have been thoroughly
studied they can be used to test a variety of other rhythmic identification methods.
For instance, the autocorrelation method is applied to patterns 1, 11, and 33
(representative of an easy, a medium, and a hard pattern) in Fig. 2. The period
16 sequence is coded using Brown’s modified binary notation because this gives
clearer pictures than either the binary notation or Povel’s accented binary encoding. Only the easy one shows a significant peak at 4, which represents a clock of
length 4. The medium and hard ones do not show any obvious peaks in the autocorrelation other than the peak at period 16 which represents the complete length
of the sequence.
Similarly, the FFT is applied to a periodic extension of patterns 1, 11, and 33
in Fig. 3. The sequence was periodically extended to give a more detailed picture,
and Brown’s modified binary notation was used. As with the autocorrelation, it
is possible to locate various periodicities in the data. On the other hand, these
8
(a)
(b)
(c)
0
4
8
period
12
16
Figure 2: The autocorrelation method is applied to (a) patterns 1, (b) pattern 11,
and pattern 33 of Fig. 1. Only the easy pattern (a) shows a discernible peak at
clock length 4
periodicities are buried among a large number of spurious peaks and features.
Finally, the periodicity transform (PT) is applied to patterns 1, 11, and 33 in
Fig. 4. All of the patterns, whether easy or hard, show periodicities at sensible
locations (2, 4, and 8). The strength of the PT is thus that it clearly depicts periodic features and suppresses those periods that are incommensurable with the
input sequence. For example, unlike the autocorrelation and FFT methods, the PT
shows nothing at periods 3 and 5. While the PT does a good job of picking out
which clocks to consider (i.e., the consistent peaks at period 4) it does not do a
good job of distinguishing the easy patterns from the difficult patterns.
The input to the Povel and Essens model (and to the autocorrelation, the FFT
9
period=4
period=2
period=1
(a)
(b)
(c)
0
4
8
frequency Hz
12
Figure 3: The FFT is applied to a periodic extension of (a) patterns 1, (b) pattern
11, and pattern 33 of Fig. 1. Peaks can be found at periods 1, 2, and 4, though they
are hidden among a large number of features that have no obvious interpretation.
(a)
(b)
(c)
0
2
4
6
8
10 12 14 16
period
Figure 4: The periodicity transform is applied to a periodic extension of (a) pattern
1, (b) pattern 11, and (c) pattern 33 from Fig. 1. Peaks can be found at periods
2, 4, and 8 (and 16, the length of the sequence). Even the hardest of the patterns
shows peaks at the clock rate of 4.
10
and the PT of this section) is a sequence of symbols representing a regular succession of abstracted sonic events. If such events can be accurately determined then
any of these methods can be readily applied. One case where this is possible is
when the sound is a collection of clearly distinuishable pulses. For example, the
sound files [7] that demonstrate the patterns of Fig. 1 contain isolated percussive
hits at the specified time interval. These can be readily detected by simple means
such as looking for peaks in the energy, and then parsed into symbolic sequences.
But for typical musical scenarios such simple techniques are unreliable. The problem then becomes one of finding appropriate feature vectors that can extract the
relevant information from the sound waveform.
The accurate transcription of soundwaves into timed events is essentially the
problem of labelling events with an absolute clock. In order to parse a sound
which includes the inevitable temporal irregularities that occur in music, there
must be a way of clustering similar intervals. For example, if one interval is
198 ms and the next is 201 ms, these should be considered “the same.” But it
is not easy to draw firm boundaries. Would an interval of 240 ms also be “the
same” interval or would it be a different one? In other words, what is needed
is a flexible method of timing, that is, a relative clock. None of the symbolic
processing techniques of this section shed light on how to model the flexibility of
11
a relative clock.
For the hierachical clock model to be useful, there must be a way for the ear
to find all the clocks: both the higher and the lower levels. Moreover, these clocks
must be flexible relative clocks rather than rigid absolute models. More simply
stated, a model of rhythmic perception must provide a way to determine the low
level tatum as well as the higher levels such as the beat and measure.
Finding the Pulse with an Adaptive Oscillator
The oscillator models provide one way to find clocks that are capable of tracking changes in the tempo of a sound pattern, as long as the change is not too rapid.
By using several adaptive oscillators with different starting values, it is possible
to locate more than one level of the metrical hierarchy simultaneously since the
oscillators operate independently. While some authors have suggested that better
performance might result by coupling the oscillators in some way, it is not obvious
how to implement the coupling.
The next three figures show the objective functions for the adaptive wavetable
oscillator when applied to patterns 1, 11, and 33. All pattern number in this section refer to the sequences in Fig. 1. These represent an easy, a medium, and
a difficult pattern for the clock induction model of Povel and Essens, and are in
good agreement with the tapping experiments. According to the model, the lis-
12
teners synchronize their internal clock to the sound pattern. This is conceptually
similar to the adaptive oscillators which attempt to synchronize their phase and
frequency parameters to the input sequence.
Fig. 5 shows the objective function for the Gaussian pulse model (other waveshapes have objective functions that look qualitatively similar) when applied to
the easy pattern 1. The figure shows how the oscillator responds from all possible
starting states.
1
0.8
1/1
0.6
3/4
α
0.4
1/2
1/4
0.2
1/8
β
0
Figure 5: The objective function for the adaptive wavetable oscillator applied
to input pattern from Fig. 1. Depending on the initialization, the algorithm
may converge to any of its maxima: to , or , ,
, or to maxima such as , , or which re-
peat over multiple cycles.
The axes are normalized so that the rate of one symbol per time unit corre13
sponds to . The phase parameter ranges from zero to one, corresponding
to one complete cycle of . The behavior of the oscillator can be read directly
from the figure: from any given location the algorithm always drives the parameters in the direction of steepest ascent. Suppose, for example, that is initialized
near 1. For any , it will climb the hill leading to the point (observe that the figure “wraps around” so that the hill leading up to the point is the other side of the hill leading up to ).
On the other hand, if is initialized near , there are two local maxima:
the oscillator may converge to either or to 2 . The first represents
a period two clock that is aligned with the starting pulse. The second represents a
period two clock aligned with the second pulse of each pair. Povel and Essens assumed that the proper clock rate was . When initialized near , the
adaptive oscillator converges to the expected clock phase at . Other initializations may lead to other synchronizations, however: , ,
and are all stable points. The latter two represent clocks that require
more than one pass through the 16 periodic cycle before repeating.
For comparison, Fig. 6 shows the objective functions for the adaptive wavetable
2
This second peak is almost hidden because its value is low, but the slope in all directions
points to it as a local maxima.
14
oscillator when the input is pattern 11 and Fig. 7 shows the objective function
when the input is pattern 33. In all cases, there is a strong peak at . This is
the basic low level pulse rate that is assumed to be known by the Povel and Essens
model (and by other methods of symbol-level processing). Thus the ascillators are
capable of finding this basic periodicity by themselves (subject to being initialized
close enough to the correct rate) and of acting as the flexible relative clock. Near
there is a unique peak at for pattern 11 but pattern 33 also has an
“off-beat” peak at . Overall, as the patterns get more difficult (according to the clock induction model) the objective functions become more complex
and have more peaks. Thus the oscillators capture something of the “difficulty”
of the patterns in the variety of possible convergent locations.
In a pair of studies [1] and [2], Eck used a collection of Fitzhugh-Nagumo oscillators to find downbeats within the Povel and Essens patterns. The frequency of
the oscillators was fixed at the correct value (corresponding to ) and then
the phases were initialized randomly. The oscillators were run for several cycles
until the outputs settle into a steady state. For most of the patterns the majority of
oscillators converged to the same phase. For two of the patterns (23 and 33) there
were two solutions. This should come as no surprise since the error surface for
pattern 33 (in Fig. 7) has two stable points at and . Though the
15
1
0.8
1/1
0.6
3/4
α
0.4
1/2
1/4
0.2
1/8
β
0
Figure 6: The objective function for the adaptive wavetable oscillator applied to
input pattern 11 from Fig. 1. The large peak near is the basic pulse rate of
the low-level clock. Other peaks represent possible convergent values for other
initializations.
16
adaptive wavetable oscillators are different in detail than Eck’s oscillators, they
have the same kind of dynamics and hence the same kind of stable points.
1
0.8
1/1
0.6
3/4
α
0.4
1/2
1/4
0.2
1/8
β
0
Figure 7: The objective function for the adaptive wavetable oscillator applied
to input pattern 33 from Fig. 1. As the patterns become harder, the objective
functions become more complex with a greater number of spurious peaks
Eck suggests that the oscillator failure rate (the percentage of oscillators that
do not converge) might be used as a predictor of pattern complexity, since this
failure rate increases along with the complexity of the patterns. Eck also experimented by tying the oscillators together into a fully connected network. The
connected model is better at finding and synchronizing to the multiple periodic
components than the uncoupled oscillators. One of the main weaknesses of Eck’s
approach is that the frequency/period of the oscillator were prespecified. The parallel with adaptive wavetable oscillators suggets that this should not actually be
17
required.
References
[1] D. Eck, “A network of relaxation oscillators that finds downbeats in rhythms,”
in G. Dorffner, ed. Artificial Neural Networks - ICANN 2001, 1239-1247,
Berlin, Springer, 2001.
[2] D. Eck, “Finding downbeats with a relaxation oscillator,” Psychological Research, 66(1), 18-25, 2002.
[3] R. Parncutt, “A perceptual model of pulse salience and metrical accent in
musical rhythms,” Music Perception, Vol. 11, No. 4, Summer 1994.
[4] D. J. Povel, Internal representation of simple temporal patterns, J. of Experimental Psychology Vol. 7, No. 1, 3-18, 1981.
[5] D. J. Povel, “A theoretical framework for rhyhtmic perception,” Psychological
Research 45: 315-337, 1984.
[6] D. J. Povel and P. Essens, “Perception of temporal patterns,” Music Perception, Vol. 2, No. 4, 411-440, Summer 1985.
18
[7] Povel’s Sequences (povelN.mp3 0:20),
. The thirty-five
sequences from Povel and Essens [6] are ordered from simplest to most complex. See Fig. 1 on p. 7.
19
Download