Clock Induction William A. Sethares March 19, 2006 Povel and Essens [6] hypothesize the presence of an internal clock that measures the passage of time. When hearing a rhythmic passage, the listener compares the sound to the beating of the internal clock. In this view, the essence of the perception lies in the relationship between the sound and the clock, between the regular succession within the sound and the regular succession of the internal clock. Like Parncutt’s [3] model of pulse salience, this is an attempt to model rhythmic perception using a regular succession (the clock) as a part of the cognitive (and/or perceptual) process. Several kinds of clocks are possible. The simplest is an absolute clock that allows accurate time stamping of events; the times at which the events occur con Department of Electrical and Computer Engineering, University of Wisconsin-Madison, Madison, WI 53706-1691 USA. 608-262-5669 sethares@ece.wisc.edu 1 stitute the information that is remembered and subject to further processing. For example, if the internal clock has an accuracy of 1 ms, a sequence of sounds could be notated directly by the interonset intervals, the time between the start of successive tones. Thus the sequence represents an alternation between shorter and longer sounds. The absolute clock model fails to explain several important aspects of perception such as the perceived structural identity of sequences that are presented at different tempos. From an information-processing perspective, the absolute time-stamp model predicts that all sequences consisting of the same set of intervals should be processed equally well. In fact, simple arrangements of sounds are much easier to remember, duplicate, and predict than complex arrangements. For example, the periodic sequence is much simpler than the periodic sequence even though both contain exactly the same set of durations. In a relative clock, the basic time unit is derived from the sequence itself. For example, if “1” represents the 100 ms time interval, the sequences can be 2 relabelled In the relative clock, the basic unit of time is flexible and it can accomodate changing tempos since the coding is independent of the actual tempo of the pattern. But it also predicts that any two sequences with the same number of elements should be equally easy to conceptualize. This is incorrect, as shown by the sequences and above. A third possibility is a hierarchical clock which is most easily pictured as a collection of relative clocks operating at different rates simultaneously. For the sequences and , one clock operates at the basic pulse rate of 100 ms and a second operates at a slower rate (every 400 ms) that defines a larger-scale hierarchical encoding of the sequence. Like the relative clock, the basic unit of time is flexible and can accomodate changing tempos. The higher levels are represented as regular groupings of the lower levels. While real music would likely consist of several such hierarchical clocks, Povel and Essens [6] focus on simple sequences like and and ask funda- mental questions about how the two clocks relate to each other. The lower level (faster) clock is assumed to operate at the correct rate. At what rate will the slower 3 clock tick? Where in the sequence will it start? A given temporal pattern may have several possible clock rates, and each of these rates may begin at a variety of locations. To see the problem, consider the sequence containing 12 low-level clock beats. This is (somewhat) compatible with higher level clocks with two, three, four, and six pulses per period. Low Level Clock: Sequence Clock 2(a): Clock 4: Clock 6: Clock 2(b): Clock 3: As shown, the clocks are labelled by their periodicity and phase: clocks 2(a) and (b) have period two and 2(b) is out-of-phase with 2(a). Since 2(a) aligns with four of the lower level events and 2(b) aligns with only one, 2(a) is preferred to 2(b). But how is one to choose between other clocks that may align with some ’s and miss others? Povel and Essens were unable to find a simple rule that considers only the alignment (and misalignment) of the clock with the ’s. To proceed, they observe 4 that some of the ’s appear accented: some are perceptually more important than others. This occurs even when the actual sound of each is identical: it is the placement of the sounds in time that causes the accent. A tone becomes perceptually marked (accented) in a temporal sequence if: (i) the tone is isolated (ii) it is the second of a cluster of two tones (iii) it is the initial or final tone of a cluster of three or more Using these observations to recode the sequences, Povel and Essens propose a simple formula that considers all possible clocks and all possible phase shifts and rates them using a weighted sum of the number of clock ticks (i) that coincide with accented ’s (ii) that coincide with (unaccented) ’s (iii) that occur during silence This weighted sum provides a measure of the strength of the clock induced by the sequence; the clock with the largest measure is the one that best fits1 with the input. 1 Actually, the model of clock induction strength in [6] uses only (ii) and (iii) (counterevidence) though the authors believe that positive evidence (i) may play a role in the suggestion of possible clocks. This is not needed in their model because all possible clocks are rated. 5 Povel and Essens test the model using a collection of temporal patterns that consist of all possible rearrangements of sequences with 16 low-level clock ticks that have five interonset intervals with duration 1, two with duration 2, and one each with durations 3 and 4. The patterns are shown schematically in Fig. 1, ordered by the measure of clock induction with the easiest at the top and the hardest at the bottom. Subjects were asked to listen to the sequences until they could reproduce them by tapping. The researchers recorded a number of features of the subject’s performance including (i) the number of times the subjects chose to listen to the sequence before responding (ii) the accuracy of subjects reproduction of the intervals in the sequence. Overall, the results agree with the predictions of the model. Subjects needed fewer hearings for the easier sequences and made smaller errors in the reproductions. It is a good idea to listen to a few of these patterns to hear (understand) the tasks that confronted the subjects; the thirty-five sequences are performed using a “stick” sound on the CD in [7]. Subjects found considerable differences in the difficulty of the sequences. “Those stimuli experienced as simple seemed to organize themselves automatically and were remembered without any cognitive activ6 easier 1 6 11 16 21 harder 26 31 Figure 1: Povel and Essens [6] rate these 35 patterns according to the degree to which they induce the best clock, which corresponds roughly to the complexity of the sequence. All thirty-five patterns may be heard in sound example [7]. Twenty-four test subjects were asked to listen to these sequences until they could reproduce them. In general, the sequences rated easier required fewer repetitions than those rated difficult. Moreover, subjects made smaller errors in the durations when reproducing the easier patterns than when reproducing the more complex patterns. Thus the sequences at the top are simpler and more comprehensible than those towards the bottom. 7 ity. For difficult sequences, subjects reported that they used various mneumonics such as assigning numbers to successive tones.” Thus easy patterns are those which have a strong relationship between the clocks at different hierarchical levels with accented events occurring at (and helping to define the location of) pulses of the higher level clock. Harder patterns do not show this kind of organization. Because these patterns have been thoroughly studied they can be used to test a variety of other rhythmic identification methods. For instance, the autocorrelation method is applied to patterns 1, 11, and 33 (representative of an easy, a medium, and a hard pattern) in Fig. 2. The period 16 sequence is coded using Brown’s modified binary notation because this gives clearer pictures than either the binary notation or Povel’s accented binary encoding. Only the easy one shows a significant peak at 4, which represents a clock of length 4. The medium and hard ones do not show any obvious peaks in the autocorrelation other than the peak at period 16 which represents the complete length of the sequence. Similarly, the FFT is applied to a periodic extension of patterns 1, 11, and 33 in Fig. 3. The sequence was periodically extended to give a more detailed picture, and Brown’s modified binary notation was used. As with the autocorrelation, it is possible to locate various periodicities in the data. On the other hand, these 8 (a) (b) (c) 0 4 8 period 12 16 Figure 2: The autocorrelation method is applied to (a) patterns 1, (b) pattern 11, and pattern 33 of Fig. 1. Only the easy pattern (a) shows a discernible peak at clock length 4 periodicities are buried among a large number of spurious peaks and features. Finally, the periodicity transform (PT) is applied to patterns 1, 11, and 33 in Fig. 4. All of the patterns, whether easy or hard, show periodicities at sensible locations (2, 4, and 8). The strength of the PT is thus that it clearly depicts periodic features and suppresses those periods that are incommensurable with the input sequence. For example, unlike the autocorrelation and FFT methods, the PT shows nothing at periods 3 and 5. While the PT does a good job of picking out which clocks to consider (i.e., the consistent peaks at period 4) it does not do a good job of distinguishing the easy patterns from the difficult patterns. The input to the Povel and Essens model (and to the autocorrelation, the FFT 9 period=4 period=2 period=1 (a) (b) (c) 0 4 8 frequency Hz 12 Figure 3: The FFT is applied to a periodic extension of (a) patterns 1, (b) pattern 11, and pattern 33 of Fig. 1. Peaks can be found at periods 1, 2, and 4, though they are hidden among a large number of features that have no obvious interpretation. (a) (b) (c) 0 2 4 6 8 10 12 14 16 period Figure 4: The periodicity transform is applied to a periodic extension of (a) pattern 1, (b) pattern 11, and (c) pattern 33 from Fig. 1. Peaks can be found at periods 2, 4, and 8 (and 16, the length of the sequence). Even the hardest of the patterns shows peaks at the clock rate of 4. 10 and the PT of this section) is a sequence of symbols representing a regular succession of abstracted sonic events. If such events can be accurately determined then any of these methods can be readily applied. One case where this is possible is when the sound is a collection of clearly distinuishable pulses. For example, the sound files [7] that demonstrate the patterns of Fig. 1 contain isolated percussive hits at the specified time interval. These can be readily detected by simple means such as looking for peaks in the energy, and then parsed into symbolic sequences. But for typical musical scenarios such simple techniques are unreliable. The problem then becomes one of finding appropriate feature vectors that can extract the relevant information from the sound waveform. The accurate transcription of soundwaves into timed events is essentially the problem of labelling events with an absolute clock. In order to parse a sound which includes the inevitable temporal irregularities that occur in music, there must be a way of clustering similar intervals. For example, if one interval is 198 ms and the next is 201 ms, these should be considered “the same.” But it is not easy to draw firm boundaries. Would an interval of 240 ms also be “the same” interval or would it be a different one? In other words, what is needed is a flexible method of timing, that is, a relative clock. None of the symbolic processing techniques of this section shed light on how to model the flexibility of 11 a relative clock. For the hierachical clock model to be useful, there must be a way for the ear to find all the clocks: both the higher and the lower levels. Moreover, these clocks must be flexible relative clocks rather than rigid absolute models. More simply stated, a model of rhythmic perception must provide a way to determine the low level tatum as well as the higher levels such as the beat and measure. Finding the Pulse with an Adaptive Oscillator The oscillator models provide one way to find clocks that are capable of tracking changes in the tempo of a sound pattern, as long as the change is not too rapid. By using several adaptive oscillators with different starting values, it is possible to locate more than one level of the metrical hierarchy simultaneously since the oscillators operate independently. While some authors have suggested that better performance might result by coupling the oscillators in some way, it is not obvious how to implement the coupling. The next three figures show the objective functions for the adaptive wavetable oscillator when applied to patterns 1, 11, and 33. All pattern number in this section refer to the sequences in Fig. 1. These represent an easy, a medium, and a difficult pattern for the clock induction model of Povel and Essens, and are in good agreement with the tapping experiments. According to the model, the lis- 12 teners synchronize their internal clock to the sound pattern. This is conceptually similar to the adaptive oscillators which attempt to synchronize their phase and frequency parameters to the input sequence. Fig. 5 shows the objective function for the Gaussian pulse model (other waveshapes have objective functions that look qualitatively similar) when applied to the easy pattern 1. The figure shows how the oscillator responds from all possible starting states. 1 0.8 1/1 0.6 3/4 α 0.4 1/2 1/4 0.2 1/8 β 0 Figure 5: The objective function for the adaptive wavetable oscillator applied to input pattern from Fig. 1. Depending on the initialization, the algorithm may converge to any of its maxima: to , or , , , or to maxima such as , , or which re- peat over multiple cycles. The axes are normalized so that the rate of one symbol per time unit corre13 sponds to . The phase parameter ranges from zero to one, corresponding to one complete cycle of . The behavior of the oscillator can be read directly from the figure: from any given location the algorithm always drives the parameters in the direction of steepest ascent. Suppose, for example, that is initialized near 1. For any , it will climb the hill leading to the point (observe that the figure “wraps around” so that the hill leading up to the point is the other side of the hill leading up to ). On the other hand, if is initialized near , there are two local maxima: the oscillator may converge to either or to 2 . The first represents a period two clock that is aligned with the starting pulse. The second represents a period two clock aligned with the second pulse of each pair. Povel and Essens assumed that the proper clock rate was . When initialized near , the adaptive oscillator converges to the expected clock phase at . Other initializations may lead to other synchronizations, however: , , and are all stable points. The latter two represent clocks that require more than one pass through the 16 periodic cycle before repeating. For comparison, Fig. 6 shows the objective functions for the adaptive wavetable 2 This second peak is almost hidden because its value is low, but the slope in all directions points to it as a local maxima. 14 oscillator when the input is pattern 11 and Fig. 7 shows the objective function when the input is pattern 33. In all cases, there is a strong peak at . This is the basic low level pulse rate that is assumed to be known by the Povel and Essens model (and by other methods of symbol-level processing). Thus the ascillators are capable of finding this basic periodicity by themselves (subject to being initialized close enough to the correct rate) and of acting as the flexible relative clock. Near there is a unique peak at for pattern 11 but pattern 33 also has an “off-beat” peak at . Overall, as the patterns get more difficult (according to the clock induction model) the objective functions become more complex and have more peaks. Thus the oscillators capture something of the “difficulty” of the patterns in the variety of possible convergent locations. In a pair of studies [1] and [2], Eck used a collection of Fitzhugh-Nagumo oscillators to find downbeats within the Povel and Essens patterns. The frequency of the oscillators was fixed at the correct value (corresponding to ) and then the phases were initialized randomly. The oscillators were run for several cycles until the outputs settle into a steady state. For most of the patterns the majority of oscillators converged to the same phase. For two of the patterns (23 and 33) there were two solutions. This should come as no surprise since the error surface for pattern 33 (in Fig. 7) has two stable points at and . Though the 15 1 0.8 1/1 0.6 3/4 α 0.4 1/2 1/4 0.2 1/8 β 0 Figure 6: The objective function for the adaptive wavetable oscillator applied to input pattern 11 from Fig. 1. The large peak near is the basic pulse rate of the low-level clock. Other peaks represent possible convergent values for other initializations. 16 adaptive wavetable oscillators are different in detail than Eck’s oscillators, they have the same kind of dynamics and hence the same kind of stable points. 1 0.8 1/1 0.6 3/4 α 0.4 1/2 1/4 0.2 1/8 β 0 Figure 7: The objective function for the adaptive wavetable oscillator applied to input pattern 33 from Fig. 1. As the patterns become harder, the objective functions become more complex with a greater number of spurious peaks Eck suggests that the oscillator failure rate (the percentage of oscillators that do not converge) might be used as a predictor of pattern complexity, since this failure rate increases along with the complexity of the patterns. Eck also experimented by tying the oscillators together into a fully connected network. The connected model is better at finding and synchronizing to the multiple periodic components than the uncoupled oscillators. One of the main weaknesses of Eck’s approach is that the frequency/period of the oscillator were prespecified. The parallel with adaptive wavetable oscillators suggets that this should not actually be 17 required. References [1] D. Eck, “A network of relaxation oscillators that finds downbeats in rhythms,” in G. Dorffner, ed. Artificial Neural Networks - ICANN 2001, 1239-1247, Berlin, Springer, 2001. [2] D. Eck, “Finding downbeats with a relaxation oscillator,” Psychological Research, 66(1), 18-25, 2002. [3] R. Parncutt, “A perceptual model of pulse salience and metrical accent in musical rhythms,” Music Perception, Vol. 11, No. 4, Summer 1994. [4] D. J. Povel, Internal representation of simple temporal patterns, J. of Experimental Psychology Vol. 7, No. 1, 3-18, 1981. [5] D. J. Povel, “A theoretical framework for rhyhtmic perception,” Psychological Research 45: 315-337, 1984. [6] D. J. Povel and P. Essens, “Perception of temporal patterns,” Music Perception, Vol. 2, No. 4, 411-440, Summer 1985. 18 [7] Povel’s Sequences (povelN.mp3 0:20), . The thirty-five sequences from Povel and Essens [6] are ordered from simplest to most complex. See Fig. 1 on p. 7. 19